Weather model verification using Sodankylä mast measurements

Sodankylä, in the heart of Arctic Research Centre of the Finnish Meteorological Institute (FMI ARC) in northern Finland, is an ideal site for atmospheric and environmental research in the boreal and sub-Arctic zone. With temperatures ranging from −50 to +30 C, it provides a challenging testing ground for numerical weather forecasting (NWP) models as well as weather forecasting in general. An extensive set of measurements has been carried out in Sodankylä for more than 100 years. In 2000, a 48 m-high micrometeorological mast was erected in the area. In this article, the use of Sodankylä mast measurements in NWP model verification is described. Starting in 2000, with the NWP model HIRLAM and Sodankylä measurements, the verification system has now been expanded to include comparisons between 12 NWP models and seven measurement masts, distributed across Europe. A case study, comparing forecasted and observed radiation fluxes, is also presented. It was found that three different radiation schemes, applicable in NWP model HARMONIE-AROME, produced somewhat different downwelling longwave radiation fluxes during cloudy days, which however did not change the overall cold bias of the predicted screen-level temperature.


Introduction
Nocturnal and wintertime surface temperature inversions still pose a difficult challenge to weather forecast models.Various atmosphere-to-surface coupling issues are also problematic in climate models, especially at Arctic latitudes.For the model development, versatile measurements are essential.The Arctic Research Centre of the Finnish Meteorological Institute (FMI ARC, http://fmiarc.fmi.fi/), is well suited for this purpose.The FMI ARC consists of two main stations, the headquarters in Sodankylä (67.368 • N, 26.633 • E), and the Pallas clean air research station (67.967 • N, 24.117 • E), which both provide ideal locations for atmospheric and environmental research in the boreal and sub-Arctic zone.
FMI ARC dates back to the mid-nineteenth century when, in 1858, The Societas Scientiarum Fennica founded the first weather station in Sodankylä.Continuous meteorological measurements were started in 1908 and have been continued to this day (Savunen et al., 2014).Being accessible from all parts of the world, FMI ARC is also an excellent base for studying various themes of global change in a northern context.
Today, an extensive set of measurements, ranging from basic meteorological data to heat and carbon fluxes as well as ozone and Arctic snow coverage measurements, is being performed at FMI ARC.Sodankylä observatory also provides facilities for receiving and processing polar satellite images, and FMI has conducted systematic aurora observations in the Finnish Lapland since late 1950s.The FMI ARC research sites belong to the Lapland Biosphere-Atmosphere Facility (LAP-BIAT, http://www.sgo.fi/lapbiat/), an infrastructure project through which the EU can fund visiting research groups.It has also been a site for various measurement campaigns (e.g.NOPEX/WINTEX campaign in 1997; Halldin et al., 2001), as well as various EU projects and measurement networks, like CEOP (Savunen et al., 2014, http://data.eol.ucar.edu/master_list/?project= CEOP/EOP-3/4), CarboEurope IP (http://www.carboeurope.org/), and ICOS (https://www.icos-ri.eu/).
In the weather model verification, the traditional way is to perform detailed studies of model analyses and forecasts by comparing them with measurements afterwards.Another way to provide insight into model behaviour is to compare measurements with forecasts parallel with model runs in near-real time.Although based partly on less accurate (unchecked) measurements, this approach nevertheless pro-

M. Kangas et al.: Weather model verification using Sodankylä mast measurements
vides valuable information about model behaviour and, when monitored frequently, can also act as a kind of alarm bell, alerting model developers when there are apparent problems with model forecasts.Data collected this way can also be used in model performance studies (Atlaskin and Kangas, 2006).As an added benefit, it provides means to monitor measurements.
Starting from 2000, the measurements at FMI ARC have been used to verify weather model forecasts in near-real time.The verification was started with the NWP model HIRLAM (Undén et al, 2002;Eerola, 2013) and Sodankylä measurements, but has later been extended to cover several other NWP models and mast measurement stations.Presently, a total of 12 models and seven measurement masts are included.The models represent the activities of HIRLAM (http://hirlam.org)and ALADIN (http://www.cnrm.meteo.fr/aladin/) NWP consortia, as well as those of ECMWF (European Centre for Medium-Range Weather Forecast, http: //www.ecmwf.int/).The masts are located across Europe and run by various European institutions.The forecastmeasurement comparison plots with statistical analyses are provided online as a part of HIRLAM forecast runs.
The harmonized and quality-checked data sets collected in Sodankylä are also available for more detailed research and model development.From the point of view of research, the most valuable feature of the Sodankylä site is that it offers the possibility to combine various simultaneous measurements, including those from a micrometeorological mast and a radiation tower, as well as from dedicated snow and soil observations, AWS, and atmospheric soundings (see e.g.Coustau et al., 2014).In the present article, these data sets are utilized in a study of radiation from HARMONIE-AROME forecast system (Seity et al., 2011) versus measured radiation in Sodankylä.
The Sodankylä measurements are likewise important in the initialization of NWP models in operational forecasting.Of the measurements performed in Sodankylä, balloon soundings (temperature, humidity, wind components) and some SYNOP measurements (surface pressure, screen-level temperature, snow depth) are assimilated in the upper air and surface analysis of HIRLAM and HARMONIE-AROME models.
Section 2 contains description of Sodankylä site and Sect. 3 of the mast verification system.A comparative study on HARMONIE-AROME radiation schemes is presented in Sect.4, and conclusions in Sect. 5.

Sodankylä measurements
The terrain around FMI ARC Sodankylä observatory (67.368 • N, 26.633 • E, altitude 179 m a.s.l., http://fmiarc.fmi.fi/) is moderately undulating, with isolated fells reaching up to 500 m altitude.The observatory is located on the eastern bank of the river Kitinen, 7 km southeast of the So-dankylä town centre, and about 100 km north of the Polar Circle and Rovaniemi.The vegetation in Sodankylä area is typical for the northern boreal zone, with coniferous forest (mostly managed) and large open mires dominating the landscape.The climate is characterized by long and cold continental-type winters and relatively warm but short summers.During 1981-2010, the average yearly medium screenlevel temperature was −0.4 • C, yearly precipitation 527 mm, and snow cover duration 200 days (from 26 October to 14 May).The absolute minimum screen-level temperature during the same period was −49.5 • C and with absolute maximum value at +30.0 • C.
Due to the warming effect of the Gulf Stream, the area can be classified as continental sub-Arctic or boreal taiga, by Köppen classification climate region Dfc (continental sub-Arctic or boreal (taiga) climates).However, with regard to stratospheric meteorology, Sodankylä can be classified as an Arctic site, often lying beneath the middle or the edge of the stratospheric polar vortex and in a zone displaying intermittent polar stratospheric ozone depletion (Savunen et al., 2014).
Continuous meteorological measurements have been performed in Sodankylä since 1908.Ground-station observations every 3 h record information on weather conditions prevailing at ground level.In addition to standard weather observations, the basic observational duties at the observatory include regular recordings of solar radiation, sunshine and hydrological quantities.Radiosonde measurements are carried out twice a day.During the NOPEX/WINTEX measurement campaign, an aircraft campaign to measure boundary layer properties was performed (Kangas et al., 1998), the results of which were then used in studies on satellite-based reflectance measurements (Kangas et al., 2001) and on regional momentum and sensible heat fluxes (Batchvarova et al., 2001).
Data from most of the measurements are collected into a central database at http://litdb.fmi.fi/.It contains data not only from Sodankylä but also from other FMI ARC measurement sites.In the following, the measurements used in the mast verification are briefly described.

Micrometeorological mast
In 2000, a 48 m-high micrometeorological mast was erected in the immediate vicinity of the Sodankylä observatory (http: //litdb.fmi.fi/micrometeorologicalmast.php), and has since been producing data.The height of the mast was limited by the presence of a nearby airfield.It is located in a sparse Scots pine forest on a sandy podzol.The average tree height in is 12 m, tree density 210 000 trunks per km 2 , tree age 60-160 years, and the projected leaf area 1.2 m 2 (http://en.ilmatieteenlaitos.fi/GHG-measurement-sites).
The mast is extensively instrumented with temperature, wind, humidity, and radiation measurements at various levels (Fig. 1, Table 1).The instruments used include level by the eddy covariance method (see more detailed description below).Additional near-ground measurements including soil temperature and moisture profiles, soil heat flux, snow depth, and below-canopy PAR are performed in the vicinity of the mast (http://litdb.fmi.fi/micrometeorologicalmastfield.php).

Heat and momentum fluxes
The in situ fluxes of sensible heat, latent heat, and momentum are measured at the micrometeorological mast by the micrometeorological eddy covariance (EC) method, which provides direct measurements of the fluxes averaged on an ecosystem scale.In the EC method, the vertical flux is obtained as the covariance of the high frequency (10 Hz) observations of vertical wind speed and the variable in question (temperature, H 2 O concentration, or horizontal wind speed) (Baldocchi, 2003).
The eddy covariance measurement system at Sodankylä includes a USA-1 (METEK GmbH, Elmshorn, Germany) three-axis sonic anemometer/thermometer and a closed-path LI-7000 (Li-Cor., Inc., Lincoln, NE, USA) CO 2 / H 2 O gas analyser.The measurements are performed at 23 m, 5-10 m above the mean forest height.The EC fluxes are calculated as half-hourly averages taking into account the appropriate corrections.The measurement systems and the postprocessing procedures are presented in more detail by Thum et al. (2009) and Aurela et al. (2015).See also Table 3.

Solar radiation tower
In addition to the basic synoptic measurements, a set of additional measurements is performed on a 18 m-high solar radiation tower in the observatory area.For consistency, all radiation data used in the mast verification are obtained from the radiation tower.The measurement instruments on the radiation tower are also easily reachable and allow more frequent maintenance than those on the micrometeorological mast.They are quality-controlled and e.g.snow on the instruments is removed if found to exist.All instruments except those used for the outgoing LW radiation are ventilated.No heating is applied as that would interfere with the measurements.
3 The mast verification system

Near-real-time comparison
Since 2002, near-real-time comparisons of model forecasts and in situ measurements have been performed as a part of HIRLAM weather forecast model operational runs at FMI. Starting with the HIRLAM forecast and Sodankylä measurements, the comparison has expanded to comprise a total of 12 models and seven masts from around Europe.An eighth mast in Estonia is presently being introduced into the system (Table 2).In addition to the direct online comparison, long-term comparison statistics are provided.Table 3 lists the parameters included in the comparison.
To enable rapid update of the comparison, the comparison plots are produced as a part of the operational HIRLAM forecast cycle (currently 4 times a day after synoptic hours 00:00, 06:00, 12:00, and 18:00 UTC) using the latest available data.
The HIRLAM program web site (http://hirlam.org)is used as the data pool, into which the data providers transfer their data in prescribed format and from where they are retrieved by the plotting routines located at FMI.The plotting is performed with Gnuplot (http://www.gnuplot.info/)scripts, produced and run by the data retrieving program based on Perl and UNIX scripts.
The parameters that are currently plotted include temperature, wind speed, and humidity at specified levels as well as various heat and radiation fluxes (Table 3).With the original aim in mind, the temperature difference between 2 m and a higher level (usually the first model level) is also included in the plots as a measure of the surface temperature inversion.A sample plot showing screen-level (2 m) temperature from the HIRLAM forecast as compared to Sodankylä mast measurement (at 3 m) is shown in Fig. 2.
An interactive web page for browsing the comparison results has been set up.The enables side-by-side comparison of different mast-model combinations.Not all modelmast-parameter combinations are possible, however, because parameters measured at different masts vary and all mast locations are not covered by all model integration areas.
In these cases, an appropriate subset of the plots is shown.Information about the parameters as well as brief descriptions of the masts and models is also included.The page is available to all HIRLAM and ALADIN consortia participants and to data suppliers as a part of the general HIRLAM forecast visualization pages.

Statistical comparison
Seasonal statistics compiled for individual observatories, or mast sites, containing the models available at each respective station are calculated in the mast comparison as well.Seasonal summaries of the daily comparisons, including a variety of descriptive and comparative statistics, are shown under a separate heading on the interactive web page.
Graphs include time series of observed and modelled variables and the departures of model output from the observations.They provide a qualitative view of how the models are doing, and how their performance has varied during the season, thus linking model performance to the prevailing conditions.These graphs are also useful for identifying gaps in the data.
Graphs of average model biases and rms errors (RMSEs) as a function of forecast lead time serve to quantify the errors, while scatterplots, histograms, and mean diurnal cycles help to interpret the errors physically by linking the average errors to specific conditions or hours of the day.
As an example, Fig. 3 shows as the plots of the RMSE and bias of screen-level (3 m in the mast) temperature and upwelling longwave radiation (LWUP, obtained from the 18 m radiation tower, see Table 3) for the spring period (March-April-May) of 2014.The plots include data from four models, HIRLAM (FMI), HARMONIE-AROME (FMI), IFS (ECMWF), and Arpege (Météo France) and they show the first 24 h of the 00:00 UTC forecasts.One can see that for the FMI operational HIRLAM there is a clear overestimation of both LWUP and the screen-level temperature.Here, LWUP represents the surface temperature over open land in the measurements and that of the whole forest-covered 50 km 2 grid box in the model.For HARMONIE-AROME and Arpege, we have a slight underestimation of both of these parameters, especially at about midday.For IFS, the correspondence between these two parameters is not so clear.

Comparison of HARMONIE-AROME radiation fluxes to Sodankylä observations: a case study
Spectrally averaged shortwave and longwave radiation fluxes at the surface are predicted output variables of the contemporary NWP models.They are directly comparable to the observed radiation fluxes, which could thus be used for the validation of the forecast along with the near-surface temperature and humidity, anemometer-level wind, cloudiness, and other variables diagnosed from the NWP model output   in the standard station verification.In particular, comparison of the simulated and observed radiation fluxes can give useful insight for the development of the cloud and radiation parameterization in the NWP models.Both in reality and in the models, the short-term variability of the surface radiation fluxes is mostly related to the variations of cloud and aerosol particles in air.In Sodankylä, the influence of aerosol in the atmospheric radiation transfer is minor.In this section, we will test different atmospheric radiation parameterization in an experimental version of the HARMONIE-AROME forecast system, based on the reference cycle 38h1.2,http://hirlam.org/index.php/hirlam-programme-53/general-model-description/mesoscale-harmonie), against the Sodankylä radiation tower measurements.

Measurements and numerical experiments
For a model-observation comparison, six components of radiation fluxes measured in the 18 m-high Sodankylä radiation tower are available (Table 3): shortwave downwards (SWDN or global radiation) and upwards (reflected); direct normal solar irradiance (DNI); diffuse shortwave solar radiation; and longwave radiation downwards (LWDN) and upwards (LWUP).In this study, we compared the observed SWDN and LWDN to their model counterparts for the time period 15 January-15 May 2014.The available 1 min flux measurements were averaged over 3 h periods and compared with the 3 h average fluxes derived from the accumulated radiation fluxes of the +6 h and +3 h HARMONIE-AROME forecasts, which were initiated every 6 h (00:00, 06:00, 12:00, 18:00 UTC).In addition, the screen-level temperature observations provided by the Sodankylä automatic weather station  (AWS), representing the middle of each 3 h period, were selected for comparison with the forecasted screen-level temperature.Sodankylä daily average precipitation observations were extracted from the FMI climatological database.
The default atmospheric radiation parameterization of AROME (Seity et al., 2011) is based on the radiation transfer code in the Integrated Forecast System (IFS cycle 25R1, European Centre for Medium-Range Weather Forecast implementation in 2002), see ECMWF (2012) and Mascart and Bougeault (2011), denoted here as ifsradia.An alternative radiation scheme originates in ALADIN (Mašek et al., 2016), hereafter denoted as acraneb2.The radiation scheme of HIRLAM (based on Savijärvi (1990), see also Nielsen et al. (2014)), hereafter denoted as hlradia, was available for experimentation.All three schemes were tested within the framework of AROME physical parameterization by running three series of experiments using a dedicated version (harmonie-38h1.radiation) of HARMONIE-AROME over a domain covering Finland.A horizontal resolution of 2.5 km and 65 levels in vertical were used.Lateral boundary conditions for the experiments were obtained from the ECMWF analyses.For the initial state of each +27 h forecast, the objective analysis of the surface variables was combined with the atmospheric analysis extracted from the boundary files.For surface-related parameterization, AROME uses the external surface scheme SURFEX (Masson et al., 2013).

Model-observation comparison in early spring 2014
Most of the winter days before mid-March 2014 were cloudy in Sodankylä.Most observed and predicted clouds were essentially nonprecipitating.The nonprecipitating clouds predicted by HARMONIE-AROME consisted mainly of (supercooled) liquid droplets while the ice crystal content was small.Some amount of the (precipitating) snow and graupel was practically always present in the simulated clouds and some liquid-ice condensate at the lowest model level was often predicted.This is due to a recent change in cloud microphysics treatment in the HARMONIE reference system (K.-I.Ivarsson, personal communication, 2015).Every month, there were several days when more than 1 mm of precipitation, corresponding roughly to 1 cm of snowfall, was observed and predicted, while the first significant rainfall appeared in the end of April.These precipitation events were predicted well by the model.Falling precipitation was observed during the periods when HARMONIE also suggested significant snow and graupel content in the clouds.This indicates that in the model most particles classified as precipitating indeed reached the surface, in agreement with the observations.Typically, the simulated condensate content of the precipitating particles was 2-3 times the liquid droplet water content, which in turn was an order of magnitude larger than that of the ice water content.In our experiments, only the cloud liquid droplets and ice crystals, but not the precipitating particles, were allowed to influence the radiative transfer in the atmosphere.This deviated from the default HARMONIE (cycle 38h1.2) settings, according to which a fraction of the snow and graupel particles is accounted for when determining the cloud optical properties.
Figure 4 shows time series of the observed and forecasted (+24 h) screen-level temperature, SWDN, and LWDN as well as the difference between the observed and forecasted LWDN in February 2014.An overall cold bias of the screenlevel temperature forecast by the model using any radiation scheme was detected as compared to the AWS observations (Fig. 4a).Typically, the forecast was 1-2 • C colder than observed.
In February, solar radiation flux (Fig. 4b) is small, Sodankylä being located north from the polar circle.In February 2014, the maximum observed SWDN value was ca.160 Wm −2 , while a typical daily maximum value was less than 80 Wm −2 .As the longwave effects (Fig. 4c) are expected to dominate in the surface radiation balance, we will focus on the LWDN comparison.
Generally, the LWDN flux was predicted well (Fig. 4c  and d).The largest differences between predicted and observed LWDN were found 1-2, 7-8, and 19-21 February.The results were best when using the ifsradia and acraneb2 schemes, while more deviations were found for hlradia.
Automatic weather station observations (not shown) indicated that during February 2014, only the afternoon and night after the 20th was cloudless in Sodankylä.In this truly clear sky case (both observed and simulated) all schemes correctly produced small LWDN fluxes and low screen-level temperatures.When observed clouds were not caught by the model, LWDN fluxes were underestimated by all schemes.This was the case e.g. on 21 February.Downwelling longwave radiation was overestimated by hlradia (Fig. 4c, d) when the simulated clouds were optically thick (due to the assumed large supercooled liquid water content, not shown), for example during 9-12 February.During some periods (7-8 and 17-19 February), the cold bias of the screen-level temperature was most evident for hlradia, which showed the most underestimated LWDN values these days.Also the integrated cloud liquid water content was then smaller in the experiment with hlradia than it was with other schemes.This might indicate secondary effects due to the cloud-radiation interactions in the model.However, more studies are needed to estimate the significance of this difference and to understand the mechanism behind it.
The simulated LWUP (Fig. 4e) followed observations generally much more closely than the screen-level temperature.This indicates that the surface (skin) temperature seen by the radiation parameterization was predicted well in most cases (with the exception of the first 2 days and 7-8 February).In the model, the properties of the snow cover on ground and, to some extent, the soil and vegetation properties under the snow, influence the surface temperature and the grid-average LWUP.
The different LWDN produced by the different radiation schemes does not, however, explain the systematic bias of the predicted screen-level temperature.LWDN is a part of the surface energy balance, which determines the (snow and soil) surface temperature that interacts with the atmosphere.In the model, the diagnostic screen-level temperature is obtained by interpolating between the predicted lowest model level (representing the layer up to ca. 28 m from the surface) and the surface temperatures.In the interpolation, the surface layer stability is taken into account.The diagnostic estimation of the screen-level temperature is likely to add uncertainty to the model-observation comparison.Thus, the simulated screenlevel temperature was evidently strongly influenced by the lowest model level temperature, which in turn was dominated by the temperature advection in the low troposphere.
In a model-observation comparison at a single location, phase errors of the large-scale forecast in time and space show up if e.g. the arrival of an atmospheric frontal system has been forecasted incorrectly.However, a systematic bias is hardly explained by the phase errors.A comparison between the predicted lowest model level temperature with the corresponding measurements of the micrometeorological mast, as well as a comparison between the predicted surface temperature and the corresponding snow-soil surface temperatures, might shed light on the problem.Predicted solar radiation fluxes, although small in this period, deserve evaluation against the observations.This falls, however, outside the scope of the present study.

Conclusions
The near-real-time mast verification of NWP forecasts, starting in 2000, has proved to be very useful in NWP model verification and, after being started with only one model and one mast (HIRLAM and Sodankylä), has now expanded to include 12 forecasts and seven masts across Europe.The mast verification system has been integrated with the operational runs of NWP model HIRLAM, with data for other models and masts obtained through a common data pool.The results are shown as a part of HIRLAM web-based visualization pages that are available to all data suppliers and members of HIRLAM and ALADIN NWP model consortia.The system is not dependent on HIRLAM runs, though, and could be also run separately.
Statistics of the comparisons with e.g.long-term bias are also included in the verification, although they are not updated daily but on seasonal basis.They provide seasonal summaries of the daily comparisons, including a variety of descriptive and comparative statistics.
A comparative study of different radiation schemes applicable within HARMONIE-AROME NWP system was also presented for early spring 2014.Based on this example, we conclude that the three different radiation schemes produced generally good but somewhat different LWDN fluxes in cloudy days -and in February 2014, there was only one afternoon and night free of clouds in Sodankylä.The hlradia scheme behaved most differently from the other two schemes -ifsradia and acraneb2.The hlradia scheme tended to overestimate LWDN in case of optically thick clouds and possibly underestimate it in case of optically thin clouds.However, when comparing the simulated screen-level temperatures to those observed by AWS, the usage of any scheme seemed to lead to a systematic cold bias of the order of 1-2 • C. The reason for this bias seems to lay outside the radiation parameterization and requires further study to be understood.

Table 2 .
Masts and weather forecast models included in the mast verification.

Table 3 .
Mast verification comparison parameters and their measurement in Sodankylä.Parameters 1-5 and 12-15 are from the micrometeorological mast, 6-11 from the radiation tower.In Sodankylä, screen-level temperature and humidity measurements take place at the height of 3 m, wind speed at 18 m.
*Usually the lowest model level.