Emission rates of greenhouse gases (GHGs) entering into the atmosphere can be
inferred using mathematical inverse approaches that combine observations from
a network of stations with forward atmospheric transport models. Some
locations for collecting observations are better than others for constraining
GHG emissions through the inversion, but the best locations for the inversion
may be inaccessible or limited by economic and other non-scientific factors.
We present a method to design an optimal GHG observing network in the
presence of multiple objectives that may be in conflict with each other. As a
demonstration, we use our method to design a prototype network of six
stations to monitor summertime emissions in California of the potent GHG
1,1,1,2-tetrafluoroethane (CH

Greenhouse gas (GHG) emissions are difficult to measure directly, which has
led to the development of two indirect methods to estimate their emission
rates. “Bottom-up” methods stitch together data on economic activity, fuel
consumption, emission factors, and other disparate sources to form GHG
emissions inventories

The viability of using a top-down approach to constrain GHG emissions hinges
on the network of observing stations. Measurements from the network are
compared objectively to simulations from an atmospheric transport model using
inverse methods

Optimization techniques can be used to strategically place stations and
select sampling strategies in a network, in order to maximize the information
obtained for top-down inversion systems. Quantitative methods for designing
“optimal” observing networks have been described for inferring carbon
dioxide (CO

Network optimization studies typically construct and optimize a single
objective function, which is usually related to the performance of the
observing network

Economic and operational factors also heavily influence the design of
observing networks

We apply a multiobjective genetic algorithm to quantify and optimize the
performance-cost tradeoff curve for a prototypical top-down GHG observing
network. Multiobjective optimization is a powerful generalization of
standard, single objective optimization methods

Extending the work of

The forward atmospheric simulations used in this study were conducted as part
of an effort to constrain HFC-134a emissions in California using atmospheric
measurements and an inverse method

The archive was constructed using version 3.4 of the Weather Research and
Forecasting model with coupled chemistry (WRF-Chem)

The HFC-134a time series in the archive were generated using version 4.1 of
the gridded emissions inventory from EDGAR

Candidate sites for new observing stations are assessed by creating synthetic
observations of HFC-134a from the forward model simulations and then using
the observations in the Bayesian inversion scheme described in
Sect.

The background air advected into our simulation domain from the west (see
Fig.

Both figures show the spatial domain and model grid used for the simulations of HFC-134a using WRF-Chem. The figure on the left shows the 15 regions used for tagging the HFC-134a tracers (regions 1–14 in California, 15 outside of California), and the locations of seven existing measurement sites (white dots). The figure on the right shows the spatial distribution of HFC-134a emissions from the version 4.1 EDGAR inventory on the WRF-Chem model grid.

In order to produce synthetic observations that are reasonably consistent
with actual observations, we first uniformly scale all of the simulated
HFC-134a mole fractions using

The figure shows the HFC-134a time series from the forward model
simulations (black lines,

The purpose of the noise is to inject uncertainty into the problem that can
arise from a variety of factors, including imprecise measurements, scale
representation errors, model imperfections, and other sources. Depending upon
the relative magnitude of the noise amplitude (

Figure

These synthetic observations thus provide a realistic challenge for inversion
algorithms. The skewed, non-Gaussian component reflects model structural
errors or systematic biases that can affect the source inversion

The surface emissions of HFC-134a (model inputs) are inferred by solving an
inverse problem that minimizes the differences between observed and simulated
mole fractions in the atmosphere (model outputs). The target “observations”
are taken as the values from Eq. (

Due to the linear relationship between emission levels and atmospheric mole
fractions, the net time series of HFC-134a simulated by the model is
a weighted sum of the time series of the individual HFC-134a tracers emitted
and tagged from the separate regions. This relationship is expressed as

The goal of the inversion is to determine the values of the emissions
weights,

Given observations of HFC-134a, the probability distribution of weights for
the emissions is obtained from Bayes' rule,

The prior distribution of weights for the emissions is modeled as

For differences between simulated and observed mole fractions that are
normally distributed, the likelihood function is given by the product of probabilities,

Using these forms for the prior distribution and likelihood function, the
posterior distribution of weights for the emissions is also Gaussian,

Evidence approximation is used to estimate

The figures illustrate the simple multiobjective problem described
in Sect.

A primary goal of network design problems is to determine the best locations
and sampling strategies for a collection of instruments or sensors that
optimize a given set of objectives. In designing a wireless communications
network, for example, the objectives may be to achieve complete coverage over
a given area using a limited number of transmitters

Multiobjective optimization problems are formulated mathematically as

To better illustrate the concept of multiobjective optimization and the
Pareto frontier, consider the simple example given below and shown in
Fig.

The right panel in Fig.

The goal for the multiobjective HFC-134a network design demonstration problem is to select “optimal” locations for placing six observing stations to monitor summertime emissions of HFC-134a from California. Optimal locations are determined by jointly maximizing the scientific performance and minimizing the measurement costs of the observing network. Seven “existing sites” are available that have related measurement capabilities. Including any of these existing sites in the network will reduce the costs, but may decrease the performance. This section provides further mathematical details of the optimization problem (design variables, search space, and objectives) and describes the numerical algorithm used to solve the problem. Given the size and complexity of the problem, and the nature of the numerical optimization algorithm, it is important to keep in mind that the resulting observing networks are not global optimal solutions. Instead, they represent plausible local optimal designs that are significantly better than a random selection of sites. Moreover, we also caution against using these designs as a basis for a real-world HFC-134a observing network, as many factors were not included in this demonstration (e.g., biases in WRF-Chem transport, inter-seasonal variations of HFC-134a, year-to-year changes in meteorology and emissions, and terms not represented in the idealized cost model).

Two types of design variables are considered for our HFC-134a observing
network test problem. We consider different locations for placing the six
observing stations (

As with the station locations, the daily measurement frequency is also
represented as an integer-valued design variable, though we use the same
frequency for all six of the locations. Measurement frequency is included as
a design variable because changing the number of measurements leads to an
interesting tradeoff between network performance and cost. This variable
takes integer values 1–6 and maps them to the six different ways of
dividing 24 h into regular sampling intervals using the 2-hourly
WRF-Chem output:

These design variables are independent directions in a seven-dimensional
integer-valued search space. Brute force search methods are impractical for
searching through a space this large. To illustrate, first consider the
simple case of choosing a location for just a single monitoring station with
a fixed measurement frequency. For this case, 2921 candidate sites need to be
assessed to optimize the objectives. Choosing the locations for a pair of
fixed-frequency stations, however, yields a search space containing roughly
4.2 million design points. The number of ways of selecting

Two objectives are jointly optimized in the network design. These are to find
design points that maximize performance,

Performance is optimized by minimizing

For the cost objective, we assume that it is less expensive to set up
HFC-134a monitoring capabilities near sites where infrastructure already
exists and atmospheric measurements or soundings are routinely taken (e.g.,
sites in the National Oceanic and Atmospheric Administration's Cooperative
Air Sampling Network). The following seven locations in California are
considered as “existing sites” where costs can be minimized: Trinidad Head,
Chico, Walnut Grove, Sutro Tower, Fresno, Los Angeles, and Scripps. The
locations of these sites are shown by the white circles in
Fig.

The total cost for the six-station observing network is calculated using

For a candidate site

The operational cost is assumed to depend linearly on

Although the cost model described by Eqs. (

Unlike the Pareto frontier that was derived analytically for the simple
example in Sect.

Genetic algorithms evolve generations of a population of potential designs
through a search space using notions such as survival-of-the-fittest and
reproduction. Each loop of a genetic algorithm represents one generation, and
at each generation four genetic operations are applied:

For multiobjective problems, modern genetic algorithms also apply niche
operators to promote diversity of the designs across the Pareto frontier.
A genetic algorithm can therefore derive a diverse set of Pareto optimal solutions
in a single optimization run, which is a great advantage over other methods
that require multiple runs to characterize the multiobjective space. For
our network design problem, we use the multi-objective genetic algorithm
(MOGA)

Settings used in the genetic algorithms.

Refer to

The figure displays the raw objective function evaluations during the evolution of a population of network designs using SOGA to optimize performance. The horizontal black line shows the SOGA Best case. The SOGA Best, SOGA Efficient, and IO cases are also displayed.

As a benchmark for referencing the algorithmic performance of SOGA, we also
employed the incremental optimization (IO) strategy described and benchmarked
by

IO uses an intuitive, recursive approach to build up a network of observing
stations. Starting with the first station, all of the candidate sites are
evaluated and the station is placed at the location that optimizes objective

SOGA is used to optimize only the performance of the HFC-134a observing
network for a fixed cost (i.e., minimize

Although 6000 objective function evaluations may appear to be a large number,
it is a tiny fraction of the number of design points that occupy the full
six-dimensional search space (recall that

To further put these results in perspective, we compare SOGA to the IO method
described in Sect.

There is an additional factor besides the “evaluation time” that affects
the efficiency of IO and SOGA. The “decision time” is the amount time it
takes for the algorithm to decide which station or stations to add to the
network. For IO, the “decision time” is negligible and is based on a
sort/search for the station with the smallest inversion variance at each
stage. The “decision time” for SOGA, on the other hand, is tied to the four
genetic operators (fitness assessment, reproduction, crossover and mutation)
and varies from generation to generation because the population changes. We
estimated an average “decision time” for SOGA and found that it is much
smaller than the “evaluation time” and does not hinder the performance of
the algorithm. We therefore conclude that, for our problem, SOGA is a more
efficient algorithm for network design than IO. By our estimates, SOGA is
about 2–7 times more efficient than IO, depending upon which design is
used (i.e., SOGA Efficient versus SOGA Best). These results counter the
findings of

Figure

Given the spatial distribution of HFC-134a emissions shown in
Fig.

Because the HFC-134a observations are synthesized using
Eqs. (

The figure shows the locations of observing stations in the SOGA Best case (stars), SOGA Efficient case (squares), and IO case (triangles). Reference locations of the seven existing observing sites are also shown (white circles).

The figure shows the posterior weights for the emissions from the
15 regions for the SOGA Best, SOGA Efficient, and IO networks shown in
Fig.

The values of

MOGA is used to jointly optimize the performance and cost of the HFC-134a
observing network and to estimate the Pareto frontier between the two
objectives. In the previous section, we showed that SOGA outperforms the IO
optimization scheme

The plots in Fig.

The figure displays the raw objective function evaluations during the evolution of a population of network designs using MOGA to optimize performance (upper panel) and cost (lower panel).

The figure displays the minimum value of the performance objective (blue line) and cost objective (red line) for each generation during the evolution of a population of network designs using MOGA.

Because it is difficult to ascertain convergence through the raw objective
function evaluation plots, Fig.

Figure

The tradeoffs between performance and cost in Fig.

The figure displays the evolution the performance and cost objectives over generations of observing networks using MOGA. The stage of the evolution is denoted by circle size, with the earliest and latest generations corresponding to the smallest and largest circles, respectively. The measurement frequencies of the networks are color coded. Late generation points along the leading edge represent the approximate Pareto frontier, and points A–G are described in the text. The gray lines approximate the tangents to the objective minima, and their intersection defines the “utopia” point.

The station locations for three representative networks near points A, B
and E along the Pareto frontier are shown in Fig.

The figure shows the locations of observing stations in three
networks that lie near the approximate Pareto frontier (see points A, B,
and E in Fig.

In this report, we demonstrate the use of single objective and multiobjective
genetic algorithms to design optimal observing networks to constrain GHG
emissions through top-down inverse approaches. In particular, we use the
algorithms to design a network of six stations to monitor HFC-134a emissions in
California. The genetic algorithms search for station locations that optimize
both the performance

Given a set of seven existing sites that could host observing stations at a minimal cost, the multiobjective genetic algorithm jointly optimizes the performance and cost of an HFC-134a observing network. The algorithm evolves different network configurations toward the Pareto frontier (i.e., the optimal combinations of the two objectives). The Pareto frontier is convex and clearly shows the tradeoffs between performance and cost. Low performing networks can be improved with minor increases in cost, but high performing networks require substantial increases in cost to achieve further improvements. The Pareto frontier thus provides a useful quantitative guide for decision makers to understand the tradeoffs in designing a GHG observing network. Because multiobjective genetic algorithms can easily accommodate additional, highly complex objectives that account for other GHGs and measurement modalities, we expect our method will provide a useful basis for designing practical GHG observing networks.

To better understand how the prototype GHG observing network could be
extended to a real-world network design, we summarize below some of the key
assumptions in our analysis. We have also released a data set of simulation
time series to two public domain data repositories

The structure of the noise used to generate the synthetic observations could
affect the network. Although the noise differs from one location to another
because different random seeds are used in Eq. (

As a matter of convenience, we used the same measurement frequency at all of the stations in the network. Additional design variables could easily be introduced to optimize the location and frequency of each station, though the computational time to design the network would increase. We expect that such a change would result in a network with stations that collect measurements relatively more frequently in locations that are far from important sources (e.g., regions 1 and 6) than locations that are nearby (e.g., regions 7 and 12).

Last, we reiterate that the cost function used in the network design is idealistic. The form of the cost function is chosen to illustrate the notion of competing objectives (performance versus cost) and impart convexity to the Pareto frontier. Because we have more expertise on the performance aspects of network design than the cost side, it is difficult for us to extrapolate our results to situations involving realistic, detailed cost models. We invite researchers to use the publicly released data set to better explore the impacts of different cost decisions and models on network design.

This work was funded by the National Institute of Standards and Technology (grant number 60NANB10D026) and Laboratory Directed Research and Development projects at the Lawrence Livermore National Laboratory (tracking codes GS-07ERD064 and PLS-14ERD006). The work was performed under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, and is released under UCRL number LLNL-JRNL-659224. Edited by: L. Vazquez