Introduction
The intensification of climate research over the past decade produces a
steadily increasing number of data sets combining different global
circulation or earth system models, CO2 emissions scenarios and
downscaling techniques. Turning future projections into robust and reliable
information available at a local scale is imperative for the successful
modelling of impacts of climate change in nature and society. The
comprehensive financial and safeguarding challenges of mitigation and
adaptation call for thorough validation, improvement and extensions of
current downscaling techniques.
The comparison of climate models to weather data raises interesting
statistical problems. For a statistician, the most natural definition
of the climate is that it is the distribution of weather (and other
earth system variables) over multidecadal timescales . A climate model (general circulation model
or more generally
earth system model) describes the distribution of observable variables
based on physical principles. Because some of the processes (e.g.
convection, clouds) occur on scales smaller than the large grid squares
needed to approximate a solution to the Navier–Stokes equations, such processes are
often calculated using simple approximations (or parameterizations).
A multitude of models have emerged for projection of future climate change at
different spatial (and temporal) scales. Essential in the process of going
from the coarse resolution of the global models to finer spatial scales are
the regional climate models (RCMs). Such models propagate information from a
coarse-scale model along the boundary of a higher-resolution area of
interest, using a more detailed terrain description, model solutions using
finer resolution, and improved physical process parameterizations. The
boundary conditions may be computed either from a global weather model forced
with updated historical observations to calculate consistently the state of
the atmosphere (reanalysis), or from a global climate model. A regional model
using reanalysis boundary conditions is sometimes said to be run in “weather
forecasting mode”, and is the closest one can hope to get to observed weather
using a regional climate model. described approaches to
downscaling precipitation.
One major purpose of regional climate models is to give end users such as stakeholders and
decision makers a representation, preferably a reliable projection, at a practically
useful spatio-temporal scale, of future weather. In the
insurance industry, for instance, the interest lies in high precipitation projections
under various possible future scenarios to assess the changing risk of damages
to buildings or flooding . Typically, scenario runs
are built from the regional model forced by global coupled
ocean–atmosphere or earth system model runs. The question then becomes how reliable these
regional models are, at the scale needed by the actual effect
study. For example, to understand patterns of risk for the insurance
of buildings, precipitation at meso- to local-scale level is needed.
compared precipitation from the HIRHAM regional
model , run over Europe and forced with ERA-40
reanalysis boundary conditions, to a gridded precipitation
product for Norway on a 25 × 25 km2 scale. They
used a variety of statistical measures for comparing the two data sets. The
regional model output was found to describe low levels of precipitation
fairly well, but failed to reproduce large quantities. found
that most of the regional models in ENSEMBLES give fairly accurate
descriptions of drought indices and other functions of low precipitation
regimes. These findings, together with the need for representative scenarios
called for by most impact studies, serve as a motivation for improving the
local-scale description of extreme future climate precipitation.
It has long been understood that regional models tend to be regionally biased
in terms of precipitation (e.g. ).
Bias correction is an approach that attempts to adjust (statistically or
otherwise) the climate model output (regional or global) to make it closer to
observed data for historical runs. The idea is then that applying this bias
correction to future simulations should also provide more realistic
projections. develop a framework for assessing the bias
correction, and the assumptions needed to apply such corrections to
projections. They focus on temperature data, and can therefore assume normal
distributions, which is not appropriate for precipitation.
There are a variety of bias correction methods in the literature
(, contains a review). The simplest is a multiplicative
correction to make the empirical means of data and RCM output agree
. Some authors (e.g. prefer to make
the correction only to the mean of precipitation on rainy days. An
intermediate approach adjusts the coefficient of variation of data and model
output . More advanced methods try to match quantiles,
either by fitting gamma distributions (with point mass at zero) to data and
models or non-parametrically using full quantile mappings
. The bias corrections are typically done grid square by
grid square, without an explicit spatial model for between grid square
dependence. Quantile corrections (or smoothed estimates thereof) typically
are found superior in comparative assessments
.
In this paper we consider approaches to statistical adjustment of the
regional model output, obtaining a calibrated product that is closer in
distribution to the observed data than the original output. We first
investigate the Doksum shift function , which makes a full
quantile calibration, as the basic tool for adjustment. Next, we restrict
ourselves to less ambitious models that correct individual quantiles
separately. Considering gridded data products covering Norway, we build
transformations either separately for each grid cell, or via models that
incorporate some kind of spatial structure. The models are fitted to a
training set of downscaled ERA-40 data, and then used to correct downscaled
ERA-40 on a test set. We also try to correct downscalings of historical
climate model runs using the same transformations built on downscaled ERA-40
data. Unless such calibrations are successful, it is difficult to argue that
scenario-based downscaled climate projections are realistic and useful for
decision makers.
The paper continues as follows. In Sect. we present the
various data sets used in the analysis. Section
deals with using the shift function to do full quantile bias correction,
while Sect. focuses on bias correction of individual
quantiles. Section discusses the potential use of the
methodology in assessment and uncertainty quantification of regional climate
models.
All code needed to run the analysis on the data are found at .
Full quantile calibration of Norwegian precipitation
The results of the evaluation of the regional model
underlines the need for enhanced climate projections at a local scale.
Discrepancies between the distributions of observed and downscaled
precipitation exist for the whole range of data, suggesting that a full
quantile calibration function is needed. In Sect. we
address this issue using a calibration that will make the model data
distribution closer to that of the observed data.
Distributional calibration using Doksum's shift
We characterize the transfer function between two distribution functions, in
our case those of a model and of observations, using Doksum's shift function
. To define this function, consider data from two
distributions, F and G, and let Δ(x)=G-1(F(x))-x. If X∼F (i.e. X is a random variable with cumulative distribution function
F), it is easy to see that X+Δ(X)∼G. In other words, Δ(x) measures how much the distribution F needs to be shifted at a value x
in order to coincide with the distribution G. The shift function Δ
can be estimated using empirical distribution functions for F and G:
Δ^(x)=G^-1(F^(x))-x,
where F^ and G^ are the empirical cumulative
distribution functions of F and G, respectively. The empirical cumulative
distribution function is a step function:
F^(t)=1n∑i=1nI(xi≤t),
where I(A) is the indicator of event A, and (x1,…,xn) are
observations of independent and identically distributed real random variables distributed according to F. We
follow the standard statistical notation where Xi stands for a random
variable and xi for its observed value.
If the shift function is constant, it means that there is only a
difference in location between the two distributions (and particularly if
that constant equals zero there is no difference between the distributions). If it is
linear, a location-scale transformation is implied.
Fisher tests of the 95 % quantile of the winter season
uncalibrated dERA40 test data (left panel), the calibrated dERA40 test data
(middle panel), and the calibrated dBCM test data (right panel). The plots
show significance level α=5 %.
Assume next that a region is divided into S grid cells. For grid cell i,i=1,…,S, let Xi denote downscaled model precipitation and Yi
observations. Let Fi be the cumulative distribution function of Xi and
Gi that of Yi. Assume further that we have downscaled model output
xit and observations yit for days t=1,…,T (using a common
T implies no loss of generality; should the number of data points for
F^ and G^ rather be TF^ and
TG^, respectively, those are used instead). Our interest lies
in distributional coherence between the downscaled model data and the
observations, rather than daily correspondence between xit and yit.
Calibration of a new value xit′uncal drawn from the same distribution
F^i but with t′∉1,…,T is done by adding its
Doksum shift,
xit′cal=xit′uncal+Δ^i(xit′uncal)=xit′uncal+G^i-1(F^i(xit′uncal))-xit′uncal=G^i-1(F^i(xit′uncal)),
showing that this calibration is indeed a full quantile calibration.
Transferability of calibration
Assume that we want to apply the calibration in Eq. () to
data from another data set, i.e. to zit′H not necessarily distributed
according to F and where t′ may or may not overlap with 1,…,T. A
typical example would be extending the calibration established for a
re-analysis to a historical climate model run. As before, we use data from
F and G to calculate empirical distributions F^i and
G^i respectively, where F^i is estimated from
xit, t=1,…,T and G^i from yit, t=1,…,T.
We then correct the new data set, zit′H, by
zit′cal=zit′H+Δ^i(zit′H)=zit′H+G^i-1(F^i(zit′H))-zit′H=G^i-1(F^i(zit′H)).
Shift function calibration results
In it was shown using several criteria that the dERA40
and OBS data sets lack agreement. A detailed comparison of specific local
features showed that the global disagreement was due to poor agreement for
high quantiles. As mentioned in the day-by-day
correlation is partly lost when downscaling the ERA-40 data; hence we compare
distributions rather than using day-by-day test measures. We test the
calibration models described in Sects.
and by considering different seasons separately. The
seasons used are winter (December to February), spring (March to May), summer
(June to August) and autumn (September to November).
In our current setup, the dERA40 and OBS data are further divided into a
training set and a test set. The training set is used to fit the calibration
model. The transfer function thus obtained is applied to dERA40 data for the
test period, which then is compared to observations for the test period. Here
the training data are chosen to be the first 80 % of the total data, i.e. the
years from 1961 to 1992. The test data are chosen to be the last 20 % of the
data, i.e. the years from 1993 to 2000.
The dERA40 data are calibrated using Doksum's shift function as described in
Sect. . In particular, for a specific xituncal
from the test data set, its calibrated value xitcal is calculated
from Eq. () where F^ and G^ both are
estimated based on the training data.
Assuming independence between the test statistics for different grid squares,
if all null hypotheses are true, we would expect about 39 spurious
significances at 95 % confidence level in a plot with 777 grid cells. We have
carried out the same kind of comparisons as in , but here we
only report the Fisher test of the 95th percentile. Figure shows substantial amounts of rejections (74 %) in the
uncalibrated dERA40, with an improvement (24 % rejections) for the calibrated
data, and a deterioration (48 % rejections) for the downscaled Bergen climate
model. Things are worse for the Kolmogorov–Smirnov test, in particular for
the climate model data (77, 18 and 79 %, respectively). Since we are
estimating the calibration from the training data, we do not expect to get
only 5 % rejections in the test set. Furthermore, spatial dependence also
affects the rejection rates.
The main reason for the difficulty of making a full quantile calibration is
that the bulk of the distribution is concentrated around very small
precipitation values, and the Kolmogorov–Smirnov statistic tends to focus on
these well-estimated parts of the distribution, where very small differences
in amounts are the reason for rejection. It would be natural to hope to use a
spatial model to borrow strength from nearby grid squares. However, the high
variability in the quantile correction for large values (occurring since
there are relatively few high observations of precipitation) makes it
difficult to fit a spatial functional model. Instead we will focus on
calibrating directly quantities of higher interest for adaptation, namely
high quantiles, where the full calibration did somewhat better.
Calibrating individual quantiles
We now focus on calibrating a fixed quantile, q, over a time period of r
days. Because of the cross-validation comparison we will perform in the next
section, we denote these time periods as folds. An observation Yi,kq is
the qth empirical quantile at location si and fold k. That is,
Yi,kq=G^i,r(k-1)+1:rk-1(q),
where G^i,r(k-1)+1:rk-1(q) is the left inverse of the
empirical density function made from observations corresponding to the days
of fold k, Yr(k-1)+1:rk. To link the downscaled precipitation to
Yi,kq, we construct
Xi,kq=F^i,r(k-1)+1:rk-1(q).
The goal is now to predict the spatial field Yi,kq using a calibrated
version of Xi,kq, where the calibration is estimated using all data
that are not in fold k, that is, Yi,-kq and Xi,-kq, where the
subscript -k stands for all folds except the kth.
We will use the notation Ykq for the vector [Y1,kq,…YS,kq], and similarly for Xkq. Further, we will denote a
diagonal matrix with diagonal entries b by diag(b).
As a baseline method, we use the empirical quantile at each location
Yi,-kq as the predictor of Yi,kq. We will later denote this as
Model 0, and it should be noted that this prediction does not use the
downscaled data. However, it should be a reasonable prediction assuming that
the climate is stationary.
As a reference model, we use the smoothing spline method that performed well
in . We will later denote this as Model Ref. The method
matches (all) quantiles of the model output to (all) quantiles of the
observations using a cubic spline regression, for days with non-zero
precipitation.
Model 1: linear regression
As a first method for doing the calibration, we do linear regression with
Xi,kq as covariate. Since the model is for precipitation data, which is
asymmetric and positive, we formulate the regression in log scale as
log(Yi,kq)=α+log(Xi,kq)βi+εi,k,
where εi,k∼N(0,σ2). Note that we have one parameter
βi for
each location, and thus have a spatially varying calibration of the downscaled data.
The parameter estimates (α^,β^) are estimated by ordinary least
squares, and we use Y^kq=exp(α^+diag(log(Xkq))β^) as
a predictor for Ykq. That is, we use the median as a point estimate (and not the mean).
Finally, to apply the method to other data sets as discussed in Sect. ,
one simply replaces Xkq with XkH,q.
Pointwise 95 % quantiles of OBS (left), dERA40 (middle), and
dBCM (right) for the first 5-year period in the
cross-validation.
Cross-validation mean square error for the 95 % quantile. The best model for
each fold is displayed in bold.
k
Model 0
Model 1
Model 2
Model 1s
Model 2s
Model Ref
1
13.29
9.68
8.06
8.59
7.91
10.69
2
31.67
7.83
13.17
6.16
14.97
6.57
3
14.45
11.23
8.70
10.72
9.15
14.11
4
13.54
6.04
7.85
5.29
8.61
5.95
5
10.93
5.63
6.59
4.99
7.22
5.79
6
23.90
10.15
12.11
8.37
13.55
9.78
7
25.47
11.84
14.79
9.50
15.83
10.15
8
16.35
10.72
11.47
9.54
12.03
10.96
Mean
18.70
9.14
10.34
7.89
11.16
9.25
Model 2: incorporating the spatial dependence
There is clearly spatial dependence in the data, which we want to incorporate
in the model to improve the predictions. We can do this by assuming that the
regression coefficients are spatially dependent, using a stochastic model as
follows
log(Ykq)=α+diag(log(Xkq))β+εkβ∼N(0,Σ(ν,κ,ϕ)),
where again εk∼N(0,σ2I) and Σij=C(‖si-sj‖), where C is a Matérn covariance function :
C(d)=ϕ221-νΓ(ν)(κd)νKν(κd).
Here ϕ determines the variance of the process, ν is a shape
parameter of the covariance function, and κ determines the correlation
range.
Cross-validation mean square error for the 95 % quantile using dBCM
predictions. The best model (excluding Model 0) for each fold is displayed in
bold. For this comparison, Model 0 can be seen as the target value we want to
reach with the dBCM-based predictions.
k
Model 0
Model 1
Model 2
Model 1s
Model 2s
Model Ref
1
13.29
17.35
11.76
17.95
10.53
19.80
2
31.67
12.34
14.31
8.40
13.53
12.15
3
14.45
60.26
33.90
73.64
31.18
64.41
4
13.54
17.03
9.34
19.53
7.77
20.24
5
10.93
33.23
13.60
42.82
10.90
41.06
6
23.90
57.05
37.95
65.99
35.37
64.30
7
25.47
98.60
58.07
118.3
52.40
114.17
8
16.35
47.72
29.71
55.38
27.33
51.96
Mean
18.70
42.95
26.08
50.25
23.63
48.52
A more computationally efficient alternative to the covariance-based model
would be to use a Markov random field prior on β, similar to that
by . However, this is not needed since the data are measured
only at 777 spatial locations, and we therefore use the simpler
covariance-based approach here.
The model parameters θ={α,κ,σ,ϕ,ν} are
estimated using maximum likelihood. The log-likelihood function is given by
L(θ;Y)=12∑j≠klog|Σj|-12∑j≠k(log(Yjq)-α)TΣj-1(log(Yjq)-α),
where Σj=diag(log(Xjq))Σ(ν,κ,ϕ)diag(log(Xjq))+σ2I. We find the ML estimates θ^={α^,κ^,σ^,ϕ^,ν^} of the
parameters using numerical optimization of the log-likelihood function.
Specifically, the function optim in is used for the
optimization; see the source code at for further
computational details.
The predictor of Ykq is obtained as Y^kq=exp(α^+diag(log(Xkq))β^), where
β^=E(β|Y-kq,θ^)=Σ(ν^,κ^,ϕ^)+1σ^2∑j≠kdiag(log(Xjq)2)-1∑j≠k1σ^2diag(log(Xjq))(log(Yjq)-α^).
To apply the model to other data one simply replaces Xkq with XkH,q.
Example calibration for the pointwise 95 % quantiles using
Model 2s with the dBCM covariate (right). The result is for the final
5-year time period in the cross-validation study. The observed quantiles
(left) and uncalibrated dBCM (middle) are shown as
references.
Average mean square error for models trained on dERA40 data. The values in the
table are averages across the eight folds for each season.
Model 0
Model 1s
Model 2s
Model Ref
RCM
–
dERA40
dBCM
dERA40
dBCM
dERA40
dBCM
Winter
18.70
7.89
50.25
11.16
23.63
9.25
48.52
Spring
8.25
5.98
18.39
6.86
9.54
7.69
21.14
Summer
7.67
5.12
28.80
7.05
9.18
6.99
36.28
Autumn
13.17
9.26
60.08
11.14
20.05
11.92
67.25
Model 1s and Model 2s: pre-smoothing the covariates
A somewhat surprising feature of the data are that the quantiles of the
observed data, Y, are spatially smoother than the downscaled climate
model output (see Fig. ). Because of this, it is natural
to add a step in the analysis where covariate is smoothed spatially before it
is used in the regression model. This is done using the following model
log(Xkq)=log(X̃kq)+εklog(X̃kq)∼N(μ,Σ(νx,κx,ϕx)).
Here log(X̃kq) are independent realizations of a Gaussian–Matérn field and εk∼N(0,σx2I). We estimate the
parameters using numerical maximization of the log-likelihood
L(μx,σx,νx,κx,ϕx;X)=82log|Σ^|-12∑k=18(log(Xjq)-μx)TΣ^-1(log(Xjq)-μx),
where Σ^=Σ(νx,κx,ϕx)+σx2I. Model 1s and Model 2s are then obtained by using
E(log(X̃kq)|Xkq,μ^x,σ^x,ν^x,κ^x,ϕ^x)=μ^+(Σ(ν^x,κ^x,ϕ^x)-1+1σ^x2I)-1(log(Xkq)-μ^)
instead of log(Xkq) as covariate in Model 1 and Model 2 respectively.
Results for individual quantiles
In this section, we evaluate the performance of the methods described in
Sect. for calibrating individual quantiles. As the tests
in Sect. were based on the 95 % quantile, we
focus on predicting Q0.95OBS.
The results in Sect. rested on predicting the last
20 % of the data (8 years) based on the first 80 %. Here, we make a more
detailed investigation using 8-fold cross-validation to evaluate the
performance of the models. The data are divided into eight 5-year periods,
and the quantile for each 5-year period is predicted using a model
estimated on the rest of the data. The quantiles for the observations,
dERA40, and dBCM data for the first 5-year period can be seen in
Fig. .
The results using the various models can be seen in Table , and the results obtained when training the models on the
dERA data but using dBCM for prediction are shown in Table . One can note that the extension of Model 1 to Model 2 by
adding spatial dependency does not improve the results for the dERA40-based
predictions, whereas it greatly improves the BCM-based predictions.
Furthermore, pre-smoothing improves Model 1 for the dERA40 predictions,
whereas it improves Model 2 for the BCM predictions. The reference model
Model Ref performs very similarly to Model 1.
Overall, Model 1s performs best for the dERA40 predictions, whereas Model 2s
performs best for the dBCM predictions. For the dBCM predictions, Model 2s
has satisfactory performance compared with the target performance for that
case which is given by the Model 0 results. An example prediction using this
model can be seen in Fig. .
The results for the different seasons are summarized in
Table . For all seasons, the conclusion is that Model 1s
is preferable if we both train and test the model on dERA40 data, whereas
Model 2s is favoured if we train the model on dERA40 data and use that
transfer function to calibrate dBCM input. The reason for this is likely that
the additional smoothing done in Model 2s compensates for the added
uncertainty when using dBCM data in the model trained on dERA40 data.
Discussion
The low quality of Norwegian precipitation in the HIRHAM regional model
forced by reanalysis necessitates a full quantile
recalibration/bias correction. Our assessment of such a recalibration on test
data indicates that it does a credible job of correcting the dERA40 model,
even under changing weather conditions. In order to apply the calibration to
climate projections, which is the ultimate goal of this research, we first
experiment with the same regional model using a global climate model (GCM), run
using historical forcings and corrected using the same calibration as for
dERA40. Downscaled global models are unable to describe the observations
well. When correcting these downscaled global models, we would of course not
expect to get a perfect calibration to data, but would hope that the
downscaled GCM/earth system model would describe a similar distribution to that of the
observations over a reasonably long period. Unfortunately, this is not the
case.
Instead of adjusting the entire distribution, we are able to achieve a better
performance by focusing on adjusting an individual quantile. In that case we
were able to achieve error rates that indicate that the corrected downscaled
climate model performed almost as well as the reanalysis-forced downscaling,
indicating that this approach can be a useful tool in downscaling climate
projections of precipitation over Norway.
There is a case in between the full quantile adjustment and the individual
quantile adjustment, namely simultaneous adjustment of several quantiles.
This will be subject to further research.
The sensitivity of regional dynamic downscalings to the lateral boundary
conditions is well known (e.g. , and references therein), and
one possibility would be to downscale other reanalyses and compare the
results. Since we did not have access to such RCM runs, we were not able to
pursue this. On the other hand, we have been able to look at other RCMs (such
as the Swedish RCA3 , with the same reanalysis as
boundary condition) and other GCMs (such as the Hadley Centre HadCM3Q0
model). The bias correction based on other regional and global models is very
similar to that based on HIRHAM and BCM (results not shown).