Bivariate spatial analysis of temperature and precipitation from general circulation models and observation proxies

Abstract. This study validates the near-surface temperature and precipitation output from decadal runs of eight atmospheric ocean general circulation models (AOGCMs) against observational proxy data from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis temperatures and Global Precipitation Climatology Project (GPCP) precipitation data. We model the joint distribution of these two fields with a parsimonious bivariate Matérn spatial covariance model, accounting for the two fields’ spatial cross-correlation as well as their own smoothnesses. We fit output from each AOGCM (30-year seasonal averages from 1981 to 2010) to a statistical model on each of 21 land regions. Both variance and smoothness values agree for both fields over all latitude bands except southern mid-latitudes. Our results imply that temperature fields have smaller smoothness coefficients than precipitation fields, while both have decreasing smoothness coefficients with increasing latitude. Models predict fields with smaller smoothness coefficients than observational proxy data for the tropics. The estimated spatial cross-correlations of these two fields, however, are quite different for most GCMs in mid-latitudes. Model correlation estimates agree well with those for observational proxy data for Australia, at high northern latitudes across North America, Europe and Asia, as well as across the Sahara, India, and Southeast Asia, but elsewhere, little consistent agreement exists.


Introduction
Atmospheric ocean general circulation models (AOGCMs or simply GCMs) are being developed by various scientific organizations to study climate science, including the human impact on climate change.Recently, the World Climate Research Programme organized the Coupled Model Intercomparison Project Phase 5 (CMIP5; Taylor et al., 2012) to support the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5).The goals of the CMIP5 project include a more complete understanding of limitations and strengths of the various models.
Toward this end, we perform a bivariate spatial statistical analysis and validation of time-averaged output on two climate variables, precipitation and temperature, from eight GCM models plus one set of observational proxies.The statistical method we propose has a straightforward extension for dealing with more than two (that is, multivariate) cli-mate variables.Methodologically, our validation procedure compares second-order spatial statistics from GCM output against those from reanalysis temperatures (NCEP/NCAR) and observed precipitation (Global Precipitation Climatology Project, GPCP) data.We use the word "validate" to mean that output from GCMs should match those from corresponding real-world values, but limited by the contrapositive argument that if the output are inconsistent, the climate model must be faulty (Oreskes et al., 1994).
There have been several works that validate climate model output using various statistical methods.For instance, Jun et al. (2008) quantify the "errors" of climate models (defined as the difference between climate model output and corresponding observation) using a spatial kernel averaging method.They only considered temperature and they focused on quantifying the dependence of these errors across different GCMs.Thorarinsdottir et al. (2013) proposed to use proper divergence functions to evaluate climate models.They compared temperature extremes from 15 climate models to those from reanalysis data.In this paper, we consider two climate variables at the same time and use various covariance parameters that represent not only marginal but crossspatial dependence structure of climate variables as a measure for the validation.We are performing "operational validation" as defined by Sargent (2013), which is to assess models' output accuracy against the real system and against other models.The comparisons are done for each parameter.Ultimately, our goal is to leverage multivariate spatial statistics to probe the differences and similarities of GCMs and observation proxies.
Precipitation and near-surface air temperature were chosen for this study because they are two of the most important climate model output fields, as well as the two variables most commonly downscaled (Colette et al., 2012;Li et al., 2010;Samadi et al., 2013;Zhang and Georgakakos, 2012).Precipitation continues to pose challenges for climate models while temperature is well studied and GCMs simulate it reliably (Christensen et al., 2007;Stocker et al., 2013).Kleiber and Nychka (2012) wrote that a "major hurdle for climate scientists is to simultaneously model temperature and precipitation".Both precipitation and temperature are critical to understanding the impact of climate on Earth's biosphere, especially those aspects directly impacting human activities such as agriculture, forestry, wildfire, and even building design (Khlebnikova et al., 2012).The most recent IPCC Assessment concludes that changes in the global water cycle are likely to be nonuniform with increasing variance, increasing frequency and intensity (Stocker et al., 2013), while others report trends of rainfall redistribution (Zhang et al., 2007).The temporal relationship of temperature and precipitation are well studied, but the time-averaged spatial characteristics of these two fields is still not well understood (Trenberth and Shea, 2005;Adler et al., 2008).Gaining a better understanding of the bivariate spatial nature of these two fields should aid in these research questions.
In the literature for statistical analysis with climate model output and observations, climate variable fields are often considered individually as univariate spatial fields.For example, Jun et al. (2008) quantified cross-correlation of the errors of multi-model ensembles of the temperature field at each grid point, accounting for the spatial dependence of each error field.Sang et al. (2011) used parametric cross-covariance models to build a joint spatial model of multiple climate model errors for temperature.Lee et al. (2013) estimated local smoothness of temperature using a composite local likelihood approach.
There are few studies of the cross-dependence of two climate variables from climate model output.Tebaldi and Sansó (2009) used a hierarchical Bayesian model to jointly model temperature and precipitation data, but did not model the spatial dependence between the fields.Jun (2011) developed nonstationary cross-covariance models to jointly fit temperature and precipitation data globally using output from a single climate model.Sain et al. (2011) used a multivariate Markov random field model to account for spatial crossdependencies between temperature and precipitation from regional climate models.They used multiple regional climate model output, though dependency between different regional climate model output was not considered.
In this paper, our focus is to validate CMIP5 ensembles by investigating bivariate properties of climate variables and compare them across output from multiple climate models as well as observation proxies.Our goal is to perform validation on more than just means and variances of temperature and precipitation fields.In particular, we are interested in how cross-correlation of surface temperature and precipitation compares across model ensembles and observational proxy data.Considering the cost of running each climate model, validating climate models through various statistics in addition to simple means and variances is valuable.To the extent that each climate model accurately represents the true nature of Earth's climate, any statistics beyond means and variances should be comparable across multi-model ensembles, as well as corresponding observational proxies.Furthermore, smoothness and cross-correlation are among the key important quantities in describing the underlying distribution of the climate processes, so we also compare local smoothness of each variable across model ensembles and proxies.
The rest of the paper is organized as follows.Section 2 describes the two types of data used in this study, observational proxy data and GCM output.Section 3 introduces the statistical methodology used to estimate the statistical model parameters.Section 4 summarizes results and Sect. 5 makes recommendations for further study.

Global observation proxies
Near-surface air temperature values for 1981-2010 are taken from the NCEP/NCAR reanalysis data, provided by the Earth System Research Laboratory in the National Oceanic and Atmospheric Administration (http://www.esrl.noaa.gov/psd/).This data set originated from a project where a wide variety of data were assimilated from multiple sources including weather stations, ships, aircraft, radar, and satellites from 1957 onward (Kalnay et al., 1996).Data since that time continue to be assimilated and quality controlled so that complete data both in space and time are reliable and readily available (Kistler et al., 2001).The data we use are monthly averages on a 2.5 • × 2.5 • resolution grid.Temperatures range from −61 to 39 • C; temperature fields are shown in the bottom half of Fig. 1 for seasonally averaged boreal summer, June-August (JJA) and boreal winter, December-February (DJF).
NCEP reanalysis data is based on a system that uses forecasts and hindcasts to fill in the gaps between missing data, which works well for fields such as temperature.Kalnay et al. (1996) claimed that reanalysis temperature data "provide an estimate of the state of the atmosphere better than would be obtained by observations alone".Other fields, such as precipitation, have biases induced by the models.For this reason, we used a different data source for precipitation data.
Precipitation data was taken from the GPCP (Adler et al., 2003).Satellite and station data are compiled into a 2.5 • × 2.5 • resolution grid of observed monthly average precipitation values from 1979 through the present.
Values are reported in millimeters per day but are converted to kilograms per square meter per second (to match units of GCM output) assuming all precipitation has a density of 1000 kg m −3 , and range from zero to a maximum of 5.45 ×10 −4 kg m −2 s −1 , which is equivalent to 47.1 mm day −1 .
The temperature values and precipitation values are defined at slightly different grid points, so each field had to be adjusted to reconcile them to the same 144 × 72 grid.This was done by interpolating precipitation longitudinally and temperature values latitudinally.We compute 30-year seasonal averages for boreal summer (JJA) and boreal winter (DJF) for the years 1981-2010.Although the averaged temperature seems to be approximately Gaussian distributed, the averaged precipitation requires transformation to alleviate skewness (Kleiber and Nychka, 2012); we use the cuberoot transformation.The transformed precipitation fields for GPCP are shown in the bottom half of Fig. 2 for JJA and DJF.

General circulation models
Output from eight GCMs were obtained from the Program in Climate Model Diagnosis and Intercomparison (PCMDI) server (http://cmip-pcmdi.llnl.gov/cmip5/),which archives the experimental results of the CMIP5 project.Specifically, near-surface air temperature and precipitation from 30-year decadal predictions were used (Meehl et al., 2009).The names, abbreviations and spatial resolutions of these models, all part of the CMIP5 project, are summarized in Table 1.
The data from these runs were used to compute 30-year seasonal averages for boreal summer (JJA) and boreal winter (DJF), again, for the years 1981-2010.Decadal runs were not available for HAD-GEM2-ES, the high-resolution Hadley Centre Earth Systems model, so data from part of one historical run was used, specifically December 1972-August 2003.
All precipitation values were cube-root transformed.The seasonal average temperature fields from one GCM, that of the Geophysical Fluid Dynamics Laboratory (GFDL), are shown in the top half of Fig. 1, and the transformed seasonal average precipitation fields for this same GCM are shown in the top half of Fig. 2. Comparisons of GFDL fields to those of the observation proxies (NCEP/GPCP) in Figs. 1 and 2 show excellent agreement for both fields, though the details of the distributions for precipitation are not as good as for temperature, particularly in the extremes (JJA for the Amazon and both JJA and DJF for the Sahara and Southeast Asia).

Statistical method
Observed proxy data as well as climate model simulation data are analyzed using a two-step process.First, we fit a regression mean structure (Sect.3.1) and, second, a sixparameter bivariate cross-covariance model that assumes both temperature and precipitation fields are isotropic.Here,   (Giorgi and Francisco, 2000).
isotropic covariance structure implies that the covariance between two spatial locations only depends on the distance between the two locations (see Sect. 3.2 for more formal definition).
Clearly the isotropic assumption is not reasonable on the globe as a whole due to the spatially varying nature of the processes' spatial dependence structure.Hence, the data are first blocked into 31 regions, that is, 21 land regions as defined by Giorgi and Francisco (2000) plus 10 rectangular ocean regions (see Fig. 3 and Table 2).We assume that each region is isotropic in isolation, which is reasonable because the size of the regions are typically no more than a few thousand kilometers in each direction.Each region was designated as equatorial, mid-latitude north, mid-latitude south, or high-latitude north (labeled Equat, Mid-N, Mid-S, and North, respectively."North" because the only high-latitude land region in the southern hemisphere, Antarctica, is not included in this study due to poor data coverage).Each region's name, latitude band designation, boundaries, and size, is summa- rized in Table 2.The ocean regions were only included in the exploratory data analysis phase to check our method.

Mean filtering
Before modeling the spatial dependence structure of the two climate variables, we first filter the mean structure of each of the two fields, temperature and cube-root transformed precipitation, for each region separately, using simple linear regression.We write, where Y is latitude and E elevation.We assume that the residuals are normally distributed with mean of zero.
To choose the appropriate mean structure, in addition to Eq. ( 1), we considered a variety of predictors, specifically longitude X, as well as quadratic interaction terms such as XY, XE, Y E, X 2 , Y 2 , etc. Figure 4 shows the residuals for four different regression models for cube-root precipitation (dashed black line) and temperature (solid gray line) for the  eastern North America (ENA) region in boreal winter (DJF) for the observational proxy data.The abscissa is the index, which is arranged in scan order from the western to eastern boundaries of the region.Vertical lines denote indices where the latitude jumps +2.5 • and longitude jumps back to the west boundary for a new scan back to the east.The increasing width between vertical bars is due to the triangular shape of ENA.As a result, the latitude dependence can be seen grossly across the graph, while longitude dependence appears within each subsection.Clearly both temperature and precipitation decrease with latitude and, at least for southern strips, temperature rises with longitude (between vertical lines).Toward the north end of this region, positive temperature excursions are larger from west to east.These figures imply that a simple mean subtraction as in Eq. ( 1), without higher-order terms of X, Y , and E, is inadequate.Physically it makes sense to subtract the linear elevation (lapse rate) and latitude dependence (solar flux), leaving the relevant secondorder structure for the covariance estimation procedure.We checked figures similar to Fig. 4 for all regions for both observational proxy data and GCMs and, generally, regardless of which mean structure regression was used for filtering, beyond that chosen Eq. ( 1), the remaining signals show very similar second-order structure.
Figure 5 shows the results of the mean field filtering process.Each plot is a specific coefficient from Eq. ( 1), α 0 , α 1 , . ..β 2 , for each GCM as well as observation proxies.The abscissa is the region number (1 = ALA, . .., 21 = SSA).The elevation coefficients have the least agreement of the three coefficients, but generally agree well between NCEP/GPCP and GCMs.Note that only land regions are included in this analysis, principally because the goal of this study is to look at those regions defined in the Giorgi and Francisco (2000) study.
Figures 6 and 7 show the residual fields for five regions: Alaska, western, central, and eastern North America, plus Greenland.The black lines in these figures delineate the boundaries of these five regions.Figure 6 shows boreal summer (JJA) values while Fig. 7 shows boreal winter (DJF) values.Figure 6a and b show the residual temperature and precipitation fields for GFDL, while Fig. 6c and d show those for observation proxies (NCEP/NCAR-GPCP).GFDL was chosen for this example because it has the same spatial resolution as the observation proxies.The other GCMs have very similar residual plots.

Covariance model
We denote bivariate data consisting of the residuals from Eq. (1) at location s as Z(s) = (Z T (s), Z P (s)).Such bivariate data are assumed to be isotropic, that is, where h .= ||h|| is the distance between the two locations, s and s + h, and i, j = T or P .The covariance models, M ij , are allowed to be different in each region to account for the fact that the spatial cross-dependence structure may vary over space on a large scale.
For modeling M ij we use the parsimonious bivariate Matérn covariance structure developed in Gneiting et al. (2010).The Matérn covariance function is widely used to characterize the covariance of an isotropic spatial field because of its flexibility (Stein, 1999).For a univariate field, Z, the Matérn covariance function can be written as where K ν (•) is the Bessel function of the second kind of order ν and (•) is the standard gamma function, a > 0, ν > 0.
The covariance parameters are the variance, σ 2 , smoothness, ν, and the inverse spatial scale, a, per kilometer.Gneiting et al. ( 2010) offer a bivariate version of the function in Eq. (3) in the following way.Marginal covariance of each field, Z T or Z P , is given by the Matérn function in Eq. (3).The cross-covariance of the two fields, Z T and Z P , is modeled as Here, ρ gives the spatially co-located correlation coefficient satisfying a complex condition related to a P , a T , a P T , ν P , ν T , and ν P T (see Theorem 3 of Gneiting et al., 2010) to guarantee a positive definite bivariate covariance function.
A parsimonious version of the bivariate Matérn function imposes a condition on the covariance parameters: a = a P = a T = a P T and ν P T = (ν P + ν T )/2.The condition on ρ reduces to |ρ| ≤ √ ν P ν T 1 2 (ν P +ν T ) (Gneiting et al., 2010).Therefore, the six covariance parameters to be estimated are σ 2 T , σ 2 P , a, ν T , ν P , and ρ.We use a maximum likelihood estimation method to estimate these parameters (refer to the Appendix for details on this procedure).We compute the asymptotic standard error for each parameter estimate, and Table 3 shows those for just one region as an example.Western North America (WNA) was chosen for this example because it is an intermediate sized region.Tables S1-S21 of the Supplement contain point estimates and asymptotic standard errors for all land regions for both seasons.Gneiting et al. (2010) argue that the assumption of common range parameter for the parsimonious version is not restrictive and may even be preferred due to the difficulty in estimating some of the parameters in Matérn class.In our case, it is not unreasonable to assume that temperature and precipitation have similar spatial scales.
Co-located correlation is the spatial correlation between the precipitation and temperature fields after having been averaged over time, which is fundamentally distinct from the more commonly computed temporal correlation at each location (as in Trenberth and Shea, 2005, Adler et al., 2008, Tebaldi and Sansó, 2009, or Wu et al., 2013).As such, several observations are in order.First, this correlation can be computed given just one realization.Temporal correlation requires multiple time points to determine the extent to which the two fields correlate over time at each point in space.Second, the spatial cross-correlation coefficient can be thought of as quantifying the degree to which the residuals of the two fields share the same spatial pattern.The distinction, in terms of interpretation, is that temporal correlation tells us how the two fields compare as time unfolds for each point in space, while spatial correlation tells us how the two time-averaged fields "unfold" in space.Since we are assuming isotropic fields for each region separately, the direction separating two points is ignored, only the distance, or spatial lag, matters.We emphasize this because, while our results share features with previous studies involving temporal correlations, they also differ in important ways.

Results
Many of the results are presented using box plots of point estimates; Figs.8-10 show the median as a dark center line with a box running from the first to third quartile, and whiskers extending out to the farthest data point that is no more than 1.5 times the box height (the interquartile range) from the box.In all cases, outliers (defined as points that fall outside of the end points of whiskers) are not displayed to rewww.adv-stat-clim-meteorol-oceanogr.net/1/29/2015/duce clutter.Typically 2-6 % of the point estimates are identified as outliers, and they appear for each of the six parameters.There are more outliers in Equatorial and Mid-North latitude bands than Mid-South, and North bands because there are more Mid-North and Equatorial regions.All box plots include only point estimates for land regions aggregated across latitude bands and sources (either NCEP/GPCP alone or all eight GCMs).Note that oceans are not included in the aggregated data.For Figs. 9 and 10, the variation within each box is due to the number of regions within a particular latitude band (5, 9, 3, and 4 points for Equatorial, Mid-N, Mid-S, and North, respectively).For Fig. 8, the dark gray NCEP/GPCP box contains this same variation, but the light gray box contains the variation over all eight GCMs and regions (40, 72, 24, and 32 points for Equatorial, Mid-N, Mid-S, and North, respectively).Note that box plots for σ T have a logarithmic ordinate axis.The pattern of both observed and modeled values shows the recognized pattern that there is very little temperature variation in the tropics throughout the year and for any latitude during the summer months, whereas there is much more temperature variation for mid-latitude and high-latitude locations during the winter months compared to summer months due to mid-to high-latitude storms (G.R. North, personal communication, 2014).This pattern is appropriately reversed for mid-latitude Southern Hemisphere regions (SAF, SSA, AUS), which show larger variance in summer (DJF) than winter (JJA) for both reanalysis data and GCMs.
Figure 9a and b give the estimates of σ T versus latitude band for reanalysis data (in red) and each GCM separately.The variation within each box is due to the multiple regions within a latitude band and multiple realizations within GCMs.Note that for most models, the variance increases with latitude during winter, but not nearly as much during the summer.The distribution of σ T values for GEOS and MIROC, especially during boreal winter, are much more spread than other models.Equatorial regions consistently give smaller variance during DJF than JJA.Generally, the GCM models tend to overestimate the variability somewhat, particularly for high-latitude JJA.This pattern is expected because the tropics have larger rainfall, especially during the summer, and high-latitude sites have limited precipitation due to reduced water holding capacity of air at lower temperatures (Trenberth and Shea, 2005).Mid-S regions break this pattern, however, because they show nearly as much variance as equatorial JJA.GCMs and GPCP data differ more for Mid-S regions than other latitude bands for both JJA and DJF, suggesting that GCMs differ from one another more in mid-latitude southern land areas than in northern and equatorial land regions.For Mid-S JJA and North DJF, GCMs underestimate precipitation variance, consistent with Zhang et al. (2007), but models overestimate variance for DJF Equat, Mid-N, and Mid-S, with all other combinations essentially equal.In all cases except Mid-S, the spread of parameter estimates overlap well.
To explore precipitation variance for individual institute's GCMs, Fig. 9c and d show the estimates of σ P for GPCP data, (in red) and each institute's GCMs separately by latitude band.This shows the same patterns as Fig. 8c and  d.Again, the largest discrepancies are for Mid-S summer (DJF), where all of the models except HAD and MPI overestimate the precipitation variance, and Mid-S winter (JJA), where BCC, HAD, and MPI underestimate, while MIROC and possibly GEOS and NCAR overestimate the precipitation variance.

Temperature smoothness, ν T
When fit simultaneously with precipitation residuals using a common spatial scale, the NCEP temperature residual field tends to have a smoothness coefficient of about 1.0, consistent with results of previous studies (North et al., 2011;Jun, 2011).Because we constrain the spatial scale to be the same for temperature and precipitation, the smoothness coefficients not only characterize the traditional concept of smoothness, but also any true spatial-scale difference be- tween the fields.Figure 8e and f show reanalysis data substantially smoother in the tropics in JJA and southern summer mid-latitudes (Mid-S DJF). Figure 10a and b show individual GCM values by region and season.BCC and MIROC tend to have the smoothest fields, while GEOS and HAD the roughest fields.
Because each GCM is evaluated at its native resolution (Table 1), we tested the dependence of this smoothness estimate on the grid resolution.We tested for, and failed to see, association between temperature smoothness and the resolution of the GCM.Large smoothness values occur for coarse models such as BCC as well as fine-gridded models such as MIROC, and vice versa.An additional test was run using GEMS at its full resolution, 288 × 192, versus the same model at one-quarter its native resolution, 144×96, with little change in final parameter estimates for most cases.

Precipitation smoothness, ν P
The estimates for the smoothness parameter of the precipitation field in Fig. 8g and h show excellent agreement between GPCP and GCMs, with roughness generally increasing with latitude.Figure 10c and d show the same trends by individual model.BCC has much more variation and generally smoother fields, while GEOS and HAD show smaller smoothness coefficients.Again, as for temperature smoothness, grid resolution does not seem to be associated with precipitation smoothness.
Comparing the smoothness coefficients for temperature and precipitation, those for precipitation are mostly larger than those for temperature, consistent with results of Jun (2011) and Jun (2014).Only for observational proxy data in southern mid-latitudes DJF and high-latitude DJF is temperature clearly smoother, and the two are essentially equal for proxy data in equatorial JJA and GCMs for high-latitude JJA and DJF.Both coefficients are probably biased upward somewhat, a conclusion based on a simulation study of the parsimonious bivariate Matérn model (Apanasovich et al., 2012).Where we fix the spatial scale and allow both smoothness coefficients to be fitted, Kleiber and Nychka (2012) fixed both temperature smoothness and precipitation smoothness values to 2.0, consistent with the range of values estimated here.However, the specific values are not our principle interest, but rather the comparison between estimates for GCMs and those for observational proxies.With that in mind, GCMs tend to predict less smooth fields for both temperature and precipitation in the tropics and also for southern mid-latitude summer (DJF).

Co-located cross-correlation, ρ
Estimates for the co-located cross-correlation parameter between precipitation and temperature fields are given in Fig. 8i  and j, and for GCM-specific plots in Fig. 10e and f.Values for proxy data and models agree very well for high-latitude regions in both seasons.Only GEOS and HAD fail to capture the correct sign for this latitude band.Equatorial regions have good agreement in JJA and fair agreement for DJF.All GCMs tend to predict negative correlations over land for both seasons except high-latitude winter.Observational proxies differ most dramatically from the GCM predictions for midlatitudes, especially during winter (DJF for Mid-N and JJA for Mid-S).Of the models considered, only a few generate positive correlations in mid-latitudes: BCC, GEMS, HAD, MIROC and, to a limited extent, GFDL and MPI.
Maps of these correlation estimates are shown for observational proxy data and two GCMs (GFDL and NCAR) in www.adv-stat-clim-meteorol-oceanogr.net/1/29/2015/   , S2, and S3 of the Supplement.Most GCMs agree with proxy data for both JJA and DJF for Alaska (ALA), northern Europe (NEU), northern Asia (NAS), Southeast Asia (SEA), Sahara (SAH), Australia (AUS), and India (SAS).The Mediterranean (MED), Caspian (CAS), and Amazon (AMZ) agree only for JJA, but not DJF.The remaining regions, including ocean regions, do not match well between models and observational proxies.
The values for the maps in Fig. 11 may be visualized by comparing temperature residuals and precipitation residuals from pairs of maps from Fig. 6 or 7. Consider, for example, the Greenland region (GRL) for GFDL DJF, Fig. 11d, which has positive temperature residuals (Fig. 7a) where its precipitation residuals are also positive (Fig. 7b) and negative temperature residuals where its precipitation residuals are negative, indicating a positive spatial cross-correlation.Note though that the corresponding case of observation proxies for DJF (Fig. 7c, d) leads to a small negative cross-correlation for the GRL region due to the competing effects of same-sign residuals with opposite-sign residuals (including a few large residuals which have actual values beyond those plotted, but marked with an "X").Many of the pixel pairs in this region indicate positive cross-correlations, but there are pixel pairs -in Saskatchewan, in the southern tip of Greenland, in the northern half of Qaasuitsup, and eastern Baffin Island -that appear negatively correlated.The net result is a small negative spatial cross-correlation for this region, indicated by the negative (blue) region in Fig. 11b.These values are tabulated in Table S5.

Correlation length, 1/a
Finally, the estimated correlation length, which is constrained to be the same for both fields' covariance as well as the cross-covariance function, ranges from about 200 to 2000 km (see Figs. 8k and l and 9e and f), which is somewhat smaller than North et al. (2011) reports, but comparable to values from Jun (2011).

Discussion
We present an approach for statistical models of the joint distribution of temperature and precipitation accounting for spatial dependence structure.Using a parsimonious bivariate covariance model, we compute spatial coefficients describing variation and smoothness for each field individually, as well as spatial scale and co-located correlation between the fields.Our results generally agree with previous work with several notable exceptions.Temperature variance increases with latitude in boreal winter while remaining small in summer; GCMs track these well, though with some tendency to overestimate the variance of reanalysis data.Precipitation variance decreases with increasing latitude and GCMs match observed values well except for southern mid-latitudes.For DJF GPCP data, precipitation variance seems to increase slightly with latitude, but decreases with latitude for GCMs.It is not clear for any latitude band or season that GCMs consistently underestimate or overestimate this variance.Smoothness estimates for temperature fields tend to be smaller than corresponding smoothness estimates for precipitation, and both tend to decrease with increasing latitude.Observational proxies, particularly in the tropics, tend to have a wider range, and smoother fields, than GCMs.Smoothness estimates for precipitation are almost entirely larger than those for temperature, both for proxies and all models collectively.For high latitudes, GCMs are collectively estimated to have about the same smoothness for precipitation and temperature.Overall, GCMs tend to predict smoothness coefficients near those predicted by observation proxies except in the tropics, where GCMs tend to predict less smooth fields for both temperature and precipitation.
Co-located cross-correlations from GCMs largely agree among models but differ from estimates from observation proxies in many regions, particularly in equatorial and midlatitude zones.Temporal correlations are sensitive to any mean, seasonal variation, or trend that may not have been fully removed and, to some extent, this is true for spatial correlation estimates, though we have worked hard to minimize this.Since the aim of this study is to validate GCMs against observational proxy data, having the "right" mean structure model is not the dominant concern, but rather the consistent application across all GCMs and proxy data.Given this perspective, it is clear that GCMs are not fully consistent with one another or with proxies, except for a few regions.Exploration of this aspect of this modeling effort deserves additional attention, as well as a more complete treatment of all ocean regions.
To improve the latitudinal resolution and minimize the effects of specific mean structure filters, it may be possible to redefine spatial regions to smaller, more geographically appropriate grows.Using some form of non-stationary covariance formulation would be a more elegant way than regional "chunking" to address this problem.
Several parameter estimates for equatorial zones show differences between JJA and DJF, notably σ T andσ P , but also the smoothness coefficients, ν T , ν P , and possibly the crosscorrelation, ρ.While this may be due to noise, we think that because there are significant annual cycles in the tropics (Xie, 1994;DeWitt and Schneider, 2010;Yu et al., 2013) and the fact that we consider land-only regions, it seems likely that annual temperature and precipitation effects may arise.We are not aware of any work specifically on landonly equatorial-region annual cycle effects, but our statistical method identifies these.
Finally, the statistical methods used here for two fields can be naturally extended to multivariate climate variables, for example, to quantify the cross-correlations of more than two variables, accounting for their spatial dependence.

Figure 5 .
Figure 5. Mean filtering coefficients by region number (land regions only) for temperature intercept (a) JJA and (b) DJF, precipitation intercept (c) JJA and (d) DJF.Temperature latitude coefficient (e) JJA and (f) DJF, and precipitation latitude coefficient (g) JJA and (h) DJF.Temperature elevation coefficient (i) JJA and (j) DJF, and precipitation elevation coefficient (k) JJA and (l) DJF.Plots include results from NCEP/GPCP plus all eight GCMs as separate lines.

Figure 7 .
Figure 7. North America and Greenland DJF residuals: GFDL for (a) temperature and (b) 1000 (precipitation) 1/3 , and from NCEP/GPCP for (c) temperature and (d) 1000 (precipitation) 1/3 .In (c) and (d) there are a few pixels marked with an "X" with large residuals which have actual values beyond those plotted.

4. 1 T
Figure 8a and b show the estimates of σ T versus latitude band for observational proxy data (NCEP/GPCP) and GCMs over land only.Note that box plots for σ T have a logarithmic ordinate axis.The pattern of both observed and modeled values shows the recognized pattern that there is very little temperature variation in the tropics throughout the year and for any latitude during the summer months, whereas there is much more temperature variation for mid-latitude and high-latitude locations during the winter months compared to summer months

4. 2 P
Figure8c and dshow the estimates of σ P for observations, GPCP data and GCMs by latitude band.Models and GPCP data follow the same basic pattern, where precipitation variance decreases with increasing latitude during the summer.This pattern is expected because the tropics have larger rainfall, especially during the summer, and high-latitude sites have limited precipitation due to reduced water holding capacity of air at lower temperatures(Trenberth and Shea, 2005).Mid-S regions break this pattern, however, because

Figure 9 .
Figure 9. Parameter estimates by source for each season by latitude band, for JJA (left column) and DJF (right column).(a) JJA and (b) DJF for σT , (c) JJA and (d) DJF for σP , and (e) JJA and (f) DJF for 1/ â (in log scale).

Figure 10 .
Figure 10.Parameter estimates by source for each season by latitude band, for JJA (left column) and DJF (right column).(a) JJA and (b) DJF for νT , (c) JJA and (d) DJF for νP , and (e) JJA and (f) DJF for ρ.

Fig. 11 .
Fig. 11.The maps of these, plus the other six GCMs' correlation estimates are shown in Figs.S1, S2, and S3 of the Supplement.Most GCMs agree with proxy data for both JJA and DJF for Alaska (ALA), northern Europe (NEU), northern Asia (NAS), Southeast Asia (SEA), Sahara (SAH), Australia (AUS), and India (SAS).The Mediterranean (MED), Caspian (CAS), and Amazon (AMZ) agree only for JJA, but not DJF.The remaining regions, including ocean regions, do not match well between models and observational proxies.The values for the maps in Fig.11may be visualized by comparing temperature residuals and precipitation residuals from pairs of maps from Fig.6or 7. Consider, for example, the Greenland region (GRL) for GFDL DJF, Fig.11d,

Table 1 .
Names of modeling institutes and sources for observational proxy-data data.