Analysis of Variability of Tropical Pacific Sea Surface Temperatures

Sea surface temperature (SST) in the Pacific Ocean is a key component of many global climate models and the El Niño Southern Oscillation (ENSO) phenomenon. We shall analyse SST for the period November 1981 – December 2014. To study the temporal variability of the ENSO phenomenon, we have selected a subregion of the tropical Pacific Ocean, namely the Niño 3.4 region, as it is thought to be the area where SST anomalies indicate most clearly ENSO’s influence on the global atmosphere. SST anomalies, obtained by subtracting the appropriate monthly averages from the data, are the focus of the 5 majority of previous analyses of the Pacific and other oceans’ SSTs. Preliminary data analysis showed that not only Niño 3.4 spatial means but also Niño 3.4 spatial variances varied with month of the year. In this article, we conduct an analysis of the raw SST data and introduce diagnostic plots (here, plots of variability versus central tendency). These plots show strong negative dependence between the spatial standard deviation and the spatial mean. Outliers are present, so we consider robust regression to obtain intercept and slope estimates for the twelve individual months and for all-months-combined. Based on this 10 mean-standard deviation relationship, we define a variance-stabilising transformation. On the transformed scale, we describe the Niño 3.4 SST time series with a statistical model that is linear, heteroskedastic, and dynamical.


Introduction
Sea surface temperature (SST) is an important component of many global climate models.The thermal inertia of the oceanic surface layer means the air-sea interaction that occurs at the surface of the ocean is important to global temperature models and prediction.Monthly SST datasets are a combination of satellite, ship, and buoy observations.Typically, these are interpolated to produce a cohesive dataset, such as can be found on the NOAA (National Oceanic and Atmospheric Administration) website (NOAA, 2016).SST anomalies rather than raw SST data are often analysed; SST anomalies are determined by calculating the long-term (usually 30-year) temporal mean for each month and then subtracting the appropriate monthly average from each data point.For an illustration, the SST data (at a resolution of 1 • × 1 • ) and the corresponding anomalies for January 1983 in the tropical Pacific Ocean are shown in Fig. 1, where the climatology base period is 1971-2000.SST in the tropical Pacific Ocean is a key component of the El Niño-Southern Oscillation (ENSO) phenomenon (Alexander et al., 2002;Bjerknes, 1969).Furthermore, tropical SST anomalies have their greatest effect in the western Pacific Ocean, where SST is normally higher than the global average SST (Holton, 2004).
The term El Niño was originally used to refer to the upwelling of warm water in the Pacific Ocean off the South American coast.However, the term is now used to describe a broader range of interconnected oceanic and atmospheric effects.ENSO describes the distribution of warmer-thanaverage waters in the tropical Pacific Ocean, the associated atmospheric variations, and the resulting weather conditions.ENSO has two canonical states or regimes, an El Niño event (warmer eastern tropical Pacific) and a La Niña event (cooler eastern tropical Pacific).There are also periods where the ocean is in a transition phase between these two states, which is referred to as the neutral phase (Holton, 2004).During an El Niño event, the trade winds (the prevailing pattern of easterly tropical surface winds) are weaker and the warm water in the western tropical Pacific Ocean spreads eastwards into the central and eastern Pacific Ocean (Cai et al., 2014;McPhaden, 2004).There is an increase in dry conditions (reduced precipitation) in Australia, particularly in the east and across Indonesia, New Guinea, Micronesia, Fiji, New Caledonia, and Hawaii (Pui et al., 2012;Ropelewski and Halpert, 1987).Concordantly, there is an increase in precipitation in the central Pacific Ocean, the western and southern United States, Central America, and South America (Rasmusson and Carpenter, 1982;Ropelewski andHalpert, 1986, 1987).El Niño is also associated with anticyclonic anomalous flow in the upper troposphere on either side of the Equator (Arkin, 1982).
During a La Niña event, the trade winds are stronger, the thermocline across the Pacific Ocean gets steeper, and the western tropical Pacific Ocean has warmer-than-normal SSTs (McPhaden, 2004).There is increased rainfall in Australia, and La Niña events have been correlated with an increased number of tropical cyclones during the cyclone season (Chand et al., 2013;Ropelewski and Halpert, 1989).In the south-western United States, a La Niña event is typically associated with unusually dry conditions (Cole et al., 2002).
In the neutral phase, there is low surface pressure in the western tropical Pacific over the warm ocean around Indonesia, high surface pressure in the eastern tropical Pacific, and the trade winds blow across the tropical Pacific Ocean (Holton, 2004).
Although ENSO events are characterized in the tropical Pacific Ocean, the global atmosphere and oceans are highly interconnected and, thus, climatic effects in regions of the world outside the Pacific can be correlated with ENSO phases.For example, the ENSO phenomenon teleconnects with precipitation during the monsoon season in India, the rainy season in south-eastern Africa (Ropelewski andHalpert, 1987, 1989), and global precipitation (Dai and Wigley, 2000;Dai et al., 1998;Shi et al., 2002).Consequently, El Niño and La Niña events have worldwide implications, and their accurate forecasting is an extremely important problem that would benefit from further statistical analysis.
Characterizing the ENSO phenomenon's temporal variability has, in the past, concentrated on SST anomalies.In this article, we show that this strategy misses a fundamental spatial mean-variability relationship that can guide us to modelling and forecasting SST on a different scale, where the issues associated with forecasting El Niño and La Niña events are more transparent.Section 2 presents the dataset we analyse.In Sect.3, we explore the mean-standard deviation dependence of tropical Pacific SST data through simple linear regression.This identifies a need for uncertainty quantification of the regression slope estimates, which is introduced in Sect. 4. In Sect.5, a variance-stabilizing transform is defined from the mean-standard deviation dependence, for all-months-combined and for each month individually.In Sect.6, a statistical model is built on the transformed scale, where an autoregressive process is fitted.Conclusions and a discussion of implications from this study of SST variability are given in Sect.7, and an appendix summarizes a bootstrap algorithm that we use for uncertainty quantification.

Dataset
The dataset we shall analyse is a subset of the global monthly SST from the Climate Modelling Branch (CMB) of NOAA using the Reynolds and Smith optimum interpolation version 2 algorithm (Reynolds et al., 2002;NOAA, 2016).In this article, we consider data from November 1981 to December 2014, inclusive.It is called "optimum interpolation version 2" because in November 2001 the fields were recalculated from November 1981 onwards.Optimum interpolation is the method by which irregularly spaced observations in time and space are combined to form a cohesive dataset (Reynolds, 1988;Reynolds and Marsico, 1993;Reynolds et al., 2002).
The optimum interpolation analysis is produced weekly using in situ (buoys and ships) and bias-corrected satellite datasets, combined with SST estimations based on sea-ice cover.The buoy observations are from both moored and drifting buoys, and they are considered to be more accurate than ship observations (Reynolds et al., 2002).Satellite measurements of SST have been obtained from geostationary and polar low-earth-orbiting platforms, using the advanced very high-resolution radiometer (AVHRR) and advanced microwave scanning radiometer (AMSR) instru-ments.The satellite observations measure approximately the top millimetre of the ocean, while the ship and buoy measurements measure the top few metres (Dash et al., 2012;Deser et al., 2010;Reynolds, 1988;Reynolds and Marsico, 1993;Reynolds et al., 2002).
To obtain monthly SST fields, the weekly fields are linearly downscaled to daily fields, and then the daily fields are averaged for the month.The monthly SST data are defined on a 1 • × 1 • latitude-longitude grid and are in units of degrees Celsius.

Spatial region
To study the temporal variability of the ENSO phenomenon, we have selected a subregion of the tropical Pacific Ocean, namely the Niño 3.4 region.The Niño 3.4 region combines part of each of the Niño 3 region (5 • S-5 • N and 90-150 • W) and the Niño 4 region (5 • S-5 • N and 160 • E-150 • W).It is defined as the region in the latitude range, 5 • S to 5 • N, and the longitude range, 120 to 170 • W (Barnston et al., 1997); see Fig. 1.We chose the Niño 3.4 region for our analysis as it is widely used in tropical Pacific SST studies, and it is thought to be the area where SST anomalies indicate most clearly ENSO's influence on the global atmosphere (Cane et al., 1997).
In what follows, the ith grid cell in the Niño 3.4 region is referenced by its centroid s i .These latitude-longitude co-ordinates are zonally and meridionally one degree apart and take half-degree values.The spatial region of interest (Niño 3.4 region) is made up of 10 × 50 = 500 ocean pixels, namely D s = {s 1 , . .., s 500 }.

Temporal period
We analyse the SST dataset for the period November 1981-December 2014.Thus, the temporal period of interest is D t = {1, . .., 398}, where each t corresponds to a time period of 1 month.For some of our analyses, we subset D t into D Jan t , D Feb t , . .., D Dec t , corresponding to the months January, February, . . ., December, respectively; in this case, t = 1 corresponds to the first year (November 1981-October 1982), t = 2 corresponds to the second year (November 1982-October 1983), and so forth.

Mean-standard deviation dependence in the Niño 3.4 region
The majority of previous analyses of Pacific SST focus on the SST anomalies, obtained by subtracting the appropriate monthly averages from the data.This removes seasonal effects, but some of our preliminary data analysis showed that not only spatial means but also spatial variances varied with month of the year.We conjectured that the monthly spatial variances might be related to the monthly spatial means, which led to this study.In this article, we conduct a spatiotemporal analysis of the raw SST data and introduce diagnostic plots (here, plots of spatial variability vs. spatial central tendency) for all-months-combined and for each of the 12 months separately.By going directly to anomalies, important nonlinear behaviour in the raw data may be overlooked.

Spatial mean and spatial standard deviation
Define the spatial mean of a spatio-temporal dataset {z(s, t) : s ∈ D s , t ∈ D t } for each time as where |D s | denotes the number of pixels (here, 500) in D s .Define the (unbiased) spatial variance of the data {z(s, t) : s ∈ D s , t ∈ D t } for each time as Define the spatial standard deviation as the square root of the spatial variance and denote it as S z (t).

Diagnostic plot of mean-standard deviation dependence
We plotted the spatial standard deviation vs. the spatial mean in the Niño 3.4 region for all t ∈ D t ; see Fig. 2.There is an apparent negative trend; that is, when the Niño 3.4 region has a higher spatial mean, it has a lower spatial standard deviation.This negative trend can also be seen in other regions of the tropical Pacific Ocean, for example the Niño 1+2 region off the coast of Peru.Recall from Sect. 2 the definitions of the monthly temporal indices, D Jan t , D Feb t , . .., D Dec t .We repeated the plot of the spatial standard deviation vs. the spatial mean in the Niño 3.4 region given in Fig. 2, but we used different colours to represent the different seasons; see Fig. 3.While there is a range of points within each of the seasons, there does appear to be substantial between-season variability.For example, the small-spatial-mean and small-spatial-standard-deviation region of the figure is predominantly austral summer months (i.e.December, January, and February) and, in general, the austral spring months (i.e.September, October, and November) have higher spatial standard deviations.The straight lines fitted for each season are obtained using the robust methodology described in Sect.3.4.
To understand this variability, we plotted the spatial standard deviation vs. the spatial mean in the Niño 3.4 region for all t ∈ D Jan t , t ∈ D Feb t , . .., t ∈ D Dec t ; see the 12 plots in Fig. 4. Combining some of the months together would increase the number of observations in each group, potentially improving the power of hypothesis tests we intend to apply (Sect.4).However, care is needed not to introduce bias by combining months that are dissimilar.Further, we would lose the opportunity to see monthly behaviours and their dependencies on other environmental factors.
The negative slope identified in Fig. 2 also appears in Fig. 4 where the data are separated into the 12 individual months.The straight lines fitted for each month were obtained using the robust methodology described in Sect.3.4.If the slope of any line is significantly different from zero, this could indicate that tropical Pacific Ocean SSTs should be transformed at certain times of the year to a different scale, where there is no mean-standard deviation relationship.We have not been able to find any mention in the literature of the relationships apparent in Figs.2-4.In particular, Fig. 4 shows a consistent negative slope throughout the months of the year, albeit different in strength from month-to-month.The cause could be coupled dynamics with other ocean and atmospheric variables (e.g.ocean currents or surface winds), which manifests as a mean-standard deviation relationship.
It is generally accepted that the ENSO phenomenon is nonlinear (e.g.Hoerling et al., 1997;Berliner et al., 2000;Kondrashov et al., 2005), and it would appear that the relationships shown in Fig. 4 represent another way to describe this.Nevertheless, the goal of this paper is to describe rather than to explain the dependence between the spatial mean and the spatial standard deviation.Our description (Sect.5) has implications for subsequent statistical modelling and forecasting of Pacific SSTs (Sect.6).
To obtain the slope of the linear relationships by month (Fig. 4), we could have used an ordinary-least-squares (OLS) estimator, but because of apparent outliers, we used a robust estimator.In what follows, we motivate robust estimation for simple linear regression, from an alternative expression for OLS estimation of the straight line's intercept and slope.

Ordinary least squares for simple linear regression
The simple-linear-regression equation based on n data, (x 1 , y 1 ), . .., (x n , y n ), is typically written as where y i represents the dependent variable, x i is the explanatory variable, and i is the error with mean 0 and variance σ 2 , for i = 1, . .., n.Estimates for α and β, namely α and β, can be derived by minimizing the sum of squares of the residuals, and their formulas are well known.Cressie and Keightley (1981) show that the same estimates can be obtained alternatively, by calculating a weighted average of the slopes, s i,j = y i −y j x i −x j , between all points (x i , y i ) and (x j , y j ), where x i = x j .That is, the OLS estimate of the slope, β, can be written alternatively as the weighted average, where the weights are w i,j = (x i − x j ) 2 , as suggested by the Gauss-Markov theorem.(For the case where x i = x j , then w i,j = 0, and that term in the sum shown in the numerator of Eq. ( 4) is interpreted as zero.) The OLS estimate of the intercept, α, can also be written as a trivial weighted average involving all pairs (x i , y i ) and (x j , y j ).Define weights w i,j = 1; then the OLS estimate of the intercept can be written alternatively as . (5)

Robust regression
Detecting and rejecting outliers can involve subjectivity or difficult simultaneous inferences, but robust methods provide an automatic way to deal with them.Most robust-regression methods minimize some function of the residuals.However, here we use the alternative expressions for the OLS estimates of slope and intercept given by Eqs. ( 4) and ( 5), respectively, to implement robust regression.

Theil-Sen method
Recall from Sect.3.3 the definition of the pairwise slopes, {s i,j }.It can be seen that E(s i,j ) = β for all i, j .Hence, a natural estimator of the slope β in Eq. ( 3) is One way to robustify this estimate of the slope would be to replace the average with the median.Hence consider βUTS ≡ median y i − y j x i − x j : which is known as the unweighted Theil-Sen estimator (Sen, 1968;Theil, 1992).The estimate, βUTS , is an unbiased estimator of the true slope, β, of the simple linear regression of y on x given by Eq. ( 3) (Cressie and Keightley, 1981;Wang and Yu, 2005).

Weighted Theil-Sen method
The estimate βUTS can be made more efficient by using weights in the median in Eq. ( 7).The weighted median as an estimate of a location parameter was first proposed by Edgeworth (1888).Given observations a 1 , . .., a n , attach weights w 1 , . .., w n , where w k ≥ 0 and k w k > 0 (Scholz, 1978).
Then a distribution function G(b) can be defined as follows: where 1 [a i ≤b] is the indicator function such that 1  Consider βWTS = weighted median{s ij : x i < x j ; w} for given weights w = {w i,j }.This estimator is called the weighted Theil-Sen estimator.Sievers (1978), Scholz (1978), and Cressie and Keightley (1981) suggest weighting by w ij = |x i − x j |.Cressie (1980) showed this to be an asymptotically optimal choice, giving more weight to the slopes between points that are further apart in the explanatory variable.Note that if w i,j = 1, then the unweighted Theil-Sen estimator is obtained.
Henceforth, we shall use βWTS = weighted median where w = {|x i − x j | : i < j }.The formula for the WTS estimator given by Eq. ( 10) should be compared to that of the OLS estimator given by Eq. ( 4), to see how the robustification takes place.The corresponding robust intercept estimator using the weighted Theil-Sen method is an unweighted median, namely (Cressie and Keightley, 1981): The formula for the WTS intercept estimator given by Eq. ( 11) should be compared to that of the OLS estimator given by Eq. ( 5), to see how the robustification takes place.

Statistical properties of weighted Theil-Sen (WTS) estimates
In the rest of this subsection, we summarize the statistical theory associated with WTS estimation.

Variance
From Cressie and Keightley (1981), and var( αWTS where g is the probability density function of the errors { i }; I (g) 2 = g 2 (y) dy 2 = 1/(12σ 2 ), for g a normal density with mean µ and variance σ 2 ; ξ (x) is defined as the sum of squared deviations of x ≡ {x i : i = 1, . .., n} about its mean: Note that the formula given by Cressie and Keightley (1981) for the asymptotic variance of βWTS contains a typographic error, which is corrected in Eq. ( 12).

Hypothesis testing
In the regression setting, it is common to test for the dependence of y on x through the slope β being non-zero, so the null hypothesis is that of no dependence.That is, the null hypothesis is H 0 : β = 0, vs. the alternative hypothesis, H 1 : A commonly used test statistic for this hypothesis test is where β and σ are the usual estimators obtained from OLS.If the errors are assumed to have a Gaussian distribution (i.e.i has probability density, g( ) = (2π σ 2 ) −1/2 exp(− 2 /2σ 2 ), for all i), then the test statistic T follows a t-distribution.
Since βWTS has a limiting normal distribution, we shall use the normal distribution based on Eq. ( 12) for WTS estimation.A robust estimate, σ , of σ is used (Sect.3.5).
For a chosen significance level of 5 % (say), we reject the null hypothesis if T > z 97.5 or if T < −z 97.5 .The value z 97.5 can be obtained by looking up the normal-distribution table, which can be found in many statistical software packages and books.These tables solve for z γ in the equation, Pr(z ≤ z γ ) = γ /100, where z is a standard normal random variable, and 0 ≤ γ ≤ 100.

Confidence intervals
A 95 % two-sided confidence interval for β can be calculated as where σ is defined below in Sect.3.5.Notice that the null hypothesis, H 0 , is rejected if zero is not contained in the confidence interval.

Weighted Theil-Sen (WTS) estimation in the diagnostic plots
In Fig. 4, we plotted the spatial standard deviation vs. the spatial mean in the Niño 3.4 region for all t ∈ D Jan t , t ∈ D Feb t , . .., t ∈ D Dec t , and superimposed the individual WTSestimated lines.Also, in Fig. 2, we plotted the spatial standard deviation vs. the spatial mean in the Niño 3.4 region for all t ∈ D t and superimposed the WTS-estimated line.Each of the estimated lines in the 12 individual plots and the allmonths-combined plot clearly have a negative slope.The WTS-estimated slopes and the associated 95 % confidence intervals, given by Eq. ( 17), for each month and all-monthscombined are listed in Table 1.The successive time points in each of our plots are 1 year apart and the autocorrelation values within each plot are small, so we are not concerned about temporal statistical dependence upsetting our inferences on the slopes of the straight lines.The presence of outliers (e.g.May 1988, which was the start of a La Niña event, and November-December 1982, which were months in the middle of an El Niño event) motivated our use of the WTS robust-regression method.It should be noted that the outliers do not contradict the observed linear relationship; rather, they appear to be extreme observations caused by unusual SST conditions.Implementing robust regression provides us with some assurance that our analysis of variability will not be dominated by the outliers.
The asymptotic variance of the WTS slope estimate was given in Sect.3.4, which depends on σ 2 ; see Eq. ( 12).To maintain robustness, we use the median absolute deviation (MAD) to estimate σ (Hampel, 1974): The square of the MAD, σ 2 , is a robust estimate of σ 2 (Rousseeuw and Croux, 1993).This estimate of σ was also used to calculate the asymptotic 95 % confidence intervals and p values from a two-sided normal test, given in Table 1.The p values show that January is the only month without significant linear dependence between the spatial standard deviation and the spatial mean.

Uncertainty quantification of slope estimates
The results given in Sect.3.5 make distributional assumptions to obtain the p values and confidence intervals.A resampling approach that is distribution free, called the bootstrap, was first developed by Efron (1979).In bootstrapping, the observed data are treated as a population from which samples are randomly drawn with replacement.The parameter or statistic of interest is then calculated from these resampled data, and the process is repeated a large number of times.Because of its nonparametric nature, it does not rely on Gaussian errors.We find this an attractive feature, and hence it is used for all subsequent uncertainty quantification.
Bootstrapping in regression can be carried out in at least two ways.Draw randomly with replacement from the (x, y) pairs, where x is the spatial mean and y is the spatial standard deviation, and then estimate the variance of the slope estimator from a large number of such resamples.Another bootstrap procedure, and the one we shall use, is to resample the residuals from the fitted model, which treats the x values as fixed and assumes that the errors are homoskedastic.In our case, the spatial mean, z(t), has much less variability than the spatial standard deviation, S z (t), and the number of pixels contributing to the spatial moments are the same for each t ∈ D t .Hence, the bootstrap assumptions for regression are not unreasonable in our case.The bootstrap algorithm we use is given in Appendix A.
We have calculated the standard deviation and the 95 % bootstrap percentile confidence intervals of the bootstrap replicates of the regression slope, β, using B = 1000 boot- .., 1000} were clearly unimodal for each month and for all-monthscombined (not shown).According to the 95 % bootstrap percentile confidence intervals for WTS estimation, most of the months individually (Fig. 5) and combined have non-zero slopes at the (point-wise) 5 % level of significance.That is, there is a significant mean-standard deviation dependence in the Niño 3.4 SST data.This is also apparent from Fig. 5, where the upper confidence envelope misses the zero line for most of the months and barely crosses it for January and February.The estimated slopes for each month are all negative, and the austral autumn and winter months (May, June, July, and August) have the steepest slopes.

Variance-stabilizing transformation
Consider a random variable X such that E(X 2 ) < ∞; then a variance-stabilizing transformation, f , satisfies for a constant c that does not depend on E (f (X)).The approximation in Eq. ( 19) relies on σ 2 x ≡ var(X) being small relative to µ x ≡ E(X).
The delta method can be used to identify a variancestabilizing transformation as follows.Assuming that f is twice differentiable, we can write a first-order Taylor-series expansion of f (x) about a real value ν as see, for example, Harding and Quinney (1986).Then put ν = µ x , and hence an approximation to the variance of f (X) is given by Let var(X) = σ 2 x be some function of the mean µ x , which we write as h(µ x ).To find a function f that satisfies Eq. ( 19), we rewrite Eq. ( 21) as follows: This implies that for any function h(µ x ), where 1/ √ h(µ x ) is integrable with respect to µ x , the relationship produces an f that satisfies (Eq.19); that is, f given by Eq. ( 23) is a variance-stabilizing transformation (Bartlett, 1947).In what follows, we are particularly interested in the case h(µ x ) = (α + βµ x ) 2 .In the Niño 3.4 region, the spatial standard deviation of SST has an approximately linear relationship with the spatial mean of SST.That is, If we now replace the empirical (i.e.spatial) moments of z with theoretical moments in Eq. ( 24), we obtain modulo an additive and multiplicative constant, and provided α + βx > 0. That is, in the domain α + βx > 0, Eq. ( 25) says that f (x) = log(α + βx) is approximately a variancestabilizing transformation.For the SST data, we write for constants α and β such that α + βz(s, t) > 0 for all {z(s, t)}.
The analysis we propose in this article is to obtain the slope β and the intercept α in Eq. ( 26) by WTS estimation; see Table 1 where, for all-months-combined, αWTS = 5.5559 and βWTS = −0.1692.For this choice, α + βz(s, t) > 0 for all {z(s, t)}.Then for all-months-combined we plotted the monthly spatial standard deviation vs. the monthly spatial mean of the transformed data {u(s, t) : s ∈ D s , t ∈ D t }; see Fig. 6 where a WTS-fitted line is also superimposed.The negative dependence seen previously in the raw data (Fig. 2) is no longer present in the all-months-combined plot, but the colours for each season suggest that there are still different behaviours between the seasons.
Because of the seasonal differences seen in Fig. 6, we should not be surprised that a single intercept and slope estimate in Eq. ( 26) does not remove the mean-standard deviation dependence in the individual months, January, February, . . ., December.In plots analogous to Figs. 4 and 5 for the transformed data, {u(s, t)}, the slopes still show a strong pattern, although they now oscillate around zero.The goal is to transform the data so that there is no mean-standard deviation dependence; that is, for each month (and for allmonths-combined), the confidence interval for the slope β should contain 0. An attempt was made to use what is known about the ENSO spring barrier to forecasting SSTs and El Niño/La Niña events (e.g.Lopez and Kirtman, 2014): the data from contiguous months either side of this barrier were combined, but a pattern of estimated slopes remained.Based on this data analysis (not shown), we concluded that each month has its own individual mean-standard deviation dependence that should be respected.
To illustrate the next stage of our analysis, we consider the month of January.The top left-hand panel of Fig. 4 shows a plot of y = S z (t) vs. x = z(t), for all t ∈ D Jan t .We used robust regression to obtain a WTS-fitted line, y = αJan + βJan x, where from Table 1, αJan = 2.117 and βJan = −0.04215.Following the transformation given by Eq. ( 26), but now just for the data in the month of January, we define for t ∈ D Jan t , u Jan (s, t) ≡ log αJan + βJan z(s, t) .
Similarly, we define u Feb (s, t) for t ∈ D Feb t , . . ., and u Dec (s, t) for t ∈ D Dec t .Note that all arguments of the log transformation in Eq. ( 27) were positive for January and all other months.Finally, we combine these individual-month definitions to define a transformation of all the data, {z(s, t)}, as v(s, t) ≡ u M (s, t) , for t ∈ M and M ∈ {Jan, Feb, . .., Dec} .(28) The analogous plot to Fig. 5, for the transformed data {v(s, t)}, is given in Fig. 7 which, using obvious notation, is based on v(t) and S v (t).Clearly, the transformation given in Eq. ( 28) has successfully removed the strong pattern in the estimated slopes, leaving a zero slope inside all 12 confidence intervals; see Table 2.With the mean-standard deviation dependence removed, it is now meaningful to look at the intercept estimates in Table 2.These show how the spatial standard deviations change from month-to-month, but they are now unconfounded with the level of warming or cooling in the Niño 3.4 region.In other words, on the transformed scale, the data {v(s, t)} show us the pure spatial variability of tropical Pacific SSTs by month.Like Fig. 7, which was for the estimated slope estimates based on the transformed data, Fig. 8 shows the estimated intercept estimates based on the transformed data with the associated point-wise bootstrap percentile confidence intervals.The ENSO spring barrier between April and May stands out.Here we recognize it as the month-to-month transition where the standard deviation increases the most.

Predicting future SSTs
In this section we fit time series models to {z(t)} and {v(t)} for the whole time period from November 1981 to December 2014.We then use hindcasting to compare the models' respective predictions to the observed in-sample SSTs.Because we hindcast in the same way for both models, the respective skills can be compared.First define the anomalies where z M ≡ t∈M z(t)/ t∈M 1.A small number of outliers were identified and removed.An autoregressive process of order 2, or AR(2), was identified using the partial autocorrelation function (PACF), and a fit was obtained using the ar function (yule-walker method) in the R stats package.At time t, z a (t) and z a (t − 1) were used to forecast z a (t + τ ), for τ = 1. .., 7. We denote these predictions as pz (t; τ ) ; τ = 1. .., 7.
www.adv-stat-clim-meteorol-oceanogr.net/2/155/2016/On the transformed scale, as defined by Eq. ( 28), we standardize v(t) by calculating its anomalies and then dividing by the intercepts (I M v ) in Fig. 8 and Table 2.That is, we define Again the PACF suggested an AR(2) process, and the fitted model was used in a similar way to forecast up to 7 months into the future on the transformed scale.Unbiased predic-  tions on the original degrees-Celsius scale were obtained using the back-transforms described by Cressie (1993, pp. 135-136).We denote these predictions as pv (t; τ ).
The square root absolute difference (RAD) for each predicted value was used as a measure of the predictive skill: where z(t + τ ) was extracted from the complete Niño 3.4 dataset.A similar definition of RAD M v (t; τ ) for pv (t; τ ) allows us to calculate a relative skill between the two types of forecast: If the relative skill is greater than 1, then pv is a better predictor than the standard predictor, pz , and if the relative skill is less than 1, pv does not perform as well as pz .
For some months, specifically the austral summer months (December, January, and February), pv was much better than pz ; see the box plots of relative skill for January in Fig. 9.For the months March-September, pv shows almost no improvement on pz , while for October and November pv did not perform as well as pz ; see the box plots of relative skill for October in Fig. 10.
Our study involved fitting an autoregressive model directly to the data {z a (t)} or {v s (t)}.No measurement error was accounted for, and hence any forecast at time t + τ relied only on the current and the immediate-past datum at time t and at time t −1, respectively.A more thorough analysis would fit a hierarchical statistical model, for which a Kalman filter could be developed.Then a forecast at time t + τ would depend on all data up to and including time t.It would be interesting to see whether the same seasonal pattern of relative predictive skill emerged.

Discussion and conclusions
The first part of this article gives an exploratory data analysis of tropical Pacific SSTs during the period November 1981-December 2014.Most analyses and models found in the literature work directly with the SST anomalies.We make a strong case here that there is structure in the raw data that these analyses miss, namely a spatial mean-variability relationship that suggests a nonlinear transformation of the data.This structure is also consistent with the generally accepted nonlinearity of the ENSO phenomenon.
Working with anomalies implies that large-scale seasonal processes and smaller-scale processes are additive.The approach based on anomalies subtracts the seasonal component, leaving behind a residual component (made up of the anomalies) that is modelled.We believe that this strategy can cause difficulties with modelling and forecasting.This is because there is a mean-standard deviation relationship that needs to be respected first, before anomalies are considered and dynamical models are built.
In this article, we give a statistical methodology that removes the mean-standard deviation relationship by transforming the data.The transformation is empirically driven, but it is based on 33 years of data for which a consistent pattern is seen; outliers are apparent for less than 3 % of the 398 months that we considered.In order that the outliers did not overly affect the patterns we inferred, we used robust methods in our exploratory data analysis.
The transformation we derived is logarithmic, monotone, and nonlinear, and it respects the variability seen in SSTs from month-to-month during the year.At the very least, the estimated parameters of the transformation offer another characterization of the enigmatic patterns of SSTs that lead to El Niño and La Niña events.
In the latter part of this article, we fitted an autoregressive process to the standardized anomalies on the transformed scale.Forecasting based on autoregressive processes is quite straightforward, as is the back-transform that yields forecasts on the original scale, in degrees Celsius.The model we propose is fundamentally statistical, and its skill relative to a standard forecast on the original scale is seen to be seasonal (Sect.6).It could clearly be enhanced by embedding it in a hierarchical model.There is also the potential in this work to understand more clearly various geophysical phenomena involving tropical Pacific SSTs, such as the ENSO spring barrier and interconnected oceanic and atmospheric effects, by transforming the SSTs to a different scale.

Figure 1 .
Figure 1.Comparison of the SST data to the SST anomaly data for January 1983.The climatology base period for the anomalies is 1971-2000.The Niño 3.4 region (5 • S-5 • N and 120-170 • W) is inside the black rectangle.Units on the colour scale are • C.

Figure 4 .
Figure 4. Spatial standard deviation, S z (t), vs. spatial mean, z(t), in the Niño 3.4 region, for all t ∈ D Jan t , t ∈ D Feb t , . .., t ∈ D Dec t , with the WTS-estimated lines superimposed.Units on both axes are • C.
and is zero otherwise.Now find m 1 = inf {b : G(b) ≥ 0.5} and m 2 = sup {b : G(b) ≤ 0.5}; then the weighted median m w is defined as

Figure 5 .
Figure5.Time sequences from December, January, February, . . ., November, December, January, showing WTS-estimated slope coefficients with upper and lower limits from point-wise 95 % bootstrap percentile confidence intervals, for the spatial standard deviation vs. spatial mean of the original data {z(s, t)}.The horizontal solid blue line is the WTS-estimated slope coefficient for allmonths-combined, and the horizontal dotted black line is the zero line.

Figure 6 .
Figure 6.Spatial standard deviation, S u (t), vs. the spatial mean, u(t), of the transformed data {u(s, t)}, in the Niño 3.4 region, for all t ∈ D t , with the WTS-estimated line superimposed.Each season has a different colour (DJF points are red; MAM points are yellow; JJA points are blue; SON points are green).

Figure 8 .
Figure8.Time sequences from December, January, February, . . ., November, December, January, showing WTS estimates of the intercept with upper and lower limits from point-wise 95 % bootstrap percentile confidence intervals, for the spatial standard deviation vs. spatial mean of the transformed data {v(s, t)}.The intercept estimates are a measure of the monthly spatial variability on the transformed scale.
Spatial standard deviation, S z (t), vs. spatial mean, z(t), in the Niño 3.4 region, for all t ∈ D t , with the WTS-estimated line superimposed.Units on both axes are • C.
Spatial standard deviation, S z (t), vs. spatial mean, z(t), in the Niño 3.4 region, for all t ∈ D t , where each season has a different colour.The WTS-estimated lines for each season are superimposed (DJF line is red; MAM line is yellow; JJA line is blue; SON line is green).Units on both axes are • C.

Table 1 .
Results from a linear regression of the spatial standard deviation, S z (t), on the spatial mean, z(t), in the Niño 3.4 region, for all t ∈ D Jan t , t ∈ D Feb t , . .., t ∈ D Dec t , and t ∈ D t .Shown are the WTS-estimated intercept and slope coefficients for each month and for allmonths-combined, and the associated model-based asymptotic 95 % confidence intervals and p values for the slope estimate from a two-sided normal test.

Table 2 .
Results from a linear regression of the spatial standard deviation, S v (t), on the spatial mean, v(t), in the Niño 3.4 region, for all t ∈ D Jan t , t ∈ D Feb t , . .., t ∈ D Dec t , and t ∈ D t .Shown are the WTS-estimated intercept and slope coefficients for each month and for all months combined.Also shown are bootstrap (using 1000 replicates) standard deviations (SD) and 95 % confidence intervals.