A comparison of two methods for detecting abrupt changes in the variance of climatic time series

Two methods for detecting abrupt shifts in the variance, Integrated Cumulative Sum of Squares (ICSS) and Sequential Regime Shift Detector (SRSD), have been compared on both synthetic and observed time series. In Monte Carlo experiments, SRSD outperformed ICSS in the overwhelming majority of the modelled scenarios with different sequences of variance regimes. The SRSD advantage was particularly apparent in the case of outliers in the series. When tested on climatic time series, in most cases both methods detected the same change points in the longer series (252-787 monthly values). The only exception was the Arctic Ocean SST series, when ICSS found one extra change point that appeared to be spurious. As for the shorter time series (66-136 yearly values), ICSS failed to detect any change points even when the variance doubled or tripled from one regime to another. For these time series, SRSD is recommended. Interestingly, all the climatic time series tested, from the Arctic to the Tropics, had one thing in common: the last shift detected in each of these series was toward a high-variance regime. This is consistent with other findings of increased climate variability in recent decades.

. These two methods are described in Section 2. In Section 3, the methods are tested using Monte Carlo experiments with different sequences of variance regimes, magnitudes of shifts and positions of change points. The effect of outliers is also evaluated. The real-world examples are presented in Section 4. The methods are applied to climatic time series from different geographic zones, from the Arctic to the Tropics. The results are summarized in Section 5.
If ( 1 : 2 ) is greater than the critical value * from Table 1 in Inclan and Tiao (1994), then there is a change point at * ( [ 1 : 2 ]) and proceed to step 2a. If ( 1 : 2 ) < * , the is no evidence of variance changes in the series and the algorithm stops.
Step 2b. Now do a similar search for the second part of the series from the change point found in step 1 till the end of the time series, i.e., 1 = * + 1 and 2 = . If ( 1 : 2 ) > * , step 2b is repeated using a new (larger) 1 , until ( 1 : 2 ) < * . When this occurs, the last point of change is = 1 − 1.
If during this testing period RSSI remains positive in the case of increasing variance, or negative in the case of decreasing variance, the null hypothesis of a constant variance is rejected, and becomes a true change point +1 . If RSSI changes its sign, the test for stops, because the null hypothesis cannot be rejected. The observation is included in the current regime , and the regime variance 2 is recalculated incrementally as in Finch (2009): where is the length of regime j before adding a new observation.

Handling outliers
An outlier is a data point whose value is substantially different from the other data points in a sample. The presence of outliers can lead to large errors in estimates of regime statistics and greatly affect the timing of regime shifts. This is particularly true for the variance, because the data values are squared.
When dealing with outliers, it is desirable to leave an observation intact if it falls within a "normal" range of variation, and assign it a small weight if it is outside that range (Huber, 1981). In SRSD, each observation xi in the current regime j is assigned a Huber-type weight wi, which is defined as where h is a tuning constant and scale is first estimated as the median absolute deviation (MAD) = (| |), = , + 1, … , − 1.
After the weights are estimated, the regime variance is calculated as where To improve accuracy of the estimates, the weights are recalculated, this time using the weighted standard deviation as scale. Then the regime variance is recalculated one more time as in Eq. 10 using more accurate weights.
When a potential regime shift is tested by calculating RSSI, the weights for data points = , + 1, … , + − 1, are assigned as If the test cannot reject the null hypothesis and is added to the current regime, the weight for this observation is recalculated using Eq. 8 with = . The weighted version of Eq. 7 for incremental recalculation of the regime variance is 2 = 1 2 + 2 2 1 + 2 . (13)

Monte Carlo experiments
The majority of experiments were performed using samples of size n = 100, which is a typical length (in years) of instrumental climatic time series. For each test, 10000 samples of random normally distributed numbers are generated with zero mean and one or two change points in the variance. A change points is defined as in SRSD, that is, as the first point of a regime. Since ICSS defines a change point as the last points of a regime, all the change points detected by ICSS are shifted one step forward.
A total of three groups of experiments were conducted: 1) with one change point located in the middle and closer to the ends of time series, 2) with two change points and different combinations of variance regimes, and 3) with outliers. Before starting comparing the two methods, it is important to clarify the role of the significance level p and cut-off length l on regime shift detection by SRSD.

The effect of tuning parameters p and l
As shown above, the performance of SRSD can be controlled by three parameters: the significance level p, cut-off length l, and Huber's tuning constant h. This will be denoted as SRSD (p, l, h). Let us clarify the role of the first two parameters. The Huber's tuning constant becomes really important only in the presence of outliers.
The significance level p (also called the target probability level) controls how sensitive SRSD is to shifts in the variance. Since the critical value decreases with the increase of , SRSD becomes sensitive to smaller and smaller shifts in the variance, thus reducing a type II error ("false negative"). On the other hand, a type I error ("false positive") increases, leading to spurious shifts. This is demonstrated in Fig. 1 that shows the frequency of shift detection for different values of .
The true change point is located at = 51 and the variance doubles from the first regime to the second. At = 0.05, the percentage of hits (change points detected exactly at = 51) is 4.5. In 57% of the time, SRSD fails to detect any shift in the series ( Table 1). As increases, the percentage of hits also increases, but the tails of the distribution in Fig. 1 become heavier.
It is important to note, that since SRSD is a sequential algorithm, the frequency of change points found at the end of the series increases substantially, as illustrated for = 0.3. But these are only potential change points that may or may not pass the test.
Therefore, when comparing SRSD and ICSS, only the fully resolved change points were considered. More specifically, the numbers of change points detected in each series (as presented in Tables 1-5) were counted for the period 1, … , − for both methods.
The choice of depends on the magnitude of shifts. Figure 2 shows that as the magnitude of a shift increases, the accuracy of shift detection is increasing faster for smaller . Thus, if the ratio of variances between two adjacent regimes is greater than 4, it is better to use = 0.05, which provides better accuracy of detection and fewer false positives, than = 0.1 or = 0.3. If the goal is to detect shifts of smaller magnitude, the choice of = 0.1 or higher would be appropriate. The target probability level can be considered as the upper limit of the desired significance level of the shifts; the p-values, calculated after all shifts in a series have been detected, are usually lower.
The cut-off length controls the time scale of the detected regimes. As an example, Fig. 3a shows the results for time series with three variance regimes, 2 = (1, 3, 1), and two change points, = (31, 61). The best results were obtained when l was set to be equal the length of the first two regimes (l = 30). In 73% of the time, SRSD detected two shifts simultaneously and the percentage of hits was the highest (Table 2). At = 20, the test statistics in Table 2 do not change much, except for the percentage of hits for 1 . When the cut-off length is larger than the regime length ( = 40), the effect on the test statistics is more dramatic. The percentage of two shifts detected simultaneously in a series sharply decreases, while the percentage of one or zero shifts increases. The first shift is affected more significantly than the second one. For a reversed order of the variance regimes (3, 1, 3), the percentage of hits for both shifts is about the same for all cut-off lengths used (Fig. 3b). Table 2 shows, however, that at = 40, only one of the two shifts is detected most of the time (69%).
In the Monte Carlo experiments below, the values of are chosen so that to make type I errors about the same for both methods. The values of are used a bit smaller than a specified regime length, because in practice the exact length of regimes in a series is usually unknown. When working with real time series it is recommended to experiment with different values of and , in the same way as it is always recommended in spectral analysis to try different window shapes and sizes when smoothing a periodogram.

Series with one change point
The first two experiments were performed with a change point positioned in the middle of the series ( 1 = 51), while the variance either increased from one to two ( Fig. 4a) or decreased from two to one (Fig. 4b). In both cases, SRSD outperformed ICSS in terms of the correct number of change points detected and percentage of hits (Table 4). Note that ICSS found no shifts at all in about 40% of the generated series.
It is worth noting an asymmetry in the frequency distribution of detected change points, particularly for ICSS. Change points tend to be detected more frequently after 1 than before if the variance increases, and the opposite is true if the variance decreases. This is because of a higher probability density near the mean (zero in this case) in a normal distribution. Thus, when the variance increases, the probability of getting small values of at 1 or right after it is relatively high, which delays a shift detection. In the case of SRSD, the frequency distribution is more symmetric, because the probability of a potential change point passing the test increases toward the true change point 1 .
When the variance increases from one regime to another, ICSS performs particularly poor if the change point is located closer to the beginning of a time series. In an experiment with a change point at 1 = 25, and a regime shift from one to three, ICSS found one change point in 46% of time, but these change points were all over the place, with only 1.1% of hits ( Fig. 4c). In contrast, ICSS performed surprisingly well, being on par with SRSD, when a change point was shifted toward the end of a time series, unless it was too close to the end. Since SRSD is sequential method, designed to work in a monitoring mode, it is not surprising that it outperformed ICSS when a change point was placed at 1 = 90 (Fig. 4d).
When the variance decreases from one regime to another, ICSS performs much better if a change point is shifted toward the beginning of a time series than toward the end. In an experiment when a change point was placed at 1 = 21, ICSS and SRSD showed similar results (Fig. 4e), but when a change point was moved to 1 = 71, the percentage of hits for SRSD was more than twice that of ICSS (Fig. 4f).

Series with two change points
In this group of experiments, the locations of change points remain constant at = (34, 67). The main goal here is to test different regime sequences. For the first regime sequence, 2 = (1, 3, 1), ICSS correctly detected two change points in 39% of the series, versus 62% for SRSD (Table 4). In more than 50% of the series, ICSS detected no shifts at all. The accuracy of change point detection, as expressed by the prominence of the peaks at 1 and 2 in Fig. 5a is much lower for ICSS than for SRSD.
The situation becomes even worse for ICSS, if the lower variance regime is in the middle, i.e., 2 = (3, 1, 3) ( Fig.   5b). In this case, ICSS correctly detected two change points in only 21% of the series, and the accuracy of detection is about one third of that for SRSD. This is consistent with the results of Inclan & Tiao (1994), who found that ICSS performs best when the larger variance is in the middle, and its performance deteriorates in the opposite situation.
The most difficult situation, according to Inclan & Tiao (1994), is when the variances change in a monotone way, i.e., the variance increases at the first change point and increases again at the second change point. They estimate that, if this is the case, then it is necessary 500 or more observations for ICSS to be able to detect two change points in more than half of the time. Indeed, for a sequence of variances 2 = (1, 3, 9) ( Fig. 5c), ICSS detected two change points in only 44% of the series versus 69% for SRSD (Table 4). Especially problematic for ICSS was the detection of the first change point, the percentage of hits for which is much lower than for 2 . The detection statistics for 2 is about the same for both methods.
When the variance decreased in a monotone way as 2 = (9, 3, 1), ICSS detected the first change point more often than the second one, while SRSD detected both change points with about the same accuracy (Fig. 5d). The difference in the frequency of detection of two change points simultaneously between the two methods is quite substantial: 84% for SRSD and only 34% for ICSS (Table 5).

Series with outliers
Even a single outlier can have a substantial impact on regime shift detection procedure; it is like a monkey wrench thrown in the works. Figure 6 and accompanying Table 5 show a few examples of negative effects of outliers. In this set of experiments, the variance changes sixfold from one regime to another and an outlier ( * ) has a value of six. In the first example (Fig. 6a), a true change point was placed at = 34 and an outlier at = 65. Both SRSD and ICSS detected the change point equally well.
The results for ICSS, however, revealed a spurious shift at = 65. Interestingly, in 41% of the series ICSS found three or more change points. The outlier had practically no effect on SRSD.
The effect of an outlier was even more dramatic when it was placed closer to the true change point (Fig. 6b). In this case, ICSS detected one change point in 94% of the series, but the overwhelming majority of those change points were found at = 45 (position of the outlier), not at = 34 (true change point). Again, the results for SRSD were not affected, except for a small bump in the frequency distribution at = 45 (Fig. 5b).
In some situations, an outlier may cause not a spurious shift, but rather a drastic deterioration of ICSS performance.
The only difference is that, in the latter case, there was an outlier at = 50. As a result, both the sensitivity and accuracy of ICSS was drastically reduced. In the majority of the series (63%), ICSS found no change points at all, and the percentage of hits for 1 and 2 was reduced to 5.2 and 4.4, respectively. The performance of SRSD was not seriously affected by the outlier.

Arctic Ocean sea surface temperature
In recent decades, the Arctic is warming faster than other parts of the globe, a phenomenon known as Arctic Amplification.
Some observational analyses found evidence for a 'wavier' jet stream in response to rapid Arctic warming (Francis and Vavrus, 2015), which, in turn, may lead to greater temperature variability in the Arctic. Figure 7a shows monthly sea surface temperature (SST) anomalies for the Arctic Ocean (65-90°N), based on the NOAA Optimum Interpolated SST dataset. This dataset (a.k.a. Reynolds OI.v2) is compiled using observations from ship inlets, buoys (both moored and drifting) and from satellites. Apart from an obvious warming trend, one can notice that SST fluctuations in recent years becomes more intense.
The exact cause of increased SST variability in recent years are beyond the scope of this paper, but this time series is a good example to test the regime shift detection algorithms. First, it is necessary to detrend the SST series that can be done in a number of ways, depending on the researcher's view on what constitutes a trend. For this example, the LOWESS (local weighted regression) technique (Cleveland and Devlin, 1988) was used with a smoothing parameter of 0.1. When the LOWESS curve was subtracted from the SST anomalies, the residuals still contained a strong autocorrelation ("red noise"), therefore a prewhitening was needed. Given a lag-1 autocorrelation coefficient of 0.72 (OLS estimate), the prewhitening was performed as = − 0.72 −1 .
The result of the application of SRSD (0.05, 120, 3) to time series { } is presented in Fig. 7b. Three variance regime have been identified, with two change points: December 1996 and July 2007. The variance is lowest during the middle regime between these two change points ( 2 2 = 0.003). The variance is twice as high during the first regime ( 1 2 = 0.006) and almost three time as high during the most recent regime ( 3 2 = 0.010). These regime shifts are highly statistically significant, especially the second one, with the observed p-values of 1.2 • 10 −4 and 5.5 • 10 −9 , respectively.
The ICSS algorithm found three change points, two of which coincided with those found by SRSD, and the third one is, much shorter than the used cut-off length of 120 months. And second, given the Huber's tuning constant of 2, the weighted variance for this short regime is only 0.012, which makes the difference between the two regimes, statistically insignificant.

Arctic Oscillation
The Arctic Oscillation (AO) index is defined as the first empirical orthogonal function of sea level pressure in the Northern Hemisphere 20-90°N (Thompson and Wallace, 1998 (Hanna et al., 2015;Woollings et al., 2010) argue that AO variability is most likely a result of atmospheric internal variability, although it can be an early sign of destabilization of the polar jet stream and an increased susceptibility to external sources (Overland and Wang, 2015). Firm attribution of recent increased circulation variance is currently not possible (Trenberth et al., 2015).

Temperature in the midlatitudes
Analyzing long-term temperature records (since 1820) in eastern Minnesota, Skaggs et al. (1995) described a period of increased variability centered around 1920-1940, followed by sharp minimum in the middle to late 1960s, after which the variance started increasing again. They underscored, that this pattern of temperature variability was not characteristic of just eastern Minnesota. The period of low temperature variability in the late 1960searly 1970s occurred more or less simultaneously over the United States. Using more recent gridded temperature data, Huntingford et al. (2013) examined changes to standard deviations before and after 1980. Although the choice of this change point year was made rather arbitrary, their map showed prominent changes over the midlatitudes for both hemispheres, with large parts of Europe and North America experiencing increased variability in more recent decades.
To illustrate the temporal pattern of changes in temperature variability noticed by Skaggs et al. (1995), Fig. 9 shows the first differences of mean winter (DJF) temperature in Minneapolis, MN, along with the 13-yr running standard deviations.
Using the first differences is a simple way of eliminating a trend in the mean and focus on variability. The running standard deviations exhibit a general decline toward the middle of the series and an increase in more recent decades. Although running deviations are useful in depicting trends in variability, they make it hard to pinpoint the exact locations of regime shifts. This trend of increasing temperature variability is projected to continue and even intensify in the 21st century (Fischer et al., 2012). Figure 10 shows the results of a regime shift analysis for monthly surface air temperature (SAT) anomalies in western Europe (45-50°N, 0-10°E). This analysis was performed in two steps. First, regime shifts in the mean were identified using the sequential t-test algorithm included in SRSD (Rodionov, 2004). The tree detected regimes are presented in Fig. 10a as a stepwise trend. Although the differences between these regimes are seemingly small (their mean values are 0.16, -0.19, and 0.22, respectively), they are highly statistically significant, with the p-values of 0.003 for the shift down in February 1962, and 6.0 • 10 −6 for the shift up in August 1987. On the second step, this stepwise trend was removed from the original data, and the residuals were used as an input to SRSD and ICSS. The results for the two methods are exactly the same: three variance regimes with the change points between them in July 1971 and September 1981 (Fig. 10b). The p-values for the shifts in variance are 0.007 and 7.6 • 10 −5 , respectively.

El Niño -Southern Oscillation
The magnitude of ENSO events has varied significantly over time, with multidecadal periods of strong and weak variability. Trenberth and Hoar (1996) noted strong ENSO variations from 1880 to the 1920s, and after about 1950 and, except for a strong event during 1939-1942, weaker variations from the mid-1920s to 1950. Exceptional ENSO variability can also be found in long ENSO-proxy records (Dunbar et al., 1994). Those periods usually coincided with large pre-industrial climate variations (Quinn and Neal, 1992). More recently, Hu et al. (2013)  largely changed from bucket to engine intake measurements and to lessen the secular trend in the time series that has been associated with global warming. Apparently, this has led to overcorrection of the CTI, because now there is a statistically significant (p = 0.03) step shift downward in 1943, as detected by SRSD (Fig. 11a).

Summary and conclusions
Changes in the variance of climatic time series are starting to attract more attention, especially from the perspective of global warming, because changes in the variance may have greater impact on temperature extremes than changes in the mean. In these circumstances, statistical methods capable of detecting shifts in the variance regimes are needed. A review of climate literature reveals that there are just a few practical methods available for that purpose and their applications are very limited so far. The situation is much more advanced in the area of econometrics, but the methods developed there need to be tested on climatic time series, which have their own specifics, such as a relatively short length and smaller magnitudes of shifts.
This paper compares two methods: SRSD developed by the author and ICSS developed by Inclan and Tiao (1994).
The latter method is currently one of the most popular in econometric research. Both SRSD and ICSS are CUSUM-type methods, but while ICSS provides retrospective detection, SRSD utilizes a sequential approach, which makes it well suited for monitoring of regime shifts. Another difference is that, unlike ICSS, SRSD has two tuning parameters, the target probability level and cut-off length, that control the magnitude and time scale of variance regimes to be detected. This allows the users to tune up SRSD software, so that it better suits their needs. Also, it is important that SRSD has an internal mechanism for dealing with outliers. Due to the lack of such mechanism in ICSS, Inclan and Tiao (1994) advice to complement their procedure with some other procedure for outlier detection.
A comparison of the two methods was first implemented using Monte Carlo simulations with series of 100 points, which is a typical length (in years) of instrumental climatic time series. An interesting feature of ICSS is that it performs poor if a change point is located in the first half of a time series when the variance increases from the first regime to the second, or if a change point is in the second half of the series when the variance decreases. In contrast, ICSS performs better in the opposite types of situations, that is, when a change point is in the first half of a series in the case of decreasing variance, or it is in the second half in the case of increasing variance. With two change points, ICSS detects the second point much more often than the first one if the variance increases in a monotone way, that is, from the first regime to the second and then again to the third. If the variance decreases in a monotone way, ICSS more often selects the first change point than the second one.
As for SRSD, it detects both change points with about the same degree of accuracy, regardless of the way the variance changes.
Overall, SRSD outperforms ICSS in the overwhelming majority of the modelled situations. The only case when the two methods showed similar results was when a change point was located off the center of the series toward the end, but not too close to the end. For time series longer than 100 points, ICSS performs better, but it takes 500 or more points before it starts showing the results on par with SRSD. The experiments with series of more than 100 points, however, were very limited.
Some applications of ICSS for econometric time series have shown that it tends to overstate the number of actual change points in the variance (Fernández, 2004). This is probably only true for relatively long time series. As shown here, for time series of 100 points, ICSS rather tends to understate the actual number of change points, and in many cases fails to find any change points at all.     Table 3.  Table 4.  Table 5.   1907 1917 1927 1937 1947 1957 1967 1977 1987 1997 2007