^{1}

^{2}

^{*}

^{1}

^{2}

Edited by: Hans Von Storch, Helmholtz-Zentrum Geesthacht Zentrum für Material- und Küstenforschung, Germany

Reviewed by: Manolis G. Grillakis, Technical University of Crete, Greece; Andras Bardossy, University of Stuttgart, Germany

This article was submitted to Atmospheric Science, a section of the journal Frontiers in Earth Science

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

A cointegrated relationship has been identified between the January sea level pressure anomaly at the climatological location of the North Pacific High (NPH) and seasonal precipitation throughout California (Costa-Cabral et al.,

Precipitation in the Sierra Nevada mountain ranges of California occurs primarily in the winter months, in the form of snow and rain (Pandey et al.,

Here we develop a model, to be used in early February of each year, for forecasting the snow water content (SWC) 2 months later, on April 1, at key Eastern Sierra snow stations on Owens Valley tributary watersheds. Freshwater from this watershed, transported more than 300 miles via aqueducts, is one of the most important water sources for over 4 million people in the Los Angeles metropolitan area (Costa-Cabral et al.,

The impetus for this work was the finding by Costa-Cabral et al. (

Establishing a statistical association between observed large-scale climate patterns and precipitation to come in the months ahead is an approach that has been used at many world locations for forecasting precipitation and anticipating water resources availability (for example, Fernando et al.,

The El Niño-Southern Oscillation (ENSO) phenomenon has been identified as a major driver of climate variability worldwide, and arises from the coupled ocean-atmosphere system of the Pacific basin. Several studies have examined the influence of ENSO on precipitation and temperature over North America, and have documented associations between the strength and phase of ENSO and precipitation frequency and intensity over different regions—particularly the southwestern United States—due to ENSO's influence on the East Asian jet stream position. Thus, California has an increased likelihood of storms, precipitation extremes, and precipitation totals under El Niño conditions (see for example, Chikamoto et al.,

Roughly half the time, however, ENSO is in a neutral phase. Such neutral conditions are not an indication of average meteorology over California. The recent multi-year drought in California provides an example of an extreme meteorological drought occurring at a time when both ENSO and the Pacific decadal oscillation (PDO; Mantua et al.,

The strength and position of the NPH, expressed as sea level pressure anomalies and geopotential height anomalies over the northeast Pacific region, affect the position of the jet stream and associated storm tracks. As shown in Costa-Cabral et al. (

Costa-Cabral et al. (

By exploiting the cointegrated relationship between SWC and the NPH anomaly, two models were developed to estimate seasonal precipitation for the Owens River watershed, and provide advance information to support reservoir operations in the Owens Valley:

The

The

The mathematical formulation, parameter fitting, validation, and application of the models are described in sections Methods and Results and Model Evaluation Based on Hindcasts. The performance of both models in hindcasts (1951–2016) is also evaluated in sections Methods and Results and Model Evaluation Based on Hindcasts. The VECM model hindcasts of April 1 SWC are compared against those obtained by linear regression from observed February 1 SWC, showing significantly higher skill.

As a test, the January mean 850 hPa geopotential height over the NPH region was used in the VECM model instead of the sea level pressure, achieving comparable results in the hindcasts (Figures

The hindcasts reveal that the VECM model has considerable forecast skill. In the case of the Categorical Model, a much larger sample size would be required for evaluating the probability values that it provides, but the hindcast for the available sample size appears consistent with the calculated probabilities.

Figure

Cumulative standardized anomaly of April 1 snow water content at Mammoth Pass (orange) and January mean sea level pressure at the North Pacific High region [NCEP/NCAR grid cell centered at (35°N, 232.5°E)] (blue). The blue line refers to the vertical axis on the right, where values are in reverse order to aid comparison between the two variables, given that they are inversely correlated.

The Owens Valley snow water content (SWC) on April 1 depends mainly on seasonal precipitation totals but also on factors that influence snowmelt, including temperature, solar radiation and wind. Elevation is sufficiently high that nearly all winter precipitation at the sites of interest is in the form of snow. Years in which snowmelt occurs early will have diminished SWC on April 1. The model presented here does not account for these factors, but relies on the cointegrated relationship between SWC and NPH anomalies.

The cointegrated relationship between SWC and NPH anomalies is explored in the VECM model described in section The VECM Model, for Forecasting the April 1 SWC. The categorical model is described in section the Categorical Model, for Estimating Probability of Dry, Normal, or Wet Categories.

_{t} denotes the values of the

Here the vector _{i} (coefficients) are estimated model parameters and the vectors ε_{t} are random errors.

The year-specific vector at time

Combining Equation (1) and (2) and writing _{i} = −(_{i+1}+⋯+_{p}) for each _{1} − ⋯ − _{p}) gives:

_{t} are non-stationary (in particular, they have a unit root) while some (non-zero) linear combination _{t} is said to be cointegrated. More intuitively, although the components of _{t} are non-stationary and vary randomly, the distance between them tends to stay within a fixed distance. In fact, there may be up to _{t}.

In Equation (3), if _{t} is cointegrated then Π has reduced rank _{i} by OLS. This type of _{t} is cointegrated is known as a

The observed snow water content (SWC) on February 1 and April 1 of each year, starting 1948, at the four Owens Valley snow stations of interest—Mammoth Pass, Rock Creek #2, Sawmill, and Cottonwood #1—were provided by LADWP and are used as predictor variables in the VECM model.

Also used as a predictor variable in the VECM model is the January mean sea level pressure at a location near the climatological position (i.e., the average position over time) of the NPH. The monthly mean sea level pressure data from the National Center for Environmental Prediction, National Center for Atmospheric Research (NCEP-NCAR) reanalysis dataset (originally described in Kalnay et al., ^{1}

Reanalysis datasets, such as the ones used in this work, are based on simulations by dynamic climate models combined with observations. Such datasets represent estimates subject to uncertainty, characterized, for example, in Bosilovich et al. (

Because the VECM model uses cumulative standardized anomalies, the raw variables are first transformed into standardized anomalies, by subtracting the series mean then dividing by the standard deviation. The mean and standard deviation values used are for the entire record period, 1948–2016. The standardized anomalies are then added successively over time to obtain the cumulative standardized anomalies.

For fitting, the following steps were followed:

1) Clip the time series to the calibration period.

2) Specify number of lags. This study used

3) Estimate β′ by Full Information Maximum Likelihood (FIML; Johansen,

4) Estimate parameters _{i} by OLS. Calculate parameters _{i} from those.

For prediction, the following steps were followed:

5) Convert back to _{i}, Π) ↦ _{i}).

6) Use Equation (1) to calculate the forecasted cumulative value ŷ_{t} using data from the preceding 3 time periods and with ε_{t} = 0.

7) Use Equation (2) to obtain the desired forecasted value Δŷ_{t} = ŷ_{t} − _{t−1}.

Model verification will be described in section VECM Model.

The Categorical Model was developed as a complement to the VECM model. The two models are distinct in intent and formulation. The purpose of the Categorical Model is to estimate in early February the probability of the upcoming April 1 SWC falling into each of the three categories, dry, normal, or wet. These probabilities can be denoted _{d}, _{n}, and _{w}, respectively.

The only input to the Categorical Model is the April 1 SWC value forecast by the VECM model. As with any forecast, there is uncertainty in the value forecast by the VECM model, and this implies that in a general sense no one of the three categories—dry, normal, or wet—can be ruled out as a possible outcome. Instead, each category has some non-zero probability of occurring.

Figure

Mammoth Pass April 1 SWC forecasts for year 2010 (red line). The red dashed lines indicate the 95% confidence interval, assuming the errors are normally distributed about the forecast value. The blue line is the Gaussian probability density function (PDF), with mean equal to the forecast value and standard error estimated from the 1951–2016 forecasts. The green line is the normal value (the average for 1966–2015), and the dashed green lines indicate the category boundaries defined by a 20% deviation from the normal value.

Once the VECM model was fit for each location, the Categorical Model estimates the probability that the April 1 SWC value would fall into one of a pre-specified set of water year classes, using the categorical distribution (see for example Murphy, _{1}, …, θ_{K}) with entries summing to 1. θ_{k} gives the probability that an observation

The Categorical model uses the VECM-forecast April 1 SWC value,

The entries of θ are rescaled to sum to 1 via the softmax function:

Using Bayesian inference, the model parameters (_{k}, _{k}) are fit with Markov Chain Monte Carlo (MCMC) to draw samples from the model posterior distribution. All years were used in the model fitting. The following prior distributions were used:

Parameter fitting was described in methods section Parameter fitting. For model verification, 10 years (2007–2016) were excluded from parameter fitting, to be used for model verification. This 10-year period includes a few wet and some very dry years, thus offering a range of different conditions for verification. In Figures

For the final model parameters, we repeated the parameter fitting, this time including all available years, 1948–2016. The results for Mammoth Pass are displayed in Figure

VECM model predictions for Mammoth Pass, using the final parameters.

The VECM model is evaluated in this section using hindcasts for the period from 1951 through 2016. Because each forecast relies on data from the preceding three years (we have

Observations are plotted against forecasts in Figure ^{2}, are reported on each figure panel and in Table

Observed April 1 SWC each year in 1951–2016 plotted against the model's forecast value, for the four Owens Valley sites. Dashed gray lines indicate the 95% confidence interval. Dashed green lines indicate a 20% deviation from the normal value, separating between the categories dry, normal, and wet.

Measures of VECM model April 1 SWC forecast performance.

^{2} in Figure |
0.751 | 0.729 | 0.783 | 0.696 |

No. of 10 wettest years misclassified | 1^{c} |
1^{c} |
0 | 0 |

No. of 10 driest years misclassified | 0 | 0 | 0 | 0 |

No. of opposite-category classifications^{a} |
0 | 2 | 0 | 0 |

No. of category misclassifications^{b} |
16 | 20 | 18 | 26 |

No. years over-predicted / No. under-predicted | 34/32 | 36/30 | 35/31 | 35/31 |

Ave. deviation for over-predictions |
7.40 | 2.64 | 3.68 | 3.43 |

Ave. deviation for under-predictions |
−7.38 | −2.90 | −3.91 | −3.77 |

Max. deviation for over-predictions |
17.06 | 7.64 | 11.09 | 9.51 |

Max. deviation for under-predictions |
−26.95 | −10.63 | −12.47 | −11.42 |

By definition of 95% confidence interval, 5% of the points (i.e., 1 in 20 points) are expected to fall outside the interval in a large sample of points. Here, the sample size of 66 years (1951–2016) is relatively small, so we expect the number of points outside the 95% confidence interval to be in the vicinity of 3.3. The number of points lying outside the 95% confidence interval bounds in Figure

LADWP defines the normal SWC value at each site as the average of 50 recent years. The 50-year range is updated every 5 years. At the time of writing this manuscript, the 50-year period used by LADWP is 1966–2015, and the normal values are the following:

Of special importance to LADWP is an April 1 SWC forecast in the form of three categories:

The observed frequency of these categories in the 66 years of record are: For Mammoth Pass, 24 dry, 22 normal, and 20 wet years; for Rock Creek #2, 32 dry, 13 normal, and 21 wet years; for Sawmill, 30 dry, 17 normal, and 19 wet years; and for Cottonwood #1, 33 dry, 13 normal, and 20 wet years.

Figure

There is considerable agreement between observed categories across the four stations, and though there are several instances where a normal year at one station was dry or wet at another station, there are no instances where one station was dry and another wet in the same year, i.e., opposite categories were not observed across stations. The same is true of the VECM model predictions: in no year was one station predicted to be wet while another station was predicted to be dry. This can be seen in Figure

Additional measures of the VECM model performance are listed in Table

The year with the largest under-prediction for Mammoth Pass was 1986, where the forecast was for 45.9 in (108% of normal) and the observation was 72.9 in (171% of normal). The year with the largest over-prediction was 2015, where the forecast was for 17.1 in (40% of normal) and the observation was 1.4 in (3% of normal). This was a year when snowmelt occurred earlier than usual, depleting the snowpack before April 1.

This section compares the VECM model performance in forecasting April 1 SWC against the results achieved using the historical linear regression equation relating observed April 1 SWC to February 1 SWC based on 1948–2016 observations for each station. The February linear regression method was available prior to this study, and represents a baseline against which the VECM model forecasting skill can be compared.

The VECM model was more successful than the February linear regression, when comparing between Table ^{2}) is 0.751 for the VECM model and 0.630 for the February linear regression. The forecast errors of the February linear regression, i.e., the differences between the forecast value and the observation on the same year, have higher average values and larger maximum values (positive and negative), compared to the VECM model. See also Figures

Performance measures for forecasts based on the February linear regression equations.

^{2} |
0.631 | 0.589 | 0.675 | 0.559 |

No. of 10 wettest years misclassified | 1 | 2 | 1 | 3 |

No. of 10 driest years misclassified | 0 | 1 | 1 | 1 |

No. of opposite-category classifications^{a} |
0 | 1 | 1 | 2 |

No. of category misclassifications^{b} |
16 | 23 | 21 | 26 |

No. years over-predicted / No. under-predicted | 34/32 | 38/28 | 38/28 | 38/28 |

Ave. deviation for over-predictions |
8.54 | 3.05 | 4.09 | 3.84 |

Ave. deviation for under-predictions |
−8.89 | −3.87 | −5.51 | −5.29 |

Max. deviation for over-predictions |
24.93 | 9.54 | 11.75 | 12.10 |

Max. deviation for under-predictions |
−32.61 | −14.65 | −17.29 | −16.23 |

The VECM model had no instances (for any station) where one of the 10 driest years was misclassified, while the February linear regression had one such instance for each station except Mammoth Pass (Table

Given the probabilistic nature of the Categorical Model, it is expected that the observed April 1 SWC will often fall into a category (dry, normal, or wet) other than the one to which the model attributed the highest probability of the three. The observed April 1 SWC is expected to most often fall into the category assigned the highest probability, but to also often fall into the category assigned the second-highest probability, and occasionally to fall into the category assigned the lowest probability.

A rigorous evaluation of the Categorical Model's performance would require a much larger sample than the available 66 years (1951–2016). The available 66 years however does allow an approximate evaluation through the qualitative examination of Figure _{w}) is plotted against the probability assigned to “dry” (_{d}). Each point represents a year between 1951 and 2016. Given _{w} + _{d} + _{n} = 1, rearranging that equation we can write _{n} = 1–_{w} + _{d}), which gives the probability assigned to category “normal” (_{n}) is a function of the probabilities plotted in Figure _{w} being larger than either _{d} or _{n}; “most likely dry” corresponds to _{d} > _{w}_{n}; and “most likely normal” corresponds to _{n} > _{w}_{d}.

Probability assigned to categories “dry” (x axis) and “wet” (y axis) each year in 1951–2016. The actual observed category each year is indicated in color, according to the legend. The distribution of colors appears consistent with the probabilities.

In Figure

The Categorical Model is designed specifically for assigning probabilities to each of the three categories, not to provide a forecast. If it were used to forecast the category directly, this model would do somewhat less well than the VECM model, especially because of a larger number of opposite-category forecasts (i.e., wet years forecast to be dry, and dry years forecast to be wet) in the period 1951–2016. Therefore, the VECM model forecast April 1 SWC value should be used to forecast the category, and the Categorical Model should be used to evaluate the uncertainty pertaining to the category.

As an example, Figure

Probability of April 1 SWC falling into each of the categories – dry, normal, or wet – issued by the Categorical Model for 2010.

The VECM model and the Categorical model were parameterized using observations for 1948-2016 (section Methods). The models were completed and delivered to LADWP in January of 2017. Since then, the model has been used to forecast April 1 snow water content (SWC) in 2017 and 2018. These were actual forecasts as opposed to hindcasts. The forecast April 1 SWC is compared against the observed value in Figure

Forecast for years 2017 (red) and 2018 (orange) plotted against the respective observations, for the four Owens Valley sites. The gray points represent 1951–2016 hindcasts, previously shown in Figure

Year 2017 had among the largest snowpack of the period plotted, especially at the most important station, Mammoth Pass. On February 1, the observed SWC was already almost double the normal (i.e., average) value for that date at Mammoth Pass and Sawmill, and more than triple the normal value at Rock Creek #2 and Cottonwood #1. By April 1, the observed SWC was about double the normal value for that date at Mammoth Pass and Sawmill, and about 2.5 times the normal at Rock Creek #2 and Cottonwood #1. The VECM model provided April 1 SWC forecasts that were mildly over-estimated for Mammoth Pass and Sawmill, and slightly under-estimated for Rock Creek #2 and Cottonwood #1 (Figure

Year 2018 was a more complex year, representing a good test case. SWC was very low on February 1 but, thanks to late-season storms in February and March, it reached near-normal values on April 1. The VECM model correctly forecast these near-normal values. For example, on February 1, the SWC at Rock Creek #2 was at 10% of the normal value for that date; but on April 1 had reached 40% of the normal value for that date. The VECM model correctly forecast the substantial SWC increase that occurred in February and March, producing approximate forecasts for the four stations (Figure

The VECM model developed and tested in this study has proven to have considerable skill forecasting Owens Valley April 1 SWC in early February. Its performance in hindcasts (1951-2016) was shown to surpass the skill of the pre-existing alternative, which consisted of using a linear regression to forecast April 1 SWC based on the observed February 1 SWC. The VECM model's performance was clearly superior to the February linear regression on every measure, including a higher coefficient of determination (^{2}), smaller average and maximum errors (defined as the forecast value minus observed value), fewer misclassifications of years, defined as a year when the forecast and observed April 1 SWC are not in the same category (dry, normal, or wet), and fewer severe misclassifications of years (i.e., years forecast to be in the category opposite the observed one, especially when those were extreme years such as among the 10 wettest or 10 driest).

As a complement to the VECM model, the Categorical Model was developed to express forecast uncertainty by estimating the probability that April 1 SWC would fall into each of the three categories—dry, normal, and wet. While the sample size of the hindcast (66 years: 1951-2016) is too small for rigorous testing of the Categorical Model, the probabilities it produced for these hindcast years appear consistent with the observations. The VECM model forecast April 1 SWC value should be used to also forecast the category, and the Categorical Model should be used to evaluate the uncertainty pertaining to the category.

Since the model was completed, using 1948-2016 observations, it model has been used to forecast April 1 snow water content (SWC) in 2017 and 2018. These were actual forecasts as opposed to hindcasts. The 2017 and 2018 forecasts were compared against the observed April 1 SWC values, showing to have been successful and having deviated no more from observations than most years in the hindcast period. The Categorical model also attributed the highest probability to the category that was observed on April 1. The 2017 was an exceptionally wet year, and 2018 was overall a dry year but which received late-season storms in February and March. The successful forecast in both 2017 and 2018 adds confidence in the VECM model and the Categorical model.

While the VECM model was shown to provide considerable forecast skill for April 1 SWC, there is significant uncertainty associated with its forecast in any individual year. This is the case with any meteorological or hydrological forecast model. Model uncertainty was clearly characterized in this report using hindcasts. Future forecasts may incur smaller or larger errors than those seen in the hindcasts or the two forecast years of 2017 and 2018. Forecast uncertainty must therefore be taken into account by LADWP in its decision making.

Many water supply reservoirs in California capture snowmelt in spring for supply later in summer. Under current operations, some reservoir storage is set aside for future floods over the course of the wet season—the flood control volume varying by month—and excess runoff is released downstream (e.g., Willis et al.,

The forecasting tool presented in this paper allows issuing forecasts in early February for the remainder of the wet season, i.e., through April 1. The tool is based on the sea-level pressure anomaly at the climatological location of the NPH measured in mid-January.

The forecast lead time may potentially be increased if the mid-January NPH sea level pressure anomaly can be accurately forecasted, whether by statistical or dynamical models. The NPH exhibits a closer relationship with precipitation throughout California compared to the ENSO indices (see Costa-Cabral et al.,

This work demonstrates that advancements in forecasts of NPH are expected to have significant benefits for water resources, agriculture, energy, insurance, drought preparedness, and flood risk management in California. We hope that future research will investigate the present ability of the different models in the North American Multi-Model Ensemble (NMME; Kirtman et al.,

This work may also contain important hints for future research by climate scientists. The cointegrated relationship identified means that the principal relationship between these 2 variables (NPH anomalies and California precipitation) is between their integrals. This suggests that the climate processes involved have characteristics analogous to reservoirs, which are integrals of stochastic inputs and outputs. This line of thinking, if further explored, might bear fruit in understanding the low-frequency variability in climate, such as decadal variability. Precipitation depends on ocean surface temperatures (SST) at different locations. Temperature measures heat content, a reservoir type variable which may be at the origin of cointegrated relationships between climatic variables.

MC-C led this study and had the idea of using cointegration as the basis for the forecasting model; JR suggested the VECM model and the categorical model mathematical frameworks, fitted the model parameters, and encoded these models in Excel for operational use; MC-C conducted the hindcasts and model evaluation, and respective graphics, and wrote most of the text of this manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This work was commissioned and funded by the Los Angeles Department of Water and Power (LADWP). We thank LADWP for the opportunity of developing this forecasting approach and applying it to the Owens Valley watershed. Sujoy B. Roy provided comments that improved the quality of this manuscript, and his contribution is gratefully acknowledged. Two reviews contributed to clarity and a more expanded discussion in this paper, and are gratefully acknowledged. The preparation of this manuscript was funded by the employers of the two authors, Northwest Hydraulic Consultants Inc, and Tetra Tech Inc.

The Supplementary Material for this article can be found online at:

^{1}