^{1}

^{*}

^{†}

^{2}

^{†}

^{3}

^{4}

^{1}

^{1}

^{2}

^{3}

^{4}

Edited by: Ahsan H. Khandoker, Khalifa University, United Arab Emirates

Reviewed by: Philip Thomas Clemson, Lancaster University, United Kingdom; Felix Putze, University of Bremen, Germany

*Correspondence: Nathan Gold

This article was submitted to Computational Physiology and Medicine, a section of the journal Frontiers in Physiology

†These authors have contributed equally to this work.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Experimentally and clinically collected time series data are often contaminated with significant confounding noise, creating short, noisy time series. This noise, due to natural variability and measurement error, poses a challenge to conventional change point detection methods. We propose a novel and robust statistical method for change point detection for noisy biological time sequences. Our method is a significant improvement over traditional change point detection methods, which only examine a potential anomaly at a single time point. In contrast, our method considers all suspected anomaly points and considers the joint probability distribution of the number of change points and the elapsed time between two consecutive anomalies. We validate our method with three simulated time series, a widely accepted benchmark data set, two geological time series, a data set of ECG recordings, and a physiological data set of heart rate variability measurements of fetal sheep model of human labor, comparing it to three existing methods. Our method demonstrates significantly improved performance over the existing point-wise detection methods.

Various biological and medical settings require constant monitoring, collecting massive volumes of data in time series typically containing confounding noise (Grassberger and Procaccia,

The heavy contamination of noise due to measurement error and naturally varying phenomena, however, make the detection of change points challenging, as existing techniques will often observe non-pathological changes, resulting in false-alarms and mistrust of detection techniques (O'Carrol,

The detection of change points in a non-stationary time series is a well-studied problem, which has produced many techniques (Basseville and Nikiforov,

Non-stationary time series may also be viewed as segments of piecewise locally stationary time series (Adak,

As the current change point detection methodology we consider operates in a point-wise manner, temporal information of change points is lost, such as how often they can be expected to occur, and if they should occur in quick succession or not. Especially in physiological time series, where temporal information and patterns of change points may be highly relevant to practitioners, a point-wise approach may be ill-suited to these time series containing noise. To rectify this problem, we propose a novel change point detection method to analyse the pattern of change points and their inter-arrival times in a small time window so as to observe additional information that may be missed using a point-wise approach. It is our intention this method will reduce the

While the existing methods (Basseville and Nikiforov,

In this section, we will describe the underlying change point detection methodology our Delta point method extends, as well as the theoretical background of these methodologies. We begin with a notation section for the reader to refer to, and subsequently describe the Bayesian online change point detection methodology introduced by Adams and MacKay (

Throughout the remainder of the article, we use the following notation (we omit descriptions in this section). Vectors are denoted in _{t} is the run length of the change point detection algorithm. _{1:t} is a vector of time series observations from time _{t} the observation at time _{(r)} denotes the vector of time series observations since the previous change point. τ ∈ [1, _{[ti,tj]} denotes the number of change points in an interval [_{i}, _{j}]. _{i}, _{j}]. α is a computation parameter, and ℓ is a computation parameter for the input length scale. ϵ_{t} is Gaussian white-noise, and σ^{2} is the standard deviation of the noise. _{*} is a vector computed by the covariance function, and

We begin with a review of the Bayesian online change point detection (BOCPD) algorithm (Adams and MacKay,

In detail, BOCPD uses a combination of a predictive model of future observations of the time series, an integer quantity, _{t}, known as the _{t}|_{t−1}), which calculates the probability of a change point occurring with respect to the last change point, to calculate the probability of a change point occurring. Bayes' rule is used to compute the posterior or past distribution of the run length as

where _{1:t} is a vector of past observations of the time series, _{t}, _{1:t}) is the joint likelihood of the cumulative run length and observations, calculated at each step, and _{1:t}) is the marginal likelihood of the observations. The joint distribution _{t}, _{1:t}) is computed with each new observation using a message passing algorithm,

where _{t}|_{t−1}) is the hazard function, _{t}|_{t−1}, _{(r)}) is the prediction model with observations _{(r)} since the last change point, and _{t−1}, _{1:t−1}) is the previous iteration of the algorithm.

Once the online message passing algorithm of Equation (2) is computed, the probability of a change point having occurred or not is given by,

respectively.

The performance of the BOCPD algorithm is highly dependent on the choice of predictive model for the next time step ahead prediction of the time series data. State of the art performance for the BOCPD algorithm was recently achieved by the use of a Gaussian process time series predictive model (Turner,

A Gaussian process is a Gaussian distribution over functions—that is, the distribution of the possible values of the function follows a multivariate Gaussian distribution (Rasmussen and William,

where α is a computing parameter, and ℓ is the input scale, or distance between inputs

In the Gaussian process time series model we used, the time index _{t} is the output. This is collected in the model,

where

To predict future observations of the time series, we appeal to Bayes' rule, using the GPTS predictive model as a prior over functions. By the rules of conditioning Gaussian distributions (Rasmussen and William,

where _{*} = _{(r)}, _{t}) is an (τ − 1) × 1 vector, where τ is the time since the last change-point, _{(r)}, _{(r)}) is an (τ − 1) × (τ − 1) matrix, and _{*.*} = _{t}, _{t}). The mean and covariance of the Gaussian predictive distribution, given above in Equations (7) and (8), respectively, are derived by conditioning the previous Gaussian distributed observations on the current observation, which due to the Gaussian process model also follows a Gaussian distribution. For a formal derivation of these formulae, see Bishop (

The GPTS predictive model was then used as the predictive model for the BOCPD algorithm to make next time step ahead predictions of the time series.

Now that we have reviewed the previous change point detection method and Gaussian process time series predictive model, we can introduce the newly proposed Delta point method. The BOCPD algorithm with the Gaussian process time series predictive model returns a vector of change points. We term the change points stored in this returned vector _{t}|_{t−1}), is then updated in the message-passing scheme (Equation 2) with highly varying information. The probability of change points occurring, as computed by the posterior run length distribution given by Equation (1), will be set artificially high due to this, thus resulting in over-detection of change points.

The Delta point method is designed to classify, with highest probability, from a vector of suspected change points, the change point most representative of a significant change in the generative process of the time series, and not those given by confounding noise.

The method consists generally of dividing the time series into intervals of user-specified, domain specific length for which a suspected change point may be contained. The number of change points and average run length of the change points in each interval containing a declared change point is then computed, and the interval with the fewest change points and longest average run length is selected as the interval with the highest probability of containing the change point most relevant to the user—the

Given a vector of suspected change points from the BOCPD algorithm, we can view the observed change points as a realization of a doubly stochastic point process, that is, a point process where the intensity rate determining inter-arrival times of events is itself a stochastic process. As the Delta point method is concerned with determining the probability of finding the fewest number of change points and longest average run length in an interval, the point process view is natural to take. This framework allows us to determine explicit probabilities of each suspected change point being the change point of interest, by computing the probabilities of a specific number of change points occurring in each interval. Rather than following a homogeneous point process with a constant and deterministic intensity rate, the Delta point method must have a stochastic intensity rate. The doubly stochastic term arises naturally from the intensity function of the change point process being generated from the BOCPD algorithm, which returns a probability of change points occurring.

Let {_{t⩾0} be a counting process representing the number of change points which are declared by the BOCPD algorithm. For clarity of exposition, we begin with a rapid review of the Poisson counting process. A Poisson process with _{t⩾0} with independent increments, and for which the number of events in a time interval of length _{Nt}t⩾0} is a doubly stochastic Poisson process, where the intensity {Λ(_{t⩾0} is itself a stochastic process (Lefebvre, _{2} ⩾ _{1} ⩾ 0, we have

where,

The probability distribution of the intensity process, {Λ(_{t⩾0}, is given by the posterior run length distribution _{t}|_{1:t}). Since the intensity process is a continuous stochastic process, we need a continuous version of the posterior run length distribution from the BOCPD algorithm, which is given by combining Equations (1) and (2). This is given as,

In the time interval (_{i}, _{i+1}], where

The expected value of a doubly stochastic point process {_{t⩾0} is computed as in a standard Poisson process, however there is a complication due to the stochastic intensity rate {Λ(_{t⩾0}. Consider the doubly stochastic point process

For the Delta point algorithm, the length _{i}, _{i+1}] as defined above is user-defined. Let _{i} be the number of change points in the interval (_{i}, _{i+1}]. Thus, _{i} = _{i+1}) − _{i}), where {

where _{n} is the run length associated with each change point in the interval.

Following the average run length computation, we then consider the joint probability distribution of _{i} and _{i}, _{i+1}]. The average run length is conditioned by the probability of observing _{i} many change-points in the interval (_{i}, _{i+1}],

The conditioned averaged run length probability is computed for each interval. Due to the Gaussian process predictive model properties, for a noisy time series, once a change point has been observed in an interval, the probability of more change points being detected in that interval increases. Further, the average run length of that interval decreases accordingly. In the traditional point-wise methods, this will result in many false positives. To avoid this difficulty, we take the opposite approach by observing that the interval with the fewest number of change points and the longest average run length has the highest probability of containing a representative change in the system, and not erroneous change points introduced by noise; that is, it contains a

Once this interval has been determined, the interval is then searched to look up the associated run length with each interval. The declared change point with the longest average run length _{n} is declared as the Delta point.

The only parameter in the Delta point method is the user-defined length of the interval (_{i}, _{i+1}], or

In future work, we will explore the structure of the doubly stochastic Poisson process of declared change points in greater detail. As the intensity function is determined predominately by the predictive model, the kernel of the Gaussian process is a natural place to begin investigation. Further, we aim to derive rigorous results on the interval length for optimal performance.

We tested the Delta point method on several simulated and real world time series data sets. The simulated time series consist of three synthetic time series of our own design, and two widely used benchmark curves. The real world data sets are made up of well-log recordings from geophysical drilling measurements, annual water levels of the Nile river and 100 clinical ECG recordings. We compared the Delta point method to three competing non-stationary change point detection algorithms, namely Takeuchi and Yamanishi (TY) (Takeuchi and Yamanishi,

To compare the Delta point method to competing methods, we performed several statistical tests on declared change points from each method. For the simulated data, we computed the mean square error (MSE) for each method, taking the absolute temporal difference between the user labeled change point and the declared change point. We then performed two-sided

To test the efficacy of the Delta point method, we produced 1,000 simulations of three different time series, each 500 data points in length. Each time series was designed to simulate change points that may be seen in real world settings, and to have a specific change point that is of more interest than others in the time series. By change point of interest we are are referring to either a change in mean in the case of simulated Series 1 and Series 2, and the introduction of a linear trend in the data in Series 3. These cases are chosen so as to be representative of changes that may occur in real world settings such as sensor failure, or a changing physiological condition.

Series 1 has two change points, with the change point of interest occurring at

where ϵ_{t} ~ _{1} values are uniformly sampled from [0,1].

Series 2 is a simulated autoregressive model with a large jump, and then a return back to the original process. The change point of interest was chosen as the onset of the jump (

where, ϵ_{t} ~ _{2} values are uniformly sampled from [0,1].

Series 3 is a simulated autoregressive moving average model with an introduced linear trend and subsequent return to the autoregressive moving average model. The change point of interest was chosen as the beginning of the linear trend (

where, ϵ_{t} ~ _{3} values are uniformly sampled from [0,1].

Simulated time series from Series 1–Series 3 are displayed in Figures _{1} = 0.7, ρ_{2} = 0.4, and ρ_{3} = 0.5.

Simulation time series 1, ρ_{1} = 0.7.

Simulation time series 2, ρ_{2} = 0.4.

Simulation time series 3, ρ_{3} = 0.5.

The BOCPD method learned parameter values through training, so we only list the values we used to initiate the method. For Series 1, Series 2, and Series 3, we used a training set of 200 data, taken at the beginning of the time series. The Gaussian process model used a non-biased parameter initialization, with an assumed standard normal distribution prior for Series 1, Series 2, and Series 3. The hazard rate parameter used for the hazard function for initial training for each time Series is θ_{h} = −3.982. The Delta point interval length for each time series was set at 40 for training, as this should protect against the BOCPD possibly declaring too many erroneous change points, by being set too short. Setting the interval length to longer should produce similar results. The techniques TY, LS, and L all require a threshold value above which a change point will be declared. We performed cross-validation of several threshold values for each method, choosing the value for each time series that allows the most accurate detection of the change point of interest. We select the change point declared by each method that is closest to the significant change point described above.

The results of each method are displayed in Table _{1} = 0.7, ρ_{2} = 0.4, and ρ_{3} = 0.5, respectively; the results for different values are ρ_{1}, ρ_{2}, and ρ_{3} are not significantly different. For Series 1, the Delta point method significantly (

Simulation results.

^{3} |
|||
---|---|---|---|

Delta | 9.718 ± 9.881 | 0.192 | N/A |

TY | 40.110 ± 28.367 | 2.413 | 1 ( |

LS | 8.953 ± 12.783 | 0.243 | 0 ( |

L | 59.459 ± 25.713 | 4.196 | 1 ( |

Delta | 3.399 ± 12.099 | 0.158 | N/A |

TY | 13.802 ± 10.7298 | 0.306 | 1 ( |

LS | 6.205 ± 9.6699 | 0.132 | 1 ( |

L | 27.728 ± 22.961 | 1.296 | 1 ( |

Delta | 63.603 ± 30.762 | 4.991 | N/A |

TY | 54.645 ± 39.358 | 4.534 | 1 ( |

LS | 65.263 ± 63.469 | 8.284 | 0 ( |

L | 69.911 ± 30.418 | 5.812 | 1 ( |

Simulation series 1 boxplot. TY, Takeuchi and Yamanishi (

Simulation series 2 boxplot. TY, Takeuchi and Yamanishi (

Simulation series 3 boxplot. TY, Takeuchi and Yamanishi (

To further analyse the performance of the Delta point method, we tested it and the existing methods on the Donoho-Johnstone Benchmark non-stationary time series (Donoho and Johnstone,

For training for the Delta point method, we used a standard normal distribution prior for the Gaussian process, and hazard rate parameter θ_{h} = −3.982. We set the Delta point interval to 50 for the Bump curve and the Block curve. We selected 50 time points for the interval length so that we could observe sufficient temporal structure for the doubly stochastic Poisson process. The training set consisted of the first 800 data of each curve, to correspond to the rule of thumb of using the first 35–40% of time series data for training (Turner,

The results of the methods for the Bump and Block curves are displayed in Table

Donaho-Johnstone Benchmark curves results.

Bump | Labeled: 440 | |

Delta | 445 | 5 |

TY | 475 | 34 |

LS | 444 | 4 |

L | 468 | 28 |

Block | Labeled: 1331 | |

Delta | 1332 | 1 |

TY | 1348 | 17 |

LS | 1353 | 22 |

L | 1335 | 24 |

The well-log data set consists of 4,050 nuclear magnetic resonance measurements obtained during the drilling of a well (Ó Ruanaidh et al.,

The Nile river time series consists of a record of the lowest annual water levels between 622 and 1,284 CE, recorded on the Island of Roda, near Cairo, Egypt (Beran,

For training for the Delta point method for both time series, we used a standard normal distribution prior for the Gaussian process, and hazard rate parameter θ_{h} = −3.982. For the well-log data, the Delta point interval was chosen as 30 due to the length of the time series and sensor noise, and for the Nile river time series, the Delta point interval was chosen to be 50, as the curve is smoother. The training set consisted of the first 1,000 data for the well-log series, and first 250 data for the Nile river set.

The results of each method are displayed in Table

Well-log and Nile River results.

Well-log | Labeled: 1070 | |

Delta | 1072 | 2 |

TY | 1083 | 13 |

LS | 1085 | 15 |

L | 1103 | 33 |

Nile river | Labeled: 715 | |

Delta | 720 | 5 |

TY | 723 | 8 |

LS | 722 | 7 |

L | 725 | 10 |

Well-log and Nile River level Delta point method.

The ECG dataset consists of short time series, with varying features. It is comprised of 100 clinical ECG recordings, each 136 data in length taken from a 67 year old patient Chen et al. (

For training for the Delta point method, we used a standard normal distribution prior for the Gaussian process, and hazard rate parameter θ_{h} = −3.982. Due to the short nature of these time series, the training set length was selected to be the first 30 data points; the training set never included the QRS complex for any of the 100 instances. The Delta point interval was set to 5, as the QRS complex is very short, and occurs rapidly in the series. The time series rapidly changes here, so a shorter interval performed best.

The performance of each method is displayed in Table

ECG (ECGFiveDays) QRS complex results.

Delta | 3.51 ± 1.352 | 14.13 | N/A |

TY | 5.08 ± 2.54 | 32.2 | 1 ( |

LS | 4.21 ± 2.944 | 26.31 | 1 ( |

L | 3.53 ± 3.221 | 22.73 | 0 ( |

ECG (ECGFiveDays) boxplot. TY, Takeuchi and Yamanishi (

We applied the Delta point method to a data set consisting of 14 experimental time series of a measure of fetal heart rate variability (HRV) known as the root mean square of successive differences (RMSSD) of R-R intervals of ECG recorded during umbilical cord occlusions (UCO) (Frasch et al.,

During UCO mimicking human labor, a hypotensive blood pressure response to the occlusions manifests as the introduction of a new trend in the recorded time series. This response is induced by the vagal nerve activation triggered by worsening acidemia during UCO as discussed in Frasch et al. (

The experimental time series are short—<200 observations—and confounded with a large amount of noise due to experimental conditions and measurement error. The time series are piecewise locally stationary, and contain naturally occurring biological fluctuations due, for example, to non-linear brain-body interactions (Berntson et al.,

To avoid false alarms, we defined a clinical _{h} = −3.982. We trained the Delta point method with 48 data points per time series, corresponding to 2 h of recording. The Delta point interval was set at 10 data, which corresponded to 25 min of experiment time. This interval was chosen to coincide with the clinical ROI.

The Delta point method significantly outperformed competing methods, with 11 of 14 declared change points in the ROI, compared to 3 of 14 for TY with Fisher's exact test statistic 0.007028, 5 of 14 for LS with Fisher's exact test statistic 0.054238, and 2 of 14 for L with Fisher's exact test statistic 0.001838. The Delta point method applied to one animal from the data set (ID473378) is displayed in Figure

Fetal Sheep ID473378 Delta point method. Delta point method for Fetal Sheep ID473378 RMSSD time series. The y-axis denotes the RMSSD of the animal over the experimental course, and the x-axis denotes experimental time. Suspected change points are denoted with red crosses, the expert sentinel value with a black cross (6:13), and the detected Delta point with an orange box (6:24).

Fetal sheep experiment results.

8003 | 15:56 | −00:05 | −00:15 | ||

473351 | 13:38 | −00:43 | −00:27 | 01:00 | |

473362 | 11:05 | −00:48 | − |
−00:28 | |

473376 | 12:36 | − |
−01:02 | −00:15 | |

473726 | 12:04 | 00:25 | −00:10 | ||

461060 | 12:43 | −00:25 | 01:30 | 01:02 | |

473361 | 12:51 | −00:08 | 00:35 | ||

473352 | 13:17 | 00:24 | 00:36 | −00:14 | |

473377 | 12:12 | − |
−00:13 | ||

473378 | 13:22 | 00:47 | 00:37 | −00:12 | |

] 473727 | 11:03 | −00:07 | −00:17 | −00:45 | |

5054 | 12:53 | 01:26 | −00:30 | 01:14 | 00:44 |

5060 | 11:26 | 00:32 | 00:28 | ||

473360 | 13:59 | 01:07 | −00:17 | −00:42 | |

Total | 11/14 | 3/14 | 5/14 | 2/14 |

We also computed Bland-Altman plots for the experimental time series to compare the Delta point method to each other method. In Figure

Bland-Altman plots of methods for Fetal Sheep dataset. TY, Takeuchi and Yamanishi (

We observed that the Delta point method is effective at finding change points of interest in piecewise locally stationary time series of different types. For the simulated time series of our own design, the Delta point method performed better or indistinguishably from the best performing methods for Series 1 and Series 2. For Series 1, the Delta point method had the lowest MSE, which suggested it is accurately identifying change points of interest. For Series 2, the Delta point method significantly outperformed the competing methods in terms of mean absolute difference in detection time for labeled change points of interest. Although method LS had a lower MSE for this series, its mean detection difference is closer to 0. In Series 3, method L performed the best, with the smallest mean absolute difference in detection time, and MSE. Series 3 consisted of the introduction of the linear trend to the autoregressive moving average model, of which the introduction of the trend was obscured by added noise. Since method L compares density ratios of the time series, its good performance on this time series is likely due to noticing these changing ratios before other methods noticed the trend.

For the Donoho-Johnstone Bump curve, the Delta point method performed nearly as well as the best performing method—method LS—with a smaller absolute difference in detection time compared to the other methods, TY and L. The Delta point method performance for the Donaho-Johnstone Block curve was better than the other methods, exemplifying the strength of the Delta point method for piecewise locally stationary time series. Our test results for the well-log data set also provides evidence of the performance of the Delta point method for piecewise locally stationary time series. For the Nile river data set, as the installation of the Nilometer is the most significant change point in the time series, and can even be noticed visually, we expected that all methods should accurately detect this change point with little variation. Indeed, our results confirm this hypothesis.

For the clinical ECG data set, ECGFiveDays, the Delta point method performs significantly better than methods TY and LS, however has an indistinguishable performance difference with method L, although the Delta point method has the lowest MSE. Due to the rapidly varying nature of the time series when the QRS complex begins, the ability of method L to compare density ratios between components of the time series is beneficial and improves its performance compared to other methods.

With regards to the fetal sheep experimental data set, the early detection of acidemia is better than late detection from a clinical perspective. Hence, we defined the clinical ROI according to expert physician input. The 20 min window before the expert-labeled sentinel point provides adequate warning to clinicians to increase monitoring, or expedite delivery, while the 3 min window posterior to the expert-labeled sentinel point is sufficiently close to be included in the experimental procedure. In clinical settings, we believe that earlier detection is better, as it provides longer decision making time, and justification for increased monitoring.

The novelty of the current work is that our method permits statistical-level predictions about concomitant changes in individual bivariate time series, simulated or physiological such as HRV measure RMSSD and ABP in an animal model of human labor. Our method is able to predict cardiovascular de-compensation by identifying ABP responses to UCO, a sensitive measure of acidosis. These predictions are reliable even in the instances when the signals are noisy. This is based on our observation that here, to mimic the online recording situation, no artifact correction for RMSSD was undertaken as is usually done for HRV offline processing (Seely et al.,

Although the Delta point method performs well in settings with noise, the method is not designed to work accurately for time series that exhibit periodic structure. Due to the Gaussian process time series predictive model that is used for updating predictions, the accuracy or predictions and thus detected change points by the BOCPD algorithm depends on the kernel selected by the user. Indeed, periodic kernels do exist, as shown in Rasmussen and William (

We have intentionally focused our analysis on the change point detection time, due to our interest in early detection of possibly negative phenomena in biological systems. For this reason, our analysis focuses only on the sensitivity of the method. Other methods may be more ideally suited for analysis with a certain specificity in mind. Additionally, it may be interesting to consider different time series predictive models, such as the dynamical Bayesian inference models of Duggento et al. (

We have developed a novel, change point detection method for effectively isolating a change point of interest in short, noisy, non-stationary, and non-periodic time series. Our method is able to effectively extract clinically relevant changes in the time series, allowing informed decision making, an essential challenge posed by Seely and co-authors for the future of intelligent monitoring (Seely et al.,

Data and codes associated with the manuscript may be found at the following sources: BOCPD code:

Donoho-Johnstone data:

Well-log data:

Nile river data:

ECGFiveDays:

Simulation data:

The human ECG dataset (Chen et al.,

Conception and design: XW, MGF, and NG; Acquisition of data: MGF and NG; Analysis and interpretation of data: NG, MGF, XW, CLH, and BSR; Drafting the manuscript: NG, MGF, and XW; Revising it for intellectual content: NG, MGF, XW, CLH, and BSR; Final approval of the completed manuscript: NG, MGF, XW, CLH, and BSR.

BSR and MGF are inventors of related patent applications entitled “EEG Monitor of Fetal Health” including U.S. Patent Application Serial No. 12/532.874 and CA 2681926 National Stage Entries of PCT/CA08/0058 filed March 28, 2008, with priority to US provisional patent application 60/908,587, filed March 28, 2007 (US 9,215,999). The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors gratefully acknowledge technical support by Dr. Qiming Wang and Dr. Michael Last. We would also like to thank Ms. Patrycja Jankowski and Ms. Dana Gurevich for excellent artwork assistance. We also gratefully acknowledge the referees, whose comments greatly improved the manuscript.