^{1}

^{2}

^{*}

^{1}

^{2}

Edited by: Lili Lei, Nanjing University, China

Reviewed by: Shuchih Yang, National Central University, Taiwan; Mengbin Zhu, State Key Laboratory of Geo-Information Engineering, China

This article was submitted to Dynamical Systems, a section of the journal Frontiers in Applied Mathematics and Statistics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Ensemble Kalman filters are powerful tools to merge model dynamics and observation data. For large system models, they are known to diverge due to subsampling errors at small ensemble size and thus possible spurious correlations in forecast error covariances. The Local Ensemble Transform Kalman filter (LETKF) remedies these disadvantages by localization in observation space. However, its application to nonlocal observations is still under debate since it is still not clear how to optimally localize nonlocal observations. The present work studies intermittent divergence of filter innovations and shows that it increases forecast errors. Nonlocal observations enhance such innovation divergence under certain conditions, whereas similar localization radius and sensitivity function width of nonlocal observations minimizes the divergence rate. The analysis of the LETKF reveals inconsistencies in the assimilation of observed and unobserved model grid points which may yield detrimental effects. These inconsistencies

Data assimilation (DA) merges models and observations to gain optimal model state estimates. It is well-established in meteorology [

Merging the model forecast state and observations, the ensemble Kalman filter tears the analysis, i.e., the newly estimated state, toward the model forecast state and thus underestimates the forecast error covariance matrix due to a limited ensemble size [

Ensemble member inflation and localization improves the filter performance. The present work considers a perfect model and thus neglects model errors. By virtue of this study construction, all divergence effects observed result from undersampling and localization. The present work chooses a small ensemble size compared to the model dimension, fixes the ensemble inflation to a flow-independent additive inflation and investigates the effect of localization.

In addition to the filter divergence described above ensemble Kalman filter may exhibit catastrophic filter divergence which enhances the filter forecasts to numerical machine infinity [

The underlying motivation of this work is the experience from meteorological data assimilation, that satellite data are detrimental to forecasts if assimilation procedure is not well-tuned [

Example for effect of nonlocal observations on departure statistics. Verification of model equivalents in observation space by local observations at three spatial positions (_{H} = 10; the localization radius is _{l} = 10. Further details on the model, observations and assimilation parameters are given in section 2.7.

Section 2 introduces the essential elements of the LETKF and re-calls its analytical description for a single observation in section 2.5. Section 2.8 provides conventional and new markers of filter divergence that help to elucidate possible underlying divergence mechanisms. Section 3 presents briefly the findings, that are put into context in section 4.

The storm-like Lorenz96—model [_{k}(

with _{k} = _{k+N}, and α_{k} = _{k}(0) = 8.0 + ξ_{k}, _{N/2}(0) = 8.01 + ξ_{N/2} with the normal distributed random variable

The model field _{H}. _{H}.

Typically, data assimilation techniques are applied to merge observations and solutions of imperfect models and the true dynamics of the underlying system is not known. To illustrate the impact of nonlocal observations, we assume (what is unrealistic in practice) that the model under consideration (1) is perfect and hence emerging differences between observations and model equivalents do not originate in the model error.

The aim of data assimilation is to estimate a state that describes optimally both a model (or background) state ^{b} ∈ ℝ^{N} and corresponding observations ^{S} of number ^{a} ∈ ℝ^{N} minimizes the cost function

with ^{N}, the background error covariance ^{N×N} and the observation error covariance ^{S×S}. The observation operator ^{N} → ℝ^{S} is linear in the present work and projects a model state into the observation space and thus links model and model equivalents in observation space.

The LETKF estimates the background error covariance

with ^{b} ∈ ℝ^{N×L}. In the following, we will call an ensemble with ^{b} are the background ensemble member perturbations ^{b,l}} is the set of background ensemble members and

Then the coordinate transformation from physical space to ensemble space

describes a state

in the new coordinate ^{b} = ^{b}) is the corresponding model equivalent of ^{b}. This implies [

which is valid for linear observation operators.

The minimization of the cost function (5) yields

with

Equation (4) provides the analysis ensemble mean

Then the square root filter-ansatz [

where ^{a, l} is the ^{a} = [(^{1/2}. The square root of ^{t} with the diagonal matrix ^{1/2} = ^{1/2}^{t}.

Finally the analysis ensemble members in physical space read

see [

Specifically, we have chosen

In principle there are two types of observations. Local observations are measured at a single spatial location in the system, whereas nonlocal observations are integrals over a set of spatial locations. Examples for local observations are radiosondes measuring humidity and temperature in the atmosphere at a certain vertical altitude and horizontal position. Typical nonlocal observations are satellite measurements capturing the radiation in a vertical atmospheric column.

The present work considers observations

where _{t} and ^{S×N}. In the following, the linear operator

with sensitivity function width _{H} or both observation types (

where the local observation is captured at spatial location

In the subsequent sections, _{H} varies in the range 1 ≤ _{H} ≤ 10. Please note that _{H} = 1 approximates the model description of a local observation. Moreover, in the following a grid point whose activity contributes to a model equivalent in observation space is called an _{nk} ≠ 0 (_{nk} = 0).

In this work, a single partial study considers a smooth sensitivity function instead of the boxcar function described above. Then the sensitivity function is the Gaspari-Cohn function _{H}/2) [_{H} ≤ _{H}, which approximates the Gaussian function by a smooth function with finite support 2_{H}

The observations _{n}), _{n} = 1, …, _{t} = 0, i.e., observations are perfect in the sense that they reflect the underlying perfect model, cf. section 2.1. We take the point of view that we do not know that the model and observations are perfect and hence we guess

This approach has been taken in most cases in the work. Since, however, this implicit filter error may already contribute to a filter instability or even may induce it, a short partial study has assumed perfect knowledge of the observation error. To this end, in this short partial study we have assumed (_{t})_{jj} = 0.1, _{t}.

Although techniques have been developed to estimate

In the LETKF, the background covariance matrix

The observation error matrix _{n} at location _{l}. Then the error of observation _{ij} = _{ij}, _{ij} ≤ _{l} is the weighting function with the Gaspari-Cohn function _{ij} is the spatial distance between _{l} is the radius of the localization function with 0 ≤ _{l}. Consequently the observation error takes its minimum _{ij} = 0 and increases monotonously with distance to its maximum _{ij} = _{l}. In the present implementation, we use ε = 10^{−7} and observation errors

The observation error close to the border of the localization area about a grid point _{low} is low enough. By virtue of the monotonic decrease of _{l}/2) with respect to distance _{l} ≥ _{c}, _{l}/2) < _{low}. In other words, for distances _{c}, the observation errors _{nn} are that large that observations at such distances do poorly contribute to the analysis. For instance, if _{low} = 0.01, then _{l} = 5 → _{c} = 3, _{l} = 10 → _{c} = 7 and _{l} = 15 → _{c} = 11. It is important to note that this corrected localization radius depends on the width of the Gaspari-Cohn function and thus on the original localization radius _{l}, i.e., _{c} = _{c}(_{l}). In most following study cases results are given for original localization radii _{l}, while the usage of the corrected localization radius is stated explicitly. The existence of a corrected localization radius _{c} illustrates the insight, that there is not a single optimal localization radius for smooth localization functions but a certain range of equivalent localization radii. For non-smooth localization functions with sharp edges, e.g., a boxcar function, this variability would not exist.

The present work considers primarily nonlocal observations. Since these are not located at a single spatial site, it is non-trivial to include them in the LETKF that assumes a single observation location. To this end, several previous studies have suggested corresponding approaches [_{H} is large. In fact, this localization scheme introduces an additional contribution to the observation error. The present implementation considers this definition. This results in the localization of the nonlocal observation at grid point

In a large part of this work, we consider a single observation with

Considering the localization scheme described above, at the model grid point

where ^{L} is a row vector with

The term _{1}.

Now utilizing the Woodbury matrix identity [

for real matrices ^{n×n}, ^{n×k}, ^{k×k}, and ^{k×n} with

where

with

Since _{i} takes its maximum at the observation location and is very small when the observation is at the localization border. This means that

Now let us focus on the ensemble members. Equations (18) and (9) give the analysis ensemble members at grid point

where

The singular value decomposition serves as a tool to compute

where ^{L×L} are the normalized eigenvectors of

i.e., ^{t} is an eigenvector of _{i} with eigenvalue 0 < λ_{i} < 1. By virtue of the properties of _{i}, λ_{i} takes its minimum at the observation location at

The remaining eigenvectors of number _{n} ⊥

Hence

This leads to

The ensemble Kalman filter underestimates the forecast error covariance matrix due to the limited ensemble size [^{b} in (3) are modified by white Gaussian additive noise ^{N×L}

with matrix elements _{add} = 0.1.

The present study investigates solutions ^{−3}/12 for 100 time steps applying a 4th-order Runge-Kutta integration scheme. According to [

Each trial assumes identical initial ensemble members and the only difference in trials results from the additive noise in additive covariance inflation, cf. section 2.6.

By virtue of the primarily numerical nature of the present work, it is mandatory to vary certain parameters, such as perturbations to the observations or the factor of additive inflation. For instance, the data assimilation results in

with _{H} = 10 and _{1} = 1, _{2} = 27, _{3} = 54. The localization radius is identical to the sensitivity function _{l} = _{H} and data assimilation is performed during 250 filter cycles with an initial phase of 50 forecast steps. For stabilization reasons, we have increased the model integration time step to Δ^{−2}/12 but reduced the number of model integrations to 10 steps, cf. [

As mentioned above, typically the measurement process is not known in all details. For instance, the observation error is assumed to be _{t} = 0. This is the valid setting for all simulations but few set of trials shown in _{t} = _{add} = 0.1 but in two single sets of experiments [cf. _{add} = 0.05. In addition, the weighting function of nonlocal observations is a boxcar window function with sharp borders but in a single set of experiments, where the weighting function is a smooth Gaspari-Cohn function, cf.

The verification measures bias and RMSE are computed for the local observations only according to Equations (23) and (24).

The Kalman filter may diverge for several reasons [

The present work focusses primarily on a non-catastrophic filter divergence type showing a strong increase of the innovation magnitude to values much larger than the model equivalent in observation space of the attractor. This divergence may be temporally intermittent with finite duration. Since this intermittent innovation divergence results in increased first guess departures and hence worsens forecasts, it is important to detect these divergences and control them. By definition the innovation process diverges if _{b} in the following. This criterion for innovation divergence is hard: if the innovation reaches the threshold σ_{th}, then innovation divergence occurs. The corresponding divergence rate γ is the ratio between the number of divergent and non-divergent trials. For instance, for γ = 1 all numerical trials diverge whereas γ = 0 reflect stability in all numerical trials.

Moreover, it is possible that

and the corresponding root mean-square error

quantify the forecast error in such trials. For a single observation, _{o}. Larger values of bias RMSE indicate larger innovation values.

To quantify filter divergence, Tong et al. [

and

at time _{n}, where the norm is defined by _{nm} are the corresponding matrix elements. The quantity Θ_{n} represents the ensemble spread in observation space and Ξ_{n} is the covariation of observed and unobserved ensemble perturbations assuming local observations. Large values of Ξ indicates catastrophic filter divergence as pointed out in [

An interesting feature to estimate the degree of divergence is the time of maximum ensemble spread _{Θ} and the time of maximum covariation of observed and unobserved ensemble perturbations _{Ξ}:

Moreover, previous studies have pointed out that catastrophic filter divergence in ensemble Kalman filter implies alignment of ensemble members. This may also represent an important mechanism in non-catastrophic filter divergence. The new quantity

is the probability of alignment and unalignment, where _{a} is the number of aligned ensemble member perturbation pairs

and _{u} is the number of ant-aligned member pairs with

∀_{lk} > 0.5 (cos β_{lk} < −0.5) implies _{a,u} ≤ 1 and the larger _{a} (_{u}) the more ensemble members are aligned (anti-aligned) to each other.

Considering the importance of member alignment to each other for catastrophic divergence, it may be interesting to estimate the alignment degree of background member perturbation with the analysis increments ^{a, l} − ^{b,l} by

The term ^{a, l} − ^{b,l} is the analysis ensemble member perturbation from the background members and _{l} → 1 (cos α_{l} → −1) the analysis ensemble members point into the same (opposite) direction as the background ensemble members. In addition,

are the percentages of aligned and anti-aligned ensemble members for which cos α_{l} > 0.5 (of number _{a}) and cos α_{l} < −0.5 (of number _{u}), respectively.

The stability of the ensemble Kalman filter depends heavily on the model and the nature of observations. To gain some insight into the effect of nonlocal observations, the present work considers primarily nonlocal observations only (section 3.1). Then the last section 3.2 shows briefly the divergence rates in the presence of both local and nonlocal observations.

The subsequent sections consider nonlocal observations only and show how they affect the filter stability. To this end, the first studies are purely numerical and are complemented by an additional analytical study.

In order to find out how the choice of localization radius _{l} affects the stability of the LETKF, a large number of numerical experiments assist to investigate statistically under which condition the filter diverges. ^{b} and the model equivalents in observation space ^{b} for two different localization radii. In _{l} = _{H}. Conversely, observations and their model equivalents diverge after some time for _{l} ≠ _{H}. This is visible in the ensemble mean (_{l} = _{H}, whereas the ensemble spread is larger for _{l} ≠ _{H}. The ensemble at _{l} = _{H} and _{l} ≠ _{H} are close to each other.

Temporal solutions of the filter process with _{H} = 5 with two different localization radii _{l}. ^{b} [top row, solid blue line, denoted as Hfg for model equivalent (H) of the first guess (fg)] and the ensemble members ^{(b,l)} (bottom row, dotted blue line). The time represents the number of analysis steps. _{1n} = 0, and at the single spatial location

This result can be generalized to a larger number of localization and sensitivity function widths, cf. _{H} = 1, no filter process diverges for a large range of localization radii _{l}, i.e., the LETKF is stable (dashed black line in _{H} = 1 corresponds to local observations. Now increasing the observation area with _{H} > 1, the filter may diverge and its divergence rate γ depends on the localization radius. We observe that the filter diverges least when the localization radius is close to the sensitivity function width. These findings hold true for both the original localization radius and the corrected radius _{c}, cf. section 2.4 and

Stability of the LETKF of nonlocal observations dependent on the sensitivity function width _{H} and the localization radius _{l}. The divergence rate γ is defined in section 2.8. _{l}. _{c} and _{low} = 0.01. Here, the observations are noise-free with _{t} = 0 but the chosen observation error is assumed to _{t} due to lack of knowledge of this true value.

These results hold also true if observations are subjected to additive noise and the observation error is chosen to the true value, cf. _{l}. The situation is different if the sensitivity function is not a non-smooth boxcar function as in the majority of the studies but a smooth Gaspari-Cohn function. Then the divergence rate is still minimum but the corresponding localization radius of this minimum is much smaller than _{H}, cf. dotted-dashed line in

LETKF stability for different parameters and _{H} = 10. The solid line denotes the divergence rate γ if the true observation error _{t} = 0.1 is known, i.e., _{t}, and the inflation rate is _{add} = 0.1; the dashed line denotes the divergence rate for lower inflation rate _{add} = 0.05, otherwise identical to the solid line case; the dotted-dashed line marks results identical to the dashed line case but with a smooth Gaspari-Cohn sensitivity function. The dotted line is taken from _{t} = 0, _{l}. _{c} with _{low} = 0.01.

All these results consider the realistic case of a small number of ensemble members _{l} < _{H} and stability with zero divergence rate for _{l} > _{H}. This means the full ensemble does not show a local minimum divergence rate as observed for

The divergence criterion is conservative with a hard threshold and trials with large but sub-threshold innovations, i.e., with innovations that do not exceed the threshold, are not detected as being divergent. Nevertheless to quantify intermittent large innovations in the filter, _{l} that are similar to the sensitivity function width _{H} (_{c} and _{H} agree well at minimum bias and RMSE, cf.

First guess departure statistics of trials that do not reach the divergence threshold. Here _{H} = 5 (black) and _{H} = 10 (red). _{l}. _{c} with _{low} = 0.01. All statistical measures are based on 100 trials.

Now understanding that localization radii _{l} ≠ _{H} may destabilize the filter, the question arises where this comes from and which mechanisms may be responsible for the innovation divergence. _{n} diverges (_{l} < _{H} and _{l} ≫ _{H}, whereas Θ_{n} remains finite for _{l} ≈ _{H}. Interestingly, for _{l} < _{H} a certain number of ensemble members align and anti-align intermittently but do not align in the instance of divergence (_{l} ≫ _{H} ensemble members both align and anti-align while the filter diverges. These results already indicate a different divergence mechanism for _{l} ≤ _{H} and _{>} _{H}. Accordingly, for _{l} < _{H} and _{l} ≈ _{H} background member perturbations align with the analysis member perturbations with cos α_{l} → 1 (_{l} fluctuates between 1 and −1 for _{l} ≫ _{H} while diverging.

Various measures reflecting stability of the LETKF dependent on the localization radius _{l} in single trials. _{o} (black) and its model equivalent _{n} (top) and Θ_{n} (bottom), for definition see section 2.8. ^{a} − ^{b} according to Equation (27). The different localization radii are _{l} = 1 (left panel), _{l} = 6 (center panel), and _{l} = 20 (right panel) with the sensitivity function width _{H} = 5.

_{Θ} and _{Ξ} when the respective quantities Θ_{n} and Ξ_{n} are maximum. These time instances agree well with the divergence times _{b}. This confirms the single trial finding in _{n} and Ξ_{n} are good markers for filter innovation divergence. Moreover only few background members align and anti-align for _{l} ≤ _{H} (small values of _{a,u}), whereas many more background members align and anti-align for _{l} ≫ _{H} (_{l} ≤ _{H} (_{a} = 1, _{u} = 0) and most analysis members still align with their background members for _{l} ≫ _{H} (

Divergence times and ensemble member alignment dependent on the localization radius _{l}. _{n} (_{Θ}, black), time of maximum Ξ_{n} (_{Ξ}, blue), and the divergence time _{b} (green), see the Methods section 2.8 for definitions. _{a} (black) and anti-alignment ratio _{u} (red) defined in Equation (26). _{a} (black) and anti-alignment ratio _{u} (red) defined in Equation (28). In addition _{H} = 5 and results are based on the 200 numerical trials from

According to _{l} may be smaller (cases _{H} or both may be equal (cases

Sketch of different configurations of sensitivity function and localization area. The circles denote the different cases (n.m) The sensitivity function (blue) has its center at the center of the spatial domain and the localization function (red) is located about model grid element

Now let us take a closer look at each case, cf.

and

with the corresponding ensemble means at observed grid points _{o,i} and the analysis ensemble members

_{l} ≤ _{H}, |_{H}, and |_{l}

with the corresponding unobserved ensemble means _{u,i} and the analysis ensemble member

and

Firstly, let us consider the limiting case of local observations with _{H} = 1. Then case _{l} ≤ 20. Moreover, the sensitivity function of the observation is non-zero at the observation location only and hence the localization of the observation to the position of the sensitivity maximum (cf. section 2.4) is trivial. In case _{i}. This situation changes in case of nonlocal observations with _{H} > 1. Then case _{H}, the larger is the error induced by this localization approximation. Consequently, updates at grid points far from the observation location still consider the observation with weighted observation error _{i}, however the observation includes a much larger error than _{i} introducing an analysis update error.

From a mathematical perspective, in cases

Hence these two latter cases may cause detrimental effects. Consequently, cases _{l} ≠ _{H}, yields bad estimates of analysis updates that make the Kalman filter diverge. Conversely, case _{l} = _{H}, involves consistent updates only and detrimental effects as described for the other cases are not present. These effects may explain enhanced filter divergence for _{l} ≠ _{H} and minimum filter divergence for _{l} = _{H} seen in _{l} ≈ _{H} shown in

The important terms in case

and α_{i} appear in both cases _{o,u} represent the covariances between model and model equivalents perturbations over ensemble members and they may contribute differently to the intermittent divergence with increasing |_{l}−_{H}|. For a closer investigation of these terms, let us consider

in case

in case

may be helpful. The term _{o} (_{u}) is the maximum over time of the mean of (_{o})_{i}α_{i} ((_{u})_{i}α_{i}). This mean is computed over the set of observed (unobserved) grid points _{o} (_{u}). Consequently, _{i} compared to the observed covariances. This down-weighting results from the fact that unobserved grid points are more distant from the observation which yields smaller values of α_{i}. By definitions (35) and (36), thus

The corresponding quantities

define the time instances when these maxima are reached and Δ

To illustrate the importance of _{o} and its corresponding occurrence time _{o}, _{o} relative to the stop time _{stop} of filter iteration, i.e., _{stop} − _{o}. For divergent trials, _{stop} = _{b} is the time of divergence and for non-divergent trials _{stop} = 200 is the maximum time. _{o} is very close to the divergence time, whereas _{o} is widely distributed about _{o} = 110 (_{stop} − _{o} = 90) in non-divergent trials. This indicates that _{o} is strongly correlated with the underlying divergence mechanism.

The divergence correlates with the weighted model-observation covariances at observed grid points _{o}. The plots show the times of maxima _{o} (cf. Equation 37) to stop, i.e., _{stop} − _{o}. _{o} is the time when the mean model-observation covariance _{o} is maximum, for divergent (red-colored with break time _{stop}) and non-divergent (black colored with maximum time _{stop} = 200) trials. Here it is _{H} = 5.

Now that _{o} is strongly correlated with the filter innovation divergence, the question arises whether the difference between weighted observed and unobserved model-observation covariances is related to the innovation divergence. _{o} − _{u} and Δ_{o} − _{u} for divergent and non-divergent experimental trials. Most trials exhibit stronger model-observation covariances in unobserved grid points than in observed grid points (

Comparison of weighted model-observation covariances in observed and non-observed grid points. _{o} − _{u} is the difference between maximum weighted model-observation covariances in observed and unobserved grid points. _{o} − _{u} is the difference of times when the weighted model-observation covariances reach their maximum, cf. Equation (38). It is _{H} = 5.

In this context, re-call that _{u} > _{o} but _{u} < _{o} in divergent trials, i.e., unobserved grid points reach their larger maximum faster than observed grid points. This indicates that the model-observation covariance _{u} reflects the instability of the filter.

Several international weather services apply ensemble Kalman filters and assimilate both nonlocal and local observations. Performing assimilation experiments similar to the experiments for nonlocal observations but now with a single additional local observation at grid point _{H} = 1, the filter diverges rarely due to large innovations (with fewest trials at _{l} ≈ 10) but at a larger number than in the absence of local observations, cf. _{l} ≈ 10. In sum, the least number of divergent trials occur at _{l} = _{H} = 1 (blue curve in _{H} = 5 with a minimum innovation divergence rate at _{l} ≈ _{H} and a maximum catastrophic divergence rate at _{l} ≈ 10. Again, the least number of trials diverge at _{l} = _{H}.

Rate of filter divergence γ (innovation divergence, black line) and catastrophic filter divergence (any ensemble member diverges, red line) in the presence of a single local and a single nonlocal observation. The total number of divergent trials is the sum of innovation and member divergence-trials (blue line). Results are based on 200 numerical trials.

Ensemble Kalman filtering of nonlocal observations may increase the innovation in the filter process leading to larger observation-background departure bias and RMSE, cf.

The majority of previous stability studies of Kalman filtering involving nonlocal observations consider catastrophic filter divergence. Kelly et al. [

_{H} and localization radius _{l}. The LETKF diverges least when _{l} ≈ _{H} and hence this choice of localization radius is called optimal, i.e., the filter is least divergent. This insight agrees with the finding in an operational weather prediction framework involving the LETKF [

It is important to point out that, under certain conditions, it may be beneficial to further enlarge the localization area compared to the sensitivity function. If the system's activity synchronizes on a larger spatial scale, then information is shared between observed and unobserved grid points and a larger localization radius would be beneficial. Examples for such synchronization phenomena are deep clouds or large-scale winds in meteorology or locally self-organized spots in physical complex systems. In other words, to decide how to choose the localization radius one should take a closer look at the system's dynamics: if larger spatially synchronized phenomena are expected, then _{l} ≫ _{H} is preferable, otherwise _{l} ≈ _{H}.

Several previous studies have derived optimal localization radii for local observations in ensemble Kalman filter [

Essentially, it is important to point out that there may be not a single optimal localization radius but a range of more or less equivalent localization radii. This holds true for smooth localization functions, whereas non-smooth (i.e., unrealistic) localization functions do not show this uncertainty, cf. section 2.4.

It is important to understand why some numerical trials diverge and some do not. Direct and indirect markers indicate which dynamical features play an important role in divergence. The most obvious direct markers are the absolute values of the innovation and the ensemble member perturbation spread Θ_{n} and both increase sharply during filter innovation divergence, cf. _{n} also increases during divergence. Interestingly, Θ_{n} and Ξ_{n} remain finite and take their maxima just before the instance of divergence, cf. _{n} increases if both observed and unobserved errors increases. Kelly et al. [_{l} > _{H}, the larger the ensemble error in unobserved grid points compared to observed grid points. Hence the model-observation covariance reflects a degree of instability (and thus of divergence) in the filter and this is stronger in unobserved grid points than in observed grid points.

_{l} ≈ _{H} hints different underlying filter divergence contributions. If _{l} < _{H}, too few grid points are updated by the nonlocal observation (_{l} < _{H}.

For _{l} ≫ _{H}, a large number of grid points are updated which, however, consider an observation with a large intrinsic error resulting from, e.g., a too small number of ensemble members. The corresponding assimilation error is more subtle than for _{l} < _{H} and increases for larger localization radii only. The localized nonlocal observation comprises a representation error due to the reduction of the broad sensitivity function to a single location. For small ensembles, this implicit observation error contributes to the analysis update error and, finally, to filter divergence. In sum, the two inconsistencies illustrated in

Moreover, there is some evidence that ensemble member alignment may cause catastrophic filter divergence [_{l} ≤ _{H} but enhanced alignment and anti-alignment for _{l} > _{H}. The authors in [

In addition to the alignment mechanism, Equation (34) represents the covariation of ensemble perturbations in spatial and observation space at observed and unobserved spatial locations. For observed spatial locations, it is maximum just before the innovation divergence time. Moreover, it reaches its maximum at unobserved locations almost always before the maximum at observed locations are reached (

The present work considers the specific case of finite low ensemble size and application of the localization scheme. To understand better the origin of the filter divergence, it is insightful to study in detail the limiting case of large ensemble sizes, i.e., close to the model dimension, and a neglect of localization. Although this limit is far from practice in seismology and meteorology, where the model systems are too large to study this limit, nevertheless this limit study is of theoretical interest and future work will consider it in some detail.

There is some evidence that the optimal localization radius is flow-dependent [

In the majority of studies, the present work considers a non-smooth boxcar sensitivity function in order to distinguish observed and unobserved grid points. Although this simplification allows to gain deeper understanding of possible contributions to the filter divergence, the sensitivity function is unrealistic. A more realistic sensitivity function is smooth and unimodal or bimodal.

Moreover, the localization scheme of nonlocal observations applied in the present work is very basic due to its choice of the maximum sensitivity as the observations location. Future work will investigate cut-off criteria as such in [

Nevertheless the present work introduces the problem of intermittent innovation divergence, extends lines of reason on the origin of filter divergence to nonlocal observation and proposes new markers of innovation divergence.

The datasets generated for this study are available on request to the corresponding author.

AH conceived the work, performed all simulations, and wrote the manuscript.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AH would like to thank Roland Potthast for insightful hints on the stability of Kalman filters. Moreover, the author very much appreciates the valuable comments of the reviewers, whose various insightful comments helped very much to improve the manuscript.