^{*}

Edited by: George Michailidis, University of Florida, United States

Reviewed by: Annette Witt, Max-Planck-Institute for Dynamics and Self-Organisation, Germany; Alex Jung, Aalto University, Finland

This article was submitted to Statistics and Probability, a section of the journal Frontiers in Applied Mathematics and Statistics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Smartphone-based earthquake early warning systems (EEWSs) are emerging as a complementary solution to classic EEWSs based on expensive scientific-grade instruments. Smartphone-based systems, however, are characterized by a highly dynamic network geometry and by noisy measurements. Thus, there is a need to control the probability of false alarms and the probability of missed detection. This study proposes a statistical methodology to address this challenge and to jointly estimate in near real-time earthquake parameters like epicenter and depth. The methodology is based on a parametric statistical model, on hypothesis testing and on Monte Carlo simulation. The methodology is tested using data obtained from the Earthquake Network (EQN), a citizen science initiative that implements a global smartphone-based EEWS. It is discovered that, when the probability to miss an earthquake is fixed at 1%, the probability of false alarm is 0.8%, proving that EQN is a robust smartphone-based EEW system.

Wireless sensor networks (WSNs) enable solutions in multiple fields, and they are adopted in environmental, health, urban, and military applications [

This study focuses on earthquake early warning systems (EEWSs) [

Classic EEWSs are based on a dense network of scientific-grade instruments, with construction and operating costs on the order of millions of euros [

Due to smartphone technology, low-cost EEWSs have been recently implemented at the global level [

Within the EQN EEWS, nodes of the WSN are the smartphones voluntarily made available by citizens. This poses many challenges because personal smartphones mainly sense the “anthropic noise” connected with human activities.

The primary challenge faced by the EQN is to control the probability of false alarms and the probability to miss an earthquake. Alerts may be triggered by events unrelated to earthquakes and some (possibly strong) earthquakes may be missed, especially if the number of monitoring smartphones is small. Both false alarms and missed detections may undermine people's trust in the EQN.

In the pivotal study by Finazzi and Fassò [

This study proposes a statistical methodology for 1) controlling the probability of false alarms, 2) controlling the probability of missed detection, 3) classifying a detection between true and false earthquake, and 4) estimating earthquake epicenter and depth (if the detection is classified as a true earthquake).

The methodology is based on a statistical parametric model, statistical hypothesis testing, and Monte Carlo simulation. Contrary to model-less approaches (see for instance [

Due to the peculiarity of the specific application, real-time is a constraint. Ideally, classification and earthquake parameter estimation should not exceed 1 or 2 s of computing time.

The smartphone-based EQN is used to test the statistical methodology, which is then applied to some true and false EQN detections.

Before formalizing the classification and the earthquake parameter estimation problems, it is useful to detail the output of the earthquake detection algorithm currently implemented by the EQN [

An earthquake detection made by an EQN is defined in terms of _{j}>0 triggers, where _{j} is not a constant, meaning that each detection is characterized by a different number of triggers. Each trigger is described by the feature vector as follows:

where _{i}∈ℝ is the triggering time, while ^{3}. The _{j}×3 matrix

Let

The aim is to learn a hypothesis map

A statistical parametric model ^{s}, with _{j} as the vector size. The hypothesis map is then

When dealing with EEW systems, it is required to control two parameters: the probability α of missed detections (true earthquakes which are not detected by the system) and the probability β of false detections (detections which are not related to any occurred earthquake). It is thus reasonable to adopt a 0/1 loss function as follows:

and to learn a

As discussed by Jung [_{j}. Assuming to have a data set

and

Note that solving (Equation 2) is equivalent to solve

where it is made explicit that the probabilities of missed and false detections depend on

From an EEW perspective, the solution provided by Equation (3) is not necessarily the best. In some contexts, a missed detection has a larger negative impact than a false detection, while in other contexts, it is the opposite. In this case, one probability is fixed to the desired level, and the other probability is minimized. Two other minimization problems for learning

In this section, we propose a statistical parametric model for the generic data point

where

with

as the distance between the hypocentre and the smartphone location, _{O}∈ℝ is the earthquake origin time.

In Equation (8), _{i, E} is the distance between the epicenter _{E}∈[0, 500] is the earthquake depth, and

The role of the random component ϵ_{i} is to model the difference between the expected and the observed triggering time. This difference is mainly due to the smartphone detection delay and a seismic wave velocity that may differ from the expected value.

Equations (6–8) fully define the statistical model

Model estimation is based on the maximum likelihood method. For a generic EQN detection, the log-likelihood function based on the joint probability distribution of

The Δ_{i} are assumed to be independent. This assumption is realistic because smartphones do not share a common clock, detection delays are independent, and the detection by each smartphone is influenced by local factors (e.g., where the smartphone is located, at which floor of the building, and the accelerometer sensitivity).

Maximum likelihood estimates of _{E}, _{E}, _{E}, and _{O} are given by

The solution of Equation (10) cannot be obtained in a closed form due to the non-linearity of Equation (8) hence, estimates are obtained _{E}, _{E}, _{E}, and _{O}. The minimization in Equation (10) is possible because for any “proposed” values of the model parameters, _{i}.

At convergence, the BFGS quasi-network method also returns the Hessian matrix. Since maximum likelihood estimates for model parameters are obtained from a minimization problem, the Hessian is equivalent to the observed Fisher information matrix. The variance–covariance matrix of the three parameters is then the inverse of the Hessian matrix from which standard errors are easily computed.

Finally, the maximum likelihood estimate of the variance is as follows:

where

Among all elements of

In this study,

The null hypothesis is rejected when the variance is higher than expected, namely, when smartphone triggering times do not follow the propagation law of the primary or secondary seismic wave. As customary in the statistical hypothesis testing, the probability α is fixed, and it represents the probability to reject the null hypothesis when it is actually true (namely, it is the probability to miss a true earthquake).

The test statistic is as follows:

which, under the null hypothesis, is distributed as a chi-square with _{(1−α), df} is the (1−α)-quantile of a chi-square distribution with

Since we do not know which seismic wave is detected by the smartphones, two models

It is worth noting that the statistical hypothesis test is equivalent to a linear map. Indeed, setting

then

Finally, δ is obtained by solving the problem

EQN detection classification and earthquake parameters estimation.

1: Initialisations: the number of times _{E}, _{E}, _{E} and _{O} are randomized when solving (Equation 10). The degrees of freedom |

2: |

3: Sample _{E}, _{E}, _{E} and _{O} from uniform distributions. |

4: Solve the minimization problem in Equation (10). |

5: Compute |

6: Let _{E, z}, _{E, z}, _{E, z}, _{O, z} and |

7: Let _{E, z}, _{E, z}, _{E, z}, _{O, z}. |

8: |

9: Solve |

10: Maximum likelihood estimates of model parameters are |

11: Compute the quantile _{1−α, df} of a chi-square distribution with |

12: Set the vector |

13: Compute |

14: Return the classification ŷ. |

15: if ŷ = −1 then |

16: Return the estimated earthquake parameters |

17: |

The minimization problem in Equation (17) has no closed-form solution. For this reason, we implement a Monte Carlo simulation that aims to simulate a data set

A total of 1,000 true EQN detections and 1,000 false EQN detections are simulated considering the true locations of 1,000 smartphones of the EQN in Lima (Peru).

The probability of missed detection is fixed to α = 0.01 while δ is made varying from 0.1 to 1.5 with step 0.1. For each value of δ, β(δ) is computed by estimating the model ^{(j)} in

For simulating a true earthquake, the following aspects are taken into account: the earthquake epicenter and depth, the arrival time of the seismic wave at the smartphone locations, the earthquake detectability by the smartphone, and the error on the triggering time. Finally, we account for the fact that smartphones may detect events unrelated to the earthquake.

The epicenter locations (_{E} and _{E}) are simulated uniformly inside the coordinates box [−12.39°, −11.74°] for latitude and [−77.17°, −76.66°] for longitude. The box encompasses the EQN of Lima. On the contrary, the earthquake depth is simulated uniformly in the range [0, 100] km independently of the earthquake epicenter.

The arrival time of the seismic wave at each smartphone location is simulated from Equation (6) assuming _{O} = 0 and

Of the remaining 30% of smartphones which do not trigger, 6% are made triggering at random with a triggering time uniformly generated in the range [0, 12] s. This implies that when the earthquake is detected by the EQN detection algorithm, the list of triggering smartphones may include triggers unrelated to the earthquake dynamic.

Once the list of triggering smartphones is defined and sorted by triggering time, the EQN detection algorithm is applied to the list. The algorithm stops when the detection condition is satisfied, and the sub-list of triggers that concurred with the earthquake detection is given as the output.

Simulated true earthquake detection based on the EQN smartphone network of Lima (Peru). The diameter of circles is proportional to the triggering time.

To simulate a false detection, we assume that smartphones trigger at random with a triggering time that does not follow the law of seismic wave propagation. Only 30% of the smartphones are made triggering, and the triggering time is uniformly sampled in the range [0, 12] s.

Simulated false earthquake detection based on the EQN smartphone network of Lima (Peru). The diameter of circles is proportional to the triggering time.

The minimization of Equation (17) is attained when

Empirical distributions of

A by-product of detection classification is the estimate of the earthquake parameters.

Box plot of the errors on epicenter location (_{E}, _{E}) _{E}

The methodology developed in this study is applied to true and false detections made by the EQN. As a true earthquake, the event occurred near Genova (Italy) on 4 October 2022 at 21:41:10.5 UTC is considered.

EQN triggers for the earthquake occurred on 4 October 2022 close to Genoa (Italy). The diameter of circles is proportional to the triggering time.

Detection classification and earthquake parameters estimation for the EQN detection near Genova (Italy) assuming

Latitude (°) | 44.46 | 44.43 | 0.03 | 44.43 | 0.02 |

[44.38, 44.47] | [44.40, 44.45] | ||||

Longitude (°) | 9.06 | 9.06 | 0.00 | 9.03 | 0.03 |

[9.01, 9.11] | [9.03, 9.09] | ||||

Depth (km) | 8.00 | 0.01 | 7.99 | 0.01 | 7.99 |

[0.00, 8.36] | [0.00, 3.80] | ||||

Estimated variance | - | 0.57 | - | 1.03 | - |

Test statistic value | - | 17.18 | - | 31.02 | - |

Critical value | - | 34.80 | - | 34.80 | - |

Classification | - | True earthquake | - | True earthquake | - |

The number of triggering smartphones is

For both seismic wave velocities, we can observe that latitude and longitude are accurately estimated, while the error in depth is not negligible. Nonetheless, the true values are within the 99% confidence intervals evaluated from the standard errors on the model parameters. In addition, the earthquake is classified as true under both velocities since both observed test statistics are lower than the test critical value. This happens because triggers are close to the epicenter, and primary and secondary seismic waves are nearly concurrent.

The estimation and classification results were obtained in less than 1 s using an Intel(R) Core(TM) i7-9750H CPU @2.60GHz, suggesting that the approach can be adopted for real-time applications.

_{0} is rejected in both cases and the detection is claimed as false. In this particular case, the detection was caused by a strong lightning bolt. The speed of sound, however, is around 0.3 km/s, a value much smaller than the speed of primary and secondary seismic waves.

Triggers for the false EQN detection occurred on 25 September 2022, close to Acapulco (Mexico). The diameter of circles is proportional to the triggering time.

The methodology developed in this study allows to classify detections made by smartphone-based earthquake early warning systems between true (related to a real earthquake) and false. This is done analyzing the information content of the smartphone triggers that contributed to the detection.

With respect to classic classification problems, the data point describing the triggers has a varying dimension which depends on the smartphone network geometry. The proposed solution is based on two steps. First, a statistical parametric model is used to convert the data point into a parameter vector with a fixed (and small) dimension. Second, a hypothesis test is implemented for classification.

While we do not claim our choices of ^{*} =

Classification and earthquake parameter estimation are performed in near real time, making the statistical methodology suitable to be implemented in operational systems. On the contrary, the methodology does not fully exploit the information available on the EQN system. Specifically, the modeling is only on the triggering smartphones, while the active non-triggering smartphones are ignored. Knowing, at the EQN detection time, which smartphones have not (yet) triggered may better constraint epicenter and depth, thus improving their estimates.

In addition, for an EEWS like EQN that works globally, it would be important to study if the data set

Finally, a limit of the approach proposed by this study is that the statistical methodology is applied downstream of EQN detections. Ideally, the detection, the classification, and the earthquake parameter estimation problems should be jointly addressed in a unified approach. In this regard, the vast literature on wireless sensor networks may help propose a solution under the real-time constraint.

These open problems, along with the estimation of the earthquake magnitude, will be the focus of future works.

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

FF: conceptualization, writing–review, and editing. FM: investigation, methodology, validation, and writing–original draft preparation. All authors contributed to the article and approved the submitted version.

This article was funded by the European Union's Horizon 2020 Research and Innovation Program under grant agreement RISE No. 821115.

Authors thank the reviewers and the associate editor for the well-targeted suggestions that considerably improved the quality of the article.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Opinions expressed in this article solely reflect the authors' views and the EU is not responsible for any use that may be made of information it contains.