Front. Appl. Math. Stat. Frontiers in Applied Mathematics and Statistics Front. Appl. Math. Stat. 2297-4687 Frontiers Media S.A. 602621 10.3389/fams.2020.602621 Applied Mathematics and Statistics Original Research Spatiotemporal Large-Scale Networks Shaped by Air Mass Movements Choufany et al. Spatiotemporal Air Mass Movements Networks Choufany M. 1 * Martinetti D. 1 Senoussi R. 1 Morris C. E. 2 Soubeyrand S. 1 INRAE, BioSP, 84914 Avignon, France INRAE, Pathologie Végétale, 84143 Avignon, France

Edited by: Quoc Thong Le Gia, University of New South Wales, Australia

Reviewed by: Xin Guo, Hong Kong Polytechnic University, Hong Kong

Lu Xiong, Middle Tennessee State University, United States

Michele Bellingeri, University of Parma, Italy

*Correspondence: Maria Choufany, maria.choufany@iplesp.upmc.fr

This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics

30 03 2021 2020 6 602621 03 09 2020 02 12 2020 Copyright © 2021 Choufany, Martinetti, Senoussi, Morris and Soubeyrand. 2021 Choufany, Martinetti, Senoussi, Morris and Soubeyrand

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The movement of atmospheric air masses can be seen as a continuous flow of gases and particles hovering over our planet, and it can be locally simplified by means of three-dimensional trajectories. These trajectories can hence be seen as a way of connecting distant areas of the globe during a given period of time. In this paper we present a mathematical formalism to construct spatial and spatiotemporal networks where the nodes represent the subsets of a partition of a geographical area and the links between them are inferred from sampled trajectories of air masses passing over and across them. We propose different estimators of the intensity of the links, relying on different bio-physical hypotheses and covering adjustable time periods. This construction leads to a new definition of spatiotemporal networks characterized by adjacency matrices giving, e.g., the probability of connection between distant areas during a chosen period of time. We applied our methodology to characterize tropospheric connectivity in two real geographical contexts: the watersheds of the French region Provence-Alpes-Côte d’Azur and the coastline of the Mediterranean Sea. The analysis of the constructed networks allowed identifying a marked seasonal pattern in air mass movements in the two study areas. If our methodology is applied to samples of air-mass trajectories, with potential implications in aerobiology and plant epidemiology, it could be applied to other types of trajectories, such as animal trajectories, to characterize connectivity between different components of the landscape hosting the animals.

aerobiology air masses dynamics connectivity spatiotemporal network spatial network
Introduction

Atmospheric air masses are volumes of air with a defined temperature and water vapor content that have long been known to rule fundamental atmospheric phenomena like weather and air currents. Their composition is mostly inert gases, but both organic and inorganic particles have been found to linger in high-altitude air as a consequence of the constant interaction of air masses with the earth’s surface below them. A non-exhaustive list includes gases and minerals like wildfire smoke, radioactive material, dust, sand, volcanic ash and sea salt, but also living organisms such as pollen, fungal spores, bacteria, virus and small insects. Despite the relative sparse density of these particles with respect to the volume of an air mass, their presence and transportation across the planet has proven to have strong effects on many phenomena impacting human health and safety (pollen , dust concentrations [4, 5], nuclear byproducts [6, 7], human, animal and plant epidemics , air pollution , and rainfall ).

The rise in the number of publications on these subjects suggests a growing interest of the scientific community on the effects of air-mass movements on the biosphere, that has surely been boosted by recent available developments, such as the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT ), allowing reconstruction of actual air-mass movements at rather fine geographical and temporal scales and with a global cover.

The vast majority of studies focused on isolated events, such as dust storms or peaks of air pollutants, that are rather concentrated in time (from few hours to few weeks) and/or space (just a few locations such as cities). Nonetheless, the movement of air masses is expected to have impacts on a broader spatiotemporal scale, as reviewed in recent studies [8, 21]. The purpose of the present paper is then to propose a mathematical framework for studying air-mass movements on large spatiotemporal scales, under the hypothesis that these movements can create stable and recurrent connections between distant portions of a territory. The very nature of these connections will be further specified throughout the manuscript, but as a general rule we will consider that any pair of points (or areas) in space can have a certain degree of connection, regardless of their geographic distance, provided that there are recurrent air-mass trajectories that connect the two points (or areas). The direction and strength of these connections will be estimated by looking at the trajectories linking every pair of points/areas and weighting them according to appropriate measures. In this perspective, it seems natural to resort to graph and network theory, since the formalism of nodes and edges provides an adequate environment for describing complex connections and can further be used to deepen into the topology of the constructed networks in order to infer interesting properties of the graphs, such as the presence of hubs.

From a generic statistical point of view, we aim to (i) estimate the weighted and directed edges of a graph using a sample of trajectories of individuals traveling through the space formed by the nodes of the graph, and (ii) characterize the estimated graph based on relevant statistics.

In the following sections, we first introduce the definitions and properties that will allow us to describe and then estimate connections between points/areas in space via spatiotemporal trajectories. Then, we propose several types of measures to model diverse types of connections. The expected output consists of a spatiotemporal graph describing the network of links induced by trajectories. It’s worth noting that our approach is meant to infer connectivity induced by air-mass movements and it is readily applicable to HYSPLIT-type data, but we have maintained a sufficient level of generality to be applied to other phenomena, provided that trajectory data are available (e.g., animal trajectories). Finally, we apply our method to two case studies concerning the coastline of the Mediterranean sea and the French region of Provence-Alpes-Côte d’Azur. The two case studies have different spatiotemporal granularities and they will be used to provide examples of application of the proposed methodology.

Framework for the Definition of Trajectory–Based Networks

In this section we show how a set of trajectories evolving within space during a finite time interval can be used to construct pertinent spatiotemporal networks. We first recall some basic definitions related to networks (Section 2.1) and then propose a statistical methodology to infer the network structure from a data set of trajectories (Section 2.2).

Network Theory

Network theory (a.k.a. graph theory) is a mathematical formalism introduced by Leonhard Euler to describe the famous Königsberg bridge problem . The two basic components of a network are a set of nodes linked by a set of edges. Nodes can represent a variety of things, such as persons, regions, computers, neurons, etc., while edges are used to describe the connections between those nodes. Formally, a network G = ( V , E ) is defined as a set of nodes (or vertices) V = { v 1 , v 2 , , v N } connected by a set of edges E = { e i j } i , j { 1 , , N } . A natural way of representing a network is given by means of a N × N square matrix M, usually referred to as an adjacency matrix, whose term (i, j), M ij , is non-zero an edge exists between i and j. By convention, adjacency matrices are defined to have an empty diagonal (i.e. M i i = 0 , i { 1 , , N } ), meaning that nodes cannot be self connected. If M is symmetrical (i.e. M i j = M j i , i , j { 1 , , N } ), then the network is said to be undirected, and directed otherwise. If M i j { 0,1 } , the network is said to be binary, meaning that an edge between two nodes i and j either exists or does not. Otherwise, if M i j , the network is said to be weighted, meaning that the value of the edge between nodes i and j accounts for the magnitude of their interaction. For example, in real social networks, it can measure the number of interactions between individuals, in transportation networks, the number of passengers between hubs (stations, airports, etc.) or in scientific networks, the number of collaborations between co-authors .

In this paper, a network is said to be spatial  when nodes correspond to geographic locations, while we use the term temporal  to refer to networks where edge values can change over time. Finally, we will use the term spatiotemporal network to refer to network that are simultaneously spatial and temporal, under the constraint that nodes cannot change position, neither appear nor disappear over time. The networks considered in this paper also fall into the rather generic definition of spatiotemporal networks. If the spatial qualifier means that the nodes of the networks represent fixed geographical locations, the temporal qualifier is more complex. Indeed, temporal networks are generally divided in the literature into two main classes, namely contact graphs or interval graphs . The former type refers to networks where edges represents instantaneous contacts between nodes (Figure 1A), while in the second type edges are active over time intervals instead of instants of time (Figure 1B). In this paper we propose a new definition of spatiotemporal networks where nodes correspond to disjoint regions of the space and edges are computed as a function of the flow of trajectories linking these nodes (Figure 1C), as it will be explained in the rest of the current section.

Types of temporal networks: The time of activation is indicated within the grey bar next to the edges (ranging between 0 and 8). For contact networks (A), edges activate only for one instant at the time and are marked with black vertical lines inside the grey bars. For example, in panel (A), the edge between nodes A and B is only active at instants 2, 4, 6, and 7. For interval networks (B), edges can be activated during an interval of time. For example, the edge between A and B in panel (B) is active during the time intervals (0, 3) and (6, 8). For contact networks (C), the edges are quantitatively more or less active across time, and the quantity of activity of any edge is described by a temporal function.

Flows and Trajectory Segments

We consider a function Φ : × × Ω Ω , usually called flow on the spatial domain Ω of d , satisfying the following properties: { Φ ( t , s , x ) = Φ ( t , t , Φ ( t , s , x ) ) Φ 1 ( t , s , . ) = Φ ( s , t , . ) , where s , t , t and x Ω . For fixed t and s, the flow Φ ( t , s , ) is a spatial transformation. For fixed x and varying s or t, the function gives a forward or backward trajectory of a particle over Ω between times t s = inf ( t , s ) and t s = sup ( t , s ) . If s t , y = Φ ( t , s , x ) gives the future location at time t of the particle presently located at x at time s. Contrarily, if s t , y = Φ ( t , s , x ) gives the location at past time t that was occupied by the particle located at x at present time s. Φ ( t , s , . ) is assumed to be a bijective mapping meaning that particles following distinct trajectories cannot be at the same location at the same time.

In general, a flow is defined with respect to a possibly time-dependent vector field F over × Ω , as the solution u : Ω of an ordinary differential equation (see e.g., Hamilton’s equations in classical mechanics) with specified initial condition at a specified time s: { d u d t ( t ) = F ( t , u ( t ) ) u ( s ) = x , where F is continuous and Lipschitzian over × Ω . In the setting introduced above, Φ ( t , s , x ) = u ( t ) with Φ ( s , s , x ) = u(s) = x. The solution u represents the trajectory of the particle located at x at time s. Varying the initial condition in System (2), i.e. varying s and x, leads to consider pieces of trajectories of all particles which dynamics are governed by the vector field F. In this article, the vector field F will not be made explicit, but we will consider samples of trajectory segments (defined below) for constructing trajectory-based networks.

Definition 2.1. The trajectory segment associated to the flow Φ over the time interval Δ t s = [ t s , t s ] , s , t , for a particle located at x Ω at time s is defined as follows: Γ ( t , s , x ) = { ( t , Φ ( t , s , x ) ) : t Δ t s } . If s < t (resp. s > t ), Γ ( t , s , x ) is a forward (resp. backward) trajectory segment. In this article, we are mainly interested in backward trajectories, but the framework presented here encompasses forward trajectories as well.

Example 2.1. The notions of flow and trajectory segment can be adapted to cope with air mass trajectories over the Earth surface. In this case, the spatial domain Ω representing the Earth surface is the sphere S 2 in 3 . If in addition, air masses are characterized by altitude and temperature evolving in space and time, then Ω = S 2 × + × , where + (resp. ) is the domain of the altitude (resp. temperature) coordinate.

Example 2.2. Animal movements and behaviour activities can also be represented with the notions of flows and trajectory segments, providing, for instance, the animal locations and the covariate value indicating whether animals are feeding or not. In this case, Ω = 2 × { 0,1 } , where 1 stands for ‘the animal is feeding’ and 0 otherwise. The use of a binary variable for describing the feeding activity may require the use of stochastic processes or generalized functions undergoing dynamic analog to the System (2) for constructing the flow if it is defined with respect to a vector field F.

Pointwise and Integrated Connectivities

Trajectory-based networks are grounded on the notion of connectivity used as a quantitative, directed measurement of edges between graph nodes. In this aim, we first define the pointwise connectivity as a measure (or submeasure), in the mathematical sense, of the connectivity between a subset A and a point x of Ω induced by the trajectory segments Γ ( t , s , x ) of a particle located at x at time s. Then, we use the pointwise connectivity to define the integrated connectivity between two subsets A and B of Ω over a temporal domain Δ of ( Δ can be the union of disjoint intervals).

Definition 2.2. Let x Ω and A ( Ω ) , where ( Ω ) is a σ-algebra of subsets of Ω . The pointwise connectivity associated to the flow Φ is defined as a real valued function Ψ on ( Ω ) × × × Ω , conveniently denoted by Ψ ( A | t , s , x ) , where A Ψ ( A | t , s , x ) is a measure or a submeasure on Ω for each t , s , x

Diverse types of the pointwise connectivity can be constructed, either using trajectory segments generated by Φ , or directly using Φ . Specific pointwise connectivities can include environmental covariates and even covariates associated to very the movements of particles. Below, we give several examples of such specifications. Some of these examples are graphically represented in Figure 2. Most examples are particularly relevant when Ω is a simple geographic domain and when Φ defines movements of individuals (e.g., air masses, animals or particles) within Ω .

Illustration of contact-based, duration-based and length-based pointwise connectivities: Ψ C , Ψ D and Ψ L are between the elliptic spatial domain A 2 and different spatial points x at time s = 1 , for Δ t s = [ 0,1 ] . The left curve on panel (A) never enters the domain A . The middle curve on panel (A) enters A (red part of the curve) over a relatively long duration [as shown by Panel (B)] but a short distance [as shown by Panel (B)]. The right curve on panel (A) enters A over a shorter duration but a longer distance. Thus, Ψ C ( A | t , s , x ) , Ψ D ( A | t , s , x ) and Ψ L ( A | t , s , x ) are zero for the left curve; Ψ C ( A | t , s , x ) = 1 for the two other curves; Ψ D ( A | t , s , x ) is larger for the middle curve than for the right one, whereas Ψ L ( A | t , s , x ) is larger for the right curve than for the middle one.

Example 2.3. The contact-based pointwise connectivity is defined by: Ψ C ( A | t , s , x ) = 1 { A t s Γ Γ ( t , s , x ) } , where A t s = Δ t s × A and 1 denotes the indicator function. Ψ C ( A | t , s , x ) indicates whether or not the particle whose movement in Ω is governed by Φ ( , s , x ) hit A during the time interval Δ t s . Note that A Ψ C ( A | t , s , x ) is only a submeasure on Ω since Ψ C ( A A | t , s , x ) Ψ C ( A | t , s , x ) + Ψ C ( A | t , s , x ) for disjoint sets A and A of ( Ω ) .

Remark 1. This example based on the simple contact between sets can be considered as too strict from a statistical and measure-theory perspective since the length or the duration of a contact may be null. Instead, a positive constraint on contact length for example can be used to define another version of the contact-based pointwise connectivity: Eq. 4 could then be replaced by Ψ C ˜ ( A | t , s , x ) = 1 { L ( A t s Γ ( t , s , x ) ) > 0 } , where L ( A t s Γ ( t , s , x ) ) denotes the length of the curve Γ within A. The length operator L will be made explicit in Example 2.5.

Example 2.4. The duration-based pointwise connectivity is defined by: Ψ D ( A | t , s , x ) = Δ t s 1 { Φ ( v , s , x ) A } d v , to measure the duration spent by the particle in A during Δ t s .

Example 2.5. The length-based pointwise connectivity is defined by: Ψ L ( A | t , s , x ) = Δ t s 1 { Φ ( v , s , x ) A } D v Φ ( v , s , x ) d v , where D v Φ ( v , s , x ) stands for the derivative with respect to the time variable v of the flow Φ (which is a vector function), and | | | | denotes the Euclidean norm. Ψ L ( A | t , s , x ) measures the distance travelled within A by the particle during Δ t s .

Example 2.6. The pointwise connectivity based on local volume is defined by: Ψ V ( A | t , s , x ) = Δ t s 1 ( Φ ( v , s , x ) A ) | det ( J Φ x ( v , s ) ) | d v where det ( J Φ x ( v , s ) ) is the determinant of the Jacobian matrix (with respect to x) of the spatial transformation Φ ( v , s , ) . The absolute value | det ( J Φ x ( v , s ) ) | of the Jacobian determinant at x gives the ratio by which the function Φ ( v , s , ) expands/shrinks infinitesimal volumes around location x into infinitesimal volumes around location Φ ( v , s , x ) . In other words, Ψ V ( A | t , s , x ) assesses how particle density increases or decreases from x to A along the time interval Δ t s . Intuitively, if n particles are initially in A and if the infinitesimal volume around any of these particles tends to shrink from A to x, then one expects a high concentration of particles in a fixed volume around x and, therefore, a high connectivity from A to x. Conversely, if the infinitesimal volume around a particle tends to expand from A to x, then one expects a lower concentration of particles in the same fixed volume around x and, therefore, a lower connectivity from A to x.

More sophisticated specifications of the pointwise connectivity can be proposed by incorporating spatio-temporal covariates in its formulation, like in the following examples.

Example 2.7. Let G denote a time-varying vector field defined over × Ω . The pointwise connectivity based on the external vector field G is defined by: Ψ G ( A | t , s , x ) = Δ t s 1 ( Φ ( v , s , x ) A ) | < D v Φ ( v , s , x ) , G ( v , Φ ( v , s , x ) ) > | d v where < D v Φ ( v , s , x ) , G ( v , Φ ( v , s , x ) ) > is the scalar product between the derivative with respect to the time variable v of the flow Φ (which is a vector function) and the vector field G. Larger the average collinearity in A between the instantaneous movement of the particle and the simultaneous direction of the vector field G, higher the connectivity between A and x. For instance, if Φ gives the movement of air masses and G provides the intensity and the direction of a continuous release of specific particles, then the connectivity will be high (resp. low) if the movement of the air in A and the movement of particles released in A are approximately collinear (resp. orthogonal).

Example 2.8. Let Z and Z ˜ be positive real valued spatio-temporal functions defined over × Ω . The pointwise connectivity based on Z and Z ˜ is defined by: Ψ Z , Z ˜ ( A | t , s , x ) = Z ( s , x ) Δ t s 1 ( Φ ( v , s , x ) A ) Z ˜ ( v , Φ ( v , s , x ) ) d v . This form of pointwise connectivity may represent, for example, (i) the negative effect of the altitude of the air mass when it is above A on the recruitment of specific particles from the ground, and (ii) the positive effect of rainfall at ( s , x ) on the deposition of particles from the air mass to the ground (see Figure 3). Thus, lower the average altitude of the air mass above A and more intense the rainfall at ( s , x ) , larger the contribution to the connectivity between A and x. This is expressed in Eq. 8 as follows: (i) Z ˜ is defined as the binary function indicating whether or not the altitude of the air mass (located at x at time s) is lower than a threshold h when it is located at Φ ( v , s , x ) at time v; (ii) Z is a function of the local rainfall intensity at ( s , x ) .

Illustration of pointwise connectivity based on a covariate measured along the trajectory: (see Example 2.8). In this illustration, the passage of the particle in the elliptic spatial domain A contributes to the pointwise connectivity [red part of the curve in Panel (A)] only when the particle is at an altitude lower than a threshold value [grey part of the curve in Panel (B)].

Remark 2. If in Example 2.8, the altitude of the air mass is incorporated as the third coordinate of Φ and A is a 3D-domain vertically limited by the threshold value h, then, Eq. 8 is simplify reduced to Eq. 5.

Remark 3. Example 2.8 could be generalized by considering a measure, say μ, over , to handle the potential contribution of discrete-time events to the pointwise connectivity: Ψ Z , μ ( A | t , s , x ) = Z ( s , x ) Δ t s 1 ( Φ ( v , s , x ) A ) Z ˜ ( v , Φ ( v , s , x ) ) d μ ( v ) . Remark 4. In the same vein, Example 2.8 can also be modified by adding within the integral the term D v Φ ( v , s , x ) arising in Eq. 6 to account for a supplementary effect of the distance travelled within A on the pointwise connectivity.

Each pointwise connectivity defined above can be used for defining the integrated connectivity, which measures the quantitative directional link between two subsets A and B of ( Ω ) generated by trajectories of particles located in B at times belonging to the temporal domain T.

Definition 2.3. Let A and B be two sets of ( Ω ) and T a subset of the temporal domain . The δ-lag integrated connectivity linking B to A over T is defined by: Ψ ν , δ ( 2 ) ( B × A | T ) = T × B Ψ ( A | s + δ , s , x ) ν ( d s , d x ) , where δ and ν is a measure on × Ω .

Definition 2.3. encompasses connectivities generated by either forward or backward trajectories, depending on the sign of δ. The use of a unique duration | δ | could be relaxed to account for space-time heterogeneities in the duration of trajectories. It could even be infinite by introducing a measure over time like in Eq. 9.

The measure ν in Definition 2.3 can be continuous, discrete or hybrid over × Ω . Indeed, if particles of interest are air masses, then B can be considered as continuously filled in space and time. Conversely, if particles of interest are animals of a specific species, then animals occupy only punctual locations in B at each time and the measure x ν ( d s , d x ) , given s, is discrete in Ω , whereas the temporal component of ν is continuous. Another examples occurs when the time s corresponds to death times of animals, then ν is both discrete in space and time with a mass only at a countable collection of space-time points.

Trajectory–Based Network

Definition 2.4. A trajectory-based network generated by Ψ ν , δ ( 2 ) (given by Eq. 10) over the temporal domain T , is a graph whose nodes A i , i = { 1 , , I } , are disjoint sets of Ω in ( Ω ) and whose directed edges are weighted by integrated connectivities M i j = Ψ ν , δ ( 2 ) ( A i × A j | T ) , 1 i , j I and i j .

Definition 2.4. corresponds to a spatial trajectory-based network evaluated over the fixed temporal domain T. It can be extended in different ways to obtain spatiotemporal analogs. For example, if T 1 , , T K denote K disjoint but successive time intervals with equal lengths, then the sequence of trajectory-based networks generated by Ψ ν , δ ( 2 ) ( × | T k ) , k = 1 , , K , forms a spatiotemporal trajectory-based network that can be analyzed to assess how connectivities across space are changing with time. This is one of the issues considered in Section 4.2.

Estimation of Integrated Connectivities

In practice, the integral defining the integrated connectivities between subsets of Ω (Definition 2.3) cannot be analytically computed in general, but can be estimated from a sample of trajectories. For instance, the integrated connectivity Ψ ν , δ ( 2 ) ( B × A | T ) can be estimated by its empirical counterpart obtained by importance sampling, say Ψ ^ ν , δ ( 2 ) ( B × A | T ) : Ψ ^ ν , δ ( 2 ) ( B × A | T ) = | T | N | B | N k = 1 N l = 1 N Ψ ( A | s k + δ , s k , b l ) , where s 1 , , s N T and b 1 , , b N B are times and locations, respectively, randomly drawn under the measure ν restricted to T × B , | T | and | B | are the length and area of T and B, respectively, and Ψ ( A | s k + δ , s k , b l ) is the pointwise connectivity associated to the trajectory of the particle located at b l at time s k and observed over Δ s k + δ , s k = [ s k s k + δ , s k s k + δ ] .

If ν is constant, other classical numerical approaches can be applied to approximate the integral, such as an hybrid approach in which the mid-point rule is applied in time and a regular point process is used in space. In such a case, the integrated connectivity estimator is also given by Eq. 11.

Example 3.1. Using Eq. 11, the contact-based integrated connectivity corresponding to Example 2.3 is estimated by: Ψ ^ C , δ ( 2 ) ( B × A | T ) = | T | N | B | N k = 1 N l = 1 N 1 { A s k + δ , s k Γ ( s k + δ , s k , b l ) } , where A s k + δ , s k = Δ s k + δ , s k × A . Thus, Ψ ^ C , δ ( 2 ) ( B × A | T ) is simply the proportion of sampled trajectories intersecting A, multiplied by | T | | B | .

Example 3.2. Using Eq. 11, the duration-based integrated connectivity corresponding to Example 2.4 is estimated by the average duration of the intersections between the sampled trajectories and A, multiplied by | T | | B | .

Example 3.3 Using Eq. 11, the length-based integrated connectivity corresponding to Example 2.5 is estimated by the average length of the intersections between the sampled trajectories and A, multiplied by | T | | B | .

Applications

In this section, we applied our general framework to the flow of air mass movements. Indeed, these movements compiled over years were used to characterize climatic patterns  and to describe the transport of pollutants . We show now how to deploy our approach for constructing air-mass movement networks in two real geographical contexts, namely the coastline of the Mediterranean Sea and the French region of Provence-Alpes-Côte d’Azur. These two examples have been chosen in order to prove the flexibility of our approach to different situations and geographical scales.

Case Study Regions and Network Construction

The first study region corresponds to the coast of the Mediterranean Sea, ranging approximately 1,600 km from north to south and 3,860 km from east to west. The temperate climate of the chosen region is strongly influenced by the presence of the Mediterranean Sea, with mild winters, hot summers and relatively scarce precipitations events. The landscape is characterized by coastal vegetation, typically shrubs and pines, and densely populated areas with intensive crop production of wheat, barley, vegetables and fruits, especially olive, grapes and citrus. In this paper, we characterize recurrent movements of air masses through the Mediterranean region by defining a grid with mesh size 74 km covering the coastline from 5 up to 250 km inland from the coast, including the four largest islands (namely Sicily, Sardinia, Cyprus and Corsica). Thus, we divide the region into 604 cells, where the centroids of the cells will be used as arrival locations of air-mass trajectories and will correspond to the nodes of the constructed network.

The second study region corresponds to the French region of Provence-Alpes-Côte d’Azur (PACA, hereafter), located in the south-eastern part of France and characterized by a rather complex landscape formed by a densely-populated coastline, agricultural lands (high-value-crops with fruit and olive orchards, vineyards, vegetable cultivation and horticulture), and natural mostly-alpines areas. The choice of this particular region is justified in the context of a research project aimed at assessing the potential long-distance dissemination of phytopathogenic bacterial populations that are known to be transported by air currents. The bacteria of interest (e.g., Pseudomonas syringae) can be lifted in to the air from a source location and then be passively transported by air masses until they are deposited back to land onto a different, far away sink location. Since the life cycles of the considered species of bacteria are strongly linked to the water cycle , we naturally partitioned the study area in a way that fit this assumption. Hence, we considered the 294 watersheds of the PACA region to define the sites that will later constitute the nodes of the constructed network. Since watersheds have irregular shapes and varying sizes, we selected a certain number of arrival locations per watershed (between 1 and 10 and proportionally to the watershed area) in order to cover the watersheds consistently according to the relative importance of their size and estimate the integrated connectivities. In total, a set of 833 arrival locations for air-mass trajectories was generated.

Once the arrival points for the two study regions have been established, we turned to the computation of air-mass trajectories arriving at the prescribed locations using the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT ). The HYSPLIT model can be fed with meteorological data from the Global Data Assimilation System files with a 0.5-degree spatial resolution (GDAS) and was tuned by us to return 48 h backward air-mass trajectories arriving at the prescribed locations at an altitude of 500 m above mean sea level. A single trajectory consists of a vector containing the hourly positions (longitude, latitude and altitude) visited by the air mass before arriving at the specified location and time. Air-mass trajectories have been computed for every arrival location (604 for the Mediterranean region and 833 for the PACA region) and for every day between January 1, 2011 and December 31, 2017 (arrival hour is 12:00 GMT). The total number of computed trajectories is 1,543,220 for the Mediterranean region and 2,128,315 for the PACA region.

The final step for the construction of the networks is the estimation of the adjacency matrices of the networks, based on the methodology presented in the previous sections. To do that, for each pair of subsets of the spatial domain, we used the daily 48 h backward trajectories arriving at the locations sampled within the receptor subset, and computed the contact-based estimator (see Example 3.1). The subsets of the spatial domain are the watersheds for PACA and circular buffers of radius 20 km for the Mediterranean region, as in .

In this work we will consider networks corresponding to three temporal contexts: (i) the spatial networks obtained when T is the entire period 2011–2017, (ii) the yearly spatiotemporal networks formed by the seven spatial networks obtained when T 1 encompasses the year 2011, T 2 encompasses 2012 and so on, and (iii) monthly spatiotemporal networks formed by the twelve spatial networks obtained when T 1 represents every January from 2011 to 2017, T 2 every February from 2011 to 2017, and so on. In all these cases, we consider that the length of the time interval was one to easily compare the inferred networks (i.e., | T | = 1 in Eqs. 11, 12).

Network Analysis

The networks we constructed are directed and weighted by contact-based connectivities generated by air mass trajectories and estimated with Eq. 12. They are inherently complex by the sheer amount of spatial and temporal information that they encompass. Hence, there is no easy way of representing the results either graphically or numerically, without compromising the original complexity of the networks. While a comprehensive study of the spatiotemporal properties of these networks goes beyond the scope of the paper, we explore some generic properties by means of well-known global topological indices:

• Density (Dens), is computed as the ratio between the sum of all edge weights and the number of all possible edges  and it measures how dense is a network with respect to a fully-connected network having the same number of nodes.

• Transitivity (Trans) or clustering, is computed by averaging the weighted clustering coefficient proposed by  across all nodes. This index measures the local cohesiveness of a node i with all the triples of its connected neighbors by considering their relative weights with respect to the node strength (defined as the sum of the weights of all the edges pointing to or from a node).

• Strength correlation (SC) is measured as the correlation between the incoming and outgoing strengths, computed as the sum of the weights of the edges pointing to or from a given node, respectively. Networks with positive (resp. negative) strength correlation are known to foster (resp. hamper) epidemic spread .

Other measures that are usually adopted to characterize network topology are meant to measure the length or cost associated to the movement between nodes following the edges of the network. We notice that, under the current framework, the weight computed between two nodes is proportional to the number of air-mass trajectories that connect them. Hence, higher values of the edge weight are associated to a higher connectivity between nodes. This is nonetheless incompatible with existing network search algorithms used to identify the shortest path between nodes since they usually consider the weight of an edge as a kind of distance or cost, hence the higher the weight, the less likely the connection between the nodes (e.g., the Dijkstra’s algorithm for weighted directed networks; ). On the other hand, it suffices to transform our weights into effective distances d i j = 1 log ( w i j ) in order to obtain a representation of edge weights that is coherent with the distance interpretation of the search algorithms (), after having row standardized the adjacency matrices in order to ensure that w i j [ 0,1 ] , i , j . After this transformation, we then computed four more indices:

• the average shortest path between every pair of nodes, computed as the average shortest effective distance (ED);

• the average number of nodes in the shortest paths (ASP);

• the network diameter (Diam), computed as the maximum effective distance of a network, and the number of nodes that have to be crossed in this longest path;

• the small worldness (SW) property, computed as the ratio between the clustering and the average shortest path distance [33,34], that accounts for the facts that certain networks are highly clustered and have relatively short shortest paths, a condition that is known to favor rapid disease spread on the network.

Results

The two spatial trajectory-based networks representing the strength of tropospheric connections in the Mediterranean region and PACA during the entire period 2012 to 2017 are represented in Figure 4. It can be observed that the strongest connections tend to link nodes that are geographically close, but nonetheless moderate connections also exist between rather distant nodes (see also Supplementary Figure S1). The connectivity in PACA is mostly oriented from North-East to South-West, which corresponds to the direction of the prevailing wind in this region. For the Mediterranean basin, the orientation of the edges depends on the region and does not have a fixed direction (see Supplementary Figure S2). An interesting additional difference between the two networks is that the one for PACA has negative strength correlation (−0.85), meaning that nodes having a high incoming strength tend to have low outgoing degree, and vice versa. On the other hand, for the Mediterranean network, the value of the strength correlation is moderately positive (0.31), meaning that nodes having high incoming strength tend to have also high outgoing strength.

Networks weighted by contact-based connectivities generated by air mass trajectories: The connectivities are generated by air mass trajectories between (A) the 604 sampled circular areas within the Mediterranean basin and (B) the 294 watersheds of PACA. Edges with weights lower than 0.3 for (A) and 2 × 10 3 for (B) are not drawn. The cuts of the intervals in the two legends are chosen in such a way that each interval contains 20% of the observed data. The differences in the values taken by the connectivities in (A) and (B) are due to different measures of the area | B | in Eqs. 11, 12: | B | = 1 for each node in (A), whereas B is the actual area (expressed in km2) of each whatershed in (B).

Interestingly, we can observe that the indices provided in Tables 1,2 are overall more variable for the monthly spatiotemporal trajectory-based networks than for the yearly ones, indicating that the average air-mass connectivity is more variable within any given year than between different years. In order to confirm this observation, we performed a Ward-linkage hierarchical clustering method  based on the Euclidean distances of the standardized values of the 7 indices of Tables 1,2 for the monthly networks of the Mediterranean and PACA.

Network indices [Diameter (Diam), density (Dens), transitivity (Trans), average shortest path (ASP), effective distance (ED), small worldness (SW), strength correlation (SC)] calculated from the networks covering the Mediterranean region and estimated in three temporal contexts: the entire period 2012–2017, yearly time periods from 2012 to 2017 and monthly time periods.

Mediterranean region
Diam Dens (×10−3) Trans ASP ED SW SC
2012–2017 7 0.30 0.74 3.11 14.13 2.34 0.05
2012 8 11.0 0.78 3.66 14.06 2.35 −0.04
2013 8 11.4 0.77 3.50 13.91 2.35 0.04
2014 16 11.6 0.78 4.18 17.90 2.25 0.06
2015 13 11.1 0.78 3.80 15.57 2.34 0.08
2016 9 14.0 0.77 3.61 14.65 2.28 0.08
2017 9 11.0 0.77 3.51 13.99 2.37 0.02
January 11 0.31 0.74 3.74 12.44 5.05 −0.11
February 22 0.30 0.74 4.16 21.67 5.65 −0.12
March 12 0.30 0.74 4.14 15.34 5.60 −0.03
April 13 0.31 0.75 4.16 16.11 5.53 −0.01
May 11 0.32 0.75 4.13 14.71 5.50 −0.18
June 14 0.32 0.73 4.69 18.10 6.46 −0.10
July 15 0.31 0.71 4.54 16.80 6.36 −0.20
August 12 0.29 0.72 4.50 16.87 6.23 −0.18
September 11 0.29 0.73 4.50 16.72 6.18 −0.06
October 19 0.28 0.74 5.14 22.42 6.94 −0.15
November 18 0.27 0.74 4.98 21.58 6.78 −0.05
December 12 0.30 0.72 3.62 10.08 4.99 −0.13

Network indices [diameter (Diam), density (Dens), transitivity (Trans), average shortest path (ASP), effective distance (ED), small worldness (SW), strength correlation (SC)] calculated from the networks covering PACA and estimated in three temporal contexts: the entire period 2012–2017, yearly time periods from 2012 to 2017 and monthly time periods.

PACA
Diam Dens (×10−3) Trans ASP ED SW SC
2012–2017 5 2.51 0.99 2.47 4.78 0.11 0.22
2012 5 0.99 0.91 2.6 4.73 2.85 0.13
2013 5 1.01 0.92 2.6 4.57 2.83 −0.07
2014 8 1.00 0.92 2.86 4.58 3.12 0.004
2015 7 1.02 0.92 2.67 4.60 2.90 −0.01
2016 8 1.01 0.91 2.81 4.58 3.09 0.02
2017 5 1.01 0.92 2.54 4.66 2.76 0.11
January 19 2.35 0.77 4.09 5.90 5.30 −0.54
February 19 2.16 0.81 4.60 6.29 5.71 −0.50
March 27 2.30 0.78 6.05 5.13 7.75 −0.52
April 15 2.40 0.97 3.54 4.08 3.64 −0.59
May 22 2.67 0.87 5.66 4.62 6.49 −0.64
June 17 2.55 0.79 4.93 6.48 6.22 −0.61
July 18 2.63 0.80 4.82 6.53 6.06 −0.62
August 21 2.63 0.80 4.91 6.32 6.15 −0.62
September 18 2.64 0.74 4.54 5.05 6.10 −0.64
October 6 2.69 0.99 2.37 4.07 2.40 −0.64
November 8 2.33 0.92 2.79 4.58 2.94 −0.49
December 18 2.34 0.80 5.90 5.74 7.36 −0.56

Figure 5A leads to the identification of three distinct seasons: summer (June, July, August, September), spring and winter (January, February, March, April, May and December) and fall (October, November). The spatial networks derived from the clustering are shown in Figures 5B–D, which display clear differences in the connectivity patterns: during spring and winter seasons, consistent and long-distance connections are observable along the Italian peninsula, moving from North to South, with sporadic connections on both sides towards Croatia and the western islands of Sardinia and Corsica. In the rest of the basin, connections tend to be local and with low or moderate intensity. During the summer season, the longest and most intense connections are observed in the western part of the basin (from Southern France and the Spanish southern coast to the north-western coast Africa). The network during the fall season correspond to intermediate situation between the above-mentionned networks. In terms of network indices (Figure 6A), it is clear that the winter air-mass connectivity network is more efficient than the summer one: it has higher density, transitivity and strength correlation, meaning that nodes are more frequently and strongly connected, while the longest path between every two nodes is shorter (lower diameter) and the average shortest path between nodes is lower, indicating that it is easier to move between any two given nodes of the network. These observations are particularly interesting if we consider the dissemination of airborne pathogens, that will be facilitated by winter connectivity regimes (irrespective of pathogens preference in terms of seasons).

Clustering according to the indices calculated over the Mediterranean spatio-temporal network: (A): Dendrogram of the months obtained from a hierarchical cluster analysis of the Mediterranean spatio-temporal network based on the monthly dissimilarities of the indices presented in Table 1. (B), (C), and (D): Networks corresponding to the three identified clusters.

Boxplot for the computed indices over the Mediterranean and PACA regions: Boxplot for the different indices (Diameter, density, transitivity, average shortest path, small worldness, effective distance, strength correlation) obtained from (A) the two clusters identified for the Mediterranean region (see Figure 5) and (B) the three clusters for PACA (see Figure 7).

For the PACA region, the dendogram in Figure 7A allows us to identify three distinct clusters: winter (December, January, February, March), summer (May, June, July, August and September), and a set of transition months (April, October and November) separating the two above-mentionned seasons. Amongst the three clusters, the transition months stand out both in terms of network indices values (Figure 6B) and in terms of obtained spatial network (Figures 7B–D). In fact, it is characterized by the lowest diameter, average shortest path, small-worldness index and a strongly negative strength correlation, while it has the highest transitivity index and a rather high density. Compared to the two other clusters of networks, we observe that the networks of the transition months are more efficient in facilitating the flow between nodes and this can have interesting consequences when considering the possibility of airborne pathogen dispersion.

Clustering according to the indices calculated over the PACA spatio-temporal network: (A): Dendrogram of the months obtained from a hierarchical cluster analysis of the PACA spatio-temporal network based on the monthly dissimilarities of the indices presented in Table 2. (B), (C), and (D): Networks corresponding to the three identified clusters.

Discussion

We presented a framework for estimating and characterizing spatial and spatio-temporal networks generated by trajectory data. The development of this framework was motivated by the study of networks resulting from the movement of air masses sampled over long time periods and large spatial scales. Thus, in the application, we investigated the tropospheric connectivities across the Mediterranean basin and the French region PACA, and their variations through years and months. Our approach could be applied to diverse phenomena, from which trajectories can be observed. For instance, one could estimate networks generated by the movement of animals on the landscape scale based on animal trajectories observed with GPS devices . This would allow the characterization of connectivity between different landscape components. Trajectories of human movement, transportation of specific goods (such as plant material) or trajectories of knowledge in social communities (that cannot be exhaustively observed) could also be used to estimate networks in other applied settings.

In Section 2.3, we proposed diverse measures of connectivity with different underlying (physical or biological) interpretations. Thus, the analyst can adapt the connectivity measure to the mechanistic processes he investigates, while we only considered the contact-based connectivity in our applications. Comparisons of contact-based, length-based and duration-based connectivities, not shown in this manuscript, led to little variations for the two case studies considered in this article. However, the use of covariates such as local rainfall and air-mass altitude for defining connectivities, as proposed in Section 2.3, is expected to impact the inferred networks and deserves to be explored. This perspective would be particularly relevant in the context of aerobiology: e.g., the airborne transport of organic particles, such as bacteria and fungal spores, that can be influenced by rainfall favoring the deposition of these particles .

In statistics, we are not only interested in point estimation, but also in the assessment of estimation uncertainties. In this paper, we however, focused on connectivity estimation, even if quantifying the estimation variance could have been useful for more rigorously investigating temporal variation in connectivities. Formally, the connectivity measures that we defined are integrals. Hence, results on integral numerical approximations (e.g., midpoint, trapezoidal or Monte Carlo integration) can be exploited to assess errors or variances of the connectivity estimates . However, for this assessment, one should ideally consider dependencies between connectivity estimates for different pairs of nodes, which is not trivial. Further in-depth methodological developments are required to tackle this issue.

To more finely estimate connectivity, and its uncertainty, one could also take into account, if relevant, the uncertainty about the trajectories themselves. For example, when observed trajectories are smoothed versions of actual trajectories (as it is likely the case for air-mass trajectories calculated with HYSPLIT) or when the trajectories are partially observed and rather erratic, (i) a probabilistic model grounded on, for instance, a stochastic differential equation, could be used to reconstruct probable trajectories and (ii) the connectivity would be estimated from these reconstructed trajectories. Obviously, step (ii) should incorporate the uncertainty about the trajectory reconstruction impacted by an eventual preliminary step consisting in estimating the parameters of the above-mentioned probabilistic model.

Concerning the application treated in this article, we observed distinct seasonal (intra-annual) patterns in the temporal variation of the networks covering the Mediterranean coastline and PACA, whereas no evidence of inter-annual variation has been observed. In the case of the Mediterranean coastline, the networks corresponding to the three clusters shown in Figures 5B–D exhibit clearly distinct spatial patterns and, by looking at the variability of network indices in Figure 6A, we can observe that during winter the flow of air is more efficient in connecting distant parts of the region. In the case of PACA, we identified a third cluster that corresponds to transition months separating the spring-winter and summer seasons. Interestingly, these transition months show the highest connectivity. In the context of aerobiology, and more specifically regarding the diffusion of airborne plant pathogens, it is particularly important to identify the seasons where the spread of pathogen is more probable and to be able to represent the spatial connectivity during these periods.

In the long-term context of our applied research projects connected to aerobiology, the construction and exploration of networks generated by air-mass movements are a way to unravel epidemiological dynamics (and the resulting genetic patterns) of microbial pathogens disseminated at long distance via air movements in the troposphere (see ; for a proof of concept). Indeed, even if the pathogen is not explicitly taken into account by the framework proposed in this article, the description of connectivities that it offers provides us a proxy of airborne pathogen movements over long temporal terms and large spatial scales. This proxy is a mean to understand pathogen transportation and to anticipate its long distance dissemination. Specifically, network indices such as those calculated in this article can be associated with particular epidemiological properties such as the probability of long-distance transport of pathogens . For instance, for plant pathogens, recent studies  showed that airborne populations of bacteria and fungi are rather constant across the years, while higher diversity can be observed in different seasons. This statement resonates with our analyses where we observed clear seasonal signals in the estimated monthly spatiotemporal networks in Section 4.3 whereas the yearly signals were less obvious.

Finally, the networks estimated using our approach could be a basis for developing epidemiological models (explicitly handling the pathogen) incorporating long-distance dissemination conditional on recurrent air-mass movements. Such models could be exploited to set up surveillance strategies for early warning and epidemic anticipation in order to help reduce the impacts of airborne pathogens on human health, agricultural production and ecosystem functioning .

Data Availability Statement

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

Author Contributions

MC, DM, CEM, RS, SS contributed equally to this work. SS, RS and MC developed the theory; MC, DM, CEM and SS performed the applications. MC, DM and SS wrote the manuscript, RS and CEM reviewed it. All authors read and approved the final manuscript.

Funding

This research was funded by the SPREE project from the French National Research Agency (Grant No. ANR–17–CE32–0004–01) and the PHYTOSENTINEL project (Grant No. IB–2019–SPE).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors thank Loïc Houde for his technical assistance in the calculation of trajectories with HYSPLIT. This article has been released as a pre-print at arxiv, .

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fams.2020.602621/full#supplementary-material.