Tropical Cyclone Track Forecasting Using Fused Deep Learning From Aligned Reanalysis Data

The forecast of tropical cyclone trajectories is crucial for the protection of people and property. Although forecast dynamical models can provide high-precision short-term forecasts, they are computationally demanding, and current statistical forecasting models have much room for improvement given that the database of past hurricanes is constantly growing. Machine learning methods, that can capture non-linearities and complex relations, have only been scarcely tested for this application. We propose a neural network model fusing past trajectory data and reanalysis atmospheric images (wind and pressure 3D fields). We use a moving frame of reference that follows the storm center for the 24 h tracking forecast. The network is trained to estimate the longitude and latitude displacement of tropical cyclones and depressions from a large database from both hemispheres (more than 3,000 storms since 1979, sampled at a 6 h frequency). The advantage of the fused network is demonstrated and a comparison with current forecast models shows that deep learning methods could provide a valuable and complementary prediction. Moreover, our method can give a forecast for a new storm in a few seconds, which is an important asset for real-time forecasts compared to traditional forecasts.


Introduction
Cyclones, hurricanes and typhoons are words designating the same phenomena: a rare and complex event characterized by strong winds surrounding a low pressure area.The ability to forecast their trajectory and intensity forecasts is crucial for the protection of people and property.However, their evolution depends on many factors at different scales, altitudes and times, which leads to modeling difficulties (Emanuel, 2003).As the dynamical models evolve, their forecast accuracy improves; however, historical tropical cyclone databases have scarcely been utilized by machine learning and deep learning methods, to further improve forecast accuracy.

Existing Storm Forecasts Methods
Today, the forecasts (track and intensity) are provided by numerous guidance models 1 .Dynamical models solve the physical equations governing motions in the atmosphere and they are influenced by physical models -convective schemes (such as Kain-Fritsch or Simplified Arakawa Schubert), cloud microphysics, land surface model, ocean model, sea/land ice model, planetary boundary layer scheme, surface layer scheme, longwave and shortwave radiation schemes, subgrid-scale diffusion-and by their data assimilation methods (such as 4D-VAR).They are computationally demanding and in current practice older model runs are adjusted in order to be considered early methods, i.e. available in real time.
Statistical models, in contrast, are based on historical relationships between storm behavior and various other parameters (DeMaria et al. (2005)).Current forecasts produced by Regional Specialized Meteorological Centers, like the American Official NHC Forecast (OFCL), are driven by consensus or ensemble methods able to combine different dynamical models 1 (up to 20 models for the Global Ensemble Forecast System 2 ) .

Deep Learning and Convolutional Neural Networks (CNN)
A convolutional neural network (CNN) is a deep learning architecture widely adopted as a very effective model for analyzing images or image-like data for pattern recognition (Krizhevsky et al., 2012;Milletari et al., 2016).A CNN is structured in layers: an input layer connected to the data, an output layer connected to the quantities to estimate, and multiple hidden layers in between.The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers.The convolutional operations are inspired by the cortex visual system, where each neuron only processes data for its receptive field.Fully connected (FC) layers, usually at the end of the network, connect every neuron in one layer to every neuron in another layer.

Machine Learning and Deep Learning in Forecasting Problems
Current statistical forecasting models still perform poorly with respect to dynamical models, even though the database made of past tropical cyclones is constantly growing 3 .Machine learning methods, which are able to capture non-linearities and complex relations, have only been scarcely tested for tropical cyclone tracking.Yet, they have recently shown their efficiency in a number of various other forecasting tasks.In particular, convolutional neural networks (CNNs) have raised attention as they are suited for large imaging (2D or 3D) data.In (Xingjian et al., 2015), a convolutional LSTM model was used for precipitation forecast.Another recent study predicts the evolution of sea surface temperature maps by combining CNNs with physical knowledge (de Bezenac et al. ( 2017)).CNNs have also been used for the detection of extreme weather events like tropical cyclones from weather model variables such as integrated water vapor, as in Racah et al. (2017).
Only few preliminary studies have tackled tropical cyclone forecast tracking using machine learning.and reanalysis maps as input for a hybrid CNN -LSTM network in order to learn the (x,y) tracking coordinates (Mudigonda et al. (2017)).While these methods are usually not compared with existing forecasts methods, some of them seem to even perform worse than a baseline of constant speed and direction, see Giffard-Roisin et al. (2018b).

Frame of Reference
When dealing with image-like data, these studies consider a fixed regional map for tracking storms, of size 160 x 80 deg (longitude/latitude) for Mudigonda et al.
(2017) and of the size of the Korean peninsula area (around 30 x 30 deg) for Rüttgers et al. (2018).However, a fixed region for tropical cyclone forecast has three major limitations.First, the tracked storm must stay in the region even though tracks often cross oceans (see Figure 1), forcing the uses of a large region, even if it leads to memory issues (Mudigonda et al. (2017)).Moreover, learning local phenomena on a large and non-centered image can be difficult.Finally, it prevents information transfer between storms coming from different basins or regions, where ground truth data is scarce.In our recent work (Giffard-Roisin et al. ( 2018b)), we showed the advantage of using a moving reference CNN model for forecasting tropical cyclone tracks 6 hours into the future.This gave roughly a 30km mean error whereas other learning methods gave more than 60km (Moradi Kordmahalleh et al., 2016;Rüttgers et al., 2018) and a constant speed baseline gave 46km mean error.However, a 6h-forecast is of little use for catastrophe planning and it is not possible to compare to existing forecast methods as the smallest standard is 24 hours.

Contributions
We propose to extend this previous work by using a moving frame of reference that follows the storm center for a 24h-forecast tracking task.We pose the tracking problem as the estimation of the displacement vector, d, between current and future locations.Moreover, we propose to use the reanalysis data as cropped images (25 x 25 degrees) centered on the storm location.That way, the computation time is reduced and we can infer information from storms coming from a large number of tropical cyclone basins from both hemispheres.
In particular, our database is made up of more than 3000 storms since 1979, sampled at a 6 hour frequency (more than 90 000 time steps).We include past temporal information by adding the reanalysis maps from previous time steps.We propose a fusion convolutional neural network taking into account past trajectories and different fields from reanalysis images (wind fields and pressure), and we treat each time step of a storm as a training data point.This paper focuses on a 24h-forecast as a proof of concept, and could be easily extended to larger forecast times.
We aim at building an end-to-end model using two types of data (track data and 3D reanalysis) as input.For each time step of each storm, we want to independently estimate its future displacement.After presenting the data, we will show how we designed CNNs to learn from the reanalysis and then improved the result by combining it with history tracks and other 0D features (such as longitude, latitude, and maximal sustained windspeed).Figure 2 summarizes the fusion pipeline that predicts the 24h storm displacement.Lastly, we will show the results on the test set and compare these with current forecast models.2 Data Description

Storm Tracks
The raw storm track data used in this study is composed of more than 3000 tropical and extra-tropical storm tracks since 1979, extracted from the NOAA database IBTrACS (Knapp et al. (2010)), shown in Figure 1.The tracks were produced by multiple governmental agencies, depending on the basin.They are defined by the 6-hourly center locations (latitude and longitude), and the database also includes some associated descriptors such as the windspeed (see Section 2.3).It includes both hemispheres and the number of records per storm varies from 2 to 120 time steps.In total, the database counts more than 90,000 time steps and we used our method to predict the 24-h track forecast for each single time step.

Reanalysis Data
The trajectory of a storm depends on large-scale atmospheric flows.We chose to extract analyzed atmospheric fields from reanalysis data, not the forecast fields.We used the ERA-Interim database (Dee et al. (2011)), which is one of the reanalysis datasets covering the data-rich period since 1979.Reanalysis is a systematic approach to produce datasets for climate monitoring and research, covering the entire globe from the Earths surface to well above the stratosphere and estimate hundreds of available variables.ERA-Interim is a global atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) and is produced in near to real time.The spectral resolution is T255 (around 80 km), the time resolution is 6 hours, and there are 60 vertical pressure levels until 0.1 hPa (altitude around 64 km).

Feature Selection
In this work, we used storm track data and reanalysis outputs to forecast tropical cyclone tracks.We can classify them into 4 types of information: • Past displacements (1D).We define a displacement as the values (δlong ∆t , δlat ∆t ) between the locations of a storm's center, as recorded in the storm track data, at different times.The time difference, ∆t, being in a multiple of 6 hours.The historical displacements of a storm help predict its future displacement (δlong 24h , δlat 24h ).We used the current displacement (i.e. between times t − 6h and t) and the past displacement (between t − 12h and t − 6h).These features are 1D in the sense that they are defined for each past time step (1D temporal data).
• Meta data (0D).We chose the following useful features extracted from the IBTrACS database: the current center-point latitude and longitude, the current windspeed at the center of the storm, the current distance to land, and the Jday predictor (Gaussian function of Julian day of storm init -peak day of the tropical cyclone season in the hemisphere, see DeMaria et al. ( 2005)).We refer to such features as 0D because they are not defined on a spatial grid.
• Wind fields u and v (spatial fields, 3D).We applied a sparse feature selection technique (Automatic Relevance Determination, based on linear regression to the target displacement shift) over the 10 available reanalysis fields on pressure levels, which highlighted the usefulness of two reanalysis fields in particular: wind fields and the geopotential height.Wind fields are the direct observations of the atmospheric flows, so their importance is clear.In order to have a moving frame of reference, we extracted the wind fields of the neighborhood of the storm at every time step from the ERA-interim reanalysis database, see Figure 3. Specifically, we extracted the u-wind and v-wind fields on a 25x25 degree grid centered on the current storm location, at three atmospheric pressure levels (700 hPa, 500 hPa, • Geopotential height fields z (spatial fields, 3D).As previously mentioned, the geopotential height was also found relevant for this task from the ARD regression.Similar to wind fields, we extracted the geopotential height (or iso-pressure latitude) fields of the neighborhood of the storm at every time step on a 25x25 degree grid centered on the current storm location, at three atmospheric pressure levels (700 hPa, 500 hPa, and 225 hPa).
In order to capture the dynamics, we extracted the wind fields and the geopotential height measured at times t and t − 6h at the same location.These fields are thus 3D (spatial) x 1D (temporal).We point out that we first used surface reanalysis data, including sea surface temperature, sea level pressure and 10 meter winds, but because of no significant impacts to the result, we concentrated our efforts to atmospheric wind and geopotential fields.

Set Separation
The storms were randomly separated in three sets as following: training (60%) / validation (20%) / testing (20%).Thus, the storms in the test set have never been seen before by the learning algorithm.Then, within each set, all time instants were treated independently.The training set was used for optimizing the parameters of the neural networks (back-propagation).The validation set was used to select the architecture of the network (Section 3).Finally, the test set was kept hidden and was only used to show the final prediction accuracy at test-time (Section 4).
3 Methodology: a Deep Fusion Model

Overview
Because of the differing nature of the data sources, it is not straightforward to mix all the data into a neural network (NN); different learning rates are needed.
We propose a new fusion NN architecture taking into account the four sources of information.An overview of the architecture we developed is shown in Figure 2.
We divided our fusion architecture into three branches: a Wind CNN, a Pressure CNN and a Past tracks + meta NN.The Wind CNN and Pressure CNN are 2D CNNs that take atmospheric fields (long, lat, stacked over height and time) as input, while the Past tracks + meta NN is a small neural network which takes 0D features as input (stacked over time).Each branch of the network makes its predictions independently.We train the parameters of each individual branch of the network for the same task, i.e. predicting the 24h-forecast track.We then integrate the three networks into a fused network and fine-tune the parameters.
The different steps will be outlined in the following sections.

Convolutional Neural Network for Reanalysis
We propose two similar CNN networks for the wind and the pressure fields.We separate them into two networks because the type of data is different and thus different learning rates were needed.We stacked the data over height (pressure level) and time, such that the inputs of the CNNs consist of multiple 2D (long, lat) frames or channels.The Pressure CNN has six input channels (each one of size 25x25), while the Wind CNN input consists of 12 channels (u and v are stacked).We used a typical CNN architecture, alternating convolutional layers (Conv layer) and max-pooling layers, with fully connected layers at the end (Simonyan and Zisserman, 2014).Following conventional wisdom in the computer vision literature, all hidden layers are equipped with the rectification We evaluated the performance on 24-hour storm track prediction for the Wind CNN.The result of the architecture evaluation on the validation set is shown in Table 3.We give two scores: Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), in kilometers.With the increase of model depth, there is no clear improvement on the result.Since adding more convolutional layers allows the network to learn features at more levels of abstraction, we chose the intermediate Network C.
We also evaluated how adding more historical features from past time steps in the input data can improve performance.In addition to t and t − 6h, we did not observe any noticeable improvement by including more data from the same location at previous time steps.We thus only kept the times t and t − 6h.

Past tracks + meta Neural Network
Another important source of information are the previous displacements and the other IBTrACS features (see Section 2).They can be treated as a size-9 vector of 0D components.We designed a small NN of two small fully connected layers (the green branch in Figure 2) to learn the future track from the 0D features.We use two past displacements from t − 12h to t − 6h and from t − 6h to t because more past tracks did not improve the performance.

Fused Neural Network for Wind, Altitude, and Tracks
Because of the differing natures between the wind fields, pressure fields and past track data, it is not straightforward to mix them as an input of a NN.Indeed, our preliminary experimentation on training a network combining these three types of inputs simultaneously did not give satisfactory enough results.Instead, we first train separately the three individual branches of the network.We then concatenate their two last layers and add a layer at the end of the network (see Figure 2).The new concatenated layers consist of the same weights as before in each branch, plus new connections from each branch to the other ones, which we initialize to zero.That way, the function computed is (at start) the same as previously.We then re-train the whole fused network by allowing every weight to be re-optimized.The number of fused layers (here two) was determined by comparing four different configurations on the validation set, and a different learning rate was tuned on the validation data set for this final optimization.

Algorithmic Details
We trained our networks using the root mean square error (RMSE) in kilometers between the forecast and the true storm location at t + 24h as the loss function.
We added an L2 penalty on the weights of the model (coef.= 0.01).

Results on the Whole Dataset (all basins)
We have compared the fused network, fusing all three branches, with the three single branches of the networks.Figure 4 shows the 24h-forecast results on the test set, which was 14,256 time steps in total, in absolute distance error.We can see the improvement of fusing networks (mean error : 130 km) with respect to the Wind CNN (mean: 148.9 km), the Pressure CNN (mean: 172.7 km) and the Past tracks + meta NN (mean: 186.6 km) alone.We can also see the importance of separately pre-training the three networks before the fusion, as it improves the mean result by 5 km.We have also calculated a persistence forecast baseline: a 24-hour prediction that is four times the storm's last displacement from t − 6h to t.The mean error of this baseline on the test set is 196 km, which is more than 60 km higher than our method.
Moreover, if we only test on tropical cyclone time steps excluding depressions, which are storms of lower intensity, our mean prediction error drops from 130 km to 109.3 km.Observe in Figure 5(a) the global trend, showing that tracks from more intense storms are predicted with a lower mean error than less intense storms.The mean error from tropical cyclones of categories 4 and 5 is below 90km.Figure 5(b) shows the forecast errors with respect to the current distance to land.We can see that a small distance to land, 200km or less, is one of the factors impacting the prediction quality.Lastly, we can see in Table 4 for the results on the test set for the different regions or basins that the best results are in the North Atlantic with a mean error of 130.2km, or 26.5% of the 24h displacement mean distance.The larger error is found in the South Pacific basin, but it is also the basin where we have the smaller number of samples.

Comparison with Statistical/Consensus Forecasts Methods
We also compared our fusion model CNN with two existing forecasting models: CLP54 , a statistical model which is often used to benchmark other storm track forecasting methods, and OFCL, the National Hurricane Center official forecast (consensus of dynamical models)5 .We extracted the CLP5 prediction results of years 1989-2016 in the Atlantic and Eastern Pacific basins.We compare in Table 5 our fused network with the statistical CLP5 on the test tropical cyclone time instants at which both methods provided a forecast.This means we compared only when there is a one-to-one correspondence, which is 4349 time steps from 258 storms.On both basins, our fused network performs better than the CLP5 model on average and in standard deviation.Moreover, the frequency of forecast errors larger to 200km, or busts, is also lower for our method, especially in the Atlantic (10% compared to 18%).Such comparison is not possible with the OFCL as this model is modified every year and they only provide forecasts of the version N of the model for the year N. We dont know the performance of the recent OFCL models on previous years and it would be unfair for them to compare with old results that were potentially obtained with earlier, less efficient models.
That is why we compared the yearly results of our fused network performance with the two models on the same subset of the test set.These results (mean and 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008   During the 2010s, the OFCL method improved and its mean errors per year were smaller than ours.We can also notice that none of the large error peaks (ATL: 1993(ATL: , 2003(ATL: , 2012;;EPAC: 1993EPAC: , 2009EPAC: , 2013) ) involve our model, which seems to indicate that our method is robust.Finally, we qualitatively compared the predictions with both OFCL and CLP5 models for recent storms of the test set, such as Tropical Cyclone Odile in 2014 (Figure 8), Tropical Cyclone Hermine in 2016 (Figure 9), and Tropical Cyclone Blas in 2016 (Figure 10).The small bars connect each pair of predicted and ground truth location after 24 hours.The longer the length, the larger 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014   the error.Even though the official OFCL model has globally smaller forecast errors, on some time points our model outperforms the OFCL.It seems that our method still perform poorly on the land (see Figure 9).A future improvement could be to add the sea/land map as additional feature.Moreover, the three forecasts often have different directions.A neural network model can thus help the current forecast modellers by providing a complementary prediction that could be integrated in a consensus method, as their mistakes are different.

Discussion
Our method only needs to be trained once, although this training can be improved with more data.After that, only a few seconds are needed to give a forecast for a new storm because prediction or inference using such models is much faster than training them.This is a significant time improvement over dynamical models, whose bottleneck is the computing speed.However, one has to keep in mind that our method needs current and past reanalysis fields.While they are usually quickly calculated, within few hours, it does increase the total forecast time accordingly.
We have shown a proof-of-concept for 24-hour forecasting, and Giffard-Roisin et al. (2018b) shows that the 6-hour results are also very satisfactory.Yet, more long-term forecasts could be made using the same structure.We conjecture that for very long forecasts, larger than 25x25 degree images might be needed.
Moreover, we worked here on trajectory prediction, yet this model can be easily modified by changing the last layer and be trained for another task, such as intensity prediction (see Giffard-Roisin et al. (2018a)).
Other useful features could be found by using different reanalysis fields.
Although our choice of wind and geopotential height fields was driven by an automated feature selection method, we did not test all the possible configurations at every pressure level.Potentially, a more refined selection could increase the overall performance.As an example, for the intensity prediction, we think that surface fields such as sea surface temperature should be reconsidered.We could also represent the wind field by streamfunction and velocity potential as opposed to u-and v-wind components, which might help to have less correlated features.Moreover, while the machine learning algorithm could learn the differences of  flow direction between North and South, a future improvement could be to flip the fields North-South and to change the sign of the vwind component.The recent release of the new version of ERA reanalysis, ERA 5, might also increase the accuracy.As Hodges et al. (2017) show, the mean offset in tropical cyclone center position in the ERA-Interm reanalysis product can be up to 1 degree for the period from 1979 to 2012, so moving to ERA 5 and using the GFDL Vortex Tracker (Marchok, 2002) would increase our performance.A comparison to other baseline forecasts, such as TVCN (Track Variable ConseNsus), would also be interesting.Finally, our method could be easily transferred to operational Numerical Weather Prediction data by filtering it to the same spatial resolution.

Conclusion
We designed a neural network for the 24h-cyclone storm track forecasting using a moving frame of reference that makes use of a common dataset and a unique trained NN for every tropical cyclone of both hemispheres.When a new tropical cyclone occurs, our network can give a forecast in only few seconds.We demonstrated the benefit of coupling past displacements and aligned reanalysis images.Moreover, we also compared with traditional forecasting methods and showed the improvement with respect to the statistical CLP5 model.This is only a proof-of-concept of deep learning for tropical cyclone forecasting, yet we think that such a different approach as machine learning and NN can be very beneficial if integrated in a consensus method.

Figure 1 :
Figure 1: Database: more than 3000 tropical/extra-tropical storm tracks since 1979.Dots = initial position, colors = maximal storm strength according to the Saffir-Simpson scale.

Figure 2 :
Figure 2: General architecture: the three types of data feed three neural networks trained separately.The final fused network is re-trained before predicting the 24h-forecast displacement.

Figure 3 :
Figure 3: Global atmospheric grids centered on the storm location: wind fields (u and v) and geopotential height (z).
The training was performed by the Adam optimizer, and each model converged within 200 epochs.Every evaluation was repeated three times and an average score was .6 mean = 172.7 mean = 148.9mean = 135.8mean = 130.4

Figure 4 :
Figure 4: Comparison between the three simple networks (the 0D Neural Network, the Pressure CNN and the Wind CNN), the fused network without separate pre-training (gray), and the fused network with pre-training (red, proposed method).24h-forecast results on the test set (storms coming from all oceanic basins), in distance between predicted and real locations.

Figure 5 :
Figure 5: 24h-forecast mean errors on the whole test set with respect to (a) the current Saffir-Simpson hurricane category (a higher category means a stronger hurricane, dep means tropical depression, storm means tropical storm); (b) its current distance to land.
errors per year-EPAC Our fusion network forecast errors OCFL model (official) forecast errors BCD5 model (staistical) forcast errors

Figure 6 :
Figure 6: average of 24-hours storm track forecasting errors (km) and standard deviation on the test set (top figure for storms in Atlantic, bottom figure for storms in East Pacific) for our fused network forecasts (blue), the CLP5 model forecasts (green) and the official NHC forecasts (red), 1989-2016.
Number of effective storms and timesteps on the test set per year-EPAC number of timepoints

Figure 7 :
Figure 7: Number of storms and timesteps used to compare in the two basins (Atlantic and East Pacific) for every year, 1989-2016.

Figure 8 :
Figure 8: 24-h forecast errors (4 time steps ahead) on Tropical Cyclone Odile in 2016.The bars connect each pair of predicted and ground truth location.The longer the length, the larger the error.At the beginning, the forecasts were not always available (a complete absence of an error bar should be interpreted as no forecast).

Figure 9 :
Figure 9: 24-h forecast errors (4 time steps ahead) on Tropical Cyclone Hermine in 2016.The bars connect each pair of predicted and ground truth location.The longer the length, the larger the error.At the beginning and at the end of the track, the forecasts were not always available (a complete absence of an error bar should be interpreted as no forecast).

Figure 10 :
Figure 10: 24-h forecast errors (4 time steps ahead) on Tropical Cyclone Blas in 2016.The bars connect each pair of predicted and ground truth location.The longer the length, the larger the error.At the beginning and at the end of the track, the forecasts were not always available (a complete absence of an error bar should be interpreted as no forecast).

(
ReLU) non-linearity and batch normalization.The different configurations that we have evaluated for Wind CNN and Pressure CNN are outlined in Table 1, one per column.All configurations follow the generic design described above and

Table 1 :
Different configurations of Wind CNN tested.The depth of the configuration increases from left to right, as more layers are added.conv3-32 indicates a convolution of size 3x3 with 32 output features.FC means fully connected layer.maxpool indicates a 3x3 max-pooling layer.The ReLU activation and batch normalization layers (applied after each conv.or FC layer) are not shown in the figure.

Table 2 :
Number of parameters (in millions) of the four network configurations tested in Table1.

Table 3 :
Performance of candidate configurations (Wind CNN) on 24 hours storm track prediction, on the validation set using wind fields.

Table 4 :
24h-forecast results for the different regions (basins), on the test set.Mean error in km and relative mean error wrt. the mean 24h displacement distance.

Table 5 :
Mean and standard deviation 24h-forecast errors for the Atlantic and Pacific basins on the subset of the test set where both predictions were available (total = 4349 time steps).Busts correspond to the ratio of track errors exceeding 200km (and 250km).