^{1}

^{2}

^{2}

This article was submitted to Environmental Informatics and Remote Sensing, a section of the journal Frontiers in Earth Science

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

_{2} and 8.5% for CO_{2} when compared against independent membrane DO and laser spectrometer instruments. This represents a predictive accuracy of 92–95% for both gases. It is apparent that the most important factor in a skillful bias correction is the measurement of the secondary environmental conditions that are likely to correlate with the instrument bias. These statistical learning methods are extremely flexible and permit the inclusion of nearly an infinite number of correlates in finding the best bias correction solution.

The uncalibrated signal (

Experimental chemistry has been slow to consider bias and systematic error, in part because the end goal of many studies was the demonstration of a corollary relationship rather than a process model (

Within the geosciences, the problem of chronic undersampling in diffusive environments, such as air and water (

The approach of regular switching to a reference compound is a proven means to correct for drift in continuous instruments. However, the instrumental conditions that we confront in this study differ in two significant ways from the previously described continuous measurement methods. The first difference has to do with the magnitude of the bias, compared to the signal of interest. Previous underway studies have confronted bias corrections of a few to 10% of the overall instrumental signal, while the instrumental bias that we face can vary by 100% or more. The magnitude of this bias renders the true environmental signal unrecognizable until the correction has been applied. The second major difference is that previous studies have identified the most likely sources of bias, but they have not quantified those sources to implement the bias correction. When the instrumental bias masks the true environmental signal, the bias must be treated as a continuously varying function, and therefore, a simple linear correction to baseline drift is not adequate. This bias correction problem lends itself to time series and multivariate regression techniques, including partial least squares, ridge regression, generalized linear, and generalized additive models (

Multivariate time series predictions have undergone a period of rapid development and availability thanks to the popularity of another member of the statistical learning family, neural networks, which have proven facile at, e.g., image and speech recognition. Neural networks are also suited for time series applications including forecasting or prediction (

In this application, we apply and compare a generalized additive model (GAM) and a LSTM neural network model to observe their performance in baseline correction to mass spectrometer data. A schematic depiction of the bias correction workflow can be observed in _{
i
}) and the instrument signal (_{
i
} and to alter the functional form (e.g., linear, polynomial, and cubic spline) that is fit between s and each _{
i
}. The effect is to give the user greater control over the functional form and the partial influence of each correlate on the total solution.

A schematic depiction of the effect of environmental factors on the introduction of bias into

The signals of interest to this study are measurements of gases dissolved in water and seawater using field-portable quadrupole mass spectrometers (QMS). We present examples of the GAM and LSTM applied to data from a submersible wet inlet mass spectrometer (SWIMS) that was used to measure dissolved oxygen in the top 150 m of the Sargasso Sea and Gulf Stream, in the subtropical Atlantic Ocean. We present a second example of signals collected with a similar mass spectrometer aboard a ship that was used to measure dissolved carbon dioxide at the ocean surface, within the sea ice-covered Ross Sea, Antarctica.

Throughout this text, we make references to Python modules that were used to implement the individual solutions. The implementation of the GAM backfit algorithm, as well as example scripts for applying these methods to SWIMS data, can be found in the

The bias correction models were each applied to ocean measurements of gases dissolved in seawater. These measurements were made using a QMS. The QMS is an ideal tool for ocean measurements because it is compact, and it can scan over a large range of atomic masses. In this study, we refer to the mass-to-charge ratio (m/z), where m represents the atomic mass of the molecule of interest, and z represents the positive charge state. For example, water vapor is measured in the QMS at m/z = 18, and molecular oxygen (O_{2}) is measured at m/z = 32. In this study, z = 1 in every instance. The QMS can be connected to a variety of gas inlet configurations. Further detail on the principles of quadrupole mass spectrometry can be found in

The first ocean dataset was collected in July 2017 along a dynamic section of the subtropical Atlantic between 35° and 40° N latitude (

Maps showing the ocean regions where QMS data were collected as part of oceanographic surveys. Panel

Ocean properties during the SWIMS tow in the N. Atlantic, across the Gulf Stream and into coastal waters influenced by the Labrador Current. The track lines of the tow are shown in panel

This ocean section began in the North Atlantic subtropical gyre, a circulation feature that is known to be highly depleted of nutrients with low biomass (

The SWIMS was being used to measure oxygen, argon, carbon dioxide, nitrogen, and methane in the surface ocean. Each of these dissolved gases has significance for biology and geochemistry of the ocean. Our

The bias correction models were also tested on data from a shipboard QMS that continuously sampled dissolved gases in the Ross Sea sector of the South Atlantic, south of 75°S. These measurements were collected between May 16 and June 4, 2017. The partial pressure of carbon dioxide (pCO_{2}) was measured by connecting the QMS directly to a turbulent air-water equilibrator of the type described by _{2} by infrared absorption spectroscopy (^{–5} torr.

Carbon dioxide was measured with the QMS by scanning at the atomic mass m/z = 44. The reconstruction of pCO_{2} was carried out with a daily 3-point linear calibration with reference gases of pCO_{2} = 0%, 0.4%, and 0.1%. These signals can be seen in the expanded scale on the right side of _{2} partial pressure (pCO_{2}) measured at atomic mass m/z = 44. Instead, the GAM and LSTM models were trained on relatively stable ion current signals measured during a four-day period between May 27 and June 1.

Time series of environmental correlates used to bias correct the pCO_{2} signal measured by shipboard QMS at m/z = 44 (panels

This late autumn period in the Southern Hemisphere was cold and windy with continual disaggregated ice formation in the surface ocean. The principal source of bias appeared from the thermal cycling in the room where the QMS and equilibrator were operating (

The SWIMS passes seawater directly over a gas-permeable silicone membrane under conditions that approach a constant flow rate while maintaining constant water temperature using a resistive heater and aluminum block (

The six environmental correlates that were measured by the SWIMS instrument or CTD to capture variations in the environment that are likely to influence the signal response (s) of the SWIMS for m/z = 32 and other dissolved gases. Panel

To discover the instrument response, it is necessary to remove

Therefore, the steps to obtain _{2}] using the ion current measured at m/z = 32. The raw ion current (_{2} dissolved in the water but also to other environmental correlates,

After bias removal, _{2}] = 0, the ion current does not reach zero because of electronic noise, and the potential for “virtual leaks” as gas is desorbed from the walls of the QMS under vacuum. In other words, _{2} = 0 in ultrapure N_{2} gas, as described in

The bias corrections that we evaluate here belong to a family of statistics called supervised learning. These corrections compare correlating inputs with corresponding outputs to develop a predictor that can be applied to any set of inputs. To develop the prediction, a sufficiently large dataset is divided into subsets—often referred to as “train” and “test” subsets (

Statistical learning models are exceedingly flexible and conform to almost any feature at any scale within a time series. This can result in “overfitting,” a condition where the learning algorithm attempts to reproduce small scale noise or other shapes in the data that do not improve the prediction or bias correction. Overfitting results because of the imperfect separation between the bias and the random error. This imperfect separation between

A GAM achieves smooth fitting by using the sum of fitting functions that individually represent the covariance between an individual input (_{
i
}
_{
i
}
_{
i
}) and the response (_{
i
}) data,_{
j
}) is flexible, although a typical choice is a natural cubic spline. Natural cubic splines are a collection of polynomials, with second derivative equal to zero at the endpoints or knots. By specifying more knots, the splines can represent a higher frequency fluctuations. The fit between _{j}’s means that the influence of each _{j} on the global solution can be observed, plotted, and evaluated. As mentioned, this is one of the principal strengths of the GAM, and it permits a more interactive and nuanced approach to determining the significance of each input variable and the behavior of each _{j}.

We implemented the penalized least squares using the ridge regression algorithm in the Scikit-learn library with a specified value for penalization and normalization of all input variables;

>> model = Ridge(alpha =

The natural cubic spline matrix with k = 9 knots was implemented using the Patsy module.

>> basis = dmatrix(“cr(train, df=10)-1”, {“train”: X[j]}).

We incorporated this penalized regression into the global fit using the backfit algorithm (_{p}), or the difference between the signal response (_{j},

Standardize or remove the mean from s:

Set the initial spline functions to zero:

Use linear regression to fit f_{j} to e_{p}: basis = model.fit(basis,e_{p})

Estimate y from f_{j}:

Recompute e_{p}:

Repeat steps 3 thru 5 until e_{p} stops changing

More complex examples, involving other link functions between _{j} and the imposition of different probability distributions on _{
i
} (e.g., Gamma, Poisson or exponential), are all treated in more detail in

To determine the optimal fit, we iteratively apply the backfit algorithm to the training data subset and then compute the generalized cross validation (GCV), as it varies with

The GCV score can be computed directly using

>> gcv = model.score (X_test, s_test).

Because the components of the GAM model are separable, it is also possible to determine which environmental correlates contribute most to the best-fit solution. This avoids the inclusion of correlates that make no contribution or may even degrade the GAM solution. The Bayesian information criterion (BIC) considers the model fit quality but also penalizes for models of increasing complexity (

The normalized Bayesian information criterion (

Recurrent neural networks (RNN) can be used to interpret sequential data, like time series, where each data record may be related to the records that preceded it. The neural network uses functional dependencies along a network of nodes, and the influence of these dependencies is weighted based on their relative importance. The RNN keeps track of these network weights as a means to archive predictive information as memory (

Because there are so many types of problems that can be solved using neural networks, it is helpful to list out the characteristics of this particular time series solution because this affects the structure of the neural network (

Instrument bias correction can be thought of as time series prediction. Even though our approach is to use a multivariate set of inputs to help develop the bias prediction, the potential for long-term transients in the instrument signal encourage the interpretation of bias correction as a sequential and time-dependent statistical problem. Examples of instrumental memory can include, e.g., the silicon membrane stiffening (

We use the Keras sequential() model. The 2D environmental array

After the

Keras allows a user to take control of when the RNN weights are updated; this is known as controlling the model state or “stateful = True.” By default, Keras updates the LSTM state after a “batch” is processed. A batch is a collection of sample sequences, where each sample sequence has

During tuning and iteration of the GAM model, we used GCV(

To evaluate the overall fit quality, we measured the RMSE between the independent O_{2} and CO_{2} instruments (_{
ind
}), and the bias-corrected signal from the QMS and SWIMS instruments as defined by

The bias correction workflow is depicted in

Sequence showing the SWIMS tow and bias correction using the generalized additive model (GAM). Panel _{2} concentration units, alongside the Seabird DO sensor.

The ability to choose a functional form for each _{
j
} environmental correlate was an attractive feature of the GAM because early tests revealed that oxygen (m/z = 32) strongly correlated with water vapor (m/z = 18), and signal from the SWIMS showed m/z = 18 ion currents outside the range observed during

All the environmental correlates (_{2}] or m/z = 32. While these parameterizations showed a stiffer, more proportional response to the large-scale variations in m/z = 18; ultimately, the natural cubic spline produced the best RMSE solution.

Having chosen a cubic spline functional form for _{i}, there remain only two additional parameters that can be used to tune the solution—the number of knots in each spline and the value of the penalty function, λ (^{–10} to λ = 10^{10} (^{–2} < λ < 10^{5}, with a minimum near λ = 10^{5}, so this value of the penalty was implemented in the solution.

Test of the GAM solution using a range of values for the penalty term

As noted, the Keras LSTM algorithm requires iteration to choose appropriate values for the

Because the choice of batch_size, epoch number, and dropout regularization cannot be determined a priori, but have a preponderant influence on overfitting, we objectively determined the optimal values for these three hyper parameters using the GridSearchCV() algorithm in Keras. The approach tries all permutations of the hyper parameters and measures the fit quality using the RMSE and a

Root mean squared deviation between the train and test subsets during successive training epochs. While the training RMSE continually decreases, suggesting improvement, the test RMSE begins to increase after 20 epochs, suggesting that the solution is being overfit.

Finally, the choice of

Normally, the procedure to evaluate a statistical learning algorithm involves validating the solution against the test data (

The final list of environmental correlates was determined using the

The SWIMS tow between 35° and 40° N recorded a total of N = 49,181 individual measurements of dissolved O_{2}. A contour plot of dissolved O_{2} reveals the tracer field in _{2}] in this section was 196 µM, suggesting a 5.7% deviation between the two instruments. Within the same section, the neural network LSTM bias correction yielded RMSE = 9.8 µM or 5.0% deviation overall. Both GAM and LSTM bias corrections tended to fit some regions better than others; however, the fit quality of the GAM and fit quality of the LSTM did not degrade in the same places, suggesting some differences in how the two models respond to the environmental correlates (

North Atlantic section, including Gulf Stream and Labrador waters showing temperature

It should be noted that we are focusing on interpretation of the relative RMSE between the GAM and LSTM solutions. The absolute value of the RMSE is less meaningful because the calibration intercept (

The bias corrections in the shipboard QMS were fit using training data over a four-day period of the surface ocean equilibrator time series from May 27 to June 1. The RMSE between the GAM solution and training data subset was 3.5%, and the LSTM misfit was 1.8%. Unlike the SWIMS tows, it was not possible to evaluate _{2} during the bias correction. However, the ambient changes in pCO_{2} should reflect the biology and chemistry which in turn are only partly dependent on the exogenous environmental correlates. The endogenous environmental correlates reflect instrument behavior, which should have zero correlation with environmental pCO_{2}. The environmental correlates used to develop the bias correction model included, 1) temperature of the lab where the QMS was installed, 2) the total gas pressure in the QMS measured as voltage, 3) the seawater flow rate through the turbulent equilibrator, 4) water vapor measured at m/z = 18, and 5) m/z = 15. Similar to the SWIMS tow, we found that three environmental correlates caused an increase (no decrease) in _{2} cell temperature, the water wall flow rate, and the second equilibrator temperature reading were eliminated from the bias correction solutions (

After bias correction, the raw ion current was calibrated to CO_{2} partial pressure, using the three-point calibration of reference standards that were measured daily. There are additional corrections to gas measurements that are made using a turbulent equilibrator, and these are described by _{2}, they would drop out of the comparison between GAM and LSTM bias corrections; so, these additional data corrections are not material to this evaluation.

In this case, the GAM model was better at removing the periodic oscillation in the QMS ion current at m/z = 44 (). However, a level of noise persists even after the bias correction, suggesting that the environmental correlates may be missing some components of the bias. In total, the 18-day time series contains 5043 unique measurements of pCO_{2} by infrared absorption spectroscopy and by QMS. The RMSE between the IR pCO_{2} and GAM-corrected pCO_{2} was 31.3 μatm; the average pCO_{2} was 411 μatm, revealing an overall misfit of 7.5% (_{2}. In this case, it appears the LSTM (not pictured) may have slightly overfit the training data, resulting in a degraded fit to the overall time series. Nevertheless, the difference in RMSE between GAM and LSTM was less than 1%, which suggests that both methods produce very similar overall bias correction outcomes.

Bias-corrected and calibrated pCO_{2} from shipboard QMS alongside measurements of pCO_{2} by infrared absorption spectroscopy (IR pCO_{2}) in the Ross Sea, 2017.

This study presents two models for instrument bias correction, a GAM and a LSTM neural network model. The two models represent philosophically different approaches to the multivariate prediction; the GAM allows the user to investigate the intermediate model fit products and choose the functional form f() for optimal regression between the results and the individual environmental correlates in

The LSTM RNN model gives the user fewer intermediate diagnostics, which produces an initial lack of confidence in the robustness of the solution because it can be challenging to understand or visualize the nature of the solution. Nevertheless, there is an emerging recognition that, compared to the human brain, computers are much more capable instruments at assigning appropriate weights to an

The difference between GAM and LSTM RMSE was 1% or less for both ocean sections, suggesting that both models performed similarly well. The RMSE for both methods were better than 6% for O_{2} and less than 9% for CO_{2}, demonstrating a predictive accuracy of better than 91% for both dissolved gases. The quality of the bias removal solution was significantly more dependent on the availability of coincidently sampled environmental correlates as inputs. We further found that the

The overall performance of the GAM and LSTM models was highly comparable, making it difficult to declare a clear winner in this case. The primary advantage conferred by the GAM model is the ability to evaluate the fit to each individual correlate, separately. This is a big advantage when it is necessary to better understand an instruments behavior and might even lead to engineering solutions that eliminate the biggest source of bias. In comparison, the skill that an LSTM RNN brings to time series prediction can potentially serve to model longer-term transients in the signal, which could lead to a better bias model when few or no environmental correlates been measured.

The entire workflow including code, writeup, and source data with BCO-DMO DOI can be found at

BL developed and tested the bias correction methods described here, and lead the field data collection program. RT Short co-developed the SWIMS instrument and in-situ calibration system, and participated in the field data collection. ST participated in the field data collection.

This work was supported by a grant from the National Science Foundation, Award # 1429940.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This research was supported by an award from the National Science Foundation Chemical and Biological Oceanography Program #1429940. We thank two anonymous reviewers for the comments and suggestions that have improved this manuscript. The GAM backfit algorithm is available at

The Supplementary Material for this article can be found online at: