^{1}

^{2}

^{1}

^{†}

^{1}

^{3}

^{1}

^{2}

^{3}

Edited by: Adam James Carroll, The Australian National University, Australia

Reviewed by: Guillaume Georges Tcherkez, Université Paris-Sud, France; Lei Song, National Cancer Institute, USA

^{†}Present address: Bettina Länger, Siemens AG Austria, Vienna, Austria

Specialty section: This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Bioengineering and Biotechnology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Inferring dynamics of metabolic networks directly from metabolomics data provides a promising way to elucidate the underlying mechanisms of biological systems, as reported in our previous studies (Weckwerth, ^{T}^{1}

Understanding regulatory mechanisms of metabolic networks at the systems level is a demanding, yet essential task. Metabolomics is the study of all metabolites identified and quantified in a biological organism under a specified physiological state and provides a promising approach to potentially unravel the complex dynamics in metabolic systems by measuring many metabolites participating in particular biochemical processes and across many biological samples (Nicholson et al.,

As a more analytical approach, mathematical modeling represents metabolic networks as a set of ordinary differential equations (ODEs, Eq. _{1}, _{2}, … , _{n}_{1}, _{2}, … , _{n}

The Jacobian matrix _{i}_{j}

To obtain the Jacobian matrix, it is natural to build mathematical models (such as Eq.

Steuer et al. established a fundamental link between metabolic covariance data ^{T}

For a system with ^{⋆}(^{2} variables in the non-symmetric

We can circumvent this problem by introducing the stoichiometric matrix (STOI) of a metabolic network, which is typically very sparse (Weckwerth,

Overdetermined systems have best approximation solutions. To make it clearer to understand, with simple matrix operations, Eq. ^{2}^{2} matrix derived from ^{2}-^{2}-^{2}

The most popular method is ordinary least squares (OLS). It minimizes the squared residual error of

The solution ^{T}

However, in some cases, when ^{T}A^{T}A^{−1} cannot be stably obtained, resulting in inaccurate solutions _{1} ≥ σ_{2} ≥ ,…, ≥ σ_{n}^{2} are the Eigen values of matrix ^{T}A_{i}_{i}_{1}

The metric for ill-posed problems, condition number of _{1}/σ_{n}

One method to alleviate ill-posed problems is to truncate

A similar method is truncated total least squares (TTLS). Unlike the original truncated SVD form Eq. 6, it implements SVD on the combined matrix [

Another method is called “regularization,” which adds a penalty form in the Eq. _{0} is the initial estimation of _{0} is unknown, it is just 0s. Γ is a function of

Regarding with _{0}| is the absolute least distance between _{0}, and Eq. _{0}, and Eq.

So far, we have introduced methods to solve the inverse Jacobian from metabolomics covariance data. In our previous work, we established reverse Jacobian calculation pipeline and implemented OLS, TLS, and TIKH in the software COVAIN (Sun and Weckwerth,

We applied our approaches on a real metabolomics dataset (Nägele et al.,

However, “no free lunch theorem in optimization” also holds true for these inverse methods since they involve the optimization process. It is possible that some methods perform better than others under specified conditions and for some types of data, and therefore, understanding the factors that affect the performance of the inverse methods is important. Additionally, two practical challenges relate with covariance matrix and fluctuation matrix. Firstly, estimation of the covariance matrix is often problematic due to missing values and outliers in the measurements. Post-experimental data processing, for instance, missing value imputation and outliers adjustment, further exert perturbations to the original covariance matrix, i.e., the ideal “true” one with no missing values or outliers. Secondly, the fluctuation matrix can be retrieved from prior biological knowledge, for example, fluctuation only associates with few particular metabolite(s), or with all metabolites, but such information may not be an accurate reflection of the “true” fluctuation in biological organisms. Therefore, for both cases, it is reasonable to check how such uncertainties affect the reverse Jacobian.

Since our aim is to study the effects of a large condition number, the imperfect covariance matrix and uncertain fluctuation matrix, we choose to use experimentally validated

BIOMD0000000023 (abbreviated as Sucrose BM23,

BIOMD0000000042 (Glycolysis BM42,

BIOMD0000000066 (Signaling BM66,

_{1}, v_{2}, etc., are the reaction rates with simple mass action kinetics. Abbreviation: Pext, “external” phosphate in the chloroplast; TPext, “external” triose phosphate in the chloroplast; P, phosphate; TP, triose phosphate; F6P, fructose-6-phosphate; G6P, glucose-6-phosphate; UDP, uridine diphosphoglucose – glucose; SP, sucrose phosphate; SUC, sucrose; GLC, glucose; FRC, fructose.

The detailed information of these three models including original publications, kinetic equations, and parameters can be accessed from the BioModels database (Le Novère et al.,

The overall workflow is as follows. We first obtained the

To obtain the metabolomics covariance data, first, we converted the ODEs of above models to SDEs by adding Gaussian white noise to the right side of Eq. _{0} in the control condition as a diagonal matrix (diagonal entries are non-zero and all off-diagonal entries are 0s which means there are no cross-talks between metabolites). Third, we iteratively simulated the SDEs with the predefined _{0} for _{0} and Jacobian _{0}^{−1} s^{−1} or mmol mL^{−1} s^{−1}, meaning the concentration change per second. After partial derivation on the concentration variables ^{−1}, that is, the inverse of time. The covariance matrix ^{−1})^{2} or (mmol mL^{−1})^{2}.

The perturbation on _{0} was obtained by reducing the repeat times to _{1}, _{2}, _{3}, etc., thus represent imperfect estimation of _{0}, based on the “Law of large numbers” theorem that the covariance estimated from a subset of data does not give the actual approximation of the covariance calculated from the original data. The perturbation magnitude δ_{0}, i.e.,

The perturbed _{0} was achieved by adding different levels of Gaussian white noise to all entries of _{0} as _{i}^{2}))_{0} where _{0}, _{i}_{0} and _{0}, 100 repeats were obtained.

In the inverse Jacobian calculation procedure, we use these perturbed covariance _{i}_{i}_{i}_{i}^{2} values of linear regression between _{0}_{i}^{2} for linear regression is that they often contain a constant offset from the origin point, and if that happens with the reverse Jacobian approach, it means that entries of _{0}_{i}_{0}_{i}_{0}_{i}_{0}_{i}^{2} is a good metric of the goodness of the reverse Jacobian (Figures S1–S4 in Supplementary Material).

As explained in Section “_{A}_{A}_{A}_{A}

_{A}_{10} scale.

Without perturbation, i.e., δ^{3}–10^{4}), which may be a result of its simple mass action kinetics. Sucrose BM23, on the other side, shows a surprisingly high condition number (over 10^{16}), which may result from its complex kinetics. In fact, for this small model with only five metabolites, there are 11 reactions including bireactant Michaelis–Menten kinetics and inhibition regulation, as well as 63 kinetic parameters. Higher complexity of the model may lead to increased fluctuation propagation and result in larger variance–covariance matrices. The other two models, Glycolysis BM42 and Signaling BM66, which contain more metabolites and reactions than the Sucrose PGM model and simpler kinetics than the Sucrose BM23 model, have medium high condition numbers (around 10^{5}–10^{6}).

When the perturbation level increases from 0, there is a clear abrupt condition number change around 30–60% perturbation amplitude. This value varies among the models, in detail, 50% for Sucrose PGM model, 60% for Sucrose BM23, 30% for Glycolysis BM42, and 55% for Signaling BM66. After this perturbation level, all the models turn to ill-posed problems with very high condition numbers.

Under no or small covariance perturbations (δ^{2} > 0.9 for Sucrose PGM and Sucrose BM23 model. OLS and TSVD are exactly the same for models with small condition number including Sucrose PGM, BM42, and BM66 (Figures ^{2} around 0.3) but is still better or similar compared to other methods (Figures

^{2} values when regressed to the true Jacobian (vectorized, see

Total least squares (TLS) appears to perform better under large covariance perturbations. This is consistent with its principle (see

BM42 and BM66 models show a relatively low accuracy of reverse Jacobian even at small perturbations (Figures ^{2} calculation. To estimate such stiffness, we calculated the ratio between maximal and minimal absolute values of non-zero Jacobian entries, and found that these ratios for BM42 and BM66 are much bigger than in the other two models. The ratio is Sucrose PGM, 388; Sucrose BM23, 3192; Glycolysis BM42, 1.3e6; and Signaling BM66, 1.0e6.

We investigated the effects of perturbations on the fluctuation matrix

We found that for small-to-medium fluctuation matrix perturbations (δ^{2} which are generally over 0.90 for all reverse calculation methods (Figure

^{2} when regressed to the true Jacobian (vectorized, see ^{2} over 100 iterations.

When the perturbation level increases to 100% and the fluctuation matrix turns to be fully randomized, the reverse Jacobian drops significantly (Figure ^{2} centering around 0.3 and ranging from 0 to 0.8). TLS shows the largest drop, indicating it is more sensitive to fluctuation perturbations. For other methods (OLS, TIKH, and TSVD), although more than 75% of ^{2} are below 0.6, some ^{2} are as high as 0.8. It indicates that there is a possibility to achieve a good reverse Jacobian under some unknown conditions without knowing the fluctuation matrix at all. However, this needs to be further investigated.

Combining the previous results, we give a full map of the combined effect of perturbations on both covariance and fluctuation matrices with the Sucrose PGM model (Figure ^{2} ≥ 0.7) lies around 30%

^{2} when regressed to the true Jacobian (vectorized, see

For this specified Sucrose PGM model, being its low condition number, the OLS and TSVD produce similar patterns with large high accuracy borders (Figures

Understanding the regulatory mechanisms of metabolic networks is a challenging yet essential task in current biochemical studies. We previously established a reverse Jacobian reconstruction algorithm to infer the regulation of the metabolic network directly from the covariance data (Sun and Weckwerth,

We benchmarked these four inverse calculation methods under small-to-large perturbations on the covariance and fluctuation matrices. We found that the accuracy of reverse Jacobian is dependent on these factors: (1) the condition number of

Tested on the four models, TSVD has achieved highest reverse Jacobian accuracy. OLS performs well when both the condition number of

By systematically comparing inverse calculation methods on systems with inherent error or uncertainties, our study contributes not only to solving Jacobian from metabolomics covariance data, but also to solving ill-posed inverse problems widely studied in many other sciences.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at

^{1}In control theory, the generic form of equation ^{T}