^{1}

^{1}

^{2}

^{3}

^{1}

^{*}

^{1}

^{2}

^{3}

Edited by: Tao Huang, Shanghai Institutes for Biological Sciences (CAS), China

Reviewed by: Huanfei Ma, Soochow University, China; Ling-Yun Wu, Academy of Mathematics and Systems Science (CAS), China

This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The progression of complex diseases is generally divided as a normal state, a pre-disease state or tipping point, and a disease state. Developing individual-specific method that can identify the pre-disease state just before a catastrophic deterioration, is critical for patients with complex diseases. However, with only a case sample, it is challenging to detect a pre-disease state which has little significant differences comparing with a normal state in terms of phenotypes and gene expressions. In this study, by regarding the tipping point as the end point of a stationary Markov process, we proposed a single-sample-based hidden Markov model (HMM) approach to explore the dynamical differences between a normal and a pre-disease states, and thus can signal the upcoming critical transition immediately after a pre-disease state. Using this method, we identified the pre-disease state or tipping point in a numerical simulation and two real datasets including stomach adenocarcinoma and influenza infection, which demonstrate the effectiveness of the method.

Considerable evidence suggests that during the progression of many complex diseases the deterioration is not necessarily smooth but abrupt (Litt et al.,

The outline for identifying the SSI score based on HMM. _{t} was constructed by PCC.

Recently, the dynamical network biomarker (DNB) method was proposed to detect the pre-disease state (Chen et al.,

In this work, by exploring the differential information between the normal and pre-disease states, we proposed a single-sample-based hidden Markov model (HMM) to signal the tipping point, even if there was only one case sample available. Specifically, the normal state was modeled as a stationary Markov process due to its highly stable nature in dynamics, while the pre-disease state was viewed as a time-varying Markov process considering its dynamical instability. Taking multiple normal samples as the references or background, a differential network whose edges carried the differential information before and after combing a single sample with references, was obtained specific to the single sample derived at a time point (

The theoretical basis of this study is the DNB theory, which provide the following generic properties when a dynamical system approaches a bifurcation point (Chen et al.,

SD(

PCC(_{1}, _{2}) increases sharply, where _{1} and _{2} represent the expressions of any two DNB members, PCC means the Pearson correlation coefficient.

PCC(

Neither SD(_{1}, _{2}) has significant change, where _{1} and _{2} represent expressions of non-DNB genes.

The detailed description and derivation of DNB can be seen in reference (Liu et al.,

A sketch of the single-sample-based HMM algorithm was provided in

A few samples that represents the relatively healthy condition were chosen as the reference or background. Generally, for individual-specific samples (e.g., samples for each symptomatic subject in influenza infection dataset), samples from a few initial time points of an individual (as shown in

First, we added each single case sample to the reference (

Second, based on the observation samples at each time point _{t} was constructed by the difference of the corresponding Pearson correlation coefficient (PCC) between the reference and combined samples (

Where _{i} and _{j} represent gene expressions for any pair of genes. Then |Δ_{i}, _{j})| was employed to constructed the differential network, i.e., when |Δ_{i}, _{j})| > _{i} and _{j} (_{1}, N_{2}, …, N_{T}, …}.

Third, suppose a time point _{T−1} = {_{1}, _{2}, …, _{T}_{−1}} = {N_{1}, N_{2}, …, N_{T−1}}, and testing part starting from _{T} = {N_{T}}. Let {_{1}, _{2}, …, _{t}} represents the state sequence up to _{0} and _{1}, respectively, denote the normal state (_{0}) and a possible pre-disease state (_{1}), which are two unobserved (hidden) states. Then based on the training samples _{T−1} = {N_{1}, N_{2}, …, N_{T−1}}, a HMM

was trained by the Baum-Welch procedures (Bilmes,

with

with

Where #1(_{T−1}, _{q}. The initial probabilities are

with π_{i} = _{q − 1} = _{i}),

Based on the testing sample _{T−1} = {N_{T}} we tested if the candidate point

Given the HMM θ^{T−1}, the SSI score was calculated by a forward algorithm. According to above settings, the calculation of probability _{T} = {N_{T}} is added to the training set, and the algorithm continues with

The algorithm of the single-sample-based HMM. The above flowchart shows how the algorithm works based on a series of single case samples. Regarding a point

According to the DNB theory, there are few differential edges in a differential network constructed in a normal stage, due to the high stability nature of the system during the normal stage. However, when the system approaches the critical transition point, there are many differential edges appearing in the differential network due to the time-varying and fluctuating dynamics of the system. Specifically, the algorithm is guaranteed by the generic properties 2 and 3 listed in section Theoretical Basis.

Two gene expression profiling datasets including the time-course dataset for influenza virus infection process (GSE30550) downloaded from the NCBI GEO database (

When applied the algorithm to both two disease datasets, there were two extra steps as follows.

First, the expression profiling information was mapped to the protein-protein interaction networks from STRING (

Second, the differential network was partitioned into local networks to reduce computational complexity. Each local network contained a center node and its first-order neighbors. The local SSI score for each local network was calculated through above algorithm. Given

Where _{i} denotes the number of nodes in the _{i} stands for the local

The networks were visualized using Cytoscape (

The proposed computational method and SSI score was applied to a numerical simulation dataset, which was generated from a nine-node regulatory network (

The application of SSI score in numerical simulation.

In Equation (S1), the parameter value

Cancer of the stomach is difficult to cure unless it is found at an early stage (before its metastasis). Unfortunately, because early stomach cancer causes few symptoms, the disease is usually advanced when the diagnosis is made (Wadhwa et al.,

The proposed method was employed in STAD dataset from TCGA, and identified the tipping point of distant metastasis (IIIA stage). This dataset contained RNA-Seq data and included 141 tumor samples and 33 tumor-adjacent samples. The tumor samples were grouped into seven stages, that is, stage IA (9 samples), stage IB (18 samples), stage IIA (23 samples), stage IIB (29 samples), stage IIIA (27 samples), stage IIIB (20 samples), and stage IV (15 samples) of stomach cancer. The tumor-adjacent samples were regarded as control data and were employed as reference samples.

As shown in

The application of SSI score in STAD dataset.

Based on IPA analysis, the common SSI-signaling genes were highly related to functions annotation “Digestive organ tumor” (

We applied the proposed method to a time-course dataset of live influenza infection challenge (GSE30550), in which there were 17 subjects who received injection of influenza virus (H3N2/Wisconsin). Among the 17 subjects, nine (subjects 1, 5, 6, 7, 8, 10, 12, 13, and 15) were infected who showed clinic symptoms and the other eight (subjects 2, 3, 4, 9, 11, 14, 16, and 17) were always stay healthy who didn't show any clinic symptom during the whole period of infection challenge (

The application of SSI score in influenza infection dataset.

The individual-specific SSI scores in

The dynamical evolution of subject-specific networks. To illustrate the dynamical evolution of the differential network, the individual-specific networks of two symptomatic subjects (subjects 1 and 12) were exhibited.

Based on IPA analysis, the common SSI-signaling genes were highly related to functions annotation “Quantity of lymphocytes” (

Detecting the early-warning signal before a sudden deterioration into a severe disease state is crucial to patients all over the world. However, it is generally challenging to signal such critical transition through only a single case sample, since the lack of samples disables statistical indices and thus makes conventional methods fail. In this work, we proposed a computational method to identify the pre-disease state on the basis of a single sample. Specifically, given a number of reference samples which can be the normal samples of an individual (

Comparing with the traditional methods which are mostly based on the differential expression of observed biomolecules, the proposed method aims at exploring the dynamic information of differential associations among biomolecules when a biological system is in the vicinity of a tipping point. This method thus possesses several obvious advantages. First, it works when only a single case sample is available, which benefits the analysis in personalized medicine. Second, it detects the pre-disease state rather than a disease state, which may help to achieve early diagnosis of some complex diseases. Third, it well-exhibits the critical properties at a network level which may provide new insights into catastrophic deterioration, such as the abnormally arising differential associations.

Although the proposed method is merely a step toward the identification of pre-disease state and the algorithm is expected to be improved in both sensitive and accurate ways, following the idea of personalized medicine, it provides a computational way and achieves individual-specific analysis and prediction by making use of only a single sample.

Publicly available datasets were analyzed in this study. This data can be found here:

RL and PC conceived the project. PC supervised the project. JZ, XY, and YL performed computational and analysis. All authors wrote the manuscript and read and approved the final manuscript.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The Supplementary Material for this article can be found online at: