^{1}

^{2}

^{3}

^{4}

^{2}

^{*}

^{2}

^{*}

^{1}

^{2}

^{3}

^{4}

Edited by: Alexander Rodriguez-Palacios, Department of Medicine, Case Western Reserve University, United States

Reviewed by: Herve Seligmann, The Hebrew University of Jerusalem, Israel; Tauqeer Hussain Mallhi, Department of Clinical Pharmacy, College of Pharmacy, Al Jouf University, Saudi Arabia

This article was submitted to Infectious Diseases - Surveillance, Prevention and Treatment, a section of the journal Frontiers in Public Health

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

COVID-19 is a pandemic viral disease with catastrophic global impact. This disease is more contagious than influenza such that cluster outbreaks occur frequently. If patients with symptoms quickly underwent testing and contact tracing, these outbreaks could be contained. Unfortunately, COVID-19 patients have symptoms similar to other common illnesses. Here, we hypothesize the order of symptom occurrence could help patients and medical professionals more quickly distinguish COVID-19 from other respiratory diseases, yet such essential information is largely unavailable. To this end, we apply a Markov Process to a graded partially ordered set based on clinical observations of COVID-19 cases to ascertain the most likely order of discernible symptoms (i.e., fever, cough, nausea/vomiting, and diarrhea) in COVID-19 patients. We then compared the progression of these symptoms in COVID-19 to other respiratory diseases, such as influenza, SARS, and MERS, to observe if the diseases present differently. Our model predicts that influenza initiates with cough, whereas COVID-19 like other coronavirus-related diseases initiates with fever. However, COVID-19 differs from SARS and MERS in the order of gastrointestinal symptoms. Our results support the notion that fever should be used to screen for entry into facilities as regions begin to reopen after the outbreak of Spring 2020. Additionally, our findings suggest that good clinical practice should involve recording the order of symptom occurrence in COVID-19 and other diseases. If such a systemic clinical practice had been standard since ancient diseases, perhaps the transition from local outbreak to pandemic could have been avoided.

The current pandemic of Coronavirus Disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has undergone an observed exponential increase of cases that has overrun hospitals across the world (

To this end, we assumed that symptoms and their orders are independent variables and created a model that approximates the probability of symptoms occurring in specific orders using available, non-ordered patient data. The use of these assumptions and data was necessary given the lack of ordered data. To do this, we applied a Markov Process to determine the order of occurrence of common symptoms of respiratory diseases. We have previously used a Markov Chain to predict cancer metastasis location (

In this study, we first defined this specific application of a Markov Process applied to a graded partially ordered set (poset), which we refer to as the Stochastic Progression Model. In this case, our graded poset represents all possible combinations of symptoms and all possible orders of symptom occurrence. It is graded because the possible combinations of symptoms are ranked by the number of symptoms that they each represent. For example, the symptom combination of fever and cough has the same rank as the combination of cough and diarrhea. We found that the Stochastic Progression Model for adults that are symptomatic indicates that there may be an order of discernible symptoms in COVID-19, but the order of symptoms seems to be independent of severity of the case on admission. From there, we compared the most likely order of symptoms in other respiratory diseases to COVID-19. To expand on our results, we analyzed a larger set of symptoms that are common to all respiratory diseases studied here and sought to decipher further distinctions.

Patient data from this study was collected from various reports in literature on the frequencies of symptoms in COVID-19, influenza, MERS, and SARS (

The main dataset of COVID-19 patients of the World Health Organization, containing 55,924 confirmed cases, was obtained through review of national and local governmental reports and observations made during visits to areas with infected individuals in China that occurred from February 16 to 24, 2020 (

We used initial frequency data of MERS and SARS to further ascertain early symptoms of disease. The MERS initial symptom frequency dataset, containing 45 confirmed cases, was collected from electronic medical records at the Samsung Medical Center in Seoul, South Korea that contained onset symptom data about patients in the 2015 Korean MERS outbreak (

Lastly, two additional datasets were collected to determine the utility of using first symptoms as early indicators of COVID-19 and influenza. The COVID-19 dataset used, containing 138 patients, was independent of all prior COVID-19 datasets. This data was obtained from electronic medical records of patients admitted to the Zhongnan Hospital of Wuhan University from January 1 to 28, 2020 (

The Stochastic Progression Model was built in R under version 3.5.2 and was illustrated by using the hasse function in the hasseDiagram_0.1.3 library (code available online:

We then simulated data of 500,000 patients, by randomly selecting if a patient has or does not have a symptom using the procedure described above and storing that information in a data frame that represents patients as rows and symptoms as columns. We assumed the occurrence of symptoms are random and independent. Considering these assumptions, we built the character arrays by applying the jar of marbles method for each simulated patient. The method repeats for each patient and involves pulling a marble from a series of jars representing each symptom. The information from each randomly pulled marble is stored in the corresponding cell of the character array in the correct column representing the symptom and the row representing the simulated patient. This process is repeated for all 500,000 simulated patients for all symptoms.

The Stochastic Progression Model is illustrated as a directed acyclic graph with nodes, representing the power set of Boolean vectors. The power sets of Boolean vectors each represent a possible state of a patient by noting the absence or presence of specific symptoms. The edges, which illustrate the transition from one state to another, were selected specifically using key definitions and assumptions to create a poset. We defined the states at the nodes as symptoms that a patient has experienced up until this point. We created and directed edges from states with fewer symptoms to more starting at the minimum set of a Boolean vector of all zeros, which indicates a person with no symptoms. First, we assume that each symptom occurs one at a time, even if the difference in time is infinitesimal. With this assumption, a node can only be directed to other nodes that denote the same set of symptoms plus one additional symptom. Second, we assume that if a patient does not digress and does not die, they will eventually acquire all symptoms reaching the maximum set of a Boolean vector, which represents a patient that has exhibited all symptoms. Applying these assumptions to form the directed acyclic graph creates a Hasse Diagram of a graded poset that follows a Markov Process altogether comprising the Stochastic Progression Model.

The nodes in the Hasse Diagram represent states of a patient by indicating the specific symptoms exhibited, and the edges represent transitions between these states. Therefore, we next needed to apply state probabilities to each node and transition probabilities to the directed edges. First, we labeled each simulated patient by summing the respective Boolean vector to find the number of symptoms for each patient. Then, to get the state probability of each node, we divided the number of simulated patients that are represented by the current Boolean vector by the total number of patients who have the same number of symptoms. To approximate the transition probability between two nodes (originating and terminating), we divided the number of simulated patients that are represented by the terminating node by the number of simulated patients that are represented by nodes characterized by the same number of symptoms as the terminating node, including the terminating node. The error of each node is determined by the sum of the products of the transition probabilities leading to that node subtracted from the state probability of the node. Then, the error of each implementation of the model was defined as the error of the node with the highest absolute value of error (

The WHO-China Joint Report from February 16 to 24, 2020 includes rates of symptom occurrence at presentation from 55,924 confirmed cases of COVID-19 (

Development of the stochastic progression model for COVID-19.

We then created another implementation of the Stochastic Progression Model and utilized the data in the WHO-China Joint Report (COVID-19 with

To further investigate these symptom paths, we implemented the Stochastic Progression Model with the main dataset (COVID-19 with

The confirmation dataset of COVID-19 cases (

The most and least likely paths of discernible symptoms in severe and non-severe COVID-19 cases on admission.

The four discernible symptoms are objective and relatively easy for patients and clinicians to confirm. So, we developed implementations of the Stochastic Progression Model using these symptoms to determine the most likely and least likely paths for four respiratory diseases: COVID-19, influenza, MERS, and SARS (

The most likely and least likely paths of discernible symptoms in respiratory diseases.

Although active surveillance of the order discernible symptoms (i.e., fever, cough, nausea/vomiting, and diarrhea) could be useful due to the distinctive most and least likely paths that we determined, we expanded our analysis to the seven symptoms commonly observed in all four respiratory diseases studied here. So, we created a second set of symptoms that amends sore throat, myalgia, and headache to the original set of symptoms (

The most likely path of common respiratory symptoms in COVID-19. The most likely path of seven common symptoms of COVID-19, determined by the transition probabilities that are also listed between nodes, of two datasets here.

We also implemented the Stochastic Progression Model with the same seven symptoms in influenza, SARS, and MERS datasets to compare and contrast disease progression with that in COVID-19 (

The most likely paths of symptoms in influenza, MERS, and SARS vs. COVID-19.

To illustrate the uniqueness of the most likely path of COVID-19, we found the transition probabilities of the same path in the other respiratory diseases (

The most likely path of symptoms in COVID-19 vs. influenza, MERS, and SARS.

Also, comparing the transition probabilities of paths in the same disease illustrates the significance of the most likely pathways. For example, the lowest transition probability in the most likely path of influenza is 0.578 (

The COVID-19 and influenza implementations of the Stochastic Progression Model suggest that there is a high likelihood of fever and cough occurring first, respectively. We desired to find metrics quantifying the possible link between first symptom and these two diseases. So, we determined the recall and the selectivity when using the initial symptom as an indicator of COVID-19 or influenza, with all other possible diseases excluded in a theoretical patient population. First, we simulated patient datasets using reported data that were independent from all previous work that we integrated in our analyses above (

Recall and selectivity of linking fever as a first symptom of patients with COVID-19.

10 Patients out of 200 | 0.980 | 0.063 | 0.661 | 0.030 |

20 Patients out of 400 | 0.990 | 0.021 | 0.665 | 0.030 |

30 Patients out of 600 | 0.977 | 0.035 | 0.668 | 0.017 |

40 Patients out of 800 | 0.973 | 0.018 | 0.665 | 0.020 |

50 Patients out of 1,000 | 0.966 | 0.031 | 0.665 | 0.016 |

Recall and selectivity of linking cough as a first symptom of patients with influenza.

10 Patients out of 200 | 0.810 | 0.110 | 0.369 | 0.031 |

20 Patients out of 400 | 0.820 | 0.067 | 0.364 | 0.030 |

30 Patients out of 600 | 0.777 | 0.061 | 0.364 | 0.015 |

40 Patients out of 800 | 0.765 | 0.092 | 0.367 | 0.023 |

50 Patients out of 1,000 | 0.804 | 0.051 | 0.362 | 0.014 |

The recall ranges from 0.966 to 0.990 with a standard deviation of 0.031 and 0.021, respectively when analyzing the link between COVID-19 and fever as a first symptom. The maximum standard deviation of any sample size is 0.063 for the mean of 0.980. On the other hand, the selectivity of fever as a first symptom of COVID-19 ranges from 0.661 to 0.668 with a standard deviation of 0.030 and 0.020, respectively, and 0.030 is the maximum standard deviation with corresponding means of 0.661 and 0.665 (

The recall in both cases is lower than the selectivity, and this observation indicates that this analysis categorizes patients as infected when they are not, but the high recall indicates that most infected patients did align with the first symptom that we predicted. In the future, we expect to confirm this analysis with data on first symptoms, as opposed to simulated data, but the purpose of this analysis was to display that further study of order of symptoms might lead to earlier recognition.

In this study, we found evidence that supports the notion that there is a most common order of discernible symptoms in COVID-19 that is also different from other prominent respiratory diseases. The most likely initial symptom is fever in the three diseases studied that are caused by coronaviruses (i.e., COVID-19, SARS, and MERS) and cough in influenza. The most likely order of the four easily discernible symptoms is identical in MERS and SARS, but the most likely path of COVID-19 has one key difference. The first two symptoms of COVID-19, SARS, and MERS are fever and cough. However, the upper GI tract (i.e., nausea/vomiting) seems to be affected before the lower GI tract (i.e., diarrhea) in COVID-19, which is the opposite from MERS and SARS. In all diseases, we found that fever and cough occur before nausea/vomiting and diarrhea. When observing the set of seven symptoms including three subjective ones (i.e., sore throat, headache, and myalgia), we found that the initial symptoms of the most likely path are the same as in the most likely path of the four discernible symptoms. Also, in both the four and seven symptoms implementations, the GI tract symptoms are last. A separate MERS dataset included the initial symptoms of patients on admission, which listed the symptoms from highest to lowest probability as fever, myalgia, cough, and diarrhea (

The simulation data used to approximate the state and transition probabilities in the Stochastic Progression Model relies on the assumption that symptoms included in the model are independent. Using the definition of independence, we observed the individual probabilities of fever and cough in a dataset from a case study of influenza, and we found that the product of the individual probabilities of fever and cough is almost equal to the probability of both occurring (

This study supports the idea that symptoms occur in a predictable order, but future work is needed to improve aspects of the Stochastic Progression Model and confirm the results found here. Our finding that COVID-19 first presents with a fever supports the recommended measures by the CDC which state that the public should take their temperature at home and when entering facilities as an early checking method (

Furthermore, when analyzing fever as the first symptom of COVID-19, a low selectivity indicates a high Type I error (i.e., rate of false positive), and a high recall indicates a low Type II error (i.e., rate of false negative). We found a moderate selectivity value and as a result, a moderate Type I error in this case. This Type I error is acceptable in our use of investigating fever as an initial symptom of COVID-19, because it suggests that more people get tested who are not infected, rather than less people get tested who are infected, as with Type II error (

The importance of knowing first symptoms is rooted in the need to stop the spread of COVID-19, a disease that is two to three times more transmissible than influenza and results in outbreaks of clusters (

Publicly available datasets were analyzed for this study. These can be found here:

JL and JH conceived the model. JL and JM conceived the project. JL created the model. JL, MM, and JM analyzed results. JL and MM wrote the manuscript. PK and JH supervised the project. All authors read, edited, and approved the final manuscript.

MM is employed by the company Nexus Development PA LLC. JM is employed by the company NanoCarrier Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We wish to thank Dr. Jorge Nieva for discussions and advisement and Libere Ndacayisaba for critical reading of the manuscript.

The Supplementary Material for this article can be found online at: