^{1}

^{2}

^{1}

^{1}

^{3}

^{4}

This article was submitted to Social Physics, a section of the journal Frontiers in Physics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Understanding the transmission process is crucial for the prevention and mitigation of COVID-19 spread. This paper contributes to the COVID-19 knowledge by analyzing the incubation period, the transmission rate from close contact to infection, and the properties of multiple-generation transmission. The data regarding these parameters are extracted from a detailed line-list database of 9,120 cases reported in mainland China from January 15 to February 29, 2020. The incubation period of COVID-19 has a mean, median, and mode of 7.83, 7, and 5 days, and, in 12.5% of cases, more than 14 days. The number of close contacts for these cases during the incubation period and a few days before hospitalization follows a log-normal distribution, which may lead to super-spreading events. The disease transmission rate from close contact roughly decreases in line with the number of close contacts with median 0.13. The average secondary cases are 2.10, 1.35, and 2.2 for the first, second, and third generations conditioned on at least one offspring. However, the ratio of no further spread in the 2nd, 3rd, and 4th generations are 26.2, 93.9, and 90.7%, respectively. Moreover, the conditioned reproduction number in the second generation is geometrically distributed. Our findings suggest that, in order to effectively control the pandemic, prevention measures, such as social distancing, wearing masks, and isolating from close contacts, would be the most important and least costly measures.

As of July 2020, the cumulative confirmed cases of COVID-19 worldwide have exceeded 17.4 million with over 572 thousand dead. There are 22 countries with more than 100,000 confirmed cases of as of July 14, 2020. The high transmissibility of the SARS-CoV-2 virus has substantially changed people’s hygiene habits, social relations, and forms of work and schooling during and after the pandemic [

Understanding the characteristics of the COVID-19 transmission process is crucial in finding a middle ground between restoring economic and societal order and controlling the pandemic. Previous research has shown that COVID-19 can be infectious pre-symptomatically [

Considering the incubation period, as of Jan. 26, the mean and median were 5 and 4.75 days (obtained by 125 patients) [

The transmission rate is defined as the probability that an infection occurs among susceptible people within a specific group. It is an important index for providing an indication of how social interactions are related to transmission risk. Nine reports were listed in [

One of the most important indices for infectious disease is the basic reproductive number. Numerous studies are devoted to its estimate. It is estimated to be 2.2 [

The best-known model within infectious disease epidemiology is the SEIR (susceptible-exposed-infectious-recovered) model with different generalization. These models are utilized at the population level for the proportion of each state at given time, aiming to investigating the strategic decisions or effectiveness of the mitigation measures. For illustration, effective containment can explains the subexponential growth in China [

Clinical investigations may suffer from a limited sample size and biased sampling from the population, leading to geometrical or demographic-dependent results. Different samples and different methods also lead to different results for data analysis and estimates. Simulation of disease spread and mitigation policies require a precise setting of incubation period [

In this work, we estimate the parameters of concern from a large scale epidemiological line-list database, which contains the contact history and epidemiological timelines of 9,120 confirmed COVID-19 cases in China [

The incubation distribution is fitted by Weibull distribution with a mean and median of 7.83 and 7 days, respectively; this is in agreement with [

The rest of the paper is organized as follows.

The line-list database used in this paper contains hand-coded information extracted from 9,120 public reported cases by mainland China health commissions from January 15 to February 29, 2020. A typically reported item is as follows:

“Patient ID: Huainan-25.

The patient Huainan-25 is a 59-year-old woman who is the wife of the Huainan-26 patient. On February 12, she developed fever, muscle soreness, and other symptoms. On February 14, she went to the hospital for treatment and stayed at the hospital for observation. On February 15, her nucleic acid test was tested positive, and doctors diagnosed her as a suspected patient. Two days later, she was confirmed. Doctors have traced back 3 close contacts, all of whom have been quarantined for medical observation. During the New Year’s holiday, she had close contact with her daughter, son-in-law, and granddaughter. Her son-in-law, an asymptomatic patient with a history of suspicious exposure in Hefei, stayed at a designated hospital for observation. Doctors have traced back his 46 close contacts, all of whom have been quarantined for medical observation.”

The original extracted line-list database contains the epidemiology timelines, e.g., the possible date of virus exposure and date of symptom onset, for each case. We define the incubation period as the time between virus exposure and symptom onset. There are 457 cases with both dates of exposure and date of symptoms reported in the line-list database.

Close contact events are social events and scenarios such as living together, dining together, traveling together, and working together. There were 412 close contact events with the numbers of close contacts and secondary infections reported. Multiple-generation transmissions can form tree structures that originated from an initial infection. There are 421 transmission chains identified from the line-list.

The incubation period is a vital variable considering the control of the pandemic. The quarantine period of close contact people with an infected individual depends on this variable. The quarantine was usually 14 days for COVID-19. However, for strict prevention, it was suggested at the Information Office of Beijing Municipality press conference on June 28 that after the first 14 days, another 14-day quarantine is necessary in some high-risk areas.

The reason why another 14 days quarantine is necessary can be found from the distribution of incubation time. The sample with 457 incubation time reveals that it is a skewed distribution, see

The empirical distribution and Weibull distribution fitting of incubation time. The Weibull distribution has density function

That is to say, the chance of an asymptomatic infected individual turning into symptomatic after 14 days is about 12.5%. For strict control of COVID-19, longer quarantine is necessary. A Weibull distribution is fitted to the empirical data, with shift 1 to the right for avoiding zero. The density function is

The scale of close contact events is the number of people involved in one event of where people have gathered together in a specific way.

Numbers of different types of close contact events.

Type | Number of cases | Proportion (%) |
---|---|---|

Living | 386 | 93.69 |

Dining | 7 | 1.70 |

Working | 3 | 0.73 |

Traveling | 1 | 0.24 |

Others | 15 | 3.64 |

The period of our dataset is the early stage of COVID-19 spread in China. The distribution of the scale in close contact events is a natural feature seen when people are free from movement regardless of the COVID-19 pandemic. The contact scale is intrinsically positive, with a few enormously high data points typically arising. The lognormal distribution is an ideal descriptor of such data, with a positive range, right skewness, heavy right tail, and easily computed parameter estimates. Supported by the K-S test with a value of 0.18, the log-normal distribution shows the proper fitting among the positive, skewed, heavy-tailed distribution candidate. The mechanism of lognormal distributed data in ecology can be obtained by stochastic differential equation [

The empirical distribution of scale of close contact events with log-normal density fitting. The density function of this log-normal distribution is

The density function of this log-normal distribution is

We define the transmission rate as the number of people infected in one close contact event over the number of people in that event.

The scatter plot of

Let

The empirical distribution of the transmission rate. The mean and median of the transmission rate is 0.20 and 0.13 with an interquantile range 0–0.3. No proper common distribution fits the empirical distribution.

Transmission events can create tree structures to map disease spread. There are in total 421 chains verified from the record data. Among the chains, there are 311 chains with secondary cases, out of which there are 654 children in the second generation. However, due to effective prevention, there are only 54 and 11 children in the third and fourth generations, respectively. No fifth generation is observed in our dataset.

The reproduction number of an infection is the number of secondary infectees infected by the same confirmed individual. We define the reproduction number in each generation by dividing the number of infected people in the next generation by the present one. Based on the existence of at least one child in the next generation, the mean reproduction number in the first, second, and third generations are 2.10, 1.35, and 2.2. However, without the conditional restriction, the mean are 1.55, 0.08, and 0.2, respectively, see

The reproduction number in each generation of transmission in the spreading tree.

Tree depth (Size) | Ratio of no child (Size) | Conditioned mean | Mean |
---|---|---|---|

1(421) | 26.2% (110) | 2.10 | 1.55 |

2(654) | 93.9% (614) | 1.35 | 0.08 |

3(54) | 90.7% (49) | 2.2 | 0.20 |

4(11) | 100.0% (11) | 0 | 0 |

Using the sample of number of secondary cases caused by the 311 infectors in the first generation, empirical distribution, together with geometric fitting is shown in

The empirical distribution of infection numbers in the second generation with geometric fitting. The geometric distribution law is

In this study, based on the details of confirmed cases reported by the mass media, the following features are explored: the Weibull distribution of the incubation period, the Log-normal distribution of the scale of close contact events, the geometric distribution of the reproduction number in different generations of virus transmissions, and the statistical feature of secondary attack rate.

As far as we know, the distribution of the close contacts’ scale is released for the first time that it is log-normal distributed due to lack of data. This heavy-tailed distribution reveals a relatively larger possibility of super spreading events comparing to light-tailed distributions. To reduce the secondary infection, it is important to take adequate measures to reduce the scale of close contact and reduce the secondary infections. Moreover, efforts should be made to trace back the close contacts to cut off the possible spreading chain in advance.

It is notable that the method here is universal to all infectious diseases. The crucial step is the line-list record of each confirmed case and the detailed transmission relationship in the spreading tree structure. For infectious diseases where only non-pharmaceutical measurement can be applied to prevent its spreading, detailed record keeping of each confirmed case and the contact history is crucial. The tree structure is good evidence for the spreading trend and helpful for the precise estimation of the effective reproductive number. Moreover, contact history is useful to nip severe infectious diseases in the bud.

Theoretically, the reproduction number, say

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below:

LZ and JZ contributed equally as first authors. JZ, X-FL, and X-KX designed the analysis, LZ, XW, JY, and X-KX analyzed the data. LZ and X-FL wrote the paper.

This work was jointly supported by the Fundamental Research Funds for the Central Universities (No. 2019XD-A11), the National Natural Science Foundation of China (Grant Nos. 11971074, 61671005, 61672108, 61976025, 61773091), the LiaoNing Revitalization Talents Program (XLYC1807106).

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.