^{*}

Edited by: Martin Hartmann, ETH Zürich, Switzerland

Reviewed by: David Anthony Nipperess, Macquarie University, Australia; Alex Washburne, Montana State University System, United States

This article was submitted to Terrestrial Microbiology, a section of the journal Frontiers in Microbiology

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Understanding the drivers of diversity is a fundamental question in ecology. Extensive literature discusses different methods for describing diversity and documenting its effects on ecosystem health and function. However, it is widely believed that diversity depends on the intensity of sampling. I discuss a statistical perspective on diversity, framing the diversity of an environment as an unknown parameter, and discussing the bias and variance of plug-in and rarefied estimates. I describe the state of the statistical literature for addressing these problems, focusing on the analysis of microbial diversity. I argue that latent variable models can address issues with variance, but bias corrections need to be utilized as well. I encourage ecologists to use estimates of diversity that account for unobserved species, and to use measurement error models to compare diversity across ecosystems.

Alpha diversity metrics summarize the structure of an ecological community with respect to its richness (number of taxonomic groups), evenness (distribution of abundances of the groups), or both. Because many perturbations to a community affect the alpha diversity of a community, summarizing and comparing community structure via alpha diversity is a ubiquitous approach to analyzing community surveys. In microbial ecology, analyzing the alpha diversity of amplicon sequencing data is a common first approach to assessing differences between environments.

Unfortunately, determining how to meaningfully estimate and compare alpha diversity is not trivial. To illustrate, consider the following example where the alpha diversity metric of interest is strain-level richness of a microbial community (the total number of strain variants present in the environment). Suppose I conduct an experiment in which I take a sample from Environment A and count the number of different microbial taxa present in my sample. I then take a sample from Environment B, count the number of different taxa in that sample, and compare it to the number of taxa in Environment A. I am likely to observe higher numbers of different taxa in the sample with more microbial reads. The library sizes can dominate the biology in determining the result of the diversity analysis (Lande,

Rarefaction is a method that adjusts for differences in library sizes across samples to aid comparisons of alpha diversity. First proposed by Sanders (

Unfortunately, rarefaction is neither justifiable nor necessary, a view framed statistically by McMurdie and Holmes (

Imagine that we had complete knowledge of every microbe in existence, including identity, abundance and location. To compare microbial diversity, we would define specific environments (e.g., the distal gut of women aged 35 living in the contiguous U.S.) and compare diversity metrics across different ecological gradients (e.g., with or without irritable bowel syndrome diagnoses). Alpha diversity could be compared exactly, because we would know entire microbial populations with perfect precision.

Unfortunately, we do not have knowledge of every microbe. We take samples from environments, and investigate the microbial community present in the sample. We use our findings about the sample to draw inferences about the environment that we are truly interested in. The samples are not of particular interest, except that they reflect the environment from which they were sampled. As we sample more and more of the environment using larger samples, we get closer to understanding the true and total microbial community of interest. This means that as we increase sampling, our calculation of any diversity metric [e.g., richness (Fisher et al.,

Observing small samples from a large population is not an experimental set-up unique to microbial ecology: it is almost universal in statistics. The set-up where an estimate of a quantity converges to the correct value as more samples are obtained is also well understood in statistics. The unique property of microbiome experiments and alpha diversity analysis is that samples do not faithfully represent the entire microbial community under study. There is unadjusted error in using our samples as proxies for the entire community.

To illustrate this distinction, I contrast microbial diversity experiments with a non-microbial experiment. Suppose we are interested in modeling the CO_{2} flux of soil treated with different amendments. We would measure the flux of equally sized soil sites treated with the different amendments, performing biological replicates using multiple sites for each amendment. To assess if the amendments affect the flux, we would fit a regression-type model (such as ANOVA) to flux with amendment as an explanatory variable. Implicitly, this model acknowledges that we can assess the flux with high precision; that is, the margin for error for determining flux is negligible.

Now suppose we knew that our flux-measuring machine consistently underestimated flux by exactly 5 units. We would adjust for the measurement error by adding 5 units to each measurement before comparing them. But what happens when we have random measurement error? If the measurement error on the machine was random (e.g., with 0 mean and variance of 1 unit for all amendments), this would not affect any particular amendment. However, detecting a difference between the effects of amendment on flux would be more challenging statistically: we would require more samples to detect a true difference compared to the case without measurement error. To account for the additional experimental noise, we would use a model that would account for measurement error in assessing differences between amendments. If the variance in the measurement error was 1 unit for amendment A but 5 units for amendment B, we would similarly adjust with a measurement error model.

To decide if measurement error must be accounted for when observations are made in an experiment, it is necessary to consider the effect of repeating the observational process on the same experimental unit. In the flux experiment, this would involve measuring the flux of the same soil sites again using the same experimental conditions. Without measurement error in the observations, we would consistently observe the same flux measurement, while if we had random measurement error, we would most likely observe slightly different flux measurements. Because technical replicates in microbiome experiments yield different numbers of reads, different community compositions, and different levels of alpha diversity, we have measurement error in microbial experiments. We currently do not account for measurement error in microbial diversity studies.

While measurement error in microbiome studies affects all analyses of microbiome data, alpha diversity is particularly affected because commonly used estimates of alpha diversity are heavily biased compared to other estimation problems in microbial ecology (such as estimating relative abundances). Some tools to address problems with bias in alpha diversity exist in the statistical literature (Chao and Bunge,

To clarify this discussion, I will focus on taxonomic richness (the simplest case), and later generalize the argument to other alpha diversity metrics. Consider the setting in _{A}) is higher than Environment B's richness (_{B}). Suppose we have two biological replicates of samples from each environment: _{A1} and _{A2} reads from Environment A, _{B1} and _{B2} reads from Environment B, and _{A1} < _{B1} < _{A2} < _{B2}. Let _{ij} be the observed richness of environment _{A1} < _{A2} < _{B1} < _{B2}.

Expected sample taxonomic richness increases with number of reads

There are currently two commonly used methods for comparing alpha diversity. The first method, _{A1}, _{A2}, _{B1}, and _{B2}, and perform modeling and hypothesis testing (such as ANOVA) as if both the bias and variance of these estimates were zero (see, for example, Makipaa et al., _{A1} reads (the number of reads in the smallest sample), _{A1},

Here I advocate for a third strategy: adjust the sample richness of each ecosystem by adding to it an estimate of the number of unobserved species, estimate the variance in the total richness estimate, and compare the diversities relative to these errors (

Modeling parameters observed with estimation error is not a new suggestion: this approach is from the field of statistical

While the example discussed here is richness, this approach to estimating and comparing alpha diversity using a bias correction (incorporating unobserved taxa) and a variance adjustment (measurement error model) could apply to any alpha diversity metric. However, richness estimation has a well-studied statistical literature, and richness estimators that are adapted to microbiome data exist (see Bunge et al.,

Plug-in estimates of many alpha diversity indices (including richness and Shannon diversity) are negatively biased for the environment's alpha diversity parameter, that is, they underestimate the true alpha diversity (Lande,

It has recently been argued that studying microbial diversity without context is distracting us from gaining insight into ecological mechanisms (Shade,

AW wrote the manuscript and performed the data analysis.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This article is based on course notes presented by the author at the Marine Biological Laboratory at the STAMPS course in 2013, 2014, 2015, 2016, 2017, and 2018. The author is grateful to Berry Brosi, the MBL, the STAMPS course directors, and the STAMPS participants for countless discussions on this topic. The author also thanks Thea Whitman and two referees for many thoughtful suggestions on the manuscript. This manuscript has been released as a preprint via bioRxiv (Willis,