1 Introduction

One of the most basic questions we can ask about microbial communities is: How do different microbial taxa vary in abundance—across space, time, and host or environmental conditions? Marker-gene and shotgun metagenomic sequencing (jointly, MGS) can be used to measure the abundances of thousands of species simultaneously, making it possible to ask this question on a community-wide scale. In these differential-abundance (DA) analyses, the change in abundance of a microbial taxon across samples or conditions is used to infer ecological dynamics or find microbes that are associated with specific host diseases or environmental conditions. Standard MGS measurements lose information about total microbial density and so are typically used to analyze the abundances of taxa relative to each other. But new methods are increasingly used to provide absolute information, making it possible to analyze changes in absolute cell density. In its various forms, DA analysis remain one of, if not the most, common forms of analyses applied to MGS data to elucidate the inner workings of microbiomes and their relationships to host and environmental health.

Yet these DA analysis are built on a fundamentally flawed foundation. MGS measurements are taxonomically biased: Microbial species vary dramatically (e.g. 10-1000X) in how efficiently they are measured—that is, converted from cells into taxonomically classified sequencing reads—by a given MGS protocol (McLaren, Willis, and Callahan (2019)). This bias arises from variation in how species respond to each step in an MGS protocol, from sample collection to bioinformatic classification. Although often associated with features specific to marker-gene sequencing—the variation among species in marker copy numbers and in primer-binding and amplification efficiencies—the existence of large variation in DNA extraction efficiencies and in the ability to correctly classify reads make taxonomic bias a universal feature of both marker-gene and shotgun measurements. As a result of taxonomic bias, MGS measurements provide inaccurate representations of actual community composition and tend to differ across protocols, studies, and even experimental batches (Yeh et al. (2018), McLaren, Willis, and Callahan (2019)). These errors have been found in some cases to supersede sizable biological differences (e.g. Lozupone et al. (2013)) and have may have contributed to failed replications of prominent findings in the human microbiome literature, such as the associations of Bacteroides and Firmicutes in stool with obesity (Finucane et al. (2014)) and the associations of certain species in the vagina of pregnant women with preterm birth (Callahan et al. (2017)).

The extent to which taxonomic bias has impacted the DA results in the scientific literature is unknown. The typical approach taken to counter taxonomic bias is to standardize the measurement protocol used within a given study, with the (often tacit) assumption being that samples measured by the same protocol will be affected by bias in the same way and so the measured differences between samples will be unaffected. For example, if taxonomic bias were to cause the measured proportion of a species to consistently be 10X too high, we would still be able to accurately infer the fold change in its proportion across samples (Kevorkian et al. (2018), Lloyd et al. (2020)). Unfortunately, mathematical arguments and analysis of experiments with artificially constructed (‘mock’) communities demonstrate that this assumption is not always warranted: Consistent taxonomic bias can lead to variable fold errors in species’ proportions (Figure 1.1, McLaren, Willis, and Callahan (2019)). These varying errors can lead to spurious conclusions for how the proportion of a taxon varies across samples, even in the direction of change (for example, causing a taxon that decreases appear to increase) (McLaren, Willis, and Callahan (2019)). Yet McLaren, Willis, and Callahan (2019) also found that certain types of DA analysis—those based on fold changes in the ratios among species—where robust to bias. The implications of these findings for DA analysis of absolute abundances and for the joint analysis of variation of many species across many samples, as typically done in microbiome association testing, have yet to be investigated.

Here we use a combination of theoretical analysis, simulation, and re-analysis of published experiments to consider when and why taxonomic bias in MGS measurements leads to spurious results in DA analysis of relative and absolute abundance. Our analysis clarifies how the received wisdom that taxonomic bias does not affect the analysis of change across samples is only partially correct and can give a false sense of security in the accuracy of DA results. Yet we also present several potential solutions—methods for quantifying, correcting, or otherwise accounting for the effect of taxonomic bias in DA analyses that can be deployed today with only modest changes to existing experimental and analytical workflows. Over time, application of these methods to past and future experiments will provide crucial quantitative information about the conditions under which taxonomic bias creates spurious results for various DA methodologies. Collectively, these methods and insights may provide practical solutions to taxonomic bias in DA analysis and the confidence that is necessary to codify the statistical findings of microbiome studies into readily-translatable scientific knowledge.

Mock community experiments show that taxonomic bias can distort the measured fold change in an individual species’ proportion across samples. The figure shows the measured vs. actual proportions for a single bacterial species, Lactobacillus crispatus, in a set of bacterial cellular mock communities, and the resulting fold changes between community samples. The inconsistent error in the measured proportions of individual samples (Panel A) leads to inaccurate measurements of fold changes (Panel B). Mock communities were constructed and measured with 16S sequencing by Brooks et al. (2015). The data was re-analyzed by McLaren, Willis, and Callahan (2019), who showed that despite the inconsistency of the errors in Panel A, taxonomic bias acted consistently across samples.

Figure 1.1: Mock community experiments show that taxonomic bias can distort the measured fold change in an individual species’ proportion across samples. The figure shows the measured vs. actual proportions for a single bacterial species, Lactobacillus crispatus, in a set of bacterial cellular mock communities, and the resulting fold changes between community samples. The inconsistent error in the measured proportions of individual samples (Panel A) leads to inaccurate measurements of fold changes (Panel B). Mock communities were constructed and measured with 16S sequencing by Brooks et al. (2015). The data was re-analyzed by McLaren, Willis, and Callahan (2019), who showed that despite the inconsistency of the errors in Panel A, taxonomic bias acted consistently across samples.

References

Brooks, J Paul, David J Edwards, Michael D Harwich, Maria C Rivera, Jennifer M Fettweis, Myrna G Serrano, Robert A Reris, et al. 2015. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies.” BMC Microbiol. BioMed Central. https://doi.org/10.1186/s12866-015-0351-6.
Callahan, Benjamin J, Daniel B DiGiulio, Daniela S Aliaga Goltsman, Christine L Sun, Elizabeth K Costello, Pratheepa Jeganathan, Joseph R Biggio, et al. 2017. Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women.” Proc. Natl. Acad. Sci. U. S. A. 114 (37): 9966–71. https://doi.org/10.1073/pnas.1705899114.
Finucane, Mariel M., Thomas J. Sharpton, Timothy J. Laurent, and Katherine S. Pollard. 2014. A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter.” Edited by Markus M. Heimesaat. PLoS One 9 (1): e84689. https://doi.org/10.1371/journal.pone.0084689.
Kevorkian, Richard, Jordan T Bird, Alexander Shumaker, and Karen G Lloyd. 2018. Estimating Population Turnover Rates by Relative Quantification Methods Reveals Microbial Dynamics in Marine Sediment.” Appl. Environ. Microbiol. 84 (1): e01443–17. https://doi.org/10.1128/AEM.01443-17.
Lloyd, Karen G., Jordan T. Bird, Joy Buongiorno, Emily Deas, Richard Kevorkian, Talor Noordhoek, Jacob Rosalsky, and Taylor Roy. 2020. Evidence for a Growth Zone for Deep-Subsurface Microbial Clades in Near-Surface Anoxic Sediments.” Appl. Environ. Microbiol. 86 (19): 1–15. https://doi.org/10.1128/AEM.00877-20.
Lozupone, Catherine A, Jesse Stombaugh, Antonio Gonzalez, Gail Ackermann, Janet K Jansson, Jeffrey I Gordon, Doug Wendel, Yoshiki Va, and Rob Knight. 2013. Meta-analyses of studies of the human microbiota.” Genome Res., 1704–14. https://doi.org/10.1101/gr.151803.112.
McLaren, Michael R, Amy D Willis, and Benjamin J Callahan. 2019. Consistent and correctable bias in metagenomic sequencing experiments.” Elife 8 (September): 46923. https://doi.org/10.7554/eLife.46923.
Yeh, Yi-Chun, David M. Needham, Ella T. Sieradzki, and Jed A. Fuhrman. 2018. Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run.” mSystems 3 (3): e00023–18. https://doi.org/10.1128/mSystems.00023-18.