1 Introduction

Abstract: To assess how the species in microbial communities vary across conditions, researchers often measure the relative or absolute abundances of all species simultaneously using marker-gene and metagenomic sequencing. These measurements are taxonomically biased, with some species measured many times more efficiently than others. Measurement error caused by taxonomic bias is generally ignored in such differential abundance (DA) analyses, likely due to a combination of 1) a widespread belief that protocol standardization is sufficient to yield accurate differences between samples and 2) a lack of practical solutions for addressing it. We use theoretical arguments and analysis of real and simulated experiments to analyze the impact of taxonomic bias on relative and absolute DA analyses. We show that taxonomic bias does in fact create inferential errors for a class of popular DA methods; however, whether these errors are biologically significant depends on experimental context. We present six approaches to addressing the error, each suited to different experimental questions and systems. These approaches provide practical solutions to mitigating the effect of taxonomic bias for a large fraction of DA questions. If adopted, they may lead to notable improvements in the reproducibility and biological interpretability of the statistical findings of microbiome studies.

One of the most basic questions we can ask about microbial communities is: How do different microbial taxa vary in abundance—across space, time, and host or environmental conditions? Marker-gene and shotgun metagenomic sequencing (jointly, MGS) can be used to measure the abundances of thousands of species simultaneously, making it possible to ask this question on a community-wide scale. In these differential-abundance (DA) analyses, the change in abundance of a microbial taxon across samples or conditions is used to infer ecological dynamics or find microbes that are associated with specific host diseases or environmental conditions. Standard MGS measurements lose information about total microbial density and so are typically used to analyze the abundances of taxa relative to each other. But new methods are increasingly used to provide absolute information, making it possible to analyze changes in absolute cell density. In its various forms, DA analysis remain one of, if not the most, common forms of analyses applied to MGS data to elucidate the inner workings of microbiomes and their relationships to host and environmental health.

Yet these DA analysis are built on a fundamentally flawed foundation. MGS measurements are taxonomically biased: Microbial species vary dramatically (e.g. 10-1000X) in how efficiently they are measured—that is, converted from cells into taxonomically classified sequencing reads—by a given MGS protocol (McLaren, Willis, and Callahan (2019)). This bias arises from variation in how species respond to each step in an MGS protocol, from sample collection to bioinformatic classification. Although often associated with features specific to marker-gene sequencing—the variation among species in marker copy numbers and in primer-binding and amplification efficiencies—the existence of large variation in DNA extraction efficiencies and in the ability to correctly classify reads make taxonomic bias a universal feature of both marker-gene and shotgun measurements. As a result, MGS measurements provide inaccurate representations of actual community composition and tend to differ across protocols, studies, and even experimental batches (Yeh et al. (2018), McLaren, Willis, and Callahan (2019)). These errors can supersede sizable biological differences (e.g. Lozupone et al. (2013)) and may have contributed to failed replications of prominent findings such as the associations of Bacteroides and Firmicutes in stool with obesity (Finucane et al. (2014)) and the associations of species in the vaginas of pregnant women with preterm birth (Callahan et al. (2017)).

The standard approach to countering taxonomic bias is to standardize the measurement protocol used within a given study. Statistical analyses are then conducted with the (often tacit) assumption that all samples will be affected by bias in the same way and so the differences between samples will be unaffected. This argument is at least intuitively plausible for DA analyses based on multiplicative or fold changes in a taxon’s abundance. If bias caused the abundance of a species to be consistently measured as 10X too high—what is known as proportional error—then we would still correctly measure its multiplicative variation across samples. However, McLaren, Willis, and Callahan (2019) showed mathematically and with MGS measurements of artificially constructed (‘mock’) communities that consistent taxonomic bias can create non-proportional errors that can majorly distort cross-sample comparisons. In particular, they showed that a species’ proportion—the most common measure of relative abundance—has non-proportional errors which affect the inferred changes between samples and can even lead to incorrect inferences about the direction of change (for example, by causing a taxon that decreased to appear to increase). Yet McLaren, Willis, and Callahan (2019) also found that other abundance measures—those based on the ratios among species—have proportional errors and may lead to more robust DA analyses. The implications of these findings for DA analysis of absolute abundances and for the joint analysis of variation of many species across many samples, as is typical in microbiome association testing, have yet to be investigated.

Here we use a combination of theoretical analysis, simulation, and re-analysis of published experiments to consider when and why taxonomic bias in MGS measurements leads to spurious results in DA analysis of relative and absolute abundance. Our analysis clarifies how the received wisdom that taxonomic bias does not affect the analysis of change across samples is only partially correct and can give a false sense of security in the accuracy of DA results. Yet we also present several potential solutions—methods for quantifying, correcting, or otherwise accounting for the effect of taxonomic bias in DA analyses that can be deployed today with only modest changes to existing experimental and analytical workflows. Over time, application of these methods to past and future experiments will provide crucial quantitative information about the conditions under which taxonomic bias creates spurious results for various DA methodologies. Collectively, these methods and insights may provide practical solutions to taxonomic bias in DA analysis and the confidence that is necessary to codify the statistical findings of microbiome studies into readily-translatable scientific knowledge.

References

Callahan, Benjamin J, Daniel B DiGiulio, Daniela S Aliaga Goltsman, Christine L Sun, Elizabeth K Costello, Pratheepa Jeganathan, Joseph R Biggio, et al. 2017. “Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women.” Proc. Natl. Acad. Sci. U. S. A. 114 (37): 9966–71. https://doi.org/10.1073/pnas.1705899114.

Finucane, Mariel M., Thomas J. Sharpton, Timothy J. Laurent, and Katherine S. Pollard. 2014. “A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter.” Edited by Markus M. Heimesaat. PLoS One 9 (1): e84689. https://doi.org/10.1371/journal.pone.0084689.

Lozupone, Catherine A, Jesse Stombaugh, Antonio Gonzalez, Gail Ackermann, Janet K Jansson, Jeffrey I Gordon, Doug Wendel, Yoshiki Va, and Rob Knight. 2013. “Meta-analyses of studies of the human microbiota.” Genome Res., 1704–14. https://doi.org/10.1101/gr.151803.112.

McLaren, Michael R, Amy D Willis, and Benjamin J Callahan. 2019. “Consistent and correctable bias in metagenomic sequencing experiments.” Elife 8 (September): 46923. https://doi.org/10.7554/eLife.46923.

Yeh, Yi-Chun, David M. Needham, Ella T. Sieradzki, and Jed A. Fuhrman. 2018. “Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run.” mSystems 3 (3): e00023–18. https://doi.org/10.1128/mSystems.00023-18.