1 Introduction

Abstract: To assess how the species in microbial communities vary across conditions, researchers often measure the relative or absolute abundances of all species simultaneously using marker-gene and metagenomic sequencing. However, these measurements are taxonomically biased: Some species are measured more efficiently than others, yielding measured abundances that are higher or lower than the truth. Taxonomic bias is generally ignored in differential abundance (DA) analyses, likely due to a combination of 1) a widespread belief that protocol standardization is sufficient to yield accurate differences between samples and 2) a lack of practical solutions for addressing it. We use theoretical arguments and case studies to analyze the impact of taxonomic bias on DA analyses of relative and absolute abundances. We show that taxonomic bias does create inferential errors for a class of popular DA methods; however, whether these errors are biologically significant depends on experimental context. We present several approaches to mitigating the effect of bias, suited to a range of experimental questions and systems. If adopted, they could improve the reproducibility and interpretability of sequencing-based studies of microbial communities.

One of the most basic questions we can ask about microbial communities is: How do different microbial taxa vary in abundance—across space, time, and host or environmental conditions? Marker-gene and shotgun metagenomic sequencing (jointly, MGS) can be used to measure the abundances of thousands of species simultaneously, making it possible to ask this question on a community-wide scale. In these differential-abundance (DA) analyses, the change in abundance of a microbial taxon across samples or conditions is used to infer ecological dynamics or find microbes that are associated with specific host diseases or environmental conditions. Standard MGS measurements lose information about total microbial density and so are typically used to analyze the abundances of taxa relative to each other (relative abundances). But new methods are increasingly used to provide absolute information, making it possible to analyze changes in absolute cell density, biomass, or genome copy number (absolute abundances). In its various forms, DA is among the most common analyses applied to MGS data.

Unfortunately, these DA analyses are built on a fundamentally flawed foundation.

MGS measurements are taxonomically biased: Microbial species vary dramatically in how efficiently they are measured—that is, converted from cells into taxonomically classified sequencing reads—by a given MGS protocol (McLaren, Willis, and Callahan (2019)). This bias arises from variation in how species respond to each step in an MGS protocol, from sample collection to bioinformatic classification. Although often associated with features specific to marker-gene sequencing—the variation among species in marker copy numbers and in primer-binding and amplification efficiencies—the existence of large variation in DNA extraction efficiencies and in the ability to correctly classify reads make taxonomic bias a universal feature of both marker-gene and shotgun measurements. As a result, MGS measurements provide inaccurate representations of actual community composition and tend to differ systematically across protocols, studies, and even experimental batches within a study (Yeh et al. (2018), McLaren, Willis, and Callahan (2019)). These errors can supersede sizable biological differences (e.g. Lozupone et al. (2013)) and may have contributed to failed replications of prominent DA results such as the associations of Bacteroides and Firmicutes in stool with obesity (Finucane et al. (2014)) and the associations of species in the vaginas of pregnant women with preterm birth (Callahan et al. (2017)).

The standard approach to countering taxonomic bias is to standardize the measurement protocol used within a given study. Statistical analyses are then conducted with the tacit assumption that all samples will be affected by bias in the same way and so the differences between samples will be unaffected. This argument is at least intuitively plausible for DA analyses based on multiplicative or fold differences (FDs) in a taxon’s abundance. If bias caused a species’ abundance to be consistently measured as 10-fold greater than its actual value, then we would still recover the correct FDs among samples. However, McLaren, Willis, and Callahan (2019) showed mathematically and with MGS measurements of artificially constructed (‘mock’) communities that consistent taxonomic bias can create fold errors (FEs) that vary across samples and, as a result, majorly distort cross-sample comparisons. In particular, they showed that the FE in a species’ proportion—the most common measure of relative abundance—varies among samples, distorting the observed FDs between samples. In some cases, bias can even lead to incorrect inferences about the direction of change (for example, by causing a taxon that decreased to appear to increase). Yet McLaren, Willis, and Callahan (2019) also found that other abundance measures—those based on the ratios among species—have proportional errors and may lead to more robust DA analyses. The implications of these findings for DA analysis of absolute abundances and for the joint analysis of variation of many species across many samples, as is typical in microbiome association testing, have yet to be investigated.

Here we use a combination of theoretical analysis, simulation, and re-analysis of published experiments to consider when and why taxonomic bias in MGS measurements leads to spurious results in DA analysis of relative and absolute abundances. We show that, in contrast to received wisdom, taxonomic bias can affect the results obtained from many standard DA methods; however, other methods are robust to bias, provided that it is truly consistent across samples. We then use several case studies to explore the real-world conditions in which bias is likely to cause serious scientific errors. Finally, we present several methods for quantifying, correcting, or otherwise accounting for taxonomic bias in DA analyses which, in many cases, can be deployed with only modest changes to existing experimental and analytical workflows. Application of these insights and methods can build the confidence needed to turn the findings of microbiome studies into readily-translatable scientific knowledge.

References

Callahan, Benjamin J, Daniel B DiGiulio, Daniela S Aliaga Goltsman, Christine L Sun, Elizabeth K Costello, Pratheepa Jeganathan, Joseph R Biggio, et al. 2017. “Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women.” Proc. Natl. Acad. Sci. U. S. A. 114 (37): 9966–71. https://doi.org/10.1073/pnas.1705899114.

Finucane, Mariel M., Thomas J. Sharpton, Timothy J. Laurent, and Katherine S. Pollard. 2014. “A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter.” Edited by Markus M. Heimesaat. PLoS One 9 (1): e84689. https://doi.org/10.1371/journal.pone.0084689.

Lozupone, Catherine A, Jesse Stombaugh, Antonio Gonzalez, Gail Ackermann, Janet K Jansson, Jeffrey I Gordon, Doug Wendel, Yoshiki Va, and Rob Knight. 2013. “Meta-analyses of studies of the human microbiota.” Genome Res., 1704–14. https://doi.org/10.1101/gr.151803.112.

McLaren, Michael R, Amy D Willis, and Benjamin J Callahan. 2019. “Consistent and correctable bias in metagenomic sequencing experiments.” Elife 8 (September): 46923. https://doi.org/10.7554/eLife.46923.

Yeh, Yi-Chun, David M. Needham, Ella T. Sieradzki, and Jed A. Fuhrman. 2018. “Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run.” mSystems 3 (3): e00023–18. https://doi.org/10.1128/mSystems.00023-18.