1 Introduction
One of the most basic questions we can ask about microbial communities is: How do different microbial taxa vary in abundance—across space, time, and host or environmental conditions? Advances in sequencing technology allow us to simultaneously measure the abundances of 100s to 1000s of species using marker-gene and shotgun metagenomic sequencing (jointly, MGS). Although standard MGS measurements lose information about total microbial density—and so are typically used to analyze the abundances of taxa relative to each other or their total—new studies are increasingly employing strategies to enable the analysis of cell density or other measures of “absolute abundance.” These relative and absolute abundances serve as the basis for a differential-abundance (DA) analysis, in which the change in abundance of a microbial taxon across samples or conditions is used to learn about the biology of the taxon and its impact on the host and other microbes as well as to detect predictive biomarkers of host and environmental health and disease.
Although MGS-based DA analysis has been widely deployed and achieved many notable successes, it faces serious concerns over accuracy and reproducibility due to the inherent technical limitations of MGS measurements. In particular, MGS measurements are taxonomically biased: Taxa vary dramatically (e.g. 10-1000X) in how efficiently they are measured—that is, converted from cells into taxonomically classified sequencing reads—by a given MGS protocol (McLaren, Willis, and Callahan (2019)). As a result, the abundance measurements obtained by MGS are inaccurate representations of the actual abundances and also tend to differ across protocols, studies, and even experimental batches (Yeh et al. (2018),McLaren, Willis, and Callahan (2019)). This bias arises from variation in how taxa respond to each step in an MGS protocol, from sample collection to bioinformatic classification. Although often associated with variation in primer binding and amplification rates and marker-gene copy-number, large variation in DNA extraction efficiency and in the ability to correctly classify reads make taxonomic bias a feature of both shotgun and marker-gene measurements. The error it causes have been found to in some cases to supersede sizable biological differences (e.g. Lozupone et al. (2013)) and has plausibly caused replication failures for prominent findings such as the association of decreased Bacteroides and increased Firmicutes in stool with obesity (Finucane et al. (2014)) and the association of certain taxa in the vagina of pregnant women with preterm birth (Callahan et al. (2017)).
The typical approach to countering taxonomic bias in DA analysis is to standardize the measurement protocol used within a given study. In broad strokes, the thinking is that the measurements of samples measured by the same protocol will be affected by bias in the same way and so the inferred differences between samples (the focus of DA analysis) will be unaffected. For example, if taxonomic bias consistently causes the measured proportion of a given species to be 10X too high, we can still accurately infer its fold changes across samples (Kevorkian et al. (2018),Lloyd et al. (2020)). However, McLaren, Willis, and Callahan (2019) used theoretical arguments and sequencing measurements of defined bacterial (or ‘mock’) communities to show that consistent taxonomic bias can lead to variable fold errors in measured proportions (Figure 1.1). These varying errors can lead to spurious conclusions for how the proportion of a taxon varies across samples, even in the direction of change (for example, causing a taxon that decreases appear to increase). Yet McLaren, Willis, and Callahan (2019) also found that the fold error in the ratios among species was constant, making fold changes in ratios robust to bias. The implications of these findings for changes in absolute abundance—which remain subject to taxonomic bias in the underlying MGS measurement—and for DA analysis across many species and many samples—as commonly done in microbiome association testing—have yet to be investigated.
Here we use a combination of theoretical analysis, simulation, and re-analysis of published experiments to consider when and why taxonomic bias in MGS measurements leads to spurious results in DA analysis of relative and absolute abundance. Our analysis clarifies how the folk wisdom that taxonomic bias does not affect the analysis of change across samples is only partially correct and can give a false sense of security in the accuracy of DA results. Yet we also present several potential solutions—methods for quantifying, correcting, or otherwise accounting for the effect of taxonomic bias in DA analyses that can be deployed today with only modest changes to existing experimental and analytical workflows. Over time, application of these methods to past and future experiments will provide crucial quantitative information about the conditions under which taxonomic bias creates spurious results for various DA methodologies. Collectively, these methods and insights may provide practical solutions to taxonomic bias in DA analysis and the confidence that is necessary to codify the statistical findings of microbiome studies into readily-translatable scientific knowledge.
Figure 1.1: Mock community experiments show that taxonomic bias can distort the measured fold change in an individual species’ proportion across samples. The figure shows the measured vs. actual proportions for a single bacterial species, Lactobacillus crispatus, in a set of bacterial cellular mock communities, and the resulting fold changes between community samples. The inconsistent error in the measured proportions of individual samples (Panel A) leads to inaccurate measurements of fold changes (Panel B). Mock communities were constructed and measured with 16S sequencing by Brooks et al. (2015). The data was re-analyzed by McLaren, Willis, and Callahan (2019), who showed that despite the inconsistency of the errors in Panel A, taxonomic bias acted consistently across samples.