4 Implications for real-world inference 

Are biologically significant errors in DA analyses likely in practice? We can begin to answer this question using a combination of theoretical analysis, consideration of hypothetical scenarios, and analysis of case studies for which some information about taxonomic bias is available.

4.1 Systematic error in slope or LFC estimates

Recall from Section 3 that the error in the slope or LFC in a DA regression analysis is proportional to the covariance of the log mean efficiency with the covariate. This covariance can be split into two components: the standard deviation of (log) mean efficiency and its correlation with the covariate. Hence when considering whether taxonomic bias creates large error, it can be useful to separately ask whether the mean efficiency is likely to vary across samples and whether it is likely to be correlated with a covariate of interest.

One approach to studying the mean efficiency empirically is to analyze data from studies for which control measurements allow direct measurement of species’ efficiencies. Brooks et al. (2015) performed 16S rRNA gene sequencing of mock communities of seven bacterial species thought to play a critical role in the human vaginal microbiome with the MGS protocol developed for the Vaginal Human Microbiome Project (VaHMP). McLaren, Willis, and Callahan (2019) later used this data to estimate the efficiencies of these seven species and develop a method to use these estimates to infer the true compositions of natural samples (see Section @ref(#calibrate-compositions)). SI Analysis (MOMS-PI) uses these estimated efficiencies to correct the compositions and explore the implied variation in mean efficiency in the VaHMP’s Multi-Omic Microbiome Study: Pregnancy Initiative (MOMS-PI) Fettweis et al. (2019). The vaginal microbiome during pregnancy is characterized by relatively low diversity, with a single Lactobacillus species often forming a majority of reads (considered to be the healthy state) but with occasional shifts to high proportions of other species, such as Gardnerella vaginalis, Atopobium vaginae, and Streptococcus agalactiae, which are thought to cause or indicate poor vaginal health and increased risk of preterm birth. The analysis of McLaren, Willis, and Callahan (2019) showed that the efficiencies of two common Lactobacillus species, L. crispatus and L. iners, were 20-30X greater than that of G. vaginalis, A. vaginae, and S. agalactiae. Analysis of microbiome trajectories of a single patient across visits indicates that the mean efficiency typically decreases by a factor of 3-10X when the community shifts from a high to low-Lactobacillus state. (NOTE: This is a super preliminary claim based on looking at just one woman, and is just a placeholder) Decreases in mean efficiency during transitions from Lactobacillus to Gardnerella dominance can be expected to be even more extreme for commonly-used vaginal microbiome primers that fail to amplify Gardnerella.

A second study in which control communities make it possible to directly analyze the impact of taxonomic bias in experimental samples was performed by Leopold and Busby (2020). To study the interactions between a host plant and its fungal commensals and pathogen, the authors inoculated trees with a fungal synthetic community and later exposed plants to a fungal pathogen. The authors used ITS amplicon sequencing to measure communities before and after infection, along with mock communities which they used to estimate the species efficiencies with the method of McLaren, Willis, and Callahan (2019). (DNA mocks were used, so that bias due to DNA extraction is not included, but other major bias sources such as PCR and ribosomal copy-number-variation are included.) The pathogen was 10X more efficiently measured than the median commensal and 40X more efficiently measured than the lowest-efficiency commensal. A re-analysis of their data (SI Analysis) shows that the mean efficiency varies by ??? in pre-infection communities and by an average of ??? pre- and post-infection, with the latter driven by the dramatic increase in the high-efficiency pathogen.

These case studies indicate that one mechanism for variability in the mean efficiency is swings between very high (\(\sim 1\)) and low (\(\ll 1\)) proportions of species with particularly high or low efficiencies. In highly diverse ecosystems, as seen in soil and the human gut (CHECK), one species rarely dominates within a given sample and so such large swings are not possible. We might therefore think that the mean efficiency will remain fairly stable in these settings. In fact, one can show mathematically that in randomly assembled communities, the mean efficiency becomes more stable as (Inverse Simpson) diversity increases, as the mean effectively averages over a greater number of species. Real communities do not assemble randomly, however. Variance in the mean efficiency may remain high if environmental filtering leads to different groups of organisms dominating in different environments, particularly if shared phenotypic values or evolutionary history causes these groups to differ in their efficiencies as well as their ecology. For example, gut microbiomes typically have high diversity at the species level but are dominated by just a small number of phyla, and two in particular: The Bacteroidetes and the Firmicutes, the ratio of which varies substantially across individuals (CHECK). Many DNA extraction protocols have been found to more efficiently lyse Gram-negative Bacteroidetes species than Gram-positive Firmicutes species (though variation can also be substantial within phyla, McLaren, Willis, and Callahan (2019)). Gut samples with large differences in their Bacteroidetes and Firmicutes proportions may therefore have large variation in mean efficiency even if the individual species within each phylum never reach a large proportion. On the other hand, competitive-exclusion dynamics can favor more stable taxonomic and/or phenotypic compositions across samples and might thereby drive a more stable mean efficiency than under random assembly

Systematic error in DA results requires that the variation in log mean efficiency is correlated with the covariate. The examples above each suggest plausible biologically-significant scenarios in which such correlations might arise. For example, in the vaginal microbiome a decline in Lactobacillus and rise of species including Gardnerella vaginalis is associated with preterm birth in pregnant women; hence the variation in mean efficiency driven by these species may also create an association of mean efficiency with preterm status. In the plant-fungal experiment of Leopold and Busby (2020), the increase in pathogen proportion after infection leads to a large increase in the mean efficiency over time, which systematic increases the observed decline in the proportions of commensal species (SI Analysis). The ratio of Bacteroidetes to Firmicutes has been linked to several host traits and health conditions, and thus the variation in efficiency between these two phyla may also drive an association of mean efficiency with these traits.

Are there general mechanisms by which we might expect associations to arise between mean efficiency and the covariates commonly of interest in microbiome studies? One possibility is that the ecology and measurement efficiency of species may become correlated simply due to shared evolutionary history. A species’ efficiency is heavily influenced by genetically determined traits such as cell-wall structure, ribosomal-operon copy number, sequence in a primer-targeted region, GC content, and even whether the species is present in a given taxonomic database. Shared evolutionary history creates positive trait associations among more closely related species, such that we should expect a significant degree of phylogenetic conservation in efficiency. Meanwhile, the same shared evolutionary history creates phylogenetic conservation in ecological traits that drives species abundance dynamics. In this way, we may expect evolution to lead to a situation in which groups of species with similar efficiencies have similar shifts in abundance between conditions, creating the potential for commensurate shifts in mean efficiency. For example, a change in diet that favors Bacteroidetes species relative to Firmicutes based on the resulting gut conditions will also increase the relative abundance of easy-to-lyse species.

Associations between efficiency and abundance can also arise more directly when a single microbial trait affects both. For example, microbes at the ocean floor are slowly buried, sinking into a low nutrient, low oxygen environment. Lloyd et al. (2020) estimate the log fold change in the estimated absolute abundance of various taxa with sediment depth (as a proxy for time) to determine which taxa are able to persist and even grow in this difficult environment. It is plausible that microbes with tougher cell walls would tend to persist longer (alive or dead) in the sediment, while at the same time being more difficult to extract DNA from than microbes with weaker cell walls. As the relative abundance of tougher species increases with depth, the mean extraction efficiency might therefore decrease. This decrease would increase the inferred log fold changes and could lead to inferred growth of taxa that are actually just persisting or even slowly dying off. Another example of a trait that might simultaneously affect efficiency and relative abundance is ribosomal copy number, which increases the measurement efficiency in ribosomal amplicon experiments and is also linked to differences in ecology and population dynamics among species.

The biological significance of the error in \(\hat \beta\) depends on the relative magnitude of the LFC (or covariance with the covariate \(x\)) in the focal species vs in the mean efficiency. It may generally be true that the largest species LFCs in a DA analysis, which are often of primary interest in a microbiome study, will tend to be larger than the LFC of the mean efficiency, at least in high-diversity settings. (This seems intuitively likely and true in examples, but I haven’t analyzed it much mathematically and don’t have a firm handle on this question.) Further investigation into this question may improve our confidence in the top hits identified in published DA analyses meeting certain conditions. However, we must remain weary of cases (as in the vaginal-microbiome and plant-fungal-interactions examples above) in which a shift in a single species can drive large changes in mean efficiency and lead to significant errors for other species. The experimental and computational methods discussed in Section 5 can mitigate taxonomic bias in these clearly-problematic cases while also helping to amass empirical evidence on the conditions in which bias is unlikely to cause major inferential errors.

  • Todo: Address DA analysis of aggregate (or synthetic) taxa, such as phyla, which may be prone to errors due to changes in the species-level make-up of the taxon.

Error in total-density measurements:

So far we have ignored error in total-density measurements that are used to convert estimated proportions to estimated densities; however, in reality these measurements are likely to contain substantial systematic error that can either worsen or improve the error in LFC estimates.

  • Todo: Add to previous section or Appendix with the relevant theory.

To understand how, we can consider the question of whether bulk 16S qPCR or flow cytometry is a more appropriate method of measuring total bacterial density for use in proportion-based density measurement (Galazzo et al. (2020), Jian, Salonen, and Korpela (2021)). 16S rRNA-gene qPCR measures the concentration of 16S rRNA gene copies in the extracted DNA and is therefore affected by variation among species in DNA extraction efficiency, 16S copy number, and primer-binding and amplification efficiency. For example, McLaren, Willis, and Callahan (2019) found that the product of extraction efficiency and 16S copy number for Lactobacillus iners was 20X larger than Gardnerella vaginalis (CHECK NUMBERS), such that we should expect samples dominated by L. iners to result in qPCR estimates 10-20X higher than equally dense samples dominated by G. vaginalis. These observations suggest that 16S qPCR is a poor proxy for fold changes in total bacterial density. Yet, as recently suggested by Jian, Salonen, and Korpela (2021), this taxonomic bias in qPCR may simultaneously make it a good measure of total density for use with 16S amplicon sequencing for proportion-based density estimation. 16S amplicon sequencing shares these same sources of bias and so is likely to have species efficiencies highly similar to those for the qPCR measurements, which we can predict to cause the mean 16S-qPCR efficiency to closely follow the mean 16S-sequencing efficiency, such that the error induced by one largely cancels the error induced by the other (Ref Appendix). In contrast, species-specific efficiencies in bulk cell counting are less likely to be strongly correlated with those from 16S sequencing. Even supposing cell counting were to provide perfectly accurate measures of total density, it may actually perform worse than qPCR for DA analysis of proportion-based density estimation precisely because the error does not mirror that in the 16S measurement.

On the other hand, bulk or targeted measurement (or DNA spike-ins) that occurs post DNA-extraction is subject to other error that is not likely to counter that of the 16S-based proportion estimates. The yield of DNA extraction can show highly random variation and also be highly non-linear (as a function of cellular biomass) at high concentrations. This features create error that is unlikely to offset the variation in mean efficiency in the 16S sequencing (todo: clarify why). Targeted measurement of cells and cellular spike-ins, discussed in Section 5, can avoid this issue while also facilitating ratio-based analysis while (unlike bulk cell counting) remaining robust mean-efficiency variation.

4.2 Loss of power and precision (stub)

Variation in mean efficiency that is not associated with covariates can still majorly hinder the goals of microbiome studies by causing reductions in the precision of LFC estimates and the power to detect true associations of a given size.

  • Illustrate with case studies?
  • Mention counting noise as an important factor that we’ve so far ignored but that can interact with taxonomic bias to lead to very low precision in LFC estimates. Low efficiency can put a species below the detection limit set by sample read depth, making it impossible to get precise LFCs. Study-specific efficiencies can explain why associations often do not replicate across studies. Example: Gardnerella vaginalis associations are not detected in certain vaginal microbiome studies that used V13 primers which we now know are bad at amplifying it.
  • Conclude with a nod to the value of the solutions in Section 5 to address these issues: Correcting the errors caused by mean-efficiency variation can improve precision, and bias-sensitivity analysis and bias-aware meta-analysis can give a better sense of the increased uncertainty in LFC estimates due to unknown efficiencies, thereby helping us to understand the increased rate of false positives and/or the larger range of LFC estimates that are compatible with the observed data than what is implied by a bias-free model.

References

Brooks, J Paul, David J Edwards, Michael D Harwich, Maria C Rivera, Jennifer M Fettweis, Myrna G Serrano, Robert A Reris, et al. 2015. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies.” BMC Microbiol. 15 (1): 66. https://doi.org/10.1186/s12866-015-0351-6.
Fettweis, Jennifer M., Myrna G. Serrano, Jamie Paul Brooks, David J. Edwards, Philippe H. Girerd, Hardik I. Parikh, Bernice Huang, et al. 2019. The vaginal microbiome and preterm birth.” Nat. Med. 25 (6): 1012–21. https://doi.org/10.1038/s41591-019-0450-2.
Galazzo, Gianluca, Niels van Best, Birke J. Benedikter, Kevin Janssen, Liene Bervoets, Christel Driessen, Melissa Oomen, et al. 2020. How to Count Our Microbes? The Effect of Different Quantitative Microbiome Profiling Approaches.” Front. Cell. Infect. Microbiol. 10 (August). https://doi.org/10.3389/fcimb.2020.00403.
Jian, Ching, Anne Salonen, and Katri Korpela. 2021. Commentary: How to Count Our Microbes? The Effect of Different Quantitative Microbiome Profiling Approaches.” Front. Cell. Infect. Microbiol. 11 (March): 10–12. https://doi.org/10.3389/fcimb.2021.627910.
Leopold, Devin R, and Posy E Busby. 2020. Host Genotype and Colonist Arrival Order Jointly Govern Plant Microbiome Composition and Function.” Curr. Biol. 30 (16): 3260–3266.e5. https://doi.org/10.1016/j.cub.2020.06.011.
Lloyd, Karen G., Jordan T. Bird, Joy Buongiorno, Emily Deas, Richard Kevorkian, Talor Noordhoek, Jacob Rosalsky, and Taylor Roy. 2020. Evidence for a Growth Zone for Deep-Subsurface Microbial Clades in Near-Surface Anoxic Sediments.” Appl. Environ. Microbiol. 86 (19): 1–15. https://doi.org/10.1128/AEM.00877-20.
McLaren, Michael R, Amy D Willis, and Benjamin J Callahan. 2019. Consistent and correctable bias in metagenomic sequencing experiments.” Elife 8 (September): 46923. https://doi.org/10.7554/eLife.46923.