6 Conclusion

It is commonly thought that analyses of the differences between samples are robust to the taxonomic bias inherent in MGS measurement as long as all samples have been subjected to the same MGS workflow. But our results show that consistent taxonomic bias is still capable of creating scientific errors in common forms of DA analysis. In particular, we showed mathematically that bias affects DA analysis methods based on proportions or in absolute abundances derived from them (using total-abundance normalization), due to variation in the mean efficiency across samples of varying taxonomic composition. The error, however, varies by experimental context and may often be negligible in a practical sense. Moreover, it can be quantified and corrected through the use of reference-taxon measurements and/or community calibration controls. In addition, bias-aware sensitivity and meta-analyses make it possible to rigorously account for bias even in the absence of such controls. Applications of these methods will gradually improve our understanding of the experimental contexts in which variation in the mean efficiency is likely to be problematic and should be accounted for.

We also showed that other DA methods are more robust—perhaps even entirely invariant—to consistent taxonomic bias. In particular, analyses based on multiplicative variation in ratios and of absolute abundances derived from them (using reference-species normalization) are invariant to bias in the MGS measurement, as bias causes a constant multiplicative error that cancels in cross-sample comparisons. In addition, careful pairing of total-abundance and MGS measurement methods can counter the variation in mean efficiency of the MGS measurement, leading to more consistent fold errors across samples.

Important open questions and future research directions remain. Key open questions include 1) determining the extent to which bias is consistent across samples for different taxonomic levels, MGS methods, and sample types; 2) assessing the validity of post-extraction measurements as measures of pre-extraction abundance; 3) understanding how the multiplicative error from bias interacts with the non-multiplicative error from contamination and taxonomic misassignment; and 4) understanding how different underlying community dynamics, and in particular the source of zero counts in MGS measurements, affect bias sensitivity. In addition, while we have showed some simple methods for incorporating bias and control measurements into DA analyses, more sophisticated statistical tools are needed to properly account for both taxonomic bias and random variation in the underlying MGS and supplemental (e.g. qPCR) measurements. Finally, more concrete experimental recommendations and user-friendly statistical workflows are needed for implementing the solutions we propose in various experimental contexts.

Our theoretical framework and example analyses provide a foundation for addressing these open questions, as well as developing experimental protocols and statistical tools that implement our proposed solutions. We look to a future in which microbiome researchers regularly choose an appropriate combination of experimental and data-analytic methods that are capable of answering their fundamental question while also accounting for taxonomic bias and other limitations inherent to MGS measurement. In doing so, we will collectively gain the confidence needed to codify the findings from MGS-based microbiome studies into true scientific knowledge.