D Supplemental figures

In the pre-infection samples from Leopold and Busby (2020), multiplicative variation in taxa proportions is much larger than that in the mean efficiency. Panel A shows the distribution of the proportions of each commensal isolate (denoted by its genus) across all samples collected prior to pathogen inoculation; Panel C shows the distribution of the (estimated) sample mean efficiency across these same samples on the same scale; and Panel B shows the efficiency of each taxon estimated from DNA mock communities as point estimates and 90% bootstrap percentile confidence intervals. Efficiencies are shown relative to the most efficiently measured taxon (Fusarium).

Figure D.1: In the pre-infection samples from Leopold and Busby (2020), multiplicative variation in taxa proportions is much larger than that in the mean efficiency. Panel A shows the distribution of the proportions of each commensal isolate (denoted by its genus) across all samples collected prior to pathogen inoculation; Panel C shows the distribution of the (estimated) sample mean efficiency across these same samples on the same scale; and Panel B shows the efficiency of each taxon estimated from DNA mock communities as point estimates and 90% bootstrap percentile confidence intervals. Efficiencies are shown relative to the most efficiently measured taxon (Fusarium).



The mean efficiency tends to increase after infection due to the high proportion of the pathogen.

Figure D.2: The mean efficiency tends to increase after infection due to the high proportion of the pathogen.



Bias correction increases the estimated increase in log proportion in response to infection for commensal taxa across all host genotypes. Shown are the estimated log fold change (LFC) and 95% confidence intervals from simple linear regression of log (base e) proportion against experimental timepoint for commensal taxa. Negative values indicate that the proportion of the taxon decreased on average in response to infection, which we expect due to an increase in pathogen abundance and the sum-to-one constraint of proportions. Bias leads to artificially low estimates, as the increased pathogen proportion drives an increase in mean efficiency.

Figure D.3: Bias correction increases the estimated increase in log proportion in response to infection for commensal taxa across all host genotypes. Shown are the estimated log fold change (LFC) and 95% confidence intervals from simple linear regression of log (base e) proportion against experimental timepoint for commensal taxa. Negative values indicate that the proportion of the taxon decreased on average in response to infection, which we expect due to an increase in pathogen abundance and the sum-to-one constraint of proportions. Bias leads to artificially low estimates, as the increased pathogen proportion drives an increase in mean efficiency.



Fold changes in the mean efficiency within and between women in the MOMS-PI study.

Figure D.4: Fold changes in the mean efficiency within and between women in the MOMS-PI study.



In vaginal microbiome measurements, shifts between Lactobacillus and Gardnerella dominance can drive spurious fold changes in other, lower-abundance species. The figure shows species proportions and mean efficiency trajectories over consecutive clinical visits for a subject in the MOMS-PI study whose microbiome samples showed substantial variation in mean efficiency. The subject’s samples are dominated by Gardnerella vaginalis and Lachnospiraceae BVAB1 during the first three visits before transitioning to being dominated by Lactobacillus iners between visits 3 and 4. This transition drives a sharp increase in the mean efficiency, which significantly distorts the fold changes in the observed (uncalibrated) microbiome measurements for species with less dramatic fold changes. Two exemplar species are shown to illustrate the magnitude (Ureaplasma cluster 23) and sign (Megasphaera OTU70 type1) errors that can arise in this situation.

Figure D.5: In vaginal microbiome measurements, shifts between Lactobacillus and Gardnerella dominance can drive spurious fold changes in other, lower-abundance species. The figure shows species proportions and mean efficiency trajectories over consecutive clinical visits for a subject in the MOMS-PI study whose microbiome samples showed substantial variation in mean efficiency. The subject’s samples are dominated by Gardnerella vaginalis and Lachnospiraceae BVAB1 during the first three visits before transitioning to being dominated by Lactobacillus iners between visits 3 and 4. This transition drives a sharp increase in the mean efficiency, which significantly distorts the fold changes in the observed (uncalibrated) microbiome measurements for species with less dramatic fold changes. Two exemplar species are shown to illustrate the magnitude (Ureaplasma cluster 23) and sign (Megasphaera OTU70 type1) errors that can arise in this situation.



Bias distorts log fold changes (LFCs) in species proportions in a regression analysis of vaginal microbiome samples from the MOMS-PI study. Samples were split into low, medium, and high diversity groups based on Shannon diversity in observed (uncalibrated) microbiome profiles. The LFC in proportion from low- to high-diversity samples was estimated for 30 common species by simple linear regression, using calibrated (bias-corrected) and observed (uncorrected) microbiome profiles following a simple zero-replacement procedure. Panel A shows the distribution of point estimates; Panel B shows the point estimates and 95% confidence intervals for each species. The difference between the calibrated and observed estimate for each species equals the negative LFC in mean efficiency.

Figure D.6: Bias distorts log fold changes (LFCs) in species proportions in a regression analysis of vaginal microbiome samples from the MOMS-PI study. Samples were split into low, medium, and high diversity groups based on Shannon diversity in observed (uncalibrated) microbiome profiles. The LFC in proportion from low- to high-diversity samples was estimated for 30 common species by simple linear regression, using calibrated (bias-corrected) and observed (uncorrected) microbiome profiles following a simple zero-replacement procedure. Panel A shows the distribution of point estimates; Panel B shows the point estimates and 95% confidence intervals for each species. The difference between the calibrated and observed estimate for each species equals the negative LFC in mean efficiency.



A bias-sensivity analysis can be performed to examine how sensitive the results of a DA analysis are to assumptions about taxonomic bias in community measurements. The figure shows the results of a bias-sensitivity analysis used to study the effect of bias on the association of Gardnerella vaginalis and preterm birth that was investigated by Callahan et al. (2017). 100 random efficiency vectors were drawn at 6 different bias strengths (quantified by the variance in log efficiency, \(\sigma_{e}^{2}\)). Each efficiency vector was used to calibrate the MGS profiles and perform a DA association test of G. vaginalis versus the host’s preterm birth outcome; regression coefficients \(\hat \beta\) indicate the increase of average logit proportion of G. vaginalis in women who experienced preterm birth.

Figure D.7: A bias-sensivity analysis can be performed to examine how sensitive the results of a DA analysis are to assumptions about taxonomic bias in community measurements. The figure shows the results of a bias-sensitivity analysis used to study the effect of bias on the association of Gardnerella vaginalis and preterm birth that was investigated by Callahan et al. (2017). 100 random efficiency vectors were drawn at 6 different bias strengths (quantified by the variance in log efficiency, \(\sigma_{e}^{2}\)). Each efficiency vector was used to calibrate the MGS profiles and perform a DA association test of G. vaginalis versus the host’s preterm birth outcome; regression coefficients \(\hat \beta\) indicate the increase of average logit proportion of G. vaginalis in women who experienced preterm birth.

References

Callahan, Benjamin J, Daniel B DiGiulio, Daniela S Aliaga Goltsman, Christine L Sun, Elizabeth K Costello, Pratheepa Jeganathan, Joseph R Biggio, et al. 2017. Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women.” Proc. Natl. Acad. Sci. U. S. A. 114 (37): 9966–71. https://doi.org/10.1073/pnas.1705899114.
Leopold, Devin R, and Posy E Busby. 2020. Host Genotype and Colonist Arrival Order Jointly Govern Plant Microbiome Composition and Function.” Curr. Biol. 30 (16): 3260–3266.e5. https://doi.org/10.1016/j.cub.2020.06.011.