B Review of experimental methods for obtaining absolute densities

There are many experimental techniques to be able to add absolute-density information to MGS measurements. Here we review the experimental techniques; the next section considers the implications for systematic error.

NOTE: Right now I don’t consistently address why various targeted methods might be expected to produce constant fold errors. In revision, seek to connect each method with the relevant theory.

B.1 Measurement of total cell density

Total cell density in the original sample can be directly measured by cell counting, either via microscopy (Kevorkian et al. (2018), Lloyd et al. (2020)) or flow cytometry (Props et al. (2017), Vandeputte et al. (2017)). Total cell density or biomass can also be measured via properties assumed to be proportional to cell density, such as fluorescence (as in fluorescence spectroscopy, Wang et al. (2021)), components of microbial cell membranes (as in PLFA analysis, Smets et al. (2016)), and the rate of microbial respiration (as in SIR method, Smets et al. (2016)). So far, it is primarily the cell counting methods that have been used for species-density measurement (rather than simply for the total density), by multiplying the estimated total density by the MGS proportions.

B.2 Measurement of total DNA density post extraction

It is also common to use density of bulk DNA or a marker gene as a proxy for total community density. Marker-gene density is typically measured with qPCR or ddPCR using ‘universal primers’ that target the marker gene of interest (typically the 16S gene for bacterial microbiome experiments) (Tettamanti Boshier et al. (2020), Jian et al. (2020), Galazzo et al. (2020)). Bulk DNA density can be measured using fluorescence-based DNA quantification assays (Contijoch et al. (2019); Korpela et al. (2018)). In either case, the DNA density is measured after DNA extraction and so is affected by taxonomic bias in the extraction process, such as variation in lysis efficiency among species. Other sources of bias that affect the DNA density measurement include variation in marker-gene copy number (for marker-gene density) and variation in genome size (for bulk DNA density).

The measured DNA density is typically used as a direct proxy for cell density in the original sample. In particular, it is assumed that a doubling of cell density in the original sample leads to a doubling of DNA density in the extraction (possibly after adjustment for known dilution factors). This linearity assumption may be violated for several reasons. First, because of taxonomic bias. For example, samples dominated by easy-to-lyse species will give more DNA per cell than samples dominated by hard-to-lyse species. Second, systematic non-linearity may occur in the DNA yield as a function of input, even if species composition is held fixed. For example, DNA yield may saturate at high sample inputs. Third, the DNA yield may vary apparently randomly due to subtle differences in sample chemistry or handling during the experiment.

B.3 Equivolumetric protocol

A large part of the reason that there is not a direct correspondence between total density in the sample and total reads sequenced is that MGS experiments are typically intentionally designed to yield a similar number of sequencing reads from each sample, regardless of total density. Cruz, Christoff, and Oliveira (2021) propose instead designing the MGS experiment so as to make total reads proportional total density. The ‘equivolumetric protocol’ they develop represents a first attempt in this direction. In their protocol, total reads is a saturating function of total density; this function can be measured with a calibration experiment, and the calibration curve used to predict the total density in the source sample. This total density estimate is then used to scale the read counts to estimate species densities in a manner equivalent to the total-community density method.

B.4 Housekeeping species

We use housekeeping species (by analogy with housekeeping genes used for normalization in RNAseq experiments) to refer to species whose density is assumed to be constant, either in the MGS sample or in the source ecosystem it is derived from.

Housekeeping species can sometimes be identified from prior scientific knowledge. Several studies that have employed shotgun sequencing of host-associated microbiomes have use the plant or animal host for this purpose. A study of Arabidopsis microbiomes used the ratio of bacterial to host reads in shotgun sequencing as a proxy for total bacterial density, which they then used for total-community normalization of 16S amplicon sequencing measurements (Karasov et al. (2020), Regalado et al. (2020)). Chng et al. (2020) similarly used the ratio of bacterial to host reads in shotgun sequencing of mouse fecal samples as a proxy for total bacterial density (though they did not use this measurement for community normalization). They also use reads from dietary plants for the same purpose. Wallace et al. (2021) used shotgun sequencing to study the virome of Drosophila, and normalized virus reads to Drosophila reads to measure viral abundance per fly. Organelle reads can also be used. Diener et al. (2021) use mitochondria reads in 16S sequencing of mouse fecal pellets to assess total microbial load, though in a qualitative fashion (as mitochondrial reads were only non-zero at very low bacterial densities induced by antibiotics).

In some cases, there may also be microbes or viruses thought to have stable densities. A recent example is that the most abundant DNA virus in human feces, crAssphage, and the most abundant RNA virus in human feces, Pepper Mild Mottle Virus, have been treated as stable reference species in wastewater monitoring for SARS-CoV-2. Although primarily used in the context of qPCR measurements, these viruses could also be used as references in RNA and DNA shotgun sequencing experiments.

Housekeeping species may not be known a priori; to address this case, several methods have been put forward to computationally identify unchanging microbes species directly from MGS measurements. These studies are often focused on mammalian gut bacterial communities. It is perhaps unreasonable to expect bacterial species to be unchanging across hosts, but weaker assumptions can be made to develop normalization methods for the MGS measurements with a similar spirit to reference-based normalization. These studies have instead developed normalization methods based on assumptions such as that most species do not change between any pair of samples (David et al. (2014)) or that the mean (log) abundance between two sample conditions is unchanged for at least some species (Mandal et al. (2015), Kumar et al. (2018)).

When housekeeping species are sequenced along with the primary MGS measurement, they can be used to obtain species densities via reference-normalization (Equation (2.13). In this case, the only relevant taxonomic bias is that of the primary MGS measurement; if it is constant then it will cancel in fold-change calculations. Because the density of the housekeeping species is unknown, we can consider either that the density of focal species has a constant error, or is in units of the housekeeping species. Non-constant error might arise if the species is treated as constant when it is in fact not.

Housekeeping species have also been used to estimate total community density by \(\text{reads}_{S} / \text{reads}_{R}\). This estimate has been used to study variation in total density across samples, or for total-density normalization.

B.5 Spike-ins

Spike-in methods differ by the biological spike-in material: Cellular spike-ins can be added prior to DNA extraction, and DNA spike-ins can be added prior to or following DNA extraction. In either case, a variety of methods can be used to actually leverage the spike-ins for absolute density analysis.

Cellular spike-ins: Cellular spike-ins are added to the sample prior to DNA extraction. Some sample processing has typically occurred prior to spiking, for the purposes of storage (e.g. a freeze thaw cycle) and homogenization. We should expect there to be taxonomic bias between the spike-in species and those naturally in the sample due to genetic differences and because of physiological differences induced by the sample processing prior to spiking and the experimental procedure used to grow and prepare the spike-in cells. Our analysis acknowledges this bias, but assumes that it is consistent across samples. The nominal density of the spike-in species added to each sample is subject to random and systematic fold error; but systematic fold error that is shared across samples will induce a constant fold error in species densities and so not impact DA analysis. For instance, if source stock is actually 1.5X higher concentration than thought, the true spike-in concentration will be 1.5X greater than nominal in all samples and not pose a problem for accurate DA inference beyond leading to a greater than intended sequencing effort being expended on the spike-in.

Example studies include Stämmler et al. (2016), Ji et al. (2019), and Rao et al. (2021).

DNA spike-ins: Another possibility is to add DNA spike-ins, derived from natural or artificial sequence. DNA spike-ins can be added the samples before DNA extraction (e.g. Smets et al. (2016), Tkacz, Hortala, and Poole (2018), Zemb et al. (2020)) or after DNA extraction (e.g. Hardwick et al. (2018), Tkacz, Hortala, and Poole (2018)). Adding spike-ins prior to extraction is thought to be preferable as it makes it possible to detect and correct for variation in DNA extraction yield among samples (Tkacz, Hortala, and Poole (2018), Zemb et al. (2020), Harrison et al. (2021)). Below, we consider the distinction between pre- and post-extraction spike-ins in the light of taxonomic bias coupled with other sources of variation in extraction efficiency.

How spike-ins are used: Like housekeeping species, spike-ins have been used in a variety of ways to analyze absolute abundances. Let \(R\) (for reference) be the spike-in species and \(S\) be the native species. Smets et al. (2016) used the ratio of \(S\) to \(R\) reads as an estimate of total density, which they were interested in for its own sake, though one could imagine then also using this total density estimate in community normalization (Equation (2.10)). Zemb et al. (2020) used the spike-ins to measure total density from the ratio of \(S\) to \(R\) qPCR abundance estimates and then used Equation (2.10). Stämmler et al. (2016) and others used the ratio-based method of Equation (2.13).

B.6 Targeted measurements

A variety of methods exist for targeted measurement of absolute density of a specific species (or higher-order taxon). The most common approach is to use qPCR or ddPCR to measure the concentration of a marker gene in the extracted DNA, using primers scoped to the target taxon. This approach is therefore subject to sources of taxonomic bias including extraction, marker-gene copy number, and primer-binding. It is also subject to non-species-specific variation in extraction yields unless these are otherwise controlled for. It is also possible to directly measure cell density using methods. Some species can be measured by CFU counting after plating on selective media (REFs), and ddPCR has been used to direct measure cells (Dreo et al. (2014), Morella et al. (2018)) and viruses (Pavšič, Žel, and Milavec (2016), Morella et al. (2018)) without first performing an extraction. Species-specific florescent probes also make it possible to measure individual species via microscopy or flow cytometry (REFs).

TODO: Argue that these methods may yield constant fold errors.

References

Chng, Kern Rei, Tarini Shankar Ghosh, Yi Han Tan, Tannistha Nandi, Ivor Russel Lee, Amanda Hui Qi Ng, Chenhao Li, et al. 2020. Metagenome-wide association analysis identifies microbial determinants of post-antibiotic ecological recovery in the gut.” Nat. Ecol. Evol. 4 (9): 1256–67. https://doi.org/10.1038/s41559-020-1236-0.
Contijoch, Eduardo J, Graham J Britton, Chao Yang, Ilaria Mogno, Zhihua Li, Ruby Ng, Sean R Llewellyn, et al. 2019. Gut microbiota density influences host physiology and is shaped by host and microbial factors.” Elife 8 (January). https://doi.org/10.7554/eLife.40553.
Cruz, Giuliano Netto Flores, Ana Paula Christoff, and Luiz Felipe Valter de Oliveira. 2021. Equivolumetric Protocol Generates Library Sizes Proportional to Total Microbial Load in 16S Amplicon Sequencing.” Front. Microbiol. 12 (February): 1–16. https://doi.org/10.3389/fmicb.2021.638231.
David, Lawrence A, Arne C Materna, Jonathan Friedman, Maria I Campos-Baptista, Matthew C Blackburn, Allison Perrotta, Susan E Erdman, and Eric J Alm. 2014. Host lifestyle affects human microbiota on daily timescales.” Genome Biol. 15 (7): R89. https://doi.org/10.1186/gb-2014-15-7-r89.
Diener, Christian, Anna C. H. Hoge, Sean M. Kearney, Ulrike Kusebauch, Sushmita Patwardhan, Robert L. Moritz, Susan E. Erdman, and Sean M. Gibbons. 2021. Non-responder phenotype reveals apparent microbiome-wide antibiotic tolerance in the murine gut.” Commun. Biol. 4 (1). https://doi.org/10.1038/s42003-021-01841-8.
Dreo, Tanja, Manca Pirc, Živa Ramšak, Jernej Pavšič, Mojca Milavec, Jana Žel, and Kristina Gruden. 2014. Optimising droplet digital PCR analysis approaches for detection and quantification of bacteria: a case study of fire blight and potato brown rot.” Anal. Bioanal. Chem. 406 (26): 6513–28. https://doi.org/10.1007/s00216-014-8084-1.
Galazzo, Gianluca, Niels van Best, Birke J. Benedikter, Kevin Janssen, Liene Bervoets, Christel Driessen, Melissa Oomen, et al. 2020. How to Count Our Microbes? The Effect of Different Quantitative Microbiome Profiling Approaches.” Front. Cell. Infect. Microbiol. 10 (August). https://doi.org/10.3389/fcimb.2020.00403.
Hardwick, Simon A., Wendy Y. Chen, Ted Wong, Bindu S. Kanakamedala, Ira W. Deveson, Sarah E. Ongley, Nadia S. Santini, et al. 2018. Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis.” Nat. Commun. 9 (1): 3096. https://doi.org/10.1038/s41467-018-05555-0.
Harrison, Joshua G., W. John Calder, Bryan Shuman, and C. Alex Buerkle. 2021. The quest for absolute abundance: The use of internal standards for DNA‐based community ecology.” Mol. Ecol. Resour. 21 (1): 30–43. https://doi.org/10.1111/1755-0998.13247.
Ji, Brian W., Ravi U. Sheth, Purushottam D. Dixit, Yiming Huang, Andrew Kaufman, Harris H. Wang, and Dennis Vitkup. 2019. Quantifying spatiotemporal variability and noise in absolute microbiota abundances using replicate sampling.” Nat. Methods 16 (8): 731–36. https://doi.org/10.1038/s41592-019-0467-y.
Jian, Ching, Panu Luukkonen, Hannele Yki-Järvinen, Anne Salonen, and Katri Korpela. 2020. Quantitative PCR provides a simple and accessible method for quantitative microbiota profiling.” Edited by Ivone Vaz-Moreira. PLoS One 15 (1): e0227285. https://doi.org/10.1371/journal.pone.0227285.
Karasov, Talia L., Manuela Neumann, Alejandra Duque-Jaramillo, Sonja Kersten, Ilja Bezrukov, Birgit Schröppel, Efthymia Symeonidi, et al. 2020. The relationship between microbial population size and disease in the Arabidopsis thaliana phyllosphere.” bioRxiv. https://doi.org/10.1101/828814.
Kevorkian, Richard, Jordan T Bird, Alexander Shumaker, and Karen G Lloyd. 2018. Estimating Population Turnover Rates by Relative Quantification Methods Reveals Microbial Dynamics in Marine Sediment.” Appl. Environ. Microbiol. 84 (1): e01443–17. https://doi.org/10.1128/AEM.01443-17.
Korpela, Katri, Elin W. Blakstad, Sissel J. Moltu, Kenneth Strømmen, Britt Nakstad, Arild E. Rønnestad, Kristin Brække, Per O. Iversen, Christian A. Drevon, and Willem de Vos. 2018. Intestinal microbiota development and gestational age in preterm neonates.” Sci. Rep. 8 (1): 1–9. https://doi.org/10.1038/s41598-018-20827-x.
Kumar, M. Senthil, Eric V. Slud, Kwame Okrah, Stephanie C. Hicks, Sridhar Hannenhalli, and Héctor Corrada Bravo. 2018. Analysis and correction of compositional bias in sparse sequencing count data.” BMC Genomics 19 (1): 799. https://doi.org/10.1186/s12864-018-5160-5.
Lloyd, Karen G., Jordan T. Bird, Joy Buongiorno, Emily Deas, Richard Kevorkian, Talor Noordhoek, Jacob Rosalsky, and Taylor Roy. 2020. Evidence for a Growth Zone for Deep-Subsurface Microbial Clades in Near-Surface Anoxic Sediments.” Appl. Environ. Microbiol. 86 (19): 1–15. https://doi.org/10.1128/AEM.00877-20.
Mandal, Siddhartha, Will Van Treuren, Richard A. White, Merete Eggesbø, Rob Knight, and Shyamal D. Peddada. 2015. Analysis of composition of microbiomes: a novel method for studying microbial composition.” Microb. Ecol. Heal. Dis. 26 (1): 27663. https://doi.org/10.3402/mehd.v26.27663.
Morella, Norma M., Shangyang Christopher Yang, Catherine A. Hernandez, and Britt Koskella. 2018. Rapid quantification of bacteriophages and their bacterial hosts in vitro and in vivo using droplet digital PCR.” J. Virol. Methods 259 (May): 18–24. https://doi.org/10.1016/j.jviromet.2018.05.007.
Pavšič, Jernej, Jana Žel, and Mojca Milavec. 2016. Digital PCR for direct quantification of viruses without DNA extraction.” Anal. Bioanal. Chem. 408 (1): 67–75. https://doi.org/10.1007/s00216-015-9109-0.
Props, Ruben, Frederiek-Maarten Kerckhof, Peter Rubbens, Jo De Vrieze, Emma Hernandez Sanabria, Willem Waegeman, Pieter Monsieurs, Frederik Hammes, and Nico Boon. 2017. Absolute quantification of microbial taxon abundances.” ISME J. 11 (2): 584–87. https://doi.org/10.1038/ismej.2016.117.
Rao, Chitong, Katharine Z. Coyte, Wayne Bainter, Raif S. Geha, Camilia R. Martin, and Seth Rakoff-Nahoum. 2021. Multi-kingdom ecological drivers of microbiota assembly in preterm infants.” Nature 591 (7851): 633–38. https://doi.org/10.1038/s41586-021-03241-8.
Regalado, Julian, Derek S. Lundberg, Oliver Deusch, Sonja Kersten, Talia Karasov, Karin Poersch, Gautam Shirsekar, and Detlef Weigel. 2020. Combining whole-genome shotgun sequencing and rRNA gene amplicon analyses to improve detection of microbe–microbe interaction networks in plant leaves.” ISME J., May, 823492. https://doi.org/10.1038/s41396-020-0665-8.
Smets, Wenke, Jonathan W. Leff, Mark A. Bradford, Rebecca L. McCulley, Sarah Lebeer, and Noah Fierer. 2016. A method for simultaneous measurement of soil bacterial abundances and community composition via 16S rRNA gene sequencing.” Soil Biol. Biochem. 96: 145–51. https://doi.org/10.1016/j.soilbio.2016.02.003.
Stämmler, Frank, Joachim Gläsner, Andreas Hiergeist, Ernst Holler, Daniela Weber, Peter J. Oefner, André Gessner, and Rainer Spang. 2016. Adjusting microbiome profiles for differences in microbial load by spike-in bacteria.” Microbiome 4 (1): 28. https://doi.org/10.1186/s40168-016-0175-0.
Tettamanti Boshier, Florencia A., Sujatha Srinivasan, Anthony Lopez, Noah G. Hoffman, Sean Proll, David N. Fredricks, and Joshua T. Schiffer. 2020. Complementing 16S rRNA Gene Amplicon Sequencing with Total Bacterial Load To Infer Absolute Species Concentrations in the Vaginal Microbiome.” mSystems 5 (2): 1–14. https://doi.org/10.1128/mSystems.00777-19.
Tkacz, Andrzej, Marion Hortala, and Philip S. Poole. 2018. Absolute quantitation of microbiota abundance in environmental samples.” Microbiome 6 (1): 110. https://doi.org/10.1186/s40168-018-0491-7.
Vandeputte, Doris, Gunter Kathagen, Kevin D’hoe, Sara Vieira-Silva, Mireia Valles-Colomer, João Sabino, Jun Wang, et al. 2017. Quantitative microbiome profiling links gut community variation to microbial load.” Nature 551 (7681): 507. https://doi.org/10.1038/nature24460.
Wallace, Megan A., Kelsey A. Coffman, Clément Gilbert, Sanjana Ravindran, Gregory F. Albery, Jessica Abbott, Eliza Argyridou, et al. 2021. The discovery, distribution, and diversity of DNA viruses associated with Drosophila melanogaster in Europe.” Virus Evol. 7 (1): 1–23. https://doi.org/10.1093/ve/veab031.
Wang, Xiaofan, Samantha Howe, Feilong Deng, and Jiangchao Zhao. 2021. Current Applications of Absolute Bacterial Quantification in Microbiome Studies and Decision-Making Regarding Different Biological Questions.” Microorganisms 9 (9): 1797. https://doi.org/10.3390/microorganisms9091797.
Zemb, Olivier, Caroline S Achard, Jerome Hamelin, Marie‐Léa De Almeida, Béatrice Gabinaud, Laurent Cauquil, Lisanne M. G. Verschuren, and Jean-jacques Godon. 2020. Absolute quantitation of microbes using 16S rRNA gene metabarcoding: A rapid normalization of relative abundances by quantitative PCR targeting a 16S rRNA gene spike‐in standard.” Microbiologyopen 9 (3): 1–21. https://doi.org/10.1002/mbo3.977.