2 How bias affects abundance measurements

This section extends the theoretical results of McLaren, Willis, and Callahan (2019) to describe how taxonomic bias in an MGS experiment affects the relative and absolute abundances measured for various microbial species. We show that some approaches to quantifying species abundance yield constant fold errors (FEs), while others yield FEs that depend on overall community composition and thus can vary across samples.

2.1 A model of MGS measurements

Our primary tool for understanding the impact of taxonomic bias on MGS measurement is the theoretical model of MGS measurement developed and empirically validated by McLaren, Willis, and Callahan (2019). This model describes the mathematical relationship between the read counts obtained by MGS and the (actual) abundances of the various species in a sample. Here we extend the model as first described McLaren, Willis, and Callahan (2019), which considers only relative abundances, to also consider absolute abundances. For concreteness, we will consider absolute abundance, or simply abundance, to refer to the number of cells per unit volume in a sample (cell concentration). That said, our results equally apply to other definitions of absolute abundance, such as the total number of cells in a sample or ecosystem and other abundance units such as biomass or genome copy number.

This model is the simplest that respects the multiplicative nature of taxonomic bias and the compositional nature of MGS measurements. The actual abundance of a species in a given sample, multiplied by its measurement efficiency—its rate of conversion from cells to taxonomically assigned sequencing reads—determines the species’ read count in that sample. Taxonomic bias presents as variation in the measurement efficiencies among species within an MGS experiment. The read counts further depend on sample-specific experimental factors that are typically unknown, such that they are best interpreted as only providing relative abundances (such data is said to be compositional; Gloor et al. (2017)).

We consider a set of microbiome samples measured by a specific MGS protocol that extracts, sequences, and taxonomically assigns reads to a set of microbial species \(S\). We make several simplifying assumptions to facilitate our analysis and presentation. First, we consider only species-level assignment, and suppose that reads that cannot be uniquely assigned to a single species in \(S\) are discarded. Second, we ignore the possibility that reads are misassigned to the wrong species or the wrong sample. Third, we suppose that taxonomic bias acts consistently across samples at the species level—that is, a given species is always measured more efficiently than another to the same degree. Finally, unless otherwise stated, we treat sequencing measurements as deterministic, ignoring the ‘random’ variation in read counts that arise from the sampling of sequencing reads and other aspects of the MGS process. These assumptions, though unrealistic descriptions of most MGS experiments, serve the purpose of clearly demonstrating when and why consistent taxonomic bias creates errors in DA analysis.

Our model stipulates that the taxonomically-assigned read count for a species \(i\) in a sample \(a\) equals its abundance \(A_{i}^{(a)}\) multiplied by a species-specific factor \(B_{i}\) and a sample-specific factor \(F^{(a)}\), \[\begin{align} \tag{2.1} M_i^{(a)} = A_i^{(a)} B_i F^{(a)}. \end{align}\] The species-specific factor, \(B_{i}\), is the relative measurement efficiency (or simply efficiency) of the species relative to an arbitrary baseline species (McLaren, Willis, and Callahan (2019)). The variation in efficiency among species corresponds to the taxonomic bias of the MGS protocol. The sample-specific factor, \(F^{(a)}\), describes the effective sequencing effort for that sample; it equals the number of reads per unit abundance that would be obtained for a species with an efficiency of 1. We can write the total number of assigned reads for the sample as \[\begin{align} \tag{2.2} M_{\text{tot}}^{(a)} = A_{\text{tot}}^{(a)} \bar B^{(a)} F^{(a)}, \end{align}\] where \(M_{{\text{tot}}}^{(a)} \equiv \sum_{j\in S} M_j^{(a)}\) is the total read count and \(A_{{\text{tot}}}^{(a)} \equiv \sum_{j\in S}A_j^{(a)}\) is the total abundance for all species \(S\), and \[\begin{align} \tag{2.3} \bar B^{(a)} \equiv \frac{\sum_{i\in S} A_i^{(a)} B_i}{\sum_{i\in S} A_i^{(a)}} \end{align}\] is the sample mean efficiency, defined as the mean efficiency of all species weighted by their abundance.

2.2 Relative abundance

We distinguish between two types of species-level relative abundances within a sample. The proportion \(P_{i}^{(a)}\) of species \(i\) in sample \(a\) equals its abundance divided by the total abundance of all species in \(S\), \[\begin{align} \tag{2.4} P_{i}^{(a)} &\equiv \frac{A_i^{(a)}}{A_{\text{tot}}^{(a)}}. \end{align}\] The ratio \(R_{i/j}^{(a)}\) between two species \(i\) and \(j\) equals the abundance of \(i\) divided by that of \(j\), \[\begin{align} \tag{2.5} R_{i/j}^{(a)} \equiv \frac{A_i^{(a)}}{A_j^{(a)}}. \end{align}\]

The measured proportion of a species is given by its proportion of all the assigned reads in a sample, \[\begin{align} \tag{2.6} \tilde P_{i}^{(a)} &\equiv \frac{M_i^{(a)}}{M_{\text{tot}}^{(a)}}. \end{align}\] We use the tilde to distinguish the measurement from the actual quantity being measured. From Equations (2.1), (2.2), and (2.6), it follows that the measured and actual proportion are related by \[\begin{align} \tag{2.7} \tilde P_{i}^{(a)} &= P_{i}^{(a)} \cdot \frac{B_i}{\bar B^{(a)}}. \end{align}\] Taxonomic bias creates a fold-error (FE) in the measured proportion of a species that is equal to its efficiency divided by the mean efficiency in the sample. Since the mean efficiency varies across samples, so does the FE. This phenomenon can be seen for Species 3 in the two hypothetical communities in Figure 2.1. Species 3, which has an efficiency of 6, is under-measured in Sample 1 (FE < 1) but over-measured (FE > 1) in Sample 2. This difference occurs because the even distribution of species Sample 1 yields a mean efficiency of 8.33; in contrast, the lopsided distribution in Sample 2, which is dominated by the low-efficiency Species 1, has a mean efficiency of just 3.15. A demonstration in bacterial mock communities is shown in Figure 3C of McLaren, Willis, and Callahan (2019).

The measured ratio \(\tilde R_{i/j}^{(a)}\) between species \(i\) and \(j\) is given by the ratio of their read counts, \[\begin{align} \tag{2.8} \tilde R_{i/j}^{(a)} \equiv \frac{M_{i}^{(a)}}{M_{j}^{(a)}}. \end{align}\] From Equations (2.1) and (2.8), it follows that the measured and actual ratio are related by \[\begin{align} \tag{2.9} \tilde R_{i/j}^{(a)} = R_{i/j}^{(a)} \cdot \frac{B_i}{B_j}. \end{align}\] Taxonomic bias creates a FE in the measured ratio that is equal to the ratio in the species’ efficiencies; the FE is therefore constant across samples. For instance, in Figure 2.1, the ratio of Species 3 (with an efficiency of 6) to Species 1 (with an efficiency of 1) is over-measured by a factor of 6 in both communities despite their varying compositions. A demonstration in bacterial mock communities is shown in Figure 3D of McLaren, Willis, and Callahan (2019).

Taxonomic bias creates fold errors in species proportions that vary across samples and lead to inaccurate fold differences between samples. Top row: Error in proportions measured by MGS in two hypothetical microbiome samples that contain different relative abundances of three species. Bottom row: Error in the measured fold difference in the third species that is derived from these measurements. Species’ proportions may be measured as too high or too low depending on sample composition. For instance, Species 3 has an efficiency of 6 and is under-measured in Sample 1 (which has a mean efficiency of 8.33) but over-measured in Sample 2 (which has a mean efficiency of 3.15).

Figure 2.1: Taxonomic bias creates fold errors in species proportions that vary across samples and lead to inaccurate fold differences between samples. Top row: Error in proportions measured by MGS in two hypothetical microbiome samples that contain different relative abundances of three species. Bottom row: Error in the measured fold difference in the third species that is derived from these measurements. Species’ proportions may be measured as too high or too low depending on sample composition. For instance, Species 3 has an efficiency of 6 and is under-measured in Sample 1 (which has a mean efficiency of 8.33) but over-measured in Sample 2 (which has a mean efficiency of 3.15).

Higher-order taxa: We can consider a higher-order taxon \(I\), such as a genus or phylum, as a set of species, \(\{i \in I\}\). The abundance of taxon \(I\) in sample \(a\) is the sum of the abundances of its constituent species, \(A_{I}^{(a)} \equiv \sum_{i \in I} A_{i}^{(a)}\). Similarly, the read count of taxon \(I\) is the sum \(M_{I}^{(a)} \equiv \sum_{i \in I} M_{i}^{(a)}\). We further define the efficiency of taxon \(I\) as the abundance-weighted average of the efficiencies of its constituent species, \[\begin{align} \tag{2.10} B_I^{(a)} \equiv \frac{\sum_{i\in I} A_{i}^{(a)} B_{i}}{\sum_{i\in I} A_{i}^{(a)}}. \end{align}\] With these definitions, the read count for higher-order taxon \(I\) can be expressed as \(M_{I}^{(a)} = A_{I}^{(a)} B_I^{(a)} F^{(a)}\). Thus \(B_I^{(a)}\) plays a role analogous to the efficiency of an individual species, but differs in that it is not constant across samples: If the constituent species have different efficiencies, then the efficiency of the higher-order taxon \(I\) depends on the relative abundances of its constituents and so will vary across samples (McLaren, Willis, and Callahan (2019)). As an example, suppose that Species 1 and Species 2 in Figure 2.1 were in the same phylum. The efficiency of the phylum would then be \(\tfrac{1}{2} \cdot 1 + \tfrac{1}{2} \cdot 18 = 9.5\) in Sample 1 and \(\tfrac{15}{16} \cdot 1 + \tfrac{1}{16} \cdot 18 \approx 2.1\) in Sample 2. Equations (2.7) and (2.9) continue to describe the measurement error in proportions and ratios involving higher-order taxa, so long as the sample-dependent, higher-order taxa efficiencies \(B_I^{(a)}\) and \(B_J^{(a)}\) are used. In this way, we see that both proportions and ratios among higher-order taxa may have inconsistent FEs.

2.3 Absolute abundance

Several extensions of the standard MGS experiment make it possible to measure absolute species abundances. These extensions fall into two general approaches. The first approach leverages information about the abundance of the total community; for example, Vandeputte et al. (2017) measured total-community abundance using flow cytometry and multiplied this number by genus proportions measured by MGS to quantify the absolute abundances of individual genera (Vandeputte et al. (2017)). A second approach leverages information about the abundance of one or more individual species; for example, a researcher might ‘spike in’ a known, fixed amount of an extraneous species to all samples prior to MGS, and normalize the read counts of all species to the spike-in species (Harrison et al. (2021)). We consider each approach in detail to determine how taxonomic bias affects the resulting absolute-abundance measurements.

2.3.1 Leveraging information about total-community abundance

Suppose that the total abundance of all species in the sample, \(A_{{\text{tot}}}^{(a)}\), has been measured by a non-MGS method, yielding a measurement \(\tilde A_{\text{tot}}^{(a)}\). The absolute abundance of an individual species can be quantified by multiplying the species’ proportion from MGS by this total-abundance measurement, \[\begin{align} \tag{2.11} \tilde A_i^{(a)} &= \tilde P_i^{(a)} \tilde A_{\text{tot}}^{(a)}. \end{align}\] Total-abundance measurements recently used for this purpose include counting cells with microscopy (Lloyd et al. (2020)) or flow cytometry (Props et al. (2017), Vandeputte et al. (2017), Galazzo et al. (2020)), measuring the concentration of a marker-gene with qPCR or ddPCR (Zhang et al. (2017), Barlow, Bogatyrev, and Ismagilov (2020), Galazzo et al. (2020), Tettamanti Boshier et al. (2020)), and measuring bulk DNA concentration with a florescence-based DNA quantification method (Contijoch et al. (2019)).

Importantly, these methods of measuring total abundance are themselves subject to taxonomic bias that is analogous to, but quantitatively different from, the MGS relative abundance measurements. Flow cytometry may yield lower cell counts for species whose cells tend to clump together or are prone to lysis during steps involved in sample collection, storage, and preparation. Marker-gene concentrations measured by qPCR are affected by variation among species in extraction efficiency, marker-gene copy number, and PCR binding and amplification efficiency (Lloyd et al. (2013)). We can easily understand the impact of taxonomic bias on total-abundance measurement under simplifying assumptions analogous to those in our MGS model. Suppose that each species \(i\) has an absolute efficiency \(B_{i}^{{\text{[tot]}}}\) for the total-abundance measurement that is constant across samples. Further, let \(\bar B^{{\text{[tot]}}(a)}\) be the abundance-weighted average of these efficiencies in sample \(a\)—that is, the mean efficiency of the total-abundance measurement. Neglecting other error sources, the total-abundance measurement equals \[\begin{align} \tag{2.12} \tilde A_{\text{tot}}^{(a)} &= \sum_{i\in S} A_i^{(a)} B_{i}^{{\text{[tot]}}} \\&= A_{\text{tot}}^{(a)} \bar B^{{\text{[tot]}}(a)}. \end{align}\]

Species abundance measurements derived by this method (Equation (2.11)) are affected by taxonomic bias in both the MGS and total-abundance measurement. We can determine the resulting fold error (FE) in the estimate \(\tilde A_i^{(a)}\) by substituting Equations (2.7) and (2.12) into Equation (2.11), yielding \[\begin{align} \tag{2.13} \tilde A_{\text{tot}}^{(a)} = A_{\text{tot}}^{(a)} \cdot \frac{B_i \bar B^{{\text{[tot]}}(a)}}{\bar B^{(a)}}. \end{align}\] Equation (2.13) indicates that the FE in the measured absolute abundance of a species equals its MGS efficiency relative to the mean MGS efficiency in the sample, multiplied by the mean efficiency of the total measurement. As in the case of proportions (Equation (2.7)), the FE depends on sample composition through the two mean efficiency terms and so will, in general, vary across samples.

2.3.2 Leveraging information about a reference species

Suppose that the absolute abundance of a reference species \(r\) has been fixed by the experimenter or been measured by independent means. This known or measured abundance \(\tilde A_{r}^{(a)}\) can be used in conjunction with the MGS read counts to obtain absolute abundances for all species. In the absence of taxonomic bias, the ratio of a species’ absolute abundance to its MGS read count is the same for all species in a given sample (Equation (2.1)). Hence the known ratio for the reference species can serve as conversion factor for obtaining the absolute abundance of a species \(i\) from its read count, \[\begin{align} \tag{2.14} \tilde A_i^{(a)} &= M_i^{(a)} \cdot \frac{\tilde A_r^{(a)}}{M_r^{(a)}}. \end{align}\] Let \(\mathop{\mathrm{FE}}[\tilde A_r^{(a)}] \equiv {\tilde A_r^{(a)}}/{A_r^{(a)}}\) be the FE in the reference measurement. The effect on \(\tilde A_i^{(a)}\) of taxonomic bias in the MGS measurement can be determined by substituting Equation (2.9) into Equation (2.14), yielding \[\begin{align} \tag{2.15} \tilde A_i^{(a)} = A_i^{(a)} \cdot \frac{B_i}{B_r} \cdot % \frac{\tilde A_r^{(a)}}{A_r^{(a)}}. \mathop{\mathrm{FE}}\left[\tilde A_r^{(a)}\right]. \end{align}\] The FE in \(\tilde A_i^{(a)}\) consists of two terms: the relative efficiency of species \(i\) to species \(r\) in the MGS measurement (\({B_i}/{B_r}\)) and the FE in the reference species’ abundance (\({\tilde A_r^{(a)}}/{A_r^{(a)}}\)).

A common application of this approach involves adding a ‘spike-in’ (as described above) in a known (and typically constant) abundance across samples (Stämmler et al. (2016), Ji et al. (2019), Tkacz, Hortala, and Poole (2018), Harrison et al. (2021), Rao et al. (2021)). In this case, the reference abundance \(\tilde A_r^{(a)}\) is determined from the concentration of the spike-in stock multiplied by the ratio of the spike-in to sample volumes.

Others have instead sought to determine naturally-occurring species that are thought to be constant across samples; we refer to such species as housekeeping species by analogy with the housekeeping genes used for absolute-abundance conversion in gene-expression studies (Silver et al. (2006)). Housekeeping species can sometimes be identified using prior scientific knowledge; for example, in shotgun sequencing experiments, researchers have used sequencing reads from the plant or animal host as a reference (Karasov et al. (2020), Regalado et al. (2020), Wallace et al. (2021)). A related approach involves computationally identifying species that are constant between pairs of samples (David et al. (2014)) or between sample conditions (Mandal et al. (2015), Kumar et al. (2018)). The abundance of a housekeeping species is typically unknown; therefore, to estimate the abundances of other species, we simply set \(\tilde A_r^{(a)}\) to 1 in Equation (2.14). The resulting abundance measurements have unknown but fixed units, which is sufficient for measuring fold changes across samples.

We suggest an additional way of using the reference-species strategy even in the absence of a spike-in or constant species: Performing targeted measurements of the absolute abundance of one or more naturally occurring species. These species can then be used as reference species in Equation (2.14) to measure the absolute abundances of all species. The most common form of targeted measurement involves using qPCR or ddPCR to measure the concentration of a marker-gene in the extracted DNA. It is also possible to directly measure cell concentration by performing ddPCR prior to DNA extraction (Morella et al. (2018)), flow cytometry with species-specific florescent probes, or CFU counting on selective media.

References

Barlow, Jacob T., Said R. Bogatyrev, and Rustem F. Ismagilov. 2020. “A quantitative sequencing framework for absolute abundance measurements of mucosal and lumenal microbial communities.” Nat. Commun. 11 (1): 1–13. https://doi.org/10.1038/s41467-020-16224-6.

Contijoch, Eduardo J, Graham J Britton, Chao Yang, Ilaria Mogno, Zhihua Li, Ruby Ng, Sean R Llewellyn, et al. 2019. “Gut microbiota density influences host physiology and is shaped by host and microbial factors.” Elife 8 (January). https://doi.org/10.7554/eLife.40553.

David, Lawrence A, Arne C Materna, Jonathan Friedman, Maria I Campos-Baptista, Matthew C Blackburn, Allison Perrotta, Susan E Erdman, and Eric J Alm. 2014. “Host lifestyle affects human microbiota on daily timescales.” Genome Biol. 15 (7): R89. https://doi.org/10.1186/gb-2014-15-7-r89.

Galazzo, Gianluca, Niels van Best, Birke J. Benedikter, Kevin Janssen, Liene Bervoets, Christel Driessen, Melissa Oomen, et al. 2020. “How to Count Our Microbes? The Effect of Different Quantitative Microbiome Profiling Approaches.” Front. Cell. Infect. Microbiol. 10 (August). https://doi.org/10.3389/fcimb.2020.00403.

Gloor, Gregory B., Jean M. Macklaim, Vera Pawlowsky-Glahn, and Juan J. Egozcue. 2017. “Microbiome Datasets Are Compositional: And This Is Not Optional.” Front. Microbiol. 8 (November): 2224. https://doi.org/10.3389/fmicb.2017.02224.

Harrison, Joshua G., W. John Calder, Bryan Shuman, and C. Alex Buerkle. 2021. “The quest for absolute abundance: The use of internal standards for DNA‐based community ecology.” Mol. Ecol. Resour. 21 (1): 30–43. https://doi.org/10.1111/1755-0998.13247.

Ji, Brian W., Ravi U. Sheth, Purushottam D. Dixit, Yiming Huang, Andrew Kaufman, Harris H. Wang, and Dennis Vitkup. 2019. “Quantifying spatiotemporal variability and noise in absolute microbiota abundances using replicate sampling.” Nat. Methods 16 (8): 731–36. https://doi.org/10.1038/s41592-019-0467-y.

Karasov, Talia L., Manuela Neumann, Alejandra Duque-Jaramillo, Sonja Kersten, Ilja Bezrukov, Birgit Schröppel, Efthymia Symeonidi, et al. 2020. “The relationship between microbial population size and disease in the Arabidopsis thaliana phyllosphere.” bioRxiv. https://doi.org/10.1101/828814.

Kumar, M. Senthil, Eric V. Slud, Kwame Okrah, Stephanie C. Hicks, Sridhar Hannenhalli, and Héctor Corrada Bravo. 2018. “Analysis and correction of compositional bias in sparse sequencing count data.” BMC Genomics 19 (1): 799. https://doi.org/10.1186/s12864-018-5160-5.

Lloyd, Karen G., Jordan T. Bird, Joy Buongiorno, Emily Deas, Richard Kevorkian, Talor Noordhoek, Jacob Rosalsky, and Taylor Roy. 2020. “Evidence for a Growth Zone for Deep-Subsurface Microbial Clades in Near-Surface Anoxic Sediments.” Appl. Environ. Microbiol. 86 (19): 1–15. https://doi.org/10.1128/AEM.00877-20.

Lloyd, Karen G., Megan K. May, Richard T. Kevorkian, and Andrew D. Steen. 2013. “Meta-analysis of quantification methods shows that archaea and bacteria have similar abundances in the subseafloor.” Appl. Environ. Microbiol. 79 (24): 7790–99. https://doi.org/10.1128/AEM.02090-13.

Mandal, Siddhartha, Will Van Treuren, Richard A. White, Merete Eggesbø, Rob Knight, and Shyamal D. Peddada. 2015. “Analysis of composition of microbiomes: a novel method for studying microbial composition.” Microb. Ecol. Heal. Dis. 26 (1): 27663. https://doi.org/10.3402/mehd.v26.27663.

McLaren, Michael R, Amy D Willis, and Benjamin J Callahan. 2019. “Consistent and correctable bias in metagenomic sequencing experiments.” Elife 8 (September): 46923. https://doi.org/10.7554/eLife.46923.

Morella, Norma M., Shangyang Christopher Yang, Catherine A. Hernandez, and Britt Koskella. 2018. “Rapid quantification of bacteriophages and their bacterial hosts in vitro and in vivo using droplet digital PCR.” J. Virol. Methods 259 (May): 18–24. https://doi.org/10.1016/j.jviromet.2018.05.007.

Props, Ruben, Frederiek-Maarten Kerckhof, Peter Rubbens, Jo De Vrieze, Emma Hernandez Sanabria, Willem Waegeman, Pieter Monsieurs, Frederik Hammes, and Nico Boon. 2017. “Absolute quantification of microbial taxon abundances.” ISME J. 11 (2): 584–87. https://doi.org/10.1038/ismej.2016.117.

Rao, Chitong, Katharine Z. Coyte, Wayne Bainter, Raif S. Geha, Camilia R. Martin, and Seth Rakoff-Nahoum. 2021. “Multi-kingdom ecological drivers of microbiota assembly in preterm infants.” Nature 591 (7851): 633–38. https://doi.org/10.1038/s41586-021-03241-8.

Regalado, Julian, Derek S. Lundberg, Oliver Deusch, Sonja Kersten, Talia Karasov, Karin Poersch, Gautam Shirsekar, and Detlef Weigel. 2020. “Combining whole-genome shotgun sequencing and rRNA gene amplicon analyses to improve detection of microbe–microbe interaction networks in plant leaves.” ISME J., May, 823492. https://doi.org/10.1038/s41396-020-0665-8.

Silver, Nicholas, Steve Best, Jie Jiang, and Swee Lay Thein. 2006. “Selection of Housekeeping Genes for Gene Expression Studies in Human Reticulocytes Using Real-Time PCR.” BMC Mol. Biol. 7 (October): 33. https://doi.org/10.1186/1471-2199-7-33.

Stämmler, Frank, Joachim Gläsner, Andreas Hiergeist, Ernst Holler, Daniela Weber, Peter J. Oefner, André Gessner, and Rainer Spang. 2016. “Adjusting microbiome profiles for differences in microbial load by spike-in bacteria.” Microbiome 4 (1): 28. https://doi.org/10.1186/s40168-016-0175-0.

Tettamanti Boshier, Florencia A., Sujatha Srinivasan, Anthony Lopez, Noah G. Hoffman, Sean Proll, David N. Fredricks, and Joshua T. Schiffer. 2020. “Complementing 16S rRNA Gene Amplicon Sequencing with Total Bacterial Load To Infer Absolute Species Concentrations in the Vaginal Microbiome.” mSystems 5 (2): 1–14. https://doi.org/10.1128/mSystems.00777-19.

Tkacz, Andrzej, Marion Hortala, and Philip S. Poole. 2018. “Absolute quantitation of microbiota abundance in environmental samples.” Microbiome 6 (1): 110. https://doi.org/10.1186/s40168-018-0491-7.

Vandeputte, Doris, Gunter Kathagen, Kevin D’hoe, Sara Vieira-Silva, Mireia Valles-Colomer, João Sabino, Jun Wang, et al. 2017. “Quantitative microbiome profiling links gut community variation to microbial load.” Nature 551 (7681): 507. https://doi.org/10.1038/nature24460.

Wallace, Megan A., Kelsey A. Coffman, Clément Gilbert, Sanjana Ravindran, Gregory F. Albery, Jessica Abbott, Eliza Argyridou, et al. 2021. “The discovery, distribution, and diversity of DNA viruses associated with Drosophila melanogaster in Europe.” Virus Evol. 7 (1): 1–23. https://doi.org/10.1093/ve/veab031.

Zhang, Zhaojing, Yuanyuan Qu, Shuzhen Li, Kai Feng, Shang Wang, Weiwei Cai, Yuting Liang, et al. 2017. “Soil bacterial quantification approaches coupling with relative abundances reflecting the changes of taxa.” Sci. Rep. 7 (1): 1–11. https://doi.org/10.1038/s41598-017-05260-w.