This method merges species that have the same taxonomy at a certain taxonomic rank. Its approach is analogous to tip_glom(), but uses categorical data instead of a tree. In principal, other categorical data known for all taxa could also be used in place of taxonomy, but for the moment, this must be stored in the taxonomyTable of the data. Also, columns/ranks to the right of the rank chosen to use for agglomeration will be replaced with NA, because they should be meaningless following agglomeration.

tax_glom(
  physeq,
  taxrank = rank_names(physeq)[1],
  NArm = TRUE,
  bad_empty = c(NA, "", " ", "\t"),
  reorder = FALSE
)

Arguments

physeq

(Required). phyloseq-class() or tax_table().

taxrank

A character string specifying the taxonomic level that you want to agglomerate over. Should be among the results of rank_names(physeq). The default value is rank_names(physeq)[1], which may agglomerate too broadly for a given experiment. You are strongly encouraged to try different values for this argument.

NArm

(Optional). Logical, length equal to one. Default is TRUE. CAUTION. The decision to prune (or not) taxa for which you lack categorical data could have a large effect on downstream analysis. You may want to re-compute your analysis under both conditions, or at least think carefully about what the effect might be and the reasons explaining the absence of information for certain taxa. In the case of taxonomy, it is often a result of imprecision in taxonomic designation based on short phylogenetic sequences and a patchy system of nomenclature. If this seems to be an issue for your analysis, think about also trying the nomenclature-agnostic tip_glom() method if you have a phylogenetic tree available.

bad_empty

(Optional). Character vector. Default: c(NA, "", " ", "\t"). Defines the bad/empty values that should be ignored and/or considered unknown. They will be removed from the internal agglomeration vector derived from the argument to tax, and therefore agglomeration will not combine taxa according to the presence of these values in tax. Furthermore, the corresponding taxa can be optionally pruned from the output if NArm is set to TRUE.

reorder

Logical specifying whether to reorder the taxa by taxonomy strings or keep initial order. Ignored if physeq has a phylogenetic tree.

Value

A taxonomically-agglomerated, optionally-pruned, object with class matching the class of physeq.

Details

This is the speedyseq reimplementation of phyloseq::tax_glom(). It should produce results that are identical to phyloseq up to taxon order.

If x is a phyloseq object with a phylogenetic tree, then the new taxa will be ordered as they are in the tree. Otherwise, the taxa order can be controlled by the reorder argument, which behaves like the reorder argument in base::rowsum(). reorder = FALSE will keep taxa in the original order determined by when the member of each group first appears in taxa_names(x); reorder = TRUE will order new taxa alphabetically according to taxonomy (string of concatenated rank values).

Acknowledgements: Documentation and general strategy derived from phyloseq::tax_glom().

See also

Examples

data(GlobalPatterns) # print the available taxonomic ranks colnames(tax_table(GlobalPatterns))
#> [1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
# agglomerate at the Family taxonomic rank (x1 <- tax_glom(GlobalPatterns, taxrank="Family"))
#> phyloseq-class experiment-level object #> otu_table() OTU Table: [ 341 taxa and 26 samples ]: #> sample_data() Sample Data: [ 26 samples by 7 sample variables ]: #> tax_table() Taxonomy Table: [ 341 taxa by 7 taxonomic ranks ]: #> phy_tree() Phylogenetic Tree: [ 341 tips and 340 internal nodes ]: #> taxa are rows
# How many taxa before/after agglomeration? ntaxa(GlobalPatterns); ntaxa(x1)
#> [1] 19216
#> [1] 341