Genome statistics

NCBI genome lengths and 16S copy numbers

## # A tibble: 9 x 4
##   ncbi_species         ncbi_ssu_count ncbi_total_length ncbi_refseq_catego…
##   <chr>                         <dbl>             <dbl> <chr>              
## 1 Atopobium vaginae                 1           1449613 representative gen…
## 2 Atopobium vaginae                 1           1430526 representative gen…
## 3 Gardnerella vaginal…              2           1667350 reference genome   
## 4 Gardnerella vaginal…              2           1617545 representative gen…
## 5 Lactobacillus crisp…              4           2043161 representative gen…
## 6 Lactobacillus iners               1           1277649 representative gen…
## 7 Prevotella bivia                  4           2521238 representative gen…
## 8 Sneathia amnii                    3           1330224 representative gen…
## 9 Streptococcus agala…              7           2160267 reference genome

NCBI annotations of 1 16S copy may just mean that the genome assemblies did not properly separate out the different 16S copies, so some further investigation of A. vaginae and L. iners in particular is warranted.

rrnDB: 16S copy number

## # A tibble: 3 x 6
##   ncbi_species                 n  mean median   min   max
##   <chr>                    <int> <dbl>  <dbl> <dbl> <dbl>
## 1 Gardnerella vaginalis        5  2         2     2     2
## 2 Sneathia amnii               1  3         3     3     3
## 3 Streptococcus agalactiae    57  6.82      7     5     8

These look like reliable numbers for these three species, and agree with the ncbi annotations.

Yuan2012: 16S copy numbers

Yuan S, Cohen DB, Ravel J, Abdo Z, Forney LJ. 2012. Evaluation of Methods for the Extraction and Purification of DNA from the Human Microbiome. PLoS One 7:e33865.

They determined copy numbers for Atopobium vaginae and Lactobacillus iners by pulse-field gel electrophoresis and found

Species 16s CN
Atopobium vaginae 2
Lactobacillus iners 5

(see their Table 4 and Methods)

Check relatives in the rrnDB

## # A tibble: 2 x 2
## # Groups:   is.na(ncbi_species) [2]
##   `is.na(ncbi_species)`      n
##   <lgl>                  <int>
## 1 FALSE                 101078
## 2 TRUE                   24165

tree w/ tips that have NCBI species

Lactobacillus

Get a tree corresponding to the clade of the MRCA of L. crispatus and L. iners,

Let’s take a look at how L. iners and L. crisp fall on the tree:

These groupings of L. iners and L. crispatus agree with those of Duar2017 (Figure 2).

Next, we will look for nearby species in the rrnDB. For L iners, let’s define an “L. iners group” consisting of all species descending from the MRCA of L. iners and L. gasseri,

## [1] "Lactobacillus iners"       "Lactobacillus hominis"    
## [3] "Lactobacillus taiwanensis" "Lactobacillus johnsonii"  
## [5] "Lactobacillus gasseri"

Check the copy numbers of these species in the rrnDB:

## # A tibble: 12 x 3
##    ncbi_scientific_name          x16s_gene_count evidence                  
##    <chr>                                   <dbl> <chr>                     
##  1 Lactobacillus gasseri                       4 Machine processing of NCB…
##  2 Lactobacillus gasseri ATCC 3…               6 Machine processing of NCB…
##  3 Lactobacillus gasseri DSM 14…               6 Machine processing of NCB…
##  4 Lactobacillus johnsonii                     7 Machine processing of NCB…
##  5 Lactobacillus johnsonii                     7 Machine processing of NCB…
##  6 Lactobacillus johnsonii                     7 Machine processing of NCB…
##  7 Lactobacillus johnsonii                     7 Machine processing of NCB…
##  8 Lactobacillus johnsonii                     7 Machine processing of NCB…
##  9 Lactobacillus johnsonii DPC …               4 Machine processing of NCB…
## 10 Lactobacillus johnsonii FI97…               4 Machine processing of NCB…
## 11 Lactobacillus johnsonii N6.2                4 Machine processing of NCB…
## 12 Lactobacillus johnsonii NCC …               6 Machine processing of NCB…
## # A tibble: 1 x 5
##       n  mean median   min   max
##   <int> <dbl>  <dbl> <dbl> <dbl>
## 1    12  5.75      6     4     7

These numbers are consistent with the number of the CN of 5 found in Yuan2012. Given that L. iners is quite distant from its relatives, I will go with the estimate of 5 for L. iners determined experimentally by Yuan2012.

Now let’s do the same for L. crispatus, defining its group somewhat broadly to include all descendants of the MRCA of crispatus with acidophilus,

## [1] "Lactobacillus acidophilus"     "Lactobacillus gallinarum"     
## [3] "Lactobacillus helveticus"      "Lactobacillus crispatus"      
## [5] "Lactobacillus ultunensis"      "Lactobacillus kitasatonis"    
## [7] "Lactobacillus amylovorus"      "Lactobacillus kefiranofaciens"

Check the copy numbers of these species in the rrnDB:

## # A tibble: 26 x 3
##    ncbi_scientific_name