We perform the regression analysis of Leopold and Busby (2020) with and without bias correction, finding that there is negligible impact of bias correction on the results.
This document asks whether bias correction has a significant impact on the regression analysis of Leopold and Busby (2020).
This script follows the analysis of code/jsdModels.R
in https://github.com/dleopold/Populus_priorityEffects.
We start by running the setup of libraries and loading the phyloseq object and bias estimate. The code is modified from jsdModels.R
to adjust working directories for reading files.
library(tidyverse)
library(magrittr)
library(phyloseq)
library(mvabund)
library(gt)
this_dir <- getwd()
setwd(here('notebook/_data/leopold2020host/dleopold-Populus_priorityEffects-8594f7c/'))
source("code/Rfunctions.R")
# load phyloseq data
(phy <- loadPhyloseq())
phyloseq-class experiment-level object
otu_table() OTU Table: [ 8 taxa and 247 samples ]
sample_data() Sample Data: [ 247 samples by 16 sample variables ]
tax_table() Taxonomy Table: [ 8 taxa by 7 taxonomic ranks ]
refseq() DNAStringSet: [ 8 reference sequences ]
Bias is accounted for in the original regression analysis through the use of taxon and sample-specific offsets, stored in a matrix effort
. These offsets also account for the variation in read depth across samples. To perform the analysis with and without bias correction, we therefore compute two versions of the offsets, using the estimated bias vector and a dummy bias vector where the efficiencies are all set to 1.
bias_list <- list(
'Corrected' = bias,
'Uncorrected' = bias %>% mutate(Bhat = 1)
)
# 'effort' (offset) is calculated following jsdModels.R
effort_fun <- function(bias, phy) {
(sample_sums(unbias(phy, bias)) %*%
t(bias$Bhat[match(taxa_names(phy), bias$Taxon)])
) %>% log
}
effort_list <- bias_list %>%
map(effort_fun, phy = phy)
The offset calculation is done as in jsdModels.R
. In section ‘Quantification and statistical analysis’), Leopold and Busby (2020) explain
We accounted for 2 sources of unequal sampling effort, variable sampling depth and the species-specific sequencing biases, by including an offset term (\(effort\)) for each species \(i\) in each sample \(j\), in the form: \(effort_{ij} = \log(bias_{i} \times depth_{j})\), where \(bias_i\) is the sequencing bias correction factor for species \(i\), estimated from our mock communities (see above, Estimating Sequencing Bias), and \(depth_j\) is the total sum of all species in sample \(j\), after dividing each by their species-specific sequencing bias correction factor.
The call to sample_sums(unbias(phy, bias))
returns the \(depth_{j}\) terms as described in this paragraph.
To understand why this calculation yields the correct offset, consider that under our model (and in our notation) we have \[\begin{align} \log \text{reads}_i(a) &= \log \text{prop}_i (a) + \underbrace{\log \left[\frac{\text{efficiency}_i}{\text{efficiency}_S(a)} \cdot \text{reads}_S(a) \right]}_{\text{offset}}. \end{align}\] It can be shown with some algebra that \[\begin{align} \frac{\text{reads}_S(a)}{\text{efficiency}_S(a)} = \sum_i \frac{\text{reads}_i(a)}{\text{efficiency}_i}. \end{align}\] The left-hand side is the ratio of total reads to mean efficiency of the sample \(a\), and the right-hand side is the \(depth_{a}\) of sample \(a\) defined by Leopold and Busby (2020). It follows that the ‘effort’ terms computed by Leopold and Busby (2020) are indeed the offsets implied by our model.
We’ll wrap the analysis from jsdModels.R
in a function to call with the corrected and uncorrected ‘effort’ offsets. Toggle to see the code.
analyze_jsd_models <- function(effort, phy, nBoot=4999) {
##########################
### Fit genotype model ###
##########################
# Make mvabund object
mvDat <- otu_table(phy) %>% data.frame %>% mvabund
# Fit joint-species model for genotype effect
mv.full <- manyglm(mvDat ~ Genotype*Treatment,
offset=effort,
family="negative.binomial",
data=data.frame(sample_data(phy)))
# Check model assumptions
#plot(mv.full)
#meanvar.plot(mvDat~sample_data(phy)$Treatment)
#meanvar.plot(mvDat~sample_data(phy)$Genotype)
# Test with anova.manyglm
# Using unstructured correlation matrix and wald tests.
# Including univariate test with adjustment for multiple testing.
mv.anova <- anova(mv.full, nBoot=nBoot, p.uni="adjusted", cor.type="shrink", test="wald")
#> saveRDS(mv.anova, "output/rds/mv.genotype.rds")
########################
### Fit region model ###
########################
#' # Fit joint-species model for genotype effect
mv.region <- manyglm(mvDat ~ Region*Treatment,
offset=effort,
family="negative.binomial",
data=data.frame(sample_data(phy)))
#' ## Check model assumptions
#plot(mv.region)
#meanvar.plot(mvDat~sample_data(phy)$Region)
#' ## Test with anova.manyglm
#+ cache=T, results='asis'
mv.region.anova <- anova(mv.region, nBoot=nBoot, p.uni="adjusted", cor.type="shrink", test="wald")
#> saveRDS(mv.region.anova, "output/rds/mv.region.rds")
mv.results <- bind_cols(
mv.anova$table %>%
rownames_to_column() %>%
filter(rowname!='(Intercept)') %>%
mutate(rowname = gsub("Genotype","Host",rowname)),
mv.region.anova$table %>%
rownames_to_column() %>%
filter(rowname!='(Intercept)') %>%
select(-rowname),
# NOTE: [MRM] I set the name repair function to match the original behavior
.name_repair = function(x) make.unique(x, sep = '')
) %>%
gt(rowname_col = "rowname") %>%
tab_spanner(
label = "Genotype",
columns = vars(Res.Df, Df.diff, wald, 'Pr(>wald)')
) %>%
tab_spanner(
label = "Ecotype",
columns = vars(Res.Df1, Df.diff1, wald1, 'Pr(>wald)1')
) %>%
fmt_number(c(4,8),
decimals = 1) %>%
fmt(c(5,9),
fns = function(x) {
ifelse(x>=0.001,round(x,3),"< 0.001")
}) %>%
cols_label('Pr(>wald)'=md("*P*-value"),
wald=md("Wald-χ<sup>2<sup>"),
Res.Df=md("Df.resid"),
Df.diff="Df",
'Pr(>wald)1'=md("*P*-value"),
wald1=md("Wald-χ<sup>2<sup>"),
Res.Df1=md("Df.resid"),
Df.diff1="Df") %>%
cols_move_to_start(3) %>%
cols_move(7,5) %>%
cols_align("center")
#> gtsave(mv.results,"output/figs/jsdModels.png")
list(
mv.full = mv.full,
mv.anova = mv.anova,
mv.region = mv.region,
mv.region.anova = mv.region.anova,
mv.results = mv.results
)
}
Finally, we run the analysis. The call is wrapped in xfun::cache_rds()
to cache the results; the hash is based on the inputs and the analysis function’s source code.
Next we compare the results of the analysis with and without bias correction.
First, let’s compare the results summary tables.
With bias correction
Genotype | Ecotype | |||||||
---|---|---|---|---|---|---|---|---|
Df | Df.resid | Wald-χ2 | P-value | Df | Df.resid | Wald-χ2 | P-value | |
Host | 11 | 235 | 20.1 | < 0.001 | 1 | 245 | 12.4 | < 0.001 |
Treatment | 4 | 231 | 21.8 | < 0.001 | 4 | 241 | 21.6 | < 0.001 |
Host:Treatment | 44 | 187 | 23.5 | 0.038 | 4 | 237 | 6.1 | 0.298 |
Without bias correction
Genotype | Ecotype | |||||||
---|---|---|---|---|---|---|---|---|
Df | Df.resid | Wald-χ2 | P-value | Df | Df.resid | Wald-χ2 | P-value | |
Host | 11 | 235 | 20.3 | < 0.001 | 1 | 245 | 12.4 | < 0.001 |
Treatment | 4 | 231 | 21.5 | < 0.001 | 4 | 241 | 21.4 | < 0.001 |
Host:Treatment | 44 | 187 | 23.5 | 0.03 | 4 | 237 | 6.2 | 0.261 |
With bias correction
res[['Corrected']][['mv.full']]
Call: manyglm(formula = mvDat ~ Genotype * Treatment, family = "negative.binomial", data = data.frame(sample_data(phy)), offset = effort)
[1] "negative.binomial"
Nuisance Parameter(s) phi estimated by the PHI method.
Aureobasidium Trichoderma Fusarium Penicillium
0.164 0.167 0.085 0.684
Alternaria Cladosporium Dioszegia Epicoccum
0.016 0.104 0.189 0.089
Degrees of Freedom: 246 Total (i.e. Null); 187 Residual
Aureobasidium Trichoderma Fusarium Penicillium
2*log-likelihood: -2279.7 -1861.5 -4056.8 -1581.0
Residual Deviance: 256.4 324.6 250.4 305.9
AIC: 2401.7 1983.5 4178.8 1703.0
Alternaria Cladosporium Dioszegia Epicoccum
2*log-likelihood: -4059.2 -2800.9 -2386.3 -4014.2
Residual Deviance: 247.8 250.0 256.2 251.1
AIC: 4181.2 2922.9 2508.3 4136.2
Without bias correction
res[['Uncorrected']][['mv.full']]
Call: manyglm(formula = mvDat ~ Genotype * Treatment, family = "negative.binomial", data = data.frame(sample_data(phy)), offset = effort)
[1] "negative.binomial"
Nuisance Parameter(s) phi estimated by the PHI method.
Aureobasidium Trichoderma Fusarium Penicillium
0.176 0.177 0.067 0.678
Alternaria Cladosporium Dioszegia Epicoccum
0.021 0.110 0.200 0.091
Degrees of Freedom: 246 Total (i.e. Null); 187 Residual
Aureobasidium Trichoderma Fusarium Penicillium
2*log-likelihood: -2296.4 -1869.7 -3997.7 -1579.4
Residual Deviance: 256.6 321.8 249.7 306.1
AIC: 2418.4 1991.7 4119.6 1701.4
Alternaria Cladosporium Dioszegia Epicoccum
2*log-likelihood: -4130.5 -2813.6 -2399.7 -4019.2
Residual Deviance: 247.9 250.0 256.2 251.1
AIC: 4252.5 2935.6 2521.7 4141.2
Let’s compare the estimated coefficients for the various response variables.
tidy_manyglm <- function(x) {
x %>% coef %>%
as_tibble(rownames = 'term') %>%
pivot_longer(-term, names_to = 'response', values_to = 'estimate')
}
coef_ests <- res %>%
map('mv.full') %>%
map_dfr(tidy_manyglm, .id = 'type')
x <- coef_ests %>%
filter(term != '(Intercept)') %>%
pivot_wider(names_from = type, values_from = estimate)
mse <- x %>%
summarize(
mse = mean((Corrected - Uncorrected)^2)
) %>% .[[1]] %>% signif(2)
x %>%
ggplot(aes(Uncorrected, Corrected)) +
coord_fixed() +
geom_abline(color = 'darkred') +
geom_point() +
labs(title = 'Non-intercept coefficients, full model',
subtitle = str_glue('Mean squared difference: {mse}')
) +
theme(plot.title.position = 'plot') +
scale_color_brewer(type = 'qual', palette = 2)
With bias correction
res[['Corrected']][['mv.region']]
Call: manyglm(formula = mvDat ~ Region * Treatment, family = "negative.binomial", data = data.frame(sample_data(phy)), offset = effort)
[1] "negative.binomial"
Nuisance Parameter(s) phi estimated by the PHI method.
Aureobasidium Trichoderma Fusarium Penicillium
0.270 0.217 0.118 1.148
Alternaria Cladosporium Dioszegia Epicoccum
0.021 0.154 0.266 0.114
Degrees of Freedom: 246 Total (i.e. Null); 237 Residual
Aureobasidium Trichoderma Fusarium Penicillium
2*log-likelihood: -2398.9 -1901.2 -4140.6 -1674.6
Residual Deviance: 259.4 314.0 251.8 296.2
AIC: 2420.9 1923.2 4162.6 1696.6
Alternaria Cladosporium Dioszegia Epicoccum
2*log-likelihood: -4134.5 -2897.9 -2468.6 -4078.0
Residual Deviance: 247.9 252.0 257.7 251.9
AIC: 4156.5 2919.9 2490.6 4100.0
Without bias correction
res[['Uncorrected']][['mv.region']]
Call: manyglm(formula = mvDat ~ Region * Treatment, family = "negative.binomial", data = data.frame(sample_data(phy)), offset = effort)
[1] "negative.binomial"
Nuisance Parameter(s) phi estimated by the PHI method.
Aureobasidium Trichoderma Fusarium Penicillium
0.287 0.229 0.093 1.142
Alternaria Cladosporium Dioszegia Epicoccum
0.028 0.165 0.280 0.117
Degrees of Freedom: 246 Total (i.e. Null); 237 Residual
Aureobasidium Trichoderma Fusarium Penicillium
2*log-likelihood: -2413.6 -1909.5 -4078.9 -1673.6
Residual Deviance: 259.9 311.7 250.8 296.2
AIC: 2435.6 1931.5 4100.9 1695.6
Alternaria Cladosporium Dioszegia Epicoccum
2*log-likelihood: -4200.8 -2915.9 -2481.6 -4083.4
Residual Deviance: 248.1 252.4 258.1 252.0
AIC: 4222.8 2937.9 2503.6 4105.4
Let’s compare the estimated coefficients for the various response variables.
coef_ests <- res %>%
map('mv.region') %>%
map_dfr(tidy_manyglm, .id = 'type')
x <- coef_ests %>%
filter(term != '(Intercept)') %>%
pivot_wider(names_from = type, values_from = estimate)
mse <- x %>%
summarize(
mse = mean((Corrected - Uncorrected)^2)
) %>% .[[1]] %>% signif(2)
x %>%
ggplot(aes(Uncorrected, Corrected)) +
coord_fixed() +
geom_abline(color = 'darkred') +
geom_point() +
labs(title = 'Non-intercept coefficients, region model',
subtitle = str_glue('Mean squared difference: {mse}')
) +
theme(plot.title.position = 'plot') +
scale_color_brewer(type = 'qual', palette = 2)
Bias correction has little impact on the estimated coefficients for the non-intercept terms for either model.
sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────
setting value
version R version 4.1.2 (2021-11-01)
os Arch Linux
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2022-01-06
pandoc 2.14.1 @ /usr/bin/ (via rmarkdown)
─ Packages ────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
ade4 1.7-18 2021-09-16 [1] CRAN (R 4.1.1)
ape 5.5 2021-04-25 [1] CRAN (R 4.1.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
Biobase 2.52.0 2021-05-19 [1] Bioconductor
BiocGenerics 0.38.0 2021-05-19 [1] Bioconductor
biomformat 1.20.0 2021-05-19 [1] Bioconductor
Biostrings 2.60.1 2021-06-06 [1] Bioconductor
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
broom 0.7.10 2021-10-31 [1] CRAN (R 4.1.2)
bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.1)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.1)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.0)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.2)
cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
cluster 2.1.2 2021-04-17 [2] CRAN (R 4.1.2)
codetools 0.2-18 2020-11-04 [2] CRAN (R 4.1.2)
colorspace 2.0-2 2021-08-11 [1] R-Forge (R 4.1.1)
commonmark 1.7 2018-12-01 [1] CRAN (R 4.0.0)
cowplot * 1.1.1 2021-08-27 [1] Github (wilkelab/cowplot@555c9ae)
crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.1)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.4)
dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.0.5)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
distill 1.3 2021-10-13 [1] CRAN (R 4.1.1)
downlit 0.4.0 2021-10-29 [1] CRAN (R 4.1.2)
dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.4)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.4)
forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.4)
foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.3)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1)
GenomeInfoDb 1.28.1 2021-07-01 [1] Bioconductor
GenomeInfoDbData 1.2.6 2021-05-31 [1] Bioconductor
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
glue 1.5.1 2021-11-30 [1] CRAN (R 4.1.2)
gt * 0.3.1 2021-08-07 [1] CRAN (R 4.1.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.1)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.0.5)
highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.1)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
igraph 1.2.9 2021-11-23 [1] CRAN (R 4.1.2)
IRanges 2.26.0 2021-05-19 [1] Bioconductor
iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.3)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3)
knitr 1.36 2021-09-29 [1] CRAN (R 4.1.1)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.3)
lattice 0.20-45 2021-09-22 [2] CRAN (R 4.1.2)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.1)
magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
MASS 7.3-54 2021-05-03 [2] CRAN (R 4.1.2)
Matrix 1.3-4 2021-06-01 [2] CRAN (R 4.1.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.2)
mgcv 1.8-38 2021-10-06 [2] CRAN (R 4.1.2)
modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.0)
multtest 2.48.0 2021-05-19 [1] Bioconductor
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
mvabund * 4.1.12 2021-05-28 [1] CRAN (R 4.1.2)
NADA 1.6-1.1 2020-03-22 [1] CRAN (R 4.0.1)
nlme 3.1-153 2021-09-07 [2] CRAN (R 4.1.2)
nvimcom * 0.9-102 2021-11-12 [1] local
patchwork * 1.1.1 2020-12-17 [1] CRAN (R 4.0.3)
permute 0.9-5 2019-03-12 [1] CRAN (R 4.0.0)
phyloseq * 1.36.0 2021-05-19 [1] Bioconductor
pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.1.1)
readr * 2.1.1 2021-11-30 [1] CRAN (R 4.1.2)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.0)
reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.0)
rhdf5 2.36.0 2021-05-19 [1] Bioconductor
rhdf5filters 1.4.0 2021-05-19 [1] Bioconductor
Rhdf5lib 1.14.2 2021-07-06 [1] Bioconductor
rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
rmarkdown * 2.11 2021-09-14 [1] CRAN (R 4.1.1)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3)
rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.1)
S4Vectors 0.30.0 2021-05-19 [1] Bioconductor
sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
statmod 1.4.36 2021-05-10 [1] CRAN (R 4.1.0)
stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
survival 3.2-13 2021-08-24 [2] CRAN (R 4.1.2)
tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
tidyr * 1.1.4 2021-09-27 [1] CRAN (R 4.1.1)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0)
truncnorm 1.0-8 2018-02-27 [1] CRAN (R 4.0.1)
tweedie 2.3.3 2021-01-20 [1] CRAN (R 4.0.4)
tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.2)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
vegan 2.5-7 2020-11-28 [1] CRAN (R 4.0.3)
withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
xfun 0.28 2021-11-04 [1] CRAN (R 4.1.2)
xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.2)
XVector 0.32.0 2021-05-19 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
zCompositions 1.3.4 2020-03-04 [1] CRAN (R 4.0.1)
zlibbioc 1.38.0 2021-05-19 [1] Bioconductor
[1] /home/michael/.local/lib/R/library
[2] /usr/lib/R/library
───────────────────────────────────────────────────────────────────────────────