This function obtains a set of comparisons from a DESeq2 analysis, given a named list of contrasts. It allows additional model parameters to be specified and a design matrix to be manually adjusted.
Arguments
- dds
A SummarizedExperiment::SummarizedExperiment or DESeq2::DESeqDataSet object.
- contrasts
A named list of contrasts, e.g.
list("condition" = c("condition", "A", "B"))
. The first element is the variable of interest, the second is the test, and the third is the reference level. The name of each element in the list will be used as a name in the results table.- res.list
A named list to hold DESeq2 result tables. Allows the function to be run multiple times if needed and append to the same list. Defaults to an empty list.
- user.mat
A logical indicating whether a user-specified model matrix is provided. Defaults to FALSE.
- block
A vector of additional terms to be considered in the model, beyond the main effect. Defaults to NULL.
- design
The design formula or matrix. If a matrix is provided, ensure 'user.mat' is set to TRUE. Defaults to NULL.
- alpha
The significance level for hypothesis testing. Defaults to 0.05.
- lfc.th
A numeric vector of log2 fold-change thresholds. Defaults to `c(log2(1.25), log2(1.5))“.
- shrink.method
The method used for shrinkage estimation. Must be one of "apeglm", "ashr", or NULL. Defaults to "ashr".
- norm.ercc
A logical indicating whether to normalize to ERCC spike-ins.
- add.rowData
A vector of column names from the rowData slot of the DESeqDataSet to be added to the results table. Defaults to NULL.
- BPPARAM
The BiocParallelParam object specifying the parallel back-end to be used. Defaults to NULL.
Value
A named list of DESeq2::DESeqResults objects for the specified contrasts.
If add.rowData
is supplied, these will be returned as S4Vectors::DFrame objects instead.
Details
It is important to note that LFC shrinkage is independent of the typical MLE results table.
That is, if lfc.th
is provided, the results table p-values will reflect that testing threshold.
If shrink.method
is set to ashr
with lfc.th
, s-values will be returned in addition to the MLE p-values.
It is possible to have shrunken FCs that are near 0 that still have a significant p-value. This is frustrating, as it means you end up having to do post-hoc filtering on LFC or filter with s-values, which are more difficult to interpret than the adjusted p-values.
Examples
library(DESeq2)
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomicRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
dds_de <- makeExampleDESeqDataSet(n = 100, m = 12, betaSD = 2) # DE genes
rownames(dds_de) <- paste0("gene", 1001:1100)
dds <- makeExampleDESeqDataSet(n = 1000, m = 12) # Non-DE genes
dds <- rbind(dds, dds_de)
dds <- DESeq(dds)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
contrasts <- list("condition" = c("condition", "A", "B"))
res <- get_DESeq2_res(dds, contrasts)
#> Loading required package: ashr
#>
#> Attaching package: ‘ashr’
#> The following object is masked from ‘package:generics’:
#>
#> prune
#>
#> Design for condition_A_vs_B is ~condition
#> using pre-existing size factors
#> estimating dispersions
#> found already estimated dispersions, replacing these
#> gene-wise dispersion estimates
#> found already estimated gene-wise dispersions, removing these
#> mean-dispersion relationship
#> final dispersion estimates
#> found already estimated dispersions, removing these
#> fitting model and testing
#> using 'ashr' for LFC shrinkage. If used in published research, please cite:
#> Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
#> https://doi.org/10.1093/biostatistics/kxw041
#> using 'ashr' for LFC shrinkage. If used in published research, please cite:
#> Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
#> https://doi.org/10.1093/biostatistics/kxw041
#> computing FSOS 'false sign or small' s-values (T=0.322)
#> using 'ashr' for LFC shrinkage. If used in published research, please cite:
#> Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
#> https://doi.org/10.1093/biostatistics/kxw041
#> computing FSOS 'false sign or small' s-values (T=0.585)
names(res)
#> [1] "condition-shLFC" "condition" "condition-shLFC0.322"
#> [4] "condition-LFC0.322" "condition-shLFC0.585" "condition-LFC0.585"
head(res[[1]])
#> log2 fold change (MMSE): condition A vs B
#> Wald test p-value: condition A vs B
#> DataFrame with 6 rows and 5 columns
#> baseMean log2FoldChange lfcSE pvalue padj
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> gene1 4.12848 -0.01272617 0.2089038 0.764262 0.984393
#> gene2 17.06387 -0.01278438 0.1027523 0.539679 0.972929
#> gene3 178.18356 0.00789678 0.0810028 0.669902 0.974509
#> gene4 5.60499 0.02010562 0.1819637 0.568633 0.974509
#> gene5 90.75194 0.00999421 0.0755899 0.597016 0.974509
#> gene6 5.55006 -0.00378005 0.1580818 0.906002 0.992109