Run DiffBind analysis on a samplesheet

RunDiffBind performs a high-level differential binding analysis with DiffBind. It, along with ProcessDBRs, form the crux of the ChIP-seq portion of this package.

RunDiffBind(outpath, samplesheet, txdb, dba = NULL,
  level = c("Treatment", "Condition", "Tissue", "Factor"), se = NULL,
  fdr.thresh = 0.05, fc.thresh = 2, block = NULL,
  heatmap.colors = NULL, heatmap.preset = NULL, reverse = FALSE,
  n.consensus = 2, breaks = c(seq(-3, -1.0001, length = 250), seq(-1,
  -0.1, length = 250), seq(-0.0999, 0.0999, length = 1), seq(0.1, 1, length
  = 250), seq(1.0001, 3, length = 250)), plot.enrich = TRUE,
  enrich.libs = c("GO_Molecular_Function_2018",
  "GO_Cellular_Component_2018", "GO_Biological_Process_2018",
  "KEGG_2019_Human", "Reactome_2016", "BioCarta_2016", "Panther_2016"),
  promoters = c(-2000, 2000), method = c("DESeq2", "edgeR"),
  scale.full = TRUE, flank.anno = TRUE, flank.dist = 5000)

Arguments

outpath	Path to directory to be used for output. Additional directories will be generated within this folder.
samplesheet	Path to samplesheet containing sample metadata.
txdb	`TxDb` object to use for annotation.
dba	DBA object as returned by `dba.count`, `dba.analyze` or from this function itself. If provided, `samplesheet` and `n.consensus` are ignored.
level	String defining variable of interest from `samplesheet`. Must be one of: "Treatment", "Condition", "Tissue", or "Factor".
se	Path to file containing consensus SEs, which will be used to to annotate whether individual peaks fall within an SE or not.
fdr.thresh	Number or numeric scalar indicating the false discovery rate (FDR) cutoff(s) to be used for determining "significant" differential binding. If multiple are given, multiple tables/plots will be generated using all combinations of `fdr.thresh` and `fc.thresh`.
fc.thresh	Number or numeric scalar indicating the log2 fold-change cutoff(s) to be used for determining "significant" differential binding. If multiple are given, multiple tables/plots will be generated using all combinations of `padj.thresh` and `fc.thresh`.
block	String or character vector defining the column(s) in `samplesheet` to use to block for unwanted variance, e.g. batch or technical effects. Must be one of: "Treatment", "Condition", "Tissue", or "Factor".
heatmap.colors	Character vector containing custom colors to use for heatmaps in hex (e.g. `c("#053061", "#f5f5f5", "#67001f")`).
heatmap.preset	String indicating which of the color presets to use in heatmaps. Available presets (low to high) are: "BuRd" Blue to red. "OrPu" Orange to purple. "BrTe" Brown to teal. "PuGr" Purple to green. "BuOr" Sea blue to orange.
reverse	Boolean indicating whether to flip heatmap color scheme (high color will become low, etc).
n.consensus	Number of samples in which peaks must overlap for the peaks to be merged and included in the consensus peak set.
breaks	Vector of sequences to be used as breaks for signal heatmaps.
plot.enrich	Boolean indicating whether enrichment analyses for DBRs should be run and plotted for each comparison.
enrich.libs	Vector of valid `enrichR` libraries to test the genes against. Available libraries can be viewed with `listEnrichrDbs` from the `enrichR` package.
promoters	Scalar vector containing how many basepairs up and downstream of the TSS should be used to define gene promoters.
method	String indicating method to be used for differential expression analysis. Can be "DESeq2" or "edgeR".
scale.full	Boolean indicating whether the full library size (total number of reads) for each sample is used for scaling normalization. If `FALSE`, the total number of reads present in the peaks for each sample is used (preferable if overall binding levels are expected to be similar between samples).
flank.anno	Boolean indicating whether flanking gene information for each peak should be retrieved. Useful for broad peaks and super enhancers.
flank.distance	Integer for distance from edges of peak to search for flanking genes. Ignored if `flank.anno = FALSE`.

Value

A DBA object from dba.analyze.

Details

The default parameters should be an adequate starting place for most users, but lazy folks can provide multiple thresholds to fdr.thresh and/or fc.thresh if they aren't sure how stringent or lenient they need to be with their data.

Providing the resulting DBA object as input to this function can be useful when running multiple times with different levels and blocks or thresholds, as it skips bam loading, which is by far the most time-intensive part.

It's generally best to provide an empty directory as the output path, as several directories will be generated.

Arguments

Value

Details

See also

Contents

Author