RunDiffBind performs a high-level differential binding analysis with DiffBind. It, along with ProcessDBRs, form the crux of the ChIP-seq portion of this package.

RunDiffBind(outpath, samplesheet, txdb, dba = NULL,
  level = c("Treatment", "Condition", "Tissue", "Factor"), se = NULL,
  fdr.thresh = 0.05, fc.thresh = 2, block = NULL,
  heatmap.colors = NULL, heatmap.preset = NULL, reverse = FALSE,
  n.consensus = 2, breaks = c(seq(-3, -1.0001, length = 250), seq(-1,
  -0.1, length = 250), seq(-0.0999, 0.0999, length = 1), seq(0.1, 1, length
  = 250), seq(1.0001, 3, length = 250)), plot.enrich = TRUE,
  enrich.libs = c("GO_Molecular_Function_2018",
  "GO_Cellular_Component_2018", "GO_Biological_Process_2018",
  "KEGG_2019_Human", "Reactome_2016", "BioCarta_2016", "Panther_2016"),
  promoters = c(-2000, 2000), method = c("DESeq2", "edgeR"),
  scale.full = TRUE, flank.anno = TRUE, flank.dist = 5000)

Arguments

outpath

Path to directory to be used for output. Additional directories will be generated within this folder.

samplesheet

Path to samplesheet containing sample metadata.

txdb

TxDb object to use for annotation.

dba

DBA object as returned by dba.count, dba.analyze or from this function itself. If provided, samplesheet and n.consensus are ignored.

level

String defining variable of interest from samplesheet. Must be one of: "Treatment", "Condition", "Tissue", or "Factor".

se

Path to file containing consensus SEs, which will be used to to annotate whether individual peaks fall within an SE or not.

fdr.thresh

Number or numeric scalar indicating the false discovery rate (FDR) cutoff(s) to be used for determining "significant" differential binding. If multiple are given, multiple tables/plots will be generated using all combinations of fdr.thresh and fc.thresh.

fc.thresh

Number or numeric scalar indicating the log2 fold-change cutoff(s) to be used for determining "significant" differential binding. If multiple are given, multiple tables/plots will be generated using all combinations of padj.thresh and fc.thresh.

block

String or character vector defining the column(s) in samplesheet to use to block for unwanted variance, e.g. batch or technical effects. Must be one of: "Treatment", "Condition", "Tissue", or "Factor".

heatmap.colors

Character vector containing custom colors to use for heatmaps in hex (e.g. c("#053061", "#f5f5f5", "#67001f")).

heatmap.preset

String indicating which of the color presets to use in heatmaps.

Available presets (low to high) are:

  • "BuRd" Blue to red.

  • "OrPu" Orange to purple.

  • "BrTe" Brown to teal.

  • "PuGr" Purple to green.

  • "BuOr" Sea blue to orange.

reverse

Boolean indicating whether to flip heatmap color scheme (high color will become low, etc).

n.consensus

Number of samples in which peaks must overlap for the peaks to be merged and included in the consensus peak set.

breaks

Vector of sequences to be used as breaks for signal heatmaps.

plot.enrich

Boolean indicating whether enrichment analyses for DBRs should be run and plotted for each comparison.

enrich.libs

Vector of valid enrichR libraries to test the genes against.

Available libraries can be viewed with listEnrichrDbs from the enrichR package.

promoters

Scalar vector containing how many basepairs up and downstream of the TSS should be used to define gene promoters.

method

String indicating method to be used for differential expression analysis. Can be "DESeq2" or "edgeR".

scale.full

Boolean indicating whether the full library size (total number of reads) for each sample is used for scaling normalization. If FALSE, the total number of reads present in the peaks for each sample is used (preferable if overall binding levels are expected to be similar between samples).

flank.anno

Boolean indicating whether flanking gene information for each peak should be retrieved. Useful for broad peaks and super enhancers.

flank.distance

Integer for distance from edges of peak to search for flanking genes. Ignored if flank.anno = FALSE.

Value

A DBA object from dba.analyze.

Details

The default parameters should be an adequate starting place for most users, but lazy folks can provide multiple thresholds to fdr.thresh and/or fc.thresh if they aren't sure how stringent or lenient they need to be with their data.

Providing the resulting DBA object as input to this function can be useful when running multiple times with different levels and blocks or thresholds, as it skips bam loading, which is by far the most time-intensive part.

It's generally best to provide an empty directory as the output path, as several directories will be generated.

See also

dba, dba.count, dba.contrast, dba.analyze, dba.report for more about ChIP-seq differential binding analysis.

ProcessDBRs, for analyzing and visualizing the results.