Skip to content

The "some day" List

Stuff that I'd like to get to at some point that isn't important enough to prioritize now.

Development

  • Cram convenience/plotting functions into a R package for ease of maintenance/sharing.
  • Find a way to summarize lots of GO terms into a human interpretable output that is both succinct and doesn't suck.
    • Fine tune an LLM and use RAG with the term descriptions?
    • Preferably from within R without relying on external setup to run models.
  • Write up CRISPRball for JOSS or such & submit.
    • Get public version live on DMZ VM.
  • Make IBET more performant.
    • Avoid complete re-plotting when adding labels.
    • Add option to add labels automatically on highlighted gene(sets).
  • Make the nf-core Cut & Run pipeline script for fragment length histograms more memory efficient by changing from using np.array for each iteration to a list and concatenating the list objects into an array at the end.
    • This sucker takes like 500+ GB of memory for large datasets unnecessarily.

Data Processing

  • Typical params.yaml files for nf-core pipelines pointing to our reference data rather than having that crammed into configs.
  • Template notebooks for ATAC-seq.
  • Template notebooks for ChIP-seq/CnR.
  • Set up nf-core rnafusion for easy running on RNA-seq data.

Reference Data

  • Get genesets from: https://www.science.org/doi/10.1126/science.add7046
  • Make combined reference genomes for nf-core pipelines, e.g. mm10 + hg38. GTFs and FASTAs, with clear species indicators.
    • At least for RNA-seq, this takes advantage of the fractional counts based on probability (EM) from salmon for ambiguous reads.
    • Have clear scripts for creation.

Organization

  • List all interactive reports/apps on wiki.