├── CONTRIBUTING.md
├── LICENSE
└── README.md
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 |
3 | Your contributions are always welcome!
4 |
5 | * Open an issue with any suggestion/correction
6 | * Send a Pull Request
7 | * Contact me on Twitter @mikhaildozmorov or by e-mail mikhail.dozmorov@gmail.com
8 |
9 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Mikhail Dozmorov
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Notes on ChIP-seq and other-seq-related tools
2 |
3 | [](https://opensource.org/licenses/MIT) [](http://makeapullrequest.com)
4 |
5 | ChIP-seq, ATAC-seq related tools and genomics data analysis resources. Please, [contribute and get in touch](CONTRIBUTING.md)! See [MDmisc notes](https://github.com/mdozmorov/MDmisc_notes) for other programming and genomics-related notes.
6 |
7 | # Table of content
8 |
9 |
10 |
11 |
12 |
13 | - [Databases](#databases)
14 | - [Motif DBs](#motif-dbs)
15 | - [ChIP-seq](#chip-seq)
16 | - [ChIP-seq pipelines](#chip-seq-pipelines)
17 | - [Normalization](#normalization)
18 | - [CUT&RUN](#cutrun)
19 | - [Quality control](#quality-control)
20 | - [Peaks](#peaks)
21 | - [Enhancers](#enhancers)
22 | - [Visualization](#visualization)
23 | - [Intersections](#intersections)
24 | - [Motif analysis](#motif-analysis)
25 | - [Differential peak detection](#differential-peak-detection)
26 | - [Enrichment](#enrichment)
27 | - [Interpretation](#interpretation)
28 | - [Excludable](#excludable)
29 | - [DNAse-seq](#dnase-seq)
30 | - [ATAC-seq](#atac-seq)
31 | - [ATAC-seq pipelines](#atac-seq-pipelines)
32 | - [Histone-seq](#histone-seq)
33 | - [Broad peak analysis](#broad-peak-analysis)
34 | - [Technology](#technology)
35 | - [Machine learning](#machine-learning)
36 | - [Misc](#misc)
37 |
38 |
39 |
40 | ## Databases
41 |
42 | - [UniBind database](https://unibind.uio.no/) - TFBS predictions of approx. 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. Processed approx. 10,000 public ChIP-seq datasets from nine species using [ChIP-eat](https://bitbucket.org/CBGR/chip-eat/src/master/). ChIP-eat combines both computational (high PWM score) and experimental (centrality to ChIP-seq peak summit) support to find high-confidence direct TF-DNA interactions in a ChIP-seq experiment-specific manner, uses the DAMO tool. Input data - ReMap 2018 and GTRD. Robust and permissive collections. Over 197,000 Cis-regulatory modules. [Downloads](https://unibind.uio.no/downloads/) of BED, FASTA, PWMs, [Tracks for the UCSC GenomeBrowser](https://unibind.uio.no/genome-tracks/), [API](https://unibind.uio.no/api/), [Enrichment analysis, online](https://unibind.uio.no/enrichment/) with or without background, differential enrichment. [UniBind Enrichment BitBucket](https://bitbucket.org/CBGR/unibind_enrichment/src/master/).
43 | Paper
44 | Puig, Rafael Riudavets, Paul Boddie, Aziz Khan, Jaime Abraham Castro-Mondragon, and Anthony Mathelier. “UniBind: Maps of High-Confidence Direct TF-DNA Interactions across Nine Species” BMC Genomics, (December 2021) https://doi.org/10.1186/s12864-021-07760-6
45 |
46 | Gheorghe, Marius, Geir Kjetil Sandve, Aziz Khan, Jeanne Chèneby, Benoit Ballester, and Anthony Mathelier. “A Map of Direct TF–DNA Interactions in the Human Genome.” Nucleic Acids Research 47, no. 4 (February 28, 2019): e21–e21. https://doi.org/10.1093/nar/gky1210
47 |
48 |
49 | - [ReMap](https://remap.univ-amu.fr/) is an integrative analysis of Homo sapiens, Mus musculus and Arabidopsis thaliana transcriptional regulators from DNA-binding experiments such as ChIP-seq, ChIP-exo, DAP-seq from public sources (GEO, ENCODE, ENA). Human hg38 and Arabidopsis TAOR10. All peaks, non-redundant peaks, cis-Regulatory Modules. [GitHub](https://github.com/remap-cisreg). [Download](https://remap.univ-amu.fr/download_page) genomic coordinates.
50 | Paper
51 | Chèneby, Jeanne, Zacharie Ménétrier, Martin Mestdagh, Thomas Rosnet, Allyssa Douida, Wassim Rhalloussi, Aurélie Bergon, Fabrice Lopez, and Benoit Ballester. “[ReMap 2020: A Database of Regulatory Regions from an Integrative Analysis of Human and Arabidopsis DNA-Binding Sequencing Experiments](https://doi.org/10.1093/nar/gkz945).” Nucleic Acids Research, October 29, 2019
52 |
53 | Hammal, Fayrouz, Pierre de Langen, Aurélie Bergon, Fabrice Lopez, and Benoit Ballester. “ReMap 2022: A Database of Human, Mouse, Drosophila and Arabidopsis Regulatory Regions from an Integrative Analysis of DNA-Binding Sequencing Experiments.” Nucleic Acids Research, November 9, 2021, gkab996. https://doi.org/10.1093/nar/gkab996.
54 |
55 |
56 | - [ADASTRA](https://adastra.autosome.org) - the database of Allelic Dosage-corrected Allele-Specific human Transcription factor binding sites (over 500K sites across 1073 human TFs and 649 cell types, reprocessed data from [GTRD](#gtrd), pipeline at [GitHub](https://github.com/autosome-ru/ADASTRA-pipeline)) at nearly 270K SNPs. Background Allele Dosage (BAD) maps. Many SNPs overlap eQTLs.
57 | Paper
58 | Abramov, Sergey, Alexandr Boytsov, Daria Bykova, Dmitry D. Penzar, Ivan Yevshin, Semyon K. Kolmykov, Marina V. Fridman, et al. “Landscape of Allele-Specific Transcription Factor Binding in the Human Genome.” Nature Communications 12, no. 1 (December 2021): 2751. https://doi.org/10.1038/s41467-021-23007-0.
59 |
60 |
61 | - [ANANASTRA](https://ananastra.autosome.org/) - ANnotation and enrichment ANalysis of Allele-Specific TRAnscription factor binding at SNPs. Annotates a given list of SNPs with allele-specific binding events across a wide range of transcription factors and cell types using [ADASTRA](#adastra). Enrichment analysis of SNPs in cell type-specific TFBSs (Fisher's exact, one-sided). [API](https://ananastra.autosome.org/api/v4/).
62 | Paper
63 | Boytsov, Alexandr, Sergey Abramov, Ariuna Z Aiusheeva, Alexandra M Kasianova, Eugene Baulin, Ivan A Kuznetsov, Yurii S Aulchenko, et al. “ANANASTRA: Annotation and Enrichment Analysis of Allele-Specific Transcription Factor Binding at SNPs.” Nucleic Acids Research, April 21, 2022, gkac262. https://doi.org/10.1093/nar/gkac262.
64 |
65 |
66 | - [Catchitt](http://jstacs.de/index.php/Catchitt) - method for predicting TFBSs, leader of ENCODE-DREAM challenge. Other methods - table in supplementary. AUPRC to benchmark performance. DNAse-seq is the best predictor, RNA-seq and sequence-based features are not informative. Java implementation, predicted peaks for 32 transcription factors in 22 primary cell types and tissues (682 total) BED hg19 files, conservative and relaxed predictions, [download](https://www.synapse.org/#!Synapse:syn11526239/wiki/497341).
67 | Paper
68 | Keilwagen, Jens, Stefan Posch, and Jan Grau. “Accurate Prediction of Cell Type-Specific Transcription Factor Binding.” Genome Biology 20, no. 1 (December 2019). https://doi.org/10.1186/s13059-018-1614-y
69 |
70 |
71 | - [C4S DB](https://c4s.site/) - Comprehensive Collection and Comparison for ChIP-Seq Database. Over 16K human ChIP-seq experiments. Data aligned to GRCh37 (hs37d5) genome. "Gene browser" and "global similarity" views. Search for gene symbol, tissue/cell line, ChIP target, sample description/ID.
72 | Paper
73 | Anzawa, Hayato, and Kengo Kinoshita. “C4S DB: Comprehensive Collection and Comparison for ChIP-Seq Database.” Journal of Molecular Biology, May 2023, 168157. https://doi.org/10.1016/j.jmb.2023.168157.
74 |
75 |
76 | - [RAEdb](http://www.computationalbiology.cn/RAEdb/index.php) - enhancer database. Enhancers identified from STARR-seq and MPRA studies. Epromoters - promoters containing enhancers. Human (hg38)/mouse (mm10) data, select cell lines. BED/FASTQ download. Links to EnhancerAtlas, VISTA, SuperEnhancer databases.
77 | Paper
78 | Cai, Zena, Ya Cui, Zhiying Tan, Gaihua Zhang, Zhongyang Tan, Xinlei Zhang, and Yousong Peng. “RAEdb: A Database of Enhancers Identified by High-Throughput Reporter Assays.” Database: The Journal of Biological Databases and Curation 2019 (January 1, 2019). https://doi.org/10.1093/database/bay140.
79 |
80 |
81 | - [ChIP-Atlas](http://chip-atlas.org/) - a large database and analysis suite of public ChIP-seq and DNAse-seq experiments (Over 76K experiments, SRA uniformly processed data). Analyses: **Visualization** of peaks in IGV browser, BED file download, **Target genes** identification, **Colocalization** of factors (antigens), **Enrichment analysis** - permutation enrichment of BED regions, with custom background possible. [GitHub](https://github.com/shinyaoki/chipatlas/tree/master/sh), [Documentation](https://github.com/inutano/chip-atlas/wiki).
82 | Paper
83 | Oki, Shinya, Tazro Ohta, Go Shioi, Hideki Hatanaka, Osamu Ogasawara, Yoshihiro Okuda, Hideya Kawaji, Ryo Nakaki, Jun Sese, and Chikara Meno. “ChIP‐Atlas: A Data‐mining Suite Powered by Full Integration of Public ChIP‐seq Data” EMBO Reports, (December 2018) https://doi.org/10.15252/embr.201846255
84 |
85 |
86 | - [TRRUST](https://www.grnpedia.org/trrust/) database (Transcriptional Regulatory Relationships Unraveled by Sentence-based Text mining). Over 8K regulatory interactions for 800 TFs in human, and over 6K interactions for 828 mouse TFs. Mouse and human TF regulatory networks overlap, complement each other. More information than in PAZAR, TFactS, TRED, TFe databases. [Download](https://www.grnpedia.org/trrust/downloadnetwork.php), TSV format. [Tools](https://www.grnpedia.org/trrust/Network_search_form.php): 1. Search a gene, 2. Enrichment of key regulators for query genes.
87 | Paper
88 | Han, Heonjong, Jae-Won Cho, Sangyoung Lee, Ayoung Yun, Hyojin Kim, Dasom Bae, Sunmo Yang, et al. “TRRUST v2: An Expanded Reference Database of Human and Mouse Transcriptional Regulatory Interactions.” Nucleic Acids Research 46, no. D1 (January 4, 2018): D380–86. https://doi.org/10.1093/nar/gkx1013.
89 |
90 |
91 | - [GTRD](http://gtrd.biouml.org) - transcription factor binding sites and data (ChIP-seq, ChIP-seo, DNAse-seq, MNase-seq, ATAC-seq, RNA-seq), uniformly processed, over 35K experiments. Seven species, TFs linked to [CIS-BP](#cisbp). All cell types are assigned onthology. Experiment search, processed data/peaks download (BED, bigBed, bigWig).
92 | Paper
93 | Yevshin, Ivan, Ruslan Sharipov, Tagir Valeev, Alexander Kel, and Fedor Kolpakov. “GTRD: A Database of Transcription Factor Binding Sites Identified by ChIP-Seq Experiments.” Nucleic Acids Research 45, no. D1 (January 4, 2017): D61–67. https://doi.org/10.1093/nar/gkw951.
94 |
95 | Kolmykov, Semyon, Ivan Yevshin, Mikhail Kulyashov, Ruslan Sharipov, Yury Kondrakhin, Vsevolod J Makeev, Ivan V Kulakovskiy, Alexander Kel, and Fedor Kolpakov. “GTRD: An Integrated View of Transcription Regulation.” Nucleic Acids Research 49, no. D1 (January 8, 2021): D104–11. https://doi.org/10.1093/nar/gkaa1057.
96 |
97 |
98 | - [Cistrome DB v3.0](http://db3.cistrome.org/browser) - a resource of ChIP-seq, A T AC-seq and DNase-seq data from humans and mice. One-page interface to search by target gene and cell type, by gneomic region, find similar BED sets for the uploaded BED.
99 | Paper
100 | Taing, Len, Ariaki Dandawate, Sehi L’Yi, Nils Gehlenborg, Myles Brown, and Clifford A Meyer. “Cistrome Data Browser: Integrated Search, Analysis and Visualization of Chromatin Data.” Nucleic Acids Research, November 16, 2023, gkad1069. https://doi.org/10.1093/nar/gkad1069.
101 |
102 |
103 | - [Cistrome DB](http://cistrome.org/db/#/) - ChIP-seq peaks for TFs, histone modifications, DNAse/ATAC. Downloadable cell type-specific, hg38 BED files. [Toolkit](http://dbtoolkit.cistrome.org/) to answer questions like "What factors regulate your gene of interest?", "What factors bind in your interval?", "What factors have a significant binding overlap with your peak set?"
104 | Paper
105 | Zheng R, Wan C, Mei S, Qin Q, Wu Q, Sun H, Chen CH, Brown M, Zhang X, Meyer CA, Liu XS. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res, 2018 Nov 20. https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky1094/5193328
106 |
107 | Mei S, Qin Q, Wu Q, Sun H, Zheng R, Zang C, Zhu M, Wu J, Shi X, Taing L, Liu T, Brown M, Meyer CA, Liu XS. Cistrome data browser: a data portal for ChIP-Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res, 2017 Jan 4;45(D1):D658-D662. https://academic.oup.com/nar/article/45/D1/D658/2333932
108 |
109 |
110 | - [CODEX ChIP-seq](http://codex.stemcells.cam.ac.uk/) - CODEX provides access to processed and curated NGS experiments, including ChIP-Seq (transcription factors and histones), RNA-Seq and DNase-Seq. Human, mouse. Download tracks, analyze correlations, motifs, compare between organisms, more.
111 | Paper
112 | Sánchez-Castillo, Manuel and Ruau, David and Wilkinson, Adam C. and Ng, Felicia S.L. and Hannah, Rebecca and Diamanti, Evangelia and Lombard, Patrick and Wilson, Nicola K. and Gottgens, Berthold. "CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities" Nucleic Acids Research, Database Issue, September 2014 https://doi.org/10.1093/nar/gku895
113 |
114 |
115 | - [hTFtarget](http://bioinfo.life.hust.edu.cn/hTFtarget/) - database of TF-gene target regulations from >7K human ChIP-seq experiments.
116 |
117 | ### Motif DBs
118 |
119 | - [CIS-BP](http://cisbp.ccbr.utoronto.ca/index.php) (The Catalog of Inferred Sequence Binding Preferences) - database of inferred sequence binding preferences. DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. PBM microarray assays to analyze TF binding preferences. Closely related DBDs (70% Amino Acid identity) almost always have very similar DNA sequence preferences, enabling inference of motifs for approx. 34% of the 70,000 known or predicted eukaryotic TFs. Tools to scan single sequence for TF binding, two sequences for differential TF binding (including SNP effect scan), protein scan, motif scan. Bulk download of PWMs, protein sequences, TF information, logos.
120 | Paper
121 | Weirauch, Matthew T., Ally Yang, Mihai Albu, Atina G. Cote, Alejandro Montenegro-Montero, Philipp Drewe, Hamed S. Najafabadi, et al. “Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity.” Cell 158, no. 6 (September 2014): 1431–43. https://doi.org/10.1016/j.cell.2014.08.009.
122 |
123 |
124 | - [HOCOMOCO](https://hocomoco11.autosome.org/) (Homo sapiens comprehensive model collection) - TFBS models and PWMs. Human- and mouse-specific models. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices. Uniformly processed data from [GTRD](#gtrd), peaks called with four peak callers (). Used [ChIPMunk](https://autosome.org/ChIPMunk/) in four computational models, including using DNA shape. Added [MoLoTool](https://molotool.autosome.org/), a web app to scan DNA sequences for TFBSs with PWMs. One model per TF is manually selected. Twice as many models as in JASPAR.
125 | Paper
126 | Kulakovskiy, Ivan V., Yulia A. Medvedeva, Ulf Schaefer, Artem S. Kasianov, Ilya E. Vorontsov, Vladimir B. Bajic, and Vsevolod J. Makeev. “HOCOMOCO: A Comprehensive Collection of Human Transcription Factor Binding Sites Models.” Nucleic Acids Research 41, no. D1 (January 1, 2013): D195–202. https://doi.org/10.1093/nar/gks1089.
127 |
128 | Kulakovskiy, Ivan V., Ilya E. Vorontsov, Ivan S. Yevshin, Anastasiia V. Soboleva, Artem S. Kasianov, Haitham Ashoor, Wail Ba-alawi, et al. “HOCOMOCO: Expansion and Enhancement of the Collection of Transcription Factor Binding Sites Models.” Nucleic Acids Research 44, no. D1 (January 4, 2016): D116–25. https://doi.org/10.1093/nar/gkv1249.
129 |
130 | Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Ruslan N Sharipov, Alla D Fedorova, Eugene I Rumynskiy, Yulia A Medvedeva, et al. “HOCOMOCO: Towards a Complete Collection of Transcription Factor Binding Models for Human and Mouse via Large-Scale ChIP-Seq Analysis.” Nucleic Acids Research 46, no. D1 (January 4, 2018): D252–59. https://doi.org/10.1093/nar/gkx1106.
131 |
132 |
133 | - [SwissRegulon](https://swissregulon.unibas.ch/sr/) - a database of regulatory motifs (PWMs) across model organisms (prokaryots, eukaryots). Data partly comes from JASPAR and TRANSFAC, reprocessing of ChIP-seq experiments. GBrowse for browsing TFBSs. Other tools.
134 | Paper
135 | Pachkov, Mikhail, Piotr J. Balwierz, Phil Arnold, Evgeniy Ozonov, and Erik van Nimwegen. “SwissRegulon, a Database of Genome-Wide Annotations of Regulatory Sites: Recent Updates.” Nucleic Acids Research 41, no. D1 (November 23, 2012): D214–20. https://doi.org/10.1093/nar/gks1145.
136 |
137 |
138 | - [tangermeme](https://github.com/jmschrei/tangermeme) - Python interface to the MEME suite
139 |
140 | ## ChIP-seq
141 |
142 | - [ChIP-seq-analysis](https://github.com/crazyhottommy/ChIP-seq-analysis) - ChIP-seq analysis notes from Ming Tang
143 |
144 | ### ChIP-seq pipelines
145 |
146 | - [ChIP-AP](https://github.com/JSuryatenggara/ChIP-AP) - ChIP-seq analysis pipeline integrating multiple tools and peak callers (FastQC, Clumpify and BBDuk from the BBMap Suite, Trimmomatic, BWA, Samtools, deepTools, MACS2, GEM, SICER2, HOMER, Genrich, IDR, and the MEME-Suite). QC, cleanup, alignment, peak-calling, pathway analysis. High-confidence peaks based on overlaps by different peak callers. Input - single- or paired-end FASTQ files or aligned BAM files. Conda installable. Command line and GUI. [Documentation](https://github.com/JSuryatenggara/ChIP-AP/wiki/ChIP-AP-Guide).
147 | Paper
148 | Suryatenggara, Jeremiah, Kol Jia Yong, Danielle E. Tenen, Daniel G. Tenen, and Mahmoud A. Bassal. "ChIP-AP: an integrated analysis pipeline for unbiased ChIP-seq analysis." Briefings in Bioinformatics 23, no. 1 (January 2022) https://doi.org/10.1093/bib/bbab537
149 |
150 |
151 | - Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data. http://www.regulatory-genomics.org, https://github.com/CostaLab/reg-gen
152 | - HINT (Hmm-based IdeNtification of Transcription factor footprints) is a framework for detection of DNA footprints from DNase-Seq and histone modification ChIP-Seq data.
153 | - Motif Analysis tools allows the search of motifs with binding sites enriched in particular genomic regions
154 | - ODIN and THOR are HMM-based approaches to detect and analyse differential peaks in pairs of ChIP-seq data.
155 | - RGT-Viz is a collection of tests for association analysis and tools for visualizaiton of genomic data such as files in BED and BAM format
156 | - Triplex Domain Finder (TDF) statistically characterizes the triple helix potential of RNA and DNA regions.
157 |
158 | - ENCODE3 pipeline v1 specifications, https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#heading=h.9ecc41kilcvq
159 |
160 | - [CHIPS](https://github.com/liulab-dfci/chips) - A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data (ChIP-seq, ATAC-seq). Alignment, extensive QC, peak calling, downstream analysis (annotation, motif finding, putative targets). Generates an HTML report, plotly interactive plots. Distributed as a Conda recipe.
161 | - Taing, Len, Gali Bai, Clara Cousins, Paloma Cejas, Xintao Qiu, Zachary T. Herbert, Myles Brown, et al. “[CHIPS: A Snakemake Pipeline for Quality Control and Reproducible Processing of Chromatin Profiling Data](https://doi.org/10.12688/f1000research.52878.1).” F1000Research, (June 30, 2021)
162 |
163 | - `AQUAS` ChIP-seq processing pipeline - The AQUAS pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications, https://github.com/kundajelab/chipseq_pipeline
164 |
165 | - `Crunch` - Completely Automated Analysis of ChIP-seq Data, http://crunch.unibas.ch/, https://www.biorxiv.org/content/early/2016/03/09/042903
166 |
167 | - `ChiLin` - QC, peak calling, motif analysis for ChIP-seq and DNAse-seq data used by CistromeDb. References to other tools. https://github.com/cfce/chilin
168 | - Qin, Qian, Shenglin Mei, Qiu Wu, Hanfei Sun, Lewyn Li, Len Taing, Sujun Chen, et al. “ChiLin: A Comprehensive ChIP-Seq and DNase-Seq Quality Control and Analysis Pipeline.” BMC Bioinformatics 17, no. 1 (October 3, 2016): 404. https://doi.org/10.1186/s12859-016-1274-4.
169 |
170 | - `ChIP-eat` - a pipeline for aligning reads, calling peaks, predicting TFBSs. https://bitbucket.org/CBGR/chip-eat/src/master/
171 | - Gheorghe, Marius, Geir Kjetil Sandve, Aziz Khan, Jeanne Chèneby, Benoit Ballester, and Anthony Mathelier. “A Map of Direct TF–DNA Interactions in the Human Genome.” Nucleic Acids Research 47, no. 4 (February 28, 2019): e21–e21. https://doi.org/10.1093/nar/gky1210.
172 |
173 | - `ChIPLine` - a pipeline for ChIP-seq analysis, https://github.com/ay-lab/ChIPLine
174 |
175 | #### Normalization
176 |
177 | - [BAMscale](https://github.com/ncbi/BAMscale) - BAMscale is a one-step tool for either 1) quantifying and normalizing the coverage of peaks or 2) generated scaled BigWig files for easy visualization of commonly used DNA-seq capture based methods.
178 |
179 | - [CHIPIN](https://github.com/BoevaLab/CHIPIN) - ChIP-seq Intersample Normalization using gene expression. Assumption - non-differential genes should have non-differential peaks.
180 |
181 | - `S3norm` - Chip-seq normalization to sequencing depth AND signal-to-noise ratio to the common reference. Negative Binomial for modeling background, convert counts to -log10(p-values), use monotonic nonlinear model to match the means of the common peaks and backgrounds in two datasets. https://github.com/guanjue/S3norm
182 | - Xiang, Guanjue, Cheryl Keller, Belinda Giardine, Lin An, Ross Hardison, and Yu Zhang. “S3norm: Simultaneous Normalization of Sequencing Depth and Signal-to-Noise Ratio in Epigenomic Data.” BioRxiv, January 1, 2018, 506634. https://doi.org/10.1101/506634.
183 |
184 | #### CUT&RUN
185 |
186 | - [CUT&Tag Data Processing and Analysis Tutorial](https://yezhengstat.github.io/CUTTag_tutorial/)
187 | - [Methods with detailed commands of CUT&RUN data analysis](https://www.biorxiv.org/content/10.1101/2020.08.31.272856v1.full-text) - from Divya S. Vinjamur et al. "[ZNF410 represses fetal globin by devoted control of CHD4/NuRD](https://doi.org/10.1101/2020.08.31.272856)," bioRxiv, August 31, 2020.
188 |
189 | - CUT&TAG technology Cleavage Under Target and Tagmentation. Compared with CUT&RUN that uses MNase, it uses Tn5 transposase, reactions performed within intact cells, performed on a solid support (tethered). Better suited for low cell numbers, low cost. Tested on H3K27me3 and RNAPII profiling in K562, compared with the same CUT&RUN data. Sharper peaks, nearly 20X more that ChIP-seq. Compared with ATAC-seq in K562, H3K4me2, better signal-to-noise ratio, even at low sequencing depth. Tested using NPAT and CTCF transcription factors. Methods - alignment (bowtie2) and peak calling (MACS2) settings.
190 | - Kaya-Okur, Hatice S., Steven J. Wu, Christine A. Codomo, Erica S. Pledger, Terri D. Bryson, Jorja G. Henikoff, Kami Ahmad, and Steven Henikoff. “[CUT&Tag for Efficient Epigenomic Profiling of Small Samples and Single Cells](https://doi.org/10.1038/s41467-019-09982-5).” Nature Communications 10, no. 1 (December 2019)
191 |
192 | - CUT&RUN technology, chromatin profiling strategy, antibody-targeted controlled cleavage by micrococcal nuclease. Cost-efficient, low input requirements, easier.
193 | - Skene, Peter J, and Steven Henikoff. “[An Efficient Targeted Nuclease Strategy for High-Resolution Mapping of DNA Binding Sites](https://elifesciences.org/articles/21856).” Genes and Chromosomes
194 |
195 | - [SEARC](https://seacr.fredhutch.org/) (Sparse Enrichment Analysis for CUT&RUN) peak caller for CUT&RUN data. Data-driven, peaks with respect to global background or IgG control. Compared to MACS2 and HOMER, more precise and maintains true positive rate at low read depth. Better call wide peaks. Input - bedGraph, output - BED. Command line and [web server](https://seacr.fredhutch.org/)
196 | - Meers, MP, Tenenbaum, D and Henikoff S (2019). "[Peak calling by sparse enrichment analysis for CUT&RUN chromatin profiling](https://epigeneticsandchromatin.biomedcentral.com/articles/10.1186/s13072-019-0287-4)". Epigenetics & Chromatin 2019 12:42.
197 |
198 | - [CUT&RUNTools 2.0](https://github.com/fl-yu/CUT-RUNTools-2.0) - extended functionality to handle single-cell data, data normalization, peak calling (MACS2, SEACR), dimensionality reduction (Latent Semantic Indexing), downstream functional analysis.
199 | - Yu, Fulong, Vijay G Sankaran, and Guo-Cheng Yuan. “[CUT&RUNTools 2.0: A Pipeline for Single-Cell and Bulk-Level CUT&RUN and CUT&Tag Data Analysis](https://doi.org/10.1093/bioinformatics/btab507),” Bioinformatics, 09 July 2021
200 |
201 | - [CUT&RUNTools](https://bitbucket.org/qzhudfci/cutruntools/src/master/) - a pipeline to fully process CUT&RUN data and identify protein binding and genomic footprinting from antibody-targeted primary cleavage data. Implemented in R, Python, Bach, runs under the SLURM job submission. At the core, creates a cut matrix of from enzyme cleavage data. Compared with Atactk and Centipede. (Tested, didn't work)
202 | - Zhu, Qian. “[CUT&RUNTools: A Flexible Pipeline for CUT&RUN Processing and Footprint Analysis](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1802-4),” 2019, 12.
203 |
204 | ### Quality control
205 |
206 | - [ChIPQC](https://bioconductor.org/packages/ChIPQC/) - Quality metrics for ChIPseq data
207 |
208 | - `phantompeakqualtools` - This package computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays. https://github.com/kundajelab/phantompeakqualtools
209 |
210 | ### Peaks
211 |
212 | - Benchmarking of 14 ChIP-seq tools for peak calling and differential analysis (described in [supplementary](https://academic.oup.com/bib/article/17/6/953/2453197#47712124)). Experimental and simulated data, narrow and broad peaks, with/without input, replicates. DEG enrichment analysis. Poor agreement. MACS2 performs OK for sharp peaks. [Figure 7](https://academic.oup.com/view-large/figure/47712115/bbv110f7p.tif) - decision tree for tool selection.
213 | Paper
214 | Steinhauser, Sebastian, Nils Kurzawa, Roland Eils, and Carl Herrmann. “A Comprehensive Comparison of Tools for Differential ChIP-Seq Analysis,” n.d., 14.
215 |
216 |
217 | - [LanceOtron](https://github.com/LHentges/LanceOtron) - deep learning-based peak caller from TF and histone ChIP-seq, ATAC-seq, DNAse-seq. Input - bigWig coverage file (+input, if available). Image recognition using wide and deep model (logistic regression producing enrichment scores, CNN, multilayer perceptron, Fig. 1c, Methods). Trained on hand-labeled data. Outperforms MACS2. Visualization using [MLV genome visualization](https://github.com/Hughes-Genome-Group/mlv) software. [Website](https://lanceotron.molbiol.ox.ac.uk/) with videos, documentation.
218 | Paper
219 | Hentges, Lance D., Martin J. Sergeant, Damien J. Downes, Jim R. Hughes, and Stephen Taylor. "LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq." bioRxiv (2021). https://doi.org/10.1101/2021.01.25.428108
220 |
221 |
222 | - [epic2](https://github.com/biocore-ntnu/epic2) - diffuse ChIP-seq peak caller, Cython reimplementation of SICER, 30X times faster, 1/7 memory use. Available on Conda and [GitHub](https://github.com/biocore-ntnu/epic2)
223 | - Stovner, Endre Bakken. “[Epic2 Efficiently Finds Diffuse Domains in ChIP-Seq Data](https://doi.org/10.1093/bioinformatics/btz232),” Bioinformatics. 2019 Nov 1
224 |
225 | - `Genrich` - Detecting sites of genomic enrichment in ChIP-seq and ATAC-seq. https://github.com/jsh58/Genrich, unpublished but highly tested and recommented, https://informatics.fas.harvard.edu/atac-seq-guidelines.html
226 |
227 | - `mosaics` - This package provides functions for fitting MOSAiCS and MOSAiCS-HMM, a statistical framework to analyze one-sample or two-sample ChIP-seq data of transcription factor binding and histone modification. https://bioconductor.org/packages/release/bioc/html/mosaics.html
228 |
229 | - `RSEG` - ChIP-seq broad domain analysis. http://smithlabresearch.org/software/rseg/
230 | - Song, Qiang, and Andrew D Smith. “Identifying Dispersed Epigenomic Domains from ChIP-Seq Data.” Bioinformatics 27, no. 6 (2011): 870–871.
231 |
232 | - `triform` - finds enriched regions (peaks) in transcription factor ChIP-sequencing data. https://bioconductor.org/packages/release/bioc/html/triform.html
233 |
234 | - [Enriched Domain Detector (EDD)](https://github.com/CollasLab/edd) - a ChIP-seq peak caller for detection of megabase domains of enrichment.
235 |
236 | ### Enhancers
237 |
238 | - [ROSE](https://bitbucket.org/young_computation/rose) - rank-ordering of super-enhancers using H3K27ac ChIP-seq data, by the Young lab.
239 |
240 | - [LILI](https://github.com/BoevaLab/LILY) - a pipeline by Boeva lab for detection of super-enhancers using H3K27ac ChIP-seq data, which includes explicit correction for copy number variation inherent to cancer samples. The pipeline is based on the ROSE algorithm originally developed by the Young lab.
241 |
242 | - [CenhANCER](https://cenhancer.chenzxlab.cn/#/) - a cancer enhancer database, curating public H3K27ac ChIP-seq data from 805 primary tissue samples and 671 cell line samples across 41 cancer types. 57 029 408 typical enhancers, 978 411 super-enhancers and 226 726 enriched transcription factors. Annotated with SNPs. Table 1 - comparison with other resources (CancerEnD, OncoBase, OncoCis, ENdb, DiseaseEnhancer, SEdb, SEanalysis).
243 | Paper
244 | Luo, Zhi-Hui, Meng-Wei Shi, Yuan Zhang, Dan-Yang Wang, Yi-Bo Tong, Xue-Ling Pan, and ShanShan Cheng. “CenhANCER: A Comprehensive Cancer Enhancer Database for Primary Tissues and Cell Lines.” Database 2023 (May 18, 2023): baad022. https://doi.org/10.1093/database/baad022.
245 |
246 |
247 | ### Visualization
248 |
249 | - [DeepTools](https://deeptools.readthedocs.io/en/develop/) - a suite of python tools particularly developed for the efficient analysis and visualization of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq. [deepStats](https://github.com/gtrichard/deepStats) - a stastitical toolbox with additional tools for deeptools and genomic signals.
250 |
251 | - Visualizations of ChIP-Seq data using Heatmaps, https://www.biostars.org/p/180314/
252 |
253 | - `ChAsE` - Chromatin Analysis & Exploration Tool. http://chase.cs.univie.ac.at/overview
254 |
255 | - `ChromHeatMap` - Heat map plotting by genome coordinate. https://bioconductor.org/packages/release/bioc/html/ChromHeatMap.html
256 |
257 | - `BAM2WIG` - a flexible tool to generate read coverage profile (WIG file) from a BAM file. http://www.epigenomes.ca/tools-and-software
258 |
259 | - `EaSeq` - peak calling (MACS), visualization, and analysis of ChIP-seq experiments. GUI, Windows-based, stand-alone. Figure 1, 3 - range of functionality, compared with other tools. https://easeq.net/downloadeaseq/. Description of tools: http://easeq.net/tools.pdf, Visualization examples: http://easeq.net/plots.pdf, Workflow examples: http://easeq.net/examples.pdf
260 | - Lerdrup, Mads, Jens Vilstrup Johansen, Shuchi Agrawal-Singh, and Klaus Hansen. “An Interactive Environment for Agile Analysis and Visualization of ChIP-Sequencing Data.” Nature Structural & Molecular Biology 23, no. 4 (April 2016): 349–57. https://doi.org/10.1038/nsmb.3180.
261 |
262 | - [pygv](https://github.com/manzt/pygv) - a minimal, scriptable IGV-like genome browser for python
263 |
264 | - `Zerone` - combine multiple ChIP-seq profiles into one discretized profile. HMM with zero-inflated negative multinomial emissions across windowed genome. QC using SVM trained on ENCODE data to distinguish good from bad samples. Requires two negative controls. Compared against peaks called by MACS, BayesPeak, JAMM. https://github.com/nanakiksc/zerone
265 | - Cuscó, Pol, and Guillaume J. Filion. “Zerone: A ChIP-Seq Discretizer for Multiple Replicates with Built-in Quality Control.” Bioinformatics 32, no. 19 (October 1, 2016): 2896–2902. https://doi.org/10.1093/bioinformatics/btw336.
266 |
267 | ### Intersections
268 |
269 | - [BedSect](https://imgsb.org/bedsect/) - web server for intersection analysis of genomic regions, UpSet and correlation plots. Gene-centric, GREAT enrichment analysis. Integrated with the [GTRD](https://gtrd.biouml.org/) database. [GitHub](https://github.com/sraghav-lab/Bedsect).
270 | Paper
271 | Mishra, Gyan Prakash, Arup Ghosh, Atimukta Jha, and Sunil Kumar Raghav. “BedSect: An Integrated Web Server Application to Perform Intersection, Visualization, and Functional Annotation of Genomic Regions From Multiple Datasets.” Frontiers in Genetics 11 (February 5, 2020): 3. https://doi.org/10.3389/fgene.2020.00003.
272 |
273 |
274 | - [Intervene](https://asntech.shinyapps.io/intervene/) - command line and web server for venn diagrams of overlaps of genomic regions (up to six sets), UpSet plot, correlation heatmap. Python (pybedtools, Seaborn, Matplotlib) and R (UpSetF, Corrplot, Venerable). [BitBucket](https://bitbucket.org/CBGR/intervene/src/master/).
275 | Paper
276 | Khan, Aziz, and Anthony Mathelier. “Intervene: A Tool for Intersection and Visualization of Multiple Gene or Genomic Region Sets.” BMC Bioinformatics 18, no. 1 (December 2017): 287. https://doi.org/10.1186/s12859-017-1708-7.
277 |
278 |
279 | ### Motif analysis
280 |
281 | - [memes](https://github.com/snystrom/memes/) - an R package interfacing MEME suite (DREME, ME, FIMO, TOMTOM). Using universalmotif_df R/Bioconductor object to make results compatible across tools. De novo motif discovery, differential motifs, known motif enrichment analysis. Visualization capabilities. Case example on ChIP-seq peaks in Drosophila wing development. Requires installation of MEME suite. Docker container with RStudio and everything configured. [Bioconductor](https://bioconductor.org/packages/memes/), [pkgdown website](https://snystrom.github.io/memes-manual/index.html).
282 | Paper
283 | Nystrom, Spencer L, and Daniel J McKay. “Memes: A Motif Analysis Environment in R Using Tools from the MEME Suite.” PLOS COMPUTATIONAL BIOLOGY, n.d., 14.
284 |
285 |
286 | - [ChEA3](https://maayanlab.cloud/chea3/) - transcription factor enrichment in gene lists. Six reference libraries of TF regulatory signatures (ARCHS4, ENCODE, GTeX, ReMap, Enrichr, Literature). Fisher's exact test. Outperform VIPER, DoRothEA, BART, TFEA.ChIP, oPOSSUM, MAGICACT.
287 | Paper
288 | Keenan, Alexandra B, Denis Torre, Alexander Lachmann, Ariel K Leong, Megan L Wojciechowicz, Vivian Utti, Kathleen M Jagodnik, Eryk Kropiwnicki, Zichen Wang, and Avi Ma’ayan. “[ChEA3: Transcription Factor Enrichment Analysis by Orthogonal Omics Integration](https://doi.org/10.1093/nar/gkz446).” Nucleic Acids Research, (July 2, 2019)
289 |
290 |
291 | - [TFEA.ChIP](https://bioconductor.org/packages/TFEA.ChIP/) - R package for transcription factor enrichment of gene lists (hypergeometric and GSEA) using experimental ChIP-seq datasets (ENCODE, GEO). Tested on known signatures, compared with two PWM-based and ChIP-based, performs comparably or better.
292 | - Puente-Santamaria, Laura, Wyeth W. Wasserman, and Luis Del Peso. "[TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets](https://doi.org/10.1093/bioinformatics/btz573)." Bioinformatics, (2019)
293 |
294 | - [PWMScan](https://ccg.epfl.ch/pwmtools/pwmscan.php) - web tool for scanning entire genomes with a position-specific weight matrix. Multiple genomes and assemblies hosted on the server. Multiple PWM collections for Eukaryotic DNA (JASPAR, [HOCOMOCO](#hocomoco), SwissRegulon, UniPROBE, [CIS-BP](#cisbp), from Jomla, Isakova publications) matrix_scan C program for matching PWMs. Compared with other motif scanning tools (PoSSuMseqrch, Patser, RSAT, STORM, HOMER), overlap >99%. Output - [BEDdetail](https://genome.ucsc.edu/FAQ/FAQformat.html#format1.7) format. [Code](https://sourceforge.net/projects/pwmscan/).
295 | Paper
296 | Ambrosini, Giovanna, Romain Groux, and Philipp Bucher. “PWMScan: A Fast Tool for Scanning Entire Genomes with a Position-Specific Weight Matrix.” Edited by John Hancock. Bioinformatics 34, no. 14 (July 15, 2018): 2483–84. https://doi.org/10.1093/bioinformatics/bty127.
297 |
298 |
299 | - [gimmemotifs](https://github.com/vanheeringen-lab/gimmemotifs) - framework for TF motif analysis using an ensemble of motif predictors. `maelstrom` tool to detect differential motif activity between multiple different conditions. Includes manually curated database of motifs. Benchmark of 14 motif detection tools - Homer, MEME, BioProspector are among the top performing. Extensive analysis results. [Documentation](https://gimmemotifs.readthedocs.io). [Tweet with updates](https://twitter.com/svheeringen/status/1313778158999666688?s=20)
300 | - Bruse, Niklas, and Simon J. van Heeringen. “[GimmeMotifs: An Analysis Framework for Transcription Factor Motif Analysis](https://doi.org/10.1101/474403),” November 20, 2018
301 |
302 | - `DECOD` - Differential motif finder. k-mer-based. http://gene.ml.cmu.edu/DECOD/
303 | - Huggins, Peter, Shan Zhong, Idit Shiff, Rachel Beckerman, Oleg Laptenko, Carol Prives, Marcel H. Schulz, Itamar Simon, and Ziv Bar-Joseph. “DECOD: Fast and Accurate Discriminative DNA Motif Finding.” Bioinformatics 27, no. 17 (September 1, 2011): 2361–67. https://doi.org/10.1093/bioinformatics/btr412.
304 |
305 | - [Non-redundant TF motif matches genome-wide](https://www.vierstra.org/resources/motif_clustering#downloads) - Clustering of 2179 motif models. hg38/mm10 BED files download with coordinates
306 |
307 | - [homerkit](https://github.com/slowkow/homerkit) - Read HOMER motif analysis output in R.
308 |
309 | - [LISA](http://lisa.cistrome.org/) - epigenetic Landscape In silico Subtraction analysis, enriched TFs and chromatin regulators in a list of genes.
310 |
311 | - [Logolas](https://github.com/kkdey/Logolas) - R package for Enrichment Depletion Logos (EDLogos) and String Logos.
312 |
313 | - [marge](https://robertamezquita.github.io/marge/) - API for HOMER in R for Genomic Analysis using Tidy Conventions, [GitHub](https://github.com/robertamezquita/marge)
314 |
315 | - [motifbreakR](https://bioconductor.org/packages/motifbreakR/) - R package for predicting the disruptiveness of single nucleotide polymorphism on TFBSs. SNPs may be a list of rsIDs or a BED file. Includes MotifDB PWMs and others (ENCODE, Factorbook, Hocomoco, homer).
316 |
317 | - [motifStack](https://bioconductor.org/packages/motifStack/) - R package for plotting stacked logos for single or multiple DNA, RNA and amino acid sequence.
318 |
319 | - [pyjaspar](https://github.com/asntech/pyjaspar) - A Pythonic interface to query and access JASPAR transcription factor motifs
320 |
321 | - [RcisTarget](https://bioconductor.org/packages/RcisTarget/) - R package to identify transcription factor binding motifs enriched on a list of genes or genomic regions.
322 |
323 | - [rGADEM](https://bioconductor.org/packages/rGADEM/) - R package for de novo motif discovery.
324 |
325 |
326 | ### Differential peak detection
327 |
328 | - [diffTF](https://git.embl.de/grp-zaugg/diffTF) - differential TF activity calculation and integration with RNA-seq data for classification of TFs into activators or repressors. Differential analysis on consensus peaks using permutations or statistics (diffPeaks). Input: BAM, fasta files, RNA-seq counts, external TFBS data (HOCOMOCO, JASPAR, TRRUST, or ReMap). Applied to several datasets, including multiomics, recovers known biology, experimental validation. Implemented as a Snakemake pipeline. Singularity, conda installation. [Documentation](https://difftf.readthedocs.io/en/latest/index.html).
329 | Paper
330 | Berest, Ivan, Christian Arnold, Armando Reyes-Palomares, Giovanni Palla, Kasper Dindler Rasmussen, Holly Giles, Peter-Martin Bruch, et al. “Quantification of Differential Transcription Factor Activity and Multiomics-Based Classification into Activators and Repressors: DiffTF.” Cell Reports 29, no. 10 (December 2019): 3147-3159.e12. https://doi.org/10.1016/j.celrep.2019.10.106.
331 |
332 |
333 | - [normR](https://bioconductor.org/packages/normr/) - a Bioconductor R package, data-driven normalization and difference calling approach for ChIP-seq data. Models ChIP- and control read counts by binomial mixture model. One component models the background, the other models the signal. Can work without control.
334 | - Helmuth, Johannes, et al. "[normR: Regime enrichment calling for ChIP-seq data](https://doi.org/10.1101/082263)." BioRxiv (2016)
335 |
336 | - `csaw` - Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control. https://bioconductor.org/packages/release/bioc/html/csaw.html
337 |
338 | - `DiffBind` - Differential Binding Analysis of ChIP-Seq Peak Data. https://bioconductor.org/packages/release/bioc/html/DiffBind.html
339 |
340 | ### Enrichment
341 |
342 | - UniBind Enrichment Analysis, also differential enrichment. Input - BED file in hg38 version. [LOLA](https://bioconductor.org/packages/release/bioc/html/LOLA.html) as an enrichment and database engine.
343 |
344 | ### Interpretation
345 |
346 | - [Lisa](http://lisa.cistrome.org/) - web server to determine the transcription factors and chromatin regulators that are directly responsible for the perturbation of a differentially expressed gene set (chrom-PR score). Using public and custom human and mouse DNase-seq, and H3K27ac ChIP-seq profiles (CistromeDB). Input: list of differential genes. [GitHub](https://github.com/AllenWLynch/lisa).
347 | Paper
348 | Qin, Qian, Jingyu Fan, Rongbin Zheng, Changxin Wan, Shenglin Mei, Qiu Wu, Hanfei Sun, et al. “Lisa: Inferring Transcriptional Regulators through Integrative Modeling of Public Chromatin Accessibility and ChIP-Seq Data.” Genome Biology 21, no. 1 (December 2020): 32. https://doi.org/10.1186/s13059-020-1934-6.
349 |
350 |
351 | - [Cistrome-GO](http://go.cistrome.org/) - functional enrichment analysis of genes regulated by TFs in human and mouse. Solo mode (ChIP-seq peaks only) or ensemble mode (integrates ChIP-seq peaks and RNA-seq differentially expressed genes). Implementation of BETA method. MACS2 peaks, DESeq2 output. Gene-centric regulatory potential (RP) score (exponentially weighted by distance sum of peaks). Human (hg19/hg38), Mouse (mm9/mm10).
352 | Paper
353 | Li, Shaojuan, Changxin Wan, Rongbin Zheng, Jingyu Fan, Xin Dong, Clifford A. Meyer, and X. Shirley Liu. “Cistrome-GO: A Web Server for Functional Enrichment Analysis of Transcription Factor ChIP-Seq Peaks.” Nucleic Acids Research 47, no. W1 (July 2, 2019): W206–11. https://doi.org/10.1093/nar/gkz332.
354 |
355 |
356 | - `Toolkit for Cistrome Data Browser` - online tool to answer questions like:
357 | - What factors regulate your gene of interest?
358 | - What factors bind in your interval?
359 | - What factors have a significant binding overlap with your peak set?
360 |
361 | - Chongzhi Zhang software page, http://faculty.virginia.edu/zanglab/software.htm
362 | - `BART` (Binding Analysis for Regulation of Transcription), a bioinformatics tool for predicting functional transcription factors (TFs) that bind at genomic cis-regulatory regions to regulate gene expression in the human or mouse genomes, given a query gene set or a ChIP-seq dataset as input. http://bartweb.uvasomrc.io/
363 | - `MARGE` (Model-based Analysis of Regulation of Gene Expression), a comprehensive computational method for inference of cis-regulation of gene expression leveraging public H3K27ac genomic profiles in human or mouse. http://cistrome.org/MARGE/
364 | - `MANCIE` (Matrix Analysis and Normalization by Concordant Information Enhancement), a computational method for high-dimensional genomic data integration. https://cran.r-project.org/web/packages/MANCIE/index.html
365 | - `SICER` (Spatial-clustering Identification of ChIP-Enriched Regions), a ChIP-Seq data analysis method. https://home.gwu.edu/~wpeng/Software.htm
366 |
367 | - [UROPA](http://loosolab.mpi-bn.mpg.de/) - Universal RObustPeak Annotator, a command line based tool intended for genomic region annotation. Definition of overlap/proximity types. [Documentation](https://uropa-manual.readthedocs.io/)
368 |
369 |
370 | ### Excludable
371 |
372 | - CUT&RUN blacklists for human (hg38) and mouse (mm10) genomes. Different biochemical properties than ChIP-seq, SEACR peak caller that uses global background. 20 C&R negative control datasets per human/mouse genome, consistently called artifactual peaks (the highest 0.1% signals in more than 30% of replicates, peaks extended by 1Kb) are assembled into blacklists. Also contain mitochondrial sequences (NUMTs). Tested bowtie2 and bowtie alignment strategies. Cover approximately 0.2% of the genome, removing reads overlapping them increases variability among samples (PCA). Compared with the Boyle's Blacklist-generated lists. BED coordinates in [supplementary](https://www.biorxiv.org/content/10.1101/2022.11.11.516118v1.supplementary-material).
373 | Paper
374 | Nordin, Anna, Gianluca Zambanini, Pierfrancesco Pagella, and Claudio Cantù. “The CUT&RUN Blacklist of Problematic Regions of the Genome.” Preprint. Genomics, November 14, 2022. https://doi.org/10.1101/2022.11.11.516118.
375 |
376 |
377 | - [Blacklist](https://github.com/Boyle-Lab/Blacklist) - Application for making ENCODE Blacklists, and links to canonical blacklists. C, C++.
378 | Paper
379 | Amemiya, Haley M., Anshul Kundaje, and Alan P. Boyle. “The ENCODE Blacklist: Identification of Problematic Regions of the Genome.” Scientific Reports 9, no. 1 (December 2019): 9354. https://doi.org/10.1038/s41598-019-45839-z.
380 |
381 |
382 | - [GEM](https://sourceforge.net/projects/gemlibrary/files/gem-library/) - mappability calculations for each genomic region, accounting for mismatches. Pre-calculated UCSC genome browser tracks for human and mouse. Mappability of genes, both protein-coding and non-protein coding. RPKUM - unique exons for quantifying gene expression.
383 | Paper
384 | Derrien, Thomas, Jordi Estellé, Santiago Marco Sola, David G. Knowles, Emanuele Raineri, Roderic Guigó, and Paolo Ribeca. “Fast Computation and Applications of Genome Mappability.” PloS One 7, no. 1 (2012): e30377. https://doi.org/10.1371/journal.pone.0030377.
385 |
386 |
387 | - [Greenscreen](https://github.com/sklasfeld/GreenscreenProject) - an approach for removing false-positive peaks (ultra-high noise) from ChIP-seq data (also, CUT&RUN) using MACS2 (broadpeak setting, optimized significance threshold and merging distance to match Blacklist-created regions). As effective as canonical blacklists, improves true factor binding overlap, improves Standardized Standard Deviation (SSD -> 1), improves replicate correlation structure. Uses as few as three samples, 99.9% overlap with Blacklist-created regions, smaller genomic footprint, same performance as Blacklist-generated.
388 | Paper
389 | Klasfeld, Sammy, and Doris Wagner. “Greenscreen Decreases Type I Errors and Increases True Peak Detection in Genomic Datasets Including ChIP-Seq.” Preprint. Genomics, March 1, 2022. https://doi.org/10.1101/2022.02.27.482177.
390 |
391 |
392 | - [Manually annotated GRCh38 blacklisted regions](https://www.encodeproject.org/files/ENCFF356LFX/) on ENCODE data portal. [Tweet by Anshul Kundaje](https://twitter.com/anshulkundaje/status/1263546023151992832?s=20)
393 |
394 | - [Repetitive centromeric, telomeric and satellite regions known to have low sequencing confidence - blacklisted regions defined by the ENCODE project](http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg19-human/wgEncodeHg19ConsensusSignalArtifactRegions.bed.gz) - from Upton et al., “Epigenomic Profiling of Neuroblastoma Cell Lines.”
395 |
396 | - [UCSC unusual regions on assembly structure](http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=map&hgta_track=problematic&hgta_table=comments&hgta_doSchema=describe+table+schema), [Tweet](https://twitter.com/GenomeBrowser/status/1260693767125778434?s=20)
397 |
398 |
399 | ## DNAse-seq
400 |
401 | - DNAse-seq analysis guide. Tools for QC, peak calling, analysis, footprint detection, motif analysis, visualization, all-in-one tools ([Table 2](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bby057/5053117#118754375))
402 | - Liu, Yongjing, Liangyu Fu, Kerstin Kaufmann, Dijun Chen, and Ming Chen. “A Practical Guide for DNase-Seq Data Analysis: From Data Management to Common Applications.” Briefings in Bioinformatics, July 12, 2018. https://doi.org/10.1093/bib/bby057.
403 |
404 |
405 | ## ATAC-seq
406 |
407 | - [awesome-atac-analysis](https://github.com/databio/awesome-atac-analysis) Awesome ATAC-seq analysis by Nathan Sheffield.
408 |
409 | - [Benchmarking ATAC-seq peak calling](https://bigmonty12.github.io/peak-calling-benchmark) by Austin Montgomery
410 |
411 | - [ATACdb](http://www.licpathway.net/ATACdb) - human chromatin accessibility data, uniformly processed from GEO. Provides (epi)genetic annotations, typical and super-enhancers, transcription factors (TFs), single-nucleotide polymorphisms (SNPs), risk SNPs, eQTLs, LD SNPs, methylations, chromatin interactions and TADs. Inference of TF footprints within chromatin accessibility regions.
412 | Paper
413 | Wang, Fan, Xuefeng Bai, Yuezhu Wang, Yong Jiang, Bo Ai, Yong Zhang, Yuejuan Liu, et al. “ATACdb: A Comprehensive Human Chromatin Accessibility Database,” Nucleic Acids Res. 2020 Oct 30;49(D1):D55–D64. https://doi.org/10.1093/nar/gkaa943
414 |
415 | # Accessible chromatin region: bed
416 | wget http://www.licpathway.net/ATACdb/download/packages/Accessible_chromatin_region_all.bed
417 |
418 | # TF footprint: txt
419 | wget http://www.licpathway.net/ATACdb/download/packages/TF_footprint_package.txt
420 |
421 | # Associated gene: txt
422 | wget http://www.licpathway.net/ATACdb/download/packages/Associated_gene_package.txt
423 |
424 | # Super-enhancer: bed csv
425 | wget http://www.licpathway.net/sedb/download/package/SE_package.bed
426 | wget http://www.licpathway.net/sedb/download/package/SE_package.csv
427 |
428 | # Typical enhancer: bed csv
429 | wget http://www.licpathway.net/sedb/download/package/TE_package.bed
430 | wget http://www.licpathway.net/sedb/download/package/TE_package.csv
431 |
432 | # eQTL: bed csv
433 | wget http://www.licpathway.net/ATACdb/download/packages/eqtl.txt
434 | wget http://www.licpathway.net/ATACdb/download/packages/eqtl.csv
435 |
436 |
437 | - ATAC-seq analysis considerations. Considering multiple workflows, settling on csaw-based. Normalization by library complexity (downsampling) is important. [Workflow](https://github.com/reskejak/ATAC-seq/blob/master/ATACseq_workflow.txt) and [GitHub](https://github.com/reskejak/ATAC-seq) with all scripts.
438 | Paper
439 | Reske, Jake J., Mike R. Wilson, and Ronald L. Chandler. “ATAC-Seq Normalization Method Can Significantly Affect Differential Accessibility Analysis and Interpretation.” Epigenetics & Chromatin 13, no. 1 (December 2020): 22. https://doi.org/10.1186/s13072-020-00342-y.
440 |
441 |
442 | - [UNMC_ATACseq_Tutorial](https://github.com/JRowleyLab/UNMC_ATACseq_Tutorial) - An open-source interactive pipeline tutorial for differential ATAC-seq footprint analysis on the cloud (Google, AWS, Azure)
443 |
444 | - [OCHROdb](https://dhs.ccm.sickkids.ca/) - a database of open chromatin regions (over 1.4M). 828 DNAse-I experiments, 194 cell lines, uniformly processed, QC'd,peaks called using [Hotspot](https://www.encodeproject.org/software/hotspot/), regulatory elements clustered across all samples, batch effect corrected, reproducible peaks statistically selected. Data from ENCODE, Roadmap Epigenomics Mapping Consortium (REMC), Blueprint Epigenome and Genomics of Gene Regulation (GGR). Downloadable metadata, curated DHS dataset (full and chromosome-specific, BED format with cell/tissue-specific columns with accessibility values), visualized in JBrowse.
445 | Paper
446 | Shooshtari, Parisa, Samantha Feng, Viswateja Nelakuditi, Reza Asakereh, Nader Hosseini Naghavi, Justin Foong, Michael Brudno, and Chris Cotsapas. “Developing OCHROdb, a Comprehensive Quality Checked Database of Open Chromatin Regions from Sequencing Data.” Scientific Reports 13, no. 1 (May 18, 2023): 8106. https://doi.org/10.1038/s41598-022-26791-x.
447 |
448 |
449 | - DNAseI hypersensitive sites from 733 biosamples (439 cell andtissue types and states). NMF to simplify pattern detection. NMF patterns better explain heritability. Data at ENCODE and [Zenodo](https://zenodo.org/record/3838751), [data browser](https://index.altius.org/). [Twitter](https://twitter.com/nameluem/status/1189916668807376898?s=20), [data download](https://www.meuleman.org/research/dhsindex/)
450 | Paper
451 | Meuleman, Wouter, Alexander Muratov, Eric Rynes, Jessica Halow, Kristen Lee, Daniel Bates, Morgan Diegel, et al. “Index and Biological Spectrum of Human DNase I Hypersensitive Sites.” Nature, July 29, 2020. https://doi.org/10.1038/s41586-020-2559-3.
452 |
453 |
454 |
455 | ### ATAC-seq pipelines
456 |
457 | - [ENCODE ATAC-seq pipeline](https://github.com/ENCODE-DCC/atac-seq-pipeline) - ATAC-seq and DNase-seq processing pipeline by Anshul Kundaje
458 |
459 | - [TOBIAS](https://github.com/loosolab/TOBIAS) (Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal) - transcription factor footprinting framework for ATAC-seq data. Corrects for Tn5 bias (ATACorrect module, Figure 1). Outperforms HINT-ATAC, PIQ, Wellington, similar or better performance as msCentipede. Validated using paired ATAC-seq and ChIP-seq data. Visualization of aggregated ATAC-seq signals, differential and time course analysis, TF clustering, network building. Input - BAM file, genome FASTA, BED peaks. Output - bigWigs of uncorrected, corrected signals, expected and corrected symbols. Conda, Neftflow implementation.
460 | Paper
461 | Bentsen, Mette, Philipp Goymann, Hendrik Schultheis, Kathrin Klee, Anastasiia Petrova, René Wiegandt, Annika Fust, et al. “ATAC-Seq Footprinting Unravels Kinetics of Transcription Factor Binding during Zygotic Genome Activation.” Nature Communications 11, no. 1 (August 26, 2020): 4267. https://doi.org/10.1038/s41467-020-18035-1.
462 |
463 |
464 | - [HINT-ATAC](http://www.regulatory-genomics.org/hint/introduction/) - a footprinting method considering ATAC-seq protocol biases. Uses a position dependency model (PDM) to learn the cleavage preferences (Methods). Compared against three footprinting methods, DNase2TF, PIQ, Wellington. PDMs are crucial for correction of cleavage bias for ATAC-seq for all methods. Also improves correction for DNAse-seq data. Comparison of protocols, Omni-ATAC (best performance), Fast-ATAC. Part of [RGT, Regulatory Genomics Toolbox](https://github.com/CostaLab/reg-gen). [Tutorial](https://www.regulatory-genomics.org/hint/tutorial-differential-footprints-on-scatac-seq/).
465 | Paper
466 | Li, Zhijian, Marcel H. Schulz, Thomas Look, Matthias Begemann, Martin Zenke, and Ivan G. Costa. “Identification of Transcription Factor Binding Sites Using ATAC-Seq.” Genome Biology, (December 2019). https://doi.org/10.1186/s13059-019-1642-2
467 |
468 |
469 | - [HMMRATAC](https://github.com/LiuLabUB/HMMRATAC) - hidden Markov model for ATAC-seq to identify open chromatin regions. Parametric modeling of nucleosome-free regions and three nucleosomal reatures (mono-, di-, and tri-nucleosomes). First, train on 1000 auto-selected regions, then predict. Tested on "active promoters" and "strong enhancers" chromatin states (positive examples), and "heterochromatin" (negative examples). Compared with MACS2, F-seq.
470 | Paper
471 | Tarbell, Evan D, and Tao Liu. “HMMRATAC: A Hidden Markov ModeleR for ATAC-Seq.” Nucleic Acids Research, June 14, 2019, gkz533. https://doi.org/10.1093/nar/gkz533
472 |
473 |
474 | - [ATACseqQC](https://bioconductor.org/packages/ATACseqQC/) - R package for ATAC-seq quality control and analysis. QC, preprocessing, read shift, peak calling, motif analysis, enrichment in nucleosome-free regions, plotting (heatmaps, library complexity). [Table 1](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4559-3/tables/1) - summary of functions. [Additional material](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4559-3#Sec15) - examples of commands, table of comparison with other pipelines.
475 | Paper
476 | Ou, Jianhong, Haibo Liu, Jun Yu, Michelle A. Kelliher, Lucio H. Castilla, Nathan D. Lawson, and Lihua Julie Zhu. “ATACseqQC: A Bioconductor Package for Post-Alignment Quality Assessment of ATAC-Seq Data.” BMC Genomics 19, no. 1 (December 2018): 169. https://doi.org/10.1186/s12864-018-4559-3.
477 |
478 |
479 | - [atac_chip_preprocess](https://github.com/ATpoint/atac_chip_preprocess) - Preprocessing workflow for ATAC-seq and ChIP-seq data, Nextflow pipeline.
480 |
481 | - ATAC-seq peak calling using MACS2: `macs2 callpeak --nomodel --nolambda -- keep-dup all --call-summits -f BAMPE -g hs`
482 |
483 | - [ATACProc](https://github.com/ay-lab/ATACProc) - ATAC-seq processing pipeline
484 |
485 | - [atacseq](https://github.com/nf-core/atacseq) - nf-core ATAC-seq peak-calling and differential analysis pipeline.
486 |
487 | - [pepatac](https://github.com/databio/pepatac) - A modular, containerized pipeline for ATAC-seq data processing. [Examples and documentation](http://code.databio.org/PEPATAC/)
488 |
489 | ## Histone-seq
490 |
491 | Homer program ‘findPeaks’ with the style ‘histone’. Peaks within 1 kb were merged into a single peak. Broad peaks in H3K36me3, H3K27me3 and H3K9me3 were called using the Homer program ‘findPeaks’ with the options ‘-region –size 1000 –minDist 2500’. When Homer runs with these options, the initial sets of peaks were 1 kb wide and peaks within 2.5 kb were merged.
492 |
493 | - `DEScan2` - broad peak (histone, ATAC, DNAse) analysis (peak caller, peak filtering and alignment across replicates, creation of a count matrix). Peak caller uses a moving window and calculated a Poisson likelihood of a peak as compared to a region outside the window. https://bioconductor.org/packages/release/bioc/html/DEScan2.html
494 | - Righelli, Dario, John Koberstein, Nancy Zhang, Claudia Angelini, Lucia Peixoto, and Davide Risso. “Differential Enriched Scan 2 (DEScan2): A Fast Pipeline for Broad Peak Analysis.” PeerJ Preprints, 2018.
495 |
496 | - `HMCan` and `HMCan-diff` - histone ChIP-seq peak caller (and differential) that accounts for CNV, also for CG bias. Hidden Markov Model to detect peak signal. Control-FREEC to detect CNV in ChIP-seq data. Outperforms others, CCAT second best. https://www.cbrc.kaust.edu.sa/hmcan/
497 | - Ashoor, Haitham, Aurélie Hérault, Aurélie Kamoun, François Radvanyi, Vladimir B. Bajic, Emmanuel Barillot, and Valentina Boeva. “HMCan: A Method for Detecting Chromatin Modifications in Cancer Samples Using ChIP-Seq Data.” Bioinformatics (Oxford, England) 29, no. 23 (December 1, 2013): 2979–86. https://doi.org/10.1093/bioinformatics/btt524.
498 | - Ashoor, Haitham, Caroline Louis-Brennetot, Isabelle Janoueix-Lerosey, Vladimir B. Bajic, and Valentina Boeva. “HMCan-Diff: A Method to Detect Changes in Histone Modifications in Cells with Different Genetic Characteristics.” Nucleic Acids Research 45, no. 8 (05 2017): e58. https://doi.org/10.1093/nar/gkw1319.
499 |
500 | - `RSEG` - ChIP-seq analysis for identifying genomic regions and their boundaries marked by diffusive histone modification markers, such as H3K36me3 and H3K27me3, http://smithlabresearch.org/software/rseg/
501 |
502 | ### Broad peak analysis
503 |
504 | - [EDD](https://github.com/CollasLab/edd) - Enriched Domain Detector, a ChIP-seq peak caller for detection of megabase domains of enrichment.
505 |
506 | - [epic2](https://github.com/biocore-ntnu/epic2) - an ultraperformant reimplementation of SICER. It focuses on speed, low memory overhead and ease of use.
507 | - Stovner, Endre Bakken, and Pål Sætrom. "[epic2 efficiently finds diffuse domains in ChIP-seq data](https://doi.org/10.1093/bioinformatics/btz232)." Bioinformatics, (2019)
508 |
509 | - [DEScan2](https://bioconductor.org/packages/DEScan2/) - Integrated peak and differential caller, specifically designed for broad epigenomic signals, R package.
510 |
511 | ## Technology
512 |
513 | - [ATAC-STARR-seq](https://github.com/HodgesGenomicsLab/ATAC-STARR-seq) - updated protocol that combined transposase-accessible chromatin (ATAC-seq) with self-transcribing active regulatory region sequencing (STARR-seq) to selectively assay the regulatory potential of accessible DNA. Includes protocols for plasmid library generation, reporter assay, data analysis (peak-within-peak calling, adapted DESeq2 to normalize reporter RNA read counts to plasmid DNA read counts. Keep duplicates). Agrees with ATAC-seq, much less noisy. [GSE181317](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE181317) - data. [GitHub](https://github.com/HodgesGenomicsLab/ATAC-STARR-seq) - computational pipeline.
514 | Paper
515 | Hansen, Tyler J., and Emily Hodges. “Identifying Transcription Factor-Bound Activators and Silencers in the Chromatin Accessible Human Genome Using ATAC-STARR-Seq.” Preprint. Genomics, March 28, 2022. https://doi.org/10.1101/2022.03.25.485870.
516 |
517 |
518 | - **CLIP-seq** (cross-linking and immunoprecipitation) technology, detects sites bound by a protein to RNAs.Figure 1 - technology overview, Figure 2 - details of HITS-CLIP/iCLIP/irCLIP/eCLIP/PAR-CLIP/Proximity-CLIP. Computational analysis, Table 3 - peak detection software. Databases ([doRiNA](https://dorina.mdc-berlin.de/), [ENCORI](https://starbase.sysu.edu.cn/), [POSTAR3](http://111.198.139.65/)).
519 | Paper
520 | Hafner, Markus, Maria Katsantoni, Tino Köster, James Marks, Joyita Mukherjee, Dorothee Staiger, Jernej Ule, and Mihaela Zavolan. "CLIP and complementary methods." Nature Reviews Methods Primers 1, no. 1 (2021): 1-23. https://doi.org/10.1038/s43586-021-00018-1
521 |
522 |
523 | - **STARR-seq** (self-transcribing active regulatory region sequencing) technology for enhancer identification. 3 min [Video](https://youtu.be/1csmHhsnkxE) protocol. Applied to Drosophila genome. The majority (55.6%) of identified enhancers were located within introns, especially in the first intron (37.2%), and in intergenic regions (22.6%). Many genes appeared to be regulated by several independently functioning enhancers.
524 | Paper
525 | Arnold, Cosmas D., Daniel Gerlach, Christoph Stelzer, Łukasz M. Boryń, Martina Rath, and Alexander Stark. “Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-Seq.” Science 339, no. 6123 (March 2013): 1074–77. https://doi.org/10.1126/science.1232542.
526 |
527 |
528 | ## Machine learning
529 |
530 | - [maxATAC](https://github.com/MiraldiLab/maxATAC) - TFBS prediction from ATAC-seq (bulk and pseudobulk) in any cell type (whole genome, chromosome, or region). Deep dilated convolutional neural networks, bigWig and BED predictions of TFBSs. [Models avaliable for 127 human TFs](https://github.com/MiraldiLab/maxATAC_data) (h5 files). Outperforms baseline (average ChIP-seq signal, motif scanning) for most TFs and cell lines. AUPR is similar to the top performer in the ENCODE-DREAM in vivo TFBS prediction challenge (0.4). OMNI-ATAC-seq data for three cell lines, to be available. ATAC-seq scaling to signal per replicate to 20 million mapped reads (RP20M) and min-max normalized to 99th percentile signals. Python, separate functions for each step (prepare, average, normalize, train, predict, benchmark, peaks, variants). [Tweet 1](https://twitter.com/tareian_it_up/status/1487614524492505090?s=20&t=1dQPuanBrvUUlP_g-Uo9jQ), [Tweet 2](https://twitter.com/EmilyMiraldi/status/1494414950848253953?s=20&t=1dQPuanBrvUUlP_g-Uo9jQ).
531 | Paper
532 | Cazares, Tareian A, Faiz W Rizvi, Balaji Iyer, Xiaoting Chen, Michael Kotliar, Joseph A Wayman, Anthony Bejjani, et al. “MaxATAC: Genome-Scale Transcription-Factor Binding Prediction from ATAC-Seq with Deep Neural Networks.” Preprint. Bioinformatics, January 29, 2022. https://doi.org/10.1101/2022.01.28.478235.
533 |
534 |
535 | - Segmentation and genome annotation (SAGA) algorithms review. Methods and tools for finding patterns from multiple ChIP-seq, histone-seq, etc. measures (Table 1). Hidden Markov Model (HMM), Dynamic Bayesian Network (DBN) algorithms. HMM intuition, math, solution algorithms. Visualization. Future work, challenges.
536 | Paper
537 | Libbrecht, Maxwell W., Rachel C. W. Chan, and Michael M. Hoffman. “Segmentation and Genome Annotation Algorithms for Identifying Chromatin State and Other Genomic Patterns.” Edited by Tamar Schlick. PLOS Computational Biology 17, no. 10 (October 14, 2021): e1009423. https://doi.org/10.1371/journal.pcbi.1009423.
538 |
539 |
540 |
541 | ## Misc
542 |
543 | - Instead of using ENCODE pre-processed p-value or fold-change signal bigWigs, which can obscure base-resolution information, we ingested the raw alignment (BAM) files for individual replicates. These BAM files were converted to base-resolution count bigWig files using the reads_to_bigwig.py script (from the ChromBPNet repository, https://github.com/kundajelab/chrombpnet/blob/master/chrombpnet/helpers/preprocessing/reads_to_bigwig.py), with default parameters. [Source](https://doi.org/10.1101/2025.06.25.661532)
544 |
545 |
546 | - `covtobed` - a tool to generate BED coverage tracks from BAM files. https://github.com/telatin/covtobed
547 |
548 | - [UCSC Genome Browser API](http://genome.ucsc.edu/goldenPath/help/api.html) to retrieve DNA sequence from coordinates.
549 | - https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM;start=4321;end=5678
550 | - https://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:4336341,4336599
--------------------------------------------------------------------------------