├── DESCRIPTION ├── INSTALL ├── NAMESPACE ├── R └── trackplot.R ├── README.md ├── inst ├── CITATION └── extdata │ ├── bw │ ├── H1_Ctcf.bw │ ├── H1_H2az.bw │ ├── H1_Nanog.bw │ ├── H1_Oct4.bw │ ├── H1_k27ac.bw │ ├── H1_k4me1.bw │ └── H1_k4me3.bw │ └── narrowpeak │ ├── H1_Ctcf.bed │ ├── H1_H2az.bed │ ├── H1_Nanog.bed │ ├── H1_Oct4.bed │ ├── H1_Pol2.bed │ ├── H1_chromHMM.bed │ ├── H1_k27ac.bed │ ├── H1_k4me1.bed │ └── H1_k4me3.bed ├── man ├── diffpeak.Rd ├── extract_summary.Rd ├── pca_plot.Rd ├── profile_extract.Rd ├── profile_heatmap.Rd ├── profile_plot.Rd ├── profile_summarize.Rd ├── read_coldata.Rd ├── summarize_homer_annots.Rd ├── track_extract.Rd ├── track_plot.Rd ├── track_summarize.Rd └── volcano_plot.Rd ├── trackplot.Rproj └── vignettes └── trackplot.Rmd /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: trackplot 2 | Title: Generate IGV style track plots and profile plots from bigWig files 3 | Version: 1.6.00 4 | Authors@R: person("Anand", "Mayakonda", , "anandmt3@gmail.com", role = c("aut", "cre")) 5 | Description: trackplot is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files. 6 | License: MIT 7 | Encoding: UTF-8 8 | Roxygen: list(markdown = TRUE) 9 | RoxygenNote: 7.3.1 10 | Depends: data.table 11 | Imports: methods 12 | Suggests: 13 | knitr, 14 | rmarkdown, 15 | limma 16 | VignetteBuilder: knitr 17 | SystemRequirements: bwtool, MySQL: mysql-server (deb) or mysql-community-server (rpm) or mysql (brew) 18 | -------------------------------------------------------------------------------- /INSTALL: -------------------------------------------------------------------------------- 1 | trackplot requires bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; `/usr/local/bin`) or a directory under the PATH. 2 | 3 | 4 | * For macOS: Please download the pre-build binary from here: https://www.dropbox.com/s/kajx9ya6erzyrim/bwtool_macOS.tar.gz?dl=1 5 | Make it executable with `chmod +x bwtool`. macOS gatekeeper might complain that it can not run the binary downloaded from the internet. If so, allow (https://support.apple.com/en-us/HT202491) it in the security settings. 6 | 7 | * For centOS or debian: Follow the below compilation instructions 8 | 9 | git clone 'https://github.com/CRG-Barcelona/bwtool' 10 | git clone 'https://github.com/CRG-Barcelona/libbeato' 11 | git clone https://github.com/madler/zlib 12 | 13 | cd libbeato/ 14 | git checkout 0c30432af9c7e1e09ba065ad3b2bc042baa54dc2 15 | ./configure 16 | make 17 | 18 | cd ../zlib 19 | ./configure 20 | make 21 | 22 | cd ../bwtool 23 | ./configure CFLAGS='-I../libbeato -I../zlib' LDFLAGS='-L../libbeato/jkweb -L../libbeato/beato -L../zlib' 24 | make 25 | 26 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(diffpeak) 4 | export(extract_summary) 5 | export(pca_plot) 6 | export(profile_extract) 7 | export(profile_heatmap) 8 | export(profile_plot) 9 | export(profile_summarize) 10 | export(read_coldata) 11 | export(summarize_homer_annots) 12 | export(track_extract) 13 | export(track_plot) 14 | export(track_summarize) 15 | export(volcano_plot) 16 | import(data.table) 17 | -------------------------------------------------------------------------------- /R/trackplot.R: -------------------------------------------------------------------------------- 1 | # This R script contains functions for bigWig visualization 2 | # 3 | # Source code: https://github.com/PoisonAlien/trackplot 4 | # 5 | # MIT License 6 | # Copyright (c) 2020 Anand Mayakonda 7 | # 8 | # Change log: 9 | # Version: 1.6.00 [2024-10-28] 10 | # * Added argument `track_overlay` for a single track with line plot. #14 11 | # * Bug fix: color alpha differs between tracks. Issue: #34 12 | # Version: 1.5.10 [2024-02-14] 13 | # * Added argument `layout_ord` and `bw_ord` to `track_plot()` re-order the overall tracks and bigWig tracks 14 | # * Added `xlab` and `ylab` arguments to `profile_plot()` 15 | # Version: 1.5.01 [2023-10-17] 16 | # * Bug fix parsing loci while parsing GTF 17 | # * Small updates to profile_heatmap() 18 | # Version: 1.5.00 [2023-08-24] 19 | # * Added `read_coldata` to import bigwig and bed files along with metadata. This streamlines all the downstream processes 20 | # * Added `profile_heatmap` for plotting heatmap 21 | # * Added `diffpeak` for minimal differential peak analysis based on peak intensities 22 | # * Added `volcano_plot` for diffpeak results visualization 23 | # * Added `summarize_homer_annots` 24 | # * Support for GTF files with `track_extract` 25 | # * `track_extract` now accepts gene name as input. 26 | # * More customization to `profile_extract` `profile_plot` and `plot_pca` 27 | # * Nicer output with `extract_summary` 28 | # * Update mysql query for UCSC. Added `ideoTblName` argument for `track_extract`. Issue: #19 29 | # Version: 1.4.00 [2023-07-27] 30 | # * Updated track_plot to include chromHMM tracks and top peaks tracks 31 | # * Support to draw narrowPeak or boradPeak files with track_plot 32 | # * Support to query ucsc for chromHMM tracks 33 | # * Additional arguments to track_plot to adjust heights of all the tracks and margins 34 | # * Improved track_extract - (extracts gene models and cytobands to avoid repetitive calling ucsc genome browser) 35 | # * Additional arguments to pca_plot for better plotting 36 | # * Added example datasets 37 | # Version: 1.3.10 [2021-10-06] 38 | # * Support for negative values (Issue: https://github.com/PoisonAlien/trackplot/issues/6 ) 39 | # * Added y_min argument to track_plot. 40 | # * Change the default value for collapse_tx to TRUE 41 | # Version: 1.3.05 [2021-06-07] 42 | # * Summarize and groupScaleByCondition tracks by condition. Issue: #4 43 | # * Allow the script to install as a package. 44 | # * Added y_max argument for custom y-axis limits in track_plot. 45 | # Version: 1.3.01 [2021-04-26] 46 | # * Fix gtf bug. Issue: #3 47 | # Version: 1.3.0 [2021-03-26] 48 | # * modularize the code base to avoid repetitive data extraction and better plotting 49 | # Version: 1.2.0 [2020-12-09] 50 | # * Added bwpcaplot() 51 | # Version: 1.1.11 [2020-12-07] 52 | # * Bug fixes in profileplot(): Typo for .check_dt() and startFrom estimation 53 | # Version: 1.1.1 [2020-12-04] 54 | # * trackplot() now plots ideogram of target chromosome 55 | # Version: 1.1.0 [2020-12-01] 56 | # * Added profileplot() 57 | # Version: 1.0.0 [2020-11-27] 58 | # * Initial release 59 | 60 | #------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 61 | 62 | #'Prepares meta data table from bigWig files. 63 | #'Output from this function is passed to all downstream functions. 64 | #' @param bws path to bigWig files 65 | #' @param sample_names sample names for each input files. Optional. Default NULL - creates one from file names. 66 | #' @param build Reference genome build. Default hg38 67 | #' @param input_type Default `bw`. Can be `bw` or `peak` 68 | #' @examples 69 | #' bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 70 | #' cd = read_coldata(bws = bigWigs, build = "hg19") 71 | #' beds = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "\\.bed$", full.names = TRUE) 72 | #' cd_bed = read_coldata(bws = beds, input_type = "peak", build = "hg19") 73 | #' @export 74 | 75 | read_coldata = function(bws = NULL, sample_names = NULL, build = "hg38", input_type = "bw"){ 76 | 77 | if(is.null(bws)){ 78 | stop("Please provide paths to bigWig files") 79 | } 80 | 81 | input_type = match.arg(arg = input_type, choices = c("bw", "peak")) 82 | 83 | message("Checking for files..") 84 | bws = as.character(bws) 85 | lapply(bws, function(x){ 86 | if(!file.exists(x)){ 87 | stop(paste0(x, " does not exist!")) 88 | } 89 | }) 90 | 91 | if(is.null(sample_names)){ 92 | bw_sample_names = unlist(data.table::tstrsplit(x = basename(bws), split = "\\.", keep = 1)) 93 | }else{ 94 | bw_sample_names = as.character(sample_names) 95 | } 96 | 97 | if(any(duplicated(bw_sample_names))){ 98 | stop("Found duplicates. Samples names must be unique") 99 | } 100 | if(length(bw_sample_names) != length(bws)){ 101 | stop("Please provide names for each input file") 102 | } 103 | 104 | coldata = data.table::data.table(bw_files = bws, bw_sample_names = bw_sample_names) 105 | 106 | attr(coldata, "refbuild") = build 107 | attr(coldata, "is_bw") = input_type == "bw" 108 | message("Input type: ", input_type) 109 | message("Ref genome: ", build) 110 | message("OK!") 111 | 112 | coldata 113 | } 114 | 115 | #' Extract bigWig track data for the given loci 116 | #' @param colData coldata from \code{read_coldata} 117 | #' @param loci target region to plot. Should be of format "chr:start-end". e.g; chr3:187715903-187752003 OR chr3:187,715,903-187,752,003 118 | #' @param gene gene name. This is mutually exclusive with \code{loci} 119 | #' @param binsize bin size to extract signal. Default 10 (bps). 120 | #' @param nthreads Default 1. Number of threads to use. 121 | #' @param query_ucsc Default TRUE. Queries UCSC and extracts gene models and cytoband for the loci. Requires `mysql` installation. 122 | #' @param gtf Use gtf file or data.frame as source for gene model. Default NULL. 123 | #' @param build Reference genome build. Default hg38 124 | #' @param padding Extend locus on both sides by this many bps. 125 | #' @param ideoTblName Table name for ideogram. Default `cytoBand` 126 | #' @import data.table 127 | #' @examples 128 | #' bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 129 | #' cd = read_coldata(bws = bigWigs, build = "hg19") 130 | #' oct4_loci = "chr6:31125776-31144789" 131 | #' t = track_extract(colData = cd, loci = oct4_loci, build = "hg19") 132 | #' @export 133 | track_extract = function(colData = NULL, loci = NULL, gene = NULL, binsize = 10, nthreads = 1, query_ucsc = TRUE, gtf = NULL, build = "hg38", padding = 0, ideoTblName = "cytoBand"){ 134 | 135 | if(is.null(colData)){ 136 | stop("Missing colData. Use read_coldata() to generate one.") 137 | } 138 | 139 | if(all(is.null(loci), is.null(gene))){ 140 | stop("Please provide a loci or a gene name!") 141 | } 142 | 143 | input_bw = attr(colData, "is_bw") 144 | build = attr(colData, "refbuild") 145 | 146 | .check_windows() 147 | if(input_bw){ 148 | .check_bwtool() 149 | } 150 | 151 | .check_dt() 152 | 153 | options(warn = -1) 154 | op_dir = tempdir() #For now 155 | 156 | if(!is.null(gene)){ 157 | if(!is.null(gtf)){ 158 | etbl = .parse_gtf(gtf = gtf, genename = gene) 159 | cyto = NA 160 | start = min(unlist(lapply(etbl,function(x) attr(x, "start")))) 161 | end = max(unlist(lapply(etbl,function(x) attr(x, "end")))) 162 | chr = unique(unlist(lapply(etbl,function(x) attr(x, "chr")))) 163 | }else if(query_ucsc){ 164 | message("Querying UCSC genome browser for gene model and cytoband..") 165 | etbl = .extract_geneModel_ucsc_bySymbol(genesymbol = gene, refBuild = build) 166 | chr = unique(as.character(etbl$chr)); start = min(as.numeric(etbl$start)); end = max(as.numeric(etbl$end)) 167 | if(length(chr) > 1){ 168 | message("Multiple chromosomes found! Using the first one ", chr[1]) 169 | etbl = etbl[chr %in% chr[1]] 170 | chr = unique(as.character(etbl$chr)); start = min(as.numeric(etbl$start)); end = max(as.numeric(etbl$end)) 171 | } 172 | if(!is.null(etbl)){ 173 | etbl = .make_exon_tbl(gene_models = etbl) 174 | } 175 | cyto = .extract_cytoband(chr = chr, refBuild = build) 176 | loci = paste0(chr, ":", start, "-", end) 177 | }else{ 178 | cyto = etbl = NA 179 | } 180 | if(is.null(etbl)){ 181 | stop("No transcript models found for ", gene) 182 | } 183 | }else{ 184 | message("Parsing loci..") 185 | loci_p = .parse_loci(loci = loci) 186 | chr = loci_p$chr; start = loci_p$start; end = loci_p$end 187 | if(start >= end){ 188 | stop("End must be larger than Start!") 189 | } 190 | message(" Queried region: ", chr, ":", start, "-", end, " [", end-start, " bps]") 191 | #Extract gene models for this region 192 | if(!is.null(gtf)){ 193 | etbl = .parse_gtf(gtf = gtf, chr = chr, start = start, end = end) 194 | cyto = NA 195 | }else if(query_ucsc){ 196 | message("Querying UCSC genome browser for gene model and cytoband..") 197 | etbl = .extract_geneModel_ucsc(chr, start = start, end = end, refBuild = build, txname = NULL, genename = NULL) 198 | if(!is.null(etbl)){ 199 | etbl = .make_exon_tbl(gene_models = etbl) 200 | } 201 | cyto = .extract_cytoband(chr = chr, refBuild = build, tblName = ideoTblName) 202 | }else{ 203 | cyto = etbl = NA 204 | } 205 | } 206 | 207 | start = start - as.numeric(padding) 208 | end = end + as.numeric(padding) 209 | loci = paste0(chr, ":", start, "-", end) 210 | 211 | input_files = colData$bw_files 212 | custom_names = colData$bw_sample_names 213 | 214 | if(input_bw){ 215 | windows = .gen_windows(chr = chr, start = start, end = end, window_size = binsize, op_dir = op_dir) 216 | track_summary = .get_summaries(bedSimple = windows, bigWigs = input_files, op_dir = op_dir, nthreads = nthreads) 217 | }else{ 218 | track_summary = .get_summaries_narrowPeaks(bigWigs = input_files, nthreads = nthreads, chr, start, end) 219 | } 220 | 221 | names(track_summary) = custom_names 222 | 223 | attr(track_summary, "meta") = list(etbl = etbl, cyto = cyto, loci = loci) 224 | message("OK!") 225 | 226 | list(data = track_summary, colData = colData) 227 | } 228 | 229 | #' Summarize tracks per condition 230 | #' @param summary_list Output from track_extract. Required. 231 | #' @param condition a column name in \code{coldata} containing sample conditions. Default NULL. 232 | #' @param stat can be `mean, median`, `max`, `min`. NAs are excluded. 233 | #' @export 234 | track_summarize = function(summary_list = NULL, condition = NULL, stat = "mean"){ 235 | 236 | if(is.null(summary_list)){ 237 | stop("Missing input! Expecting output from track_extract()") 238 | } 239 | 240 | stat = match.arg(arg = stat, choices = c("mean", "median", "max", "min")) 241 | 242 | meta = attr(summary_list$data, "meta") 243 | loci = meta$loci 244 | etbl = meta$etbl 245 | cyto = meta$cyto 246 | 247 | coldata = summary_list$colData 248 | is_bw = attr(coldata, "is_bw") 249 | build = attr(coldata, "refbuild") 250 | 251 | if(is.null(condition)){ 252 | stop("Please provide a column name containing sample condition!\nHere are available columns.\n", paste(colnames(coldata), collapse = " ")) 253 | } 254 | 255 | summary_list = summary_list$data 256 | 257 | if(!is.null(condition)){ 258 | if(!condition %in% colnames(coldata)){ 259 | warning(paste0(condition, " does not exists in coldata. Here are available columns.")) 260 | print(coldata) 261 | stop() 262 | }else{ 263 | colnames(coldata)[which(colnames(coldata) == condition)] = "group_condition" 264 | coldata$group_condition = as.character(coldata$group_condition) 265 | } 266 | condition = as.character(coldata$group_condition) 267 | } 268 | 269 | names(summary_list) = condition 270 | 271 | summary_list = data.table::rbindlist(l = summary_list, use.names = TRUE, fill = TRUE, idcol = "sample_name") 272 | 273 | if(stat == "mean"){ 274 | summary_list = summary_list[,mean(max, na.rm = TRUE), .(sample_name, chromosome, start, end)] 275 | }else if (stat == "median"){ 276 | summary_list = summary_list[,median(max, na.rm = TRUE), .(sample_name, chromosome, start, end)] 277 | }else if (stat == "max"){ 278 | summary_list = summary_list[,max(max, na.rm = TRUE), .(sample_name, chromosome, start, end)] 279 | }else{ 280 | summary_list = summary_list[,min(max, na.rm = TRUE), .(sample_name, chromosome, start, end)] 281 | } 282 | 283 | colnames(summary_list)[ncol(summary_list)] = "max" #this column name means nothing, just using it for the consistency 284 | summary_list = split(summary_list, summary_list$sample_name) 285 | attr(summary_list, "meta") = meta 286 | 287 | list(data = summary_list, colData = coldata) 288 | } 289 | 290 | #' Generate IGV style locus tracks with ease 291 | #' @param summary_list Output from track_extract 292 | #' @param draw_gene_track Default FALSE. If TRUE plots gene models overlapping with the queried region 293 | #' @param show_ideogram Default TRUE. If TRUE plots ideogram of the target chromosome with query loci highlighted. Works only when `query_ucsc` is TRUE. 294 | #' @param txname transcript name to draw. Default NULL. Plots all transcripts overlapping with the queried region 295 | #' @param genename gene name to draw. Default NULL. Plots all genes overlapping with the queried region 296 | #' @param collapse_txs Default FALSE. Whether to collapse all transcripts belonging to same gene into a unified gene model 297 | #' @param groupAutoScale Default TRUE 298 | #' @param y_max custom y axis upper limits for each track. Recycled if required. 299 | #' @param y_min custom y axis lower limits for each track. Recycled if required. 300 | #' @param gene_fsize Font size. Default 1 301 | #' @param col Color for tracks. Default `#2f3640`. Multiple colors can be provided for each track 302 | #' @param show_axis Default FALSE 303 | #' @param track_names Default NULL 304 | #' @param track_names_pos Default 0 (corresponds to left corner) 305 | #' @param track_names_to_left If TRUE, track names are shown to the left of the margin. Default FALSE, plots on top as a title 306 | #' @param track_overlay Draws all bigWigs in a single track as a line plot 307 | #' @param regions genomic regions to highlight. A data.frame with at-least three columns containing chr, start and end positions. 308 | #' @param boxcol color for highlighted region. Default "#192A561A" 309 | #' @param boxcolalpha Default 0.5 310 | #' @param ucscChromHMM Name of the chromHMM table. Use .get_ucsc_hmm_tbls() to see the details. 311 | #' @param chromHMM chromHMM data. Can be path to bed files or a list data.frames with first three columns containing chr,start,end and a 4th column containing integer coded state 312 | #' @param chromHMM_names name for the chromHMM track 313 | #' @param chromHMM_cols A named vector for each state (in the 4th column of chromHMM file). Default NULL 314 | #' @param peaks bed file to be highlighted. Can be path to bed files or a list data.frames with first three columns containing chr,start,end. 315 | #' @param peaks_track_names Provide a name for each loci bed file. Default NULL 316 | #' @param cytoband_track_height Default 1 317 | #' @param chromHMM_track_height Default 1 318 | #' @param gene_track_height Default 2 319 | #' @param scale_track_height Default 1 320 | #' @param peaks_track_height Default 2. 321 | #' @param bw_track_height Default 3 322 | #' @param left_mar Space to the left. Default 4 323 | #' @param bw_ord Names of the tracks to be drawn in the provided order. Default NULL. 324 | #' @param layout_ord Plot layout order. Deafult c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, cytoband track. 325 | #' @examples 326 | #' bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 327 | #' cd = read_coldata(bws = bigWigs, build = "hg19") 328 | #' oct4_loci = "chr6:31125776-31144789" 329 | #' t = track_extract(colData = cd, loci = oct4_loci, build = "hg19") 330 | #' trackplot::track_plot(summary_list = t) 331 | #' @export 332 | track_plot = function(summary_list = NULL, 333 | draw_gene_track = TRUE, 334 | show_ideogram = TRUE, 335 | col = "gray70", 336 | groupAutoScale = FALSE, 337 | y_max = NULL, 338 | y_min = NULL, 339 | txname = NULL, 340 | genename = NULL, 341 | show_axis = FALSE, 342 | gene_fsize = 1, 343 | track_names = NULL, 344 | track_names_pos = 0, 345 | track_names_to_left = FALSE, 346 | track_overlay = FALSE, 347 | regions = NULL, 348 | collapse_txs = TRUE, 349 | boxcol = "#ffc41a", 350 | boxcolalpha = 0.4, 351 | chromHMM = NULL, 352 | chromHMM_cols = NULL, 353 | chromHMM_names = NULL, 354 | ucscChromHMM = NULL, 355 | peaks = NULL, 356 | bw_track_height = 3, 357 | peaks_track_height = 2, 358 | gene_track_height = 2, 359 | scale_track_height = 2, 360 | chromHMM_track_height = 1, 361 | cytoband_track_height = 2, 362 | peaks_track_names = NULL, 363 | left_mar = NULL, 364 | bw_ord = NULL, 365 | layout_ord = c("p", "b", "h", "g", "c") 366 | ){ 367 | 368 | if(is.null(summary_list)){ 369 | stop("Missing input! Expecting output from track_extract()") 370 | } 371 | 372 | 373 | meta = attr(summary_list$data, "meta") 374 | loci = meta$loci 375 | etbl = meta$etbl 376 | cyto = meta$cyto 377 | 378 | coldata = summary_list$colData 379 | is_bw = attr(coldata, "is_bw") 380 | build = attr(coldata, "refbuild") 381 | 382 | loci_p = .parse_loci(loci = loci) 383 | chr = loci_p$chr; start = loci_p$start; end = loci_p$end 384 | 385 | # chr = summary_list$loci[1] 386 | # start = as.numeric(summary_list$loci[2]) 387 | # end = as.numeric(summary_list$loci[3]) 388 | # etbl = summary_list$etbl 389 | # cyto = summary_list$cyto 390 | # is_bw = attr(summary_list, "is_bw") 391 | # build = attr(summary_list, "refbuild") 392 | 393 | summary_list = summary_list$data 394 | 395 | #Change the order 396 | if(!is.null(bw_ord)){ 397 | bw_ord = intersect(names(summary_list), bw_ord) 398 | 399 | if(length(bw_ord) == 0){ 400 | stop("None of the provided bw_ord are presnt in the data! Available names:\n", paste(names(summary_list), collapse = ", ")) 401 | } 402 | 403 | summary_list = summary_list[bw_ord] 404 | coldata = data.table::rbindlist(split(coldata, coldata$bw_sample_names)[bw_ord]) 405 | } 406 | 407 | if(length(col) != length(summary_list)){ 408 | col = rep(x = col, length(summary_list)) 409 | } 410 | 411 | plot_regions = FALSE 412 | if(!is.null(regions)){ 413 | if(is(object = regions, class2 = "data.frame")){ 414 | regions = data.table::as.data.table(x = regions) 415 | colnames(regions)[1:3] = c("chromsome", "startpos", "endpos") 416 | regions = regions[chromsome %in% chr] 417 | if(nrow(regions) == 0){ 418 | warning("None of the regions are within the requested chromosme: ", chr) 419 | plot_regions = TRUE 420 | }else{ 421 | plot_regions = TRUE 422 | } 423 | }else{ 424 | stop("'mark_regions' must be a data.frame with first 3 columns containing : chr, start, end") 425 | } 426 | } 427 | 428 | if(!is.null(track_names)){ 429 | names(summary_list) = track_names 430 | } 431 | 432 | groupScaleByCondition = FALSE #For furture 433 | if(groupScaleByCondition){ 434 | plot_height = unlist(lapply(summary_list, function(x) max(x$max, na.rm = TRUE))) 435 | plot_height_min = unlist(lapply(summary_list, function(x) min(x$max, na.rm = TRUE))) 436 | plot_height = data.table::data.table(plot_height, plot_height_min, col, names(summary_list)) 437 | plot_height$og_ord = 1:nrow(plot_height) 438 | plot_height = plot_height[order(col)] 439 | plot_height_max = plot_height[,.(.N, max(plot_height)), .(col)] 440 | plot_height_min = plot_height[,.(.N, max(plot_height_min)), .(col)] 441 | plot_height$max = rep(plot_height_max$V2, plot_height_max$N) 442 | plot_height$min = rep(plot_height_min$V2, plot_height_min$N) 443 | plot_height_min = plot_height[order(og_ord)][,min] 444 | plot_height = plot_height[order(og_ord)][,max] 445 | }else if(groupAutoScale){ 446 | plot_height = max(unlist(lapply(summary_list, function(x) max(x$max, na.rm = TRUE))), na.rm = TRUE) 447 | plot_height_min = min(unlist(lapply(summary_list, function(x) min(x$max, na.rm = TRUE))), na.rm = TRUE) 448 | plot_height = rep(plot_height, length(summary_list)) 449 | plot_height_min = rep(plot_height_min, length(summary_list)) 450 | }else{ 451 | plot_height = unlist(lapply(summary_list, function(x) max(x$max, na.rm = TRUE))) 452 | plot_height_min = unlist(lapply(summary_list, function(x) min(x$max, na.rm = TRUE))) 453 | } 454 | 455 | if(!is.null(y_max)){ 456 | #If custom ylims are provided 457 | if(length(y_max) != length(summary_list)){ 458 | y_max = rep(y_max, length(summary_list)) 459 | } 460 | plot_height = y_max 461 | 462 | }else{ 463 | plot_height = round(plot_height, digits = 2) 464 | } 465 | 466 | if(!is.null(y_min)){ 467 | #If custom ylims are provided 468 | if(length(y_min) != length(summary_list)){ 469 | y_min = rep(y_min, length(summary_list)) 470 | } 471 | plot_height_min = y_min 472 | 473 | }else{ 474 | plot_height_min = round(plot_height_min, digits = 2) 475 | } 476 | 477 | if(is_bw & track_overlay){ 478 | ntracks = 1 479 | }else{ 480 | ntracks = length(summary_list) 481 | } 482 | 483 | lo = .make_layout(ntracks = ntracks, ntracks_h = bw_track_height, cytoband = show_ideogram, cytoband_h = cytoband_track_height, genemodel = draw_gene_track, 484 | genemodel_h = gene_track_height, chrHMM = any(!is.null(ucscChromHMM), !is.null(chromHMM)), chrHMM_h = chromHMM_track_height, loci = !is.null(peaks), 485 | loci_h = peaks_track_height, scale_track_height = scale_track_height, lord = layout_ord) 486 | 487 | query = data.table::data.table(chr = chr, start = start, end = end) 488 | data.table::setkey(x = query, chr, start, end) 489 | 490 | if(is.null(left_mar)){ 491 | left_mar = ifelse(test = show_axis, yes = 4, no = 2) 492 | } 493 | 494 | #Draw top peaks 495 | if(!is.null(peaks)){ 496 | 497 | if(is.list(peaks)){ 498 | peaks_data = lapply(peaks, function(l){ 499 | colnames(l)[1:3] = c("chr", "start", "end") 500 | data.table::setDT(l, key = c("chr", "start", "end")) 501 | l 502 | }) 503 | }else{ 504 | peaks_data = lapply(peaks, function(l){ 505 | l = data.table::fread(file = l) 506 | colnames(l)[1:3] = c("chr", "start", "end") 507 | data.table::setDT(l, key = c("chr", "start", "end")) 508 | l 509 | }) 510 | } 511 | 512 | if(is.null(peaks_track_names)){ 513 | names(peaks_data) = paste0("Bed", 1:length(peaks_data)) 514 | }else{ 515 | names(peaks_data) = peaks_track_names 516 | } 517 | 518 | if(show_axis){ 519 | par(mar = c(0.25, left_mar, 0.25, 1)) 520 | }else{ 521 | par(mar = c(0.25, left_mar, 0.25, 1)) 522 | } 523 | 524 | plot(NA, xlim = c(start, end), ylim = c(0, length(peaks)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 525 | 526 | for(idx in seq_along(peaks_data)){ 527 | l_idx = peaks_data[[idx]] 528 | l_idx = data.table::foverlaps(x = query, y = l_idx, type = "any", nomatch = NULL)[,.(chr, start, end)] 529 | rect(xleft = start, ybottom = idx - 0.49, xright = end, ytop = idx - 0.51, col = "gray90", border = NA) 530 | if(nrow(l_idx) > 0){ 531 | rect(xleft = l_idx$start, ybottom = idx - 0.9, xright = l_idx$end, ytop = idx - 0.1, col = "#34495e", border = NA) 532 | } 533 | text(x = start, y = idx - 0.5, labels = names(peaks_data)[idx], adj = 1.2, xpd = TRUE) 534 | } 535 | } 536 | 537 | #Draw bigWig signals 538 | boxcol = grDevices::adjustcolor(boxcol, alpha.f = boxcolalpha) 539 | if(is_bw){ 540 | if(track_overlay){ 541 | 542 | if(show_axis){ 543 | par(mar = c(0.5, left_mar, 2, 1)) 544 | }else{ 545 | par(mar = c(0.5, left_mar, 2, 1)) 546 | } 547 | 548 | plot(NA, xlim = c(start, end), ylim = c(min(plot_height_min), max(plot_height)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 549 | for(idx in 1:length(summary_list)){ 550 | x = summary_list[[idx]] 551 | if(nrow(x) == 0){ 552 | if(track_names_to_left){ 553 | text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE) 554 | #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0) 555 | #title(main = , adj = track_names_pos, font.main = 3) 556 | }else{ 557 | title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3) 558 | } 559 | next 560 | } 561 | points(x = x$start, y = x$max, col = col[idx], type = "l") 562 | } 563 | 564 | if(plot_regions){ 565 | # boxcol = "#192a56" 566 | if(nrow(x) > 0){ 567 | rect(xleft = regions[, startpos], ybottom = 0, xright = regions[, endpos], ytop = max(plot_height), col = boxcol, border = NA, xpd = TRUE) 568 | } 569 | } 570 | 571 | if(show_axis){ 572 | axis(side = 2, at = c(min(plot_height_min), max(plot_height)), las = 2) 573 | } 574 | legend(x = "topright", legend = names(summary_list), col = col, pch = "-") 575 | }else{ 576 | for(idx in 1:length(summary_list)){ 577 | x = summary_list[[idx]] 578 | if(show_axis){ 579 | par(mar = c(0.5, left_mar, 2, 1)) 580 | }else{ 581 | par(mar = c(0.5, left_mar, 2, 1)) 582 | } 583 | 584 | plot(NA, xlim = c(start, end), ylim = c(plot_height_min[idx], plot_height[idx]), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 585 | #If there is no signal, just add the track names and go to next 586 | if(nrow(x) == 0){ 587 | if(track_names_to_left){ 588 | text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE) 589 | #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0) 590 | #title(main = , adj = track_names_pos, font.main = 3) 591 | }else{ 592 | title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3) 593 | } 594 | next 595 | } 596 | rect(xleft = x$start, ybottom = 0, xright = x$end, ytop = x$max, col = col[idx], border = col[idx]) 597 | 598 | if(show_axis){ 599 | axis(side = 2, at = c(plot_height_min[idx], plot_height[idx]), las = 2) 600 | }else{ 601 | text(x = start, y = plot_height[idx], labels = paste0("[", plot_height_min[idx], "-", plot_height[idx], "]"), adj = 0, xpd = TRUE) 602 | } 603 | #plot(NA, xlim = c(start, end), ylim = c(0, nrow(regions)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 604 | 605 | if(plot_regions){ 606 | # boxcol = "#192a56" 607 | if(nrow(x) > 0){ 608 | rect(xleft = regions[, startpos], ybottom = 0, xright = regions[, endpos], ytop = max(plot_height), col = boxcol, border = NA, xpd = TRUE) 609 | } 610 | } 611 | 612 | if(track_names_to_left){ 613 | text(x = start, y = (plot_height_min[idx] + plot_height[idx])/2, labels = names(summary_list)[idx], adj = 1.1, cex = gene_fsize, xpd = TRUE) 614 | }else{ 615 | title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3) 616 | } 617 | } 618 | } 619 | }else{ 620 | for(idx in 1:length(summary_list)){ 621 | x = summary_list[[idx]] 622 | 623 | if(show_axis){ 624 | par(mar = c(0, left_mar, 0, 1)) 625 | }else{ 626 | par(mar = c(0.5, left_mar, 1, 1)) 627 | } 628 | 629 | plot(NA, xlim = c(start, end), ylim = c(0, 1), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 630 | #If there is no signal, just add the track names and go to next 631 | if(nrow(x) == 0){ 632 | if(track_names_to_left){ 633 | text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE) 634 | #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0) 635 | #title(main = , adj = track_names_pos, font.main = 3) 636 | }else{ 637 | title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3) 638 | } 639 | next 640 | } 641 | cols = cut(x$max, breaks = c(0, 166, 277, 389, 500, 612, 723, 834, 945, max(x$max)), labels = c("#FFFFFF", "#F0F0F0", "#D9D9D9", "#BDBDBD", "#969696", "#737373", 642 | "#525252", "#252525", "#000000")) 643 | rect(xleft = x$start, ybottom = 0.01, xright = x$end, ytop = 0.99, col = as.character(cols), border = NA) 644 | 645 | if(plot_regions){ 646 | # boxcol = "#192a56" 647 | boxcol = grDevices::adjustcolor(boxcol, alpha.f = boxcolalpha) 648 | if(nrow(regions) > 0){ 649 | if(nrow(x) > 0){ 650 | rect(xleft = regions[, startpos], ybottom = 0, xright = regions[, endpos], ytop = max(plot_height), col = boxcol, border = NA, xpd = TRUE) 651 | } 652 | } 653 | } 654 | 655 | if(track_names_to_left){ 656 | text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE) 657 | #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0) 658 | #title(main = , adj = track_names_pos, font.main = 3) 659 | }else{ 660 | title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3) 661 | } 662 | } 663 | 664 | } 665 | 666 | #Draw chrom HMM tracks 667 | plotHMM = FALSE 668 | if(!is.null(chromHMM)){ 669 | if(is.list(chromHMM)){ 670 | chromHMM = lapply(chromHMM, function(l){ 671 | colnames(l)[1:4] = c("chr", "start", "end", "name") 672 | data.table::setDT(l, key = c("chr", "start", "end")) 673 | l 674 | }) 675 | }else{ 676 | chromHMM = lapply(chromHMM, function(l){ 677 | l = data.table::fread(file = l) 678 | colnames(l)[1:4] = c("chr", "start", "end", "name") 679 | data.table::setDT(l, key = c("chr", "start", "end")) 680 | }) 681 | } 682 | 683 | hmmdata = lapply(chromHMM, function(hmm){ 684 | .load_chromHMM(chr = chr, start = start, end = end, ucsc = hmm) 685 | #.extract_chromHmm_ucsc() 686 | }) 687 | 688 | if(is.null(chromHMM_names)){ 689 | names(hmmdata) = paste0("chromHMM_", 1:length(hmmdata)) 690 | }else{ 691 | names(hmmdata) = chromHMM_names 692 | } 693 | 694 | plotHMM = TRUE 695 | }else if(!is.null(ucscChromHMM)){ 696 | hmmdata = lapply(ucscChromHMM, function(hmmtbl){ 697 | .extract_chromHmm_ucsc(chr = chr, start = start, end = end, refBuild = build, tbl = hmmtbl) 698 | }) 699 | names(hmmdata) = ucscChromHMM 700 | plotHMM = TRUE 701 | #return(hmmdata) 702 | } 703 | 704 | if(plotHMM){ 705 | if(show_axis){ 706 | par(mar = c(0.1, left_mar, 0, 1)) 707 | }else{ 708 | par(mar = c(0.1, left_mar, 0, 1)) 709 | } 710 | 711 | if(is.null(chromHMM_cols)){ 712 | chromHMM_cols = .get_ucsc_hmm_states_cols() 713 | } 714 | 715 | .plot_ucsc_chrHmm(d = hmmdata, start = start, end = end, hmm_cols = chromHMM_cols) 716 | } 717 | 718 | #Draw gene models 719 | if(draw_gene_track){ 720 | 721 | #etbl = .make_exon_tbl(gene_models = etbl, txname = txname, genename = genename) 722 | 723 | if(!is.null(etbl)){ 724 | 725 | if(!is.null(genename)){ 726 | if(length(etbl[unlist((lapply(etbl, attr, "gene"))) %in% genename]) == 0){ 727 | message("Note: Could not find any of the requested gene names! Available genes are:") 728 | print(unique(unlist((lapply(etbl, attr, "gene"))))) 729 | }else{ 730 | etbl = etbl[unlist((lapply(etbl, attr, "gene"))) %in% genename] 731 | } 732 | } 733 | 734 | if(collapse_txs){ 735 | etbl = .collapse_tx(etbl) 736 | } 737 | 738 | if(show_axis){ 739 | par(mar = c(0.25, left_mar, 0, 1)) 740 | }else{ 741 | par(mar = c(0.25, left_mar, 0, 1)) 742 | } 743 | 744 | plot(NA, xlim = c(start, end), ylim = c(0, length(etbl)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 745 | exon_col = "#192a56" 746 | for(tx_id in 1:length(etbl)){ 747 | txtbl = etbl[[tx_id]] 748 | segments(x0 = attr(txtbl, "start"), y0 = tx_id-0.45, x1 = attr(txtbl, "end"), y1 = tx_id-0.45, col = exon_col, lwd = 1) 749 | name_at = min(c(txtbl[[1]], txtbl[[2]])) 750 | if(is.na(attr(txtbl, "tx"))){ 751 | text(x = name_at, y = tx_id-0.45, labels = paste0(attr(txtbl, "gene")), adj = 1, cex = gene_fsize, xpd = TRUE, pos = 2) #x = start for outer margin 752 | }else{ 753 | text(x = name_at, y = tx_id-0.45, labels = paste0(attr(txtbl, "tx"), " [", attr(txtbl, "gene"), "]"), cex = gene_fsize, adj = 0, xpd = TRUE, pos = 2) 754 | } 755 | 756 | rect(xleft = txtbl[[1]], ybottom = tx_id-0.75, xright = txtbl[[2]], ytop = tx_id-0.25, col = exon_col, border = NA) 757 | if(attr(txtbl, "strand") == "+"){ 758 | dirat = pretty(x = c(min(txtbl[[1]]), max(txtbl[[2]]))) 759 | dirat[1] = min(txtbl[[1]]) #Avoid drawing arrows outside gene length 760 | dirat[length(dirat)] = max(txtbl[[2]]) 761 | points(x = dirat, y = rep(tx_id-0.45, length(dirat)), pch = ">", col = exon_col) 762 | }else{ 763 | dirat = pretty(x = c(min(txtbl[[1]]), max(txtbl[[2]]))) 764 | dirat[1] = min(txtbl[[1]]) #Avoid drawing arrows outside gene length 765 | dirat[length(dirat)] = max(txtbl[[2]]) 766 | points(x = dirat, y = rep(tx_id-0.45, length(dirat)), pch = "<", col = exon_col) 767 | } 768 | } 769 | } 770 | } 771 | 772 | #Draw scale 773 | if(show_axis){ 774 | par(mar = c(0, left_mar, 0, 1)) 775 | }else{ 776 | par(mar = c(0, left_mar, 0, 1)) 777 | } 778 | lab_at = pretty(c(start, end)) 779 | lab_at_lab = ifelse(test = lab_at > 1e6, yes = paste0(lab_at/1e6, "M"), no = ifelse(lab_at > 100000, yes = paste0(lab_at/1e5, "K"), no = lab_at)) 780 | plot(NA, xlim = c(start, end), ylim = c(0, 1), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 781 | rect(xleft = start, ybottom = 0.5, xright = end, ytop = 0.5, lty = 2, xpd = TRUE) 782 | rect(xleft = lab_at, ybottom = 0.45, xright = lab_at, ytop = 0.5, xpd = TRUE) 783 | text(x = lab_at, y = 0.2, labels = lab_at_lab, xpd = FALSE) 784 | #axis(side = 1, at = lab_at, lty = 2, line = -3) 785 | text(x = end, y = 0.9, labels = paste0(chr, ":", start, "-", end), adj = 1, xpd = TRUE) 786 | 787 | #Draw ideogram 788 | if(is.list(cyto)){ 789 | if(show_ideogram){ 790 | par(mar = c(0.2, 1, 0, 1)) 791 | plot(NA, xlim = c(0, max(cyto$end)), ylim = c(0, 1), axes = FALSE, frame.plot = FALSE, xlab = NA, ylab = NA) 792 | rect(xleft = cyto$start, ybottom = 0.1, xright = cyto$end, ytop = 0.6, col = cyto$color, border = "#34495e") 793 | rect(xleft = start, ybottom = 0, xright = end, ytop = 0.7, col = "#d35400", lwd = 2, border = "#d35400") 794 | text(x = 0, y = 0.5, labels = chr, adj = 1.2, font = 2, xpd = TRUE) 795 | } 796 | } 797 | 798 | } 799 | 800 | # profileplot is an ultra-fast, simple, and minimal dependency R script to generate profile-plots from bigWig files 801 | #' Generate bigWig signal matrix for given genomic regions or ucsc refseq transcripts 802 | #' @param colData from \code{read_coldata} 803 | #' @param bed bed file or a data.frame with first 3 column containing chromosome, star, end positions. 804 | #' @param binSize bin size to extract signal. Default 50 (bps). Should be >1 805 | #' @param startFrom Default "start". For bed files this can be "start", "center" or "end". For `ucsc_assembly` this can only be "start" or "end" 806 | #' @param up extend upstream by this many bps from `startFrom`. Default 2500 807 | #' @param down extend downstream by this many bps from `startFrom`. Default 2500 808 | #' @param ucsc_assembly If `bed` file not provided, setting `ucsc_assembly` to TRUE will fetch transcripts from UCSC genome browser. 809 | #' @param pc_genes Use only protein coding genes when `ucsc_assembly` is used. Default TRUE 810 | #' @param nthreads Default 4 811 | #' @seealso \code{\link{profile_summarize}} \code{\link{profile_plot}} \code{\link{profile_heatmap}} 812 | #' @export 813 | 814 | profile_extract = function(colData = NULL, bed = NULL, ucsc_assembly = TRUE, startFrom = "start", binSize = 50, 815 | up = 2500, down = 2500, pc_genes = TRUE, nthreads = 4){ 816 | .check_windows() 817 | .check_bwtool(warn = FALSE) 818 | .check_dt() 819 | 820 | if(is.null(colData)){ 821 | stop("Missing colData. Use read_coldata() to generate one.") 822 | } 823 | 824 | bigWigs = colData$bw_files 825 | custom_names = colData$bw_sample_names 826 | 827 | op_dir = tempdir() #For now 828 | 829 | if(is.null(bed)){ 830 | if(ucsc_assembly){ 831 | ucsc_assembly = attr(colData, "refbuild") 832 | message("No bed file was given. Defaulting to ucsc refseq..") 833 | startFrom = match.arg(arg = startFrom, choices = c("start", "end")) 834 | bed = .make_genome_bed(refBuild = ucsc_assembly, up = as.numeric(up), down = as.numeric(down), tss = startFrom, op_dir = op_dir, pc_genes = pc_genes, for_profile = TRUE) 835 | bed_annot = bed[[2]] 836 | bed = bed[[1]] 837 | }else{ 838 | stop("Please provide either a BED file or set ucsc_assembly to TRUE") 839 | } 840 | }else{ 841 | startFrom = match.arg(arg = startFrom, choices = c("start", "end", "center")) 842 | bed = .make_bed(bed = bed, op_dir = op_dir, up = as.numeric(up), down = as.numeric(up), tss = startFrom, for_profile = TRUE) 843 | bed_annot = NA 844 | } 845 | 846 | message("Extracting signals..") 847 | mats = parallel::mclapply(bigWigs, function(x){ 848 | .bwt_mats(bw = x, binSize = binSize, bed = bed, size = paste0(up, ":", down), startFrom = startFrom, op_dir = op_dir) 849 | }, mc.cores = nthreads) 850 | 851 | mats = as.character(unlist(x = mats)) 852 | sig_list = lapply(mats, data.table::fread) 853 | 854 | if(!is.null(custom_names)){ 855 | names(sig_list) = custom_names 856 | }else{ 857 | names(sig_list) = gsub(pattern = "*\\.matrix$", replacement = "", x = basename(path = mats)) 858 | } 859 | 860 | attr(sig_list, "args") = c(up, down) 861 | 862 | #Remove intermediate files 863 | lapply(mats, function(x) system(command = paste0("rm ", x), intern = TRUE)) 864 | 865 | list(data = sig_list, colData = colData) 866 | } 867 | 868 | 869 | #' Summarize data for profile plots 870 | #' @param sig_list Output generated from `profile_extract` 871 | #' @param stat Default `mean`. Can be `mean`, `median` 872 | #' @param condition column name with conditions in `colData`. If provided summarizes signals from samples belonging to same group or condition 873 | #' @seealso \code{\link{profile_extract}} \code{\link{profile_plot}} \code{\link{profile_heatmap}} 874 | #' @export 875 | profile_summarize = function(sig_list = NULL, stat = "mean", condition = NULL){ 876 | 877 | if(is.null(sig_list)){ 878 | stop("Missing input! Use profile_extract() to generate one.") 879 | } 880 | 881 | stat = match.arg(arg = stat, choices = c("mean", 'median')) 882 | 883 | colData = sig_list$colData 884 | collapse_replicates = FALSE 885 | if(!is.null(condition)){ 886 | if(!condition %in% colnames(colData)){ 887 | stop(condition , " not found in colData!\nAvailable columns: ", print(paste(colnames(colData), collapse = " "))) 888 | } 889 | collapse_by_idx = which(colnames(colData) == condition) 890 | condition = unlist(colData[,collapse_by_idx, with = FALSE], use.names = FALSE) 891 | collapse_replicates = TRUE 892 | } 893 | 894 | message("Summarizing..") 895 | sig_summary = .summarizeMats(mats = sig_list$data, group = condition, collapse_reps = collapse_replicates, summarizeBy = stat) 896 | attr(sig_summary, "args") = attr(sig_list$data, "args") 897 | list(data = sig_summary, colData = colData) 898 | } 899 | 900 | 901 | #' Draw a profile plot 902 | #' @param sig_list Output generated from profile_summarize 903 | #' @param color Manual colors for each bigWig. Default NULL. 904 | #' @param line_size Default 1 905 | #' @param legend_fs Legend font size. Default 1 906 | #' @param axis_fs Axis font size. Default 1 907 | #' @param xlab x axis label. Default NA 908 | #' @param ylab y axis label. Default NA 909 | #' @export 910 | profile_plot = function(sig_list = NULL, color = NULL, line_size = 1, legend_fs = 1, axis_fs = 1, xlab = NA, ylab = NA){ 911 | 912 | if(is.null(sig_list)){ 913 | stop("Missing input! Expecting output from profile_summarize()") 914 | } 915 | 916 | args = attr(sig_list$data, "args") 917 | up = as.numeric(args[1]) 918 | down = as.numeric(args[2]) 919 | 920 | sig_summary = sig_list$data 921 | 922 | if(is.null(color)){ 923 | color = c("#2f4f4f", "#8b4513", "#228b22", "#00008b", "#ff0000", "#ffd700", "#7fff00", "#00ffff", "#ff00ff", "#6495ed", "#ffe4b5", "#ff69b4") #hcl.colors(n = 10, palette = "Dark 2") 924 | color = color[1:length(sig_summary)] 925 | } 926 | 927 | y_max = max(unlist(lapply(sig_summary, max, na.rm = TRUE))) 928 | y_min = min(unlist(lapply(sig_summary, min, na.rm = TRUE))) 929 | ylabs = pretty(c(y_min, y_max), n = 5) 930 | 931 | 932 | x_max = max(unlist(lapply(sig_summary, length))) 933 | xlabs = c(up, 0, down) 934 | xticks = xticks = c(0, 935 | as.integer(length(sig_summary[[1]])/sum(as.numeric(xlabs[1]), as.numeric(xlabs[3])) * as.numeric(xlabs[1])), 936 | length(sig_summary[[1]])) 937 | 938 | #line_size = 1 939 | par(mar = c(4, 4, 2, 1)) 940 | plot(NA, xlim = c(0, x_max), ylim = c(min(ylabs), max(ylabs)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 941 | abline(h = ylabs, v = pretty(xticks), col = "gray90", lty = 2) 942 | 943 | lapply(1:length(sig_summary), function(idx){ 944 | points(sig_summary[[idx]], type = 'l', lwd = line_size, col = color[idx]) 945 | }) 946 | axis(side = 1, at = xticks, labels = xlabs, cex.axis = axis_fs) 947 | axis(side = 2, at = ylabs, las = 2, cex.axis = axis_fs) 948 | 949 | legend(x = "topright", legend = names(sig_summary), col = color, bty = "n", lty = 1, lwd = 1.2, cex = legend_fs, xpd = TRUE) 950 | 951 | mtext(text = xlab, side = 1, line = 2.5, cex = 1) 952 | mtext(text = ylab, side = 2, line = 2.5, cex = 1) 953 | 954 | invisible(list(mean_signal = sig_summary, color_codes = color, xticks = xticks, xlabs = xlabs)) 955 | } 956 | 957 | #' Draw a heatmap 958 | #' @details This function takes output from extract_matrix and draws a heatmap 959 | #' @param mat_list Input matrix list generated by \code{\link{profile_extract}} 960 | #' @param sortBy Sort matrix by.., Can be mean, median. Default mean. 961 | #' @param col_pal Color palette to use. Default Blues. Use hcl.pals(type = "sequential") to see available palettes 962 | #' @param revpal Reverse color palette? Default FALSE. 963 | #' @param sample_names Manually specify sample names. 964 | #' @param title_size size of title. Default 0.8 965 | #' @param top_profile Boolean. Whether to draw top profile plot. 966 | #' @param top_profile_h Default 2. 967 | #' @param zmins Manually specify min scores to include in heatmap 968 | #' @param zmaxs Manually specify max scores to include in heatmap 969 | #' @param scale Whether to row scale the matrix. Default FALSE 970 | #' @param file_name Default NULL. If provided saves plot as a png. 971 | #' @param hm_width Width of the plot. Default 1024 972 | #' @param hm_height Height of the plot. Default 600 973 | #' @param mat_order Default NULL. Sample order in which the heatmaps are drawn. 974 | #' @export 975 | profile_heatmap = function(mat_list, sortBy = "mean", col_pal = "Blues", revpal = FALSE, sample_names = NULL, 976 | title_size = 1, top_profile = FALSE, top_profile_h = 2, zmins = NULL, zmaxs = NULL, 977 | scale = FALSE, file_name = NULL, hm_width = 1024, hm_height = 600, mat_order = NULL){ 978 | 979 | 980 | if(!sortBy %in% c("mean", "median")){ 981 | stop("sortBy can only be mean, median") 982 | } 983 | 984 | col_pal = match.arg(arg = col_pal, choices = hcl.pals(type = "sequential")) 985 | hmcols = rev(hcl.colors(n = 255, palette = col_pal)) 986 | if(revpal){ 987 | hmcols = rev(hmcols) 988 | } 989 | 990 | cdata = mat_list$colData 991 | size = attr(mat_list$data, "args") 992 | mat_list = mat_list$data 993 | 994 | if(!is.null(mat_order)){ 995 | if(length(mat_order) != length(mat_list)){ 996 | stop("Length of mat_order should be the same as number of bw files! [", length(mat_list), "]") 997 | } 998 | mat_list = mat_list[mat_order] 999 | } 1000 | 1001 | mat_list = .order_matrix(mats = mat_list, sortBy = sortBy) 1002 | 1003 | if(top_profile){ 1004 | profile_dat = .summarizeMats(mats = mat_list, summarizeBy = "mean") 1005 | yl = c(min(unlist(x = profile_dat), na.rm = TRUE), 1006 | max(unlist(x = profile_dat), na.rm = TRUE)) 1007 | yl = round(x = yl, digits = 2) 1008 | 1009 | } 1010 | 1011 | nsamps = length(mat_list) 1012 | 1013 | if(!is.null(zmins)){ 1014 | if(length(zmins) != length(mat_list)){ 1015 | warning("zmins are recycled") 1016 | zmins = rep(zmins, length(mat_list)) 1017 | } 1018 | }else{ 1019 | zmins = unlist(lapply(profile_dat, min, na.rm = TRUE)) 1020 | } 1021 | 1022 | if(!is.null(zmaxs)){ 1023 | if(length(zmaxs) != length(mat_list)){ 1024 | warning("zmaxs are recycled") 1025 | zmaxs = rep(zmaxs, length(mat_list)) 1026 | } 1027 | }else{ 1028 | zmaxs = unlist(lapply(profile_dat, max, na.rm = TRUE)) 1029 | } 1030 | 1031 | if(!is.null(sample_names)){ 1032 | if(length(sample_names) != length(mat_list)){ 1033 | stop("Number of sample names should be equal to number of samples in the matrix") 1034 | }else{ 1035 | sample_names = cdata$bw_sample_names 1036 | names(mat_list) = sample_names 1037 | } 1038 | } 1039 | 1040 | xlabs = c(size[1], 0, size[2]) 1041 | xticks = c(0, 1042 | 1/(sum(as.integer(xlabs[1]), as.integer(xlabs[3]))/as.integer(xlabs[1])), 1043 | 1) 1044 | 1045 | if(!is.null(file_name)){ 1046 | png(filename = paste0(file_name, ".png"), res = 100, height = 1024, width = 600) 1047 | } 1048 | 1049 | if(top_profile){ 1050 | lo_mat = matrix(data = 1:(nsamps*2), nrow = 2, ncol = nsamps) 1051 | lo = graphics::layout(mat = lo_mat, heights = c(top_profile_h, 9)) 1052 | }else{ 1053 | lo_mat = matrix(data = 1:nsamps, nrow = 1) 1054 | lo = graphics::layout(mat = lo_mat) 1055 | } 1056 | 1057 | for(i in 1:nsamps){ 1058 | 1059 | if(top_profile){ 1060 | .plot_profile_mini(plot_dat = profile_dat, index = i, ylims = c(zmins[i], zmaxs[i])) 1061 | } 1062 | 1063 | par(mar = c(3,2,1.5,1), las=1, tcl=-.25) 1064 | hm.dat = mat_list[[i]] 1065 | data.table::setDF(x = hm.dat) 1066 | 1067 | if(is.null(zmins)){ 1068 | #colMin = apply(hm.dat, 2, min, na.rm = TRUE) 1069 | zmin = round(min(hm.dat, na.rm = TRUE), digits = 2) 1070 | #zmin = 0 1071 | }else{ 1072 | zmin = zmins[i] 1073 | } 1074 | 1075 | if(scale){ 1076 | hm.dat = scale(x = t(hm.dat), scale = TRUE) 1077 | #hm.dat = apply(hm.dat, 2, rev) 1078 | }else{ 1079 | hm.dat = t(apply(hm.dat, 2, rev)) 1080 | } 1081 | 1082 | if(is.null(zmaxs)){ 1083 | zmax = max(rowMeans(hm.dat, na.rm = TRUE)) #mean(apply(hm.dat, 2, max, na.rm = TRUE)) 1084 | #colMax = which(x = colMax == max(colMax), arr.ind = TRUE)[1] 1085 | #zmax = round(max(boxplot.stats(unlist(hm.dat[colMax,]))$stats), 2) 1086 | }else{ 1087 | zmax = zmaxs[i] 1088 | } 1089 | 1090 | #hmcols = grDevices::colorRampPalette(hmcols)(255) 1091 | hm.dat[hm.dat >= zmax] = zmax 1092 | #return(hm.dat) 1093 | image(hm.dat, axes = FALSE, col = hmcols, useRaster = TRUE, 1094 | zlim = c(zmin, zmax), xlim = c(-0.2, 1)) 1095 | 1096 | #Add legend 1097 | image(x = c(-0.1, -0.05), y = seq(0, 1, length.out = length(hmcols)-1), 1098 | z = matrix(data = 1:(length(hmcols)-1), nrow = 1), add = TRUE, col = hmcols) 1099 | axis(side = 2, at = seq(0, 1, length.out = 5), 1100 | labels = NA, 1101 | line = -0.6, font.axis = 2, yaxp = c(1.1, 1.2, 3), lwd = 1) 1102 | mtext(text = round(seq(zmin, zmax, length.out = 5), digits = 2), side = 2, 1103 | line = -0.1, at = seq(0, 1, length.out = 5), font = 1) 1104 | 1105 | title(main = names(mat_list)[i], cex.main = title_size, font.main = 2) 1106 | axis(side = 1, at = xticks, 1107 | labels = NA, lty = 1, lwd = 1, 1108 | font.axis = 2, cex.axis = 1, line = 0.7, tick = FALSE) 1109 | mtext(text = c(paste0("-", xlabs[1]), xlabs[2], xlabs[3]), side = 1, line = 1, at = xticks, font = 1) 1110 | 1111 | rect(xleft = 0, ybottom = 0, xright = 1, ytop = 1, border = "black", lwd = 1) 1112 | } 1113 | 1114 | if(!is.null(file_name)){ 1115 | dev.off() 1116 | } 1117 | 1118 | } 1119 | 1120 | # bwpcaplot function to perform PCA analysis based on genomic regions of interest or around TSS sites. 1121 | #' Extract area under the curve for every peak from from given bigWig files. 1122 | #' @param colData bigWig files. Default NULL. Required. 1123 | #' @param bed bed file or a data.frame with first 3 column containing chromosome, star, end positions. 1124 | #' @param binSize bin size to extract signal. Default 50 (bps). Should be >1 1125 | #' @param ucsc_assembly If `bed` file not provided, setting `ucsc_assembly` to TRUE will fetch transcripts from UCSC genome browser. 1126 | #' @param startFrom Default "start". For bed files this can be "start", "center" or "end". For `ucsc_assembly` this can only be "start" or "end" 1127 | #' @param pc_genes Use only protein coding genes when using `ucsc_assembly`. Default TRUE 1128 | #' @param up extend upstream by this many bps from `startFrom`. Default 2500 1129 | #' @param down extend downstream by this many bps from `startFrom`. Default 2500 1130 | #' @param nthreads Default 4 1131 | #' @export 1132 | extract_summary = function(colData, bed = NULL, ucsc_assembly = TRUE, startFrom = "start", binSize = 50, 1133 | up = 2500, down = 2500, pc_genes = TRUE, nthreads = 4){ 1134 | 1135 | if(is.null(colData)){ 1136 | stop("Missing colData. Use read_coldata() to generate one.") 1137 | } 1138 | 1139 | .check_windows() 1140 | .check_bwtool() 1141 | .check_dt() 1142 | 1143 | bigWigs = colData$bw_files 1144 | custom_names = colData$bw_sample_names 1145 | 1146 | op_dir = tempdir() #For now 1147 | 1148 | if(is.null(bed)){ 1149 | if(ucsc_assembly){ 1150 | ucsc_assembly = attr(colData, "refbuild") 1151 | message("No bed file was given. Defaulting to ucsc refseq..") 1152 | startFrom = match.arg(arg = startFrom, choices = c("start", "end")) 1153 | bed = .make_genome_bed(refBuild = ucsc_assembly, up = as.numeric(up), down = as.numeric(down), tss = startFrom, op_dir = op_dir, pc_genes = pc_genes, for_profile = FALSE) 1154 | bed_annot = bed[[2]] 1155 | bed = bed[[1]] 1156 | }else{ 1157 | stop("Please provide either a BED file or an ucsc_assembly name") 1158 | } 1159 | }else{ 1160 | startFrom = match.arg(arg = startFrom, choices = c("start", "end", "center")) 1161 | bed = .make_bed(bed = bed, op_dir = op_dir, up = as.numeric(up), down = as.numeric(up), tss = startFrom) 1162 | bed_annot = NA 1163 | } 1164 | 1165 | op_dir = tempdir() #For now 1166 | if(!dir.exists(paths = op_dir)){ 1167 | dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE) 1168 | } 1169 | 1170 | message("Extracting summaries..") 1171 | summaries = parallel::mclapply(seq_along(bigWigs), FUN = function(idx){ 1172 | bw = bigWigs[idx] 1173 | bn = custom_names[idx] 1174 | cmd = paste("bwtool summary -with-sum -keep-bed -header", bed, bw, paste0(op_dir, bn, ".summary")) 1175 | system(command = cmd, intern = TRUE) 1176 | paste0(op_dir, bn, ".summary") 1177 | }, mc.cores = nthreads) 1178 | 1179 | merge_anno = FALSE 1180 | 1181 | if(!is.na(bed_annot)){ 1182 | bed_annot = data.table::fread(file = bed_annot) 1183 | bed_annot[, id := paste0(V1, ":", V2, "-", V3)] 1184 | colnames(bed_annot) = c("chromosome", "start", "end", "tx", "gene", "id") 1185 | merge_anno = TRUE 1186 | } 1187 | 1188 | summary_list = lapply(summaries, function(x){ 1189 | x = data.table::fread(x) 1190 | colnames(x)[1] = 'chromosome' 1191 | x = x[,.(chromosome, start, end, size, sum)] 1192 | #x[,id := paste0(chromosome, ":", start, "-", end)] 1193 | x$sum 1194 | }) 1195 | summary_list = data.frame(summary_list) 1196 | colnames(summary_list) = custom_names 1197 | data.table::setDT(summary_list) 1198 | 1199 | if(merge_anno){ 1200 | summary_list = cbind(bed_annot, summary_list) 1201 | }else{ 1202 | bed = data.table::fread(file = bed) 1203 | colnames(bed) = c("chromosome", "start", "end") 1204 | summary_list = cbind(bed, summary_list) 1205 | } 1206 | 1207 | list(data = summary_list, colData = colData) 1208 | } 1209 | 1210 | #' Differential Peak Analysis 1211 | #' @details Takes output from \code{extract_summary} and performs differential peak analysis with Limma 1212 | #' 1213 | #' @param summary_list Output from \code{\link{extract_summary}} 1214 | #' @param condition a column name in \code{coldata} containing sample conditions. Default NULL. 1215 | #' @param log2 log2 convert data prior to testing. Default TRUE 1216 | #' @param num Numerator condition. Default NULL 1217 | #' @param den Denominator condition. Default NULL 1218 | #' @export 1219 | #' 1220 | diffpeak = function(summary_list = NULL, condition = NULL, log2 = TRUE, num = NULL, den = NULL){ 1221 | 1222 | if (is.null(condition)){ 1223 | stop("Define a condition for the differential peak analysis\n") 1224 | } 1225 | 1226 | sum_tbl = as.data.frame(summary_list$data) 1227 | sum_tbl$rid = paste0("rid_", 1:nrow(sum_tbl)) 1228 | rownames(sum_tbl) = sum_tbl$rid 1229 | coldata = as.data.frame(summary_list$colData) 1230 | 1231 | if(condition %in% colnames(coldata)){ 1232 | condition <- coldata[, condition] 1233 | }else { 1234 | print(coldata) 1235 | stop("The condition argument must be a column name in coldata. See above for list of valid colnames") 1236 | } 1237 | 1238 | exprs = sum_tbl[,coldata$bw_sample_names] 1239 | if(log2){ 1240 | exprs = log2(x = exprs + 0.1) 1241 | } 1242 | #sum_tbl = sum_tbl[complete.cases(sum_tbl),, drop = FALSE] 1243 | #sds_idx = order(apply(sum_tbl, 1, sd, na.rm = TRUE), decreasing = TRUE, na.last = TRUE) 1244 | #sum_tbl = sum_tbl[sds_idx,,drop = FALSE] 1245 | 1246 | 1247 | design <- model.matrix(~0 + as.factor(condition)) 1248 | colnames(design) <- as.character(levels(as.factor(condition))) 1249 | 1250 | if(is.null(num) & is.null(den)){ 1251 | contrast <- vector() 1252 | for (a in 1:length(levels(as.factor(condition)))) { 1253 | for (b in 1:length(levels(as.factor(condition)))) { 1254 | if (a != b) { 1255 | if (a < b) { 1256 | contrast[length(contrast) + 1] <- paste(levels(as.factor(condition))[a], 1257 | levels(as.factor(condition))[b], sep = "-") 1258 | } 1259 | } 1260 | } 1261 | } 1262 | message("Argument num and den are missing. Pefroming diffpeak analysis for below contrast:") 1263 | print(contrast[1]) 1264 | num = unlist(data.table::tstrsplit(x = contrast[1], split = "-"))[1] 1265 | den = unlist(data.table::tstrsplit(x = contrast[1], split = "-"))[2] 1266 | }else{ 1267 | if(is.null(num) | is.null(den)){ 1268 | stop("Num and Den must be provided") 1269 | }else{ 1270 | contrast = paste0(num, "-", den) 1271 | } 1272 | } 1273 | 1274 | fit <- limma::lmFit(object = exprs, design = design) 1275 | 1276 | cnt <- paste(colnames(design)[1], colnames(design)[2], sep = "-") 1277 | cMat <- limma::makeContrasts(contrasts = contrast, levels = design) 1278 | fit2 <- limma::contrasts.fit(fit, cMat) 1279 | efit <- limma::eBayes(fit2) 1280 | 1281 | tt = limma::topTable(fit = efit, coef = 1, number = "all") 1282 | tt = merge(sum_tbl, tt, by = "row.names") 1283 | data.table::setDT(x = tt) 1284 | tt$Row.names = tt$rid = NULL 1285 | 1286 | tt = tt[order(P.Value, decreasing = FALSE)] 1287 | attr(tt, "contrast") = contrast 1288 | tt 1289 | } 1290 | 1291 | #' Volcano plot for limma results from differential peak Analysis 1292 | #' @details Takes output from \code{diffpeak} and draws volcano plot 1293 | #' @param res Output from \code{\link{diffpeak}} 1294 | #' @param fdr FDR threshold. Default 0.1 1295 | #' @param upcol color for up-regulated. Default "#d35400" 1296 | #' @param downcol color for up-regulated. Default "#1abc9c" 1297 | #' @param alpha Default 0.6 1298 | #' @param size Point size. Default 0.8 1299 | #' @export 1300 | #' 1301 | volcano_plot = function(res = NULL, fdr = 0.1, upcol = "#d35400", downcol = "#1abc9c", alpha = 0.6, size = 0.8){ 1302 | 1303 | if(is.null(res)){ 1304 | stop("Missing input. Expecting output diffpeak()") 1305 | } 1306 | 1307 | contrast = attr(res, "contrast") 1308 | 1309 | ylims = max(-log10(res$P.Value), na.rm = TRUE) 1310 | xlims = range(res$logFC) 1311 | 1312 | res_sig = res[adj.P.Val < fdr] 1313 | res_nonsig = res[!adj.P.Val < fdr] 1314 | 1315 | par(mar = c(3.5, 3.5, 3, 1)) 1316 | plot(NA, xlim = xlims, ylim = c(0, ylims), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA) 1317 | grid() 1318 | points(res_nonsig$logFC, y = -log10(res_nonsig$P.Value), pch = 19, col = adjustcolor("gray", alpha.f = alpha), cex = size) 1319 | if(nrow(res_sig)>0){ 1320 | points(res_sig$logFC, y = -log10(res_sig$P.Value), pch = 19, col = ifelse(test = res_sig$logFC > 0, adjustcolor(col = upcol, alpha.f = alpha), adjustcolor(col = downcol, alpha.f = alpha)), cex = size) 1321 | } 1322 | axis(side = 1, at = pretty(xlims)) 1323 | axis(side = 2, at = pretty(c(0, ylims)), las = 2) 1324 | mtext(text = "Log Fold Change", side = 1, line = 2, col = "#34495e", font = 2) 1325 | mtext(text = "-log10(P-Value)", side = 2, line = 2, col = "#34495e", font = 2) 1326 | title(main = contrast, col.main = "#34495e") 1327 | 1328 | #Small hack to align legend properly 1329 | leg_pos = which(abs(xlims) == max(abs(xlims))) 1330 | leg_pos = ifelse(test = leg_pos == 1, yes = "bottomleft", no = "bottomright") 1331 | 1332 | legend(x = leg_pos, legend = c(paste0("Down [", nrow(res_sig[logFC < 0]), "]"), paste0("Up [", nrow(res_sig[logFC > 0]), "]")), 1333 | col = c(downcol, upcol), pch = 19, title = paste0("Diff. peaks [FDR<", fdr, "]"), adj = 0, bty = "n") 1334 | } 1335 | 1336 | #' Draw a PCA plot 1337 | #' @param summary_list output from extract_summary 1338 | #' @param top Top most variable peaks to consider for PCA. Default 1000 1339 | #' @param log2 log transform data? Default FALSE. IF TRUE, adds a small positive value and log2 converts. 1340 | #' @param xpc Default PC1 1341 | #' @param ypc Default PC2 1342 | #' @param color_by a column name in \code{coldata} to color by. Default NULL. 1343 | #' @param pch_by a column name in \code{coldata} to pch by. Default NULL. 1344 | #' @param color Manual colors for each level in `color_by` Default NULL. 1345 | #' @param show_cree If TRUE draws a cree plot. Default TRUE 1346 | #' @param size Point size. Default 1 1347 | #' @param lab_size Font size for labels. Default 1 1348 | #' @param legend_size Default 1 1349 | #' @param legendpos Default topright 1350 | #' @param legendpos2 Default bottomright 1351 | #' @export 1352 | pca_plot = function(summary_list = NULL, top = 1000, log2 = FALSE, xpc = "PC1", ypc = "PC2", color_by = NULL, pch_by = NULL, color = NULL, 1353 | show_cree = TRUE, lab_size = 1, size = 1, legend_size = 1, legendpos = "topright", legendpos2 = "bottomright"){ 1354 | 1355 | 1356 | if(is.null(summary_list)){ 1357 | stop("Missing input! Expecting output from extract_summary()") 1358 | } 1359 | 1360 | sum_tbl = as.data.frame(summary_list$data) 1361 | colData = summary_list$colData 1362 | 1363 | sum_tbl = sum_tbl[,colData$bw_sample_names] 1364 | if(log2){ 1365 | sum_tbl = log2(x = sum_tbl + 0.1) 1366 | } 1367 | #sum_tbl = sum_tbl[complete.cases(sum_tbl),, drop = FALSE] 1368 | sds_idx = order(apply(sum_tbl, 1, sd, na.rm = TRUE), decreasing = TRUE, na.last = TRUE) 1369 | sum_tbl = sum_tbl[sds_idx,,drop = FALSE] 1370 | 1371 | 1372 | if(is.null(color)){ 1373 | condition_colors = c("#2f4f4f", "#8b4513", "#228b22", "#00008b", "#ff0000", "#ffd700", "#7fff00", "#00ffff", "#ff00ff", "#6495ed", "#ffe4b5", "#ff69b4") 1374 | # condition_colors = c("#A6CEE3FF", "#1F78B4FF", "#B2DF8AFF", "#33A02CFF", "#FB9A99FF", 1375 | # "#E31A1CFF", "#FDBF6FFF", "#FF7F00FF", "#CAB2D6FF", "#6A3D9AFF", 1376 | # "#FFFF99FF", "#9E0142FF", "#D53E4FFF", "#F46D43FF", "#000000FF", 1377 | # "#EE82EEFF", "#4169E1FF", "#7B7060FF", "#535C68FF") 1378 | }else{ 1379 | condition_colors = color 1380 | } 1381 | 1382 | condition = color_by 1383 | pch = pch_by 1384 | 1385 | if(!is.null(condition)){ 1386 | group_df = cbind(sample = colData$bw_sample_names, condition = colData[,which(colnames(colData) == condition), with = FALSE]) 1387 | colnames(group_df) = c("sample", "condition") 1388 | condition_colors = condition_colors[1:nrow(group_df[,.N,condition])] 1389 | names(condition_colors) = group_df[,.N,condition][,condition] 1390 | group_df$color = condition_colors[group_df$condition] 1391 | condition_leg = TRUE 1392 | }else{ 1393 | group_df = data.table::data.table(sample = colData$bw_sample_names) 1394 | group_df$color = "black" 1395 | condition_leg = FALSE 1396 | } 1397 | 1398 | 1399 | if(!is.null(pch)){ 1400 | pch_vec = 15:19 1401 | pch_df = cbind(sample = colData$bw_sample_names, pch = colData[,which(colnames(colData) == pch), with = FALSE]) 1402 | colnames(pch_df) = c("sample", "pch") 1403 | pchs = pch_vec[1:nrow(pch_df[,.N,pch])] 1404 | names(pchs) = pch_df[,.N,pch][,pch] 1405 | pch_df$pch = pchs[pch_df$pch] 1406 | pch_leg = TRUE 1407 | }else{ 1408 | pch_df = cbind(sample = colData$bw_sample_names, pch = 19) 1409 | colnames(pch_df) = c("sample", "pch") 1410 | pchs = NA 1411 | pch_leg = FALSE 1412 | } 1413 | 1414 | group_df = merge(group_df, pch_df, by = "sample", all = TRUE) 1415 | 1416 | if(nrow(sum_tbl) < top){ 1417 | pca = prcomp(t(sum_tbl)) 1418 | }else{ 1419 | pca = prcomp(t(sum_tbl[1:top,])) 1420 | } 1421 | 1422 | pca_dat = as.data.frame(pca$x) 1423 | pca_var_explained = pca$sdev^2/sum(pca$sdev^2) 1424 | names(pca_var_explained) = paste0("PC", 1:length(pca_var_explained)) 1425 | data.table::setDT(x = pca_dat, keep.rownames = "sample") 1426 | pca_dat = merge(pca_dat, group_df, by = 'sample') 1427 | data.table::setDF(x = pca_dat) 1428 | attr(pca_dat, "percentVar") <- round(pca_var_explained, digits = 3) 1429 | #print(head(pca_dat)) 1430 | 1431 | if(show_cree){ 1432 | lo = graphics::layout(mat = matrix(data = c(1, 2), ncol = 2)) 1433 | } 1434 | 1435 | grid_cols = "gray90" 1436 | par(mar = c(3, 4, 2, 1)) 1437 | plot(NA, axes = FALSE, xlab = NA, ylab = NA, cex = 1.2, xlim = range(pretty(pca_dat[, xpc])), ylim = range(pretty(pca_dat[, ypc]))) 1438 | abline(h = pretty(pca_dat[, xpc]), v = pretty(pca_dat[, ypc]), col = grid_cols, lty = 2, lwd = 0.1) 1439 | abline(h = 0, v = 0, col = grid_cols, lty = 2, lwd = 0.8) 1440 | points(x = pca_dat[, xpc], y = pca_dat[, ypc], col = pca_dat$color, bg = pca_dat$color, pch = as.integer(pca_dat$pch), cex = size) 1441 | axis(side = 1, at = pretty(pca_dat[, xpc]), cex.axis = 0.8) 1442 | axis(side = 2, at = pretty(pca_dat[, ypc]), las = 2, cex.axis = 0.8) 1443 | mtext(text = paste0(xpc, " [", round(pca_var_explained[xpc], digits = 2), "]"), side = 1, line = 2, cex = 0.8) 1444 | mtext(text = paste0(ypc, " [", round(pca_var_explained[ypc], digits = 2), "]"), side = 2, line = 2, cex = 0.8) 1445 | text(x = pca_dat[, xpc], y = pca_dat[, ypc], labels = pca_dat$sample, pos = 3, col = pca_dat$color, xpd = TRUE, cex = lab_size) 1446 | #title(main = NA, sub = paste0("N = ", top, " peaks"), adj = 0, outer = TRUE) 1447 | 1448 | data.table::setDT(x = pca_dat) 1449 | 1450 | if(condition_leg){ 1451 | legend(x = legendpos, legend = names(condition_colors), col = condition_colors, 1452 | pch = 19, cex = legend_size, xpd = TRUE, ncol = 2, bty = "n") 1453 | } 1454 | 1455 | if(pch_leg){ 1456 | legend(x = legendpos2, legend = names(pchs), col = "black", 1457 | pch = as.integer(pchs), cex = legend_size, xpd = TRUE, ncol = 1, bty = "n") 1458 | } 1459 | 1460 | if(show_cree){ 1461 | par(mar = c(3, 4, 2, 4)) 1462 | b = barplot(height = pca_var_explained, names.arg = NA, col = "#2c3e50", ylim = c(0, 1), las = 2, axes = FALSE) 1463 | points(x = b, y = cumsum(pca_var_explained), type = 'l', lty = 2, lwd = 1.2, xpd = TRUE, col = "#c0392b") 1464 | points(x = b, y = cumsum(pca_var_explained), type = 'p', pch = 19, xpd = TRUE, col = "#c0392b") 1465 | mtext(text = paste0("PC", 1:length(pca_var_explained)), side = 1, at = b, las = 2, line = 0.5, cex = 0.8) 1466 | axis(side = 2, at = seq(0, 1, 0.1), line = 0, las = 2, cex.axis = 0.8) 1467 | mtext(text = "var. explained", side = 2, line = 2.5) 1468 | axis(side = 4, at = seq(0, 1, 0.1), line = 0, las = 2, cex.axis = 0.8) 1469 | mtext(text = "cumulative var. explained", side = 4, line = 2.5) 1470 | } 1471 | 1472 | invisible(x = pca_dat) 1473 | } 1474 | 1475 | 1476 | #' Parse peak annotations generated by homer annotatePeaks.pl 1477 | #' @details summarizes peak annotations generated with homer annotatePeaks.pl, generates a pie chart of peak distributions. 1478 | #' 1479 | #' @param anno Raw annotations generated by homer `annotatePeaks.pl`. Can be more than one file. 1480 | #' @param sample_names Sample names correspoding to each input file. Default parses from input file. 1481 | #' @param legend_font_size font size for legend. Default 1. 1482 | #' @param label_size font size for labels. Default 0.8. 1483 | #' @export 1484 | 1485 | summarize_homer_annots = function(anno, sample_names = NULL, legend_font_size = 1, label_size = 0.8){ 1486 | 1487 | homer = lapply(anno, function(homer.anno){ 1488 | homer.anno = data.table::fread(homer.anno) 1489 | colnames(homer.anno)[1] = 'peakid' 1490 | homer.anno$anno = sapply(strsplit(x = as.character(homer.anno$Annotation), split = ' (', fixed = T), '[[', 1) 1491 | 1492 | homer.anno = homer.anno[,.(Chr, Start, End, Strand, anno, `Gene Name`, `Gene Type`,`Distance to TSS`, `Nearest PromoterID`, `Nearest Ensembl`)] 1493 | homer.anno = homer.anno[order(anno, `Gene Name`)] 1494 | colnames(homer.anno) = c('Chr', 'Start', 'End', 'Strand', 'Annotation', 'Hugo_Symbol', 1495 | 'Biotype', 'Distance_to_TSS', 'Nearest_PromoterID', 'Nearest_Ens') 1496 | homer.anno$Annotation = gsub(pattern = "3' UTR", replacement = '3pUTR', x = homer.anno$Annotation) 1497 | homer.anno$Annotation = gsub(pattern = "5' UTR", replacement = '5pUTR', x = homer.anno$Annotation) 1498 | homer.anno 1499 | }) 1500 | 1501 | if(is.null(sample_names)){ 1502 | names(homer) = unlist(data.table::tstrsplit(x = basename(anno), split = "\\.", keep = 1)) 1503 | }else{ 1504 | names(homer) = sample_names 1505 | } 1506 | 1507 | homer.anno.stats = lapply(homer, function(h){ 1508 | homer.anno.stats = h[,.N,Annotation][,fract := N/sum(N)][order(N, decreasing = TRUE)] 1509 | homer.anno.stats[,leg := paste0(Annotation, " [", N, "]")] 1510 | homer.anno.stats 1511 | }) 1512 | npeaks = unlist(lapply(homer, nrow)) 1513 | 1514 | homer.anno.stats = data.table::rbindlist(l = homer.anno.stats, idcol = "Sample") 1515 | homer.anno.stats = data.table::dcast(data = homer.anno.stats, Annotation ~ Sample, value.var = "fract", fill = 0) 1516 | data.table::setDF(x = homer.anno.stats, rownames = homer.anno.stats$Annotation) 1517 | homer.anno.stats =homer.anno.stats[,-1, drop = FALSE] 1518 | 1519 | pie.cols = c('3pUTR' = '#E7298A', '5pUTR' = '#D95F02', 'Intergenic' = '#BEBADA', 1520 | 'TTS' = '#FB8072', 'exon' = '#80B1D3', 'intron' = '#FDB462', 1521 | 'non-coding' = '#FFFFB3', 1522 | 'NA' = 'gray70', 'promoter-TSS' = '#1B9E77') 1523 | 1524 | homer.anno.stats = homer.anno.stats[names(pie.cols)[names(pie.cols) %in% rownames(homer.anno.stats)],,drop = FALSE] 1525 | 1526 | pie.col = pie.cols[rownames(homer.anno.stats)] 1527 | 1528 | #graphics::layout(mat = matrix(data = c(1, 2), nrow = 2), heights = c(4, 1.25)) 1529 | par(mar = c(2, 4, 5, 3)) 1530 | b = barplot(height = as.matrix(homer.anno.stats), col = pie.col, horiz = TRUE, 1531 | las = 2, axes = FALSE, names.arg = rep(NA, ncol(homer.anno.stats)), border = "#34495e") 1532 | axis(side = 1, at = seq(0, 1, 0.25), font = 2, lwd = 2) 1533 | mtext(text = npeaks[colnames(x = homer.anno.stats)], side = 4, at = b, las = 2, font = 4, adj = -0.2) 1534 | mtext(text = colnames(homer.anno.stats), side = 2, at = b, las = 2, adj = 1, font = 2) 1535 | 1536 | #plot.new() 1537 | #par(mar = c(1, 0, 1, 1)) 1538 | .add_legend("topleft", legend = names(pie.col), col = pie.col, bty = "n", border=NA, 1539 | xpd = TRUE, text.font = 2, pch = 15, xjust = 0, yjust = 0, 1540 | cex = 0.8, y.intersp = 1.5, x.intersp = 1, 1541 | pt.cex = 1.2 * 0.8, ncol = 3) 1542 | 1543 | homer 1544 | } 1545 | 1546 | # if(plot){ 1547 | # homer.anno.stats = homer.anno[,.N,Annotation][,fract := round(N/sum(N), digits = 2)][order(N, decreasing = TRUE)] 1548 | # homer.anno.stats[,leg := paste0(Annotation, " [", N, "]")] 1549 | # 1550 | # pie.cols = c('3pUTR' = '#E7298A', '5pUTR' = '#D95F02', 'Intergenic' = '#BEBADA', 1551 | # 'TTS' = '#FB8072', 'exon' = '#80B1D3', 'intron' = '#FDB462', 'non-coding' = '#FFFFB3', 1552 | # 'NA' = 'gray70', 'promoter-TSS' = '#1B9E77') 1553 | # 1554 | # par(bty="n", mgp = c(0.5,0.5,0), las=1, tcl=-.25, font.main=4,xpd=NA, mar=c(0,0,1,3)) 1555 | # pie(x = homer.anno.stats$fract, col = pie.cols[homer.anno.stats$Annotation], labels = homer.anno.stats$fract, 1556 | # border = "white", radius = 0.8, init.angle = 0, font = 4, cex = label_size) 1557 | # symbols(0,0,circles=.3, inches=FALSE, col="white", bg="white", lty=0, add=TRUE) 1558 | # add_legend(x = "topright", bty = "n", legend = homer.anno.stats$leg, 1559 | # col = pie.cols[homer.anno.stats$Annotation], 1560 | # cex = legend_font_size, pch = 15, text.font = 4) 1561 | # } 1562 | 1563 | #------------------------------------------------------------------------------------------------------------------------------------ 1564 | # Undocumented Accessory functions 1565 | #------------------------------------------------------------------------------------------------------------------------------------ 1566 | 1567 | # Add legends outside the margin 1568 | .add_legend <- function(...) { 1569 | opar <- par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), 1570 | mar=c(0, 0, 0, 0), new=TRUE) 1571 | on.exit(par(opar)) 1572 | plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n') 1573 | legend(...) 1574 | } 1575 | 1576 | .make_layout = function(ntracks, ntracks_h = 3, cytoband = TRUE, cytoband_h = 1, genemodel = TRUE, genemodel_h = 1, chrHMM = TRUE, chrHMM_h = 1, loci = TRUE, loci_h = 2, scale_track_height = 1, lord = NULL){ 1577 | 1578 | #track heights (peaks, bigwig tracks, chromHMM, gene model, cytoband, scale) 1579 | lo_h_ord = list("p" = loci_h, "b" = rep(ntracks_h, ntracks), "h" = chrHMM_h, "g" = genemodel_h, "c" = cytoband_h, "s" = scale_track_height) 1580 | mat_ord = c("p" = 1, "b" = 2, "h" = 3, "g" = 4, "c" = 5, "s" = 6) 1581 | 1582 | case = NULL 1583 | #print(paste(cytoband, genemodel, chrHMM, loci, sep = ", ")) 1584 | 1585 | if(cytoband & genemodel & chrHMM & loci){ 1586 | #print("all") 1587 | case = 1 1588 | lo_h_ord = lo_h_ord[c("p", "b", "h", "g", "s", "c")] #c(loci_h, rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height, cytoband_h) 1589 | }else if(cytoband & genemodel & chrHMM == FALSE & loci){ 1590 | #print("no hmm") 1591 | case = 2 1592 | lo_h_ord = lo_h_ord[c("p", "b", "g", "s", "c")] #c(loci_h, rep(ntracks_h, ntracks), genemodel_h, scale_track_height, cytoband_h) 1593 | }else if(cytoband & genemodel & chrHMM == FALSE & loci == FALSE){ 1594 | #print("no hmm and no loci") 1595 | case = 3 1596 | lo_h_ord = lo_h_ord[c("b", "g", "s", "c")] #c(rep(ntracks_h, ntracks), genemodel_h, scale_track_height, cytoband_h) 1597 | }else if(cytoband & genemodel == FALSE & chrHMM == FALSE & loci == FALSE){ 1598 | #print("no hmm and no loci and no genemodel") 1599 | case = 4 1600 | lo_h_ord = lo_h_ord[c("b", "s", "c")] #c(rep(ntracks_h, ntracks), scale_track_height, cytoband_h) 1601 | }else if(cytoband == FALSE & genemodel == FALSE & chrHMM == FALSE & loci == FALSE){ 1602 | #print("no hmm and no loci and no genemodel and no cytoband") 1603 | case = 5 1604 | lo_h_ord = lo_h_ord[c("b", "s")] #c(rep(ntracks_h, ntracks), scale_track_height) 1605 | }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == TRUE & loci == FALSE){ 1606 | #print("no loci and no cytoband") 1607 | case = 6 1608 | lo_h_ord = lo_h_ord[c("b", "h", "g", "s")] #c(rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height) 1609 | }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == TRUE & loci == TRUE){ 1610 | #print("no cytoband") 1611 | case = 7 1612 | lo_h_ord = lo_h_ord[c("p", "b", "h", "g", "s")] #c(loci_h, rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height) 1613 | }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == FALSE & loci == TRUE){ 1614 | #print("no cytoband no chrHMM") 1615 | case = 8 1616 | lo_h_ord = lo_h_ord[c("p", "b", "g", "s")] #c(loci_h, rep(ntracks_h, ntracks), genemodel_h, scale_track_height) 1617 | }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == FALSE & loci == FALSE){ 1618 | #print("no cytoband no chrHMM no loci") 1619 | case = 9 1620 | lo_h_ord = lo_h_ord[c("b", "g", "s")] #c(rep(ntracks_h, ntracks), genemodel_h, scale_track_height) 1621 | }else if(cytoband == TRUE & genemodel == TRUE & chrHMM == TRUE & loci == FALSE){ 1622 | #print("no loci") 1623 | case = 10 1624 | lo_h_ord = lo_h_ord[c("b", "h", "g", "s", "c")] #c(rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height, cytoband_h) 1625 | }else if(cytoband == TRUE & genemodel == FALSE & chrHMM == FALSE & loci == TRUE){ 1626 | #print("no genemodel no chrhmm") 1627 | case = 11 1628 | lo_h_ord = lo_h_ord[c("p", "b", "s", "c")] #c(loci_h, rep(ntracks_h, ntracks), scale_track_height, cytoband_h) 1629 | }else{ 1630 | #print("Something is wrong!") 1631 | } 1632 | 1633 | lord = c(lord, "s") 1634 | lord = c(intersect(lord, names(lo_h_ord)), setdiff(names(lo_h_ord), lord)) 1635 | lo_heights = unlist(lo_h_ord[lord], use.names = FALSE) 1636 | lo = lo_h_ord[lord] 1637 | 1638 | #Order in which plots are drawn by default 1639 | plot_ord = data.frame(row.names = c("p", "b", "h", "g", "s", "c"), name = c("p", "b", "h", "g", "s", "c"), ord = 1:6) 1640 | plot_ord = plot_ord[names(lo),,drop = FALSE] 1641 | plot_ord$ord_req = 1:nrow(plot_ord) #Required order 1642 | plot_ord = plot_ord[order(plot_ord$ord),] 1643 | 1644 | ### Re-organize the layout by user specification 1645 | dt = data.table::data.table(name = plot_ord$name, ord = plot_ord$ord, ord_req = plot_ord$ord_req) 1646 | 1647 | dt$n_tracks = ifelse(test = dt$name == 'b', yes = ntracks, no = 1) 1648 | #print(dt) 1649 | dt_s = split(dt, dt$n_tracks) 1650 | #print(dt_s) 1651 | if(length(dt_s) > 1){ 1652 | dt_mult = dt_s[[2]] 1653 | dt_sing = dt_s[[1]] 1654 | dt_sing[,n_tracks := 1] 1655 | dt_sing$ord_req2 = dt_sing$ord_req 1656 | dt_mult = data.table::data.table(name = rep(dt_mult$name, dt_mult$n_tracks), ord = dt_mult$ord, ord_req = dt_mult$ord_req, ord_req2 = seq(dt_mult$ord_req, dt_mult$ord_req + dt_mult$n_tracks -1 , 1), n_tracks = dt_mult$n_tracks) 1657 | 1658 | dt = data.table::rbindlist(l = list(dt_sing, dt_mult), use.names = TRUE, fill = TRUE) 1659 | dt = dt[order(ord_req)] 1660 | 1661 | dt = split(dt, dt$ord) |> data.table::rbindlist() 1662 | dt[, ord_req4 := 1:nrow(dt)] 1663 | dt = dt[order(ord_req)] 1664 | data = dt$ord_req4 1665 | }else{ 1666 | #ifelse(test = dt$n_tracks > 1, ) 1667 | #print(dt) 1668 | dt = data.table::data.table(name = rep(dt$name, dt$n_tracks), ord = dt$ord, ord_req = dt$ord_req, ord_req2 = dt$ord_req, n_tracks = dt$n_tracks) 1669 | #print(dt) 1670 | 1671 | dt = split(dt, dt$ord) |> data.table::rbindlist() 1672 | dt[, ord_req4 := 1:nrow(dt)] 1673 | dt = dt[order(ord_req)] 1674 | data = dt$ord_req 1675 | } 1676 | 1677 | if(case == 1){ 1678 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1679 | }else if(case == 2){ 1680 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1681 | }else if(case == 3){ 1682 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1683 | }else if(case == 4){ 1684 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1685 | }else if(case == 5){ 1686 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1687 | }else if(case == 6){ 1688 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1689 | }else if(case == 7){ 1690 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1691 | }else if(case == 8){ 1692 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1693 | }else if(case == 9){ 1694 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1695 | }else if(case == 10){ 1696 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1697 | }else if(case == 11){ 1698 | lo = graphics::layout(mat = matrix(data = data), heights = lo_heights) 1699 | } 1700 | 1701 | lo_h_ord[lord] 1702 | } 1703 | 1704 | .gen_windows = function(chr = NA, start, end, window_size = 50, op_dir = getwd()){ 1705 | #chr = "chr19"; start = 15348301; end = 15391262; window_size = 50; op_dir = getwd() 1706 | message(paste0("Generating windows ", "[", window_size, " bp window size]")) 1707 | 1708 | window_dat = data.table::data.table() 1709 | #temp = start; 1710 | while(start <= end){ 1711 | window_dat = data.table::rbindlist(l = list(window_dat, data.table::data.table(start, end = start + window_size)), fill = TRUE) 1712 | start = start + window_size 1713 | } 1714 | window_dat$chr = chr 1715 | window_dat = window_dat[, .(chr, start, end)] 1716 | 1717 | print(window_dat) 1718 | 1719 | op_dir = paste0(op_dir, "/") 1720 | 1721 | if(!dir.exists(paths = op_dir)){ 1722 | dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE) 1723 | } 1724 | 1725 | temp_op_bed = tempfile(pattern = "trackr", tmpdir = op_dir, fileext = ".bed") 1726 | data.table::fwrite(x = window_dat, file = temp_op_bed, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) 1727 | temp_op_bed 1728 | } 1729 | 1730 | 1731 | .get_summaries = function(bedSimple, bigWigs, op_dir = getwd(), nthreads = 1){ 1732 | #bedSimple = temp_op_bed; bigWigs = list.files(path = "./", pattern = "bw"); op_dir = getwd(); nthreads = 1 1733 | op_dir = paste0(op_dir, "/") 1734 | 1735 | if(!dir.exists(paths = op_dir)){ 1736 | dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE) 1737 | } 1738 | 1739 | message(paste0("Extracting signals")) 1740 | 1741 | summaries = parallel::mclapply(bigWigs, FUN = function(bw){ 1742 | bn = gsub(pattern = "\\.bw$|\\.bigWig$", replacement = "", x = basename(bw)) 1743 | message(paste0(" Processing ", bn, " ..")) 1744 | cmd = paste("bwtool summary -with-sum -keep-bed -header", bedSimple, bw, paste0(op_dir, bn, ".summary")) 1745 | system(command = cmd, intern = TRUE) 1746 | paste0(op_dir, bn, ".summary") 1747 | }, mc.cores = nthreads) 1748 | 1749 | summary_list = lapply(summaries, function(x){ 1750 | x = data.table::fread(x) 1751 | colnames(x)[1] = 'chromosome' 1752 | x = x[,.(chromosome, start, end, size, max)] 1753 | if(all(is.na(x[,max]))){ 1754 | message("No signal! Possible cause: chromosome name mismatch between bigWigs and queried loci.") 1755 | x[, max := 0] 1756 | } 1757 | x 1758 | }) 1759 | 1760 | 1761 | #Remove intermediate files 1762 | lapply(summaries, function(x) system(command = paste0("rm ", x), intern = TRUE)) 1763 | system(command = paste0("rm ", bedSimple), intern = TRUE) 1764 | 1765 | names(summary_list) = gsub(pattern = "*\\.summary$", replacement = "", x = basename(path = unlist(summaries))) 1766 | summary_list 1767 | } 1768 | 1769 | 1770 | .extract_geneModel = function(ucsc_tbl = NULL, chr = NULL, start = NULL, end = NULL, txname = txname, genename = genename){ 1771 | 1772 | message("Parsing UCSC file..") 1773 | if(is(object = ucsc_tbl, class2 = "data.frame")){ 1774 | ucsc = data.table::as.data.table(ucsc_tbl) 1775 | }else if(file.exists(ucsc_tbl)){ 1776 | ucsc = data.table::fread(file = ucsc_tbl) 1777 | }else{ 1778 | stop("Gene model must be a file or a data.frame") 1779 | } 1780 | 1781 | query = data.table::data.table(chr, start, end) 1782 | query[,chr := as.character(chr)] 1783 | query[,start := as.numeric(as.character(start))] 1784 | query[,end := as.numeric(as.character(end))] 1785 | data.table::setkey(x = query, chr, start, end) 1786 | 1787 | colnames(ucsc)[c(3, 5, 6)] = c("chr", "start", "end") 1788 | data.table::setkey(x = ucsc, chr, start, end) 1789 | 1790 | gene_models = data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL) 1791 | 1792 | if(nrow(gene_models) == 0){ 1793 | message("No features found within the requested loci! If you are not sure why..\n 1.Make sure there are no discripancies in chromosome names i.e, chr prefixes\n") 1794 | return(NULL) 1795 | }else{ 1796 | if(!is.null(txname)){ 1797 | gene_models = gene_models[name %in% txname] 1798 | } 1799 | 1800 | if(!is.null(genename)){ 1801 | gene_models = gene_models[name2 %in% genename] 1802 | } 1803 | 1804 | exon_tbls = lapply(seq_along(along.with = 1:nrow(gene_models)), function(idx){ 1805 | exon_start = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonStarts], split = ","))) 1806 | exon_end = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonEnds], split = ","))) 1807 | exon_tbl = data.frame(start = exon_start, end = exon_end) 1808 | attributes(exon_tbl) = list(start = gene_models[idx, start], end = gene_models[idx, end], strand = gene_models[idx, strand], tx = gene_models[idx, name], gene = gene_models[idx, name2]) 1809 | exon_tbl 1810 | }) 1811 | 1812 | return(exon_tbls) 1813 | } 1814 | } 1815 | 1816 | .extract_cytoband = function(chr = NULL, refBuild = "hg19", tblName = "cytoBand"){ 1817 | 1818 | if(!grepl(pattern = "^chr", x = chr)){ 1819 | message("Adding chr prefix to target chromosome for UCSC query..") 1820 | chr = paste0("chr", chr) 1821 | } 1822 | 1823 | cmd = paste0( 1824 | "mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ", 1825 | refBuild, 1826 | " -e 'select chrom, chromStart, chromEnd, name, gieStain from ", tblName, " WHERE chrom =\"", 1827 | chr, 1828 | "\"'" 1829 | ) 1830 | message(paste0("Extracting cytobands from UCSC:\n", " chromosome: ", chr, "\n", " build: ", refBuild, "\n query: ", cmd)) 1831 | 1832 | cyto = data.table::fread(cmd = cmd, colClasses = c("character", "numeric", "numeric", "character", "character")) 1833 | colnames(cyto) = c("chr", "start", "end", "band", "stain") 1834 | data.table::setkey(x = cyto, chr, start, end) 1835 | 1836 | #Color codes from https://github.com/jianhong/trackViewer (Thank you..) 1837 | ### gieStain ############################# 1838 | # #FFFFFF - gneg - Giemsa negative bands 1839 | # #999999 - gpos25 - Giemsa positive bands 1840 | # #666666 - gpos50 - Giemsa positive bands 1841 | # #333333 - gpos75 - Giemsa positive bands 1842 | # #000000 - gpos100 - Giemsa positive bands 1843 | # #660033 - acen - centromeric regions 1844 | # #660099 - gvar - variable length heterochromatic regions 1845 | # #6600cc - stalk - tightly constricted regions on the short arms of 1846 | # the acrocentric chromosomes 1847 | colorSheme = c( 1848 | "gneg" = "#FFFFFF", 1849 | "acen" = "#660033", 1850 | "gvar" = "#660099", 1851 | "stalk" = "#6600CC" 1852 | ) 1853 | gposCols <- sapply(1:100, function(i){ 1854 | i <- as.hexmode(round(256-i*2.56, digits = 0)) 1855 | i <- toupper(as.character(i)) 1856 | if(nchar(i)==1) i <- paste0("0", i) 1857 | return(paste0("#", i, i, i)) 1858 | }) 1859 | names(gposCols) <- paste0("gpos", 1:100) 1860 | colorSheme <- c(gposCols, colorSheme) 1861 | cyto$color = colorSheme[cyto[,stain]] 1862 | cyto 1863 | 1864 | cyto 1865 | } 1866 | 1867 | 1868 | .load_chromHMM = function(chr, start, end, ucsc){ 1869 | 1870 | if(nrow(ucsc) == 0){ 1871 | message("No features found within the requested loci!") 1872 | return(NULL) 1873 | } 1874 | colnames(ucsc)[1:4] = c("chr", "start", "end", "name") 1875 | if(!grepl(pattern = "^chr", x = chr)){ 1876 | ucsc[, chr := gsub(pattern = "^chr", replacement = "", x = chr)] 1877 | } 1878 | data.table::setkey(x = ucsc, chr, start, end) 1879 | 1880 | query = data.table::data.table(chr = chr, start = start, end = end) 1881 | data.table::setkey(x = query, chr, start, end) 1882 | 1883 | data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL)[,.(chr, start, end, name)] 1884 | } 1885 | 1886 | .extract_chromHmm_ucsc = function(chr, start, end, refBuild = "hg38", tbl){ 1887 | 1888 | if(!grepl(pattern = "^chr", x = chr)){ 1889 | message("Adding chr prefix to target chromosome for UCSC query..") 1890 | tar_chr = paste0("chr", chr) 1891 | }else{ 1892 | tar_chr = chr 1893 | } 1894 | 1895 | ucsc_tbls = .get_ucsc_hmm_tbls() 1896 | tbl = match.arg(arg = tbl, choices = ucsc_tbls$TableName) 1897 | 1898 | .check_mysql() 1899 | 1900 | cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ", refBuild, " -e 'select chrom, chromStart, chromEnd, name from ", tbl, " WHERE chrom =\"", tar_chr, "\"'") 1901 | message(paste0("Extracting chromHMM from UCSC:\n", " chromosome: ", tar_chr, "\n", " build: ", refBuild, "\n query: ", cmd)) 1902 | #system(command = cmd) 1903 | ucsc = data.table::fread(cmd = cmd) 1904 | if(nrow(ucsc) == 0){ 1905 | message("No features found within the requested loci!") 1906 | return(NULL) 1907 | } 1908 | colnames(ucsc) = c("chr", "start", "end", "name") 1909 | if(!grepl(pattern = "^chr", x = chr)){ 1910 | ucsc[, chr := gsub(pattern = "^chr", replacement = "", x = chr)] 1911 | } 1912 | data.table::setkey(x = ucsc, chr, start, end) 1913 | 1914 | query = data.table::data.table(chr = chr, start = start, end = end) 1915 | data.table::setkey(x = query, chr, start, end) 1916 | 1917 | data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL)[,.(chr, start, end, name)] 1918 | } 1919 | 1920 | .extract_geneModel_ucsc_bySymbol = function(genesymbol, refBuild){ 1921 | .check_mysql() 1922 | op_file = tempfile(pattern = "ucsc", fileext = ".tsv") 1923 | 1924 | cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ", refBuild, " -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE name2 =\"", genesymbol, "\"'") 1925 | message(paste0("Extracting gene models from UCSC:\n", " Gene: ", genesymbol, "\n", " build: ", refBuild, "\n query: ", cmd)) 1926 | 1927 | ucsc = data.table::fread(cmd = cmd, sep = "\t") 1928 | if(nrow(ucsc) == 0){ 1929 | message("No features found within the requested loci!") 1930 | return(NULL) 1931 | } 1932 | 1933 | colnames(ucsc) = c("chr", "start", "end", "strand", "name", "name2", "exonStarts", "exonEnds") 1934 | data.table::setkey(x = ucsc, chr, start, end) 1935 | ucsc 1936 | } 1937 | 1938 | .extract_geneModel_ucsc = function(chr, start = NULL, end = NULL, refBuild = "hg19", txname = NULL, genename = NULL){ 1939 | .check_mysql() 1940 | op_file = tempfile(pattern = "ucsc", fileext = ".tsv") 1941 | 1942 | if(!grepl(pattern = "^chr", x = chr)){ 1943 | message("Adding chr prefix to target chromosome for UCSC query..") 1944 | tar_chr = paste0("chr", chr) 1945 | }else{ 1946 | tar_chr = chr 1947 | } 1948 | 1949 | cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ", refBuild, " -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE chrom =\"", tar_chr, "\"'") 1950 | message(paste0("Extracting gene models from UCSC:\n", " chromosome: ", tar_chr, "\n", " build: ", refBuild, "\n query: ", cmd)) 1951 | #system(command = cmd) 1952 | ucsc = data.table::fread(cmd = cmd) 1953 | if(nrow(ucsc) == 0){ 1954 | message("No features found within the requested loci!") 1955 | return(NULL) 1956 | } 1957 | colnames(ucsc) = c("chr", "start", "end", "strand", "name", "name2", "exonStarts", "exonEnds") 1958 | if(!grepl(pattern = "^chr", x = chr)){ 1959 | ucsc[, chr := gsub(pattern = "^chr", replacement = "", x = chr)] 1960 | } 1961 | data.table::setkey(x = ucsc, chr, start, end) 1962 | 1963 | query = data.table::data.table(chr = chr, start = start, end = end) 1964 | data.table::setkey(x = query, chr, start, end) 1965 | 1966 | gene_models = data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL) 1967 | 1968 | if(nrow(gene_models) == 0){ 1969 | message("No features found within the requested loci!") 1970 | return(NULL) 1971 | }else{ 1972 | return(gene_models) 1973 | } 1974 | } 1975 | 1976 | .make_exon_tbl = function(gene_models, txname = NULL, genename = NULL){ 1977 | if(!is.null(txname)){ 1978 | gene_models = gene_models[name %in% txname] 1979 | } 1980 | 1981 | if(nrow(gene_models) == 0){ 1982 | message(" Requested transcript ", txname, " does not exist within the queried region!\n Skipping gene track plotting..") 1983 | return(NULL) 1984 | } 1985 | 1986 | if(!is.null(genename)){ 1987 | gene_models = gene_models[name2 %in% genename] 1988 | } 1989 | 1990 | if(nrow(gene_models) == 0){ 1991 | message(" Requested gene ", genename, " does not exist within the queried region!\n Skipping gene track plotting..") 1992 | return(NULL) 1993 | } 1994 | 1995 | exon_tbls = lapply(seq_along(along.with = 1:nrow(gene_models)), function(idx){ 1996 | exon_start = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonStarts], split = ","))) 1997 | exon_end = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonEnds], split = ","))) 1998 | exon_tbl = data.frame(start = exon_start, end = exon_end) 1999 | attributes(exon_tbl) = list(start = gene_models[idx, start], end = gene_models[idx, end], strand = gene_models[idx, strand], tx = gene_models[idx, name], gene = gene_models[idx, name2]) 2000 | exon_tbl 2001 | }) 2002 | 2003 | return(exon_tbls) 2004 | } 2005 | 2006 | .get_ucsc_hmm_states_cols = function(){ 2007 | states = c("red", "red4", "purple", "orange", "orange", "yellow", "yellow", 2008 | "blue", "darkgreen", "darkgreen", "lightgreen", "gray", "gray90", 2009 | "gray90", "gray90") 2010 | names(states) = 1:15 2011 | states 2012 | } 2013 | 2014 | .plot_ucsc_chrHmm = function(d, start = NULL, end = NULL, hmm_cols = NULL){ 2015 | #hmm_cols = .get_ucsc_hmm_states_cols() 2016 | 2017 | if(is.null(start)){ 2018 | start = min(data.table::rbindlist(l = d, use.names = TRUE, fill = TRUE)[,start], na.rm = TRUE) 2019 | } 2020 | 2021 | if(is.null(end)){ 2022 | end = max(data.table::rbindlist(l = d, use.names = TRUE, fill = TRUE)[,end], na.rm = TRUE) 2023 | } 2024 | 2025 | plot(NA, ylim = c(0, length(d)), xlim = c(start, end), axes = FALSE, xlab = NA, ylab = NA) 2026 | lapply(seq_along(d), function(i){ 2027 | di = d[[i]] 2028 | di$state = unlist(data.table::tstrsplit(x = di$name, split = "_", keep = 1)) 2029 | rect(xleft = di$start, ybottom = i-0.9, xright = di$end, ytop = i-0.1, col = hmm_cols[di$state], border = NA) 2030 | di_name = gsub(pattern = "wgEncodeBroadHmm|HMM", replacement = "", x = names(d)[i]) 2031 | text(x = start, y = i - 0.5, labels = di_name, adj = 1.2, xpd = TRUE) 2032 | #mtext(text = di_name, side = 2, line = 1, outer = TRUE, cex = 1) 2033 | }) 2034 | 2035 | } 2036 | 2037 | .collapse_tx = function(exon_tbls){ 2038 | message("Collapsing transcripts..") 2039 | tx_tbl = lapply(exon_tbls, function(x){ 2040 | xdt = data.table::data.table(start = x[[1]], end = x[[2]]) 2041 | xdt$tx = attr(x = x, which = "tx") 2042 | xdt$gene = attr(x = x, which = "gene") 2043 | xdt$strand = attr(x = x, which = "strand") 2044 | xdt$tx_start = attr(x = x, which = "start") 2045 | xdt$tx_end = attr(x = x, which = "end") 2046 | xdt 2047 | }) 2048 | tx_tbl = data.table::rbindlist(l = tx_tbl) 2049 | tx_tbl[,id := paste0(start, ":", end)] 2050 | tx_tbl = tx_tbl[!duplicated(id)] 2051 | 2052 | exon_tbls = lapply(split(tx_tbl, as.factor(as.character(tx_tbl$gene))), function(x){ 2053 | x = x[order(start)] 2054 | exon_start = as.numeric(x[,start]) 2055 | exon_end = as.numeric(x[,end]) 2056 | gene_tbl = data.frame(start = exon_start, end = exon_end) 2057 | attributes(gene_tbl) = list(start = min(x[, tx_start]), end = max(x[, tx_end]), strand = unique(x[, strand]), tx = NA, gene = unique(x[, gene])) 2058 | gene_tbl 2059 | }) 2060 | 2061 | exon_tbls 2062 | } 2063 | 2064 | .parse_gtf = function(gtf = NULL, chr, start = NULL, end = NULL, refBuild = "hg19", txname = NULL, genename = NULL){ 2065 | message("Parsing gtf file..") 2066 | if(is(object = gtf, class2 = "data.frame")){ 2067 | gtf = data.table::as.data.table(gtf) 2068 | }else if(file.exists(gtf)){ 2069 | gtf = data.table::fread(file = gtf) 2070 | }else{ 2071 | stop("Gene model must be a file or a data.frame") 2072 | } 2073 | 2074 | colnames(gtf) = c("chr", "source", "feature", "start", "end", "ph", "strand", "ph2", "info") 2075 | gtf[,chr := as.character(chr)] 2076 | gtf[,start :=as.numeric(as.character(start))] 2077 | gtf[,end := as.numeric(as.character(end))] 2078 | data.table::setkey(x = gtf, chr, start, end) 2079 | 2080 | if(!is.null(genename)){ 2081 | gene_models = gtf[info %like% genename] 2082 | if(nrow(gene_models) == 0){ 2083 | message("No features found for the gene ", genename) 2084 | return(NULL) 2085 | } 2086 | }else{ 2087 | query = data.table::data.table(chr, start, end) 2088 | data.table::setkey(x = query, chr, start, end) 2089 | gene_models = data.table::foverlaps(x = query, y = gtf, type = "any", nomatch = NULL) 2090 | if(nrow(gene_models) == 0){ 2091 | message("No features found within the requested loci!") 2092 | return(NULL) 2093 | } 2094 | } 2095 | 2096 | 2097 | gene_models_exon = gene_models[feature %in% c("exon", "transcript")] 2098 | #gene_models_rest = gene_models[!feature %in% "exon"] 2099 | 2100 | feature_ids = data.table::tstrsplit(x = gene_models_exon$info, split = "; ") 2101 | feature_id_names = lapply(feature_ids, function(x){ 2102 | #x = x[1:50] #sample rows 2103 | x = unique(unlist(data.table::tstrsplit(x = x, split = " ", keep = 1))) 2104 | x = x[complete.cases(x)] 2105 | x[1] 2106 | }) 2107 | names(feature_ids) = feature_id_names 2108 | req_fields = c("gene_id", "transcript_id") 2109 | req_fields = req_fields[req_fields %in% unlist(feature_id_names)] 2110 | feature_ids = feature_ids[req_fields] 2111 | 2112 | feature_ids = sapply(feature_ids, function(x){ 2113 | gsub(pattern = "\"|;", replacement = "", x = unlist(data.table::tstrsplit(x = x, split = " ", keep = 2))) 2114 | }) 2115 | feature_ids = data.frame(feature_ids) 2116 | colnames(feature_ids) = c("name2", "name") 2117 | gene_models_exon = cbind(gene_models_exon, feature_ids) 2118 | #gene_models = data.table::rbindlist(list(gene_models_rest, gene_models_exon), use.names = TRUE, fill = TRUE) 2119 | gene_models = gene_models_exon[order(as.numeric(as.character(start)))] 2120 | #gene_models[,.(chr,start,end,strand, tx, gene)] 2121 | 2122 | if(nrow(gene_models) == 0){ 2123 | warning("No features found within the requested loci!") 2124 | return(NULL) 2125 | }else{ 2126 | if(!is.null(txname)){ 2127 | gene_models = gene_models[name %in% txname] 2128 | } 2129 | 2130 | if(!is.null(genename)){ 2131 | gene_models = gene_models[name2 %in% genename] 2132 | } 2133 | 2134 | if(nrow(gene_models) == 0){ 2135 | warning("Requested gene or transcript could not be found within the requested loci!") 2136 | return(NULL) 2137 | } 2138 | 2139 | gene_models = split(gene_models, as.factor(as.character(gene_models$name))) 2140 | 2141 | exon_tbls = lapply(seq_along(along.with = 1:length(gene_models)), function(idx){ 2142 | x = gene_models[[idx]] 2143 | exon_start = as.numeric(as.character(x[feature %in% "exon"][, start])) 2144 | exon_end = as.numeric(as.character(x[feature %in% "exon"][, end])) 2145 | exon_tbl = data.frame(start = exon_start, end = exon_end) 2146 | attributes(exon_tbl) = list(start = min(x[,start], na.rm = TRUE), end = max(x[,end], na.rm = TRUE), strand = unique(x[,strand]), tx = unique(x[, name]), gene = unique(x[, name2]), chr = unique(x[,chr])) 2147 | exon_tbl 2148 | }) 2149 | } 2150 | exon_tbls 2151 | } 2152 | 2153 | .make_genome_bed = function(refBuild = "hg19", tss = "start", up = 2500, down = 2500, op_dir = tempdir(), pc_genes = FALSE, for_profile = TRUE){ 2154 | if(!dir.exists(paths = op_dir)){ 2155 | dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE) 2156 | } 2157 | 2158 | tss = match.arg(arg = tss, choices = c("start", "end")) 2159 | 2160 | temp_op_bed = tempfile(pattern = "profileplot_ucsc", tmpdir = op_dir, fileext = ".bed") 2161 | 2162 | cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ", refBuild, " -e 'select chrom, txStart, txEnd, strand, name, name2 from refGene'") 2163 | message(paste0("Extracting gene models from UCSC:\n", " build: ", refBuild, "\n query: ", cmd)) 2164 | #system(command = cmd) 2165 | ucsc = data.table::fread(cmd = cmd) 2166 | colnames(ucsc) = c("chr", "start", "end", "strand", "tx_id", "gene_id") 2167 | 2168 | main_contigs = paste0("chr", c(1:22, "X", "Y")) 2169 | ucsc = ucsc[chr %in% main_contigs] 2170 | 2171 | if(pc_genes){ 2172 | ucsc = ucsc[tx_id %like% "^NM"] 2173 | } 2174 | 2175 | message("Fetched ", nrow(ucsc), " transcripts from ", nrow(ucsc[,.N,.(chr)]), " contigs") 2176 | 2177 | if(for_profile){ 2178 | #If it is only for profile plot where tss or tes are extended, we donyt extend manually here. Instead invert tss for negative strand txs. Let bwtool do the hard work 2179 | ucsc_minus = ucsc[strand %in% "-"] 2180 | colnames(ucsc_minus) = c("chr", "end", "start", "strand", "tx_id", "gene_id") 2181 | ucsc_plus = ucsc[strand %in% "+"] 2182 | ucsc_bed = data.table::rbindlist(l = list(ucsc_plus, ucsc_minus), use.names = TRUE, fill = TRUE)[,.(chr, start, end, tx_id, gene_id)] 2183 | }else{ 2184 | #If for summary, extrend the regions 2185 | ucsc_minus = ucsc[strand %in% "-"] 2186 | if(nrow(ucsc_minus) > 0){ 2187 | if(tss == "start"){ 2188 | ucsc_minus[, bed_start := end-up] 2189 | ucsc_minus[, bed_end := end+down] 2190 | }else{ 2191 | ucsc_minus[, bed_start := start-up] 2192 | ucsc_minus[, bed_end := start+down] 2193 | } 2194 | } 2195 | 2196 | ucsc_plus = ucsc[strand %in% "+"] 2197 | if(nrow(ucsc_plus) > 0){ 2198 | if(tss == "start"){ 2199 | ucsc_plus[, bed_start := start-up] 2200 | ucsc_plus[, bed_end := start+down] 2201 | }else{ 2202 | ucsc_plus[, bed_start := end-up] 2203 | ucsc_plus[, bed_end := end+down] 2204 | } 2205 | } 2206 | 2207 | ucsc_bed = data.table::rbindlist( 2208 | l = list(ucsc_plus[, .(chr, bed_start, bed_end, tx_id, gene_id)], ucsc_minus[, .(chr, bed_start, bed_end, tx_id, gene_id)]), 2209 | use.names = TRUE, 2210 | fill = TRUE 2211 | ) 2212 | } 2213 | 2214 | colnames(ucsc_bed) = c("chr", "start", "end", "tx", "gene") 2215 | data.table::setkey(x = ucsc_bed, chr, start, end) 2216 | 2217 | data.table::fwrite(x = ucsc_bed[,1:3], file = temp_op_bed, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) 2218 | data.table::fwrite(x = ucsc_bed, file = paste0(temp_op_bed, "2"), sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) 2219 | 2220 | list(temp_op_bed, paste0(temp_op_bed, "2")) 2221 | } 2222 | 2223 | .make_bed = function(bed, op_dir = tempdir(), up = 2500, down = 2500, tss = "center", for_profile = FALSE){ 2224 | #bwtool tool requires only three columns 2225 | 2226 | tss = match.arg(arg = tss, choices = c("start", "end", "center")) 2227 | 2228 | if(!dir.exists(paths = op_dir)){ 2229 | dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE) 2230 | } 2231 | 2232 | temp_op_bed = tempfile(pattern = "profileplot", tmpdir = op_dir, fileext = ".bed") 2233 | 2234 | if(is.data.frame(bed)){ 2235 | bed = data.table::as.data.table(x = bed) 2236 | #data.table::setDT(x = bed) 2237 | colnames(bed)[1:3] = c("chr", "start", "end") 2238 | bed[, chr := as.character(chr)] 2239 | bed[, start := as.numeric(as.character(start))] 2240 | bed[, end := as.numeric(as.character(end))] 2241 | }else if(file.exists(bed)){ 2242 | bed = data.table::fread(file = bed, select = list(character = 1, numeric = c(2, 3)), col.names = c("chr", "start", "end")) 2243 | bed = bed[,.(chr, start, end)] 2244 | } 2245 | 2246 | if(!for_profile){ 2247 | if(tss == "center"){ 2248 | bed[, focal_point := as.integer(apply(bed[,2:3], 1, mean))] 2249 | bed[, bed_start := focal_point-up] 2250 | bed[, bed_end := focal_point+down] 2251 | }else if(tss == "start"){ 2252 | bed[, bed_start := start-up] 2253 | bed[, bed_end := start+down] 2254 | }else{ 2255 | bed[, bed_start := end-up] 2256 | bed[, bed_end := end+down] 2257 | } 2258 | bed = bed[,.(chr, bed_start, bed_end)] 2259 | data.table::setkey(x = bed, chr, bed_start, bed_end) 2260 | } 2261 | 2262 | data.table::fwrite(x = bed, file = temp_op_bed, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) 2263 | 2264 | return(temp_op_bed) 2265 | } 2266 | 2267 | .bwt_mats = function(bw, binSize, bed, size, startFrom, op_dir){ 2268 | 2269 | bn = gsub(pattern = "\\.bw$|\\.bigWig$", replacement = "", 2270 | x = basename(bw), ignore.case = TRUE) 2271 | message(paste0("Processing ", bn, "..")) 2272 | 2273 | bw = gsub(pattern = " ", replacement = "\\ ", x = bw, fixed = TRUE) #Change spaces with \ for unix style paths 2274 | 2275 | if(startFrom == "start"){ 2276 | cmd = paste0("bwtool matrix -starts -tiled-averages=", binSize, " ", size, " " , bed , " ", bw, " ", paste0(op_dir, "/", bn, ".matrix")) 2277 | }else if(startFrom == "end"){ 2278 | cmd = paste0("bwtool matrix -ends -tiled-averages=", binSize, " ", size, " " , bed , " ", bw, " ", paste0(op_dir, "/", bn, ".matrix")) 2279 | }else{ 2280 | cmd = paste0("bwtool matrix -tiled-averages=", binSize, " ", size, " " , bed , " ", bw, " ", paste0(op_dir, "/", bn, ".matrix")) 2281 | } 2282 | print(cmd) 2283 | 2284 | system(command = cmd, intern = TRUE) 2285 | paste0(op_dir, "/", bn, ".matrix") 2286 | } 2287 | 2288 | 2289 | .summarizeMats = function(mats = NULL, summarizeBy = 'mean', group = NULL, collapse_reps = FALSE){ 2290 | 2291 | if(!is.null(group)){ 2292 | # gdf = data.table::data.table(sample = names(mats), condition = group) 2293 | # gdf = gdf[order(group, sample)] 2294 | group_u = unique(group) 2295 | if(collapse_reps){ 2296 | summarizedMats = lapply(group_u, function(g){ 2297 | x = apply(data.table::rbindlist(l = mats[which(group == g)], fill = TRUE, use.names = TRUE), 2, summarizeBy, na.rm = TRUE) 2298 | x 2299 | }) 2300 | names(summarizedMats) = group_u 2301 | }else{ 2302 | summarizedMats = lapply(mats[1:(length(mats))], function(x){ 2303 | if(!is.null(dim(x))){ 2304 | x = apply(x, 2, summarizeBy, na.rm = TRUE) 2305 | } 2306 | x 2307 | }) 2308 | } 2309 | }else{ 2310 | summarizedMats = lapply(mats[1:(length(mats))], function(x){ 2311 | if(!is.null(dim(x))){ 2312 | x = apply(x, 2, summarizeBy, na.rm = TRUE) 2313 | } 2314 | x 2315 | }) 2316 | } 2317 | 2318 | #summarizedMats$param = mats$param 2319 | summarizedMats 2320 | } 2321 | 2322 | 2323 | # estimate tandard deviation for CI 2324 | .estimateCI = function(mats = NULL, group = NULL, collapse_reps = FALSE){ 2325 | 2326 | if(!is.null(group)){ 2327 | if(collapse_reps){ 2328 | group_u = unique(group) 2329 | ciMats = lapply(group_u, function(g){ 2330 | x = apply(data.table::rbindlist(l = mats[which(group == g)], fill = TRUE, use.names = TRUE), 2, function(y){ 2331 | sd(y, na.rm = TRUE)/sqrt(length(y)) 2332 | }) 2333 | x 2334 | }) 2335 | names(ciMats) = group_u 2336 | }else{ 2337 | ciMats = lapply(mats[1:(length(mats))], function(x){ 2338 | if(!is.null(dim(x))){ 2339 | x = apply(x, 2, function(y){ 2340 | sd(y, na.rm = TRUE)/sqrt(length(y)) 2341 | }) 2342 | } 2343 | x 2344 | }) 2345 | } 2346 | }else{ 2347 | ciMats = lapply(mats[1:(length(mats))], function(x){ 2348 | if(!is.null(dim(x))){ 2349 | x = apply(x, 2, function(y){ 2350 | sd(y, na.rm = TRUE)/sqrt(length(y)) 2351 | }) 2352 | } 2353 | x 2354 | }) 2355 | } 2356 | 2357 | ciMats 2358 | } 2359 | 2360 | .order_by_sds = function(mat, keep_sd = FALSE){ 2361 | mat_sd = apply(as.matrix(mat), 1, sd, na.rm = TRUE) #order it based on SD 2362 | mat_sd = sort(mat_sd, decreasing = TRUE) 2363 | mat = mat[names(mat_sd),, drop = FALSE] 2364 | mat 2365 | } 2366 | 2367 | .extract_summary = function(bw, binSize, bed, op_dir){ 2368 | bn = gsub(pattern = "\\.bw$|\\.bigWig$", replacement = "", 2369 | x = basename(bw), ignore.case = TRUE) 2370 | message(paste0(" Processing ", bn, "..")) 2371 | 2372 | cmd = paste("bwtool summary -with-sum -keep-bed -header", bedSimple, bw, paste0(op_dir, bn, ".summary")) 2373 | paste0(op_dir, "/", bn, ".summary") 2374 | } 2375 | 2376 | .loci2df = function(loci){ 2377 | chr = as.character(unlist(data.table::tstrsplit(x = loci, spli = ":", keep = 1))) 2378 | start = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-", keep = 1)) 2379 | start = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(start)))) 2380 | end = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-", keep = 2)) 2381 | end = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(end)))) 2382 | data.table::data.table(chr, start, end) 2383 | } 2384 | 2385 | .check_bwtool = function(warn = FALSE){ 2386 | check = as.character(Sys.which(names = 'bwtool'))[1] 2387 | if(check != ""){ 2388 | if(warn){ 2389 | message("Checking for bwtool installation") 2390 | message(paste0(" All good! Found bwtool at: ", check)) 2391 | }else{ 2392 | return(invisible(0)) 2393 | } 2394 | }else{ 2395 | stop("Could not locate bwtool. Download it from here: https://github.com/CRG-Barcelona/bwtool/releases") 2396 | } 2397 | } 2398 | 2399 | .check_mysql = function(warn = FALSE){ 2400 | check = as.character(Sys.which(names = 'mysql'))[1] 2401 | if(check != ""){ 2402 | if(warn){ 2403 | message("Checking for mysql installation") 2404 | message(paste0(" All good! Found mysql at: ", check)) 2405 | }else{ 2406 | return(invisible(0)) 2407 | } 2408 | }else{ 2409 | stop("Could not locate mysql.\nInstall:\n apt install mysql-server [Debian]\n yum install mysql-server [centOS]\n brew install mysql [macOS]\n conda install -c anaconda mysql [conda]") 2410 | } 2411 | } 2412 | 2413 | .check_dt = function(){ 2414 | if(!requireNamespace("data.table", quietly = TRUE)){ 2415 | message("Could not find data.table library. Attempting to install..") 2416 | install.packages("data.table") 2417 | } 2418 | suppressPackageStartupMessages(expr = library("data.table", quietly = TRUE, warn.conflicts = FALSE, verbose = FALSE)) 2419 | } 2420 | 2421 | .check_windows = function(){ 2422 | if(Sys.info()["sysname"] == "Windows"){ 2423 | stop("Windows is not supported :(") 2424 | } 2425 | } 2426 | 2427 | .get_ucsc_hmm_tbls = function(){ 2428 | hmm = structure( 2429 | list( 2430 | TableName = c( 2431 | "wgEncodeBroadHmmGm12878HMM", 2432 | "wgEncodeBroadHmmH1hescHMM", 2433 | "wgEncodeBroadHmmHepg2HMM", 2434 | "wgEncodeBroadHmmHepg2HMM", 2435 | "wgEncodeBroadHmmHsmmHMM", 2436 | "wgEncodeBroadHmmHuvecHMM", 2437 | "wgEncodeBroadHmmK562HMM", 2438 | "wgEncodeBroadHmmNhekHMM", 2439 | "wgEncodeBroadHmmNhlfHMM" 2440 | ), 2441 | cell = c( 2442 | "GM12878", 2443 | "H1-hESC", 2444 | "HepG2", 2445 | "HMEC", 2446 | "HSMM", 2447 | "HUVEC", 2448 | "K562", 2449 | "NHEK", 2450 | "NHLF" 2451 | ), 2452 | Tier = c(1L, 2453 | 1L, 2L, 3L, 3L, 2L, 1L, 3L, 3L), 2454 | Description = c( 2455 | "B-lymphocyte, lymphoblastoid", 2456 | "embryonic stem cells", 2457 | "hepatocellular carcinoma", 2458 | "mammary epithelial cells", 2459 | "skeletal muscle myoblasts", 2460 | "umbilical vein endothelial cells", 2461 | "leukemia", 2462 | "epidermal keratinocytes", 2463 | "lung fibroblasts" 2464 | ), 2465 | Lineage = c( 2466 | "mesoderm", 2467 | "inner cell mass", 2468 | "endoderm", 2469 | "ectoderm", 2470 | "mesoderm", 2471 | "mesoderm", 2472 | "mesoderm", 2473 | "ectoderm", 2474 | "endoderm" 2475 | ), 2476 | Tissue = c( 2477 | "blood", 2478 | "embryonic stem cell", 2479 | "liver", 2480 | "breast", 2481 | "muscle", 2482 | "blood vessel", 2483 | "blood", 2484 | "skin", 2485 | "lung" 2486 | ), 2487 | Karyotype = c( 2488 | "normal", 2489 | "normal", 2490 | "cancer", 2491 | "normal", 2492 | "normal", 2493 | "normal", 2494 | "cancer", 2495 | "normal", 2496 | "normal" 2497 | ), 2498 | Sex = c("F", 2499 | "M", "M", "U", "U", "U", "F", "U", "U") 2500 | ), 2501 | row.names = c(NA,-9L), 2502 | class = c("data.table", "data.frame") 2503 | ) 2504 | 2505 | hmm 2506 | } 2507 | 2508 | .get_summaries_narrowPeaks = function(bigWigs, nthreads = 1, chr = NA, start = NA, end = NA){ 2509 | #bedSimple = temp_op_bed; bigWigs = list.files(path = "./", pattern = "bw"); op_dir = getwd(); nthreads = 1 2510 | query = data.table::data.table(chromosome = chr, start = start, end = end, key = c("chromosome", "start", "end")) 2511 | 2512 | message(paste0("Extracting signals")) 2513 | 2514 | summaries = parallel::mclapply(bigWigs, FUN = function(bw){ 2515 | bn = unlist(data.table::tstrsplit(x = basename(bw), split = "\\.", keep = 1)) 2516 | message(paste0(" Processing ", bn, " ..")) 2517 | bw = data.table::fread(file = bw) 2518 | colnames(bw)[c(1:3, 5)] = c("chromosome", "start", "end", "max") 2519 | data.table::setkeyv(x = bw, cols = c("chromosome", "start", "end")) 2520 | bw = data.table::foverlaps(x = query, y = bw, type = "any", nomatch = NULL) 2521 | bw[,.(chromosome, start, end, max)] 2522 | }, mc.cores = nthreads) 2523 | 2524 | names(summaries) = unlist(data.table::tstrsplit(x = basename(bigWigs), split = "\\.", keep = 1)) 2525 | summaries 2526 | } 2527 | 2528 | .parse_loci = function(loci){ 2529 | chr = as.character(unlist(data.table::tstrsplit(x = loci, split = ":", keep = 1))) 2530 | start = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-"))[1] 2531 | start = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(start)))) 2532 | end = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-"))[2] 2533 | end = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(end)))) 2534 | list(chr = chr, start = start, end = end) 2535 | } 2536 | 2537 | #Plot profile mini version 2538 | .plot_profile_mini = function(plot_dat, index = 1, ylims = c(0, 2)){ 2539 | #size = as.character(plot_dat$param["size"]) 2540 | 2541 | y = plot_dat[[index]] 2542 | x = (1:(length(y)))/length(y) 2543 | par(mar = c(1,2,1,1)) 2544 | plot(NA, axes = FALSE, xlim = c(-0.2, 1), ylim = ylims, xlab = NA, ylab = NA) 2545 | #grid(col = "gray90") 2546 | points(x, y, type = "l", lwd = 1.2) 2547 | axis(side = 2, at = ylims, lwd = 1, line = -0.6, labels = NA) 2548 | mtext(side = 2, at = ylims, lwd = 1, line = -0.05, text = round(ylims, digits = 2), cex = 0.8, font = 1, las = 2) 2549 | } 2550 | 2551 | # Order matrices 2552 | .order_matrix = function(mats, sortBy = NULLs, k = NULL){ 2553 | 2554 | mats_avg = lapply(mats, function(x){ 2555 | x = as.matrix(as.data.frame(x = x)) 2556 | apply(x, 1, mean, na.rm = TRUE) 2557 | }) 2558 | 2559 | mats_avg = as.data.frame(mats_avg) 2560 | cluster_row_cut = NULL 2561 | 2562 | if(sortBy == "mean"){ 2563 | mats_avg$oall_avg = rowMeans(mats_avg, na.rm = TRUE) 2564 | row_idx = order(mats_avg$oall_avg, decreasing = TRUE) 2565 | }else if(sortBy == "median"){ 2566 | mats_avg$oall_avg = apply(mats_avg, 1, median, na.rm = TRUE) 2567 | row_idx = order(mats_avg$oall_avg, decreasing = TRUE) 2568 | }else if(sortBy == "hclust"){ 2569 | set.seed(seed = 1024) 2570 | hc = hclust(d = dist(mats_avg)) 2571 | mats_avg$hc_order = hc$order 2572 | if(!is.null(k)){ 2573 | mats_avg$cluster = cutree(tree = hc, k = k) 2574 | mats_avg$row_idx = 1:nrow(mats_avg) 2575 | mats_avg_spl = split(mats_avg, f = as.factor(as.character(mats_avg$cluster))) 2576 | cluster_row_cut = unlist(lapply(mats_avg_spl, nrow)) 2577 | print(cluster_row_cut) 2578 | mats_avg_spl = lapply(mats_avg_spl, function(x){ 2579 | xhc = hclust(d = dist(x[,1:(ncol(x)-3)])) 2580 | x$row_idx2 = xhc$order 2581 | x = x[order(x$row_idx2, decreasing = FALSE),, drop = FALSE] 2582 | x 2583 | }) 2584 | mats_avg_spl = data.table::rbindlist(l = mats_avg_spl, fill = TRUE, use.names = TRUE) 2585 | #cluster_row_cut = cumsum(xhm[,.N, cluster][,N]) 2586 | row_idx = mats_avg_spl$row_idx 2587 | }else{ 2588 | row_idx = mats_avg$hc_order 2589 | } 2590 | } 2591 | 2592 | mats = lapply(mats, function(x){ 2593 | x = as.data.frame(x = x) 2594 | x = x[row_idx,,drop = FALSE] 2595 | x 2596 | }) 2597 | 2598 | mats 2599 | } 2600 | #------------------------------------------------------------------------------------------------------------------------------------ 2601 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## trackplot - Fast and easy visualisation of bigWig files in R 2 | 3 | 4 | [![GitHub closed issues](https://img.shields.io/github/issues-closed-raw/poisonalien/trackplot.svg)](https://github.com/poisonalien/trackplot/issues) 5 | 6 | 7 | ## Introduction 8 | 9 | `trackplot.R` is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files. 10 | 11 | ## Installation 12 | 13 | `trackplot.R` is a standalone R script and requires no installation. Just source it and you're good to go! 14 | 15 | ```r 16 | source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true") 17 | 18 | # OR 19 | 20 | download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R") 21 | source('trackplot.R') 22 | 23 | # OR If you prefer to have it as package 24 | 25 | remotes::install_github(repo = "poisonalien/trackplot") 26 | ``` 27 | 28 | ## Features 29 | 30 | Why `trackplot`? 31 | 32 | * It's extremely fast since most of the heavy lifting is done by [bwtool](https://github.com/CRG-Barcelona/bwtool). >15X faster than [deeptools](https://deeptools.readthedocs.io/en/develop/) for equivalent `profileplots` and `heatmaps` 33 | * Lightweight and minimal dependency 34 | - [data.table](https://cran.r-project.org/web/packages/data.table/index.html) and [bwtool](https://github.com/CRG-Barcelona/bwtool) are the only requirements. Similar R packages [GViz](https://bioconductor.org/packages/release/bioc/html/Gviz.html) and [karyoploteR](http://bioconductor.org/packages/release/bioc/html/karyoploteR.html) has over 150 dependencies. 35 | - Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages) 36 | * Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible. 37 | * Supports GTF and standard UCSC gene formats as well. 38 | * Customization: Each plot can be customized for color, scale, height, width, etc. 39 | * Tracks can be summarized per condition (by mean, median, max, min) 40 | * PCA and, optional differential peak analysis with `limma` when using uniformly processed, normalized bigWig files. 41 | 42 | ## Dependencies 43 | 44 | 1. [data.table](https://cran.r-project.org/web/packages/data.table/index.html) R package - which itself has no dependency. 45 | 2. [bwtool](https://github.com/CRG-Barcelona/bwtool) - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; `/usr/local/bin`) or a directory under the PATH. 46 | 47 | * For macOS: Please download the pre-build binary from [here](https://www.dropbox.com/s/kajx9ya6erzyrim/bwtool_macOS.tar.gz?dl=1). Make it executable with `chmod +x bwtool`. macOS gatekeeper might complain that it can't run the binary downloaded from the internet. If so, [allow](https://support.apple.com/en-us/HT202491) it in the security settings. 48 | 49 | * For centOS or debian: Follow these [compilation instructions](https://gist.github.com/PoisonAlien/e19b482ac6146bfb03142a0de1c4fbc8). 50 | 51 | ### Citation 52 | 53 | If you find the script useful consider citing [trackplot](https://academic.oup.com/bioinformaticsadvances/article/4/1/vbae031/7616126) and [bwtool](https://academic.oup.com/bioinformatics/article/30/11/1618/282756) 54 | 55 | **_Mayakonda A, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: [38476298](https://pubmed.ncbi.nlm.nih.gov/38476298/)_** 56 | 57 | **_Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: [24489365](https://pubmed.ncbi.nlm.nih.gov/24489365/)_** 58 | 59 | ## Usage 60 | 61 | Simple usage - Make a table of all the bigWig files to be analysed with `read_coldata()` and pass it to the downstream functions. 62 | 63 | ```mermaid 64 | flowchart TD 65 | a[bigWig file list] -->A{read_coldata} 66 | A --> B{track_extract} 67 | B --> B1[track_plot] 68 | A --> C{profile_extract} 69 | C --> C1[profile_summarize] 70 | C --> C3[profile_heatmap] 71 | C1 --> C2[profile_plot] 72 | A --> D{extract_summary} 73 | D --> D1[pca_plot] 74 | D --> D2[diffpeak] 75 | D2 --> D3[volcano_plot] 76 | ``` 77 | 78 | ```r 79 | #Path to bigWig files 80 | bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw", 81 | "H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw") 82 | 83 | #Make a table of bigWigs along with ref genome build 84 | bigWigs = read_coldata(bws = bigWigs, build = "hg19") 85 | ``` 86 | 87 | ## trackplots 88 | 89 | `track_extract()` and `track_plot()` are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, `track_summarize` can summarize tracks by condition. 90 | 91 | ### Step-1: Extract signal from bigWig files 92 | ```r 93 | #Region to plot 94 | oct4_loci = "chr6:31125776-31144789" 95 | 96 | #Extract bigWig signal for a loci of interest 97 | t = track_extract(colData = bigWigs, loci = oct4_loci) 98 | 99 | #Or you can also specifiy a gene name instead of a loci 100 | # - loci and gene models will be automatically extracted from UCSC genome browser 101 | t = track_extract(colData = bigWigs, gene = "POU5F1") 102 | ``` 103 | 104 | 105 | ### Step-2: Plot 106 | 107 | #### Basic plot 108 | ```r 109 | track_plot(summary_list = t) 110 | ``` 111 | 112 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/b50457b0-6623-47f6-b00f-f6e0de5a4808) 113 | 114 | #### Add cytoband and change colors for each track 115 | ```r 116 | track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60") 117 | track_plot(summary_list = t, 118 | col = track_cols, 119 | show_ideogram = TRUE) 120 | ``` 121 | 122 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/a0911998-aae8-4de1-96f5-18e278d19d80) 123 | 124 | #### Collapse all tracks into a single track 125 | 126 | Use `track_overlay = TRUE` to overlay all tracks into a single line track 127 | 128 | ```r 129 | track_plot(summary_list = t, col = track_cols, show_ideogram = FALSE, track_overlay = TRUE) 130 | ``` 131 | 132 | ![](https://github.com/user-attachments/assets/d286f159-8950-4209-a985-fa3ba7103c53) 133 | 134 | 135 | #### Heighilight regions of interest (any bed files would do) 136 | 137 | ```r 138 | oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files 139 | track_plot(summary_list = t, 140 | col = track_cols, 141 | show_ideogram = TRUE, 142 | peaks = oct4_nanog_peaks) 143 | ``` 144 | 145 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/2531af5e-7200-478e-aa90-4ff5f537f57a) 146 | 147 | #### Add some chromHMM tracks to the bottom 148 | 149 | chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an [example](https://github.com/PoisonAlien/trackplot/blob/master/inst/extdata/narrowpeak/H1_chromHMM.bed) file. 150 | 151 | Note that the color code for each of the 15 states are as described [here](https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeBroadHmm&db=hg19). 152 | In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument `chromHMM_cols` 153 | 154 | ```r 155 | chromHMM_peaks = "H1_chromHMM.bed" 156 | 157 | track_plot(summary_list = t, 158 | col = track_cols, 159 | show_ideogram = TRUE, 160 | peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks) 161 | ``` 162 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/5ef8d09f-1bdf-4622-9367-4245bdec63d5) 163 | 164 | #### Add some chromHMM tracks from UCSC 165 | 166 | UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data. 167 | In this case, use the argument `ucscChromHMM` with any values from TableName column of the below table. 168 | 169 | ```r 170 | TableName cell Description Tissue Karyotype 171 | 1: wgEncodeBroadHmmGm12878HMM GM12878 B-lymphocyte, lymphoblastoid blood normal 172 | 2: wgEncodeBroadHmmH1hescHMM H1-hESC embryonic stem cells embryonic stem cell normal 173 | 3: wgEncodeBroadHmmHepg2HMM HepG2 hepatocellular carcinoma liver cancer 174 | 4: wgEncodeBroadHmmHepg2HMM HMEC mammary epithelial cells breast normal 175 | 5: wgEncodeBroadHmmHsmmHMM HSMM skeletal muscle myoblasts muscle normal 176 | 6: wgEncodeBroadHmmHuvecHMM HUVEC umbilical vein endothelial cells blood vessel normal 177 | ``` 178 | 179 | ```r 180 | track_plot(summary_list = t, 181 | col = track_cols, 182 | show_ideogram = TRUE, 183 | peaks = oct4_nanog_peaks, 184 | ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM")) 185 | ``` 186 | 187 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/fecf7ab1-44cb-4308-b3f4-d8ca03cdd15d) 188 | 189 | ### Re-organize tracks 190 | 191 | By default tracks are organized from top to bottom as `c("p", "b", "h", "g", "c")` corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument `layout_ord`. Furthermore, bigWig tracks themselves can be ordered with the argument `bw_ord` which accepts the names of the bigWig tracks as input and plots them in the given order. 192 | 193 | ```r 194 | #Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end. 195 | track_plot( 196 | summary_list = t, 197 | col = track_cols, 198 | show_ideogram = TRUE, 199 | genename = c("POU5F1", "TCF19"), 200 | peaks = oct4_nanog_peaks, 201 | peaks_track_names = c("NANOG", "OCT4"), 202 | groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0, 203 | bw_ord = c("NANOG", "OCT4"), 204 | layout_ord = c("h", "g", "c") 205 | ) 206 | ``` 207 | 208 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/11c9fe9b-0292-40af-8197-7f20a5275f01) 209 | 210 | 211 | ## narrowPeaks and broadPeaks 212 | 213 | All of the above plots can also be generated with [narrowPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) or [broadPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format13) files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per [UCSC convention](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) 214 | 215 | `narrowPeak` is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files. 216 | 217 | ```r 218 | narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed", 219 | "H1_k4me1.bed", "H1_k4me3.bed", "H1_Nanog.bed", 220 | "H1_Oct4.bed", "H1_Pol2.bed") 221 | 222 | #Use peak as input_type 223 | narrowPeaks = read_coldata(narrowPeaks, build = "hg19", input_type = "peak") 224 | 225 | oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example 226 | 227 | narrowPeaks_track = track_extract(colData = narrowPeaks, loci = oct4_loci) 228 | 229 | #Rest plotting is same 230 | track_plot(summary_list = narrowPeaks_track, 231 | show_ideogram = TRUE, 232 | peaks = oct4_nanog_peaks, 233 | ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM")) 234 | 235 | ``` 236 | 237 | ![image](https://github.com/PoisonAlien/trackplot/assets/8164062/fa3999fd-ab7f-4617-a43e-d3cac7f3a3b3) 238 | 239 | 240 | ## profileplots 241 | 242 | `profile_extract()` -> `profile_summarize()` -> `profile_plot()` are functions to generate density based profile-plots from bigWig files. 243 | 244 | 245 | * Below example for summarizing approx. 3,671 peaks for 3 bigWig files takes ca. 3 seconds on my 5 year old [macbook Pro](https://support.apple.com/kb/sp715?locale=en_GB). This includes generating signal matrix, summarizing, and plotting. Equivalent deeptools commands takes 20 seconds. 246 | * Optionally, it can also query UCSC genome browser for refseq transcripts of desired assembly and summarize around TSS regions 247 | * Replicates can be collapsed into single value per condition 248 | 249 | Example data from [GSE99183](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99183) where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer. 250 | 251 | ```r 252 | bws = c("GSM2634756_U87_BRD4.bw", "GSM2634757_U87_BRD4_dBET_24h.bw", "GSM2634758_U87_BRD4_dBET_2h.bw") 253 | bws = read_coldata(bws = bws, 254 | sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"), 255 | build = "hg19") 256 | ``` 257 | 258 | ### Refseq transcripts 259 | 260 | ```r 261 | #Extract signals from bigWig files around refseq transcripts 262 | pe_refseq = profile_extract(colData = bws, ucsc_assembly = TRUE, 263 | startFrom = 'start', up = 1500, down = 1500) 264 | 265 | #Estimate mean signal 266 | ps_refseq = profile_summarize(sig_list = pe_refseq) 267 | 268 | #Plot 269 | profile_plot(ps_refseq) 270 | ``` 271 | 272 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/bd26cdac-6f87-44ed-b41c-685454c6d28c) 273 | 274 | 275 | ### Custom BED regions 276 | 277 | ```r 278 | #BRD4 binding sites 279 | bed = "GSM2634756_U87_BRD4_peaks.narrowPeak.gz" 280 | 281 | #Center and extend 1500 both ways from the peak center 282 | pe_bed = profile_extract(colData = bws, bed = bed, startFrom = "center", 283 | up = 1500, down = 1500, nthreads = 4) 284 | 285 | #Estimate mean signal 286 | ps_bed = profile_summarize(sig_list = pe_bed) 287 | 288 | #Plot 289 | profile_plot(ps_bed) 290 | ``` 291 | 292 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/94c1d728-e38c-418a-8469-ba50c42dc295) 293 | 294 | 295 | ## heatmap 296 | 297 | Output from `profile_extract` can be used to draw a heatmap with `profile_heatmap` 298 | 299 | ```r 300 | profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8) 301 | ``` 302 | 303 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/a82eedc8-a3f3-4439-a005-13242fce7929) 304 | 305 | 306 | 307 | ***PSA*** If you find the tool useful, consider starring this repository or up voting this [Biostars thread](https://www.biostars.org/p/475853/) so that more poeple can find it :) 308 | 309 | ### Caveat 310 | 311 | * Windows OS is not supported 312 | 313 | 314 | ### Acknowledgements 315 | 316 | [Joschka Hey](https://github.com/HeyLifeHD) for all the cool suggestions :) 317 | -------------------------------------------------------------------------------- /inst/CITATION: -------------------------------------------------------------------------------- 1 | bibentry( 2 | bibtype = "Article", 3 | title = "Trackplot: a fast and lightweight R script for epigenomic enrichment plots", 4 | author = personList(as.person("Anand Mayakonda"), 5 | as.person("Frank Westermann")), 6 | journal = "Bioinformatics Advances", 7 | year = "2024", 8 | volume = "4", 9 | doi = "doi.org/10.1093/bioadv/vbae031" 10 | ) 11 | -------------------------------------------------------------------------------- /inst/extdata/bw/H1_Ctcf.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_Ctcf.bw -------------------------------------------------------------------------------- /inst/extdata/bw/H1_H2az.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_H2az.bw -------------------------------------------------------------------------------- /inst/extdata/bw/H1_Nanog.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_Nanog.bw -------------------------------------------------------------------------------- /inst/extdata/bw/H1_Oct4.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_Oct4.bw -------------------------------------------------------------------------------- /inst/extdata/bw/H1_k27ac.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_k27ac.bw -------------------------------------------------------------------------------- /inst/extdata/bw/H1_k4me1.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_k4me1.bw -------------------------------------------------------------------------------- /inst/extdata/bw/H1_k4me3.bw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_k4me3.bw -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_Ctcf.bed: -------------------------------------------------------------------------------- 1 | chr6 31126946 31127607 . 562 . 13.920298 15.7 -1 2 | chr6 31130121 31130512 . 589 . 15.050033 15.7 -1 3 | chr6 31139989 31141379 . 302 . 3.232853 3.5 -1 4 | chr6 31142509 31143777 . 303 . 3.290765 3.0 -1 5 | chr6 31142721 31142850 . 526 . 12.440937 4.2 -1 6 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_H2az.bed: -------------------------------------------------------------------------------- 1 | chr6 31127041 31127157 . 1000 . 6.568079 3.1 -1 2 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_Nanog.bed: -------------------------------------------------------------------------------- 1 | chr6 31139583 31140045 peak3497 584 . 1856.580 -1 -1 2 | chr6 31140724 31141062 peak3498 90 . 285.880 -1 -1 3 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_Oct4.bed: -------------------------------------------------------------------------------- 1 | chr6 31139576 31140109 peak2363 41 . 128.810 -1 -1 2 | chr6 31140574 31141112 peak2364 93 . 288.400 -1 -1 3 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_Pol2.bed: -------------------------------------------------------------------------------- 1 | chr6 31125655 31128143 peak25632 330 . 1066.980 -1 -1 2 | chr6 31128357 31133697 peak25633 958 . 3100.000 -1 -1 3 | chr6 31136367 31139334 peak25634 318 . 1029.150 -1 -1 4 | chr6 31139365 31141730 peak25635 1000 . 3233.060 -1 -1 5 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_chromHMM.bed: -------------------------------------------------------------------------------- 1 | chr6 31125621 31126021 1 2 | chr6 31126021 31127821 2 3 | chr6 31127821 31128221 6 4 | chr6 31128221 31129421 11 5 | chr6 31129421 31129621 7 6 | chr6 31129621 31129821 6 7 | chr6 31129821 31131621 9 8 | chr6 31131621 31132621 10 9 | chr6 31132621 31133421 9 10 | chr6 31133421 31133821 5 11 | chr6 31133821 31134421 7 12 | chr6 31134421 31135221 6 13 | chr6 31135221 31137221 2 14 | chr6 31137221 31138421 1 15 | chr6 31138421 31138821 2 16 | chr6 31138821 31139221 4 17 | chr6 31139221 31139821 5 18 | chr6 31139821 31141821 4 19 | chr6 31141821 31142221 5 20 | chr6 31142221 31144021 7 21 | chr6 31144021 31145221 11 22 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_k27ac.bed: -------------------------------------------------------------------------------- 1 | chr6 31120776 31147840 . 454 . 3.950245 13.4 -1 2 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_k4me1.bed: -------------------------------------------------------------------------------- 1 | chr6 31121821 31150316 . 460 . 4.127887 12.8 -1 2 | chr6 31126483 31128222 . 696 . 7.494269 14.7 -1 3 | chr6 31129449 31131098 . 621 . 6.429799 14.5 -1 4 | chr6 31132550 31135066 . 522 . 5.004274 14.4 -1 5 | chr6 31132739 31132931 . 936 . 10.929473 6.1 -1 6 | chr6 31133481 31133603 . 1000 . 13.579328 4.7 -1 7 | chr6 31133793 31133992 . 948 . 11.100020 6.9 -1 8 | chr6 31136002 31144622 . 632 . 6.585711 13.5 -1 9 | -------------------------------------------------------------------------------- /inst/extdata/narrowpeak/H1_k4me3.bed: -------------------------------------------------------------------------------- 1 | chr6 31124287 31128550 . 538 . 12.148244 14.6 -1 2 | chr6 31134594 31142336 . 660 . 16.792120 14.4 -1 3 | -------------------------------------------------------------------------------- /man/diffpeak.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{diffpeak} 4 | \alias{diffpeak} 5 | \title{Differential Peak Analysis} 6 | \usage{ 7 | diffpeak( 8 | summary_list = NULL, 9 | condition = NULL, 10 | log2 = TRUE, 11 | num = NULL, 12 | den = NULL 13 | ) 14 | } 15 | \arguments{ 16 | \item{summary_list}{Output from \code{\link{extract_summary}}} 17 | 18 | \item{condition}{a column name in \code{coldata} containing sample conditions. Default NULL.} 19 | 20 | \item{log2}{log2 convert data prior to testing. Default TRUE} 21 | 22 | \item{num}{Numerator condition. Default NULL} 23 | 24 | \item{den}{Denominator condition. Default NULL} 25 | } 26 | \description{ 27 | Differential Peak Analysis 28 | } 29 | \details{ 30 | Takes output from \code{extract_summary} and performs differential peak analysis with Limma 31 | } 32 | -------------------------------------------------------------------------------- /man/extract_summary.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{extract_summary} 4 | \alias{extract_summary} 5 | \title{Extract area under the curve for every peak from from given bigWig files.} 6 | \usage{ 7 | extract_summary( 8 | colData, 9 | bed = NULL, 10 | ucsc_assembly = TRUE, 11 | startFrom = "start", 12 | binSize = 50, 13 | up = 2500, 14 | down = 2500, 15 | pc_genes = TRUE, 16 | nthreads = 4 17 | ) 18 | } 19 | \arguments{ 20 | \item{colData}{bigWig files. Default NULL. Required.} 21 | 22 | \item{bed}{bed file or a data.frame with first 3 column containing chromosome, star, end positions.} 23 | 24 | \item{ucsc_assembly}{If \code{bed} file not provided, setting \code{ucsc_assembly} to TRUE will fetch transcripts from UCSC genome browser.} 25 | 26 | \item{startFrom}{Default "start". For bed files this can be "start", "center" or "end". For \code{ucsc_assembly} this can only be "start" or "end"} 27 | 28 | \item{binSize}{bin size to extract signal. Default 50 (bps). Should be >1} 29 | 30 | \item{up}{extend upstream by this many bps from \code{startFrom}. Default 2500} 31 | 32 | \item{down}{extend downstream by this many bps from \code{startFrom}. Default 2500} 33 | 34 | \item{pc_genes}{Use only protein coding genes when using \code{ucsc_assembly}. Default TRUE} 35 | 36 | \item{nthreads}{Default 4} 37 | } 38 | \description{ 39 | Extract area under the curve for every peak from from given bigWig files. 40 | } 41 | -------------------------------------------------------------------------------- /man/pca_plot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{pca_plot} 4 | \alias{pca_plot} 5 | \title{Draw a PCA plot} 6 | \usage{ 7 | pca_plot( 8 | summary_list = NULL, 9 | top = 1000, 10 | log2 = FALSE, 11 | xpc = "PC1", 12 | ypc = "PC2", 13 | color_by = NULL, 14 | pch_by = NULL, 15 | color = NULL, 16 | show_cree = TRUE, 17 | lab_size = 1, 18 | size = 1, 19 | legend_size = 1, 20 | legendpos = "topright", 21 | legendpos2 = "bottomright" 22 | ) 23 | } 24 | \arguments{ 25 | \item{summary_list}{output from extract_summary} 26 | 27 | \item{top}{Top most variable peaks to consider for PCA. Default 1000} 28 | 29 | \item{log2}{log transform data? Default FALSE. IF TRUE, adds a small positive value and log2 converts.} 30 | 31 | \item{xpc}{Default PC1} 32 | 33 | \item{ypc}{Default PC2} 34 | 35 | \item{color_by}{a column name in \code{coldata} to color by. Default NULL.} 36 | 37 | \item{pch_by}{a column name in \code{coldata} to pch by. Default NULL.} 38 | 39 | \item{color}{Manual colors for each level in \code{color_by} Default NULL.} 40 | 41 | \item{show_cree}{If TRUE draws a cree plot. Default TRUE} 42 | 43 | \item{lab_size}{Font size for labels. Default 1} 44 | 45 | \item{size}{Point size. Default 1} 46 | 47 | \item{legend_size}{Default 1} 48 | 49 | \item{legendpos}{Default topright} 50 | 51 | \item{legendpos2}{Default bottomright} 52 | } 53 | \description{ 54 | Draw a PCA plot 55 | } 56 | -------------------------------------------------------------------------------- /man/profile_extract.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{profile_extract} 4 | \alias{profile_extract} 5 | \title{Generate bigWig signal matrix for given genomic regions or ucsc refseq transcripts} 6 | \usage{ 7 | profile_extract( 8 | colData = NULL, 9 | bed = NULL, 10 | ucsc_assembly = TRUE, 11 | startFrom = "start", 12 | binSize = 50, 13 | up = 2500, 14 | down = 2500, 15 | pc_genes = TRUE, 16 | nthreads = 4 17 | ) 18 | } 19 | \arguments{ 20 | \item{colData}{from \code{read_coldata}} 21 | 22 | \item{bed}{bed file or a data.frame with first 3 column containing chromosome, star, end positions.} 23 | 24 | \item{ucsc_assembly}{If \code{bed} file not provided, setting \code{ucsc_assembly} to TRUE will fetch transcripts from UCSC genome browser.} 25 | 26 | \item{startFrom}{Default "start". For bed files this can be "start", "center" or "end". For \code{ucsc_assembly} this can only be "start" or "end"} 27 | 28 | \item{binSize}{bin size to extract signal. Default 50 (bps). Should be >1} 29 | 30 | \item{up}{extend upstream by this many bps from \code{startFrom}. Default 2500} 31 | 32 | \item{down}{extend downstream by this many bps from \code{startFrom}. Default 2500} 33 | 34 | \item{pc_genes}{Use only protein coding genes when \code{ucsc_assembly} is used. Default TRUE} 35 | 36 | \item{nthreads}{Default 4} 37 | } 38 | \description{ 39 | Generate bigWig signal matrix for given genomic regions or ucsc refseq transcripts 40 | } 41 | \seealso{ 42 | \code{\link{profile_summarize}} \code{\link{profile_plot}} \code{\link{profile_heatmap}} 43 | } 44 | -------------------------------------------------------------------------------- /man/profile_heatmap.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{profile_heatmap} 4 | \alias{profile_heatmap} 5 | \title{Draw a heatmap} 6 | \usage{ 7 | profile_heatmap( 8 | mat_list, 9 | sortBy = "mean", 10 | col_pal = "Blues", 11 | revpal = FALSE, 12 | sample_names = NULL, 13 | title_size = 1, 14 | top_profile = FALSE, 15 | top_profile_h = 2, 16 | zmins = NULL, 17 | zmaxs = NULL, 18 | scale = FALSE, 19 | file_name = NULL, 20 | hm_width = 1024, 21 | hm_height = 600, 22 | mat_order = NULL 23 | ) 24 | } 25 | \arguments{ 26 | \item{mat_list}{Input matrix list generated by \code{\link{profile_extract}}} 27 | 28 | \item{sortBy}{Sort matrix by.., Can be mean, median. Default mean.} 29 | 30 | \item{col_pal}{Color palette to use. Default Blues. Use hcl.pals(type = "sequential") to see available palettes} 31 | 32 | \item{revpal}{Reverse color palette? Default FALSE.} 33 | 34 | \item{sample_names}{Manually specify sample names.} 35 | 36 | \item{title_size}{size of title. Default 0.8} 37 | 38 | \item{top_profile}{Boolean. Whether to draw top profile plot.} 39 | 40 | \item{top_profile_h}{Default 2.} 41 | 42 | \item{zmins}{Manually specify min scores to include in heatmap} 43 | 44 | \item{zmaxs}{Manually specify max scores to include in heatmap} 45 | 46 | \item{scale}{Whether to row scale the matrix. Default FALSE} 47 | 48 | \item{file_name}{Default NULL. If provided saves plot as a png.} 49 | 50 | \item{hm_width}{Width of the plot. Default 1024} 51 | 52 | \item{hm_height}{Height of the plot. Default 600} 53 | 54 | \item{mat_order}{Default NULL. Sample order in which the heatmaps are drawn.} 55 | } 56 | \description{ 57 | Draw a heatmap 58 | } 59 | \details{ 60 | This function takes output from extract_matrix and draws a heatmap 61 | } 62 | -------------------------------------------------------------------------------- /man/profile_plot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{profile_plot} 4 | \alias{profile_plot} 5 | \title{Draw a profile plot} 6 | \usage{ 7 | profile_plot( 8 | sig_list = NULL, 9 | color = NULL, 10 | line_size = 1, 11 | legend_fs = 1, 12 | axis_fs = 1, 13 | xlab = NA, 14 | ylab = NA 15 | ) 16 | } 17 | \arguments{ 18 | \item{sig_list}{Output generated from profile_summarize} 19 | 20 | \item{color}{Manual colors for each bigWig. Default NULL.} 21 | 22 | \item{line_size}{Default 1} 23 | 24 | \item{legend_fs}{Legend font size. Default 1} 25 | 26 | \item{axis_fs}{Axis font size. Default 1} 27 | 28 | \item{xlab}{x axis label. Default NA} 29 | 30 | \item{ylab}{y axis label. Default NA} 31 | } 32 | \description{ 33 | Draw a profile plot 34 | } 35 | -------------------------------------------------------------------------------- /man/profile_summarize.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{profile_summarize} 4 | \alias{profile_summarize} 5 | \title{Summarize data for profile plots} 6 | \usage{ 7 | profile_summarize(sig_list = NULL, stat = "mean", condition = NULL) 8 | } 9 | \arguments{ 10 | \item{sig_list}{Output generated from \code{profile_extract}} 11 | 12 | \item{stat}{Default \code{mean}. Can be \code{mean}, \code{median}} 13 | 14 | \item{condition}{column name with conditions in \code{colData}. If provided summarizes signals from samples belonging to same group or condition} 15 | } 16 | \description{ 17 | Summarize data for profile plots 18 | } 19 | \seealso{ 20 | \code{\link{profile_extract}} \code{\link{profile_plot}} \code{\link{profile_heatmap}} 21 | } 22 | -------------------------------------------------------------------------------- /man/read_coldata.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{read_coldata} 4 | \alias{read_coldata} 5 | \title{Prepares meta data table from bigWig files. 6 | Output from this function is passed to all downstream functions.} 7 | \usage{ 8 | read_coldata( 9 | bws = NULL, 10 | sample_names = NULL, 11 | build = "hg38", 12 | input_type = "bw" 13 | ) 14 | } 15 | \arguments{ 16 | \item{bws}{path to bigWig files} 17 | 18 | \item{sample_names}{sample names for each input files. Optional. Default NULL - creates one from file names.} 19 | 20 | \item{build}{Reference genome build. Default hg38} 21 | 22 | \item{input_type}{Default \code{bw}. Can be \code{bw} or \code{peak}} 23 | } 24 | \description{ 25 | Prepares meta data table from bigWig files. 26 | Output from this function is passed to all downstream functions. 27 | } 28 | \examples{ 29 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\\\.bw$", full.names = TRUE) 30 | cd = read_coldata(bws = bigWigs, build = "hg19") 31 | beds = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "\\\\.bed$", full.names = TRUE) 32 | cd_bed = read_coldata(bws = beds, input_type = "peak", build = "hg19") 33 | } 34 | -------------------------------------------------------------------------------- /man/summarize_homer_annots.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{summarize_homer_annots} 4 | \alias{summarize_homer_annots} 5 | \title{Parse peak annotations generated by homer annotatePeaks.pl} 6 | \usage{ 7 | summarize_homer_annots( 8 | anno, 9 | sample_names = NULL, 10 | legend_font_size = 1, 11 | label_size = 0.8 12 | ) 13 | } 14 | \arguments{ 15 | \item{anno}{Raw annotations generated by homer \code{annotatePeaks.pl}. Can be more than one file.} 16 | 17 | \item{sample_names}{Sample names correspoding to each input file. Default parses from input file.} 18 | 19 | \item{legend_font_size}{font size for legend. Default 1.} 20 | 21 | \item{label_size}{font size for labels. Default 0.8.} 22 | } 23 | \description{ 24 | Parse peak annotations generated by homer annotatePeaks.pl 25 | } 26 | \details{ 27 | summarizes peak annotations generated with homer annotatePeaks.pl, generates a pie chart of peak distributions. 28 | } 29 | -------------------------------------------------------------------------------- /man/track_extract.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{track_extract} 4 | \alias{track_extract} 5 | \title{Extract bigWig track data for the given loci} 6 | \usage{ 7 | track_extract( 8 | colData = NULL, 9 | loci = NULL, 10 | gene = NULL, 11 | binsize = 10, 12 | nthreads = 1, 13 | query_ucsc = TRUE, 14 | gtf = NULL, 15 | build = "hg38", 16 | padding = 0, 17 | ideoTblName = "cytoBand" 18 | ) 19 | } 20 | \arguments{ 21 | \item{colData}{coldata from \code{read_coldata}} 22 | 23 | \item{loci}{target region to plot. Should be of format "chr:start-end". e.g; chr3:187715903-187752003 OR chr3:187,715,903-187,752,003} 24 | 25 | \item{gene}{gene name. This is mutually exclusive with \code{loci}} 26 | 27 | \item{binsize}{bin size to extract signal. Default 10 (bps).} 28 | 29 | \item{nthreads}{Default 1. Number of threads to use.} 30 | 31 | \item{query_ucsc}{Default TRUE. Queries UCSC and extracts gene models and cytoband for the loci. Requires \code{mysql} installation.} 32 | 33 | \item{gtf}{Use gtf file or data.frame as source for gene model. Default NULL.} 34 | 35 | \item{build}{Reference genome build. Default hg38} 36 | 37 | \item{padding}{Extend locus on both sides by this many bps.} 38 | 39 | \item{ideoTblName}{Table name for ideogram. Default \code{cytoBand}} 40 | } 41 | \description{ 42 | Extract bigWig track data for the given loci 43 | } 44 | \examples{ 45 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\\\.bw$", full.names = TRUE) 46 | cd = read_coldata(bws = bigWigs, build = "hg19") 47 | oct4_loci = "chr6:31125776-31144789" 48 | t = track_extract(colData = cd, loci = oct4_loci, build = "hg19") 49 | } 50 | -------------------------------------------------------------------------------- /man/track_plot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{track_plot} 4 | \alias{track_plot} 5 | \title{Generate IGV style locus tracks with ease} 6 | \usage{ 7 | track_plot( 8 | summary_list = NULL, 9 | draw_gene_track = TRUE, 10 | show_ideogram = TRUE, 11 | col = "gray70", 12 | groupAutoScale = FALSE, 13 | y_max = NULL, 14 | y_min = NULL, 15 | txname = NULL, 16 | genename = NULL, 17 | show_axis = FALSE, 18 | gene_fsize = 1, 19 | track_names = NULL, 20 | track_names_pos = 0, 21 | track_names_to_left = FALSE, 22 | track_overlay = FALSE, 23 | regions = NULL, 24 | collapse_txs = TRUE, 25 | boxcol = "#ffc41a", 26 | boxcolalpha = 0.4, 27 | chromHMM = NULL, 28 | chromHMM_cols = NULL, 29 | chromHMM_names = NULL, 30 | ucscChromHMM = NULL, 31 | peaks = NULL, 32 | bw_track_height = 3, 33 | peaks_track_height = 2, 34 | gene_track_height = 2, 35 | scale_track_height = 2, 36 | chromHMM_track_height = 1, 37 | cytoband_track_height = 2, 38 | peaks_track_names = NULL, 39 | left_mar = NULL, 40 | bw_ord = NULL, 41 | layout_ord = c("p", "b", "h", "g", "c") 42 | ) 43 | } 44 | \arguments{ 45 | \item{summary_list}{Output from track_extract} 46 | 47 | \item{draw_gene_track}{Default FALSE. If TRUE plots gene models overlapping with the queried region} 48 | 49 | \item{show_ideogram}{Default TRUE. If TRUE plots ideogram of the target chromosome with query loci highlighted. Works only when \code{query_ucsc} is TRUE.} 50 | 51 | \item{col}{Color for tracks. Default \verb{#2f3640}. Multiple colors can be provided for each track} 52 | 53 | \item{groupAutoScale}{Default TRUE} 54 | 55 | \item{y_max}{custom y axis upper limits for each track. Recycled if required.} 56 | 57 | \item{y_min}{custom y axis lower limits for each track. Recycled if required.} 58 | 59 | \item{txname}{transcript name to draw. Default NULL. Plots all transcripts overlapping with the queried region} 60 | 61 | \item{genename}{gene name to draw. Default NULL. Plots all genes overlapping with the queried region} 62 | 63 | \item{show_axis}{Default FALSE} 64 | 65 | \item{gene_fsize}{Font size. Default 1} 66 | 67 | \item{track_names}{Default NULL} 68 | 69 | \item{track_names_pos}{Default 0 (corresponds to left corner)} 70 | 71 | \item{track_names_to_left}{If TRUE, track names are shown to the left of the margin. Default FALSE, plots on top as a title} 72 | 73 | \item{track_overlay}{Draws all bigWigs in a single track as a line plot} 74 | 75 | \item{regions}{genomic regions to highlight. A data.frame with at-least three columns containing chr, start and end positions.} 76 | 77 | \item{collapse_txs}{Default FALSE. Whether to collapse all transcripts belonging to same gene into a unified gene model} 78 | 79 | \item{boxcol}{color for highlighted region. Default "#192A561A"} 80 | 81 | \item{boxcolalpha}{Default 0.5} 82 | 83 | \item{chromHMM}{chromHMM data. Can be path to bed files or a list data.frames with first three columns containing chr,start,end and a 4th column containing integer coded state} 84 | 85 | \item{chromHMM_cols}{A named vector for each state (in the 4th column of chromHMM file). Default NULL} 86 | 87 | \item{chromHMM_names}{name for the chromHMM track} 88 | 89 | \item{ucscChromHMM}{Name of the chromHMM table. Use .get_ucsc_hmm_tbls() to see the details.} 90 | 91 | \item{peaks}{bed file to be highlighted. Can be path to bed files or a list data.frames with first three columns containing chr,start,end.} 92 | 93 | \item{bw_track_height}{Default 3} 94 | 95 | \item{peaks_track_height}{Default 2.} 96 | 97 | \item{gene_track_height}{Default 2} 98 | 99 | \item{scale_track_height}{Default 1} 100 | 101 | \item{chromHMM_track_height}{Default 1} 102 | 103 | \item{cytoband_track_height}{Default 1} 104 | 105 | \item{peaks_track_names}{Provide a name for each loci bed file. Default NULL} 106 | 107 | \item{left_mar}{Space to the left. Default 4} 108 | 109 | \item{bw_ord}{Names of the tracks to be drawn in the provided order. Default NULL.} 110 | 111 | \item{layout_ord}{Plot layout order. Deafult c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, cytoband track.} 112 | } 113 | \description{ 114 | Generate IGV style locus tracks with ease 115 | } 116 | \examples{ 117 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\\\.bw$", full.names = TRUE) 118 | cd = read_coldata(bws = bigWigs, build = "hg19") 119 | oct4_loci = "chr6:31125776-31144789" 120 | t = track_extract(colData = cd, loci = oct4_loci, build = "hg19") 121 | trackplot::track_plot(summary_list = t) 122 | } 123 | -------------------------------------------------------------------------------- /man/track_summarize.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{track_summarize} 4 | \alias{track_summarize} 5 | \title{Summarize tracks per condition} 6 | \usage{ 7 | track_summarize(summary_list = NULL, condition = NULL, stat = "mean") 8 | } 9 | \arguments{ 10 | \item{summary_list}{Output from track_extract. Required.} 11 | 12 | \item{condition}{a column name in \code{coldata} containing sample conditions. Default NULL.} 13 | 14 | \item{stat}{can be \verb{mean, median}, \code{max}, \code{min}. NAs are excluded.} 15 | } 16 | \description{ 17 | Summarize tracks per condition 18 | } 19 | -------------------------------------------------------------------------------- /man/volcano_plot.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/trackplot.R 3 | \name{volcano_plot} 4 | \alias{volcano_plot} 5 | \title{Volcano plot for limma results from differential peak Analysis} 6 | \usage{ 7 | volcano_plot( 8 | res = NULL, 9 | fdr = 0.1, 10 | upcol = "#d35400", 11 | downcol = "#1abc9c", 12 | alpha = 0.6, 13 | size = 0.8 14 | ) 15 | } 16 | \arguments{ 17 | \item{res}{Output from \code{\link{diffpeak}}} 18 | 19 | \item{fdr}{FDR threshold. Default 0.1} 20 | 21 | \item{upcol}{color for up-regulated. Default "#d35400"} 22 | 23 | \item{downcol}{color for up-regulated. Default "#1abc9c"} 24 | 25 | \item{alpha}{Default 0.6} 26 | 27 | \item{size}{Point size. Default 0.8} 28 | } 29 | \description{ 30 | Volcano plot for limma results from differential peak Analysis 31 | } 32 | \details{ 33 | Takes output from \code{diffpeak} and draws volcano plot 34 | } 35 | -------------------------------------------------------------------------------- /trackplot.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | 15 | BuildType: Package 16 | PackageUseDevtools: Yes 17 | PackageInstallArgs: --no-multiarch --with-keep.source 18 | -------------------------------------------------------------------------------- /vignettes/trackplot.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "trackplot tutorial" 3 | output: 4 | html_document: 5 | toc: yes 6 | toc_depth: 3 7 | toc_float: yes 8 | number_sections: yes 9 | self_contained: yes 10 | vignette: > 11 | %\VignetteIndexEntry{01: trackplot tutorial} 12 | %\VignetteEngine{knitr::rmarkdown} 13 | %\VignetteEncoding{UTF-8} 14 | --- 15 | 16 | ```{r, include = FALSE} 17 | knitr::opts_chunk$set( 18 | collapse = TRUE, 19 | comment = "#>" 20 | ) 21 | ``` 22 | 23 | # Introduction 24 | 25 | `trackplot` is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots, and heatmaps from bigWig files. 26 | Libraries in R such as `Gviz` and `karyoploteR` facilitates such visualization but they are slow, complicated, and come with a large number of dependencies. 27 | 28 | `trackplot` overcomes the [dependency heaviness](https://en.wikipedia.org/wiki/Dependency_hell) by taking advantage of existing `bwtool` for faster bigWig processing, and by performing all the plotting in base R. 29 | 30 | Some of the features include: 31 | 32 | * It's significantly fast since most of the heavy lifting is done by [bwtool](https://github.com/CRG-Barcelona/bwtool). 33 | * Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible. 34 | * Supports GTF and standard UCSC gene formats as well. 35 | * Lightweight and minimal dependency 36 | - [data.table](https://cran.r-project.org/web/packages/data.table/index.html) and [bwtool](https://github.com/CRG-Barcelona/bwtool) are the only requirements. 37 | - Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages) 38 | * Customization: Each plot can be customized for color, scale, height, width, etc. 39 | * Tracks can be summarized per condition (by mean, median, max, min) 40 | 41 | ## Pre-requisites 42 | 43 | * [bwtool](https://github.com/CRG-Barcelona/bwtool) - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; `/usr/local/bin`). 44 | Or, you could also add the path where bwtool is located to R session with the below command. 45 | 46 | ```r 47 | #Example 48 | Sys.setenv(PATH = paste("/Users/anand/Documents/bwtool_dir/", Sys.getenv("PATH"), sep=":")) 49 | ``` 50 | 51 | * If you have trouble compiling the tool, follow [these](https://gist.github.com/PoisonAlien/e19b482ac6146bfb03142a0de1c4fbc8) instructions. Alternatively, you can download the pre-built binary for [macOS](https://www.dropbox.com/s/kajx9ya6erzyrim/bwtool_macOS.tar.gz?dl=1) or [centOS](https://www.dropbox.com/s/77ek89jqfhcmouu/bwtool_centOS_x86_64.tar.gz?dl=1) 52 | 53 | 54 | # Usage 55 | 56 | Here ENCODE data from human embryonic stem cell line `H1` are used for demonstration. 57 | 58 | ```{r setup} 59 | library(trackplot) 60 | ``` 61 | 62 | All you need is a list of bigWig files to be analyzed. Make a column data of bigWig file names with `read_coldata()`. 63 | 64 | ```{r, importBW} 65 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 66 | 67 | #Generate a coldata table 68 | h1 = trackplot::read_coldata(bigWigs, build = "hg19") 69 | 70 | print(h1) 71 | 72 | #You can add any sort of metadata such as condition, treatment, etc to the above - similar to coldata in RNA-Seq analysis. 73 | ``` 74 | 75 | Above table will be the input for most of the downstream functions. 76 | 77 | 78 | 79 | # trackplot 80 | 81 | `trackplot` generates an IGV style tracks with signal intensities, gene models and other desired annotation. 82 | 83 | It is a two step process involving `track_extract()` -> `track_plot()` 84 | 85 | ## Extract signal 86 | 87 | Extract bigWig signal for the genomic regions of interest. This can be either 88 | 89 | * a genomic loci [e.g; `chr6:31115776-31154789`] 90 | * a gene name [e.g; `POU5F1`] 91 | 92 | ```{r} 93 | #Region to plot 94 | oct4_loci = "chr6:31125776-31144789" 95 | 96 | #Extract bigWig signal for the above loci 97 | t_loci = track_extract(colData = h1, loci = oct4_loci) 98 | 99 | #To extract bigWig signal for the gene of interest 100 | # t_gene = track_extract(colData = h1, gene = "POU5F1") 101 | ``` 102 | 103 | ## Plot 104 | 105 | ### Basic plot 106 | 107 | ```{r, plot1, fig.height=5, fig.width=7} 108 | track_plot(summary_list = t_loci) 109 | ``` 110 | 111 | ### Change colors for each track 112 | 113 | ```{r, plot_col, fig.height=5, fig.width=7} 114 | track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60") 115 | track_plot(summary_list = t_loci, col = track_cols) 116 | ``` 117 | 118 | ### Collapse all tracks into a single track 119 | 120 | ```{r} 121 | track_plot(summary_list = t_loci, track_overlay = T, col = track_cols, show_ideogram = FALSE, genename = c("POU5F1", "TCF19"), gene_track_height = 1) 122 | ``` 123 | 124 | 125 | ### Heighlight sites at the top 126 | 127 | Using BED files or data.frame in BED format to heightlight target regions of interest 128 | 129 | ```{r, plot_tfbs, fig.height=5.2, fig.width=7} 130 | tf_beds = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "Nanog|Oct4", full.names = TRUE) 131 | print(basename(tf_beds)) 132 | track_plot( 133 | summary_list = t_loci, 134 | col = track_cols, 135 | peaks = tf_beds, 136 | peaks_track_names = c("NANOG", "OCT4") 137 | ) 138 | ``` 139 | 140 | 141 | 142 | ### Show only specific genes 143 | 144 | Use `genename` argument to show only specific genes in the gene track 145 | 146 | ```{r, plot_geneneames, fig.height=5.2, fig.width=7} 147 | track_plot( 148 | summary_list = t_loci, 149 | col = track_cols, 150 | peaks = tf_beds, 151 | peaks_track_names = c("NANOG", "OCT4"), 152 | genename = c("POU5F1", "TCF19") 153 | ) 154 | ``` 155 | 156 | ### Include chromHMM data 157 | 158 | chromHMM data should be a BED file with the 4th column containing chromatin state. See here for an [example](https://github.com/PoisonAlien/trackplot/blob/master/inst/extdata/narrowpeak/H1_chromHMM.bed) file. 159 | 160 | ```{} 161 | chr6 31125621 31126021 1 162 | chr6 31126021 31127821 2 163 | chr6 31127821 31128221 6 164 | chr6 31128221 31129421 11 165 | ``` 166 | 167 | 168 | Note that the color code for each of the 15 states are as described [here](https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeBroadHmm&db=hg19). 169 | In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument `chromHMM_cols` 170 | 171 | ```{r, plot_chrHMM, fig.height=5.2, fig.width=7} 172 | #Example chromHMM data for H1 from UCSC 173 | h1_chrHMM = system.file("extdata", "narrowpeak", "H1_chromHMM.bed", package = "trackplot") 174 | 175 | track_plot( 176 | summary_list = t_loci, 177 | col = track_cols, 178 | peaks = tf_beds, 179 | peaks_track_names = c("NANOG", "OCT4"), 180 | genename = c("POU5F1", "TCF19"), 181 | chromHMM = h1_chrHMM, 182 | chromHMM_names = "H1" 183 | ) 184 | ``` 185 | 186 | 187 | ## Re-organize tracks 188 | 189 | By default tracks are organized from top to bottom as `c("p", "b", "h", "g", "c")` corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument `layout_ord`. Furthermore, bigWig tracks themselves can be ordered with the argument `bw_ord` which accepts the names of the bigWig tracks as input and plots them in the given order. 190 | 191 | ```{r} 192 | #Re-organize the layout in the order chromHMM track, gene track, cytoband track, bigWig tracks and peak track. 193 | track_plot( 194 | summary_list = t_loci, 195 | col = track_cols, 196 | peaks = tf_beds, 197 | peaks_track_names = c("NANOG", "OCT4"), 198 | genename = c("POU5F1", "TCF19"), 199 | chromHMM = h1_chrHMM, 200 | chromHMM_names = "H1", layout_ord = c("h", "g", "c", "b", "p") 201 | ) 202 | ``` 203 | 204 | ## Overlay tracks 205 | 206 | ```{r} 207 | #Re-organize the layout in the order chromHMM track, gene track, cytoband track, bigWig tracks and peak track. 208 | track_plot( 209 | summary_list = t_loci, 210 | col = track_cols, 211 | peaks = tf_beds, 212 | peaks_track_names = c("NANOG", "OCT4"), 213 | genename = c("POU5F1", "TCF19"), 214 | chromHMM = h1_chrHMM, 215 | chromHMM_names = "H1", layout_ord = c("h", "g", "c", "b", "p"), track_overlay = TRUE 216 | ) 217 | ``` 218 | 219 | ## Peak files as input 220 | 221 | All of the above plots can also be generated with [narrowPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) or [broadPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format13) files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per [UCSC convention](https://genome.ucsc.edu/FAQ/FAQformat.html#format1) 222 | 223 | `narrowPeak` is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files. 224 | Process is similar, `read_coldata` -> `track_extract` -> `track_plot` 225 | 226 | ```{r plot_narrowPeak, fig.height=5.2, fig.width=7} 227 | narrowPeak_files = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "\\.bed$", full.names = TRUE) |> grep(pattern = "chromHMM.bed$", invert = TRUE, value = TRUE) 228 | print(basename(narrowPeak_files)) 229 | 230 | #Generate a coldata table 231 | h1_peaks = read_coldata(narrowPeak_files, build = "hg19", input_type = "peak") 232 | 233 | #Extract signal intensities from narrowPeak files 234 | oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example 235 | t_loci2 = track_extract(colData = h1_peaks, loci = oct4_loci) 236 | 237 | #Plot 238 | track_plot( 239 | summary_list = t_loci2, 240 | col = track_cols, 241 | peaks = tf_beds, 242 | peaks_track_names = c("NANOG", "OCT4"), 243 | genename = c("POU5F1", "TCF19"), 244 | chromHMM = h1_chrHMM, 245 | chromHMM_names = "H1" 246 | ) 247 | ``` 248 | 249 | # profileplot 250 | 251 | Example data from [GSE99183](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99183) where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer. 252 | 253 | ```{r, profile_plot, fig.height=5.2, fig.width=7} 254 | bws = list.files(path = "/Users/anandmayakonda/Documents/Documents_MacBookPro_work/github/data.nosync/GSE99183_U87_BRD4/", pattern = "bw", full.names = TRUE) 255 | bws = grep(pattern = "BRD3|BRD2|H3K27Ac|Pol2|H3K4|E2F1", x = bws, value = TRUE, invert = TRUE) 256 | print(basename(bws)) 257 | ``` 258 | 259 | ```{r, profile_plot_cd} 260 | bws = trackplot::read_coldata(bws = bws, sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"), build = "hg19") 261 | 262 | #Add some metadata info 263 | bws$condition = c("dmso", "dBET", "dBET") 264 | 265 | print(bws) 266 | ``` 267 | 268 | `profile_extract()` -> `profile_summarize()` -> `profile_plot()` 269 | 270 | ### Around TSS 271 | 272 | Setting `ucsc_assembly = TRUE` will fetch refseq transcripts from UCSC browser and plots the signal distribution around the TSS or TES. 273 | 274 | ```{r, profilePlot_tss, fig.height=5, fig.width=5} 275 | #Extract signals from bigWig files around refseq transcripts 276 | pe_refseq = trackplot::profile_extract(colData = bws, ucsc_assembly = TRUE, startFrom = 'start', up = 1500, down = 1500) 277 | #Estimate mean signal 278 | ps_refseq = trackplot::profile_summarize(sig_list = pe_refseq) 279 | ``` 280 | 281 | ```{r, fig.height=5, fig.width=6} 282 | #Plot 283 | trackplot::profile_plot(ps_refseq, legend_fs = 0.7, xlab = "TSS", ylab = "RPM") 284 | ``` 285 | 286 | ### Around BED regions of intersert 287 | 288 | Providing a BED file or a data.frame in BED format.. 289 | 290 | ```{r, profilePlot_bed} 291 | #BRD4 binding sites 292 | bed = "/Users/anandmayakonda/Documents/Documents_MacBookPro_work/github/data.nosync/GSE99183_U87_BRD4/GSM2634756_U87_BRD4_peaks.narrowPeak" 293 | #Center and extend 1500 both ways from the peak center 294 | pe_bed = trackplot::profile_extract(colData = bws, bed = bed, startFrom = "center", up = 1500, down = 1500, nthreads = 4) 295 | #Estimate mean signal 296 | ps_bed = trackplot::profile_summarize(sig_list = pe_bed) 297 | ``` 298 | 299 | ```{r, fig.height=5, fig.width=6} 300 | #Plot 301 | trackplot::profile_plot(ps_bed, legend_fs = 0.7, xlab = "peak center", ylab = "RPM") 302 | ``` 303 | 304 | # heatmap 305 | 306 | Output from `profile_extract` can be used to draw a heatmap with `profile_heatmap` 307 | 308 | ```{r, heatmap, fig.height=7, fig.width=5} 309 | trackplot::profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8) 310 | ``` 311 | 312 | 313 | # Peak intensities 314 | 315 | Extract peak intensities. Similar to `profile_extract`, setting `ucsc_assembly = TRUE` will fetch refseq transcripts from UCSC browser and extracts peak intensities aroud the TSS or TES. 316 | 317 | ```{r, es} 318 | tss_summary = trackplot::extract_summary(colData = bws, ucsc_assembly = TRUE, up = 1500, down = 1500) #For TSS regions 319 | bed_summary = trackplot::extract_summary(colData = bws, bed = bed, up = 1500, down = 1500) #For BED regions 320 | ``` 321 | 322 | output includes peak intensities across all the sites 323 | 324 | ```{r, esdata} 325 | print(tss_summary$data) 326 | ``` 327 | 328 | Above signal intensities can be used to perform PCA analysis or do differential peak analysis. 329 | 330 | ## PCA 331 | 332 | ```{r, espca} 333 | trackplot::pca_plot(summary_list = tss_summary, log2 = TRUE, color_by = "condition", lab_size = 0.6) 334 | ``` 335 | 336 | 337 | ## Differential peak analysis 338 | 339 | Although not recommended, above peak intensities can be used to perform a differential peak analysis between two conditions using [limma](https://www.bioconductor.org/packages/release/bioc/html/limma.html) 340 | 341 | WARNING: Please make sure that the bigWigs contain normalized intensities such as RPM. Otherwise you should strongly consider using tools such as [DiffBind](https://bioconductor.org/packages/devel/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf) 342 | 343 | 344 | ```{r, eslimma} 345 | library(limma) 346 | dp_bed = trackplot::diffpeak(summary_list = bed_summary, condition = "condition", num = "dBET", den = "dmso") 347 | print(dp_bed) 348 | ``` 349 | 350 | Above output can be plotted with `volcanoplot`. 351 | 352 | ```{r, volcano} 353 | trackplot::volcano_plot(res = dp_bed) 354 | ``` 355 | 356 | Here, the treatment effect is quite strong and lack of replicates makes it as a not so nice example. 357 | 358 | 359 | ## Sessioninfo 360 | 361 | ```{r} 362 | sessionInfo() 363 | ``` 364 | 365 | --------------------------------------------------------------------------------