├── DESCRIPTION
├── INSTALL
├── NAMESPACE
├── R
    └── trackplot.R
├── README.md
├── inst
    ├── CITATION
    └── extdata
    │   ├── bw
    │       ├── H1_Ctcf.bw
    │       ├── H1_H2az.bw
    │       ├── H1_Nanog.bw
    │       ├── H1_Oct4.bw
    │       ├── H1_k27ac.bw
    │       ├── H1_k4me1.bw
    │       └── H1_k4me3.bw
    │   └── narrowpeak
    │       ├── H1_Ctcf.bed
    │       ├── H1_H2az.bed
    │       ├── H1_Nanog.bed
    │       ├── H1_Oct4.bed
    │       ├── H1_Pol2.bed
    │       ├── H1_chromHMM.bed
    │       ├── H1_k27ac.bed
    │       ├── H1_k4me1.bed
    │       └── H1_k4me3.bed
├── man
    ├── diffpeak.Rd
    ├── extract_summary.Rd
    ├── pca_plot.Rd
    ├── profile_extract.Rd
    ├── profile_heatmap.Rd
    ├── profile_plot.Rd
    ├── profile_summarize.Rd
    ├── read_coldata.Rd
    ├── summarize_homer_annots.Rd
    ├── track_extract.Rd
    ├── track_plot.Rd
    ├── track_summarize.Rd
    └── volcano_plot.Rd
├── trackplot.Rproj
└── vignettes
    └── trackplot.Rmd


/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: trackplot
 2 | Title: Generate IGV style track plots and profile plots from bigWig files
 3 | Version: 1.6.00
 4 | Authors@R: person("Anand", "Mayakonda", , "anandmt3@gmail.com", role = c("aut", "cre"))
 5 | Description: trackplot is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files.
 6 | License: MIT
 7 | Encoding: UTF-8
 8 | Roxygen: list(markdown = TRUE)
 9 | RoxygenNote: 7.3.1
10 | Depends: data.table
11 | Imports: methods
12 | Suggests: 
13 |     knitr,
14 |     rmarkdown,
15 |     limma
16 | VignetteBuilder: knitr
17 | SystemRequirements: bwtool, MySQL: mysql-server (deb) or mysql-community-server (rpm) or mysql (brew)
18 | 


--------------------------------------------------------------------------------
/INSTALL:
--------------------------------------------------------------------------------
 1 | trackplot requires bwtool - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; `/usr/local/bin`) or a directory under the PATH. 
 2 | 
 3 | 
 4 | * For macOS: Please download the pre-build binary from here: https://www.dropbox.com/s/kajx9ya6erzyrim/bwtool_macOS.tar.gz?dl=1 
 5 | Make it executable with `chmod +x bwtool`. macOS gatekeeper might complain that it can not run the binary downloaded from the internet. If so, allow (https://support.apple.com/en-us/HT202491) it in the security settings. 
 6 | 
 7 | * For centOS or debian: Follow the below compilation instructions
 8 | 
 9 | git clone 'https://github.com/CRG-Barcelona/bwtool'
10 | git clone 'https://github.com/CRG-Barcelona/libbeato'
11 | git clone https://github.com/madler/zlib
12 | 
13 | cd libbeato/
14 | git checkout 0c30432af9c7e1e09ba065ad3b2bc042baa54dc2
15 | ./configure
16 | make
17 | 
18 | cd ../zlib
19 | ./configure
20 | make
21 | 
22 | cd ../bwtool
23 | ./configure CFLAGS='-I../libbeato -I../zlib' LDFLAGS='-L../libbeato/jkweb -L../libbeato/beato -L../zlib'
24 | make
25 | 
26 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(diffpeak)
 4 | export(extract_summary)
 5 | export(pca_plot)
 6 | export(profile_extract)
 7 | export(profile_heatmap)
 8 | export(profile_plot)
 9 | export(profile_summarize)
10 | export(read_coldata)
11 | export(summarize_homer_annots)
12 | export(track_extract)
13 | export(track_plot)
14 | export(track_summarize)
15 | export(volcano_plot)
16 | import(data.table)
17 | 


--------------------------------------------------------------------------------
/R/trackplot.R:
--------------------------------------------------------------------------------
   1 | # This R script contains functions for bigWig visualization
   2 | #
   3 | # Source code: https://github.com/PoisonAlien/trackplot
   4 | #
   5 | # MIT License
   6 | # Copyright (c) 2020 Anand Mayakonda <anandmt3@gmail.com>
   7 | #
   8 | # Change log:
   9 | # Version: 1.6.00 [2024-10-28]
  10 | #   * Added argument `track_overlay` for a single track with line plot. #14
  11 | #   * Bug fix: color alpha differs between tracks. Issue: #34
  12 | # Version: 1.5.10 [2024-02-14]
  13 | #   * Added argument `layout_ord` and `bw_ord` to `track_plot()` re-order the overall tracks and bigWig tracks
  14 | #   * Added `xlab` and `ylab` arguments to `profile_plot()`
  15 | # Version: 1.5.01 [2023-10-17]
  16 | #   * Bug fix parsing loci while parsing GTF
  17 | #   * Small updates to profile_heatmap()
  18 | # Version: 1.5.00 [2023-08-24]
  19 | #   * Added `read_coldata` to import bigwig and bed files along with metadata. This streamlines all the downstream processes
  20 | #   * Added `profile_heatmap` for plotting heatmap
  21 | #   * Added `diffpeak` for minimal differential peak analysis based on peak intensities
  22 | #   * Added `volcano_plot` for diffpeak results visualization
  23 | #   * Added `summarize_homer_annots`
  24 | #   * Support for GTF files with `track_extract`
  25 | #   * `track_extract` now accepts gene name as input.
  26 | #   * More customization to `profile_extract` `profile_plot` and `plot_pca`
  27 | #   * Nicer output with `extract_summary` 
  28 | #   * Update mysql query for UCSC. Added `ideoTblName` argument for `track_extract`. Issue: #19
  29 | # Version: 1.4.00 [2023-07-27]
  30 | #   * Updated track_plot to include chromHMM tracks and top peaks tracks
  31 | #   * Support to draw narrowPeak or boradPeak files with track_plot
  32 | #   * Support to query ucsc for chromHMM tracks
  33 | #   * Additional arguments to track_plot to adjust heights of all the tracks and margins
  34 | #   * Improved track_extract - (extracts gene models and cytobands to avoid repetitive calling ucsc genome browser)
  35 | #   * Additional arguments to pca_plot for better plotting
  36 | #   * Added example datasets
  37 | # Version: 1.3.10 [2021-10-06]
  38 | #   * Support for negative values (Issue: https://github.com/PoisonAlien/trackplot/issues/6 )
  39 | #   * Added y_min argument to track_plot. 
  40 | #   * Change the default value for collapse_tx to TRUE
  41 | # Version: 1.3.05 [2021-06-07]
  42 | #   * Summarize and groupScaleByCondition tracks by condition. Issue: #4
  43 | #   * Allow the script to install as a package.
  44 | #   * Added y_max argument for custom y-axis limits in track_plot. 
  45 | # Version: 1.3.01 [2021-04-26]
  46 | #   * Fix gtf bug. Issue: #3
  47 | # Version: 1.3.0 [2021-03-26]
  48 | #   * modularize the code base to avoid repetitive data extraction and better plotting
  49 | # Version: 1.2.0 [2020-12-09]
  50 | #   * Added bwpcaplot()
  51 | # Version: 1.1.11 [2020-12-07]
  52 | #   * Bug fixes in profileplot(): Typo for .check_dt() and startFrom estimation
  53 | # Version: 1.1.1 [2020-12-04]
  54 | #   * trackplot() now plots ideogram of target chromosome
  55 | # Version: 1.1.0 [2020-12-01]
  56 | #   * Added profileplot()
  57 | # Version: 1.0.0 [2020-11-27]
  58 | #   * Initial release
  59 | 
  60 | #------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  61 | 
  62 | #'Prepares meta data table from bigWig files.
  63 | #'Output from this function is passed to all downstream functions.
  64 | #' @param bws path to bigWig files
  65 | #' @param sample_names sample names for each input files. Optional. Default NULL - creates one from file names.
  66 | #' @param build Reference genome build. Default hg38
  67 | #' @param input_type Default `bw`. Can be `bw` or `peak`
  68 | #' @examples
  69 | #' bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 
  70 | #' cd = read_coldata(bws = bigWigs, build = "hg19")
  71 | #' beds = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "\\.bed$", full.names = TRUE) 
  72 | #' cd_bed = read_coldata(bws = beds, input_type = "peak", build = "hg19")
  73 | #' @export
  74 | 
  75 | read_coldata = function(bws = NULL, sample_names = NULL, build = "hg38", input_type = "bw"){
  76 |   
  77 |   if(is.null(bws)){
  78 |     stop("Please provide paths to bigWig files")
  79 |   }
  80 |   
  81 |   input_type = match.arg(arg = input_type, choices = c("bw", "peak"))
  82 |   
  83 |   message("Checking for files..")
  84 |   bws = as.character(bws)
  85 |   lapply(bws, function(x){
  86 |     if(!file.exists(x)){
  87 |       stop(paste0(x, " does not exist!"))
  88 |     }
  89 |   })
  90 |   
  91 |   if(is.null(sample_names)){
  92 |     bw_sample_names = unlist(data.table::tstrsplit(x = basename(bws), split = "\\.", keep = 1))
  93 |   }else{
  94 |     bw_sample_names = as.character(sample_names)  
  95 |   }
  96 |   
  97 |   if(any(duplicated(bw_sample_names))){
  98 |     stop("Found duplicates. Samples names must be unique")
  99 |   }
 100 |   if(length(bw_sample_names) != length(bws)){
 101 |     stop("Please provide names for each input file")
 102 |   }
 103 |   
 104 |   coldata = data.table::data.table(bw_files = bws, bw_sample_names = bw_sample_names)
 105 |   
 106 |   attr(coldata, "refbuild") = build
 107 |   attr(coldata, "is_bw") =  input_type ==  "bw"
 108 |   message("Input type: ", input_type)
 109 |   message("Ref genome: ", build)
 110 |   message("OK!")
 111 |   
 112 |   coldata
 113 | }
 114 | 
 115 | #' Extract bigWig track data for the given loci
 116 | #' @param colData coldata from \code{read_coldata}
 117 | #' @param loci target region to plot. Should be of format "chr:start-end". e.g; chr3:187715903-187752003 OR chr3:187,715,903-187,752,003
 118 | #' @param gene gene name. This is mutually exclusive with \code{loci}
 119 | #' @param binsize bin size to extract signal. Default 10 (bps).
 120 | #' @param nthreads Default 1. Number of threads to use.
 121 | #' @param query_ucsc Default TRUE. Queries UCSC and extracts gene models and cytoband for the loci. Requires `mysql` installation.
 122 | #' @param gtf Use gtf file or data.frame as source for gene model. Default NULL.
 123 | #' @param build Reference genome build. Default hg38
 124 | #' @param padding Extend locus on both sides by this many bps. 
 125 | #' @param ideoTblName Table name for ideogram. Default `cytoBand`
 126 | #' @import data.table
 127 | #' @examples
 128 | #' bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 
 129 | #' cd = read_coldata(bws = bigWigs, build = "hg19")
 130 | #' oct4_loci = "chr6:31125776-31144789"
 131 | #' t = track_extract(colData = cd, loci = oct4_loci, build = "hg19")
 132 | #' @export
 133 | track_extract = function(colData = NULL, loci = NULL, gene = NULL, binsize = 10, nthreads = 1, query_ucsc = TRUE, gtf = NULL, build = "hg38", padding = 0, ideoTblName = "cytoBand"){
 134 |   
 135 |   if(is.null(colData)){
 136 |     stop("Missing colData. Use read_coldata() to generate one.")
 137 |   }
 138 |   
 139 |   if(all(is.null(loci), is.null(gene))){
 140 |     stop("Please provide a loci or a gene name!")
 141 |   }
 142 |   
 143 |   input_bw = attr(colData, "is_bw")
 144 |   build = attr(colData, "refbuild")
 145 | 
 146 |   .check_windows()
 147 |   if(input_bw){
 148 |     .check_bwtool()  
 149 |   }
 150 |   
 151 |   .check_dt()
 152 |   
 153 |   options(warn = -1)
 154 |   op_dir = tempdir() #For now
 155 |   
 156 |   if(!is.null(gene)){
 157 |     if(!is.null(gtf)){
 158 |       etbl = .parse_gtf(gtf = gtf, genename = gene)
 159 |       cyto = NA
 160 |       start = min(unlist(lapply(etbl,function(x) attr(x, "start"))))
 161 |       end = max(unlist(lapply(etbl,function(x) attr(x, "end"))))
 162 |       chr = unique(unlist(lapply(etbl,function(x) attr(x, "chr"))))
 163 |     }else if(query_ucsc){
 164 |       message("Querying UCSC genome browser for gene model and cytoband..")
 165 |       etbl = .extract_geneModel_ucsc_bySymbol(genesymbol = gene, refBuild = build)
 166 |       chr = unique(as.character(etbl$chr)); start = min(as.numeric(etbl$start)); end = max(as.numeric(etbl$end))
 167 |       if(length(chr) > 1){
 168 |         message("Multiple chromosomes found! Using the first one ", chr[1])
 169 |         etbl = etbl[chr %in% chr[1]]
 170 |         chr = unique(as.character(etbl$chr)); start = min(as.numeric(etbl$start)); end = max(as.numeric(etbl$end))
 171 |       }
 172 |       if(!is.null(etbl)){
 173 |         etbl = .make_exon_tbl(gene_models = etbl)  
 174 |       }
 175 |       cyto = .extract_cytoband(chr = chr, refBuild = build)
 176 |       loci = paste0(chr, ":", start, "-", end)
 177 |     }else{
 178 |       cyto = etbl = NA
 179 |     }
 180 |     if(is.null(etbl)){
 181 |       stop("No transcript models found for ", gene)
 182 |     }
 183 |   }else{
 184 |     message("Parsing loci..")
 185 |     loci_p = .parse_loci(loci = loci)
 186 |     chr = loci_p$chr; start = loci_p$start; end = loci_p$end
 187 |     if(start >= end){
 188 |       stop("End must be larger than Start!")
 189 |     }
 190 |     message("    Queried region: ", chr, ":", start, "-", end, " [", end-start, " bps]")
 191 |     #Extract gene models for this region
 192 |     if(!is.null(gtf)){
 193 |       etbl = .parse_gtf(gtf = gtf, chr = chr, start = start, end = end)
 194 |       cyto = NA
 195 |     }else if(query_ucsc){
 196 |       message("Querying UCSC genome browser for gene model and cytoband..")
 197 |       etbl = .extract_geneModel_ucsc(chr, start = start, end = end, refBuild = build, txname = NULL, genename = NULL)
 198 |       if(!is.null(etbl)){
 199 |         etbl = .make_exon_tbl(gene_models = etbl)  
 200 |       }
 201 |       cyto = .extract_cytoband(chr = chr, refBuild = build, tblName = ideoTblName)
 202 |     }else{
 203 |       cyto = etbl = NA
 204 |     }
 205 |   }
 206 |   
 207 |   start = start - as.numeric(padding)  
 208 |   end = end + as.numeric(padding)  
 209 |   loci = paste0(chr, ":", start, "-", end)
 210 |   
 211 |   input_files = colData$bw_files
 212 |   custom_names = colData$bw_sample_names
 213 |   
 214 |   if(input_bw){
 215 |     windows = .gen_windows(chr = chr, start = start, end = end, window_size = binsize, op_dir = op_dir)
 216 |     track_summary = .get_summaries(bedSimple = windows, bigWigs = input_files, op_dir = op_dir, nthreads = nthreads)  
 217 |   }else{
 218 |     track_summary = .get_summaries_narrowPeaks(bigWigs = input_files, nthreads = nthreads, chr, start, end)  
 219 |   }
 220 |   
 221 |   names(track_summary) = custom_names
 222 |   
 223 |   attr(track_summary, "meta") = list(etbl = etbl, cyto = cyto, loci = loci)
 224 |   message("OK!")
 225 |   
 226 |   list(data = track_summary, colData = colData)
 227 | }
 228 | 
 229 | #' Summarize tracks per condition
 230 | #' @param summary_list Output from track_extract. Required.
 231 | #' @param condition a column name in \code{coldata} containing sample conditions. Default NULL.
 232 | #' @param stat can be `mean, median`, `max`, `min`. NAs are excluded. 
 233 | #' @export
 234 | track_summarize = function(summary_list = NULL, condition = NULL, stat = "mean"){
 235 |   
 236 |   if(is.null(summary_list)){
 237 |     stop("Missing input! Expecting output from track_extract()")
 238 |   }
 239 |   
 240 |   stat = match.arg(arg = stat, choices = c("mean", "median", "max", "min"))
 241 |   
 242 |   meta = attr(summary_list$data, "meta")
 243 |   loci = meta$loci
 244 |   etbl = meta$etbl
 245 |   cyto = meta$cyto
 246 |   
 247 |   coldata = summary_list$colData
 248 |   is_bw = attr(coldata, "is_bw")
 249 |   build = attr(coldata, "refbuild")
 250 |   
 251 |   if(is.null(condition)){
 252 |     stop("Please provide a column name containing sample condition!\nHere are available columns.\n", paste(colnames(coldata), collapse = " "))
 253 |   }
 254 |   
 255 |   summary_list = summary_list$data
 256 |   
 257 |   if(!is.null(condition)){
 258 |     if(!condition %in% colnames(coldata)){
 259 |       warning(paste0(condition, " does not exists in coldata. Here are available columns."))
 260 |       print(coldata)
 261 |       stop()
 262 |     }else{
 263 |       colnames(coldata)[which(colnames(coldata) == condition)] = "group_condition"
 264 |       coldata$group_condition = as.character(coldata$group_condition)
 265 |     }
 266 |     condition = as.character(coldata$group_condition)
 267 |   }
 268 |   
 269 |   names(summary_list) = condition
 270 |   
 271 |   summary_list = data.table::rbindlist(l = summary_list, use.names = TRUE, fill = TRUE, idcol = "sample_name")
 272 |   
 273 |   if(stat == "mean"){
 274 |     summary_list = summary_list[,mean(max, na.rm = TRUE), .(sample_name, chromosome, start, end)]  
 275 |   }else if (stat == "median"){
 276 |     summary_list = summary_list[,median(max, na.rm = TRUE), .(sample_name, chromosome, start, end)]
 277 |   }else if (stat == "max"){
 278 |     summary_list = summary_list[,max(max, na.rm = TRUE), .(sample_name, chromosome, start, end)]
 279 |   }else{
 280 |     summary_list = summary_list[,min(max, na.rm = TRUE), .(sample_name, chromosome, start, end)]
 281 |   }
 282 |   
 283 |   colnames(summary_list)[ncol(summary_list)] = "max" #this column name means nothing, just using it for the consistency
 284 |   summary_list = split(summary_list, summary_list$sample_name)
 285 |   attr(summary_list, "meta") = meta
 286 |   
 287 |   list(data = summary_list, colData = coldata)
 288 | }
 289 | 
 290 | #' Generate IGV style locus tracks with ease
 291 | #' @param summary_list Output from track_extract
 292 | #' @param draw_gene_track Default FALSE. If TRUE plots gene models overlapping with the queried region
 293 | #' @param show_ideogram Default TRUE. If TRUE plots ideogram of the target chromosome with query loci highlighted. Works only when `query_ucsc` is TRUE. 
 294 | #' @param txname transcript name to draw. Default NULL. Plots all transcripts overlapping with the queried region
 295 | #' @param genename gene name to draw. Default NULL. Plots all genes overlapping with the queried region
 296 | #' @param collapse_txs Default FALSE. Whether to collapse all transcripts belonging to same gene into a unified gene model
 297 | #' @param groupAutoScale Default TRUE
 298 | #' @param y_max custom y axis upper limits for each track. Recycled if required.
 299 | #' @param y_min custom y axis lower limits for each track. Recycled if required.
 300 | #' @param gene_fsize Font size. Default 1
 301 | #' @param col Color for tracks. Default `#2f3640`. Multiple colors can be provided for each track
 302 | #' @param show_axis Default FALSE
 303 | #' @param track_names Default NULL
 304 | #' @param track_names_pos Default 0 (corresponds to left corner)
 305 | #' @param track_names_to_left If TRUE, track names are shown to the left of the margin. Default FALSE, plots on top as a title
 306 | #' @param track_overlay Draws all bigWigs in a single track as a line plot 
 307 | #' @param regions genomic regions to highlight. A data.frame with at-least three columns containing chr, start and end positions.
 308 | #' @param boxcol color for highlighted region. Default "#192A561A"
 309 | #' @param boxcolalpha Default 0.5
 310 | #' @param ucscChromHMM Name of the chromHMM table. Use .get_ucsc_hmm_tbls() to see the details.
 311 | #' @param chromHMM chromHMM data. Can be path to bed files or a list data.frames with first three columns containing chr,start,end and a 4th column containing integer coded state
 312 | #' @param chromHMM_names name for the chromHMM track
 313 | #' @param chromHMM_cols A named vector for each state (in the 4th column of chromHMM file). Default NULL
 314 | #' @param peaks bed file to be highlighted. Can be path to bed files or a list data.frames with first three columns containing chr,start,end.
 315 | #' @param peaks_track_names Provide a name for each loci bed file. Default NULL
 316 | #' @param cytoband_track_height Default 1
 317 | #' @param chromHMM_track_height Default 1
 318 | #' @param gene_track_height Default 2 
 319 | #' @param scale_track_height Default 1
 320 | #' @param peaks_track_height Default 2.
 321 | #' @param bw_track_height Default 3
 322 | #' @param left_mar Space to the left. Default 4
 323 | #' @param bw_ord Names of the tracks to be drawn in the provided order. Default NULL.
 324 | #' @param layout_ord Plot layout order. Deafult c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, cytoband track.
 325 | #' @examples
 326 | #' bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 
 327 | #' cd = read_coldata(bws = bigWigs, build = "hg19")
 328 | #' oct4_loci = "chr6:31125776-31144789"
 329 | #' t = track_extract(colData = cd, loci = oct4_loci, build = "hg19")
 330 | #' trackplot::track_plot(summary_list = t)
 331 | #' @export
 332 | track_plot = function(summary_list = NULL,
 333 |                       draw_gene_track = TRUE,
 334 |                       show_ideogram = TRUE,
 335 |                       col = "gray70",
 336 |                       groupAutoScale = FALSE,
 337 |                       y_max = NULL,
 338 |                       y_min = NULL,
 339 |                       txname = NULL,
 340 |                       genename = NULL,
 341 |                       show_axis = FALSE,
 342 |                       gene_fsize = 1,
 343 |                       track_names = NULL,
 344 |                       track_names_pos = 0,
 345 |                       track_names_to_left = FALSE,
 346 |                       track_overlay = FALSE,
 347 |                       regions = NULL,
 348 |                       collapse_txs = TRUE,
 349 |                       boxcol = "#ffc41a",
 350 |                       boxcolalpha = 0.4,
 351 |                       chromHMM = NULL,
 352 |                       chromHMM_cols = NULL,
 353 |                       chromHMM_names = NULL,
 354 |                       ucscChromHMM = NULL,
 355 |                       peaks = NULL,
 356 |                       bw_track_height = 3,
 357 |                       peaks_track_height = 2,
 358 |                       gene_track_height = 2,
 359 |                       scale_track_height = 2,
 360 |                       chromHMM_track_height = 1,
 361 |                       cytoband_track_height = 2,
 362 |                       peaks_track_names = NULL,
 363 |                       left_mar = NULL,
 364 |                       bw_ord = NULL,
 365 |                       layout_ord = c("p", "b", "h", "g", "c")
 366 | ){
 367 |   
 368 |   if(is.null(summary_list)){
 369 |     stop("Missing input! Expecting output from track_extract()")
 370 |   }
 371 |   
 372 |   
 373 |   meta = attr(summary_list$data, "meta")
 374 |   loci = meta$loci
 375 |   etbl = meta$etbl
 376 |   cyto = meta$cyto
 377 |   
 378 |   coldata = summary_list$colData
 379 |   is_bw = attr(coldata, "is_bw")
 380 |   build = attr(coldata, "refbuild")
 381 |   
 382 |   loci_p = .parse_loci(loci = loci)
 383 |   chr = loci_p$chr; start = loci_p$start; end = loci_p$end
 384 |   
 385 |   # chr = summary_list$loci[1]
 386 |   # start = as.numeric(summary_list$loci[2])
 387 |   # end = as.numeric(summary_list$loci[3])
 388 |   # etbl = summary_list$etbl
 389 |   # cyto = summary_list$cyto
 390 |   # is_bw = attr(summary_list, "is_bw")
 391 |   # build = attr(summary_list, "refbuild")
 392 |   
 393 |   summary_list = summary_list$data
 394 |   
 395 |   #Change the order
 396 |   if(!is.null(bw_ord)){
 397 |     bw_ord = intersect(names(summary_list), bw_ord)
 398 |     
 399 |     if(length(bw_ord) == 0){
 400 |       stop("None of the provided bw_ord are presnt in the data! Available names:\n", paste(names(summary_list), collapse = ", "))
 401 |     }
 402 |     
 403 |     summary_list = summary_list[bw_ord]
 404 |     coldata = data.table::rbindlist(split(coldata, coldata$bw_sample_names)[bw_ord])
 405 |   }
 406 |   
 407 |   if(length(col) != length(summary_list)){
 408 |     col = rep(x = col, length(summary_list))
 409 |   }
 410 |   
 411 |   plot_regions = FALSE
 412 |   if(!is.null(regions)){
 413 |     if(is(object = regions, class2 = "data.frame")){
 414 |       regions = data.table::as.data.table(x = regions)
 415 |       colnames(regions)[1:3] = c("chromsome", "startpos", "endpos")
 416 |       regions = regions[chromsome %in% chr]
 417 |       if(nrow(regions) == 0){
 418 |         warning("None of the regions are within the requested chromosme: ", chr)
 419 |         plot_regions = TRUE
 420 |       }else{
 421 |         plot_regions = TRUE  
 422 |       }
 423 |     }else{
 424 |       stop("'mark_regions' must be a data.frame with first 3 columns containing : chr, start, end")
 425 |     }
 426 |   }
 427 |   
 428 |   if(!is.null(track_names)){
 429 |     names(summary_list) = track_names
 430 |   }
 431 |   
 432 |   groupScaleByCondition = FALSE #For furture
 433 |   if(groupScaleByCondition){
 434 |     plot_height = unlist(lapply(summary_list, function(x) max(x$max, na.rm = TRUE)))
 435 |     plot_height_min = unlist(lapply(summary_list, function(x) min(x$max, na.rm = TRUE)))
 436 |     plot_height = data.table::data.table(plot_height, plot_height_min, col, names(summary_list))
 437 |     plot_height$og_ord = 1:nrow(plot_height)
 438 |     plot_height = plot_height[order(col)]
 439 |     plot_height_max = plot_height[,.(.N, max(plot_height)), .(col)]
 440 |     plot_height_min = plot_height[,.(.N, max(plot_height_min)), .(col)]
 441 |     plot_height$max = rep(plot_height_max$V2, plot_height_max$N)
 442 |     plot_height$min = rep(plot_height_min$V2, plot_height_min$N)
 443 |     plot_height_min = plot_height[order(og_ord)][,min]
 444 |     plot_height = plot_height[order(og_ord)][,max]
 445 |   }else if(groupAutoScale){
 446 |     plot_height = max(unlist(lapply(summary_list, function(x) max(x$max, na.rm = TRUE))), na.rm = TRUE)
 447 |     plot_height_min = min(unlist(lapply(summary_list, function(x) min(x$max, na.rm = TRUE))), na.rm = TRUE)
 448 |     plot_height = rep(plot_height, length(summary_list))
 449 |     plot_height_min = rep(plot_height_min, length(summary_list))
 450 |   }else{
 451 |     plot_height = unlist(lapply(summary_list, function(x) max(x$max, na.rm = TRUE)))
 452 |     plot_height_min = unlist(lapply(summary_list, function(x) min(x$max, na.rm = TRUE)))
 453 |   }
 454 |   
 455 |   if(!is.null(y_max)){
 456 |     #If custom ylims are provided
 457 |     if(length(y_max) != length(summary_list)){
 458 |       y_max = rep(y_max, length(summary_list))
 459 |     }
 460 |     plot_height = y_max
 461 |     
 462 |   }else{
 463 |     plot_height = round(plot_height, digits = 2)  
 464 |   }
 465 |   
 466 |   if(!is.null(y_min)){
 467 |     #If custom ylims are provided
 468 |     if(length(y_min) != length(summary_list)){
 469 |       y_min = rep(y_min, length(summary_list))
 470 |     }
 471 |     plot_height_min = y_min
 472 |     
 473 |   }else{
 474 |     plot_height_min = round(plot_height_min, digits = 2)  
 475 |   }
 476 |   
 477 |   if(is_bw & track_overlay){
 478 |     ntracks = 1
 479 |   }else{
 480 |     ntracks = length(summary_list)  
 481 |   }
 482 |   
 483 |   lo = .make_layout(ntracks = ntracks, ntracks_h = bw_track_height, cytoband = show_ideogram, cytoband_h = cytoband_track_height, genemodel = draw_gene_track, 
 484 |                genemodel_h = gene_track_height, chrHMM = any(!is.null(ucscChromHMM), !is.null(chromHMM)), chrHMM_h = chromHMM_track_height, loci = !is.null(peaks), 
 485 |                loci_h = peaks_track_height, scale_track_height = scale_track_height, lord = layout_ord)
 486 |   
 487 |   query = data.table::data.table(chr = chr, start = start, end = end)
 488 |   data.table::setkey(x = query, chr, start, end)
 489 |   
 490 |   if(is.null(left_mar)){
 491 |     left_mar = ifelse(test = show_axis, yes = 4, no = 2)
 492 |   }
 493 |   
 494 |   #Draw top peaks
 495 |   if(!is.null(peaks)){
 496 |     
 497 |     if(is.list(peaks)){
 498 |       peaks_data = lapply(peaks, function(l){
 499 |         colnames(l)[1:3] = c("chr", "start", "end")
 500 |         data.table::setDT(l, key = c("chr", "start", "end"))
 501 |         l
 502 |       })
 503 |     }else{
 504 |       peaks_data = lapply(peaks, function(l){
 505 |         l = data.table::fread(file = l)
 506 |         colnames(l)[1:3] = c("chr", "start", "end")
 507 |         data.table::setDT(l, key = c("chr", "start", "end"))
 508 |         l
 509 |       })
 510 |     }
 511 |     
 512 |     if(is.null(peaks_track_names)){
 513 |       names(peaks_data) = paste0("Bed", 1:length(peaks_data))  
 514 |     }else{
 515 |       names(peaks_data) = peaks_track_names
 516 |     }
 517 |     
 518 |     if(show_axis){
 519 |       par(mar = c(0.25, left_mar, 0.25, 1))
 520 |     }else{
 521 |       par(mar = c(0.25, left_mar, 0.25, 1))  
 522 |     }
 523 |     
 524 |     plot(NA, xlim = c(start, end), ylim = c(0, length(peaks)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 525 |     
 526 |     for(idx in seq_along(peaks_data)){
 527 |       l_idx = peaks_data[[idx]]
 528 |       l_idx = data.table::foverlaps(x = query, y = l_idx, type = "any", nomatch = NULL)[,.(chr, start, end)]
 529 |       rect(xleft = start, ybottom = idx - 0.49, xright = end, ytop = idx - 0.51, col = "gray90", border = NA)
 530 |       if(nrow(l_idx) > 0){
 531 |         rect(xleft = l_idx$start, ybottom = idx - 0.9, xright = l_idx$end, ytop = idx - 0.1, col = "#34495e", border = NA)
 532 |       }
 533 |       text(x = start, y = idx - 0.5, labels = names(peaks_data)[idx], adj = 1.2, xpd = TRUE)
 534 |     }
 535 |   }
 536 |   
 537 |   #Draw bigWig signals
 538 |   boxcol = grDevices::adjustcolor(boxcol, alpha.f = boxcolalpha)
 539 |   if(is_bw){
 540 |     if(track_overlay){
 541 |       
 542 |       if(show_axis){
 543 |         par(mar = c(0.5, left_mar, 2, 1))
 544 |       }else{
 545 |         par(mar = c(0.5, left_mar, 2, 1))  
 546 |       }
 547 |       
 548 |       plot(NA, xlim = c(start, end), ylim = c(min(plot_height_min), max(plot_height)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 549 |       for(idx in 1:length(summary_list)){
 550 |         x = summary_list[[idx]]
 551 |         if(nrow(x) == 0){
 552 |           if(track_names_to_left){
 553 |             text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE)
 554 |             #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0)
 555 |             #title(main = , adj = track_names_pos, font.main = 3)  
 556 |           }else{
 557 |             title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3)
 558 |           }
 559 |           next
 560 |         }
 561 |         points(x = x$start, y = x$max, col = col[idx], type = "l")
 562 |       }
 563 |       
 564 |       if(plot_regions){
 565 |         # boxcol = "#192a56"
 566 |         if(nrow(x) > 0){
 567 |           rect(xleft = regions[, startpos], ybottom = 0, xright = regions[, endpos], ytop = max(plot_height), col = boxcol, border = NA, xpd = TRUE)  
 568 |         }
 569 |       }
 570 |       
 571 |       if(show_axis){
 572 |         axis(side = 2, at = c(min(plot_height_min), max(plot_height)), las = 2)
 573 |       }
 574 |       legend(x = "topright", legend = names(summary_list), col = col, pch = "-")
 575 |     }else{
 576 |       for(idx in 1:length(summary_list)){
 577 |         x = summary_list[[idx]]
 578 |         if(show_axis){
 579 |           par(mar = c(0.5, left_mar, 2, 1))
 580 |         }else{
 581 |           par(mar = c(0.5, left_mar, 2, 1))  
 582 |         }
 583 |         
 584 |         plot(NA, xlim = c(start, end), ylim = c(plot_height_min[idx], plot_height[idx]), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 585 |         #If there is no signal, just add the track names and go to next
 586 |         if(nrow(x) == 0){
 587 |           if(track_names_to_left){
 588 |             text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE)
 589 |             #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0)
 590 |             #title(main = , adj = track_names_pos, font.main = 3)  
 591 |           }else{
 592 |             title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3)
 593 |           }
 594 |           next
 595 |         }
 596 |         rect(xleft = x$start, ybottom = 0, xright = x$end, ytop = x$max, col = col[idx], border = col[idx])
 597 |         
 598 |         if(show_axis){
 599 |           axis(side = 2, at = c(plot_height_min[idx], plot_height[idx]), las = 2)  
 600 |         }else{
 601 |           text(x = start, y = plot_height[idx], labels = paste0("[", plot_height_min[idx], "-", plot_height[idx], "]"), adj = 0, xpd = TRUE)
 602 |         }
 603 |         #plot(NA, xlim = c(start, end), ylim = c(0, nrow(regions)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 604 |         
 605 |         if(plot_regions){
 606 |           # boxcol = "#192a56"
 607 |           if(nrow(x) > 0){
 608 |             rect(xleft = regions[, startpos], ybottom = 0, xright = regions[, endpos], ytop = max(plot_height), col = boxcol, border = NA, xpd = TRUE)  
 609 |           }
 610 |         }
 611 |         
 612 |         if(track_names_to_left){
 613 |           text(x = start, y = (plot_height_min[idx] + plot_height[idx])/2, labels = names(summary_list)[idx], adj = 1.1, cex = gene_fsize, xpd = TRUE)
 614 |         }else{
 615 |           title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3)  
 616 |         }
 617 |       }
 618 |     }
 619 |   }else{
 620 |     for(idx in 1:length(summary_list)){
 621 |       x = summary_list[[idx]]
 622 |       
 623 |       if(show_axis){
 624 |         par(mar = c(0, left_mar, 0, 1))
 625 |       }else{
 626 |         par(mar = c(0.5, left_mar, 1, 1))  
 627 |       }
 628 |       
 629 |       plot(NA, xlim = c(start, end), ylim = c(0, 1), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 630 |       #If there is no signal, just add the track names and go to next
 631 |       if(nrow(x) == 0){
 632 |         if(track_names_to_left){
 633 |           text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE)
 634 |           #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0)
 635 |           #title(main = , adj = track_names_pos, font.main = 3)  
 636 |         }else{
 637 |           title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3)
 638 |         }
 639 |         next
 640 |       }
 641 |       cols = cut(x$max, breaks = c(0, 166, 277, 389, 500, 612, 723, 834, 945, max(x$max)), labels = c("#FFFFFF", "#F0F0F0", "#D9D9D9", "#BDBDBD", "#969696", "#737373", 
 642 |                                                                                                       "#525252", "#252525", "#000000"))
 643 |       rect(xleft = x$start, ybottom = 0.01, xright = x$end, ytop = 0.99, col = as.character(cols), border = NA)
 644 |       
 645 |       if(plot_regions){
 646 |         # boxcol = "#192a56"
 647 |         boxcol = grDevices::adjustcolor(boxcol, alpha.f = boxcolalpha)
 648 |         if(nrow(regions) > 0){
 649 |           if(nrow(x) > 0){
 650 |             rect(xleft = regions[, startpos], ybottom = 0, xright = regions[, endpos], ytop = max(plot_height), col = boxcol, border = NA, xpd = TRUE)  
 651 |           }
 652 |         }
 653 |       }
 654 |       
 655 |       if(track_names_to_left){
 656 |         text(x = start, y = 0.5, labels = names(summary_list)[idx], adj = 1, cex = gene_fsize, xpd = TRUE)
 657 |         #mtext(text = names(summary_list)[idx], side = 2, line = -2, outer = TRUE, xpd = TRUE, las = 2, adj = 0)
 658 |         #title(main = , adj = track_names_pos, font.main = 3)  
 659 |       }else{
 660 |         title(main = names(summary_list)[idx], adj = track_names_pos, font.main = 3)
 661 |       }
 662 |     }
 663 |     
 664 |   }
 665 |   
 666 |   #Draw chrom HMM tracks
 667 |   plotHMM = FALSE
 668 |   if(!is.null(chromHMM)){
 669 |     if(is.list(chromHMM)){
 670 |       chromHMM = lapply(chromHMM, function(l){
 671 |         colnames(l)[1:4] = c("chr", "start", "end", "name")
 672 |         data.table::setDT(l, key = c("chr", "start", "end"))
 673 |         l
 674 |       })
 675 |     }else{
 676 |       chromHMM = lapply(chromHMM, function(l){
 677 |         l = data.table::fread(file =  l)
 678 |         colnames(l)[1:4] = c("chr", "start", "end", "name")
 679 |         data.table::setDT(l, key = c("chr", "start", "end"))
 680 |       })
 681 |     }
 682 |     
 683 |     hmmdata = lapply(chromHMM, function(hmm){
 684 |       .load_chromHMM(chr = chr, start = start, end = end, ucsc = hmm)
 685 |       #.extract_chromHmm_ucsc()
 686 |     })
 687 |     
 688 |     if(is.null(chromHMM_names)){
 689 |       names(hmmdata) = paste0("chromHMM_", 1:length(hmmdata))
 690 |     }else{
 691 |       names(hmmdata) = chromHMM_names
 692 |     }
 693 |     
 694 |     plotHMM = TRUE
 695 |   }else if(!is.null(ucscChromHMM)){
 696 |     hmmdata = lapply(ucscChromHMM, function(hmmtbl){
 697 |       .extract_chromHmm_ucsc(chr = chr, start = start, end = end, refBuild = build, tbl = hmmtbl)
 698 |     })
 699 |     names(hmmdata) = ucscChromHMM
 700 |     plotHMM = TRUE
 701 |     #return(hmmdata)
 702 |   }
 703 |   
 704 |   if(plotHMM){
 705 |     if(show_axis){
 706 |       par(mar = c(0.1, left_mar, 0, 1))
 707 |     }else{
 708 |       par(mar = c(0.1, left_mar, 0, 1))  
 709 |     }
 710 |     
 711 |     if(is.null(chromHMM_cols)){
 712 |       chromHMM_cols = .get_ucsc_hmm_states_cols()
 713 |     }
 714 |     
 715 |     .plot_ucsc_chrHmm(d = hmmdata, start = start, end = end, hmm_cols = chromHMM_cols)
 716 |   }
 717 |   
 718 |   #Draw gene models
 719 |   if(draw_gene_track){
 720 |     
 721 |     #etbl = .make_exon_tbl(gene_models = etbl, txname = txname, genename = genename)
 722 |     
 723 |     if(!is.null(etbl)){
 724 |       
 725 |       if(!is.null(genename)){
 726 |         if(length(etbl[unlist((lapply(etbl, attr, "gene"))) %in% genename]) == 0){
 727 |           message("Note: Could not find any of the requested gene names! Available genes are:")
 728 |           print(unique(unlist((lapply(etbl, attr, "gene")))))
 729 |         }else{
 730 |           etbl = etbl[unlist((lapply(etbl, attr, "gene"))) %in% genename]  
 731 |         }
 732 |       }
 733 |       
 734 |       if(collapse_txs){
 735 |         etbl = .collapse_tx(etbl)
 736 |       }
 737 |       
 738 |       if(show_axis){
 739 |         par(mar = c(0.25, left_mar, 0, 1))
 740 |       }else{
 741 |         par(mar = c(0.25, left_mar, 0, 1))  
 742 |       }
 743 |       
 744 |       plot(NA, xlim = c(start, end), ylim = c(0, length(etbl)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 745 |       exon_col = "#192a56"
 746 |       for(tx_id in 1:length(etbl)){
 747 |         txtbl = etbl[[tx_id]]
 748 |         segments(x0 = attr(txtbl, "start"), y0 = tx_id-0.45, x1 = attr(txtbl, "end"), y1 = tx_id-0.45, col = exon_col, lwd = 1)
 749 |         name_at = min(c(txtbl[[1]], txtbl[[2]]))
 750 |         if(is.na(attr(txtbl, "tx"))){
 751 |           text(x = name_at, y = tx_id-0.45, labels = paste0(attr(txtbl, "gene")), adj = 1, cex = gene_fsize, xpd = TRUE, pos = 2) #x = start for outer margin
 752 |         }else{
 753 |           text(x = name_at, y = tx_id-0.45, labels = paste0(attr(txtbl, "tx"), " [", attr(txtbl, "gene"), "]"), cex = gene_fsize, adj = 0, xpd = TRUE, pos = 2)  
 754 |         }
 755 |         
 756 |         rect(xleft = txtbl[[1]], ybottom = tx_id-0.75, xright = txtbl[[2]], ytop = tx_id-0.25, col = exon_col, border = NA)
 757 |         if(attr(txtbl, "strand") == "+"){
 758 |           dirat = pretty(x = c(min(txtbl[[1]]), max(txtbl[[2]])))
 759 |           dirat[1] = min(txtbl[[1]]) #Avoid drawing arrows outside gene length
 760 |           dirat[length(dirat)] = max(txtbl[[2]])
 761 |           points(x = dirat, y = rep(tx_id-0.45, length(dirat)), pch = ">", col = exon_col)
 762 |         }else{
 763 |           dirat = pretty(x = c(min(txtbl[[1]]), max(txtbl[[2]])))
 764 |           dirat[1] = min(txtbl[[1]]) #Avoid drawing arrows outside gene length
 765 |           dirat[length(dirat)] = max(txtbl[[2]])
 766 |           points(x = dirat, y = rep(tx_id-0.45, length(dirat)), pch = "<", col = exon_col)
 767 |         }
 768 |       }
 769 |     }
 770 |   }
 771 |   
 772 |   #Draw scale
 773 |   if(show_axis){
 774 |     par(mar = c(0, left_mar, 0, 1))  
 775 |   }else{
 776 |     par(mar = c(0, left_mar, 0, 1))
 777 |   }
 778 |   lab_at = pretty(c(start, end))
 779 |   lab_at_lab = ifelse(test = lab_at > 1e6, yes = paste0(lab_at/1e6, "M"), no = ifelse(lab_at > 100000, yes = paste0(lab_at/1e5, "K"), no = lab_at))
 780 |   plot(NA, xlim = c(start, end), ylim = c(0, 1), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 781 |   rect(xleft = start, ybottom = 0.5, xright = end, ytop = 0.5, lty = 2, xpd = TRUE)
 782 |   rect(xleft = lab_at, ybottom = 0.45, xright = lab_at, ytop = 0.5, xpd = TRUE)
 783 |   text(x = lab_at, y = 0.2, labels = lab_at_lab, xpd = FALSE)
 784 |   #axis(side = 1, at = lab_at, lty = 2, line = -3)
 785 |   text(x = end, y = 0.9, labels = paste0(chr, ":", start, "-", end), adj = 1, xpd = TRUE)
 786 |   
 787 |   #Draw ideogram
 788 |   if(is.list(cyto)){
 789 |     if(show_ideogram){
 790 |       par(mar = c(0.2, 1, 0, 1))
 791 |       plot(NA, xlim = c(0, max(cyto$end)), ylim = c(0, 1), axes = FALSE, frame.plot = FALSE, xlab = NA, ylab = NA)
 792 |       rect(xleft = cyto$start, ybottom = 0.1, xright = cyto$end, ytop = 0.6, col = cyto$color, border = "#34495e")
 793 |       rect(xleft = start, ybottom = 0, xright = end, ytop = 0.7, col = "#d35400", lwd = 2, border = "#d35400")
 794 |       text(x = 0, y = 0.5, labels = chr, adj = 1.2, font = 2, xpd = TRUE)
 795 |     }
 796 |   }
 797 |   
 798 | }
 799 | 
 800 | # profileplot is an ultra-fast, simple, and minimal dependency R script to generate profile-plots from bigWig files
 801 | #' Generate bigWig signal matrix for given genomic regions or ucsc refseq transcripts 
 802 | #' @param colData from \code{read_coldata}
 803 | #' @param bed bed file or a data.frame with first 3 column containing chromosome, star, end positions. 
 804 | #' @param binSize bin size to extract signal. Default 50 (bps). Should be >1
 805 | #' @param startFrom Default "start". For bed files this can be "start", "center" or "end". For `ucsc_assembly` this can only be "start" or "end"
 806 | #' @param up extend upstream by this many bps from `startFrom`. Default 2500
 807 | #' @param down extend downstream by this many bps from `startFrom`. Default 2500
 808 | #' @param ucsc_assembly If `bed` file not provided, setting `ucsc_assembly` to TRUE will fetch transcripts from UCSC genome browser.
 809 | #' @param pc_genes Use only protein coding genes when `ucsc_assembly` is used. Default TRUE
 810 | #' @param nthreads Default 4
 811 | #' @seealso \code{\link{profile_summarize}} \code{\link{profile_plot}} \code{\link{profile_heatmap}}
 812 | #' @export
 813 | 
 814 | profile_extract = function(colData = NULL, bed = NULL, ucsc_assembly = TRUE, startFrom = "start", binSize = 50,
 815 |                            up = 2500, down = 2500, pc_genes = TRUE, nthreads = 4){
 816 |   .check_windows()
 817 |   .check_bwtool(warn = FALSE)
 818 |   .check_dt()
 819 |   
 820 |   if(is.null(colData)){
 821 |     stop("Missing colData. Use read_coldata() to generate one.")
 822 |   }
 823 |   
 824 |   bigWigs = colData$bw_files
 825 |   custom_names = colData$bw_sample_names
 826 |   
 827 |   op_dir = tempdir() #For now
 828 |   
 829 |   if(is.null(bed)){
 830 |     if(ucsc_assembly){
 831 |       ucsc_assembly = attr(colData, "refbuild")
 832 |       message("No bed file was given. Defaulting to ucsc refseq..")
 833 |       startFrom = match.arg(arg = startFrom, choices = c("start", "end"))
 834 |       bed = .make_genome_bed(refBuild = ucsc_assembly, up = as.numeric(up), down = as.numeric(down), tss = startFrom, op_dir = op_dir, pc_genes = pc_genes, for_profile = TRUE)
 835 |       bed_annot = bed[[2]]
 836 |       bed = bed[[1]]
 837 |     }else{
 838 |       stop("Please provide either a BED file or set ucsc_assembly to TRUE")
 839 |     }
 840 |   }else{
 841 |     startFrom = match.arg(arg = startFrom, choices = c("start", "end", "center"))
 842 |     bed = .make_bed(bed = bed, op_dir = op_dir, up = as.numeric(up), down = as.numeric(up), tss = startFrom, for_profile = TRUE)
 843 |     bed_annot = NA
 844 |   }
 845 |   
 846 |   message("Extracting signals..")
 847 |   mats = parallel::mclapply(bigWigs, function(x){
 848 |     .bwt_mats(bw = x, binSize = binSize, bed = bed, size = paste0(up, ":", down), startFrom = startFrom, op_dir = op_dir)
 849 |   }, mc.cores = nthreads)
 850 |   
 851 |   mats = as.character(unlist(x = mats))
 852 |   sig_list = lapply(mats, data.table::fread)
 853 |   
 854 |   if(!is.null(custom_names)){
 855 |     names(sig_list) = custom_names
 856 |   }else{
 857 |     names(sig_list) = gsub(pattern = "*\\.matrix$", replacement = "", x = basename(path = mats))
 858 |   }
 859 |   
 860 |   attr(sig_list, "args") = c(up, down)
 861 |   
 862 |   #Remove intermediate files
 863 |   lapply(mats, function(x) system(command = paste0("rm ", x), intern = TRUE))
 864 |   
 865 |   list(data = sig_list, colData = colData)
 866 | }
 867 | 
 868 | 
 869 | #' Summarize data for profile plots
 870 | #' @param sig_list Output generated from `profile_extract`
 871 | #' @param stat Default `mean`. Can be `mean`, `median`
 872 | #' @param condition column name with conditions in `colData`. If provided summarizes signals from samples belonging to same group or condition
 873 | #' @seealso \code{\link{profile_extract}} \code{\link{profile_plot}} \code{\link{profile_heatmap}}
 874 | #' @export
 875 | profile_summarize = function(sig_list = NULL, stat = "mean", condition = NULL){
 876 |   
 877 |   if(is.null(sig_list)){
 878 |     stop("Missing input! Use profile_extract() to generate one.")
 879 |   }
 880 |   
 881 |   stat = match.arg(arg = stat, choices = c("mean", 'median'))
 882 |   
 883 |   colData = sig_list$colData
 884 |   collapse_replicates = FALSE
 885 |   if(!is.null(condition)){
 886 |     if(!condition %in% colnames(colData)){
 887 |       stop(condition , " not found in colData!\nAvailable columns: ", print(paste(colnames(colData), collapse = " ")))
 888 |     }
 889 |     collapse_by_idx = which(colnames(colData) == condition)
 890 |     condition = unlist(colData[,collapse_by_idx, with = FALSE], use.names = FALSE)
 891 |     collapse_replicates = TRUE
 892 |   }
 893 |   
 894 |   message("Summarizing..")
 895 |   sig_summary = .summarizeMats(mats = sig_list$data, group = condition, collapse_reps = collapse_replicates, summarizeBy = stat)
 896 |   attr(sig_summary, "args") = attr(sig_list$data, "args")
 897 |   list(data = sig_summary, colData = colData)
 898 | }
 899 | 
 900 | 
 901 | #' Draw a profile plot
 902 | #' @param sig_list Output generated from profile_summarize
 903 | #' @param color Manual colors for each bigWig. Default NULL. 
 904 | #' @param line_size Default 1
 905 | #' @param legend_fs Legend font size. Default 1
 906 | #' @param axis_fs Axis font size. Default 1
 907 | #' @param xlab x axis label. Default NA
 908 | #' @param ylab y axis label. Default NA
 909 | #' @export
 910 | profile_plot = function(sig_list = NULL, color = NULL, line_size = 1, legend_fs = 1, axis_fs = 1, xlab = NA, ylab = NA){
 911 |   
 912 |   if(is.null(sig_list)){
 913 |     stop("Missing input! Expecting output from profile_summarize()")
 914 |   }
 915 |   
 916 |   args = attr(sig_list$data, "args")
 917 |   up = as.numeric(args[1])
 918 |   down = as.numeric(args[2])
 919 |   
 920 |   sig_summary = sig_list$data
 921 |   
 922 |   if(is.null(color)){
 923 |     color = c("#2f4f4f", "#8b4513", "#228b22", "#00008b", "#ff0000", "#ffd700", "#7fff00", "#00ffff", "#ff00ff", "#6495ed", "#ffe4b5", "#ff69b4") #hcl.colors(n = 10, palette = "Dark 2")
 924 |     color = color[1:length(sig_summary)]
 925 |   }
 926 |   
 927 |   y_max = max(unlist(lapply(sig_summary, max, na.rm = TRUE)))
 928 |   y_min = min(unlist(lapply(sig_summary, min, na.rm = TRUE)))
 929 |   ylabs = pretty(c(y_min, y_max), n = 5)
 930 |   
 931 |   
 932 |   x_max = max(unlist(lapply(sig_summary, length)))
 933 |   xlabs = c(up, 0, down)
 934 |   xticks = xticks = c(0,
 935 |                       as.integer(length(sig_summary[[1]])/sum(as.numeric(xlabs[1]), as.numeric(xlabs[3])) * as.numeric(xlabs[1])),
 936 |                       length(sig_summary[[1]]))
 937 |   
 938 |   #line_size = 1
 939 |   par(mar = c(4, 4, 2, 1))
 940 |   plot(NA, xlim = c(0, x_max), ylim = c(min(ylabs), max(ylabs)), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
 941 |   abline(h = ylabs, v = pretty(xticks), col = "gray90", lty = 2)
 942 |   
 943 |   lapply(1:length(sig_summary), function(idx){
 944 |     points(sig_summary[[idx]], type = 'l', lwd = line_size, col = color[idx])
 945 |   })
 946 |   axis(side = 1, at = xticks, labels = xlabs, cex.axis = axis_fs)
 947 |   axis(side = 2, at = ylabs, las = 2, cex.axis = axis_fs)
 948 |   
 949 |   legend(x = "topright", legend = names(sig_summary), col = color, bty = "n", lty = 1, lwd = 1.2, cex = legend_fs, xpd = TRUE)
 950 |   
 951 |   mtext(text = xlab, side = 1, line = 2.5, cex = 1)
 952 |   mtext(text = ylab, side = 2, line = 2.5, cex = 1)
 953 |   
 954 |   invisible(list(mean_signal = sig_summary, color_codes = color, xticks = xticks, xlabs = xlabs))
 955 | }
 956 | 
 957 | #' Draw a heatmap
 958 | #' @details This function takes output from extract_matrix and draws a heatmap
 959 | #' @param mat_list Input matrix list generated by \code{\link{profile_extract}}
 960 | #' @param sortBy Sort matrix by.., Can be mean, median. Default mean.
 961 | #' @param col_pal Color palette to use. Default Blues. Use hcl.pals(type = "sequential") to see available palettes
 962 | #' @param revpal Reverse color palette? Default FALSE.
 963 | #' @param sample_names Manually specify sample names.
 964 | #' @param title_size size of title. Default 0.8
 965 | #' @param top_profile Boolean. Whether to draw top profile plot.
 966 | #' @param top_profile_h Default 2.
 967 | #' @param zmins Manually specify min scores to include in heatmap
 968 | #' @param zmaxs Manually specify max scores to include in heatmap
 969 | #' @param scale Whether to row scale the matrix. Default FALSE
 970 | #' @param file_name Default NULL. If provided saves plot as a png.
 971 | #' @param hm_width Width of the plot. Default 1024
 972 | #' @param hm_height Height of the plot. Default 600
 973 | #' @param mat_order Default NULL. Sample order in which the heatmaps are drawn.
 974 | #' @export
 975 | profile_heatmap = function(mat_list, sortBy = "mean", col_pal = "Blues", revpal = FALSE, sample_names = NULL,
 976 |                         title_size = 1, top_profile = FALSE, top_profile_h = 2, zmins = NULL, zmaxs = NULL,
 977 |                         scale = FALSE, file_name = NULL, hm_width = 1024, hm_height = 600, mat_order = NULL){
 978 |   
 979 |   
 980 |   if(!sortBy %in% c("mean", "median")){
 981 |     stop("sortBy can only be mean, median")
 982 |   }
 983 |   
 984 |   col_pal = match.arg(arg = col_pal, choices = hcl.pals(type = "sequential"))
 985 |   hmcols = rev(hcl.colors(n = 255, palette = col_pal))
 986 |   if(revpal){
 987 |     hmcols = rev(hmcols)
 988 |   }
 989 |   
 990 |   cdata = mat_list$colData
 991 |   size = attr(mat_list$data, "args")
 992 |   mat_list = mat_list$data
 993 |   
 994 |   if(!is.null(mat_order)){
 995 |     if(length(mat_order) != length(mat_list)){
 996 |       stop("Length of mat_order should be the same as number of bw files! [", length(mat_list), "]")
 997 |     }
 998 |     mat_list = mat_list[mat_order]
 999 |   }
1000 |   
1001 |   mat_list = .order_matrix(mats = mat_list, sortBy = sortBy)
1002 |   
1003 |   if(top_profile){
1004 |     profile_dat = .summarizeMats(mats = mat_list, summarizeBy = "mean")
1005 |     yl = c(min(unlist(x = profile_dat), na.rm = TRUE),
1006 |            max(unlist(x = profile_dat), na.rm = TRUE))
1007 |     yl = round(x = yl, digits = 2)
1008 |     
1009 |   }
1010 |   
1011 |   nsamps = length(mat_list)
1012 |   
1013 |   if(!is.null(zmins)){
1014 |     if(length(zmins) != length(mat_list)){
1015 |       warning("zmins are recycled")
1016 |       zmins = rep(zmins, length(mat_list))
1017 |     }
1018 |   }else{
1019 |     zmins = unlist(lapply(profile_dat, min, na.rm = TRUE))
1020 |   }
1021 |   
1022 |   if(!is.null(zmaxs)){
1023 |     if(length(zmaxs) != length(mat_list)){
1024 |       warning("zmaxs are recycled")
1025 |       zmaxs = rep(zmaxs, length(mat_list))
1026 |     }
1027 |   }else{
1028 |     zmaxs = unlist(lapply(profile_dat, max, na.rm = TRUE))
1029 |   }
1030 |   
1031 |   if(!is.null(sample_names)){
1032 |     if(length(sample_names) != length(mat_list)){
1033 |       stop("Number of sample names should be equal to number of samples in the matrix")
1034 |     }else{
1035 |       sample_names = cdata$bw_sample_names
1036 |       names(mat_list) = sample_names
1037 |     }
1038 |   }
1039 |   
1040 |   xlabs = c(size[1], 0, size[2])
1041 |   xticks = c(0,
1042 |              1/(sum(as.integer(xlabs[1]), as.integer(xlabs[3]))/as.integer(xlabs[1])),
1043 |              1)
1044 |   
1045 |   if(!is.null(file_name)){
1046 |     png(filename = paste0(file_name, ".png"), res = 100, height = 1024, width = 600)  
1047 |   }
1048 |   
1049 |   if(top_profile){
1050 |     lo_mat = matrix(data = 1:(nsamps*2), nrow = 2, ncol = nsamps)
1051 |     lo = graphics::layout(mat = lo_mat, heights = c(top_profile_h, 9))
1052 |   }else{
1053 |     lo_mat = matrix(data = 1:nsamps, nrow = 1)
1054 |     lo = graphics::layout(mat = lo_mat)
1055 |   }
1056 |   
1057 |   for(i in 1:nsamps){
1058 |     
1059 |     if(top_profile){
1060 |       .plot_profile_mini(plot_dat = profile_dat, index = i, ylims = c(zmins[i], zmaxs[i]))
1061 |     }
1062 |     
1063 |     par(mar = c(3,2,1.5,1), las=1, tcl=-.25)
1064 |     hm.dat = mat_list[[i]]
1065 |     data.table::setDF(x = hm.dat)
1066 |     
1067 |     if(is.null(zmins)){
1068 |       #colMin = apply(hm.dat, 2, min, na.rm = TRUE)
1069 |       zmin = round(min(hm.dat, na.rm = TRUE), digits = 2)
1070 |       #zmin = 0
1071 |     }else{
1072 |       zmin = zmins[i]
1073 |     }
1074 |     
1075 |     if(scale){
1076 |       hm.dat = scale(x = t(hm.dat), scale = TRUE)
1077 |       #hm.dat =  apply(hm.dat, 2, rev)
1078 |     }else{
1079 |       hm.dat =  t(apply(hm.dat, 2, rev))
1080 |     }
1081 |     
1082 |     if(is.null(zmaxs)){
1083 |       zmax = max(rowMeans(hm.dat, na.rm = TRUE)) #mean(apply(hm.dat, 2, max, na.rm = TRUE))
1084 |       #colMax = which(x = colMax == max(colMax), arr.ind = TRUE)[1]
1085 |       #zmax = round(max(boxplot.stats(unlist(hm.dat[colMax,]))$stats), 2)
1086 |     }else{
1087 |       zmax = zmaxs[i]
1088 |     }
1089 |     
1090 |     #hmcols = grDevices::colorRampPalette(hmcols)(255)
1091 |     hm.dat[hm.dat >= zmax] = zmax
1092 |     #return(hm.dat)
1093 |     image(hm.dat, axes = FALSE, col = hmcols, useRaster = TRUE,
1094 |           zlim = c(zmin, zmax), xlim = c(-0.2, 1))
1095 |     
1096 |     #Add legend
1097 |     image(x = c(-0.1, -0.05), y = seq(0, 1, length.out = length(hmcols)-1),
1098 |           z = matrix(data = 1:(length(hmcols)-1), nrow = 1), add = TRUE, col = hmcols)
1099 |     axis(side = 2, at = seq(0, 1, length.out = 5),
1100 |          labels = NA,
1101 |          line = -0.6, font.axis = 2, yaxp  = c(1.1, 1.2, 3), lwd = 1)
1102 |     mtext(text = round(seq(zmin, zmax, length.out = 5), digits = 2), side = 2,
1103 |           line = -0.1, at = seq(0, 1, length.out = 5), font = 1)
1104 |     
1105 |     title(main = names(mat_list)[i], cex.main = title_size, font.main = 2)
1106 |     axis(side = 1, at = xticks,
1107 |          labels = NA, lty = 1, lwd = 1,
1108 |          font.axis = 2, cex.axis = 1, line = 0.7, tick = FALSE)
1109 |     mtext(text = c(paste0("-", xlabs[1]), xlabs[2], xlabs[3]), side = 1, line = 1, at = xticks, font = 1)
1110 |     
1111 |     rect(xleft = 0, ybottom = 0, xright = 1, ytop = 1, border = "black", lwd = 1)
1112 |   }
1113 |   
1114 |   if(!is.null(file_name)){
1115 |     dev.off()
1116 |   }
1117 |   
1118 | }
1119 | 
1120 | # bwpcaplot function to perform PCA analysis based on genomic regions of interest or around TSS sites.
1121 | #' Extract area under the curve for every peak from from given bigWig files.
1122 | #' @param colData bigWig files. Default NULL. Required.
1123 | #' @param bed bed file or a data.frame with first 3 column containing chromosome, star, end positions. 
1124 | #' @param binSize bin size to extract signal. Default 50 (bps). Should be >1
1125 | #' @param ucsc_assembly If `bed` file not provided, setting `ucsc_assembly` to TRUE will fetch transcripts from UCSC genome browser.
1126 | #' @param startFrom Default "start". For bed files this can be "start", "center" or "end". For `ucsc_assembly` this can only be "start" or "end"
1127 | #' @param pc_genes Use only protein coding genes when using `ucsc_assembly`. Default TRUE
1128 | #' @param up extend upstream by this many bps from `startFrom`. Default 2500
1129 | #' @param down extend downstream by this many bps from `startFrom`. Default 2500
1130 | #' @param nthreads Default 4
1131 | #' @export
1132 | extract_summary = function(colData, bed = NULL, ucsc_assembly = TRUE, startFrom = "start", binSize = 50,
1133 |                            up = 2500, down = 2500, pc_genes = TRUE, nthreads = 4){
1134 |   
1135 |   if(is.null(colData)){
1136 |     stop("Missing colData. Use read_coldata() to generate one.")
1137 |   }
1138 |   
1139 |   .check_windows()
1140 |   .check_bwtool()
1141 |   .check_dt()
1142 |   
1143 |   bigWigs = colData$bw_files
1144 |   custom_names = colData$bw_sample_names
1145 |   
1146 |   op_dir = tempdir() #For now
1147 |   
1148 |   if(is.null(bed)){
1149 |     if(ucsc_assembly){
1150 |       ucsc_assembly = attr(colData, "refbuild")
1151 |       message("No bed file was given. Defaulting to ucsc refseq..")
1152 |       startFrom = match.arg(arg = startFrom, choices = c("start", "end"))
1153 |       bed = .make_genome_bed(refBuild = ucsc_assembly, up = as.numeric(up), down = as.numeric(down), tss = startFrom, op_dir = op_dir, pc_genes = pc_genes, for_profile = FALSE)
1154 |       bed_annot = bed[[2]]
1155 |       bed = bed[[1]]
1156 |     }else{
1157 |       stop("Please provide either a BED file or an ucsc_assembly name")
1158 |     }
1159 |   }else{
1160 |     startFrom = match.arg(arg = startFrom, choices = c("start", "end", "center"))
1161 |     bed = .make_bed(bed = bed, op_dir = op_dir, up = as.numeric(up), down = as.numeric(up), tss = startFrom)
1162 |     bed_annot = NA
1163 |   }
1164 |   
1165 |   op_dir = tempdir() #For now
1166 |   if(!dir.exists(paths = op_dir)){
1167 |     dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE)
1168 |   }
1169 |   
1170 |   message("Extracting summaries..")
1171 |   summaries = parallel::mclapply(seq_along(bigWigs), FUN = function(idx){
1172 |     bw = bigWigs[idx]
1173 |     bn = custom_names[idx]
1174 |     cmd = paste("bwtool summary -with-sum -keep-bed -header", bed, bw, paste0(op_dir, bn, ".summary"))
1175 |     system(command = cmd, intern = TRUE)
1176 |     paste0(op_dir, bn, ".summary")
1177 |   }, mc.cores = nthreads)
1178 | 
1179 |   merge_anno = FALSE
1180 |   
1181 |   if(!is.na(bed_annot)){
1182 |     bed_annot = data.table::fread(file = bed_annot)
1183 |     bed_annot[, id := paste0(V1, ":", V2, "-", V3)]
1184 |     colnames(bed_annot) = c("chromosome", "start", "end", "tx", "gene", "id")
1185 |     merge_anno = TRUE
1186 |   }
1187 |   
1188 |   summary_list = lapply(summaries, function(x){
1189 |     x = data.table::fread(x)
1190 |     colnames(x)[1] = 'chromosome'
1191 |     x = x[,.(chromosome, start, end, size, sum)]
1192 |     #x[,id := paste0(chromosome, ":", start, "-", end)]
1193 |     x$sum
1194 |   })
1195 |   summary_list = data.frame(summary_list)
1196 |   colnames(summary_list) = custom_names
1197 |   data.table::setDT(summary_list)
1198 |   
1199 |   if(merge_anno){
1200 |     summary_list = cbind(bed_annot, summary_list)
1201 |   }else{
1202 |     bed = data.table::fread(file = bed)
1203 |     colnames(bed) = c("chromosome", "start", "end")
1204 |     summary_list = cbind(bed, summary_list)
1205 |   }
1206 |   
1207 |   list(data = summary_list, colData = colData)
1208 | }
1209 | 
1210 | #' Differential Peak Analysis
1211 | #' @details Takes output from \code{extract_summary} and performs differential peak analysis with Limma
1212 | #'
1213 | #' @param summary_list Output from \code{\link{extract_summary}}
1214 | #' @param condition a column name in \code{coldata} containing sample conditions. Default NULL.
1215 | #' @param log2 log2 convert data prior to testing. Default TRUE
1216 | #' @param num Numerator condition. Default NULL
1217 | #' @param den Denominator condition. Default NULL
1218 | #' @export
1219 | #'
1220 | diffpeak = function(summary_list = NULL, condition = NULL, log2 = TRUE, num = NULL, den = NULL){
1221 |   
1222 |   if (is.null(condition)){
1223 |     stop("Define a condition for the differential peak analysis\n")
1224 |   }
1225 |   
1226 |   sum_tbl = as.data.frame(summary_list$data)
1227 |   sum_tbl$rid = paste0("rid_", 1:nrow(sum_tbl))
1228 |   rownames(sum_tbl) = sum_tbl$rid
1229 |   coldata = as.data.frame(summary_list$colData)
1230 |   
1231 |   if(condition %in% colnames(coldata)){
1232 |     condition <- coldata[, condition]
1233 |   }else {
1234 |     print(coldata)
1235 |     stop("The condition argument must be a column name in coldata. See above for list of valid colnames")
1236 |   }
1237 |   
1238 |   exprs = sum_tbl[,coldata$bw_sample_names]
1239 |   if(log2){
1240 |     exprs = log2(x = exprs + 0.1)
1241 |   }
1242 |   #sum_tbl = sum_tbl[complete.cases(sum_tbl),, drop = FALSE]
1243 |   #sds_idx = order(apply(sum_tbl, 1, sd, na.rm = TRUE), decreasing = TRUE, na.last = TRUE)
1244 |   #sum_tbl = sum_tbl[sds_idx,,drop = FALSE]
1245 |   
1246 |   
1247 |   design <- model.matrix(~0 + as.factor(condition))
1248 |   colnames(design) <- as.character(levels(as.factor(condition)))
1249 |   
1250 |   if(is.null(num) & is.null(den)){
1251 |     contrast <- vector()
1252 |     for (a in 1:length(levels(as.factor(condition)))) {
1253 |       for (b in 1:length(levels(as.factor(condition)))) {
1254 |         if (a != b) {
1255 |           if (a < b) {
1256 |             contrast[length(contrast) + 1] <- paste(levels(as.factor(condition))[a],
1257 |                                                     levels(as.factor(condition))[b], sep = "-")
1258 |           }
1259 |         }
1260 |       }
1261 |     }
1262 |     message("Argument num and den are missing. Pefroming diffpeak analysis for below contrast:")
1263 |     print(contrast[1])
1264 |     num = unlist(data.table::tstrsplit(x = contrast[1], split = "-"))[1]
1265 |     den = unlist(data.table::tstrsplit(x = contrast[1], split = "-"))[2]
1266 |   }else{
1267 |     if(is.null(num) | is.null(den)){
1268 |       stop("Num and Den must be provided")
1269 |     }else{
1270 |       contrast = paste0(num, "-", den)
1271 |     }
1272 |   }
1273 |   
1274 |   fit <- limma::lmFit(object = exprs, design = design)
1275 |   
1276 |   cnt <- paste(colnames(design)[1], colnames(design)[2], sep = "-")
1277 |   cMat <- limma::makeContrasts(contrasts = contrast, levels = design)
1278 |   fit2 <- limma::contrasts.fit(fit, cMat)
1279 |   efit <- limma::eBayes(fit2)
1280 |   
1281 |   tt = limma::topTable(fit = efit, coef = 1, number = "all")
1282 |   tt = merge(sum_tbl, tt, by = "row.names")
1283 |   data.table::setDT(x = tt)
1284 |   tt$Row.names = tt$rid = NULL
1285 |   
1286 |   tt = tt[order(P.Value, decreasing = FALSE)]
1287 |   attr(tt, "contrast") = contrast
1288 |   tt
1289 | }
1290 | 
1291 | #' Volcano plot for limma results from differential peak Analysis
1292 | #' @details Takes output from \code{diffpeak} and draws volcano plot
1293 | #' @param res Output from \code{\link{diffpeak}}
1294 | #' @param fdr FDR threshold. Default 0.1
1295 | #' @param upcol color for up-regulated. Default "#d35400"
1296 | #' @param downcol color for up-regulated. Default "#1abc9c"
1297 | #' @param alpha Default 0.6
1298 | #' @param size Point size. Default 0.8
1299 | #' @export
1300 | #'
1301 | volcano_plot = function(res = NULL, fdr = 0.1, upcol = "#d35400", downcol = "#1abc9c", alpha = 0.6, size = 0.8){
1302 |   
1303 |   if(is.null(res)){
1304 |     stop("Missing input. Expecting output diffpeak()")
1305 |   }
1306 |   
1307 |   contrast = attr(res, "contrast")
1308 |   
1309 |   ylims = max(-log10(res$P.Value), na.rm = TRUE)
1310 |   xlims = range(res$logFC)
1311 |   
1312 |   res_sig = res[adj.P.Val < fdr]
1313 |   res_nonsig = res[!adj.P.Val < fdr]
1314 |   
1315 |   par(mar = c(3.5, 3.5, 3, 1))
1316 |   plot(NA, xlim = xlims, ylim = c(0, ylims), frame.plot = FALSE, axes = FALSE, xlab = NA, ylab = NA)
1317 |   grid()
1318 |   points(res_nonsig$logFC, y = -log10(res_nonsig$P.Value), pch = 19, col = adjustcolor("gray", alpha.f = alpha), cex = size)
1319 |   if(nrow(res_sig)>0){
1320 |     points(res_sig$logFC, y = -log10(res_sig$P.Value), pch = 19, col = ifelse(test = res_sig$logFC > 0, adjustcolor(col = upcol, alpha.f = alpha), adjustcolor(col = downcol, alpha.f = alpha)), cex = size)  
1321 |   }
1322 |   axis(side = 1, at = pretty(xlims))
1323 |   axis(side = 2, at = pretty(c(0, ylims)), las = 2)
1324 |   mtext(text = "Log Fold Change", side = 1, line = 2, col = "#34495e", font = 2)
1325 |   mtext(text = "-log10(P-Value)", side = 2, line = 2, col = "#34495e", font = 2)
1326 |   title(main = contrast, col.main = "#34495e")
1327 |   
1328 |   #Small hack to align legend properly
1329 |   leg_pos = which(abs(xlims) == max(abs(xlims)))
1330 |   leg_pos = ifelse(test = leg_pos == 1, yes = "bottomleft", no = "bottomright")
1331 |   
1332 |   legend(x = leg_pos, legend = c(paste0("Down [", nrow(res_sig[logFC < 0]), "]"), paste0("Up [", nrow(res_sig[logFC > 0]), "]")), 
1333 |          col = c(downcol, upcol), pch = 19, title = paste0("Diff. peaks [FDR<", fdr, "]"), adj = 0, bty = "n")
1334 | }
1335 | 
1336 | #' Draw a PCA plot
1337 | #' @param summary_list output from extract_summary
1338 | #' @param top Top most variable peaks to consider for PCA. Default 1000
1339 | #' @param log2 log transform data? Default FALSE. IF TRUE, adds a small positive value and log2 converts.
1340 | #' @param xpc Default PC1
1341 | #' @param ypc Default PC2
1342 | #' @param color_by a column name in \code{coldata} to color by. Default NULL.
1343 | #' @param pch_by a column name in \code{coldata} to pch by. Default NULL. 
1344 | #' @param color Manual colors for each level in `color_by` Default NULL. 
1345 | #' @param show_cree If TRUE draws a cree plot. Default TRUE
1346 | #' @param size Point size. Default 1
1347 | #' @param lab_size Font size for labels. Default 1
1348 | #' @param legend_size Default 1
1349 | #' @param legendpos Default topright
1350 | #' @param legendpos2 Default bottomright
1351 | #' @export
1352 | pca_plot = function(summary_list = NULL, top = 1000, log2 = FALSE, xpc = "PC1", ypc = "PC2", color_by = NULL, pch_by = NULL, color = NULL, 
1353 |                      show_cree = TRUE, lab_size = 1, size = 1, legend_size = 1, legendpos = "topright", legendpos2 = "bottomright"){
1354 |   
1355 |   
1356 |   if(is.null(summary_list)){
1357 |     stop("Missing input! Expecting output from extract_summary()")
1358 |   }
1359 |   
1360 |   sum_tbl = as.data.frame(summary_list$data)
1361 |   colData = summary_list$colData
1362 |   
1363 |   sum_tbl = sum_tbl[,colData$bw_sample_names]
1364 |   if(log2){
1365 |     sum_tbl = log2(x = sum_tbl + 0.1)
1366 |   }
1367 |   #sum_tbl = sum_tbl[complete.cases(sum_tbl),, drop = FALSE]
1368 |   sds_idx = order(apply(sum_tbl, 1, sd, na.rm = TRUE), decreasing = TRUE, na.last = TRUE)
1369 |   sum_tbl = sum_tbl[sds_idx,,drop = FALSE]
1370 |   
1371 |   
1372 |   if(is.null(color)){
1373 |     condition_colors = c("#2f4f4f", "#8b4513", "#228b22", "#00008b", "#ff0000", "#ffd700", "#7fff00", "#00ffff", "#ff00ff", "#6495ed", "#ffe4b5", "#ff69b4")
1374 |     # condition_colors = c("#A6CEE3FF", "#1F78B4FF", "#B2DF8AFF", "#33A02CFF", "#FB9A99FF", 
1375 |     #                      "#E31A1CFF", "#FDBF6FFF", "#FF7F00FF", "#CAB2D6FF", "#6A3D9AFF", 
1376 |     #                      "#FFFF99FF", "#9E0142FF", "#D53E4FFF", "#F46D43FF", "#000000FF", 
1377 |     #                      "#EE82EEFF", "#4169E1FF", "#7B7060FF", "#535C68FF")
1378 |   }else{
1379 |     condition_colors = color
1380 |   }
1381 |   
1382 |   condition = color_by
1383 |   pch = pch_by
1384 |   
1385 |   if(!is.null(condition)){
1386 |     group_df = cbind(sample = colData$bw_sample_names, condition = colData[,which(colnames(colData) == condition), with = FALSE])
1387 |     colnames(group_df) = c("sample", "condition")
1388 |     condition_colors = condition_colors[1:nrow(group_df[,.N,condition])]
1389 |     names(condition_colors) = group_df[,.N,condition][,condition]
1390 |     group_df$color = condition_colors[group_df$condition]
1391 |     condition_leg = TRUE
1392 |   }else{
1393 |     group_df = data.table::data.table(sample = colData$bw_sample_names)
1394 |     group_df$color = "black"
1395 |     condition_leg = FALSE
1396 |   }
1397 |   
1398 |   
1399 |   if(!is.null(pch)){
1400 |     pch_vec = 15:19
1401 |     pch_df = cbind(sample = colData$bw_sample_names, pch = colData[,which(colnames(colData) == pch), with = FALSE])
1402 |     colnames(pch_df) = c("sample", "pch")
1403 |     pchs = pch_vec[1:nrow(pch_df[,.N,pch])]
1404 |     names(pchs) = pch_df[,.N,pch][,pch]
1405 |     pch_df$pch = pchs[pch_df$pch]
1406 |     pch_leg = TRUE
1407 |   }else{
1408 |     pch_df = cbind(sample = colData$bw_sample_names, pch = 19)
1409 |     colnames(pch_df) = c("sample", "pch")
1410 |     pchs = NA
1411 |     pch_leg = FALSE
1412 |   }
1413 |   
1414 |   group_df = merge(group_df, pch_df, by = "sample", all = TRUE)
1415 |   
1416 |   if(nrow(sum_tbl) < top){
1417 |     pca = prcomp(t(sum_tbl))
1418 |   }else{
1419 |     pca = prcomp(t(sum_tbl[1:top,]))
1420 |   }
1421 |   
1422 |   pca_dat = as.data.frame(pca$x)
1423 |   pca_var_explained = pca$sdev^2/sum(pca$sdev^2)
1424 |   names(pca_var_explained) = paste0("PC", 1:length(pca_var_explained))
1425 |   data.table::setDT(x = pca_dat, keep.rownames = "sample")
1426 |   pca_dat = merge(pca_dat, group_df, by = 'sample')
1427 |   data.table::setDF(x = pca_dat)
1428 |   attr(pca_dat, "percentVar") <- round(pca_var_explained, digits = 3)
1429 |   #print(head(pca_dat))
1430 |   
1431 |   if(show_cree){
1432 |     lo = graphics::layout(mat = matrix(data = c(1, 2), ncol = 2))
1433 |   }
1434 |   
1435 |   grid_cols = "gray90"
1436 |   par(mar = c(3, 4, 2, 1))
1437 |   plot(NA, axes = FALSE, xlab = NA, ylab = NA, cex = 1.2, xlim = range(pretty(pca_dat[, xpc])), ylim = range(pretty(pca_dat[, ypc])))
1438 |   abline(h = pretty(pca_dat[, xpc]), v = pretty(pca_dat[, ypc]), col = grid_cols, lty = 2, lwd = 0.1)
1439 |   abline(h = 0, v = 0, col = grid_cols, lty = 2, lwd = 0.8)
1440 |   points(x = pca_dat[, xpc], y = pca_dat[, ypc], col = pca_dat$color, bg = pca_dat$color, pch = as.integer(pca_dat$pch), cex = size)
1441 |   axis(side = 1, at = pretty(pca_dat[, xpc]), cex.axis = 0.8)
1442 |   axis(side = 2, at = pretty(pca_dat[, ypc]), las = 2, cex.axis = 0.8)
1443 |   mtext(text = paste0(xpc, " [", round(pca_var_explained[xpc], digits = 2), "]"), side = 1, line = 2, cex = 0.8)
1444 |   mtext(text = paste0(ypc, " [", round(pca_var_explained[ypc], digits = 2), "]"), side = 2, line = 2, cex = 0.8)
1445 |   text(x = pca_dat[, xpc], y = pca_dat[, ypc], labels = pca_dat$sample, pos = 3, col = pca_dat$color, xpd = TRUE, cex = lab_size)
1446 |   #title(main = NA, sub = paste0("N = ", top, " peaks"), adj = 0, outer = TRUE)
1447 |   
1448 |   data.table::setDT(x = pca_dat)
1449 |   
1450 |   if(condition_leg){
1451 |     legend(x = legendpos, legend = names(condition_colors), col = condition_colors, 
1452 |            pch = 19, cex = legend_size, xpd = TRUE, ncol = 2, bty = "n")
1453 |   }
1454 |   
1455 |   if(pch_leg){
1456 |     legend(x = legendpos2, legend = names(pchs), col = "black", 
1457 |            pch = as.integer(pchs), cex = legend_size, xpd = TRUE, ncol = 1, bty = "n")
1458 |   }
1459 |   
1460 |   if(show_cree){
1461 |     par(mar = c(3, 4, 2, 4))
1462 |     b = barplot(height = pca_var_explained, names.arg = NA, col = "#2c3e50", ylim = c(0, 1), las = 2, axes = FALSE)
1463 |     points(x = b, y = cumsum(pca_var_explained), type = 'l', lty = 2, lwd = 1.2, xpd = TRUE, col = "#c0392b")
1464 |     points(x = b, y = cumsum(pca_var_explained), type = 'p', pch = 19, xpd = TRUE, col = "#c0392b")
1465 |     mtext(text = paste0("PC", 1:length(pca_var_explained)), side = 1, at = b, las = 2, line = 0.5, cex = 0.8)
1466 |     axis(side = 2, at = seq(0, 1, 0.1), line = 0, las = 2, cex.axis = 0.8)
1467 |     mtext(text = "var. explained", side = 2, line = 2.5)
1468 |     axis(side = 4, at = seq(0, 1, 0.1), line = 0, las = 2, cex.axis = 0.8)
1469 |     mtext(text = "cumulative var. explained", side = 4, line = 2.5)
1470 |   }
1471 |   
1472 |   invisible(x = pca_dat)
1473 | }
1474 | 
1475 | 
1476 | #' Parse peak annotations generated by homer annotatePeaks.pl
1477 | #' @details summarizes peak annotations generated with homer annotatePeaks.pl, generates a pie chart of peak distributions.
1478 | #'
1479 | #' @param anno Raw annotations generated by homer `annotatePeaks.pl`. Can be more than one file.
1480 | #' @param sample_names Sample names correspoding to each input file. Default parses from input file.
1481 | #' @param legend_font_size font size for legend. Default 1.
1482 | #' @param label_size font size for labels. Default 0.8.
1483 | #' @export
1484 | 
1485 | summarize_homer_annots = function(anno, sample_names = NULL, legend_font_size = 1, label_size = 0.8){
1486 |   
1487 |   homer = lapply(anno, function(homer.anno){
1488 |     homer.anno = data.table::fread(homer.anno)
1489 |     colnames(homer.anno)[1] = 'peakid'
1490 |     homer.anno$anno = sapply(strsplit(x = as.character(homer.anno$Annotation), split = ' (', fixed = T), '[[', 1)
1491 |     
1492 |     homer.anno = homer.anno[,.(Chr, Start, End, Strand, anno, `Gene Name`, `Gene Type`,`Distance to TSS`, `Nearest PromoterID`, `Nearest Ensembl`)]
1493 |     homer.anno = homer.anno[order(anno, `Gene Name`)]
1494 |     colnames(homer.anno) = c('Chr', 'Start', 'End', 'Strand', 'Annotation', 'Hugo_Symbol',
1495 |                              'Biotype', 'Distance_to_TSS', 'Nearest_PromoterID', 'Nearest_Ens')
1496 |     homer.anno$Annotation = gsub(pattern = "3' UTR", replacement = '3pUTR', x = homer.anno$Annotation)
1497 |     homer.anno$Annotation = gsub(pattern = "5' UTR", replacement = '5pUTR', x = homer.anno$Annotation)
1498 |     homer.anno
1499 |   })
1500 |   
1501 |   if(is.null(sample_names)){
1502 |     names(homer) = unlist(data.table::tstrsplit(x = basename(anno), split = "\\.", keep = 1))
1503 |   }else{
1504 |     names(homer) = sample_names
1505 |   }
1506 |   
1507 |   homer.anno.stats = lapply(homer, function(h){
1508 |     homer.anno.stats = h[,.N,Annotation][,fract := N/sum(N)][order(N, decreasing = TRUE)]
1509 |     homer.anno.stats[,leg := paste0(Annotation, " [", N, "]")]
1510 |     homer.anno.stats
1511 |   })
1512 |   npeaks = unlist(lapply(homer, nrow))
1513 |   
1514 |   homer.anno.stats = data.table::rbindlist(l = homer.anno.stats, idcol = "Sample")
1515 |   homer.anno.stats = data.table::dcast(data = homer.anno.stats, Annotation ~ Sample, value.var = "fract", fill = 0)
1516 |   data.table::setDF(x = homer.anno.stats, rownames = homer.anno.stats$Annotation)
1517 |   homer.anno.stats =homer.anno.stats[,-1, drop = FALSE]
1518 |   
1519 |   pie.cols = c('3pUTR' = '#E7298A', '5pUTR' = '#D95F02', 'Intergenic' = '#BEBADA',
1520 |                'TTS' = '#FB8072', 'exon' = '#80B1D3', 'intron' = '#FDB462',
1521 |                'non-coding' = '#FFFFB3',
1522 |                'NA' = 'gray70', 'promoter-TSS' = '#1B9E77')
1523 |   
1524 |   homer.anno.stats = homer.anno.stats[names(pie.cols)[names(pie.cols) %in% rownames(homer.anno.stats)],,drop = FALSE]
1525 |   
1526 |   pie.col = pie.cols[rownames(homer.anno.stats)]
1527 |   
1528 |   #graphics::layout(mat = matrix(data = c(1, 2), nrow = 2), heights = c(4, 1.25))
1529 |   par(mar = c(2, 4, 5, 3))
1530 |   b = barplot(height = as.matrix(homer.anno.stats), col = pie.col, horiz = TRUE,
1531 |               las = 2, axes = FALSE, names.arg = rep(NA, ncol(homer.anno.stats)), border = "#34495e")
1532 |   axis(side = 1, at = seq(0, 1, 0.25), font = 2, lwd = 2)
1533 |   mtext(text = npeaks[colnames(x = homer.anno.stats)], side = 4, at = b, las = 2, font = 4, adj = -0.2)
1534 |   mtext(text = colnames(homer.anno.stats), side = 2, at = b, las = 2, adj = 1, font = 2)
1535 |   
1536 |   #plot.new()
1537 |   #par(mar = c(1, 0, 1, 1))
1538 |   .add_legend("topleft", legend = names(pie.col), col = pie.col,  bty = "n", border=NA,
1539 |              xpd = TRUE, text.font = 2, pch = 15, xjust = 0, yjust = 0,
1540 |              cex = 0.8, y.intersp = 1.5, x.intersp = 1,
1541 |              pt.cex = 1.2 * 0.8, ncol = 3)
1542 |   
1543 |   homer
1544 | }
1545 | 
1546 | # if(plot){
1547 | #   homer.anno.stats = homer.anno[,.N,Annotation][,fract := round(N/sum(N), digits = 2)][order(N, decreasing = TRUE)]
1548 | #   homer.anno.stats[,leg := paste0(Annotation, " [", N, "]")]
1549 | #
1550 | #   pie.cols = c('3pUTR' = '#E7298A', '5pUTR' = '#D95F02', 'Intergenic' = '#BEBADA',
1551 | #                'TTS' = '#FB8072', 'exon' = '#80B1D3', 'intron' = '#FDB462', 'non-coding' = '#FFFFB3',
1552 | #                'NA' = 'gray70', 'promoter-TSS' = '#1B9E77')
1553 | #
1554 | #   par(bty="n", mgp = c(0.5,0.5,0), las=1, tcl=-.25, font.main=4,xpd=NA, mar=c(0,0,1,3))
1555 | #   pie(x = homer.anno.stats$fract, col = pie.cols[homer.anno.stats$Annotation], labels = homer.anno.stats$fract,
1556 | #       border = "white", radius = 0.8, init.angle = 0, font = 4, cex = label_size)
1557 | #   symbols(0,0,circles=.3, inches=FALSE, col="white", bg="white", lty=0, add=TRUE)
1558 | #   add_legend(x = "topright", bty = "n", legend = homer.anno.stats$leg,
1559 | #              col = pie.cols[homer.anno.stats$Annotation],
1560 | #              cex = legend_font_size, pch = 15, text.font = 4)
1561 | # }
1562 | 
1563 | #------------------------------------------------------------------------------------------------------------------------------------
1564 | #                                                   Undocumented Accessory functions
1565 | #------------------------------------------------------------------------------------------------------------------------------------
1566 | 
1567 | # Add legends outside the margin
1568 | .add_legend <- function(...) {
1569 |   opar <- par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0),
1570 |               mar=c(0, 0, 0, 0), new=TRUE)
1571 |   on.exit(par(opar))
1572 |   plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
1573 |   legend(...)
1574 | }
1575 | 
1576 | .make_layout = function(ntracks, ntracks_h = 3, cytoband = TRUE, cytoband_h = 1, genemodel = TRUE, genemodel_h = 1, chrHMM = TRUE, chrHMM_h = 1, loci = TRUE, loci_h = 2, scale_track_height = 1, lord = NULL){
1577 |   
1578 |   #track heights (peaks, bigwig tracks, chromHMM, gene model, cytoband, scale)
1579 |   lo_h_ord =  list("p" = loci_h, "b" = rep(ntracks_h, ntracks), "h" = chrHMM_h, "g" = genemodel_h, "c" = cytoband_h, "s" = scale_track_height)
1580 |   mat_ord = c("p" = 1, "b" = 2, "h" = 3, "g" = 4, "c" = 5, "s" = 6)
1581 |   
1582 |   case = NULL
1583 |   #print(paste(cytoband, genemodel, chrHMM, loci, sep = ", "))
1584 |   
1585 |   if(cytoband & genemodel & chrHMM & loci){
1586 |     #print("all")
1587 |     case = 1
1588 |     lo_h_ord = lo_h_ord[c("p", "b", "h", "g", "s", "c")] #c(loci_h, rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height, cytoband_h)
1589 |   }else if(cytoband & genemodel & chrHMM == FALSE & loci){
1590 |     #print("no hmm")
1591 |     case = 2
1592 |     lo_h_ord = lo_h_ord[c("p", "b", "g", "s", "c")] #c(loci_h, rep(ntracks_h, ntracks), genemodel_h, scale_track_height, cytoband_h)
1593 |   }else if(cytoband & genemodel & chrHMM == FALSE & loci == FALSE){
1594 |     #print("no hmm and no loci")
1595 |     case = 3
1596 |     lo_h_ord = lo_h_ord[c("b", "g", "s", "c")] #c(rep(ntracks_h, ntracks), genemodel_h, scale_track_height, cytoband_h)
1597 |   }else if(cytoband & genemodel == FALSE & chrHMM == FALSE & loci == FALSE){
1598 |     #print("no hmm and no loci and no genemodel")
1599 |     case = 4
1600 |     lo_h_ord = lo_h_ord[c("b", "s", "c")] #c(rep(ntracks_h, ntracks), scale_track_height, cytoband_h)
1601 |   }else if(cytoband == FALSE & genemodel == FALSE & chrHMM == FALSE & loci == FALSE){
1602 |     #print("no hmm and no loci and no genemodel and no cytoband")
1603 |     case = 5
1604 |     lo_h_ord = lo_h_ord[c("b", "s")] #c(rep(ntracks_h, ntracks), scale_track_height)
1605 |   }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == TRUE & loci == FALSE){
1606 |     #print("no loci and no cytoband")
1607 |     case = 6
1608 |     lo_h_ord = lo_h_ord[c("b", "h", "g", "s")] #c(rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height)
1609 |   }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == TRUE & loci == TRUE){
1610 |     #print("no cytoband")
1611 |     case = 7
1612 |     lo_h_ord = lo_h_ord[c("p", "b", "h", "g", "s")] #c(loci_h, rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height)
1613 |   }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == FALSE & loci == TRUE){
1614 |     #print("no cytoband no chrHMM")
1615 |     case = 8
1616 |     lo_h_ord = lo_h_ord[c("p", "b", "g", "s")] #c(loci_h, rep(ntracks_h, ntracks), genemodel_h, scale_track_height)
1617 |   }else if(cytoband == FALSE & genemodel == TRUE & chrHMM == FALSE & loci == FALSE){
1618 |     #print("no cytoband no chrHMM no loci")
1619 |     case = 9
1620 |     lo_h_ord = lo_h_ord[c("b", "g", "s")] #c(rep(ntracks_h, ntracks), genemodel_h, scale_track_height)
1621 |   }else if(cytoband == TRUE & genemodel == TRUE & chrHMM == TRUE & loci == FALSE){
1622 |     #print("no loci")
1623 |     case = 10
1624 |     lo_h_ord = lo_h_ord[c("b", "h", "g", "s", "c")] #c(rep(ntracks_h, ntracks), chrHMM_h, genemodel_h, scale_track_height, cytoband_h)
1625 |   }else if(cytoband == TRUE & genemodel == FALSE & chrHMM == FALSE & loci == TRUE){
1626 |     #print("no genemodel no chrhmm")
1627 |     case = 11
1628 |     lo_h_ord = lo_h_ord[c("p", "b", "s", "c")] #c(loci_h, rep(ntracks_h, ntracks), scale_track_height, cytoband_h)
1629 |   }else{
1630 |     #print("Something is wrong!")
1631 |   }
1632 |   
1633 |   lord = c(lord, "s")
1634 |   lord = c(intersect(lord, names(lo_h_ord)), setdiff(names(lo_h_ord), lord))
1635 |   lo_heights = unlist(lo_h_ord[lord], use.names = FALSE)
1636 |   lo = lo_h_ord[lord]
1637 |   
1638 |   #Order in which plots are drawn by default
1639 |   plot_ord = data.frame(row.names = c("p", "b", "h", "g", "s", "c"), name = c("p", "b", "h", "g", "s", "c"), ord = 1:6)
1640 |   plot_ord = plot_ord[names(lo),,drop = FALSE]
1641 |   plot_ord$ord_req = 1:nrow(plot_ord) #Required order
1642 |   plot_ord = plot_ord[order(plot_ord$ord),]
1643 |   
1644 |   ### Re-organize the layout by user specification
1645 |   dt = data.table::data.table(name = plot_ord$name, ord = plot_ord$ord, ord_req = plot_ord$ord_req)
1646 |   
1647 |   dt$n_tracks = ifelse(test = dt$name == 'b', yes = ntracks, no = 1)
1648 |   #print(dt)
1649 |   dt_s = split(dt, dt$n_tracks)
1650 |   #print(dt_s)
1651 |   if(length(dt_s) > 1){
1652 |     dt_mult = dt_s[[2]]
1653 |     dt_sing = dt_s[[1]]
1654 |     dt_sing[,n_tracks := 1]
1655 |     dt_sing$ord_req2 = dt_sing$ord_req
1656 |     dt_mult = data.table::data.table(name = rep(dt_mult$name, dt_mult$n_tracks), ord = dt_mult$ord, ord_req = dt_mult$ord_req, ord_req2 = seq(dt_mult$ord_req, dt_mult$ord_req + dt_mult$n_tracks -1 , 1), n_tracks = dt_mult$n_tracks)
1657 |     
1658 |     dt = data.table::rbindlist(l = list(dt_sing, dt_mult), use.names = TRUE, fill = TRUE)
1659 |     dt = dt[order(ord_req)]
1660 |     
1661 |     dt = split(dt, dt$ord) |> data.table::rbindlist()
1662 |     dt[, ord_req4 := 1:nrow(dt)]
1663 |     dt = dt[order(ord_req)]
1664 |     data = dt$ord_req4
1665 |   }else{
1666 |     #ifelse(test = dt$n_tracks > 1, )
1667 |     #print(dt)
1668 |     dt = data.table::data.table(name = rep(dt$name, dt$n_tracks), ord = dt$ord, ord_req = dt$ord_req, ord_req2 = dt$ord_req, n_tracks = dt$n_tracks)
1669 |     #print(dt)
1670 |     
1671 |     dt = split(dt, dt$ord) |> data.table::rbindlist()
1672 |     dt[, ord_req4 := 1:nrow(dt)]
1673 |     dt = dt[order(ord_req)]
1674 |     data = dt$ord_req
1675 |   }
1676 | 
1677 |   if(case == 1){
1678 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1679 |   }else if(case == 2){
1680 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1681 |   }else if(case == 3){
1682 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1683 |   }else if(case == 4){
1684 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1685 |   }else if(case == 5){
1686 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1687 |   }else if(case == 6){
1688 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1689 |   }else if(case == 7){
1690 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1691 |   }else if(case == 8){
1692 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1693 |   }else if(case == 9){
1694 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1695 |   }else if(case == 10){
1696 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1697 |   }else if(case == 11){
1698 |     lo = graphics::layout(mat = matrix(data = data), heights = lo_heights)
1699 |   }
1700 |   
1701 |   lo_h_ord[lord]
1702 | }
1703 | 
1704 | .gen_windows = function(chr = NA, start, end, window_size = 50, op_dir = getwd()){
1705 |   #chr = "chr19"; start = 15348301; end = 15391262; window_size = 50; op_dir = getwd()
1706 |   message(paste0("Generating windows ", "[", window_size, " bp window size]"))
1707 |   
1708 |   window_dat = data.table::data.table()
1709 |   #temp = start;
1710 |   while(start <= end){
1711 |     window_dat = data.table::rbindlist(l = list(window_dat, data.table::data.table(start, end = start + window_size)), fill = TRUE)
1712 |     start = start + window_size
1713 |   }
1714 |   window_dat$chr = chr
1715 |   window_dat = window_dat[, .(chr, start, end)]
1716 |   
1717 |   print(window_dat)
1718 |   
1719 |   op_dir = paste0(op_dir, "/")
1720 |   
1721 |   if(!dir.exists(paths = op_dir)){
1722 |     dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE)
1723 |   }
1724 |   
1725 |   temp_op_bed = tempfile(pattern = "trackr", tmpdir = op_dir, fileext = ".bed")
1726 |   data.table::fwrite(x = window_dat, file = temp_op_bed, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
1727 |   temp_op_bed
1728 | }
1729 | 
1730 | 
1731 | .get_summaries = function(bedSimple, bigWigs, op_dir = getwd(), nthreads = 1){
1732 |   #bedSimple = temp_op_bed; bigWigs = list.files(path = "./", pattern = "bw"); op_dir = getwd(); nthreads = 1
1733 |   op_dir = paste0(op_dir, "/")
1734 |   
1735 |   if(!dir.exists(paths = op_dir)){
1736 |     dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE)
1737 |   }
1738 |   
1739 |   message(paste0("Extracting signals"))
1740 |   
1741 |   summaries = parallel::mclapply(bigWigs, FUN = function(bw){
1742 |     bn = gsub(pattern = "\\.bw$|\\.bigWig$", replacement = "", x = basename(bw))
1743 |     message(paste0("    Processing ", bn, " .."))
1744 |     cmd = paste("bwtool summary -with-sum -keep-bed -header", bedSimple, bw, paste0(op_dir, bn, ".summary"))
1745 |     system(command = cmd, intern = TRUE)
1746 |     paste0(op_dir, bn, ".summary")
1747 |   }, mc.cores = nthreads)
1748 |   
1749 |   summary_list = lapply(summaries, function(x){
1750 |     x = data.table::fread(x)
1751 |     colnames(x)[1] = 'chromosome'
1752 |     x = x[,.(chromosome, start, end, size, max)]
1753 |     if(all(is.na(x[,max]))){
1754 |       message("No signal! Possible cause: chromosome name mismatch between bigWigs and queried loci.")
1755 |       x[, max := 0]
1756 |     }
1757 |     x
1758 |   })
1759 |   
1760 |   
1761 |   #Remove intermediate files
1762 |   lapply(summaries, function(x) system(command = paste0("rm ", x), intern = TRUE))
1763 |   system(command = paste0("rm ", bedSimple), intern = TRUE)
1764 |   
1765 |   names(summary_list) = gsub(pattern = "*\\.summary$", replacement = "", x = basename(path = unlist(summaries)))
1766 |   summary_list
1767 | }
1768 | 
1769 | 
1770 | .extract_geneModel = function(ucsc_tbl = NULL, chr = NULL, start = NULL, end = NULL, txname = txname, genename = genename){
1771 |   
1772 |   message("Parsing UCSC file..")
1773 |   if(is(object = ucsc_tbl, class2 = "data.frame")){
1774 |     ucsc = data.table::as.data.table(ucsc_tbl)
1775 |   }else if(file.exists(ucsc_tbl)){
1776 |     ucsc = data.table::fread(file = ucsc_tbl)  
1777 |   }else{
1778 |     stop("Gene model must be a file or a data.frame")
1779 |   }
1780 |   
1781 |   query = data.table::data.table(chr, start, end)
1782 |   query[,chr := as.character(chr)]
1783 |   query[,start := as.numeric(as.character(start))]
1784 |   query[,end := as.numeric(as.character(end))]
1785 |   data.table::setkey(x = query, chr, start, end)
1786 |   
1787 |   colnames(ucsc)[c(3, 5, 6)] = c("chr", "start", "end")
1788 |   data.table::setkey(x = ucsc, chr, start, end)
1789 |   
1790 |   gene_models = data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL)
1791 |   
1792 |   if(nrow(gene_models) == 0){
1793 |     message("No features found within the requested loci! If you are not sure why..\n    1.Make sure there are no discripancies in chromosome names i.e, chr prefixes\n")
1794 |     return(NULL)  
1795 |   }else{
1796 |     if(!is.null(txname)){
1797 |       gene_models = gene_models[name %in% txname]
1798 |     }
1799 |     
1800 |     if(!is.null(genename)){
1801 |       gene_models = gene_models[name2 %in% genename]
1802 |     }
1803 |     
1804 |     exon_tbls = lapply(seq_along(along.with = 1:nrow(gene_models)), function(idx){
1805 |       exon_start = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonStarts], split = ",")))
1806 |       exon_end = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonEnds], split = ",")))
1807 |       exon_tbl = data.frame(start = exon_start, end = exon_end)
1808 |       attributes(exon_tbl) = list(start = gene_models[idx, start], end = gene_models[idx, end], strand = gene_models[idx, strand], tx = gene_models[idx, name], gene = gene_models[idx, name2])
1809 |       exon_tbl
1810 |     })
1811 |     
1812 |     return(exon_tbls)
1813 |   }
1814 | }
1815 | 
1816 | .extract_cytoband = function(chr = NULL, refBuild = "hg19", tblName = "cytoBand"){
1817 |   
1818 |   if(!grepl(pattern = "^chr", x = chr)){
1819 |     message("Adding chr prefix to target chromosome for UCSC query..")
1820 |     chr = paste0("chr", chr)
1821 |   }
1822 |   
1823 |   cmd = paste0(
1824 |     "mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ",
1825 |     refBuild,
1826 |     " -e 'select chrom, chromStart, chromEnd, name, gieStain from ",  tblName, " WHERE chrom =\"",
1827 |     chr,
1828 |     "\"'"
1829 |   )
1830 |   message(paste0("Extracting cytobands from UCSC:\n", "    chromosome: ", chr, "\n", "    build: ", refBuild, "\n    query: ", cmd))
1831 |   
1832 |   cyto = data.table::fread(cmd = cmd, colClasses = c("character", "numeric", "numeric", "character", "character"))
1833 |   colnames(cyto) = c("chr", "start", "end", "band", "stain")
1834 |   data.table::setkey(x = cyto, chr, start, end)
1835 |   
1836 |   #Color codes from https://github.com/jianhong/trackViewer (Thank you..)
1837 |   ### gieStain #############################
1838 |   # #FFFFFF - gneg    - Giemsa negative bands
1839 |   # #999999 - gpos25  - Giemsa positive bands
1840 |   # #666666 - gpos50  - Giemsa positive bands
1841 |   # #333333 - gpos75  - Giemsa positive bands
1842 |   # #000000 - gpos100 - Giemsa positive bands
1843 |   # #660033 - acen    - centromeric regions
1844 |   # #660099 - gvar    - variable length heterochromatic regions
1845 |   # #6600cc - stalk   - tightly constricted regions on the short arms of
1846 |   #                     the acrocentric chromosomes
1847 |   colorSheme = c(
1848 |     "gneg"    = "#FFFFFF",
1849 |     "acen"    = "#660033",
1850 |     "gvar"    = "#660099",
1851 |     "stalk"   = "#6600CC"
1852 |   )
1853 |   gposCols <- sapply(1:100, function(i){
1854 |     i <- as.hexmode(round(256-i*2.56, digits = 0))
1855 |     i <- toupper(as.character(i))
1856 |     if(nchar(i)==1) i <- paste0("0", i)
1857 |     return(paste0("#", i, i, i))
1858 |   })
1859 |   names(gposCols) <- paste0("gpos", 1:100)
1860 |   colorSheme <- c(gposCols, colorSheme)
1861 |   cyto$color = colorSheme[cyto[,stain]]
1862 |   cyto
1863 |   
1864 |   cyto
1865 | }
1866 | 
1867 | 
1868 | .load_chromHMM = function(chr, start, end, ucsc){
1869 |   
1870 |   if(nrow(ucsc) == 0){
1871 |     message("No features found within the requested loci!")
1872 |     return(NULL)
1873 |   }
1874 |   colnames(ucsc)[1:4] = c("chr", "start", "end", "name")
1875 |   if(!grepl(pattern = "^chr", x = chr)){
1876 |     ucsc[, chr := gsub(pattern = "^chr", replacement = "", x = chr)]
1877 |   }
1878 |   data.table::setkey(x = ucsc, chr, start, end)
1879 |   
1880 |   query = data.table::data.table(chr = chr, start = start, end = end)
1881 |   data.table::setkey(x = query, chr, start, end)
1882 |   
1883 |   data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL)[,.(chr, start, end, name)]
1884 | }
1885 | 
1886 | .extract_chromHmm_ucsc = function(chr, start, end, refBuild = "hg38", tbl){
1887 |   
1888 |   if(!grepl(pattern = "^chr", x = chr)){
1889 |     message("Adding chr prefix to target chromosome for UCSC query..")
1890 |     tar_chr = paste0("chr", chr)
1891 |   }else{
1892 |     tar_chr = chr
1893 |   }
1894 |   
1895 |   ucsc_tbls = .get_ucsc_hmm_tbls()
1896 |   tbl = match.arg(arg = tbl, choices = ucsc_tbls$TableName)
1897 |   
1898 |   .check_mysql()
1899 |   
1900 |   cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ",  refBuild,  " -e 'select chrom, chromStart, chromEnd, name from ", tbl, " WHERE chrom =\"", tar_chr, "\"'")
1901 |   message(paste0("Extracting chromHMM from UCSC:\n", "    chromosome: ", tar_chr, "\n", "    build: ", refBuild, "\n    query: ", cmd))
1902 |   #system(command = cmd)
1903 |   ucsc = data.table::fread(cmd = cmd)
1904 |   if(nrow(ucsc) == 0){
1905 |     message("No features found within the requested loci!")
1906 |     return(NULL)
1907 |   }
1908 |   colnames(ucsc) = c("chr", "start", "end", "name")
1909 |   if(!grepl(pattern = "^chr", x = chr)){
1910 |     ucsc[, chr := gsub(pattern = "^chr", replacement = "", x = chr)]
1911 |   }
1912 |   data.table::setkey(x = ucsc, chr, start, end)
1913 |   
1914 |   query = data.table::data.table(chr = chr, start = start, end = end)
1915 |   data.table::setkey(x = query, chr, start, end)
1916 |   
1917 |   data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL)[,.(chr, start, end, name)]
1918 | }
1919 | 
1920 | .extract_geneModel_ucsc_bySymbol = function(genesymbol, refBuild){
1921 |   .check_mysql()
1922 |   op_file = tempfile(pattern = "ucsc", fileext = ".tsv")
1923 |   
1924 |   cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ",  refBuild,  " -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE name2 =\"", genesymbol, "\"'")
1925 |   message(paste0("Extracting gene models from UCSC:\n", "    Gene: ", genesymbol, "\n", "    build: ", refBuild, "\n    query: ", cmd))
1926 |   
1927 |   ucsc = data.table::fread(cmd = cmd, sep = "\t")
1928 |   if(nrow(ucsc) == 0){
1929 |     message("No features found within the requested loci!")
1930 |     return(NULL)
1931 |   }
1932 |   
1933 |   colnames(ucsc) = c("chr", "start", "end", "strand", "name", "name2", "exonStarts", "exonEnds")
1934 |   data.table::setkey(x = ucsc, chr, start, end)
1935 |   ucsc
1936 | }
1937 | 
1938 | .extract_geneModel_ucsc = function(chr, start = NULL, end = NULL, refBuild = "hg19", txname = NULL, genename = NULL){
1939 |   .check_mysql()
1940 |   op_file = tempfile(pattern = "ucsc", fileext = ".tsv")
1941 |   
1942 |   if(!grepl(pattern = "^chr", x = chr)){
1943 |     message("Adding chr prefix to target chromosome for UCSC query..")
1944 |     tar_chr = paste0("chr", chr)
1945 |   }else{
1946 |     tar_chr = chr
1947 |   }
1948 |   
1949 |   cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ",  refBuild,  " -e 'select chrom, txStart, txEnd, strand, name, name2, exonStarts, exonEnds from refGene WHERE chrom =\"", tar_chr, "\"'")
1950 |   message(paste0("Extracting gene models from UCSC:\n", "    chromosome: ", tar_chr, "\n", "    build: ", refBuild, "\n    query: ", cmd))
1951 |   #system(command = cmd)
1952 |   ucsc = data.table::fread(cmd = cmd)
1953 |   if(nrow(ucsc) == 0){
1954 |     message("No features found within the requested loci!")
1955 |     return(NULL)
1956 |   }
1957 |   colnames(ucsc) = c("chr", "start", "end", "strand", "name", "name2", "exonStarts", "exonEnds")
1958 |   if(!grepl(pattern = "^chr", x = chr)){
1959 |     ucsc[, chr := gsub(pattern = "^chr", replacement = "", x = chr)]
1960 |   }
1961 |   data.table::setkey(x = ucsc, chr, start, end)
1962 |   
1963 |   query = data.table::data.table(chr = chr, start = start, end = end)
1964 |   data.table::setkey(x = query, chr, start, end)
1965 |   
1966 |   gene_models = data.table::foverlaps(x = query, y = ucsc, type = "any", nomatch = NULL)
1967 |   
1968 |   if(nrow(gene_models) == 0){
1969 |     message("No features found within the requested loci!")
1970 |     return(NULL) 
1971 |   }else{
1972 |     return(gene_models)
1973 |   }
1974 | }
1975 | 
1976 | .make_exon_tbl = function(gene_models, txname = NULL, genename = NULL){
1977 |   if(!is.null(txname)){
1978 |     gene_models = gene_models[name %in% txname]
1979 |   }
1980 |   
1981 |   if(nrow(gene_models) == 0){
1982 |     message("    Requested transcript ", txname, " does not exist within the queried region!\n    Skipping gene track plotting..")
1983 |     return(NULL)
1984 |   }
1985 |   
1986 |   if(!is.null(genename)){
1987 |     gene_models = gene_models[name2 %in% genename]
1988 |   }
1989 |   
1990 |   if(nrow(gene_models) == 0){
1991 |     message("    Requested gene ", genename, " does not exist within the queried region!\n    Skipping gene track plotting..")
1992 |     return(NULL)
1993 |   }
1994 |   
1995 |   exon_tbls = lapply(seq_along(along.with = 1:nrow(gene_models)), function(idx){
1996 |     exon_start = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonStarts], split = ",")))
1997 |     exon_end = as.numeric(unlist(data.table::tstrsplit(x = gene_models[idx, exonEnds], split = ",")))
1998 |     exon_tbl = data.frame(start = exon_start, end = exon_end)
1999 |     attributes(exon_tbl) = list(start = gene_models[idx, start], end = gene_models[idx, end], strand = gene_models[idx, strand], tx = gene_models[idx, name], gene = gene_models[idx, name2])
2000 |     exon_tbl
2001 |   })
2002 |   
2003 |   return(exon_tbls)
2004 | }
2005 | 
2006 | .get_ucsc_hmm_states_cols = function(){
2007 |   states = c("red", "red4", "purple", "orange", "orange", "yellow", "yellow", 
2008 |              "blue", "darkgreen", "darkgreen", "lightgreen", "gray", "gray90", 
2009 |              "gray90", "gray90")
2010 |   names(states) = 1:15
2011 |   states
2012 | }
2013 | 
2014 | .plot_ucsc_chrHmm = function(d, start = NULL, end = NULL, hmm_cols = NULL){
2015 |   #hmm_cols = .get_ucsc_hmm_states_cols()
2016 |   
2017 |   if(is.null(start)){
2018 |     start = min(data.table::rbindlist(l = d, use.names = TRUE, fill = TRUE)[,start], na.rm = TRUE)  
2019 |   }
2020 |   
2021 |   if(is.null(end)){
2022 |     end = max(data.table::rbindlist(l = d, use.names = TRUE, fill = TRUE)[,end], na.rm = TRUE)
2023 |   }
2024 |   
2025 |   plot(NA, ylim = c(0, length(d)), xlim = c(start, end), axes = FALSE, xlab = NA, ylab = NA)
2026 |   lapply(seq_along(d), function(i){
2027 |     di = d[[i]]
2028 |     di$state = unlist(data.table::tstrsplit(x = di$name, split = "_", keep = 1))
2029 |     rect(xleft = di$start, ybottom = i-0.9, xright = di$end, ytop = i-0.1, col = hmm_cols[di$state], border = NA)
2030 |     di_name = gsub(pattern = "wgEncodeBroadHmm|HMM", replacement = "", x = names(d)[i])
2031 |     text(x = start, y = i - 0.5, labels = di_name, adj = 1.2, xpd = TRUE)
2032 |     #mtext(text = di_name, side = 2, line = 1, outer = TRUE, cex = 1)
2033 |   })
2034 |   
2035 | }
2036 | 
2037 | .collapse_tx = function(exon_tbls){
2038 |   message("Collapsing transcripts..")
2039 |   tx_tbl = lapply(exon_tbls, function(x){
2040 |     xdt = data.table::data.table(start = x[[1]], end = x[[2]])
2041 |     xdt$tx = attr(x = x, which = "tx")
2042 |     xdt$gene = attr(x = x, which = "gene")
2043 |     xdt$strand = attr(x = x, which = "strand")
2044 |     xdt$tx_start = attr(x = x, which = "start")
2045 |     xdt$tx_end = attr(x = x, which = "end")
2046 |     xdt
2047 |   })
2048 |   tx_tbl = data.table::rbindlist(l = tx_tbl)
2049 |   tx_tbl[,id := paste0(start, ":", end)]
2050 |   tx_tbl = tx_tbl[!duplicated(id)]
2051 |   
2052 |   exon_tbls = lapply(split(tx_tbl, as.factor(as.character(tx_tbl$gene))), function(x){
2053 |     x = x[order(start)]
2054 |     exon_start = as.numeric(x[,start])
2055 |     exon_end = as.numeric(x[,end])
2056 |     gene_tbl = data.frame(start = exon_start, end = exon_end)
2057 |     attributes(gene_tbl) = list(start = min(x[, tx_start]), end = max(x[, tx_end]), strand = unique(x[, strand]), tx = NA, gene = unique(x[, gene]))
2058 |     gene_tbl
2059 |   })
2060 |   
2061 |   exon_tbls
2062 | }
2063 | 
2064 | .parse_gtf = function(gtf = NULL, chr, start = NULL, end = NULL, refBuild = "hg19", txname = NULL, genename = NULL){
2065 |   message("Parsing gtf file..")
2066 |   if(is(object = gtf, class2 = "data.frame")){
2067 |     gtf = data.table::as.data.table(gtf)
2068 |   }else if(file.exists(gtf)){
2069 |     gtf = data.table::fread(file = gtf)  
2070 |   }else{
2071 |     stop("Gene model must be a file or a data.frame")
2072 |   }
2073 |   
2074 |   colnames(gtf) = c("chr", "source", "feature", "start", "end", "ph", "strand", "ph2", "info")
2075 |   gtf[,chr := as.character(chr)]
2076 |   gtf[,start :=as.numeric(as.character(start))]
2077 |   gtf[,end := as.numeric(as.character(end))]
2078 |   data.table::setkey(x = gtf, chr, start, end)
2079 |   
2080 |   if(!is.null(genename)){
2081 |     gene_models = gtf[info %like% genename]
2082 |     if(nrow(gene_models) == 0){
2083 |       message("No features found for the gene ", genename)
2084 |       return(NULL)  
2085 |     }
2086 |   }else{
2087 |     query = data.table::data.table(chr, start, end)
2088 |     data.table::setkey(x = query, chr, start, end)
2089 |     gene_models = data.table::foverlaps(x = query, y = gtf, type = "any", nomatch = NULL)
2090 |     if(nrow(gene_models) == 0){
2091 |       message("No features found within the requested loci!")
2092 |       return(NULL)  
2093 |     }
2094 |   }
2095 |   
2096 |   
2097 |   gene_models_exon = gene_models[feature %in% c("exon", "transcript")]
2098 |   #gene_models_rest = gene_models[!feature %in% "exon"]
2099 |   
2100 |   feature_ids = data.table::tstrsplit(x = gene_models_exon$info, split = "; ")
2101 |   feature_id_names = lapply(feature_ids, function(x){
2102 |     #x = x[1:50] #sample rows
2103 |     x = unique(unlist(data.table::tstrsplit(x = x, split = " ", keep = 1)))
2104 |     x = x[complete.cases(x)]
2105 |     x[1]
2106 |   })
2107 |   names(feature_ids) = feature_id_names
2108 |   req_fields = c("gene_id", "transcript_id")
2109 |   req_fields = req_fields[req_fields %in% unlist(feature_id_names)]
2110 |   feature_ids = feature_ids[req_fields]
2111 |   
2112 |   feature_ids = sapply(feature_ids, function(x){
2113 |     gsub(pattern = "\"|;", replacement = "", x = unlist(data.table::tstrsplit(x = x, split = " ", keep = 2)))
2114 |   })
2115 |   feature_ids = data.frame(feature_ids)
2116 |   colnames(feature_ids) = c("name2", "name")
2117 |   gene_models_exon = cbind(gene_models_exon, feature_ids)
2118 |   #gene_models = data.table::rbindlist(list(gene_models_rest, gene_models_exon), use.names = TRUE, fill = TRUE)
2119 |   gene_models = gene_models_exon[order(as.numeric(as.character(start)))]
2120 |   #gene_models[,.(chr,start,end,strand, tx, gene)]
2121 |   
2122 |   if(nrow(gene_models) == 0){
2123 |     warning("No features found within the requested loci!")
2124 |     return(NULL)
2125 |   }else{
2126 |     if(!is.null(txname)){
2127 |       gene_models = gene_models[name %in% txname]
2128 |     }
2129 |     
2130 |     if(!is.null(genename)){
2131 |       gene_models = gene_models[name2 %in% genename]
2132 |     }
2133 |     
2134 |     if(nrow(gene_models) == 0){
2135 |       warning("Requested gene or transcript could not be found within the requested loci!")
2136 |       return(NULL)
2137 |     }
2138 |     
2139 |     gene_models = split(gene_models, as.factor(as.character(gene_models$name)))
2140 |     
2141 |     exon_tbls = lapply(seq_along(along.with = 1:length(gene_models)), function(idx){
2142 |       x = gene_models[[idx]]
2143 |       exon_start = as.numeric(as.character(x[feature %in% "exon"][, start]))
2144 |       exon_end = as.numeric(as.character(x[feature %in% "exon"][, end]))
2145 |       exon_tbl = data.frame(start = exon_start, end = exon_end)
2146 |       attributes(exon_tbl) = list(start = min(x[,start], na.rm = TRUE), end = max(x[,end], na.rm = TRUE), strand = unique(x[,strand]), tx = unique(x[, name]), gene = unique(x[, name2]), chr = unique(x[,chr]))
2147 |       exon_tbl
2148 |     })
2149 |   }
2150 |   exon_tbls
2151 | }
2152 | 
2153 | .make_genome_bed = function(refBuild = "hg19", tss = "start", up = 2500, down = 2500, op_dir = tempdir(), pc_genes = FALSE, for_profile = TRUE){
2154 |   if(!dir.exists(paths = op_dir)){
2155 |     dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE)
2156 |   }
2157 |   
2158 |   tss = match.arg(arg = tss, choices = c("start", "end"))
2159 |   
2160 |   temp_op_bed = tempfile(pattern = "profileplot_ucsc", tmpdir = op_dir, fileext = ".bed")
2161 |   
2162 |   cmd = paste0("mysql --user genome --host genome-mysql.soe.ucsc.edu -NAD ",  refBuild,  " -e 'select chrom, txStart, txEnd, strand, name, name2 from refGene'")
2163 |   message(paste0("Extracting gene models from UCSC:\n", "    build: ", refBuild, "\n    query: ", cmd))
2164 |   #system(command = cmd)
2165 |   ucsc = data.table::fread(cmd = cmd)
2166 |   colnames(ucsc) = c("chr", "start", "end", "strand", "tx_id", "gene_id")
2167 |   
2168 |   main_contigs = paste0("chr", c(1:22, "X", "Y"))
2169 |   ucsc = ucsc[chr %in% main_contigs]
2170 |   
2171 |   if(pc_genes){
2172 |     ucsc = ucsc[tx_id %like% "^NM"]
2173 |   }
2174 |   
2175 |   message("Fetched ", nrow(ucsc), " transcripts from ", nrow(ucsc[,.N,.(chr)]), " contigs")
2176 |   
2177 |   if(for_profile){
2178 |     #If it is only for profile plot where tss or tes are extended, we donyt extend manually here. Instead invert tss for negative strand txs. Let bwtool do the hard work
2179 |     ucsc_minus = ucsc[strand %in% "-"]
2180 |     colnames(ucsc_minus) = c("chr", "end", "start", "strand", "tx_id", "gene_id")
2181 |     ucsc_plus = ucsc[strand %in% "+"]
2182 |     ucsc_bed = data.table::rbindlist(l = list(ucsc_plus, ucsc_minus), use.names = TRUE, fill = TRUE)[,.(chr, start, end, tx_id, gene_id)]
2183 |   }else{
2184 |     #If for summary, extrend the regions
2185 |     ucsc_minus = ucsc[strand %in% "-"]
2186 |     if(nrow(ucsc_minus) > 0){
2187 |       if(tss == "start"){
2188 |         ucsc_minus[, bed_start := end-up]
2189 |         ucsc_minus[, bed_end := end+down]
2190 |       }else{
2191 |         ucsc_minus[, bed_start := start-up]
2192 |         ucsc_minus[, bed_end := start+down]
2193 |       }
2194 |     }
2195 |     
2196 |     ucsc_plus = ucsc[strand %in% "+"]
2197 |     if(nrow(ucsc_plus) > 0){
2198 |       if(tss == "start"){
2199 |         ucsc_plus[, bed_start := start-up]
2200 |         ucsc_plus[, bed_end := start+down]
2201 |       }else{
2202 |         ucsc_plus[, bed_start := end-up]
2203 |         ucsc_plus[, bed_end := end+down]
2204 |       }
2205 |     }
2206 |     
2207 |     ucsc_bed = data.table::rbindlist(
2208 |       l = list(ucsc_plus[, .(chr, bed_start, bed_end, tx_id, gene_id)], ucsc_minus[, .(chr, bed_start, bed_end, tx_id, gene_id)]),
2209 |       use.names = TRUE,
2210 |       fill = TRUE
2211 |     )
2212 |   }
2213 |   
2214 |   colnames(ucsc_bed) = c("chr", "start", "end", "tx", "gene")
2215 |   data.table::setkey(x = ucsc_bed, chr, start, end)
2216 |   
2217 |   data.table::fwrite(x = ucsc_bed[,1:3], file = temp_op_bed, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
2218 |   data.table::fwrite(x = ucsc_bed, file = paste0(temp_op_bed, "2"), sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
2219 |   
2220 |   list(temp_op_bed, paste0(temp_op_bed, "2"))
2221 | }
2222 | 
2223 | .make_bed = function(bed, op_dir = tempdir(), up = 2500, down = 2500, tss = "center", for_profile = FALSE){
2224 |   #bwtool tool requires only three columns
2225 |   
2226 |   tss = match.arg(arg = tss, choices = c("start", "end", "center"))
2227 |   
2228 |   if(!dir.exists(paths = op_dir)){
2229 |     dir.create(path = op_dir, showWarnings = FALSE, recursive = TRUE)
2230 |   }
2231 |   
2232 |   temp_op_bed = tempfile(pattern = "profileplot", tmpdir = op_dir, fileext = ".bed")
2233 |   
2234 |   if(is.data.frame(bed)){
2235 |     bed = data.table::as.data.table(x = bed)
2236 |     #data.table::setDT(x = bed)
2237 |     colnames(bed)[1:3] = c("chr", "start", "end")
2238 |     bed[, chr := as.character(chr)]
2239 |     bed[, start := as.numeric(as.character(start))]
2240 |     bed[, end := as.numeric(as.character(end))]
2241 |   }else if(file.exists(bed)){
2242 |     bed = data.table::fread(file = bed, select = list(character = 1, numeric = c(2, 3)), col.names = c("chr", "start", "end"))
2243 |     bed = bed[,.(chr, start, end)]
2244 |   }
2245 |   
2246 |   if(!for_profile){
2247 |     if(tss == "center"){
2248 |       bed[, focal_point := as.integer(apply(bed[,2:3], 1, mean))]
2249 |       bed[, bed_start := focal_point-up]
2250 |       bed[, bed_end := focal_point+down]
2251 |     }else if(tss == "start"){
2252 |       bed[, bed_start := start-up]
2253 |       bed[, bed_end := start+down]
2254 |     }else{
2255 |       bed[, bed_start := end-up]
2256 |       bed[, bed_end := end+down]
2257 |     }
2258 |     bed = bed[,.(chr, bed_start, bed_end)]
2259 |     data.table::setkey(x = bed, chr, bed_start, bed_end)
2260 |   }
2261 |   
2262 |   data.table::fwrite(x = bed, file = temp_op_bed, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
2263 |   
2264 |   return(temp_op_bed)
2265 | }
2266 | 
2267 | .bwt_mats = function(bw, binSize, bed, size, startFrom, op_dir){
2268 |   
2269 |   bn = gsub(pattern = "\\.bw$|\\.bigWig$", replacement = "",
2270 |             x = basename(bw), ignore.case = TRUE)
2271 |   message(paste0("Processing ", bn, ".."))
2272 |   
2273 |   bw = gsub(pattern = " ", replacement = "\\ ", x = bw, fixed = TRUE) #Change spaces with \ for unix style paths
2274 |   
2275 |   if(startFrom == "start"){
2276 |     cmd = paste0("bwtool matrix -starts -tiled-averages=", binSize, " ", size, " " , bed , " ", bw, " ", paste0(op_dir, "/", bn, ".matrix"))  
2277 |   }else if(startFrom == "end"){
2278 |     cmd = paste0("bwtool matrix -ends -tiled-averages=", binSize, " ", size, " " , bed , " ", bw, " ", paste0(op_dir, "/", bn, ".matrix"))
2279 |   }else{
2280 |     cmd = paste0("bwtool matrix -tiled-averages=", binSize, " ", size, " " , bed , " ", bw, " ", paste0(op_dir, "/", bn, ".matrix"))
2281 |   }
2282 |   print(cmd)
2283 | 
2284 |   system(command = cmd, intern = TRUE)
2285 |   paste0(op_dir, "/", bn, ".matrix")
2286 | }
2287 | 
2288 | 
2289 | .summarizeMats = function(mats = NULL, summarizeBy = 'mean', group = NULL, collapse_reps = FALSE){
2290 |   
2291 |   if(!is.null(group)){
2292 |     # gdf = data.table::data.table(sample = names(mats), condition = group)
2293 |     # gdf = gdf[order(group, sample)]
2294 |     group_u = unique(group)
2295 |     if(collapse_reps){
2296 |       summarizedMats = lapply(group_u, function(g){
2297 |         x = apply(data.table::rbindlist(l = mats[which(group == g)], fill = TRUE, use.names = TRUE), 2, summarizeBy, na.rm = TRUE)
2298 |         x
2299 |       })
2300 |       names(summarizedMats) = group_u
2301 |     }else{
2302 |       summarizedMats = lapply(mats[1:(length(mats))], function(x){
2303 |         if(!is.null(dim(x))){
2304 |           x = apply(x, 2, summarizeBy, na.rm = TRUE)
2305 |         }
2306 |         x
2307 |       })
2308 |     }
2309 |   }else{
2310 |     summarizedMats = lapply(mats[1:(length(mats))], function(x){
2311 |       if(!is.null(dim(x))){
2312 |         x = apply(x, 2, summarizeBy, na.rm = TRUE)
2313 |       }
2314 |       x
2315 |     })
2316 |   }
2317 |   
2318 |   #summarizedMats$param = mats$param
2319 |   summarizedMats
2320 | }
2321 | 
2322 | 
2323 | # estimate tandard deviation for CI
2324 | .estimateCI = function(mats = NULL, group = NULL, collapse_reps = FALSE){
2325 |   
2326 |   if(!is.null(group)){
2327 |     if(collapse_reps){
2328 |       group_u = unique(group)
2329 |       ciMats = lapply(group_u, function(g){
2330 |         x = apply(data.table::rbindlist(l = mats[which(group == g)], fill = TRUE, use.names = TRUE), 2, function(y){
2331 |           sd(y, na.rm = TRUE)/sqrt(length(y))
2332 |         })
2333 |         x
2334 |       })
2335 |       names(ciMats) = group_u
2336 |     }else{
2337 |       ciMats = lapply(mats[1:(length(mats))], function(x){
2338 |         if(!is.null(dim(x))){
2339 |           x = apply(x, 2, function(y){
2340 |             sd(y, na.rm = TRUE)/sqrt(length(y))
2341 |           })
2342 |         }
2343 |         x
2344 |       })
2345 |     }
2346 |   }else{
2347 |     ciMats = lapply(mats[1:(length(mats))], function(x){
2348 |       if(!is.null(dim(x))){
2349 |         x = apply(x, 2, function(y){
2350 |           sd(y, na.rm = TRUE)/sqrt(length(y))
2351 |         })
2352 |       }
2353 |       x
2354 |     })
2355 |   }
2356 |   
2357 |   ciMats
2358 | }
2359 | 
2360 | .order_by_sds = function(mat, keep_sd = FALSE){
2361 |   mat_sd = apply(as.matrix(mat), 1, sd, na.rm = TRUE) #order it based on SD
2362 |   mat_sd = sort(mat_sd, decreasing = TRUE)
2363 |   mat = mat[names(mat_sd),, drop = FALSE]
2364 |   mat
2365 | }
2366 | 
2367 | .extract_summary = function(bw, binSize, bed, op_dir){
2368 |   bn = gsub(pattern = "\\.bw$|\\.bigWig$", replacement = "",
2369 |             x = basename(bw), ignore.case = TRUE)
2370 |   message(paste0("    Processing ", bn, ".."))
2371 |   
2372 |   cmd = paste("bwtool summary -with-sum -keep-bed -header", bedSimple, bw, paste0(op_dir, bn, ".summary"))
2373 |   paste0(op_dir, "/", bn, ".summary")
2374 | }
2375 | 
2376 | .loci2df = function(loci){
2377 |   chr = as.character(unlist(data.table::tstrsplit(x = loci, spli = ":", keep = 1)))
2378 |   start = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-", keep = 1))
2379 |   start = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(start))))
2380 |   end = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-", keep = 2))
2381 |   end = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(end))))
2382 |   data.table::data.table(chr, start, end)
2383 | }
2384 | 
2385 | .check_bwtool = function(warn = FALSE){
2386 |   check = as.character(Sys.which(names = 'bwtool'))[1]
2387 |   if(check != ""){
2388 |     if(warn){
2389 |       message("Checking for bwtool installation")
2390 |       message(paste0("    All good! Found bwtool at: ", check))
2391 |     }else{
2392 |       return(invisible(0))
2393 |     }
2394 |   }else{
2395 |     stop("Could not locate bwtool. Download it from here: https://github.com/CRG-Barcelona/bwtool/releases")
2396 |   }
2397 | }
2398 | 
2399 | .check_mysql = function(warn = FALSE){
2400 |   check = as.character(Sys.which(names = 'mysql'))[1]
2401 |   if(check != ""){
2402 |     if(warn){
2403 |       message("Checking for mysql installation")
2404 |       message(paste0("    All good! Found mysql at: ", check))
2405 |     }else{
2406 |       return(invisible(0))
2407 |     }
2408 |   }else{
2409 |     stop("Could not locate mysql.\nInstall:\n apt install mysql-server [Debian]\n yum install mysql-server [centOS]\n brew install mysql [macOS]\n conda install -c anaconda mysql [conda]")
2410 |   }
2411 | }
2412 | 
2413 | .check_dt = function(){
2414 |   if(!requireNamespace("data.table", quietly = TRUE)){
2415 |     message("Could not find data.table library. Attempting to install..")
2416 |     install.packages("data.table")
2417 |   }
2418 |   suppressPackageStartupMessages(expr = library("data.table", quietly = TRUE, warn.conflicts = FALSE, verbose = FALSE))
2419 | }
2420 | 
2421 | .check_windows = function(){
2422 |   if(Sys.info()["sysname"] == "Windows"){
2423 |     stop("Windows is not supported :(")
2424 |   }
2425 | }
2426 | 
2427 | .get_ucsc_hmm_tbls = function(){
2428 |   hmm = structure(
2429 |     list(
2430 |       TableName = c(
2431 |         "wgEncodeBroadHmmGm12878HMM",
2432 |         "wgEncodeBroadHmmH1hescHMM",
2433 |         "wgEncodeBroadHmmHepg2HMM",
2434 |         "wgEncodeBroadHmmHepg2HMM",
2435 |         "wgEncodeBroadHmmHsmmHMM",
2436 |         "wgEncodeBroadHmmHuvecHMM",
2437 |         "wgEncodeBroadHmmK562HMM",
2438 |         "wgEncodeBroadHmmNhekHMM",
2439 |         "wgEncodeBroadHmmNhlfHMM"
2440 |       ),
2441 |       cell = c(
2442 |         "GM12878",
2443 |         "H1-hESC",
2444 |         "HepG2",
2445 |         "HMEC",
2446 |         "HSMM",
2447 |         "HUVEC",
2448 |         "K562",
2449 |         "NHEK",
2450 |         "NHLF"
2451 |       ),
2452 |       Tier = c(1L,
2453 |                1L, 2L, 3L, 3L, 2L, 1L, 3L, 3L),
2454 |       Description = c(
2455 |         "B-lymphocyte, lymphoblastoid",
2456 |         "embryonic stem cells",
2457 |         "hepatocellular carcinoma",
2458 |         "mammary epithelial cells",
2459 |         "skeletal muscle myoblasts",
2460 |         "umbilical vein endothelial cells",
2461 |         "leukemia",
2462 |         "epidermal keratinocytes",
2463 |         "lung fibroblasts"
2464 |       ),
2465 |       Lineage = c(
2466 |         "mesoderm",
2467 |         "inner cell mass",
2468 |         "endoderm",
2469 |         "ectoderm",
2470 |         "mesoderm",
2471 |         "mesoderm",
2472 |         "mesoderm",
2473 |         "ectoderm",
2474 |         "endoderm"
2475 |       ),
2476 |       Tissue = c(
2477 |         "blood",
2478 |         "embryonic stem cell",
2479 |         "liver",
2480 |         "breast",
2481 |         "muscle",
2482 |         "blood vessel",
2483 |         "blood",
2484 |         "skin",
2485 |         "lung"
2486 |       ),
2487 |       Karyotype = c(
2488 |         "normal",
2489 |         "normal",
2490 |         "cancer",
2491 |         "normal",
2492 |         "normal",
2493 |         "normal",
2494 |         "cancer",
2495 |         "normal",
2496 |         "normal"
2497 |       ),
2498 |       Sex = c("F",
2499 |               "M", "M", "U", "U", "U", "F", "U", "U")
2500 |     ),
2501 |     row.names = c(NA,-9L),
2502 |     class = c("data.table", "data.frame")
2503 |   )
2504 |   
2505 |   hmm
2506 | }
2507 | 
2508 | .get_summaries_narrowPeaks = function(bigWigs, nthreads = 1, chr = NA, start = NA, end = NA){
2509 |   #bedSimple = temp_op_bed; bigWigs = list.files(path = "./", pattern = "bw"); op_dir = getwd(); nthreads = 1
2510 |   query = data.table::data.table(chromosome = chr, start = start, end = end, key = c("chromosome", "start", "end"))
2511 |   
2512 |   message(paste0("Extracting signals"))
2513 |   
2514 |   summaries = parallel::mclapply(bigWigs, FUN = function(bw){
2515 |     bn = unlist(data.table::tstrsplit(x =  basename(bw), split = "\\.", keep = 1))
2516 |     message(paste0("    Processing ", bn, " .."))
2517 |     bw = data.table::fread(file = bw)
2518 |     colnames(bw)[c(1:3, 5)] = c("chromosome", "start", "end", "max")
2519 |     data.table::setkeyv(x = bw, cols = c("chromosome", "start", "end"))
2520 |     bw = data.table::foverlaps(x = query, y = bw, type = "any", nomatch = NULL)
2521 |     bw[,.(chromosome, start, end, max)]
2522 |   }, mc.cores = nthreads)
2523 |   
2524 |   names(summaries) = unlist(data.table::tstrsplit(x =  basename(bigWigs), split = "\\.", keep = 1))
2525 |   summaries
2526 | }
2527 | 
2528 | .parse_loci = function(loci){
2529 |   chr = as.character(unlist(data.table::tstrsplit(x = loci, split = ":", keep = 1)))
2530 |   start = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-"))[1]
2531 |   start = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(start))))
2532 |   end = unlist(data.table::tstrsplit(x = unlist(data.table::tstrsplit(x = loci, split = ":", keep = 2)), split = "-"))[2]
2533 |   end = as.numeric(as.character(gsub(pattern = ",", replacement = "", x = as.character(end))))
2534 |   list(chr = chr, start = start, end = end)
2535 | }
2536 | 
2537 | #Plot profile mini version
2538 | .plot_profile_mini = function(plot_dat, index = 1, ylims = c(0, 2)){
2539 |   #size = as.character(plot_dat$param["size"])
2540 |   
2541 |   y = plot_dat[[index]]
2542 |   x = (1:(length(y)))/length(y)
2543 |   par(mar = c(1,2,1,1))
2544 |   plot(NA, axes = FALSE, xlim = c(-0.2, 1), ylim = ylims, xlab = NA, ylab = NA)
2545 |   #grid(col = "gray90")
2546 |   points(x, y, type = "l", lwd = 1.2)
2547 |   axis(side = 2, at = ylims, lwd = 1, line = -0.6, labels = NA)
2548 |   mtext(side = 2, at = ylims, lwd = 1, line = -0.05, text = round(ylims, digits = 2), cex = 0.8, font = 1, las = 2)
2549 | }
2550 | 
2551 | # Order matrices
2552 | .order_matrix = function(mats, sortBy = NULLs, k = NULL){
2553 |   
2554 |   mats_avg = lapply(mats, function(x){
2555 |     x = as.matrix(as.data.frame(x = x))
2556 |     apply(x, 1, mean, na.rm = TRUE)
2557 |   })
2558 |   
2559 |   mats_avg = as.data.frame(mats_avg)
2560 |   cluster_row_cut = NULL
2561 |   
2562 |   if(sortBy == "mean"){
2563 |     mats_avg$oall_avg = rowMeans(mats_avg, na.rm = TRUE)
2564 |     row_idx = order(mats_avg$oall_avg, decreasing = TRUE)
2565 |   }else if(sortBy == "median"){
2566 |     mats_avg$oall_avg = apply(mats_avg, 1, median, na.rm = TRUE)
2567 |     row_idx = order(mats_avg$oall_avg, decreasing = TRUE)
2568 |   }else if(sortBy == "hclust"){
2569 |     set.seed(seed = 1024)
2570 |     hc = hclust(d = dist(mats_avg))
2571 |     mats_avg$hc_order = hc$order
2572 |     if(!is.null(k)){
2573 |       mats_avg$cluster = cutree(tree = hc, k = k)
2574 |       mats_avg$row_idx = 1:nrow(mats_avg)
2575 |       mats_avg_spl = split(mats_avg, f = as.factor(as.character(mats_avg$cluster)))
2576 |       cluster_row_cut = unlist(lapply(mats_avg_spl, nrow))
2577 |       print(cluster_row_cut)
2578 |       mats_avg_spl = lapply(mats_avg_spl, function(x){
2579 |         xhc = hclust(d = dist(x[,1:(ncol(x)-3)]))
2580 |         x$row_idx2 = xhc$order
2581 |         x = x[order(x$row_idx2, decreasing = FALSE),, drop = FALSE]
2582 |         x
2583 |       })
2584 |       mats_avg_spl = data.table::rbindlist(l = mats_avg_spl, fill = TRUE, use.names = TRUE)
2585 |       #cluster_row_cut = cumsum(xhm[,.N, cluster][,N])
2586 |       row_idx = mats_avg_spl$row_idx
2587 |     }else{
2588 |       row_idx = mats_avg$hc_order
2589 |     }
2590 |   }
2591 |   
2592 |   mats = lapply(mats, function(x){
2593 |     x = as.data.frame(x = x)
2594 |     x = x[row_idx,,drop = FALSE]
2595 |     x
2596 |   })
2597 |   
2598 |   mats
2599 | }
2600 | #------------------------------------------------------------------------------------------------------------------------------------
2601 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## trackplot - Fast and easy visualisation of bigWig files in R
  2 | 
  3 | <!-- badges: start -->
  4 | [![GitHub closed issues](https://img.shields.io/github/issues-closed-raw/poisonalien/trackplot.svg)](https://github.com/poisonalien/trackplot/issues)
  5 | <!-- badges: end -->
  6 | 
  7 | ## Introduction
  8 | 
  9 | `trackplot.R` is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots and heatmaps from bigWig files. 
 10 | 
 11 | ## Installation
 12 | 
 13 | `trackplot.R` is a standalone R script and requires no installation. Just source it and you're good to go!
 14 | 
 15 | ```r
 16 | source("https://github.com/PoisonAlien/trackplot/blob/master/R/trackplot.R?raw=true")
 17 | 
 18 | # OR
 19 | 
 20 | download.file(url = "https://raw.githubusercontent.com/PoisonAlien/trackplot/master/R/trackplot.R", destfile = "trackplot.R")
 21 | source('trackplot.R') 
 22 | 
 23 | # OR If you prefer to have it as package
 24 | 
 25 | remotes::install_github(repo = "poisonalien/trackplot")
 26 | ```
 27 | 
 28 | ## Features
 29 | 
 30 | Why `trackplot`?
 31 | 
 32 |   * It's extremely fast since most of the heavy lifting is done by [bwtool](https://github.com/CRG-Barcelona/bwtool).  >15X faster than [deeptools](https://deeptools.readthedocs.io/en/develop/) for equivalent `profileplots` and `heatmaps` 
 33 |   * Lightweight and minimal dependency 
 34 |     - [data.table](https://cran.r-project.org/web/packages/data.table/index.html) and [bwtool](https://github.com/CRG-Barcelona/bwtool) are the only requirements. Similar R packages [GViz](https://bioconductor.org/packages/release/bioc/html/Gviz.html) and [karyoploteR](http://bioconductor.org/packages/release/bioc/html/karyoploteR.html) has over 150 dependencies.
 35 |     - Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
 36 |   * Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible.
 37 |   * Supports GTF and standard UCSC gene formats as well.
 38 |   * Customization: Each plot can be customized for color, scale, height, width, etc.
 39 |   * Tracks can be summarized per condition (by mean, median, max, min)
 40 |   * PCA and, optional differential peak analysis with `limma` when using uniformly processed, normalized bigWig files.
 41 | 
 42 | ## Dependencies
 43 | 
 44 | 1. [data.table](https://cran.r-project.org/web/packages/data.table/index.html) R package - which itself has no dependency.
 45 | 2. [bwtool](https://github.com/CRG-Barcelona/bwtool) - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; `/usr/local/bin`) or a directory under the PATH. 
 46 | 
 47 | * For macOS: Please download the pre-build binary from [here](https://www.dropbox.com/s/kajx9ya6erzyrim/bwtool_macOS.tar.gz?dl=1). Make it executable with `chmod +x bwtool`. macOS gatekeeper might complain that it can't run the binary downloaded from the internet. If so, [allow](https://support.apple.com/en-us/HT202491) it in the security settings. 
 48 | 
 49 | * For centOS or debian: Follow these [compilation instructions](https://gist.github.com/PoisonAlien/e19b482ac6146bfb03142a0de1c4fbc8).
 50 | 
 51 | ### Citation
 52 | 
 53 | If you find the script useful consider citing [trackplot](https://academic.oup.com/bioinformaticsadvances/article/4/1/vbae031/7616126) and [bwtool](https://academic.oup.com/bioinformatics/article/30/11/1618/282756)
 54 | 
 55 | **_Mayakonda A, and Frank Westermann. Trackplot: a fast and lightweight R script for epigenomic enrichment plots. Bioinformatics advances vol. 4,1 vbae031. 28 Feb. 2024. PMID: [38476298](https://pubmed.ncbi.nlm.nih.gov/38476298/)_**
 56 | 
 57 | **_Pohl A, Beato M. bwtool: a tool for bigWig files. Bioinformatics. 2014 Jun 1;30(11):1618-9. doi: 10.1093/bioinformatics/btu056. Epub 2014 Jan 30. PMID: [24489365](https://pubmed.ncbi.nlm.nih.gov/24489365/)_**
 58 | 
 59 | ## Usage
 60 | 
 61 | Simple usage - Make a table of all the bigWig files to be analysed with `read_coldata()` and pass it to the downstream functions.
 62 | 
 63 | ```mermaid
 64 | flowchart TD
 65 |     a[bigWig file list] -->A{read_coldata}
 66 |     A --> B{track_extract}
 67 |     B --> B1[track_plot]
 68 |     A --> C{profile_extract}
 69 |     C --> C1[profile_summarize]
 70 |     C --> C3[profile_heatmap]
 71 |     C1 --> C2[profile_plot]
 72 |     A --> D{extract_summary}
 73 |     D --> D1[pca_plot]
 74 |     D --> D2[diffpeak]
 75 |     D2 --> D3[volcano_plot]
 76 | ```
 77 | 
 78 | ```r
 79 | #Path to bigWig files
 80 | bigWigs = c("H1_Oct4.bw", "H1_Nanog.bw", "H1_k4me3.bw", 
 81 |             "H1_k4me1.bw", "H1_k27ac.bw", "H1_H2az.bw", "H1_Ctcf.bw")
 82 | 
 83 | #Make a table of bigWigs along with ref genome build
 84 | bigWigs = read_coldata(bws = bigWigs, build = "hg19")
 85 | ```
 86 | 
 87 | ## trackplots
 88 | 
 89 | `track_extract()` and `track_plot()` are two functions to generate IGV style track plots (aka locus plots) from bigWig files. Additionally, `track_summarize` can summarize tracks by condition.
 90 |  
 91 | ### Step-1: Extract signal from bigWig files 
 92 | ```r
 93 | #Region to plot
 94 | oct4_loci = "chr6:31125776-31144789"
 95 | 
 96 | #Extract bigWig signal for a loci of interest
 97 | t = track_extract(colData = bigWigs, loci = oct4_loci)
 98 | 
 99 | #Or you can also specifiy a gene name instead of a loci 
100 | # - loci and gene models will be automatically extracted from UCSC genome browser
101 | t = track_extract(colData = bigWigs, gene = "POU5F1")
102 | ```
103 | 
104 | 
105 | ### Step-2: Plot
106 | 
107 | #### Basic plot
108 | ```r
109 | track_plot(summary_list = t)
110 | ```
111 | 
112 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/b50457b0-6623-47f6-b00f-f6e0de5a4808)
113 | 
114 | #### Add cytoband and change colors for each track
115 | ```r
116 | track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
117 | track_plot(summary_list = t, 
118 |           col = track_cols, 
119 |           show_ideogram = TRUE)
120 | ```
121 | 
122 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/a0911998-aae8-4de1-96f5-18e278d19d80)
123 | 
124 | #### Collapse all tracks into a single track
125 | 
126 | Use `track_overlay = TRUE` to overlay all tracks into a single line track
127 | 
128 | ```r
129 | track_plot(summary_list = t, col = track_cols, show_ideogram = FALSE, track_overlay = TRUE)
130 | ```
131 | 
132 | ![](https://github.com/user-attachments/assets/d286f159-8950-4209-a985-fa3ba7103c53)
133 | 
134 | 
135 | #### Heighilight regions of interest (any bed files would do)
136 | 
137 | ```r
138 | oct4_nanog_peaks = c("H1_Nanog.bed","H1_Oct4.bed") #Peak files 
139 | track_plot(summary_list = t, 
140 |           col = track_cols, 
141 |           show_ideogram = TRUE, 
142 |           peaks = oct4_nanog_peaks)
143 | ```
144 | 
145 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/2531af5e-7200-478e-aa90-4ff5f537f57a)
146 | 
147 | #### Add some chromHMM tracks to the bottom
148 | 
149 | chromHMM data should be a bed file with the 4th column containing chromatin state. See here for an [example](https://github.com/PoisonAlien/trackplot/blob/master/inst/extdata/narrowpeak/H1_chromHMM.bed) file. 
150 | 
151 | Note that the color code for each of the 15 states are as described [here](https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeBroadHmm&db=hg19). 
152 | In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument `chromHMM_cols`
153 | 
154 | ```r
155 | chromHMM_peaks = "H1_chromHMM.bed"
156 | 
157 | track_plot(summary_list = t, 
158 |           col = track_cols, 
159 |           show_ideogram = TRUE, 
160 |           peaks = oct4_nanog_peaks, chromHMM = chromHMM_peaks)
161 | ```
162 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/5ef8d09f-1bdf-4622-9367-4245bdec63d5)
163 | 
164 | #### Add some chromHMM tracks from UCSC
165 | 
166 | UCSC has 9 cell lines for which chromHMM data is available. These can be added automatically in case if you dont have your own data.
167 | In this case, use the argument `ucscChromHMM` with any values from TableName column of the below table.
168 | 
169 | ```r
170 |                     TableName    cell                      Description              Tissue Karyotype
171 | 1: wgEncodeBroadHmmGm12878HMM GM12878     B-lymphocyte, lymphoblastoid               blood    normal
172 | 2:  wgEncodeBroadHmmH1hescHMM H1-hESC             embryonic stem cells embryonic stem cell    normal
173 | 3:   wgEncodeBroadHmmHepg2HMM   HepG2         hepatocellular carcinoma               liver    cancer
174 | 4:   wgEncodeBroadHmmHepg2HMM    HMEC         mammary epithelial cells              breast    normal
175 | 5:    wgEncodeBroadHmmHsmmHMM    HSMM        skeletal muscle myoblasts              muscle    normal
176 | 6:   wgEncodeBroadHmmHuvecHMM   HUVEC umbilical vein endothelial cells        blood vessel    normal
177 | ```
178 | 
179 | ```r
180 | track_plot(summary_list = t, 
181 |           col = track_cols, 
182 |           show_ideogram = TRUE, 
183 |           peaks = oct4_nanog_peaks, 
184 |           ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))
185 | ```
186 | 
187 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/fecf7ab1-44cb-4308-b3f4-d8ca03cdd15d)
188 | 
189 | ### Re-organize tracks
190 | 
191 | By default tracks are organized from top to bottom as `c("p", "b", "h", "g", "c")`  corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument `layout_ord`. Furthermore, bigWig tracks themselves can be ordered with the argument `bw_ord` which accepts the names of the bigWig tracks as input and plots them in the given order.
192 | 
193 | ```r
194 | #Draw only NANOG, OCT4 bigWigs in that order. Re-organize the layout in the order chromHMM track, gene track, cytoband track. Rest go to the end.
195 | track_plot(
196 |   summary_list = t,
197 |   col = track_cols,
198 |   show_ideogram = TRUE,
199 |   genename = c("POU5F1", "TCF19"),
200 |   peaks = oct4_nanog_peaks,
201 |   peaks_track_names = c("NANOG", "OCT4"),
202 |   groupAutoScale = FALSE, ucscChromHMM = "wgEncodeBroadHmmH1hescHMM", y_min = 0, 
203 |   bw_ord = c("NANOG", "OCT4"), 
204 |   layout_ord = c("h", "g", "c")
205 | )
206 | ```
207 | 
208 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/11c9fe9b-0292-40af-8197-7f20a5275f01)
209 | 
210 | 
211 | ## narrowPeaks and broadPeaks 
212 | 
213 | All of the above plots can also be generated with [narrowPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) or [broadPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format13) files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per [UCSC convention](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
214 | 
215 | `narrowPeak` is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files.
216 | 
217 | ```r
218 | narrowPeaks = c("H1_Ctcf.bed", "H1_H2az.bed", "H1_k27ac.bed", 
219 |                 "H1_k4me1.bed", "H1_k4me3.bed", "H1_Nanog.bed", 
220 |                 "H1_Oct4.bed", "H1_Pol2.bed")
221 | 
222 | #Use peak as input_type
223 | narrowPeaks = read_coldata(narrowPeaks, build = "hg19", input_type = "peak")
224 | 
225 | oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example
226 | 
227 | narrowPeaks_track = track_extract(colData = narrowPeaks, loci = oct4_loci)
228 | 
229 | #Rest plotting is same
230 | track_plot(summary_list = narrowPeaks_track, 
231 |           show_ideogram = TRUE, 
232 |           peaks = oct4_nanog_peaks, 
233 |           ucscChromHMM = c("wgEncodeBroadHmmH1hescHMM", "wgEncodeBroadHmmNhlfHMM"))
234 | 
235 | ```
236 |   
237 | ![image](https://github.com/PoisonAlien/trackplot/assets/8164062/fa3999fd-ab7f-4617-a43e-d3cac7f3a3b3)
238 | 
239 | 
240 | ## profileplots
241 | 
242 | `profile_extract()` -> `profile_summarize()` -> `profile_plot()` are functions to generate density based profile-plots from bigWig files.
243 | 
244 | 
245 |   * Below example for summarizing approx. 3,671 peaks for 3 bigWig files takes ca. 3 seconds on my 5 year old [macbook Pro](https://support.apple.com/kb/sp715?locale=en_GB). This includes generating signal matrix, summarizing, and plotting. Equivalent deeptools commands takes 20 seconds.
246 |   * Optionally, it can also query UCSC genome browser for refseq transcripts of desired assembly and summarize around TSS regions
247 |   * Replicates can be collapsed into single value per condition
248 | 
249 | Example data from [GSE99183](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99183) where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer.
250 | 
251 | ```r
252 | bws = c("GSM2634756_U87_BRD4.bw", "GSM2634757_U87_BRD4_dBET_24h.bw", "GSM2634758_U87_BRD4_dBET_2h.bw")
253 | bws = read_coldata(bws = bws, 
254 |                   sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"), 
255 |                   build = "hg19")
256 | ```
257 | 
258 | ### Refseq transcripts
259 | 
260 | ```r
261 | #Extract signals from bigWig files around refseq transcripts
262 | pe_refseq = profile_extract(colData = bws, ucsc_assembly = TRUE, 
263 |                             startFrom = 'start', up = 1500, down = 1500)
264 | 
265 | #Estimate mean signal
266 | ps_refseq = profile_summarize(sig_list = pe_refseq) 
267 | 
268 | #Plot
269 | profile_plot(ps_refseq)
270 | ```
271 | 
272 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/bd26cdac-6f87-44ed-b41c-685454c6d28c)
273 | 
274 | 
275 | ### Custom BED regions
276 | 
277 | ```r
278 | #BRD4 binding sites 
279 | bed = "GSM2634756_U87_BRD4_peaks.narrowPeak.gz"
280 | 
281 | #Center and extend 1500 both ways from the peak center
282 | pe_bed = profile_extract(colData = bws, bed = bed, startFrom = "center", 
283 |                           up = 1500, down = 1500, nthreads = 4)
284 | 
285 | #Estimate mean signal
286 | ps_bed = profile_summarize(sig_list = pe_bed) 
287 | 
288 | #Plot
289 | profile_plot(ps_bed)
290 | ```
291 | 
292 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/94c1d728-e38c-418a-8469-ba50c42dc295)
293 | 
294 | 
295 | ## heatmap
296 | 
297 | Output from `profile_extract` can be used to draw a heatmap with `profile_heatmap`
298 | 
299 | ```r
300 | profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8)
301 | ```
302 | 
303 | ![](https://github.com/PoisonAlien/trackplot/assets/8164062/a82eedc8-a3f3-4439-a005-13242fce7929)
304 | 
305 | 
306 | 
307 | ***PSA*** If you find the tool useful, consider starring this repository or up voting this [Biostars thread](https://www.biostars.org/p/475853/) so that more poeple can find it :)
308 | 
309 | ### Caveat
310 | 
311 |  * Windows OS is not supported
312 | 
313 | 
314 | ### Acknowledgements 
315 | 
316 | [Joschka Hey](https://github.com/HeyLifeHD) for all the cool suggestions :)
317 | 


--------------------------------------------------------------------------------
/inst/CITATION:
--------------------------------------------------------------------------------
 1 | bibentry(
 2 |   bibtype  = "Article",
 3 |   title    = "Trackplot: a fast and lightweight R script for epigenomic enrichment plots",
 4 |   author   =  personList(as.person("Anand Mayakonda"),
 5 |                        as.person("Frank Westermann")),
 6 |   journal  = "Bioinformatics Advances",
 7 |   year     = "2024",
 8 |   volume   = "4",
 9 |   doi      = "doi.org/10.1093/bioadv/vbae031"
10 | )
11 | 


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_Ctcf.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_Ctcf.bw


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_H2az.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_H2az.bw


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_Nanog.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_Nanog.bw


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_Oct4.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_Oct4.bw


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_k27ac.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_k27ac.bw


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_k4me1.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_k4me1.bw


--------------------------------------------------------------------------------
/inst/extdata/bw/H1_k4me3.bw:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PoisonAlien/trackplot/f95331672c9d353a50ce66531a1775e7a3f66df7/inst/extdata/bw/H1_k4me3.bw


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_Ctcf.bed:
--------------------------------------------------------------------------------
1 | chr6	31126946	31127607	.	562	.	13.920298	15.7	-1
2 | chr6	31130121	31130512	.	589	.	15.050033	15.7	-1
3 | chr6	31139989	31141379	.	302	.	3.232853	3.5	-1
4 | chr6	31142509	31143777	.	303	.	3.290765	3.0	-1
5 | chr6	31142721	31142850	.	526	.	12.440937	4.2	-1
6 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_H2az.bed:
--------------------------------------------------------------------------------
1 | chr6	31127041	31127157	.	1000	.	6.568079	3.1	-1
2 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_Nanog.bed:
--------------------------------------------------------------------------------
1 | chr6	31139583	31140045	peak3497	584	.	1856.580	-1	-1
2 | chr6	31140724	31141062	peak3498	90	.	285.880	-1	-1
3 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_Oct4.bed:
--------------------------------------------------------------------------------
1 | chr6	31139576	31140109	peak2363	41	.	128.810	-1	-1
2 | chr6	31140574	31141112	peak2364	93	.	288.400	-1	-1
3 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_Pol2.bed:
--------------------------------------------------------------------------------
1 | chr6	31125655	31128143	peak25632	330	.	1066.980	-1	-1
2 | chr6	31128357	31133697	peak25633	958	.	3100.000	-1	-1
3 | chr6	31136367	31139334	peak25634	318	.	1029.150	-1	-1
4 | chr6	31139365	31141730	peak25635	1000	.	3233.060	-1	-1
5 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_chromHMM.bed:
--------------------------------------------------------------------------------
 1 | chr6	31125621	31126021	1
 2 | chr6	31126021	31127821	2
 3 | chr6	31127821	31128221	6
 4 | chr6	31128221	31129421	11
 5 | chr6	31129421	31129621	7
 6 | chr6	31129621	31129821	6
 7 | chr6	31129821	31131621	9
 8 | chr6	31131621	31132621	10
 9 | chr6	31132621	31133421	9
10 | chr6	31133421	31133821	5
11 | chr6	31133821	31134421	7
12 | chr6	31134421	31135221	6
13 | chr6	31135221	31137221	2
14 | chr6	31137221	31138421	1
15 | chr6	31138421	31138821	2
16 | chr6	31138821	31139221	4
17 | chr6	31139221	31139821	5
18 | chr6	31139821	31141821	4
19 | chr6	31141821	31142221	5
20 | chr6	31142221	31144021	7
21 | chr6	31144021	31145221	11
22 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_k27ac.bed:
--------------------------------------------------------------------------------
1 | chr6	31120776	31147840	.	454	.	3.950245	13.4	-1
2 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_k4me1.bed:
--------------------------------------------------------------------------------
1 | chr6	31121821	31150316	.	460	.	4.127887	12.8	-1
2 | chr6	31126483	31128222	.	696	.	7.494269	14.7	-1
3 | chr6	31129449	31131098	.	621	.	6.429799	14.5	-1
4 | chr6	31132550	31135066	.	522	.	5.004274	14.4	-1
5 | chr6	31132739	31132931	.	936	.	10.929473	6.1	-1
6 | chr6	31133481	31133603	.	1000	.	13.579328	4.7	-1
7 | chr6	31133793	31133992	.	948	.	11.100020	6.9	-1
8 | chr6	31136002	31144622	.	632	.	6.585711	13.5	-1
9 | 


--------------------------------------------------------------------------------
/inst/extdata/narrowpeak/H1_k4me3.bed:
--------------------------------------------------------------------------------
1 | chr6	31124287	31128550	.	538	.	12.148244	14.6	-1
2 | chr6	31134594	31142336	.	660	.	16.792120	14.4	-1
3 | 


--------------------------------------------------------------------------------
/man/diffpeak.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{diffpeak}
 4 | \alias{diffpeak}
 5 | \title{Differential Peak Analysis}
 6 | \usage{
 7 | diffpeak(
 8 |   summary_list = NULL,
 9 |   condition = NULL,
10 |   log2 = TRUE,
11 |   num = NULL,
12 |   den = NULL
13 | )
14 | }
15 | \arguments{
16 | \item{summary_list}{Output from \code{\link{extract_summary}}}
17 | 
18 | \item{condition}{a column name in \code{coldata} containing sample conditions. Default NULL.}
19 | 
20 | \item{log2}{log2 convert data prior to testing. Default TRUE}
21 | 
22 | \item{num}{Numerator condition. Default NULL}
23 | 
24 | \item{den}{Denominator condition. Default NULL}
25 | }
26 | \description{
27 | Differential Peak Analysis
28 | }
29 | \details{
30 | Takes output from \code{extract_summary} and performs differential peak analysis with Limma
31 | }
32 | 


--------------------------------------------------------------------------------
/man/extract_summary.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{extract_summary}
 4 | \alias{extract_summary}
 5 | \title{Extract area under the curve for every peak from from given bigWig files.}
 6 | \usage{
 7 | extract_summary(
 8 |   colData,
 9 |   bed = NULL,
10 |   ucsc_assembly = TRUE,
11 |   startFrom = "start",
12 |   binSize = 50,
13 |   up = 2500,
14 |   down = 2500,
15 |   pc_genes = TRUE,
16 |   nthreads = 4
17 | )
18 | }
19 | \arguments{
20 | \item{colData}{bigWig files. Default NULL. Required.}
21 | 
22 | \item{bed}{bed file or a data.frame with first 3 column containing chromosome, star, end positions.}
23 | 
24 | \item{ucsc_assembly}{If \code{bed} file not provided, setting \code{ucsc_assembly} to TRUE will fetch transcripts from UCSC genome browser.}
25 | 
26 | \item{startFrom}{Default "start". For bed files this can be "start", "center" or "end". For \code{ucsc_assembly} this can only be "start" or "end"}
27 | 
28 | \item{binSize}{bin size to extract signal. Default 50 (bps). Should be >1}
29 | 
30 | \item{up}{extend upstream by this many bps from \code{startFrom}. Default 2500}
31 | 
32 | \item{down}{extend downstream by this many bps from \code{startFrom}. Default 2500}
33 | 
34 | \item{pc_genes}{Use only protein coding genes when using \code{ucsc_assembly}. Default TRUE}
35 | 
36 | \item{nthreads}{Default 4}
37 | }
38 | \description{
39 | Extract area under the curve for every peak from from given bigWig files.
40 | }
41 | 


--------------------------------------------------------------------------------
/man/pca_plot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{pca_plot}
 4 | \alias{pca_plot}
 5 | \title{Draw a PCA plot}
 6 | \usage{
 7 | pca_plot(
 8 |   summary_list = NULL,
 9 |   top = 1000,
10 |   log2 = FALSE,
11 |   xpc = "PC1",
12 |   ypc = "PC2",
13 |   color_by = NULL,
14 |   pch_by = NULL,
15 |   color = NULL,
16 |   show_cree = TRUE,
17 |   lab_size = 1,
18 |   size = 1,
19 |   legend_size = 1,
20 |   legendpos = "topright",
21 |   legendpos2 = "bottomright"
22 | )
23 | }
24 | \arguments{
25 | \item{summary_list}{output from extract_summary}
26 | 
27 | \item{top}{Top most variable peaks to consider for PCA. Default 1000}
28 | 
29 | \item{log2}{log transform data? Default FALSE. IF TRUE, adds a small positive value and log2 converts.}
30 | 
31 | \item{xpc}{Default PC1}
32 | 
33 | \item{ypc}{Default PC2}
34 | 
35 | \item{color_by}{a column name in \code{coldata} to color by. Default NULL.}
36 | 
37 | \item{pch_by}{a column name in \code{coldata} to pch by. Default NULL.}
38 | 
39 | \item{color}{Manual colors for each level in \code{color_by} Default NULL.}
40 | 
41 | \item{show_cree}{If TRUE draws a cree plot. Default TRUE}
42 | 
43 | \item{lab_size}{Font size for labels. Default 1}
44 | 
45 | \item{size}{Point size. Default 1}
46 | 
47 | \item{legend_size}{Default 1}
48 | 
49 | \item{legendpos}{Default topright}
50 | 
51 | \item{legendpos2}{Default bottomright}
52 | }
53 | \description{
54 | Draw a PCA plot
55 | }
56 | 


--------------------------------------------------------------------------------
/man/profile_extract.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{profile_extract}
 4 | \alias{profile_extract}
 5 | \title{Generate bigWig signal matrix for given genomic regions or ucsc refseq transcripts}
 6 | \usage{
 7 | profile_extract(
 8 |   colData = NULL,
 9 |   bed = NULL,
10 |   ucsc_assembly = TRUE,
11 |   startFrom = "start",
12 |   binSize = 50,
13 |   up = 2500,
14 |   down = 2500,
15 |   pc_genes = TRUE,
16 |   nthreads = 4
17 | )
18 | }
19 | \arguments{
20 | \item{colData}{from \code{read_coldata}}
21 | 
22 | \item{bed}{bed file or a data.frame with first 3 column containing chromosome, star, end positions.}
23 | 
24 | \item{ucsc_assembly}{If \code{bed} file not provided, setting \code{ucsc_assembly} to TRUE will fetch transcripts from UCSC genome browser.}
25 | 
26 | \item{startFrom}{Default "start". For bed files this can be "start", "center" or "end". For \code{ucsc_assembly} this can only be "start" or "end"}
27 | 
28 | \item{binSize}{bin size to extract signal. Default 50 (bps). Should be >1}
29 | 
30 | \item{up}{extend upstream by this many bps from \code{startFrom}. Default 2500}
31 | 
32 | \item{down}{extend downstream by this many bps from \code{startFrom}. Default 2500}
33 | 
34 | \item{pc_genes}{Use only protein coding genes when \code{ucsc_assembly} is used. Default TRUE}
35 | 
36 | \item{nthreads}{Default 4}
37 | }
38 | \description{
39 | Generate bigWig signal matrix for given genomic regions or ucsc refseq transcripts
40 | }
41 | \seealso{
42 | \code{\link{profile_summarize}} \code{\link{profile_plot}} \code{\link{profile_heatmap}}
43 | }
44 | 


--------------------------------------------------------------------------------
/man/profile_heatmap.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{profile_heatmap}
 4 | \alias{profile_heatmap}
 5 | \title{Draw a heatmap}
 6 | \usage{
 7 | profile_heatmap(
 8 |   mat_list,
 9 |   sortBy = "mean",
10 |   col_pal = "Blues",
11 |   revpal = FALSE,
12 |   sample_names = NULL,
13 |   title_size = 1,
14 |   top_profile = FALSE,
15 |   top_profile_h = 2,
16 |   zmins = NULL,
17 |   zmaxs = NULL,
18 |   scale = FALSE,
19 |   file_name = NULL,
20 |   hm_width = 1024,
21 |   hm_height = 600,
22 |   mat_order = NULL
23 | )
24 | }
25 | \arguments{
26 | \item{mat_list}{Input matrix list generated by \code{\link{profile_extract}}}
27 | 
28 | \item{sortBy}{Sort matrix by.., Can be mean, median. Default mean.}
29 | 
30 | \item{col_pal}{Color palette to use. Default Blues. Use hcl.pals(type = "sequential") to see available palettes}
31 | 
32 | \item{revpal}{Reverse color palette? Default FALSE.}
33 | 
34 | \item{sample_names}{Manually specify sample names.}
35 | 
36 | \item{title_size}{size of title. Default 0.8}
37 | 
38 | \item{top_profile}{Boolean. Whether to draw top profile plot.}
39 | 
40 | \item{top_profile_h}{Default 2.}
41 | 
42 | \item{zmins}{Manually specify min scores to include in heatmap}
43 | 
44 | \item{zmaxs}{Manually specify max scores to include in heatmap}
45 | 
46 | \item{scale}{Whether to row scale the matrix. Default FALSE}
47 | 
48 | \item{file_name}{Default NULL. If provided saves plot as a png.}
49 | 
50 | \item{hm_width}{Width of the plot. Default 1024}
51 | 
52 | \item{hm_height}{Height of the plot. Default 600}
53 | 
54 | \item{mat_order}{Default NULL. Sample order in which the heatmaps are drawn.}
55 | }
56 | \description{
57 | Draw a heatmap
58 | }
59 | \details{
60 | This function takes output from extract_matrix and draws a heatmap
61 | }
62 | 


--------------------------------------------------------------------------------
/man/profile_plot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{profile_plot}
 4 | \alias{profile_plot}
 5 | \title{Draw a profile plot}
 6 | \usage{
 7 | profile_plot(
 8 |   sig_list = NULL,
 9 |   color = NULL,
10 |   line_size = 1,
11 |   legend_fs = 1,
12 |   axis_fs = 1,
13 |   xlab = NA,
14 |   ylab = NA
15 | )
16 | }
17 | \arguments{
18 | \item{sig_list}{Output generated from profile_summarize}
19 | 
20 | \item{color}{Manual colors for each bigWig. Default NULL.}
21 | 
22 | \item{line_size}{Default 1}
23 | 
24 | \item{legend_fs}{Legend font size. Default 1}
25 | 
26 | \item{axis_fs}{Axis font size. Default 1}
27 | 
28 | \item{xlab}{x axis label. Default NA}
29 | 
30 | \item{ylab}{y axis label. Default NA}
31 | }
32 | \description{
33 | Draw a profile plot
34 | }
35 | 


--------------------------------------------------------------------------------
/man/profile_summarize.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{profile_summarize}
 4 | \alias{profile_summarize}
 5 | \title{Summarize data for profile plots}
 6 | \usage{
 7 | profile_summarize(sig_list = NULL, stat = "mean", condition = NULL)
 8 | }
 9 | \arguments{
10 | \item{sig_list}{Output generated from \code{profile_extract}}
11 | 
12 | \item{stat}{Default \code{mean}. Can be \code{mean}, \code{median}}
13 | 
14 | \item{condition}{column name with conditions in \code{colData}. If provided summarizes signals from samples belonging to same group or condition}
15 | }
16 | \description{
17 | Summarize data for profile plots
18 | }
19 | \seealso{
20 | \code{\link{profile_extract}} \code{\link{profile_plot}} \code{\link{profile_heatmap}}
21 | }
22 | 


--------------------------------------------------------------------------------
/man/read_coldata.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{read_coldata}
 4 | \alias{read_coldata}
 5 | \title{Prepares meta data table from bigWig files.
 6 | Output from this function is passed to all downstream functions.}
 7 | \usage{
 8 | read_coldata(
 9 |   bws = NULL,
10 |   sample_names = NULL,
11 |   build = "hg38",
12 |   input_type = "bw"
13 | )
14 | }
15 | \arguments{
16 | \item{bws}{path to bigWig files}
17 | 
18 | \item{sample_names}{sample names for each input files. Optional. Default NULL - creates one from file names.}
19 | 
20 | \item{build}{Reference genome build. Default hg38}
21 | 
22 | \item{input_type}{Default \code{bw}. Can be \code{bw} or \code{peak}}
23 | }
24 | \description{
25 | Prepares meta data table from bigWig files.
26 | Output from this function is passed to all downstream functions.
27 | }
28 | \examples{
29 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\\\.bw$", full.names = TRUE) 
30 | cd = read_coldata(bws = bigWigs, build = "hg19")
31 | beds = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "\\\\.bed$", full.names = TRUE) 
32 | cd_bed = read_coldata(bws = beds, input_type = "peak", build = "hg19")
33 | }
34 | 


--------------------------------------------------------------------------------
/man/summarize_homer_annots.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{summarize_homer_annots}
 4 | \alias{summarize_homer_annots}
 5 | \title{Parse peak annotations generated by homer annotatePeaks.pl}
 6 | \usage{
 7 | summarize_homer_annots(
 8 |   anno,
 9 |   sample_names = NULL,
10 |   legend_font_size = 1,
11 |   label_size = 0.8
12 | )
13 | }
14 | \arguments{
15 | \item{anno}{Raw annotations generated by homer \code{annotatePeaks.pl}. Can be more than one file.}
16 | 
17 | \item{sample_names}{Sample names correspoding to each input file. Default parses from input file.}
18 | 
19 | \item{legend_font_size}{font size for legend. Default 1.}
20 | 
21 | \item{label_size}{font size for labels. Default 0.8.}
22 | }
23 | \description{
24 | Parse peak annotations generated by homer annotatePeaks.pl
25 | }
26 | \details{
27 | summarizes peak annotations generated with homer annotatePeaks.pl, generates a pie chart of peak distributions.
28 | }
29 | 


--------------------------------------------------------------------------------
/man/track_extract.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{track_extract}
 4 | \alias{track_extract}
 5 | \title{Extract bigWig track data for the given loci}
 6 | \usage{
 7 | track_extract(
 8 |   colData = NULL,
 9 |   loci = NULL,
10 |   gene = NULL,
11 |   binsize = 10,
12 |   nthreads = 1,
13 |   query_ucsc = TRUE,
14 |   gtf = NULL,
15 |   build = "hg38",
16 |   padding = 0,
17 |   ideoTblName = "cytoBand"
18 | )
19 | }
20 | \arguments{
21 | \item{colData}{coldata from \code{read_coldata}}
22 | 
23 | \item{loci}{target region to plot. Should be of format "chr:start-end". e.g; chr3:187715903-187752003 OR chr3:187,715,903-187,752,003}
24 | 
25 | \item{gene}{gene name. This is mutually exclusive with \code{loci}}
26 | 
27 | \item{binsize}{bin size to extract signal. Default 10 (bps).}
28 | 
29 | \item{nthreads}{Default 1. Number of threads to use.}
30 | 
31 | \item{query_ucsc}{Default TRUE. Queries UCSC and extracts gene models and cytoband for the loci. Requires \code{mysql} installation.}
32 | 
33 | \item{gtf}{Use gtf file or data.frame as source for gene model. Default NULL.}
34 | 
35 | \item{build}{Reference genome build. Default hg38}
36 | 
37 | \item{padding}{Extend locus on both sides by this many bps.}
38 | 
39 | \item{ideoTblName}{Table name for ideogram. Default \code{cytoBand}}
40 | }
41 | \description{
42 | Extract bigWig track data for the given loci
43 | }
44 | \examples{
45 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\\\.bw$", full.names = TRUE) 
46 | cd = read_coldata(bws = bigWigs, build = "hg19")
47 | oct4_loci = "chr6:31125776-31144789"
48 | t = track_extract(colData = cd, loci = oct4_loci, build = "hg19")
49 | }
50 | 


--------------------------------------------------------------------------------
/man/track_plot.Rd:
--------------------------------------------------------------------------------
  1 | % Generated by roxygen2: do not edit by hand
  2 | % Please edit documentation in R/trackplot.R
  3 | \name{track_plot}
  4 | \alias{track_plot}
  5 | \title{Generate IGV style locus tracks with ease}
  6 | \usage{
  7 | track_plot(
  8 |   summary_list = NULL,
  9 |   draw_gene_track = TRUE,
 10 |   show_ideogram = TRUE,
 11 |   col = "gray70",
 12 |   groupAutoScale = FALSE,
 13 |   y_max = NULL,
 14 |   y_min = NULL,
 15 |   txname = NULL,
 16 |   genename = NULL,
 17 |   show_axis = FALSE,
 18 |   gene_fsize = 1,
 19 |   track_names = NULL,
 20 |   track_names_pos = 0,
 21 |   track_names_to_left = FALSE,
 22 |   track_overlay = FALSE,
 23 |   regions = NULL,
 24 |   collapse_txs = TRUE,
 25 |   boxcol = "#ffc41a",
 26 |   boxcolalpha = 0.4,
 27 |   chromHMM = NULL,
 28 |   chromHMM_cols = NULL,
 29 |   chromHMM_names = NULL,
 30 |   ucscChromHMM = NULL,
 31 |   peaks = NULL,
 32 |   bw_track_height = 3,
 33 |   peaks_track_height = 2,
 34 |   gene_track_height = 2,
 35 |   scale_track_height = 2,
 36 |   chromHMM_track_height = 1,
 37 |   cytoband_track_height = 2,
 38 |   peaks_track_names = NULL,
 39 |   left_mar = NULL,
 40 |   bw_ord = NULL,
 41 |   layout_ord = c("p", "b", "h", "g", "c")
 42 | )
 43 | }
 44 | \arguments{
 45 | \item{summary_list}{Output from track_extract}
 46 | 
 47 | \item{draw_gene_track}{Default FALSE. If TRUE plots gene models overlapping with the queried region}
 48 | 
 49 | \item{show_ideogram}{Default TRUE. If TRUE plots ideogram of the target chromosome with query loci highlighted. Works only when \code{query_ucsc} is TRUE.}
 50 | 
 51 | \item{col}{Color for tracks. Default \verb{#2f3640}. Multiple colors can be provided for each track}
 52 | 
 53 | \item{groupAutoScale}{Default TRUE}
 54 | 
 55 | \item{y_max}{custom y axis upper limits for each track. Recycled if required.}
 56 | 
 57 | \item{y_min}{custom y axis lower limits for each track. Recycled if required.}
 58 | 
 59 | \item{txname}{transcript name to draw. Default NULL. Plots all transcripts overlapping with the queried region}
 60 | 
 61 | \item{genename}{gene name to draw. Default NULL. Plots all genes overlapping with the queried region}
 62 | 
 63 | \item{show_axis}{Default FALSE}
 64 | 
 65 | \item{gene_fsize}{Font size. Default 1}
 66 | 
 67 | \item{track_names}{Default NULL}
 68 | 
 69 | \item{track_names_pos}{Default 0 (corresponds to left corner)}
 70 | 
 71 | \item{track_names_to_left}{If TRUE, track names are shown to the left of the margin. Default FALSE, plots on top as a title}
 72 | 
 73 | \item{track_overlay}{Draws all bigWigs in a single track as a line plot}
 74 | 
 75 | \item{regions}{genomic regions to highlight. A data.frame with at-least three columns containing chr, start and end positions.}
 76 | 
 77 | \item{collapse_txs}{Default FALSE. Whether to collapse all transcripts belonging to same gene into a unified gene model}
 78 | 
 79 | \item{boxcol}{color for highlighted region. Default "#192A561A"}
 80 | 
 81 | \item{boxcolalpha}{Default 0.5}
 82 | 
 83 | \item{chromHMM}{chromHMM data. Can be path to bed files or a list data.frames with first three columns containing chr,start,end and a 4th column containing integer coded state}
 84 | 
 85 | \item{chromHMM_cols}{A named vector for each state (in the 4th column of chromHMM file). Default NULL}
 86 | 
 87 | \item{chromHMM_names}{name for the chromHMM track}
 88 | 
 89 | \item{ucscChromHMM}{Name of the chromHMM table. Use .get_ucsc_hmm_tbls() to see the details.}
 90 | 
 91 | \item{peaks}{bed file to be highlighted. Can be path to bed files or a list data.frames with first three columns containing chr,start,end.}
 92 | 
 93 | \item{bw_track_height}{Default 3}
 94 | 
 95 | \item{peaks_track_height}{Default 2.}
 96 | 
 97 | \item{gene_track_height}{Default 2}
 98 | 
 99 | \item{scale_track_height}{Default 1}
100 | 
101 | \item{chromHMM_track_height}{Default 1}
102 | 
103 | \item{cytoband_track_height}{Default 1}
104 | 
105 | \item{peaks_track_names}{Provide a name for each loci bed file. Default NULL}
106 | 
107 | \item{left_mar}{Space to the left. Default 4}
108 | 
109 | \item{bw_ord}{Names of the tracks to be drawn in the provided order. Default NULL.}
110 | 
111 | \item{layout_ord}{Plot layout order. Deafult c("p", "b", "h", "g", "c") corresponding to peaks track, bigWig track, chromHmm track, gene track, cytoband track.}
112 | }
113 | \description{
114 | Generate IGV style locus tracks with ease
115 | }
116 | \examples{
117 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\\\.bw$", full.names = TRUE) 
118 | cd = read_coldata(bws = bigWigs, build = "hg19")
119 | oct4_loci = "chr6:31125776-31144789"
120 | t = track_extract(colData = cd, loci = oct4_loci, build = "hg19")
121 | trackplot::track_plot(summary_list = t)
122 | }
123 | 


--------------------------------------------------------------------------------
/man/track_summarize.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{track_summarize}
 4 | \alias{track_summarize}
 5 | \title{Summarize tracks per condition}
 6 | \usage{
 7 | track_summarize(summary_list = NULL, condition = NULL, stat = "mean")
 8 | }
 9 | \arguments{
10 | \item{summary_list}{Output from track_extract. Required.}
11 | 
12 | \item{condition}{a column name in \code{coldata} containing sample conditions. Default NULL.}
13 | 
14 | \item{stat}{can be \verb{mean, median}, \code{max}, \code{min}. NAs are excluded.}
15 | }
16 | \description{
17 | Summarize tracks per condition
18 | }
19 | 


--------------------------------------------------------------------------------
/man/volcano_plot.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/trackplot.R
 3 | \name{volcano_plot}
 4 | \alias{volcano_plot}
 5 | \title{Volcano plot for limma results from differential peak Analysis}
 6 | \usage{
 7 | volcano_plot(
 8 |   res = NULL,
 9 |   fdr = 0.1,
10 |   upcol = "#d35400",
11 |   downcol = "#1abc9c",
12 |   alpha = 0.6,
13 |   size = 0.8
14 | )
15 | }
16 | \arguments{
17 | \item{res}{Output from \code{\link{diffpeak}}}
18 | 
19 | \item{fdr}{FDR threshold. Default 0.1}
20 | 
21 | \item{upcol}{color for up-regulated. Default "#d35400"}
22 | 
23 | \item{downcol}{color for up-regulated. Default "#1abc9c"}
24 | 
25 | \item{alpha}{Default 0.6}
26 | 
27 | \item{size}{Point size. Default 0.8}
28 | }
29 | \description{
30 | Volcano plot for limma results from differential peak Analysis
31 | }
32 | \details{
33 | Takes output from \code{diffpeak} and draws volcano plot
34 | }
35 | 


--------------------------------------------------------------------------------
/trackplot.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: Default
 4 | SaveWorkspace: Default
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 
15 | BuildType: Package
16 | PackageUseDevtools: Yes
17 | PackageInstallArgs: --no-multiarch --with-keep.source
18 | 


--------------------------------------------------------------------------------
/vignettes/trackplot.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "trackplot tutorial"
  3 | output:
  4 |   html_document:
  5 |     toc: yes
  6 |     toc_depth: 3
  7 |     toc_float: yes
  8 |     number_sections: yes
  9 |     self_contained: yes
 10 | vignette: >
 11 |   %\VignetteIndexEntry{01: trackplot tutorial}
 12 |   %\VignetteEngine{knitr::rmarkdown}
 13 |   %\VignetteEncoding{UTF-8}
 14 | ---
 15 | 
 16 | ```{r, include = FALSE}
 17 | knitr::opts_chunk$set(
 18 |   collapse = TRUE,
 19 |   comment = "#>"
 20 | )
 21 | ```
 22 | 
 23 | # Introduction
 24 | 
 25 | `trackplot` is an ultra-fast, simple, and minimal dependency R script to generate IGV style track plots (aka locus plots), profile plots, and heatmaps from bigWig files. 
 26 | Libraries in R such as `Gviz` and `karyoploteR` facilitates such visualization but they are slow, complicated, and come with a large number of dependencies.
 27 | 
 28 | `trackplot` overcomes the [dependency heaviness](https://en.wikipedia.org/wiki/Dependency_hell) by taking advantage of existing `bwtool` for faster bigWig processing, and by performing all the plotting in base R.
 29 | 
 30 | Some of the features include:
 31 | 
 32 |   * It's significantly fast since most of the heavy lifting is done by [bwtool](https://github.com/CRG-Barcelona/bwtool). 
 33 |   * Automatically queries UCSC genome browser for gene models, cytobands, and chromHMM tracks - making analysis reproducible.
 34 |   * Supports GTF and standard UCSC gene formats as well.
 35 |   * Lightweight and minimal dependency 
 36 |     - [data.table](https://cran.r-project.org/web/packages/data.table/index.html) and [bwtool](https://github.com/CRG-Barcelona/bwtool) are the only requirements. 
 37 |     - Plots are generated in pure base R graphics (no ggplot2 or tidyverse packages)
 38 |   * Customization: Each plot can be customized for color, scale, height, width, etc.
 39 |   * Tracks can be summarized per condition (by mean, median, max, min)
 40 | 
 41 | ## Pre-requisites
 42 | 
 43 | * [bwtool](https://github.com/CRG-Barcelona/bwtool) - a command line tool for processing bigWig files. Install and move the binary to a PATH (e.g; `/usr/local/bin`). 
 44 | Or, you could also add the path where bwtool is located to R session with the below command.
 45 | 
 46 | ```r
 47 | #Example
 48 | Sys.setenv(PATH = paste("/Users/anand/Documents/bwtool_dir/", Sys.getenv("PATH"), sep=":"))
 49 | ```
 50 | 
 51 | * If you have trouble compiling the tool, follow [these](https://gist.github.com/PoisonAlien/e19b482ac6146bfb03142a0de1c4fbc8) instructions. Alternatively, you can download the pre-built binary for [macOS](https://www.dropbox.com/s/kajx9ya6erzyrim/bwtool_macOS.tar.gz?dl=1) or [centOS](https://www.dropbox.com/s/77ek89jqfhcmouu/bwtool_centOS_x86_64.tar.gz?dl=1)
 52 | 
 53 | 
 54 | # Usage
 55 | 
 56 | Here ENCODE data from human embryonic stem cell line `H1` are used for demonstration. 
 57 | 
 58 | ```{r setup}
 59 | library(trackplot)
 60 | ```
 61 | 
 62 | All you need is a list of bigWig files to be analyzed. Make a column data of bigWig file names with `read_coldata()`. 
 63 | 
 64 | ```{r, importBW}
 65 | bigWigs = system.file("extdata", "bw", package = "trackplot") |> list.files(pattern = "\\.bw$", full.names = TRUE) 
 66 | 
 67 | #Generate a coldata table 
 68 | h1 = trackplot::read_coldata(bigWigs, build = "hg19")
 69 | 
 70 | print(h1)
 71 | 
 72 | #You can add any sort of metadata such as condition, treatment, etc to the above - similar to coldata in RNA-Seq analysis. 
 73 | ```
 74 | 
 75 | Above table will be the input for most of the downstream functions.
 76 | 
 77 | <img src="https://github.com/PoisonAlien/trackplot/assets/8164062/a6624174-34ff-4e6c-af2d-8a41e247d5fe" style="width:600px;"/>
 78 | 
 79 | # trackplot
 80 | 
 81 | `trackplot` generates an IGV style tracks with signal intensities, gene models and other desired annotation.
 82 | 
 83 | It is a two step process involving `track_extract()` -> `track_plot()`
 84 | 
 85 | ## Extract signal
 86 | 
 87 | Extract bigWig signal for the genomic regions of interest. This can be either 
 88 | 
 89 | * a genomic loci [e.g; `chr6:31115776-31154789`]
 90 | * a gene name [e.g; `POU5F1`]
 91 | 
 92 | ```{r}
 93 | #Region to plot
 94 | oct4_loci = "chr6:31125776-31144789"
 95 | 
 96 | #Extract bigWig signal for the above loci
 97 | t_loci = track_extract(colData = h1, loci = oct4_loci)
 98 | 
 99 | #To extract bigWig signal for the gene of interest
100 | # t_gene = track_extract(colData = h1, gene = "POU5F1")
101 | ```
102 | 
103 | ## Plot
104 | 
105 | ### Basic plot
106 | 
107 | ```{r, plot1, fig.height=5, fig.width=7}
108 | track_plot(summary_list = t_loci)
109 | ```
110 | 
111 | ### Change colors for each track
112 | 
113 | ```{r, plot_col, fig.height=5, fig.width=7}
114 | track_cols = c("#d35400","#d35400","#2980b9","#2980b9","#2980b9", "#27ae60","#27ae60")
115 | track_plot(summary_list = t_loci, col = track_cols)
116 | ```
117 | 
118 | ### Collapse all tracks into a single track
119 | 
120 | ```{r}
121 | track_plot(summary_list = t_loci, track_overlay = T, col = track_cols, show_ideogram = FALSE, genename = c("POU5F1", "TCF19"), gene_track_height = 1)
122 | ```
123 | 
124 | 
125 | ### Heighlight sites at the top
126 | 
127 | Using BED files or data.frame in BED format to heightlight target regions of interest
128 | 
129 | ```{r, plot_tfbs, fig.height=5.2, fig.width=7}
130 | tf_beds = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "Nanog|Oct4", full.names = TRUE) 
131 | print(basename(tf_beds))
132 | track_plot(
133 |   summary_list = t_loci,
134 |   col = track_cols,
135 |   peaks = tf_beds,
136 |   peaks_track_names = c("NANOG", "OCT4")
137 | )
138 | ```
139 | 
140 | 
141 | 
142 | ### Show only specific genes
143 | 
144 | Use `genename` argument to show only specific genes in the gene track
145 | 
146 | ```{r, plot_geneneames, fig.height=5.2, fig.width=7}
147 | track_plot(
148 |   summary_list = t_loci,
149 |   col = track_cols,
150 |   peaks = tf_beds,
151 |   peaks_track_names = c("NANOG", "OCT4"),
152 |   genename = c("POU5F1", "TCF19")
153 | )
154 | ```
155 | 
156 | ### Include chromHMM data
157 | 
158 | chromHMM data should be a BED file with the 4th column containing chromatin state. See here for an [example](https://github.com/PoisonAlien/trackplot/blob/master/inst/extdata/narrowpeak/H1_chromHMM.bed) file. 
159 | 
160 | ```{}
161 | chr6    31125621        31126021        1
162 | chr6    31126021        31127821        2
163 | chr6    31127821        31128221        6
164 | chr6    31128221        31129421        11
165 | ```
166 | 
167 | 
168 | Note that the color code for each of the 15 states are as described [here](https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeBroadHmm&db=hg19). 
169 | In case if it is different for your data, you will have to define your own color codes for each state and pass it to the argument `chromHMM_cols`
170 | 
171 | ```{r, plot_chrHMM, fig.height=5.2, fig.width=7}
172 | #Example chromHMM data for H1 from UCSC
173 | h1_chrHMM = system.file("extdata", "narrowpeak", "H1_chromHMM.bed", package = "trackplot") 
174 | 
175 | track_plot(
176 |   summary_list = t_loci,
177 |   col = track_cols,
178 |   peaks = tf_beds,
179 |   peaks_track_names = c("NANOG", "OCT4"),
180 |   genename = c("POU5F1", "TCF19"),
181 |   chromHMM = h1_chrHMM,
182 |   chromHMM_names = "H1"
183 | )
184 | ```
185 | 
186 | 
187 | ## Re-organize tracks
188 | 
189 | By default tracks are organized from top to bottom as `c("p", "b", "h", "g", "c")`  corresponding to peaks track, bigWig track, chromHmm track, gene track, and cytoband track. This can be changes with the argument `layout_ord`. Furthermore, bigWig tracks themselves can be ordered with the argument `bw_ord` which accepts the names of the bigWig tracks as input and plots them in the given order.
190 | 
191 | ```{r}
192 | #Re-organize the layout in the order chromHMM track, gene track, cytoband track, bigWig tracks and peak track.
193 | track_plot(
194 |   summary_list = t_loci,
195 |   col = track_cols,
196 |   peaks = tf_beds,
197 |   peaks_track_names = c("NANOG", "OCT4"),
198 |   genename = c("POU5F1", "TCF19"),
199 |   chromHMM = h1_chrHMM,
200 |   chromHMM_names = "H1", layout_ord = c("h", "g", "c", "b", "p")
201 | )
202 | ```
203 | 
204 | ## Overlay tracks
205 | 
206 | ```{r}
207 | #Re-organize the layout in the order chromHMM track, gene track, cytoband track, bigWig tracks and peak track.
208 | track_plot(
209 |   summary_list = t_loci,
210 |   col = track_cols,
211 |   peaks = tf_beds,
212 |   peaks_track_names = c("NANOG", "OCT4"),
213 |   genename = c("POU5F1", "TCF19"),
214 |   chromHMM = h1_chrHMM,
215 |   chromHMM_names = "H1", layout_ord = c("h", "g", "c", "b", "p"), track_overlay = TRUE
216 | )
217 | ```
218 | 
219 | ## Peak files as input
220 | 
221 | All of the above plots can also be generated with [narrowPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format12) or [broadPeak](https://genome.ucsc.edu/FAQ/FAQformat.html#format13) files as input. Here, 5th column containing scores are plotted as intensity. Color coding and binning of scores are as per [UCSC convention](https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
222 | 
223 | `narrowPeak` is one of the output from macs2 peak caller and are easier to visualize in the absence of bigWig files. 
224 | Process is similar, `read_coldata` -> `track_extract` -> `track_plot`
225 | 
226 | ```{r plot_narrowPeak, fig.height=5.2, fig.width=7}
227 | narrowPeak_files = system.file("extdata", "narrowpeak", package = "trackplot") |> list.files(pattern = "\\.bed$", full.names = TRUE) |> grep(pattern = "chromHMM.bed$", invert = TRUE, value = TRUE) 
228 | print(basename(narrowPeak_files))
229 | 
230 | #Generate a coldata table 
231 | h1_peaks = read_coldata(narrowPeak_files, build = "hg19", input_type = "peak")
232 | 
233 | #Extract signal intensities from narrowPeak files
234 | oct4_loci = "chr6:30,818,383-31,452,182" #633Kb region for example
235 | t_loci2 = track_extract(colData = h1_peaks, loci = oct4_loci)
236 | 
237 | #Plot
238 | track_plot(
239 |     summary_list = t_loci2,
240 |     col = track_cols,
241 |     peaks = tf_beds,
242 |     peaks_track_names = c("NANOG", "OCT4"),
243 |     genename = c("POU5F1", "TCF19"),
244 |     chromHMM = h1_chrHMM,
245 |     chromHMM_names = "H1"
246 | )
247 | ```
248 | 
249 | # profileplot
250 | 
251 | Example data from [GSE99183](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99183) where U87 glioma cell lines are treated with a DMSO and a BRD4 degradaer.
252 | 
253 | ```{r, profile_plot, fig.height=5.2, fig.width=7}
254 | bws = list.files(path = "/Users/anandmayakonda/Documents/Documents_MacBookPro_work/github/data.nosync/GSE99183_U87_BRD4/", pattern = "bw", full.names = TRUE)
255 | bws = grep(pattern = "BRD3|BRD2|H3K27Ac|Pol2|H3K4|E2F1", x = bws, value = TRUE, invert = TRUE)
256 | print(basename(bws))
257 | ```
258 | 
259 | ```{r, profile_plot_cd}
260 | bws = trackplot::read_coldata(bws = bws, sample_names = c("BRD4", "BRD4_dBET_24h", "BRD4_dBET_2h"), build = "hg19")
261 | 
262 | #Add some metadata info
263 | bws$condition = c("dmso", "dBET", "dBET")
264 | 
265 | print(bws)
266 | ```
267 | 
268 | `profile_extract()` -> `profile_summarize()` -> `profile_plot()`
269 | 
270 | ### Around TSS
271 | 
272 | Setting `ucsc_assembly = TRUE` will fetch refseq transcripts from UCSC browser and plots the signal distribution around the TSS or TES.
273 | 
274 | ```{r, profilePlot_tss, fig.height=5, fig.width=5}
275 | #Extract signals from bigWig files around refseq transcripts
276 | pe_refseq = trackplot::profile_extract(colData = bws, ucsc_assembly = TRUE, startFrom = 'start', up = 1500, down = 1500)
277 | #Estimate mean signal
278 | ps_refseq = trackplot::profile_summarize(sig_list = pe_refseq) 
279 | ```
280 | 
281 | ```{r, fig.height=5, fig.width=6}
282 | #Plot
283 | trackplot::profile_plot(ps_refseq, legend_fs = 0.7, xlab = "TSS", ylab = "RPM")
284 | ```
285 | 
286 | ### Around BED regions of intersert
287 | 
288 | Providing a BED file or a data.frame in BED format..
289 | 
290 | ```{r, profilePlot_bed}
291 | #BRD4 binding sites 
292 | bed = "/Users/anandmayakonda/Documents/Documents_MacBookPro_work/github/data.nosync/GSE99183_U87_BRD4/GSM2634756_U87_BRD4_peaks.narrowPeak"
293 | #Center and extend 1500 both ways from the peak center
294 | pe_bed = trackplot::profile_extract(colData = bws, bed = bed, startFrom = "center", up = 1500, down = 1500, nthreads = 4)
295 | #Estimate mean signal
296 | ps_bed = trackplot::profile_summarize(sig_list = pe_bed) 
297 | ```
298 | 
299 | ```{r, fig.height=5, fig.width=6}
300 | #Plot
301 | trackplot::profile_plot(ps_bed, legend_fs = 0.7, xlab = "peak center", ylab = "RPM")
302 | ```
303 | 
304 | # heatmap
305 | 
306 | Output from `profile_extract` can be used to draw a heatmap with `profile_heatmap`
307 | 
308 | ```{r, heatmap, fig.height=7, fig.width=5}
309 | trackplot::profile_heatmap(mat_list = pe_bed, top_profile = TRUE, zmaxs = 0.8)
310 | ```
311 | 
312 | 
313 | # Peak intensities
314 | 
315 | Extract peak intensities. Similar to `profile_extract`, setting `ucsc_assembly = TRUE` will fetch refseq transcripts from UCSC browser and extracts peak intensities aroud the TSS or TES.
316 | 
317 | ```{r, es}
318 | tss_summary = trackplot::extract_summary(colData = bws, ucsc_assembly = TRUE, up = 1500, down = 1500) #For TSS regions
319 | bed_summary = trackplot::extract_summary(colData = bws, bed = bed, up = 1500, down = 1500) #For BED regions
320 | ```
321 | 
322 | output includes peak intensities across all the sites
323 | 
324 | ```{r, esdata}
325 | print(tss_summary$data)
326 | ```
327 | 
328 | Above signal intensities can be used to perform PCA analysis or do differential peak analysis.
329 | 
330 | ## PCA 
331 | 
332 | ```{r, espca}
333 | trackplot::pca_plot(summary_list = tss_summary, log2 = TRUE, color_by = "condition", lab_size = 0.6)
334 | ```
335 | 
336 | 
337 | ## Differential peak analysis
338 | 
339 | Although not recommended, above peak intensities can be used to perform a differential peak analysis between two conditions using [limma](https://www.bioconductor.org/packages/release/bioc/html/limma.html)
340 | 
341 | WARNING: Please make sure that the bigWigs contain normalized intensities such as RPM. Otherwise you should strongly consider using tools such as [DiffBind](https://bioconductor.org/packages/devel/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf)
342 | 
343 | 
344 | ```{r, eslimma}
345 | library(limma)
346 | dp_bed = trackplot::diffpeak(summary_list = bed_summary, condition = "condition", num = "dBET", den = "dmso")
347 | print(dp_bed)
348 | ```
349 | 
350 | Above output can be plotted with `volcanoplot`. 
351 | 
352 | ```{r, volcano}
353 | trackplot::volcano_plot(res = dp_bed)
354 | ```
355 | 
356 | Here, the treatment effect is quite strong and lack of replicates makes it as a not so nice example. 
357 | 
358 | 
359 | ## Sessioninfo
360 | 
361 | ```{r}
362 | sessionInfo()
363 | ```
364 | 
365 | 


--------------------------------------------------------------------------------