├── .Rbuildignore ├── .gitignore ├── DESCRIPTION ├── NAMESPACE ├── NEWS.md ├── R ├── ActivePathways.r ├── cytoscape.r ├── gmt.r ├── merge_p.r └── statistical_tests.r ├── README.md ├── inst └── extdata │ ├── Adenocarcinoma_scores_subset.tsv │ ├── Differential_expression_rna_protein.tsv │ ├── enrichmentMap__legend.pdf │ ├── enrichmentMap__pathways.gmt │ ├── enrichmentMap__pathways.txt │ ├── enrichmentMap__subgroups.txt │ ├── hsapiens_REAC_subset.gmt │ └── hsapiens_REAC_subset2.gmt ├── man ├── ActivePathways.Rd ├── DPM.Rd ├── GMT.Rd ├── brownsMethod.Rd ├── columnSignificance.Rd ├── enrichmentAnalysis.Rd ├── export_as_CSV.Rd ├── hypergeometric.Rd ├── makeBackground.Rd ├── merge_p_values.Rd ├── orderedHypergeometric.Rd └── prepareCytoscape.Rd ├── tests ├── testthat.R └── testthat │ ├── helper.r │ ├── hsapiens_REAC_subset.gmt │ ├── test.gmt │ ├── test_columnContribution.r │ ├── test_columnSignificance.r │ ├── test_cytoscape.r │ ├── test_data.txt │ ├── test_data_rna_protein.tsv │ ├── test_enrichmentAnalysis.r │ ├── test_export_CSV.r │ ├── test_merge_p_values.r │ ├── test_orderedHypergeometric.r │ ├── test_return.r │ └── test_validation.r └── vignettes ├── ActivePathways-vignette.Rmd ├── CreateEnrichmentMapDialogue_V2.png ├── ImportStep_V2.png ├── LegendView.png ├── LegendView_Custom.png ├── LegendView_RColorBrewer.png ├── NetworkStep1_V2.png ├── NetworkStep2_V2.png ├── PropertiesDropDown2_V2.png ├── StylePanel_V2.png ├── border_line_type.jpg ├── legend.png ├── lineplot_tutorial.png ├── new_map.png └── set_aesthetic.jpg /.Rbuildignore: -------------------------------------------------------------------------------- 1 | ^.*\.Rproj$ 2 | ^\.Rproj\.user$ 3 | ^self_testing$ 4 | ^\.Rhistory$ 5 | ^\.gitignore$ 6 | ^Notes$ 7 | ^README.md$ 8 | ^results_ActivePathways.csv$ 9 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj 2 | .Rproj.user 3 | .Rhistory 4 | .RData 5 | .DS_Store 6 | ActivePathways.Rproj 7 | -------------------------------------------------------------------------------- /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: ActivePathways 2 | Title: Integrative Pathway Enrichment Analysis of Multivariate Omics Data 3 | Version: 2.0.5 4 | Authors@R: c(person("Juri", "Reimand", email = "juri.reimand@utoronto.ca", role = c("aut", "cre")), 5 | person("Jonathan", "Barenboim", email = "jon.barenboim@gmail.com", role = "ctb"), 6 | person("Mykhaylo", "Slobodyanyuk", email = "michael.slobodyanyuk@oicr.on.ca", role = "aut")) 7 | Description: Framework for analysing multiple omics datasets in the context of molecular pathways, biological processes and other types of gene sets. The package uses p-value merging to combine gene- or protein-level signals, followed by ranked hypergeometric tests to determine enriched pathways and processes. Genes can be integrated using directional constraints that reflect how the input datasets are expected interact with one another. This approach allows researchers to interpret a series of omics datasets in the context of known biology and gene function, and discover associations that are only apparent when several datasets are combined. The recent version of the package is part of the following publication: Directional integration and pathway enrichment analysis for multi-omics data. Slobodyanyuk M^, Bahcheli AT^, Klein ZP, Bayati M, Strug LJ, Reimand J. Nature Communications (2024) . 8 | Depends: R (>= 3.6) 9 | Imports: 10 | data.table, 11 | ggplot2 12 | License: GPL-3 13 | URL: 14 | BugReports: https://github.com/reimandlab/ActivePathways/issues 15 | Encoding: UTF-8 16 | LazyData: true 17 | RoxygenNote: 7.3.1 18 | Suggests: testthat, 19 | knitr, 20 | rmarkdown, 21 | RColorBrewer 22 | VignetteBuilder: knitr 23 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | S3method("$",GMT) 4 | S3method("[",GMT) 5 | S3method("[[",GMT) 6 | S3method(print,GMT) 7 | export(ActivePathways) 8 | export(DPM) 9 | export(brownsMethod) 10 | export(export_as_CSV) 11 | export(is.GMT) 12 | export(makeBackground) 13 | export(merge_p_values) 14 | export(orderedHypergeometric) 15 | export(read.GMT) 16 | export(write.GMT) 17 | import(data.table) 18 | import(ggplot2) 19 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | ### ActivePathways 2.0.5 2 | * Fixed a minor bug when exporting results from the ActivePathways() output as a data.table into a csv file. Thre bug occurred when all results unfiltered by statistical significance values were exported, resulting in NULL values in gene overlap columns of resulting tables. These NULL entries are now first converted to an empty string inside the export_as_CSV() function. 3 | 4 | ### ActivePathways 2.0.4 5 | * Minor update to ensure the 'scores' and 'scores_direction' matrices have the same number of rows, and that the gene row names in 'scores' are in the same order as 'scores_direction'. The method now reports an error and terminates if two matrices are misaligned. 6 | 7 | ### ActivePathways 2.0.3 8 | * Minor updates in documentation and code examples. 9 | 10 | ### ActivePathways 2.0.2 11 | * Separated the directional P-value merging methods into separate functions. 'Fisher_directional', 'DPM', 'Stouffer_directional', and 'Strube_directional' methods perform directional integration. The 'scores_direction' and 'constraints_vector' parameters must be provided. 12 | 13 | ### ActivePathways 2.0.1 14 | * Changed how very small P-values are processed before P-value merging. P-values of '0' or anything less than '1e-300' are converted to '1e-300'. 15 | 16 | ### ActivePathways 2.0.0 17 | * Incorporated 'scores_direction' and 'constraints_vector' parameters to ActivePathways() and merge_p_values() to account for the direction between datasets when performing p-value merging. 18 | * Added the 'Stouffer' and 'Strube' p-value merging methods as alternatives to 'Fisher' and "Brown', respectively. 19 | * Changed the naming convention of parameters and objects, substituting a period '.' for an underscore '_'. 20 | 21 | ### ActivePathways 1.1.1 22 | * Added an option for the colours specified in the "custom_colors" parameter to be provided in any order, as long as the vector "names()" match the column names of the "scores" matrix. 23 | * Fixed an error where the "combined" contribution label was absent from the legend.pdf ActivePathways output file. 24 | 25 | ### ActivePathways 1.1.0 26 | * Updated the filtering procedure of gene sets in the GMT file when a custom gene background is provided. Given a background gene list, the GMT gene sets are first modified to only include the genes of the background list, and second, the gene sets are filtered by gene set size. Gene sets lacking any genes from the background list are removed. This update will result in a more lenient multiple testing correction in analyses with a custom background gene list. 27 | 28 | ### ActivePathways 1.0.4 29 | * Added three new parameters to ActivePathways and prepareCystoscape functions. These include "color_palette", "custom_colors" and "color_integrated_only" to provide more options for node coloring in Cytoscape. 30 | 31 | ### ActivePathways 1.0.3 32 | * Removed dependency used for testing for CRAN compliance. 33 | 34 | ### ActivePathways 1.0.2 35 | * Renamed package to ActivePathways from activePathways for consistency 36 | with function and publication 37 | * Added new function export_as_CSV(res, file_name) to save data in 38 | spreadsheet-friendly formats 39 | * Updated README-file with an actionable step-by-step tutorial 40 | * Changed logic of creating files for Enrichment Map: the user can provide 41 | the parameter "cytoscape.file.tag" for creating the required files. If the 42 | parameter is NA (default), no files are created. No directories are created. 43 | * Removed the parameter "return.all" as it was redundant with the 44 | parameter "significance". 45 | * Removed the parameter "reanalyze" to simplify the package and leave the structuring 46 | of results up to the user. 47 | * Removed the dependency on the R package metap. As a result, only Fisher's and Brown's p-value 48 | merging options are available. 49 | * Updated the vignette that now describes the ActivePathways package as well as the following steps of visualising results as enrichment maps in Cytoscape. 50 | -------------------------------------------------------------------------------- /R/ActivePathways.r: -------------------------------------------------------------------------------- 1 | #' ActivePathways 2 | #' 3 | #' @param scores A numerical matrix of p-values where each row is a gene and 4 | #' each column represents an omics dataset (evidence). Rownames correspond to the genes 5 | #' and colnames to the datasets. All values must be 0<=p<=1. We recommend converting 6 | #' missing values to ones. 7 | #' @param gmt A GMT object to be used for enrichment analysis. If a filename, a 8 | #' GMT object will be read from the file. 9 | #' @param background A character vector of gene names to be used as a 10 | #' statistical background. By default, the background is all genes that appear 11 | #' in \code{gmt}. 12 | #' @param geneset_filter A numeric vector of length two giving the lower and 13 | #' upper limits for the size of the annotated geneset to pathways in gmt. 14 | #' Pathways with a geneset shorter than \code{geneset_filter[1]} or longer 15 | #' than \code{geneset_filter[2]} will be removed. Set either value to NA 16 | #' to not enforce a minimum or maximum value, or set \code{geneset_filter} to 17 | #' \code{NULL} to skip filtering. 18 | #' @param cutoff A maximum merged p-value for a gene to be used for analysis. 19 | #' Any genes with merged, unadjusted \code{p > significant} will be discarded 20 | #' before testing. 21 | #' @param significant Significance cutoff for selecting enriched pathways. Pathways with 22 | #' \code{adjusted_p_val <= significant} will be selected as results. 23 | #' @param merge_method Statistical method to merge p-values. See section on Merging P-Values 24 | #' @param correction_method Statistical method to correct p-values. See 25 | #' \code{\link[stats]{p.adjust}} for details. 26 | #' @param cytoscape_file_tag The directory and/or file prefix to which the output files 27 | #' for generating enrichment maps should be written. If NA, files will not be written. 28 | #' @param color_palette Color palette from RColorBrewer::brewer.pal to color each 29 | #' column in the scores matrix. If NULL grDevices::rainbow is used by default. 30 | #' @param custom_colors A character vector of custom colors for each column in the scores matrix. 31 | #' @param color_integrated_only A character vector of length 1 specifying the color of the 32 | #' "combined" pathway contribution. 33 | #' @param scores_direction A numerical matrix of log2 transformed fold-change values where each row is a 34 | #' gene and each column represents a dataset (evidence). Rownames correspond to the genes 35 | #' and colnames to the datasets. We recommend converting missing values to zero. 36 | #' Must contain the same dimensions as the scores parameter. Datasets without directional information should be set to 0. 37 | #' @param constraints_vector A numerical vector of +1 or -1 values corresponding to the user-defined 38 | #' directional relationship between columns in scores_direction. Datasets without directional information should 39 | #' be set to 0. 40 | #' 41 | #' @return A data.table of terms (enriched pathways) containing the following columns: 42 | #' \describe{ 43 | #' \item{term_id}{The database ID of the term} 44 | #' \item{term_name}{The full name of the term} 45 | #' \item{adjusted_p_val}{The associated p-value, adjusted for multiple testing} 46 | #' \item{term_size}{The number of genes annotated to the term} 47 | #' \item{overlap}{A character vector of the genes enriched in the term} 48 | #' \item{evidence}{Columns of \code{scores} (i.e., omics datasets) that contributed 49 | #' individually to the enrichment of the term. Each input column is evaluated 50 | #' separately for enrichments and added to the evidence if the term is found.} 51 | #' } 52 | #' 53 | #' @section Merging P-values: 54 | #' To obtain a single p-value for each gene across the multiple omics datasets considered, 55 | #' the p-values in \code{scores} #' are merged row-wise using a data fusion approach of p-value merging. 56 | #' The eight available methods are: 57 | #' \describe{ 58 | #' \item{Fisher}{Fisher's method assumes p-values are uniformly 59 | #' distributed and performs a chi-squared test on the statistic sum(-2 log(p)). 60 | #' This method is most appropriate when the columns in \code{scores} are 61 | #' independent.} 62 | #' \item{Fisher_directional}{Fisher's method modification that allows for 63 | #' directional information to be incorporated with the \code{scores_direction} 64 | #' and \code{constraints_vector} parameters.} 65 | #' \item{Brown}{Brown's method extends Fisher's method by accounting for the 66 | #' covariance in the columns of \code{scores}. It is more appropriate when the 67 | #' tests of significance used to create the columns in \code{scores} are not 68 | #' necessarily independent. The Brown's method is therefore recommended for 69 | #' many omics integration approaches.} 70 | #' \item{DPM}{DPM extends Brown's method by incorporating directional information 71 | #' using the \code{scores_direction} and \code{constraints_vector} parameters.} 72 | #' \item{Stouffer}{Stouffer's method assumes p-values are uniformly distributed 73 | #' and transforms p-values into a Z-score using the cumulative distribution function of a 74 | #' standard normal distribution. This method is appropriate when the columns in \code{scores} 75 | #' are independent.} 76 | #' \item{Stouffer_directional}{Stouffer's method modification that allows for 77 | #' directional information to be incorporated with the \code{scores_direction} 78 | #' and \code{constraints_vector} parameters.} 79 | #' \item{Strube}{Strube's method extends Stouffer's method by accounting for the 80 | #' covariance in the columns of \code{scores}.} 81 | #' \item{Strube_directional}{Strube's method modification that allows for 82 | #' directional information to be incorporated with the \code{scores_direction} 83 | #' and \code{constraints_vector} parameters.} 84 | #' } 85 | #' 86 | #' @section Cytoscape: 87 | #' To visualize and interpret enriched pathways, ActivePathways provides an option 88 | #' to further analyse results as enrichment maps in the Cytoscape software. 89 | #' If \code{!is.na(cytoscape_file_tag)}, four files will be written that can be used 90 | #' to build enrichment maps. This requires the EnrichmentMap and enhancedGraphics apps. 91 | #' 92 | #' The four files written are: 93 | #' \describe{ 94 | #' \item{pathways.txt}{A list of significant terms and the 95 | #' associated p-value. Only terms with \code{adjusted_p_val <= significant} are 96 | #' written to this file.} 97 | #' \item{subgroups.txt}{A matrix indicating whether the significant terms (pathways) 98 | #' were also found to be significant when considering only one column from 99 | #' \code{scores}. A one indicates that term was found to be significant 100 | #' when only p-values in that column were used to select genes.} 101 | #' \item{pathways.gmt}{A Shortened version of the supplied GMT 102 | #' file, containing only the significantly enriched terms in pathways.txt. } 103 | #' \item{legend.pdf}{A legend with colours matching contributions 104 | #' from columns in \code{scores}.} 105 | #' } 106 | #' 107 | #' How to use: Create an enrichment map in Cytoscape with the file of terms 108 | #' (pathways.txt) and the shortened gmt file 109 | #' (pathways.gmt). Upload the subgroups file (subgroups.txt) as a table 110 | #' using the menu File > Import > Table from File. To paint nodes according 111 | #' to the type of supporting evidence, use the 'style' 112 | #' panel, set image/Chart1 to use the column `instruct` and the passthrough 113 | #' mapping type. Make sure the app enhancedGraphics is installed. 114 | #' Lastly, use the file legend.pdf as a reference for colors in the enrichment map. 115 | #' 116 | #' @examples 117 | #' fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv", 118 | #' package = "ActivePathways") 119 | #' fname_GMT = system.file("extdata", "hsapiens_REAC_subset.gmt", 120 | #' package = "ActivePathways") 121 | #' 122 | #' dat <- as.matrix(read.table(fname_scores, header = TRUE, row.names = 'Gene')) 123 | #' dat[is.na(dat)] <- 1 124 | #' 125 | #' ActivePathways(dat, fname_GMT) 126 | #' 127 | #' @import data.table 128 | #' 129 | #' @export 130 | 131 | ActivePathways <- function(scores, gmt, background = makeBackground(gmt), 132 | geneset_filter = c(5, 1000), cutoff = 0.1, significant = 0.05, 133 | merge_method = c("Fisher", "Fisher_directional", "Brown", "DPM", "Stouffer", 134 | "Stouffer_directional", "Strube", "Strube_directional"), 135 | correction_method = c("holm", "fdr", "hochberg", "hommel", 136 | "bonferroni", "BH", "BY", "none"), 137 | cytoscape_file_tag = NA, color_palette = NULL, custom_colors = NULL, 138 | color_integrated_only = "#FFFFF0", scores_direction = NULL, 139 | constraints_vector = NULL) { 140 | 141 | merge_method <- match.arg(merge_method) 142 | correction_method <- match.arg(correction_method) 143 | 144 | ##### Validation ##### 145 | # scores 146 | if (!(is.matrix(scores) && is.numeric(scores))) stop("scores must be a numeric matrix") 147 | if (any(is.na(scores))) stop("scores cannot contain missing values, we recommend replacing NA with 1 or removing") 148 | if (any(scores < 0) || any(scores > 1)) stop("All values in scores must be in [0,1]") 149 | if (any(duplicated(rownames(scores)))) stop("scores matrix contains duplicated genes - rownames must be unique") 150 | 151 | # scores_direction and constraints_vector 152 | if (xor(!is.null(scores_direction),!is.null(constraints_vector))) stop("Both scores_direction and constraints_vector must be provided") 153 | if (!is.null(scores_direction) && !is.null(constraints_vector)){ 154 | if (!(is.numeric(constraints_vector) && is.vector(constraints_vector))) stop("constraints_vector must be a numeric vector") 155 | if (any(!constraints_vector %in% c(1,-1,0))) stop("constraints_vector must contain the values: 1, -1 or 0") 156 | if (!(is.matrix(scores_direction) && is.numeric(scores_direction))) stop("scores_direction must be a numeric matrix") 157 | if (any(is.na(scores_direction))) stop("scores_direction cannot contain missing values, we recommend replacing NA with 0 or removing") 158 | if (nrow(scores) != nrow(scores_direction)) stop("scores and scores_direction must have the same number of rows") 159 | if (any(!rownames(scores_direction) %in% rownames(scores))) stop ("scores_direction gene names must match scores genes") 160 | if (any(rownames(scores) != rownames(scores_direction))) stop("scores genes should be in the same order as scores_direction genes") 161 | if (is.null(colnames(scores)) || is.null(colnames(scores_direction))) stop("column names must be provided to scores and scores_direction") 162 | if (any(!colnames(scores_direction) %in% colnames(scores))) stop("scores_direction column names must match scores column names") 163 | if (length(constraints_vector) != length(colnames(scores_direction))) stop("constraints_vector should have the same number of entries as columns in scores_direction") 164 | if (merge_method %in% c("Fisher","Brown","Stouffer","Strube")) stop("Only DPM, Fisher_directional, Stouffer_directional, and Strube_directional methods support directional integration") 165 | if (any(constraints_vector %in% 0) && !all(scores_direction[,constraints_vector %in% 0] == 0)) 166 | stop("scores_direction entries must be set to 0's for columns that do not contain directional information") 167 | if (!is.null(names(constraints_vector))){ 168 | if (!all.equal(names(constraints_vector), colnames(scores_direction), colnames(scores)) == TRUE){ 169 | stop("the constraints_vector entries should match the order of scores and scores_direction columns") 170 | }}} 171 | 172 | # cutoff and significant 173 | stopifnot(length(cutoff) == 1) 174 | stopifnot(is.numeric(cutoff)) 175 | if (cutoff < 0 || cutoff > 1) stop("cutoff must be a value in [0,1]") 176 | stopifnot(length(significant) == 1) 177 | stopifnot(is.numeric(significant)) 178 | if (significant < 0 || significant > 1) stop("significant must be a value in [0,1]") 179 | 180 | # gmt 181 | if (!is.GMT(gmt)) gmt <- read.GMT(gmt) 182 | if (length(gmt) == 0) stop("No pathways in gmt made the geneset_filter") 183 | if (!(is.character(background) && is.vector(background))) { 184 | stop("background must be a character vector") 185 | } 186 | 187 | # geneset_filter 188 | if (!is.null(geneset_filter)) { 189 | if (!(is.numeric(geneset_filter) && is.vector(geneset_filter))) { 190 | stop("geneset_filter must be a numeric vector") 191 | } 192 | if (length(geneset_filter) != 2) stop("geneset_filter must be length 2") 193 | if (!is.numeric(geneset_filter)) stop("geneset_filter must be numeric") 194 | if (any(geneset_filter < 0, na.rm=TRUE)) stop("geneset_filter limits must be positive") 195 | } 196 | 197 | # custom_colors 198 | if (!is.null(custom_colors)){ 199 | if(!(is.character(custom_colors) && is.vector(custom_colors))){ 200 | stop("colors must be provided as a character vector") 201 | } 202 | if(length(colnames(scores)) != length(custom_colors)) stop("incorrect number of colors is provided") 203 | } 204 | if (!is.null(custom_colors) & !is.null(color_palette)){ 205 | stop("Both custom_colors and color_palette are provided. Specify only one of these parameters for node coloring.") 206 | } 207 | 208 | if (!is.null(names(custom_colors))){ 209 | if (!all(names(custom_colors) %in% colnames(scores))){ 210 | stop("names() of the custom colors vector should match the scores column names") 211 | } 212 | } 213 | 214 | # color_palette 215 | if (!is.null(color_palette)){ 216 | if (!(color_palette %in% rownames(RColorBrewer::brewer.pal.info))) stop("palette must be from the RColorBrewer package") 217 | } 218 | 219 | # color_integrated_only 220 | if(!(is.character(color_integrated_only) && is.vector(color_integrated_only))){ 221 | stop("color must be provided as a character vector") 222 | } 223 | if(1 != length(color_integrated_only)) stop("only a single color must be specified") 224 | 225 | # contribution 226 | contribution <- TRUE 227 | if (ncol(scores) == 1) { 228 | contribution <- FALSE 229 | message("scores matrix contains only one column. Column contributions will not be calculated") 230 | } 231 | 232 | ##### filtering and sorting #### 233 | 234 | # Remove any genes not found in the background 235 | orig_length <- nrow(scores) 236 | scores <- scores[rownames(scores) %in% background, , drop=FALSE] 237 | if(!is.null(scores_direction)){ 238 | scores_direction <- scores_direction[rownames(scores_direction) %in% background, , drop=FALSE] 239 | } 240 | if (nrow(scores) == 0) { 241 | stop("scores does not contain any genes in the background") 242 | } 243 | if (nrow(scores) < orig_length) { 244 | message(paste(orig_length - nrow(scores), "rows were removed from scores", 245 | "because they are not found in the background")) 246 | } 247 | 248 | 249 | # Filter the GMT 250 | if (!all(background %in% unique(unlist(sapply(gmt, "[", c(3)))))){ 251 | background_genes <- lapply(sapply(gmt, "[", c(3)), intersect, background) 252 | background_genes <- background_genes[lapply(background_genes,length) > 0] 253 | gmt <- gmt[names(sapply(gmt,"[",c(3))) %in% names(background_genes)] 254 | for (i in 1:length(gmt)) { 255 | gmt[[i]]$genes <- background_genes[[i]] 256 | } 257 | } 258 | 259 | if(!is.null(geneset_filter)) { 260 | orig_length <- length(gmt) 261 | if (!is.na(geneset_filter[1])) { 262 | gmt <- Filter(function(x) length(x$genes) >= geneset_filter[1], gmt) 263 | } 264 | if (!is.na(geneset_filter[2])) { 265 | gmt <- Filter(function(x) length(x$genes) <= geneset_filter[2], gmt) 266 | } 267 | if (length(gmt) == 0) stop("No pathways in gmt made the geneset_filter") 268 | if (length(gmt) < orig_length) { 269 | message(paste(orig_length - length(gmt), "terms were removed from gmt", 270 | "because they did not make the geneset_filter")) 271 | } 272 | } 273 | 274 | # merge p-values to get a single score for each gene and remove any genes 275 | # that don't make the cutoff 276 | merged_scores <- merge_p_values(scores, merge_method, scores_direction, constraints_vector) 277 | merged_scores <- merged_scores[merged_scores <= cutoff] 278 | 279 | if (length(merged_scores) == 0) stop("No genes made the cutoff") 280 | 281 | # Sort genes by p-value 282 | ordered_scores <- names(merged_scores)[order(merged_scores)] 283 | 284 | ##### enrichmentAnalysis and column contribution ##### 285 | 286 | res <- enrichmentAnalysis(ordered_scores, gmt, background) 287 | adjusted_p <- stats::p.adjust(res$adjusted_p_val, method = correction_method) 288 | res[, "adjusted_p_val" := adjusted_p] 289 | 290 | significant_indeces <- which(res$adjusted_p_val <= significant) 291 | if (length(significant_indeces) == 0) { 292 | warning("No significant terms were found") 293 | return() 294 | } 295 | 296 | if (contribution) { 297 | sig_cols <- columnSignificance(scores, gmt, background, cutoff, 298 | significant, correction_method, res$adjusted_p_val) 299 | res <- cbind(res, sig_cols[, -1]) 300 | } else { 301 | sig_cols <- NULL 302 | } 303 | 304 | # if significant result were found and cytoscape file tag exists 305 | # proceed with writing files in the working directory 306 | if (length(significant_indeces) > 0 & !is.na(cytoscape_file_tag)) { 307 | prepareCytoscape(res[significant_indeces, c("term_id", "term_name", "adjusted_p_val")], 308 | gmt[significant_indeces], 309 | cytoscape_file_tag, 310 | sig_cols[significant_indeces,], color_palette, custom_colors, color_integrated_only) 311 | } 312 | 313 | res[significant_indeces] 314 | } 315 | 316 | 317 | #' Perform pathway enrichment analysis on an ordered list of genes 318 | #' 319 | #' @param genelist character vector of gene names, in decreasing order 320 | #' of significance 321 | #' @param gmt GMT object 322 | #' @param background character vector of gene names. List of all genes being used 323 | #' as a statistical background 324 | #' 325 | #' @return a data.table of terms with the following columns: 326 | #' \describe{ 327 | #' \item{term_id}{The id of the term} 328 | #' \item{term_name}{The full name of the term} 329 | #' \item{adjusted_p_val}{The associated p-value adjusted for multiple testing} 330 | #' \item{term_size}{The number of genes annotated to the term} 331 | #' \item{overlap}{A character vector of the genes that overlap between the 332 | #' term and the query} 333 | #' } 334 | #' @keywords internal 335 | enrichmentAnalysis <- function(genelist, gmt, background) { 336 | dt <- data.table(term_id=names(gmt)) 337 | 338 | for (i in 1:length(gmt)) { 339 | term <- gmt[[i]] 340 | tmp <- orderedHypergeometric(genelist, background, term$genes) 341 | overlap <- genelist[1:tmp$ind] 342 | overlap <- overlap[overlap %in% term$genes] 343 | if (length(overlap) == 0) overlap <- c() 344 | set(dt, i, 'term_name', term$name) 345 | set(dt, i, 'adjusted_p_val', tmp$p_val) 346 | set(dt, i, 'term_size', length(term$genes)) 347 | set(dt, i, 'overlap', list(list(overlap))) 348 | } 349 | dt 350 | } 351 | 352 | #' Determine which terms are found to be significant using each column 353 | #' individually. 354 | #' 355 | #' @inheritParams ActivePathways 356 | #' @param pvals p-value for the pathways calculated by ActivePathways 357 | #' 358 | #' @return a data.table with columns 'term_id' and a column for each column 359 | #' in \code{scores}, indicating whether each term (pathway) was found to be 360 | #' significant or not when considering only that column. For each term, 361 | #' either report the list of related genes if that term was significant, or NA if not. 362 | 363 | columnSignificance <- function(scores, gmt, background, cutoff, significant, correction_method, pvals) { 364 | dt <- data.table(term_id=names(gmt), evidence=NA) 365 | for (col in colnames(scores)) { 366 | col_scores <- scores[, col, drop=TRUE] 367 | col_scores <- col_scores[col_scores <= cutoff] 368 | col_scores <- names(col_scores)[order(col_scores)] 369 | 370 | res <- enrichmentAnalysis(col_scores, gmt, background) 371 | set(res, i = NULL, "adjusted_p_val", stats::p.adjust(res$adjusted_p_val, correction_method)) 372 | set(res, i = which(res$adjusted_p_val > significant), "overlap", list(list(NA))) 373 | set(dt, i=NULL, col, res$overlap) 374 | } 375 | 376 | ev_names = colnames(dt[,-1:-2]) 377 | set_evidence <- function(x) { 378 | ev <- ev_names[!is.na(dt[x, -1:-2])] 379 | if(length(ev) == 0) { 380 | if (pvals[x] <= significant) { 381 | ev <- 'combined' 382 | } else { 383 | ev <- 'none' 384 | } 385 | } 386 | ev 387 | } 388 | evidence <- lapply(1:nrow(dt), set_evidence) 389 | 390 | set(dt, i=NULL, "evidence", evidence) 391 | colnames(dt)[-1:-2] = paste0("Genes_", colnames(dt)[-1:-2]) 392 | 393 | dt 394 | } 395 | 396 | #' Export the results from ActivePathways as a comma-separated values (CSV) file. 397 | #' 398 | #' @param res the data.table object with ActivePathways results. 399 | #' @param file_name location and name of the CSV file to write to. 400 | #' @export 401 | #' 402 | #' @examples 403 | #' fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv", 404 | #' package = "ActivePathways") 405 | #' fname_GMT = system.file("extdata", "hsapiens_REAC_subset.gmt", 406 | #' package = "ActivePathways") 407 | #' 408 | #' dat <- as.matrix(read.table(fname_scores, header = TRUE, row.names = 'Gene')) 409 | #' dat[is.na(dat)] <- 1 410 | #' 411 | #' res <- ActivePathways(dat, fname_GMT) 412 | #'\donttest{ 413 | #' export_as_CSV(res, "results_ActivePathways.csv") 414 | #'} 415 | export_as_CSV = function (res, file_name) { 416 | overlap_index <- which(grepl("overlap", colnames(res), fixed=TRUE)) 417 | dataset_indices <- which(grepl("Genes_", colnames(res), fixed=TRUE)) 418 | for (i in c(overlap_index, dataset_indices)){ 419 | res[[i]] <- sapply(res[[i]], function(x) paste(x, collapse = "|")) 420 | } 421 | data.table::fwrite(res, file_name) 422 | } 423 | -------------------------------------------------------------------------------- /R/cytoscape.r: -------------------------------------------------------------------------------- 1 | #' Prepare files for building an enrichment map network visualization in Cytoscape 2 | #' 3 | #' This function writes four text files that are used to build an network using 4 | #' Cytoscape and the EnrichmentMap app. The files are prefixed with \code{cytoscape_file_tag}. 5 | #' The four files written are: 6 | #' \describe{ 7 | #' \item{pathways.txt}{A list of significant terms and the 8 | #' associated p-value. Only terms with \code{adjusted_p_val <= significant} are 9 | #' written to this file} 10 | #' \item{subgroups.txt}{A matrix indicating whether the significant 11 | #' pathways are found to be significant when considering only one column (i.e., type of omics evidence) from 12 | #' \code{scores}. A 1 indicates that that term is significant using only that 13 | #' column to test for enrichment analysis} 14 | #' \item{pathways.gmt}{A shortened version of the supplied GMT 15 | #' file, containing only the terms in pathways.txt.} 16 | #' \item{legend.pdf}{A legend with colours matching contributions 17 | #' from columns in \code{scores}} 18 | #' } 19 | #' 20 | #' @param terms A data.table object with the columns 'term_id', 'term_name', 'adjusted_p_val'. 21 | #' @param gmt An abridged GMT object containing only the pathways that were 22 | #' found to be significant in the ActivePathways analysis. 23 | #' @param cytoscape_file_tag The user-defined file prefix and/or directory defining the location of the files. 24 | #' @param col_significance A data.table object with a column 'term_id' and a column 25 | #' for each type of omics evidence indicating whether a term was also found to be significant or not 26 | #' when considering only the genes and p-values in the corresponding column of the \code{scores} matrix. 27 | #' If term was not found, NA's are shown in columns, otherwise the relevant lists of genes are shown. 28 | #' @param color_palette Color palette from RColorBrewer::brewer.pal to color each 29 | #' column in the scores matrix. If NULL grDevices::rainbow is used by default. 30 | #' @param custom_colors A character vector of custom colors for each column in the scores matrix. 31 | #' @param color_integrated_only A character vector of length 1 specifying the color of the "combined" pathway contribution. 32 | #' @import ggplot2 33 | #' 34 | #' @return None 35 | 36 | prepareCytoscape <- function(terms, 37 | gmt, 38 | cytoscape_file_tag, 39 | col_significance, color_palette = NULL, custom_colors = NULL, color_integrated_only = "#FFFFF0") { 40 | if (!is.null(col_significance)) { 41 | 42 | # Obtain the name of each omics dataset and incorporate a 'combined' contribution 43 | tests <- colnames(col_significance)[3:length(colnames(col_significance))] 44 | tests <- substr(tests, 7, 100) 45 | tests <- append(tests, "combined") 46 | 47 | # Create a matrix of ones and zeros, where columns are omics datasets + 'combined' 48 | # and rows are enriched pathways 49 | rows <- 1:nrow(col_significance) 50 | evidence_columns = do.call(rbind, lapply(col_significance$evidence, 51 | function(x) 0+(tests %in% x))) 52 | colnames(evidence_columns) = tests 53 | col_significance = cbind(col_significance[,"term_id"], evidence_columns) 54 | 55 | # Acquire colours from grDevices::rainbow or RColorBrewer::brewer.pal if custom colors are not provided 56 | if(is.null(color_palette) & is.null(custom_colors)) { 57 | col_colors <- grDevices::rainbow(length(tests)) 58 | } else if (!is.null(custom_colors)){ 59 | if (!is.null(names(custom_colors))){ 60 | custom_colors <- custom_colors[order(match(names(custom_colors),tests))] 61 | } 62 | custom_colors <- append(custom_colors, color_integrated_only, after = match("combined",tests)) 63 | col_colors <- custom_colors 64 | } else { 65 | col_colors <- RColorBrewer::brewer.pal(length(tests),color_palette) 66 | } 67 | col_colors <- replace(col_colors, match("combined",tests),color_integrated_only) 68 | if (!is.null(names(col_colors))){ 69 | names(col_colors)[length(col_colors)] <- "combined" 70 | } 71 | 72 | instruct_str <- paste('piechart:', 73 | ' attributelist="', 74 | paste(tests, collapse=','), 75 | '" colorlist="', 76 | paste(col_colors, collapse=','), 77 | '" showlabels=FALSE', sep='') 78 | col_significance[, "instruct" := instruct_str] 79 | 80 | # Writing the Files 81 | utils::write.table(terms, 82 | file=paste0(cytoscape_file_tag, "pathways.txt"), 83 | row.names=FALSE, 84 | sep="\t", 85 | quote=FALSE) 86 | utils::write.table(col_significance, 87 | file=paste0(cytoscape_file_tag, "subgroups.txt"), 88 | row.names=FALSE, 89 | sep="\t", 90 | quote=FALSE) 91 | write.GMT(gmt, 92 | paste0(cytoscape_file_tag, "pathways.gmt")) 93 | 94 | # Making a Legend 95 | dummy_plot = ggplot(data.frame("tests" = factor(tests, levels = tests), 96 | "value" = 1), aes(tests, fill = tests)) + 97 | geom_bar() + 98 | scale_fill_manual(name = "Contribution", values=col_colors) 99 | 100 | grDevices::pdf(file = NULL) # Suppressing Blank Display Device from ggplot_gtable 101 | dummy_table = ggplot_gtable(ggplot_build(dummy_plot)) 102 | grDevices::dev.off() 103 | 104 | legend = dummy_table$grobs[[which(sapply(dummy_table$grobs, function(x) x$name) == "guide-box")]] 105 | 106 | # Estimating height & width 107 | legend_height = ifelse(length(tests) > 20, 108 | 5.5, 109 | length(tests)*0.25+1) 110 | legend_width = ifelse(length(tests) > 20, 111 | ceiling(length(tests)/20)*(max(nchar(tests))*0.05+1), 112 | max(nchar(tests))*0.05+1) 113 | ggsave(legend, 114 | device = "pdf", 115 | filename = paste0(cytoscape_file_tag, "legend.pdf"), 116 | height = legend_height, 117 | width = legend_width, 118 | scale = 1) 119 | 120 | } else { 121 | utils::write.table(terms, 122 | file=paste0(cytoscape_file_tag, "pathways.txt"), 123 | row.names=FALSE, 124 | sep="\t", 125 | quote=FALSE) 126 | write.GMT(gmt, 127 | paste0(cytoscape_file_tag, "pathways.gmt")) 128 | } 129 | } 130 | -------------------------------------------------------------------------------- /R/gmt.r: -------------------------------------------------------------------------------- 1 | #' Read and Write GMT files 2 | #' 3 | #' Functions to read and write Gene Matrix Transposed (GMT) files and to test if 4 | #' an object inherits from GMT. 5 | #' 6 | #' A GMT file describes gene sets, such as biological terms and pathways. GMT files are 7 | #' tab delimited text files. Each row of a GMT file contains a single term with its 8 | #' database ID and a term name, followed by all the genes annotated to the term. 9 | #' 10 | #' @format 11 | #' A GMT object is a named list of terms, where each term is a list with the items: 12 | #' \describe{ 13 | #' \item{id}{The term ID.} 14 | #' \item{name}{The full name or description of the term.} 15 | #' \item{genes}{A character vector of genes annotated to this term.} 16 | #' } 17 | #' @rdname GMT 18 | #' @name GMT 19 | #' @aliases GMT gmt 20 | #' 21 | #' @param filename Location of the gmt file. 22 | #' @param gmt A GMT object. 23 | #' @param x The object to test. 24 | #' 25 | #' @return \code{read.GMT} returns a GMT object. \cr 26 | #' \code{write.GMT} returns NULL. \cr 27 | #' \code{is.GMT} returns TRUE if \code{x} is a GMT object, else FALSE. 28 | #' 29 | #' 30 | #' @examples 31 | #' fname_GMT <- system.file("extdata", "hsapiens_REAC_subset.gmt", package = "ActivePathways") 32 | #' gmt <- read.GMT(fname_GMT) 33 | #' gmt[1:10] 34 | #' gmt[[1]] 35 | #' gmt[[1]]$id 36 | #' gmt[[1]]$genes 37 | #' gmt[[1]]$name 38 | #' gmt$`REAC:1630316` 39 | #' @export 40 | read.GMT <- function(filename) { 41 | gmt <- strsplit(readLines(filename), '\t') 42 | names(gmt) <- sapply(gmt, `[`, 1) 43 | gmt <- lapply(gmt, function(x) { list(id=x[1], name=x[2], genes=x[-c(1,2)]) }) 44 | class(gmt) <- 'GMT' 45 | gmt 46 | } 47 | 48 | #' @rdname GMT 49 | #' @export 50 | write.GMT <- function(gmt, filename) { 51 | if (!is.GMT(gmt)) stop("gmt is not a valid GMT object") 52 | sink(filename) 53 | for (term in gmt) { 54 | cat(term$id, term$name, paste(term$genes, collapse="\t"), sep = "\t") 55 | cat("\n") 56 | } 57 | sink() 58 | } 59 | 60 | #' Make a background list of genes (i.e., the statistical universe) based on all the terms (gene sets, pathways) considered. 61 | #' 62 | #' Returns A character vector of all genes in a GMT object. 63 | #' 64 | #' @param gmt A \link{GMT} object. 65 | #' @return A character vector containing all genes in GMT. 66 | #' @export 67 | #' 68 | #' @examples 69 | #' fname_GMT <- system.file("extdata", "hsapiens_REAC_subset.gmt", package = "ActivePathways") 70 | #' gmt <- read.GMT(fname_GMT) 71 | #' makeBackground(gmt)[1:10] 72 | makeBackground <- function(gmt) { 73 | if (!is.GMT(gmt)) stop('gmt is not a valid GMT object') 74 | unlist(Reduce(function(x, y) union(x, y$genes), gmt, gmt[[1]]$genes)) 75 | } 76 | 77 | ##### Subsetting functions ##### 78 | # Treat as a list but return an object of "GMT" class 79 | #' @export 80 | `[.GMT` <- function(x, i) { 81 | x <- unclass(x) 82 | res <- x[i] 83 | class(res) <- c('GMT') 84 | res 85 | } 86 | #' @export 87 | `[[.GMT` <- function(x, i, exact = TRUE) { 88 | x <- unclass(x) 89 | x[[i, exact = exact]] 90 | } 91 | 92 | #' @export 93 | `$.GMT` <- function(x, i) { 94 | x[[i]] 95 | } 96 | 97 | #' @export 98 | #' @rdname GMT 99 | is.GMT <- function(x) inherits(x, 'GMT') 100 | 101 | # Print a GMT object 102 | #' @export 103 | print.GMT <- function(x, ...) { 104 | num_lines <- min(length(x), getOption("max.print", 99999)) 105 | num_trunc <- length(x) - num_lines 106 | cat(sapply(x[1:num_lines], function(a) paste(a$id, "-", a$name, "\n", 107 | paste(a$genes, collapse=", "), '\n\n'))) 108 | if (num_trunc == 1) { 109 | cat('[ reached getOption("max.print") -- omitted 1 term ]') 110 | } else if (num_trunc > 1) { 111 | cat(paste('[ reached getOption("max.print") -- ommitted', num_trunc, 'terms ]')) 112 | } 113 | } 114 | -------------------------------------------------------------------------------- /R/merge_p.r: -------------------------------------------------------------------------------- 1 | #' Merge a list or matrix of p-values 2 | #' 3 | #' @param scores Either a list/vector of p-values or a matrix where each column is a test. 4 | #' @param method Method to merge p-values. See 'methods' section below. 5 | #' @param scores_direction Either a vector of log2 transformed fold-change values or a matrix where each column is a test. 6 | #' Must contain the same dimensions as the scores parameter. Datasets without directional information should be set to 0. 7 | #' @param constraints_vector A numerical vector of +1 or -1 values corresponding to the user-defined 8 | #' directional relationship between the columns in scores_direction. Datasets without directional information should 9 | #' be set to 0. 10 | #' 11 | #' @return If \code{scores} is a vector or list, returns a number. If \code{scores} is a 12 | #' matrix, returns a named list of p-values merged by row. 13 | #' 14 | #' @section Methods: 15 | #' Eight methods are available to merge a list of p-values: 16 | #' \describe{ 17 | #' \item{Fisher}{Fisher's method (default) assumes that p-values are uniformly 18 | #' distributed and performs a chi-squared test on the statistic sum(-2 log(p)). 19 | #' This method is most appropriate when the columns in \code{scores} are 20 | #' independent.} 21 | #' \item{Fisher_directional}{Fisher's method modification that allows for 22 | #' directional information to be incorporated with the \code{scores_direction} 23 | #' and \code{constraints_vector} parameters.} 24 | #' \item{Brown}{Brown's method extends Fisher's method by accounting for the 25 | #' covariance in the columns of \code{scores}. It is more appropriate when the 26 | #' tests of significance used to create the columns in \code{scores} are not 27 | #' necessarily independent. Note that the "Brown" method cannot be used with a 28 | #' single list of p-values. However, in this case Brown's method is identical 29 | #' to Fisher's method and should be used instead.} 30 | #' \item{DPM}{DPM extends Brown's method by incorporating directional information 31 | #' using the \code{scores_direction} and \code{constraints_vector} parameters.} 32 | #' \item{Stouffer}{Stouffer's method assumes p-values are uniformly distributed 33 | #' and transforms p-values into a Z-score using the cumulative distribution function of a 34 | #' standard normal distribution. This method is appropriate when the columns in \code{scores} 35 | #' are independent.} 36 | #' \item{Stouffer_directional}{Stouffer's method modification that allows for 37 | #' directional information to be incorporated with the \code{scores_direction} 38 | #' and \code{constraints_vector} parameters.} 39 | #' \item{Strube}{Strube's method extends Stouffer's method by accounting for the 40 | #' covariance in the columns of \code{scores}.} 41 | #' \item{Strube_directional}{Strube's method modification that allows for 42 | #' directional information to be incorporated with the \code{scores_direction} 43 | #' and \code{constraints_vector} parameters.} 44 | #' 45 | #' } 46 | #' 47 | #' @examples 48 | #' merge_p_values(c(0.05, 0.09, 0.01)) 49 | #' merge_p_values(list(a=0.01, b=1, c=0.0015, d=0.025), method='Fisher') 50 | #' merge_p_values(matrix(data=c(0.03, 0.061, 0.48, 0.052), nrow = 2), method='Brown') 51 | #' 52 | #' @export 53 | merge_p_values <- function(scores, method = "Fisher", scores_direction = NULL, 54 | constraints_vector = NULL) { 55 | 56 | ##### Validation ##### 57 | # scores 58 | if (is.list(scores)) scores <- unlist(scores, recursive=FALSE) 59 | if (!(is.vector(scores) || is.matrix(scores))) stop("scores must be a matrix or vector") 60 | if (any(is.na(scores))) stop("scores cannot contain missing values, we recommend replacing NA with 1 or removing") 61 | if (!is.numeric(scores)) stop("scores must be numeric") 62 | if (any(scores < 0 | scores > 1)) stop("All values in scores must be in [0,1]") 63 | 64 | # scores_direction and constraints_vector 65 | if (xor(!is.null(scores_direction),!is.null(constraints_vector))) stop("Both scores_direction and constraints_vector must be provided") 66 | if (!is.null(scores_direction) && !is.null(constraints_vector)){ 67 | if (!(is.numeric(constraints_vector) && is.vector(constraints_vector))) stop("constraints_vector must be a numeric vector") 68 | if (any(!constraints_vector %in% c(1,-1,0))) stop("constraints_vector must contain the values: 1, -1 or 0") 69 | if (!(is.vector(scores_direction) || is.matrix(scores_direction))) stop("scores_direction must be a matrix or vector") 70 | if (!all(class(scores_direction) == class(scores))) stop("scores and scores_direction must be the same data type") 71 | if (any(is.na(scores_direction))) stop("scores_direction cannot contain missing values, we recommend replacing NA with 0 or removing") 72 | if (!is.numeric(scores_direction)) stop("scores_direction must be numeric") 73 | if (method %in% c("Fisher","Brown","Stouffer","Strube")) stop("Only DPM, Fisher_directional, Stouffer_directional, and Strube_directional methods support directional integration") 74 | 75 | if (is.matrix(scores_direction)){ 76 | if (nrow(scores) != nrow(scores_direction)) stop("scores and scores_direction must have the same number of rows") 77 | if (any(!rownames(scores_direction) %in% rownames(scores))) stop ("scores_direction gene names must match scores genes") 78 | if (any(rownames(scores) != rownames(scores_direction))) stop("scores genes should be in the same order as scores_direction genes") 79 | if (is.null(colnames(scores)) || is.null(colnames(scores_direction))) stop("column names must be provided to scores and scores_direction") 80 | if (any(!colnames(scores_direction) %in% colnames(scores))) stop("scores_direction column names must match scores column names") 81 | if (length(constraints_vector) != length(colnames(scores_direction))) stop("constraints_vector should have the same number of entries as columns in scores_direction") 82 | if (any(constraints_vector %in% 0) && !all(scores_direction[,constraints_vector %in% 0] == 0)) 83 | stop("scores_direction entries must be set to 0's for columns that do not contain directional information") 84 | if (!is.null(names(constraints_vector))){ 85 | if (!all.equal(names(constraints_vector), colnames(scores_direction), colnames(scores)) == TRUE){ 86 | stop("the constraints_vector entries should match the order of scores and scores_direction columns") 87 | }}} 88 | 89 | if (is.vector(scores_direction)){ 90 | if (length(constraints_vector) != length(scores_direction)) stop("constraints_vector should have the same number of entries as scores_direction") 91 | if (length(scores_direction) != length(scores)) stop("scores_direction should have the same number of entries as scores") 92 | if (any(constraints_vector %in% 0) && !all(scores_direction[constraints_vector %in% 0] == 0)) 93 | stop("scores_direction entries that do not contain directional information must be set to 0's") 94 | if (!is.null(names(constraints_vector))){ 95 | if (!all.equal(names(constraints_vector), names(scores_direction), names(scores)) == TRUE){ 96 | stop("the constraints_vector entries should match the order of scores and scores_direction") 97 | }}}} 98 | 99 | # method 100 | if (!method %in% c("Fisher", "Fisher_directional", "Brown", "DPM", "Stouffer", "Stouffer_directional", "Strube", "Strube_directional")){ 101 | stop("Only Fisher, Brown, Stouffer and Strube methods are currently supported for non-directional analysis. 102 | And only DPM, Fisher_directional, Stouffer_directional, and Strube_directional are supported for directional analysis") 103 | } 104 | if (method %in% c("Fisher_directional", "DPM", "Stouffer_directional", "Strube_directional") & 105 | is.null(scores_direction)){ 106 | stop("scores_direction and constraints_vector must be provided for directional analyses") 107 | } 108 | 109 | 110 | ##### Merge P-values ##### 111 | 112 | # Methods to merge p-values from a scores vector 113 | if (is.vector(scores)){ 114 | if (method == "Brown" || method == "Strube" || method == "DPM" || method == "Strube_directional") { 115 | stop("Brown's, DPM, Strube's, and Strube_directional methods cannot be used with a single list of p-values") 116 | } 117 | 118 | # Convert 0 or very small p-values to 1e-300 119 | if(min(scores) < 1e-300){ 120 | message(paste('warning: p-values smaller than ', 1e-300, ' are replaced with ', 1e-300)) 121 | scores <- sapply(scores, function(x) ifelse (x < 1e-300, 1e-300, x)) 122 | } 123 | 124 | if (method == "Fisher"){ 125 | p_fisher <- stats::pchisq(fishersMethod(scores),2*length(scores), lower.tail = FALSE) 126 | return(p_fisher) 127 | } 128 | 129 | if (method == "Fisher_directional"){ 130 | p_fisher <- stats::pchisq(fishersDirectional(scores, scores_direction,constraints_vector), 131 | 2*length(scores), lower.tail = FALSE) 132 | return(p_fisher) 133 | } 134 | 135 | if (method == "Stouffer"){ 136 | p_stouffer <- 2*stats::pnorm(-1*abs(stouffersMethod(scores))) 137 | return(p_stouffer) 138 | } 139 | if (method == "Stouffer_directional"){ 140 | p_stouffer <- 2*stats::pnorm(-1*abs(stouffersDirectional(scores,scores_direction,constraints_vector))) 141 | return(p_stouffer) 142 | } 143 | } 144 | 145 | # If scores is a matrix with one column, then no p-value merging can be done 146 | if (ncol(scores) == 1) return (scores[, 1, drop=TRUE]) 147 | 148 | # If scores is a matrix with multiple columns, apply the following methods 149 | if(min(scores) < 1e-300){ 150 | message(paste('warning: p-values smaller than ', 1e-300, ' are replaced with ', 1e-300)) 151 | scores <- apply(scores, c(1,2), function(x) ifelse (x < 1e-300, 1e-300, x)) 152 | } 153 | 154 | if (method == "Fisher"){ 155 | fisher_merged <- c() 156 | for(i in 1:length(scores[,1])) { 157 | p_fisher <- stats::pchisq(fishersMethod(scores[i,]), 2*length(scores[i,]), lower.tail = FALSE) 158 | fisher_merged <- c(fisher_merged, p_fisher) 159 | } 160 | names(fisher_merged) <- rownames(scores) 161 | return(fisher_merged) 162 | } 163 | if (method == "Fisher_directional"){ 164 | fisher_merged <- c() 165 | for(i in 1:length(scores[,1])) { 166 | p_fisher <- stats::pchisq(fishersDirectional(scores[i,], scores_direction[i,], constraints_vector), 167 | 2*length(scores[i,]), lower.tail = FALSE) 168 | fisher_merged <- c(fisher_merged,p_fisher) 169 | } 170 | names(fisher_merged) <- rownames(scores) 171 | return(fisher_merged) 172 | } 173 | if (method == "Brown") { 174 | cov_matrix <- calculateCovariances(t(scores)) 175 | brown_merged <- brownsMethod(scores, cov_matrix = cov_matrix) 176 | return(brown_merged) 177 | } 178 | if (method == "DPM") { 179 | cov_matrix <- calculateCovariances(t(scores)) 180 | dpm_merged <- DPM(scores, cov_matrix = cov_matrix, scores_direction = scores_direction, 181 | constraints_vector = constraints_vector) 182 | return(dpm_merged) 183 | } 184 | if (method == "Stouffer"){ 185 | stouffer_merged <- c() 186 | for(i in 1:length(scores[,1])){ 187 | p_stouffer <- 2*stats::pnorm(-1*abs(stouffersMethod(scores[i,]))) 188 | stouffer_merged <- c(stouffer_merged,p_stouffer) 189 | } 190 | names(stouffer_merged) <- rownames(scores) 191 | return(stouffer_merged) 192 | } 193 | if (method == "Stouffer_directional"){ 194 | stouffer_merged <- c() 195 | for(i in 1:length(scores[,1])){ 196 | p_stouffer <- 2*stats::pnorm(-1*abs(stouffersDirectional(scores[i,], scores_direction[i,],constraints_vector))) 197 | stouffer_merged <- c(stouffer_merged,p_stouffer) 198 | } 199 | names(stouffer_merged) <- rownames(scores) 200 | return(stouffer_merged) 201 | } 202 | if (method == "Strube"){ 203 | strube_merged <- strubesMethod(scores) 204 | return(strube_merged) 205 | } 206 | if (method == "Strube_directional"){ 207 | strube_merged <- strubesDirectional(scores,scores_direction,constraints_vector) 208 | return(strube_merged) 209 | } 210 | } 211 | 212 | 213 | fishersMethod <- function(p_values) { 214 | chisq_values <- -2*log(p_values) 215 | sum(chisq_values) 216 | } 217 | 218 | 219 | fishersDirectional <- function(p_values, scores_direction, constraints_vector) { 220 | # Sum the directional chi-squared values 221 | directionality <- constraints_vector * scores_direction/abs(scores_direction) 222 | p_values_directional <- p_values[!is.na(directionality)] 223 | chisq_directional <- abs(-2 * sum(log(p_values_directional)*directionality[!is.na(directionality)])) 224 | 225 | # Sum the non-directional chi-squared values 226 | chisq_nondirectional <- abs(-2 * sum(log(p_values[is.na(directionality)]))) 227 | 228 | # Combine both 229 | sum(c(chisq_directional, chisq_nondirectional)) 230 | } 231 | 232 | 233 | #' Merge p-values using the Brown's method. 234 | #' 235 | #' @param p_values A matrix of m x n p-values. 236 | #' @param data_matrix An m x n matrix representing m tests and n samples. NA's are not allowed. 237 | #' @param cov_matrix A pre-calculated covariance matrix of \code{data_matrix}. This is more 238 | #' efficient when making many calls with the same data_matrix. 239 | #' Only one of \code{data_matrix} and \code{cov_matrix} must be given. If both are supplied, 240 | #' \code{data_matrix} is ignored. 241 | #' @return A p-value vector representing the merged significance of multiple p-values. 242 | #' @export 243 | 244 | # Based on the R package EmpiricalBrownsMethod 245 | # https://github.com/IlyaLab/CombiningDependentPvaluesUsingEBM/blob/master/R/EmpiricalBrownsMethod/R/ebm.R 246 | # Only significant differences are the removal of extra_info and allowing a 247 | # pre-calculated covariance matrix 248 | # 249 | brownsMethod <- function(p_values, data_matrix = NULL, cov_matrix = NULL) { 250 | if (missing(data_matrix) && missing(cov_matrix)) { 251 | stop ("Either data_matrix or cov_matrix must be supplied") 252 | } 253 | if (!(missing(data_matrix) || missing(cov_matrix))) { 254 | message("Both data_matrix and cov_matrix were supplied. Ignoring data_matrix") 255 | } 256 | if (missing(cov_matrix)) cov_matrix <- calculateCovariances(data_matrix) 257 | 258 | N <- ncol(cov_matrix) 259 | expected <- 2 * N 260 | cov_sum <- 2 * sum(cov_matrix[lower.tri(cov_matrix, diag=FALSE)]) 261 | var <- (4 * N) + cov_sum 262 | sf <- var / (2 * expected) 263 | 264 | df <- (2 * expected^2) / var 265 | if (df > 2 * N) { 266 | df <- 2 * N 267 | sf <- 1 268 | } 269 | 270 | # Acquiring the unadjusted chi-squared values from Fisher's method 271 | fisher_chisq <- c() 272 | for(i in 1:length(p_values[,1])) { 273 | fisher_chisq <- c(fisher_chisq, fishersMethod(p_values[i,])) 274 | } 275 | 276 | # Adjusted p-value 277 | p_brown <- stats::pchisq(df=df, q=fisher_chisq/sf, lower.tail=FALSE) 278 | names(p_brown) <- rownames(p_values) 279 | p_brown 280 | } 281 | 282 | 283 | #' Merge p-values using the DPM method. 284 | #' 285 | #' @param p_values A matrix of m x n p-values. 286 | #' @param data_matrix An m x n matrix representing m tests and n samples. NA's are not allowed. 287 | #' @param cov_matrix A pre-calculated covariance matrix of \code{data_matrix}. This is more 288 | #' efficient when making many calls with the same data_matrix. 289 | #' Only one of \code{data_matrix} and \code{cov_matrix} must be given. If both are supplied, 290 | #' \code{data_matrix} is ignored. 291 | #' @param scores_direction A matrix of log2 fold-change values. Datasets without directional information should be set to 0. 292 | #' @param constraints_vector A numerical vector of +1 or -1 values corresponding to the user-defined 293 | #' directional relationship between columns in scores_direction. Datasets without directional information should 294 | #' be set to 0. 295 | #' @return A p-value vector representing the merged significance of multiple p-values. 296 | #' @export 297 | 298 | 299 | DPM <- function(p_values, data_matrix = NULL, cov_matrix = NULL, 300 | scores_direction, constraints_vector) { 301 | if (missing(data_matrix) && missing(cov_matrix)) { 302 | stop ("Either data_matrix or cov_matrix must be supplied") 303 | } 304 | if (!(missing(data_matrix) || missing(cov_matrix))) { 305 | message("Both data_matrix and cov_matrix were supplied. Ignoring data_matrix") 306 | } 307 | if (missing(cov_matrix)) cov_matrix <- calculateCovariances(data_matrix) 308 | 309 | N <- ncol(cov_matrix) 310 | expected <- 2 * N 311 | cov_sum <- 2 * sum(cov_matrix[lower.tri(cov_matrix, diag=FALSE)]) 312 | var <- (4 * N) + cov_sum 313 | sf <- var / (2 * expected) 314 | 315 | df <- (2 * expected^2) / var 316 | if (df > 2 * N) { 317 | df <- 2 * N 318 | sf <- 1 319 | } 320 | 321 | # Acquiring the unadjusted chi-squared value from Fisher's method 322 | fisher_chisq <- c() 323 | for(i in 1:length(p_values[,1])) { 324 | fisher_chisq <- c(fisher_chisq, fishersDirectional(p_values[i,], scores_direction[i,],constraints_vector)) 325 | } 326 | 327 | # Adjusted p-value 328 | p_dpm <- stats::pchisq(df=df, q=fisher_chisq/sf, lower.tail=FALSE) 329 | names(p_dpm) <- rownames(p_values) 330 | p_dpm 331 | } 332 | 333 | 334 | stouffersMethod <- function (p_values){ 335 | k = length(p_values) 336 | z_values <- stats::qnorm(p_values/2) 337 | sum(z_values)/sqrt(k) 338 | } 339 | 340 | 341 | stouffersDirectional <- function (p_values, scores_direction, constraints_vector){ 342 | k = length(p_values) 343 | 344 | # Sum the directional z-values 345 | directionality <- constraints_vector * scores_direction/abs(scores_direction) 346 | p_values_directional <- p_values[!is.na(directionality)] 347 | z_directional <- abs(sum(stats::qnorm(p_values_directional/2)*directionality[!is.na(directionality)])) 348 | 349 | # Sum the non-directional z-values 350 | z_nondirectional <- abs(sum(stats::qnorm(p_values[is.na(directionality)]/2))) 351 | 352 | # Combine both 353 | z_values <- c(z_directional, z_nondirectional) 354 | sum(z_values)/sqrt(k) 355 | } 356 | 357 | 358 | strubesMethod <- function (p_values){ 359 | # Acquiring the unadjusted z-value from Stouffer's method 360 | stouffer_z <- c() 361 | for(i in 1:length(p_values[,1])){ 362 | stouffer_z <- c(stouffer_z,stouffersMethod(p_values[i,])) 363 | } 364 | 365 | # Correlation matrix 366 | cor_mtx <- stats::cor(p_values, use = "complete.obs") 367 | cor_mtx[is.na(cor_mtx)] <- 0 368 | cor_mtx <- abs(cor_mtx) 369 | 370 | # Adjusted p-value 371 | k = length(p_values[1,]) 372 | adjusted_z <- stouffer_z * sqrt(k) / sqrt(sum(cor_mtx)) 373 | p_strube <- 2*stats::pnorm(-1*abs(adjusted_z)) 374 | names(p_strube) <- rownames(p_values) 375 | p_strube 376 | } 377 | 378 | 379 | strubesDirectional <- function (p_values, scores_direction, constraints_vector){ 380 | # Acquiring the unadjusted z-value from Stouffer's method 381 | stouffer_z <- c() 382 | for(i in 1:length(p_values[,1])){ 383 | stouffer_z <- c(stouffer_z,stouffersDirectional(p_values[i,], scores_direction[i,],constraints_vector)) 384 | } 385 | 386 | # Correlation matrix 387 | cor_mtx <- stats::cor(p_values, use = "complete.obs") 388 | cor_mtx[is.na(cor_mtx)] <- 0 389 | cor_mtx <- abs(cor_mtx) 390 | 391 | # Adjusted p-value 392 | k = length(p_values[1,]) 393 | adjusted_z <- stouffer_z * sqrt(k) / sqrt(sum(cor_mtx)) 394 | p_strube <- 2*stats::pnorm(-1*abs(adjusted_z)) 395 | names(p_strube) <- rownames(p_values) 396 | p_strube 397 | } 398 | 399 | 400 | transformData <- function(dat) { 401 | # If all values in dat are the same (equal to y), return dat. The covariance 402 | # matrix will be the zero matrix, and brown's method gives the p-value as y 403 | # Otherwise (dat - dmv) / dvsd is NaN and ecdf throws an error 404 | if (isTRUE(all.equal(min(dat), max(dat)))) return(dat) 405 | 406 | dvm <- mean(dat, na.rm=TRUE) 407 | dvsd <- pop.sd(dat) 408 | s <- (dat - dvm) / dvsd 409 | distr <- stats::ecdf(s) 410 | sapply(s, function(a) -2 * log(distr(a))) 411 | } 412 | 413 | 414 | calculateCovariances <- function(data_matrix) { 415 | transformed_data_matrix <- apply(data_matrix, 1, transformData) 416 | stats::cov(transformed_data_matrix) 417 | } 418 | 419 | 420 | pop.var <- function(x) stats::var(x, na.rm=TRUE) * (length(x) - 1) / length(x) 421 | pop.sd <- function(x) sqrt(pop.var(x)) 422 | -------------------------------------------------------------------------------- /R/statistical_tests.r: -------------------------------------------------------------------------------- 1 | #' Hypergeometric test 2 | #' 3 | #' Perform a hypergeometric test, also known as the Fisher's exact test, on a 2x2 contingency 4 | #' table with the alternative hypothesis set to 'greater'. In this application, the test finds the 5 | #' probability that counts[1, 1] or more genes would be found to be annotated to a term (pathway), 6 | #' assuming the null hypothesis of genes being distributed randomly to terms. 7 | #' 8 | #' @param counts A 2x2 numerical matrix representing a contingency table. 9 | #' 10 | #' @return a p-value of enrichment of genes in a term or pathway. 11 | hypergeometric <- function(counts) { 12 | if (any(counts < 0)) stop('counts contains negative values. Something went very wrong.') 13 | m <- counts[1, 1] + counts[2, 1] 14 | n <- counts[1, 2] + counts[2, 2] 15 | k <- counts[1, 1] + counts[1, 2] 16 | x <- counts[1, 1] 17 | stats::phyper(x-1, m, n, k, lower.tail=FALSE) 18 | } 19 | 20 | 21 | #' Ordered Hypergeometric Test 22 | #' 23 | #' Perform a series of hypergeometric tests (a.k.a. Fisher's Exact tests), on a ranked list of genes ordered 24 | #' by significance against a list of annotation genes. The hypergeometric tests are executed with 25 | #' increasingly larger numbers of genes representing the top genes in order of decreasing scores. 26 | #' The lowest p-value of the series is returned as the optimal enriched intersection of the ranked list of genes 27 | #' and the biological term (pathway). 28 | #' 29 | #' @param genelist Character vector of gene names, assumed to be ordered by decreasing importance. 30 | #' For example, the genes could be ranked by decreasing significance of differential expression. 31 | #' @param background Character vector of gene names. List of all genes used as a statistical background (i.e., the universe). 32 | #' @param annotations Character vector of gene names. A gene set representing a functional term, process or biological pathway. 33 | #' 34 | #' @return a list with the items: 35 | #' \describe{ 36 | #' \item{p_val}{The lowest obtained p-value} 37 | #' \item{ind}{The index of \code{genelist} such that \code{genelist[1:ind]} 38 | #' gives the lowest p-value} 39 | #' } 40 | #' @export 41 | #' 42 | #' @examples 43 | #' orderedHypergeometric(c('HERC2', 'SP100'), c('PHC2', 'BLM', 'XPC', 'SMC3', 'HERC2', 'SP100'), 44 | #' c('HERC2', 'PHC2', 'BLM')) 45 | orderedHypergeometric <- function(genelist, background, annotations) { 46 | # Only test subsets of genelist that end with a gene in annotations since 47 | # these are the only tests for which the p-value can decrease 48 | which_in <- which(genelist %in% annotations) 49 | if (length(which_in) == 0) return(list(p_val=1, ind=1)) 50 | 51 | # Construct the counts matrix for the first which_in[1] genes 52 | gl <- genelist[1:which_in[1]] 53 | cl <- setdiff(background, gl) 54 | genelist0 <- length(gl) - 1 55 | complement1 <- length(which(cl %in% annotations)) 56 | complement0 <- length(cl) - complement1 57 | counts <- matrix(data=c(1, genelist0, complement1, complement0), nrow=2) 58 | scores <- hypergeometric(counts) 59 | 60 | if (length(which_in) == 1) return(list(p_val=scores, ind=which_in[1])) 61 | 62 | # Update counts and recalculate score for the rest of the indeces in which_in 63 | # The genes in genelist[which_in[i]:which_in[i-1]] are added to the genes 64 | # being tested and removed from the complement. Of these, 1 will always be 65 | # in annotations and the rest will not. Therefore we can just modify the 66 | # contingency table rather than recounting which genes are in annotations 67 | for (i in 2:length(which_in)) { 68 | diff <- which_in[i] - which_in[i-1] 69 | counts[1, 1] <- i 70 | counts[2, 1] <- counts[2, 1] + diff - 1 71 | counts[1, 2] <- counts[1, 2] - 1 72 | counts[2, 2] <- counts[2, 2] - diff + 1 73 | scores[i] <- hypergeometric(counts) 74 | } 75 | 76 | # Return the lowest p-value and the associated index 77 | min_score <- min(scores) 78 | 79 | ind = which_in[max(which(scores==min_score))] 80 | p_val = min_score 81 | 82 | list(p_val=p_val, ind=ind) 83 | } 84 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ActivePathways - integrative pathway analysis of multi-omics data 2 | 3 | 4 | **July 28th 2024: ActivePathways version 2.0.5 is now available on CRAN and GitHub that fixes a minor bug on exporing unfiltered results as CSV files. The major update 2.0 provides the directional p-value merging (DPM) method described in our recent publication.** 5 | 6 | ActivePathways is a tool for multivariate pathway enrichment analysis that identifies gene sets, such as pathways or Gene Ontology terms, that are over-represented in a list or matrix of genes. ActivePathways uses a data fusion method to combine multiple omics datasets, prioritises genes based on the significance and direction of signals from the omics datasets, and performs pathway enrichment analysis of these prioritised genes. We can find pathways and genes supported by single or multiple omics datasets, as well as additional genes and pathways that are only apparent through data integration and remain undetected in any single dataset alone. 7 | 8 | The new version of ActivePathways is described in our recent publication. 9 | 10 | Mykhaylo Slobodyanyuk^, Alexander T. Bahcheli^, Zoe P. Klein, Masroor Bayati, Lisa J. Strug, Jüri Reimand. Directional integration and pathway enrichment analysis for multi-omics data. *Nature Communications* 15, 5690 (2024). (^ - co-first authors) 11 | 12 | 13 | 14 | The first version of ActivePathways was published in Nature Communications with the PCAWG Pan-Cancer project. 15 | 16 | Marta Paczkowska^, Jonathan Barenboim^, Nardnisa Sintupisut, Natalie S. Fox, Helen Zhu, Diala Abd-Rabbo, Miles W. Mee, Paul C. Boutros, PCAWG Drivers and Functional Interpretation Working Group, PCAWG Consortium, Juri Reimand. Integrative pathway enrichment analysis of multivariate omics data. *Nature Communications* 11, 735 (2020) (^ - co-first authors) 17 | 18 | 19 | 20 | The package version 2.0.3 used in the DPM preprint and manuscript is archived on Zenodo: . 21 | 22 | ## Installation 23 | 24 | Package tested with: MacOS 14, Windows 11, Ubuntu 20.04. 25 | 26 | Software dependencies: data.table, ggplot2, testthat, knitr, rmarkdown, RColorBrewer. 27 | 28 | Installation time: less than 2 minutes. 29 | 30 | #### From CRAN: ActivePathways 2.0.5 is currently the most recent version 31 | Open R and run `install.packages('ActivePathways')` 32 | 33 | #### Using devtools on our GitHub repository 34 | Using the R package `devtools`, run 35 | `devtools::install_github('https://github.com/reimandlab/ActivePathways', build_vignettes = TRUE)` 36 | 37 | #### From source on our GitHub repository 38 | Clone the repository, for example using `git clone https://github.com/reimandlab/ActivePathways.git`. 39 | 40 | Open R in the directory where you cloned the package and run `install.packages("ActivePathways", repos = NULL, type = "source")` 41 | 42 | 43 | 44 | ## Using ActivePathways 45 | 46 | See the vignette for more details. Run `browseVignettes(package='ActivePathways')` in R. 47 | 48 | 49 | ### Examples 50 | 51 | The simplest use of ActivePathways requires only a data table and a GMT file. The data table is a matrix of p-values of genes/transcripts/proteins as rows and omics datasets as columns. it also needs a list of gene sets in the form of a GMT [(Gene Matrix Transposed)](https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29) file. 52 | 53 | * The data table must be a numerical matrix. For a single gene list, a one-column matrix can be used. The matrix cannot contain any missing values, and one conservative option is to re-assign all missing values as 1s, indicating our confidence that the missing P-values are always insignificant. Alternatively, one may consider removing genes with NA values. 54 | 55 | * Gene sets in the form of a GMT file can be acquired from multiple [sources](https://baderlab.org/GeneSets) such as Gene Ontology, Reactome and others. For better accuracy and statistical power these pathway databases should be combined. Acquiring an [up-to-date GMT file](http://download.baderlab.org/EM_Genesets/current_release/) is essential to avoid using unreliable outdated annotations [(see this paper)](https://www.nature.com/articles/nmeth.3963). 56 | 57 | ```R 58 | 59 | library(ActivePathways) 60 | 61 | ## 62 | # Run an example using the data files included in the ActivePathways package. 63 | # This basic example does not incorporate directionality. 64 | ## 65 | 66 | fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv", 67 | package = "ActivePathways") 68 | fname_GMT <- system.file("extdata", "hsapiens_REAC_subset.gmt", 69 | package = "ActivePathways") 70 | 71 | ## 72 | # Numeric matrix of p-values is required as input. 73 | # NA values are converted to P = 1. 74 | ## 75 | 76 | scores <- read.table(fname_scores, header = TRUE, row.names = 'Gene') 77 | scores <- as.matrix(scores) 78 | scores[is.na(scores)] <- 1 79 | 80 | 81 | ## 82 | # Main call of ActivePathways function: 83 | ## 84 | 85 | enriched_pathways <- ActivePathways(scores, fname_GMT) 86 | 87 | #35 terms were removed from gmt because they did not make the geneset_filter 88 | #91 rows were removed from scores because they are not found in the background 89 | 90 | 91 | ## 92 | # list a few first results of enriched pathways identified by ActivePathways 93 | ## 94 | 95 | enriched_pathways[1:3,] 96 | 97 | # term_id term_name adjusted_p_val term_size 98 | #1: REAC:2424491 DAP12 signaling 4.491268e-05 358 99 | #2: REAC:422475 Axon guidance 2.028966e-02 555 100 | #3: REAC:177929 Signaling by EGFR 6.245734e-04 366 101 | # overlap evidence 102 | #1: TP53,PIK3CA,KRAS,PTEN,BRAF,NRAS,... CDS 103 | #2: PIK3CA,KRAS,BRAF,NRAS,CALM2,RPS6KA3,... X3UTR,promCore 104 | #3: TP53,PIK3CA,KRAS,PTEN,BRAF,NRAS,... CDS 105 | # Genes_X3UTR Genes_X5UTR 106 | #1: NA NA 107 | #2: CALM2,ARPC2,RHOA,NUMB,CALM1,ACTB,... NA 108 | #3: NA NA 109 | # Genes_CDS 110 | #1: TP53,PTEN,KRAS,PIK3CA,BRAF,NRAS,... 111 | #2: NA 112 | #3: TP53,PTEN,KRAS,PIK3CA,BRAF,NRAS,... 113 | # Genes_promCore 114 | #1: NA 115 | #2: EFNA1,IQGAP1,COL4A1,SCN2B,RPS6KA3,CALM2,... 116 | #3: NA 117 | 118 | ## 119 | # Show enriched genes of the first pathway 'DAP12 signalling' 120 | # the column `overlap` displays genes of the integrated dataset (from 121 | # data fusion, i.e., p-value merging) that occur in the given pathway. 122 | # Genes are ranked by joint significance across input omics datasets. 123 | ## 124 | 125 | enriched_pathways[["overlap"]][[1]] 126 | # [1] "TP53" "PIK3CA" "KRAS" "PTEN" "BRAF" "NRAS" "B2M" "CALM2" 127 | # [9] "CDKN1A" "CDKN1B" 128 | 129 | ## 130 | # Save the resulting pathways as a Comma-Separated Values (CSV) file 131 | # for spreadsheets and computational pipelines. 132 | # the data.table object cannot be saved directly as text. 133 | ## 134 | 135 | export_as_CSV(enriched_pathways, "enriched_pathways.csv") 136 | 137 | 138 | ## 139 | # Examine a few lines of the two major types of input 140 | ## 141 | 142 | ## 143 | # The scores matrix includes p-values for genes (rows) 144 | # and evidence of different omics datasets (columns). 145 | # This dataset includes predicted cancer driver mutations 146 | # in gene coding/CDS, 5'UTR, 3'UTR, and core promoter sequences 147 | ## 148 | 149 | head(scores, n = 3) 150 | 151 | # X3UTR X5UTR CDS promCore 152 | #A2M 1.0000000 0.33396764 0.9051708 0.4499201 153 | #AAAS 1.0000000 0.42506012 0.7047723 0.7257641 154 | #ABAT 0.9664126 0.04202735 0.7600985 0.1903789 155 | 156 | ## 157 | # GMT files include functional gene sets (pathways, processes). 158 | # Each tab-separated line represents a gene set: 159 | # gene set ID, description followed by gene symbols. 160 | # Gene symbols in the scores table and the GMT file need to match. 161 | # NB: this GMT file is a small subset of the real GMT file built for testing. 162 | # It should not be used for real analyses. 163 | ## 164 | 165 | readLines(fname_GMT)[11:13] 166 | 167 | #[1] "REAC:3656535\tTGFBR1 LBD Mutants in Cancer\tTGFB1\tFKBP1A\tTGFBR2\tTGFBR1\t" 168 | #[2] "REAC:73927\tDepurination\tOGG1\tMPG\tMUTYH\t" 169 | #[3] "REAC:5602410\tTLR3 deficiency - HSE\tTLR3\t" 170 | 171 | 172 | ``` 173 | 174 | ### Examples - Directional integration of multi-omics data 175 | 176 | ActivePathways 2.0 extends our integrative pathway analysis framework significantly. Users can now provide directional assumptions of input omics datasets for more accurate analyses. This allows us to prioritise genes and pathways where certain directional assumptions are met, and penalise those where the assumptions are violated. 177 | 178 | For example, fold-change in protein expression would be expected to associate positively with mRNA fold-change of the corresponding gene, while negative associations would be unexpected and indicate more-complex situations or potential false positives. We can instruct the pathway analysis to prioritise positively-associated protein/mRNA pairs and penalise negative associations (or vice versa). 179 | 180 | Two additional inputs are included in ActivePathways that allow diverse multi-omics analyses. These inputs are optional. 181 | 182 | The scores_direction and constraints_vector parameters are provided in the merge_p_values() and ActivePathways() functions to incorporate this directional penalty into the data fusion and pathway enrichment analyses. 183 | 184 | The parameter constraints_vector is a vector that allows the user to represent the expected relationship between the input omics datasets. The vector size is n_datasets. Values include +1, -1, and 0. The constraints_vector should reflect the expected *relative* directional relationship between datasets. For example, the constraints_vector values c(-1,1) and c(1,-1) are functionally identical. When combining datasets that contain both directional datatypes (eg gene or protein expression, gene promoter methylation) and non-directional datatypes (eg gene mutational burden, ChIP-seq), we can define the relative relationship between directional datatypes with the values 1 and -1 while setting the value of non-directional datatypes to 0. 185 | 186 | The parameter scores_direction is a matrix that reflects the directions that the genes/transcripts/protein show in the data. The matrix size is n_genes * n_datasets, that is the same size as the P-value matrix. This is a numeric matrix, but only the signs of the values are accounted for. 187 | 188 | #### Directional data integration at the gene level 189 | 190 | ```R 191 | 192 | ## 193 | # load a dataset of P-values and fold-changes for mRNA and protein levels 194 | # this dataset is embedded in the package 195 | ## 196 | fname_data_matrix <- system.file('extdata', 197 | 'Differential_expression_rna_protein.tsv', 198 | package = 'ActivePathways') 199 | pvals_FCs <- read.table(fname_data_matrix, header = TRUE, sep = '\t') 200 | 201 | # examine a few example genes 202 | example_genes <- c('ACTN4','PIK3R4','PPIL1','NELFE','LUZP1','ITGB2') 203 | pvals_FCs[pvals_FCs$gene %in% example_genes,] 204 | 205 | # gene rna_pval rna_log2fc protein_pval protein_log2fc 206 | #73 PIK3R4 1.266285e-03 1.1557077 2.791135e-03 -0.8344799 207 | #74 PPIL1 1.276838e-03 -1.1694221 1.199303e-04 -1.1193605 208 | #606 NELFE 1.447553e-02 -0.9120687 1.615592e-05 -1.2630114 209 | #4048 LUZP1 3.253382e-05 1.5830796 4.129125e-02 0.5791377 210 | #4050 ITGB2 4.584450e-05 1.6472117 1.327997e-01 0.4221579 211 | #4052 ACTN4 5.725503e-05 1.5531533 8.238317e-07 1.4279158 212 | 213 | ## 214 | # create a matrix of gene/protein P-values. 215 | # where the columns are different omics datasets (mRNA, protein) 216 | # and the rows are genes. 217 | ## 218 | 219 | pval_matrix <- data.frame( 220 | row.names = pvals_FCs$gene, 221 | rna = pvals_FCs$rna_pval, 222 | protein = pvals_FCs$protein_pval) 223 | pval_matrix <- as.matrix(pval_matrix) 224 | 225 | ## 226 | # examine a few genes in the P-value matrix 227 | ## 228 | 229 | pval_matrix[example_genes,] 230 | # rna protein 231 | #ACTN4 5.725503e-05 8.238317e-07 232 | #PIK3R4 1.266285e-03 2.791135e-03 233 | #PPIL1 1.276838e-03 1.199303e-04 234 | #NELFE 1.447553e-02 1.615592e-05 235 | #LUZP1 3.253382e-05 4.129125e-02 236 | #ITGB2 4.584450e-05 1.327997e-01 237 | 238 | ## 239 | # convert missing values to P = 1 240 | ## 241 | 242 | pval_matrix[is.na(pval_matrix)] <- 1 243 | 244 | ## 245 | # Create a matrix of gene/protein directions 246 | # similarly to the P-value matrix (i.e., scores_direction) 247 | ## 248 | 249 | dir_matrix <- data.frame( 250 | row.names = pvals_FCs$gene, 251 | rna = pvals_FCs$rna_log2fc, 252 | protein = pvals_FCs$protein_log2fc) 253 | dir_matrix <- as.matrix(dir_matrix) 254 | 255 | ## 256 | # ActivePathways only uses the signs of the direction values (ie +1 or -1). 257 | ## 258 | 259 | dir_matrix <- sign(dir_matrix) 260 | 261 | ## 262 | # if directions are missing (NA), we recommend setting the values to zero 263 | ## 264 | 265 | dir_matrix[is.na(dir_matrix)] <- 0 266 | 267 | ## 268 | # examine a few genes in the direction matrix 269 | ## 270 | 271 | dir_matrix[example_genes,] 272 | # rna protein 273 | #ACTN4 1 1 274 | #PIK3R4 1 -1 275 | #PPIL1 -1 -1 276 | #NELFE -1 -1 277 | #LUZP1 1 1 278 | #ITGB2 1 1 279 | 280 | ## 281 | # This matrix has to be accompanied by a vector that 282 | # provides the expected relationship between the 283 | # different datasets. Here, mRNA levels and protein 284 | # levels are expected to have consistent directions: 285 | # either both positive or both negative (eg log fold-change). 286 | ## 287 | 288 | constraints_vector <- c(1,1) 289 | 290 | ## 291 | # Alternatively, we can use another vector to prioritise 292 | # genes/proteins where the directions are the opposite. 293 | ## 294 | 295 | # constraints_vector <- c(1,-1) 296 | 297 | ## 298 | # Now we merge the P-values of the two datasets 299 | # using directional assumtions and compare these 300 | # with the plain non-directional merging. 301 | # The top 5 scoring genes differ if we penalise genes 302 | # where this directional logic is violated: 303 | # While 4 of 5 genes retain significance, the gene PIK3R4 is penalised. 304 | # Interestingly, as a consequence of penalising PIK3R4, 305 | # other genes such as ITGB2 move up in rank. 306 | ## 307 | 308 | directional_merged_pvals <- merge_p_values(pval_matrix, 309 | method = "DPM", dir_matrix, constraints_vector) 310 | 311 | merged_pvals <- merge_p_values(pval_matrix, method = "Brown") 312 | 313 | 314 | sort(merged_pvals)[1:5] 315 | # ACTN4 PPIL1 NELFE LUZP1 PIK3R4 316 | #1.168708e-09 2.556067e-06 3.804646e-06 1.950607e-05 4.790125e-05 317 | 318 | 319 | sort(directional_merged_pvals)[1:5] 320 | # ACTN4 PPIL1 NELFE LUZP1 ITGB2 321 | #1.168708e-09 2.556067e-06 3.804646e-06 1.950607e-05 7.920157e-05 322 | 323 | ## 324 | # PIK3R4 is penalised because the fold-changes of its mRNA and 325 | # protein levels are significant and have the opposite signs: 326 | ## 327 | 328 | pvals_FCs[pvals_FCs$gene == "PIK3R4",] 329 | # gene rna_pval rna_log2fc protein_pval protein_log2fc 330 | #73 PIK3R4 0.001266285 1.155708 0.002791135 -0.8344799 331 | 332 | pval_matrix["PIK3R4",] 333 | # rna protein 334 | #0.001266285 0.002791135 335 | 336 | dir_matrix["PIK3R4",] 337 | # rna protein 338 | # 1 -1 339 | 340 | merged_pvals["PIK3R4"] 341 | # PIK3R4 342 | #4.790125e-05 343 | 344 | directional_merged_pvals["PIK3R4"] 345 | # PIK3R4 346 | #0.8122527 347 | 348 | ``` 349 | To assess the impact of the directional penalty on gene merged P-value signals we create a plot showing directional results on the y axis and non-directional results on the x. Blue dots are prioritised hits, red dots are penalised. 350 | 351 | ```R 352 | lineplot_df <- data.frame(original = -log10(merged_pvals), 353 | modified = -log10(directional_merged_pvals)) 354 | 355 | ggplot(lineplot_df) + 356 | geom_point(size = 2.4, shape = 19, 357 | aes(original, modified, 358 | color = ifelse(original <= -log10(0.05),"gray", 359 | ifelse(modified > -log10(0.05),"#1F449C","#F05039")))) + 360 | labs(title = "", 361 | x ="Merged -log10(P)", 362 | y = "Directional Merged -log10(P)") + 363 | geom_hline(yintercept = 1.301, linetype = "dashed", 364 | col = 'black', size = 0.5) + 365 | geom_vline(xintercept = 1.301, linetype = "dashed", 366 | col = "black", size = 0.5) + 367 | geom_abline(size = 0.5, slope = 1,intercept = 0) + 368 | scale_color_identity() 369 | 370 | ``` 371 | 372 | ![](vignettes/lineplot_tutorial.png) 373 | 374 | #### Pathway-level insight 375 | To explore how changes on the individual gene level impact biological pathways, we can compare results before and after incorporating a directional penalty. 376 | 377 | ```R 378 | 379 | ## 380 | # use the example GMT file embedded in the package 381 | ## 382 | 383 | fname_GMT2 <- system.file("extdata", "hsapiens_REAC_subset2.gmt", 384 | package = "ActivePathways") 385 | ## 386 | # Integrative pathway enrichment analysis with no directionality 387 | ## 388 | enriched_pathways <- ActivePathways( 389 | pval_matrix, gmt = fname_GMT2, cytoscape_file_tag = "Original_") 390 | 391 | ## 392 | # Directional integration and pathway enrichment analysis 393 | # this analysis the directional coefficients and constraints_vector from 394 | # the gene-based analysis described above 395 | ## 396 | 397 | constraints_vector 398 | # [1] 1 1 399 | 400 | dir_matrix[example_genes,] 401 | # rna protein 402 | #ACTN4 1 1 403 | #PIK3R4 1 -1 404 | #PPIL1 -1 -1 405 | #NELFE -1 -1 406 | #LUZP1 1 1 407 | #ITGB2 1 1 408 | 409 | enriched_pathways_directional <- ActivePathways( 410 | pval_matrix, gmt = fname_GMT2, cytoscape_file_tag = "Directional_", 411 | merge_method = "DPM", scores_direction = dir_matrix, constraints_vector = constraints_vector) 412 | 413 | ## 414 | # Examine the pathways that are lost when 415 | # directional information is incorporated in the data integration 416 | ## 417 | 418 | pathways_lost_in_directional_integration = 419 | setdiff(enriched_pathways$term_id, enriched_pathways_directional$term_id) 420 | pathways_lost_in_directional_integration 421 | #[1] "REAC:R-HSA-3858494" "REAC:R-HSA-69206" "REAC:R-HSA-69242" 422 | #[4] "REAC:R-HSA-9013149" 423 | 424 | enriched_pathways[enriched_pathways$term_id %in% pathways_lost_in_directional_integration,] 425 | # term_id term_name adjusted_p_val 426 | #1: REAC:R-HSA-3858494 Beta-catenin independent WNT signaling 0.013437464 427 | #2: REAC:R-HSA-69206 G1/S Transition 0.026263457 428 | #3: REAC:R-HSA-69242 S Phase 0.009478766 429 | #4: REAC:R-HSA-9013149 RAC1 GTPase cycle 0.047568911 430 | # term_size overlap evidence 431 | #1: 143 PSMA5,PSMB4,PSMC5,PSMD11,PSMA8,GNG13,... rna,protein 432 | #2: 130 PSMA5,PSMB4,CDK4,PSMC5,PSMD11,CCNB1,... protein 433 | #3: 162 PSMA5,PSMB4,RFC3,CDK4,PSMC5,PSMD11,... combined 434 | #4: 184 SRGAP1,TIAM1,BAIAP2,FMNL1,DOCK9,PAK3,... rna 435 | # Genes_rna 436 | #1: GNG13,PSMC1,PSMA5,PSMB4,ITPR3,DVL1,... 437 | #2: NA 438 | #3: NA 439 | #4: SRGAP1,TIAM1,FMNL1,ARHGAP30,FARP2,DOCK10,... 440 | # Genes_protein 441 | #1: PSMA8,PSMD11,PSMA5,PRKG1,PSMD10,PSMB4,... 442 | #2: PSMD11,PSMA5,PSMD10,PSMB4,CDK7,ORC2,... 443 | #3: NA 444 | #4: NA 445 | 446 | 447 | ## 448 | # An example of a lost pathway is Beta-catenin independent WNT signaling. 449 | # Out of the 32 genes that contribute to this pathway enrichment, 450 | # 10 genes are in directional conflict. The enrichment is no longer 451 | # identified when these genes are penalised due to the conflicting 452 | # log2 fold-change directions. 453 | ## 454 | 455 | wnt_pathway_id <- "REAC:R-HSA-3858494" 456 | enriched_pathway_genes <- unlist( 457 | enriched_pathways[enriched_pathways$term_id == wnt_pathway_id,]$overlap) 458 | enriched_pathway_genes 459 | # [1] "PSMA5" "PSMB4" "PSMC5" "PSMD11" "PSMA8" "GNG13" "SMURF1" "PSMC1" 460 | # [9] "PSMA4" "PLCB2" "PRKG1" "PSMD4" "PSMD1" "PSMD10" "PSMA6" "PSMA2" 461 | #[17] "PSMA1" "PRKCA" "PSMC6" "RHOA" "PSMB3" "PSMB1" "PSME3" "ITPR3" 462 | #[25] "AGO4" "DVL3" "PSMA3" "PPP3R1" "DVL1" "CLTA" "PSME2" "CALM1" 463 | #[33] "PSMD6" "PSMB6" 464 | 465 | ## 466 | # examine the pathway genes that have directional disagreement and 467 | # contribute to the lack of pathway enrichment in the directional analysis 468 | ## 469 | 470 | pathway_gene_pvals = pval_matrix[enriched_pathway_genes,] 471 | pathway_gene_directions = dir_matrix[enriched_pathway_genes,] 472 | 473 | directional_conflict_genes = names(which( 474 | pathway_gene_directions[,1] != pathway_gene_directions[,2] & 475 | pathway_gene_directions[,1] != 0 & pathway_gene_directions[,2] != 0)) 476 | 477 | pathway_gene_pvals[directional_conflict_genes,] 478 | # rna protein 479 | #PSMD11 0.34121101 0.002094310 480 | #PSMA8 0.55510836 0.001415197 481 | #SMURF1 0.03353629 0.042995333 482 | #PSMD1 0.04650877 0.100178048 483 | #RHOA 0.01786687 0.474628084 484 | #PSME3 0.07148904 0.130184883 485 | #ITPR3 0.01660850 0.589929787 486 | #DVL3 0.46381447 0.022535743 487 | #PSME2 0.03274707 0.514351089 488 | #PSMB6 0.02863259 0.677224905 489 | 490 | pathway_gene_directions[directional_conflict_genes,] 491 | # rna protein 492 | #PSMD11 1 -1 493 | #PSMA8 1 -1 494 | #SMURF1 1 -1 495 | #PSMD1 1 -1 496 | #RHOA -1 1 497 | #PSME3 1 -1 498 | #ITPR3 1 -1 499 | #DVL3 1 -1 500 | #PSME2 -1 1 501 | #PSMB6 -1 1 502 | 503 | length(directional_conflict_genes) 504 | #[1] 10 505 | 506 | 507 | ``` 508 | To visualise differences in biological pathways between ActivePathways analyses with or without a directional penalty, we combine both outputs into a single enrichment map for [plotting](#visualising-directional-impact-with-node-borders). 509 | 510 | More thorough documentation of the ActivePathways function can be found in R with `?ActivePathways`, and complete tutorials can be found with `browseVignettes(package='ActivePathways')`. 511 | 512 | 513 | # Visualising pathway enrichment results using enrichment maps in Cytoscape 514 | 515 | Cytoscape provides powerful tools to visualise the enriched pathways from `ActivePathways` as a network (i.e., an enrichment map). `ActivePathways` provides the files needed for building enrichment maps in Cytoscape. To create these files, supply a file prefix to the argument `cytoscape_file_tag` in the ActivePathways() function. The prefix can be a path to an existing writable directory. 516 | 517 | ```{r} 518 | res <- ActivePathways(scores, fname_GMT, cytoscape_file_tag = "enrichmentMap__") 519 | ``` 520 | Four files are written using the prefix: 521 | 522 | * `enrichmentMap__pathways.txt`: a table of significant terms and the associated adjusted P-values. Terms include molecular pathways, biological processes, and other gene sets. Note that only terms with `adjusted_p_val <= significant` are written. 523 | 524 | * `enrichmentMap__subgroups.txt`: a matrix indicating which columns of the input matrix (i.e., which omics datasets) contributed to the discovery of each pathway. These values correspond to the `evidence` evaluation of input omics datasets discussed above. A value of one indicates the pathway was also detectable using a specific input omics dataset. A value of zero indicates otherwise. This file will be not generated if the input matrix is a single-column matrix of scores (just one omics dataset). 525 | 526 | * `enrichmentMap__pathways.gmt`: a shortened version of the supplied GMT file, containing only the significant pathways detected by `ActivePathways`. 527 | 528 | * `enrichmentMap__legend.pdf`: a pdf file containing a legend, with colors corresponding to the different omics datasets visualised in the enrichment map. This can be used as a reference to the generated enrichment map. 529 | 530 | ## Creating enrichment maps using results of ActivePathways 531 | 532 | Pathway enrichment analysis often leads to complex and redundant results. Enrichment maps are network-based visualisations of pathway enrichment analyses. Enrichment maps can be generated in the Cytoscape software using the EnrichmentMap app. **The enhancedGraphics app is also required**. See the vignette for details: `browseVignettes(package='ActivePathways')`. 533 | 534 | 535 | ## Required software 536 | 537 | 1. Cytoscape, see 538 | 2. EnrichmentMap app of Cytoscape, see menu Apps>App manager or 539 | 3. EnhancedGraphics app of Cytoscape, see menu Apps>App manager or 540 | 541 | ## Creating the enrichment map 542 | 543 | * Open the Cytoscape software. 544 | * Select *Apps -> EnrichmentMap*. 545 | * In the following dialogue, click the button `+` *Add Data Set from Files* in the top left corner of the dialogue. 546 | * Change the Analysis Type to Generic/gProfiler/Enrichr. 547 | * Upload the files `enrichmentMap__pathways.txt` and `enrichmentMap__pathways.gmt` in the *Enrichments* and *GMT* fields, respectively. 548 | * Click the checkbox *Show Advanced Options* and set *Cutoff* to 0.6. 549 | * Then click *Build* in the bottom-right corner to create the enrichment map. 550 | 551 | ![](vignettes/CreateEnrichmentMapDialogue_V2.png) 552 | 553 | ![](vignettes/NetworkStep1_V2.png) 554 | 555 | 556 | ## Colour the nodes of the network to visualise supporting omics datasets 557 | 558 | The third file `enrichmentMap__subgroups.txt` needs to be imported to Cytoscape directly in order to color nodes (i.e. terms) according to their source omics datasets. To import the file, select the menu option *File -> Import -> Table from File* and select the file `enrichmentMap__subgroups.txt`. In the following dialogue, select *To a Network Collection* in the dropdown menu *Where to Import Table Data*. Click OK to proceed. 559 | 560 | ![](vignettes/ImportStep_V2.png) 561 | 562 | Cytoscape uses the imported information to color nodes like a pie chart. To enable this click the Style tab in the left control panel and select the Image/Chart1 Property in a series of dropdown menus (*Properties -> Paint -> Custom Paint 1 -> Image/Chart 1*). 563 | 564 | ![](vignettes/PropertiesDropDown2_V2.png) 565 | 566 | The *Image/Chart 1* property now appears in the Style control panel. Click the triangle on the right, then set the *Column* to *instruct* and the *Mapping Type* to *Passthrough Mapping*. 567 | 568 | ![](vignettes/StylePanel_V2.png) 569 | 570 | This step colours the nodes corresponding to the enriched pathways according to the supporting omics datasets, based on the scores matrix initially analysed in `ActivePathways`. 571 | 572 | ![](vignettes/NetworkStep2_V2.png) 573 | 574 | `ActivePathways` generates a color legend in the file `enrichmentMap__legend.pdf` that shows which colors correspond to which omics datasets. 575 | 576 | ![](vignettes/LegendView.png) 577 | 578 | Note that one of the colors corresponds to a subset of enriched pathways with *combined* evidence. These terms were only detected through data fusion and P-value merging, and not with any of the input datasets individually. This exemplifies the added value of integrative multi-omics pathway enrichment analysis. 579 | 580 | ## Visualising directional impact with node borders 581 | 582 | From the drop-down Properties menu, select *Border Line Type*. 583 | 584 | ![](vignettes/border_line_type.jpg) 585 | 586 | Set *Column* to *directional impact* and *Mapping Type* to *Discrete Mapping*. Now we can compare findings between a non-directional and a directional method. We highlight pathways that were shared (0), lost (1), and gained (2) between the approaches. Here, we have solid lines for the shared pathways, dots for the lost pathways, and vertical lines for the gained pathways. Border widths can be adjusted in the *Border Width* property, again with discrete mapping. 587 | 588 | ![](vignettes/set_aesthetic.jpg) 589 | 590 | This step changes node borders in the aggregated enrichment map, depicting the additional information provided by directional impact. 591 | 592 | ![](vignettes/new_map.png) 593 | 594 | ![](vignettes/legend.png) 595 | 596 | ## Alternative node coloring 597 | 598 | For a more diverse range of colors, ActivePathways supports any color palette from RColorBrewer. The color_palette parameter must be provided. 599 | ```{r} 600 | res <- ActivePathways(scores, gmt_file, 601 | cytoscape_file_tag = "enrichmentMap__", 602 | color_palette = "Pastel1") 603 | ``` 604 | ![](vignettes/LegendView_RColorBrewer.png) 605 | 606 | Alternatively, the custom_colors parameter can be specified as a vector to manually input the color of each dataset. This vector should contain the same number of colors as columns in the scores matrix. 607 | ```{r} 608 | res <- ActivePathways(scores, gmt_file, 609 | cytoscape_file_tag = "enrichmentMap__", 610 | custom_colors = c("violet","green","orange","red")) 611 | ``` 612 | ![](vignettes/LegendView_Custom.png) 613 | 614 | To change the color of the *combined* contribution, a color must be provided to the color_integrated_only parameter. 615 | 616 | **If the coloring of nodes did not work in Cytoscape after setting the options in the Style panel, check that the EnhancedGraphics Cytoscape app is installed.** 617 | 618 | ## References 619 | 620 | * See the vignette for more details: `browseVignettes(package='ActivePathways')`. 621 | 622 | * Mykhaylo Slobodyanyuk^, Alexander T. Bahcheli^, Zoe P. Klein, Masroor Bayati, Lisa J. Strug, Jüri Reimand. Directional integration and pathway enrichment analysis for multi-omics data. Nature Communications (2024) (^ - co-first authors) . 623 | 624 | * Integrative Pathway Enrichment Analysis of Multivariate Omics Data. Paczkowska M^, Barenboim J^, Sintupisut N, Fox NS, Zhu H, Abd-Rabbo D, Mee MW, Boutros PC, PCAWG Drivers and Functional Interpretation Working Group; Reimand J, PCAWG Consortium. Nature Communications (2020) (^ - co-first authors) . 625 | 626 | * Pathway Enrichment Analysis and Visualization of Omics Data Using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Reimand J^, Isserlin R^, Voisin V, Kucera M, Tannus-Lopes C, Rostamianfar A, Wadi L, Meyer M, Wong J, Xu C, Merico D, Bader GD. Nature Protocols (2019) (^ - co-first authors) . 627 | -------------------------------------------------------------------------------- /inst/extdata/enrichmentMap__legend.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/inst/extdata/enrichmentMap__legend.pdf -------------------------------------------------------------------------------- /inst/extdata/enrichmentMap__pathways.txt: -------------------------------------------------------------------------------- 1 | term_id term_name adjusted_p_val 2 | REAC:2424491 DAP12 signaling 4.49126833230489e-05 3 | REAC:422475 Axon guidance 0.00046259184602835 4 | REAC:177929 Signaling by EGFR 0.000619750411866923 5 | REAC:2559583 Cellular Senescence 6.59544666946083e-05 6 | REAC:5654699 SHC-mediated cascade:FGFR2 0.0284446360025451 7 | REAC:2428924 IGF1R signaling cascade 0.00472765617433565 8 | REAC:167044 Signalling to RAS 0.00725674706700891 9 | REAC:5654700 FRS-mediated FGFR2 signaling 0.0038328186164029 10 | REAC:187687 Signalling to ERKs 0.0110412411912787 11 | REAC:180336 SHC1 events in EGFR signaling 0.0036184063767845 12 | REAC:4420097 VEGFA-VEGFR2 Pathway 0.00399592999211922 13 | REAC:112399 IRS-mediated signalling 0.00376912151604491 14 | REAC:5654712 FRS-mediated FGFR4 signaling 0.0038328186164029 15 | REAC:180292 GAB1 signalosome 0.011469483859716 16 | REAC:2262752 Cellular responses to stress 0.000906522444094916 17 | REAC:198203 PI3K/AKT activation 0.00491448952862143 18 | REAC:194138 Signaling by VEGF 0.0061305631670633 19 | REAC:1257604 PIP3 activates AKT signaling 0.0108481449904663 20 | REAC:212436 Generic Transcription Pathway 0.00945486548608516 21 | REAC:74752 Signaling by Insulin receptor 0.00701199797493265 22 | REAC:186797 Signaling by PDGF 0.000806572551897785 23 | REAC:112412 SOS-mediated signalling 0.0036184063767845 24 | REAC:3214842 HDMs demethylate histones 0.00775152941354889 25 | REAC:1236394 Signaling by ERBB4 0.000365080271370282 26 | REAC:449147 Signaling by Interleukins 0.00289259718940152 27 | REAC:1433557 Signaling by SCF-KIT 0.000332224908895709 28 | REAC:5654695 PI-3K cascade:FGFR2 0.0108481449904663 29 | REAC:5654736 Signaling by FGFR1 0.000430526596880612 30 | REAC:2172127 DAP12 interactions 6.09352718405167e-05 31 | REAC:5655253 Signaling by FGFR2 in disease 0.0200793711374132 32 | REAC:5673000 RAF activation 0.00404111415860685 33 | REAC:448424 Interleukin-17 signaling 0.00746476885651692 34 | REAC:190236 Signaling by FGFR 0.000160119485897304 35 | -------------------------------------------------------------------------------- /inst/extdata/enrichmentMap__subgroups.txt: -------------------------------------------------------------------------------- 1 | term_id X3UTR X5UTR CDS promCore combined instruct 2 | REAC:2424491 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 3 | REAC:422475 1 0 0 1 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 4 | REAC:177929 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 5 | REAC:2559583 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 6 | REAC:5654699 0 0 0 0 1 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 7 | REAC:2428924 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 8 | REAC:167044 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 9 | REAC:5654700 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 10 | REAC:187687 0 0 0 0 1 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 11 | REAC:180336 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 12 | REAC:4420097 1 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 13 | REAC:112399 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 14 | REAC:5654712 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 15 | REAC:180292 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 16 | REAC:2262752 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 17 | REAC:198203 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 18 | REAC:194138 1 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 19 | REAC:1257604 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 20 | REAC:212436 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 21 | REAC:74752 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 22 | REAC:186797 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 23 | REAC:112412 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 24 | REAC:3214842 0 0 0 0 1 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 25 | REAC:1236394 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 26 | REAC:449147 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 27 | REAC:1433557 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 28 | REAC:5654695 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 29 | REAC:5654736 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 30 | REAC:2172127 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 31 | REAC:5655253 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 32 | REAC:5673000 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 33 | REAC:448424 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 34 | REAC:190236 0 0 1 0 0 piechart: attributelist="X3UTR,X5UTR,CDS,promCore,combined" colorlist="#FF0000,#CCFF00,#00FF66,#0066FF,#FFFFF0" showlabels=FALSE 35 | -------------------------------------------------------------------------------- /man/ActivePathways.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ActivePathways.r 3 | \name{ActivePathways} 4 | \alias{ActivePathways} 5 | \title{ActivePathways} 6 | \usage{ 7 | ActivePathways( 8 | scores, 9 | gmt, 10 | background = makeBackground(gmt), 11 | geneset_filter = c(5, 1000), 12 | cutoff = 0.1, 13 | significant = 0.05, 14 | merge_method = c("Fisher", "Fisher_directional", "Brown", "DPM", "Stouffer", 15 | "Stouffer_directional", "Strube", "Strube_directional"), 16 | correction_method = c("holm", "fdr", "hochberg", "hommel", "bonferroni", "BH", "BY", 17 | "none"), 18 | cytoscape_file_tag = NA, 19 | color_palette = NULL, 20 | custom_colors = NULL, 21 | color_integrated_only = "#FFFFF0", 22 | scores_direction = NULL, 23 | constraints_vector = NULL 24 | ) 25 | } 26 | \arguments{ 27 | \item{scores}{A numerical matrix of p-values where each row is a gene and 28 | each column represents an omics dataset (evidence). Rownames correspond to the genes 29 | and colnames to the datasets. All values must be 0<=p<=1. We recommend converting 30 | missing values to ones.} 31 | 32 | \item{gmt}{A GMT object to be used for enrichment analysis. If a filename, a 33 | GMT object will be read from the file.} 34 | 35 | \item{background}{A character vector of gene names to be used as a 36 | statistical background. By default, the background is all genes that appear 37 | in \code{gmt}.} 38 | 39 | \item{geneset_filter}{A numeric vector of length two giving the lower and 40 | upper limits for the size of the annotated geneset to pathways in gmt. 41 | Pathways with a geneset shorter than \code{geneset_filter[1]} or longer 42 | than \code{geneset_filter[2]} will be removed. Set either value to NA 43 | to not enforce a minimum or maximum value, or set \code{geneset_filter} to 44 | \code{NULL} to skip filtering.} 45 | 46 | \item{cutoff}{A maximum merged p-value for a gene to be used for analysis. 47 | Any genes with merged, unadjusted \code{p > significant} will be discarded 48 | before testing.} 49 | 50 | \item{significant}{Significance cutoff for selecting enriched pathways. Pathways with 51 | \code{adjusted_p_val <= significant} will be selected as results.} 52 | 53 | \item{merge_method}{Statistical method to merge p-values. See section on Merging P-Values} 54 | 55 | \item{correction_method}{Statistical method to correct p-values. See 56 | \code{\link[stats]{p.adjust}} for details.} 57 | 58 | \item{cytoscape_file_tag}{The directory and/or file prefix to which the output files 59 | for generating enrichment maps should be written. If NA, files will not be written.} 60 | 61 | \item{color_palette}{Color palette from RColorBrewer::brewer.pal to color each 62 | column in the scores matrix. If NULL grDevices::rainbow is used by default.} 63 | 64 | \item{custom_colors}{A character vector of custom colors for each column in the scores matrix.} 65 | 66 | \item{color_integrated_only}{A character vector of length 1 specifying the color of the 67 | "combined" pathway contribution.} 68 | 69 | \item{scores_direction}{A numerical matrix of log2 transformed fold-change values where each row is a 70 | gene and each column represents a dataset (evidence). Rownames correspond to the genes 71 | and colnames to the datasets. We recommend converting missing values to zero. 72 | Must contain the same dimensions as the scores parameter. Datasets without directional information should be set to 0.} 73 | 74 | \item{constraints_vector}{A numerical vector of +1 or -1 values corresponding to the user-defined 75 | directional relationship between columns in scores_direction. Datasets without directional information should 76 | be set to 0.} 77 | } 78 | \value{ 79 | A data.table of terms (enriched pathways) containing the following columns: 80 | \describe{ 81 | \item{term_id}{The database ID of the term} 82 | \item{term_name}{The full name of the term} 83 | \item{adjusted_p_val}{The associated p-value, adjusted for multiple testing} 84 | \item{term_size}{The number of genes annotated to the term} 85 | \item{overlap}{A character vector of the genes enriched in the term} 86 | \item{evidence}{Columns of \code{scores} (i.e., omics datasets) that contributed 87 | individually to the enrichment of the term. Each input column is evaluated 88 | separately for enrichments and added to the evidence if the term is found.} 89 | } 90 | } 91 | \description{ 92 | ActivePathways 93 | } 94 | \section{Merging P-values}{ 95 | 96 | To obtain a single p-value for each gene across the multiple omics datasets considered, 97 | the p-values in \code{scores} #' are merged row-wise using a data fusion approach of p-value merging. 98 | The eight available methods are: 99 | \describe{ 100 | \item{Fisher}{Fisher's method assumes p-values are uniformly 101 | distributed and performs a chi-squared test on the statistic sum(-2 log(p)). 102 | This method is most appropriate when the columns in \code{scores} are 103 | independent.} 104 | \item{Fisher_directional}{Fisher's method modification that allows for 105 | directional information to be incorporated with the \code{scores_direction} 106 | and \code{constraints_vector} parameters.} 107 | \item{Brown}{Brown's method extends Fisher's method by accounting for the 108 | covariance in the columns of \code{scores}. It is more appropriate when the 109 | tests of significance used to create the columns in \code{scores} are not 110 | necessarily independent. The Brown's method is therefore recommended for 111 | many omics integration approaches.} 112 | \item{DPM}{DPM extends Brown's method by incorporating directional information 113 | using the \code{scores_direction} and \code{constraints_vector} parameters.} 114 | \item{Stouffer}{Stouffer's method assumes p-values are uniformly distributed 115 | and transforms p-values into a Z-score using the cumulative distribution function of a 116 | standard normal distribution. This method is appropriate when the columns in \code{scores} 117 | are independent.} 118 | \item{Stouffer_directional}{Stouffer's method modification that allows for 119 | directional information to be incorporated with the \code{scores_direction} 120 | and \code{constraints_vector} parameters.} 121 | \item{Strube}{Strube's method extends Stouffer's method by accounting for the 122 | covariance in the columns of \code{scores}.} 123 | \item{Strube_directional}{Strube's method modification that allows for 124 | directional information to be incorporated with the \code{scores_direction} 125 | and \code{constraints_vector} parameters.} 126 | } 127 | } 128 | 129 | \section{Cytoscape}{ 130 | 131 | To visualize and interpret enriched pathways, ActivePathways provides an option 132 | to further analyse results as enrichment maps in the Cytoscape software. 133 | If \code{!is.na(cytoscape_file_tag)}, four files will be written that can be used 134 | to build enrichment maps. This requires the EnrichmentMap and enhancedGraphics apps. 135 | 136 | The four files written are: 137 | \describe{ 138 | \item{pathways.txt}{A list of significant terms and the 139 | associated p-value. Only terms with \code{adjusted_p_val <= significant} are 140 | written to this file.} 141 | \item{subgroups.txt}{A matrix indicating whether the significant terms (pathways) 142 | were also found to be significant when considering only one column from 143 | \code{scores}. A one indicates that term was found to be significant 144 | when only p-values in that column were used to select genes.} 145 | \item{pathways.gmt}{A Shortened version of the supplied GMT 146 | file, containing only the significantly enriched terms in pathways.txt. } 147 | \item{legend.pdf}{A legend with colours matching contributions 148 | from columns in \code{scores}.} 149 | } 150 | 151 | How to use: Create an enrichment map in Cytoscape with the file of terms 152 | (pathways.txt) and the shortened gmt file 153 | (pathways.gmt). Upload the subgroups file (subgroups.txt) as a table 154 | using the menu File > Import > Table from File. To paint nodes according 155 | to the type of supporting evidence, use the 'style' 156 | panel, set image/Chart1 to use the column `instruct` and the passthrough 157 | mapping type. Make sure the app enhancedGraphics is installed. 158 | Lastly, use the file legend.pdf as a reference for colors in the enrichment map. 159 | } 160 | 161 | \examples{ 162 | fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv", 163 | package = "ActivePathways") 164 | fname_GMT = system.file("extdata", "hsapiens_REAC_subset.gmt", 165 | package = "ActivePathways") 166 | 167 | dat <- as.matrix(read.table(fname_scores, header = TRUE, row.names = 'Gene')) 168 | dat[is.na(dat)] <- 1 169 | 170 | ActivePathways(dat, fname_GMT) 171 | 172 | } 173 | -------------------------------------------------------------------------------- /man/DPM.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/merge_p.r 3 | \name{DPM} 4 | \alias{DPM} 5 | \title{Merge p-values using the DPM method.} 6 | \usage{ 7 | DPM( 8 | p_values, 9 | data_matrix = NULL, 10 | cov_matrix = NULL, 11 | scores_direction, 12 | constraints_vector 13 | ) 14 | } 15 | \arguments{ 16 | \item{p_values}{A matrix of m x n p-values.} 17 | 18 | \item{data_matrix}{An m x n matrix representing m tests and n samples. NA's are not allowed.} 19 | 20 | \item{cov_matrix}{A pre-calculated covariance matrix of \code{data_matrix}. This is more 21 | efficient when making many calls with the same data_matrix. 22 | Only one of \code{data_matrix} and \code{cov_matrix} must be given. If both are supplied, 23 | \code{data_matrix} is ignored.} 24 | 25 | \item{scores_direction}{A matrix of log2 fold-change values. Datasets without directional information should be set to 0.} 26 | 27 | \item{constraints_vector}{A numerical vector of +1 or -1 values corresponding to the user-defined 28 | directional relationship between columns in scores_direction. Datasets without directional information should 29 | be set to 0.} 30 | } 31 | \value{ 32 | A p-value vector representing the merged significance of multiple p-values. 33 | } 34 | \description{ 35 | Merge p-values using the DPM method. 36 | } 37 | -------------------------------------------------------------------------------- /man/GMT.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/gmt.r 3 | \name{GMT} 4 | \alias{GMT} 5 | \alias{read.GMT} 6 | \alias{gmt} 7 | \alias{write.GMT} 8 | \alias{is.GMT} 9 | \title{Read and Write GMT files} 10 | \format{ 11 | A GMT object is a named list of terms, where each term is a list with the items: 12 | \describe{ 13 | \item{id}{The term ID.} 14 | \item{name}{The full name or description of the term.} 15 | \item{genes}{A character vector of genes annotated to this term.} 16 | } 17 | } 18 | \usage{ 19 | read.GMT(filename) 20 | 21 | write.GMT(gmt, filename) 22 | 23 | is.GMT(x) 24 | } 25 | \arguments{ 26 | \item{filename}{Location of the gmt file.} 27 | 28 | \item{gmt}{A GMT object.} 29 | 30 | \item{x}{The object to test.} 31 | } 32 | \value{ 33 | \code{read.GMT} returns a GMT object. \cr 34 | \code{write.GMT} returns NULL. \cr 35 | \code{is.GMT} returns TRUE if \code{x} is a GMT object, else FALSE. 36 | } 37 | \description{ 38 | Functions to read and write Gene Matrix Transposed (GMT) files and to test if 39 | an object inherits from GMT. 40 | } 41 | \details{ 42 | A GMT file describes gene sets, such as biological terms and pathways. GMT files are 43 | tab delimited text files. Each row of a GMT file contains a single term with its 44 | database ID and a term name, followed by all the genes annotated to the term. 45 | } 46 | \examples{ 47 | fname_GMT <- system.file("extdata", "hsapiens_REAC_subset.gmt", package = "ActivePathways") 48 | gmt <- read.GMT(fname_GMT) 49 | gmt[1:10] 50 | gmt[[1]] 51 | gmt[[1]]$id 52 | gmt[[1]]$genes 53 | gmt[[1]]$name 54 | gmt$`REAC:1630316` 55 | } 56 | -------------------------------------------------------------------------------- /man/brownsMethod.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/merge_p.r 3 | \name{brownsMethod} 4 | \alias{brownsMethod} 5 | \title{Merge p-values using the Brown's method.} 6 | \usage{ 7 | brownsMethod(p_values, data_matrix = NULL, cov_matrix = NULL) 8 | } 9 | \arguments{ 10 | \item{p_values}{A matrix of m x n p-values.} 11 | 12 | \item{data_matrix}{An m x n matrix representing m tests and n samples. NA's are not allowed.} 13 | 14 | \item{cov_matrix}{A pre-calculated covariance matrix of \code{data_matrix}. This is more 15 | efficient when making many calls with the same data_matrix. 16 | Only one of \code{data_matrix} and \code{cov_matrix} must be given. If both are supplied, 17 | \code{data_matrix} is ignored.} 18 | } 19 | \value{ 20 | A p-value vector representing the merged significance of multiple p-values. 21 | } 22 | \description{ 23 | Merge p-values using the Brown's method. 24 | } 25 | -------------------------------------------------------------------------------- /man/columnSignificance.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ActivePathways.r 3 | \name{columnSignificance} 4 | \alias{columnSignificance} 5 | \title{Determine which terms are found to be significant using each column 6 | individually.} 7 | \usage{ 8 | columnSignificance( 9 | scores, 10 | gmt, 11 | background, 12 | cutoff, 13 | significant, 14 | correction_method, 15 | pvals 16 | ) 17 | } 18 | \arguments{ 19 | \item{scores}{A numerical matrix of p-values where each row is a gene and 20 | each column represents an omics dataset (evidence). Rownames correspond to the genes 21 | and colnames to the datasets. All values must be 0<=p<=1. We recommend converting 22 | missing values to ones.} 23 | 24 | \item{gmt}{A GMT object to be used for enrichment analysis. If a filename, a 25 | GMT object will be read from the file.} 26 | 27 | \item{background}{A character vector of gene names to be used as a 28 | statistical background. By default, the background is all genes that appear 29 | in \code{gmt}.} 30 | 31 | \item{cutoff}{A maximum merged p-value for a gene to be used for analysis. 32 | Any genes with merged, unadjusted \code{p > significant} will be discarded 33 | before testing.} 34 | 35 | \item{significant}{Significance cutoff for selecting enriched pathways. Pathways with 36 | \code{adjusted_p_val <= significant} will be selected as results.} 37 | 38 | \item{correction_method}{Statistical method to correct p-values. See 39 | \code{\link[stats]{p.adjust}} for details.} 40 | 41 | \item{pvals}{p-value for the pathways calculated by ActivePathways} 42 | } 43 | \value{ 44 | a data.table with columns 'term_id' and a column for each column 45 | in \code{scores}, indicating whether each term (pathway) was found to be 46 | significant or not when considering only that column. For each term, 47 | either report the list of related genes if that term was significant, or NA if not. 48 | } 49 | \description{ 50 | Determine which terms are found to be significant using each column 51 | individually. 52 | } 53 | -------------------------------------------------------------------------------- /man/enrichmentAnalysis.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ActivePathways.r 3 | \name{enrichmentAnalysis} 4 | \alias{enrichmentAnalysis} 5 | \title{Perform pathway enrichment analysis on an ordered list of genes} 6 | \usage{ 7 | enrichmentAnalysis(genelist, gmt, background) 8 | } 9 | \arguments{ 10 | \item{genelist}{character vector of gene names, in decreasing order 11 | of significance} 12 | 13 | \item{gmt}{GMT object} 14 | 15 | \item{background}{character vector of gene names. List of all genes being used 16 | as a statistical background} 17 | } 18 | \value{ 19 | a data.table of terms with the following columns: 20 | \describe{ 21 | \item{term_id}{The id of the term} 22 | \item{term_name}{The full name of the term} 23 | \item{adjusted_p_val}{The associated p-value adjusted for multiple testing} 24 | \item{term_size}{The number of genes annotated to the term} 25 | \item{overlap}{A character vector of the genes that overlap between the 26 | term and the query} 27 | } 28 | } 29 | \description{ 30 | Perform pathway enrichment analysis on an ordered list of genes 31 | } 32 | \keyword{internal} 33 | -------------------------------------------------------------------------------- /man/export_as_CSV.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/ActivePathways.r 3 | \name{export_as_CSV} 4 | \alias{export_as_CSV} 5 | \title{Export the results from ActivePathways as a comma-separated values (CSV) file.} 6 | \usage{ 7 | export_as_CSV(res, file_name) 8 | } 9 | \arguments{ 10 | \item{res}{the data.table object with ActivePathways results.} 11 | 12 | \item{file_name}{location and name of the CSV file to write to.} 13 | } 14 | \description{ 15 | Export the results from ActivePathways as a comma-separated values (CSV) file. 16 | } 17 | \examples{ 18 | fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv", 19 | package = "ActivePathways") 20 | fname_GMT = system.file("extdata", "hsapiens_REAC_subset.gmt", 21 | package = "ActivePathways") 22 | 23 | dat <- as.matrix(read.table(fname_scores, header = TRUE, row.names = 'Gene')) 24 | dat[is.na(dat)] <- 1 25 | 26 | res <- ActivePathways(dat, fname_GMT) 27 | \donttest{ 28 | export_as_CSV(res, "results_ActivePathways.csv") 29 | } 30 | } 31 | -------------------------------------------------------------------------------- /man/hypergeometric.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/statistical_tests.r 3 | \name{hypergeometric} 4 | \alias{hypergeometric} 5 | \title{Hypergeometric test} 6 | \usage{ 7 | hypergeometric(counts) 8 | } 9 | \arguments{ 10 | \item{counts}{A 2x2 numerical matrix representing a contingency table.} 11 | } 12 | \value{ 13 | a p-value of enrichment of genes in a term or pathway. 14 | } 15 | \description{ 16 | Perform a hypergeometric test, also known as the Fisher's exact test, on a 2x2 contingency 17 | table with the alternative hypothesis set to 'greater'. In this application, the test finds the 18 | probability that counts[1, 1] or more genes would be found to be annotated to a term (pathway), 19 | assuming the null hypothesis of genes being distributed randomly to terms. 20 | } 21 | -------------------------------------------------------------------------------- /man/makeBackground.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/gmt.r 3 | \name{makeBackground} 4 | \alias{makeBackground} 5 | \title{Make a background list of genes (i.e., the statistical universe) based on all the terms (gene sets, pathways) considered.} 6 | \usage{ 7 | makeBackground(gmt) 8 | } 9 | \arguments{ 10 | \item{gmt}{A \link{GMT} object.} 11 | } 12 | \value{ 13 | A character vector containing all genes in GMT. 14 | } 15 | \description{ 16 | Returns A character vector of all genes in a GMT object. 17 | } 18 | \examples{ 19 | fname_GMT <- system.file("extdata", "hsapiens_REAC_subset.gmt", package = "ActivePathways") 20 | gmt <- read.GMT(fname_GMT) 21 | makeBackground(gmt)[1:10] 22 | } 23 | -------------------------------------------------------------------------------- /man/merge_p_values.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/merge_p.r 3 | \name{merge_p_values} 4 | \alias{merge_p_values} 5 | \title{Merge a list or matrix of p-values} 6 | \usage{ 7 | merge_p_values( 8 | scores, 9 | method = "Fisher", 10 | scores_direction = NULL, 11 | constraints_vector = NULL 12 | ) 13 | } 14 | \arguments{ 15 | \item{scores}{Either a list/vector of p-values or a matrix where each column is a test.} 16 | 17 | \item{method}{Method to merge p-values. See 'methods' section below.} 18 | 19 | \item{scores_direction}{Either a vector of log2 transformed fold-change values or a matrix where each column is a test. 20 | Must contain the same dimensions as the scores parameter. Datasets without directional information should be set to 0.} 21 | 22 | \item{constraints_vector}{A numerical vector of +1 or -1 values corresponding to the user-defined 23 | directional relationship between the columns in scores_direction. Datasets without directional information should 24 | be set to 0.} 25 | } 26 | \value{ 27 | If \code{scores} is a vector or list, returns a number. If \code{scores} is a 28 | matrix, returns a named list of p-values merged by row. 29 | } 30 | \description{ 31 | Merge a list or matrix of p-values 32 | } 33 | \section{Methods}{ 34 | 35 | Eight methods are available to merge a list of p-values: 36 | \describe{ 37 | \item{Fisher}{Fisher's method (default) assumes that p-values are uniformly 38 | distributed and performs a chi-squared test on the statistic sum(-2 log(p)). 39 | This method is most appropriate when the columns in \code{scores} are 40 | independent.} 41 | \item{Fisher_directional}{Fisher's method modification that allows for 42 | directional information to be incorporated with the \code{scores_direction} 43 | and \code{constraints_vector} parameters.} 44 | \item{Brown}{Brown's method extends Fisher's method by accounting for the 45 | covariance in the columns of \code{scores}. It is more appropriate when the 46 | tests of significance used to create the columns in \code{scores} are not 47 | necessarily independent. Note that the "Brown" method cannot be used with a 48 | single list of p-values. However, in this case Brown's method is identical 49 | to Fisher's method and should be used instead.} 50 | \item{DPM}{DPM extends Brown's method by incorporating directional information 51 | using the \code{scores_direction} and \code{constraints_vector} parameters.} 52 | \item{Stouffer}{Stouffer's method assumes p-values are uniformly distributed 53 | and transforms p-values into a Z-score using the cumulative distribution function of a 54 | standard normal distribution. This method is appropriate when the columns in \code{scores} 55 | are independent.} 56 | \item{Stouffer_directional}{Stouffer's method modification that allows for 57 | directional information to be incorporated with the \code{scores_direction} 58 | and \code{constraints_vector} parameters.} 59 | \item{Strube}{Strube's method extends Stouffer's method by accounting for the 60 | covariance in the columns of \code{scores}.} 61 | \item{Strube_directional}{Strube's method modification that allows for 62 | directional information to be incorporated with the \code{scores_direction} 63 | and \code{constraints_vector} parameters.} 64 | 65 | } 66 | } 67 | 68 | \examples{ 69 | merge_p_values(c(0.05, 0.09, 0.01)) 70 | merge_p_values(list(a=0.01, b=1, c=0.0015, d=0.025), method='Fisher') 71 | merge_p_values(matrix(data=c(0.03, 0.061, 0.48, 0.052), nrow = 2), method='Brown') 72 | 73 | } 74 | -------------------------------------------------------------------------------- /man/orderedHypergeometric.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/statistical_tests.r 3 | \name{orderedHypergeometric} 4 | \alias{orderedHypergeometric} 5 | \title{Ordered Hypergeometric Test} 6 | \usage{ 7 | orderedHypergeometric(genelist, background, annotations) 8 | } 9 | \arguments{ 10 | \item{genelist}{Character vector of gene names, assumed to be ordered by decreasing importance. 11 | For example, the genes could be ranked by decreasing significance of differential expression.} 12 | 13 | \item{background}{Character vector of gene names. List of all genes used as a statistical background (i.e., the universe).} 14 | 15 | \item{annotations}{Character vector of gene names. A gene set representing a functional term, process or biological pathway.} 16 | } 17 | \value{ 18 | a list with the items: 19 | \describe{ 20 | \item{p_val}{The lowest obtained p-value} 21 | \item{ind}{The index of \code{genelist} such that \code{genelist[1:ind]} 22 | gives the lowest p-value} 23 | } 24 | } 25 | \description{ 26 | Perform a series of hypergeometric tests (a.k.a. Fisher's Exact tests), on a ranked list of genes ordered 27 | by significance against a list of annotation genes. The hypergeometric tests are executed with 28 | increasingly larger numbers of genes representing the top genes in order of decreasing scores. 29 | The lowest p-value of the series is returned as the optimal enriched intersection of the ranked list of genes 30 | and the biological term (pathway). 31 | } 32 | \examples{ 33 | orderedHypergeometric(c('HERC2', 'SP100'), c('PHC2', 'BLM', 'XPC', 'SMC3', 'HERC2', 'SP100'), 34 | c('HERC2', 'PHC2', 'BLM')) 35 | } 36 | -------------------------------------------------------------------------------- /man/prepareCytoscape.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/cytoscape.r 3 | \name{prepareCytoscape} 4 | \alias{prepareCytoscape} 5 | \title{Prepare files for building an enrichment map network visualization in Cytoscape} 6 | \usage{ 7 | prepareCytoscape( 8 | terms, 9 | gmt, 10 | cytoscape_file_tag, 11 | col_significance, 12 | color_palette = NULL, 13 | custom_colors = NULL, 14 | color_integrated_only = "#FFFFF0" 15 | ) 16 | } 17 | \arguments{ 18 | \item{terms}{A data.table object with the columns 'term_id', 'term_name', 'adjusted_p_val'.} 19 | 20 | \item{gmt}{An abridged GMT object containing only the pathways that were 21 | found to be significant in the ActivePathways analysis.} 22 | 23 | \item{cytoscape_file_tag}{The user-defined file prefix and/or directory defining the location of the files.} 24 | 25 | \item{col_significance}{A data.table object with a column 'term_id' and a column 26 | for each type of omics evidence indicating whether a term was also found to be significant or not 27 | when considering only the genes and p-values in the corresponding column of the \code{scores} matrix. 28 | If term was not found, NA's are shown in columns, otherwise the relevant lists of genes are shown.} 29 | 30 | \item{color_palette}{Color palette from RColorBrewer::brewer.pal to color each 31 | column in the scores matrix. If NULL grDevices::rainbow is used by default.} 32 | 33 | \item{custom_colors}{A character vector of custom colors for each column in the scores matrix.} 34 | 35 | \item{color_integrated_only}{A character vector of length 1 specifying the color of the "combined" pathway contribution.} 36 | } 37 | \value{ 38 | None 39 | } 40 | \description{ 41 | This function writes four text files that are used to build an network using 42 | Cytoscape and the EnrichmentMap app. The files are prefixed with \code{cytoscape_file_tag}. 43 | The four files written are: 44 | \describe{ 45 | \item{pathways.txt}{A list of significant terms and the 46 | associated p-value. Only terms with \code{adjusted_p_val <= significant} are 47 | written to this file} 48 | \item{subgroups.txt}{A matrix indicating whether the significant 49 | pathways are found to be significant when considering only one column (i.e., type of omics evidence) from 50 | \code{scores}. A 1 indicates that that term is significant using only that 51 | column to test for enrichment analysis} 52 | \item{pathways.gmt}{A shortened version of the supplied GMT 53 | file, containing only the terms in pathways.txt.} 54 | \item{legend.pdf}{A legend with colours matching contributions 55 | from columns in \code{scores}} 56 | } 57 | } 58 | -------------------------------------------------------------------------------- /tests/testthat.R: -------------------------------------------------------------------------------- 1 | library(testthat) 2 | library(ActivePathways) 3 | 4 | test_check("ActivePathways") 5 | -------------------------------------------------------------------------------- /tests/testthat/helper.r: -------------------------------------------------------------------------------- 1 | # Prepare testing data 2 | gmt <- read.GMT('test.gmt') 3 | gmt_reac <- read.GMT('hsapiens_REAC_subset.gmt') 4 | dat <- as.matrix(read.table('test_data.txt', header=TRUE, row.names='Gene')) 5 | dat[is.na(dat)] <- 1 6 | background <- makeBackground(gmt) 7 | 8 | # filenames for cytoscape 9 | CStag = "CS_files" 10 | 11 | # Prepare testing data for scores_direction and constraints_vector 12 | df <- read.table('test_data_rna_protein.tsv', header = TRUE, row.names = "gene", sep = '\t') 13 | scores_test <- data.frame(row.names = rownames(df), rna = df$rna_pval, protein = df$protein_pval) 14 | scores_test <- as.matrix(scores_test) 15 | scores_test[is.na(scores_test)] <- 1 16 | direction_test <- data.frame(row.names = rownames(df), rna = df$rna_log2fc, protein = df$protein_log2fc) 17 | direction_test <- as.matrix(direction_test) 18 | direction_test[is.na(direction_test)] <- 0 19 | constraints_vector_test <- c(1,1) 20 | 21 | # Run ActivePathways quickly 22 | run_ap_short <- function(dat) ActivePathways(dat[,1, drop = F], gmt[1:3], cutoff=1, significant=1) 23 | run_ap_short_contribution <- function(dat) ActivePathways(dat, gmt[1:3], cutoff=1, significant=1) 24 | run_ap <- function(scores_test,direction_test,constraints_vector_test) ActivePathways(scores=scores_test, 25 | merge_method="DPM", 26 | gmt_reac, cutoff=1, significant=1, 27 | scores_direction=direction_test, 28 | constraints_vector=constraints_vector_test) 29 | 30 | 31 | 32 | # Data for testing enrichmentAnalysis 33 | ea_gmt <- gmt[1:4] 34 | ea_gmt[[1]]$genes <- c('PHC2', 'XPC', 'BLM') 35 | ea_gmt[[2]]$genes <- c('HERC2', 'SP100', 'BLM') 36 | ea_gmt[[3]]$genes <- c('HERC2', 'XPC') 37 | ea_gmt[[4]]$genes <- c('XPC') 38 | ea_genelist <- c('HERC2', 'SP100', 'BLM') 39 | ea_genelist2 <- c('HERC2', letters, 'XPC') 40 | ea_background <- makeBackground(ea_gmt) 41 | ea_background2 <- c(ea_background, letters) 42 | 43 | # Expectation to test if two lists contain the same items, ignoring order 44 | expect_setequal <- function(actual, expected) { 45 | # Test that the sets from two objects are the same 46 | differences <- setdiff(actual,expected) 47 | sets_equal <- length(differences) == 0 48 | message <- paste("Sets not equal. First difference was:", differences[[1]]) 49 | expect(sets_equal, message) 50 | invisible(actual) 51 | } 52 | -------------------------------------------------------------------------------- /tests/testthat/test.gmt: -------------------------------------------------------------------------------- 1 | REAC:450513 Tristetraprolin (TTP, ZFP36) binds and destabilizes mRNA EXOSC9 XRN1 EXOSC8 EXOSC2 EXOSC6 DCP1A EXOSC3 YWHAB DIS3 DCP2 EXOSC1 EXOSC4 EXOSC5 TNPO1 ZFP36 EXOSC7 MAPKAPK2 2 | REAC:912631 Regulation of signaling by CBL PIK3R3 PIK3R1 VAV1 AL358075.4 BLNK LYN CBL SYK PIK3CA FYN GRB2 RAPGEF1 AL672043.1 CRK YES1 CRKL PIK3CB PIK3R2 HCK PIK3CD 3 | REAC:434316 Fatty Acids bound to GPR40 (FFAR1) regulate insulin secretion GNA14 GNA11 GNAQ PLCB2 PLCB1 GNA15 FFAR1 PLCB3 4 | REAC:446199 Synthesis of Dolichyl-phosphate DOLPP1 DHDDS DOLK MVD NUS1 SRD5A3 5 | REAC:3304356 SMAD2/3 Phosphorylation Motif Mutants in Cancer TGFB1 SMAD2 SMAD3 TGFBR2 ZFYVE9 TGFBR1 6 | REAC:5213460 RIPK1-mediated regulated necrosis BIRC3 FASLG FAS MLKL TNFRSF10B TRAF2 RIPK3 FADD CASP8 RIPK1 TNFRSF10A XIAP TRADD TNFSF10 BIRC2 CFLAR 7 | REAC:5635851 GLI proteins bind promoters of Hh responsive genes to promote transcription PTCH1 PTCH2 GLI3 GLI1 BOC HHIP GLI2 8 | REAC:2644605 FBXW7 Mutants and NOTCH1 in Cancer SKP1 CUL1 FBXW7 RBX1 NOTCH1 9 | REAC:2024101 CS/DS degradation DCN HYAL3 HEXA NCAN NAT6 ARSB HYAL1 BGN HEXB CSPG5 VCAN CSPG4 BCAN IDS AC244197.3 IDUA 10 | REAC:5358752 T41 mutants of beta-catenin aren't phosphorylated AXIN1 PPP2R5C PPP2R5B CSNK1A1 PPP2R1B PPP2R5E PPP2R5A PPP2R5D AMER1 GSK3B PPP2CA APC PPP2CB CTNNB1 PPP2R1A 11 | REAC:3595172 Defective CHST3 causes SEDCJD VCAN CSPG4 NCAN BGN DCN CSPG5 BCAN 12 | REAC:167826 The fatty acid cycling model SLC25A27 SLC25A14 UCP2 UCP3 UCP1 13 | REAC:77348 Beta oxidation of octanoyl-CoA to hexanoyl-CoA ACADM HADHA HADHB HADH ECHS1 14 | REAC:5083625 Defective GALNT3 causes familial hyperphosphatemic tumoral calcinosis (HFTC) MUC2 MUC5B MUC19 MUC3A MUC12 MUC7 MUC6 MUC5AC MUC21 MUC15 MUCL1 MUC4 MUC20 MUC1 MUC13 MUC16 MUC17 15 | REAC:1912420 Pre-NOTCH Processing in Golgi ATP2A3 MFNG RAB6A B4GALT1 ST3GAL3 LFNG NOTCH1 TMED2 ATP2A1 RFNG NOTCH4 ATP2A2 NOTCH3 NOTCH2 SEL1L ST3GAL6 ST3GAL4 FURIN 16 | REAC:5358493 Synthesis of diphthamide-EEF2 DPH3 EEF2 DPH5 DNAJC24 DPH6 DPH7 DPH1 DPH2 17 | REAC:210746 Regulation of gene expression in endocrine-committed (NEUROG3+) progenitor cells NKX2-2 PAX4 NEUROG3 NEUROD1 INSM1 18 | REAC:2142712 Synthesis of 12-eicosatetraenoic acid derivatives ALOX15 ALOXE3 ALOX12B GPX4 GPX2 GPX1 ALOX12 19 | REAC:75157 FasL/ CD95L signaling CASP8 FAS FASLG FADD CASP10 20 | REAC:2022857 Keratan sulfate degradation ACAN FMOD OGN HEXA KERA HEXB GLB1L GNS OMD PRELP LUM GLB1 21 | REAC:111932 CaMK IV-mediated phosphorylation of CREB CAMK4 CALM3 CALM1 CALM2 CREB1 22 | REAC:2470946 Cohesin Loading onto Chromatin STAG1 SMC1A STAG2 PDS5A WAPL NIPBL SMC3 RAD21 PDS5B MAU2 23 | REAC:5603041 IRAK4 deficiency (TLR2/4) IRAK4 TLR4 TLR6 TIRAP CD36 MYD88 TLR2 BTK CD14 LY96 TLR1 24 | REAC:388479 Vasopressin-like receptors AVPR1B AVPR1A OXTR OXT AVP AVPR2 25 | REAC:111448 Activation of NOXA and translocation to mitochondria TFDP2 TFDP1 TP53 E2F1 PMAIP1 26 | REAC:180689 APOBEC3G mediated resistance to HIV-1 infection HMGA1 BANF1 PPIA PSIP1 APOBEC3G 27 | REAC:5083632 Defective C1GALT1C1 causes Tn polyagglutination syndrome (TNPS) MUC19 MUC5B MUC2 MUC3A MUC21 MUC5AC MUC6 MUC7 MUC12 MUC15 MUC4 MUCL1 MUC20 MUC13 MUC1 MUC17 MUC16 28 | REAC:5625900 RHO GTPases activate CIT KIF14 RHOC RNASE1 RAC1 RHOB MYH9 MYL12B MYH11 MYL9 CDKN1B PRC1 RHOA MYL6 MYH10 DLG4 MYH14 CIT 29 | REAC:4839748 AMER1 mutants destabilize the destruction complex PPP2CB PPP2R1A GSK3B AMER1 PPP2R5D APC PPP2CA PPP2R1B PPP2R5A PPP2R5E PPP2R5C PPP2R5B AXIN1 CSNK1A1 30 | REAC:3656225 Defective CHST6 causes MCDC1 KERA PRELP LUM OMD ACAN OGN FMOD 31 | REAC:427601 Multifunctional anion exchangers SLC26A1 SLC26A4 SLC26A7 SLC26A11 SLC26A2 SLC26A3 SLC26A6 SLC26A9 SLC5A12 32 | REAC:2142770 Synthesis of 15-eicosatetraenoic acid derivatives GPX1 PTGS2 GPX4 GPX2 ALOX15 ALOX15B 33 | REAC:372708 p130Cas linkage to MAPK signaling for integrins FN1 TLN1 ITGA2B SRC ITGB3 FGA VWF PTK2 FGG RAP1B FGB APBB1IP RAP1A CRK BCAR1 34 | REAC:77588 SLBP Dependent Processing of Replication-Dependent Histone Pre-mRNAs NCBP1 SNRPD3 SLBP NCBP2 SNRPF LSM11 SNRPE ZNF473 SNRPB LSM10 SNRPG 35 | REAC:193634 Axonal growth inhibition (RHOA activation) ARHGDIA ARHGEF1 MAG RHOA LINGO1 NGFR OMG RTN4 36 | REAC:3315487 SMAD2/3 MH2 Domain Mutants in Cancer TGFBR1 ZFYVE9 TGFBR2 SMAD4 SMAD3 SMAD2 TGFB1 37 | REAC:164843 2-LTR circle formation XRCC4 XRCC5 XRCC6 HMGA1 PSIP1 BANF1 LIG4 38 | REAC:622312 Inflammasomes NLRP3 HSP90AB1 P2RX7 PSTPIP1 MEFV TXNIP SUGT1 BCL2L1 APP CASP1 NLRP1 PANX1 BCL2 NLRC4 PYCARD AIM2 TXN 39 | REAC:442380 Zinc influx into cells by the SLC39 gene family SLC39A7 SLC39A10 SLC39A14 SLC39A4 SLC39A1 SLC39A8 SLC39A6 SLC39A5 SLC39A3 SLC39A2 40 | REAC:418889 Ligand-independent caspase activation via DCC DCC APPL1 CASP9 DAPK2 MAGED1 UNC5B UNC5A CASP3 DAPK3 DAPK1 41 | REAC:975110 TRAF6 mediated IRF7 activation in TLR7/8 or 9 signaling IRF7 TLR9 TRAF6 UBE2N IRAK1 TLR8 DHX36 MYD88 UBE2V1 TLR7 IRAK4 42 | REAC:2485179 Activation of the phototransduction cascade GNAT1 CNGA1 SLC24A1 RHO PDE6B PDE6A CNGB1 SAG PDE6G GNB1 GNGT1 43 | REAC:168799 Neurotoxicity of clostridium toxins STX1B SV2A SNAP25 SV2B STX1A VAMP1 VAMP2 SYT1 SV2C SYT2 44 | REAC:196819 Vitamin B1 (thiamin) metabolism SLC19A2 TPK1 THTPA SLC25A19 SLC19A3 45 | REAC:196780 Biotin transport and metabolism MCCC1 ACACB ACACA HLCS BTD PC PDZD11 PCCB MCCC2 PCCA SLC5A6 46 | REAC:5666185 RHO GTPases Activate Rhotekin and Rhophilins ROPN1 LIN7B RHOC RHOA RTKN RHPN2 RHPN1 RHOB TAX1BP3 47 | REAC:110329 Cleavage of the damaged pyrimidine SMUG1 TDG UNG MBD4 NTHL1 NEIL1 OGG1 NEIL2 48 | REAC:1236977 Endosomal/Vacuolar pathway LNPEP HLA-H HLA-G CTSS CTSL HLA-F CTSV HLA-B HLA-A HLA-C B2M HLA-E 49 | REAC:1663150 The activation of arylsulfatases SUMF2 ARSE ARSI ARSK ARSG ARSF SUMF1 STS ARSA ARSH ARSJ ARSB ARSD 50 | REAC:2562578 TRIF-mediated programmed cell death TICAM1 CD14 TICAM2 TLR3 LY96 CASP8 FADD RIPK3 TLR4 RIPK1 51 | REAC:3000484 Scavenging by Class F Receptors HSPH1 HYOU1 SCARF1 APOB HSP90AA1 CALR 52 | REAC:2028269 Signaling by Hippo STK4 AMOTL2 STK3 YAP1 LATS2 TJP1 WWTR1 AMOT DVL2 LATS1 YWHAE CASP3 MOB1B NPHP4 MOB1A WWC1 SAV1 TJP2 YWHAB AMOTL1 53 | REAC:190861 Gap junction assembly GJA1 GJB1 GJB7 GJD4 GJD3 GJC1 GJB5 GJA10 GJD2 GJA8 GJB6 GJB3 GJC2 GJA4 GJA5 GJA9 GJB4 GJA3 GJB2 54 | REAC:74713 IRS activation INS-IGF2 INS IRS2 GRB10 IRS1 INSR 55 | REAC:391906 Leukotriene receptors GPR17 CYSLTR2 LTB4R2 LTB4R CYSLTR1 56 | REAC:6803207 TP53 Regulates Transcription of Caspase Activators and Caspases TP53 PIDD1 APAF1 CASP2 TP63 CASP1 TP73 NLRC4 CRADD ATM CASP10 CASP6 57 | REAC:3595174 Defective CHST14 causes EDS, musculocontractural type NCAN BGN CSPG4 VCAN DCN CSPG5 BCAN 58 | REAC:193670 p75NTR negatively regulates cell cycle via SC1 NGFR NGF HDAC2 PRDM4 HDAC3 HDAC1 59 | REAC:193144 Estrogen biosynthesis CYP19A1 HSD17B11 HSD17B1 HSD17B14 AKR1B15 HSD17B2 60 | REAC:1433559 Regulation of KIT signaling SOS1 GRB2 SOCS1 LCK KIT FYN SH2B3 SRC YES1 KITLG PRKCA PTPN6 CBL LYN SOCS6 SH2B2 61 | REAC:209560 NF-kB is activated and signals survival TRAF6 UBB SQSTM1 RPS27A NFKBIA NFKB1 NGFR UBC IKBKB RELA UBA52 NGF IRAK1 62 | REAC:4839744 truncated APC mutants destabilize the destruction complex CSNK1A1 PPP2R5C PPP2R5B AXIN1 PPP2R5E PPP2R5A PPP2R1B APC PPP2CA GSK3B PPP2R5D AMER1 PPP2R1A PPP2CB 63 | REAC:1839117 Signaling by cytosolic FGFR1 fusion mutants GAB2 CPSF6 GRB2 STAT5B PIK3CA FGFR1OP2 MYO18A BCR ZMYM2 CNTRL STAT1 PIK3R1 STAT5A FGFR1OP STAT3 TRIM24 LRRFIP1 CUX1 64 | REAC:622323 Presynaptic nicotinic acetylcholine receptors CHRNA3 CHRNA6 CHRNA1 CHRNB3 CHRNG CHRNA2 CHRNB2 CHRNA5 CHRNA4 CHRNE CHRNB4 CHRND 65 | REAC:5368598 Negative regulation of TCF-dependent signaling by DVL-interacting proteins DVL2 CXXC4 DVL1 CCDC88C DVL3 66 | REAC:5656121 Translesion synthesis by POLI RPA1 POLI RFC5 RPA3 REV1 RPA2 RFC3 RFC2 UBC UBA52 UBB RPS27A MAD2L2 PCNA RFC1 REV3L RFC4 67 | REAC:444209 Free fatty acid receptors FFAR3 FFAR2 GPR31 FFAR4 FFAR1 68 | REAC:159424 Conjugation of carboxylic acids GLYATL2 ACSM5 ACSM1 GLYAT GLYATL3 ACSM2B ACSM2A ACSM4 GLYATL1 69 | REAC:2408517 SeMet incorporation into proteins AIMP1 QARS RARS EPRS MARS KARS DARS AIMP2 IARS EEF1E1 LARS 70 | REAC:5603027 IKBKG deficiency causes anhidrotic ectodermal dysplasia with immunodeficiency (EDA-ID) (via TLR) NFKB1 NFKBIA CHUK RELA NFKB2 NFKBIB IKBKG IKBKB 71 | REAC:112411 MAPK1 (ERK2) activation MAPK1 JAK2 MAP2K2 PTPN11 TYK2 IL6R JAK1 IL6ST IL6 72 | REAC:193048 Androgen biosynthesis SRD5A2 HSD17B3 HSD3B1 CGA HSD17B12 LHB SRD5A3 SRD5A1 POMC CYP17A1 HSD3B2 73 | REAC:879518 Transport of organic anions SLCO2B1 SLCO1B3 SLCO1A2 SLCO1C1 SLCO1B1 AVP SLCO3A1 SLCO2A1 SLCO4C1 SLC16A2 SLCO4A1 ALB 74 | REAC:203641 NOSTRIN mediated eNOS trafficking WASL CAV1 NOS3 NOSTRIN DNM2 75 | REAC:2660826 Constitutive Signaling by NOTCH1 t(7;9)(NOTCH1:M1580_K2555) Translocation Mutant JAG2 NOTCH1 DLL4 DLL1 ADAM17 JAG1 ADAM10 76 | REAC:209822 Glycoprotein hormones FSHB CGB5 INHBC LHB INHBA CGB3 TSHB CGA CGB8 INHA INHBE INHBB 77 | REAC:189451 Heme biosynthesis FECH COX10 UROD ALAS1 PPOX HMBS COX15 ALAS2 UROS CPOX ALAD 78 | REAC:5637815 Signaling by Ligand-Responsive EGFR Variants in Cancer PIK3CA UBB GRB2 SOS1 GAB1 NRAS RPS27A HSP90AA1 HRAS KRAS CDC37 EGFR SHC1 PIK3R1 EGF UBC PLCG1 CBL UBA52 79 | REAC:75205 Dissolution of Fibrin Clot PLG SERPINB8 PLAUR PLAT SERPINB2 SERPINF2 SERPINB6 S100A10 HRG SERPINE2 PLAU ANXA2 SERPINE1 80 | REAC:111471 Apoptotic factor-mediated response CASP7 CYCS CASP3 APAF1 CASP9 XIAP DIABLO 81 | REAC:2408508 Metabolism of ingested SeMet, Sec, MeSec into H2Se AHCY CBS GNMT MAT1A NNMT HNMT SCLY CTH 82 | REAC:200425 Import of palmitoyl-CoA into the mitochondrial matrix PRKAG2 PRKAB2 CPT1B PPARD THRSP CPT2 MID1IP1 PRKAA2 RXRA CPT1A SLC22A5 SLC25A20 83 | REAC:176974 Unwinding of DNA MCM3 MCM6 CDC45 MCM5 GINS3 MCM2 MCM4 MCM8 GINS2 GINS1 GINS4 MCM7 84 | REAC:5684264 MAP3K8 (TPL2)-dependent MAPK1/3 activation UBB CUL1 MAP3K8 FBXW11 MAP2K4 MAP2K1 RPS27A IKBKG CHUK SKP1 TNIP2 BTRC NFKB1 IKBKB UBC UBA52 85 | REAC:69416 Dimerization of procaspase-8 RIPK1 FADD FAS CASP8 FASLG TRAF2 CFLAR TNFRSF10B TNFRSF10A TNFSF10 TRADD 86 | REAC:174495 Synthesis And Processing Of GAG, GAGPOL Polyproteins RPS27A UBAP1 VPS37B VPS37D UBB VPS37C MVB12B MVB12A NMT2 UBA52 TSG101 VPS28 UBC VPS37A 87 | REAC:2022377 Metabolism of Angiotensinogen to Angiotensins C9ORF3 GZMH MME CPB1 ACE2 CTSZ ACE CPB2 ENPEP AGT REN CTSD ATP6AP2 CTSG ANPEP CPA3 CMA1 88 | REAC:561048 Organic anion transport SLC22A8 SLC22A7 SLC22A11 SLC22A6 SLC22A12 89 | REAC:189085 Digestion of dietary carbohydrate AMY2A AMY2B MGAM AMY1A LCT AMY1B AMY1C TREH SI 90 | REAC:4793953 Defective B4GALT1 causes B4GALT1-CDG (CDG-2d) LUM KERA PRELP OMD FMOD OGN ACAN 91 | REAC:8857538 PTK6 promotes HIF1A stabilization HIF1A PTK6 HBEGF LINC01139 GPNMB LRRK2 EGFR 92 | REAC:111931 PKA-mediated phosphorylation of CREB ADCY7 ADCY2 ADCY9 PRKAR1A PRKAR2B ADCY5 ADCY4 ADCY1 PRKACA ADCY6 PRKACG PRKAR1B PRKACB PRKAR2A CREB1 ADCY8 ADCY3 93 | REAC:2197563 NOTCH2 intracellular domain regulates transcription FCER2 HES1 CREB1 EP300 RBPJ MAML1 GZMB MAML3 HES5 NOTCH2 MAMLD1 MAML2 94 | REAC:111957 Cam-PDE 1 activation CALM3 CALM1 PDE1C CALM2 PDE1A PDE1B 95 | REAC:5676934 Protein repair MSRB2 MSRA PCMT1 MSRB1 MSRB3 TXN 96 | REAC:4420332 Defective B3GALT6 causes EDSP2 and SEMDJL1 SDC2 BCAN VCAN CSPG4 GPC3 CSPG5 GPC5 SDC1 SDC3 BGN AGRN HSPG2 SDC4 GPC1 NCAN GPC6 GPC2 DCN GPC4 97 | REAC:428540 Activation of Rac RAC1 PAK3 BUB1B-PAK6 SOS2 ROBO1 PAK6 PAK1 NCK2 RNASE1 PAK2 GPC1 PAK5 SOS1 NCK1 PAK4 SLIT2 98 | REAC:549132 Organic cation/anion/zwitterion transport SLC22A1 RSC1A1 SLC22A15 SLC22A18 SLC22A5 SLC22A12 SLC22A6 SLC22A4 SLC22A11 RUNX1 SLC22A7 SLC22A2 SLC22A16 SLC22A3 SLC22A8 99 | REAC:1839130 Signaling by activated point mutants of FGFR3 FGF4 FGF2 FGF18 FGF8 FGFR3 FGF16 FGF23 FGF9 FGF20 FGF5 FGF17 FGF1 100 | REAC:5688849 Defective CSF2RB causes pulmonary surfactant metabolism dysfunction 5 (SMDP5) SFTPA2 SFTPB CSF2RB CSF2RA SFTA3 SFTPC SFTPD SFTPA1 101 | -------------------------------------------------------------------------------- /tests/testthat/test_columnContribution.r: -------------------------------------------------------------------------------- 1 | context('columnContribution function') 2 | 3 | 4 | test_that('Column Contribution ratio is correct', { 5 | col <- colnames(dat)[1] 6 | 7 | res <- ActivePathways(dat, gmt, significant = 1) 8 | res_just_column <- ActivePathways(dat[, 1, drop = FALSE], gmt, significant = 1) 9 | expect_equal(res_just_column$overlap, res[[paste0("Genes_", col)]]) 10 | }) 11 | -------------------------------------------------------------------------------- /tests/testthat/test_columnSignificance.r: -------------------------------------------------------------------------------- 1 | context('columnSignificance Function') 2 | 3 | 4 | test_that('columnSignificance agrees with testing individual columns', { 5 | col <- colnames(dat)[1] 6 | 7 | res1 <- columnSignificance(dat, gmt, background, 0.1, 0.05, 'holm', rep(0.05, length(gmt))) 8 | res2 <- ActivePathways(dat[, col, drop = FALSE], gmt, 9 | correction_method = 'holm', geneset_filter = NULL) 10 | 11 | # Pathways that are significant according to columnSignificance 12 | comp1 = res1$term_id[ sapply(res1[["Genes_cds"]], function(x) !all(is.na(x))) ] 13 | 14 | expect_true(setequal(comp1, res2$term_id)) 15 | }) 16 | 17 | test_that('Column names of columnSignificance result is correct', { 18 | res <- columnSignificance(dat, gmt, background, 0.1, 0.05, 'holm', rep(0.05, length(gmt))) 19 | expect_equal(colnames(res), c('term_id', 'evidence', paste0("Genes_", colnames(dat)))) 20 | }) 21 | -------------------------------------------------------------------------------- /tests/testthat/test_cytoscape.r: -------------------------------------------------------------------------------- 1 | context("Validation of Cytoscape Files and Test that the files are written") 2 | 3 | 4 | test_that("cytoscape_filenames specified", { 5 | 6 | expect_error(ActivePathways(dat, gmt, cytoscape_file_tag = CStag, significant = 1), NA) 7 | expect_message(ActivePathways(dat[,1, drop = F], gmt, cytoscape_file_tag = CStag, significant = 1), 8 | "scores matrix contains only one column. Column contributions will not be calculated", 9 | fixed = TRUE) 10 | }) 11 | 12 | 13 | test_that("Cytoscape files are written", { 14 | 15 | CS_fnames = paste0(CStag, c("pathways.txt", "subgroups.txt", "pathways.gmt", "legend.pdf")) 16 | 17 | suppressWarnings(file.remove(CS_fnames)) 18 | ActivePathways(dat, gmt, cytoscape_file_tag = CStag, significant = 0.9, cutoff = 1) 19 | expect_equal(file.exists(CS_fnames), c(TRUE, TRUE, TRUE, TRUE)) 20 | 21 | suppressWarnings(file.remove(CS_fnames)) 22 | ActivePathways(dat[,1, drop = F], gmt, cytoscape_file_tag = CStag, significant = 0.9, cutoff = 1) 23 | expect_equal(file.exists(CS_fnames), c(TRUE, FALSE, TRUE, FALSE)) 24 | 25 | suppressWarnings(file.remove(CS_fnames)) 26 | suppressWarnings(ActivePathways(dat, gmt, cytoscape_file_tag = CStag, significant = 0)) 27 | expect_equal(file.exists(CS_fnames), c(FALSE, FALSE, FALSE, FALSE)) 28 | 29 | suppressWarnings(file.remove(CS_fnames)) 30 | suppressWarnings(ActivePathways(dat, gmt, cytoscape_file_tag = NA)) 31 | expect_equal(file.exists(CS_fnames), c(FALSE, FALSE, FALSE, FALSE)) 32 | 33 | suppressWarnings(file.remove(CS_fnames)) 34 | }) 35 | -------------------------------------------------------------------------------- /tests/testthat/test_data.txt: -------------------------------------------------------------------------------- 1 | Gene cds promoter enhancer 2 | YWHAB 0.181759849432863 0.573086408819819 0.0801841709343925 3 | PIK3R1 0.602307993749334 0.207969196040606 0.663872931721775 4 | LYN 0.470537676163286 0.674624033841617 0.267676513421447 5 | CBL 0.550952588371561 0.116561937843907 0.209648324944454 6 | PIK3CA 0.0540076617030512 0.0290456574381137 0.382803198202891 7 | FYN NaN 0.673002840468483 0.0162136835995857 8 | GRB2 0.286803788768734 0.134829611925831 0.0631235526924534 9 | CRK 0.719999151213222 0.0655267271138409 0.185142055481746 10 | YES1 0.578556194515032 0.124656734114965 0.228050611170295 11 | FFAR1 0.00303599268752283 0.0636714240840086 0.0492807721449381 12 | SRD5A3 0.288074096116829 0.0358067339897629 0.123682154635008 13 | TGFB1 0.0612956812817711 0.0610529986408099 0.323547128645997 14 | SMAD2 0.568142036584863 0.0947513326601436 NaN 15 | SMAD3 0.176208589757222 0.203036016818637 0.621776050218014 16 | TGFBR2 0.215742489942542 0.0625617834021545 0.323493076750721 17 | ZFYVE9 0.0224740590393185 0.388509516563473 0.403654829321766 18 | TGFBR1 0.538995651513955 0.00104396926512306 0.00660456963077945 19 | FASLG 0.826268064502935 0.570385053029772 0.614228033751506 20 | FAS 0.129042188048758 0.20908063560038 0.00372549509946649 21 | TNFRSF10B 0.0787923912677442 0.0574973243105529 0.386117968844315 22 | TRAF2 0.361874014385668 0.22442794052215 0.219061256254318 23 | RIPK3 0.136275282672798 0.0490289357618559 0.139143487766408 24 | FADD 0.140990912559879 0.328272367301317 0.0514925745297319 25 | CASP8 0.491529744931601 0.300376305357171 0.108679339959835 26 | RIPK1 0.342228607177507 0.168452777932317 0.294468711326761 27 | TNFRSF10A 0.462360632575749 0.0348048284650228 0.337221848126996 28 | XIAP 0.280093566017082 0.231719806343191 0.215701169481379 29 | TRADD 0.163648477250899 0.0241196800137681 0.223511220917899 30 | TNFSF10 0.295085663460857 0.0143604971451271 0.619212329533342 31 | CFLAR 0.509505723632392 0.0577364266381152 0.0473195089397215 32 | SKP1 0.440620422727399 0.653130867621918 0.0107270688797775 33 | CUL1 0.69308480218818 0.00752388811354779 0.0031797652177559 34 | rty 0.187285536302794 0.397634913189841 0.0120889812105367 35 | DCN 0.172936536608252 0.00167639656635933 0.0340881081725078 36 | HEXA 0.33022116641044 0.931526053768471 0.27134635449122 37 | NCAN 0.18238263117686 0.299748397560577 0.474814417973435 38 | ARSB 0.23111211631101 0.34311876315205 0.533967882602817 39 | zxc 0.284584475728181 0.351512281944281 0.592723365828653 40 | HEXB 0.648283156662765 0.239973120916298 0.086056202465542 41 | CSPG5 0.659346099292787 0.0487821028774078 0.358557947073338 42 | VCAN 0.0686878027536848 0.491104585717545 0.0293370810717161 43 | fgh 0.0509080834579079 0.187641833440054 0.901662542354382 44 | BCAN 0.787631766257433 0.225030615489533 0.0374954394106719 45 | qwe 0.326165001483189 0.227565735120978 0.236017704377374 46 | PPP2R5C 0.0256205616326249 0.237770242497341 0.703038785186687 47 | PPP2R5B 0.668138052886544 0.133410161184475 0.16182500769308 48 | CSNK1A1 0.095428723884147 0.496964753033534 0.399608162580152 49 | asd 0.00682200165226474 0.081460833287022 0.0066516898093222 50 | PPP2R5E 0.218919950042612 0.570025483190649 NaN 51 | PPP2R5A 0.111667365168111 0.0697380171623881 0.016823949197242 52 | PPP2R5D 0.52537266912045 0.063956104952396 0.0804431493305053 53 | AMER1 0.0635450420358293 0.152257093354487 0.245071437603002 54 | GSK3B 0.508044461840615 0.255655872271908 0.593995143698125 55 | PPP2CA 0.00626565432006055 0.124474012488303 0.240561275336224 56 | APC 0.362197080075817 0.411662319752638 0.132461065475608 57 | PPP2CB 0.205843495067803 0.635863265436401 0.378447105687548 58 | PPP2R1A 0.0985574849943051 0.588028264171712 0.255382889202054 59 | MUC2 0.405105057017385 0.154423601969691 0.387027882147937 60 | MUC5B 0.174949507957768 0.535187173560105 0.00204348738970449 61 | MUC19 0.106674213966976 0.674560255787201 0.278825827268245 62 | MUC3A 0.0117273200509688 0.160412973337993 0.295858270553739 63 | MUC12 0.0495774824192386 0.786175224347218 0.419916319436669 64 | MUC7 0.636459665502816 0.580504673105255 NaN 65 | MUC6 0.395838210813141 0.0481391996153076 0.108968288063993 66 | MUC5AC 0.279811607955896 NaN 0.197066465382849 67 | MUC21 0.175460048777475 0.274879467257154 0.150139895113747 68 | MUC15 0.333821054134149 0.228122891010576 0.0111420352647364 69 | MUCL1 0.379818836725266 0.220906795424679 0.535082329455193 70 | MUC4 0.904550466894486 0.0527523599134175 0.237503425736444 71 | MUC20 0.179841497617397 0.096855549952906 0.0337274434852643 72 | MUC1 0.886684840037558 0.149391315008235 0.696308352577933 73 | MUC13 0.711293035984503 0.5343909634298 0.0179609866278288 74 | MUC16 0.423176647603287 0.132447232115612 0.0677308970025233 75 | MUC17 0.500910607124346 0.173990975121157 0.0248499519296264 76 | NOTCH2 0.266290030147768 0.753648701095022 0.0762196409571534 77 | ALOX15 0.118226541780289 0.155017854300937 0.361492803953278 78 | GPX4 0.00843781888462744 0.336992751483621 0.270437507231239 79 | GPX2 0.741130866151387 0.564265427419989 0.251102107978515 80 | GPX1 0.0030950654526386 0.108904545446975 0.10684374021452 81 | CASP10 0.0554205690135064 0.102944850372978 0.0826767820182217 82 | ACAN 0.784223631802517 0.0841097031577299 0.10120852124552 83 | FMOD 0.743174281732335 0.129713597785276 0.326095776862619 84 | OGN 0.0581407235165238 0.386195902945662 0.122358916983575 85 | KERA 0.0937110839782113 0.0107751800483757 0.371142369002521 86 | OMD 0.68830180098311 0.460768175970274 0.00994624933071638 87 | PRELP 0.0468314374082694 0.387730193019547 0.31670497294304 88 | LUM 0.155238545178946 0.450677720254535 0.300686039230981 89 | CALM3 0.294082753617955 0.0973512997693561 0.0757295548189794 90 | CALM1 0.0162164472903082 0.0146462814141986 0.0883705169615348 91 | CALM2 0.28867463584822 NaN 0.476320391605935 92 | CREB1 0.100589558043417 0.00441554136643305 0.193774879008658 93 | IRAK4 0.00187477740643946 0.385542838344349 0.317205656228724 94 | TLR4 0.113487144001835 0.0509786886990751 0.144689976899324 95 | MYD88 0.0531301110581846 NaN 0.0970400569269218 96 | CD14 0.262592331139313 0.164184846544988 0.442698832228852 97 | LY96 0.586673957860476 0.400579860275021 0.318368637777921 98 | AVP 0.0205120962953295 0.13026984986983 0.27948032541776 99 | TP53 0.227437347116785 0.116820408172034 0.0672024581528924 100 | HMGA1 0.175266313377645 0.195205552944568 0.55227284759047 101 | BANF1 0.00862024755915763 0.404321216235234 NaN 102 | PSIP1 0.0315422441663664 0.260179410362494 0.0193001785639445 103 | RHOC 0.46282590745913 0.0800188748102712 0.38912825821739 104 | RNASE1 0.391651541447617 0.296354310690846 0.375541368175919 105 | RAC1 0.0497605816406729 0.473192699224232 0.00785975981540628 106 | RHOB 0.15536420963248 0.00831086962178004 0.59812700514789 107 | RHOA 0.173077362901154 0.0654017654847227 0.135245141152461 108 | SRC 0.212359437153009 0.0122535219756262 0.507312746796874 109 | NGFR 0.322377774987638 0.181379556715301 0.352457843008307 110 | CASP1 0.308680244342293 0.216286729560113 0.0126951770622138 111 | NLRC4 0.223874641936647 0.530559156350974 0.158444218992153 112 | TXN 0.210539932795287 0.00727720974666793 0.227056053007002 113 | CASP9 0.249826923228618 0.368385912313289 0.302337101590475 114 | CASP3 0.38916880865594 0.270582585454458 0.394530508850117 115 | TRAF6 0.00499926355836968 0.757390671934309 0.314606112914376 116 | IRAK1 0.0368293027963075 0.213197299245414 0.0213526357655416 117 | HSP90AA1 0.33071309399982 0.084418215333913 0.11277048299554 118 | DVL2 0.497118703694093 0.177351832995164 0.0176069871193889 119 | APAF1 0.00638770741570969 0.0280786242682516 0.00253383433249355 120 | NGF 0.701357809121609 0.224052417264222 0.205976732317063 121 | SOS1 0.43215780379985 NaN 0.0398484022264086 122 | UBB 0.402179697719774 0.155442919318031 0.0912067445566898 123 | RPS27A 0.763321282798381 0.105533938373264 0.278925020638349 124 | NFKBIA 0.322306480886568 0.881330034849885 0.0149157670047246 125 | NFKB1 0.750516107024684 0.524829883640447 0.186042826898887 126 | UBC 0.0333904103535825 0.34319503837792 0.142243777205839 127 | IKBKB 0.426156664406726 0.158072265510563 0.153501809385075 128 | RELA 0.631518450016278 0.0187994230118444 0.691485574359667 129 | UBA52 0.477150317357163 0.0345409139103193 0.337136959982154 130 | CHUK 0.33664114526757 0.11167441558727 0.0591199470884607 131 | IKBKG 0.0273153924865842 0.0485566114907025 0.216172251093514 132 | CGA 0.0304592868066794 0.330087367618374 0.0704885247069825 133 | LHB 0.116528473112306 0.22983749350818 0.215878879567404 134 | EGFR 0.245749468620911 0.428237554651949 0.157023897185237 135 | SLC22A5 0.221200111634252 0.64464870188557 0.0176161706232961 136 | SLC22A8 0.452956386587425 0.0075086814505186 0.382419380566935 137 | SLC22A7 0.416167622932431 0.270523771951322 0.126673352104803 138 | SLC22A11 0.576297231810438 0.421769362201008 0.137786002942712 139 | SLC22A6 0.280550250201698 0.623302031329024 0.109465727804541 140 | SLC22A12 0.421394167036946 0.0769854744091124 0.00664221909901664 141 | GPC1 0.140697512299858 0.12154599641336 NaN 142 | -------------------------------------------------------------------------------- /tests/testthat/test_data_rna_protein.tsv: -------------------------------------------------------------------------------- 1 | gene rna_pval rna_log2fc protein_pval protein_log2fc 2 | TBX20 0.000357525 -1.105589466 NA NA 3 | TPRG1 0.000450743 1.692824405 0.015971771 0.605260822 4 | TAGLN3 0.000586889 -0.934074291 NA NA 5 | COL2A1 0.000756495 -1.11660567 0.893790743 -0.144821024 6 | TMEM220 0.000756495 -1.591869755 NA NA 7 | SLC17A2 0.001814256 -1.646219828 NA NA 8 | DAPK2 0.001944503 -1.236295967 0.045711209 -0.507885739 9 | MAMDC2 0.001944503 -1.698242123 0.076839798 -1.964908518 10 | DCLK1 0.002421783 -1.038307475 0.015971771 -1.127911513 11 | KHNYN 0.002995212 1.142517108 0.005479684 -1.017432094 12 | TMEM55B 0.002995212 0.808984306 0.978697383 0.291851953 13 | HUS1B 0.002995212 -1.005525351 NA NA 14 | SLC22A10 0.003010222 0.74624427 NA NA 15 | PPP1R13B 0.003684403 1.153023448 0.018784685 0.98526776 16 | CHD8 0.003684403 1.318268287 0.37602023 0.620428325 17 | SYPL1 0.00450474 1.231264698 0.05232352 1.005819252 18 | CRLS1 0.00450474 -1.292625821 NA NA 19 | HSD3B7 0.00450474 1.090055947 0.111888112 0.311957592 20 | RAB2B 0.00450474 1.171185417 0.768886795 -0.067745508 21 | RAMP1 0.00450474 -1.203869726 NA NA 22 | IZUMO1 0.00450474 0.977368481 NA NA 23 | NARF 0.005479684 -0.82458169 0.688522201 0.177777166 24 | DNAJA4 0.005479684 -1.004742924 0.086761533 -0.846859147 25 | DTWD2 0.005479684 1.072007456 1 0.012433014 26 | TMEM198 0.005479684 -0.867986706 NA NA 27 | GIP 0.005648776 -1.021586722 NA NA 28 | DGKK 0.005828361 -0.982368597 NA NA 29 | AP1G2 0.006628079 1.37972769 0.294537238 0.253267234 30 | TAF7L 0.006628079 -0.744959654 NA NA 31 | C19orf57 0.006628079 1.427460164 NA NA 32 | JSRP1 0.006628079 -1.013067608 0.122517932 -0.894223144 33 | RBM12B 0.006628079 1.1387136 0.347492645 0.525667888 34 | POLN 0.006628079 -0.547246086 NA NA 35 | ASCL1 0.007082013 -1.045419331 NA NA 36 | ARID4A 0.00797954 1.070053132 0.059675788 0.484819373 37 | RBBP4 0.00797954 -1.805544172 1 -0.224044148 38 | SYN1 0.00797954 -1.240076496 1 -0.243574681 39 | MYH15 0.00797954 -0.903393125 0.076839798 -0.512752607 40 | IPO11 0.00797954 1.088966953 0.574292637 0.361459873 41 | MEPCE 0.00797954 1.518069613 0.320321905 0.720907206 42 | RHBDF2 0.00797954 0.723731174 0.034513009 1.036065362 43 | RNF145 0.00797954 0.960588002 0.218848025 0.607235045 44 | NOP9 0.00797954 1.008246249 0.006628079 0.983550708 45 | CPNE7 0.00797954 1.154804721 NA NA 46 | TRPV5 0.008090787 0.644002696 NA NA 47 | SLC31A1 0.009556372 0.784097699 NA NA 48 | GPX7 0.009556372 -1.170961288 0.151872392 -0.939190446 49 | ING1 0.009556372 -1.095231053 0.103708514 -1.206054698 50 | PFKP 0.009556372 0.734171072 0.029822126 0.653130432 51 | SCAMP1 0.009556372 1.350597405 0.978697383 0.657603859 52 | FAF2 0.009556372 0.821466416 0.270123647 0.560733539 53 | PLCE1 0.009556372 -1.109233066 NA NA 54 | FAR2 0.009556372 0.456814184 NA NA 55 | AP5M1 0.009556372 1.141371435 0.136614041 0.641827057 56 | MRPS11 0.009556372 -1.335737706 0.810041728 0.127962304 57 | DHRS1 0.009556372 0.864620275 0.076839798 0.786212305 58 | SLFN13 0.009556372 -1.00963914 NA NA 59 | CADM2 0.009556372 -0.749532162 NA NA 60 | ZNF425 0.009556372 1.422190487 NA NA 61 | PGBD2 0.009556372 1.181635837 NA NA 62 | CCDC177 0.009761752 0.68807853 NA NA 63 | TRIM60 0.009972042 -1.545409372 NA NA 64 | USP10 0.011393573 1.028713556 0.136614041 0.837526835 65 | GTF2IRD1 0.011393573 1.011279562 0.127428127 1.289800091 66 | TOX4 0.011393573 0.889082427 0.469593677 -0.114132794 67 | PRMT5 0.011393573 0.768809429 0.086761533 0.82702538 68 | NCOA2 0.011393573 0.774338167 0.503319039 0.550695269 69 | RUSC1 0.011393573 0.936215214 0.864616785 -0.092534527 70 | MTURN 0.011393573 -1.635825951 0.674396811 -0.118331203 71 | SMIM20 0.011393573 -0.794011156 0.025666212 -1.113577341 72 | ADPRM 0.011393573 -1.211928747 NA NA 73 | MAPK4 0.011393573 -0.870785701 NA NA 74 | WNT5B 0.011393573 -1.065140524 NA NA 75 | MAP3K14 0.011393573 -1.177319935 NA NA 76 | HBM 0.01275843 1.427961962 0.236985237 1.160652335 77 | CA7 0.013151255 -0.594299142 NA NA 78 | RALYL 0.013166999 -1.173671993 NA NA 79 | PSG3 0.013257371 0.798473654 NA NA 80 | CROCC2 0.013310573 -1.035260571 0.294537238 1.917226435 81 | AP2B1 0.013518835 -1.219757499 0.893790743 0.128839961 82 | CUX1 0.013518835 0.75074608 0.015971771 1.011067942 83 | FGD1 0.013518835 -0.807987005 NA NA 84 | GAS8 0.013518835 1.311705956 NA NA 85 | EIF4H 0.013518835 0.586479532 0.122517932 0.253250419 86 | FBP2 0.013518835 -0.822130547 0.225413534 -0.694679452 87 | SOCS2 0.013518835 -1.358713506 NA NA 88 | LRRC49 0.013518835 -0.666150035 NA NA 89 | PANK3 0.013518835 1.334098652 0.405904275 0.705984706 90 | EMC6 0.013518835 -0.750929476 1 1.347986834 91 | EPPK1 0.013518835 1.156332217 0.059675788 0.744371633 92 | TRIM41 0.013518835 1.186057465 0.630528914 0.142162735 93 | CRY1 0.013518835 -0.977021758 NA NA 94 | FXYD1 0.013518835 -0.767150995 NA NA 95 | TTLL1 0.013518835 -0.789939528 NA NA 96 | ZYG11A 0.013518835 0.992600549 NA NA 97 | RAD51L1 0.013518835 0.922141757 NA NA 98 | FIGN 0.013518835 -0.743659684 NA NA 99 | CCDC62 0.013518835 -0.761940866 NA NA 100 | SIM1 0.014359259 -0.816299684 NA NA 101 | CALML3 0.015971771 0.974592667 0.015971771 1.16270074 102 | MTTP 0.015971771 -1.492207502 0.628571429 -1.815562057 103 | EIF5B 0.015971771 0.710031225 0.893790743 -0.145090034 104 | SCFD1 0.015971771 1.062474486 0.009556372 1.090109309 105 | HBP1 0.015971771 1.120776017 NA NA 106 | TMOD2 0.015971771 -0.866905749 0.109548295 -0.620059409 107 | DBR1 0.015971771 0.972127105 0.247091129 0.287829105 108 | TBC1D7 0.015971771 -1.733317374 0.955089355 -0.168800252 109 | FUNDC2 0.015971771 -0.924000602 0.018784685 -0.877012634 110 | EFCAB1 0.015971771 -1.329640051 NA NA 111 | REEP4 0.015971771 0.37786569 0.649510605 0.183245443 112 | COL21A1 0.015971771 -0.612811973 0.572760573 -0.865929997 113 | RNF170 0.015971771 1.286652391 0.766432484 0.037020852 114 | ATG10 0.015971771 1.135243111 NA NA 115 | CORO6 0.015971771 -1.168547316 0.437101706 -0.820529902 116 | SLC5A12 0.015971771 0.710350342 NA NA 117 | RALGAPA1 0.015971771 0.30796241 0.002995212 1.138278243 118 | SRPK3 0.015971771 -1.138555211 NA NA 119 | DEFA4 0.01616114 1.048160071 0.039795012 0.952835518 120 | CCK 0.01793474 -0.722583633 NA NA 121 | F12 0.018784685 0.861768435 0.611412419 -0.46230423 122 | FNTA 0.018784685 0.72152704 0.20508663 0.312886485 123 | HIST1H1A 0.018784685 -0.82293859 0.025666212 -0.659115125 124 | IFI27 0.018784685 0.793442768 NA NA 125 | CD99 0.018784685 -0.862031929 0.005479684 -0.96600878 126 | XPNPEP2 0.018784685 -0.907463709 1 -0.031775315 127 | CLINT1 0.018784685 0.311587342 0.018784685 0.712626692 128 | CTCF 0.018784685 1.132315778 0.469593677 -0.484562394 129 | EML2 0.018784685 0.822731442 0.039795012 0.843028384 130 | LRP10 0.018784685 1.312518564 0.00797954 1.271599045 131 | SUSD5 0.018784685 -1.446855602 NA NA 132 | MAT2B 0.018784685 0.900001515 0.045711209 0.63366498 133 | SERPINA10 0.018784685 -1.133937437 0.405904275 -0.229143393 134 | ARMCX1 0.018784685 -0.869225137 0.018794774 -1.14734795 135 | CDKAL1 0.018784685 -1.679050116 0.247091129 -0.616640144 136 | PARP16 0.018784685 -0.467599741 0.31959707 -0.208408305 137 | C1GALT1 0.018784685 -0.775887675 1 0.182597728 138 | TRMT5 0.018784685 0.720214087 0.076839798 0.600977224 139 | C14orf93 0.018784685 1.285617223 NA NA 140 | ZNF655 0.018784685 1.721029817 0.089209552 0.459080313 141 | NLRX1 0.018784685 1.058067919 0.20508663 0.415301806 142 | EFHC2 0.018784685 -0.992127041 NA NA 143 | C1orf21 0.018784685 -0.893406586 0.623878701 -0.539503655 144 | TPGS1 0.018784685 -0.427021579 NA NA 145 | LRRN4 0.018784685 -1.020970634 NA NA 146 | TRIM7 0.018784685 0.860553697 0.036522301 1.517764677 147 | SLC39A3 0.018784685 -1.081669893 NA NA 148 | SPIRE2 0.018784685 0.66259784 NA NA 149 | FUT1 0.018784685 1.063330371 NA NA 150 | DHRS4L1 0.018784685 1.122289655 NA NA 151 | SLC13A2 0.019108754 0.548421006 NA NA 152 | TMEM82 0.020817976 -0.722850121 NA NA 153 | ADIPOQ 0.021751781 -0.822727063 0.574292637 -0.570247296 154 | BIN1 0.0220045 -1.189585728 0.109548295 -0.974888675 155 | ARHGAP5 0.0220045 0.851355772 0.076839798 0.834539815 156 | CYC1 0.0220045 1.338254833 0.574292637 0.107577951 157 | CYP24A1 0.0220045 1.151659865 0.93616176 0.095761425 158 | ESRRA 0.0220045 0.660113769 0.149964105 0.277820695 159 | HSD17B4 0.0220045 1.208505275 0.011393573 0.964397863 160 | HUS1 0.0220045 -0.887604587 0.059675788 -1.074655885 161 | LTA4H 0.0220045 0.605073618 0.029822126 0.656911544 162 | MSH3 0.0220045 0.595499727 0.37602023 0.195951859 163 | PAH 0.0220045 -1.050932983 NA NA 164 | PSME1 0.0220045 1.085163107 0.151872392 1.021176065 165 | SLC20A2 0.0220045 0.90146853 0.437101706 0.555860354 166 | TST 0.0220045 0.495652934 0.270123647 0.048579392 167 | VDAC3 0.0220045 0.950739895 0.810041728 0.335234937 168 | WFS1 0.0220045 0.388992952 0.009556372 0.409441223 169 | DGKD 0.0220045 -0.993987219 0.674396811 0.682444501 170 | DGAT1 0.0220045 0.848706873 0.180652681 -0.442313433 171 | VPS4B 0.0220045 0.870010447 0.015971771 0.789952594 172 | SCRN1 0.0220045 -0.995964561 0.002421783 -1.178174456 173 | NFASC 0.0220045 -0.836618651 0.571428571 -0.477647981 174 | PAMR1 0.0220045 -1.04891043 0.650549746 0.131151306 175 | TMEM98 0.0220045 -0.93615148 NA NA 176 | FBXO4 0.0220045 -1.317559593 0.009556372 -1.139629931 177 | TINF2 0.0220045 0.702787892 0.125942685 0.630886873 178 | NMD3 0.0220045 0.979023773 0.097642444 0.771109828 179 | MINDY4 0.0220045 -0.714932152 0.4 -1.332866921 180 | LCA5 0.0220045 -1.122629113 NA NA 181 | FAM45A 0.0220045 0.644317329 0.37602023 0.45236434 182 | C6orf132 0.0220045 0.83027806 0.05232352 0.722427323 183 | TMPRSS11D 0.0220045 0.72611518 0.405904275 0.300725076 184 | FAM189A2 0.0220045 -0.779077616 NA NA 185 | KCNB1 0.0220045 -1.135530768 NA NA 186 | TRIM71 0.0220045 -0.390862224 NA NA 187 | BCL2L2-PABPN1 0.0220045 1.142463011 NA NA 188 | ZKSCAN7 0.0220045 -0.76199694 NA NA 189 | KIAA1377 0.0220045 -1.108424161 NA NA 190 | HCAR3 0.0220045 0.581568191 NA NA 191 | EGF 0.025666212 -0.619128093 NA NA 192 | GOLGA2 0.025666212 0.771911066 0.538245486 0.20414804 193 | GPR39 0.025666212 -0.697353915 NA NA 194 | HES1 0.025666212 0.902489219 0.78594874 0.549922842 195 | IGF1 0.025666212 -1.054153995 0.638888889 -0.300969816 196 | PNN 0.025666212 0.854128804 0.574292637 -0.291719599 197 | RAB3B 0.025666212 -1.348240928 0.320321905 -0.635523855 198 | RABGGTA 0.025666212 0.867452733 0.097642444 0.763373148 199 | SRP54 0.025666212 0.575056607 0.20508663 0.535352723 200 | ZBTB16 0.025666212 -1.021094326 NA NA 201 | PLA2G7 0.025666212 -1.180397876 0.000980713 1.445835206 202 | CLDN2 0.025666212 -1.145554858 NA NA 203 | WDR47 0.025666212 0.725264358 0.005479684 0.99459523 204 | ICOSLG 0.025666212 -0.706157977 NA NA 205 | SACS 0.025666212 -1.773699496 0.270123647 -1.290294442 206 | RANGRF 0.025666212 -0.712027992 0.565783927 -0.447246023 207 | UGGT1 0.025666212 -0.684258871 0.294537238 -0.451493623 208 | AKR1B10 0.025666212 0.736256055 0.018784685 0.749245676 209 | HOMEZ 0.025666212 1.241970113 0.075414781 1.013902997 210 | FAM129B 0.025666212 1.008986306 0.039795012 0.701572406 211 | DCAF11 0.025666212 0.954693826 0.347492645 0.864227715 212 | SFXN1 0.025666212 1.950901287 0.186070034 1.060301035 213 | CAVIN3 0.025666212 -1.259241046 0.136614041 -0.514935324 214 | DCBLD2 0.025666212 -1.010200191 0.458520823 -0.518327588 215 | GPAT4 0.025666212 0.634149344 0.776210828 0.19256231 216 | POTEI 0.025666212 -1.269284259 0.503319039 0.399375873 217 | PPP4R4 0.025666212 -0.722077425 0.111888112 0.446241484 218 | ASIC2 0.025666212 -0.95651339 NA NA 219 | ATRNL1 0.025666212 -0.883351194 NA NA 220 | DIRAS1 0.025666212 -0.846225689 NA NA 221 | FAM171B 0.025666212 -1.151664498 NA NA 222 | LMTK3 0.025666212 0.879223878 NA NA 223 | NRXN1 0.025666212 -1.155990484 NA NA 224 | TMEM127 0.025666212 0.776671103 NA NA 225 | YY2 0.025666212 -0.790033094 NA NA 226 | HSF4 0.025666212 0.848922808 NA NA 227 | SLC22A9 0.027571262 0.601907638 NA NA 228 | GUCA2A 0.027976658 -1.233709192 NA NA 229 | KHDRBS2 0.029545414 -0.729337219 NA NA 230 | CDK7 0.029822126 0.717408259 0.503319039 -0.003201871 231 | CEACAM8 0.029822126 0.92764653 0.151872392 0.486575438 232 | CYP17A1 0.029822126 -0.517859844 NA NA 233 | ECI1 0.029822126 0.997321563 0.045711209 0.577918021 234 | DMXL1 0.029822126 0.942172393 0.001944503 0.944059703 235 | FOXC1 0.029822126 -0.876245738 0.690511445 -0.162206933 236 | GSTZ1 0.029822126 0.839064803 0.109548295 0.55781303 237 | KCNQ1 0.029822126 1.241307002 0.437562438 -0.505915781 238 | MTRR 0.029822126 -0.772622893 0.893790743 0.00029816 239 | MYBPC1 0.029822126 -0.741461098 0.109548295 -0.709638265 240 | RP9 0.029822126 -0.53249663 0.44283318 -0.064119047 241 | STAT5B 0.029822126 -0.838160572 0.611412419 -0.05131994 242 | THRSP 0.029822126 -1.070123136 NA NA 243 | ARHGEF7 0.029822126 -0.83108183 0.067832625 -0.699966957 244 | MTMR6 0.029822126 -0.597017158 0.20508663 -0.7680694 245 | CDC42BPB 0.029822126 0.632109389 0.247091129 0.968694876 246 | PJA2 0.029822126 1.76674202 0.029822126 3.631251948 247 | RCOR1 0.029822126 0.524906095 0.05232352 0.65783498 248 | CBLC 0.029822126 0.720283994 0.054750046 0.907683978 249 | ARGLU1 0.029822126 -1.061546865 0.768886795 -0.231311495 250 | NAXD 0.029822126 -0.799851541 0.437101706 -0.466816789 251 | PDXP 0.029822126 -1.118598413 0.122517932 -0.585684256 252 | FIGNL1 0.029822126 -0.799517155 0.574292637 0.276193984 253 | PCYOX1L 0.029822126 0.922707856 0.097642444 1.720411254 254 | DBNDD1 0.029822126 0.853383409 NA NA 255 | TCTN1 0.029822126 -0.513004375 NA NA 256 | ITIH5 0.029822126 -1.166672347 0.00797954 -1.086591356 257 | SSH2 0.029822126 -1.123319274 0.38650761 -0.806467388 258 | FOPNL 0.029822126 0.66878713 NA NA 259 | TMEM30B 0.029822126 -0.026183708 NA NA 260 | LEMD2 0.029822126 -0.814490056 0.029822126 -0.831200423 261 | CA11 0.029822126 1.012893105 0.832944833 -0.029635882 262 | HTR2A 0.029822126 -0.781494287 NA NA 263 | KCNA6 0.029822126 -0.765476197 NA NA 264 | SAMD14 0.029822126 -0.973650526 NA NA 265 | TNF 0.029822126 -0.914800779 NA NA 266 | SCN4A 0.029822126 -0.834432686 NA NA 267 | CCNB3 0.029822126 -1.024624824 NA NA 268 | EXTL1 0.029822126 -0.476942125 NA NA 269 | BARX1 0.029822126 -0.564863949 NA NA 270 | SPP2 0.029834637 1.224432494 0.690511445 0.156005773 271 | GAGE2E 0.029834637 1.583140873 0.851719324 0.287340998 272 | ACTL8 0.031049234 0.947267059 0.114285714 1.139548513 273 | PPEF2 0.031639945 0.350530754 NA NA 274 | CLRN3 0.032694975 -0.786013419 NA NA 275 | ARVCF 0.034513009 -0.28878153 0.076839798 -0.471929965 276 | ATP5B 0.034513009 1.283139251 0.768886795 -0.103179028 277 | ADGRB1 0.034513009 0.551521379 NA NA 278 | CACNB1 0.034513009 -1.132236922 0.768886795 -0.561325178 279 | GRB7 0.034513009 0.820494179 0.059675788 0.718004563 280 | KIF3C 0.034513009 -0.516679957 0.151872392 -0.590733048 281 | MAP3K1 0.034513009 0.648220135 0.098617585 0.876780763 282 | MYO7B 0.034513009 -1.489369136 NA NA 283 | NDUFB3 0.034513009 0.942576391 0.37602023 -0.065719394 284 | PRKCA 0.034513009 -0.555980508 0.067832625 -0.983469881 285 | SCG5 0.034513009 -1.084599746 NA NA 286 | PABPN1 0.034513009 0.795814607 NA NA 287 | NEMF 0.034513009 -0.048227574 0.168348749 0.121192656 288 | CNOT8 0.034513009 1.08673033 0.574292637 0.142202779 289 | FAM13A 0.034513009 -0.675793845 NA NA 290 | PAXIP1 0.034513009 0.662050916 0.0220045 1.192916982 291 | GTPBP4 0.034513009 0.669593891 0.469593677 0.163213392 292 | MYEF2 0.034513009 -1.325128799 0.186070034 -1.831217261 293 | HDGFL3 0.034513009 -0.988133695 0.136614041 -0.784387515 294 | BORCS6 0.034513009 -0.596910571 0.978697383 0.150097775 295 | RFWD3 0.034513009 1.097444934 NA NA 296 | EAPP 0.034513009 0.33467958 0.574292637 0.069426313 297 | ECHDC1 0.034513009 -0.825820557 0.097642444 -0.557956764 298 | RBM25 0.034513009 0.700942626 0.294537238 0.578090304 299 | ENGASE 0.034513009 -1.272550498 0.37602023 -0.695045321 300 | THSD4 0.034513009 -0.680615312 0.649510605 0.08296643 301 | WDR61 0.034513009 -1.156926971 0.688522201 -0.290526632 302 | L3MBTL3 0.034513009 -0.851380686 0.035827949 -0.914739299 303 | MYOM3 0.034513009 -0.764256686 0.168348749 -0.709368019 304 | SPIN4 0.034513009 -1.080150253 NA NA 305 | ATP6V0D2 0.034513009 0.99958212 NA NA 306 | INAFM2 0.034513009 -0.717788811 NA NA 307 | NAALAD2 0.034513009 -0.82289103 NA NA 308 | ADCY2 0.034513009 -0.977049108 NA NA 309 | BRINP1 0.034513009 -0.816822634 NA NA 310 | DCUN1D2 0.034513009 -0.647168739 NA NA 311 | DHRS4L2 0.034513009 1.449805344 NA NA 312 | C20ORF26 0.034513009 -1.189640274 NA NA 313 | ADCY5 0.039795012 -0.651252415 NA NA 314 | SERPINH1 0.039795012 -1.21097985 0.405904275 -0.941129078 315 | CSNK1A1 0.039795012 0.776147709 0.122517932 0.596504981 316 | CSNK1G3 0.039795012 0.864514937 0.93616176 0.129762887 317 | DLD 0.039795012 0.806401832 0.270123647 0.275045981 318 | EFEMP1 0.039795012 -0.832409223 0.247091129 -0.480885831 319 | GPD1 0.039795012 -0.78445361 0.109548295 -0.694408228 320 | HSPA1L 0.039795012 -0.894905096 0.649510605 0.131199462 321 | ITPR2 0.039795012 -0.692725971 0.122517932 -0.816765168 322 | MPZ 0.039795012 -2.752270396 0.039795012 -0.994593646 323 | MTHFD1 0.039795012 0.624434563 0.151872392 0.634283566 324 | MYBPC2 0.039795012 -0.871399419 0.034513009 -0.870032612 325 | NPM1 0.039795012 0.565764768 0.186070034 0.485280263 326 | PLAT 0.039795012 0.685971318 0.20508663 0.353659204 327 | POLB 0.039795012 0.818134497 0.320321905 0.316010598 328 | RAF1 0.039795012 1.111704393 0.168348749 1.454881036 329 | RDH5 0.039795012 -0.914158262 NA NA 330 | S100A9 0.039795012 0.625982826 0.045711209 0.268649605 331 | S100A12 0.039795012 0.67518107 0.025666212 0.897236487 332 | HLTF 0.039795012 0.763470346 0.045711209 0.866193423 333 | TP63 0.039795012 0.683873932 0.20508663 0.676795971 334 | CYP7B1 0.039795012 -0.664634301 0.003684403 -1.249145944 335 | SLC25A17 0.039795012 -0.948418917 0.722342673 0.090684908 336 | SLU7 0.039795012 1.077910187 0.086761533 0.552436048 337 | WASF3 0.039795012 -0.936502889 0.93616176 -0.019477191 338 | SNRNP27 0.039795012 -0.561702735 0.086761533 -0.554640111 339 | ITGB1BP2 0.039795012 -1.04415137 0.948729289 -0.077492877 340 | TMCO6 0.039795012 1.195371806 0.463403263 0.15696866 341 | MYDGF 0.039795012 -0.57973669 0.649510605 0.025193216 342 | TMX4 0.039795012 -0.773290876 0.086761533 -0.839913139 343 | WDR19 0.039795012 -0.094038305 0.109548295 -0.382523145 344 | PLEKHA2 0.039795012 0.927808852 0.067832625 0.444131475 345 | FKBP10 0.039795012 -0.805704253 0.469593677 -0.744502273 346 | KCTD14 0.039795012 -1.128630873 0.810041728 -1.257278315 347 | SAT2 0.039795012 -0.720497196 0.405904275 -0.747623943 348 | DTD2 0.039795012 1.530944369 0.076839798 0.52092256 349 | ERP27 0.039795012 -0.661313449 NA NA 350 | CHCHD4 0.039795012 0.357638682 0.320321905 0.169422049 351 | GRPEL2 0.039795012 0.72070874 0.270123647 0.560033871 352 | NRK 0.039795012 -1.003083126 0.004096213 -2.198015889 353 | RFLNB 0.039795012 -0.893534297 0.088293857 0.286211365 354 | CCDC9B 0.039795012 -0.675971046 0.649510605 -0.659561902 355 | HACD4 0.039795012 -0.600886689 NA NA 356 | CDK11A 0.039795012 1.037339358 0.20508663 0.432442214 357 | RORC 0.039795012 -0.554863717 NA NA 358 | SH3RF2 0.039795012 0.968825306 0.072024691 0.949101285 359 | KIFC2 0.039795012 0.535560156 NA NA 360 | LRP3 0.039795012 -0.987289465 NA NA 361 | TRPM3 0.039795012 -0.799898414 NA NA 362 | ULK2 0.039795012 -2.233954345 NA NA 363 | SMIM15 0.039795012 0.547421946 NA NA 364 | MXI1 0.039795012 0.94461409 NA NA 365 | EYA1 0.039795012 -0.781897056 NA NA 366 | SEC16B 0.039795012 -0.738029433 NA NA 367 | LONRF3 0.039795012 -1.090501791 NA NA 368 | KCNH5 0.040597074 0.685617795 NA NA 369 | KLHL33 0.041220825 -0.550658021 NA NA 370 | HMGCLL1 0.04133803 -0.518453521 NA NA 371 | XAGE2 0.045295054 -1.565172561 NA NA 372 | AARS 0.045711209 0.553615181 0.122517932 0.5500708 373 | ART3 0.045711209 -1.225786998 0.315151515 -1.0834237 374 | CAPN1 0.045711209 0.646412481 0.086761533 0.631288119 375 | RCC1 0.045711209 0.77600977 0.93616176 0.204211254 376 | CYP51A1 0.045711209 0.362820339 0.93616176 0.068043127 377 | PHC2 0.045711209 -1.630758311 0.225413534 -0.1250244 378 | GM2A 0.045711209 0.909252838 0.978697383 0.239526145 379 | HNRNPAB 0.045711209 0.656769193 0.37602023 -0.169481376 380 | ITGAE 0.045711209 -0.537837254 0.810041728 0.308307694 381 | KRT13 0.045711209 0.750261117 0.018784685 1.039614316 382 | PITX1 0.045711209 0.956409679 0.015971771 1.589687565 383 | PPP3CA 0.045711209 -0.74625322 0.045711209 -0.630252974 384 | PSMC2 0.045711209 1.31747828 0.122517932 0.856237559 385 | PTX3 0.045711209 -0.910625409 0.810041728 0.288983993 386 | RBP2 0.045711209 0.731318626 NA NA 387 | S100A8 0.045711209 0.530925996 0.006628079 0.57114929 388 | SLC7A2 0.045711209 -0.739340279 NA NA 389 | SPTBN1 0.045711209 -0.889868051 0.247091129 -0.846156642 390 | STYX 0.045711209 0.734460792 0.125541126 0.089115416 391 | TCF7 0.045711209 -0.760099593 1 -0.339344957 392 | UBE2G2 0.045711209 -0.632388776 0.649510605 0.021765304 393 | ZNF136 0.045711209 -0.719672989 NA NA 394 | SLC43A1 0.045711209 -1.869062941 NA NA 395 | ENC1 0.045711209 -1.178077807 NA NA 396 | BTRC 0.045711209 0.16220394 NA NA 397 | USP13 0.045711209 -0.601228072 0.05232352 -0.660709659 398 | IRF9 0.045711209 1.043068914 0.039795012 1.148263075 399 | TUBGCP3 0.045711209 -0.429019743 0.034513009 -0.808442861 400 | WDHD1 0.045711209 0.611711185 0.086761533 0.637415529 401 | RHOBTB3 0.045711209 -1.37356497 NA NA 402 | TBC1D9B 0.045711209 0.835208726 0.347492645 0.185417989 403 | N4BP3 0.045711209 1.158699803 0.538245486 -0.096491108 404 | SDF2L1 0.045711209 -0.850359915 0.574292637 0.084765797 405 | CCDC9 0.045711209 0.66807693 0.247091129 0.630753022 406 | GOLGA7 0.045711209 0.588777365 0.136614041 0.426756319 407 | FAM8A1 0.045711209 -0.534656325 0.197808115 -0.836992917 408 | SH3TC1 0.045711209 -0.744389602 0.768886795 -0.246788132 409 | RBM28 0.045711209 0.724368622 0.893790743 0.274292152 410 | RSAD1 0.045711209 -0.771828283 0.60952381 0.412231478 411 | MIS18BP1 0.045711209 -0.747909753 0.335664336 -0.155844504 412 | CENPN 0.045711209 0.673167011 0.778865579 0.285349943 413 | CTTNBP2NL 0.045711209 0.626355776 0.076839798 0.617989143 414 | CLTRN 0.045711209 -0.851537416 NA NA 415 | SRR 0.045711209 -0.691950029 0.059234883 -0.834253563 416 | ZNF106 0.045711209 -0.903889697 0.846998719 0.101090761 417 | RUFY1 0.045711209 0.932995631 0.005479684 0.759156984 418 | CCDC115 0.045711209 1.116927725 0.039795012 0.779625771 419 | SCRN2 0.045711209 -0.73511872 0.186070034 -0.602675344 420 | ANKRD40 0.045711209 -1.153013624 0.070238585 -0.669341506 421 | CXorf40A 0.045711209 -0.675008159 0.755050505 -0.543254465 422 | RMI2 0.045711209 0.682104735 0.688911089 -0.407829423 423 | ZNF526 0.045711209 0.807707675 1 -0.073834295 424 | UBLCP1 0.045711209 0.719950306 0.097642444 0.30488639 425 | MOSPD2 0.045711209 -0.553994524 0.688522201 -0.137724547 426 | PLCXD3 0.045711209 -0.595259729 NA NA 427 | MAST4 0.045711209 1.089172734 0.076839798 0.654830622 428 | PHLDB3 0.045711209 0.896317211 0.029822126 0.909743033 429 | ARMCX4 0.045711209 -0.75530203 NA NA 430 | CELSR3 0.045711209 0.539737052 NA NA 431 | OMG 0.045711209 -0.695696171 NA NA 432 | GRIN2C 0.045711209 -0.835381614 NA NA 433 | FNTB 0.045711209 0.540012678 NA NA 434 | CCZ1B 0.045711209 -0.777086647 NA NA 435 | KCNH7 0.045711209 -3.244430944 NA NA 436 | B4GALT6 0.045711209 -1.263078613 NA NA 437 | FBXO24 0.045711209 0.746589638 NA NA 438 | LRRC43 0.045711209 -0.765784453 NA NA 439 | NXPE1 0.045711209 -0.686872578 NA NA 440 | UGT2B4 0.045965915 -0.901677493 NA NA 441 | HS3ST4 0.046682742 -0.698299805 0.177777778 1.10616322 442 | LRRC14B 0.047064403 -0.460242353 NA NA 443 | CPA2 0.049771984 -0.788299252 NA NA 444 | GABBR2 0.050169199 -0.809696673 NA NA 445 | SIX3 0.050928434 -0.506402257 NA NA 446 | CACNA1S 0.05232352 -0.672505139 0.768886795 -0.060026461 447 | CAPN6 0.05232352 -0.893177722 1 -0.318557076 448 | CAPZA1 0.05232352 0.255055831 0.086761533 0.479885627 449 | EPHA1 0.05232352 0.607019081 0.768886795 0.124069886 450 | ESRRG 0.05232352 -0.90337727 NA NA 451 | FKBP3 0.05232352 -0.869191896 0.136614041 -1.00777756 452 | BLOC1S1 0.05232352 -0.563624694 0.405904275 -0.225353741 453 | GTF2A1 0.05232352 0.553110076 0.978697383 -0.089194076 454 | HPN 0.05232352 -0.948812655 NA NA 455 | IDH1 0.05232352 1.221109848 0.097642444 1.061969094 456 | LSAMP 0.05232352 -0.782624715 0.005620386 -0.899563797 457 | NARS 0.05232352 0.472389924 0.097642444 0.367744916 458 | NPR1 0.05232352 -0.613934096 0.133333333 1.227104972 459 | SERPINA5 0.05232352 -1.031962163 0.151872392 -1.115540894 460 | PROX1 0.05232352 -0.154908799 0.202020202 0.129473866 461 | PSMC6 0.05232352 0.692472155 0.039795012 0.831791831 462 | SCO1 0.05232352 -1.015603881 0.018784685 -0.958615937 463 | SOD3 0.05232352 -0.844217815 0.109548295 -0.571521752 464 | TFAP2C 0.05232352 0.436639236 0.347492645 0.213036183 465 | TPD52 0.05232352 0.362377927 0.168348749 0.158733081 466 | UCP1 0.05232352 -0.568136764 NA NA 467 | CTNNAL1 0.05232352 -0.710408163 0.086761533 -0.398881335 468 | MYOM2 0.05232352 -1.117877346 0.05232352 -0.807723081 469 | STOML1 0.05232352 -0.90166518 0.380952381 -0.971166133 470 | VPS9D1 0.05232352 1.226804618 0.016161616 1.446408809 471 | ZSCAN12 0.05232352 -0.557567714 NA NA 472 | KPTN 0.05232352 0.767002241 0.202967311 0.583699445 473 | DDX42 0.05232352 -0.909405413 0.347492645 -0.271502699 474 | HABP4 0.05232352 -1.453223776 0.574292637 -0.987195417 475 | RFTN1 0.05232352 -0.750816346 0.015971771 -0.994821355 476 | FBXO46 0.05232352 0.855604981 NA NA 477 | PYGO1 0.05232352 -0.607265547 NA NA 478 | ARHGEF16 0.05232352 0.865790882 0.186070034 0.869048136 479 | PACSIN1 0.05232352 -1.095115699 0.097642444 -0.924780627 480 | DCTN4 0.05232352 0.882414074 0.649510605 0.571080565 481 | PIGG 0.05232352 -0.133347609 0.247091129 -0.64990835 482 | CHTF8 0.05232352 0.942639755 0.775624376 0.138198302 483 | WDR70 0.05232352 -0.879132826 0.122517932 -1.310253867 484 | RNLS 0.05232352 -0.582210045 0.343434343 0.304644102 485 | ISY1 0.05232352 0.481851941 0.688522201 -0.098566832 486 | RELCH 0.05232352 0.630986424 0.437101706 0.618175604 487 | MRPS9 0.05232352 0.904047243 0.538245486 0.322843993 488 | SOWAHC 0.05232352 0.861338825 0.018784685 1.273956604 489 | TNIP2 0.05232352 -0.518296309 0.611412419 -0.28349582 490 | PAAF1 0.05232352 -0.946705377 0.503319039 -0.337209398 491 | HYI 0.05232352 -0.372390134 0.225413534 -0.52736661 492 | ARMC10 0.05232352 1.651834221 0.574292637 0.856225525 493 | SLC9A7 0.05232352 -0.740705146 0.051218939 -0.60583178 494 | MYO18B 0.05232352 -0.857535041 0.538245486 -0.405578441 495 | UNK 0.05232352 -0.542720909 0.347492645 -0.209853396 496 | PXYLP1 0.05232352 -0.775144494 NA NA 497 | PPP1R14A 0.05232352 -0.581823181 0.136614041 -0.527614232 498 | VSTM2L 0.05232352 -0.915128205 NA NA 499 | C22orf39 0.05232352 -0.487653761 NA NA 500 | MTPN 0.05232352 0.533682717 0.270123647 0.485724883 501 | -------------------------------------------------------------------------------- /tests/testthat/test_enrichmentAnalysis.r: -------------------------------------------------------------------------------- 1 | context("Test the enrichmentAnalysis function") 2 | 3 | 4 | test_that('Overlap Found by enrichmentAnalysis is correct', { 5 | res <- enrichmentAnalysis(ea_genelist, ea_gmt, ea_background) 6 | res2 <- enrichmentAnalysis(ea_genelist2, ea_gmt, ea_background2) 7 | 8 | expect_true(setequal(res[[1, 'overlap']], c('BLM'))) 9 | expect_true(setequal(res[[2, 'overlap']], c('HERC2', 'SP100', 'BLM'))) 10 | expect_true(setequal(res2[[3, 'overlap']], c('HERC2'))) 11 | expect_true(setequal(res[[4, 'overlap']], NULL)) 12 | }) 13 | -------------------------------------------------------------------------------- /tests/testthat/test_export_CSV.r: -------------------------------------------------------------------------------- 1 | context("Export of results as CSV file") 2 | 3 | test_that("CSV file structure is expected for single evidence", { 4 | 5 | res = ActivePathways(dat[,1, drop = F], gmt) 6 | CSV_fname = "res.csv" 7 | suppressWarnings(file.remove(CSV_fname)) 8 | 9 | export_as_CSV(res, CSV_fname) 10 | res_from_CSV = read.csv(CSV_fname, stringsAsFactors = F) 11 | suppressWarnings(file.remove(CSV_fname)) 12 | 13 | expect_equal(colnames(res_from_CSV), colnames(res)) 14 | expect_equal(res_from_CSV$term_id, res$term_id) 15 | 16 | }) 17 | 18 | 19 | test_that("CSV file structure is expected for multiple evidence", { 20 | 21 | res = ActivePathways(dat, gmt) 22 | CSV_fname = "res.csv" 23 | suppressWarnings(file.remove(CSV_fname)) 24 | 25 | export_as_CSV(res, CSV_fname) 26 | res_from_CSV = read.csv(CSV_fname, stringsAsFactors = F) 27 | suppressWarnings(file.remove(CSV_fname)) 28 | 29 | expect_equal(colnames(res_from_CSV), colnames(res)) 30 | expect_equal(res_from_CSV$term_id, res$term_id) 31 | 32 | }) 33 | 34 | 35 | test_that("CSV file is exported when there are NULL entries", { 36 | 37 | res = ActivePathways(scores=scores_test, gmt=gmt_reac, significant=1) 38 | CSV_fname = "res.csv" 39 | suppressWarnings(file.remove(CSV_fname)) 40 | 41 | export_as_CSV(res, CSV_fname) 42 | res_from_CSV = read.csv(CSV_fname, stringsAsFactors = F) 43 | suppressWarnings(file.remove(CSV_fname)) 44 | 45 | # convert res overlap column values to a string type 46 | res$overlap <- sapply(res$overlap, function(x) paste(x, collapse = "|")) 47 | expect_equal(res_from_CSV$overlap, res$overlap) 48 | 49 | }) 50 | -------------------------------------------------------------------------------- /tests/testthat/test_merge_p_values.r: -------------------------------------------------------------------------------- 1 | context("merge_p_values function") 2 | 3 | test_list <- list(a=0.01, b=0.06, c=0.8, d=0.0001, e=0, f=1) 4 | test_matrix <- matrix(c(0.01, 0.06, 0.08, 0.0001, 0, 1), ncol=2) 5 | 6 | comparison_list <- test_list 7 | comparison_list[[5]] = 1e-300 8 | 9 | test_matrix = matrix(unlist(test_list), ncol = 2) 10 | comparison_matrix = matrix(unlist(comparison_list), ncol = 2) 11 | 12 | test_that("scores is a numeric matrix or list with valid p-values", { 13 | 14 | expect_error(merge_p_values(unlist(test_list), "Fisher"), NA) 15 | expect_error(merge_p_values(test_list, "Brown"), 16 | "Brown's, DPM, Strube's, and Strube_directional methods cannot be used with a single list of p-values") 17 | expect_error(merge_p_values(unlist(test_list), "Brown"), 18 | "Brown's, DPM, Strube's, and Strube_directional methods cannot be used with a single list of p-values") 19 | 20 | 21 | 22 | test_list[[1]] <- -0.1 23 | expect_error(merge_p_values(test_list), 'All values in scores must be in [0,1]', fixed=TRUE) 24 | test_list[[1]] <- 1.1 25 | expect_error(merge_p_values(test_list), 'All values in scores must be in [0,1]', fixed=TRUE) 26 | test_list[[1]] <- NA 27 | expect_error(merge_p_values(test_list), 'scores cannot contain missing values, we recommend replacing NA with 1 or removing') 28 | test_list[[1]] <- 'c' 29 | expect_error(merge_p_values(test_list), 'scores must be numeric') 30 | 31 | 32 | test_matrix[1, 1] <- NA 33 | expect_error(merge_p_values(test_matrix), 'scores cannot contain missing values, we recommend replacing NA with 1 or removing') 34 | test_matrix[1, 1] <- -0.1 35 | expect_error(merge_p_values(test_matrix), "All values in scores must be in [0,1]", fixed=TRUE) 36 | test_matrix[1, 1] <- 1.1 37 | expect_error(merge_p_values(test_matrix), "All values in scores must be in [0,1]", fixed=TRUE) 38 | test_matrix[1, 1] <- 'a' 39 | expect_error(merge_p_values(test_matrix), 'scores must be numeric') 40 | 41 | 42 | }) 43 | 44 | 45 | test_direction_vector <- c(1,-1) 46 | 47 | 48 | test_that("Merged p-values are correct", { 49 | 50 | this_tolerance = 1e-7 51 | answer1 = c(1.481551e-05, 4.167534e-299, 9.785148e-01) 52 | answer2 = c(2.52747e-05, 0.00000e+00, 9.73873e-01) 53 | answer3 = 7.147579e-296 54 | 55 | expect_equal(merge_p_values(test_matrix, "Fisher"), answer1, tolerance = this_tolerance) 56 | 57 | expect_equal(merge_p_values(test_matrix, "Brown"), answer2, tolerance = this_tolerance) 58 | 59 | expect_equal(merge_p_values(test_matrix[, 1, drop=FALSE], "Fisher"), test_matrix[, 1, drop=TRUE]) 60 | expect_equal(merge_p_values(test_matrix[, 1, drop=FALSE], "Brown"), test_matrix[, 1, drop=TRUE]) 61 | 62 | expect_equal(merge_p_values(test_list, "Fisher"), answer3, tolerance = this_tolerance) 63 | 64 | test_pval_vector <- c(0.05,0.01) 65 | test_direction_vector <- c(1,-1) 66 | constraints_vector1 <- c(-1,1) 67 | constraints_vector2 <- c(1,-1) 68 | 69 | expect_equal(merge_p_values(test_pval_vector,"Fisher_directional",test_direction_vector,constraints_vector1), 70 | merge_p_values(test_pval_vector,"Fisher_directional",test_direction_vector,constraints_vector2)) 71 | 72 | inflated_pvals <- c(1, 1e-400) 73 | threshold_pvals <- c(1, 1e-300) 74 | expect_equal(merge_p_values(inflated_pvals), merge_p_values(threshold_pvals)) 75 | 76 | inflated_pval_matrix = matrix(c(1, 1, 1e-320, 1e-310), ncol = 2) 77 | threshold_pval_matrix = matrix(c(1, 1, 1e-300, 1e-300), ncol = 2) 78 | expect_equal(merge_p_values(inflated_pval_matrix), merge_p_values(threshold_pval_matrix)) 79 | 80 | }) 81 | 82 | 83 | test_matrix <- matrix(c(0.01, 0.06, 0.08, 0.0001, 0, 1), ncol=2) 84 | test_direction_matrix <- matrix(c(1,-1,1,-1,-1,1), ncol=2) 85 | constraints_vector <- c(1,1) 86 | 87 | colnames(test_matrix) <- c("RNA", "Protein") 88 | rownames(test_matrix) <- c("TP53", "CHRNA1","PTEN") 89 | 90 | colnames(test_direction_matrix) <- colnames(test_matrix) 91 | rownames(test_direction_matrix) <- rownames(test_matrix) 92 | 93 | test_that("scores_direction and constraints_vector are valid", { 94 | 95 | expect_error(merge_p_values(test_matrix, "Fisher_directional",test_direction_matrix),'Both scores_direction and constraints_vector must be provided') 96 | expect_error(merge_p_values(test_matrix, "Fisher_directional",constraints_vector = constraints_vector),'Both scores_direction and constraints_vector must be provided') 97 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_direction_matrix, c(1,"b")), 'constraints_vector must be a numeric vector') 98 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_direction_matrix, c(1,0)), "scores_direction entries must be set to 0's for columns that do not contain directional information") 99 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_direction_matrix, c(1,5)), "constraints_vector must contain the values: 1, -1 or 0") 100 | 101 | test_dir <- as.vector(test_direction_matrix) 102 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_dir, c(1,1)), 'scores and scores_direction must be the same data type') 103 | 104 | test_dir <- test_direction_matrix 105 | test_dir[1,1] <- NA 106 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_dir, c(1,1)), 'scores_direction cannot contain missing values, we recommend replacing NA with 0 or removing') 107 | 108 | test_dir <- test_direction_matrix 109 | test_dir[1,1] <- 'a' 110 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_dir, c(1,1)), 'scores_direction must be numeric') 111 | 112 | test_dir <- test_direction_matrix 113 | colnames(test_dir) <- NULL 114 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_dir, c(1,1)), 'column names must be provided to scores and scores_direction') 115 | 116 | test_m <- test_matrix[1:2,] 117 | expect_error(merge_p_values(test_m, "Fisher_directional", test_direction_matrix, c(1,1)), 'scores and scores_direction must have the same number of rows') 118 | 119 | test_m <- test_matrix 120 | rownames(test_m) <- c("TP53", "GENE2", "GENE3") 121 | expect_error(merge_p_values(test_m, "Fisher_directional", test_direction_matrix, c(1,1)), 'scores_direction gene names must match scores genes') 122 | 123 | test_m <- test_matrix 124 | rownames(test_m) <- c("CHRNA1","TP53","PTEN") 125 | expect_error(merge_p_values(test_m, "Fisher_directional", test_direction_matrix, c(1,1)), 'scores genes should be in the same order as scores_direction genes') 126 | 127 | test_m <- test_matrix 128 | colnames(test_m) <- c("RNA","Mutation") 129 | expect_error(merge_p_values(test_m, "Fisher_directional", test_direction_matrix, c(1,1)), 130 | 'scores_direction column names must match scores column names') 131 | 132 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_direction_matrix, c(1,1,-1)), 133 | 'constraints_vector should have the same number of entries as columns in scores_direction') 134 | 135 | names(constraints_vector) <- c("Protein","RNA") 136 | expect_error(merge_p_values(test_matrix, "Fisher_directional", test_direction_matrix, constraints_vector), 137 | 'the constraints_vector entries should match the order of scores and scores_direction columns') 138 | }) 139 | 140 | 141 | 142 | test_that("P-value merging methods are correct", { 143 | expect_error(merge_p_values(c(0.05,0.10), "Fisher", c(1,1), c(1,1)), 144 | 'Only DPM, Fisher_directional, Stouffer_directional, and Strube_directional methods support directional integration') 145 | 146 | expect_error(merge_p_values(c(0.05,0.10), "Tippett"), 147 | 'Only Fisher, Brown, Stouffer and Strube methods are currently supported for non-directional analysis. 148 | And only DPM, Fisher_directional, Stouffer_directional, and Strube_directional are supported for directional analysis') 149 | 150 | expect_error(merge_p_values(c(0.05,0.10), "Fisher_directional"), 151 | 'scores_direction and constraints_vector must be provided for directional analyses') 152 | 153 | }) 154 | -------------------------------------------------------------------------------- /tests/testthat/test_orderedHypergeometric.r: -------------------------------------------------------------------------------- 1 | context("Ordered Hypergeometric Statistical Test") 2 | 3 | 4 | test_that('hypergeometric gives the same results as fisher.test', { 5 | counts <- matrix(c(0,0,0,0), nrow=2) 6 | expect_equal(hypergeometric(counts), fisher.test(counts, alternative='greater')$p.value) 7 | 8 | counts <- matrix(c(0, 5, 16, 1683), nrow=2) 9 | expect_equal(hypergeometric(counts), fisher.test(counts, alternative='greater')$p.value) 10 | 11 | counts <- matrix(c(2, 2, 2, 2), nrow=2) 12 | expect_equal(hypergeometric(counts), fisher.test(counts, alternative='greater')$p.value) 13 | }) 14 | 15 | 16 | test_that('orderedHypergeometric returns the lowest p-value and correct index', { 17 | genelist <- c('HERC2', 'SP100', 'BLM') 18 | background <- c('PHC2', 'BLM', 'XPC', 'SMC3', 'HERC2', 'SP100') 19 | annotations <- c('HERC2', 'PHC2', 'BLM') 20 | 21 | get_pvalue <- function(genes) { 22 | complement <- setdiff(background, genes) 23 | genelist1 <- length(which(genes %in% annotations)) 24 | genelist0 <- length(genes) - genelist1 25 | complement1 <- length(which(complement %in% annotations)) 26 | complement0 <- length(complement) - complement1 27 | counts <- matrix(c(genelist1, genelist0, complement1, complement0), 2) 28 | hypergeometric(counts) 29 | } 30 | p_values <- sapply(1:length(genelist), function(i) get_pvalue(genelist[1:i])) 31 | 32 | smallest_value <- min(p_values) 33 | smallest_index <- match(smallest_value, p_values) 34 | exp <- list(p_val=smallest_value, ind=smallest_index) 35 | 36 | expect_equal(orderedHypergeometric(genelist, background, annotations), exp) 37 | }) 38 | -------------------------------------------------------------------------------- /tests/testthat/test_return.r: -------------------------------------------------------------------------------- 1 | context("Format of the returned object and output files") 2 | 3 | 4 | test_that("Column names of data.table is correct", { 5 | expect_equal(colnames(run_ap_short(dat)), 6 | c('term_id', 'term_name', 'adjusted_p_val', 'term_size', 'overlap')) 7 | expect_equal(colnames(run_ap_short_contribution(dat)), 8 | c('term_id', 'term_name', 'adjusted_p_val', 'term_size', 'overlap', 9 | 'evidence', 'Genes_cds', 'Genes_promoter', 'Genes_enhancer')) 10 | }) 11 | 12 | 13 | test_that("All results or only significant ones are returned", { 14 | expect_true(all(ActivePathways(dat, gmt, cytoscape_file_tag = NA)$p_val < 0.05)) 15 | expect_equal(nrow(ActivePathways(dat, gmt, cytoscape_file_tag = NA, significant = 1)), length(gmt)) 16 | }) 17 | 18 | 19 | test_that("No significant results are found", { 20 | expect_warning(res1 <- ActivePathways(dat, gmt, cytoscape_file_tag = NA, significant = 0), 21 | "No significant terms were found", fixed = TRUE) 22 | expect_equal(nrow(res1), NULL) 23 | }) 24 | 25 | 26 | -------------------------------------------------------------------------------- /tests/testthat/test_validation.r: -------------------------------------------------------------------------------- 1 | context("Validation on input to ActivePathways") 2 | 3 | 4 | test_that("scores is a numeric matrix with valid p-values", { 5 | dat2 <- dat 6 | dat2[1, 1] <- 'a' 7 | expect_error(run_ap_short(dat2), 'scores must be a numeric matrix') 8 | 9 | dat2 <- dat 10 | dat2[1, 1] <- NA 11 | expect_error(run_ap_short(dat2), 'scores cannot contain missing values, we recommend replacing NA with 1 or removing') 12 | 13 | dat2[1, 1] <- -0.1 14 | expect_error(run_ap_short(dat2), "All values in scores must be in [0,1]", fixed=TRUE) 15 | 16 | dat2[1, 1] <- 1.1 17 | expect_error(run_ap_short(dat2), "All values in scores must be in [0,1]", fixed=TRUE) 18 | 19 | dat2[1, 1] <- 1 20 | expect_error(run_ap_short(dat2), NA) 21 | 22 | dat2[1, 1] <- 0 23 | expect_error(run_ap_short(dat2), NA) 24 | }) 25 | 26 | test_that("scores_direction and constraints_vector have valid input",{ 27 | 28 | expect_error(run_ap(scores_test, direction_test,NULL),'Both scores_direction and constraints_vector must be provided') 29 | expect_error(run_ap(scores_test, NULL,constraints_vector = constraints_vector_test),'Both scores_direction and constraints_vector must be provided') 30 | 31 | constraints_vector <- c('a','b') 32 | expect_error(run_ap(scores_test,direction_test,constraints_vector), 'constraints_vector must be a numeric vector') 33 | expect_error(run_ap(scores_test, direction_test, c(1,0)), "scores_direction entries must be set to 0's for columns that do not contain directional information") 34 | 35 | dir_test <- direction_test 36 | dir_test[1,1] <- NA 37 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 'scores_direction cannot contain missing values, we recommend replacing NA with 0 or removing') 38 | 39 | dir_test <- direction_test 40 | dir_test[1,1] <- 'a' 41 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 'scores_direction must be a numeric matrix') 42 | 43 | dir_test <- direction_test[1:3,] 44 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 'scores and scores_direction must have the same number of rows') 45 | 46 | dir_test <- direction_test 47 | rownames(dir_test) <- 1:length(direction_test[,1]) 48 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 'scores_direction gene names must match scores genes') 49 | 50 | dir_test <- direction_test 51 | rownames(dir_test) <- rev(rownames(direction_test)) 52 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 'scores genes should be in the same order as scores_direction genes') 53 | 54 | dir_test <- direction_test 55 | colnames(dir_test) <- NULL 56 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 'column names must be provided to scores and scores_direction') 57 | 58 | constraints_vector <- c(1,1,-1) 59 | expect_error(run_ap(scores_test,direction_test,constraints_vector), 60 | 'constraints_vector should have the same number of entries as columns in scores_direction') 61 | 62 | constraints_vector <- c(1,-1) 63 | names(constraints_vector) <- c("protein","rna") 64 | expect_error(run_ap(scores_test,direction_test,constraints_vector), 65 | 'the constraints_vector entries should match the order of scores and scores_direction columns') 66 | 67 | dir_test <- direction_test 68 | colnames(dir_test) <- c("rna","Mutation") 69 | expect_error(run_ap(scores_test,dir_test,constraints_vector_test), 70 | 'scores_direction column names must match scores column names') 71 | 72 | constraints_vector <- c(1,0) 73 | expect_error(run_ap(scores_test, direction_test, constraints_vector), "scores_direction entries must be set to 0's for columns that do not contain directional information") 74 | }) 75 | 76 | test_that("significant is valid", { 77 | expect_error(ActivePathways(dat, gmt, significant=-0.1), 78 | "significant must be a value in [0,1]", fixed=TRUE) 79 | expect_error(ActivePathways(dat, gmt, significant = 1.1), 80 | "significant must be a value in [0,1]", fixed=TRUE) 81 | expect_error(ActivePathways(dat, gmt, significant=NULL), 82 | "length(significant) == 1 is not TRUE", fixed=TRUE) 83 | expect_error(ActivePathways(dat, gmt, significant=c(1,2)), 84 | "length(significant) == 1 is not TRUE", fixed=TRUE) 85 | expect_error(ActivePathways(dat, gmt, significant='qwe'), 86 | "is.numeric(significant) is not TRUE", fixed=TRUE) 87 | expect_warning(ActivePathways(dat, gmt, significant = 0), 88 | "No significant terms were found") 89 | expect_error(ActivePathways(dat, gmt, significant = 1), NA) 90 | }) 91 | 92 | 93 | test_that("cutoff is valid", { 94 | expect_error(ActivePathways(dat, gmt, cutoff=-0.1), 95 | "cutoff must be a value in [0,1]", fixed=TRUE) 96 | expect_error(ActivePathways(dat, gmt, cutoff = 1.1), 97 | "cutoff must be a value in [0,1]", fixed=TRUE) 98 | expect_error(ActivePathways(dat, gmt, cutoff=NULL), 99 | "length(cutoff) == 1 is not TRUE", fixed=TRUE) 100 | expect_error(ActivePathways(dat, gmt, cutoff=c(1,2)), 101 | "length(cutoff) == 1 is not TRUE", fixed=TRUE) 102 | expect_error(ActivePathways(dat, gmt, cutoff='qwe'), 103 | "is.numeric(cutoff) is not TRUE", fixed=TRUE) 104 | expect_error(ActivePathways(dat, gmt, cutoff=0), 105 | "No genes made the cutoff", fixed=TRUE) 106 | expect_error(ActivePathways(dat, gmt, cutoff=1), NA) 107 | }) 108 | 109 | 110 | test_that("background is a character vector", { 111 | error_msg <- "background must be a character vector" 112 | expect_error(ActivePathways(dat, gmt, background=c(1,5,2)), error_msg) 113 | expect_error(ActivePathways(dat, gmt, background=matrix(c('a', 'b', 'c', 'd'), 2)), error_msg) 114 | }) 115 | 116 | 117 | test_that("genes not found in background are removed", { 118 | expect_message(ActivePathways(dat, gmt, background=rownames(dat)[-(1:10)], significant=1, cutoff=1), 119 | "10 rows were removed from scores because they are not found in the background") 120 | expect_error(ActivePathways(dat, gmt, background='qwerty'), 121 | "scores does not contain any genes in the background") 122 | }) 123 | 124 | test_that("geneset_filter is a numeric vector of length 2", { 125 | expect_error(ActivePathways(dat, gmt, geneset_filter=1), 126 | "geneset_filter must be length 2") 127 | expect_error(ActivePathways(dat, gmt, geneset_filter=list(1,2)), 128 | "geneset_filter must be a numeric vector") 129 | expect_error(ActivePathways(dat, gmt, geneset_filter=c('q', 2)), 130 | "geneset_filter must be a numeric vector") 131 | expect_error(ActivePathways(dat, gmt, geneset_filter=c(1, -2)), 132 | "geneset_filter limits must be positive") 133 | expect_error(ActivePathways(dat, gmt, geneset_filter=c(0, 0)), 134 | "No pathways in gmt made the geneset_filter", fixed=TRUE) 135 | expect_message(ActivePathways(dat, gmt, geneset_filter=c(NA, 10)), 136 | "[0-9]+ terms were removed from gmt because they did not make the geneset_filter") 137 | expect_error(ActivePathways(dat, gmt, geneset_filter=c(0, NA)), NA) 138 | expect_error(ActivePathways(dat, gmt, geneset_filter=NULL), NA) 139 | }) 140 | 141 | test_that("custom colors is a character vector that is equal in length to the number of columns in scores",{ 142 | expect_error(ActivePathways(scores = dat, gmt = gmt, custom_colors = list("red","blue", "green")), 143 | "colors must be provided as a character vector",fixed = TRUE) 144 | expect_error(ActivePathways(scores = dat, gmt = gmt, custom_colors = c("red","blue")), 145 | "incorrect number of colors is provided",fixed = TRUE) 146 | 147 | incorrect_color_names <- c("red","blue", "green") 148 | names(incorrect_color_names) <- c("promoter","lds", "enhancer") 149 | expect_error(ActivePathways(scores = dat, gmt = gmt, custom_colors = incorrect_color_names), 150 | "names() of the custom colors vector should match the scores column names",fixed = TRUE) 151 | }) 152 | 153 | test_that("color palette is from the RColorBrewer package",{ 154 | expect_error(ActivePathways(scores = dat, gmt = gmt, color_palette = "flamingo"), 155 | "palette must be from the RColorBrewer package",fixed = TRUE) 156 | }) 157 | 158 | test_that("color palette and custom colors parameters are never specified together",{ 159 | expect_error(ActivePathways(scores = dat, gmt = gmt, color_palette = "Pastel1", custom_colors = c("red","blue", "green")), 160 | "Both custom_colors and color_palette are provided. Specify only one of these parameters for node coloring.",fixed = TRUE) 161 | }) 162 | 163 | test_that("color_integrated_only is a character vector of length 1",{ 164 | expect_error(ActivePathways(scores = dat, gmt = gmt, color_integrated_only = list(1,2,3)), 165 | "color must be provided as a character vector",fixed = TRUE) 166 | expect_error(ActivePathways(scores = dat, gmt = gmt, color_integrated_only = c("red","blue")), 167 | "only a single color must be specified",fixed = TRUE) 168 | }) 169 | -------------------------------------------------------------------------------- /vignettes/CreateEnrichmentMapDialogue_V2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/CreateEnrichmentMapDialogue_V2.png -------------------------------------------------------------------------------- /vignettes/ImportStep_V2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/ImportStep_V2.png -------------------------------------------------------------------------------- /vignettes/LegendView.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/LegendView.png -------------------------------------------------------------------------------- /vignettes/LegendView_Custom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/LegendView_Custom.png -------------------------------------------------------------------------------- /vignettes/LegendView_RColorBrewer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/LegendView_RColorBrewer.png -------------------------------------------------------------------------------- /vignettes/NetworkStep1_V2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/NetworkStep1_V2.png -------------------------------------------------------------------------------- /vignettes/NetworkStep2_V2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/NetworkStep2_V2.png -------------------------------------------------------------------------------- /vignettes/PropertiesDropDown2_V2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/PropertiesDropDown2_V2.png -------------------------------------------------------------------------------- /vignettes/StylePanel_V2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/StylePanel_V2.png -------------------------------------------------------------------------------- /vignettes/border_line_type.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/border_line_type.jpg -------------------------------------------------------------------------------- /vignettes/legend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/legend.png -------------------------------------------------------------------------------- /vignettes/lineplot_tutorial.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/lineplot_tutorial.png -------------------------------------------------------------------------------- /vignettes/new_map.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/new_map.png -------------------------------------------------------------------------------- /vignettes/set_aesthetic.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reimandlab/ActivePathways/2cd1931cfc750e96282533bbd0928b5273f5035f/vignettes/set_aesthetic.jpg --------------------------------------------------------------------------------