├── DESCRIPTION ├── MD5 ├── NAMESPACE ├── NEWS.md ├── R ├── GOCluster.R ├── GOCore.R ├── GOHeat.R ├── GOVenn.R └── Helper.R ├── README.md ├── build └── vignette.rds ├── data └── EC.rda ├── inst ├── CITATION └── doc │ ├── GOplot_vignette.R │ ├── GOplot_vignette.Rmd │ └── GOplot_vignette.html ├── man ├── EC.Rd ├── GOBar.Rd ├── GOBubble.Rd ├── GOChord.Rd ├── GOCircle.Rd ├── GOCluster.Rd ├── GOHeat.Rd ├── GOVenn.Rd ├── chord_dat.Rd ├── circle_dat.Rd └── reduce_overlap.Rd └── vignettes ├── GOBar.png ├── GOBubble1.png ├── GOBubble2.png ├── GOBubble3.png ├── GOBubble4.png ├── GOChord1.png ├── GOCirc.png ├── GOCluster.png ├── GOCluster2.png ├── GOHeat_lfc.png ├── GOHeat_nolfc.png ├── GOVenn.png ├── GOplot.css ├── GOplot_vignette.Rmd └── Titel.png /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: GOplot 2 | Type: Package 3 | Title: Visualization of Functional Analysis Data 4 | Version: 1.0.2 5 | Date: 2016-03-30 6 | Authors@R: c( 7 | person("Wencke", "Walter", , email = "wencke.walter@arcor.de", role = c("aut", "cre")), 8 | person("Fatima", "Sanchez-Cabo", , role = "aut") 9 | ) 10 | URL: https://github.com/wencke/wencke.github.io 11 | BugReports: https://github.com/wencke/wencke.github.io/issues 12 | Description: Implementation of multilayered visualizations for enhanced 13 | graphical representation of functional analysis data. It combines and integrates 14 | omics data derived from expression and functional annotation enrichment 15 | analyses. Its plotting functions have been developed with an hierarchical 16 | structure in mind: starting from a general overview to identify the most 17 | enriched categories (modified bar plot, bubble plot) to a more detailed one 18 | displaying different types of relevant information for the molecules in a given 19 | set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap). 20 | Depends: ggplot2 (>= 2.0.0), ggdendro (>= 0.1-17), gridExtra (>= 21 | 2.0.0), RColorBrewer (>= 1.1.2), R (>= 3.2.3) 22 | License: GPL-2 23 | Suggests: knitr, rmarkdown 24 | VignetteBuilder: knitr 25 | LazyData: TRUE 26 | RoxygenNote: 5.0.1 27 | NeedsCompilation: no 28 | Packaged: 2016-03-30 08:24:21 UTC; BioinfoNerd 29 | Author: Wencke Walter [aut, cre], 30 | Fatima Sanchez-Cabo [aut] 31 | Maintainer: Wencke Walter 32 | Repository: CRAN 33 | Date/Publication: 2016-03-30 20:35:02 34 | -------------------------------------------------------------------------------- /MD5: -------------------------------------------------------------------------------- 1 | 1715bfe67bec477dd3cb8d7602e4fa20 *DESCRIPTION 2 | e61dd1ca8c29cbb323e13d8517d9c02a *NAMESPACE 3 | 4b28a97ef1acd9ed6c62896137721df8 *NEWS.md 4 | 23abcbfe83b5aba778ee015295f137aa *R/GOCluster.R 5 | e572f5815342fade1478975babe13886 *R/GOCore.R 6 | dfc4d334a0d33f9a93b570eff3e9e80f *R/GOHeat.R 7 | 42cbd333583bc85dd011a233cb3f2f74 *R/GOVenn.R 8 | c5f5d6bd7353ce68cefdc2be652b5960 *R/Helper.R 9 | 584ae11d874b64d6e5c4ff9c5b575d22 *README.md 10 | 0a9da26cff7c27c4304cd6c957e04af8 *build/vignette.rds 11 | 322babedca883c827c36456263b2a8ec *data/EC.rda 12 | b7bcdd8ea7db60feca771ffc0332b250 *inst/CITATION 13 | fcd68b105afd0eeb7427bbf4fd9d6841 *inst/doc/GOplot_vignette.R 14 | bb17de94d29ab2e4eb59b63b01accfb6 *inst/doc/GOplot_vignette.Rmd 15 | 452ddf9993f7c63c2ee59ce2ece6f673 *inst/doc/GOplot_vignette.html 16 | 4251e378a6add0b3ae7a596abdcbb2a6 *man/EC.Rd 17 | e90a8e3541703f5e5883bb3266e24a19 *man/GOBar.Rd 18 | c5bb4945d9ff65d94135ce4efaaf5459 *man/GOBubble.Rd 19 | 44149f89ac4b3e6198648742e408de8c *man/GOChord.Rd 20 | ee2fcf3f06f78085787602c7f8c6093b *man/GOCircle.Rd 21 | ce63714fd31d2075dc19890283c90c32 *man/GOCluster.Rd 22 | 0e75b8f0a887a4b435f3101e75977931 *man/GOHeat.Rd 23 | 12b7a3dc5e14f387675f436209b22346 *man/GOVenn.Rd 24 | 5c0c6435c09dab8eede3bb7bc3c91be0 *man/chord_dat.Rd 25 | 8c9fc2bea8e29d8e69191d14876c4177 *man/circle_dat.Rd 26 | e571685ac3bd31d377069f5340b3902f *man/reduce_overlap.Rd 27 | da9c6b713c73d7cf668dab80ac28f2de *vignettes/GOBar.png 28 | 9d64131438541b889d07e8e2b7f44d30 *vignettes/GOBubble1.png 29 | 0ea657fee85af6e5dc3c8f43e2ec9cbd *vignettes/GOBubble2.png 30 | fbf7808d3c7235e04feac329eadc9295 *vignettes/GOBubble3.png 31 | cdb1824e3bf1830599ffe85eb60bbcd4 *vignettes/GOBubble4.png 32 | e5787103b84b40771253e4e76beaf5b3 *vignettes/GOChord1.png 33 | ea9374ef982d235dcbcc1f8b2158a804 *vignettes/GOCirc.png 34 | 8e332df56b7dd05c8c81bbda25811926 *vignettes/GOCluster.png 35 | 93aa247b84c448148ecbc7edf4e94c50 *vignettes/GOCluster2.png 36 | d6fb473e4aaa993d3532bc78fa32b5c9 *vignettes/GOHeat_lfc.png 37 | 140ee05839dbcb04483ab1735cc18f62 *vignettes/GOHeat_nolfc.png 38 | 0514ae4a1ca467974d4197efddba0639 *vignettes/GOVenn.png 39 | 5f57576c51fe172d93f3622cc433f762 *vignettes/GOplot.css 40 | bb17de94d29ab2e4eb59b63b01accfb6 *vignettes/GOplot_vignette.Rmd 41 | f9eb01d9aba255d27107fd69cc246ba5 *vignettes/Titel.png 42 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | # Generated by roxygen2: do not edit by hand 2 | 3 | export(GOBar) 4 | export(GOBubble) 5 | export(GOChord) 6 | export(GOCircle) 7 | export(GOCluster) 8 | export(GOHeat) 9 | export(GOVenn) 10 | export(chord_dat) 11 | export(circle_dat) 12 | export(reduce_overlap) 13 | import(RColorBrewer) 14 | import(ggdendro) 15 | import(ggplot2) 16 | import(grDevices) 17 | import(graphics) 18 | import(gridExtra) 19 | import(stats) 20 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | GOplot 1.0.2 (2016-03-29) 2 | ---------------------------------------- 3 | 4 | * Add function 'reduce_overlap' to reduce the number of redundant terms and improve readability of plots 5 | 6 | * Add parameter 'bg.col' to GOBubble() to enable panel background colour of facet plot 7 | 8 | * Add new plot function GOHeat() 9 | 10 | * Fix various bugs of draw_table() 11 | 12 | 13 | GOplot 1.0.1 (2015-07-15) 14 | ---------------------------------------- 15 | 16 | * Fix various bugs of GOVenn() 17 | 18 | * Fix bug of 'process.lable' argument in GOChord 19 | 20 | * Adjust draw_table() to new release of gridExtra 21 | 22 | * Add parameter 'limit' to chord_dat() to restrict the dimension of the binary martix -------------------------------------------------------------------------------- /R/GOCluster.R: -------------------------------------------------------------------------------- 1 | #' 2 | #' @name GOCluster 3 | #' @title Circular dendrogram. 4 | #' @description GOCluster generates a circular dendrogram of the \code{data} 5 | #' clustering using by default euclidean distance and average linkage.The 6 | #' inner ring displays the color coded logFC while the outside one encodes the 7 | #' assigned terms to each gene. 8 | #' @param data A data frame which should be the result of 9 | #' \code{\link{circle_dat}} in case the data contains only one logFC column. 10 | #' Otherwise \code{data} is a data frame whereas the first column contains the 11 | #' genes, the second the term and the following columns the logFCs of the 12 | #' different contrasts. 13 | #' @param process A character vector of selected processes (ID or term 14 | #' description) 15 | #' @param metric A character vector specifying the distance measure to be used 16 | #' (default='euclidean'), see \code{dist} 17 | #' @param clust A character vector specifying the agglomeration method to be 18 | #' used (default='average'), see \code{hclust} 19 | #' @param clust.by A character vector specifying if the clustering should be 20 | #' done for gene expression pattern or functional categories. By default the 21 | #' clustering is done based on the functional categories. 22 | #' @param nlfc If TRUE \code{data} contains multiple logFC columns (default= 23 | #' FALSE) 24 | #' @param lfc.col Character vector to define the color scale for the logFC of 25 | #' the form c(high, midpoint,low) 26 | #' @param lfc.min Specifies the minimium value of the logFC scale (default = -3) 27 | #' @param lfc.max Specifies the maximum value of the logFC scale (default = 3) 28 | #' @param lfc.space The space between the leafs of the dendrogram and the ring 29 | #' for the logFC 30 | #' @param lfc.width The width of the logFC ring 31 | #' @param term.col A character vector specifying the colors of the term bands 32 | #' @param term.space The space between the logFC ring and the term ring 33 | #' @param term.width The width of the term ring 34 | #' @details The inner ring can be split into smaller rings to display multiply 35 | #' logFC values resulting from various comparisons. 36 | #' @import ggplot2 37 | #' @import ggdendro 38 | #' @import RColorBrewer 39 | #' @import stats 40 | #' @examples 41 | #' \dontrun{ 42 | #' #Load the included dataset 43 | #' data(EC) 44 | #' 45 | #' #Generating the circ object 46 | #' circ<-circular_dat(EC$david, EC$genelist) 47 | #' 48 | #' #Creating the cluster plot 49 | #' GOCluster(circ, EC$process) 50 | #' 51 | #' #Cluster the data according to gene expression and assigning a different color scale for the logFC 52 | #' GOCluster(circ,EC$process,clust.by='logFC',lfc.col=c('darkgoldenrod1','black','cyan1')) 53 | #' } 54 | #' @export 55 | #' 56 | 57 | GOCluster<-function(data, process, metric, clust, clust.by, nlfc, lfc.col, lfc.min, lfc.max, lfc.space, lfc.width, term.col, term.space, term.width){ 58 | x <- y <- xend <- yend <- width <- space <- logFC <- NULL 59 | if (missing(metric)) metric<-'euclidean' 60 | if (missing(clust)) clust<-'average' 61 | if (missing(clust.by)) clust.by<-'term' 62 | if (missing(nlfc)) nlfc <- 0 63 | if (missing(lfc.col)) lfc.col<-c('firebrick1','white','dodgerblue') 64 | if (missing(lfc.min)) lfc.min <- -3 65 | if (missing(lfc.max)) lfc.max <- 3 66 | if (missing(lfc.space)) lfc.space<- (-0.5) else lfc.space<-lfc.space*(-1) 67 | if (missing(lfc.width)) lfc.width<- (-1.6) else lfc.width<-lfc.space-lfc.width-0.1 68 | if (missing(term.col)) term.col<-brewer.pal(length(process), 'Set3') 69 | if (missing(term.space)) term.space<- lfc.space+lfc.width else term.space<-term.space*(-1)+lfc.width 70 | if (missing(term.width)) term.width<- 2*lfc.width+term.space else term.width<-term.width*(-1)+term.space 71 | 72 | 73 | if (clust.by=='logFC') distance <- stats::dist(chord[,dim(chord)[2]], method=metric) 74 | if (clust.by=='term') distance <- stats::dist(chord, method=metric) 75 | cluster <- stats::hclust(distance, method=clust) 76 | dendr <- dendro_data(cluster) 77 | y_range <- range(dendr$segments$y) 78 | x_pos <- data.frame(x=dendr$label$x, label=as.character(dendr$label$label)) 79 | chord <- as.data.frame(chord) 80 | chord$label <- as.character(rownames(chord)) 81 | all <- merge(x_pos, chord, by='label') 82 | all$label <- as.character(all$label) 83 | if (nlfc){ 84 | lfc_rect <- all[,c(2, dim(all)[2])] 85 | for (l in 4:dim(data)[2]) lfc_rect <- cbind(lfc_rect, sapply(all$label, function(x) data[match(x, data$genes), l])) 86 | num <- dim(data)[2]-1 87 | tmp <- seq(lfc.space, lfc.width, length = num) 88 | lfc<-data.frame(x=numeric(),width=numeric(),space=numeric(),logFC=numeric()) 89 | for (l in 1:(length(tmp)-1)){ 90 | tmp_df<-data.frame(x=lfc_rect[,1],width=tmp[l+1],space=tmp[l],logFC=lfc_rect[,l+1]) 91 | lfc<-rbind(lfc,tmp_df) 92 | } 93 | }else{ 94 | lfc <- all[,c(2, dim(all)[2])] 95 | lfc$space <- lfc.space 96 | lfc$width <- lfc.width 97 | } 98 | term <- all[,c(2:(length(process)+2))] 99 | color<-NULL;termx<-NULL;tspace<-NULL;twidth<-NULL 100 | for (row in 1:dim(term)[1]){ 101 | idx <- which(term[row,-1] != 0) 102 | if(length(idx) != 0){ 103 | termx<-c(termx,rep(term[row,1],length(idx))) 104 | color<-c(color,term.col[idx]) 105 | tmp<-seq(term.space,term.width,length=length(idx)+1) 106 | tspace<-c(tspace,tmp[1:(length(tmp)-1)]) 107 | twidth<-c(twidth,tmp[2:length(tmp)]) 108 | } 109 | } 110 | tmp <- sapply(lfc$logFC, function(x) ifelse(x > lfc.max, lfc.max, x)) 111 | logFC <- sapply(tmp, function(x) ifelse(x < lfc.min, lfc.min, x)) 112 | lfc$logFC <- logFC 113 | term_rect <- data.frame(x = termx, width = twidth, space = tspace, col = color) 114 | legend <- data.frame(x = 1:length(process),label = process) 115 | 116 | ggplot()+ 117 | geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend))+ 118 | geom_rect(data=lfc,aes(xmin=x-0.5,xmax=x+0.5,ymin=width,ymax=space,fill=logFC))+ 119 | scale_fill_gradient2('logFC', space = 'Lab', low=lfc.col[3],mid=lfc.col[2],high=lfc.col[1],guide=guide_colorbar(title.position='top',title.hjust=0.5),breaks=c(min(lfc$logFC),max(lfc$logFC)),labels=c(round(min(lfc$logFC)),round(max(lfc$logFC))))+ 120 | geom_rect(data=term_rect,aes(xmin=x-0.5,xmax=x+0.5,ymin=width,ymax=space),fill=term_rect$col)+ 121 | geom_point(data=legend,aes(x=x,y=0.1,size=factor(label,levels=label),shape=NA))+ 122 | guides(size=guide_legend("GO Terms",ncol=4,byrow=T,override.aes=list(shape=22,fill=term.col,size = 8)))+ 123 | coord_polar()+ 124 | scale_y_reverse()+ 125 | theme(legend.position='bottom',legend.background = element_rect(fill='transparent'),legend.box='horizontal',legend.direction='horizontal')+ 126 | theme_blank 127 | 128 | } 129 | 130 | #' 131 | #' @name GOChord 132 | #' @title Displays the relationship between genes and terms. 133 | #' @description The GOChord function generates a circularly composited overview 134 | #' of selected/specific genes and their assigned processes or terms. More 135 | #' generally, it joins genes and processes via ribbons in an intersection-like 136 | #' graph. The input can be generated with the \code{\link{chord_dat}} 137 | #' function. 138 | #' @param data The matrix represents the binary relation (1= is related to, 0= 139 | #' is not related to) between a set of genes (rows) and processes (columns); a 140 | #' column for the logFC of the genes is optional 141 | #' @param title The title (on top) of the plot 142 | #' @param space The space between the chord segments of the plot 143 | #' @param gene.order A character vector defining the order of the displayed gene 144 | #' labels 145 | #' @param gene.size The size of the gene labels 146 | #' @param gene.space The space between the gene labels and the segement of the 147 | #' logFC 148 | #' @param nlfc Defines the number of logFC columns (default=1) 149 | #' @param lfc.col The fill color for the logFC specified in the following form: 150 | #' c(color for low values, color for the mid point, color for the high values) 151 | #' @param lfc.min Specifies the minimium value of the logFC scale (default = -3) 152 | #' @param lfc.max Specifies the maximum value of the logFC scale (default = 3) 153 | #' @param ribbon.col The background color of the ribbons 154 | #' @param border.size Defines the size of the ribbon borders 155 | #' @param process.label The size of the legend entries 156 | #' @param limit A vector with two cutoff values (default= c(0,0)). The first 157 | #' value defines the minimum number of terms a gene has to be assigned to. The 158 | #' second the minimum number of genes assigned to a selected term. 159 | #' @details The \code{gene.order} argument has three possible options: "logFC", 160 | #' "alphabetical", "none", which are quite self- explanatory. 161 | #' 162 | #' Maybe the most important argument of the function is \code{nlfc}.If your 163 | #' \code{data} does not contain a column of logFC values you have to set 164 | #' \code{nlfc = 0}. Differential expression analysis can be performed for 165 | #' multiple conditions and/or batches. Therefore, the data frame might contain 166 | #' more than one logFC value per gene. To adjust to this situation the 167 | #' \code{nlfc} argument is used as well. It is a numeric value and it defines 168 | #' the number of logFC columns of your \code{data}. The default is "1" 169 | #' assuming that most of the time only one contrast is considered. 170 | #' 171 | #' To represent the data more useful it might be necessary to reduce the 172 | #' dimension of \code{data}. This can be achieved with \code{limit}. The first 173 | #' value of the vector defines the threshold for the minimum number of terms a 174 | #' gene has to be assigned to in order to be represented in the plot. Most of 175 | #' the time it is more meaningful to represent genes with various functions. A 176 | #' value of 3 excludes all genes with less than three term assignments. 177 | #' Whereas the second value of the parameter restricts the number of terms 178 | #' according to the number of assigned genes. All terms with a count smaller 179 | #' or equal to the threshold are excluded. 180 | #' @seealso \code{\link{chord_dat}} 181 | #' @import ggplot2 182 | #' @import grDevices 183 | #' @examples 184 | #' \dontrun{ 185 | #' # Load the included dataset 186 | #' data(EC) 187 | #' 188 | #' # Generating the binary matrix 189 | #' chord<-chord_dat(circ,EC$genes,EC$process) 190 | #' 191 | #' # Creating the chord plot 192 | #' GOChord(chord) 193 | #' 194 | #' # Excluding process with less than 5 assigned genes 195 | #' GOChord(chord, limit = c(0,5)) 196 | #' 197 | #' # Creating the chord plot genes ordered by logFC and a different logFC color scale 198 | #' GOChord(chord,space=0.02,gene.order='logFC',lfc.col=c('red','black','cyan')) 199 | #' } 200 | #' @export 201 | 202 | GOChord <- function(data, title, space, gene.order, gene.size, gene.space, nlfc = 1, lfc.col, lfc.min, lfc.max, ribbon.col, border.size, process.label, limit){ 203 | y <- id <- xpro <- ypro <- xgen <- ygen <- lx <- ly <- ID <- logFC <- NULL 204 | Ncol <- dim(data)[2] 205 | 206 | if (missing(title)) title <- '' 207 | if (missing(space)) space = 0 208 | if (missing(gene.order)) gene.order <- 'none' 209 | if (missing(gene.size)) gene.size <- 3 210 | if (missing(gene.space)) gene.space <- 0.2 211 | if (missing(lfc.col)) lfc.col <- c('brown1', 'azure', 'cornflowerblue') 212 | if (missing(lfc.min)) lfc.min <- -3 213 | if (missing(lfc.max)) lfc.max <- 3 214 | if (missing(border.size)) border.size <- 0.5 215 | if (missing (process.label)) process.label <- 11 216 | if (missing(limit)) limit <- c(0, 0) 217 | 218 | if (gene.order == 'logFC') data <- data[order(data[, Ncol], decreasing = T), ] 219 | if (gene.order == 'alphabetical') data <- data[order(rownames(data)), ] 220 | if (sum(!is.na(match(colnames(data), 'logFC'))) > 0){ 221 | if (nlfc == 1){ 222 | cdata <- check_chord(data[, 1:(Ncol - 1)], limit) 223 | lfc <- sapply(rownames(cdata), function(x) data[match(x,rownames(data)), Ncol]) 224 | }else{ 225 | cdata <- check_chord(data[, 1:(Ncol - nlfc)], limit) 226 | lfc <- sapply(rownames(cdata), function(x) data[, (Ncol - nlfc + 1)]) 227 | } 228 | }else{ 229 | cdata <- check_chord(data, limit) 230 | lfc <- 0 231 | } 232 | if (missing(ribbon.col)) colRib <- grDevices::rainbow(dim(cdata)[2]) else colRib <- ribbon.col 233 | nrib <- colSums(cdata) 234 | ngen <- rowSums(cdata) 235 | Ncol <- dim(cdata)[2] 236 | Nrow <- dim(cdata)[1] 237 | colRibb <- c() 238 | for (b in 1:length(nrib)) colRibb <- c(colRibb, rep(colRib[b], 202 * nrib[b])) 239 | r1 <- 1; r2 <- r1 + 0.1 240 | xmax <- c(); x <- 0 241 | for (r in 1:length(nrib)){ 242 | perc <- nrib[r] / sum(nrib) 243 | xmax <- c(xmax, (pi * perc) - space) 244 | if (length(x) <= Ncol - 1) x <- c(x, x[r] + pi * perc) 245 | } 246 | xp <- c(); yp <- c() 247 | l <- 50 248 | for (s in 1:Ncol){ 249 | xh <- seq(x[s], x[s] + xmax[s], length = l) 250 | xp <- c(xp, r1 * sin(x[s]), r1 * sin(xh), r1 * sin(x[s] + xmax[s]), r2 * sin(x[s] + xmax[s]), r2 * sin(rev(xh)), r2 * sin(x[s])) 251 | yp <- c(yp, r1 * cos(x[s]), r1 * cos(xh), r1 * cos(x[s] + xmax[s]), r2 * cos(x[s] + xmax[s]), r2 * cos(rev(xh)), r2 * cos(x[s])) 252 | } 253 | df_process <- data.frame(x = xp, y = yp, id = rep(c(1:Ncol), each = 4 + 2 * l)) 254 | xp <- c(); yp <- c(); logs <- NULL 255 | x2 <- seq(0 - space, -pi - (-pi / Nrow) - space, length = Nrow) 256 | xmax2 <- rep(-pi / Nrow + space, length = Nrow) 257 | for (s in 1:Nrow){ 258 | xh <- seq(x2[s], x2[s] + xmax2[s], length = l) 259 | if (nlfc <= 1){ 260 | xp <- c(xp, (r1 + 0.05) * sin(x2[s]), (r1 + 0.05) * sin(xh), (r1 + 0.05) * sin(x2[s] + xmax2[s]), r2 * sin(x2[s] + xmax2[s]), r2 * sin(rev(xh)), r2 * sin(x2[s])) 261 | yp <- c(yp, (r1 + 0.05) * cos(x2[s]), (r1 + 0.05) * cos(xh), (r1 + 0.05) * cos(x2[s] + xmax2[s]), r2 * cos(x2[s] + xmax2[s]), r2 * cos(rev(xh)), r2 * cos(x2[s])) 262 | }else{ 263 | tmp <- seq(r1, r2, length = nlfc + 1) 264 | for (t in 1:nlfc){ 265 | logs <- c(logs, data[s, (dim(data)[2] + 1 - t)]) 266 | xp <- c(xp, (tmp[t]) * sin(x2[s]), (tmp[t]) * sin(xh), (tmp[t]) * sin(x2[s] + xmax2[s]), tmp[t + 1] * sin(x2[s] + xmax2[s]), tmp[t + 1] * sin(rev(xh)), tmp[t + 1] * sin(x2[s])) 267 | yp <- c(yp, (tmp[t]) * cos(x2[s]), (tmp[t]) * cos(xh), (tmp[t]) * cos(x2[s] + xmax2[s]), tmp[t + 1] * cos(x2[s] + xmax2[s]), tmp[t + 1] * cos(rev(xh)), tmp[t + 1] * cos(x2[s])) 268 | }}} 269 | if(lfc[1] != 0){ 270 | if (nlfc == 1){ 271 | df_genes <- data.frame(x = xp, y = yp, id = rep(c(1:Nrow), each = 4 + 2 * l), logFC = rep(lfc, each = 4 + 2 * l)) 272 | }else{ 273 | df_genes <- data.frame(x = xp, y = yp, id = rep(c(1:(nlfc*Nrow)), each = 4 + 2 * l), logFC = rep(logs, each = 4 + 2 * l)) 274 | } 275 | }else{ 276 | df_genes <- data.frame(x = xp, y = yp, id = rep(c(1:Nrow), each = 4 + 2 * l)) 277 | } 278 | aseq <- seq(0, 180, length = length(x2)); angle <- c() 279 | for (o in aseq) if((o + 270) <= 360) angle <- c(angle, o + 270) else angle <- c(angle, o - 90) 280 | df_texg <- data.frame(xgen = (r1 + gene.space) * sin(x2 + xmax2/2),ygen = (r1 + gene.space) * cos(x2 + xmax2 / 2),labels = rownames(cdata), angle = angle) 281 | df_texp <- data.frame(xpro = (r1 + 0.15) * sin(x + xmax / 2),ypro = (r1 + 0.15) * cos(x + xmax / 2), labels = colnames(cdata), stringsAsFactors = FALSE) 282 | cols <- rep(colRib, each = 4 + 2 * l) 283 | x.end <- c(); y.end <- c(); processID <- c() 284 | for (gs in 1:length(x2)){ 285 | val <- seq(x2[gs], x2[gs] + xmax2[gs], length = ngen[gs] + 1) 286 | pros <- which((cdata[gs, ] != 0) == T) 287 | for (v in 1:(length(val) - 1)){ 288 | x.end <- c(x.end, sin(val[v]), sin(val[v + 1])) 289 | y.end <- c(y.end, cos(val[v]), cos(val[v + 1])) 290 | processID <- c(processID, rep(pros[v], 2)) 291 | } 292 | } 293 | df_bezier <- data.frame(x.end = x.end, y.end = y.end, processID = processID) 294 | df_bezier <- df_bezier[order(df_bezier$processID,-df_bezier$y.end),] 295 | x.start <- c(); y.start <- c() 296 | for (rs in 1:length(x)){ 297 | val<-seq(x[rs], x[rs] + xmax[rs], length = nrib[rs] + 1) 298 | for (v in 1:(length(val) - 1)){ 299 | x.start <- c(x.start, sin(val[v]), sin(val[v + 1])) 300 | y.start <- c(y.start, cos(val[v]), cos(val[v + 1])) 301 | } 302 | } 303 | df_bezier$x.start <- x.start 304 | df_bezier$y.start <- y.start 305 | df_path <- bezier(df_bezier, colRib) 306 | if(length(df_genes$logFC) != 0){ 307 | tmp <- sapply(df_genes$logFC, function(x) ifelse(x > lfc.max, lfc.max, x)) 308 | logFC <- sapply(tmp, function(x) ifelse(x < lfc.min, lfc.min, x)) 309 | df_genes$logFC <- logFC 310 | } 311 | 312 | g<- ggplot() + 313 | geom_polygon(data = df_process, aes(x, y, group=id), fill='gray70', inherit.aes = F,color='black') + 314 | geom_polygon(data = df_process, aes(x, y, group=id), fill=cols, inherit.aes = F,alpha=0.6,color='black') + 315 | geom_point(aes(x = xpro, y = ypro, size = factor(labels, levels = labels), shape = NA), data = df_texp) + 316 | guides(size = guide_legend("GO Terms", ncol = 4, byrow = T, override.aes = list(shape = 22, fill = unique(cols), size = 8))) + 317 | theme(legend.text = element_text(size = process.label)) + 318 | geom_text(aes(xgen, ygen, label = labels, angle = angle), data = df_texg, size = gene.size) + 319 | geom_polygon(aes(x = lx, y = ly, group = ID), data = df_path, fill = colRibb, color = 'black', size = border.size, inherit.aes = F) + 320 | labs(title = title) + 321 | theme_blank 322 | 323 | if (nlfc >= 1){ 324 | g + geom_polygon(data = df_genes, aes(x, y, group = id, fill = logFC), inherit.aes = F, color = 'black') + 325 | scale_fill_gradient2('logFC', space = 'Lab', low = lfc.col[3], mid = lfc.col[2], high = lfc.col[1], guide = guide_colorbar(title.position = "top", title.hjust = 0.5), 326 | breaks = c(min(df_genes$logFC), max(df_genes$logFC)), labels = c(round(min(df_genes$logFC)), round(max(df_genes$logFC)))) + 327 | theme(legend.position = 'bottom', legend.background = element_rect(fill = 'transparent'), legend.box = 'horizontal', legend.direction = 'horizontal') 328 | }else{ 329 | g + geom_polygon(data = df_genes, aes(x, y, group = id), fill = 'gray50', inherit.aes = F, color = 'black')+ 330 | theme(legend.position = 'bottom', legend.background = element_rect(fill = 'transparent'), legend.box = 'horizontal', legend.direction = 'horizontal') 331 | } 332 | } -------------------------------------------------------------------------------- /R/GOCore.R: -------------------------------------------------------------------------------- 1 | #' Transcriptomic information of endothelial cells. 2 | #' 3 | #' The data set contains the transcriptomic information of endothelial cells 4 | #' from two steady state tissues (brain and heart). More detailed information 5 | #' can be found in the paper by Nolan et al. 2013. The data was normalized and a 6 | #' statistical analysis was performed to determine differentially expressed 7 | #' genes. DAVID functional annotation tool was used to perform a gene- 8 | #' annotation enrichment analysis of the set of differentially expressed genes 9 | #' (adjusted p-value < 0.05). 10 | #' 11 | #' @docType data 12 | #' @keywords datasets 13 | #' @name EC 14 | #' @usage data(EC) 15 | #' @format A list containing 5 items 16 | #' @source \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067} 17 | "EC" 18 | 19 | #' 20 | #' @name circle_dat 21 | #' @title Creates a plotting object. 22 | #' @description The function takes the results from a functional analysis (for 23 | #' example DAVID) and combines it with a list of selected genes and their 24 | #' logFC. The resulting data frame can be used as an input for various ploting 25 | #' functions. 26 | #' @param terms A data frame with columns for 'category', 'ID', 'term', adjusted 27 | #' p-value ('adj_pval') and 'genes' 28 | #' @param genes A data frame with columns for 'ID', 'logFC' 29 | #' @details Since most of the gene- annotation enrichment analysis are based on 30 | #' the gene ontology database the package was build with this structure in 31 | #' mind, but is not restricted to it. Gene ontology is structured as an 32 | #' acyclic graph and it provides terms covering different areas. These terms 33 | #' are grouped into three independent \code{categories}: BP (biological 34 | #' process), CC (cellular component) or MF (molecular function). 35 | #' 36 | #' The "ID" and "term" columns of the \code{terms} data frame refer to the ID 37 | #' and term description, whereas the ID is optional. 38 | #' 39 | #' The "ID" column of the \code{genes} data frame can contain any unique 40 | #' identifier. Nevertheless, the identifier has to be the same as in "genes" 41 | #' from \code{terms}. 42 | #' @examples 43 | #' \dontrun{ 44 | #' #Load the included dataset 45 | #' data(EC) 46 | #' 47 | #' #Building the circ object 48 | #' circ<-circular_dat(EC$david, EC$genelist) 49 | #' } 50 | #' @export 51 | 52 | circle_dat <- function(terms, genes){ 53 | 54 | colnames(terms) <- tolower(colnames(terms)) 55 | terms$genes <- toupper(terms$genes) 56 | genes$ID <- toupper(genes$ID) 57 | tgenes <- strsplit(as.vector(terms$genes), ', ') 58 | if (length(tgenes[[1]]) == 1) tgenes <- strsplit(as.vector(terms$genes), ',') 59 | count <- sapply(1:length(tgenes), function(x) length(tgenes[[x]])) 60 | logFC <- sapply(unlist(tgenes), function(x) genes$logFC[match(x, genes$ID)]) 61 | if(class(logFC) == 'factor'){ 62 | logFC <- gsub(",", ".", gsub("\\.", "", logFC)) 63 | logFC <- as.numeric(logFC) 64 | } 65 | s <- 1; zsc <- c() 66 | for (c in 1:length(count)){ 67 | value <- 0 68 | e <- s + count[c] - 1 69 | value <- sapply(logFC[s:e], function(x) ifelse(x > 0, 1, -1)) 70 | zsc <- c(zsc, sum(value) / sqrt(count[c])) 71 | s <- e + 1 72 | } 73 | if (is.null(terms$id)){ 74 | df <- data.frame(category = rep(as.character(terms$category), count), term = rep(as.character(terms$term), count), 75 | count = rep(count, count), genes = as.character(unlist(tgenes)), logFC = logFC, adj_pval = rep(terms$adj_pval, count), 76 | zscore = rep(zsc, count), stringsAsFactors = FALSE) 77 | }else{ 78 | df <- data.frame(category = rep(as.character(terms$category), count), ID = rep(as.character(terms$id), count), term = rep(as.character(terms$term), count), 79 | count = rep(count, count), genes = as.character(unlist(tgenes)), logFC = logFC, adj_pval = rep(terms$adj_pval, count), 80 | zscore = rep(zsc, count), stringsAsFactors = FALSE) 81 | } 82 | return(df) 83 | } 84 | 85 | #' 86 | #' @name chord_dat 87 | #' @title Creates a binary matrix. 88 | #' @description The function creates a matrix which represents the binary 89 | #' relation (1= is related to, 0= is not related to) between selected genes 90 | #' (row) and processes (column). The resulting matrix can be visualized with 91 | #' the \code{\link{GOChord}} function. 92 | #' @param data A data frame with at least two coloumns: GO ID|term and genes. 93 | #' Each row contains exactly one GO ID|term and one gene. A column containing 94 | #' logFC values is optional and might be used if \code{genes} is missing. 95 | #' @param genes A character vector of selected genes OR data frame with coloumns 96 | #' for gene ID and logFC. 97 | #' @details If more than one logFC value for each gene is at disposal, only one 98 | #' should be used to create the binary matrix. The other values have to be 99 | #' added manually later. 100 | #' @param process A character vector of selected processes 101 | #' @return A binary matrix 102 | #' @seealso \code{\link{GOChord}} 103 | #' @examples 104 | #' \dontrun{ 105 | #' # Load the included dataset 106 | #' data(EC) 107 | #' 108 | #' # Building the circ object 109 | #' circ <- circle_dat(EC$david, EC$genelist) 110 | #' 111 | #' # Building the binary matrix 112 | #' chord <- chord_dat(circ, EC$genes, EC$process) 113 | #' 114 | #' } 115 | #' @export 116 | 117 | chord_dat <- function(data, genes, process){ 118 | id <- term <- logFC <- BPprocess <- NULL 119 | 120 | colnames(data) <- tolower(colnames(data)) 121 | if (missing(genes)){ 122 | if (is.null(data$logFC)){ 123 | genes <- as.character(unique(data$genes)) 124 | }else{ 125 | genes <- subset(data, !duplicated(genes), c(genes, logFC)) 126 | } 127 | }else{ 128 | if(is.vector(genes)){ 129 | genes <- as.character(genes) 130 | }else{ 131 | if(class(genes[, 2]) != 'numeric') genes[, 2] <- as.numeric(levels(genes[, 2]))[genes[, 2]] 132 | genes[, 1] <- as.character(genes[, 1]) 133 | colnames(genes) <- c('genes', 'logFC') 134 | } 135 | } 136 | if (missing(process)){ 137 | process <- as.character(unique(data$term)) 138 | }else{ 139 | if(class(process) != 'character') process <- as.character(process) 140 | } 141 | if (strsplit(process[1],':')[[1]][1] == 'GO'){ 142 | subData <- subset(data, id%in%process) 143 | colnames(subData)[which(colnames(subData) == 'id')] <- 'BPprocess' 144 | }else{ 145 | subData <- subset(data, term%in%process) 146 | colnames(subData)[which(colnames(subData) == 'term')] <- 'BPprocess' 147 | } 148 | 149 | if(is.vector(genes)){ 150 | M <- genes[genes%in%unique(subData$genes)] 151 | mat <- matrix(0, ncol = length(process), nrow = length(M)) 152 | rownames(mat) <- M 153 | colnames(mat) <- process 154 | for (p in 1:length(process)){ 155 | sub2 <- subset(subData, BPprocess == process[p]) 156 | for (g in 1:length(M)) mat[g, p] <- ifelse(M[g]%in%sub2$genes, 1, 0) 157 | } 158 | }else{ 159 | genes <- subset(genes, genes %in% unique(subData$genes)) 160 | N <- length(process) + 1 161 | M <- genes[,1] 162 | mat <- matrix(0, ncol = N, nrow = length(M)) 163 | rownames(mat) <- M 164 | colnames(mat) <- c(process, 'logFC') 165 | mat[,N] <- genes[,2] 166 | for (p in 1:(N-1)){ 167 | sub2 <- subset(subData, BPprocess == process[p]) 168 | for (g in 1:length(M)) mat[g, p] <- ifelse(M[g]%in%sub2$genes, 1, 0) 169 | } 170 | } 171 | return(mat) 172 | } 173 | 174 | #' 175 | #' @name reduce_overlap 176 | #' @title Eliminates redundant terms. 177 | #' @description The function eliminates all terms with a gene overlap >= set 178 | #' threshold (\code{overlap}) The reduced dataset can be used to improve the 179 | #' readability of plots such as \code{GOBubble} and \code{GOBar} 180 | #' @param data A data frame created with \code{circle_dat}. 181 | #' @param overlap Skalar indicating the threshold for gene overlap (default = 0.75). 182 | #' @details The function is currently very slow. 183 | #' @examples 184 | #' \dontrun{ 185 | #' # Load the included dataset 186 | #' data(EC) 187 | #' 188 | #' # Building the circ object 189 | #' circ <- circle_dat(EC$david, EC$genelist) 190 | #' 191 | #' # Eliminate redundant terms 192 | #' reduced_circ <- reduce_overlap(circ) 193 | #' 194 | #' # Plot reduced data 195 | #' GOBubble(reduced_circ) 196 | #' 197 | #' } 198 | #' @export 199 | 200 | reduce_overlap <- function(data, overlap){ 201 | term <- genes <- NULL 202 | if (missing(overlap)) overlap <- 0.75 203 | terms <- unique(data$term) 204 | FUN <- function(x,y) round(sum(x$genes %in% y$genes)/nrow(x), digits = 2) 205 | tmp <- matrix(0, ncol = length(terms), nrow = length(terms), dimnames = list(terms, terms)) 206 | for (row in 1:nrow(tmp)){ 207 | for (col in 1:ncol(tmp)){ 208 | tmp[row, col] <- FUN(subset(data, term == terms[row], genes), subset(data, term == terms[col], genes)) 209 | } 210 | } 211 | tmp[base::upper.tri(tmp)] <- 0 212 | for(col in 1:ncol(tmp)){ 213 | idx <- which(tmp[,col] >= overlap) 214 | sel_col <- idx[which(idx != col)] 215 | tmp[,sel_col] <- 0 216 | } 217 | sel_terms <- colnames(tmp)[colSums(tmp) != 0] 218 | dat <- subset(data, term %in% sel_terms) 219 | data <- dat[!duplicated(dat$term), ] 220 | return(data) 221 | } 222 | 223 | #' 224 | #' @name GOBubble 225 | #' @title Bubble plot. 226 | #' @description The function creates a bubble plot of the input \code{data}. The 227 | #' input \code{data} can be created with the help of the 228 | #' \code{\link{circle_dat}} function. 229 | #' @param data A data frame with coloumns for category, GO ID, term, adjusted 230 | #' p-value, z-score, count(num of genes) 231 | #' @param display A character vector. Indicates whether it should be a single 232 | #' plot ('single') or a facet plot with panels for each category 233 | #' (default='single') 234 | #' @param title The title (on top) of the plot 235 | #' @param colour A character vector which defines the colour of the bubbles for 236 | #' each category 237 | #' @param labels Sets a threshold for the displayed labels. The threshold refers 238 | #' to the -log(adjusted p-value) (default=5) 239 | #' @param ID If TRUE then labels are IDs else terms 240 | #' @param table.legend Defines whether a table of GO ID and GO term should be 241 | #' displayed on the right side of the plot or not (default = TRUE) 242 | #' @param table.col If TRUE then the table entries are coloured according to 243 | #' their category, if FALSE then entries are black 244 | #' @param bg.col Should only be used in case of a facet plot. If TRUE then the 245 | #' panel backgrounds are coloured according to the displayed category 246 | #' @details The x- axis of the plot represents the z-score. The negative 247 | #' logarithm of the adjusted p-value (corresponding to the significance of the 248 | #' term) is displayed on the y-axis. The area of the plotted circles is 249 | #' proportional to the number of genes assigned to the term. Each circle is 250 | #' coloured according to its category and labeled alternatively with the ID or 251 | #' term name.If static is set to FALSE the mouse hover effect will be enabled. 252 | #' @import ggplot2 253 | #' @import gridExtra 254 | #' @import graphics 255 | #' @examples 256 | #' \dontrun{ 257 | #' #Load the included dataset 258 | #' data(EC) 259 | #' 260 | #' #Building the circ object 261 | #' circ <- circular_dat(EC$david, EC$genelist) 262 | #' 263 | #' #Creating the bubble plot colouring the table entries according to the category 264 | #' GOBubble(circ, table.col = T) 265 | #' 266 | #' #Creating the bubble plot displaying the term instead of the ID and without the table 267 | #' GOBubble(circ, ID = F, table.legend = F) 268 | #' 269 | #' #Faceting the plot 270 | #' GOBubble(circ, display = 'multiple') 271 | #' } 272 | #' @export 273 | GOBubble <- function(data, display, title, colour, labels, ID = T, table.legend = T, table.col = T, bg.col = F){ 274 | zscore <- adj_pval <- category <- count <- id <- term <- NULL 275 | if (missing(display)) display <- 'single' 276 | if (missing(title)) title <- '' 277 | if (missing(colour)) cols <- c("chartreuse4", "brown2", "cornflowerblue") else cols <- colour 278 | if (missing(labels)) labels <- 5 279 | if (bg.col == T & display == 'single') cat("Parameter bg.col will be ignored. To use the parameter change display to 'multiple'") 280 | 281 | colnames(data) <- tolower(colnames(data)) 282 | if(!'count'%in%colnames(data)){ 283 | rang <- c(5, 5) 284 | data$count <- rep(1, dim(data)[1]) 285 | }else {rang <- c(1, 30)} 286 | data$adj_pval <- -log(data$adj_pval, 10) 287 | sub <- data[!duplicated(data$term), ] 288 | g <- ggplot(sub, aes(zscore, adj_pval, fill = category, size = count))+ 289 | labs(title = title, x = 'z-score', y = '-log (adj p-value)')+ 290 | geom_point(shape = 21, col = 'black', alpha = 1 / 2)+ 291 | geom_hline(yintercept = 1.3, col = 'orange')+ 292 | scale_size(range = rang, guide = 'none') 293 | if (!is.character(labels)) sub2 <- subset(sub, subset = sub$adj_pval >= labels) else sub2 <- subset(sub, sub$id%in%labels | sub$term%in%labels) 294 | if (display == 'single'){ 295 | g <- g + scale_fill_manual('Category', values = cols, labels = c('Biological Process', 'Cellular Component', 'Molecular Function'))+ 296 | theme(legend.position = 'bottom')+ 297 | annotate ("text", x = min(sub$zscore)+0.2, y = 1.4, label = "Threshold", colour = "orange", size = 4) 298 | if (ID) g <- g+ geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = id), size = 5) else g <- g + geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = term), size = 4) 299 | if (table.legend){ 300 | if (table.col) table <- draw_table(sub2, col = cols) else table <- draw_table(sub2) 301 | g <- g + theme(axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'), 302 | axis.title = element_text(size = 14, face = 'bold'), panel.background = element_blank(), panel.grid.minor = element_blank(), 303 | panel.grid.major = element_line(colour = 'grey80'), plot.background = element_blank()) 304 | graphics::par(mar = c(0.1, 0.1, 0.1, 0.1)) 305 | grid.arrange(g, table, ncol = 2) 306 | }else{ 307 | g + theme(axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'), 308 | axis.title = element_text(size = 14, face = 'bold'), panel.background = element_blank(), panel.grid.minor = element_blank(), 309 | panel.grid.major = element_line(colour = 'grey80'), plot.background = element_blank()) 310 | } 311 | }else{ 312 | if(bg.col){ 313 | dummy_col <- data.frame(category = c('BP', 'CC', 'MF'), adj_pval = sub$adj_pval[1:3], zscore = sub$zscore[1:3], size = 1:3, count = 1:3) 314 | g <- g + geom_rect(data = dummy_col, aes(fill = category), xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, alpha = 0.1)+ 315 | facet_grid(.~category, space = 'free_x', scales = 'free_x')+ 316 | scale_fill_manual(values = cols, guide ='none') 317 | }else{ 318 | g <- g + facet_grid(.~category, space = 'free_x', scales = 'free_x')+ 319 | scale_fill_manual(values = cols, guide ='none') 320 | } 321 | if (ID) { 322 | g + geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = id), size = 5) + 323 | theme(axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), 324 | axis.ticks = element_line(colour = 'grey80'), panel.border = element_rect(fill = 'transparent', colour = 'grey80'), 325 | panel.background = element_blank(), panel.grid = element_blank(), plot.background = element_blank()) 326 | }else{ 327 | g + geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = term), size = 5) + 328 | theme(axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), 329 | axis.ticks = element_line(colour = 'grey80'), panel.border = element_rect(fill = 'transparent', colour = 'grey80'), 330 | panel.background = element_blank(), panel.grid = element_blank(), plot.background = element_blank()) 331 | } 332 | } 333 | } 334 | 335 | #' 336 | #' @name GOBar 337 | #' @title Z-score coloured barplot. 338 | #' @description Z-score coloured barplot of terms ordered alternatively by 339 | #' z-score or the negative logarithm of the adjusted p-value 340 | #' @param data A data frame containing at least the term ID and/or term, the 341 | #' adjusted p-value and the z-score. A possible input can be generated with 342 | #' the \code{circle_dat} function 343 | #' @param display A character vector indicating whether a single plot ('single') 344 | #' or a facet plot with panels for each category should be drawn 345 | #' (default='single') 346 | #' @param order.by.zscore Defines the order of the bars. If TRUE the bars are 347 | #' ordered according to the z-scores of the processes. Otherwise the bars are 348 | #' ordered by the negative logarithm of the adjusted p-value 349 | #' @param title The title of the plot 350 | #' @param zsc.col Character vector to define the colour scale for the z-score of 351 | #' the form c(high, midpoint,low) 352 | #' @details If \code{display} is used to facet the plot the width of the panels 353 | #' will be proportional to the length of the x scale. 354 | #' @import ggplot2 355 | #' @import gridExtra 356 | #' @import stats 357 | #' @examples 358 | #' \dontrun{ 359 | #' #Load the included dataset 360 | #' data(EC) 361 | #' 362 | #' #Building the circ object 363 | #' circ<-circular_dat(EC$david, EC$genelist) 364 | #' 365 | #' #Creating the bar plot 366 | #' GOBar(circ) 367 | #' 368 | #' #Faceting the plot 369 | #' GOBar(circ, display='multiple') 370 | #' } 371 | #' @export 372 | 373 | GOBar <- function(data, display, order.by.zscore = T, title, zsc.col){ 374 | id <- adj_pval <- zscore <- NULL 375 | if (missing(display)) display <- 'single' 376 | if (missing(title)) title <- '' 377 | if (missing(zsc.col)) zsc.col <- c('firebrick1', 'white', 'dodgerblue1') 378 | colnames(data) <- tolower(colnames(data)) 379 | data$adj_pval <- -log(data$adj_pval, 10) 380 | sub <- data[!duplicated(data$term), ] 381 | 382 | if (order.by.zscore == T) { 383 | sub <- sub[order(sub$zscore, decreasing = T), ] 384 | leg <- theme(legend.position = 'bottom') 385 | g <- ggplot(sub, aes(x = factor(id, levels = stats::reorder(id, adj_pval)), y = adj_pval, fill = zscore)) + 386 | geom_bar(stat = 'identity', colour = 'black') + 387 | scale_fill_gradient2('z-score', space = 'Lab', low = zsc.col[3], mid = zsc.col[2], high = zsc.col[1], guide = guide_colourbar(title.position = "top", title.hjust = 0.5), 388 | breaks = c(min(sub$zscore), max(sub$zscore)), labels = c('decreasing', 'increasing')) + 389 | labs(title = title, x = '', y = '-log (adj p-value)') + 390 | leg 391 | }else{ 392 | sub <- sub[order(sub$adj_pval, decreasing = T), ] 393 | leg <- theme(legend.justification = c(1, 1), legend.position = c(0.98, 0.995), legend.background = element_rect(fill = 'transparent'), 394 | legend.box = 'vertical', legend.direction = 'horizontal') 395 | g <- ggplot(sub, aes( x = factor(id, levels = reorder(id, adj_pval)), y = zscore, fill = adj_pval)) + 396 | geom_bar(stat = 'identity', colour = 'black') + 397 | scale_fill_gradient2('Significance', space = 'Lab', low = zsc.col[3], mid = zsc.col[2], high = zsc.col[1], guide = guide_colourbar(title.position = "top", title.hjust = 0.5), breaks = c(min(sub$adj_pval), max(sub$adj_pval)), labels = c('low', 'high')) + 398 | labs(title = title, x = '', y = 'z-score') + 399 | leg 400 | } 401 | if (display == 'single'){ 402 | g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'), 403 | axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), panel.background = element_blank(), 404 | panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank()) 405 | }else{ 406 | g + facet_grid(.~category, space = 'free_x', scales = 'free_x')+ 407 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'), 408 | axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), panel.background = element_blank(), 409 | panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank()) 410 | } 411 | } 412 | 413 | #' 414 | #' @name GOCircle 415 | #' @title Circular visualization of the results of a functional analysis. 416 | #' @description The circular plot combines gene expression and gene- annotation 417 | #' enrichment data. A subset of terms is displayed like the \code{GOBar} plot 418 | #' in combination with a scatterplot of the gene expression data. The whole 419 | #' plot is drawn on a specific coordinate system to achieve the circular 420 | #' layout.The segments are labeled with the term ID. 421 | #' @param data A special data frame which should be the result of 422 | #' \code{circle_dat} 423 | #' @param title The title of the plot 424 | #' @param nsub A numeric or character vector. If it's numeric then the number 425 | #' defines how many processes are displayed (starting from the first row of 426 | #' \code{data}). If it's a character string of processes then these processes 427 | #' are displayed 428 | #' @param rad1 The radius of the inner circle (default=2) 429 | #' @param rad2 The radius of the outer circle (default=3) 430 | #' @param table.legend Shall a table be displayd or not? (default=TRUE) 431 | #' @param zsc.col Character vector to define the colour scale for the z-score of 432 | #' the form c(high, midpoint,low) 433 | #' @param lfc.col A character vector specifying the colour for up- and 434 | #' down-regulated genes 435 | #' @param label.size Size of the segment labels (default=5) 436 | #' @param label.fontface Font style of the segment labels (default='bold') 437 | #' @details The outer circle shows a scatter plot for each term of the logFC of 438 | #' the assigned genes. The colours can be changed with the argument 439 | #' \code{lfc.col}. 440 | #' 441 | #' The \code{nsub} argument needs a bit more explanation to be used wisely. First of 442 | #' all, it can be a numeric or a character vector. If it is a character vector 443 | #' then it contains the IDs or term descriptions of the displayed processes.If 444 | #' \code{nsub} is a numeric vector then the number defines how many terms are 445 | #' displayed. It starts with the first row of the input data frame. 446 | #' @import ggplot2 447 | #' @import gridExtra 448 | #' @import stats 449 | #' @import graphics 450 | #' @seealso \code{\link{circle_dat}}, \code{\link{GOBar}} 451 | #' @examples 452 | #' \dontrun{ 453 | #' # Load the included dataset 454 | #' data(EC) 455 | #' 456 | #' # Building the circ object 457 | #' circ <- circle_dat(EC$david, EC$genelist) 458 | #' 459 | #' # Creating the circular plot 460 | #' GOCircle(circ) 461 | #' 462 | #' # Creating the circular plot with a different colour scale for the logFC 463 | #' GOCircle(circ, lfc.col = c('purple', 'orange')) 464 | #' 465 | #' # Creating the circular plot with a different colour scale for the z-score 466 | #' GOCircle(circ, zsc.col = c('yellow', 'black', 'cyan')) 467 | #' 468 | #' # Creating the circular plot with different font style 469 | #' GOCircle(circ, label.size = 5, label.fontface = 'italic') 470 | #' } 471 | #' @export 472 | 473 | GOCircle <- function(data, title, nsub, rad1, rad2, table.legend = T, zsc.col, lfc.col, label.size, label.fontface){ 474 | xmax <- y1<- zscore <- y2 <- ID <- logx <- logy2 <- logy <- logFC <- NULL 475 | if (missing(title)) title <- '' 476 | if (missing(nsub)) if (dim(data)[1] > 10) nsub <- 10 else nsub <- dim(data)[1] 477 | if (missing(rad1)) rad1 <- 2 478 | if (missing(rad2)) rad2 <- 3 479 | if (missing(zsc.col)) zsc.col <- c('red', 'white', 'blue') 480 | if (missing(lfc.col)) lfc.col <- c('cornflowerblue', 'firebrick1') else lfc.col <- rev(lfc.col) 481 | if (missing(label.size)) label.size = 5 482 | if (missing(label.fontface)) label.fontface = 'bold' 483 | 484 | data$adj_pval <- -log(data$adj_pval, 10) 485 | suby <- data[!duplicated(data$term), ] 486 | if (is.numeric(nsub) == T){ 487 | suby <- suby[1:nsub, ] 488 | }else{ 489 | if (strsplit(nsub[1], ':')[[1]][1] == 'GO'){ 490 | suby <- suby[suby$ID%in%nsub, ] 491 | }else{ 492 | suby <- suby[suby$term%in%nsub, ] 493 | } 494 | nsub <- length(nsub)} 495 | N <- dim(suby)[1] 496 | r_pval <- round(range(suby$adj_pval), 0) + c(-2, 2) 497 | ymax <- c() 498 | for (i in 1:length(suby$adj_pval)){ 499 | val <- (suby$adj_pval[i] - r_pval[1]) / (r_pval[2] - r_pval[1]) 500 | ymax <- c(ymax, val)} 501 | df <- data.frame(x = seq(0, 10 - (10 / N), length = N), xmax = rep(10 / N - 0.2, N), y1 = rep(rad1, N), y2 = rep(rad2, N), ymax = ymax, zscore = suby$zscore, ID = suby$ID) 502 | scount <- data[!duplicated(data$term), which(colnames(data) == 'count')][1:nsub] 503 | idx_term <- which(!duplicated(data$term) == T) 504 | xm <- c(); logs <- c() 505 | for (sc in 1:length(scount)){ 506 | idx <- c(idx_term[sc], idx_term[sc] + scount[sc] -1) 507 | val <- stats::runif(scount[sc], df$x[sc] + 0.06, (df$x[sc] + df$xmax[sc] - 0.06)) 508 | xm <- c(xm, val) 509 | r_logFC <- round(range(data$logFC[idx[1]:idx[2]]), 0) + c(-1, 1) 510 | for (lfc in idx[1]:idx[2]){ 511 | val <- (data$logFC[lfc] - r_logFC[1]) / (r_logFC[2] - r_logFC[1]) 512 | logs <- c(logs, val)} 513 | } 514 | cols <- c() 515 | for (ys in 1:length(logs)) cols <- c(cols, ifelse(data$logFC[ys] > 0, 'upregulated', 'downregulated')) 516 | dfp <- data.frame(logx = xm, logy = logs, logFC = factor(cols), logy2 = rep(rad2, length(logs))) 517 | c <- ggplot()+ 518 | geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y1, ymax = y1 + ymax, fill = zscore), colour = 'black') + 519 | geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2, ymax = y2 + 1), fill = 'gray70') + 520 | geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2 + 0.5, ymax = y2 + 0.5), colour = 'white') + 521 | geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2 + 0.25, ymax = y2 + 0.25), colour = 'white') + 522 | geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2 + 0.75, ymax = y2 + 0.75), colour = 'white') + 523 | geom_text(data = df, aes(x = x + (xmax / 2), y = y2 + 1.3, label = ID, angle = 360 - (x = x + (xmax / 2)) / (10 / 360)), size = label.size, fontface = label.fontface) + 524 | coord_polar() + 525 | labs(title = title) + 526 | ylim(1, rad2 + 1.6) + 527 | xlim(0, 10) + 528 | theme_blank + 529 | scale_fill_gradient2('z-score', space = 'Lab', low = zsc.col[3], mid = zsc.col[2], high = zsc.col[1], guide = guide_colourbar(title.position = "top", title.hjust = 0.5), breaks = c(min(df$zscore), max(df$zscore)),labels = c('decreasing', 'increasing')) + 530 | theme(legend.position = 'bottom', legend.background = element_rect(fill = 'transparent'), legend.box = 'horizontal', legend.direction = 'horizontal') + 531 | geom_point(data = dfp, aes(x = logx, y = logy2 + logy), pch = 21, fill = 'transparent', colour = 'black', size = 3)+ 532 | geom_point(data = dfp, aes(x = logx, y = logy2 + logy, colour = logFC), size = 2.5)+ 533 | scale_colour_manual(values = lfc.col, guide = guide_legend(title.position = "top", title.hjust = 0.5)) 534 | 535 | if (table.legend){ 536 | table <- draw_table(suby) 537 | graphics::par(mar = c(0.1, 0.1, 0.1, 0.1)) 538 | grid.arrange(c, table, ncol = 2) 539 | }else{ 540 | c + theme(plot.background = element_rect(fill = 'aliceblue'), panel.background = element_rect(fill = 'white')) 541 | } 542 | } 543 | -------------------------------------------------------------------------------- /R/GOHeat.R: -------------------------------------------------------------------------------- 1 | #' 2 | #' @name GOHeat 3 | #' @title Displays heatmap of the relationship between genes and terms. 4 | #' @description The GOHeat function generates a heatmap of the relationship 5 | #' between genes and terms. Biological processes are displayed in rows and 6 | #' genes in columns. In addition genes are clustered to highlight groups of 7 | #' genes with similar annotated functions. The input can be generated with the 8 | #' \code{\link{chord_dat}} function. 9 | #' @param data The matrix represents the binary relation (1= is related to, 0= 10 | #' is not related to) between a set of genes (rows) and processes (columns) 11 | #' @param nlfc Defines the number of logFC columns (default = 0) 12 | #' @param fill.col Defines the color scale break points 13 | #' @details The heatmap has in general two modes which depend on the \code{nlfc} 14 | #' argument. If \code{nlfc = 0}, so no logFC values are available, the 15 | #' coloring encodes for the overall number of processes the respective gene is 16 | #' assigned to. In case of \code{nlfc = 1} the color corresponds to the logFC 17 | #' of the gene. 18 | #' @import ggplot2 19 | #' @examples 20 | #' \dontrun{ 21 | #' # Load the included dataset 22 | #' data(EC) 23 | #' 24 | #' # Generate the circ object 25 | #' circ <- circle_dat(EC$david, EC$genelist) 26 | #' 27 | #' # Generate the chord object 28 | #' chord <- chord_dat(circ, EC$genes, EC$process) 29 | #' 30 | #' # Create the plot with user-defined colors 31 | #' GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green')) 32 | #' } 33 | #' @export 34 | #' 35 | 36 | GOHeat <- function(data, nlfc, fill.col){ 37 | x <- y <- z <- NULL 38 | if(missing(nlfc)) nlfc <- 0 else nlfc <- nlfc 39 | if(missing(fill.col)) fill.col <- c('firebrick', 'white', 'dodgerblue') else fill.col <- fill.col 40 | 41 | distance <- dist(data) 42 | cluster <- hclust(distance) 43 | M <- dim(data)[2] 44 | nterm <- M - nlfc 45 | if(nlfc == 0){ 46 | s <- rowSums(data[,1:nterm]) 47 | tmp <- NULL 48 | for(r in 1:nrow(data)){ 49 | tmp <- c(tmp, as.numeric(gsub(1, s[r], data[r, 1:nterm]))) 50 | } 51 | }else{ 52 | tmp <- NULL 53 | for(r in 1:nrow(data)){ 54 | tmp <- c(tmp, as.numeric(gsub(1, data[r, (nterm + 1)], data[r, 1:nterm]))) 55 | } 56 | } 57 | df <- data.frame(x = rep(cluster$order, each = nterm), y = rep(colnames(data[,1:nterm]), length(rownames(data))), z = tmp, 58 | lab = rep(rownames(data), each = nterm)) 59 | df_o <- df[order(df$x),] 60 | 61 | g <- ggplot() + 62 | geom_tile(data = df_o, aes(x = x, y = y, fill = z))+ 63 | scale_x_discrete(breaks = 1:length(unique(df_o$x)), labels = unique(df_o$lab)) + 64 | theme(axis.text.x = element_text(angle = 90, vjust = 0.5), axis.title.x=element_blank(), axis.title.y=element_blank(), 65 | axis.text.y = element_text(size = 14), panel.background=element_blank(), panel.grid.major=element_blank(), 66 | panel.grid.minor=element_blank()) 67 | if(nlfc == 0){ 68 | g + scale_fill_gradient2('Count', space = 'Lab', low=fill.col[2], mid=fill.col[3], high=fill.col[1]) 69 | }else{ 70 | g + scale_fill_gradient2('logFC', space = 'Lab', low=fill.col[3], mid=fill.col[2], high=fill.col[1]) 71 | } 72 | } 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /R/GOVenn.R: -------------------------------------------------------------------------------- 1 | #' 2 | #' @name GOVenn 3 | #' @title Venn diagram of differentially expressed genes. 4 | #' @description The function compares lists of differentially expressed genes 5 | #' and illustrates possible relations.Additionally it represents the variety 6 | #' of gene expression patterns within the intersection in small pie charts 7 | #' with three segements. Clockwise are shown the number of commonly up- 8 | #' regulated, commonly down- regulated and contra- regulated genes. 9 | #' @param data1 A data frame consisting of two columns: ID, logFC 10 | #' @param data2 A data frame consisting of two columns: ID, logFC 11 | #' @param data3 A data frame consisting of two columns: ID, logFC 12 | #' @param title The title of the plot 13 | #' @param label A character vector to define the legend keys 14 | #' @param lfc.col A character vector determining the background colors of the 15 | #' pie segments representing up- and down- regulated genes 16 | #' @param circle.col A character vector to assign clockwise colors for the 17 | #' circles 18 | #' @param plot If TRUE only the venn diagram is plotted. Otherwise the function 19 | #' returns a list with two items: the actual plot and a list containing the 20 | #' overlap entries (default= TRUE) 21 | #' @details The \code{plot} argument can be used to adjust the amount of 22 | #' information that is returned by calling the function. If you are only 23 | #' interested in the actual plot of the venn diagram, \code{plot} should be 24 | #' set to TRUE. Sometimes you also want to know the elements of the 25 | #' intersections. In this case \code{plot} should be set to FALSE and the 26 | #' function call will return a list of two items. The first item, that can be 27 | #' accessed by $plot, contains the plotting information. Additionally, a list 28 | #' ($table) will be returned containing the elements of the various overlaps. 29 | #' @import ggplot2 30 | #' @examples 31 | #' \dontrun{ 32 | #' #Load the included dataset 33 | #' data(EC) 34 | #' 35 | #' #Generating the circ object 36 | #' circ<-circular_dat(EC$david, EC$genelist) 37 | #' 38 | #' #Selecting terms of interest 39 | #' l1<-subset(circ,term=='heart development',c(genes,logFC)) 40 | #' l2<-subset(circ,term=='plasma membrane',c(genes,logFC)) 41 | #' l3<-subset(circ,term=='tissue morphogenesis',c(genes,logFC)) 42 | #' 43 | #' GOVenn(l1,l2,l3, label=c('heart development','plasma membrane','tissue morphogenesis')) 44 | #' } 45 | #' @export 46 | 47 | GOVenn<-function(data1, data2, data3, title, label, lfc.col, circle.col, plot=T){ 48 | id <- NULL 49 | if (missing(label)) label<-c('List1','List2','List3') 50 | if (missing(lfc.col)) lfc.col<-c('firebrick1','gold','cornflowerblue') 51 | if (missing(circle.col)) circle.col<-c('brown1','chartreuse3','cornflowerblue') 52 | if (missing(title)) title<-'' 53 | if (missing(data3)==F) { 54 | three<-T 55 | overlap<-get_overlap(data1,data2,data3) 56 | venn_df<-overlap$venn_df 57 | table<-overlap$table 58 | }else{ 59 | three<-F 60 | overlap<-get_overlap2(data1,data2) 61 | venn_df<-overlap$venn_df 62 | table<-overlap$table 63 | } 64 | 65 | ### calc Venn ### 66 | if (three){ 67 | center<-data.frame(x=c(0.4311,0.4308,0.6380),y=c(0.6197,0.3801,0.5001),diameter=c(0.4483,0.4483,0.4483)) 68 | outerCircle<-data.frame(x=numeric(),y=numeric(),id=numeric()) 69 | for (var in 1:3){ 70 | dat <- circleFun(c(center$x[var],center$y[var]),center$diameter[var],npoints = 100) 71 | outerCircle<-rbind(outerCircle,dat) 72 | } 73 | outerCircle$id<-rep(c(label[1],label[2],label[3]),each=100) 74 | outerCircle$id<-factor(outerCircle$id, levels=c(label[1],label[2],label[3])) 75 | }else{ 76 | center<-data.frame(x=c(0.33,0.6699),y=c(0.5,0.5),diameter=c(0.6180,0.6180)) 77 | outerCircle<-data.frame(x=numeric(),y=numeric(),id=numeric()) 78 | for (var in 1:2){ 79 | dat <- circleFun(c(center$x[var],center$y[var]),center$diameter[var],npoints = 100) 80 | outerCircle<-rbind(outerCircle,dat) 81 | } 82 | outerCircle$id<-rep(c(label[1],label[2]),each=100) 83 | outerCircle$id<-factor(outerCircle$id, levels=c(label[1],label[2])) 84 | } 85 | 86 | ### calc single pies ### 87 | if (three){ 88 | Pie<-data.frame(x=numeric(),y=numeric(),id=numeric()) 89 | dat <- circleFun(c(center$x[1],max(subset(outerCircle,id==label[1])$y)-0.05),0.1,npoints = 100) 90 | Pie<-rbind(Pie,dat) 91 | dat <- circleFun(c(center$x[2],min(subset(outerCircle,id==label[2])$y)+0.05),0.1,npoints = 100) 92 | Pie<-rbind(Pie,dat) 93 | dat <- circleFun(c(max(subset(outerCircle,id==label[3])$x)-0.05,center$y[3]),0.1,npoints = 100) 94 | Pie<-rbind(Pie,dat) 95 | Pie$id<-rep(1:3,each=100) 96 | UP<-Pie[c(1:50,100:150,200:250),] 97 | Down<-Pie[c(50:100,150:200,250:300),] 98 | }else{ 99 | Pie<-data.frame(x=numeric(),y=numeric(),id=numeric()) 100 | dat <- circleFun(c(min(subset(outerCircle,id==label[1])$x)+0.05,center$y[1]),0.1,npoints = 100) 101 | Pie<-rbind(Pie,dat) 102 | dat <- circleFun(c(max(subset(outerCircle,id==label[2])$x)-0.05,center$y[2]),0.1,npoints = 100) 103 | Pie<-rbind(Pie,dat) 104 | Pie$id<-rep(1:2,each=100) 105 | UP<-Pie[c(1:50,100:150),] 106 | Down<-Pie[c(50:100,150:200),] 107 | } 108 | 109 | ### calc single pie text ### 110 | if (three){ 111 | x<-c();y<-c() 112 | for (i in unique(Pie$id)){ 113 | x<-c(x,rep((min(subset(Pie,id==i)$x)+max(subset(Pie,id==i)$x))/2,2)) 114 | y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2+0.02) 115 | y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2-0.02) 116 | } 117 | pieText<-data.frame(x=x,y=y,label=c(venn_df$UP[1],venn_df$DOWN[1],venn_df$UP[2],venn_df$DOWN[2],venn_df$UP[3],venn_df$DOWN[3])) 118 | }else{ 119 | x<-c();y<-c() 120 | for (i in unique(Pie$id)){ 121 | x<-c(x,rep((min(subset(Pie,id==i)$x)+max(subset(Pie,id==i)$x))/2,2)) 122 | y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2+0.02) 123 | y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2-0.02) 124 | } 125 | pieText<-data.frame(x=x,y=y,label=c(venn_df$UP[1],venn_df$DOWN[1],venn_df$UP[2],venn_df$DOWN[2])) 126 | } 127 | 128 | ### calc overlap pies ### 129 | if (three){ 130 | smc<-data.frame(x=c(0.6,0.59,0.31,0.5),y=c(0.66,0.34,0.5,0.5)) 131 | PieOv<-data.frame(x=numeric(),y=numeric()) 132 | PieOv<-rbind(PieOv,circleFun(c(smc$x[1],smc$y[1]),0.06,npoints = 100)) 133 | PieOv<-rbind(PieOv,circleFun(c(smc$x[2],smc$y[2]),0.06,npoints = 100)) 134 | PieOv<-rbind(PieOv,circleFun(c(smc$x[3],smc$y[3]),0.06,npoints = 100)) 135 | PieOv<-rbind(PieOv,circleFun(c(smc$x[4],smc$y[4]),0.06,npoints = 100)) 136 | PieOv$id<-rep(1:4,each=100) 137 | smc$id<-1:4 138 | UPOv<-rbind(smc[1,],PieOv[1:33,],smc[1,],smc[2,],PieOv[100:133,],smc[2,],smc[3,],PieOv[200:233,],smc[3,],smc[4,],PieOv[300:333,],smc[4,]) 139 | Change<-rbind(smc[1,],PieOv[33:66,],smc[1,],smc[2,],PieOv[133:166,],smc[2,],smc[3,],PieOv[233:266,],smc[3,],smc[4,],PieOv[333:366,],smc[4,]) 140 | DownOv<-rbind(smc[1,],PieOv[66:100,],smc[1,],smc[2,],PieOv[166:200,],smc[2,],smc[3,],PieOv[266:300,],smc[3,],smc[4,],PieOv[366:400,],smc[4,]) 141 | }else{ 142 | PieOv<-data.frame(x=numeric(),y=numeric(),id=numeric()) 143 | PieOv<-rbind(PieOv,circleFun(c(0.5,0.5),0.08,npoints = 100)) 144 | PieOv$id<-rep(1,100) 145 | center<-data.frame(x=0.5, y=0.5, id=1) 146 | UPOv<-rbind(center[1,],PieOv[1:33,]) 147 | Change<-rbind(center[1,],PieOv[33:66,]) 148 | DownOv<-rbind(center[1,],PieOv[66:100,]) 149 | } 150 | 151 | ### calc overlap pie text ### 152 | if (three){ 153 | x<-c();y<-c() 154 | for (i in unique(PieOv$id)){ 155 | x<-c(x,subset(UPOv,id==i)$x[1]+0.0115,subset(DownOv,id==i)$x[1]-0.018,subset(Change,id==i)$x[1]+0.01) 156 | y<-c(y,subset(UPOv,id==i)$y[1]+0.01,subset(DownOv,id==i)$y[1],subset(Change,id==i)$y[1]-0.013) 157 | } 158 | small.pieT<-data.frame(x=x,y=y,label=c(venn_df$UP[5],venn_df$Change[5],venn_df$DOWN[5],venn_df$UP[6],venn_df$Change[6],venn_df$DOWN[6],venn_df$UP[4],venn_df$Change[4],venn_df$DOWN[4],venn_df$UP[7],venn_df$Change[7],venn_df$DOWN[7])) 159 | }else{ 160 | x<-c(subset(UPOv,id==1)$x[1]+0.015,subset(DownOv,id==1)$x[1]-0.018,subset(Change,id==1)$x[1]+0.01) 161 | y<-c(subset(UPOv,id==1)$y[1]+0.015,subset(DownOv,id==1)$y[1],subset(Change,id==1)$y[1]-0.013) 162 | small.pieT<-data.frame(x=x,y=y,label=c(venn_df$UP[3],venn_df$Change[3],venn_df$DOWN[3])) 163 | } 164 | 165 | g<- ggplot()+ 166 | geom_polygon(data=outerCircle, aes(x,y, group=id, fill=id) ,alpha=0.5,color='black')+ 167 | scale_fill_manual(values=circle.col)+ 168 | guides(fill=guide_legend(title=''))+ 169 | geom_polygon(data=UP, aes(x,y,group=id),fill=lfc.col[1],color='white')+ 170 | geom_polygon(data=Down, aes(x,y,group=id),fill=lfc.col[3],color='white')+ 171 | geom_text(data=pieText, aes(x=x,y=y,label=label),size=5)+ 172 | geom_polygon(data=UPOv, aes(x,y,group=id),fill=lfc.col[1],color='white')+ 173 | geom_polygon(data=DownOv, aes(x,y,group=id),fill=lfc.col[3],color='white')+ 174 | geom_polygon(data=Change, aes(x,y,group=id),fill=lfc.col[2],color='white')+ 175 | geom_text(data=small.pieT,aes(x=x,y=y,label=label),size=4)+ 176 | theme_blank+ 177 | labs(title=title) 178 | 179 | if (plot) return(g) else return(list(plot=g,table=table)) 180 | } 181 | 182 | 183 | 184 | -------------------------------------------------------------------------------- /R/Helper.R: -------------------------------------------------------------------------------- 1 | ############## 2 | # In general # 3 | ############## 4 | 5 | # Theme blank 6 | theme_blank <- theme(axis.line = element_blank(), axis.text.x = element_blank(), 7 | axis.text.y = element_blank(), axis.ticks = element_blank(), axis.title.x = element_blank(), 8 | axis.title.y = element_blank(), panel.background = element_blank(), panel.border = element_blank(), 9 | panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank()) 10 | 11 | # Draw adjacent table for GOBubble and GOCircle 12 | draw_table <- function(data, col){ 13 | id <- term <- NULL 14 | colnames(data) <- tolower(colnames(data)) 15 | if (missing(col)){ 16 | tt1 <- ttheme_default() 17 | }else{ 18 | text.col <- c(rep(col[1], sum(data$category == 'BP')), rep(col[2], sum(data$category == 'CC')), rep(col[3], sum(data$category == 'MF'))) 19 | tt1 <- ttheme_minimal( 20 | core = list(bg_params = list(fill = text.col, col=NA, alpha= 1/3)), 21 | colhead = list(fg_params = list(col = "black"))) 22 | } 23 | table <- tableGrob(subset(data, select = c(id, term)), cols = c('ID', 'Description'), rows = NULL, theme = tt1) 24 | return(table) 25 | } 26 | 27 | ########### 28 | # GOChord # 29 | ########### 30 | 31 | # Bezier function for drawing ribbons 32 | bezier <- function(data, process.col){ 33 | x <- c() 34 | y <- c() 35 | Id <- c() 36 | sequ <- seq(0, 1, by = 0.01) 37 | N <- dim(data)[1] 38 | sN <- seq(1, N, by = 2) 39 | if (process.col[1] == '') col_rain <- grDevices::rainbow(N) else col_rain <- process.col 40 | for (n in sN){ 41 | xval <- c(); xval2 <- c(); yval <- c(); yval2 <- c() 42 | for (t in sequ){ 43 | xva <- (1 - t) * (1 - t) * data$x.start[n] + t * t * data$x.end[n] 44 | xval <- c(xval, xva) 45 | xva2 <- (1 - t) * (1 - t) * data$x.start[n + 1] + t * t * data$x.end[n + 1] 46 | xval2 <- c(xval2, xva2) 47 | yva <- (1 - t) * (1 - t) * data$y.start[n] + t * t * data$y.end[n] 48 | yval <- c(yval, yva) 49 | yva2 <- (1 - t) * (1 - t) * data$y.start[n + 1] + t * t * data$y.end[n + 1] 50 | yval2 <- c(yval2, yva2) 51 | } 52 | x <- c(x, xval, rev(xval2)) 53 | y <- c(y, yval, rev(yval2)) 54 | Id <- c(Id, rep(n, 2 * length(sequ))) 55 | } 56 | df <- data.frame(lx = x, ly = y, ID = Id) 57 | return(df) 58 | } 59 | 60 | # Check function for GOChord argument 'limit' 61 | check_chord <- function(mat, limit){ 62 | 63 | if(all(colSums(mat) >= limit[2]) & all(rowSums(mat) >= limit[1])) return(mat) 64 | 65 | tmp <- mat[(rowSums(mat) >= limit[1]),] 66 | mat <- tmp[,(colSums(tmp) >= limit[2])] 67 | 68 | mat <- check_chord(mat, limit) 69 | return(mat) 70 | } 71 | 72 | ########## 73 | # GOVenn # 74 | ########## 75 | 76 | # Calculate points to draw a circle 77 | circleFun <- function(center = c(0,0),diameter = 1, npoints = 100){ 78 | r = diameter / 2 79 | tt <- seq(0,2*pi,length.out = npoints) 80 | xx <- center[1] + r * cos(tt) 81 | yy <- center[2] + r * sin(tt) 82 | return(data.frame(x = xx, y = yy)) 83 | } 84 | 85 | # Calculate overlap for three lists 86 | get_overlap<-function(A,B,C){ 87 | colnames(A)<-c('ID','logFC') 88 | colnames(B)<-c('ID','logFC') 89 | colnames(C)<-c('ID','logFC') 90 | UP<-NULL;DOWN<-NULL;Change<-NULL 91 | if (class(A$logFC)!='numeric'){ 92 | A$logFC<-gsub(",", ".", gsub("\\.", "", A$logFC)) 93 | A$Trend<-sapply(as.numeric(A$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 94 | }else{ A$Trend<-sapply(A$logFC, function(x) ifelse(x > 0,'UP','DOWN'))} 95 | if (class(B$logFC)!='numeric'){ 96 | B$logFC<-gsub(",", ".", gsub("\\.", "", B$logFC)) 97 | B$Trend<-sapply(as.numeric(B$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 98 | }else{ B$Trend<-sapply(B$logFC, function(x) ifelse(x > 0,'UP','DOWN'))} 99 | if (class(C$logFC)!='numeric'){ 100 | C$logFC<-gsub(",", ".", gsub("\\.", "", C$logFC)) 101 | C$Trend<-sapply(as.numeric(C$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 102 | }else{ C$Trend<-sapply(C$logFC, function(x) ifelse(x > 0,'UP','DOWN'))} 103 | if (sum(((A$ID%in%B$ID)==T)==T)==0){ 104 | AB<-data.frame() 105 | }else{ 106 | AB<-A[(A$ID%in%B$ID)==T,which(colnames(A)%in%c('ID','logFC','Trend'))] 107 | BA<-B[(B$ID%in%A$ID)==T,which(colnames(B)%in%c('ID','logFC','Trend'))] 108 | AB<-merge(AB,BA,by="ID") 109 | rownames(AB)<-AB$ID 110 | AB<-AB[,-1] 111 | } 112 | if (sum(((A$ID%in%C$ID)==T)==T)==0){ 113 | AC<-data.frame() 114 | }else{ 115 | AC<-A[(A$ID%in%C$ID)==T,which(colnames(A)%in%c('ID','logFC','Trend'))] 116 | CA<-C[(C$ID%in%A$ID)==T,which(colnames(C)%in%c('ID','logFC','Trend'))] 117 | AC<-merge(AC,CA,by="ID") 118 | rownames(AC)<-AC$ID 119 | AC<-AC[,-1] 120 | } 121 | if (sum(((B$ID%in%C$ID)==T)==T)==0){ 122 | BC<-data.frame() 123 | }else{ 124 | BC<-B[(B$ID%in%C$ID)==T,which(colnames(B)%in%c('ID','logFC','Trend'))] 125 | CB<-C[(C$ID%in%B$ID)==T,which(colnames(C)%in%c('ID','logFC','Trend'))] 126 | BC<-merge(BC,CB,by="ID") 127 | rownames(BC)<-BC$ID 128 | BC<-BC[,-1] 129 | } 130 | if (sum(((A$ID%in%B$ID)==T & (A$ID%in%C$ID)==T))==0){ 131 | ABC<-data.frame() 132 | }else{ 133 | ABC<-A[((A$ID%in%B$ID)==T & (A$ID%in%C$ID)==T),which(colnames(A)%in%c('ID','logFC','Trend'))] 134 | BAC<-B[((B$ID%in%A$ID)==T & (B$ID%in%C$ID)==T),which(colnames(B)%in%c('ID','logFC','Trend'))] 135 | CAB<-C[((C$ID%in%A$ID)==T & (C$ID%in%B$ID)==T),which(colnames(C)%in%c('ID','logFC','Trend'))] 136 | ABC<-merge(ABC,BAC,by='ID') 137 | ABC<-merge(ABC,CAB,by='ID') 138 | rownames(ABC)<-ABC$ID 139 | ABC<-ABC[,-1] 140 | } 141 | A_only<-A[((A$ID%in%B$ID)==F & (A$ID%in%C$ID)==F),which(colnames(A)%in%c('ID','logFC','Trend'))] 142 | rownames(A_only)<-A_only$ID 143 | A_only<-A_only[,-1] 144 | B_only<-B[((B$ID%in%A$ID)==F & (B$ID%in%C$ID)==F),which(colnames(A)%in%c('ID','logFC','Trend'))] 145 | rownames(B_only)<-B_only$ID 146 | B_only<-B_only[,-1] 147 | C_only<-C[((C$ID%in%A$ID)==F & (C$ID%in%B$ID)==F),which(colnames(A)%in%c('ID','logFC','Trend'))] 148 | rownames(C_only)<-C_only$ID 149 | C_only<-C_only[,-1] 150 | UP<-c(UP,sum(A_only$Trend=='UP'));DOWN<-c(DOWN,sum(A_only$Trend=='DOWN'));Change<-c(Change,sum(A_only$Trend=='Change')) 151 | UP<-c(UP,sum(B_only$Trend=='UP'));DOWN<-c(DOWN,sum(B_only$Trend=='DOWN'));Change<-c(Change,sum(B_only$Trend=='Change')) 152 | UP<-c(UP,sum(C_only$Trend=='UP'));DOWN<-c(DOWN,sum(C_only$Trend=='DOWN'));Change<-c(Change,sum(C_only$Trend=='Change')) 153 | if (dim(AB)[1]==0){ 154 | OvAB<-data.frame() 155 | UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0) 156 | }else{ 157 | tmp<-NULL 158 | for (t in 1:dim(AB)[1]) tmp<-c(tmp,ifelse(AB$Trend.x[t]==AB$Trend.y[t],AB$Trend.x[t],'Change')) 159 | OvAB<-data.frame(logFC_A=AB$logFC.x,logFC_B=AB$logFC.y,Trend=tmp) 160 | rownames(OvAB)<-rownames(AB) 161 | AB<-OvAB[order(OvAB$Trend),] 162 | UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change')) 163 | } 164 | if (dim(AC)[1]==0){ 165 | OvAc<-data.frame() 166 | UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0) 167 | }else{ 168 | tmp<-NULL 169 | for (t in 1:dim(AC)[1]) tmp<-c(tmp,ifelse(AC$Trend.x[t]==AC$Trend.y[t],AC$Trend.x[t],'Change')) 170 | OvAC<-data.frame(logFC_A=AC$logFC.x,logFC_C=AC$logFC.y,Trend=tmp) 171 | rownames(OvAC)<-rownames(AC) 172 | AC<-OvAC[order(OvAC$Trend),] 173 | UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change')) 174 | } 175 | if (dim(BC)[1]==0){ 176 | OvBC<-data.frame() 177 | UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0) 178 | }else{ 179 | tmp<-NULL 180 | for (t in 1:dim(BC)[1]) tmp<-c(tmp,ifelse(BC$Trend.x[t]==BC$Trend.y[t],BC$Trend.x[t],'Change')) 181 | OvBC<-data.frame(logFC_B=BC$logFC.x,logFC_C=BC$logFC.y,Trend=tmp) 182 | rownames(OvBC)<-rownames(BC) 183 | BC<-OvBC[order(OvBC$Trend),] 184 | UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change')) 185 | } 186 | if (dim(ABC)[1]==0){ 187 | OvABC<-data.frame() 188 | UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0) 189 | }else{ 190 | tmp<-NULL 191 | for (t in 1:dim(ABC)[1]) tmp<-c(tmp,ifelse(((ABC$Trend.x[t]==ABC$Trend.y[t]) & (ABC$Trend.x[t]==ABC$Trend[t])),ABC$Trend.x[t],'Change')) 192 | OvABC<-data.frame(logFC_A=ABC$logFC.x,logFC_B=ABC$logFC.y,logFC_C=ABC$logFC,Trend=tmp) 193 | rownames(OvABC)<-rownames(ABC) 194 | ABC<-OvABC[order(OvABC$Trend),] 195 | UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change')) 196 | } 197 | counts<-data.frame(Contrast=c('A_only','B_only','C_only','AB','AC','BC','ABC'),Count=c(dim(A_only)[1],dim(B_only)[1],dim(C_only)[1],dim(AB)[1],dim(AC)[1],dim(BC)[1],dim(ABC)[1]),UP=UP,DOWN=DOWN,Change=Change) 198 | venn<-list(A_only=A_only,B_only=B_only,C_only=C_only,AB=AB,BC=BC,AC=AC,ABC=ABC) 199 | return(list(venn_df=counts,table=venn)) 200 | } 201 | 202 | # Caluclate overlap for two lists 203 | get_overlap2<-function(A,B){ 204 | colnames(A)<-c('ID','logFC') 205 | colnames(B)<-c('ID','logFC') 206 | UP<-NULL;DOWN<-NULL;Change<-NULL 207 | if (class(A$logFC)!='numeric'){ 208 | A$logFC<-gsub(",", ".", gsub("\\.", "", A$logFC)) 209 | A$Trend<-sapply(as.numeric(A$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 210 | }else{ A$Trend<-sapply(A$logFC, function(x) ifelse(x > 0,'UP','DOWN'))} 211 | if (class(B$logFC)!='numeric'){ 212 | B$logFC<-gsub(",", ".", gsub("\\.", "", B$logFC)) 213 | B$Trend<-sapply(as.numeric(B$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 214 | }else{ B$Trend<-sapply(B$logFC, function(x) ifelse(x > 0,'UP','DOWN'))} 215 | AB<-A[(A$ID%in%B$ID)==T,which(colnames(A)%in%c('ID','logFC','Trend'))] 216 | BA<-B[(B$ID%in%A$ID)==T,which(colnames(B)%in%c('ID','logFC','Trend'))] 217 | A_only<-A[(A$ID%in%B$ID)==F,which(colnames(A)%in%c('ID','logFC','Trend'))] 218 | B_only<-B[(B$ID%in%A$ID)==F,which(colnames(B)%in%c('ID','logFC','Trend'))] 219 | AB<-merge(AB,BA,by='ID') 220 | UP<-c(UP,sum(A_only$Trend=='UP'));DOWN<-c(DOWN,sum(A_only$Trend=='DOWN'));Change<-c(Change,sum(A_only$Trend=='Change')) 221 | UP<-c(UP,sum(B_only$Trend=='UP'));DOWN<-c(DOWN,sum(B_only$Trend=='DOWN'));Change<-c(Change,sum(B_only$Trend=='Change')) 222 | rownames(A_only)<-A_only$ID 223 | A_only<-A_only[,-1] 224 | A_only<-A_only[order(A_only$Trend),] 225 | rownames(B_only)<-B_only$ID 226 | B_only<-B_only[,-1] 227 | B_only<-B_only[order(B_only$Trend),] 228 | tmp<-NULL 229 | for (t in 1:dim(AB)[1]) tmp<-c(tmp,ifelse(AB$Trend.x[t]==AB$Trend.y[t],AB$Trend.x[t],'Change')) 230 | OvAB<-data.frame(logFC_A=AB$logFC.x,logFC_B=AB$logFC.y,Trend=tmp) 231 | rownames(OvAB)<-AB$ID 232 | AB<-OvAB[order(OvAB$Trend),] 233 | UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change')) 234 | counts<-data.frame(Contrast=c('A_only','B_only','AB'),Count=c(dim(A_only)[1],dim(B_only)[1],dim(AB)[1]),UP=UP,DOWN=DOWN,Change=Change) 235 | venn<-list(A_only=A_only,B_only=B_only,AB=AB) 236 | return(list(venn_df=counts,table=venn,dim=c(dim(A)[1],dim(B)[1]))) 237 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GOplot 2 | 3 | Despite the plethora of methods available for the functional analysis of omics data, obtaining comprehensive- yet detailed understanding of the results remains challenging. GOplot takes the output of any general enrichment analysis and generates plots at different levels of detail: from a general overview to identify the most enriched categories (bar plot, bubble plot) to a more detailed view displaying different types of information for molecules in a given set of categories (circle plot, chord plot, cluster plot). The package provides a deeper insight into omics data and allows scientists to generate insightful plots with only a few lines of code to easily communicate the findings. 4 | 5 | ## Installation 6 | 7 | GOplot is available via CRAN: http://cran.r-project.org/web/packages/GOplot 8 | 9 | * the latest released version: `install.packages("GOplot")` 10 | * the latest development version: `install_github("wencke/wencke.github.io")` 11 | 12 | ## Available functions 13 | 14 | For preprocessing: circle_dat(), chord_dat() and reduce_overlap() 15 | 16 | For plotting: GOBubble(), GOBar(), GOChord(), GOCluster(), GOCircle(), GOVenn(), GOHeat() 17 | 18 | A manual can be found on the website https://wencke.github.io/ 19 | -------------------------------------------------------------------------------- /build/vignette.rds: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/build/vignette.rds -------------------------------------------------------------------------------- /data/EC.rda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/data/EC.rda -------------------------------------------------------------------------------- /inst/CITATION: -------------------------------------------------------------------------------- 1 | citHeader("To cite GOplot in publications use:") 2 | 3 | citEntry(entry = "Article", 4 | title = "GOplot: an R package for visually combining expression data with functional analysis", 5 | author = personList(as.person("Wencke Walter"), as.person("Fatima Sanchez-Cabo"), 6 | as.person("Mercedes Ricote")), 7 | journal = "Bioinformatics", 8 | year = "2015", 9 | 10 | textVersion = paste("Walter, Wencke, Fatima Sanchez-Cabo, and Mercedes Ricote.", 11 | "GOplot: an R package for visually combining expression data with functional analysis.", 12 | "Bioinformatics (2015): btv300.") 13 | ) 14 | -------------------------------------------------------------------------------- /inst/doc/GOplot_vignette.R: -------------------------------------------------------------------------------- 1 | ## ----table1, echo = FALSE, results = 'asis'------------------------------ 2 | toy<-data.frame(Name=c('EC$eset','EC$genelist','EC$david','EC$genes','EC$process'),Description=c('Data frame of normalized expression values of brain and heart endothelial cells (3 replicates)','Data frame of differentially expressed genes (adjusted p-value < 0.05)','Data frame of results from a functional analysis of the differentially expressed genes performed with DAVID','Data frame of selected genes with logFC','Character vector of selected enriched biological processes'),Dimension=c('20644 x 7','2039 x 7','174 x 5','37 x 2','7')) 3 | knitr::kable(toy, colnames=c('Name','Description','Dimension (row, col)')) 4 | 5 | ## ----glimpse, warning = FALSE, message = FALSE--------------------------- 6 | library(GOplot) 7 | # Load the dataset 8 | data(EC) 9 | # Get a glimpse of the data format of the results of the functional analysis... 10 | head(EC$david) 11 | # ...and of the data frame of selected genes 12 | head(EC$genelist) 13 | 14 | ## ----circ_object, warning = FALSE, message = FALSE----------------------- 15 | # Generate the plotting object 16 | circ <- circle_dat(EC$david, EC$genelist) 17 | 18 | ## ----GOBar, warning = FALSE, message = FALSE, fig.width = 8.3, fig.height = 6---- 19 | # Generate a simple barplot 20 | GOBar(subset(circ, category == 'BP')) 21 | 22 | ## ----GOBar2, eval = FALSE, warning = FALSE, message = FALSE-------------- 23 | # # Facet the barplot according to the categories of the terms 24 | # GOBar(circ, display = 'multiple') 25 | 26 | ## ----GOBar3, eval = FALSE, warning = FALSE, message = FALSE-------------- 27 | # # Facet the barplot, add a title and change the colour scale for the z-score 28 | # GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan')) 29 | 30 | ## ----GOBubble1, warning = FALSE, message = FALSE, fig.keep = 'none'------ 31 | # Generate the bubble plot with a label threshold of 3 32 | GOBubble(circ, labels = 3) 33 | 34 | ## ----GOBubble2, warning = FALSE, message = FALSE, fig.keep = 'none'------ 35 | # Add a title, change the colour of the circles, facet the plot according to the categories and change the label threshold 36 | GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3) 37 | 38 | ## ----GOBubble3, warning = FALSE, message = FALSE, fig.keep = 'none'------ 39 | # Colour the background according to the category 40 | GOBubble(circ, title = 'Bubble plot with background colour', display = 'multiple', bg.col = T, labels = 3) 41 | 42 | ## ----GOBubble4, warning = FALSE, message = FALSE, fig.keep = 'none', eval = FALSE---- 43 | # # Reduce redundant terms with a gene overlap >= 0.75... 44 | # reduced_circ <- reduce_overlap(circ, overlap = 0.75) 45 | # # ...and plot it 46 | # GOBubble(reduced_circ, labels = 2.8) 47 | 48 | ## ----GOCircle1, warning = FALSE, message = FALSE, fig.keep = 'none'------ 49 | # Generate a circular visualization of the results of gene- annotation enrichment analysis 50 | GOCircle(circ) 51 | 52 | ## ----GOCircle2, eval = FALSE--------------------------------------------- 53 | # # Generate a circular visualization of selected terms 54 | # IDs <- c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047') 55 | # GOCircle(circ, nsub = IDs) 56 | 57 | ## ----GOCircle3, eval = FALSE--------------------------------------------- 58 | # # Generate a circular visualization for 10 terms 59 | # GOCircle(circ, nsub = 10) 60 | 61 | ## ----GOChord1, warning = FALSE, message = FALSE-------------------------- 62 | # Define a list of genes which you think are interesting to look at. The item EC$genes of the toy 63 | # sample contains the data frame of selected genes and their logFC. Have a look... 64 | head(EC$genes) 65 | # Since we have a lot of significantly enriched processes we selected some specific ones (EC$process) 66 | EC$process 67 | # Now it is time to generate the binary matrix 68 | chord <- chord_dat(circ, EC$genes, EC$process) 69 | head(chord) 70 | 71 | ## ----GOChord2, eval=FALSE, warning = FALSE, message = FALSE-------------- 72 | # # Generate the matrix with a list of selected genes 73 | # chord <- chord_dat(data = circ, genes = EC$genes) 74 | # # Generate the matrix with selected processes 75 | # chord <- chord_dat(data = circ, process = EC$process) 76 | 77 | ## ----GOChord3, warning = FALSE, message = FALSE, fig.keep = 'none'------- 78 | # Create the plot 79 | GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5) 80 | 81 | ## ----GOChord4, warning = FALSE, message = FALSE, fig.keep = 'none'------- 82 | # Display only genes which are assigned to at least three processes 83 | GOChord(chord, limit = c(3, 0), gene.order = 'logFC') 84 | 85 | ## ----GOHeat1, warning = FALSE, message = FALSE, fig.keep = 'none'-------- 86 | # First, we use the chord object without logFC column to create the heatmap 87 | GOHeat(chord[,-8], nlfc = 0) 88 | 89 | ## ----GOHeat2, warning = FALSE, message = FALSE, fig.keep = 'none'-------- 90 | # First, we use the chord object without logFC column to create the heatmap 91 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green')) 92 | 93 | ## ----GOCluster, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'---- 94 | # GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2) 95 | 96 | ## ----GOCluster2, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'---- 97 | # GOCluster(circ, EC$process, clust.by = 'term', lfc.col = c('darkgoldenrod1', 'black', 'cyan1')) 98 | 99 | ## ----GOVenn, warning=FALSE, message=FALSE, fig.keep='none'--------------- 100 | l1 <- subset(circ, term == 'heart development', c(genes,logFC)) 101 | l2 <- subset(circ, term == 'plasma membrane', c(genes,logFC)) 102 | l3 <- subset(circ, term == 'tissue morphogenesis', c(genes,logFC)) 103 | GOVenn(l1,l2,l3, label = c('heart development', 'plasma membrane', 'tissue morphogenesis')) 104 | 105 | -------------------------------------------------------------------------------- /inst/doc/GOplot_vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "GOplot 1.0.2" 3 | author: "Wencke Walter" 4 | date: "`r Sys.Date()`" 5 | output: 6 | rmarkdown::html_vignette: 7 | css: GOplot.css 8 | vignette: > 9 | %\VignetteIndexEntry{GOplot_0.2} 10 | %\VignetteEngine{knitr::rmarkdown} 11 | \usepackage[utf8]{inputenc} 12 | --- 13 | 14 | A manual to exploit the possibilities and limitations of the R package GOplot. 15 | 16 | 17 | ##Introduction 18 | The GOplot package concentrates on the visualization of biological data. More precisely, the package will help combine and integrate expression data with the results of a functional analysis. The package cannot be used to perform any of these analyses. It is for visualization purpose only. In all the scientific fields we visualize information to meet a basic need- to tell a story. Attributable to space restrictions and a general need to present everything neat and tidy most of the times it is simply not possible to actually tell a story. Therefore, we use vision to communicate information. A well designed and elaborated figure provides the beholder with high-dimensional information in a much smaller space than for example a table. The idea of the package is to provide the user with functions that allow a quick examination of large amounts of data, expose trends and find patterns & correlations within the data. Effective data visualization is an important tool in the decision making process and helps to find further pieces of the puzzle picturing the answer of your biological question. Based on that you will be able to confirm or falsify your hypotheses. You might even start to look in a different direction to investigate your topic relying on the insight a new visualization provides. The plotting functions of the package were developed with a hierarchical structure in mind; starting with a general overview and closing with definite subsets of selected genes and terms. To explain the idea let us use an example. 19 | 20 | ##The toy example 21 | GOplot comes with a manually compiled data set. Selected samples were downloaded from gene expression omnibus (accession number: *[GSE47067](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067)*). As a brief summary, the data set contains the transcriptomic information of endothelial cells from two steady state tissues (brain and heart). More detailed information can be found in the paper by *[Nolan et al. 2013](http://www.ncbi.nlm.nih.gov/pubmed/23871589)*. The data was normalized and a statistical analysis was performed to determine differentially expressed genes. *DAVID* functional annotation tool was used to perform a gene- annotation enrichment analysis of the set of differentially expressed genes (adjusted p-value < 0.05). The data set contains the five following items: 22 | 23 | ```{r table1, echo = FALSE, results = 'asis'} 24 | toy<-data.frame(Name=c('EC$eset','EC$genelist','EC$david','EC$genes','EC$process'),Description=c('Data frame of normalized expression values of brain and heart endothelial cells (3 replicates)','Data frame of differentially expressed genes (adjusted p-value < 0.05)','Data frame of results from a functional analysis of the differentially expressed genes performed with DAVID','Data frame of selected genes with logFC','Character vector of selected enriched biological processes'),Dimension=c('20644 x 7','2039 x 7','174 x 5','37 x 2','7')) 25 | knitr::kable(toy, colnames=c('Name','Description','Dimension (row, col)')) 26 | ``` 27 | 28 | ##Getting started 29 | As a first step we want to get an overview of the enriched GO terms of our differentially expressed genes. But before we start plotting we need to bring the data in the right format for the plotting functions. In general, the data object of the plotting functions can be created manually, but the package includes a function that does the job for you. The *circle_dat* function combines the result of the functional analysis with a list of selected genes and their logFC. Most likely a list of differentially expressed genes. *circle_dat* takes two data frames as an input. The first one contains the results of the functional analysis and should have at least four columns (category, term, genes, adjusted p-value). Additionally, a data frame of the selected genes and their logFC is needed. This data frame can be, for example, the result from a statistical analysis performed with *limma*. Let us have a look at the mentioned data frames. 30 | 31 | ```{r glimpse, warning = FALSE, message = FALSE} 32 | library(GOplot) 33 | # Load the dataset 34 | data(EC) 35 | # Get a glimpse of the data format of the results of the functional analysis... 36 | head(EC$david) 37 | # ...and of the data frame of selected genes 38 | head(EC$genelist) 39 | ``` 40 | 41 | Now, that we know what the input data looks like it's time to use the *cirlce_dat* function to create the plotting object. 42 | 43 | ```{r circ_object, warning = FALSE, message = FALSE} 44 | # Generate the plotting object 45 | circ <- circle_dat(EC$david, EC$genelist) 46 | ``` 47 | 48 | The **circ** object has eight columns with the following names: 49 | 50 | * category 51 | * ID 52 | * term 53 | * count 54 | * gene 55 | * logFC 56 | * adj_pval 57 | * zscore 58 | 59 | Since most of the gene- annotation enrichment analysis are based on the gene ontology database the package was build with this structure in mind, but is not restricted to it. As explained by *[Ashburner et al.](http://www.ncbi.nlm.nih.gov/pubmed/10802651)* in their paper from the year 2000, gene ontology is structured as an acyclic graph and it provides terms covering different areas. These terms are grouped into three independent categories: BP (biological process), CC (cellular component) or MF (molecular function). The first column of the **circ** object contains this information, which was already given in the input. For more information on the structure of gene ontology, have a look at the documentation section of the gene ontology consortium [website](http://geneontology.org/page/ontology-documentation). 60 | All the terms from inside the gene ontology database come with a GO **ID** and a GO **term** description. The **ID** column of the circ object is optional. So in case you want to use a functional analysis tool that is not based on gene ontology you won't have an **ID** column. The term description column does contain just that: a description of the term and the performance of the implemented functions does not depend on possible resemblance with gene ontology terms. **Count** is the number of genes assigned to a term. **Gene** names and their **logFC** are taken from the input list of selected genes. The significance of a term is indicated by the adjusted p-value (**adj_pval**). Terms with an adjusted p-value < 0.05 are considered as significantly enriched and are more likely to provide reliable information. The last column contains the **zscore**, an easy to calculate value to give you a hint if the biological process (/molecular function/cellular components) is more likely to be decreased (negative value) or increased (positive value). 61 | 62 | $$zscore=\frac{(up-down)}{\sqrt{count}}$$ 63 | 64 | Whereas *up* and *down* are the number of assigned genes up-regulated (logFC>0) in the data or down- regulated (logFC<0), respectively. 65 | 66 | ##The plots 67 | 68 | ###The modified barplot (GOBar) 69 | Since we do not really know what to expect from our data the aim of the first figure should be to display as many terms as possible without going too much into details. Nevertheless, the figure shall help us to pick the interesting and valuable terms. Therefore, we need to have some parameters to quantify the importance. Since the majority of scientific sampled data is plotted using bar charts a modified version of the normal barplot function, named *GOBar*, is included. The *GOBar* function allows the user to quickly create an appealing barplot. 70 | 71 | ```{r GOBar, warning = FALSE, message = FALSE, fig.width = 8.3, fig.height = 6} 72 | # Generate a simple barplot 73 | GOBar(subset(circ, category == 'BP')) 74 | ``` 75 | 76 | On the y-axis the significance of the terms is shown and the bars are ordered according to their z-score. If you want, you can change the order by setting the argument *order.by.zscore* to FALSE. In this case the bars are ordered based on their significance. Additionally, the barplot can be easily faceted according to the categories of the terms using the argument *display* of the plotting function (output not shown). 77 | 78 | ```{r GOBar2, eval = FALSE, warning = FALSE, message = FALSE} 79 | # Facet the barplot according to the categories of the terms 80 | GOBar(circ, display = 'multiple') 81 | ``` 82 | 83 | To add a title use *title* and to change the colour scale of the z-score use the argument *zsc.col* (output not shown). 84 | 85 | ```{r GOBar3, eval = FALSE, warning = FALSE, message = FALSE} 86 | # Facet the barplot, add a title and change the colour scale for the z-score 87 | GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan')) 88 | ``` 89 | 90 | Barplots are common and very easy to read, but they might not be the absolute solution. Another possibility to display an overview for high- dimensional data is the bubble plot. 91 | 92 | ###The bubble plot (GOBubble) 93 | The bubble plot is another possibility to get an overview of the enriched terms. The z-score is assigned to the x-axis and the negative logarithm of the adjusted p-value to the y-axis, as in the barplot (the higher the more significant). The area of the displayed circles is proportional to the number of genes (circ$count) assigned to the term and the colour corresponds to the category. 94 | The help page of the plotting function (?GOBubble) lists all the arguments to change the layout of the plot. As a default the circles are labeled with the term ID. Therefore, a table connecting the IDs and terms is displayed on the right side by default. You can hide it by setting the argument *table.legend* to FALSE. If you want to display the term description instead set the argument *ID* to FALSE. Not all the circles are labeld due to the limited space and the overlap of the circles. A threshold for the labeling is set (default=5) based on the negative logarithm of the adjusted p-value. 95 | 96 | 97 | ```{r GOBubble1, warning = FALSE, message = FALSE, fig.keep = 'none'} 98 | # Generate the bubble plot with a label threshold of 3 99 | GOBubble(circ, labels = 3) 100 | ``` 101 | 102 | ![GOBubble1.](GOBubble1.png) 103 | 104 | To add a title, change the colour of the circles, facet the plot and to change the label threshold use the following arguments: 105 | 106 | ```{r GOBubble2, warning = FALSE, message = FALSE, fig.keep = 'none'} 107 | # Add a title, change the colour of the circles, facet the plot according to the categories and change the label threshold 108 | GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3) 109 | ``` 110 | 111 | ![GOBubble2.](GOBubble2.png) 112 | 113 | For the facet plot it is also possible to colour the background of the panels according to the displayed category by setting *bg.col* to TRUE. 114 | 115 | ```{r GOBubble3, warning = FALSE, message = FALSE, fig.keep = 'none'} 116 | # Colour the background according to the category 117 | GOBubble(circ, title = 'Bubble plot with background colour', display = 'multiple', bg.col = T, labels = 3) 118 | ``` 119 | 120 | ![GOBubble3.](GOBubble3.png) 121 | 122 | A new function, *reduce_overlap*, was included in the updated version of the package to reduce the number of redundant terms. So far, the implemented method is very simple + slow and needs further refinement. Nevertheless, by reducing the number of redundant terms the readability of plots, like the bubble plot, improves significantly. The function deletes all terms that have a gene overlap greater than or equal to a set threshold. The function keeps one term per group as a representative without taking into consideration the GO hierarchy. 123 | 124 | ```{r GOBubble4, warning = FALSE, message = FALSE, fig.keep = 'none', eval = FALSE} 125 | # Reduce redundant terms with a gene overlap >= 0.75... 126 | reduced_circ <- reduce_overlap(circ, overlap = 0.75) 127 | # ...and plot it 128 | GOBubble(reduced_circ, labels = 2.8) 129 | ``` 130 | 131 | ![GOBubble4.](GOBubble4.png) 132 | 133 | ### Circular visualization of the results of gene- annotation enrichment analysis (GOCircle) 134 | The overview plots shall help to decide which of the terms are the most interesting to us. Of course, this decision depends although on the hypothesis and ideas you want to confirm with your data. Not always are the most significant terms the ones you are interested in. So, after manually selecting a set of valuable terms (EC$process) the next figure should provide us with more details on this specific terms. One of the major issues we figured out by presenting the plots was: it was sometimes difficult to interpret the information the z-score provides. Since the measure is not that common. As shown above it is simply the number of up- regulated genes minus the number of down- regulated genes divided by the square root of the count. The *GOCircle* plot emphasizes this fact. 135 | 136 | ```{r GOCircle1, warning = FALSE, message = FALSE, fig.keep = 'none'} 137 | # Generate a circular visualization of the results of gene- annotation enrichment analysis 138 | GOCircle(circ) 139 | ``` 140 | 141 | ![Circle plot.](GOCirc.png) 142 | 143 | The outer circle shows a scatter plot for each term of the logFC of the assigned genes. Red circles display up- regulation and blue ones down- regulation by default. The colours can be changed with the argument *lfc.col*. Therefore, it is easier to understand, why in some cases highly significant terms have a z-score close to zero. A z-score of zero does not mean that the term is not important. At least not as long as it is significantly enriched. It just shows that the z-score is a crude measure, because obviously the score does not take into account the functional level and activation dependencies of the single genes within a process. 144 | You can change the layout of the plot with various arguments, see ?GOCirlce.The *nsub* argument needs a little bit more explanation to be used wisely. First of all, it can be a numeric or a character vector. If it is a character vector then it contains the IDs or term descriptions of the processes you want to display (output not shown). 145 | 146 | ```{r GOCircle2, eval = FALSE} 147 | # Generate a circular visualization of selected terms 148 | IDs <- c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047') 149 | GOCircle(circ, nsub = IDs) 150 | ``` 151 | 152 | If *nsub* is a numeric vector then the number defines how many terms are displayed. It starts with the first row of the input data frame (output not shown). 153 | 154 | ```{r GOCircle3, eval = FALSE} 155 | # Generate a circular visualization for 10 terms 156 | GOCircle(circ, nsub = 10) 157 | ``` 158 | 159 | This kind of visualization is only useful for a smaller set of terms. The maximum number of terms lies around 12. While the number of terms decreases the amount of displayed information increases. 160 | 161 | ### Display of the relationship between genes and terms (GOChord) 162 | Based on the **[Circos](http://circos.ca/)** plots designed by *[Martin Krzywinski](http://mkweb.bcgsc.ca/)* the *GOChord* plotting function was implemented. It displays the relationship between a list of selected genes and terms, as well as the logFC of the genes. As an input a binary membership matrix is necessary. You can build the matrix on your own or you use the implemented function *chord_dat* which does the job for you. The function takes three arguments: *data*, *genes* and *process*, of which from the last two only one is mandatory. So, the *circle_dat* combined your expression data with the results from the functional analysis. The bar and bubble plot allowed you to get a first impression of your data and now, you selected a list of genes and processes you think are valuable. *GOCircle* adds a layer to display the expression values of the genes assigned to the terms, but it lacks the information of the relationship between the genes and the terms. It is not easy to figure out if some of the genes are linked to multiple processes. The chord plot fills the void left by *GOCircle*. 163 | 164 | ```{r GOChord1, warning = FALSE, message = FALSE} 165 | # Define a list of genes which you think are interesting to look at. The item EC$genes of the toy 166 | # sample contains the data frame of selected genes and their logFC. Have a look... 167 | head(EC$genes) 168 | # Since we have a lot of significantly enriched processes we selected some specific ones (EC$process) 169 | EC$process 170 | # Now it is time to generate the binary matrix 171 | chord <- chord_dat(circ, EC$genes, EC$process) 172 | head(chord) 173 | ``` 174 | 175 | Rows are genes and columns are terms. A '0' indicates that the gene is not assigned to the term; a '1' the opposite. As mentioned before it is possible to leave either the *genes* or the *process* argument out. If you pass on the *process* argument the binary matrix is build for the list of selected genes and all the processes with at least one assigned gene. On the other hand, if you just provide a set of processes without limiting the list of genes, the binary matrix is generated for all the genes which are assigned to at least one of the processes from your list (output not shown). 176 | 177 | ```{r GOChord2, eval=FALSE, warning = FALSE, message = FALSE} 178 | # Generate the matrix with a list of selected genes 179 | chord <- chord_dat(data = circ, genes = EC$genes) 180 | # Generate the matrix with selected processes 181 | chord <- chord_dat(data = circ, process = EC$process) 182 | ``` 183 | 184 | Be aware that a pass on either *genes* or *process* might lead to a large binary matrix which results in a confusing visualization. The chart was designed for smaller subsets of high-dimensional data. 185 | Like the other plotting functions *GOChord* provides the user with a lot of arguments to change the layout of the plot, see ?GOChord. Most of the arguments address the adjustment of the font size of the labels, the space between them, the colour scale for the logFC and the colour of the ribbons. Despite the asthetics there are two other arguments: *gene.order* and *nlfc*. The first argument defines the order of the genes with the three possible options: 'logFC', 'alphabetical', 'none'. Actually the options are quite self- explanatory. Sometimes you are performing the differential expression analysis for multiple conditions and/or batches. Therefore, you want to include more than one logFC value per gene. To adjust to this situation you should use the *nlfc* argument. It is a numeric value and it defines the number of logFC columns within your binary membership matrix. The default is '1' assuming that most of the time you just have one contrast and one logFC value per gene. 186 | 187 | 188 | ```{r GOChord3, warning = FALSE, message = FALSE, fig.keep = 'none'} 189 | # Create the plot 190 | GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5) 191 | ``` 192 | ![Chord1.](GOChord1.png) 193 | 194 | The *space* argument defines the space between the coloured rectangles representing the logFC. Also the font size of the gene labels (*gene.size*) and the space (*gene.space*) between them was changed. The genes were ordered according to their logFC values setting *gene.order* to 'logFC'. 195 | 196 | Sometimes the plot gets a bit crowded and you would like to reduce the number of displayed genes or processes. You can do this automatically by making use of the *limit* argument. Limit is a vector with two cutoff values (default = c(0, 0)). The first value defines the minimum (>=) number of terms a gene has to be assigned to. The second value determines the number of genes assigned to a selected term. For example, to display only genes which are assigned to at least three processes you would use the following line of code (output not shown): 197 | 198 | ```{r GOChord4, warning = FALSE, message = FALSE, fig.keep = 'none'} 199 | # Display only genes which are assigned to at least three processes 200 | GOChord(chord, limit = c(3, 0), gene.order = 'logFC') 201 | ``` 202 | 203 | ### Heatmap of genes and terms (GOHeat) 204 | Thanks to a very nice suggestion from *[Maureen Sartor, Ph.D.](http://sartorlab.ccmb.med.umich.edu/)* I implemented *GOHeat*. The *GOHeat* function generates a heatmap of the relationship between genes and terms similar to *GOChord*. Biological processes are displayed in rows and genes in columns. Each column is divided into smaller rectangles and the colouring of the tiles depends on the presence or abscence of logFC values. In addition genes are clustered to highlight groups of genes with similar annotated functions. Basically the function has two modes depending on the *nlfc* argument. If *nlfc = 0*, so no logFC values are available, the colouring encodes for the overall number of processes the respective gene is assigned to. Let's have a look at an example... 205 | 206 | ```{r GOHeat1, warning = FALSE, message = FALSE, fig.keep = 'none'} 207 | # First, we use the chord object without logFC column to create the heatmap 208 | GOHeat(chord[,-8], nlfc = 0) 209 | ``` 210 | 211 | ![Heat1.](GOHeat_nolfc.png) 212 | 213 | In case of *nlfc = 1* the colour corresponds to the logFC of the gene... 214 | 215 | ```{r GOHeat2, warning = FALSE, message = FALSE, fig.keep = 'none'} 216 | # First, we use the chord object without logFC column to create the heatmap 217 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green')) 218 | ``` 219 | 220 | ![Heat2.](GOHeat_lfc.png) 221 | 222 | ### Golden eye (GOCluster) 223 | The idea behind the *GOCluster* function is to visualize as much information as possible. Here is an example: 224 | 225 | ```{r GOCluster, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'} 226 | GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2) 227 | ``` 228 | ![GOCluster.](GOCluster.png) 229 | 230 | Hierarchical clustering is a popular method for gene expression analysis due to its unsupervised nature assuring an unbiased result. Genes are grouped together based on their expression patterns, thus clusters are likely to contain sets of co-regulated or functionally related genes. *GOCluster* performs the hierarchical clustering of the gene expression profiles using the *hclust* method in core R. If you want to change the distance metric or the clustering algorithm use the arguments *metric* and *clust*, respectively. The resulting dendrogram is transformed with the help of *ggdendro* to be suitable for a visualization with *ggplot2*. As before a circular layout was chosen, because it is not only effective but also visually appealing. The first ring next to the dendrogram represents the logFC of the genes, which are actually the leaves of the clustering tree. In case you are interested in more than one contrast the *nlfc* argument is also available for this function. By default it is set to '1', so only one ring is drawn. Like always the logFC values are colour- coded with an user- definable colour scale (*lfc.col*). The next ring represents the terms assigned to the genes. For aesthetic reasons the terms should be reduced to a reasonable number with the argument *process*. The terms are colour- coded as well and you can change the default colours by using the argument *term.col*. Once again, the plotting function provides you with a bunch of arguments to change the layout of the plot and you can check them out on the help page, ?GOCluster. Probably the most important argument of the function is *clust.by*. It expects a character vector specifying if the clustering should be done for gene expression pattern ('logFC', as in the figure above) or functional categories ('terms'). 231 | 232 | 233 | ```{r GOCluster2, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'} 234 | GOCluster(circ, EC$process, clust.by = 'term', lfc.col = c('darkgoldenrod1', 'black', 'cyan1')) 235 | ``` 236 | ![GOCluster2.](GOCluster2.png) 237 | 238 | ### Venn diagram (GOVenn) 239 | In this biological context we implemented a Venn diagram that can be used to detect relations between various lists of differentially expressed genes or to explore the intersection of genes of multiple terms from the functional analysis. The Venn diagram does not only display the number of overlap genes, but it also displays the information about the gene expression patterns (commonly up- regulated, commonly down- regulated or contra- regulated). At the moment, maximal three datasets are aloud as an input. The input data frame contains at least two columns: one for the gene names and one for the logFC value. 240 | 241 | ```{r GOVenn, warning=FALSE, message=FALSE, fig.keep='none'} 242 | l1 <- subset(circ, term == 'heart development', c(genes,logFC)) 243 | l2 <- subset(circ, term == 'plasma membrane', c(genes,logFC)) 244 | l3 <- subset(circ, term == 'tissue morphogenesis', c(genes,logFC)) 245 | GOVenn(l1,l2,l3, label = c('heart development', 'plasma membrane', 'tissue morphogenesis')) 246 | ``` 247 | ![Venn diagram.](GOVenn.png) 248 | 249 | For example, heart development and tissue morphogenesis share a set of 22 genes, whereas 5 are commonly up-regulated and 17 are commonly down-regulated. The important thing to notice is, that the pie charts don't display redundant information. Thus, if you compare three datasets the genes which are shared by all datasets (pie chart in the middle) are not included in the other pie charts. 250 | The following [link](https://wwalter.shinyapps.io/Venn/) refers to the shinyapp of this tool. The web tool is slightly more interactive since the circles are area-proportional to the number of genes of the dataset and the small pie charts can be moved with sliders. It has also all the other options of the *GOVenn* function to change the layout of the plot. You can easily download the picture and gene lists. 251 | -------------------------------------------------------------------------------- /man/EC.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \docType{data} 4 | \name{EC} 5 | \alias{EC} 6 | \title{Transcriptomic information of endothelial cells.} 7 | \format{A list containing 5 items} 8 | \source{ 9 | \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067} 10 | } 11 | \usage{ 12 | data(EC) 13 | } 14 | \description{ 15 | The data set contains the transcriptomic information of endothelial cells 16 | from two steady state tissues (brain and heart). More detailed information 17 | can be found in the paper by Nolan et al. 2013. The data was normalized and a 18 | statistical analysis was performed to determine differentially expressed 19 | genes. DAVID functional annotation tool was used to perform a gene- 20 | annotation enrichment analysis of the set of differentially expressed genes 21 | (adjusted p-value < 0.05). 22 | } 23 | \keyword{datasets} 24 | 25 | -------------------------------------------------------------------------------- /man/GOBar.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \name{GOBar} 4 | \alias{GOBar} 5 | \title{Z-score coloured barplot.} 6 | \usage{ 7 | GOBar(data, display, order.by.zscore = T, title, zsc.col) 8 | } 9 | \arguments{ 10 | \item{data}{A data frame containing at least the term ID and/or term, the 11 | adjusted p-value and the z-score. A possible input can be generated with 12 | the \code{circle_dat} function} 13 | 14 | \item{display}{A character vector indicating whether a single plot ('single') 15 | or a facet plot with panels for each category should be drawn 16 | (default='single')} 17 | 18 | \item{order.by.zscore}{Defines the order of the bars. If TRUE the bars are 19 | ordered according to the z-scores of the processes. Otherwise the bars are 20 | ordered by the negative logarithm of the adjusted p-value} 21 | 22 | \item{title}{The title of the plot} 23 | 24 | \item{zsc.col}{Character vector to define the colour scale for the z-score of 25 | the form c(high, midpoint,low)} 26 | } 27 | \description{ 28 | Z-score coloured barplot of terms ordered alternatively by 29 | z-score or the negative logarithm of the adjusted p-value 30 | } 31 | \details{ 32 | If \code{display} is used to facet the plot the width of the panels 33 | will be proportional to the length of the x scale. 34 | } 35 | \examples{ 36 | \dontrun{ 37 | #Load the included dataset 38 | data(EC) 39 | 40 | #Building the circ object 41 | circ<-circular_dat(EC$david, EC$genelist) 42 | 43 | #Creating the bar plot 44 | GOBar(circ) 45 | 46 | #Faceting the plot 47 | GOBar(circ, display='multiple') 48 | } 49 | } 50 | 51 | -------------------------------------------------------------------------------- /man/GOBubble.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \name{GOBubble} 4 | \alias{GOBubble} 5 | \title{Bubble plot.} 6 | \usage{ 7 | GOBubble(data, display, title, colour, labels, ID = T, table.legend = T, 8 | table.col = T, bg.col = F) 9 | } 10 | \arguments{ 11 | \item{data}{A data frame with coloumns for category, GO ID, term, adjusted 12 | p-value, z-score, count(num of genes)} 13 | 14 | \item{display}{A character vector. Indicates whether it should be a single 15 | plot ('single') or a facet plot with panels for each category 16 | (default='single')} 17 | 18 | \item{title}{The title (on top) of the plot} 19 | 20 | \item{colour}{A character vector which defines the colour of the bubbles for 21 | each category} 22 | 23 | \item{labels}{Sets a threshold for the displayed labels. The threshold refers 24 | to the -log(adjusted p-value) (default=5)} 25 | 26 | \item{ID}{If TRUE then labels are IDs else terms} 27 | 28 | \item{table.legend}{Defines whether a table of GO ID and GO term should be 29 | displayed on the right side of the plot or not (default = TRUE)} 30 | 31 | \item{table.col}{If TRUE then the table entries are coloured according to 32 | their category, if FALSE then entries are black} 33 | 34 | \item{bg.col}{Should only be used in case of a facet plot. If TRUE then the 35 | panel backgrounds are coloured according to the displayed category} 36 | } 37 | \description{ 38 | The function creates a bubble plot of the input \code{data}. The 39 | input \code{data} can be created with the help of the 40 | \code{\link{circle_dat}} function. 41 | } 42 | \details{ 43 | The x- axis of the plot represents the z-score. The negative 44 | logarithm of the adjusted p-value (corresponding to the significance of the 45 | term) is displayed on the y-axis. The area of the plotted circles is 46 | proportional to the number of genes assigned to the term. Each circle is 47 | coloured according to its category and labeled alternatively with the ID or 48 | term name.If static is set to FALSE the mouse hover effect will be enabled. 49 | } 50 | \examples{ 51 | \dontrun{ 52 | #Load the included dataset 53 | data(EC) 54 | 55 | #Building the circ object 56 | circ <- circular_dat(EC$david, EC$genelist) 57 | 58 | #Creating the bubble plot colouring the table entries according to the category 59 | GOBubble(circ, table.col = T) 60 | 61 | #Creating the bubble plot displaying the term instead of the ID and without the table 62 | GOBubble(circ, ID = F, table.legend = F) 63 | 64 | #Faceting the plot 65 | GOBubble(circ, display = 'multiple') 66 | } 67 | } 68 | 69 | -------------------------------------------------------------------------------- /man/GOChord.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCluster.R 3 | \name{GOChord} 4 | \alias{GOChord} 5 | \title{Displays the relationship between genes and terms.} 6 | \usage{ 7 | GOChord(data, title, space, gene.order, gene.size, gene.space, nlfc = 1, 8 | lfc.col, lfc.min, lfc.max, ribbon.col, border.size, process.label, limit) 9 | } 10 | \arguments{ 11 | \item{data}{The matrix represents the binary relation (1= is related to, 0= 12 | is not related to) between a set of genes (rows) and processes (columns); a 13 | column for the logFC of the genes is optional} 14 | 15 | \item{title}{The title (on top) of the plot} 16 | 17 | \item{space}{The space between the chord segments of the plot} 18 | 19 | \item{gene.order}{A character vector defining the order of the displayed gene 20 | labels} 21 | 22 | \item{gene.size}{The size of the gene labels} 23 | 24 | \item{gene.space}{The space between the gene labels and the segement of the 25 | logFC} 26 | 27 | \item{nlfc}{Defines the number of logFC columns (default=1)} 28 | 29 | \item{lfc.col}{The fill color for the logFC specified in the following form: 30 | c(color for low values, color for the mid point, color for the high values)} 31 | 32 | \item{lfc.min}{Specifies the minimium value of the logFC scale (default = -3)} 33 | 34 | \item{lfc.max}{Specifies the maximum value of the logFC scale (default = 3)} 35 | 36 | \item{ribbon.col}{The background color of the ribbons} 37 | 38 | \item{border.size}{Defines the size of the ribbon borders} 39 | 40 | \item{process.label}{The size of the legend entries} 41 | 42 | \item{limit}{A vector with two cutoff values (default= c(0,0)). The first 43 | value defines the minimum number of terms a gene has to be assigned to. The 44 | second the minimum number of genes assigned to a selected term.} 45 | } 46 | \description{ 47 | The GOChord function generates a circularly composited overview 48 | of selected/specific genes and their assigned processes or terms. More 49 | generally, it joins genes and processes via ribbons in an intersection-like 50 | graph. The input can be generated with the \code{\link{chord_dat}} 51 | function. 52 | } 53 | \details{ 54 | The \code{gene.order} argument has three possible options: "logFC", 55 | "alphabetical", "none", which are quite self- explanatory. 56 | 57 | Maybe the most important argument of the function is \code{nlfc}.If your 58 | \code{data} does not contain a column of logFC values you have to set 59 | \code{nlfc = 0}. Differential expression analysis can be performed for 60 | multiple conditions and/or batches. Therefore, the data frame might contain 61 | more than one logFC value per gene. To adjust to this situation the 62 | \code{nlfc} argument is used as well. It is a numeric value and it defines 63 | the number of logFC columns of your \code{data}. The default is "1" 64 | assuming that most of the time only one contrast is considered. 65 | 66 | To represent the data more useful it might be necessary to reduce the 67 | dimension of \code{data}. This can be achieved with \code{limit}. The first 68 | value of the vector defines the threshold for the minimum number of terms a 69 | gene has to be assigned to in order to be represented in the plot. Most of 70 | the time it is more meaningful to represent genes with various functions. A 71 | value of 3 excludes all genes with less than three term assignments. 72 | Whereas the second value of the parameter restricts the number of terms 73 | according to the number of assigned genes. All terms with a count smaller 74 | or equal to the threshold are excluded. 75 | } 76 | \examples{ 77 | \dontrun{ 78 | # Load the included dataset 79 | data(EC) 80 | 81 | # Generating the binary matrix 82 | chord<-chord_dat(circ,EC$genes,EC$process) 83 | 84 | # Creating the chord plot 85 | GOChord(chord) 86 | 87 | # Excluding process with less than 5 assigned genes 88 | GOChord(chord, limit = c(0,5)) 89 | 90 | # Creating the chord plot genes ordered by logFC and a different logFC color scale 91 | GOChord(chord,space=0.02,gene.order='logFC',lfc.col=c('red','black','cyan')) 92 | } 93 | } 94 | \seealso{ 95 | \code{\link{chord_dat}} 96 | } 97 | 98 | -------------------------------------------------------------------------------- /man/GOCircle.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \name{GOCircle} 4 | \alias{GOCircle} 5 | \title{Circular visualization of the results of a functional analysis.} 6 | \usage{ 7 | GOCircle(data, title, nsub, rad1, rad2, table.legend = T, zsc.col, lfc.col, 8 | label.size, label.fontface) 9 | } 10 | \arguments{ 11 | \item{data}{A special data frame which should be the result of 12 | \code{circle_dat}} 13 | 14 | \item{title}{The title of the plot} 15 | 16 | \item{nsub}{A numeric or character vector. If it's numeric then the number 17 | defines how many processes are displayed (starting from the first row of 18 | \code{data}). If it's a character string of processes then these processes 19 | are displayed} 20 | 21 | \item{rad1}{The radius of the inner circle (default=2)} 22 | 23 | \item{rad2}{The radius of the outer circle (default=3)} 24 | 25 | \item{table.legend}{Shall a table be displayd or not? (default=TRUE)} 26 | 27 | \item{zsc.col}{Character vector to define the colour scale for the z-score of 28 | the form c(high, midpoint,low)} 29 | 30 | \item{lfc.col}{A character vector specifying the colour for up- and 31 | down-regulated genes} 32 | 33 | \item{label.size}{Size of the segment labels (default=5)} 34 | 35 | \item{label.fontface}{Font style of the segment labels (default='bold')} 36 | } 37 | \description{ 38 | The circular plot combines gene expression and gene- annotation 39 | enrichment data. A subset of terms is displayed like the \code{GOBar} plot 40 | in combination with a scatterplot of the gene expression data. The whole 41 | plot is drawn on a specific coordinate system to achieve the circular 42 | layout.The segments are labeled with the term ID. 43 | } 44 | \details{ 45 | The outer circle shows a scatter plot for each term of the logFC of 46 | the assigned genes. The colours can be changed with the argument 47 | \code{lfc.col}. 48 | 49 | The \code{nsub} argument needs a bit more explanation to be used wisely. First of 50 | all, it can be a numeric or a character vector. If it is a character vector 51 | then it contains the IDs or term descriptions of the displayed processes.If 52 | \code{nsub} is a numeric vector then the number defines how many terms are 53 | displayed. It starts with the first row of the input data frame. 54 | } 55 | \examples{ 56 | \dontrun{ 57 | # Load the included dataset 58 | data(EC) 59 | 60 | # Building the circ object 61 | circ <- circle_dat(EC$david, EC$genelist) 62 | 63 | # Creating the circular plot 64 | GOCircle(circ) 65 | 66 | # Creating the circular plot with a different colour scale for the logFC 67 | GOCircle(circ, lfc.col = c('purple', 'orange')) 68 | 69 | # Creating the circular plot with a different colour scale for the z-score 70 | GOCircle(circ, zsc.col = c('yellow', 'black', 'cyan')) 71 | 72 | # Creating the circular plot with different font style 73 | GOCircle(circ, label.size = 5, label.fontface = 'italic') 74 | } 75 | } 76 | \seealso{ 77 | \code{\link{circle_dat}}, \code{\link{GOBar}} 78 | } 79 | 80 | -------------------------------------------------------------------------------- /man/GOCluster.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCluster.R 3 | \name{GOCluster} 4 | \alias{GOCluster} 5 | \title{Circular dendrogram.} 6 | \usage{ 7 | GOCluster(data, process, metric, clust, clust.by, nlfc, lfc.col, lfc.min, 8 | lfc.max, lfc.space, lfc.width, term.col, term.space, term.width) 9 | } 10 | \arguments{ 11 | \item{data}{A data frame which should be the result of 12 | \code{\link{circle_dat}} in case the data contains only one logFC column. 13 | Otherwise \code{data} is a data frame whereas the first column contains the 14 | genes, the second the term and the following columns the logFCs of the 15 | different contrasts.} 16 | 17 | \item{process}{A character vector of selected processes (ID or term 18 | description)} 19 | 20 | \item{metric}{A character vector specifying the distance measure to be used 21 | (default='euclidean'), see \code{dist}} 22 | 23 | \item{clust}{A character vector specifying the agglomeration method to be 24 | used (default='average'), see \code{hclust}} 25 | 26 | \item{clust.by}{A character vector specifying if the clustering should be 27 | done for gene expression pattern or functional categories. By default the 28 | clustering is done based on the functional categories.} 29 | 30 | \item{nlfc}{If TRUE \code{data} contains multiple logFC columns (default= 31 | FALSE)} 32 | 33 | \item{lfc.col}{Character vector to define the color scale for the logFC of 34 | the form c(high, midpoint,low)} 35 | 36 | \item{lfc.min}{Specifies the minimium value of the logFC scale (default = -3)} 37 | 38 | \item{lfc.max}{Specifies the maximum value of the logFC scale (default = 3)} 39 | 40 | \item{lfc.space}{The space between the leafs of the dendrogram and the ring 41 | for the logFC} 42 | 43 | \item{lfc.width}{The width of the logFC ring} 44 | 45 | \item{term.col}{A character vector specifying the colors of the term bands} 46 | 47 | \item{term.space}{The space between the logFC ring and the term ring} 48 | 49 | \item{term.width}{The width of the term ring} 50 | } 51 | \description{ 52 | GOCluster generates a circular dendrogram of the \code{data} 53 | clustering using by default euclidean distance and average linkage.The 54 | inner ring displays the color coded logFC while the outside one encodes the 55 | assigned terms to each gene. 56 | } 57 | \details{ 58 | The inner ring can be split into smaller rings to display multiply 59 | logFC values resulting from various comparisons. 60 | } 61 | \examples{ 62 | \dontrun{ 63 | #Load the included dataset 64 | data(EC) 65 | 66 | #Generating the circ object 67 | circ<-circular_dat(EC$david, EC$genelist) 68 | 69 | #Creating the cluster plot 70 | GOCluster(circ, EC$process) 71 | 72 | #Cluster the data according to gene expression and assigning a different color scale for the logFC 73 | GOCluster(circ,EC$process,clust.by='logFC',lfc.col=c('darkgoldenrod1','black','cyan1')) 74 | } 75 | } 76 | 77 | -------------------------------------------------------------------------------- /man/GOHeat.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOHeat.R 3 | \name{GOHeat} 4 | \alias{GOHeat} 5 | \title{Displays heatmap of the relationship between genes and terms.} 6 | \usage{ 7 | GOHeat(data, nlfc, fill.col) 8 | } 9 | \arguments{ 10 | \item{data}{The matrix represents the binary relation (1= is related to, 0= 11 | is not related to) between a set of genes (rows) and processes (columns)} 12 | 13 | \item{nlfc}{Defines the number of logFC columns (default = 0)} 14 | 15 | \item{fill.col}{Defines the color scale break points} 16 | } 17 | \description{ 18 | The GOHeat function generates a heatmap of the relationship 19 | between genes and terms. Biological processes are displayed in rows and 20 | genes in columns. In addition genes are clustered to highlight groups of 21 | genes with similar annotated functions. The input can be generated with the 22 | \code{\link{chord_dat}} function. 23 | } 24 | \details{ 25 | The heatmap has in general two modes which depend on the \code{nlfc} 26 | argument. If \code{nlfc = 0}, so no logFC values are available, the 27 | coloring encodes for the overall number of processes the respective gene is 28 | assigned to. In case of \code{nlfc = 1} the color corresponds to the logFC 29 | of the gene. 30 | } 31 | \examples{ 32 | \dontrun{ 33 | # Load the included dataset 34 | data(EC) 35 | 36 | # Generate the circ object 37 | circ <- circle_dat(EC$david, EC$genelist) 38 | 39 | # Generate the chord object 40 | chord <- chord_dat(circ, EC$genes, EC$process) 41 | 42 | # Create the plot with user-defined colors 43 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green')) 44 | } 45 | } 46 | 47 | -------------------------------------------------------------------------------- /man/GOVenn.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOVenn.R 3 | \name{GOVenn} 4 | \alias{GOVenn} 5 | \title{Venn diagram of differentially expressed genes.} 6 | \usage{ 7 | GOVenn(data1, data2, data3, title, label, lfc.col, circle.col, plot = T) 8 | } 9 | \arguments{ 10 | \item{data1}{A data frame consisting of two columns: ID, logFC} 11 | 12 | \item{data2}{A data frame consisting of two columns: ID, logFC} 13 | 14 | \item{data3}{A data frame consisting of two columns: ID, logFC} 15 | 16 | \item{title}{The title of the plot} 17 | 18 | \item{label}{A character vector to define the legend keys} 19 | 20 | \item{lfc.col}{A character vector determining the background colors of the 21 | pie segments representing up- and down- regulated genes} 22 | 23 | \item{circle.col}{A character vector to assign clockwise colors for the 24 | circles} 25 | 26 | \item{plot}{If TRUE only the venn diagram is plotted. Otherwise the function 27 | returns a list with two items: the actual plot and a list containing the 28 | overlap entries (default= TRUE)} 29 | } 30 | \description{ 31 | The function compares lists of differentially expressed genes 32 | and illustrates possible relations.Additionally it represents the variety 33 | of gene expression patterns within the intersection in small pie charts 34 | with three segements. Clockwise are shown the number of commonly up- 35 | regulated, commonly down- regulated and contra- regulated genes. 36 | } 37 | \details{ 38 | The \code{plot} argument can be used to adjust the amount of 39 | information that is returned by calling the function. If you are only 40 | interested in the actual plot of the venn diagram, \code{plot} should be 41 | set to TRUE. Sometimes you also want to know the elements of the 42 | intersections. In this case \code{plot} should be set to FALSE and the 43 | function call will return a list of two items. The first item, that can be 44 | accessed by $plot, contains the plotting information. Additionally, a list 45 | ($table) will be returned containing the elements of the various overlaps. 46 | } 47 | \examples{ 48 | \dontrun{ 49 | #Load the included dataset 50 | data(EC) 51 | 52 | #Generating the circ object 53 | circ<-circular_dat(EC$david, EC$genelist) 54 | 55 | #Selecting terms of interest 56 | l1<-subset(circ,term=='heart development',c(genes,logFC)) 57 | l2<-subset(circ,term=='plasma membrane',c(genes,logFC)) 58 | l3<-subset(circ,term=='tissue morphogenesis',c(genes,logFC)) 59 | 60 | GOVenn(l1,l2,l3, label=c('heart development','plasma membrane','tissue morphogenesis')) 61 | } 62 | } 63 | 64 | -------------------------------------------------------------------------------- /man/chord_dat.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \name{chord_dat} 4 | \alias{chord_dat} 5 | \title{Creates a binary matrix.} 6 | \usage{ 7 | chord_dat(data, genes, process) 8 | } 9 | \arguments{ 10 | \item{data}{A data frame with at least two coloumns: GO ID|term and genes. 11 | Each row contains exactly one GO ID|term and one gene. A column containing 12 | logFC values is optional and might be used if \code{genes} is missing.} 13 | 14 | \item{genes}{A character vector of selected genes OR data frame with coloumns 15 | for gene ID and logFC.} 16 | 17 | \item{process}{A character vector of selected processes} 18 | } 19 | \value{ 20 | A binary matrix 21 | } 22 | \description{ 23 | The function creates a matrix which represents the binary 24 | relation (1= is related to, 0= is not related to) between selected genes 25 | (row) and processes (column). The resulting matrix can be visualized with 26 | the \code{\link{GOChord}} function. 27 | } 28 | \details{ 29 | If more than one logFC value for each gene is at disposal, only one 30 | should be used to create the binary matrix. The other values have to be 31 | added manually later. 32 | } 33 | \examples{ 34 | \dontrun{ 35 | # Load the included dataset 36 | data(EC) 37 | 38 | # Building the circ object 39 | circ <- circle_dat(EC$david, EC$genelist) 40 | 41 | # Building the binary matrix 42 | chord <- chord_dat(circ, EC$genes, EC$process) 43 | 44 | } 45 | } 46 | \seealso{ 47 | \code{\link{GOChord}} 48 | } 49 | 50 | -------------------------------------------------------------------------------- /man/circle_dat.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \name{circle_dat} 4 | \alias{circle_dat} 5 | \title{Creates a plotting object.} 6 | \usage{ 7 | circle_dat(terms, genes) 8 | } 9 | \arguments{ 10 | \item{terms}{A data frame with columns for 'category', 'ID', 'term', adjusted 11 | p-value ('adj_pval') and 'genes'} 12 | 13 | \item{genes}{A data frame with columns for 'ID', 'logFC'} 14 | } 15 | \description{ 16 | The function takes the results from a functional analysis (for 17 | example DAVID) and combines it with a list of selected genes and their 18 | logFC. The resulting data frame can be used as an input for various ploting 19 | functions. 20 | } 21 | \details{ 22 | Since most of the gene- annotation enrichment analysis are based on 23 | the gene ontology database the package was build with this structure in 24 | mind, but is not restricted to it. Gene ontology is structured as an 25 | acyclic graph and it provides terms covering different areas. These terms 26 | are grouped into three independent \code{categories}: BP (biological 27 | process), CC (cellular component) or MF (molecular function). 28 | 29 | The "ID" and "term" columns of the \code{terms} data frame refer to the ID 30 | and term description, whereas the ID is optional. 31 | 32 | The "ID" column of the \code{genes} data frame can contain any unique 33 | identifier. Nevertheless, the identifier has to be the same as in "genes" 34 | from \code{terms}. 35 | } 36 | \examples{ 37 | \dontrun{ 38 | #Load the included dataset 39 | data(EC) 40 | 41 | #Building the circ object 42 | circ<-circular_dat(EC$david, EC$genelist) 43 | } 44 | } 45 | 46 | -------------------------------------------------------------------------------- /man/reduce_overlap.Rd: -------------------------------------------------------------------------------- 1 | % Generated by roxygen2: do not edit by hand 2 | % Please edit documentation in R/GOCore.R 3 | \name{reduce_overlap} 4 | \alias{reduce_overlap} 5 | \title{Eliminates redundant terms.} 6 | \usage{ 7 | reduce_overlap(data, overlap) 8 | } 9 | \arguments{ 10 | \item{data}{A data frame created with \code{circle_dat}.} 11 | 12 | \item{overlap}{Skalar indicating the threshold for gene overlap (default = 0.75).} 13 | } 14 | \description{ 15 | The function eliminates all terms with a gene overlap >= set 16 | threshold (\code{overlap}) The reduced dataset can be used to improve the 17 | readability of plots such as \code{GOBubble} and \code{GOBar} 18 | } 19 | \details{ 20 | The function is currently very slow. 21 | } 22 | \examples{ 23 | \dontrun{ 24 | # Load the included dataset 25 | data(EC) 26 | 27 | # Building the circ object 28 | circ <- circle_dat(EC$david, EC$genelist) 29 | 30 | # Eliminate redundant terms 31 | reduced_circ <- reduce_overlap(circ) 32 | 33 | # Plot reduced data 34 | GOBubble(reduced_circ) 35 | 36 | } 37 | } 38 | 39 | -------------------------------------------------------------------------------- /vignettes/GOBar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBar.png -------------------------------------------------------------------------------- /vignettes/GOBubble1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble1.png -------------------------------------------------------------------------------- /vignettes/GOBubble2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble2.png -------------------------------------------------------------------------------- /vignettes/GOBubble3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble3.png -------------------------------------------------------------------------------- /vignettes/GOBubble4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble4.png -------------------------------------------------------------------------------- /vignettes/GOChord1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOChord1.png -------------------------------------------------------------------------------- /vignettes/GOCirc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOCirc.png -------------------------------------------------------------------------------- /vignettes/GOCluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOCluster.png -------------------------------------------------------------------------------- /vignettes/GOCluster2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOCluster2.png -------------------------------------------------------------------------------- /vignettes/GOHeat_lfc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOHeat_lfc.png -------------------------------------------------------------------------------- /vignettes/GOHeat_nolfc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOHeat_nolfc.png -------------------------------------------------------------------------------- /vignettes/GOVenn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOVenn.png -------------------------------------------------------------------------------- /vignettes/GOplot.css: -------------------------------------------------------------------------------- 1 | body { 2 | background-color: #fff; 3 | margin: 1em auto; 4 | max-width: 800px; 5 | overflow: visible; 6 | padding-left: 2em; 7 | padding-right: 2em; 8 | font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; 9 | font-size: 14px; 10 | line-height: 20px; 11 | } 12 | 13 | #header { 14 | text-align: center; 15 | color: #660000; 16 | } 17 | 18 | #TOC { 19 | clear: both; 20 | margin: 0 0 10px 0; 21 | padding: 4px; 22 | border: 1px solid #CCCCCC; 23 | border-radius: 5px; 24 | background-color: #f6f6f6; 25 | font-size: 13px; 26 | line-height: 1.3; 27 | } 28 | #TOC .toctitle { 29 | font-weight: bold; 30 | font-size: 15px; 31 | margin-left: 5px; 32 | } 33 | 34 | #TOC ul { 35 | padding-left: 40px; 36 | margin-left: -1.5em; 37 | margin-top: 5px; 38 | margin-bottom: 5px; 39 | } 40 | #TOC ul ul { 41 | margin-left: -2em; 42 | } 43 | #TOC li { 44 | line-height: 16px; 45 | } 46 | 47 | table { 48 | margin: auto; 49 | min-width: 40%; 50 | border-width: 1px; 51 | border-color: #DDDDDD; 52 | border-style: outset; 53 | border-collapse: collapse; 54 | } 55 | table[summary="R argblock"] { 56 | width: 100%; 57 | border: none; 58 | } 59 | table th { 60 | border-width: 2px; 61 | padding: 5px; 62 | border-style: inset; 63 | } 64 | table td { 65 | border-width: 1px; 66 | border-style: inset; 67 | line-height: 18px; 68 | padding: 5px 5px; 69 | } 70 | table, table th, table td { 71 | border-left-style: none; 72 | border-right-style: none; 73 | } 74 | table tr.odd { 75 | background-color: #E5E4E2; 76 | } 77 | 78 | p { 79 | margin: 0.5em 0; 80 | } 81 | 82 | blockquote { 83 | background-color: #f6f6f6; 84 | padding: 13px; 85 | padding-bottom: 1px; 86 | } 87 | 88 | hr { 89 | border-style: solid; 90 | border: none; 91 | border-top: 1px solid #777; 92 | margin: 28px 0; 93 | background-color: darked; 94 | } 95 | 96 | dl { 97 | margin-left: 0; 98 | } 99 | dl dd { 100 | margin-bottom: 13px; 101 | margin-left: 13px; 102 | } 103 | dl dt { 104 | font-weight: bold; 105 | } 106 | 107 | ul { 108 | margin-top: 0; 109 | } 110 | ul li { 111 | list-style: circle outside; 112 | } 113 | ul ul { 114 | margin-bottom: 0; 115 | } 116 | 117 | pre, code { 118 | background-color: #f5f5f5; 119 | border-radius: 3px; 120 | color: #333; 121 | } 122 | pre { 123 | overflow-x: auto; 124 | border-radius: 3px; 125 | margin: 5px 0px 10px 0px; 126 | padding: 10px; 127 | } 128 | pre:not([class]) { 129 | background-color: white; 130 | border: #f5f5f5 1px solid; 131 | } 132 | pre:not([class]) code { 133 | color: #444; 134 | background-color: white; 135 | } 136 | code { 137 | font-family: monospace; 138 | font-size: 90%; 139 | } 140 | p > code, li > code { 141 | padding: 2px 4px; 142 | color: #d14; 143 | border: 1px solid #e1e1e8; 144 | white-space: inherit; 145 | } 146 | div.figure { 147 | text-align: center; 148 | width: 100%; 149 | height: 50%; 150 | } 151 | table > caption, div.figure p.caption { 152 | font-style: italic; 153 | } 154 | table > caption span, div.figure p.caption span { 155 | font-style: normal; 156 | font-weight: bold; 157 | } 158 | p { 159 | margin: 0 0 10px; 160 | } 161 | table { 162 | margin: auto auto 10px auto; 163 | } 164 | 165 | img { 166 | background-color: #FFFFFF; 167 | padding: 2px; 168 | border: 1px solid darkred; 169 | border-radius: 3px; 170 | margin-left: auto; 171 | margin-right: auto; 172 | max-width: 95%; 173 | } 174 | 175 | h1 { 176 | margin-top: 0; 177 | font-size: 35px; 178 | line-height: 40px; 179 | } 180 | 181 | h2 { 182 | border-bottom: 3px solid darkred; 183 | padding-top: 10px; 184 | padding-bottom: 2px; 185 | font-size: 145%; 186 | } 187 | 188 | h3 { 189 | padding-top: 10px; 190 | font-size: 120%; 191 | } 192 | 193 | h4 { 194 | margin-left: 8px; 195 | font-size: 105%; 196 | } 197 | 198 | h5, h6 { 199 | font-size: 105%; 200 | } 201 | 202 | a { 203 | color: #0033dd; 204 | text-decoration: none; 205 | } 206 | a:hover { 207 | color: #6666ff; } 208 | a:visited { 209 | color: #800080; } 210 | a:visited:hover { 211 | color: #BB00BB; } 212 | a[href^="http:"] { 213 | text-decoration: underline; } 214 | a[href^="https:"] { 215 | text-decoration: underline; } 216 | 217 | div.r-help-page { 218 | background-color: #f9f9f9; 219 | border-bottom: #ddd 1px solid; 220 | margin-bottom: 10px; 221 | padding: 10px; 222 | } 223 | div.r-help-page:hover { 224 | background-color: #f4f4f4; 225 | } 226 | 227 | /* Class described in https://benjeffrey.com/posts/pandoc-syntax-highlighting-css 228 | Colours from https://gist.github.com/robsimmons/1172277 */ 229 | 230 | code > span.kw { color: #555; font-weight: bold; } /* Keyword */ 231 | code > span.dt { color: black; } /* DataType */ 232 | code > span.dv { color: #40a070; } /* DecVal (decimal values) */ 233 | /*code > span.bn { color: #d14; } BaseN */ 234 | /*code > span.fl { color: #d14; } Float */ 235 | /*code > span.ch { color: #d14; } Char */ 236 | /*code > span.st { color: #d14; } String */ 237 | code > span.co { color: darkred; font-style: italic; } /* Comment */ 238 | /*code > span.ot { color: #007020; } OtherToken */ 239 | /*code > span.al { color: #ff0000; font-weight: bold; } AlertToken */ 240 | /*code > span.fu { color: #900; font-weight: bold; } Function calls */ 241 | /*code > span.er { color: #a61717; background-color: #e3d2d2; } ErrorTok */ 242 | -------------------------------------------------------------------------------- /vignettes/GOplot_vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "GOplot 1.0.2" 3 | author: "Wencke Walter" 4 | date: "`r Sys.Date()`" 5 | output: 6 | rmarkdown::html_vignette: 7 | css: GOplot.css 8 | vignette: > 9 | %\VignetteIndexEntry{GOplot_0.2} 10 | %\VignetteEngine{knitr::rmarkdown} 11 | \usepackage[utf8]{inputenc} 12 | --- 13 | 14 | A manual to exploit the possibilities and limitations of the R package GOplot. 15 | 16 | 17 | ##Introduction 18 | The GOplot package concentrates on the visualization of biological data. More precisely, the package will help combine and integrate expression data with the results of a functional analysis. The package cannot be used to perform any of these analyses. It is for visualization purpose only. In all the scientific fields we visualize information to meet a basic need- to tell a story. Attributable to space restrictions and a general need to present everything neat and tidy most of the times it is simply not possible to actually tell a story. Therefore, we use vision to communicate information. A well designed and elaborated figure provides the beholder with high-dimensional information in a much smaller space than for example a table. The idea of the package is to provide the user with functions that allow a quick examination of large amounts of data, expose trends and find patterns & correlations within the data. Effective data visualization is an important tool in the decision making process and helps to find further pieces of the puzzle picturing the answer of your biological question. Based on that you will be able to confirm or falsify your hypotheses. You might even start to look in a different direction to investigate your topic relying on the insight a new visualization provides. The plotting functions of the package were developed with a hierarchical structure in mind; starting with a general overview and closing with definite subsets of selected genes and terms. To explain the idea let us use an example. 19 | 20 | ##The toy example 21 | GOplot comes with a manually compiled data set. Selected samples were downloaded from gene expression omnibus (accession number: *[GSE47067](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067)*). As a brief summary, the data set contains the transcriptomic information of endothelial cells from two steady state tissues (brain and heart). More detailed information can be found in the paper by *[Nolan et al. 2013](http://www.ncbi.nlm.nih.gov/pubmed/23871589)*. The data was normalized and a statistical analysis was performed to determine differentially expressed genes. *DAVID* functional annotation tool was used to perform a gene- annotation enrichment analysis of the set of differentially expressed genes (adjusted p-value < 0.05). The data set contains the five following items: 22 | 23 | ```{r table1, echo = FALSE, results = 'asis'} 24 | toy<-data.frame(Name=c('EC$eset','EC$genelist','EC$david','EC$genes','EC$process'),Description=c('Data frame of normalized expression values of brain and heart endothelial cells (3 replicates)','Data frame of differentially expressed genes (adjusted p-value < 0.05)','Data frame of results from a functional analysis of the differentially expressed genes performed with DAVID','Data frame of selected genes with logFC','Character vector of selected enriched biological processes'),Dimension=c('20644 x 7','2039 x 7','174 x 5','37 x 2','7')) 25 | knitr::kable(toy, colnames=c('Name','Description','Dimension (row, col)')) 26 | ``` 27 | 28 | ##Getting started 29 | As a first step we want to get an overview of the enriched GO terms of our differentially expressed genes. But before we start plotting we need to bring the data in the right format for the plotting functions. In general, the data object of the plotting functions can be created manually, but the package includes a function that does the job for you. The *circle_dat* function combines the result of the functional analysis with a list of selected genes and their logFC. Most likely a list of differentially expressed genes. *circle_dat* takes two data frames as an input. The first one contains the results of the functional analysis and should have at least four columns (category, term, genes, adjusted p-value). Additionally, a data frame of the selected genes and their logFC is needed. This data frame can be, for example, the result from a statistical analysis performed with *limma*. Let us have a look at the mentioned data frames. 30 | 31 | ```{r glimpse, warning = FALSE, message = FALSE} 32 | library(GOplot) 33 | # Load the dataset 34 | data(EC) 35 | # Get a glimpse of the data format of the results of the functional analysis... 36 | head(EC$david) 37 | # ...and of the data frame of selected genes 38 | head(EC$genelist) 39 | ``` 40 | 41 | Now, that we know what the input data looks like it's time to use the *cirlce_dat* function to create the plotting object. 42 | 43 | ```{r circ_object, warning = FALSE, message = FALSE} 44 | # Generate the plotting object 45 | circ <- circle_dat(EC$david, EC$genelist) 46 | ``` 47 | 48 | The **circ** object has eight columns with the following names: 49 | 50 | * category 51 | * ID 52 | * term 53 | * count 54 | * gene 55 | * logFC 56 | * adj_pval 57 | * zscore 58 | 59 | Since most of the gene- annotation enrichment analysis are based on the gene ontology database the package was build with this structure in mind, but is not restricted to it. As explained by *[Ashburner et al.](http://www.ncbi.nlm.nih.gov/pubmed/10802651)* in their paper from the year 2000, gene ontology is structured as an acyclic graph and it provides terms covering different areas. These terms are grouped into three independent categories: BP (biological process), CC (cellular component) or MF (molecular function). The first column of the **circ** object contains this information, which was already given in the input. For more information on the structure of gene ontology, have a look at the documentation section of the gene ontology consortium [website](http://geneontology.org/page/ontology-documentation). 60 | All the terms from inside the gene ontology database come with a GO **ID** and a GO **term** description. The **ID** column of the circ object is optional. So in case you want to use a functional analysis tool that is not based on gene ontology you won't have an **ID** column. The term description column does contain just that: a description of the term and the performance of the implemented functions does not depend on possible resemblance with gene ontology terms. **Count** is the number of genes assigned to a term. **Gene** names and their **logFC** are taken from the input list of selected genes. The significance of a term is indicated by the adjusted p-value (**adj_pval**). Terms with an adjusted p-value < 0.05 are considered as significantly enriched and are more likely to provide reliable information. The last column contains the **zscore**, an easy to calculate value to give you a hint if the biological process (/molecular function/cellular components) is more likely to be decreased (negative value) or increased (positive value). 61 | 62 | $$zscore=\frac{(up-down)}{\sqrt{count}}$$ 63 | 64 | Whereas *up* and *down* are the number of assigned genes up-regulated (logFC>0) in the data or down- regulated (logFC<0), respectively. 65 | 66 | ##The plots 67 | 68 | ###The modified barplot (GOBar) 69 | Since we do not really know what to expect from our data the aim of the first figure should be to display as many terms as possible without going too much into details. Nevertheless, the figure shall help us to pick the interesting and valuable terms. Therefore, we need to have some parameters to quantify the importance. Since the majority of scientific sampled data is plotted using bar charts a modified version of the normal barplot function, named *GOBar*, is included. The *GOBar* function allows the user to quickly create an appealing barplot. 70 | 71 | ```{r GOBar, warning = FALSE, message = FALSE, fig.width = 8.3, fig.height = 6} 72 | # Generate a simple barplot 73 | GOBar(subset(circ, category == 'BP')) 74 | ``` 75 | 76 | On the y-axis the significance of the terms is shown and the bars are ordered according to their z-score. If you want, you can change the order by setting the argument *order.by.zscore* to FALSE. In this case the bars are ordered based on their significance. Additionally, the barplot can be easily faceted according to the categories of the terms using the argument *display* of the plotting function (output not shown). 77 | 78 | ```{r GOBar2, eval = FALSE, warning = FALSE, message = FALSE} 79 | # Facet the barplot according to the categories of the terms 80 | GOBar(circ, display = 'multiple') 81 | ``` 82 | 83 | To add a title use *title* and to change the colour scale of the z-score use the argument *zsc.col* (output not shown). 84 | 85 | ```{r GOBar3, eval = FALSE, warning = FALSE, message = FALSE} 86 | # Facet the barplot, add a title and change the colour scale for the z-score 87 | GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan')) 88 | ``` 89 | 90 | Barplots are common and very easy to read, but they might not be the absolute solution. Another possibility to display an overview for high- dimensional data is the bubble plot. 91 | 92 | ###The bubble plot (GOBubble) 93 | The bubble plot is another possibility to get an overview of the enriched terms. The z-score is assigned to the x-axis and the negative logarithm of the adjusted p-value to the y-axis, as in the barplot (the higher the more significant). The area of the displayed circles is proportional to the number of genes (circ$count) assigned to the term and the colour corresponds to the category. 94 | The help page of the plotting function (?GOBubble) lists all the arguments to change the layout of the plot. As a default the circles are labeled with the term ID. Therefore, a table connecting the IDs and terms is displayed on the right side by default. You can hide it by setting the argument *table.legend* to FALSE. If you want to display the term description instead set the argument *ID* to FALSE. Not all the circles are labeld due to the limited space and the overlap of the circles. A threshold for the labeling is set (default=5) based on the negative logarithm of the adjusted p-value. 95 | 96 | 97 | ```{r GOBubble1, warning = FALSE, message = FALSE, fig.keep = 'none'} 98 | # Generate the bubble plot with a label threshold of 3 99 | GOBubble(circ, labels = 3) 100 | ``` 101 | 102 | ![GOBubble1.](GOBubble1.png) 103 | 104 | To add a title, change the colour of the circles, facet the plot and to change the label threshold use the following arguments: 105 | 106 | ```{r GOBubble2, warning = FALSE, message = FALSE, fig.keep = 'none'} 107 | # Add a title, change the colour of the circles, facet the plot according to the categories and change the label threshold 108 | GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3) 109 | ``` 110 | 111 | ![GOBubble2.](GOBubble2.png) 112 | 113 | For the facet plot it is also possible to colour the background of the panels according to the displayed category by setting *bg.col* to TRUE. 114 | 115 | ```{r GOBubble3, warning = FALSE, message = FALSE, fig.keep = 'none'} 116 | # Colour the background according to the category 117 | GOBubble(circ, title = 'Bubble plot with background colour', display = 'multiple', bg.col = T, labels = 3) 118 | ``` 119 | 120 | ![GOBubble3.](GOBubble3.png) 121 | 122 | A new function, *reduce_overlap*, was included in the updated version of the package to reduce the number of redundant terms. So far, the implemented method is very simple + slow and needs further refinement. Nevertheless, by reducing the number of redundant terms the readability of plots, like the bubble plot, improves significantly. The function deletes all terms that have a gene overlap greater than or equal to a set threshold. The function keeps one term per group as a representative without taking into consideration the GO hierarchy. 123 | 124 | ```{r GOBubble4, warning = FALSE, message = FALSE, fig.keep = 'none', eval = FALSE} 125 | # Reduce redundant terms with a gene overlap >= 0.75... 126 | reduced_circ <- reduce_overlap(circ, overlap = 0.75) 127 | # ...and plot it 128 | GOBubble(reduced_circ, labels = 2.8) 129 | ``` 130 | 131 | ![GOBubble4.](GOBubble4.png) 132 | 133 | ### Circular visualization of the results of gene- annotation enrichment analysis (GOCircle) 134 | The overview plots shall help to decide which of the terms are the most interesting to us. Of course, this decision depends although on the hypothesis and ideas you want to confirm with your data. Not always are the most significant terms the ones you are interested in. So, after manually selecting a set of valuable terms (EC$process) the next figure should provide us with more details on this specific terms. One of the major issues we figured out by presenting the plots was: it was sometimes difficult to interpret the information the z-score provides. Since the measure is not that common. As shown above it is simply the number of up- regulated genes minus the number of down- regulated genes divided by the square root of the count. The *GOCircle* plot emphasizes this fact. 135 | 136 | ```{r GOCircle1, warning = FALSE, message = FALSE, fig.keep = 'none'} 137 | # Generate a circular visualization of the results of gene- annotation enrichment analysis 138 | GOCircle(circ) 139 | ``` 140 | 141 | ![Circle plot.](GOCirc.png) 142 | 143 | The outer circle shows a scatter plot for each term of the logFC of the assigned genes. Red circles display up- regulation and blue ones down- regulation by default. The colours can be changed with the argument *lfc.col*. Therefore, it is easier to understand, why in some cases highly significant terms have a z-score close to zero. A z-score of zero does not mean that the term is not important. At least not as long as it is significantly enriched. It just shows that the z-score is a crude measure, because obviously the score does not take into account the functional level and activation dependencies of the single genes within a process. 144 | You can change the layout of the plot with various arguments, see ?GOCirlce.The *nsub* argument needs a little bit more explanation to be used wisely. First of all, it can be a numeric or a character vector. If it is a character vector then it contains the IDs or term descriptions of the processes you want to display (output not shown). 145 | 146 | ```{r GOCircle2, eval = FALSE} 147 | # Generate a circular visualization of selected terms 148 | IDs <- c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047') 149 | GOCircle(circ, nsub = IDs) 150 | ``` 151 | 152 | If *nsub* is a numeric vector then the number defines how many terms are displayed. It starts with the first row of the input data frame (output not shown). 153 | 154 | ```{r GOCircle3, eval = FALSE} 155 | # Generate a circular visualization for 10 terms 156 | GOCircle(circ, nsub = 10) 157 | ``` 158 | 159 | This kind of visualization is only useful for a smaller set of terms. The maximum number of terms lies around 12. While the number of terms decreases the amount of displayed information increases. 160 | 161 | ### Display of the relationship between genes and terms (GOChord) 162 | Based on the **[Circos](http://circos.ca/)** plots designed by *[Martin Krzywinski](http://mkweb.bcgsc.ca/)* the *GOChord* plotting function was implemented. It displays the relationship between a list of selected genes and terms, as well as the logFC of the genes. As an input a binary membership matrix is necessary. You can build the matrix on your own or you use the implemented function *chord_dat* which does the job for you. The function takes three arguments: *data*, *genes* and *process*, of which from the last two only one is mandatory. So, the *circle_dat* combined your expression data with the results from the functional analysis. The bar and bubble plot allowed you to get a first impression of your data and now, you selected a list of genes and processes you think are valuable. *GOCircle* adds a layer to display the expression values of the genes assigned to the terms, but it lacks the information of the relationship between the genes and the terms. It is not easy to figure out if some of the genes are linked to multiple processes. The chord plot fills the void left by *GOCircle*. 163 | 164 | ```{r GOChord1, warning = FALSE, message = FALSE} 165 | # Define a list of genes which you think are interesting to look at. The item EC$genes of the toy 166 | # sample contains the data frame of selected genes and their logFC. Have a look... 167 | head(EC$genes) 168 | # Since we have a lot of significantly enriched processes we selected some specific ones (EC$process) 169 | EC$process 170 | # Now it is time to generate the binary matrix 171 | chord <- chord_dat(circ, EC$genes, EC$process) 172 | head(chord) 173 | ``` 174 | 175 | Rows are genes and columns are terms. A '0' indicates that the gene is not assigned to the term; a '1' the opposite. As mentioned before it is possible to leave either the *genes* or the *process* argument out. If you pass on the *process* argument the binary matrix is build for the list of selected genes and all the processes with at least one assigned gene. On the other hand, if you just provide a set of processes without limiting the list of genes, the binary matrix is generated for all the genes which are assigned to at least one of the processes from your list (output not shown). 176 | 177 | ```{r GOChord2, eval=FALSE, warning = FALSE, message = FALSE} 178 | # Generate the matrix with a list of selected genes 179 | chord <- chord_dat(data = circ, genes = EC$genes) 180 | # Generate the matrix with selected processes 181 | chord <- chord_dat(data = circ, process = EC$process) 182 | ``` 183 | 184 | Be aware that a pass on either *genes* or *process* might lead to a large binary matrix which results in a confusing visualization. The chart was designed for smaller subsets of high-dimensional data. 185 | Like the other plotting functions *GOChord* provides the user with a lot of arguments to change the layout of the plot, see ?GOChord. Most of the arguments address the adjustment of the font size of the labels, the space between them, the colour scale for the logFC and the colour of the ribbons. Despite the asthetics there are two other arguments: *gene.order* and *nlfc*. The first argument defines the order of the genes with the three possible options: 'logFC', 'alphabetical', 'none'. Actually the options are quite self- explanatory. Sometimes you are performing the differential expression analysis for multiple conditions and/or batches. Therefore, you want to include more than one logFC value per gene. To adjust to this situation you should use the *nlfc* argument. It is a numeric value and it defines the number of logFC columns within your binary membership matrix. The default is '1' assuming that most of the time you just have one contrast and one logFC value per gene. 186 | 187 | 188 | ```{r GOChord3, warning = FALSE, message = FALSE, fig.keep = 'none'} 189 | # Create the plot 190 | GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5) 191 | ``` 192 | ![Chord1.](GOChord1.png) 193 | 194 | The *space* argument defines the space between the coloured rectangles representing the logFC. Also the font size of the gene labels (*gene.size*) and the space (*gene.space*) between them was changed. The genes were ordered according to their logFC values setting *gene.order* to 'logFC'. 195 | 196 | Sometimes the plot gets a bit crowded and you would like to reduce the number of displayed genes or processes. You can do this automatically by making use of the *limit* argument. Limit is a vector with two cutoff values (default = c(0, 0)). The first value defines the minimum (>=) number of terms a gene has to be assigned to. The second value determines the number of genes assigned to a selected term. For example, to display only genes which are assigned to at least three processes you would use the following line of code (output not shown): 197 | 198 | ```{r GOChord4, warning = FALSE, message = FALSE, fig.keep = 'none'} 199 | # Display only genes which are assigned to at least three processes 200 | GOChord(chord, limit = c(3, 0), gene.order = 'logFC') 201 | ``` 202 | 203 | ### Heatmap of genes and terms (GOHeat) 204 | Thanks to a very nice suggestion from *[Maureen Sartor, Ph.D.](http://sartorlab.ccmb.med.umich.edu/)* I implemented *GOHeat*. The *GOHeat* function generates a heatmap of the relationship between genes and terms similar to *GOChord*. Biological processes are displayed in rows and genes in columns. Each column is divided into smaller rectangles and the colouring of the tiles depends on the presence or abscence of logFC values. In addition genes are clustered to highlight groups of genes with similar annotated functions. Basically the function has two modes depending on the *nlfc* argument. If *nlfc = 0*, so no logFC values are available, the colouring encodes for the overall number of processes the respective gene is assigned to. Let's have a look at an example... 205 | 206 | ```{r GOHeat1, warning = FALSE, message = FALSE, fig.keep = 'none'} 207 | # First, we use the chord object without logFC column to create the heatmap 208 | GOHeat(chord[,-8], nlfc = 0) 209 | ``` 210 | 211 | ![Heat1.](GOHeat_nolfc.png) 212 | 213 | In case of *nlfc = 1* the colour corresponds to the logFC of the gene... 214 | 215 | ```{r GOHeat2, warning = FALSE, message = FALSE, fig.keep = 'none'} 216 | # First, we use the chord object without logFC column to create the heatmap 217 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green')) 218 | ``` 219 | 220 | ![Heat2.](GOHeat_lfc.png) 221 | 222 | ### Golden eye (GOCluster) 223 | The idea behind the *GOCluster* function is to visualize as much information as possible. Here is an example: 224 | 225 | ```{r GOCluster, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'} 226 | GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2) 227 | ``` 228 | ![GOCluster.](GOCluster.png) 229 | 230 | Hierarchical clustering is a popular method for gene expression analysis due to its unsupervised nature assuring an unbiased result. Genes are grouped together based on their expression patterns, thus clusters are likely to contain sets of co-regulated or functionally related genes. *GOCluster* performs the hierarchical clustering of the gene expression profiles using the *hclust* method in core R. If you want to change the distance metric or the clustering algorithm use the arguments *metric* and *clust*, respectively. The resulting dendrogram is transformed with the help of *ggdendro* to be suitable for a visualization with *ggplot2*. As before a circular layout was chosen, because it is not only effective but also visually appealing. The first ring next to the dendrogram represents the logFC of the genes, which are actually the leaves of the clustering tree. In case you are interested in more than one contrast the *nlfc* argument is also available for this function. By default it is set to '1', so only one ring is drawn. Like always the logFC values are colour- coded with an user- definable colour scale (*lfc.col*). The next ring represents the terms assigned to the genes. For aesthetic reasons the terms should be reduced to a reasonable number with the argument *process*. The terms are colour- coded as well and you can change the default colours by using the argument *term.col*. Once again, the plotting function provides you with a bunch of arguments to change the layout of the plot and you can check them out on the help page, ?GOCluster. Probably the most important argument of the function is *clust.by*. It expects a character vector specifying if the clustering should be done for gene expression pattern ('logFC', as in the figure above) or functional categories ('terms'). 231 | 232 | 233 | ```{r GOCluster2, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'} 234 | GOCluster(circ, EC$process, clust.by = 'term', lfc.col = c('darkgoldenrod1', 'black', 'cyan1')) 235 | ``` 236 | ![GOCluster2.](GOCluster2.png) 237 | 238 | ### Venn diagram (GOVenn) 239 | In this biological context we implemented a Venn diagram that can be used to detect relations between various lists of differentially expressed genes or to explore the intersection of genes of multiple terms from the functional analysis. The Venn diagram does not only display the number of overlap genes, but it also displays the information about the gene expression patterns (commonly up- regulated, commonly down- regulated or contra- regulated). At the moment, maximal three datasets are aloud as an input. The input data frame contains at least two columns: one for the gene names and one for the logFC value. 240 | 241 | ```{r GOVenn, warning=FALSE, message=FALSE, fig.keep='none'} 242 | l1 <- subset(circ, term == 'heart development', c(genes,logFC)) 243 | l2 <- subset(circ, term == 'plasma membrane', c(genes,logFC)) 244 | l3 <- subset(circ, term == 'tissue morphogenesis', c(genes,logFC)) 245 | GOVenn(l1,l2,l3, label = c('heart development', 'plasma membrane', 'tissue morphogenesis')) 246 | ``` 247 | ![Venn diagram.](GOVenn.png) 248 | 249 | For example, heart development and tissue morphogenesis share a set of 22 genes, whereas 5 are commonly up-regulated and 17 are commonly down-regulated. The important thing to notice is, that the pie charts don't display redundant information. Thus, if you compare three datasets the genes which are shared by all datasets (pie chart in the middle) are not included in the other pie charts. 250 | The following [link](https://wwalter.shinyapps.io/Venn/) refers to the shinyapp of this tool. The web tool is slightly more interactive since the circles are area-proportional to the number of genes of the dataset and the small pie charts can be moved with sliders. It has also all the other options of the *GOVenn* function to change the layout of the plot. You can easily download the picture and gene lists. 251 | -------------------------------------------------------------------------------- /vignettes/Titel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/Titel.png --------------------------------------------------------------------------------