├── DESCRIPTION
├── MD5
├── NAMESPACE
├── NEWS.md
├── R
    ├── GOCluster.R
    ├── GOCore.R
    ├── GOHeat.R
    ├── GOVenn.R
    └── Helper.R
├── README.md
├── build
    └── vignette.rds
├── data
    └── EC.rda
├── inst
    ├── CITATION
    └── doc
    │   ├── GOplot_vignette.R
    │   ├── GOplot_vignette.Rmd
    │   └── GOplot_vignette.html
├── man
    ├── EC.Rd
    ├── GOBar.Rd
    ├── GOBubble.Rd
    ├── GOChord.Rd
    ├── GOCircle.Rd
    ├── GOCluster.Rd
    ├── GOHeat.Rd
    ├── GOVenn.Rd
    ├── chord_dat.Rd
    ├── circle_dat.Rd
    └── reduce_overlap.Rd
└── vignettes
    ├── GOBar.png
    ├── GOBubble1.png
    ├── GOBubble2.png
    ├── GOBubble3.png
    ├── GOBubble4.png
    ├── GOChord1.png
    ├── GOCirc.png
    ├── GOCluster.png
    ├── GOCluster2.png
    ├── GOHeat_lfc.png
    ├── GOHeat_nolfc.png
    ├── GOVenn.png
    ├── GOplot.css
    ├── GOplot_vignette.Rmd
    └── Titel.png


/DESCRIPTION:
--------------------------------------------------------------------------------
 1 | Package: GOplot
 2 | Type: Package
 3 | Title: Visualization of Functional Analysis Data
 4 | Version: 1.0.2
 5 | Date: 2016-03-30
 6 | Authors@R: c(
 7 |     person("Wencke", "Walter", , email = "wencke.walter@arcor.de", role = c("aut", "cre")),
 8 |     person("Fatima", "Sanchez-Cabo", , role = "aut")
 9 |     )
10 | URL: https://github.com/wencke/wencke.github.io
11 | BugReports: https://github.com/wencke/wencke.github.io/issues
12 | Description: Implementation of multilayered visualizations for enhanced
13 |     graphical representation of functional analysis data. It combines and integrates
14 |     omics data derived from expression and functional annotation enrichment
15 |     analyses. Its plotting functions have been developed with an hierarchical
16 |     structure in mind: starting from a general overview to identify the most
17 |     enriched categories (modified bar plot, bubble plot) to a more detailed one
18 |     displaying different types of relevant information for the molecules in a given
19 |     set of categories (circle plot, chord plot, cluster plot, Venn diagram, heatmap).
20 | Depends: ggplot2 (>= 2.0.0), ggdendro (>= 0.1-17), gridExtra (>=
21 |         2.0.0), RColorBrewer (>= 1.1.2), R (>= 3.2.3)
22 | License: GPL-2
23 | Suggests: knitr, rmarkdown
24 | VignetteBuilder: knitr
25 | LazyData: TRUE
26 | RoxygenNote: 5.0.1
27 | NeedsCompilation: no
28 | Packaged: 2016-03-30 08:24:21 UTC; BioinfoNerd
29 | Author: Wencke Walter [aut, cre],
30 |   Fatima Sanchez-Cabo [aut]
31 | Maintainer: Wencke Walter <wencke.walter@arcor.de>
32 | Repository: CRAN
33 | Date/Publication: 2016-03-30 20:35:02
34 | 


--------------------------------------------------------------------------------
/MD5:
--------------------------------------------------------------------------------
 1 | 1715bfe67bec477dd3cb8d7602e4fa20 *DESCRIPTION
 2 | e61dd1ca8c29cbb323e13d8517d9c02a *NAMESPACE
 3 | 4b28a97ef1acd9ed6c62896137721df8 *NEWS.md
 4 | 23abcbfe83b5aba778ee015295f137aa *R/GOCluster.R
 5 | e572f5815342fade1478975babe13886 *R/GOCore.R
 6 | dfc4d334a0d33f9a93b570eff3e9e80f *R/GOHeat.R
 7 | 42cbd333583bc85dd011a233cb3f2f74 *R/GOVenn.R
 8 | c5f5d6bd7353ce68cefdc2be652b5960 *R/Helper.R
 9 | 584ae11d874b64d6e5c4ff9c5b575d22 *README.md
10 | 0a9da26cff7c27c4304cd6c957e04af8 *build/vignette.rds
11 | 322babedca883c827c36456263b2a8ec *data/EC.rda
12 | b7bcdd8ea7db60feca771ffc0332b250 *inst/CITATION
13 | fcd68b105afd0eeb7427bbf4fd9d6841 *inst/doc/GOplot_vignette.R
14 | bb17de94d29ab2e4eb59b63b01accfb6 *inst/doc/GOplot_vignette.Rmd
15 | 452ddf9993f7c63c2ee59ce2ece6f673 *inst/doc/GOplot_vignette.html
16 | 4251e378a6add0b3ae7a596abdcbb2a6 *man/EC.Rd
17 | e90a8e3541703f5e5883bb3266e24a19 *man/GOBar.Rd
18 | c5bb4945d9ff65d94135ce4efaaf5459 *man/GOBubble.Rd
19 | 44149f89ac4b3e6198648742e408de8c *man/GOChord.Rd
20 | ee2fcf3f06f78085787602c7f8c6093b *man/GOCircle.Rd
21 | ce63714fd31d2075dc19890283c90c32 *man/GOCluster.Rd
22 | 0e75b8f0a887a4b435f3101e75977931 *man/GOHeat.Rd
23 | 12b7a3dc5e14f387675f436209b22346 *man/GOVenn.Rd
24 | 5c0c6435c09dab8eede3bb7bc3c91be0 *man/chord_dat.Rd
25 | 8c9fc2bea8e29d8e69191d14876c4177 *man/circle_dat.Rd
26 | e571685ac3bd31d377069f5340b3902f *man/reduce_overlap.Rd
27 | da9c6b713c73d7cf668dab80ac28f2de *vignettes/GOBar.png
28 | 9d64131438541b889d07e8e2b7f44d30 *vignettes/GOBubble1.png
29 | 0ea657fee85af6e5dc3c8f43e2ec9cbd *vignettes/GOBubble2.png
30 | fbf7808d3c7235e04feac329eadc9295 *vignettes/GOBubble3.png
31 | cdb1824e3bf1830599ffe85eb60bbcd4 *vignettes/GOBubble4.png
32 | e5787103b84b40771253e4e76beaf5b3 *vignettes/GOChord1.png
33 | ea9374ef982d235dcbcc1f8b2158a804 *vignettes/GOCirc.png
34 | 8e332df56b7dd05c8c81bbda25811926 *vignettes/GOCluster.png
35 | 93aa247b84c448148ecbc7edf4e94c50 *vignettes/GOCluster2.png
36 | d6fb473e4aaa993d3532bc78fa32b5c9 *vignettes/GOHeat_lfc.png
37 | 140ee05839dbcb04483ab1735cc18f62 *vignettes/GOHeat_nolfc.png
38 | 0514ae4a1ca467974d4197efddba0639 *vignettes/GOVenn.png
39 | 5f57576c51fe172d93f3622cc433f762 *vignettes/GOplot.css
40 | bb17de94d29ab2e4eb59b63b01accfb6 *vignettes/GOplot_vignette.Rmd
41 | f9eb01d9aba255d27107fd69cc246ba5 *vignettes/Titel.png
42 | 


--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
 1 | # Generated by roxygen2: do not edit by hand
 2 | 
 3 | export(GOBar)
 4 | export(GOBubble)
 5 | export(GOChord)
 6 | export(GOCircle)
 7 | export(GOCluster)
 8 | export(GOHeat)
 9 | export(GOVenn)
10 | export(chord_dat)
11 | export(circle_dat)
12 | export(reduce_overlap)
13 | import(RColorBrewer)
14 | import(ggdendro)
15 | import(ggplot2)
16 | import(grDevices)
17 | import(graphics)
18 | import(gridExtra)
19 | import(stats)
20 | 


--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
 1 | GOplot 1.0.2 (2016-03-29) 
 2 | ----------------------------------------
 3 | 
 4 | * Add function 'reduce_overlap' to reduce the number of redundant terms and improve readability of plots
 5 | 
 6 | * Add parameter 'bg.col' to GOBubble() to enable panel background colour of facet plot 
 7 | 
 8 | * Add new plot function GOHeat()
 9 | 
10 | * Fix various bugs of draw_table()
11 | 
12 | 
13 | GOplot 1.0.1 (2015-07-15) 
14 | ----------------------------------------
15 | 
16 | * Fix various bugs of GOVenn()
17 | 
18 | * Fix bug of 'process.lable' argument in GOChord
19 | 
20 | * Adjust draw_table() to new release of gridExtra
21 | 
22 | * Add parameter 'limit' to chord_dat() to restrict the dimension of the binary martix


--------------------------------------------------------------------------------
/R/GOCluster.R:
--------------------------------------------------------------------------------
  1 | #' 
  2 | #' @name GOCluster
  3 | #' @title Circular dendrogram.
  4 | #' @description GOCluster generates a circular dendrogram of the \code{data} 
  5 | #'   clustering using by default euclidean distance and average linkage.The 
  6 | #'   inner ring displays the color coded logFC while the outside one encodes the
  7 | #'   assigned terms to each gene.
  8 | #' @param data A data frame which should be the result of 
  9 | #'   \code{\link{circle_dat}} in case the data contains only one logFC column. 
 10 | #'   Otherwise \code{data} is a data frame whereas the first column contains the
 11 | #'   genes, the second the term and the following columns the logFCs of the 
 12 | #'   different contrasts.
 13 | #' @param process A character vector of selected processes (ID or term
 14 | #'   description)
 15 | #' @param metric A character vector specifying the distance measure to be used 
 16 | #'   (default='euclidean'), see \code{dist}
 17 | #' @param clust A character vector specifying the agglomeration method to be 
 18 | #'   used (default='average'), see \code{hclust}
 19 | #' @param clust.by A character vector specifying if the clustering should be 
 20 | #'   done for gene expression pattern or functional categories. By default the 
 21 | #'   clustering is done based on the functional categories.
 22 | #' @param nlfc If TRUE \code{data} contains multiple logFC columns (default= 
 23 | #'   FALSE)
 24 | #' @param lfc.col Character vector to define the color scale for the logFC of 
 25 | #'   the form c(high, midpoint,low)
 26 | #' @param lfc.min Specifies the minimium value of the logFC scale (default = -3)
 27 | #' @param lfc.max Specifies the maximum value of the logFC scale (default = 3)
 28 | #' @param lfc.space The space between the leafs of the dendrogram and the ring 
 29 | #'   for the logFC
 30 | #' @param lfc.width The width of the logFC ring
 31 | #' @param term.col A character vector specifying the colors of the term bands
 32 | #' @param term.space The space between the logFC ring and the term ring
 33 | #' @param term.width The width of the term ring
 34 | #' @details The inner ring can be split into smaller rings to display multiply
 35 | #'   logFC values resulting from various comparisons.
 36 | #' @import ggplot2
 37 | #' @import ggdendro
 38 | #' @import RColorBrewer
 39 | #' @import stats
 40 | #' @examples
 41 | #' \dontrun{
 42 | #' #Load the included dataset
 43 | #' data(EC)
 44 | #' 
 45 | #' #Generating the circ object
 46 | #' circ<-circular_dat(EC$david, EC$genelist)
 47 | #' 
 48 | #' #Creating the cluster plot
 49 | #' GOCluster(circ, EC$process)
 50 | #' 
 51 | #' #Cluster the data according to gene expression and assigning a different color scale for the logFC
 52 | #' GOCluster(circ,EC$process,clust.by='logFC',lfc.col=c('darkgoldenrod1','black','cyan1'))
 53 | #' }
 54 | #' @export
 55 | #' 
 56 | 
 57 | GOCluster<-function(data, process, metric, clust, clust.by, nlfc, lfc.col, lfc.min, lfc.max, lfc.space, lfc.width, term.col, term.space, term.width){
 58 |   x <- y <- xend <- yend <- width <- space <- logFC <- NULL
 59 |   if (missing(metric)) metric<-'euclidean'
 60 |   if (missing(clust)) clust<-'average'
 61 |   if (missing(clust.by)) clust.by<-'term'
 62 |   if (missing(nlfc)) nlfc <- 0
 63 |   if (missing(lfc.col)) lfc.col<-c('firebrick1','white','dodgerblue')
 64 |   if (missing(lfc.min)) lfc.min <- -3
 65 |   if (missing(lfc.max)) lfc.max <- 3
 66 |   if (missing(lfc.space)) lfc.space<- (-0.5) else lfc.space<-lfc.space*(-1)
 67 |   if (missing(lfc.width)) lfc.width<- (-1.6) else lfc.width<-lfc.space-lfc.width-0.1
 68 |   if (missing(term.col)) term.col<-brewer.pal(length(process), 'Set3')
 69 |   if (missing(term.space)) term.space<- lfc.space+lfc.width else term.space<-term.space*(-1)+lfc.width
 70 |   if (missing(term.width)) term.width<- 2*lfc.width+term.space else term.width<-term.width*(-1)+term.space
 71 |   
 72 | 
 73 |   if (clust.by=='logFC') distance <- stats::dist(chord[,dim(chord)[2]], method=metric)
 74 |   if (clust.by=='term') distance <- stats::dist(chord, method=metric)
 75 |   cluster <- stats::hclust(distance, method=clust)
 76 |   dendr <- dendro_data(cluster)
 77 |   y_range <- range(dendr$segments$y)
 78 |   x_pos <- data.frame(x=dendr$label$x, label=as.character(dendr$label$label))
 79 |   chord <- as.data.frame(chord)
 80 |   chord$label <- as.character(rownames(chord))
 81 |   all <- merge(x_pos, chord, by='label')
 82 |   all$label <- as.character(all$label)
 83 |   if (nlfc){
 84 |     lfc_rect <- all[,c(2, dim(all)[2])]
 85 |     for (l in 4:dim(data)[2]) lfc_rect <- cbind(lfc_rect, sapply(all$label, function(x) data[match(x, data$genes), l]))
 86 |     num <- dim(data)[2]-1
 87 |     tmp <- seq(lfc.space, lfc.width, length = num)
 88 |     lfc<-data.frame(x=numeric(),width=numeric(),space=numeric(),logFC=numeric())
 89 |     for (l in 1:(length(tmp)-1)){
 90 |       tmp_df<-data.frame(x=lfc_rect[,1],width=tmp[l+1],space=tmp[l],logFC=lfc_rect[,l+1])
 91 |       lfc<-rbind(lfc,tmp_df)
 92 |     }
 93 |   }else{
 94 |     lfc <- all[,c(2, dim(all)[2])]  
 95 |     lfc$space <- lfc.space
 96 |     lfc$width <- lfc.width
 97 |   }
 98 |   term <- all[,c(2:(length(process)+2))]
 99 |   color<-NULL;termx<-NULL;tspace<-NULL;twidth<-NULL
100 |   for (row in 1:dim(term)[1]){
101 |     idx <- which(term[row,-1] != 0)
102 |     if(length(idx) != 0){
103 |       termx<-c(termx,rep(term[row,1],length(idx)))
104 |       color<-c(color,term.col[idx])
105 |       tmp<-seq(term.space,term.width,length=length(idx)+1)
106 |       tspace<-c(tspace,tmp[1:(length(tmp)-1)])
107 |       twidth<-c(twidth,tmp[2:length(tmp)])
108 |     }
109 |   }
110 |   tmp <- sapply(lfc$logFC, function(x) ifelse(x > lfc.max, lfc.max, x))
111 |   logFC <- sapply(tmp, function(x) ifelse(x < lfc.min, lfc.min, x))
112 |   lfc$logFC <- logFC
113 |   term_rect <- data.frame(x = termx, width = twidth, space = tspace, col = color)
114 |   legend <- data.frame(x = 1:length(process),label = process)
115 | 
116 |   ggplot()+
117 |     geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend))+
118 |     geom_rect(data=lfc,aes(xmin=x-0.5,xmax=x+0.5,ymin=width,ymax=space,fill=logFC))+
119 |     scale_fill_gradient2('logFC', space = 'Lab', low=lfc.col[3],mid=lfc.col[2],high=lfc.col[1],guide=guide_colorbar(title.position='top',title.hjust=0.5),breaks=c(min(lfc$logFC),max(lfc$logFC)),labels=c(round(min(lfc$logFC)),round(max(lfc$logFC))))+
120 |     geom_rect(data=term_rect,aes(xmin=x-0.5,xmax=x+0.5,ymin=width,ymax=space),fill=term_rect$col)+
121 |     geom_point(data=legend,aes(x=x,y=0.1,size=factor(label,levels=label),shape=NA))+
122 |     guides(size=guide_legend("GO Terms",ncol=4,byrow=T,override.aes=list(shape=22,fill=term.col,size = 8)))+
123 |     coord_polar()+
124 |     scale_y_reverse()+
125 |     theme(legend.position='bottom',legend.background = element_rect(fill='transparent'),legend.box='horizontal',legend.direction='horizontal')+
126 |     theme_blank  
127 |     
128 | }
129 | 
130 | #' 
131 | #' @name GOChord
132 | #' @title Displays the relationship between genes and terms.
133 | #' @description The GOChord function generates a circularly composited overview 
134 | #'   of selected/specific genes and their assigned processes or terms. More 
135 | #'   generally, it joins genes and processes via ribbons in an intersection-like
136 | #'   graph. The input can be generated with the \code{\link{chord_dat}} 
137 | #'   function.
138 | #' @param data The matrix represents the binary relation (1= is related to, 0= 
139 | #'   is not related to) between a set of genes (rows) and processes (columns); a
140 | #'   column for the logFC of the genes is optional
141 | #' @param title The title (on top) of the plot
142 | #' @param space The space between the chord segments of the plot
143 | #' @param gene.order A character vector defining the order of the displayed gene
144 | #'   labels
145 | #' @param gene.size The size of the gene labels
146 | #' @param gene.space The space between the gene labels and the segement of the 
147 | #'   logFC
148 | #' @param nlfc Defines the number of logFC columns (default=1)
149 | #' @param lfc.col The fill color for the logFC specified in the following form: 
150 | #'   c(color for low values, color for the mid point, color for the high values)
151 | #' @param lfc.min Specifies the minimium value of the logFC scale (default = -3)
152 | #' @param lfc.max Specifies the maximum value of the logFC scale (default = 3)
153 | #' @param ribbon.col The background color of the ribbons
154 | #' @param border.size Defines the size of the ribbon borders
155 | #' @param process.label The size of the legend entries
156 | #' @param limit A vector with two cutoff values (default= c(0,0)). The first 
157 | #' value defines the minimum number of terms a gene has to be assigned to. The 
158 | #' second the minimum number of genes assigned to a selected term.
159 | #' @details The \code{gene.order} argument has three possible options: "logFC", 
160 | #'   "alphabetical", "none", which are quite self- explanatory.
161 | #'   
162 | #'   Maybe the most important argument of the function is \code{nlfc}.If your 
163 | #'   \code{data} does not contain a column of logFC values you have to set
164 | #'   \code{nlfc = 0}. Differential expression analysis can be performed for
165 | #'   multiple conditions and/or batches. Therefore, the data frame might contain
166 | #'   more than one logFC value per gene. To adjust to this situation the
167 | #'   \code{nlfc} argument is used as well. It is a numeric value and it defines
168 | #'   the number of logFC columns of your \code{data}. The default is "1"
169 | #'   assuming that most of the time only one contrast is considered.
170 | #'   
171 | #'   To represent the data more useful it might be necessary to reduce the 
172 | #'   dimension of \code{data}. This can be achieved with \code{limit}. The first
173 | #'   value of the vector defines the threshold for the minimum number of terms a
174 | #'   gene has to be assigned to in order to be represented in the plot. Most of
175 | #'   the time it is more meaningful to represent genes with various functions. A
176 | #'   value of 3 excludes all genes with less than three term assignments. 
177 | #'   Whereas the second value of the parameter restricts the number of terms 
178 | #'   according to the number of assigned genes. All terms with a count smaller 
179 | #'   or equal to the threshold are excluded.
180 | #' @seealso \code{\link{chord_dat}}
181 | #' @import ggplot2
182 | #' @import grDevices
183 | #' @examples
184 | #' \dontrun{
185 | #' # Load the included dataset
186 | #' data(EC)
187 | #' 
188 | #' # Generating the binary matrix
189 | #' chord<-chord_dat(circ,EC$genes,EC$process)
190 | #' 
191 | #' # Creating the chord plot
192 | #' GOChord(chord)
193 | #' 
194 | #' # Excluding process with less than 5 assigned genes
195 | #' GOChord(chord, limit = c(0,5))
196 | #' 
197 | #' # Creating the chord plot genes ordered by logFC and a different logFC color scale
198 | #' GOChord(chord,space=0.02,gene.order='logFC',lfc.col=c('red','black','cyan'))
199 | #' }
200 | #' @export
201 | 
202 | GOChord <- function(data, title, space, gene.order, gene.size, gene.space, nlfc = 1, lfc.col, lfc.min, lfc.max, ribbon.col, border.size, process.label, limit){
203 |   y <- id <- xpro <- ypro <- xgen <- ygen <- lx <- ly <- ID <- logFC <- NULL
204 |   Ncol <- dim(data)[2]
205 |   
206 |   if (missing(title)) title <- ''
207 |   if (missing(space)) space = 0
208 |   if (missing(gene.order)) gene.order <- 'none'
209 |   if (missing(gene.size)) gene.size <- 3
210 |   if (missing(gene.space)) gene.space <- 0.2
211 |   if (missing(lfc.col)) lfc.col <- c('brown1', 'azure', 'cornflowerblue')
212 |   if (missing(lfc.min)) lfc.min <- -3
213 |   if (missing(lfc.max)) lfc.max <- 3
214 |   if (missing(border.size)) border.size <- 0.5
215 |   if (missing (process.label)) process.label <- 11
216 |   if (missing(limit)) limit <- c(0, 0)
217 |   
218 |   if (gene.order == 'logFC') data <- data[order(data[, Ncol], decreasing = T), ]
219 |   if (gene.order == 'alphabetical') data <- data[order(rownames(data)), ]
220 |   if (sum(!is.na(match(colnames(data), 'logFC'))) > 0){
221 |     if (nlfc == 1){
222 |       cdata <- check_chord(data[, 1:(Ncol - 1)], limit)
223 |       lfc <- sapply(rownames(cdata), function(x) data[match(x,rownames(data)), Ncol])
224 |     }else{
225 |       cdata <- check_chord(data[, 1:(Ncol - nlfc)], limit)
226 |       lfc <- sapply(rownames(cdata), function(x) data[, (Ncol - nlfc + 1)])
227 |     }
228 |   }else{
229 |     cdata <- check_chord(data, limit)
230 |     lfc <- 0
231 |   }
232 |   if (missing(ribbon.col)) colRib <- grDevices::rainbow(dim(cdata)[2]) else colRib <- ribbon.col
233 |   nrib <- colSums(cdata)
234 |   ngen <- rowSums(cdata)
235 |   Ncol <- dim(cdata)[2]
236 |   Nrow <- dim(cdata)[1]
237 |   colRibb <- c()
238 |   for (b in 1:length(nrib)) colRibb <- c(colRibb, rep(colRib[b], 202 * nrib[b]))
239 |   r1 <- 1; r2 <- r1 + 0.1
240 |   xmax <- c(); x <- 0
241 |   for (r in 1:length(nrib)){
242 |     perc <- nrib[r] / sum(nrib)
243 |     xmax <- c(xmax, (pi * perc) - space)
244 |     if (length(x) <= Ncol - 1) x <- c(x, x[r] + pi * perc)
245 |   }
246 |   xp <- c(); yp <- c()
247 |   l <- 50
248 |   for (s in 1:Ncol){
249 |     xh <- seq(x[s], x[s] + xmax[s], length = l)
250 |     xp <- c(xp, r1 * sin(x[s]), r1 * sin(xh), r1 * sin(x[s] + xmax[s]), r2 * sin(x[s] + xmax[s]), r2 * sin(rev(xh)), r2 * sin(x[s]))
251 |     yp <- c(yp, r1 * cos(x[s]), r1 * cos(xh), r1 * cos(x[s] + xmax[s]), r2 * cos(x[s] + xmax[s]), r2 * cos(rev(xh)), r2 * cos(x[s]))
252 |   }
253 |   df_process <- data.frame(x = xp, y = yp, id = rep(c(1:Ncol), each = 4 + 2 * l))
254 |   xp <- c(); yp <- c(); logs <- NULL
255 |   x2 <- seq(0 - space, -pi - (-pi / Nrow) - space, length = Nrow)
256 |   xmax2 <- rep(-pi / Nrow + space, length = Nrow)
257 |   for (s in 1:Nrow){
258 |     xh <- seq(x2[s], x2[s] + xmax2[s], length = l)
259 |     if (nlfc <= 1){
260 |       xp <- c(xp, (r1 + 0.05) * sin(x2[s]), (r1 + 0.05) * sin(xh), (r1 + 0.05) * sin(x2[s] + xmax2[s]), r2 * sin(x2[s] + xmax2[s]), r2 * sin(rev(xh)), r2 * sin(x2[s]))
261 |       yp <- c(yp, (r1 + 0.05) * cos(x2[s]), (r1 + 0.05) * cos(xh), (r1 + 0.05) * cos(x2[s] + xmax2[s]), r2 * cos(x2[s] + xmax2[s]), r2 * cos(rev(xh)), r2 * cos(x2[s]))
262 |     }else{
263 |       tmp <- seq(r1, r2, length = nlfc + 1)
264 |       for (t in 1:nlfc){
265 |         logs <- c(logs, data[s, (dim(data)[2] + 1 - t)])
266 |         xp <- c(xp, (tmp[t]) * sin(x2[s]), (tmp[t]) * sin(xh), (tmp[t]) * sin(x2[s] + xmax2[s]), tmp[t + 1] * sin(x2[s] + xmax2[s]), tmp[t + 1] * sin(rev(xh)), tmp[t + 1] * sin(x2[s]))
267 |         yp <- c(yp, (tmp[t]) * cos(x2[s]), (tmp[t]) * cos(xh), (tmp[t]) * cos(x2[s] + xmax2[s]), tmp[t + 1] * cos(x2[s] + xmax2[s]), tmp[t + 1] * cos(rev(xh)), tmp[t + 1] * cos(x2[s]))
268 |       }}}
269 |   if(lfc[1] != 0){
270 |     if (nlfc == 1){
271 |       df_genes <- data.frame(x = xp, y = yp, id = rep(c(1:Nrow), each = 4 + 2 * l), logFC = rep(lfc, each = 4 + 2 * l))
272 |     }else{
273 |       df_genes <- data.frame(x = xp, y = yp, id = rep(c(1:(nlfc*Nrow)), each = 4 + 2 * l), logFC = rep(logs, each = 4 + 2 * l))  
274 |     }
275 |   }else{
276 |     df_genes <- data.frame(x = xp, y = yp, id = rep(c(1:Nrow), each = 4 + 2 * l))
277 |   }
278 |   aseq <- seq(0, 180, length = length(x2)); angle <- c()
279 |   for (o in aseq) if((o + 270) <= 360) angle <- c(angle, o + 270) else angle <- c(angle, o - 90)
280 |   df_texg <- data.frame(xgen = (r1 + gene.space) * sin(x2 + xmax2/2),ygen = (r1 + gene.space) * cos(x2 + xmax2 / 2),labels = rownames(cdata), angle = angle)
281 |   df_texp <- data.frame(xpro = (r1 + 0.15) * sin(x + xmax / 2),ypro = (r1 + 0.15) * cos(x + xmax / 2), labels = colnames(cdata), stringsAsFactors = FALSE)
282 |   cols <- rep(colRib, each = 4 + 2 * l)
283 |   x.end <- c(); y.end <- c(); processID <- c()
284 |   for (gs in 1:length(x2)){
285 |     val <- seq(x2[gs], x2[gs] + xmax2[gs], length = ngen[gs] + 1)
286 |     pros <- which((cdata[gs, ] != 0) == T)
287 |     for (v in 1:(length(val) - 1)){
288 |       x.end <- c(x.end, sin(val[v]), sin(val[v + 1]))
289 |       y.end <- c(y.end, cos(val[v]), cos(val[v + 1]))
290 |       processID <- c(processID, rep(pros[v], 2))
291 |     }
292 |   }
293 |   df_bezier <- data.frame(x.end = x.end, y.end = y.end, processID = processID)
294 |   df_bezier <- df_bezier[order(df_bezier$processID,-df_bezier$y.end),]
295 |   x.start <- c(); y.start <- c()
296 |   for (rs in 1:length(x)){
297 |     val<-seq(x[rs], x[rs] + xmax[rs], length = nrib[rs] + 1)
298 |     for (v in 1:(length(val) - 1)){
299 |       x.start <- c(x.start, sin(val[v]), sin(val[v + 1]))
300 |       y.start <- c(y.start, cos(val[v]), cos(val[v + 1]))
301 |     }
302 |   }	
303 |   df_bezier$x.start <- x.start
304 |   df_bezier$y.start <- y.start
305 |   df_path <- bezier(df_bezier, colRib)
306 |   if(length(df_genes$logFC) != 0){
307 |     tmp <- sapply(df_genes$logFC, function(x) ifelse(x > lfc.max, lfc.max, x))
308 |     logFC <- sapply(tmp, function(x) ifelse(x < lfc.min, lfc.min, x))
309 |     df_genes$logFC <- logFC
310 |   }
311 |   
312 |   g<- ggplot() +
313 |     geom_polygon(data = df_process, aes(x, y, group=id), fill='gray70', inherit.aes = F,color='black') +
314 |     geom_polygon(data = df_process, aes(x, y, group=id), fill=cols, inherit.aes = F,alpha=0.6,color='black') +	
315 |     geom_point(aes(x = xpro, y = ypro, size = factor(labels, levels = labels), shape = NA), data = df_texp) +
316 |     guides(size = guide_legend("GO Terms", ncol = 4, byrow = T, override.aes = list(shape = 22, fill = unique(cols), size = 8))) +
317 |     theme(legend.text = element_text(size = process.label)) +
318 |     geom_text(aes(xgen, ygen, label = labels, angle = angle), data = df_texg, size = gene.size) +
319 |     geom_polygon(aes(x = lx, y = ly, group = ID), data = df_path, fill = colRibb, color = 'black', size = border.size, inherit.aes = F) +		
320 |     labs(title = title) +
321 |     theme_blank
322 |   
323 |   if (nlfc >= 1){
324 |     g + geom_polygon(data = df_genes, aes(x, y, group = id, fill = logFC), inherit.aes = F, color = 'black') +
325 |       scale_fill_gradient2('logFC', space = 'Lab', low = lfc.col[3], mid = lfc.col[2], high = lfc.col[1], guide = guide_colorbar(title.position = "top", title.hjust = 0.5), 
326 |                            breaks = c(min(df_genes$logFC), max(df_genes$logFC)), labels = c(round(min(df_genes$logFC)), round(max(df_genes$logFC)))) +
327 |       theme(legend.position = 'bottom', legend.background = element_rect(fill = 'transparent'), legend.box = 'horizontal', legend.direction = 'horizontal')
328 |   }else{
329 |     g + geom_polygon(data = df_genes, aes(x, y, group = id), fill = 'gray50', inherit.aes = F, color = 'black')+
330 |       theme(legend.position = 'bottom', legend.background = element_rect(fill = 'transparent'), legend.box = 'horizontal', legend.direction = 'horizontal')
331 |   }
332 | }


--------------------------------------------------------------------------------
/R/GOCore.R:
--------------------------------------------------------------------------------
  1 | #' Transcriptomic information of endothelial cells.
  2 | #' 
  3 | #' The data set contains the transcriptomic information of endothelial cells
  4 | #' from two steady state tissues (brain and heart). More detailed information
  5 | #' can be found in the paper by Nolan et al. 2013. The data was normalized and a
  6 | #' statistical analysis was performed to determine differentially expressed
  7 | #' genes. DAVID functional annotation tool was used to perform a gene-
  8 | #' annotation enrichment analysis of the set of differentially expressed genes
  9 | #' (adjusted p-value < 0.05).
 10 | #' 
 11 | #' @docType data
 12 | #' @keywords datasets
 13 | #' @name EC
 14 | #' @usage data(EC)
 15 | #' @format A list containing 5 items 
 16 | #' @source \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067}
 17 | "EC"
 18 | 
 19 | #' 
 20 | #' @name circle_dat
 21 | #' @title Creates a plotting object.
 22 | #' @description The function takes the results from a functional analysis (for 
 23 | #'   example DAVID) and combines it with a list of selected genes and their 
 24 | #'   logFC. The resulting data frame can be used as an input for various ploting
 25 | #'   functions.
 26 | #' @param terms A data frame with columns for 'category', 'ID', 'term', adjusted
 27 | #'   p-value ('adj_pval') and 'genes'
 28 | #' @param genes A data frame with columns for 'ID', 'logFC'
 29 | #' @details Since most of the gene- annotation enrichment analysis are based on 
 30 | #'   the gene ontology database the package was build with this structure in 
 31 | #'   mind, but is not restricted to it. Gene ontology is structured as an 
 32 | #'   acyclic graph and it provides terms covering different areas. These terms 
 33 | #'   are grouped into three independent \code{categories}: BP (biological 
 34 | #'   process), CC (cellular component) or MF (molecular function).
 35 | #'   
 36 | #'   The "ID" and "term" columns of the \code{terms} data frame refer to the ID 
 37 | #'   and term description, whereas the ID is optional.
 38 | #'   
 39 | #'   The "ID" column of the \code{genes} data frame can contain any unique 
 40 | #'   identifier. Nevertheless, the identifier has to be the same as in "genes" 
 41 | #'   from \code{terms}.
 42 | #' @examples
 43 | #' \dontrun{
 44 | #' #Load the included dataset
 45 | #' data(EC)
 46 | #' 
 47 | #' #Building the circ object
 48 | #' circ<-circular_dat(EC$david, EC$genelist)
 49 | #' }
 50 | #' @export
 51 | 
 52 | circle_dat <- function(terms, genes){
 53 |   
 54 |   colnames(terms) <- tolower(colnames(terms))
 55 |   terms$genes <- toupper(terms$genes)
 56 |   genes$ID <- toupper(genes$ID)
 57 |   tgenes <- strsplit(as.vector(terms$genes), ', ')
 58 |   if (length(tgenes[[1]]) == 1) tgenes <- strsplit(as.vector(terms$genes), ',')
 59 |   count <- sapply(1:length(tgenes), function(x) length(tgenes[[x]]))
 60 |   logFC <- sapply(unlist(tgenes), function(x) genes$logFC[match(x, genes$ID)])
 61 |   if(class(logFC) == 'factor'){
 62 |     logFC <- gsub(",", ".", gsub("\\.", "", logFC))
 63 |     logFC <- as.numeric(logFC)
 64 |   }
 65 |   s <- 1; zsc <- c()
 66 |   for (c in 1:length(count)){
 67 |     value <- 0
 68 |     e <- s + count[c] - 1
 69 |     value <- sapply(logFC[s:e], function(x) ifelse(x > 0, 1, -1))
 70 |     zsc <- c(zsc, sum(value) / sqrt(count[c]))
 71 |     s <- e + 1
 72 |   }
 73 |   if (is.null(terms$id)){
 74 |     df <- data.frame(category = rep(as.character(terms$category), count), term = rep(as.character(terms$term), count),
 75 |                      count = rep(count, count), genes = as.character(unlist(tgenes)), logFC = logFC, adj_pval = rep(terms$adj_pval, count),
 76 |                      zscore = rep(zsc, count), stringsAsFactors = FALSE)
 77 |   }else{
 78 |     df <- data.frame(category = rep(as.character(terms$category), count), ID = rep(as.character(terms$id), count), term = rep(as.character(terms$term), count),
 79 |                      count = rep(count, count), genes = as.character(unlist(tgenes)), logFC = logFC, adj_pval = rep(terms$adj_pval, count),
 80 |                      zscore = rep(zsc, count), stringsAsFactors = FALSE)
 81 |   }
 82 |   return(df)
 83 | }
 84 | 
 85 | #' 
 86 | #' @name chord_dat
 87 | #' @title Creates a binary matrix.
 88 | #' @description The function creates a matrix which represents the binary 
 89 | #'   relation (1= is related to, 0= is not related to) between selected genes 
 90 | #'   (row) and processes (column). The resulting matrix can be visualized with 
 91 | #'   the \code{\link{GOChord}} function.
 92 | #' @param data A data frame with at least two coloumns: GO ID|term and genes. 
 93 | #'   Each row contains exactly one GO ID|term and one gene. A column containing
 94 | #'   logFC values is optional and might be used if \code{genes} is missing.
 95 | #' @param genes A character vector of selected genes OR data frame with coloumns
 96 | #'   for gene ID and logFC.
 97 | #' @details If more than one logFC value for each gene is at disposal, only one 
 98 | #'   should be used to create the binary matrix. The other values have to be 
 99 | #'   added manually later. 
100 | #' @param process A character vector of selected processes
101 | #' @return A binary matrix
102 | #' @seealso \code{\link{GOChord}}
103 | #' @examples
104 | #' \dontrun{
105 | #' # Load the included dataset
106 | #' data(EC)
107 | #' 
108 | #' # Building the circ object
109 | #' circ <- circle_dat(EC$david, EC$genelist)
110 | #' 
111 | #' # Building the binary matrix
112 | #' chord <- chord_dat(circ, EC$genes, EC$process)
113 | #' 
114 | #' }
115 | #' @export
116 | 
117 | chord_dat <- function(data, genes, process){
118 |   id <- term <- logFC <- BPprocess <- NULL
119 | 
120 |   colnames(data) <- tolower(colnames(data))
121 |   if (missing(genes)){
122 |     if (is.null(data$logFC)){
123 |       genes <- as.character(unique(data$genes))
124 |     }else{
125 |       genes <- subset(data, !duplicated(genes), c(genes, logFC))
126 |     }
127 |   }else{
128 |     if(is.vector(genes)){
129 |       genes <- as.character(genes) 
130 |     }else{
131 |       if(class(genes[, 2]) != 'numeric') genes[, 2] <- as.numeric(levels(genes[, 2]))[genes[, 2]]
132 |       genes[, 1] <- as.character(genes[, 1])
133 |       colnames(genes) <- c('genes', 'logFC')
134 |     }
135 |   }
136 |   if (missing(process)){
137 |     process <- as.character(unique(data$term))
138 |   }else{
139 |     if(class(process) != 'character') process <- as.character(process)
140 |   }
141 |   if (strsplit(process[1],':')[[1]][1] == 'GO'){
142 |     subData <- subset(data, id%in%process)
143 |     colnames(subData)[which(colnames(subData) == 'id')] <- 'BPprocess'
144 |   }else{
145 |     subData <- subset(data, term%in%process)
146 |     colnames(subData)[which(colnames(subData) == 'term')] <- 'BPprocess'
147 |   }
148 |   
149 |   if(is.vector(genes)){
150 |     M <- genes[genes%in%unique(subData$genes)]
151 |     mat <- matrix(0, ncol = length(process), nrow = length(M))
152 |     rownames(mat) <- M
153 |     colnames(mat) <- process
154 |     for (p in 1:length(process)){
155 |       sub2 <- subset(subData, BPprocess == process[p])
156 |       for (g in 1:length(M)) mat[g, p] <- ifelse(M[g]%in%sub2$genes, 1, 0)
157 |     }
158 |   }else{
159 |     genes <- subset(genes, genes %in% unique(subData$genes))
160 |     N <- length(process) + 1
161 |     M <- genes[,1] 
162 |     mat <- matrix(0, ncol = N, nrow = length(M))
163 |     rownames(mat) <- M
164 |     colnames(mat) <- c(process, 'logFC') 
165 |     mat[,N] <- genes[,2]
166 |     for (p in 1:(N-1)){
167 |       sub2 <- subset(subData, BPprocess == process[p])
168 |       for (g in 1:length(M)) mat[g, p] <- ifelse(M[g]%in%sub2$genes, 1, 0)
169 |     }
170 |   }
171 |   return(mat)
172 | }
173 | 
174 | #' 
175 | #' @name reduce_overlap
176 | #' @title Eliminates redundant terms.
177 | #' @description The function eliminates all terms with a gene overlap >= set
178 | #'   threshold (\code{overlap}) The reduced dataset can be used to improve the
179 | #'   readability of plots such as \code{GOBubble} and \code{GOBar}
180 | #' @param data A data frame created with \code{circle_dat}.
181 | #' @param overlap Skalar indicating the threshold for gene overlap (default = 0.75).
182 | #' @details The function is currently very slow.
183 | #' @examples
184 | #' \dontrun{
185 | #' # Load the included dataset
186 | #' data(EC)
187 | #' 
188 | #' # Building the circ object
189 | #' circ <- circle_dat(EC$david, EC$genelist)
190 | #' 
191 | #' # Eliminate redundant terms
192 | #' reduced_circ <- reduce_overlap(circ)
193 | #' 
194 | #' # Plot reduced data
195 | #' GOBubble(reduced_circ)
196 | #' 
197 | #' }
198 | #' @export
199 | 
200 | reduce_overlap <- function(data, overlap){
201 |   term <- genes <- NULL
202 |   if (missing(overlap)) overlap <- 0.75
203 |   terms <- unique(data$term)
204 |   FUN <- function(x,y) round(sum(x$genes %in% y$genes)/nrow(x), digits = 2)
205 |   tmp <- matrix(0, ncol = length(terms), nrow = length(terms), dimnames = list(terms, terms))
206 |   for (row in 1:nrow(tmp)){
207 |     for (col in 1:ncol(tmp)){
208 |       tmp[row, col] <- FUN(subset(data, term == terms[row], genes), subset(data, term == terms[col], genes))
209 |     }
210 |   }
211 |   tmp[base::upper.tri(tmp)] <- 0
212 |   for(col in 1:ncol(tmp)){
213 |     idx <- which(tmp[,col] >= overlap)
214 |     sel_col <- idx[which(idx != col)]
215 |     tmp[,sel_col] <- 0
216 |   }
217 |   sel_terms <- colnames(tmp)[colSums(tmp) != 0]
218 |   dat <- subset(data, term %in% sel_terms)
219 |   data <- dat[!duplicated(dat$term), ]
220 |   return(data)
221 | }
222 | 
223 | #' 
224 | #' @name GOBubble
225 | #' @title Bubble plot.
226 | #' @description The function creates a bubble plot of the input \code{data}. The
227 | #'   input \code{data} can be created with the help of the 
228 | #'   \code{\link{circle_dat}} function.
229 | #' @param data A data frame with coloumns for category, GO ID, term, adjusted 
230 | #'   p-value, z-score, count(num of genes)
231 | #' @param display A character vector. Indicates whether it should be a single 
232 | #'   plot ('single') or a facet plot with panels for each category 
233 | #'   (default='single')
234 | #' @param title The title (on top) of the plot
235 | #' @param colour A character vector which defines the colour of the bubbles for 
236 | #'   each category
237 | #' @param labels Sets a threshold for the displayed labels. The threshold refers
238 | #'   to the -log(adjusted p-value) (default=5)
239 | #' @param ID If TRUE then labels are IDs else terms
240 | #' @param table.legend Defines whether a table of GO ID and GO term should be 
241 | #'   displayed on the right side of the plot or not (default = TRUE)
242 | #' @param table.col If TRUE then the table entries are coloured according to 
243 | #'   their category, if FALSE then entries are black
244 | #' @param bg.col Should only be used in case of a facet plot. If TRUE then the
245 | #'   panel backgrounds are coloured according to the displayed category
246 | #' @details The x- axis of the plot represents the z-score. The negative 
247 | #'   logarithm of the adjusted p-value (corresponding to the significance of the
248 | #'   term) is displayed on the y-axis. The area of the plotted circles is 
249 | #'   proportional to the number of genes assigned to the term. Each circle is 
250 | #'   coloured according to its category and labeled alternatively with the ID or 
251 | #'   term name.If static is set to FALSE the mouse hover effect will be enabled.
252 | #' @import ggplot2
253 | #' @import gridExtra
254 | #' @import graphics
255 | #' @examples
256 | #' \dontrun{
257 | #' #Load the included dataset
258 | #' data(EC)
259 | #' 
260 | #' #Building the circ object
261 | #' circ <- circular_dat(EC$david, EC$genelist)
262 | #' 
263 | #' #Creating the bubble plot colouring the table entries according to the category
264 | #' GOBubble(circ, table.col = T)
265 | #' 
266 | #' #Creating the bubble plot displaying the term instead of the ID and without the table
267 | #' GOBubble(circ, ID = F, table.legend = F)
268 | #' 
269 | #' #Faceting the plot
270 | #' GOBubble(circ, display = 'multiple')
271 | #' }
272 | #' @export
273 | GOBubble <- function(data, display, title, colour, labels, ID = T, table.legend = T, table.col = T, bg.col = F){
274 |   zscore <- adj_pval <- category <- count <- id <- term <- NULL
275 |   if (missing(display)) display <- 'single'
276 |   if (missing(title)) title <- ''
277 |   if (missing(colour)) cols <- c("chartreuse4", "brown2", "cornflowerblue") else cols <- colour
278 |   if (missing(labels)) labels <- 5
279 |   if (bg.col == T & display == 'single') cat("Parameter bg.col will be ignored. To use the parameter change display to 'multiple'")
280 |   
281 |   colnames(data) <- tolower(colnames(data))
282 |   if(!'count'%in%colnames(data)){
283 |     rang <- c(5, 5)
284 |     data$count <- rep(1, dim(data)[1])
285 |   }else {rang <- c(1, 30)}
286 |   data$adj_pval <- -log(data$adj_pval, 10)
287 |   sub <- data[!duplicated(data$term), ]
288 |   g <- ggplot(sub, aes(zscore, adj_pval, fill = category, size = count))+
289 |     labs(title = title, x = 'z-score', y = '-log (adj p-value)')+
290 |     geom_point(shape = 21, col = 'black', alpha = 1 / 2)+
291 |     geom_hline(yintercept = 1.3, col = 'orange')+
292 |     scale_size(range = rang, guide = 'none')
293 |   if (!is.character(labels)) sub2 <- subset(sub, subset = sub$adj_pval >= labels) else sub2 <- subset(sub, sub$id%in%labels | sub$term%in%labels)
294 |   if (display == 'single'){
295 |     g <- g + scale_fill_manual('Category', values = cols, labels = c('Biological Process', 'Cellular Component', 'Molecular Function'))+
296 |       theme(legend.position = 'bottom')+
297 |       annotate ("text", x = min(sub$zscore)+0.2, y = 1.4, label = "Threshold", colour = "orange", size = 4)
298 |     if (ID) g <- g+ geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = id), size = 5) else g <- g + geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = term), size = 4)
299 |     if (table.legend){
300 |       if (table.col) table <- draw_table(sub2, col = cols) else table <- draw_table(sub2)
301 |       g <- g + theme(axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'), 
302 |                      axis.title = element_text(size = 14, face = 'bold'), panel.background = element_blank(), panel.grid.minor = element_blank(), 
303 |                      panel.grid.major = element_line(colour = 'grey80'), plot.background = element_blank()) 
304 |       graphics::par(mar = c(0.1, 0.1, 0.1, 0.1))
305 |       grid.arrange(g, table, ncol = 2)
306 |     }else{
307 |       g + theme(axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'), 
308 |                 axis.title = element_text(size = 14, face = 'bold'), panel.background = element_blank(), panel.grid.minor = element_blank(), 
309 |                 panel.grid.major = element_line(colour = 'grey80'), plot.background = element_blank())
310 |     }
311 |   }else{
312 |     if(bg.col){
313 |       dummy_col <- data.frame(category = c('BP', 'CC', 'MF'), adj_pval = sub$adj_pval[1:3], zscore = sub$zscore[1:3], size = 1:3, count = 1:3)
314 |       g <- g + geom_rect(data = dummy_col, aes(fill = category), xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, alpha = 0.1)+
315 |              facet_grid(.~category, space = 'free_x', scales = 'free_x')+ 
316 |              scale_fill_manual(values = cols, guide ='none')
317 |     }else{
318 |       g <- g + facet_grid(.~category, space = 'free_x', scales = 'free_x')+ 
319 |                scale_fill_manual(values = cols, guide ='none') 
320 |     }
321 |     if (ID) {
322 |       g + geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = id), size = 5) + 
323 |         theme(axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), 
324 |               axis.ticks = element_line(colour = 'grey80'), panel.border = element_rect(fill = 'transparent', colour = 'grey80'),
325 |               panel.background = element_blank(), panel.grid = element_blank(), plot.background = element_blank()) 
326 |     }else{
327 |       g + geom_text(data = sub2, aes(x = zscore, y = adj_pval, label = term), size = 5) + 
328 |         theme(axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), axis.line = element_line(colour = 'grey80'), 
329 |               axis.ticks = element_line(colour = 'grey80'), panel.border = element_rect(fill = 'transparent', colour = 'grey80'),
330 |               panel.background = element_blank(), panel.grid = element_blank(), plot.background = element_blank())
331 |     }
332 |   }
333 | }
334 | 
335 | #' 
336 | #' @name GOBar
337 | #' @title Z-score coloured barplot.
338 | #' @description Z-score coloured barplot of terms ordered alternatively by 
339 | #'   z-score or the negative logarithm of the adjusted p-value
340 | #' @param data A data frame containing at least the term ID and/or term, the 
341 | #'   adjusted p-value and the z-score. A possible input can be generated with 
342 | #'   the \code{circle_dat} function
343 | #' @param display A character vector indicating whether a single plot ('single')
344 | #'   or a facet plot with panels for each category should be drawn 
345 | #'   (default='single')
346 | #' @param order.by.zscore Defines the order of the bars. If TRUE the bars are 
347 | #'   ordered according to the z-scores of the processes. Otherwise the bars are 
348 | #'   ordered by the negative logarithm of the adjusted p-value
349 | #' @param title The title of the plot
350 | #' @param zsc.col Character vector to define the colour scale for the z-score of 
351 | #'   the form c(high, midpoint,low)
352 | #' @details If \code{display} is used to facet the plot the width of the panels 
353 | #'   will be proportional to the length of the x scale.
354 | #' @import ggplot2
355 | #' @import gridExtra
356 | #' @import stats
357 | #' @examples
358 | #' \dontrun{
359 | #' #Load the included dataset
360 | #' data(EC)
361 | #' 
362 | #' #Building the circ object
363 | #' circ<-circular_dat(EC$david, EC$genelist)
364 | #' 
365 | #' #Creating the bar plot
366 | #' GOBar(circ)
367 | #' 
368 | #' #Faceting the plot
369 | #' GOBar(circ, display='multiple')
370 | #' }
371 | #' @export
372 | 
373 | GOBar <- function(data, display, order.by.zscore = T, title, zsc.col){
374 |   id <- adj_pval <- zscore <- NULL
375 |   if (missing(display)) display <- 'single'
376 |   if (missing(title)) title <- ''
377 |   if (missing(zsc.col)) zsc.col <- c('firebrick1', 'white', 'dodgerblue1')
378 |   colnames(data) <- tolower(colnames(data))
379 |   data$adj_pval <- -log(data$adj_pval, 10)
380 |   sub <- data[!duplicated(data$term), ]
381 | 
382 |   if (order.by.zscore == T) {
383 |     sub <- sub[order(sub$zscore, decreasing = T), ]
384 |     leg <- theme(legend.position = 'bottom')
385 |     g <-  ggplot(sub, aes(x = factor(id, levels = stats::reorder(id, adj_pval)), y = adj_pval, fill = zscore)) +
386 |       geom_bar(stat = 'identity', colour = 'black') +
387 |       scale_fill_gradient2('z-score', space = 'Lab', low = zsc.col[3], mid = zsc.col[2], high = zsc.col[1], guide = guide_colourbar(title.position = "top", title.hjust = 0.5), 
388 |                            breaks = c(min(sub$zscore), max(sub$zscore)), labels = c('decreasing', 'increasing')) +
389 |       labs(title = title, x = '', y = '-log (adj p-value)') +
390 |       leg
391 |   }else{
392 |     sub <- sub[order(sub$adj_pval, decreasing = T), ]
393 |     leg <- theme(legend.justification = c(1, 1), legend.position = c(0.98, 0.995), legend.background = element_rect(fill = 'transparent'),
394 |                  legend.box = 'vertical', legend.direction = 'horizontal')
395 |     g <-  ggplot(sub, aes( x = factor(id, levels = reorder(id, adj_pval)), y = zscore, fill = adj_pval)) +
396 |       geom_bar(stat = 'identity', colour = 'black') +
397 |       scale_fill_gradient2('Significance', space = 'Lab', low = zsc.col[3], mid = zsc.col[2], high = zsc.col[1], guide = guide_colourbar(title.position = "top", title.hjust = 0.5), breaks = c(min(sub$adj_pval), max(sub$adj_pval)), labels = c('low', 'high')) +
398 |       labs(title = title, x = '', y = 'z-score') +
399 |       leg
400 |   }
401 |   if (display == 'single'){
402 |     g + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'),
403 |               axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), panel.background = element_blank(), 
404 |               panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank())        
405 |   }else{
406 |     g + facet_grid(.~category, space = 'free_x', scales = 'free_x')+
407 |         theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), axis.line = element_line(colour = 'grey80'), axis.ticks = element_line(colour = 'grey80'),
408 |             axis.title = element_text(size = 14, face = 'bold'), axis.text = element_text(size = 14), panel.background = element_blank(), 
409 |             panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank())
410 |   }
411 | }
412 | 
413 | #' 
414 | #' @name GOCircle
415 | #' @title Circular visualization of the results of a functional analysis.
416 | #' @description The circular plot combines gene expression and gene- annotation 
417 | #'   enrichment data. A subset of terms is displayed like the \code{GOBar} plot 
418 | #'   in combination with a scatterplot of the gene expression data. The whole 
419 | #'   plot is drawn on a specific coordinate system to achieve the circular 
420 | #'   layout.The segments are labeled with the term ID.
421 | #' @param data A special data frame which should be the result of 
422 | #'   \code{circle_dat}
423 | #' @param title The title of the plot
424 | #' @param nsub A numeric or character vector. If it's numeric then the number 
425 | #'   defines how many processes are displayed (starting from the first row of 
426 | #'   \code{data}). If it's a character string of processes then these processes 
427 | #'   are displayed
428 | #' @param rad1 The radius of the inner circle (default=2)
429 | #' @param rad2 The radius of the outer circle (default=3)
430 | #' @param table.legend Shall a table be displayd or not? (default=TRUE)
431 | #' @param zsc.col Character vector to define the colour scale for the z-score of 
432 | #'   the form c(high, midpoint,low)
433 | #' @param lfc.col A character vector specifying the colour for up- and 
434 | #'   down-regulated genes
435 | #' @param label.size Size of the segment labels (default=5)
436 | #' @param label.fontface Font style of the segment labels (default='bold')
437 | #' @details The outer circle shows a scatter plot for each term of the logFC of 
438 | #'   the assigned genes. The colours can be changed with the argument 
439 | #'   \code{lfc.col}.
440 | #'   
441 | #'   The \code{nsub} argument needs a bit more explanation to be used wisely. First of 
442 | #'   all, it can be a numeric or a character vector. If it is a character vector
443 | #'   then it contains the IDs or term descriptions of the displayed processes.If
444 | #'   \code{nsub} is a numeric vector then the number defines how many terms are 
445 | #'   displayed. It starts with the first row of the input data frame.
446 | #' @import ggplot2
447 | #' @import gridExtra
448 | #' @import stats
449 | #' @import graphics
450 | #' @seealso \code{\link{circle_dat}}, \code{\link{GOBar}}
451 | #' @examples
452 | #' \dontrun{
453 | #' # Load the included dataset
454 | #' data(EC)
455 | #' 
456 | #' # Building the circ object
457 | #' circ <- circle_dat(EC$david, EC$genelist)
458 | #' 
459 | #' # Creating the circular plot
460 | #' GOCircle(circ)
461 | #' 
462 | #' # Creating the circular plot with a different colour scale for the logFC
463 | #' GOCircle(circ, lfc.col = c('purple', 'orange'))
464 | #' 
465 | #' # Creating the circular plot with a different colour scale for the z-score
466 | #' GOCircle(circ, zsc.col = c('yellow', 'black', 'cyan'))
467 | #' 
468 | #' # Creating the circular plot with different font style
469 | #' GOCircle(circ, label.size = 5, label.fontface = 'italic')
470 | #' }
471 | #' @export
472 | 
473 | GOCircle <- function(data, title, nsub, rad1, rad2, table.legend = T, zsc.col, lfc.col, label.size, label.fontface){
474 |   xmax <- y1<- zscore <- y2 <- ID <- logx <- logy2 <- logy <- logFC <- NULL
475 |   if (missing(title)) title <- ''
476 |   if (missing(nsub)) if (dim(data)[1] > 10) nsub <- 10 else nsub <- dim(data)[1]
477 |   if (missing(rad1)) rad1 <- 2
478 |   if (missing(rad2)) rad2 <- 3
479 |   if (missing(zsc.col)) zsc.col <- c('red', 'white', 'blue')
480 |   if (missing(lfc.col)) lfc.col <- c('cornflowerblue', 'firebrick1') else lfc.col <- rev(lfc.col)
481 |   if (missing(label.size)) label.size = 5
482 |   if (missing(label.fontface)) label.fontface = 'bold'
483 |   
484 |   data$adj_pval <- -log(data$adj_pval, 10)
485 |   suby <- data[!duplicated(data$term), ]
486 |   if (is.numeric(nsub) == T){		
487 |     suby <- suby[1:nsub, ]
488 |   }else{
489 |     if (strsplit(nsub[1], ':')[[1]][1] == 'GO'){
490 |       suby <- suby[suby$ID%in%nsub, ]
491 |     }else{
492 |       suby <- suby[suby$term%in%nsub, ]
493 |     }
494 |     nsub <- length(nsub)}
495 |   N <- dim(suby)[1]
496 |   r_pval <- round(range(suby$adj_pval), 0) + c(-2, 2)
497 |   ymax <- c()
498 |   for (i in 1:length(suby$adj_pval)){
499 |     val <- (suby$adj_pval[i] - r_pval[1]) / (r_pval[2] - r_pval[1])
500 |     ymax <- c(ymax, val)}
501 |   df <- data.frame(x = seq(0, 10 - (10 / N), length = N), xmax = rep(10 / N - 0.2, N), y1 = rep(rad1, N), y2 = rep(rad2, N), ymax = ymax, zscore = suby$zscore, ID = suby$ID)
502 |   scount <- data[!duplicated(data$term), which(colnames(data) == 'count')][1:nsub]
503 |   idx_term <- which(!duplicated(data$term) == T)
504 |   xm <- c(); logs <- c()
505 |   for (sc in 1:length(scount)){
506 |     idx <- c(idx_term[sc], idx_term[sc] + scount[sc] -1)
507 |     val <- stats::runif(scount[sc], df$x[sc] + 0.06, (df$x[sc] + df$xmax[sc] - 0.06))
508 |     xm <- c(xm, val)
509 |     r_logFC <- round(range(data$logFC[idx[1]:idx[2]]), 0) + c(-1, 1)
510 |     for (lfc in idx[1]:idx[2]){
511 |       val <- (data$logFC[lfc] - r_logFC[1]) / (r_logFC[2] - r_logFC[1])
512 |       logs <- c(logs, val)}
513 |   }
514 |   cols <- c()
515 |   for (ys in 1:length(logs)) cols <- c(cols, ifelse(data$logFC[ys] > 0, 'upregulated', 'downregulated'))
516 |   dfp <- data.frame(logx = xm, logy = logs, logFC = factor(cols), logy2 = rep(rad2, length(logs)))
517 |   c <-	ggplot()+
518 |     geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y1, ymax = y1 + ymax, fill = zscore), colour = 'black') +
519 |     geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2, ymax = y2 + 1), fill = 'gray70') +
520 |     geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2 + 0.5, ymax = y2 + 0.5), colour = 'white') +
521 |     geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2 + 0.25, ymax = y2 + 0.25), colour = 'white') +
522 |     geom_rect(data = df, aes(xmin = x, xmax = x + xmax, ymin = y2 + 0.75, ymax = y2 + 0.75), colour = 'white') +
523 |     geom_text(data = df, aes(x = x + (xmax / 2), y = y2 + 1.3, label = ID, angle = 360 - (x = x + (xmax / 2)) / (10 / 360)), size = label.size, fontface = label.fontface) +
524 |     coord_polar() +
525 |     labs(title = title) +
526 |     ylim(1, rad2 + 1.6) +
527 |     xlim(0, 10) +
528 |     theme_blank +
529 |     scale_fill_gradient2('z-score', space = 'Lab', low = zsc.col[3], mid = zsc.col[2], high = zsc.col[1], guide = guide_colourbar(title.position = "top", title.hjust = 0.5), breaks = c(min(df$zscore), max(df$zscore)),labels = c('decreasing', 'increasing')) +
530 |     theme(legend.position = 'bottom', legend.background = element_rect(fill = 'transparent'), legend.box = 'horizontal', legend.direction = 'horizontal') +	
531 |     geom_point(data = dfp, aes(x = logx, y = logy2 + logy), pch = 21, fill = 'transparent', colour = 'black', size = 3)+
532 |     geom_point(data = dfp, aes(x = logx, y = logy2 + logy, colour = logFC), size = 2.5)+
533 |     scale_colour_manual(values = lfc.col, guide = guide_legend(title.position = "top", title.hjust = 0.5))		
534 |   
535 |   if (table.legend){
536 |     table <- draw_table(suby)
537 |     graphics::par(mar = c(0.1, 0.1, 0.1, 0.1))
538 |     grid.arrange(c, table, ncol = 2)
539 |   }else{
540 |     c + theme(plot.background = element_rect(fill = 'aliceblue'), panel.background = element_rect(fill = 'white'))
541 |   }
542 | }	
543 | 


--------------------------------------------------------------------------------
/R/GOHeat.R:
--------------------------------------------------------------------------------
 1 | #' 
 2 | #' @name GOHeat
 3 | #' @title Displays heatmap of the relationship between genes and terms.
 4 | #' @description The GOHeat function generates a heatmap of the relationship 
 5 | #'   between genes and terms. Biological processes are displayed in rows and
 6 | #'   genes in columns. In addition genes are clustered to highlight groups of
 7 | #'   genes with similar annotated functions. The input can be generated with the
 8 | #'   \code{\link{chord_dat}} function.
 9 | #' @param data The matrix represents the binary relation (1= is related to, 0= 
10 | #'   is not related to) between a set of genes (rows) and processes (columns)
11 | #' @param nlfc Defines the number of logFC columns (default = 0)
12 | #' @param fill.col Defines the color scale break points
13 | #' @details The heatmap has in general two modes which depend on the \code{nlfc}
14 | #'   argument. If \code{nlfc = 0}, so no logFC values are available, the 
15 | #'   coloring encodes for the overall number of processes the respective gene is
16 | #'   assigned to. In case of \code{nlfc = 1} the color corresponds to the logFC 
17 | #'   of the gene.
18 | #' @import ggplot2
19 | #' @examples
20 | #' \dontrun{
21 | #' # Load the included dataset
22 | #' data(EC)
23 | #' 
24 | #' # Generate the circ object
25 | #' circ <- circle_dat(EC$david, EC$genelist)
26 | #' 
27 | #' # Generate the chord object
28 | #' chord <- chord_dat(circ, EC$genes, EC$process)
29 | #' 
30 | #' # Create the plot with user-defined colors
31 | #' GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green'))
32 | #' }
33 | #' @export
34 | #' 
35 | 
36 | GOHeat <- function(data, nlfc, fill.col){
37 |   x <- y <- z <- NULL
38 |   if(missing(nlfc)) nlfc <- 0 else nlfc <- nlfc
39 |   if(missing(fill.col)) fill.col <- c('firebrick', 'white', 'dodgerblue') else fill.col <- fill.col
40 |   
41 |   distance <- dist(data)
42 |   cluster <- hclust(distance)
43 |   M <- dim(data)[2]
44 |   nterm <- M - nlfc
45 |   if(nlfc == 0){
46 |     s <- rowSums(data[,1:nterm])
47 |     tmp <- NULL
48 |     for(r in 1:nrow(data)){
49 |       tmp <- c(tmp, as.numeric(gsub(1, s[r], data[r, 1:nterm])))
50 |     }
51 |   }else{
52 |     tmp <- NULL
53 |     for(r in 1:nrow(data)){
54 |       tmp <- c(tmp, as.numeric(gsub(1, data[r, (nterm + 1)], data[r, 1:nterm])))
55 |     }
56 |   }
57 |   df <- data.frame(x = rep(cluster$order, each = nterm), y = rep(colnames(data[,1:nterm]), length(rownames(data))), z = tmp, 
58 |                    lab = rep(rownames(data), each = nterm))
59 |   df_o <- df[order(df$x),]
60 |   
61 |   g <- ggplot() + 
62 |         geom_tile(data = df_o, aes(x = x, y = y, fill = z))+
63 |         scale_x_discrete(breaks = 1:length(unique(df_o$x)), labels = unique(df_o$lab)) +
64 |         theme(axis.text.x = element_text(angle = 90, vjust = 0.5), axis.title.x=element_blank(), axis.title.y=element_blank(),
65 |               axis.text.y = element_text(size = 14), panel.background=element_blank(), panel.grid.major=element_blank(),
66 |               panel.grid.minor=element_blank())
67 |   if(nlfc == 0){
68 |     g + scale_fill_gradient2('Count', space = 'Lab', low=fill.col[2], mid=fill.col[3], high=fill.col[1])
69 |   }else{
70 |     g + scale_fill_gradient2('logFC', space = 'Lab', low=fill.col[3], mid=fill.col[2], high=fill.col[1])
71 |   }
72 | }
73 | 
74 | 
75 | 
76 | 


--------------------------------------------------------------------------------
/R/GOVenn.R:
--------------------------------------------------------------------------------
  1 | #' 
  2 | #' @name GOVenn
  3 | #' @title Venn diagram of differentially expressed genes.
  4 | #' @description The function compares lists of differentially expressed genes 
  5 | #'   and illustrates possible relations.Additionally it represents the variety 
  6 | #'   of gene expression patterns within the intersection in small pie charts 
  7 | #'   with three segements. Clockwise are shown the number of commonly up- 
  8 | #'   regulated, commonly down- regulated and contra- regulated genes.
  9 | #' @param data1 A data frame consisting of two columns: ID, logFC
 10 | #' @param data2 A data frame consisting of two columns: ID, logFC
 11 | #' @param data3 A data frame consisting of two columns: ID, logFC
 12 | #' @param title The title of the plot
 13 | #' @param label A character vector to define the legend keys
 14 | #' @param lfc.col A character vector determining the background colors of the 
 15 | #'   pie segments representing up- and down- regulated genes
 16 | #' @param circle.col A character vector to assign clockwise colors for the 
 17 | #'   circles
 18 | #' @param plot If TRUE only the venn diagram is plotted. Otherwise the function 
 19 | #'   returns a list with two items: the actual plot and a list containing the 
 20 | #'   overlap entries (default= TRUE)
 21 | #' @details The \code{plot} argument can be used to adjust the amount of 
 22 | #'   information that is returned by calling the function. If you are only 
 23 | #'   interested in the actual plot of the venn diagram, \code{plot} should be 
 24 | #'   set to TRUE. Sometimes you also want to know the elements of the 
 25 | #'   intersections. In this case \code{plot} should be set to FALSE and the 
 26 | #'   function call will return a list of two items. The first item, that can be 
 27 | #'   accessed by $plot, contains the plotting information. Additionally, a list
 28 | #'   ($table) will be returned containing the elements of the various overlaps. 
 29 | #' @import ggplot2
 30 | #' @examples
 31 | #' \dontrun{
 32 | #' #Load the included dataset
 33 | #' data(EC)
 34 | #' 
 35 | #' #Generating the circ object
 36 | #' circ<-circular_dat(EC$david, EC$genelist)
 37 | #' 
 38 | #' #Selecting terms of interest
 39 | #' l1<-subset(circ,term=='heart development',c(genes,logFC))
 40 | #' l2<-subset(circ,term=='plasma membrane',c(genes,logFC))
 41 | #' l3<-subset(circ,term=='tissue morphogenesis',c(genes,logFC))
 42 | #' 
 43 | #' GOVenn(l1,l2,l3, label=c('heart development','plasma membrane','tissue morphogenesis'))
 44 | #' }
 45 | #' @export
 46 | 
 47 | GOVenn<-function(data1, data2, data3, title, label, lfc.col, circle.col, plot=T){
 48 |   id <- NULL
 49 |   if (missing(label)) label<-c('List1','List2','List3')
 50 |   if (missing(lfc.col)) lfc.col<-c('firebrick1','gold','cornflowerblue')
 51 |   if (missing(circle.col)) circle.col<-c('brown1','chartreuse3','cornflowerblue')
 52 |   if (missing(title)) title<-''
 53 |   if (missing(data3)==F) {
 54 |     three<-T
 55 |     overlap<-get_overlap(data1,data2,data3)
 56 |     venn_df<-overlap$venn_df
 57 |     table<-overlap$table
 58 |   }else{
 59 |     three<-F
 60 |     overlap<-get_overlap2(data1,data2)
 61 |     venn_df<-overlap$venn_df
 62 |     table<-overlap$table
 63 |   }
 64 |   
 65 | 	### calc Venn ###
 66 |   if (three){
 67 |     center<-data.frame(x=c(0.4311,0.4308,0.6380),y=c(0.6197,0.3801,0.5001),diameter=c(0.4483,0.4483,0.4483))
 68 |     outerCircle<-data.frame(x=numeric(),y=numeric(),id=numeric())
 69 | 	  for (var in 1:3){
 70 | 		  dat <- circleFun(c(center$x[var],center$y[var]),center$diameter[var],npoints = 100)
 71 | 		  outerCircle<-rbind(outerCircle,dat)
 72 | 	  }
 73 | 	  outerCircle$id<-rep(c(label[1],label[2],label[3]),each=100)
 74 | 	  outerCircle$id<-factor(outerCircle$id, levels=c(label[1],label[2],label[3]))
 75 |   }else{
 76 |     center<-data.frame(x=c(0.33,0.6699),y=c(0.5,0.5),diameter=c(0.6180,0.6180))
 77 |     outerCircle<-data.frame(x=numeric(),y=numeric(),id=numeric())
 78 |     for (var in 1:2){
 79 |       dat <- circleFun(c(center$x[var],center$y[var]),center$diameter[var],npoints = 100)
 80 |       outerCircle<-rbind(outerCircle,dat)
 81 |     }
 82 |     outerCircle$id<-rep(c(label[1],label[2]),each=100)
 83 |     outerCircle$id<-factor(outerCircle$id, levels=c(label[1],label[2]))
 84 |   }
 85 | 
 86 | 	### calc single pies ### 
 87 |   if (three){
 88 |     Pie<-data.frame(x=numeric(),y=numeric(),id=numeric())
 89 | 	  dat <- circleFun(c(center$x[1],max(subset(outerCircle,id==label[1])$y)-0.05),0.1,npoints = 100)
 90 | 	  Pie<-rbind(Pie,dat)
 91 | 	  dat <- circleFun(c(center$x[2],min(subset(outerCircle,id==label[2])$y)+0.05),0.1,npoints = 100)
 92 | 	  Pie<-rbind(Pie,dat)
 93 | 	  dat <- circleFun(c(max(subset(outerCircle,id==label[3])$x)-0.05,center$y[3]),0.1,npoints = 100)
 94 | 	  Pie<-rbind(Pie,dat)
 95 | 	  Pie$id<-rep(1:3,each=100)
 96 | 	  UP<-Pie[c(1:50,100:150,200:250),]
 97 | 	  Down<-Pie[c(50:100,150:200,250:300),]
 98 |   }else{
 99 |     Pie<-data.frame(x=numeric(),y=numeric(),id=numeric())
100 |     dat <- circleFun(c(min(subset(outerCircle,id==label[1])$x)+0.05,center$y[1]),0.1,npoints = 100)
101 |     Pie<-rbind(Pie,dat)
102 |     dat <- circleFun(c(max(subset(outerCircle,id==label[2])$x)-0.05,center$y[2]),0.1,npoints = 100)
103 |     Pie<-rbind(Pie,dat)
104 |     Pie$id<-rep(1:2,each=100)
105 |     UP<-Pie[c(1:50,100:150),]
106 |     Down<-Pie[c(50:100,150:200),]
107 |   }
108 | 
109 | 	### calc single pie text ###
110 | 	if (three){
111 |     x<-c();y<-c()
112 | 	  for (i in unique(Pie$id)){
113 | 		  x<-c(x,rep((min(subset(Pie,id==i)$x)+max(subset(Pie,id==i)$x))/2,2))
114 | 		  y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2+0.02)
115 | 		  y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2-0.02)
116 | 	  }
117 | 	  pieText<-data.frame(x=x,y=y,label=c(venn_df$UP[1],venn_df$DOWN[1],venn_df$UP[2],venn_df$DOWN[2],venn_df$UP[3],venn_df$DOWN[3]))
118 |   }else{
119 |     x<-c();y<-c()
120 |     for (i in unique(Pie$id)){
121 |       x<-c(x,rep((min(subset(Pie,id==i)$x)+max(subset(Pie,id==i)$x))/2,2))
122 |       y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2+0.02)
123 |       y<-c(y,(min(subset(Pie,id==i)$y)+max(subset(Pie,id==i)$y))/2-0.02)
124 |     }
125 |     pieText<-data.frame(x=x,y=y,label=c(venn_df$UP[1],venn_df$DOWN[1],venn_df$UP[2],venn_df$DOWN[2]))
126 |   }
127 |   
128 | 	### calc overlap pies ### 
129 |   if (three){
130 |     smc<-data.frame(x=c(0.6,0.59,0.31,0.5),y=c(0.66,0.34,0.5,0.5))
131 |     PieOv<-data.frame(x=numeric(),y=numeric())
132 | 	  PieOv<-rbind(PieOv,circleFun(c(smc$x[1],smc$y[1]),0.06,npoints = 100))
133 | 	  PieOv<-rbind(PieOv,circleFun(c(smc$x[2],smc$y[2]),0.06,npoints = 100))
134 | 	  PieOv<-rbind(PieOv,circleFun(c(smc$x[3],smc$y[3]),0.06,npoints = 100))
135 | 	  PieOv<-rbind(PieOv,circleFun(c(smc$x[4],smc$y[4]),0.06,npoints = 100))
136 | 	  PieOv$id<-rep(1:4,each=100)
137 | 	  smc$id<-1:4
138 | 	  UPOv<-rbind(smc[1,],PieOv[1:33,],smc[1,],smc[2,],PieOv[100:133,],smc[2,],smc[3,],PieOv[200:233,],smc[3,],smc[4,],PieOv[300:333,],smc[4,])
139 | 	  Change<-rbind(smc[1,],PieOv[33:66,],smc[1,],smc[2,],PieOv[133:166,],smc[2,],smc[3,],PieOv[233:266,],smc[3,],smc[4,],PieOv[333:366,],smc[4,])
140 | 	  DownOv<-rbind(smc[1,],PieOv[66:100,],smc[1,],smc[2,],PieOv[166:200,],smc[2,],smc[3,],PieOv[266:300,],smc[3,],smc[4,],PieOv[366:400,],smc[4,])
141 |   }else{
142 |     PieOv<-data.frame(x=numeric(),y=numeric(),id=numeric())
143 |     PieOv<-rbind(PieOv,circleFun(c(0.5,0.5),0.08,npoints = 100))
144 |     PieOv$id<-rep(1,100)
145 |     center<-data.frame(x=0.5, y=0.5, id=1)
146 |     UPOv<-rbind(center[1,],PieOv[1:33,])
147 |     Change<-rbind(center[1,],PieOv[33:66,])
148 |     DownOv<-rbind(center[1,],PieOv[66:100,])
149 |   }
150 |   
151 |   ### calc overlap pie text ###
152 |   if (three){
153 |     x<-c();y<-c()
154 | 	  for (i in unique(PieOv$id)){
155 | 		  x<-c(x,subset(UPOv,id==i)$x[1]+0.0115,subset(DownOv,id==i)$x[1]-0.018,subset(Change,id==i)$x[1]+0.01)
156 | 		  y<-c(y,subset(UPOv,id==i)$y[1]+0.01,subset(DownOv,id==i)$y[1],subset(Change,id==i)$y[1]-0.013)
157 | 	  }
158 | 	  small.pieT<-data.frame(x=x,y=y,label=c(venn_df$UP[5],venn_df$Change[5],venn_df$DOWN[5],venn_df$UP[6],venn_df$Change[6],venn_df$DOWN[6],venn_df$UP[4],venn_df$Change[4],venn_df$DOWN[4],venn_df$UP[7],venn_df$Change[7],venn_df$DOWN[7]))
159 |   }else{
160 |     x<-c(subset(UPOv,id==1)$x[1]+0.015,subset(DownOv,id==1)$x[1]-0.018,subset(Change,id==1)$x[1]+0.01)
161 |     y<-c(subset(UPOv,id==1)$y[1]+0.015,subset(DownOv,id==1)$y[1],subset(Change,id==1)$y[1]-0.013)
162 |     small.pieT<-data.frame(x=x,y=y,label=c(venn_df$UP[3],venn_df$Change[3],venn_df$DOWN[3]))
163 |   }
164 |   
165 | 	g<- ggplot()+
166 |     geom_polygon(data=outerCircle, aes(x,y, group=id, fill=id) ,alpha=0.5,color='black')+
167 | 	  scale_fill_manual(values=circle.col)+
168 |     guides(fill=guide_legend(title=''))+
169 | 	  geom_polygon(data=UP, aes(x,y,group=id),fill=lfc.col[1],color='white')+
170 | 	  geom_polygon(data=Down, aes(x,y,group=id),fill=lfc.col[3],color='white')+
171 | 	  geom_text(data=pieText, aes(x=x,y=y,label=label),size=5)+
172 |     geom_polygon(data=UPOv, aes(x,y,group=id),fill=lfc.col[1],color='white')+
173 | 	  geom_polygon(data=DownOv, aes(x,y,group=id),fill=lfc.col[3],color='white')+
174 | 	  geom_polygon(data=Change, aes(x,y,group=id),fill=lfc.col[2],color='white')+
175 | 	  geom_text(data=small.pieT,aes(x=x,y=y,label=label),size=4)+
176 |     theme_blank+
177 |     labs(title=title)
178 |   
179 |   if (plot) return(g) else return(list(plot=g,table=table))
180 | }
181 | 
182 | 
183 | 	
184 | 


--------------------------------------------------------------------------------
/R/Helper.R:
--------------------------------------------------------------------------------
  1 | ##############
  2 | # In general #
  3 | ##############
  4 | 
  5 | # Theme blank
  6 | theme_blank <- theme(axis.line = element_blank(), axis.text.x = element_blank(),
  7 |                      axis.text.y = element_blank(), axis.ticks = element_blank(), axis.title.x = element_blank(),
  8 |                      axis.title.y = element_blank(), panel.background = element_blank(), panel.border = element_blank(),
  9 |                      panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.background = element_blank())
 10 | 
 11 | # Draw adjacent table for GOBubble and GOCircle
 12 | draw_table <- function(data, col){
 13 |   id <- term <- NULL
 14 |   colnames(data) <- tolower(colnames(data))
 15 |   if (missing(col)){
 16 |     tt1 <- ttheme_default()
 17 |   }else{
 18 |     text.col <- c(rep(col[1], sum(data$category == 'BP')), rep(col[2], sum(data$category == 'CC')), rep(col[3], sum(data$category == 'MF')))
 19 |     tt1 <- ttheme_minimal(
 20 |       core = list(bg_params = list(fill = text.col, col=NA, alpha= 1/3)), 
 21 |       colhead = list(fg_params = list(col = "black")))
 22 |   }
 23 |   table <- tableGrob(subset(data, select = c(id, term)), cols = c('ID', 'Description'), rows = NULL, theme = tt1)
 24 |   return(table)
 25 | }
 26 | 
 27 | ###########
 28 | # GOChord #
 29 | ###########
 30 | 
 31 | # Bezier function for drawing ribbons
 32 | bezier <- function(data, process.col){
 33 |   x <- c()
 34 |   y <- c()
 35 |   Id <- c()
 36 |   sequ <- seq(0, 1, by = 0.01)
 37 |   N <- dim(data)[1]
 38 |   sN <- seq(1, N, by = 2)
 39 |   if (process.col[1] == '') col_rain <- grDevices::rainbow(N) else col_rain <- process.col
 40 |   for (n in sN){
 41 |     xval <- c(); xval2 <- c(); yval <- c(); yval2 <- c()
 42 |     for (t in sequ){
 43 |       xva <- (1 - t) * (1 - t) * data$x.start[n] + t * t * data$x.end[n]
 44 |       xval <- c(xval, xva)
 45 |       xva2 <- (1 - t) * (1 - t) * data$x.start[n + 1] + t * t * data$x.end[n + 1]
 46 |       xval2 <- c(xval2, xva2)
 47 |       yva <- (1 - t) * (1 - t) * data$y.start[n] + t * t * data$y.end[n]  
 48 |       yval <- c(yval, yva)
 49 |       yva2 <- (1 - t) * (1 - t) * data$y.start[n + 1] + t * t * data$y.end[n + 1]
 50 |       yval2 <- c(yval2, yva2)			
 51 |     }
 52 |     x <- c(x, xval, rev(xval2))
 53 |     y <- c(y, yval, rev(yval2))
 54 |     Id <- c(Id, rep(n, 2 * length(sequ)))
 55 |   }
 56 |   df <- data.frame(lx = x, ly = y, ID = Id)
 57 |   return(df)
 58 | }
 59 | 
 60 | # Check function for GOChord argument 'limit'
 61 | check_chord <- function(mat, limit){
 62 |   
 63 |   if(all(colSums(mat) >= limit[2]) & all(rowSums(mat) >= limit[1])) return(mat)
 64 |   
 65 |   tmp <- mat[(rowSums(mat) >= limit[1]),]
 66 |   mat <- tmp[,(colSums(tmp) >= limit[2])]
 67 |   
 68 |   mat <- check_chord(mat, limit)
 69 |   return(mat)
 70 | }
 71 | 
 72 | ##########
 73 | # GOVenn #
 74 | ##########
 75 | 
 76 | # Calculate points to draw a circle
 77 | circleFun <- function(center = c(0,0),diameter = 1, npoints = 100){
 78 |   r = diameter / 2
 79 |   tt <- seq(0,2*pi,length.out = npoints)
 80 |   xx <- center[1] + r * cos(tt)
 81 |   yy <- center[2] + r * sin(tt)
 82 |   return(data.frame(x = xx, y = yy))
 83 | }
 84 | 
 85 | # Calculate overlap for three lists
 86 | get_overlap<-function(A,B,C){
 87 |   colnames(A)<-c('ID','logFC')
 88 |   colnames(B)<-c('ID','logFC')
 89 |   colnames(C)<-c('ID','logFC')
 90 |   UP<-NULL;DOWN<-NULL;Change<-NULL
 91 |   if (class(A$logFC)!='numeric'){
 92 |     A$logFC<-gsub(",", ".", gsub("\\.", "", A$logFC))
 93 |     A$Trend<-sapply(as.numeric(A$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 
 94 |   }else{ A$Trend<-sapply(A$logFC, function(x) ifelse(x > 0,'UP','DOWN'))}
 95 |   if (class(B$logFC)!='numeric'){
 96 |     B$logFC<-gsub(",", ".", gsub("\\.", "", B$logFC))
 97 |     B$Trend<-sapply(as.numeric(B$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 
 98 |   }else{ B$Trend<-sapply(B$logFC, function(x) ifelse(x > 0,'UP','DOWN'))}
 99 |   if (class(C$logFC)!='numeric'){
100 |     C$logFC<-gsub(",", ".", gsub("\\.", "", C$logFC))
101 |     C$Trend<-sapply(as.numeric(C$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 
102 |   }else{ C$Trend<-sapply(C$logFC, function(x) ifelse(x > 0,'UP','DOWN'))}
103 |   if (sum(((A$ID%in%B$ID)==T)==T)==0){
104 |     AB<-data.frame() 
105 |   }else{
106 |     AB<-A[(A$ID%in%B$ID)==T,which(colnames(A)%in%c('ID','logFC','Trend'))]
107 |     BA<-B[(B$ID%in%A$ID)==T,which(colnames(B)%in%c('ID','logFC','Trend'))]
108 |     AB<-merge(AB,BA,by="ID")
109 |     rownames(AB)<-AB$ID
110 |     AB<-AB[,-1]   
111 |   }
112 |   if (sum(((A$ID%in%C$ID)==T)==T)==0){
113 |     AC<-data.frame() 
114 |   }else{
115 |     AC<-A[(A$ID%in%C$ID)==T,which(colnames(A)%in%c('ID','logFC','Trend'))]
116 |     CA<-C[(C$ID%in%A$ID)==T,which(colnames(C)%in%c('ID','logFC','Trend'))]
117 |     AC<-merge(AC,CA,by="ID")
118 |     rownames(AC)<-AC$ID
119 |     AC<-AC[,-1]
120 |   }
121 |   if (sum(((B$ID%in%C$ID)==T)==T)==0){
122 |     BC<-data.frame() 
123 |   }else{
124 |     BC<-B[(B$ID%in%C$ID)==T,which(colnames(B)%in%c('ID','logFC','Trend'))]
125 |     CB<-C[(C$ID%in%B$ID)==T,which(colnames(C)%in%c('ID','logFC','Trend'))]
126 |     BC<-merge(BC,CB,by="ID")
127 |     rownames(BC)<-BC$ID
128 |     BC<-BC[,-1]
129 |   }
130 |   if (sum(((A$ID%in%B$ID)==T & (A$ID%in%C$ID)==T))==0){
131 |     ABC<-data.frame() 
132 |   }else{
133 |     ABC<-A[((A$ID%in%B$ID)==T & (A$ID%in%C$ID)==T),which(colnames(A)%in%c('ID','logFC','Trend'))]
134 |     BAC<-B[((B$ID%in%A$ID)==T & (B$ID%in%C$ID)==T),which(colnames(B)%in%c('ID','logFC','Trend'))]
135 |     CAB<-C[((C$ID%in%A$ID)==T & (C$ID%in%B$ID)==T),which(colnames(C)%in%c('ID','logFC','Trend'))]
136 |     ABC<-merge(ABC,BAC,by='ID')
137 |     ABC<-merge(ABC,CAB,by='ID')
138 |     rownames(ABC)<-ABC$ID
139 |     ABC<-ABC[,-1]
140 |   }
141 |   A_only<-A[((A$ID%in%B$ID)==F & (A$ID%in%C$ID)==F),which(colnames(A)%in%c('ID','logFC','Trend'))]
142 |   rownames(A_only)<-A_only$ID
143 |   A_only<-A_only[,-1]
144 |   B_only<-B[((B$ID%in%A$ID)==F & (B$ID%in%C$ID)==F),which(colnames(A)%in%c('ID','logFC','Trend'))]
145 |   rownames(B_only)<-B_only$ID
146 |   B_only<-B_only[,-1]
147 |   C_only<-C[((C$ID%in%A$ID)==F & (C$ID%in%B$ID)==F),which(colnames(A)%in%c('ID','logFC','Trend'))]
148 |   rownames(C_only)<-C_only$ID
149 |   C_only<-C_only[,-1]
150 |   UP<-c(UP,sum(A_only$Trend=='UP'));DOWN<-c(DOWN,sum(A_only$Trend=='DOWN'));Change<-c(Change,sum(A_only$Trend=='Change'))
151 |   UP<-c(UP,sum(B_only$Trend=='UP'));DOWN<-c(DOWN,sum(B_only$Trend=='DOWN'));Change<-c(Change,sum(B_only$Trend=='Change'))
152 |   UP<-c(UP,sum(C_only$Trend=='UP'));DOWN<-c(DOWN,sum(C_only$Trend=='DOWN'));Change<-c(Change,sum(C_only$Trend=='Change'))
153 |   if (dim(AB)[1]==0){
154 |     OvAB<-data.frame()
155 |     UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0)
156 |   }else{
157 |     tmp<-NULL
158 |     for (t in 1:dim(AB)[1]) tmp<-c(tmp,ifelse(AB$Trend.x[t]==AB$Trend.y[t],AB$Trend.x[t],'Change'))
159 |     OvAB<-data.frame(logFC_A=AB$logFC.x,logFC_B=AB$logFC.y,Trend=tmp)
160 |     rownames(OvAB)<-rownames(AB)
161 |     AB<-OvAB[order(OvAB$Trend),]
162 |     UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change'))
163 |   }
164 |   if (dim(AC)[1]==0){
165 |     OvAc<-data.frame()
166 |     UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0)
167 |   }else{
168 |     tmp<-NULL
169 |     for (t in 1:dim(AC)[1]) tmp<-c(tmp,ifelse(AC$Trend.x[t]==AC$Trend.y[t],AC$Trend.x[t],'Change'))
170 |     OvAC<-data.frame(logFC_A=AC$logFC.x,logFC_C=AC$logFC.y,Trend=tmp)
171 |     rownames(OvAC)<-rownames(AC)
172 |     AC<-OvAC[order(OvAC$Trend),]
173 |     UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change'))
174 |   }
175 |   if (dim(BC)[1]==0){
176 |     OvBC<-data.frame()
177 |     UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0)
178 |   }else{
179 |     tmp<-NULL
180 |     for (t in 1:dim(BC)[1]) tmp<-c(tmp,ifelse(BC$Trend.x[t]==BC$Trend.y[t],BC$Trend.x[t],'Change'))
181 |     OvBC<-data.frame(logFC_B=BC$logFC.x,logFC_C=BC$logFC.y,Trend=tmp)
182 |     rownames(OvBC)<-rownames(BC)
183 |     BC<-OvBC[order(OvBC$Trend),]
184 |     UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change'))
185 |   }
186 |   if (dim(ABC)[1]==0){
187 |     OvABC<-data.frame()
188 |     UP<-c(UP,0);DOWN<-c(DOWN,0);Change<-c(Change,0)
189 |   }else{
190 |     tmp<-NULL
191 |     for (t in 1:dim(ABC)[1]) tmp<-c(tmp,ifelse(((ABC$Trend.x[t]==ABC$Trend.y[t]) & (ABC$Trend.x[t]==ABC$Trend[t])),ABC$Trend.x[t],'Change'))
192 |     OvABC<-data.frame(logFC_A=ABC$logFC.x,logFC_B=ABC$logFC.y,logFC_C=ABC$logFC,Trend=tmp)
193 |     rownames(OvABC)<-rownames(ABC)
194 |     ABC<-OvABC[order(OvABC$Trend),]
195 |     UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change'))
196 |   }
197 |   counts<-data.frame(Contrast=c('A_only','B_only','C_only','AB','AC','BC','ABC'),Count=c(dim(A_only)[1],dim(B_only)[1],dim(C_only)[1],dim(AB)[1],dim(AC)[1],dim(BC)[1],dim(ABC)[1]),UP=UP,DOWN=DOWN,Change=Change)
198 |   venn<-list(A_only=A_only,B_only=B_only,C_only=C_only,AB=AB,BC=BC,AC=AC,ABC=ABC)
199 |   return(list(venn_df=counts,table=venn))
200 | }
201 | 
202 | # Caluclate overlap for two lists
203 | get_overlap2<-function(A,B){
204 |   colnames(A)<-c('ID','logFC')
205 |   colnames(B)<-c('ID','logFC')
206 |   UP<-NULL;DOWN<-NULL;Change<-NULL
207 |   if (class(A$logFC)!='numeric'){
208 |     A$logFC<-gsub(",", ".", gsub("\\.", "", A$logFC))
209 |     A$Trend<-sapply(as.numeric(A$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 
210 |   }else{ A$Trend<-sapply(A$logFC, function(x) ifelse(x > 0,'UP','DOWN'))}
211 |   if (class(B$logFC)!='numeric'){
212 |     B$logFC<-gsub(",", ".", gsub("\\.", "", B$logFC))
213 |     B$Trend<-sapply(as.numeric(B$logFC), function(x) ifelse(x > 0,'UP','DOWN')) 
214 |   }else{ B$Trend<-sapply(B$logFC, function(x) ifelse(x > 0,'UP','DOWN'))}
215 |   AB<-A[(A$ID%in%B$ID)==T,which(colnames(A)%in%c('ID','logFC','Trend'))]
216 |   BA<-B[(B$ID%in%A$ID)==T,which(colnames(B)%in%c('ID','logFC','Trend'))]
217 |   A_only<-A[(A$ID%in%B$ID)==F,which(colnames(A)%in%c('ID','logFC','Trend'))]
218 |   B_only<-B[(B$ID%in%A$ID)==F,which(colnames(B)%in%c('ID','logFC','Trend'))]
219 |   AB<-merge(AB,BA,by='ID')
220 |   UP<-c(UP,sum(A_only$Trend=='UP'));DOWN<-c(DOWN,sum(A_only$Trend=='DOWN'));Change<-c(Change,sum(A_only$Trend=='Change'))
221 |   UP<-c(UP,sum(B_only$Trend=='UP'));DOWN<-c(DOWN,sum(B_only$Trend=='DOWN'));Change<-c(Change,sum(B_only$Trend=='Change'))
222 |   rownames(A_only)<-A_only$ID
223 |   A_only<-A_only[,-1]
224 |   A_only<-A_only[order(A_only$Trend),]
225 |   rownames(B_only)<-B_only$ID
226 |   B_only<-B_only[,-1]
227 |   B_only<-B_only[order(B_only$Trend),]
228 |   tmp<-NULL
229 |   for (t in 1:dim(AB)[1]) tmp<-c(tmp,ifelse(AB$Trend.x[t]==AB$Trend.y[t],AB$Trend.x[t],'Change'))
230 |   OvAB<-data.frame(logFC_A=AB$logFC.x,logFC_B=AB$logFC.y,Trend=tmp)
231 |   rownames(OvAB)<-AB$ID
232 |   AB<-OvAB[order(OvAB$Trend),]
233 |   UP<-c(UP,sum(tmp=='UP'));DOWN<-c(DOWN,sum(tmp=='DOWN'));Change<-c(Change,sum(tmp=='Change'))
234 |   counts<-data.frame(Contrast=c('A_only','B_only','AB'),Count=c(dim(A_only)[1],dim(B_only)[1],dim(AB)[1]),UP=UP,DOWN=DOWN,Change=Change)
235 |   venn<-list(A_only=A_only,B_only=B_only,AB=AB)
236 |   return(list(venn_df=counts,table=venn,dim=c(dim(A)[1],dim(B)[1])))
237 | }


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # GOplot
 2 | 
 3 | Despite the plethora of methods available for the functional analysis of omics data, obtaining comprehensive- yet detailed understanding of the results remains challenging. GOplot takes the output of any general enrichment analysis and generates plots at different levels of detail: from a general overview to identify the most enriched categories (bar plot, bubble plot) to a more detailed view displaying different types of information for molecules in a given set of categories (circle plot, chord plot, cluster plot). The package provides a deeper insight into omics data and allows scientists to generate insightful plots with only a few lines of code to easily communicate the findings. 
 4 | 
 5 | ## Installation
 6 | 
 7 | GOplot is available via CRAN: http://cran.r-project.org/web/packages/GOplot
 8 | 
 9 | * the latest released version: `install.packages("GOplot")`
10 | * the latest development version: `install_github("wencke/wencke.github.io")`
11 | 
12 | ## Available functions
13 | 
14 | For preprocessing: circle_dat(), chord_dat() and reduce_overlap()
15 | 
16 | For plotting: GOBubble(), GOBar(), GOChord(), GOCluster(), GOCircle(), GOVenn(), GOHeat()
17 | 
18 | A manual can be found on the website https://wencke.github.io/
19 | 


--------------------------------------------------------------------------------
/build/vignette.rds:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/build/vignette.rds


--------------------------------------------------------------------------------
/data/EC.rda:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/data/EC.rda


--------------------------------------------------------------------------------
/inst/CITATION:
--------------------------------------------------------------------------------
 1 | citHeader("To cite GOplot in publications use:")
 2 | 
 3 | citEntry(entry = "Article",
 4 |   title = "GOplot: an R package for visually combining expression data with functional analysis",
 5 |   author = personList(as.person("Wencke Walter"), as.person("Fatima Sanchez-Cabo"),
 6 |                       as.person("Mercedes Ricote")),
 7 |   journal = "Bioinformatics",
 8 |   year = "2015",
 9 | 
10 |   textVersion = paste("Walter, Wencke, Fatima Sanchez-Cabo, and Mercedes Ricote.", 
11 |                       "GOplot: an R package for visually combining expression data with functional analysis.",
12 |                       "Bioinformatics (2015): btv300.")
13 | )
14 | 


--------------------------------------------------------------------------------
/inst/doc/GOplot_vignette.R:
--------------------------------------------------------------------------------
  1 | ## ----table1, echo = FALSE, results = 'asis'------------------------------
  2 | toy<-data.frame(Name=c('EC$eset','EC$genelist','EC$david','EC$genes','EC$process'),Description=c('Data frame of normalized expression values of brain and heart endothelial cells (3 replicates)','Data frame of differentially expressed genes (adjusted p-value < 0.05)','Data frame of results from a functional analysis of the differentially expressed genes performed with DAVID','Data frame of selected genes with logFC','Character vector of selected enriched biological processes'),Dimension=c('20644 x 7','2039 x 7','174 x 5','37 x 2','7'))
  3 | knitr::kable(toy, colnames=c('Name','Description','Dimension (row, col)'))
  4 | 
  5 | ## ----glimpse, warning = FALSE, message = FALSE---------------------------
  6 | library(GOplot)
  7 | # Load the dataset
  8 | data(EC)
  9 | # Get a glimpse of the data format of the results of the functional analysis... 
 10 | head(EC$david)
 11 | # ...and of the data frame of selected genes
 12 | head(EC$genelist)
 13 | 
 14 | ## ----circ_object, warning = FALSE, message = FALSE-----------------------
 15 | # Generate the plotting object
 16 | circ <- circle_dat(EC$david, EC$genelist)
 17 | 
 18 | ## ----GOBar, warning = FALSE, message = FALSE, fig.width = 8.3, fig.height = 6----
 19 | # Generate a simple barplot
 20 | GOBar(subset(circ, category == 'BP'))
 21 | 
 22 | ## ----GOBar2, eval = FALSE, warning = FALSE, message = FALSE--------------
 23 | #  # Facet the barplot according to the categories of the terms
 24 | #  GOBar(circ, display = 'multiple')
 25 | 
 26 | ## ----GOBar3, eval = FALSE, warning = FALSE, message = FALSE--------------
 27 | #  # Facet the barplot, add a title and change the colour scale for the z-score
 28 | #  GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan'))
 29 | 
 30 | ## ----GOBubble1, warning = FALSE, message = FALSE, fig.keep = 'none'------
 31 | # Generate the bubble plot with a label threshold of 3
 32 | GOBubble(circ, labels = 3)
 33 | 
 34 | ## ----GOBubble2, warning = FALSE, message = FALSE, fig.keep = 'none'------
 35 | # Add a title, change the colour of the circles, facet the plot according to the categories and change the label threshold
 36 | GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3)
 37 | 
 38 | ## ----GOBubble3, warning = FALSE, message = FALSE, fig.keep = 'none'------
 39 | # Colour the background according to the category
 40 | GOBubble(circ, title = 'Bubble plot with background colour', display = 'multiple', bg.col = T, labels = 3)
 41 | 
 42 | ## ----GOBubble4, warning = FALSE, message = FALSE, fig.keep = 'none', eval = FALSE----
 43 | #  # Reduce redundant terms with a gene overlap >= 0.75...
 44 | #  reduced_circ <- reduce_overlap(circ, overlap = 0.75)
 45 | #  # ...and plot it
 46 | #  GOBubble(reduced_circ, labels = 2.8)
 47 | 
 48 | ## ----GOCircle1, warning = FALSE, message = FALSE, fig.keep = 'none'------
 49 | # Generate a circular visualization of the results of gene- annotation enrichment analysis
 50 | GOCircle(circ)
 51 | 
 52 | ## ----GOCircle2, eval = FALSE---------------------------------------------
 53 | #  # Generate a circular visualization of selected terms
 54 | #  IDs <- c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047')
 55 | #  GOCircle(circ, nsub = IDs)
 56 | 
 57 | ## ----GOCircle3, eval = FALSE---------------------------------------------
 58 | #  # Generate a circular visualization for 10 terms
 59 | #  GOCircle(circ, nsub = 10)
 60 | 
 61 | ## ----GOChord1, warning = FALSE, message = FALSE--------------------------
 62 | # Define a list of genes which you think are interesting to look at. The item EC$genes of the toy 
 63 | # sample contains the data frame of selected genes and their logFC. Have a look...
 64 | head(EC$genes)
 65 | # Since we have a lot of significantly enriched processes we selected some specific ones (EC$process)
 66 | EC$process
 67 | # Now it is time to generate the binary matrix
 68 | chord <- chord_dat(circ, EC$genes, EC$process)
 69 | head(chord)
 70 | 
 71 | ## ----GOChord2, eval=FALSE, warning = FALSE, message = FALSE--------------
 72 | #  # Generate the matrix with a list of selected genes
 73 | #  chord <- chord_dat(data = circ, genes = EC$genes)
 74 | #  # Generate the matrix with selected processes
 75 | #  chord <- chord_dat(data = circ, process = EC$process)
 76 | 
 77 | ## ----GOChord3, warning = FALSE, message = FALSE, fig.keep = 'none'-------
 78 | # Create the plot
 79 | GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5)
 80 | 
 81 | ## ----GOChord4, warning = FALSE, message = FALSE, fig.keep = 'none'-------
 82 | # Display only genes which are assigned to at least three processes
 83 | GOChord(chord, limit = c(3, 0), gene.order = 'logFC')
 84 | 
 85 | ## ----GOHeat1, warning = FALSE, message = FALSE, fig.keep = 'none'--------
 86 | # First, we use the chord object without logFC column to create the heatmap
 87 | GOHeat(chord[,-8], nlfc = 0)
 88 | 
 89 | ## ----GOHeat2, warning = FALSE, message = FALSE, fig.keep = 'none'--------
 90 | # First, we use the chord object without logFC column to create the heatmap
 91 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green'))
 92 | 
 93 | ## ----GOCluster, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'----
 94 | #  GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2)
 95 | 
 96 | ## ----GOCluster2, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'----
 97 | #  GOCluster(circ, EC$process, clust.by = 'term', lfc.col = c('darkgoldenrod1', 'black', 'cyan1'))
 98 | 
 99 | ## ----GOVenn, warning=FALSE, message=FALSE, fig.keep='none'---------------
100 | l1 <- subset(circ, term == 'heart development', c(genes,logFC))
101 | l2 <- subset(circ, term == 'plasma membrane', c(genes,logFC))
102 | l3 <- subset(circ, term == 'tissue morphogenesis', c(genes,logFC))
103 | GOVenn(l1,l2,l3, label = c('heart development', 'plasma membrane', 'tissue morphogenesis'))
104 | 
105 | 


--------------------------------------------------------------------------------
/inst/doc/GOplot_vignette.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "GOplot 1.0.2"
  3 | author: "Wencke Walter"
  4 | date: "`r Sys.Date()`"
  5 | output: 
  6 |   rmarkdown::html_vignette:
  7 |     css: GOplot.css
  8 | vignette: >
  9 |   %\VignetteIndexEntry{GOplot_0.2}
 10 |   %\VignetteEngine{knitr::rmarkdown}
 11 |   \usepackage[utf8]{inputenc}
 12 | ---
 13 | 
 14 | A manual to exploit the possibilities and limitations of the R package GOplot.
 15 | 
 16 | 
 17 | ##Introduction
 18 | The GOplot package concentrates on the visualization of biological data. More precisely, the package will help combine and integrate expression data with the results of a functional analysis. The package cannot be used to perform any of these analyses. It is for visualization purpose only. In all the scientific fields we visualize information to meet a basic need- to tell a story. Attributable to space restrictions and a general need to present everything neat and tidy most of the times it is simply not possible to actually tell a story. Therefore, we use vision to communicate information. A well designed and elaborated figure provides the beholder with high-dimensional information in a much smaller space than for example a table. The idea of the package is to provide the user with functions that allow a quick examination of large amounts of data, expose trends and find patterns & correlations within the data. Effective data visualization is an important tool in the decision making process and helps to find further pieces of the puzzle picturing the answer of your biological question. Based on that you will be able to confirm or falsify your hypotheses. You might even start to look in a different direction to investigate your topic relying on the insight a new visualization provides. The plotting functions of the package were developed with a hierarchical structure in mind; starting with a general overview and closing with definite subsets of selected genes and terms. To explain the idea let us use an example.
 19 | 
 20 | ##The toy example
 21 | GOplot comes with a manually compiled data set. Selected samples were downloaded from gene expression omnibus (accession number: *[GSE47067](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067)*). As a brief summary, the data set contains the transcriptomic information of endothelial cells from two steady state tissues (brain and heart). More detailed information can be found in the paper by *[Nolan et al. 2013](http://www.ncbi.nlm.nih.gov/pubmed/23871589)*. The data was normalized and a statistical analysis was performed to determine differentially expressed genes. *DAVID* functional annotation tool was used to perform a gene- annotation enrichment analysis of the set of differentially expressed genes (adjusted p-value < 0.05). The data set contains the five following items:
 22 | 
 23 | ```{r table1, echo = FALSE, results = 'asis'}
 24 | toy<-data.frame(Name=c('EC$eset','EC$genelist','EC$david','EC$genes','EC$process'),Description=c('Data frame of normalized expression values of brain and heart endothelial cells (3 replicates)','Data frame of differentially expressed genes (adjusted p-value < 0.05)','Data frame of results from a functional analysis of the differentially expressed genes performed with DAVID','Data frame of selected genes with logFC','Character vector of selected enriched biological processes'),Dimension=c('20644 x 7','2039 x 7','174 x 5','37 x 2','7'))
 25 | knitr::kable(toy, colnames=c('Name','Description','Dimension (row, col)'))
 26 | ```
 27 | 
 28 | ##Getting started
 29 | As a first step we want to get an overview of the enriched GO terms of our differentially expressed genes. But before we start plotting we need to bring the data in the right format for the plotting functions. In general, the data object of the plotting functions can be created manually, but the package includes a function that does the job for you. The *circle_dat* function combines the result of the functional analysis with a list of selected genes and their logFC. Most likely a list of differentially expressed genes. *circle_dat* takes two data frames as an input. The first one contains the results of the functional analysis and should have at least four columns (category, term, genes, adjusted p-value). Additionally, a data frame of the selected genes and their logFC is needed. This data frame can be, for example, the result from a statistical analysis performed with *limma*. Let us have a look at the mentioned data frames.   
 30 | 
 31 | ```{r glimpse, warning = FALSE, message = FALSE}
 32 | library(GOplot)
 33 | # Load the dataset
 34 | data(EC)
 35 | # Get a glimpse of the data format of the results of the functional analysis... 
 36 | head(EC$david)
 37 | # ...and of the data frame of selected genes
 38 | head(EC$genelist)
 39 | ```
 40 | 
 41 | Now, that we know what the input data looks like it's time to use the *cirlce_dat* function to create the plotting object.
 42 | 
 43 | ```{r circ_object, warning = FALSE, message = FALSE}
 44 | # Generate the plotting object
 45 | circ <- circle_dat(EC$david, EC$genelist)
 46 | ```
 47 | 
 48 | The **circ** object has eight columns with the following names: 
 49 | 
 50 | * category
 51 | * ID
 52 | * term
 53 | * count
 54 | * gene
 55 | * logFC
 56 | * adj_pval
 57 | * zscore
 58 | 
 59 | Since most of the gene- annotation enrichment analysis are based on the gene ontology database the package was build with this structure in mind, but is not restricted to it. As explained by *[Ashburner et al.](http://www.ncbi.nlm.nih.gov/pubmed/10802651)* in their paper from the year 2000, gene ontology is structured as an acyclic graph and it provides terms covering different areas. These terms are grouped into three independent categories: BP (biological process), CC (cellular component) or MF (molecular function). The first column of the **circ** object contains this information, which was already given in the input. For more information on the structure of gene ontology, have a look at the documentation section of the gene ontology consortium [website](http://geneontology.org/page/ontology-documentation). 
 60 | All the terms from inside the gene ontology database come with a GO **ID** and a GO **term** description. The **ID** column of the circ object is optional. So in case you want to use a functional analysis tool that is not based on gene ontology you won't have an **ID** column. The term description column does contain just that: a description of the term and the performance of the implemented functions does not depend on possible resemblance with gene ontology terms. **Count** is the number of genes assigned to a term. **Gene** names and their **logFC** are taken from the input list of selected genes. The significance of a term is indicated by the adjusted p-value (**adj_pval**). Terms with an adjusted p-value < 0.05 are considered as significantly enriched and are more likely to provide reliable information. The last column contains the **zscore**, an easy to calculate value to give you a hint if the biological process (/molecular function/cellular components) is more likely to be decreased (negative value) or increased (positive value). 
 61 | 
 62 | $$zscore=\frac{(up-down)}{\sqrt{count}}$$
 63 | 
 64 | Whereas *up* and *down* are the number of assigned genes up-regulated (logFC>0) in the data or down- regulated (logFC<0), respectively.
 65 | 
 66 | ##The plots
 67 | 
 68 | ###The modified barplot (GOBar)
 69 | Since we do not really know what to expect from our data the aim of the first figure should be to display as many terms as possible without going too much into details. Nevertheless, the figure shall help us to pick the interesting and valuable terms. Therefore, we need to have some parameters to quantify the importance. Since the majority of scientific sampled data is plotted using bar charts a modified version of the normal barplot function, named *GOBar*, is included. The *GOBar* function allows the user to quickly create an appealing barplot. 
 70 | 
 71 | ```{r GOBar, warning = FALSE, message = FALSE, fig.width = 8.3, fig.height = 6}
 72 | # Generate a simple barplot
 73 | GOBar(subset(circ, category == 'BP'))
 74 | ```
 75 | 
 76 | On the y-axis the significance of the terms is shown and the bars are ordered according to their z-score. If you want, you can change the order by setting the argument *order.by.zscore* to FALSE. In this case the bars are ordered based on their significance. Additionally, the barplot can be easily faceted according to the categories of the terms using the argument *display* of the plotting function (output not shown).
 77 | 
 78 | ```{r GOBar2, eval = FALSE, warning = FALSE, message = FALSE}
 79 | # Facet the barplot according to the categories of the terms 
 80 | GOBar(circ, display = 'multiple')
 81 | ```
 82 | 
 83 | To add a title use *title* and to change the colour scale of the z-score use the argument *zsc.col* (output not shown).
 84 | 
 85 | ```{r GOBar3, eval = FALSE, warning = FALSE, message = FALSE}
 86 | # Facet the barplot, add a title and change the colour scale for the z-score
 87 | GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan'))
 88 | ```
 89 | 
 90 | Barplots are common and very easy to read, but they might not be the absolute solution. Another possibility to display an overview for high- dimensional data is the bubble plot. 
 91 | 
 92 | ###The bubble plot (GOBubble)
 93 | The bubble plot is another possibility to get an overview of the enriched terms. The z-score is assigned to the x-axis and the negative logarithm of the adjusted p-value to the y-axis, as in the barplot (the higher the more significant). The area of the displayed circles is proportional to the number of genes (circ$count) assigned to the term and the colour corresponds to the category.
 94 | The help page of the plotting function (?GOBubble) lists all the arguments to change the layout of the plot. As a default the circles are labeled with the term ID. Therefore, a table connecting the IDs and terms is displayed on the right side by default. You can hide it by setting the argument *table.legend* to FALSE. If you want to display the term description instead set the argument *ID* to FALSE. Not all the circles are labeld due to the limited space and the overlap of the circles. A threshold for the labeling is set (default=5) based on the negative logarithm of the adjusted p-value.
 95 | 
 96 | 
 97 | ```{r GOBubble1, warning = FALSE, message = FALSE, fig.keep = 'none'}
 98 | # Generate the bubble plot with a label threshold of 3
 99 | GOBubble(circ, labels = 3)
100 | ```
101 | 
102 | ![GOBubble1.](GOBubble1.png)
103 | 
104 | To add a title, change the colour of the circles, facet the plot and to change the label threshold use the following arguments:
105 | 
106 | ```{r GOBubble2, warning = FALSE, message = FALSE, fig.keep = 'none'}
107 | # Add a title, change the colour of the circles, facet the plot according to the categories and change the label threshold
108 | GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3)
109 | ```
110 | 
111 | ![GOBubble2.](GOBubble2.png)
112 | 
113 | For the facet plot it is also possible to colour the background of the panels according to the displayed category by setting *bg.col* to TRUE.
114 | 
115 | ```{r GOBubble3, warning = FALSE, message = FALSE, fig.keep = 'none'}
116 | # Colour the background according to the category
117 | GOBubble(circ, title = 'Bubble plot with background colour', display = 'multiple', bg.col = T, labels = 3)
118 | ```
119 | 
120 | ![GOBubble3.](GOBubble3.png)
121 | 
122 | A new function, *reduce_overlap*, was included in the updated version of the package to reduce the number of redundant terms. So far, the implemented method is very simple + slow and needs further refinement. Nevertheless, by reducing the number of redundant terms the readability of plots, like the bubble plot, improves significantly. The function deletes all terms that have a gene overlap greater than or equal to a set threshold. The function keeps one term per group as a representative without taking into consideration the GO hierarchy.  
123 | 
124 | ```{r GOBubble4, warning = FALSE, message = FALSE, fig.keep = 'none', eval = FALSE}
125 | # Reduce redundant terms with a gene overlap >= 0.75...
126 | reduced_circ <- reduce_overlap(circ, overlap = 0.75)
127 | # ...and plot it
128 | GOBubble(reduced_circ, labels = 2.8)
129 | ```
130 | 
131 | ![GOBubble4.](GOBubble4.png)
132 | 
133 | ### Circular visualization of the results of gene- annotation enrichment analysis (GOCircle)
134 | The overview plots shall help to decide which of the terms are the most interesting to us. Of course, this decision depends although on the hypothesis and ideas you want to confirm with your data. Not always are the most significant terms the ones you are interested in. So, after manually selecting a set of valuable terms (EC$process) the next figure should provide us with more details on this specific terms. One of the major issues we figured out by presenting the plots was: it was sometimes difficult to interpret the information the z-score provides. Since the measure is not that common. As shown above it is simply the number of up- regulated genes minus the number of down- regulated genes divided by the square root of the count. The *GOCircle* plot emphasizes this fact.
135 | 
136 | ```{r GOCircle1, warning = FALSE, message = FALSE, fig.keep = 'none'}
137 | # Generate a circular visualization of the results of gene- annotation enrichment analysis
138 | GOCircle(circ)
139 | ```
140 | 
141 | ![Circle plot.](GOCirc.png)
142 | 
143 | The outer circle shows a scatter plot for each term of the logFC of the assigned genes. Red circles display up- regulation and blue ones down- regulation by default. The colours can be changed with the argument *lfc.col*. Therefore, it is easier to understand, why in some cases highly significant terms have a z-score close to zero. A z-score of zero does not mean that the term is not important. At least not as long as it is significantly enriched. It just shows that the z-score is a crude measure, because obviously the score does not take into account the functional level and activation dependencies of the single genes within a process. 
144 | You can change the layout of the plot with various arguments, see ?GOCirlce.The *nsub* argument needs a little bit more explanation to be used wisely. First of all, it can be a numeric or a character vector. If it is a character vector then it contains the IDs or term descriptions of the processes you want to display (output not shown).
145 | 
146 | ```{r GOCircle2, eval = FALSE}
147 | # Generate a circular visualization of selected terms
148 | IDs <- c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047')
149 | GOCircle(circ, nsub = IDs)
150 | ```
151 | 
152 | If *nsub* is a numeric vector then the number defines how many terms are displayed. It starts with the first row of the input data frame (output not shown).
153 | 
154 | ```{r GOCircle3, eval = FALSE}
155 | # Generate a circular visualization for 10 terms
156 | GOCircle(circ, nsub = 10)
157 | ```
158 | 
159 | This kind of visualization is only useful for a smaller set of terms. The maximum number of terms lies around 12. While the number of terms decreases the amount of displayed information increases. 
160 | 
161 | ### Display of the relationship between genes and terms (GOChord)
162 | Based on the **[Circos](http://circos.ca/)** plots designed by *[Martin Krzywinski](http://mkweb.bcgsc.ca/)* the *GOChord* plotting function was implemented. It displays the relationship between a list of selected genes and terms, as well as the logFC of the genes. As an input a binary membership matrix is necessary. You can build the matrix on your own or you use the implemented function *chord_dat* which does the job for you. The function takes three arguments: *data*, *genes* and *process*, of which from the last two only one is mandatory. So, the *circle_dat* combined your expression data with the results from the functional analysis. The bar and bubble plot allowed you to get a first impression of your data and now, you selected a list of genes and processes you think are valuable. *GOCircle* adds a layer to display the expression values of the genes assigned to the terms, but it lacks the information of the relationship between the genes and the terms. It is not easy to figure out if some of the genes are linked to multiple processes. The chord plot fills the void left by *GOCircle*.           
163 | 
164 | ```{r GOChord1, warning = FALSE, message = FALSE}
165 | # Define a list of genes which you think are interesting to look at. The item EC$genes of the toy 
166 | # sample contains the data frame of selected genes and their logFC. Have a look...
167 | head(EC$genes)
168 | # Since we have a lot of significantly enriched processes we selected some specific ones (EC$process)
169 | EC$process
170 | # Now it is time to generate the binary matrix
171 | chord <- chord_dat(circ, EC$genes, EC$process)
172 | head(chord)
173 | ```
174 | 
175 | Rows are genes and columns are terms. A '0' indicates that the gene is not assigned to the term; a '1' the opposite. As mentioned before it is possible to leave either the *genes* or the *process* argument out. If you pass on the *process* argument the binary matrix is build for the list of selected genes and all the processes with at least one assigned gene. On the other hand, if you just provide a set of processes without limiting the list of genes, the binary matrix is generated for all the genes which are assigned to at least one of the processes from your list (output not shown).
176 | 
177 | ```{r GOChord2, eval=FALSE, warning = FALSE, message = FALSE}
178 | # Generate the matrix with a list of selected genes
179 | chord <- chord_dat(data = circ, genes = EC$genes)
180 | # Generate the matrix with selected processes
181 | chord <- chord_dat(data = circ, process = EC$process)
182 | ```
183 | 
184 | Be aware that a pass on either *genes* or *process* might lead to a large binary matrix which results in a confusing visualization. The chart was designed for smaller subsets of high-dimensional data.
185 | Like the other plotting functions *GOChord* provides the user with a lot of arguments to change the layout of the plot, see ?GOChord. Most of the arguments address the adjustment of the font size of the labels, the space between them, the colour scale for the logFC and the colour of the ribbons. Despite the asthetics there are two other arguments: *gene.order* and *nlfc*. The first argument defines the order of the genes with the three possible options: 'logFC', 'alphabetical', 'none'. Actually the options are quite self- explanatory. Sometimes you are performing the differential expression analysis for multiple conditions and/or batches. Therefore, you want to include more than one logFC value per gene. To adjust to this situation you should use the *nlfc* argument. It is a numeric value and it defines the number of logFC columns within your binary membership matrix. The default is '1' assuming that most of the time you just have one contrast and one logFC value per gene.       
186 | 
187 | 
188 | ```{r GOChord3, warning = FALSE, message = FALSE, fig.keep = 'none'}
189 | # Create the plot
190 | GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5)
191 | ```
192 | ![Chord1.](GOChord1.png)
193 | 
194 | The *space* argument defines the space between the coloured rectangles representing the logFC. Also the font size of the gene labels (*gene.size*) and the space (*gene.space*) between them was changed. The genes were ordered according to their logFC values setting *gene.order* to 'logFC'.
195 | 
196 | Sometimes the plot gets a bit crowded and you would like to reduce the number of displayed genes or processes. You can do this automatically by making use of the *limit* argument. Limit is a vector with two cutoff values (default = c(0, 0)). The first value defines the minimum (>=) number of terms a gene has to be assigned to. The second value determines the number of genes assigned to a selected term. For example, to display only genes which are assigned to at least three processes you would use the following line of code (output not shown):
197 | 
198 | ```{r GOChord4, warning = FALSE, message = FALSE, fig.keep = 'none'}
199 | # Display only genes which are assigned to at least three processes
200 | GOChord(chord, limit = c(3, 0), gene.order = 'logFC')
201 | ```
202 | 
203 | ### Heatmap of genes and terms (GOHeat)
204 | Thanks to a very nice suggestion from *[Maureen Sartor, Ph.D.](http://sartorlab.ccmb.med.umich.edu/)* I implemented *GOHeat*. The *GOHeat* function generates a heatmap of the relationship between genes and terms similar to *GOChord*. Biological processes are displayed in rows and genes in columns. Each column is divided into smaller rectangles and the colouring of the tiles depends on the presence or abscence of logFC values. In addition genes are clustered to highlight groups of genes with similar annotated functions. Basically the function has two modes depending on the *nlfc* argument. If *nlfc = 0*, so no logFC values are available, the colouring encodes for the overall number of processes the respective gene is assigned to. Let's have a look at an example...
205 | 
206 | ```{r GOHeat1, warning = FALSE, message = FALSE, fig.keep = 'none'}
207 | # First, we use the chord object without logFC column to create the heatmap
208 | GOHeat(chord[,-8], nlfc = 0)
209 | ```
210 | 
211 | ![Heat1.](GOHeat_nolfc.png)
212 | 
213 | In case of *nlfc = 1* the colour corresponds to the logFC of the gene...
214 | 
215 | ```{r GOHeat2, warning = FALSE, message = FALSE, fig.keep = 'none'}
216 | # First, we use the chord object without logFC column to create the heatmap
217 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green'))
218 | ```
219 | 
220 | ![Heat2.](GOHeat_lfc.png)
221 | 
222 | ### Golden eye (GOCluster)
223 | The idea behind the *GOCluster* function is to visualize as much information as possible. Here is an example:
224 | 
225 | ```{r GOCluster, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'}
226 | GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2)
227 | ```
228 | ![GOCluster.](GOCluster.png)
229 | 
230 | Hierarchical clustering is a popular method for gene expression analysis due to its unsupervised nature assuring an unbiased result. Genes are grouped together based on their expression patterns, thus clusters are likely to contain sets of co-regulated or functionally related genes. *GOCluster* performs the hierarchical clustering of the gene expression profiles using the *hclust* method in core R. If you want to change the distance metric or the clustering algorithm use the arguments *metric* and *clust*, respectively. The resulting dendrogram is transformed with the help of *ggdendro* to be suitable for a visualization with *ggplot2*. As before a circular layout was chosen, because it is not only effective but also visually appealing. The first ring next to the dendrogram represents the logFC of the genes, which are actually the leaves of the clustering tree. In case you are interested in more than one contrast the *nlfc* argument is also available for this function. By default it is set to '1', so only one ring is drawn. Like always the logFC values are colour- coded with an user- definable colour scale (*lfc.col*). The next ring represents the terms assigned to the genes. For aesthetic reasons the terms should be reduced to a reasonable number with the argument *process*. The terms are colour- coded as well and you can change the default colours by using the argument *term.col*. Once again, the plotting function provides you with a bunch of arguments to change the layout of the plot and you can check them out on the help page, ?GOCluster. Probably the most important argument of the function is *clust.by*. It expects a character vector specifying if the clustering should be done for gene expression pattern ('logFC', as in the figure above) or functional categories ('terms').        
231 | 
232 | 
233 | ```{r GOCluster2, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'}
234 | GOCluster(circ, EC$process, clust.by = 'term', lfc.col = c('darkgoldenrod1', 'black', 'cyan1'))
235 | ```
236 | ![GOCluster2.](GOCluster2.png)
237 | 
238 | ### Venn diagram (GOVenn)
239 | In this biological context we implemented a Venn diagram that can be used to detect relations between various lists of differentially expressed genes or to explore the intersection of genes of multiple terms from the functional analysis. The Venn diagram does not only display the number of overlap genes, but it also displays the information about the gene expression patterns (commonly up- regulated, commonly down- regulated or contra- regulated). At the moment, maximal three datasets are aloud as an input. The input data frame contains at least two columns: one for the gene names and one for the logFC value. 
240 | 
241 | ```{r GOVenn, warning=FALSE, message=FALSE, fig.keep='none'}
242 | l1 <- subset(circ, term == 'heart development', c(genes,logFC))
243 | l2 <- subset(circ, term == 'plasma membrane', c(genes,logFC))
244 | l3 <- subset(circ, term == 'tissue morphogenesis', c(genes,logFC))
245 | GOVenn(l1,l2,l3, label = c('heart development', 'plasma membrane', 'tissue morphogenesis'))
246 | ```
247 | ![Venn diagram.](GOVenn.png)
248 | 
249 | For example, heart development and tissue morphogenesis share a set of 22 genes, whereas 5 are commonly up-regulated and 17 are commonly down-regulated. The important thing to notice is, that the pie charts don't display redundant information. Thus, if you compare three datasets the genes which are shared by all datasets (pie chart in the middle) are not included in the other pie charts. 
250 | The following [link](https://wwalter.shinyapps.io/Venn/) refers to the shinyapp of this tool. The web tool is slightly more interactive since the circles are area-proportional to the number of genes of the dataset and the small pie charts can be moved with sliders. It has also all the other options of the *GOVenn* function to change the layout of the plot. You can easily download the picture and gene lists.   
251 | 


--------------------------------------------------------------------------------
/man/EC.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \docType{data}
 4 | \name{EC}
 5 | \alias{EC}
 6 | \title{Transcriptomic information of endothelial cells.}
 7 | \format{A list containing 5 items}
 8 | \source{
 9 | \url{http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067}
10 | }
11 | \usage{
12 | data(EC)
13 | }
14 | \description{
15 | The data set contains the transcriptomic information of endothelial cells
16 | from two steady state tissues (brain and heart). More detailed information
17 | can be found in the paper by Nolan et al. 2013. The data was normalized and a
18 | statistical analysis was performed to determine differentially expressed
19 | genes. DAVID functional annotation tool was used to perform a gene-
20 | annotation enrichment analysis of the set of differentially expressed genes
21 | (adjusted p-value < 0.05).
22 | }
23 | \keyword{datasets}
24 | 
25 | 


--------------------------------------------------------------------------------
/man/GOBar.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \name{GOBar}
 4 | \alias{GOBar}
 5 | \title{Z-score coloured barplot.}
 6 | \usage{
 7 | GOBar(data, display, order.by.zscore = T, title, zsc.col)
 8 | }
 9 | \arguments{
10 | \item{data}{A data frame containing at least the term ID and/or term, the 
11 | adjusted p-value and the z-score. A possible input can be generated with 
12 | the \code{circle_dat} function}
13 | 
14 | \item{display}{A character vector indicating whether a single plot ('single')
15 | or a facet plot with panels for each category should be drawn 
16 | (default='single')}
17 | 
18 | \item{order.by.zscore}{Defines the order of the bars. If TRUE the bars are 
19 | ordered according to the z-scores of the processes. Otherwise the bars are 
20 | ordered by the negative logarithm of the adjusted p-value}
21 | 
22 | \item{title}{The title of the plot}
23 | 
24 | \item{zsc.col}{Character vector to define the colour scale for the z-score of 
25 | the form c(high, midpoint,low)}
26 | }
27 | \description{
28 | Z-score coloured barplot of terms ordered alternatively by 
29 |   z-score or the negative logarithm of the adjusted p-value
30 | }
31 | \details{
32 | If \code{display} is used to facet the plot the width of the panels 
33 |   will be proportional to the length of the x scale.
34 | }
35 | \examples{
36 | \dontrun{
37 | #Load the included dataset
38 | data(EC)
39 | 
40 | #Building the circ object
41 | circ<-circular_dat(EC$david, EC$genelist)
42 | 
43 | #Creating the bar plot
44 | GOBar(circ)
45 | 
46 | #Faceting the plot
47 | GOBar(circ, display='multiple')
48 | }
49 | }
50 | 
51 | 


--------------------------------------------------------------------------------
/man/GOBubble.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \name{GOBubble}
 4 | \alias{GOBubble}
 5 | \title{Bubble plot.}
 6 | \usage{
 7 | GOBubble(data, display, title, colour, labels, ID = T, table.legend = T,
 8 |   table.col = T, bg.col = F)
 9 | }
10 | \arguments{
11 | \item{data}{A data frame with coloumns for category, GO ID, term, adjusted 
12 | p-value, z-score, count(num of genes)}
13 | 
14 | \item{display}{A character vector. Indicates whether it should be a single 
15 | plot ('single') or a facet plot with panels for each category 
16 | (default='single')}
17 | 
18 | \item{title}{The title (on top) of the plot}
19 | 
20 | \item{colour}{A character vector which defines the colour of the bubbles for 
21 | each category}
22 | 
23 | \item{labels}{Sets a threshold for the displayed labels. The threshold refers
24 | to the -log(adjusted p-value) (default=5)}
25 | 
26 | \item{ID}{If TRUE then labels are IDs else terms}
27 | 
28 | \item{table.legend}{Defines whether a table of GO ID and GO term should be 
29 | displayed on the right side of the plot or not (default = TRUE)}
30 | 
31 | \item{table.col}{If TRUE then the table entries are coloured according to 
32 | their category, if FALSE then entries are black}
33 | 
34 | \item{bg.col}{Should only be used in case of a facet plot. If TRUE then the
35 | panel backgrounds are coloured according to the displayed category}
36 | }
37 | \description{
38 | The function creates a bubble plot of the input \code{data}. The
39 |   input \code{data} can be created with the help of the 
40 |   \code{\link{circle_dat}} function.
41 | }
42 | \details{
43 | The x- axis of the plot represents the z-score. The negative 
44 |   logarithm of the adjusted p-value (corresponding to the significance of the
45 |   term) is displayed on the y-axis. The area of the plotted circles is 
46 |   proportional to the number of genes assigned to the term. Each circle is 
47 |   coloured according to its category and labeled alternatively with the ID or 
48 |   term name.If static is set to FALSE the mouse hover effect will be enabled.
49 | }
50 | \examples{
51 | \dontrun{
52 | #Load the included dataset
53 | data(EC)
54 | 
55 | #Building the circ object
56 | circ <- circular_dat(EC$david, EC$genelist)
57 | 
58 | #Creating the bubble plot colouring the table entries according to the category
59 | GOBubble(circ, table.col = T)
60 | 
61 | #Creating the bubble plot displaying the term instead of the ID and without the table
62 | GOBubble(circ, ID = F, table.legend = F)
63 | 
64 | #Faceting the plot
65 | GOBubble(circ, display = 'multiple')
66 | }
67 | }
68 | 
69 | 


--------------------------------------------------------------------------------
/man/GOChord.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCluster.R
 3 | \name{GOChord}
 4 | \alias{GOChord}
 5 | \title{Displays the relationship between genes and terms.}
 6 | \usage{
 7 | GOChord(data, title, space, gene.order, gene.size, gene.space, nlfc = 1,
 8 |   lfc.col, lfc.min, lfc.max, ribbon.col, border.size, process.label, limit)
 9 | }
10 | \arguments{
11 | \item{data}{The matrix represents the binary relation (1= is related to, 0= 
12 | is not related to) between a set of genes (rows) and processes (columns); a
13 | column for the logFC of the genes is optional}
14 | 
15 | \item{title}{The title (on top) of the plot}
16 | 
17 | \item{space}{The space between the chord segments of the plot}
18 | 
19 | \item{gene.order}{A character vector defining the order of the displayed gene
20 | labels}
21 | 
22 | \item{gene.size}{The size of the gene labels}
23 | 
24 | \item{gene.space}{The space between the gene labels and the segement of the 
25 | logFC}
26 | 
27 | \item{nlfc}{Defines the number of logFC columns (default=1)}
28 | 
29 | \item{lfc.col}{The fill color for the logFC specified in the following form: 
30 | c(color for low values, color for the mid point, color for the high values)}
31 | 
32 | \item{lfc.min}{Specifies the minimium value of the logFC scale (default = -3)}
33 | 
34 | \item{lfc.max}{Specifies the maximum value of the logFC scale (default = 3)}
35 | 
36 | \item{ribbon.col}{The background color of the ribbons}
37 | 
38 | \item{border.size}{Defines the size of the ribbon borders}
39 | 
40 | \item{process.label}{The size of the legend entries}
41 | 
42 | \item{limit}{A vector with two cutoff values (default= c(0,0)). The first 
43 | value defines the minimum number of terms a gene has to be assigned to. The 
44 | second the minimum number of genes assigned to a selected term.}
45 | }
46 | \description{
47 | The GOChord function generates a circularly composited overview 
48 |   of selected/specific genes and their assigned processes or terms. More 
49 |   generally, it joins genes and processes via ribbons in an intersection-like
50 |   graph. The input can be generated with the \code{\link{chord_dat}} 
51 |   function.
52 | }
53 | \details{
54 | The \code{gene.order} argument has three possible options: "logFC", 
55 |   "alphabetical", "none", which are quite self- explanatory.
56 |   
57 |   Maybe the most important argument of the function is \code{nlfc}.If your 
58 |   \code{data} does not contain a column of logFC values you have to set
59 |   \code{nlfc = 0}. Differential expression analysis can be performed for
60 |   multiple conditions and/or batches. Therefore, the data frame might contain
61 |   more than one logFC value per gene. To adjust to this situation the
62 |   \code{nlfc} argument is used as well. It is a numeric value and it defines
63 |   the number of logFC columns of your \code{data}. The default is "1"
64 |   assuming that most of the time only one contrast is considered.
65 |   
66 |   To represent the data more useful it might be necessary to reduce the 
67 |   dimension of \code{data}. This can be achieved with \code{limit}. The first
68 |   value of the vector defines the threshold for the minimum number of terms a
69 |   gene has to be assigned to in order to be represented in the plot. Most of
70 |   the time it is more meaningful to represent genes with various functions. A
71 |   value of 3 excludes all genes with less than three term assignments. 
72 |   Whereas the second value of the parameter restricts the number of terms 
73 |   according to the number of assigned genes. All terms with a count smaller 
74 |   or equal to the threshold are excluded.
75 | }
76 | \examples{
77 | \dontrun{
78 | # Load the included dataset
79 | data(EC)
80 | 
81 | # Generating the binary matrix
82 | chord<-chord_dat(circ,EC$genes,EC$process)
83 | 
84 | # Creating the chord plot
85 | GOChord(chord)
86 | 
87 | # Excluding process with less than 5 assigned genes
88 | GOChord(chord, limit = c(0,5))
89 | 
90 | # Creating the chord plot genes ordered by logFC and a different logFC color scale
91 | GOChord(chord,space=0.02,gene.order='logFC',lfc.col=c('red','black','cyan'))
92 | }
93 | }
94 | \seealso{
95 | \code{\link{chord_dat}}
96 | }
97 | 
98 | 


--------------------------------------------------------------------------------
/man/GOCircle.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \name{GOCircle}
 4 | \alias{GOCircle}
 5 | \title{Circular visualization of the results of a functional analysis.}
 6 | \usage{
 7 | GOCircle(data, title, nsub, rad1, rad2, table.legend = T, zsc.col, lfc.col,
 8 |   label.size, label.fontface)
 9 | }
10 | \arguments{
11 | \item{data}{A special data frame which should be the result of 
12 | \code{circle_dat}}
13 | 
14 | \item{title}{The title of the plot}
15 | 
16 | \item{nsub}{A numeric or character vector. If it's numeric then the number 
17 | defines how many processes are displayed (starting from the first row of 
18 | \code{data}). If it's a character string of processes then these processes 
19 | are displayed}
20 | 
21 | \item{rad1}{The radius of the inner circle (default=2)}
22 | 
23 | \item{rad2}{The radius of the outer circle (default=3)}
24 | 
25 | \item{table.legend}{Shall a table be displayd or not? (default=TRUE)}
26 | 
27 | \item{zsc.col}{Character vector to define the colour scale for the z-score of 
28 | the form c(high, midpoint,low)}
29 | 
30 | \item{lfc.col}{A character vector specifying the colour for up- and 
31 | down-regulated genes}
32 | 
33 | \item{label.size}{Size of the segment labels (default=5)}
34 | 
35 | \item{label.fontface}{Font style of the segment labels (default='bold')}
36 | }
37 | \description{
38 | The circular plot combines gene expression and gene- annotation 
39 |   enrichment data. A subset of terms is displayed like the \code{GOBar} plot 
40 |   in combination with a scatterplot of the gene expression data. The whole 
41 |   plot is drawn on a specific coordinate system to achieve the circular 
42 |   layout.The segments are labeled with the term ID.
43 | }
44 | \details{
45 | The outer circle shows a scatter plot for each term of the logFC of 
46 |   the assigned genes. The colours can be changed with the argument 
47 |   \code{lfc.col}.
48 |   
49 |   The \code{nsub} argument needs a bit more explanation to be used wisely. First of 
50 |   all, it can be a numeric or a character vector. If it is a character vector
51 |   then it contains the IDs or term descriptions of the displayed processes.If
52 |   \code{nsub} is a numeric vector then the number defines how many terms are 
53 |   displayed. It starts with the first row of the input data frame.
54 | }
55 | \examples{
56 | \dontrun{
57 | # Load the included dataset
58 | data(EC)
59 | 
60 | # Building the circ object
61 | circ <- circle_dat(EC$david, EC$genelist)
62 | 
63 | # Creating the circular plot
64 | GOCircle(circ)
65 | 
66 | # Creating the circular plot with a different colour scale for the logFC
67 | GOCircle(circ, lfc.col = c('purple', 'orange'))
68 | 
69 | # Creating the circular plot with a different colour scale for the z-score
70 | GOCircle(circ, zsc.col = c('yellow', 'black', 'cyan'))
71 | 
72 | # Creating the circular plot with different font style
73 | GOCircle(circ, label.size = 5, label.fontface = 'italic')
74 | }
75 | }
76 | \seealso{
77 | \code{\link{circle_dat}}, \code{\link{GOBar}}
78 | }
79 | 
80 | 


--------------------------------------------------------------------------------
/man/GOCluster.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCluster.R
 3 | \name{GOCluster}
 4 | \alias{GOCluster}
 5 | \title{Circular dendrogram.}
 6 | \usage{
 7 | GOCluster(data, process, metric, clust, clust.by, nlfc, lfc.col, lfc.min,
 8 |   lfc.max, lfc.space, lfc.width, term.col, term.space, term.width)
 9 | }
10 | \arguments{
11 | \item{data}{A data frame which should be the result of 
12 | \code{\link{circle_dat}} in case the data contains only one logFC column. 
13 | Otherwise \code{data} is a data frame whereas the first column contains the
14 | genes, the second the term and the following columns the logFCs of the 
15 | different contrasts.}
16 | 
17 | \item{process}{A character vector of selected processes (ID or term
18 | description)}
19 | 
20 | \item{metric}{A character vector specifying the distance measure to be used 
21 | (default='euclidean'), see \code{dist}}
22 | 
23 | \item{clust}{A character vector specifying the agglomeration method to be 
24 | used (default='average'), see \code{hclust}}
25 | 
26 | \item{clust.by}{A character vector specifying if the clustering should be 
27 | done for gene expression pattern or functional categories. By default the 
28 | clustering is done based on the functional categories.}
29 | 
30 | \item{nlfc}{If TRUE \code{data} contains multiple logFC columns (default= 
31 | FALSE)}
32 | 
33 | \item{lfc.col}{Character vector to define the color scale for the logFC of 
34 | the form c(high, midpoint,low)}
35 | 
36 | \item{lfc.min}{Specifies the minimium value of the logFC scale (default = -3)}
37 | 
38 | \item{lfc.max}{Specifies the maximum value of the logFC scale (default = 3)}
39 | 
40 | \item{lfc.space}{The space between the leafs of the dendrogram and the ring 
41 | for the logFC}
42 | 
43 | \item{lfc.width}{The width of the logFC ring}
44 | 
45 | \item{term.col}{A character vector specifying the colors of the term bands}
46 | 
47 | \item{term.space}{The space between the logFC ring and the term ring}
48 | 
49 | \item{term.width}{The width of the term ring}
50 | }
51 | \description{
52 | GOCluster generates a circular dendrogram of the \code{data} 
53 |   clustering using by default euclidean distance and average linkage.The 
54 |   inner ring displays the color coded logFC while the outside one encodes the
55 |   assigned terms to each gene.
56 | }
57 | \details{
58 | The inner ring can be split into smaller rings to display multiply
59 |   logFC values resulting from various comparisons.
60 | }
61 | \examples{
62 | \dontrun{
63 | #Load the included dataset
64 | data(EC)
65 | 
66 | #Generating the circ object
67 | circ<-circular_dat(EC$david, EC$genelist)
68 | 
69 | #Creating the cluster plot
70 | GOCluster(circ, EC$process)
71 | 
72 | #Cluster the data according to gene expression and assigning a different color scale for the logFC
73 | GOCluster(circ,EC$process,clust.by='logFC',lfc.col=c('darkgoldenrod1','black','cyan1'))
74 | }
75 | }
76 | 
77 | 


--------------------------------------------------------------------------------
/man/GOHeat.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOHeat.R
 3 | \name{GOHeat}
 4 | \alias{GOHeat}
 5 | \title{Displays heatmap of the relationship between genes and terms.}
 6 | \usage{
 7 | GOHeat(data, nlfc, fill.col)
 8 | }
 9 | \arguments{
10 | \item{data}{The matrix represents the binary relation (1= is related to, 0= 
11 | is not related to) between a set of genes (rows) and processes (columns)}
12 | 
13 | \item{nlfc}{Defines the number of logFC columns (default = 0)}
14 | 
15 | \item{fill.col}{Defines the color scale break points}
16 | }
17 | \description{
18 | The GOHeat function generates a heatmap of the relationship 
19 |   between genes and terms. Biological processes are displayed in rows and
20 |   genes in columns. In addition genes are clustered to highlight groups of
21 |   genes with similar annotated functions. The input can be generated with the
22 |   \code{\link{chord_dat}} function.
23 | }
24 | \details{
25 | The heatmap has in general two modes which depend on the \code{nlfc}
26 |   argument. If \code{nlfc = 0}, so no logFC values are available, the 
27 |   coloring encodes for the overall number of processes the respective gene is
28 |   assigned to. In case of \code{nlfc = 1} the color corresponds to the logFC 
29 |   of the gene.
30 | }
31 | \examples{
32 | \dontrun{
33 | # Load the included dataset
34 | data(EC)
35 | 
36 | # Generate the circ object
37 | circ <- circle_dat(EC$david, EC$genelist)
38 | 
39 | # Generate the chord object
40 | chord <- chord_dat(circ, EC$genes, EC$process)
41 | 
42 | # Create the plot with user-defined colors
43 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green'))
44 | }
45 | }
46 | 
47 | 


--------------------------------------------------------------------------------
/man/GOVenn.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOVenn.R
 3 | \name{GOVenn}
 4 | \alias{GOVenn}
 5 | \title{Venn diagram of differentially expressed genes.}
 6 | \usage{
 7 | GOVenn(data1, data2, data3, title, label, lfc.col, circle.col, plot = T)
 8 | }
 9 | \arguments{
10 | \item{data1}{A data frame consisting of two columns: ID, logFC}
11 | 
12 | \item{data2}{A data frame consisting of two columns: ID, logFC}
13 | 
14 | \item{data3}{A data frame consisting of two columns: ID, logFC}
15 | 
16 | \item{title}{The title of the plot}
17 | 
18 | \item{label}{A character vector to define the legend keys}
19 | 
20 | \item{lfc.col}{A character vector determining the background colors of the 
21 | pie segments representing up- and down- regulated genes}
22 | 
23 | \item{circle.col}{A character vector to assign clockwise colors for the 
24 | circles}
25 | 
26 | \item{plot}{If TRUE only the venn diagram is plotted. Otherwise the function 
27 | returns a list with two items: the actual plot and a list containing the 
28 | overlap entries (default= TRUE)}
29 | }
30 | \description{
31 | The function compares lists of differentially expressed genes 
32 |   and illustrates possible relations.Additionally it represents the variety 
33 |   of gene expression patterns within the intersection in small pie charts 
34 |   with three segements. Clockwise are shown the number of commonly up- 
35 |   regulated, commonly down- regulated and contra- regulated genes.
36 | }
37 | \details{
38 | The \code{plot} argument can be used to adjust the amount of 
39 |   information that is returned by calling the function. If you are only 
40 |   interested in the actual plot of the venn diagram, \code{plot} should be 
41 |   set to TRUE. Sometimes you also want to know the elements of the 
42 |   intersections. In this case \code{plot} should be set to FALSE and the 
43 |   function call will return a list of two items. The first item, that can be 
44 |   accessed by $plot, contains the plotting information. Additionally, a list
45 |   ($table) will be returned containing the elements of the various overlaps.
46 | }
47 | \examples{
48 | \dontrun{
49 | #Load the included dataset
50 | data(EC)
51 | 
52 | #Generating the circ object
53 | circ<-circular_dat(EC$david, EC$genelist)
54 | 
55 | #Selecting terms of interest
56 | l1<-subset(circ,term=='heart development',c(genes,logFC))
57 | l2<-subset(circ,term=='plasma membrane',c(genes,logFC))
58 | l3<-subset(circ,term=='tissue morphogenesis',c(genes,logFC))
59 | 
60 | GOVenn(l1,l2,l3, label=c('heart development','plasma membrane','tissue morphogenesis'))
61 | }
62 | }
63 | 
64 | 


--------------------------------------------------------------------------------
/man/chord_dat.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \name{chord_dat}
 4 | \alias{chord_dat}
 5 | \title{Creates a binary matrix.}
 6 | \usage{
 7 | chord_dat(data, genes, process)
 8 | }
 9 | \arguments{
10 | \item{data}{A data frame with at least two coloumns: GO ID|term and genes. 
11 | Each row contains exactly one GO ID|term and one gene. A column containing
12 | logFC values is optional and might be used if \code{genes} is missing.}
13 | 
14 | \item{genes}{A character vector of selected genes OR data frame with coloumns
15 | for gene ID and logFC.}
16 | 
17 | \item{process}{A character vector of selected processes}
18 | }
19 | \value{
20 | A binary matrix
21 | }
22 | \description{
23 | The function creates a matrix which represents the binary 
24 |   relation (1= is related to, 0= is not related to) between selected genes 
25 |   (row) and processes (column). The resulting matrix can be visualized with 
26 |   the \code{\link{GOChord}} function.
27 | }
28 | \details{
29 | If more than one logFC value for each gene is at disposal, only one 
30 |   should be used to create the binary matrix. The other values have to be 
31 |   added manually later.
32 | }
33 | \examples{
34 | \dontrun{
35 | # Load the included dataset
36 | data(EC)
37 | 
38 | # Building the circ object
39 | circ <- circle_dat(EC$david, EC$genelist)
40 | 
41 | # Building the binary matrix
42 | chord <- chord_dat(circ, EC$genes, EC$process)
43 | 
44 | }
45 | }
46 | \seealso{
47 | \code{\link{GOChord}}
48 | }
49 | 
50 | 


--------------------------------------------------------------------------------
/man/circle_dat.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \name{circle_dat}
 4 | \alias{circle_dat}
 5 | \title{Creates a plotting object.}
 6 | \usage{
 7 | circle_dat(terms, genes)
 8 | }
 9 | \arguments{
10 | \item{terms}{A data frame with columns for 'category', 'ID', 'term', adjusted
11 | p-value ('adj_pval') and 'genes'}
12 | 
13 | \item{genes}{A data frame with columns for 'ID', 'logFC'}
14 | }
15 | \description{
16 | The function takes the results from a functional analysis (for 
17 |   example DAVID) and combines it with a list of selected genes and their 
18 |   logFC. The resulting data frame can be used as an input for various ploting
19 |   functions.
20 | }
21 | \details{
22 | Since most of the gene- annotation enrichment analysis are based on 
23 |   the gene ontology database the package was build with this structure in 
24 |   mind, but is not restricted to it. Gene ontology is structured as an 
25 |   acyclic graph and it provides terms covering different areas. These terms 
26 |   are grouped into three independent \code{categories}: BP (biological 
27 |   process), CC (cellular component) or MF (molecular function).
28 |   
29 |   The "ID" and "term" columns of the \code{terms} data frame refer to the ID 
30 |   and term description, whereas the ID is optional.
31 |   
32 |   The "ID" column of the \code{genes} data frame can contain any unique 
33 |   identifier. Nevertheless, the identifier has to be the same as in "genes" 
34 |   from \code{terms}.
35 | }
36 | \examples{
37 | \dontrun{
38 | #Load the included dataset
39 | data(EC)
40 | 
41 | #Building the circ object
42 | circ<-circular_dat(EC$david, EC$genelist)
43 | }
44 | }
45 | 
46 | 


--------------------------------------------------------------------------------
/man/reduce_overlap.Rd:
--------------------------------------------------------------------------------
 1 | % Generated by roxygen2: do not edit by hand
 2 | % Please edit documentation in R/GOCore.R
 3 | \name{reduce_overlap}
 4 | \alias{reduce_overlap}
 5 | \title{Eliminates redundant terms.}
 6 | \usage{
 7 | reduce_overlap(data, overlap)
 8 | }
 9 | \arguments{
10 | \item{data}{A data frame created with \code{circle_dat}.}
11 | 
12 | \item{overlap}{Skalar indicating the threshold for gene overlap (default = 0.75).}
13 | }
14 | \description{
15 | The function eliminates all terms with a gene overlap >= set
16 |   threshold (\code{overlap}) The reduced dataset can be used to improve the
17 |   readability of plots such as \code{GOBubble} and \code{GOBar}
18 | }
19 | \details{
20 | The function is currently very slow.
21 | }
22 | \examples{
23 | \dontrun{
24 | # Load the included dataset
25 | data(EC)
26 | 
27 | # Building the circ object
28 | circ <- circle_dat(EC$david, EC$genelist)
29 | 
30 | # Eliminate redundant terms
31 | reduced_circ <- reduce_overlap(circ)
32 | 
33 | # Plot reduced data
34 | GOBubble(reduced_circ)
35 | 
36 | }
37 | }
38 | 
39 | 


--------------------------------------------------------------------------------
/vignettes/GOBar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBar.png


--------------------------------------------------------------------------------
/vignettes/GOBubble1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble1.png


--------------------------------------------------------------------------------
/vignettes/GOBubble2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble2.png


--------------------------------------------------------------------------------
/vignettes/GOBubble3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble3.png


--------------------------------------------------------------------------------
/vignettes/GOBubble4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOBubble4.png


--------------------------------------------------------------------------------
/vignettes/GOChord1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOChord1.png


--------------------------------------------------------------------------------
/vignettes/GOCirc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOCirc.png


--------------------------------------------------------------------------------
/vignettes/GOCluster.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOCluster.png


--------------------------------------------------------------------------------
/vignettes/GOCluster2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOCluster2.png


--------------------------------------------------------------------------------
/vignettes/GOHeat_lfc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOHeat_lfc.png


--------------------------------------------------------------------------------
/vignettes/GOHeat_nolfc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOHeat_nolfc.png


--------------------------------------------------------------------------------
/vignettes/GOVenn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/GOVenn.png


--------------------------------------------------------------------------------
/vignettes/GOplot.css:
--------------------------------------------------------------------------------
  1 | body {
  2 |   background-color: #fff;
  3 |   margin: 1em auto;
  4 |   max-width: 800px;
  5 |   overflow: visible;
  6 |   padding-left: 2em;
  7 |   padding-right: 2em;
  8 |   font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
  9 |   font-size: 14px;
 10 |   line-height: 20px;
 11 | }
 12 | 
 13 | #header {
 14 |   text-align: center;
 15 |   color: #660000;
 16 | }
 17 | 
 18 | #TOC {
 19 |   clear: both;
 20 |   margin: 0 0 10px 0;
 21 |   padding: 4px;
 22 |   border: 1px solid #CCCCCC;
 23 |   border-radius: 5px;
 24 |   background-color: #f6f6f6;
 25 |   font-size: 13px;
 26 |   line-height: 1.3;
 27 | }
 28 |   #TOC .toctitle {
 29 |     font-weight: bold;
 30 |     font-size: 15px;
 31 |     margin-left: 5px;
 32 |   }
 33 | 
 34 |   #TOC ul {
 35 |     padding-left: 40px;
 36 |     margin-left: -1.5em;
 37 |     margin-top: 5px;
 38 |     margin-bottom: 5px;
 39 |   }
 40 |   #TOC ul ul {
 41 |     margin-left: -2em;
 42 |   }
 43 |   #TOC li {
 44 |     line-height: 16px;
 45 |   }
 46 | 
 47 | table {
 48 |   margin: auto;
 49 |   min-width: 40%;
 50 |   border-width: 1px;
 51 |   border-color: #DDDDDD;
 52 |   border-style: outset;
 53 |   border-collapse: collapse;
 54 | }
 55 | table[summary="R argblock"] {
 56 |   width: 100%;
 57 |   border: none;
 58 | }
 59 | table th {
 60 |   border-width: 2px;
 61 |   padding: 5px;
 62 |   border-style: inset;
 63 | }
 64 | table td {
 65 |   border-width: 1px;
 66 |   border-style: inset;
 67 |   line-height: 18px;
 68 |   padding: 5px 5px;
 69 | }
 70 | table, table th, table td {
 71 |   border-left-style: none;
 72 |   border-right-style: none;
 73 | }
 74 | table tr.odd {
 75 |   background-color: #E5E4E2;
 76 | }
 77 | 
 78 | p {
 79 |   margin: 0.5em 0;
 80 | }
 81 | 
 82 | blockquote {
 83 |   background-color: #f6f6f6;
 84 |   padding: 13px;
 85 |   padding-bottom: 1px;
 86 | }
 87 | 
 88 | hr {
 89 |   border-style: solid;
 90 |   border: none;
 91 |   border-top: 1px solid #777;
 92 |   margin: 28px 0;
 93 |   background-color: darked;
 94 | }
 95 | 
 96 | dl {
 97 |   margin-left: 0;
 98 | }
 99 |   dl dd {
100 |     margin-bottom: 13px;
101 |     margin-left: 13px;
102 |   }
103 |   dl dt {
104 |     font-weight: bold;
105 |   }
106 | 
107 | ul {
108 |   margin-top: 0;
109 | }
110 |   ul li {
111 |     list-style: circle outside;
112 |   }
113 |   ul ul {
114 |     margin-bottom: 0;
115 |   }
116 | 
117 | pre, code {
118 |   background-color: #f5f5f5;
119 |   border-radius: 3px;
120 |   color: #333;
121 | }
122 | pre {
123 |   overflow-x: auto;
124 |   border-radius: 3px;
125 |   margin: 5px 0px 10px 0px;
126 |   padding: 10px;
127 | }
128 | pre:not([class]) {
129 |   background-color: white;
130 |   border: #f5f5f5 1px solid;
131 | }
132 | pre:not([class]) code {
133 |   color: #444;
134 |   background-color: white;
135 | }
136 | code {
137 |   font-family: monospace;
138 |   font-size: 90%;
139 | }
140 | p > code, li > code {
141 |   padding: 2px 4px;
142 |   color: #d14;
143 |   border: 1px solid #e1e1e8;
144 |   white-space: inherit;
145 | }
146 | div.figure {
147 |   text-align: center;
148 |   width: 100%;
149 |   height: 50%;
150 | }
151 | table > caption, div.figure p.caption {
152 |   font-style: italic;
153 | }
154 | table > caption span, div.figure p.caption span {
155 |   font-style: normal;
156 |   font-weight: bold;
157 | }
158 | p {
159 |   margin: 0 0 10px;
160 | }
161 | table {
162 |   margin: auto auto 10px auto;
163 | }
164 | 
165 | img {
166 |   background-color: #FFFFFF;
167 |   padding: 2px;
168 |   border: 1px solid darkred;
169 |   border-radius: 3px;
170 |   margin-left: auto;
171 |   margin-right: auto;
172 |   max-width: 95%;
173 | }
174 | 
175 | h1 {
176 |   margin-top: 0;
177 |   font-size: 35px;
178 |   line-height: 40px;
179 | }
180 | 
181 | h2 {
182 |   border-bottom: 3px solid darkred;
183 |   padding-top: 10px;
184 |   padding-bottom: 2px;
185 |   font-size: 145%;
186 | }
187 | 
188 | h3 {
189 |   padding-top: 10px;
190 |   font-size: 120%;
191 | }
192 | 
193 | h4 {
194 |   margin-left: 8px;
195 |   font-size: 105%;
196 | }
197 | 
198 | h5, h6 {
199 |   font-size: 105%;
200 | }
201 | 
202 | a {
203 |   color: #0033dd;
204 |   text-decoration: none;
205 | }
206 |   a:hover {
207 |     color: #6666ff; }
208 |   a:visited {
209 |     color: #800080; }
210 |   a:visited:hover {
211 |     color: #BB00BB; }
212 |   a[href^="http:"] {
213 |     text-decoration: underline; }
214 |   a[href^="https:"] {
215 |     text-decoration: underline; }
216 | 
217 | div.r-help-page {
218 |   background-color: #f9f9f9;
219 |   border-bottom: #ddd 1px solid;
220 |   margin-bottom: 10px;
221 |   padding: 10px;
222 | }
223 | div.r-help-page:hover {
224 |   background-color: #f4f4f4;
225 | }
226 | 
227 | /* Class described in https://benjeffrey.com/posts/pandoc-syntax-highlighting-css
228 |    Colours from https://gist.github.com/robsimmons/1172277 */
229 | 
230 | code > span.kw { color: #555; font-weight: bold; } /* Keyword */
231 | code > span.dt { color: black; } /* DataType */
232 | code > span.dv { color: #40a070; } /* DecVal (decimal values) */
233 | /*code > span.bn { color: #d14; }  BaseN */
234 | /*code > span.fl { color: #d14; }  Float */
235 | /*code > span.ch { color: #d14; }  Char */
236 | /*code > span.st { color: #d14; }  String */
237 | code > span.co { color: darkred; font-style: italic; } /* Comment */
238 | /*code > span.ot { color: #007020; }  OtherToken */
239 | /*code > span.al { color: #ff0000; font-weight: bold; }  AlertToken */
240 | /*code > span.fu { color: #900; font-weight: bold; }  Function calls */
241 | /*code > span.er { color: #a61717; background-color: #e3d2d2; }  ErrorTok */
242 | 


--------------------------------------------------------------------------------
/vignettes/GOplot_vignette.Rmd:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "GOplot 1.0.2"
  3 | author: "Wencke Walter"
  4 | date: "`r Sys.Date()`"
  5 | output: 
  6 |   rmarkdown::html_vignette:
  7 |     css: GOplot.css
  8 | vignette: >
  9 |   %\VignetteIndexEntry{GOplot_0.2}
 10 |   %\VignetteEngine{knitr::rmarkdown}
 11 |   \usepackage[utf8]{inputenc}
 12 | ---
 13 | 
 14 | A manual to exploit the possibilities and limitations of the R package GOplot.
 15 | 
 16 | 
 17 | ##Introduction
 18 | The GOplot package concentrates on the visualization of biological data. More precisely, the package will help combine and integrate expression data with the results of a functional analysis. The package cannot be used to perform any of these analyses. It is for visualization purpose only. In all the scientific fields we visualize information to meet a basic need- to tell a story. Attributable to space restrictions and a general need to present everything neat and tidy most of the times it is simply not possible to actually tell a story. Therefore, we use vision to communicate information. A well designed and elaborated figure provides the beholder with high-dimensional information in a much smaller space than for example a table. The idea of the package is to provide the user with functions that allow a quick examination of large amounts of data, expose trends and find patterns & correlations within the data. Effective data visualization is an important tool in the decision making process and helps to find further pieces of the puzzle picturing the answer of your biological question. Based on that you will be able to confirm or falsify your hypotheses. You might even start to look in a different direction to investigate your topic relying on the insight a new visualization provides. The plotting functions of the package were developed with a hierarchical structure in mind; starting with a general overview and closing with definite subsets of selected genes and terms. To explain the idea let us use an example.
 19 | 
 20 | ##The toy example
 21 | GOplot comes with a manually compiled data set. Selected samples were downloaded from gene expression omnibus (accession number: *[GSE47067](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47067)*). As a brief summary, the data set contains the transcriptomic information of endothelial cells from two steady state tissues (brain and heart). More detailed information can be found in the paper by *[Nolan et al. 2013](http://www.ncbi.nlm.nih.gov/pubmed/23871589)*. The data was normalized and a statistical analysis was performed to determine differentially expressed genes. *DAVID* functional annotation tool was used to perform a gene- annotation enrichment analysis of the set of differentially expressed genes (adjusted p-value < 0.05). The data set contains the five following items:
 22 | 
 23 | ```{r table1, echo = FALSE, results = 'asis'}
 24 | toy<-data.frame(Name=c('EC$eset','EC$genelist','EC$david','EC$genes','EC$process'),Description=c('Data frame of normalized expression values of brain and heart endothelial cells (3 replicates)','Data frame of differentially expressed genes (adjusted p-value < 0.05)','Data frame of results from a functional analysis of the differentially expressed genes performed with DAVID','Data frame of selected genes with logFC','Character vector of selected enriched biological processes'),Dimension=c('20644 x 7','2039 x 7','174 x 5','37 x 2','7'))
 25 | knitr::kable(toy, colnames=c('Name','Description','Dimension (row, col)'))
 26 | ```
 27 | 
 28 | ##Getting started
 29 | As a first step we want to get an overview of the enriched GO terms of our differentially expressed genes. But before we start plotting we need to bring the data in the right format for the plotting functions. In general, the data object of the plotting functions can be created manually, but the package includes a function that does the job for you. The *circle_dat* function combines the result of the functional analysis with a list of selected genes and their logFC. Most likely a list of differentially expressed genes. *circle_dat* takes two data frames as an input. The first one contains the results of the functional analysis and should have at least four columns (category, term, genes, adjusted p-value). Additionally, a data frame of the selected genes and their logFC is needed. This data frame can be, for example, the result from a statistical analysis performed with *limma*. Let us have a look at the mentioned data frames.   
 30 | 
 31 | ```{r glimpse, warning = FALSE, message = FALSE}
 32 | library(GOplot)
 33 | # Load the dataset
 34 | data(EC)
 35 | # Get a glimpse of the data format of the results of the functional analysis... 
 36 | head(EC$david)
 37 | # ...and of the data frame of selected genes
 38 | head(EC$genelist)
 39 | ```
 40 | 
 41 | Now, that we know what the input data looks like it's time to use the *cirlce_dat* function to create the plotting object.
 42 | 
 43 | ```{r circ_object, warning = FALSE, message = FALSE}
 44 | # Generate the plotting object
 45 | circ <- circle_dat(EC$david, EC$genelist)
 46 | ```
 47 | 
 48 | The **circ** object has eight columns with the following names: 
 49 | 
 50 | * category
 51 | * ID
 52 | * term
 53 | * count
 54 | * gene
 55 | * logFC
 56 | * adj_pval
 57 | * zscore
 58 | 
 59 | Since most of the gene- annotation enrichment analysis are based on the gene ontology database the package was build with this structure in mind, but is not restricted to it. As explained by *[Ashburner et al.](http://www.ncbi.nlm.nih.gov/pubmed/10802651)* in their paper from the year 2000, gene ontology is structured as an acyclic graph and it provides terms covering different areas. These terms are grouped into three independent categories: BP (biological process), CC (cellular component) or MF (molecular function). The first column of the **circ** object contains this information, which was already given in the input. For more information on the structure of gene ontology, have a look at the documentation section of the gene ontology consortium [website](http://geneontology.org/page/ontology-documentation). 
 60 | All the terms from inside the gene ontology database come with a GO **ID** and a GO **term** description. The **ID** column of the circ object is optional. So in case you want to use a functional analysis tool that is not based on gene ontology you won't have an **ID** column. The term description column does contain just that: a description of the term and the performance of the implemented functions does not depend on possible resemblance with gene ontology terms. **Count** is the number of genes assigned to a term. **Gene** names and their **logFC** are taken from the input list of selected genes. The significance of a term is indicated by the adjusted p-value (**adj_pval**). Terms with an adjusted p-value < 0.05 are considered as significantly enriched and are more likely to provide reliable information. The last column contains the **zscore**, an easy to calculate value to give you a hint if the biological process (/molecular function/cellular components) is more likely to be decreased (negative value) or increased (positive value). 
 61 | 
 62 | $$zscore=\frac{(up-down)}{\sqrt{count}}$$
 63 | 
 64 | Whereas *up* and *down* are the number of assigned genes up-regulated (logFC>0) in the data or down- regulated (logFC<0), respectively.
 65 | 
 66 | ##The plots
 67 | 
 68 | ###The modified barplot (GOBar)
 69 | Since we do not really know what to expect from our data the aim of the first figure should be to display as many terms as possible without going too much into details. Nevertheless, the figure shall help us to pick the interesting and valuable terms. Therefore, we need to have some parameters to quantify the importance. Since the majority of scientific sampled data is plotted using bar charts a modified version of the normal barplot function, named *GOBar*, is included. The *GOBar* function allows the user to quickly create an appealing barplot. 
 70 | 
 71 | ```{r GOBar, warning = FALSE, message = FALSE, fig.width = 8.3, fig.height = 6}
 72 | # Generate a simple barplot
 73 | GOBar(subset(circ, category == 'BP'))
 74 | ```
 75 | 
 76 | On the y-axis the significance of the terms is shown and the bars are ordered according to their z-score. If you want, you can change the order by setting the argument *order.by.zscore* to FALSE. In this case the bars are ordered based on their significance. Additionally, the barplot can be easily faceted according to the categories of the terms using the argument *display* of the plotting function (output not shown).
 77 | 
 78 | ```{r GOBar2, eval = FALSE, warning = FALSE, message = FALSE}
 79 | # Facet the barplot according to the categories of the terms 
 80 | GOBar(circ, display = 'multiple')
 81 | ```
 82 | 
 83 | To add a title use *title* and to change the colour scale of the z-score use the argument *zsc.col* (output not shown).
 84 | 
 85 | ```{r GOBar3, eval = FALSE, warning = FALSE, message = FALSE}
 86 | # Facet the barplot, add a title and change the colour scale for the z-score
 87 | GOBar(circ, display = 'multiple', title = 'Z-score coloured barplot', zsc.col = c('yellow', 'black', 'cyan'))
 88 | ```
 89 | 
 90 | Barplots are common and very easy to read, but they might not be the absolute solution. Another possibility to display an overview for high- dimensional data is the bubble plot. 
 91 | 
 92 | ###The bubble plot (GOBubble)
 93 | The bubble plot is another possibility to get an overview of the enriched terms. The z-score is assigned to the x-axis and the negative logarithm of the adjusted p-value to the y-axis, as in the barplot (the higher the more significant). The area of the displayed circles is proportional to the number of genes (circ$count) assigned to the term and the colour corresponds to the category.
 94 | The help page of the plotting function (?GOBubble) lists all the arguments to change the layout of the plot. As a default the circles are labeled with the term ID. Therefore, a table connecting the IDs and terms is displayed on the right side by default. You can hide it by setting the argument *table.legend* to FALSE. If you want to display the term description instead set the argument *ID* to FALSE. Not all the circles are labeld due to the limited space and the overlap of the circles. A threshold for the labeling is set (default=5) based on the negative logarithm of the adjusted p-value.
 95 | 
 96 | 
 97 | ```{r GOBubble1, warning = FALSE, message = FALSE, fig.keep = 'none'}
 98 | # Generate the bubble plot with a label threshold of 3
 99 | GOBubble(circ, labels = 3)
100 | ```
101 | 
102 | ![GOBubble1.](GOBubble1.png)
103 | 
104 | To add a title, change the colour of the circles, facet the plot and to change the label threshold use the following arguments:
105 | 
106 | ```{r GOBubble2, warning = FALSE, message = FALSE, fig.keep = 'none'}
107 | # Add a title, change the colour of the circles, facet the plot according to the categories and change the label threshold
108 | GOBubble(circ, title = 'Bubble plot', colour = c('orange', 'darkred', 'gold'), display = 'multiple', labels = 3)
109 | ```
110 | 
111 | ![GOBubble2.](GOBubble2.png)
112 | 
113 | For the facet plot it is also possible to colour the background of the panels according to the displayed category by setting *bg.col* to TRUE.
114 | 
115 | ```{r GOBubble3, warning = FALSE, message = FALSE, fig.keep = 'none'}
116 | # Colour the background according to the category
117 | GOBubble(circ, title = 'Bubble plot with background colour', display = 'multiple', bg.col = T, labels = 3)
118 | ```
119 | 
120 | ![GOBubble3.](GOBubble3.png)
121 | 
122 | A new function, *reduce_overlap*, was included in the updated version of the package to reduce the number of redundant terms. So far, the implemented method is very simple + slow and needs further refinement. Nevertheless, by reducing the number of redundant terms the readability of plots, like the bubble plot, improves significantly. The function deletes all terms that have a gene overlap greater than or equal to a set threshold. The function keeps one term per group as a representative without taking into consideration the GO hierarchy.  
123 | 
124 | ```{r GOBubble4, warning = FALSE, message = FALSE, fig.keep = 'none', eval = FALSE}
125 | # Reduce redundant terms with a gene overlap >= 0.75...
126 | reduced_circ <- reduce_overlap(circ, overlap = 0.75)
127 | # ...and plot it
128 | GOBubble(reduced_circ, labels = 2.8)
129 | ```
130 | 
131 | ![GOBubble4.](GOBubble4.png)
132 | 
133 | ### Circular visualization of the results of gene- annotation enrichment analysis (GOCircle)
134 | The overview plots shall help to decide which of the terms are the most interesting to us. Of course, this decision depends although on the hypothesis and ideas you want to confirm with your data. Not always are the most significant terms the ones you are interested in. So, after manually selecting a set of valuable terms (EC$process) the next figure should provide us with more details on this specific terms. One of the major issues we figured out by presenting the plots was: it was sometimes difficult to interpret the information the z-score provides. Since the measure is not that common. As shown above it is simply the number of up- regulated genes minus the number of down- regulated genes divided by the square root of the count. The *GOCircle* plot emphasizes this fact.
135 | 
136 | ```{r GOCircle1, warning = FALSE, message = FALSE, fig.keep = 'none'}
137 | # Generate a circular visualization of the results of gene- annotation enrichment analysis
138 | GOCircle(circ)
139 | ```
140 | 
141 | ![Circle plot.](GOCirc.png)
142 | 
143 | The outer circle shows a scatter plot for each term of the logFC of the assigned genes. Red circles display up- regulation and blue ones down- regulation by default. The colours can be changed with the argument *lfc.col*. Therefore, it is easier to understand, why in some cases highly significant terms have a z-score close to zero. A z-score of zero does not mean that the term is not important. At least not as long as it is significantly enriched. It just shows that the z-score is a crude measure, because obviously the score does not take into account the functional level and activation dependencies of the single genes within a process. 
144 | You can change the layout of the plot with various arguments, see ?GOCirlce.The *nsub* argument needs a little bit more explanation to be used wisely. First of all, it can be a numeric or a character vector. If it is a character vector then it contains the IDs or term descriptions of the processes you want to display (output not shown).
145 | 
146 | ```{r GOCircle2, eval = FALSE}
147 | # Generate a circular visualization of selected terms
148 | IDs <- c('GO:0007507', 'GO:0001568', 'GO:0001944', 'GO:0048729', 'GO:0048514', 'GO:0005886', 'GO:0008092', 'GO:0008047')
149 | GOCircle(circ, nsub = IDs)
150 | ```
151 | 
152 | If *nsub* is a numeric vector then the number defines how many terms are displayed. It starts with the first row of the input data frame (output not shown).
153 | 
154 | ```{r GOCircle3, eval = FALSE}
155 | # Generate a circular visualization for 10 terms
156 | GOCircle(circ, nsub = 10)
157 | ```
158 | 
159 | This kind of visualization is only useful for a smaller set of terms. The maximum number of terms lies around 12. While the number of terms decreases the amount of displayed information increases. 
160 | 
161 | ### Display of the relationship between genes and terms (GOChord)
162 | Based on the **[Circos](http://circos.ca/)** plots designed by *[Martin Krzywinski](http://mkweb.bcgsc.ca/)* the *GOChord* plotting function was implemented. It displays the relationship between a list of selected genes and terms, as well as the logFC of the genes. As an input a binary membership matrix is necessary. You can build the matrix on your own or you use the implemented function *chord_dat* which does the job for you. The function takes three arguments: *data*, *genes* and *process*, of which from the last two only one is mandatory. So, the *circle_dat* combined your expression data with the results from the functional analysis. The bar and bubble plot allowed you to get a first impression of your data and now, you selected a list of genes and processes you think are valuable. *GOCircle* adds a layer to display the expression values of the genes assigned to the terms, but it lacks the information of the relationship between the genes and the terms. It is not easy to figure out if some of the genes are linked to multiple processes. The chord plot fills the void left by *GOCircle*.           
163 | 
164 | ```{r GOChord1, warning = FALSE, message = FALSE}
165 | # Define a list of genes which you think are interesting to look at. The item EC$genes of the toy 
166 | # sample contains the data frame of selected genes and their logFC. Have a look...
167 | head(EC$genes)
168 | # Since we have a lot of significantly enriched processes we selected some specific ones (EC$process)
169 | EC$process
170 | # Now it is time to generate the binary matrix
171 | chord <- chord_dat(circ, EC$genes, EC$process)
172 | head(chord)
173 | ```
174 | 
175 | Rows are genes and columns are terms. A '0' indicates that the gene is not assigned to the term; a '1' the opposite. As mentioned before it is possible to leave either the *genes* or the *process* argument out. If you pass on the *process* argument the binary matrix is build for the list of selected genes and all the processes with at least one assigned gene. On the other hand, if you just provide a set of processes without limiting the list of genes, the binary matrix is generated for all the genes which are assigned to at least one of the processes from your list (output not shown).
176 | 
177 | ```{r GOChord2, eval=FALSE, warning = FALSE, message = FALSE}
178 | # Generate the matrix with a list of selected genes
179 | chord <- chord_dat(data = circ, genes = EC$genes)
180 | # Generate the matrix with selected processes
181 | chord <- chord_dat(data = circ, process = EC$process)
182 | ```
183 | 
184 | Be aware that a pass on either *genes* or *process* might lead to a large binary matrix which results in a confusing visualization. The chart was designed for smaller subsets of high-dimensional data.
185 | Like the other plotting functions *GOChord* provides the user with a lot of arguments to change the layout of the plot, see ?GOChord. Most of the arguments address the adjustment of the font size of the labels, the space between them, the colour scale for the logFC and the colour of the ribbons. Despite the asthetics there are two other arguments: *gene.order* and *nlfc*. The first argument defines the order of the genes with the three possible options: 'logFC', 'alphabetical', 'none'. Actually the options are quite self- explanatory. Sometimes you are performing the differential expression analysis for multiple conditions and/or batches. Therefore, you want to include more than one logFC value per gene. To adjust to this situation you should use the *nlfc* argument. It is a numeric value and it defines the number of logFC columns within your binary membership matrix. The default is '1' assuming that most of the time you just have one contrast and one logFC value per gene.       
186 | 
187 | 
188 | ```{r GOChord3, warning = FALSE, message = FALSE, fig.keep = 'none'}
189 | # Create the plot
190 | GOChord(chord, space = 0.02, gene.order = 'logFC', gene.space = 0.25, gene.size = 5)
191 | ```
192 | ![Chord1.](GOChord1.png)
193 | 
194 | The *space* argument defines the space between the coloured rectangles representing the logFC. Also the font size of the gene labels (*gene.size*) and the space (*gene.space*) between them was changed. The genes were ordered according to their logFC values setting *gene.order* to 'logFC'.
195 | 
196 | Sometimes the plot gets a bit crowded and you would like to reduce the number of displayed genes or processes. You can do this automatically by making use of the *limit* argument. Limit is a vector with two cutoff values (default = c(0, 0)). The first value defines the minimum (>=) number of terms a gene has to be assigned to. The second value determines the number of genes assigned to a selected term. For example, to display only genes which are assigned to at least three processes you would use the following line of code (output not shown):
197 | 
198 | ```{r GOChord4, warning = FALSE, message = FALSE, fig.keep = 'none'}
199 | # Display only genes which are assigned to at least three processes
200 | GOChord(chord, limit = c(3, 0), gene.order = 'logFC')
201 | ```
202 | 
203 | ### Heatmap of genes and terms (GOHeat)
204 | Thanks to a very nice suggestion from *[Maureen Sartor, Ph.D.](http://sartorlab.ccmb.med.umich.edu/)* I implemented *GOHeat*. The *GOHeat* function generates a heatmap of the relationship between genes and terms similar to *GOChord*. Biological processes are displayed in rows and genes in columns. Each column is divided into smaller rectangles and the colouring of the tiles depends on the presence or abscence of logFC values. In addition genes are clustered to highlight groups of genes with similar annotated functions. Basically the function has two modes depending on the *nlfc* argument. If *nlfc = 0*, so no logFC values are available, the colouring encodes for the overall number of processes the respective gene is assigned to. Let's have a look at an example...
205 | 
206 | ```{r GOHeat1, warning = FALSE, message = FALSE, fig.keep = 'none'}
207 | # First, we use the chord object without logFC column to create the heatmap
208 | GOHeat(chord[,-8], nlfc = 0)
209 | ```
210 | 
211 | ![Heat1.](GOHeat_nolfc.png)
212 | 
213 | In case of *nlfc = 1* the colour corresponds to the logFC of the gene...
214 | 
215 | ```{r GOHeat2, warning = FALSE, message = FALSE, fig.keep = 'none'}
216 | # First, we use the chord object without logFC column to create the heatmap
217 | GOHeat(chord, nlfc = 1, fill.col = c('red', 'yellow', 'green'))
218 | ```
219 | 
220 | ![Heat2.](GOHeat_lfc.png)
221 | 
222 | ### Golden eye (GOCluster)
223 | The idea behind the *GOCluster* function is to visualize as much information as possible. Here is an example:
224 | 
225 | ```{r GOCluster, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'}
226 | GOCluster(circ, EC$process, clust.by = 'logFC', term.width = 2)
227 | ```
228 | ![GOCluster.](GOCluster.png)
229 | 
230 | Hierarchical clustering is a popular method for gene expression analysis due to its unsupervised nature assuring an unbiased result. Genes are grouped together based on their expression patterns, thus clusters are likely to contain sets of co-regulated or functionally related genes. *GOCluster* performs the hierarchical clustering of the gene expression profiles using the *hclust* method in core R. If you want to change the distance metric or the clustering algorithm use the arguments *metric* and *clust*, respectively. The resulting dendrogram is transformed with the help of *ggdendro* to be suitable for a visualization with *ggplot2*. As before a circular layout was chosen, because it is not only effective but also visually appealing. The first ring next to the dendrogram represents the logFC of the genes, which are actually the leaves of the clustering tree. In case you are interested in more than one contrast the *nlfc* argument is also available for this function. By default it is set to '1', so only one ring is drawn. Like always the logFC values are colour- coded with an user- definable colour scale (*lfc.col*). The next ring represents the terms assigned to the genes. For aesthetic reasons the terms should be reduced to a reasonable number with the argument *process*. The terms are colour- coded as well and you can change the default colours by using the argument *term.col*. Once again, the plotting function provides you with a bunch of arguments to change the layout of the plot and you can check them out on the help page, ?GOCluster. Probably the most important argument of the function is *clust.by*. It expects a character vector specifying if the clustering should be done for gene expression pattern ('logFC', as in the figure above) or functional categories ('terms').        
231 | 
232 | 
233 | ```{r GOCluster2, warning=FALSE, eval=FALSE, message=FALSE, fig.keep='none'}
234 | GOCluster(circ, EC$process, clust.by = 'term', lfc.col = c('darkgoldenrod1', 'black', 'cyan1'))
235 | ```
236 | ![GOCluster2.](GOCluster2.png)
237 | 
238 | ### Venn diagram (GOVenn)
239 | In this biological context we implemented a Venn diagram that can be used to detect relations between various lists of differentially expressed genes or to explore the intersection of genes of multiple terms from the functional analysis. The Venn diagram does not only display the number of overlap genes, but it also displays the information about the gene expression patterns (commonly up- regulated, commonly down- regulated or contra- regulated). At the moment, maximal three datasets are aloud as an input. The input data frame contains at least two columns: one for the gene names and one for the logFC value. 
240 | 
241 | ```{r GOVenn, warning=FALSE, message=FALSE, fig.keep='none'}
242 | l1 <- subset(circ, term == 'heart development', c(genes,logFC))
243 | l2 <- subset(circ, term == 'plasma membrane', c(genes,logFC))
244 | l3 <- subset(circ, term == 'tissue morphogenesis', c(genes,logFC))
245 | GOVenn(l1,l2,l3, label = c('heart development', 'plasma membrane', 'tissue morphogenesis'))
246 | ```
247 | ![Venn diagram.](GOVenn.png)
248 | 
249 | For example, heart development and tissue morphogenesis share a set of 22 genes, whereas 5 are commonly up-regulated and 17 are commonly down-regulated. The important thing to notice is, that the pie charts don't display redundant information. Thus, if you compare three datasets the genes which are shared by all datasets (pie chart in the middle) are not included in the other pie charts. 
250 | The following [link](https://wwalter.shinyapps.io/Venn/) refers to the shinyapp of this tool. The web tool is slightly more interactive since the circles are area-proportional to the number of genes of the dataset and the small pie charts can be moved with sliders. It has also all the other options of the *GOVenn* function to change the layout of the plot. You can easily download the picture and gene lists.   
251 | 


--------------------------------------------------------------------------------
/vignettes/Titel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cran/GOplot/7808bca4bd30c3a3c727052fb7fd0578c7d80d31/vignettes/Titel.png


--------------------------------------------------------------------------------