├── DESCRIPTION ├── NAMESPACE ├── NEWS ├── QUESTIONS ├── R ├── KEGGREST.R ├── parsers.R └── utilities.R ├── README.md ├── inst └── unitTests │ └── test_KEGGREST.R ├── man ├── keggCompounds.Rd ├── keggConv.Rd ├── keggFind.Rd ├── keggGet.Rd ├── keggInfo.Rd ├── keggLink.Rd ├── keggList.Rd ├── listDatabases.Rd └── mark.pathway.by.objects.Rd ├── tests └── KEGGREST_unit_tests.R └── vignettes └── KEGGREST-vignette.Rmd /DESCRIPTION: -------------------------------------------------------------------------------- 1 | Package: KEGGREST 2 | Version: 1.49.0 3 | Title: 4 | Client-side REST access to the Kyoto Encyclopedia of Genes and Genomes (KEGG) 5 | Authors@R: c( 6 | person("Dan", "Tenenbaum", role = "aut"), 7 | person("Bioconductor Package", "Maintainer", role = c("aut", "cre"), 8 | email = "maintainer@bioconductor.org"), 9 | person("Martin", "Morgan", role = "ctb"), 10 | person("Kozo", "Nishida", role = "ctb"), 11 | person("Marcel", "Ramos", role = "ctb"), 12 | person("Kristina", "Riemer", role = "ctb"), 13 | person("Lori", "Shepherd", role = "ctb"), 14 | person("Jeremy", "Volkening", role = "ctb") 15 | ) 16 | Depends: R (>= 3.5.0) 17 | Imports: methods, httr, png, Biostrings 18 | Suggests: RUnit, BiocGenerics, BiocStyle, knitr, markdown 19 | Description: 20 | A package that provides a client interface to the Kyoto 21 | Encyclopedia of Genes and Genomes (KEGG) REST API. Only 22 | for academic use by academic users belonging to academic 23 | institutions (see ). 24 | Note that KEGGREST is based on KEGGSOAP by J. Zhang, R. Gentleman, 25 | and Marc Carlson, and KEGG (python package) by Aurelien Mazurie. 26 | URL: https://bioconductor.org/packages/KEGGREST 27 | BugReports: https://github.com/Bioconductor/KEGGREST/issues 28 | License: Artistic-2.0 29 | VignetteBuilder: knitr 30 | biocViews: Annotation, Pathways, ThirdPartyClient, KEGG 31 | RoxygenNote: 7.1.1 32 | Date: 2024-06-17 33 | -------------------------------------------------------------------------------- /NAMESPACE: -------------------------------------------------------------------------------- 1 | 2 | importFrom(utils, download.file, head) 3 | importFrom(httr, GET, POST, http_status, content, stop_for_status) 4 | importFrom(png, readPNG, writePNG) 5 | importFrom(Biostrings, readAAStringSet, readDNAStringSet, 6 | DNAStringSet, AAStringSet) 7 | import(methods) 8 | 9 | export( 10 | keggInfo, 11 | keggList, 12 | listDatabases, 13 | keggFind, 14 | keggGet, 15 | keggCompounds, 16 | keggConv, 17 | keggLink, 18 | mark.pathway.by.objects, 19 | color.pathway.by.objects 20 | ) 21 | 22 | -------------------------------------------------------------------------------- /NEWS: -------------------------------------------------------------------------------- 1 | CHANGES IN VERSION 1.46.0 2 | ----------------------- 3 | 4 | BUG FIXES 5 | 6 | o 1.45.1 Fix keggFind URL to use '+' instead of spaces. 7 | 8 | CHANGES IN VERSION 1.42.0 9 | ----------------------- 10 | 11 | SIGNIFICANT USER-VISIBLE CHANGES 12 | 13 | o `keggCompounds` lists compound IDs for a given pathway (@KristinaRiemer, 14 | #6). 15 | 16 | BUG FIXES 17 | 18 | o Update URL path in `.get.kegg.url` from `tmp` to `kegg` subfolder. 19 | 20 | CHANGES IN VERSION 1.37.0 21 | ----------------------- 22 | 23 | BUG CORRECTION 24 | 25 | o 1.37.1 Fixes new endpoint 26 | o 1.37.2 http to https fixes windows error 27 | 28 | CHANGES IN VERSION 1.0.0 29 | ----------------------- 30 | 31 | SIGNIFICANT USER-VISIBLE CHANGES 32 | 33 | o Package introduced. 34 | 35 | NEW FEATURES 36 | 37 | o Package introduced. 38 | -------------------------------------------------------------------------------- /QUESTIONS: -------------------------------------------------------------------------------- 1 | Questions for the KEGG team. 2 | 3 | 4 | ** No apparent replacement found for old APIs 5 | 6 | Is there a new programmatic way to call the old SOAP api "get_motifs_by_gene"? 7 | I know I can do it manually with a request like this: 8 | 9 | http://www.kegg.jp/ssdb-bin/ssdb_motif?kid=eco%3Ab0002&lib=pfam 10 | 11 | But then I have to scrape the page. 12 | 13 | I have a similar question about "get_genes_by_motifs", which I can also do 14 | manually from http://www.kegg.jp/kegg/ssdb/. Also, the SOAP API had 15 | "start" and "max_results" arguments; is there an equivalent? 16 | 17 | About the SOAP APIs "get_best_neighbors_by_gene" and 18 | "get_best_best_neighbors_by_gene"; it looks like similar 19 | functionality is provided by http://www.kegg.jp/kegg/ssdb/ but again, 20 | is there a more programmatic way to do it? 21 | 22 | I’d also like to replace the old API "get_paralogs_by_gene" 23 | which it seems like I can also do from the 24 | http://www.kegg.jp/kegg/ssdb/ page. 25 | With all of these SSDB functions, is there support for the 26 | "start" and "max_results" arguments? 27 | 28 | Although I can search compounds by mass, 29 | (example: http://rest.kegg.jp/find/compound/174.05/exact_mass/), 30 | I can’t seem to search glycans by mass as I could with the SOAP API 31 | "search_glycans_by_mass". Is there an equivalent function in the REST API? 32 | 33 | There does not seem to be an equivalent to the SOAP API 34 | "search_compounds_by_subcomp". Is there a replacement? 35 | 36 | What about "search_glycans_by_kcam" and the more general-purpose 37 | "bget"? Is there a way in the REST api to return flat-file records, 38 | similar to what "bget" did? 39 | 40 | Two other "missing" functions seem to be "get_ko_by_ko_class" 41 | and "get_genes_by_ko_class". Are there replacements for these? 42 | 43 | Is the SOAP function get_html_of_colored_pathway_by_elements 44 | any different from e.g. 45 | http://www.kegg.jp/kegg-bin/show_pathway?eco00260/b0002%09%23ff0000,%2300ff00/c00263%09%23ffff00,yellow 46 | ? 47 | 48 | Is there a REST implementation of the SOAP functions 49 | get_element_relations_by_pathway and get_elements_by_pathway? 50 | 51 | 52 | ** Results of new APIs differ from old 53 | 54 | Calling the SOAP api "get_enzymes_by_pathway" with 55 | "pathway_id" equal to "path:eco00020" yields a result of 14 56 | enzymes; what seems to be the equivalent in REST, 57 | http://rest.kegg.jp/link/enzyme/path:eco00020 returns nothing. 58 | Is this expected? Have the data changed? 59 | 60 | Similarly, calling "get_compounds_by_pathway" with a 61 | "pathway_id" argument of "path:eco00020" returns 20 compounds 62 | in SOAP, but the REST equivalent (?), 63 | http://rest.kegg.jp/link/compound/path%3aeco00020 returns nothing. 64 | 65 | Calling this REST api with a different argument: 66 | http://rest.kegg.jp/link/compound/path:map00010 67 | does return some results. 68 | 69 | Calling the SOAP api "get_kos_by_pathway" with a "pathway_id" 70 | argument of "path:hsa00010" returns 36 results, but the seeming REST 71 | equivalent, http://rest.kegg.jp/link/ko/path%3ahsa00010 returns nothing. 72 | 73 | 74 | ** Missing Arguments 75 | 76 | The SOAP API "get_genes_by_organism" has "start" and "max_results 77 | arguments. The REST equivalent, which seems to be, for example 78 | http://rest.kegg.jp/list/hsa does not appear to have such arguments. 79 | Is there a way to paginate results that come back from the REST server? 80 | 81 | It looks like I can call an equivalent of the old 82 | "get_genes_by_ko" API, as follows: 83 | http://rest.kegg.jp/link/genes/ko:K12524 84 | But in the old API, I could filter by organism (e.g. "eco"). 85 | Also, the SOAP api would return annotations, for example, for 86 | eco:b0002 it would return 87 | "thrA; fused aspartokinase I and homoserine dehydrogenase I 88 | (EC:2.7.2.4 1.1.1.3); K12524 bifunctional aspartokinase / homoserine 89 | dehydrogenase 1 [EC:2.7.2.4 1.1.1.3] ". 90 | 91 | Is there a way I can get these annotations back in a REST query? 92 | It seems I only get two columns back, the ko ID and gene ID. 93 | 94 | Similarly, the SOAP API "get_pathways_by_kos" had an "org" 95 | argument to filter by "organism". The REST equivalent, for 96 | example http://rest.kegg.jp/link/pathway/ko:K00016+ko:K00382 97 | oes not seem to have this option, and the results do not allow 98 | me to do my own filtering (that is, they do not have three-letter 99 | organism codes). 100 | 101 | Also, the results differ between SOAP and REST. 102 | In SOAP, calling "get_pathways_by_kos" with a "ko_id_list" 103 | argument of ko:K00016 and ko:K00382, and an "org" argument 104 | of "hsa", returns path:hsa00010 and path:hsa00620, but the results of 105 | http://rest.kegg.jp/link/pathway/ko:K00016+ko:K00382 do not 106 | include these items. 107 | 108 | -------------------------------------------------------------------------------- /R/KEGGREST.R: -------------------------------------------------------------------------------- 1 | keggInfo <- function(database) 2 | { 3 | ## FIXME return an object instead of a character vector 4 | url <- sprintf("%s/info/%s", .getRootUrl(), database) 5 | .getUrl(url, .textParser) 6 | } 7 | 8 | 9 | keggList <- function(database, organism) 10 | { 11 | database <- paste(database, collapse="+") 12 | if (missing(organism)) 13 | url <- sprintf("%s/list/%s", .getRootUrl(), database) 14 | else 15 | url <- sprintf("%s/list/%s/%s", .getRootUrl(), database, organism) 16 | if (database == "organism") 17 | return(.organismListParser(url)) 18 | .getUrl(url, .listParser, nameColumn=1, valueColumn=2) 19 | } 20 | 21 | keggFind <- function(database, query, 22 | option=c("formula", "exact_mass", "mol_weight")) 23 | { 24 | if(missing(database)) 25 | stop("'database' argument is required") 26 | if (!missing(option)) 27 | option <- match.arg(option) 28 | if (is.integer(query) && length(query) > 1) 29 | query <- sprintf("%s-%s", min(query), max(query)) 30 | query <- gsub("\\s", "+", query) 31 | query <- paste(query, collapse="+") 32 | url <- sprintf("%s/find/%s/%s", .getRootUrl(), database, query) 33 | if (!missing(option)) 34 | url <- sprintf("%s/%s", url, option) 35 | .getUrl(url, .listParser, nameColumn=1, valueColumn=2) 36 | } 37 | 38 | 39 | keggGet <- function(dbentries, 40 | option=c("aaseq", "ntseq", "mol", "kcf", "image", "kgml")) 41 | { 42 | if (length(dbentries) > 10) 43 | warning(paste("More than 10 inputs supplied, only the first", 44 | "10 results will be returned.")) 45 | dbentries <- paste(dbentries, collapse="+") 46 | url <- sprintf("%s/get/%s", .getRootUrl(), dbentries) 47 | if (!missing(option)) 48 | { 49 | url <- sprintf("%s/%s", url, option) 50 | 51 | if (option == "image") 52 | return(content(GET(url), type="image/png")) 53 | if (option %in% c("aaseq", "ntseq")) 54 | { 55 | t <- tempfile() 56 | cat(.getUrl(url, .textParser), file=t) 57 | if (option == "aaseq") 58 | return(readAAStringSet(t)) 59 | else if (option == "ntseq") 60 | return(readDNAStringSet(t)) 61 | } 62 | if (option %in% c("mol", "kcf", "kgml")) 63 | return(.getUrl(url, .textParser)) 64 | } 65 | if (grepl("^br:", dbentries[1])) 66 | return(.getUrl(url, .textParser)) 67 | .getUrl(url, .flatFileParser) 68 | } 69 | 70 | keggCompounds <- function(pathwayID) 71 | { 72 | url <- sprintf("%s/link/cpd/%s", .getRootUrl(), pathwayID) 73 | .getUrl(url, .compoundParser) 74 | } 75 | 76 | .keggConv <- function(target, source) 77 | { 78 | query <-paste(source, collapse = "+") 79 | url <- sprintf("%s/conv/%s/%s", .getRootUrl(), target, query) 80 | .getUrl(url, .listParser, nameColumn = 1, valueColumn = 2) 81 | } 82 | 83 | keggConv <- function (target, source, querySize = 100) 84 | { 85 | groups <- .splitInGroups(source, querySize) 86 | answer <- lapply(groups, .keggConv, target = target) 87 | as(unlist(answer), "character") 88 | } 89 | 90 | keggLink <- function(target, source) 91 | { 92 | if (missing(source)) 93 | { 94 | url <- sprintf("%s/link/%s", 95 | .getGenomeUrl(), target) 96 | .getUrl(url, .matrixParser, ncol=3) 97 | } else { 98 | url <- sprintf("%s/link/%s/%s", 99 | .getRootUrl(), target, paste(source, collapse="+")) 100 | .getUrl(url, .listParser, nameColumn=1, valueColumn=2) 101 | 102 | } 103 | ## FIXME?? keggLink("pathway",c("hsa:10458", "ece:Z5100")) 104 | ## returns a list with duplicate names 105 | } 106 | 107 | 108 | listDatabases <- function() 109 | { 110 | c("pathway", "brite", "module", "ko", "genome", "vg", "ag", "compound", 111 | "glycan", "reaction", "rclass", "enzyme", "disease", "drug", 112 | "dgroup", "environ", "genes", "ligand", "kegg") 113 | } 114 | 115 | ## This is not strictly speaking an API supported by the KEGG REST 116 | ## server, but it seems useful, and does not use SOAP, so I'm leaving it in. 117 | mark.pathway.by.objects <- function(pathway.id, object.id.list) 118 | { 119 | ## example: http://www.kegg.jp/pathway/eco00260+b0002+c00263 120 | pathway.id <- sub("^path:", "", pathway.id) 121 | if (!missing(object.id.list)) { 122 | object.id.list <- paste(object.id.list, collapse="+") 123 | pathway.id <- sprintf("%s+%s", pathway.id, object.id.list) 124 | } 125 | url <- sprintf("https://www.kegg.jp/pathway/%s", pathway.id) 126 | .get.kegg.url(url) 127 | } 128 | 129 | ## This is not strictly speaking an API supported by the KEGG REST 130 | ## server, but it seems useful, and does not use SOAP, so I'm leaving it in. 131 | color.pathway.by.objects <- function(pathway.id, object.id.list, 132 | fg.color.list, bg.color.list) 133 | { 134 | ## example: http://www.kegg.jp/kegg-bin/show_pathway?eco00260/b0002%09%23ff0000,%2300ff00/c00263%09%23ffff00,yellow 135 | ## also works to include organism code in gene IDs 136 | ## (but don't include path: in pathway id) 137 | ## documentation here: http://www.kegg.jp/kegg/rest/weblink.html 138 | ## and here: http://www.kegg.jp/kegg/tool/map_pathway2.html 139 | 140 | ## Nov 2020: refactored to use form POST due to issues with long URLs when 141 | ## large identifier lists are passed. 142 | 143 | pathway.id <- sub("^path:", "", pathway.id) 144 | if (!(length(object.id.list)==length(fg.color.list) && 145 | length(fg.color.list) == length(bg.color.list))) { 146 | stop(paste("object.id.list, fg.color.list, and bg.color.list must", 147 | "all be the same length.")) 148 | } 149 | 150 | # format identifier/color list as expected by server 151 | payload <- paste( 152 | c("#ids", object.id.list), 153 | c("cols", paste(bg.color.list, fg.color.list, sep=',')), 154 | sep="\t", 155 | collapse="\n" 156 | ) 157 | 158 | # fetch KEGG page from server, via a 302 redirect handled by httr 159 | # transparently 160 | res <- POST( 161 | url = "https://www.kegg.jp/kegg-bin/show_pathway", 162 | body = list( 163 | map = pathway.id, 164 | multi_query = payload, 165 | mode = 'color' 166 | ), 167 | encode="multipart" 168 | ) 169 | res <- content(res, "text") 170 | 171 | # extract image URL from page 172 | img_matches <- regexpr( 173 | "(?<= 1) { 185 | stop( 186 | "'color.pathway.by.objects()' ", 187 | "unexpectedly matched multiple KEGG image paths in response." 188 | ) 189 | } 190 | sprintf("https://www.kegg.jp%s", img_url) 191 | 192 | } 193 | 194 | -------------------------------------------------------------------------------- /R/parsers.R: -------------------------------------------------------------------------------- 1 | 2 | .matrixParser <- function(txt, ncol) 3 | { 4 | lines <- strsplit(txt, "\n")[[1]] 5 | split <- strsplit(lines, "\t") 6 | u <- unlist(split) 7 | matrix(u, ncol=ncol, byrow=TRUE) 8 | } 9 | 10 | 11 | .organismListParser <- function(url) 12 | { 13 | lines <- readLines(url) 14 | split <- strsplit(lines, "\t") 15 | u <- unlist(split) 16 | m <- matrix(u, ncol=4, byrow=TRUE) 17 | colnames(m) <- c("T.number", "organism", "species", "phylogeny") 18 | m 19 | } 20 | 21 | .get_parser_NAME <- function(entry) 22 | { 23 | ret <- list() 24 | for (value in names(entry)) 25 | { 26 | ret[[value]] <- gsub("^;|;$", "", entry[[value]]) 27 | } 28 | ret 29 | } 30 | 31 | .get_parser_ENTRY <- function(entry) 32 | { 33 | segs <- strsplit(unlist(entry[[1]]), " +")[[1]] 34 | ret <- c(segs[1]) 35 | names(ret) <- segs[2] 36 | ret 37 | } 38 | 39 | 40 | .get_parser_REFERENCE <- function(refs) 41 | { 42 | ret <- list() 43 | thisref <- list() 44 | for (i in 1:length(refs)) { 45 | #sapply(refs, function(item) { 46 | item <- refs[[i]] 47 | if (item$refField == "REFERENCE") 48 | { 49 | if (length(thisref) > 0) 50 | ret <- c(ret, list(thisref)) 51 | thisref <- list(id=item$value) 52 | } else { 53 | if (is.null(thisref[[item$refField]])) 54 | thisref[[item$refField]] <- list() 55 | thisref[[item$refField]] <- c(thisref[[item$refField]], 56 | item$value) 57 | } 58 | #}) 59 | } 60 | ret <- c(ret, list(thisref)) 61 | ret 62 | } 63 | 64 | 65 | .get_parser_key_value <- function(entry) 66 | { 67 | content <- c() 68 | names <- c() 69 | lines <- unlist(strsplit(unname(unlist(entry)), "\n", fixed=TRUE)) 70 | for (line in lines) 71 | { 72 | tmp <- strsplit(line, " ", fixed=TRUE)[[1]] 73 | key <- tmp[1] 74 | value <- paste(tmp[2:length(tmp)], collapse=" ") 75 | if (is.na(value)) 76 | value <- "" 77 | content <- c(content, .strip(value)) 78 | names <- c(names, .strip(key)) 79 | } 80 | names(content) <- names 81 | content 82 | } 83 | 84 | .get_parser_list <- function(entry) 85 | { 86 | unname(unlist(strsplit(unlist(entry), " {2,}"))) 87 | } 88 | 89 | .get_parser_list_or_key_value <- function(entry) 90 | { 91 | x <- unlist(entry) 92 | if (any(grepl(" {2,}", x))) 93 | .get_parser_key_value(entry) 94 | else 95 | .get_parser_list(entry) 96 | ## unlist(unname(sapply(entry, strsplit, " "))) 97 | } 98 | 99 | 100 | .get_parser_biostring <- function(entry, type) 101 | { 102 | ntseq <- unname(unlist(entry)) 103 | tmp <- ntseq[2:length(ntseq)] 104 | seq <- paste(tmp, collapse="") 105 | if (type=="AAStringSet") 106 | AAStringSet(seq) 107 | else if (type == "DNAStringSet") 108 | DNAStringSet(seq) 109 | } 110 | 111 | 112 | .flatFileParser <- function(txt) 113 | { 114 | entry <- list() 115 | refs <- list() 116 | allEntries <- c() 117 | last_field <- NULL 118 | lines <- strsplit(.strip(txt), "\n", fixed=TRUE)[[1]] 119 | ffrec <- flatFileRecordGen() 120 | for (line in lines) 121 | { 122 | if (line == "///") 123 | { 124 | ffrec$flush() 125 | for (name in ffrec$names()) 126 | { 127 | item <- ffrec$get(name) 128 | if (name == "ENTRY") 129 | ffrec$set("ENTRY", .get_parser_ENTRY(item)) 130 | if (name %in% c("ENZYME", "MARKER", "ALL_REAC", 131 | "RELATEDPAIR", "DBLINKS", "DRUG", "GENE")) 132 | ffrec$set(name, .get_parser_list(item)) 133 | if (name %in% c("PATHWAY", "ORTHOLOGY", "PATHWAY_MAP", "MODULE", 134 | "DISEASE", "REL_PATHWAY", "COMPOUND", 135 | "REACTION", "ORGANISM")) 136 | { 137 | ffrec$set(name, .get_parser_key_value(item)) 138 | } 139 | if (name %in% c("REACTION")) 140 | { 141 | ffrec$set(name, .get_parser_list_or_key_value(item)) 142 | } 143 | item <- ffrec$get(name) 144 | if(length(item) == 1 && "list" %in% class(item)) 145 | { 146 | item <- unlist(item) 147 | item <- unname(item) 148 | ffrec$set(name, item) 149 | } 150 | } 151 | if ("NTSEQ" %in% ffrec$names()) 152 | { 153 | ffrec$set("NTSEQ", 154 | .get_parser_biostring(ffrec$get("NTSEQ"), "DNAStringSet")) 155 | } 156 | if ("AASEQ" %in% ffrec$names()) 157 | { 158 | ffrec$set("AASEQ", 159 | .get_parser_biostring(ffrec$get("AASEQ"), "AAStringSet")) 160 | } 161 | 162 | ## dreaded copy-and-append pattern 163 | allEntries <- c(allEntries, list(ffrec$getFields())) 164 | ffrec <- flatFileRecordGen() 165 | } else { 166 | subfield <- NULL 167 | tmp <- strsplit(line, "", fixed=TRUE)[[1]] 168 | fs <- tmp[1:12] 169 | fs <- fs[!is.na(fs)] 170 | first12 <- .strip(paste(fs, collapse="")) 171 | if(is.na(tmp[13])) 172 | value <- "" 173 | else 174 | value <- .rstrip(paste(tmp[13:length(tmp)], collapse="")) 175 | if (!grepl("^ ", line)) 176 | { 177 | field <- strsplit(line, " ", fixed=TRUE)[[1]][1] 178 | ffrec$setField(field) 179 | } else { 180 | if (first12 != "") 181 | { 182 | subfield <- first12 183 | ffrec$setSubfield(first12) 184 | } 185 | } 186 | ffrec$setBody(value) 187 | } 188 | } 189 | allEntries 190 | } 191 | 192 | .listParser <- function(txt, valueColumn, nameColumn) 193 | { 194 | lines <- strsplit(txt, "\n", fixed=TRUE)[[1]] 195 | splits <- strsplit(lines, "\t", fixed=TRUE) 196 | len <- lengths(splits) 197 | ret <- character(length(len)) 198 | idx <- len >= valueColumn 199 | ret[idx] <- sapply(splits[idx], "[[", valueColumn) 200 | if (!missing(nameColumn)) { 201 | idx <- len >= nameColumn 202 | nms <- character(length(len)) 203 | nms[idx] <- sapply(splits[idx], "[[", nameColumn) 204 | names(ret) <- nms 205 | } 206 | ret 207 | } 208 | 209 | 210 | .textParser <- function(txt) 211 | { 212 | txt 213 | } 214 | 215 | 216 | flatFileRecordGen <- setRefClass("KEGGFlatFileRecord", 217 | fields=list("fields"="list", 218 | lastField="character", 219 | lastSubfield="character", 220 | lastReference="list", 221 | references="list"), 222 | methods=list( 223 | initialize=function() 224 | { 225 | .self$fields <- list() 226 | .self$references <- list() 227 | .self$lastField <- character(0) 228 | .self$lastSubfield <- character(0) 229 | .self$lastReference <- list() 230 | }, 231 | setField=function(field) 232 | { 233 | .self$flush() 234 | .self$lastField <- field 235 | .self$lastSubfield <- character(0) 236 | .self 237 | }, 238 | setSubfield=function(subfield) 239 | { 240 | .self$lastSubfield <- subfield 241 | .self 242 | }, 243 | setBody=function(body) 244 | { 245 | if (!is.null(.self$lastField) && !is.na(.self$lastField) && .self$lastField == "REFERENCE") 246 | { 247 | if(length(.self$lastSubfield)) 248 | { 249 | if(is.null(.self$lastReference[[.self$lastSubfield]])) 250 | .self$lastReference[[.self$lastSubfield]] <- c() 251 | .self$lastReference[[.self$lastSubfield]] <- c( 252 | .self$lastReference[[.self$lastSubfield]], 253 | body) 254 | } else { 255 | if(is.null(.self$lastReference[[.self$lastField]])) 256 | .self$lastReference[[.self$lastField]] <- c() 257 | .self$lastReference[[.self$lastField]] <- c( 258 | .self$lastReference[[.self$lastField]], 259 | body) 260 | } 261 | } else{ 262 | if (is.null(.self$fields[[.self$lastField]])) 263 | .self$fields[[.self$lastField]] <- list() 264 | 265 | if(length(.self$lastSubfield)) 266 | { 267 | if(is.null(.self$fields[[.self$lastField]][[.self$lastSubfield]])) 268 | .self$fields[[.self$lastField]][[.self$lastSubfield]] <- c() 269 | .self$fields[[.self$lastField]][[.self$lastSubfield]] <- c( 270 | .self$fields[[.self$lastField]][[.self$lastSubfield]], 271 | body 272 | ) 273 | } else { 274 | if (is.null(.self$fields[[.self$lastField]][[.self$lastField]])) 275 | .self$fields[[.self$lastField]][[.self$lastField]] <- c() 276 | if (!is.null(.self$lastField) && !is.na(.self$lastField)) 277 | .self$fields[[.self$lastField]][[.self$lastField]] <- c( 278 | .self$fields[[.self$lastField]][[.self$lastField]], body) 279 | } 280 | } 281 | .self 282 | }, 283 | flush = function() 284 | { 285 | .self$fields[["///"]] <- NULL 286 | if (length(.self$lastReference)) 287 | { 288 | .self$references[[length(.self$references)+1]] <- .self$lastReference 289 | .self$lastReference <- list() 290 | } 291 | .self 292 | }, 293 | names = function() 294 | { 295 | nms <- base::names(.self$fields) 296 | if (length(.self$references)) 297 | nms <-c(nms, "REFERENCE") 298 | nms 299 | }, 300 | get = function(name) 301 | { 302 | if (name == "REFERENCE") 303 | return(.self$references) 304 | return(.self$fields[[name]]) 305 | }, 306 | set = function(name, value) 307 | { 308 | .self$fields[[name]] <- value 309 | .self 310 | }, getFields = function() 311 | { 312 | f <- .self$fields 313 | if (length(.self$references)) 314 | f[["REFERENCE"]] <- .self$references 315 | f 316 | } 317 | ) 318 | ) 319 | 320 | .compoundParser <- function(txt) 321 | { 322 | cmptxt <- unlist(txt) 323 | lines <- strsplit(cmptxt, "\n") 324 | cmps <- gsub(".*cpd:", "", unlist(lines)) 325 | cmps 326 | } 327 | -------------------------------------------------------------------------------- /R/utilities.R: -------------------------------------------------------------------------------- 1 | 2 | .getRootUrl <- function() 3 | { 4 | getOption("KEGG_REST_URL", "https://rest.kegg.jp") 5 | } 6 | 7 | .getGenomeUrl <- function() 8 | { 9 | getOption("KEGG_GENOME_URL", "http://rest.genome.jp") 10 | } 11 | 12 | .printf <- function(...) message(noquote(sprintf(...))) 13 | 14 | .cleanUrl <- function(url) 15 | { 16 | url <- gsub(" ", "%20", url, fixed=TRUE) 17 | url <- gsub("#", "%23", url, fixed=TRUE) 18 | url <- gsub(":", "%3a", url, fixed=TRUE) 19 | sub("http(s)*%3a//", "http\\1://", url) 20 | } 21 | 22 | .getUrl <- function(url, parser, ...) 23 | { 24 | url <- .cleanUrl(url) 25 | debug <- getOption("KEGGREST_DEBUG", FALSE) 26 | if (debug) 27 | .printf("url == %s", url) 28 | response <- GET(url) 29 | stop_for_status(response) 30 | content <- .strip(content(response, "text")) 31 | if (nchar(content) == 0) 32 | return(character(0)) 33 | do.call(parser, list(content, ...)) 34 | } 35 | 36 | .strip <- function(str) 37 | { 38 | gsub("^\\s+|\\s+$", "", str) 39 | } 40 | 41 | .rstrip <- function(str) 42 | { 43 | gsub("\\s+$", "", str) 44 | } 45 | 46 | .lstrip <- function(str) 47 | { 48 | gsub("^\\s+", "", str) 49 | } 50 | 51 | .get.kegg.url <- function(url) 52 | { 53 | res <- GET(url) 54 | stop_for_status(res, "GET KEGG pathway URL") 55 | content <- content(res, type="text", encoding = "UTF-8") 56 | lines <- strsplit(content, "\n", fixed=TRUE)[[1]] 57 | urlLine <- grep("](https://bioconductor.org/) 2 | 3 | **KEGGREST** is an R/Bioconductor package that provides a client interface to the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. 4 | 5 | See https://bioconductor.org/packages/KEGGREST for more information including how to install the release version of the package (please refrain from installing directly from GitHub). 6 | 7 | -------------------------------------------------------------------------------- /inst/unitTests/test_KEGGREST.R: -------------------------------------------------------------------------------- 1 | library(KEGGREST) 2 | library(RUnit) 3 | 4 | ## checker helper 5 | .checkLOL <- function(res) 6 | { 7 | all(checkTrue(class(res)=="list"), 8 | checkTrue(class(res[[1]])=="list"), 9 | checkTrue(length(res) > 0)) 10 | } 11 | 12 | .checkCharVec <- function(res) 13 | { 14 | all(checkTrue(class(res)=="character"), 15 | checkTrue(length(res) > 0)) 16 | } 17 | 18 | .checkPlainText <- function(res) 19 | { 20 | all(checkTrue(class(res)=="character"), 21 | checkTrue(length(res) == 1)) 22 | } 23 | 24 | .checkNamedCharVec <- function(res) 25 | { 26 | .checkCharVec(res) && 27 | checkTrue(length(names(res)) > 0) 28 | } 29 | 30 | .checkUnnamedCharVec <- function(res) 31 | { 32 | .checkCharVec(res) && 33 | is.null(names(res)) 34 | } 35 | 36 | test_keggInfo <- function() 37 | { 38 | res <- keggInfo("kegg") 39 | .checkPlainText(res) 40 | res <- keggInfo("pathway") 41 | .checkPlainText(res) 42 | res <- keggInfo("hsa") 43 | .checkPlainText(res) 44 | 45 | } 46 | 47 | test_keggList <- function() 48 | { 49 | res <- keggList("pathway") 50 | .checkCharVec(res) 51 | res <- keggList("pathway", "hsa") 52 | .checkCharVec(res) 53 | res <- keggList("organism") 54 | checkTrue("matrix" %in% class(res)) 55 | checkTrue("hsa" %in% res[, "organism"]) 56 | res <- keggList("hsa") 57 | .checkCharVec(res) 58 | res <- keggList("T01001") 59 | .checkCharVec(res) 60 | res <- keggList(c("hsa:10458", "ece:Z5100")) 61 | .checkCharVec(res) 62 | res <- keggList(c("cpd:C01290","gl:G00092")) 63 | .checkCharVec(res) 64 | res <- keggList(c("C01290+G00092")) 65 | .checkCharVec(res) 66 | } 67 | 68 | ## The thorough thing to do would be to hit /list/x for each 69 | ## x in listDatabases, but that might slam KEGG too hard and 70 | ## make them mad. Instead we hit /info. KEGG does not like 71 | ## /info/organism for some reason so we will test /list/organism. 72 | ## NOTE: rpair (RP ids) was discontinued in 2016. 73 | test_listDatabases <- function() 74 | { 75 | dbs <- listDatabases() 76 | for (db in dbs) 77 | { 78 | if (all(db != c("organism", "rpair", "environ"))) # environ by vince may 5 2021 79 | { 80 | res <- keggInfo(db) 81 | .checkPlainText(res) 82 | } 83 | } 84 | res <- keggList("organism") 85 | checkTrue("matrix" %in% class(res)) 86 | } 87 | 88 | 89 | test_keggFind <- function() 90 | { 91 | res <- keggFind("genes", c("shiga", "toxin")) 92 | .checkCharVec(res) 93 | res <- keggFind("genes", "shiga toxin") 94 | .checkCharVec(res) 95 | res <- keggFind("compound", "C7H10O5", "formula") 96 | .checkCharVec(res) 97 | res <- keggFind("compound", "O5C7", "formula") 98 | .checkCharVec(res) 99 | res <- keggFind("compound", 174.05, "exact_mass") 100 | .checkCharVec(res) 101 | res <- keggFind("compound", 300:310, "mol_weight") 102 | .checkCharVec(res) 103 | } 104 | 105 | test_keggGet <- function() 106 | { 107 | res <- keggGet(c("cpd:C01290", "gl:G00092")) 108 | .checkLOL(res) 109 | res <- keggGet(c("C01290", "G00092")) 110 | .checkLOL(res) 111 | res <- keggGet(c("hsa:10458", "ece:Z5100")) 112 | .checkLOL(res) 113 | res <- keggGet("ec:1.1.1.1") 114 | .checkLOL(res) 115 | .checkLOL(res[[1]]$REFERENCE) 116 | res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") 117 | checkTrue("AAStringSet" %in% class(res)) 118 | res <- keggGet(c("hsa:10458", "ece:Z5100"), "ntseq") 119 | checkTrue("DNAStringSet" %in% class(res)) 120 | png <- keggGet("hsa05130", "image") 121 | checkTrue("array" %in% class(png)) 122 | } 123 | 124 | test_keggGet_2 <- function() 125 | { 126 | res <- keggGet("br:br08901") 127 | .checkCharVec(res) 128 | res <- keggGet(c("br:br08901", "ece:Z5100")) 129 | .checkCharVec(res) 130 | res <- keggGet(c("ece:Z5100", "br:br08901")) 131 | .checkLOL(res) 132 | res <- keggGet("path:map00010") 133 | res <- res[[1]] 134 | # .checkNamedCharVec(res$DISEASE) 135 | res <- keggGet("md:M00001") 136 | .checkNamedCharVec(res[[1]]$REACTION) 137 | .checkNamedCharVec(res[[1]]$ORTHOLOGY) 138 | res <- keggGet("ds:H00001") 139 | .checkLOL(res) 140 | .checkUnnamedCharVec(res[[1]]$GENE) 141 | res <- keggGet("dr:D00001") 142 | x <- res[[1]]$PRODUCT 143 | checkTrue(all(names(x) == c("PRODUCT","GENERIC"))) 144 | checkTrue(grepl("^ ", res[[1]]$BRITE[2])) 145 | # res <- keggGet("ev:E00001") 146 | #[1] "http://rest.kegg.jp/get/ev:E00001" 147 | #Browse[1]> zz = GET(url) 148 | #Browse[1]> httr::content(zz) 149 | #NULL 150 | #Browse[1]> zz 151 | #Response [http://rest.kegg.jp/get/ev:E00001] 152 | # Date: 2021-05-05 12:33 153 | # Status: 404 154 | # Content-Type: text/plain 155 | # 156 | # 157 | # .checkCharVec(res[[1]]$CATEGORY) 158 | res <- keggGet("ko:K00001") 159 | checkTrue(names(res[[1]]$ENTRY) == "KO") 160 | ## DBLINK parser? 161 | res <- keggGet("genome:T00001") 162 | x <- res[[1]]$CHROMOSOME 163 | checkTrue(all(names(x) == c("CHROMOSOME", "SEQUENCE", "LENGTH"))) 164 | x <- res[[1]]$TAXONOMY 165 | checkTrue(all(names(x) == c("TAXONOMY", "LINEAGE"))) 166 | res <- keggGet("mgnm:T30001") 167 | ## metagenome has multiple TAXONOMY sections! fixme 168 | .checkCharVec(res[[1]]$ANNOTATION) 169 | ## Changed from hsa:645954; that one doesn't seem to exist! 170 | res <- keggGet("hsa:10460") 171 | .checkNamedCharVec(res[[1]]$ORGANISM) 172 | ## IS DNAStringSet the best object for a nucleotide sequence? fixme 173 | checkTrue(class(res[[1]]$NTSEQ) %in% "DNAStringSet") 174 | res <-keggGet("cpd:C00001") 175 | .checkUnnamedCharVec(res[[1]]$REACTION) 176 | checkTrue(length(res[[1]]$REACTION)> 300) 177 | res <- keggGet("gl:G00001") 178 | checkTrue("COMPOSITION" %in% names(res[[1]])) 179 | res <- keggGet("rn:R00001") 180 | checkTrue("EQUATION" %in% names(res[[1]])) 181 | res <- keggGet("rc:RC00001") 182 | .checkUnnamedCharVec(res[[1]]$REACTION) 183 | res <- keggGet("ec:1.1.1.1") 184 | .checkUnnamedCharVec(res[[1]]$REACTION) 185 | .checkUnnamedCharVec(res[[1]]$ALL_REAC) ## not ideal fixme (?) 186 | #res <- keggGet("vgnm:NC_018104") 187 | #checkTrue(is.na(names(res[[1]]$ENTRY))) # not ideal fixme 188 | res <- keggGet("hsa:10458") 189 | checkTrue("AAStringSet" %in% class(res[[1]]$AASEQ)) 190 | checkTrue("DNAStringSet" %in% class(res[[1]]$NTSEQ)) 191 | # fixme do something with CODON_USAGE? 192 | 193 | 194 | } 195 | 196 | test_splitInGroups <- function() 197 | { 198 | .splitInGroups <- KEGGREST:::.splitInGroups 199 | checkIdentical(.splitInGroups(character(), 3), list()) 200 | checkIdentical(.splitInGroups(1:5, 3), list(1:3, 4:5)) 201 | checkIdentical(.splitInGroups(1:6, 3), list(1:3, 4:6)) 202 | checkIdentical(.splitInGroups(1:7, 3), list(1:3, 4:6, 7L)) 203 | } 204 | 205 | test_keggConv <- function() 206 | { 207 | res <- keggConv("eco", "ncbi-geneid") 208 | .checkCharVec(res) 209 | res <- keggConv("ncbi-geneid", "eco") 210 | .checkCharVec(res) 211 | res <- keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100")) 212 | .checkCharVec(res) 213 | } 214 | 215 | test_keggLink <- function() 216 | { 217 | res <- keggLink("pathway", "hsa") 218 | .checkCharVec(res) 219 | res <- keggLink("hsa", "pathway") 220 | .checkCharVec(res) 221 | res <- keggLink("pathway", c("hsa:10458", "ece:Z5100")) 222 | .checkCharVec(res) 223 | } 224 | 225 | test_mark_and_color_pathways_by_objects <- function(){ 226 | url <- mark.pathway.by.objects("path:eco00260", 227 | c("eco:b0002", "eco:c00263")) 228 | .checkCharVec(url) 229 | checkTrue(grep("https://", url)==1) 230 | res <- httr::GET(url) 231 | checkTrue( httr::http_type(res) == 'image/png' ) 232 | url <- color.pathway.by.objects("path:eco00260", 233 | c("eco:b0002", "eco:c00263"), 234 | c("#ff0000", "#00ff00"), 235 | c("#ffff00", "yellow")) 236 | .checkCharVec(url) 237 | checkTrue(grep("https://", url)==1) 238 | res <- httr::GET(url) 239 | checkTrue( httr::http_type(res) == 'image/png' ) 240 | } 241 | 242 | 243 | test_reference_parser <- function() 244 | { 245 | res <- keggGet("path:map00010")[[1]] 246 | refs <- res$REFERENCE[[1]] 247 | checkTrue(length(refs) > 0) 248 | } 249 | 250 | test_keggCompounds <- function() { 251 | result <- c( 252 | "C00011", "C00042", "C00090", "C00146", "C00160", "C00530", 253 | "C00682", "C01407", "C02124", "C02222", "C02375", "C02575", "C02625", 254 | "C02814", "C02933", "C03434", "C03572", "C03585", "C03664", "C03918", 255 | "C04091", "C04431", "C04522", "C04706", "C04729", "C05618", "C06328", 256 | "C06329", "C06594", "C06596", "C06597", "C06598", "C06599", "C06600", 257 | "C06601", "C06602", "C06603", "C06755", "C06988", "C06989", "C06990", 258 | "C07075", "C07088", "C07089", "C07090", "C07091", "C07092", "C07093", 259 | "C07094", "C07095", "C07096", "C07097", "C07098", "C07099", "C07100", 260 | "C07101", "C07102", "C07103", "C11352", "C12831", "C12832", "C12833", 261 | "C12834", "C12835", "C12836", "C12837", "C12838", "C14419", "C14450", 262 | "C16181", "C16182", "C16266", "C18236", "C18238", "C18240", "C18241", 263 | "C18242", "C18243", "C18244", "C18933", "C21103", "C21104", "C21105" 264 | ) 265 | checkTrue( 266 | all( 267 | result %in% keggCompounds("map00361") 268 | ) 269 | ) 270 | } 271 | 272 | -------------------------------------------------------------------------------- /man/keggCompounds.Rd: -------------------------------------------------------------------------------- 1 | \name{keggCompounds} 2 | \alias{keggCompounds} 3 | \title{ 4 | Get list of compounds IDs for pathway 5 | } 6 | \description{ 7 | Get list of compounds IDs for pathway. 8 | } 9 | \usage{ 10 | keggCompounds(pathwayID) 11 | } 12 | \arguments{ 13 | \item{pathwayID}{ 14 | A KEGG pathway identifier with the prefix \code{map} and 5 digit number. 15 | } 16 | 17 | } 18 | \value{ 19 | A list of KEGG compound identifiers 20 | } 21 | \references{ 22 | \url{https://www.genome.jp/kegg/pathway.html} 23 | } 24 | \author{ 25 | Dan Tenenbaum, Kristina Riemer 26 | } 27 | \examples{ 28 | keggCompounds("map00361") 29 | } 30 | \keyword{ compounds } 31 | -------------------------------------------------------------------------------- /man/keggConv.Rd: -------------------------------------------------------------------------------- 1 | \name{keggConv} 2 | \alias{keggConv} 3 | \alias{conv} 4 | \alias{bconv} 5 | \title{ 6 | Convert KEGG identifiers to/from outside identifiers 7 | } 8 | \description{ 9 | Convert KEGG identifiers to/from outside identifiers. 10 | } 11 | \usage{ 12 | keggConv(target, source, querySize = 100) 13 | } 14 | \arguments{ 15 | \item{target}{ 16 | A KEGG organism code (), T number, or one of the external 17 | databases \code{ncbi-gi}, \code{ncbi-geneid}, \code{ncbi-proteinid}, 18 | \code{uniprot}, or 19 | (for chemical substance identifiers) 20 | \code{drug}, \code{compound}, or \code{glycan}, \code{pubchem}, 21 | or \code{chebi}. 22 | } 23 | 24 | \item{source}{ 25 | Same as \code{target}, but may also be a list of KEGG identifers 26 | representing internal or external names. 27 | } 28 | 29 | \item{querySize}{ 30 | Empirically, KEGG limits queries to 100 source identifiers per query. 31 | This argument enables larger queries by dividing \code{source} into 32 | sub-queries of no more than \code{querySize} identifiers. 33 | } 34 | 35 | } 36 | \value{ 37 | A named character vector. 38 | } 39 | \references{ 40 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 41 | } 42 | \author{ 43 | Dan Tenenbaum 44 | } 45 | \examples{ 46 | ## conversion from NCBI GeneID to KEGG ID for E. coli genes 47 | head(keggConv("eco", "ncbi-geneid")) 48 | head(keggConv("ncbi-geneid", "eco")) ## opposite direction 49 | 50 | ## conversion from KEGG ID to NCBI GI 51 | head(keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100"))) 52 | 53 | ## conversion from NCBI GI to KEGG ID when the organism code is not known: 54 | head(keggConv("genes", "ncbi-geneid:3113320")) 55 | } 56 | \keyword{ conv } 57 | -------------------------------------------------------------------------------- /man/keggFind.Rd: -------------------------------------------------------------------------------- 1 | \name{keggFind} 2 | \alias{keggFind} 3 | \title{ 4 | Finds entries with matching query keywords or other query data in a given 5 | database 6 | } 7 | \description{ 8 | Finds entries with matching query keywords or other query data in a given 9 | database. 10 | } 11 | \usage{ 12 | keggFind(database, query, option = c("formula", "exact_mass", 13 | "mol_weight")) 14 | } 15 | \arguments{ 16 | \item{database}{ 17 | Either the name of a single KEGG database (list available via 18 | \code{\link{listDatabases}()}, a "T number" genome identifier, 19 | or a KEGG organism code (lists of both available via 20 | \code{keggList("organism")}). 21 | } 22 | \item{query}{ 23 | One or more keywords, or a range of integers representing 24 | molecular weights. 25 | If \code{query} includes identifiers not known to KEGG, 26 | the results will not contain any information about those identifiers. 27 | } 28 | \item{option}{ 29 | \code{Optional.} If \code{database} is \code{compound} or \code{drug}, 30 | \code{option} can be \code{formula}, \code{exact_mass}, or 31 | \code{weight}. 32 | Chemical formula search is a partial match irrespective of the 33 | order of atoms given. 34 | The exact mass (or molecular weight) is checked by rounding off to the 35 | same decimal place as the query data. 36 | } 37 | } 38 | \value{ 39 | A named character vector. 40 | } 41 | \references{ 42 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 43 | } 44 | \author{ 45 | Dan Tenenbaum 46 | } 47 | 48 | 49 | \examples{ 50 | res <- 51 | keggFind("genes", c("shiga", "toxin")) ## for keywords "shiga" and "toxin" 52 | length(res) 53 | head(res) 54 | res <- keggFind("genes", "shiga toxin") ## for keywords "shiga toxin" 55 | length(res) 56 | head(res) 57 | keggFind("compound", "C7H10O5", "formula") ## for chemical formula "C7H10O5" 58 | res <- keggFind("compound", "O5C7", "formula") ## for chemical formula 59 | ## containing "O5" and "C7" 60 | length(res) 61 | head(res) 62 | keggFind("compound", 174.05, "exact_mass") ## for 174.045 63 | ## =< exact mass < 174.055 64 | res <- keggFind("compound", 300:310, "mol_weight") ## for 300 =< 65 | ## molecular weight =< 310 66 | length(res) 67 | head(res) 68 | } 69 | \keyword{ find } 70 | -------------------------------------------------------------------------------- /man/keggGet.Rd: -------------------------------------------------------------------------------- 1 | \name{keggGet} 2 | \alias{keggGet} 3 | \title{ 4 | Retrieves given database entries 5 | } 6 | \description{ 7 | Retrieves given database entries. 8 | } 9 | \usage{ 10 | keggGet(dbentries, option = c("aaseq", "ntseq", "mol", "kcf", 11 | "image", "kgml")) 12 | } 13 | %- maybe also 'usage' for other objects documented here. 14 | \arguments{ 15 | \item{dbentries}{ 16 | One or more (up to a maximum of 10) KEGG identifiers. 17 | } 18 | \item{option}{ 19 | \code{Optional.} Option governing the format of the output. 20 | \code{aaseq} is an amino acid sequence, \code{ntseq} is a nucleotide 21 | sequence. \code{image} returns an object which can be written 22 | to a PNG file, \code{kgml} returns a KGML document. 23 | } 24 | } 25 | \details{ 26 | Retrieves all entries from the KEGG database for a set of KEGG identifers. 27 | 28 | \code{keggGet}() can only return 10 result sets at once (this limitation 29 | is on the server side). If you supply more than 10 inputs to \code{keggGet()}, 30 | \code{KEGGREST} will warn that only the first 10 results will be returned. 31 | } 32 | \value{ 33 | A list wrapping a KEGG flat file. 34 | If \code{option} is \code{aaseq}, an \code{AAStringSet} object. 35 | If \code{option} is \code{ntseq}, a \code{DNAStringSet} object. 36 | If \code{option} is \code{image}, an object which can be written 37 | to a PNG file. 38 | If \code{option} is \code{kgml}, a KGML document. 39 | } 40 | \references{ 41 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 42 | } 43 | \author{ 44 | Dan Tenenbaum 45 | } 46 | \examples{ 47 | res <- keggGet(c("cpd:C01290", "gl:G00092")) ## retrieves a compound entry 48 | ## and a glycan entry 49 | str(res) 50 | res <- keggGet(c("C01290", "G00092")) ## same as above, without prefixes 51 | str(res) 52 | res <- keggGet(c("hsa:10458", "ece:Z5100")) ## retrieves a human gene entry 53 | ## and an E.coli O157 gene entry 54 | str(res) 55 | res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino 56 | ## acid sequences of a human gene and an 57 | ## E.coli O157 gene 58 | png <- keggGet("hsa05130", "image") ## retrieves the image file of a 59 | ## pathway map 60 | t <- tempfile() 61 | library(png) 62 | writePNG(png, t) 63 | res <- keggGet("hsa05130", "kgml") 64 | str(res) 65 | } 66 | \keyword{ get } 67 | -------------------------------------------------------------------------------- /man/keggInfo.Rd: -------------------------------------------------------------------------------- 1 | \name{keggInfo} 2 | \alias{keggInfo} 3 | \alias{info} 4 | \title{ 5 | Displays the current statistics of a given database 6 | } 7 | \description{ 8 | Displays statistics of a given database, such as number of 9 | entries, version, release date, and source. 10 | } 11 | \usage{ 12 | keggInfo(database) 13 | } 14 | \arguments{ 15 | \item{database}{ 16 | Either a KEGG database (list available via \code{\link{listDatabases}()}), 17 | a KEGG organism code (list available by calling \code{\link{keggList}()}) 18 | with the \code{organism} argument), or a T number (list available by 19 | calling \code{\link{keggList}()} with the \code{genome} argument.) 20 | 21 | } 22 | } 23 | \value{ 24 | A character vector containing statistics about \code{database}. 25 | } 26 | \references{ 27 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 28 | } 29 | \author{ 30 | Dan Tenenbaum 31 | } 32 | \examples{ 33 | res <- keggInfo("kegg") ## displays the current statistics of the KEGG database 34 | cat(res) 35 | res <- keggInfo("pathway") ## displays the number pathway entries including both 36 | ## the reference and organism-specific pathways 37 | cat(res) 38 | res <- keggInfo("hsa") ## displays the number of gene entries for the 39 | ## KEGG organism Homo sapiens 40 | cat(res) 41 | } 42 | \keyword{ info } 43 | \keyword{ metadata } 44 | -------------------------------------------------------------------------------- /man/keggLink.Rd: -------------------------------------------------------------------------------- 1 | \name{keggLink} 2 | \alias{keggLink} 3 | \alias{link} 4 | \title{ 5 | Find related entries by using database cross-references. 6 | } 7 | \description{ 8 | Find related entries by using database cross-references. 9 | } 10 | \usage{ 11 | keggLink(target, source) 12 | } 13 | \arguments{ 14 | \item{target}{ 15 | Either the name of a single KEGG database (list available via 16 | \code{\link{listDatabases}()}, a "T number" genome identifier, 17 | or a KEGG organism code (lists of both available via 18 | \code{keggList("organism")}). 19 | } 20 | \item{source}{ 21 | The same as \code{target}, but may also be one or more 22 | KEGG identifiers. 23 | } 24 | } 25 | \details{ 26 | Many of the old KEGGSOAP functions whose names 27 | started with 'get', such as \code{get.pathways.by.genes} and 28 | \code{get.pathways.by.reactions}, 29 | are replaced by using \code{keggLink} (see examples). 30 | 31 | 32 | 33 | } 34 | \value{ 35 | A named character vector. 36 | } 37 | \references{ 38 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 39 | } 40 | \author{ 41 | Dan Tenenbaum 42 | } 43 | \examples{ 44 | res <- keggLink("pathway", "hsa") ## KEGG pathways linked from each of 45 | ## the human genes equivalent to 'get.genes.by.pathway' in KEGGSOAP 46 | length(res) 47 | head(res) 48 | res <- keggLink("hsa", "pathway") ## human genes linked from each of the 49 | ## KEGG pathways equivalent to 'get.pathways.by.genes' in KEGGSOAP 50 | keggLink("pathway", c("hsa:10458", "ece:Z5100")) ## KEGG pathways 51 | ## linked from a human gene and an E. coli O157 gene 52 | res <- keggLink("hsa:126") ## LinkDB search shows all KEGG 53 | ## resources related to hsa:126 54 | head(res) 55 | } 56 | \keyword{ link } 57 | -------------------------------------------------------------------------------- /man/keggList.Rd: -------------------------------------------------------------------------------- 1 | \name{keggList} 2 | \alias{keggList} 3 | \title{ 4 | Returns a list of entry identifiers and associated definition for a given 5 | database or a given set of database entries. 6 | %% ~~function to do ... ~~ 7 | } 8 | \description{ 9 | Returns a list of entry identifiers and associated definition for a given 10 | database or a given set of database entries. 11 | } 12 | \usage{ 13 | keggList(database, organism) 14 | } 15 | %- maybe also 'usage' for other objects documented here. 16 | \arguments{ 17 | \item{database}{ 18 | %% ~~Describe \code{x} here~~ 19 | Either a KEGG database (list available via \code{\link{listDatabases}()}), 20 | a KEGG organism code (list available via \code{\link{keggList}()} with the 21 | \code{organism} argument, a T number (list available via 22 | \code{\link{keggList}()} with the \code{genome} argument), or a character 23 | vector of KEGG identifiers. 24 | } 25 | \item{organism}{ 26 | \code{Optional.} A KEGG organism identifier (list available via 27 | \code{\link{keggList}()} with the \code{organism} argument). 28 | } 29 | } 30 | \value{ 31 | A named character vector containing entry identifiers and 32 | associated definition. 33 | } 34 | \references{ 35 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 36 | } 37 | \author{ 38 | Dan Tenenbaum 39 | } 40 | \examples{ 41 | res <- keggList("pathway") ## returns the list of reference pathways 42 | length(res) 43 | head(res) 44 | res <- keggList("pathway", "hsa") ## returns the list of human pathways 45 | length(res) 46 | head(res) 47 | res <- keggList("organism") ## returns the list of KEGG organisms with 48 | ## taxonomic classification 49 | nrow(res) 50 | head(res) 51 | res <- keggList("hsa") ## returns the entire list of human genes 52 | length(res) 53 | head(res) 54 | ## keggList("T01001") ## same as above 55 | keggList(c("hsa:10458", "ece:Z5100")) ## returns the list of a human gene 56 | ## and an E.coli O157 gene 57 | keggList(c("cpd:C01290","gl:G00092")) ## returns the list of a compound entry 58 | ## and a glycan entry 59 | keggList(c("C01290+G00092")) ## same as above (prefixes are not necessary) 60 | } 61 | \keyword{ list } 62 | -------------------------------------------------------------------------------- /man/listDatabases.Rd: -------------------------------------------------------------------------------- 1 | \name{listDatabases} 2 | \alias{listDatabases} 3 | \title{ 4 | Lists the KEGG databases which may be searched. 5 | } 6 | \description{ 7 | Lists the KEGG databases which may be searched. In most cases, 8 | you can also use a KEGG organism name or T number (genome identifier) 9 | as a database name. 10 | } 11 | \usage{ 12 | listDatabases() 13 | } 14 | \value{ 15 | A character vector of database names. 16 | } 17 | \references{ 18 | \url{https://www.kegg.jp/kegg/docs/keggapi.html} 19 | } 20 | \author{ 21 | Dan Tenenbaum 22 | } 23 | \seealso{ 24 | \code{\link{keggList}} 25 | } 26 | \examples{ 27 | listDatabases() 28 | res <- keggList("organism") ## list all organisms 29 | nrow(res) 30 | head(res) 31 | res <- keggList("hsa") ## list all human genes 32 | length(res) 33 | head(res) 34 | ## keggList("T01001") ## list all human genes 35 | res <- keggList("genome") ## list all genome identifiers 36 | length(res) 37 | head(res) 38 | } 39 | \keyword{ database } 40 | \keyword{ databases } 41 | -------------------------------------------------------------------------------- /man/mark.pathway.by.objects.Rd: -------------------------------------------------------------------------------- 1 | \name{mark.pathway.by.objects} 2 | \alias{mark.pathway.by.objects} 3 | \alias{color.pathway.by.objects} 4 | 5 | \title{Client-side interface to obtain an url for a KEGG pathway diagram 6 | with a given set of genes marked} 7 | \description{ 8 | Given a KEGG pathway id and a set of KEGG gene ids, the functions 9 | return the URL of a KEGG pathway diagram with the elements 10 | corresponding to the genes marked by red or specified color 11 | } 12 | \usage{ 13 | mark.pathway.by.objects(pathway.id, object.id.list) 14 | color.pathway.by.objects(pathway.id, object.id.list, 15 | fg.color.list, bg.color.list) 16 | } 17 | 18 | \arguments{ 19 | \item{pathway.id}{\code{pathway.id} a character string for a KEGG 20 | pathway id. KEGG pathway ids consist of the string path followed by 21 | a colon, a three-letter code for the organism of concern, and then 22 | a number (e. g. "path:eco00020"). The three-letter organism code 23 | consists of the first letter of the genus name and the first two 24 | letters of the species name of the scientific name of the organism 25 | of concern} 26 | \item{object.id.list}{\code{object.id.list} a vector of character 27 | strings for KEGG gene ids. KEGG gene ids normally consist of 28 | three letters followed by a column and then several numeric 29 | numbers. The three letters are from the first letter of the genus 30 | name and the first two letters of the species name of the scientific 31 | name of the organism of concern (e. g. hsa:111 for Homo Sapiens)} 32 | \item{fg.color.list}{\code{fg.color.list} a vector of two character 33 | strings to indicate the color for the text and border, respectively, 34 | of the objects in a pathway diagram. The strings can either be a 35 | color code linke #ff0000 or letter link yellow} 36 | \item{bg.color.list}{\code{bg.color.list} a vector of character 37 | strings of the same length of \code{object.id.list} to indicate the 38 | background color of the objects in a pathway diagram. The strings 39 | can either be a color code like #ff0000 or letter like yellow} 40 | } 41 | \details{ 42 | This function only returns the URL of the KEGG pathway diagram. Use 43 | the function \code{\link{browseURL}} to view the diagram. 44 | 45 | These functions are not part of the KEGG REST API; they are provided 46 | because they existed in \code{KEGGSOAP} and an alternative implementation 47 | was possible. 48 | } 49 | \value{ 50 | This function returns a character string for the url 51 | } 52 | \references{\url{https://www.kegg.jp/kegg/docs/keggapi.html}} 53 | \author{Jianhua Zhang} 54 | 55 | \seealso{\code{\link{browseURL}}} 56 | \examples{ 57 | url <- mark.pathway.by.objects( 58 | "path:eco00260", c("eco:b0002", "eco:c00263") 59 | ) 60 | if(interactive()){ 61 | browseURL(url) 62 | } 63 | url <- color.pathway.by.objects( 64 | "path:eco00260", c("eco:b0002", "eco:c00263"), 65 | c("#ff0000", "#00ff00"), 66 | c("#ffff00", "yellow") 67 | ) 68 | } 69 | \keyword{ datasets } 70 | 71 | -------------------------------------------------------------------------------- /tests/KEGGREST_unit_tests.R: -------------------------------------------------------------------------------- 1 | BiocGenerics:::testPackage("KEGGREST") 2 | -------------------------------------------------------------------------------- /vignettes/KEGGREST-vignette.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Accessing the KEGG REST API" 3 | date: "`r format(Sys.Date(), '%B %d, %Y')`" 4 | vignette: > 5 | %\VignetteIndexEntry{Accessing the KEGG REST API} 6 | %\VignetteEngine{knitr::rmarkdown} 7 | %\VignetteEncoding{UTF-8} 8 | output: 9 | BiocStyle::html_document: 10 | toc: true 11 | --- 12 | 13 | ```{r setup, echo=FALSE} 14 | library(knitr) 15 | options(width=80) 16 | ``` 17 | ```{r wrap-hook, echo=FALSE} 18 | hook_output = knit_hooks$get('output') 19 | knit_hooks$set(output = function(x, options) { 20 | # this hook is used only when the linewidth option is not NULL 21 | if (!is.null(n <- options$linewidth)) { 22 | x = knitr:::split_lines(x) 23 | # any lines wider than n should be wrapped 24 | if (any(nchar(x) > n)) x = strwrap(x, width = n) 25 | x = paste(x, collapse = '\n') 26 | } 27 | hook_output(x, options) 28 | }) 29 | ``` 30 | 31 | # KEGGREST 32 | 33 | [KEGG](https://www.kegg.jp/kegg/) 34 | is a database resource for understanding high-level functions 35 | and utilities of the biological system, such as the cell, the organism 36 | and the ecosystem, from molecular-level information, especially 37 | large-scale molecular datasets generated by genome sequencing and 38 | other high-throughput experimental technologies. 39 | 40 | `KEGGREST` allows access to the 41 | [KEGG REST API](https://www.kegg.jp/kegg/docs/keggapi.html). Since 42 | KEGG disabled the KEGG SOAP server 43 | on December 31, 2012 (which means the `KEGGSOAP` package will no 44 | longer work), `KEGGREST` serves as a replacement. 45 | 46 | The interface to `KEGGREST` is simpler and in some ways more 47 | powerful than `KEGGSOAP`; however, not all the functionality 48 | that was available through the SOAP API has been exposed 49 | in the REST API. If and when more functionality is exposed 50 | on the server side, this package will be updated to take 51 | advantage of it. 52 | 53 | **Restriction: The KEGG API is provided for academic use by academic 54 | users belonging to academic institutions. See https://www.kegg.jp/kegg/rest/ 55 | for more information.** 56 | 57 | ## Installation 58 | 59 | You can install `KEGGREST` from Bioconductor with: 60 | 61 | ```{r install,eval=FALSE} 62 | if (!require("BiocManager", quietly=TRUE)) 63 | install.packages("BiocManager") 64 | 65 | BiocManager::install("KEGGREST") 66 | ``` 67 | 68 | ## Overview 69 | 70 | The KEGG REST API is built on some simple operations: 71 | `info`, `list`, `find`, `get`, `conv`, and `link`. 72 | The corresponding `R` functions in `KEGGREST` are: 73 | `keggInfo()`, `keggList()`, `keggFind()`, `keggGet()`, 74 | `keggConv`, and `keggLink()`. 75 | 76 | 77 | # Exploring KEGG Resources with `keggList()` 78 | 79 | KEGG exposes a number of databases. To get an idea of 80 | what is available, run `listDatabases()`: 81 | 82 | ```{r listDatabases} 83 | library(KEGGREST) 84 | listDatabases() 85 | ``` 86 | You can use these databases in further queries. Note that in many 87 | cases you can also use a three-letter KEGG organism code or a 88 | "T number" (genome identifier) in the same place you would use 89 | one of these database names. 90 | 91 | You can obtain the list of organisms available in KEGG with 92 | the `keggList()` function: 93 | 94 | ```{r get_organisms} 95 | org <- keggList("organism") 96 | head(org) 97 | ``` 98 | 99 | From `KEGGREST`'s point of view, you've just asked KEGG 100 | to show you the name of every entry in the "organism" database. 101 | 102 | Therefore, the complete list of entities that can be 103 | queried with `KEGGREST` can be obtained as follows: 104 | 105 | ```{r list_queryables} 106 | queryables <- c(listDatabases(), org[,1], org[,2]) 107 | ``` 108 | 109 | You could also ask for every entry in the "hsa" (_Homo sapiens_) 110 | database as follows: 111 | 112 | ```{r query_hsa, eval=FALSE} 113 | keggList("hsa") 114 | ``` 115 | 116 | # Get specific entries with `keggGet()` 117 | 118 | Once you have a list of specific KEGG identifiers, use 119 | `keggGet()` to get more information about them. Here we look up 120 | a human gene and an E. coli O157 gene: 121 | 122 | ```{r keggGet} 123 | query <- keggGet(c("hsa:10458", "ece:Z5100")) 124 | ``` 125 | 126 | As expected, this returns two items: 127 | 128 | ```{r querylength} 129 | length(query) 130 | ``` 131 | 132 | Behind the scenes, `KEGGREST` downloaded and parsed a KEGG 133 | [flat file](https://www.kegg.jp/kegg/rest/dbentry.html), which you 134 | can now explore: 135 | 136 | ```{r explore} 137 | names(query[[1]]) 138 | query[[1]]$ENTRY 139 | query[[1]]$DBLINKS 140 | ``` 141 | 142 | `keggGet()` can also return amino acid sequences as `AAStringSet` objects 143 | (from the `Biostrings` package): 144 | 145 | ```{r aaseq} 146 | keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino acid sequences 147 | ``` 148 | 149 | ...or `DNAStringSet` objects if `option` is `ntseq`: 150 | 151 | ```{r ntseq} 152 | keggGet(c("hsa:10458", "ece:Z5100"), "ntseq") ## retrieves nucleotide sequences 153 | ``` 154 | 155 | 156 | 157 | `keggGet()` can also return images: 158 | ```{r png} 159 | png <- keggGet("hsa05130", "image") 160 | t <- tempfile() 161 | library(png) 162 | writePNG(png, t) 163 | if (interactive()) browseURL(t) 164 | ``` 165 | 166 | __NOTE__: `keggGet()` can only return 10 result sets at once (this limitation 167 | is on the server side). If you supply more than 10 inputs to `keggGet()`, 168 | `KEGGREST` will warn that only the first 10 results will be returned. 169 | 170 | # Search by keywords with `keggFind()` 171 | 172 | You can search for two separate keywords ("shiga" and "toxin" in this case): 173 | 174 | ```{r separate_keywords, linewidth=80} 175 | head(keggFind("genes", c("shiga", "toxin"))) 176 | ``` 177 | 178 | Or search for the two words together: 179 | 180 | ```{r keyphrase, linewidth=80} 181 | head(keggFind("genes", "shiga toxin")) 182 | ``` 183 | 184 | Search for a chemical formula: 185 | ```{r formula} 186 | head(keggFind("compound", "C7H10O5", "formula")) 187 | ``` 188 | Search for a chemical formula containing "O5" and "C7": 189 | ```{r formula2} 190 | head(keggFind("compound", "O5C7", "formula")) 191 | ``` 192 | 193 | You can search for compounds with a particular exact mass: 194 | 195 | ```{r exact_mass} 196 | keggFind("compound", 174.05, "exact_mass") 197 | ``` 198 | 199 | Because we've supplied a number with two decimal digits of precision, 200 | KEGG will find all compounds with exact mass between 174.045 and 174.055. 201 | 202 | Integer ranges can be used to find compounds by molecular weight: 203 | 204 | ```{r mol_weight} 205 | head(keggFind("compound", 300:310, "mol_weight")) 206 | ``` 207 | 208 | # Convert identifiers with `keggConv()` 209 | 210 | Convert between KEGG identifiers and outside identifiers. 211 | 212 | You can either specify fully qualified identifiers: 213 | 214 | ```{r conv_with_ids} 215 | keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100")) 216 | ``` 217 | 218 | ...or get the mapping for an entire species: 219 | 220 | ```{r conv_species_kegg_to_geneid} 221 | head(keggConv("eco", "ncbi-geneid")) 222 | ``` 223 | 224 | Reversing the arguments does the opposite mapping: 225 | 226 | ```{r conv_species_geneid_to_kegg} 227 | head(keggConv("ncbi-geneid", "eco")) 228 | 229 | ``` 230 | 231 | # Link across databases with `keggLink()` 232 | 233 | Most of the `KEGGSOAP` functions whose names started with 234 | "get", for example `get.pathways.by.genes()`, can be replaced 235 | with the `keggLink()` function. Here we query all pathways 236 | for human: 237 | 238 | ```{r keggLink} 239 | head(keggLink("pathway", "hsa")) 240 | ``` 241 | 242 | ...but you can also specify one or more genes (from multiple species): 243 | ```{r keggLink2} 244 | keggLink("pathway", c("hsa:10458", "ece:Z5100")) 245 | ``` 246 | --------------------------------------------------------------------------------