├── DESCRIPTION
├── NAMESPACE
├── NEWS
├── QUESTIONS
├── R
├── KEGGREST.R
├── parsers.R
└── utilities.R
├── README.md
├── inst
└── unitTests
│ └── test_KEGGREST.R
├── man
├── keggCompounds.Rd
├── keggConv.Rd
├── keggFind.Rd
├── keggGet.Rd
├── keggInfo.Rd
├── keggLink.Rd
├── keggList.Rd
├── listDatabases.Rd
└── mark.pathway.by.objects.Rd
├── tests
└── KEGGREST_unit_tests.R
└── vignettes
└── KEGGREST-vignette.Rmd
/DESCRIPTION:
--------------------------------------------------------------------------------
1 | Package: KEGGREST
2 | Version: 1.49.0
3 | Title:
4 | Client-side REST access to the Kyoto Encyclopedia of Genes and Genomes (KEGG)
5 | Authors@R: c(
6 | person("Dan", "Tenenbaum", role = "aut"),
7 | person("Bioconductor Package", "Maintainer", role = c("aut", "cre"),
8 | email = "maintainer@bioconductor.org"),
9 | person("Martin", "Morgan", role = "ctb"),
10 | person("Kozo", "Nishida", role = "ctb"),
11 | person("Marcel", "Ramos", role = "ctb"),
12 | person("Kristina", "Riemer", role = "ctb"),
13 | person("Lori", "Shepherd", role = "ctb"),
14 | person("Jeremy", "Volkening", role = "ctb")
15 | )
16 | Depends: R (>= 3.5.0)
17 | Imports: methods, httr, png, Biostrings
18 | Suggests: RUnit, BiocGenerics, BiocStyle, knitr, markdown
19 | Description:
20 | A package that provides a client interface to the Kyoto
21 | Encyclopedia of Genes and Genomes (KEGG) REST API. Only
22 | for academic use by academic users belonging to academic
23 | institutions (see ).
24 | Note that KEGGREST is based on KEGGSOAP by J. Zhang, R. Gentleman,
25 | and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.
26 | URL: https://bioconductor.org/packages/KEGGREST
27 | BugReports: https://github.com/Bioconductor/KEGGREST/issues
28 | License: Artistic-2.0
29 | VignetteBuilder: knitr
30 | biocViews: Annotation, Pathways, ThirdPartyClient, KEGG
31 | RoxygenNote: 7.1.1
32 | Date: 2024-06-17
33 |
--------------------------------------------------------------------------------
/NAMESPACE:
--------------------------------------------------------------------------------
1 |
2 | importFrom(utils, download.file, head)
3 | importFrom(httr, GET, POST, http_status, content, stop_for_status)
4 | importFrom(png, readPNG, writePNG)
5 | importFrom(Biostrings, readAAStringSet, readDNAStringSet,
6 | DNAStringSet, AAStringSet)
7 | import(methods)
8 |
9 | export(
10 | keggInfo,
11 | keggList,
12 | listDatabases,
13 | keggFind,
14 | keggGet,
15 | keggCompounds,
16 | keggConv,
17 | keggLink,
18 | mark.pathway.by.objects,
19 | color.pathway.by.objects
20 | )
21 |
22 |
--------------------------------------------------------------------------------
/NEWS:
--------------------------------------------------------------------------------
1 | CHANGES IN VERSION 1.46.0
2 | -----------------------
3 |
4 | BUG FIXES
5 |
6 | o 1.45.1 Fix keggFind URL to use '+' instead of spaces.
7 |
8 | CHANGES IN VERSION 1.42.0
9 | -----------------------
10 |
11 | SIGNIFICANT USER-VISIBLE CHANGES
12 |
13 | o `keggCompounds` lists compound IDs for a given pathway (@KristinaRiemer,
14 | #6).
15 |
16 | BUG FIXES
17 |
18 | o Update URL path in `.get.kegg.url` from `tmp` to `kegg` subfolder.
19 |
20 | CHANGES IN VERSION 1.37.0
21 | -----------------------
22 |
23 | BUG CORRECTION
24 |
25 | o 1.37.1 Fixes new endpoint
26 | o 1.37.2 http to https fixes windows error
27 |
28 | CHANGES IN VERSION 1.0.0
29 | -----------------------
30 |
31 | SIGNIFICANT USER-VISIBLE CHANGES
32 |
33 | o Package introduced.
34 |
35 | NEW FEATURES
36 |
37 | o Package introduced.
38 |
--------------------------------------------------------------------------------
/QUESTIONS:
--------------------------------------------------------------------------------
1 | Questions for the KEGG team.
2 |
3 |
4 | ** No apparent replacement found for old APIs
5 |
6 | Is there a new programmatic way to call the old SOAP api "get_motifs_by_gene"?
7 | I know I can do it manually with a request like this:
8 |
9 | http://www.kegg.jp/ssdb-bin/ssdb_motif?kid=eco%3Ab0002&lib=pfam
10 |
11 | But then I have to scrape the page.
12 |
13 | I have a similar question about "get_genes_by_motifs", which I can also do
14 | manually from http://www.kegg.jp/kegg/ssdb/. Also, the SOAP API had
15 | "start" and "max_results" arguments; is there an equivalent?
16 |
17 | About the SOAP APIs "get_best_neighbors_by_gene" and
18 | "get_best_best_neighbors_by_gene"; it looks like similar
19 | functionality is provided by http://www.kegg.jp/kegg/ssdb/ but again,
20 | is there a more programmatic way to do it?
21 |
22 | I’d also like to replace the old API "get_paralogs_by_gene"
23 | which it seems like I can also do from the
24 | http://www.kegg.jp/kegg/ssdb/ page.
25 | With all of these SSDB functions, is there support for the
26 | "start" and "max_results" arguments?
27 |
28 | Although I can search compounds by mass,
29 | (example: http://rest.kegg.jp/find/compound/174.05/exact_mass/),
30 | I can’t seem to search glycans by mass as I could with the SOAP API
31 | "search_glycans_by_mass". Is there an equivalent function in the REST API?
32 |
33 | There does not seem to be an equivalent to the SOAP API
34 | "search_compounds_by_subcomp". Is there a replacement?
35 |
36 | What about "search_glycans_by_kcam" and the more general-purpose
37 | "bget"? Is there a way in the REST api to return flat-file records,
38 | similar to what "bget" did?
39 |
40 | Two other "missing" functions seem to be "get_ko_by_ko_class"
41 | and "get_genes_by_ko_class". Are there replacements for these?
42 |
43 | Is the SOAP function get_html_of_colored_pathway_by_elements
44 | any different from e.g.
45 | http://www.kegg.jp/kegg-bin/show_pathway?eco00260/b0002%09%23ff0000,%2300ff00/c00263%09%23ffff00,yellow
46 | ?
47 |
48 | Is there a REST implementation of the SOAP functions
49 | get_element_relations_by_pathway and get_elements_by_pathway?
50 |
51 |
52 | ** Results of new APIs differ from old
53 |
54 | Calling the SOAP api "get_enzymes_by_pathway" with
55 | "pathway_id" equal to "path:eco00020" yields a result of 14
56 | enzymes; what seems to be the equivalent in REST,
57 | http://rest.kegg.jp/link/enzyme/path:eco00020 returns nothing.
58 | Is this expected? Have the data changed?
59 |
60 | Similarly, calling "get_compounds_by_pathway" with a
61 | "pathway_id" argument of "path:eco00020" returns 20 compounds
62 | in SOAP, but the REST equivalent (?),
63 | http://rest.kegg.jp/link/compound/path%3aeco00020 returns nothing.
64 |
65 | Calling this REST api with a different argument:
66 | http://rest.kegg.jp/link/compound/path:map00010
67 | does return some results.
68 |
69 | Calling the SOAP api "get_kos_by_pathway" with a "pathway_id"
70 | argument of "path:hsa00010" returns 36 results, but the seeming REST
71 | equivalent, http://rest.kegg.jp/link/ko/path%3ahsa00010 returns nothing.
72 |
73 |
74 | ** Missing Arguments
75 |
76 | The SOAP API "get_genes_by_organism" has "start" and "max_results
77 | arguments. The REST equivalent, which seems to be, for example
78 | http://rest.kegg.jp/list/hsa does not appear to have such arguments.
79 | Is there a way to paginate results that come back from the REST server?
80 |
81 | It looks like I can call an equivalent of the old
82 | "get_genes_by_ko" API, as follows:
83 | http://rest.kegg.jp/link/genes/ko:K12524
84 | But in the old API, I could filter by organism (e.g. "eco").
85 | Also, the SOAP api would return annotations, for example, for
86 | eco:b0002 it would return
87 | "thrA; fused aspartokinase I and homoserine dehydrogenase I
88 | (EC:2.7.2.4 1.1.1.3); K12524 bifunctional aspartokinase / homoserine
89 | dehydrogenase 1 [EC:2.7.2.4 1.1.1.3] ".
90 |
91 | Is there a way I can get these annotations back in a REST query?
92 | It seems I only get two columns back, the ko ID and gene ID.
93 |
94 | Similarly, the SOAP API "get_pathways_by_kos" had an "org"
95 | argument to filter by "organism". The REST equivalent, for
96 | example http://rest.kegg.jp/link/pathway/ko:K00016+ko:K00382
97 | oes not seem to have this option, and the results do not allow
98 | me to do my own filtering (that is, they do not have three-letter
99 | organism codes).
100 |
101 | Also, the results differ between SOAP and REST.
102 | In SOAP, calling "get_pathways_by_kos" with a "ko_id_list"
103 | argument of ko:K00016 and ko:K00382, and an "org" argument
104 | of "hsa", returns path:hsa00010 and path:hsa00620, but the results of
105 | http://rest.kegg.jp/link/pathway/ko:K00016+ko:K00382 do not
106 | include these items.
107 |
108 |
--------------------------------------------------------------------------------
/R/KEGGREST.R:
--------------------------------------------------------------------------------
1 | keggInfo <- function(database)
2 | {
3 | ## FIXME return an object instead of a character vector
4 | url <- sprintf("%s/info/%s", .getRootUrl(), database)
5 | .getUrl(url, .textParser)
6 | }
7 |
8 |
9 | keggList <- function(database, organism)
10 | {
11 | database <- paste(database, collapse="+")
12 | if (missing(organism))
13 | url <- sprintf("%s/list/%s", .getRootUrl(), database)
14 | else
15 | url <- sprintf("%s/list/%s/%s", .getRootUrl(), database, organism)
16 | if (database == "organism")
17 | return(.organismListParser(url))
18 | .getUrl(url, .listParser, nameColumn=1, valueColumn=2)
19 | }
20 |
21 | keggFind <- function(database, query,
22 | option=c("formula", "exact_mass", "mol_weight"))
23 | {
24 | if(missing(database))
25 | stop("'database' argument is required")
26 | if (!missing(option))
27 | option <- match.arg(option)
28 | if (is.integer(query) && length(query) > 1)
29 | query <- sprintf("%s-%s", min(query), max(query))
30 | query <- gsub("\\s", "+", query)
31 | query <- paste(query, collapse="+")
32 | url <- sprintf("%s/find/%s/%s", .getRootUrl(), database, query)
33 | if (!missing(option))
34 | url <- sprintf("%s/%s", url, option)
35 | .getUrl(url, .listParser, nameColumn=1, valueColumn=2)
36 | }
37 |
38 |
39 | keggGet <- function(dbentries,
40 | option=c("aaseq", "ntseq", "mol", "kcf", "image", "kgml"))
41 | {
42 | if (length(dbentries) > 10)
43 | warning(paste("More than 10 inputs supplied, only the first",
44 | "10 results will be returned."))
45 | dbentries <- paste(dbentries, collapse="+")
46 | url <- sprintf("%s/get/%s", .getRootUrl(), dbentries)
47 | if (!missing(option))
48 | {
49 | url <- sprintf("%s/%s", url, option)
50 |
51 | if (option == "image")
52 | return(content(GET(url), type="image/png"))
53 | if (option %in% c("aaseq", "ntseq"))
54 | {
55 | t <- tempfile()
56 | cat(.getUrl(url, .textParser), file=t)
57 | if (option == "aaseq")
58 | return(readAAStringSet(t))
59 | else if (option == "ntseq")
60 | return(readDNAStringSet(t))
61 | }
62 | if (option %in% c("mol", "kcf", "kgml"))
63 | return(.getUrl(url, .textParser))
64 | }
65 | if (grepl("^br:", dbentries[1]))
66 | return(.getUrl(url, .textParser))
67 | .getUrl(url, .flatFileParser)
68 | }
69 |
70 | keggCompounds <- function(pathwayID)
71 | {
72 | url <- sprintf("%s/link/cpd/%s", .getRootUrl(), pathwayID)
73 | .getUrl(url, .compoundParser)
74 | }
75 |
76 | .keggConv <- function(target, source)
77 | {
78 | query <-paste(source, collapse = "+")
79 | url <- sprintf("%s/conv/%s/%s", .getRootUrl(), target, query)
80 | .getUrl(url, .listParser, nameColumn = 1, valueColumn = 2)
81 | }
82 |
83 | keggConv <- function (target, source, querySize = 100)
84 | {
85 | groups <- .splitInGroups(source, querySize)
86 | answer <- lapply(groups, .keggConv, target = target)
87 | as(unlist(answer), "character")
88 | }
89 |
90 | keggLink <- function(target, source)
91 | {
92 | if (missing(source))
93 | {
94 | url <- sprintf("%s/link/%s",
95 | .getGenomeUrl(), target)
96 | .getUrl(url, .matrixParser, ncol=3)
97 | } else {
98 | url <- sprintf("%s/link/%s/%s",
99 | .getRootUrl(), target, paste(source, collapse="+"))
100 | .getUrl(url, .listParser, nameColumn=1, valueColumn=2)
101 |
102 | }
103 | ## FIXME?? keggLink("pathway",c("hsa:10458", "ece:Z5100"))
104 | ## returns a list with duplicate names
105 | }
106 |
107 |
108 | listDatabases <- function()
109 | {
110 | c("pathway", "brite", "module", "ko", "genome", "vg", "ag", "compound",
111 | "glycan", "reaction", "rclass", "enzyme", "disease", "drug",
112 | "dgroup", "environ", "genes", "ligand", "kegg")
113 | }
114 |
115 | ## This is not strictly speaking an API supported by the KEGG REST
116 | ## server, but it seems useful, and does not use SOAP, so I'm leaving it in.
117 | mark.pathway.by.objects <- function(pathway.id, object.id.list)
118 | {
119 | ## example: http://www.kegg.jp/pathway/eco00260+b0002+c00263
120 | pathway.id <- sub("^path:", "", pathway.id)
121 | if (!missing(object.id.list)) {
122 | object.id.list <- paste(object.id.list, collapse="+")
123 | pathway.id <- sprintf("%s+%s", pathway.id, object.id.list)
124 | }
125 | url <- sprintf("https://www.kegg.jp/pathway/%s", pathway.id)
126 | .get.kegg.url(url)
127 | }
128 |
129 | ## This is not strictly speaking an API supported by the KEGG REST
130 | ## server, but it seems useful, and does not use SOAP, so I'm leaving it in.
131 | color.pathway.by.objects <- function(pathway.id, object.id.list,
132 | fg.color.list, bg.color.list)
133 | {
134 | ## example: http://www.kegg.jp/kegg-bin/show_pathway?eco00260/b0002%09%23ff0000,%2300ff00/c00263%09%23ffff00,yellow
135 | ## also works to include organism code in gene IDs
136 | ## (but don't include path: in pathway id)
137 | ## documentation here: http://www.kegg.jp/kegg/rest/weblink.html
138 | ## and here: http://www.kegg.jp/kegg/tool/map_pathway2.html
139 |
140 | ## Nov 2020: refactored to use form POST due to issues with long URLs when
141 | ## large identifier lists are passed.
142 |
143 | pathway.id <- sub("^path:", "", pathway.id)
144 | if (!(length(object.id.list)==length(fg.color.list) &&
145 | length(fg.color.list) == length(bg.color.list))) {
146 | stop(paste("object.id.list, fg.color.list, and bg.color.list must",
147 | "all be the same length."))
148 | }
149 |
150 | # format identifier/color list as expected by server
151 | payload <- paste(
152 | c("#ids", object.id.list),
153 | c("cols", paste(bg.color.list, fg.color.list, sep=',')),
154 | sep="\t",
155 | collapse="\n"
156 | )
157 |
158 | # fetch KEGG page from server, via a 302 redirect handled by httr
159 | # transparently
160 | res <- POST(
161 | url = "https://www.kegg.jp/kegg-bin/show_pathway",
162 | body = list(
163 | map = pathway.id,
164 | multi_query = payload,
165 | mode = 'color'
166 | ),
167 | encode="multipart"
168 | )
169 | res <- content(res, "text")
170 |
171 | # extract image URL from page
172 | img_matches <- regexpr(
173 | "(?<=
1) {
185 | stop(
186 | "'color.pathway.by.objects()' ",
187 | "unexpectedly matched multiple KEGG image paths in response."
188 | )
189 | }
190 | sprintf("https://www.kegg.jp%s", img_url)
191 |
192 | }
193 |
194 |
--------------------------------------------------------------------------------
/R/parsers.R:
--------------------------------------------------------------------------------
1 |
2 | .matrixParser <- function(txt, ncol)
3 | {
4 | lines <- strsplit(txt, "\n")[[1]]
5 | split <- strsplit(lines, "\t")
6 | u <- unlist(split)
7 | matrix(u, ncol=ncol, byrow=TRUE)
8 | }
9 |
10 |
11 | .organismListParser <- function(url)
12 | {
13 | lines <- readLines(url)
14 | split <- strsplit(lines, "\t")
15 | u <- unlist(split)
16 | m <- matrix(u, ncol=4, byrow=TRUE)
17 | colnames(m) <- c("T.number", "organism", "species", "phylogeny")
18 | m
19 | }
20 |
21 | .get_parser_NAME <- function(entry)
22 | {
23 | ret <- list()
24 | for (value in names(entry))
25 | {
26 | ret[[value]] <- gsub("^;|;$", "", entry[[value]])
27 | }
28 | ret
29 | }
30 |
31 | .get_parser_ENTRY <- function(entry)
32 | {
33 | segs <- strsplit(unlist(entry[[1]]), " +")[[1]]
34 | ret <- c(segs[1])
35 | names(ret) <- segs[2]
36 | ret
37 | }
38 |
39 |
40 | .get_parser_REFERENCE <- function(refs)
41 | {
42 | ret <- list()
43 | thisref <- list()
44 | for (i in 1:length(refs)) {
45 | #sapply(refs, function(item) {
46 | item <- refs[[i]]
47 | if (item$refField == "REFERENCE")
48 | {
49 | if (length(thisref) > 0)
50 | ret <- c(ret, list(thisref))
51 | thisref <- list(id=item$value)
52 | } else {
53 | if (is.null(thisref[[item$refField]]))
54 | thisref[[item$refField]] <- list()
55 | thisref[[item$refField]] <- c(thisref[[item$refField]],
56 | item$value)
57 | }
58 | #})
59 | }
60 | ret <- c(ret, list(thisref))
61 | ret
62 | }
63 |
64 |
65 | .get_parser_key_value <- function(entry)
66 | {
67 | content <- c()
68 | names <- c()
69 | lines <- unlist(strsplit(unname(unlist(entry)), "\n", fixed=TRUE))
70 | for (line in lines)
71 | {
72 | tmp <- strsplit(line, " ", fixed=TRUE)[[1]]
73 | key <- tmp[1]
74 | value <- paste(tmp[2:length(tmp)], collapse=" ")
75 | if (is.na(value))
76 | value <- ""
77 | content <- c(content, .strip(value))
78 | names <- c(names, .strip(key))
79 | }
80 | names(content) <- names
81 | content
82 | }
83 |
84 | .get_parser_list <- function(entry)
85 | {
86 | unname(unlist(strsplit(unlist(entry), " {2,}")))
87 | }
88 |
89 | .get_parser_list_or_key_value <- function(entry)
90 | {
91 | x <- unlist(entry)
92 | if (any(grepl(" {2,}", x)))
93 | .get_parser_key_value(entry)
94 | else
95 | .get_parser_list(entry)
96 | ## unlist(unname(sapply(entry, strsplit, " ")))
97 | }
98 |
99 |
100 | .get_parser_biostring <- function(entry, type)
101 | {
102 | ntseq <- unname(unlist(entry))
103 | tmp <- ntseq[2:length(ntseq)]
104 | seq <- paste(tmp, collapse="")
105 | if (type=="AAStringSet")
106 | AAStringSet(seq)
107 | else if (type == "DNAStringSet")
108 | DNAStringSet(seq)
109 | }
110 |
111 |
112 | .flatFileParser <- function(txt)
113 | {
114 | entry <- list()
115 | refs <- list()
116 | allEntries <- c()
117 | last_field <- NULL
118 | lines <- strsplit(.strip(txt), "\n", fixed=TRUE)[[1]]
119 | ffrec <- flatFileRecordGen()
120 | for (line in lines)
121 | {
122 | if (line == "///")
123 | {
124 | ffrec$flush()
125 | for (name in ffrec$names())
126 | {
127 | item <- ffrec$get(name)
128 | if (name == "ENTRY")
129 | ffrec$set("ENTRY", .get_parser_ENTRY(item))
130 | if (name %in% c("ENZYME", "MARKER", "ALL_REAC",
131 | "RELATEDPAIR", "DBLINKS", "DRUG", "GENE"))
132 | ffrec$set(name, .get_parser_list(item))
133 | if (name %in% c("PATHWAY", "ORTHOLOGY", "PATHWAY_MAP", "MODULE",
134 | "DISEASE", "REL_PATHWAY", "COMPOUND",
135 | "REACTION", "ORGANISM"))
136 | {
137 | ffrec$set(name, .get_parser_key_value(item))
138 | }
139 | if (name %in% c("REACTION"))
140 | {
141 | ffrec$set(name, .get_parser_list_or_key_value(item))
142 | }
143 | item <- ffrec$get(name)
144 | if(length(item) == 1 && "list" %in% class(item))
145 | {
146 | item <- unlist(item)
147 | item <- unname(item)
148 | ffrec$set(name, item)
149 | }
150 | }
151 | if ("NTSEQ" %in% ffrec$names())
152 | {
153 | ffrec$set("NTSEQ",
154 | .get_parser_biostring(ffrec$get("NTSEQ"), "DNAStringSet"))
155 | }
156 | if ("AASEQ" %in% ffrec$names())
157 | {
158 | ffrec$set("AASEQ",
159 | .get_parser_biostring(ffrec$get("AASEQ"), "AAStringSet"))
160 | }
161 |
162 | ## dreaded copy-and-append pattern
163 | allEntries <- c(allEntries, list(ffrec$getFields()))
164 | ffrec <- flatFileRecordGen()
165 | } else {
166 | subfield <- NULL
167 | tmp <- strsplit(line, "", fixed=TRUE)[[1]]
168 | fs <- tmp[1:12]
169 | fs <- fs[!is.na(fs)]
170 | first12 <- .strip(paste(fs, collapse=""))
171 | if(is.na(tmp[13]))
172 | value <- ""
173 | else
174 | value <- .rstrip(paste(tmp[13:length(tmp)], collapse=""))
175 | if (!grepl("^ ", line))
176 | {
177 | field <- strsplit(line, " ", fixed=TRUE)[[1]][1]
178 | ffrec$setField(field)
179 | } else {
180 | if (first12 != "")
181 | {
182 | subfield <- first12
183 | ffrec$setSubfield(first12)
184 | }
185 | }
186 | ffrec$setBody(value)
187 | }
188 | }
189 | allEntries
190 | }
191 |
192 | .listParser <- function(txt, valueColumn, nameColumn)
193 | {
194 | lines <- strsplit(txt, "\n", fixed=TRUE)[[1]]
195 | splits <- strsplit(lines, "\t", fixed=TRUE)
196 | len <- lengths(splits)
197 | ret <- character(length(len))
198 | idx <- len >= valueColumn
199 | ret[idx] <- sapply(splits[idx], "[[", valueColumn)
200 | if (!missing(nameColumn)) {
201 | idx <- len >= nameColumn
202 | nms <- character(length(len))
203 | nms[idx] <- sapply(splits[idx], "[[", nameColumn)
204 | names(ret) <- nms
205 | }
206 | ret
207 | }
208 |
209 |
210 | .textParser <- function(txt)
211 | {
212 | txt
213 | }
214 |
215 |
216 | flatFileRecordGen <- setRefClass("KEGGFlatFileRecord",
217 | fields=list("fields"="list",
218 | lastField="character",
219 | lastSubfield="character",
220 | lastReference="list",
221 | references="list"),
222 | methods=list(
223 | initialize=function()
224 | {
225 | .self$fields <- list()
226 | .self$references <- list()
227 | .self$lastField <- character(0)
228 | .self$lastSubfield <- character(0)
229 | .self$lastReference <- list()
230 | },
231 | setField=function(field)
232 | {
233 | .self$flush()
234 | .self$lastField <- field
235 | .self$lastSubfield <- character(0)
236 | .self
237 | },
238 | setSubfield=function(subfield)
239 | {
240 | .self$lastSubfield <- subfield
241 | .self
242 | },
243 | setBody=function(body)
244 | {
245 | if (!is.null(.self$lastField) && !is.na(.self$lastField) && .self$lastField == "REFERENCE")
246 | {
247 | if(length(.self$lastSubfield))
248 | {
249 | if(is.null(.self$lastReference[[.self$lastSubfield]]))
250 | .self$lastReference[[.self$lastSubfield]] <- c()
251 | .self$lastReference[[.self$lastSubfield]] <- c(
252 | .self$lastReference[[.self$lastSubfield]],
253 | body)
254 | } else {
255 | if(is.null(.self$lastReference[[.self$lastField]]))
256 | .self$lastReference[[.self$lastField]] <- c()
257 | .self$lastReference[[.self$lastField]] <- c(
258 | .self$lastReference[[.self$lastField]],
259 | body)
260 | }
261 | } else{
262 | if (is.null(.self$fields[[.self$lastField]]))
263 | .self$fields[[.self$lastField]] <- list()
264 |
265 | if(length(.self$lastSubfield))
266 | {
267 | if(is.null(.self$fields[[.self$lastField]][[.self$lastSubfield]]))
268 | .self$fields[[.self$lastField]][[.self$lastSubfield]] <- c()
269 | .self$fields[[.self$lastField]][[.self$lastSubfield]] <- c(
270 | .self$fields[[.self$lastField]][[.self$lastSubfield]],
271 | body
272 | )
273 | } else {
274 | if (is.null(.self$fields[[.self$lastField]][[.self$lastField]]))
275 | .self$fields[[.self$lastField]][[.self$lastField]] <- c()
276 | if (!is.null(.self$lastField) && !is.na(.self$lastField))
277 | .self$fields[[.self$lastField]][[.self$lastField]] <- c(
278 | .self$fields[[.self$lastField]][[.self$lastField]], body)
279 | }
280 | }
281 | .self
282 | },
283 | flush = function()
284 | {
285 | .self$fields[["///"]] <- NULL
286 | if (length(.self$lastReference))
287 | {
288 | .self$references[[length(.self$references)+1]] <- .self$lastReference
289 | .self$lastReference <- list()
290 | }
291 | .self
292 | },
293 | names = function()
294 | {
295 | nms <- base::names(.self$fields)
296 | if (length(.self$references))
297 | nms <-c(nms, "REFERENCE")
298 | nms
299 | },
300 | get = function(name)
301 | {
302 | if (name == "REFERENCE")
303 | return(.self$references)
304 | return(.self$fields[[name]])
305 | },
306 | set = function(name, value)
307 | {
308 | .self$fields[[name]] <- value
309 | .self
310 | }, getFields = function()
311 | {
312 | f <- .self$fields
313 | if (length(.self$references))
314 | f[["REFERENCE"]] <- .self$references
315 | f
316 | }
317 | )
318 | )
319 |
320 | .compoundParser <- function(txt)
321 | {
322 | cmptxt <- unlist(txt)
323 | lines <- strsplit(cmptxt, "\n")
324 | cmps <- gsub(".*cpd:", "", unlist(lines))
325 | cmps
326 | }
327 |
--------------------------------------------------------------------------------
/R/utilities.R:
--------------------------------------------------------------------------------
1 |
2 | .getRootUrl <- function()
3 | {
4 | getOption("KEGG_REST_URL", "https://rest.kegg.jp")
5 | }
6 |
7 | .getGenomeUrl <- function()
8 | {
9 | getOption("KEGG_GENOME_URL", "http://rest.genome.jp")
10 | }
11 |
12 | .printf <- function(...) message(noquote(sprintf(...)))
13 |
14 | .cleanUrl <- function(url)
15 | {
16 | url <- gsub(" ", "%20", url, fixed=TRUE)
17 | url <- gsub("#", "%23", url, fixed=TRUE)
18 | url <- gsub(":", "%3a", url, fixed=TRUE)
19 | sub("http(s)*%3a//", "http\\1://", url)
20 | }
21 |
22 | .getUrl <- function(url, parser, ...)
23 | {
24 | url <- .cleanUrl(url)
25 | debug <- getOption("KEGGREST_DEBUG", FALSE)
26 | if (debug)
27 | .printf("url == %s", url)
28 | response <- GET(url)
29 | stop_for_status(response)
30 | content <- .strip(content(response, "text"))
31 | if (nchar(content) == 0)
32 | return(character(0))
33 | do.call(parser, list(content, ...))
34 | }
35 |
36 | .strip <- function(str)
37 | {
38 | gsub("^\\s+|\\s+$", "", str)
39 | }
40 |
41 | .rstrip <- function(str)
42 | {
43 | gsub("\\s+$", "", str)
44 | }
45 |
46 | .lstrip <- function(str)
47 | {
48 | gsub("^\\s+", "", str)
49 | }
50 |
51 | .get.kegg.url <- function(url)
52 | {
53 | res <- GET(url)
54 | stop_for_status(res, "GET KEGG pathway URL")
55 | content <- content(res, type="text", encoding = "UTF-8")
56 | lines <- strsplit(content, "\n", fixed=TRUE)[[1]]
57 | urlLine <- grep("
](https://bioconductor.org/)
2 |
3 | **KEGGREST** is an R/Bioconductor package that provides a client interface to the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API.
4 |
5 | See https://bioconductor.org/packages/KEGGREST for more information including how to install the release version of the package (please refrain from installing directly from GitHub).
6 |
7 |
--------------------------------------------------------------------------------
/inst/unitTests/test_KEGGREST.R:
--------------------------------------------------------------------------------
1 | library(KEGGREST)
2 | library(RUnit)
3 |
4 | ## checker helper
5 | .checkLOL <- function(res)
6 | {
7 | all(checkTrue(class(res)=="list"),
8 | checkTrue(class(res[[1]])=="list"),
9 | checkTrue(length(res) > 0))
10 | }
11 |
12 | .checkCharVec <- function(res)
13 | {
14 | all(checkTrue(class(res)=="character"),
15 | checkTrue(length(res) > 0))
16 | }
17 |
18 | .checkPlainText <- function(res)
19 | {
20 | all(checkTrue(class(res)=="character"),
21 | checkTrue(length(res) == 1))
22 | }
23 |
24 | .checkNamedCharVec <- function(res)
25 | {
26 | .checkCharVec(res) &&
27 | checkTrue(length(names(res)) > 0)
28 | }
29 |
30 | .checkUnnamedCharVec <- function(res)
31 | {
32 | .checkCharVec(res) &&
33 | is.null(names(res))
34 | }
35 |
36 | test_keggInfo <- function()
37 | {
38 | res <- keggInfo("kegg")
39 | .checkPlainText(res)
40 | res <- keggInfo("pathway")
41 | .checkPlainText(res)
42 | res <- keggInfo("hsa")
43 | .checkPlainText(res)
44 |
45 | }
46 |
47 | test_keggList <- function()
48 | {
49 | res <- keggList("pathway")
50 | .checkCharVec(res)
51 | res <- keggList("pathway", "hsa")
52 | .checkCharVec(res)
53 | res <- keggList("organism")
54 | checkTrue("matrix" %in% class(res))
55 | checkTrue("hsa" %in% res[, "organism"])
56 | res <- keggList("hsa")
57 | .checkCharVec(res)
58 | res <- keggList("T01001")
59 | .checkCharVec(res)
60 | res <- keggList(c("hsa:10458", "ece:Z5100"))
61 | .checkCharVec(res)
62 | res <- keggList(c("cpd:C01290","gl:G00092"))
63 | .checkCharVec(res)
64 | res <- keggList(c("C01290+G00092"))
65 | .checkCharVec(res)
66 | }
67 |
68 | ## The thorough thing to do would be to hit /list/x for each
69 | ## x in listDatabases, but that might slam KEGG too hard and
70 | ## make them mad. Instead we hit /info. KEGG does not like
71 | ## /info/organism for some reason so we will test /list/organism.
72 | ## NOTE: rpair (RP ids) was discontinued in 2016.
73 | test_listDatabases <- function()
74 | {
75 | dbs <- listDatabases()
76 | for (db in dbs)
77 | {
78 | if (all(db != c("organism", "rpair", "environ"))) # environ by vince may 5 2021
79 | {
80 | res <- keggInfo(db)
81 | .checkPlainText(res)
82 | }
83 | }
84 | res <- keggList("organism")
85 | checkTrue("matrix" %in% class(res))
86 | }
87 |
88 |
89 | test_keggFind <- function()
90 | {
91 | res <- keggFind("genes", c("shiga", "toxin"))
92 | .checkCharVec(res)
93 | res <- keggFind("genes", "shiga toxin")
94 | .checkCharVec(res)
95 | res <- keggFind("compound", "C7H10O5", "formula")
96 | .checkCharVec(res)
97 | res <- keggFind("compound", "O5C7", "formula")
98 | .checkCharVec(res)
99 | res <- keggFind("compound", 174.05, "exact_mass")
100 | .checkCharVec(res)
101 | res <- keggFind("compound", 300:310, "mol_weight")
102 | .checkCharVec(res)
103 | }
104 |
105 | test_keggGet <- function()
106 | {
107 | res <- keggGet(c("cpd:C01290", "gl:G00092"))
108 | .checkLOL(res)
109 | res <- keggGet(c("C01290", "G00092"))
110 | .checkLOL(res)
111 | res <- keggGet(c("hsa:10458", "ece:Z5100"))
112 | .checkLOL(res)
113 | res <- keggGet("ec:1.1.1.1")
114 | .checkLOL(res)
115 | .checkLOL(res[[1]]$REFERENCE)
116 | res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq")
117 | checkTrue("AAStringSet" %in% class(res))
118 | res <- keggGet(c("hsa:10458", "ece:Z5100"), "ntseq")
119 | checkTrue("DNAStringSet" %in% class(res))
120 | png <- keggGet("hsa05130", "image")
121 | checkTrue("array" %in% class(png))
122 | }
123 |
124 | test_keggGet_2 <- function()
125 | {
126 | res <- keggGet("br:br08901")
127 | .checkCharVec(res)
128 | res <- keggGet(c("br:br08901", "ece:Z5100"))
129 | .checkCharVec(res)
130 | res <- keggGet(c("ece:Z5100", "br:br08901"))
131 | .checkLOL(res)
132 | res <- keggGet("path:map00010")
133 | res <- res[[1]]
134 | # .checkNamedCharVec(res$DISEASE)
135 | res <- keggGet("md:M00001")
136 | .checkNamedCharVec(res[[1]]$REACTION)
137 | .checkNamedCharVec(res[[1]]$ORTHOLOGY)
138 | res <- keggGet("ds:H00001")
139 | .checkLOL(res)
140 | .checkUnnamedCharVec(res[[1]]$GENE)
141 | res <- keggGet("dr:D00001")
142 | x <- res[[1]]$PRODUCT
143 | checkTrue(all(names(x) == c("PRODUCT","GENERIC")))
144 | checkTrue(grepl("^ ", res[[1]]$BRITE[2]))
145 | # res <- keggGet("ev:E00001")
146 | #[1] "http://rest.kegg.jp/get/ev:E00001"
147 | #Browse[1]> zz = GET(url)
148 | #Browse[1]> httr::content(zz)
149 | #NULL
150 | #Browse[1]> zz
151 | #Response [http://rest.kegg.jp/get/ev:E00001]
152 | # Date: 2021-05-05 12:33
153 | # Status: 404
154 | # Content-Type: text/plain
155 | #
156 | #
157 | # .checkCharVec(res[[1]]$CATEGORY)
158 | res <- keggGet("ko:K00001")
159 | checkTrue(names(res[[1]]$ENTRY) == "KO")
160 | ## DBLINK parser?
161 | res <- keggGet("genome:T00001")
162 | x <- res[[1]]$CHROMOSOME
163 | checkTrue(all(names(x) == c("CHROMOSOME", "SEQUENCE", "LENGTH")))
164 | x <- res[[1]]$TAXONOMY
165 | checkTrue(all(names(x) == c("TAXONOMY", "LINEAGE")))
166 | res <- keggGet("mgnm:T30001")
167 | ## metagenome has multiple TAXONOMY sections! fixme
168 | .checkCharVec(res[[1]]$ANNOTATION)
169 | ## Changed from hsa:645954; that one doesn't seem to exist!
170 | res <- keggGet("hsa:10460")
171 | .checkNamedCharVec(res[[1]]$ORGANISM)
172 | ## IS DNAStringSet the best object for a nucleotide sequence? fixme
173 | checkTrue(class(res[[1]]$NTSEQ) %in% "DNAStringSet")
174 | res <-keggGet("cpd:C00001")
175 | .checkUnnamedCharVec(res[[1]]$REACTION)
176 | checkTrue(length(res[[1]]$REACTION)> 300)
177 | res <- keggGet("gl:G00001")
178 | checkTrue("COMPOSITION" %in% names(res[[1]]))
179 | res <- keggGet("rn:R00001")
180 | checkTrue("EQUATION" %in% names(res[[1]]))
181 | res <- keggGet("rc:RC00001")
182 | .checkUnnamedCharVec(res[[1]]$REACTION)
183 | res <- keggGet("ec:1.1.1.1")
184 | .checkUnnamedCharVec(res[[1]]$REACTION)
185 | .checkUnnamedCharVec(res[[1]]$ALL_REAC) ## not ideal fixme (?)
186 | #res <- keggGet("vgnm:NC_018104")
187 | #checkTrue(is.na(names(res[[1]]$ENTRY))) # not ideal fixme
188 | res <- keggGet("hsa:10458")
189 | checkTrue("AAStringSet" %in% class(res[[1]]$AASEQ))
190 | checkTrue("DNAStringSet" %in% class(res[[1]]$NTSEQ))
191 | # fixme do something with CODON_USAGE?
192 |
193 |
194 | }
195 |
196 | test_splitInGroups <- function()
197 | {
198 | .splitInGroups <- KEGGREST:::.splitInGroups
199 | checkIdentical(.splitInGroups(character(), 3), list())
200 | checkIdentical(.splitInGroups(1:5, 3), list(1:3, 4:5))
201 | checkIdentical(.splitInGroups(1:6, 3), list(1:3, 4:6))
202 | checkIdentical(.splitInGroups(1:7, 3), list(1:3, 4:6, 7L))
203 | }
204 |
205 | test_keggConv <- function()
206 | {
207 | res <- keggConv("eco", "ncbi-geneid")
208 | .checkCharVec(res)
209 | res <- keggConv("ncbi-geneid", "eco")
210 | .checkCharVec(res)
211 | res <- keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100"))
212 | .checkCharVec(res)
213 | }
214 |
215 | test_keggLink <- function()
216 | {
217 | res <- keggLink("pathway", "hsa")
218 | .checkCharVec(res)
219 | res <- keggLink("hsa", "pathway")
220 | .checkCharVec(res)
221 | res <- keggLink("pathway", c("hsa:10458", "ece:Z5100"))
222 | .checkCharVec(res)
223 | }
224 |
225 | test_mark_and_color_pathways_by_objects <- function(){
226 | url <- mark.pathway.by.objects("path:eco00260",
227 | c("eco:b0002", "eco:c00263"))
228 | .checkCharVec(url)
229 | checkTrue(grep("https://", url)==1)
230 | res <- httr::GET(url)
231 | checkTrue( httr::http_type(res) == 'image/png' )
232 | url <- color.pathway.by.objects("path:eco00260",
233 | c("eco:b0002", "eco:c00263"),
234 | c("#ff0000", "#00ff00"),
235 | c("#ffff00", "yellow"))
236 | .checkCharVec(url)
237 | checkTrue(grep("https://", url)==1)
238 | res <- httr::GET(url)
239 | checkTrue( httr::http_type(res) == 'image/png' )
240 | }
241 |
242 |
243 | test_reference_parser <- function()
244 | {
245 | res <- keggGet("path:map00010")[[1]]
246 | refs <- res$REFERENCE[[1]]
247 | checkTrue(length(refs) > 0)
248 | }
249 |
250 | test_keggCompounds <- function() {
251 | result <- c(
252 | "C00011", "C00042", "C00090", "C00146", "C00160", "C00530",
253 | "C00682", "C01407", "C02124", "C02222", "C02375", "C02575", "C02625",
254 | "C02814", "C02933", "C03434", "C03572", "C03585", "C03664", "C03918",
255 | "C04091", "C04431", "C04522", "C04706", "C04729", "C05618", "C06328",
256 | "C06329", "C06594", "C06596", "C06597", "C06598", "C06599", "C06600",
257 | "C06601", "C06602", "C06603", "C06755", "C06988", "C06989", "C06990",
258 | "C07075", "C07088", "C07089", "C07090", "C07091", "C07092", "C07093",
259 | "C07094", "C07095", "C07096", "C07097", "C07098", "C07099", "C07100",
260 | "C07101", "C07102", "C07103", "C11352", "C12831", "C12832", "C12833",
261 | "C12834", "C12835", "C12836", "C12837", "C12838", "C14419", "C14450",
262 | "C16181", "C16182", "C16266", "C18236", "C18238", "C18240", "C18241",
263 | "C18242", "C18243", "C18244", "C18933", "C21103", "C21104", "C21105"
264 | )
265 | checkTrue(
266 | all(
267 | result %in% keggCompounds("map00361")
268 | )
269 | )
270 | }
271 |
272 |
--------------------------------------------------------------------------------
/man/keggCompounds.Rd:
--------------------------------------------------------------------------------
1 | \name{keggCompounds}
2 | \alias{keggCompounds}
3 | \title{
4 | Get list of compounds IDs for pathway
5 | }
6 | \description{
7 | Get list of compounds IDs for pathway.
8 | }
9 | \usage{
10 | keggCompounds(pathwayID)
11 | }
12 | \arguments{
13 | \item{pathwayID}{
14 | A KEGG pathway identifier with the prefix \code{map} and 5 digit number.
15 | }
16 |
17 | }
18 | \value{
19 | A list of KEGG compound identifiers
20 | }
21 | \references{
22 | \url{https://www.genome.jp/kegg/pathway.html}
23 | }
24 | \author{
25 | Dan Tenenbaum, Kristina Riemer
26 | }
27 | \examples{
28 | keggCompounds("map00361")
29 | }
30 | \keyword{ compounds }
31 |
--------------------------------------------------------------------------------
/man/keggConv.Rd:
--------------------------------------------------------------------------------
1 | \name{keggConv}
2 | \alias{keggConv}
3 | \alias{conv}
4 | \alias{bconv}
5 | \title{
6 | Convert KEGG identifiers to/from outside identifiers
7 | }
8 | \description{
9 | Convert KEGG identifiers to/from outside identifiers.
10 | }
11 | \usage{
12 | keggConv(target, source, querySize = 100)
13 | }
14 | \arguments{
15 | \item{target}{
16 | A KEGG organism code (), T number, or one of the external
17 | databases \code{ncbi-gi}, \code{ncbi-geneid}, \code{ncbi-proteinid},
18 | \code{uniprot}, or
19 | (for chemical substance identifiers)
20 | \code{drug}, \code{compound}, or \code{glycan}, \code{pubchem},
21 | or \code{chebi}.
22 | }
23 |
24 | \item{source}{
25 | Same as \code{target}, but may also be a list of KEGG identifers
26 | representing internal or external names.
27 | }
28 |
29 | \item{querySize}{
30 | Empirically, KEGG limits queries to 100 source identifiers per query.
31 | This argument enables larger queries by dividing \code{source} into
32 | sub-queries of no more than \code{querySize} identifiers.
33 | }
34 |
35 | }
36 | \value{
37 | A named character vector.
38 | }
39 | \references{
40 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
41 | }
42 | \author{
43 | Dan Tenenbaum
44 | }
45 | \examples{
46 | ## conversion from NCBI GeneID to KEGG ID for E. coli genes
47 | head(keggConv("eco", "ncbi-geneid"))
48 | head(keggConv("ncbi-geneid", "eco")) ## opposite direction
49 |
50 | ## conversion from KEGG ID to NCBI GI
51 | head(keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100")))
52 |
53 | ## conversion from NCBI GI to KEGG ID when the organism code is not known:
54 | head(keggConv("genes", "ncbi-geneid:3113320"))
55 | }
56 | \keyword{ conv }
57 |
--------------------------------------------------------------------------------
/man/keggFind.Rd:
--------------------------------------------------------------------------------
1 | \name{keggFind}
2 | \alias{keggFind}
3 | \title{
4 | Finds entries with matching query keywords or other query data in a given
5 | database
6 | }
7 | \description{
8 | Finds entries with matching query keywords or other query data in a given
9 | database.
10 | }
11 | \usage{
12 | keggFind(database, query, option = c("formula", "exact_mass",
13 | "mol_weight"))
14 | }
15 | \arguments{
16 | \item{database}{
17 | Either the name of a single KEGG database (list available via
18 | \code{\link{listDatabases}()}, a "T number" genome identifier,
19 | or a KEGG organism code (lists of both available via
20 | \code{keggList("organism")}).
21 | }
22 | \item{query}{
23 | One or more keywords, or a range of integers representing
24 | molecular weights.
25 | If \code{query} includes identifiers not known to KEGG,
26 | the results will not contain any information about those identifiers.
27 | }
28 | \item{option}{
29 | \code{Optional.} If \code{database} is \code{compound} or \code{drug},
30 | \code{option} can be \code{formula}, \code{exact_mass}, or
31 | \code{weight}.
32 | Chemical formula search is a partial match irrespective of the
33 | order of atoms given.
34 | The exact mass (or molecular weight) is checked by rounding off to the
35 | same decimal place as the query data.
36 | }
37 | }
38 | \value{
39 | A named character vector.
40 | }
41 | \references{
42 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
43 | }
44 | \author{
45 | Dan Tenenbaum
46 | }
47 |
48 |
49 | \examples{
50 | res <-
51 | keggFind("genes", c("shiga", "toxin")) ## for keywords "shiga" and "toxin"
52 | length(res)
53 | head(res)
54 | res <- keggFind("genes", "shiga toxin") ## for keywords "shiga toxin"
55 | length(res)
56 | head(res)
57 | keggFind("compound", "C7H10O5", "formula") ## for chemical formula "C7H10O5"
58 | res <- keggFind("compound", "O5C7", "formula") ## for chemical formula
59 | ## containing "O5" and "C7"
60 | length(res)
61 | head(res)
62 | keggFind("compound", 174.05, "exact_mass") ## for 174.045
63 | ## =< exact mass < 174.055
64 | res <- keggFind("compound", 300:310, "mol_weight") ## for 300 =<
65 | ## molecular weight =< 310
66 | length(res)
67 | head(res)
68 | }
69 | \keyword{ find }
70 |
--------------------------------------------------------------------------------
/man/keggGet.Rd:
--------------------------------------------------------------------------------
1 | \name{keggGet}
2 | \alias{keggGet}
3 | \title{
4 | Retrieves given database entries
5 | }
6 | \description{
7 | Retrieves given database entries.
8 | }
9 | \usage{
10 | keggGet(dbentries, option = c("aaseq", "ntseq", "mol", "kcf",
11 | "image", "kgml"))
12 | }
13 | %- maybe also 'usage' for other objects documented here.
14 | \arguments{
15 | \item{dbentries}{
16 | One or more (up to a maximum of 10) KEGG identifiers.
17 | }
18 | \item{option}{
19 | \code{Optional.} Option governing the format of the output.
20 | \code{aaseq} is an amino acid sequence, \code{ntseq} is a nucleotide
21 | sequence. \code{image} returns an object which can be written
22 | to a PNG file, \code{kgml} returns a KGML document.
23 | }
24 | }
25 | \details{
26 | Retrieves all entries from the KEGG database for a set of KEGG identifers.
27 |
28 | \code{keggGet}() can only return 10 result sets at once (this limitation
29 | is on the server side). If you supply more than 10 inputs to \code{keggGet()},
30 | \code{KEGGREST} will warn that only the first 10 results will be returned.
31 | }
32 | \value{
33 | A list wrapping a KEGG flat file.
34 | If \code{option} is \code{aaseq}, an \code{AAStringSet} object.
35 | If \code{option} is \code{ntseq}, a \code{DNAStringSet} object.
36 | If \code{option} is \code{image}, an object which can be written
37 | to a PNG file.
38 | If \code{option} is \code{kgml}, a KGML document.
39 | }
40 | \references{
41 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
42 | }
43 | \author{
44 | Dan Tenenbaum
45 | }
46 | \examples{
47 | res <- keggGet(c("cpd:C01290", "gl:G00092")) ## retrieves a compound entry
48 | ## and a glycan entry
49 | str(res)
50 | res <- keggGet(c("C01290", "G00092")) ## same as above, without prefixes
51 | str(res)
52 | res <- keggGet(c("hsa:10458", "ece:Z5100")) ## retrieves a human gene entry
53 | ## and an E.coli O157 gene entry
54 | str(res)
55 | res <- keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino
56 | ## acid sequences of a human gene and an
57 | ## E.coli O157 gene
58 | png <- keggGet("hsa05130", "image") ## retrieves the image file of a
59 | ## pathway map
60 | t <- tempfile()
61 | library(png)
62 | writePNG(png, t)
63 | res <- keggGet("hsa05130", "kgml")
64 | str(res)
65 | }
66 | \keyword{ get }
67 |
--------------------------------------------------------------------------------
/man/keggInfo.Rd:
--------------------------------------------------------------------------------
1 | \name{keggInfo}
2 | \alias{keggInfo}
3 | \alias{info}
4 | \title{
5 | Displays the current statistics of a given database
6 | }
7 | \description{
8 | Displays statistics of a given database, such as number of
9 | entries, version, release date, and source.
10 | }
11 | \usage{
12 | keggInfo(database)
13 | }
14 | \arguments{
15 | \item{database}{
16 | Either a KEGG database (list available via \code{\link{listDatabases}()}),
17 | a KEGG organism code (list available by calling \code{\link{keggList}()})
18 | with the \code{organism} argument), or a T number (list available by
19 | calling \code{\link{keggList}()} with the \code{genome} argument.)
20 |
21 | }
22 | }
23 | \value{
24 | A character vector containing statistics about \code{database}.
25 | }
26 | \references{
27 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
28 | }
29 | \author{
30 | Dan Tenenbaum
31 | }
32 | \examples{
33 | res <- keggInfo("kegg") ## displays the current statistics of the KEGG database
34 | cat(res)
35 | res <- keggInfo("pathway") ## displays the number pathway entries including both
36 | ## the reference and organism-specific pathways
37 | cat(res)
38 | res <- keggInfo("hsa") ## displays the number of gene entries for the
39 | ## KEGG organism Homo sapiens
40 | cat(res)
41 | }
42 | \keyword{ info }
43 | \keyword{ metadata }
44 |
--------------------------------------------------------------------------------
/man/keggLink.Rd:
--------------------------------------------------------------------------------
1 | \name{keggLink}
2 | \alias{keggLink}
3 | \alias{link}
4 | \title{
5 | Find related entries by using database cross-references.
6 | }
7 | \description{
8 | Find related entries by using database cross-references.
9 | }
10 | \usage{
11 | keggLink(target, source)
12 | }
13 | \arguments{
14 | \item{target}{
15 | Either the name of a single KEGG database (list available via
16 | \code{\link{listDatabases}()}, a "T number" genome identifier,
17 | or a KEGG organism code (lists of both available via
18 | \code{keggList("organism")}).
19 | }
20 | \item{source}{
21 | The same as \code{target}, but may also be one or more
22 | KEGG identifiers.
23 | }
24 | }
25 | \details{
26 | Many of the old KEGGSOAP functions whose names
27 | started with 'get', such as \code{get.pathways.by.genes} and
28 | \code{get.pathways.by.reactions},
29 | are replaced by using \code{keggLink} (see examples).
30 |
31 |
32 |
33 | }
34 | \value{
35 | A named character vector.
36 | }
37 | \references{
38 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
39 | }
40 | \author{
41 | Dan Tenenbaum
42 | }
43 | \examples{
44 | res <- keggLink("pathway", "hsa") ## KEGG pathways linked from each of
45 | ## the human genes equivalent to 'get.genes.by.pathway' in KEGGSOAP
46 | length(res)
47 | head(res)
48 | res <- keggLink("hsa", "pathway") ## human genes linked from each of the
49 | ## KEGG pathways equivalent to 'get.pathways.by.genes' in KEGGSOAP
50 | keggLink("pathway", c("hsa:10458", "ece:Z5100")) ## KEGG pathways
51 | ## linked from a human gene and an E. coli O157 gene
52 | res <- keggLink("hsa:126") ## LinkDB search shows all KEGG
53 | ## resources related to hsa:126
54 | head(res)
55 | }
56 | \keyword{ link }
57 |
--------------------------------------------------------------------------------
/man/keggList.Rd:
--------------------------------------------------------------------------------
1 | \name{keggList}
2 | \alias{keggList}
3 | \title{
4 | Returns a list of entry identifiers and associated definition for a given
5 | database or a given set of database entries.
6 | %% ~~function to do ... ~~
7 | }
8 | \description{
9 | Returns a list of entry identifiers and associated definition for a given
10 | database or a given set of database entries.
11 | }
12 | \usage{
13 | keggList(database, organism)
14 | }
15 | %- maybe also 'usage' for other objects documented here.
16 | \arguments{
17 | \item{database}{
18 | %% ~~Describe \code{x} here~~
19 | Either a KEGG database (list available via \code{\link{listDatabases}()}),
20 | a KEGG organism code (list available via \code{\link{keggList}()} with the
21 | \code{organism} argument, a T number (list available via
22 | \code{\link{keggList}()} with the \code{genome} argument), or a character
23 | vector of KEGG identifiers.
24 | }
25 | \item{organism}{
26 | \code{Optional.} A KEGG organism identifier (list available via
27 | \code{\link{keggList}()} with the \code{organism} argument).
28 | }
29 | }
30 | \value{
31 | A named character vector containing entry identifiers and
32 | associated definition.
33 | }
34 | \references{
35 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
36 | }
37 | \author{
38 | Dan Tenenbaum
39 | }
40 | \examples{
41 | res <- keggList("pathway") ## returns the list of reference pathways
42 | length(res)
43 | head(res)
44 | res <- keggList("pathway", "hsa") ## returns the list of human pathways
45 | length(res)
46 | head(res)
47 | res <- keggList("organism") ## returns the list of KEGG organisms with
48 | ## taxonomic classification
49 | nrow(res)
50 | head(res)
51 | res <- keggList("hsa") ## returns the entire list of human genes
52 | length(res)
53 | head(res)
54 | ## keggList("T01001") ## same as above
55 | keggList(c("hsa:10458", "ece:Z5100")) ## returns the list of a human gene
56 | ## and an E.coli O157 gene
57 | keggList(c("cpd:C01290","gl:G00092")) ## returns the list of a compound entry
58 | ## and a glycan entry
59 | keggList(c("C01290+G00092")) ## same as above (prefixes are not necessary)
60 | }
61 | \keyword{ list }
62 |
--------------------------------------------------------------------------------
/man/listDatabases.Rd:
--------------------------------------------------------------------------------
1 | \name{listDatabases}
2 | \alias{listDatabases}
3 | \title{
4 | Lists the KEGG databases which may be searched.
5 | }
6 | \description{
7 | Lists the KEGG databases which may be searched. In most cases,
8 | you can also use a KEGG organism name or T number (genome identifier)
9 | as a database name.
10 | }
11 | \usage{
12 | listDatabases()
13 | }
14 | \value{
15 | A character vector of database names.
16 | }
17 | \references{
18 | \url{https://www.kegg.jp/kegg/docs/keggapi.html}
19 | }
20 | \author{
21 | Dan Tenenbaum
22 | }
23 | \seealso{
24 | \code{\link{keggList}}
25 | }
26 | \examples{
27 | listDatabases()
28 | res <- keggList("organism") ## list all organisms
29 | nrow(res)
30 | head(res)
31 | res <- keggList("hsa") ## list all human genes
32 | length(res)
33 | head(res)
34 | ## keggList("T01001") ## list all human genes
35 | res <- keggList("genome") ## list all genome identifiers
36 | length(res)
37 | head(res)
38 | }
39 | \keyword{ database }
40 | \keyword{ databases }
41 |
--------------------------------------------------------------------------------
/man/mark.pathway.by.objects.Rd:
--------------------------------------------------------------------------------
1 | \name{mark.pathway.by.objects}
2 | \alias{mark.pathway.by.objects}
3 | \alias{color.pathway.by.objects}
4 |
5 | \title{Client-side interface to obtain an url for a KEGG pathway diagram
6 | with a given set of genes marked}
7 | \description{
8 | Given a KEGG pathway id and a set of KEGG gene ids, the functions
9 | return the URL of a KEGG pathway diagram with the elements
10 | corresponding to the genes marked by red or specified color
11 | }
12 | \usage{
13 | mark.pathway.by.objects(pathway.id, object.id.list)
14 | color.pathway.by.objects(pathway.id, object.id.list,
15 | fg.color.list, bg.color.list)
16 | }
17 |
18 | \arguments{
19 | \item{pathway.id}{\code{pathway.id} a character string for a KEGG
20 | pathway id. KEGG pathway ids consist of the string path followed by
21 | a colon, a three-letter code for the organism of concern, and then
22 | a number (e. g. "path:eco00020"). The three-letter organism code
23 | consists of the first letter of the genus name and the first two
24 | letters of the species name of the scientific name of the organism
25 | of concern}
26 | \item{object.id.list}{\code{object.id.list} a vector of character
27 | strings for KEGG gene ids. KEGG gene ids normally consist of
28 | three letters followed by a column and then several numeric
29 | numbers. The three letters are from the first letter of the genus
30 | name and the first two letters of the species name of the scientific
31 | name of the organism of concern (e. g. hsa:111 for Homo Sapiens)}
32 | \item{fg.color.list}{\code{fg.color.list} a vector of two character
33 | strings to indicate the color for the text and border, respectively,
34 | of the objects in a pathway diagram. The strings can either be a
35 | color code linke #ff0000 or letter link yellow}
36 | \item{bg.color.list}{\code{bg.color.list} a vector of character
37 | strings of the same length of \code{object.id.list} to indicate the
38 | background color of the objects in a pathway diagram. The strings
39 | can either be a color code like #ff0000 or letter like yellow}
40 | }
41 | \details{
42 | This function only returns the URL of the KEGG pathway diagram. Use
43 | the function \code{\link{browseURL}} to view the diagram.
44 |
45 | These functions are not part of the KEGG REST API; they are provided
46 | because they existed in \code{KEGGSOAP} and an alternative implementation
47 | was possible.
48 | }
49 | \value{
50 | This function returns a character string for the url
51 | }
52 | \references{\url{https://www.kegg.jp/kegg/docs/keggapi.html}}
53 | \author{Jianhua Zhang}
54 |
55 | \seealso{\code{\link{browseURL}}}
56 | \examples{
57 | url <- mark.pathway.by.objects(
58 | "path:eco00260", c("eco:b0002", "eco:c00263")
59 | )
60 | if(interactive()){
61 | browseURL(url)
62 | }
63 | url <- color.pathway.by.objects(
64 | "path:eco00260", c("eco:b0002", "eco:c00263"),
65 | c("#ff0000", "#00ff00"),
66 | c("#ffff00", "yellow")
67 | )
68 | }
69 | \keyword{ datasets }
70 |
71 |
--------------------------------------------------------------------------------
/tests/KEGGREST_unit_tests.R:
--------------------------------------------------------------------------------
1 | BiocGenerics:::testPackage("KEGGREST")
2 |
--------------------------------------------------------------------------------
/vignettes/KEGGREST-vignette.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Accessing the KEGG REST API"
3 | date: "`r format(Sys.Date(), '%B %d, %Y')`"
4 | vignette: >
5 | %\VignetteIndexEntry{Accessing the KEGG REST API}
6 | %\VignetteEngine{knitr::rmarkdown}
7 | %\VignetteEncoding{UTF-8}
8 | output:
9 | BiocStyle::html_document:
10 | toc: true
11 | ---
12 |
13 | ```{r setup, echo=FALSE}
14 | library(knitr)
15 | options(width=80)
16 | ```
17 | ```{r wrap-hook, echo=FALSE}
18 | hook_output = knit_hooks$get('output')
19 | knit_hooks$set(output = function(x, options) {
20 | # this hook is used only when the linewidth option is not NULL
21 | if (!is.null(n <- options$linewidth)) {
22 | x = knitr:::split_lines(x)
23 | # any lines wider than n should be wrapped
24 | if (any(nchar(x) > n)) x = strwrap(x, width = n)
25 | x = paste(x, collapse = '\n')
26 | }
27 | hook_output(x, options)
28 | })
29 | ```
30 |
31 | # KEGGREST
32 |
33 | [KEGG](https://www.kegg.jp/kegg/)
34 | is a database resource for understanding high-level functions
35 | and utilities of the biological system, such as the cell, the organism
36 | and the ecosystem, from molecular-level information, especially
37 | large-scale molecular datasets generated by genome sequencing and
38 | other high-throughput experimental technologies.
39 |
40 | `KEGGREST` allows access to the
41 | [KEGG REST API](https://www.kegg.jp/kegg/docs/keggapi.html). Since
42 | KEGG disabled the KEGG SOAP server
43 | on December 31, 2012 (which means the `KEGGSOAP` package will no
44 | longer work), `KEGGREST` serves as a replacement.
45 |
46 | The interface to `KEGGREST` is simpler and in some ways more
47 | powerful than `KEGGSOAP`; however, not all the functionality
48 | that was available through the SOAP API has been exposed
49 | in the REST API. If and when more functionality is exposed
50 | on the server side, this package will be updated to take
51 | advantage of it.
52 |
53 | **Restriction: The KEGG API is provided for academic use by academic
54 | users belonging to academic institutions. See https://www.kegg.jp/kegg/rest/
55 | for more information.**
56 |
57 | ## Installation
58 |
59 | You can install `KEGGREST` from Bioconductor with:
60 |
61 | ```{r install,eval=FALSE}
62 | if (!require("BiocManager", quietly=TRUE))
63 | install.packages("BiocManager")
64 |
65 | BiocManager::install("KEGGREST")
66 | ```
67 |
68 | ## Overview
69 |
70 | The KEGG REST API is built on some simple operations:
71 | `info`, `list`, `find`, `get`, `conv`, and `link`.
72 | The corresponding `R` functions in `KEGGREST` are:
73 | `keggInfo()`, `keggList()`, `keggFind()`, `keggGet()`,
74 | `keggConv`, and `keggLink()`.
75 |
76 |
77 | # Exploring KEGG Resources with `keggList()`
78 |
79 | KEGG exposes a number of databases. To get an idea of
80 | what is available, run `listDatabases()`:
81 |
82 | ```{r listDatabases}
83 | library(KEGGREST)
84 | listDatabases()
85 | ```
86 | You can use these databases in further queries. Note that in many
87 | cases you can also use a three-letter KEGG organism code or a
88 | "T number" (genome identifier) in the same place you would use
89 | one of these database names.
90 |
91 | You can obtain the list of organisms available in KEGG with
92 | the `keggList()` function:
93 |
94 | ```{r get_organisms}
95 | org <- keggList("organism")
96 | head(org)
97 | ```
98 |
99 | From `KEGGREST`'s point of view, you've just asked KEGG
100 | to show you the name of every entry in the "organism" database.
101 |
102 | Therefore, the complete list of entities that can be
103 | queried with `KEGGREST` can be obtained as follows:
104 |
105 | ```{r list_queryables}
106 | queryables <- c(listDatabases(), org[,1], org[,2])
107 | ```
108 |
109 | You could also ask for every entry in the "hsa" (_Homo sapiens_)
110 | database as follows:
111 |
112 | ```{r query_hsa, eval=FALSE}
113 | keggList("hsa")
114 | ```
115 |
116 | # Get specific entries with `keggGet()`
117 |
118 | Once you have a list of specific KEGG identifiers, use
119 | `keggGet()` to get more information about them. Here we look up
120 | a human gene and an E. coli O157 gene:
121 |
122 | ```{r keggGet}
123 | query <- keggGet(c("hsa:10458", "ece:Z5100"))
124 | ```
125 |
126 | As expected, this returns two items:
127 |
128 | ```{r querylength}
129 | length(query)
130 | ```
131 |
132 | Behind the scenes, `KEGGREST` downloaded and parsed a KEGG
133 | [flat file](https://www.kegg.jp/kegg/rest/dbentry.html), which you
134 | can now explore:
135 |
136 | ```{r explore}
137 | names(query[[1]])
138 | query[[1]]$ENTRY
139 | query[[1]]$DBLINKS
140 | ```
141 |
142 | `keggGet()` can also return amino acid sequences as `AAStringSet` objects
143 | (from the `Biostrings` package):
144 |
145 | ```{r aaseq}
146 | keggGet(c("hsa:10458", "ece:Z5100"), "aaseq") ## retrieves amino acid sequences
147 | ```
148 |
149 | ...or `DNAStringSet` objects if `option` is `ntseq`:
150 |
151 | ```{r ntseq}
152 | keggGet(c("hsa:10458", "ece:Z5100"), "ntseq") ## retrieves nucleotide sequences
153 | ```
154 |
155 |
156 |
157 | `keggGet()` can also return images:
158 | ```{r png}
159 | png <- keggGet("hsa05130", "image")
160 | t <- tempfile()
161 | library(png)
162 | writePNG(png, t)
163 | if (interactive()) browseURL(t)
164 | ```
165 |
166 | __NOTE__: `keggGet()` can only return 10 result sets at once (this limitation
167 | is on the server side). If you supply more than 10 inputs to `keggGet()`,
168 | `KEGGREST` will warn that only the first 10 results will be returned.
169 |
170 | # Search by keywords with `keggFind()`
171 |
172 | You can search for two separate keywords ("shiga" and "toxin" in this case):
173 |
174 | ```{r separate_keywords, linewidth=80}
175 | head(keggFind("genes", c("shiga", "toxin")))
176 | ```
177 |
178 | Or search for the two words together:
179 |
180 | ```{r keyphrase, linewidth=80}
181 | head(keggFind("genes", "shiga toxin"))
182 | ```
183 |
184 | Search for a chemical formula:
185 | ```{r formula}
186 | head(keggFind("compound", "C7H10O5", "formula"))
187 | ```
188 | Search for a chemical formula containing "O5" and "C7":
189 | ```{r formula2}
190 | head(keggFind("compound", "O5C7", "formula"))
191 | ```
192 |
193 | You can search for compounds with a particular exact mass:
194 |
195 | ```{r exact_mass}
196 | keggFind("compound", 174.05, "exact_mass")
197 | ```
198 |
199 | Because we've supplied a number with two decimal digits of precision,
200 | KEGG will find all compounds with exact mass between 174.045 and 174.055.
201 |
202 | Integer ranges can be used to find compounds by molecular weight:
203 |
204 | ```{r mol_weight}
205 | head(keggFind("compound", 300:310, "mol_weight"))
206 | ```
207 |
208 | # Convert identifiers with `keggConv()`
209 |
210 | Convert between KEGG identifiers and outside identifiers.
211 |
212 | You can either specify fully qualified identifiers:
213 |
214 | ```{r conv_with_ids}
215 | keggConv("ncbi-proteinid", c("hsa:10458", "ece:Z5100"))
216 | ```
217 |
218 | ...or get the mapping for an entire species:
219 |
220 | ```{r conv_species_kegg_to_geneid}
221 | head(keggConv("eco", "ncbi-geneid"))
222 | ```
223 |
224 | Reversing the arguments does the opposite mapping:
225 |
226 | ```{r conv_species_geneid_to_kegg}
227 | head(keggConv("ncbi-geneid", "eco"))
228 |
229 | ```
230 |
231 | # Link across databases with `keggLink()`
232 |
233 | Most of the `KEGGSOAP` functions whose names started with
234 | "get", for example `get.pathways.by.genes()`, can be replaced
235 | with the `keggLink()` function. Here we query all pathways
236 | for human:
237 |
238 | ```{r keggLink}
239 | head(keggLink("pathway", "hsa"))
240 | ```
241 |
242 | ...but you can also specify one or more genes (from multiple species):
243 | ```{r keggLink2}
244 | keggLink("pathway", c("hsa:10458", "ece:Z5100"))
245 | ```
246 |
--------------------------------------------------------------------------------