├── .gitignore
├── .travis.yml
├── LICENSE
├── README.md
├── project.clj
├── src
└── edn_ld
│ ├── common.clj
│ ├── core.clj
│ ├── jena.clj
│ └── rdfxml.clj
├── test-resources
└── books.tsv
└── test
└── edn_ld
├── core_test.clj
├── jena_test.clj
└── readme_test.clj
/.gitignore:
--------------------------------------------------------------------------------
1 | pom.xml
2 | pom.xml.asc
3 | *jar
4 | /lib/
5 | /classes/
6 | /target/
7 | /checkouts/
8 | .lein-deps-sum
9 | .lein-repl-history
10 | .lein-plugins/
11 | .lein-failures
12 | .lein-env
13 | .nrepl-port
14 | .DS_Store
15 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: clojure
2 | jdk:
3 | - oraclejdk8
4 | - oraclejdk7
5 | - openjdk7
6 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2015, James A. Overton
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without
5 | modification, are permitted provided that the following conditions are met:
6 |
7 | * Redistributions of source code must retain the above copyright notice, this
8 | list of conditions and the following disclaimer.
9 |
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 | this list of conditions and the following disclaimer in the documentation
12 | and/or other materials provided with the distribution.
13 |
14 | * Neither the name of edn-ld nor the names of its
15 | contributors may be used to endorse or promote products derived from
16 | this software without specific prior written permission.
17 |
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28 |
29 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # EDN-LD
2 |
3 | [](https://travis-ci.org/ontodev/edn-ld)
4 |
5 | EDN-LD is a set of conventions and a library for working with [Linked Data (LD)](http://linkeddata.org) using [Extensible Data Notation (EDN)](https://github.com/edn-format/edn) and the [Clojure programming language](http://clojure.org). EDN-LD builds on EDN and [JSON-LD](http://json-ld.org), but is not otherwise affiliated with those projects.
6 |
7 | **[Try EDN-LD online!](http://try.edn-ld.com)**
8 |
9 | This project is in early development!
10 |
11 |
12 | ## Linked Data
13 |
14 | Linked data is an approach to working with data on the Web:
15 |
16 | - instead of tables we have graphs -- networks of data
17 | - instead of rows we have resources -- nodes in the graph
18 | - the values in our cells are also nodes -- either resources or literals: strings, numbers, dates
19 | - and instead of columns we have named relations that link nodes to form the graph
20 |
21 | Just think of your tables as big sets of row-column-cell "triples". By switching from rigid tables to flexible graphs, we can easily merge data from across the web.
22 |
23 | Linked data is simple. The tools for working with it are powerful: big Java libraries such as [Jena](https://jena.apache.org), [Sesame](http://rdf4j.org), [OWLAPI](http://owlapi.sourceforge.net), etc. Unfortunately, most of the tools are not simple.
24 |
25 | EDN-LD is a simple linked data tool.
26 |
27 |
28 | ## Install
29 |
30 | EDN-LD is a Clojure library. The easiest way to get started is to use [Leiningen](http://leiningen.org) and add this to your `project.clj` dependencies:
31 |
32 | [edn-ld "0.3.0"]
33 |
34 |
35 | ## Tutorial
36 |
37 | Try out EDN-LD with our [interactive online tutorial](http://try.edn-ld.com), or by cloning this project and starting a REPL:
38 |
39 | $ git clone https://github.com/ontodev/edn-ld.git
40 | $ cd edn-ld
41 | $ lein repl
42 | nREPL server started ...
43 | user=> (use 'edn-ld.core 'edn-ld.common)
44 | nil
45 | user=> (require '[clojure.string :as string])
46 | nil
47 | user=> "Ready!"
48 | Ready!
49 |
50 | Say we have a (very small) table of books and their authors called `books.tsv`:
51 |
52 | Title | Author
53 | ----------|-------
54 | The Iliad | Homer
55 |
56 | A common way to represent this in Clojure is as a list of maps, with the column names as the keys. We can `slurp` and split the data until we get what we want:
57 |
58 | user=> (defn split-row [row] (string/split row #"\t"))
59 | #'user/split-row
60 | user=> (defn read-tsv [path] (->> path slurp string/split-lines (drop 1) (mapv split-row)))
61 | #'user/read-tsv
62 | user=> (def rows (read-tsv "test-resources/books.tsv"))
63 | #'user/rows
64 | user=> rows
65 | [["The Iliad" "Homer"]]
66 |
67 | Now we use `zipmap` to associate keys with values:
68 |
69 | user=> (def data (mapv (partial zipmap [:title :author]) rows))
70 | #'user/data
71 | user=> data
72 | [{:title "The Iliad", :author "Homer"}]
73 |
74 | We have the data in a convenient shape, but what does it mean? Well, there's some resource that has "The Iliad" as its title, and some guy named "Homer" who is the author of that resource. We also know from the context that it's a book.
75 |
76 | The first thing to do is give names to our resources. Linked data names are [IRIs](https://en.wikipedia.org/wiki/Internationalized_resource_identifier): globally unique identifiers that generalize the familiar URL you see in your browser's location bar. We can use some standard names for our relations from the [Dublin Core](http://dublincore.org) metadata standard, and we'll make up some more.
77 |
78 | Name | IRI
79 | ----------|-----------------------------------------
80 | title | `http://purl.org/dc/elements/1.1/title`
81 | author | `http://purl.org/dc/elements/1.1/author`
82 | The Iliad | `http://example.com/the-iliad`
83 | Homer | `http://example.com/Homer`
84 | book | `http://example.com/book`
85 |
86 | IRIs can be long and cumbersome, so let's define some prefixes that we can use to shorten them:
87 |
88 | Prefix | IRI
89 | -------|-----------------------------------
90 | `dc` | `http://purl.org/dc/elements/1.1/`
91 | `ex` | `http://example.com/`
92 |
93 | The `ex` prefix will be our default. We use strings for full IRIs and keywords when we're using some sort of contraction.
94 |
95 | IRI | Contraction
96 | -----------------------------------------|------------
97 | `http://purl.org/dc/elements/1.1/title` | `:dc:title`
98 | `http://purl.org/dc/elements/1.1/author` | `:dc:author`
99 | `http://example.com/the-iliad` | `:the-iliad`
100 | `http://example.com/Homer` | `:Homer`
101 | `http://example.com/book` | `:book`
102 |
103 | We'll put this naming information in a *context* map:
104 |
105 | user=> (def context {:dc "http://purl.org/dc/elements/1.1/", :ex "http://example.com/", nil :ex, :title :dc:title, :author :dc:author})
106 | #'user/context
107 |
108 | The `nil` key indicates the default prefix `:ex`. Now we can use the context to expand contractions and to contract IRIs:
109 |
110 | user=> (expand context :title)
111 | http://purl.org/dc/elements/1.1/title
112 | user=> (expand context :Homer)
113 | http://example.com/Homer
114 | user=> (contract context "http://purl.org/dc/elements/1.1/title")
115 | :title
116 | user=> (contract context "http://purl.org/dc/elements/1.1/foo")
117 | :dc:foo
118 | user=> (expand-all context data)
119 | [{"http://purl.org/dc/elements/1.1/title" "The Iliad", "http://purl.org/dc/elements/1.1/author" "Homer"}]
120 |
121 | Sometimes we also want to *resolve* a name to an IRI. We can define a resources map from string to IRIs or contractions:
122 |
123 | user=> (def resources {"Homer" :Homer, "The Iliad" :the-iliad})
124 | #'user/resources
125 |
126 | We should include this information in our data by assigning a special `:subject-iri` to each of our maps. We can do this one at a time with `assoc`:
127 |
128 | user=> (def book (assoc (first data) :subject-iri :the-iliad))
129 | #'user/book
130 | user=> book
131 | {:title "The Iliad", :author "Homer", :subject-iri :the-iliad}
132 |
133 | Or we can use a higher-order function to find the title from the resources map:
134 |
135 | user=> (def books (mapv #(assoc % :subject-iri (get resources (:title %))) data))
136 | #'user/books
137 | user=> books
138 | [{:title "The Iliad", :author "Homer", :subject-iri :the-iliad}]
139 |
140 | Now it's time to convert our book data to "triples", i.e. statements about things to put in our graph. A triple consists of a subject, a predicate, and an object:
141 |
142 | - the subject is the name of a resource: an IRI
143 | - the predicate is the name of a relation: also an IRI
144 | - the object can either be an IRI or literal data.
145 |
146 | We represent an IRI with a string, or a contracted IRI with a keyword. We represent literal data as a map with special keys:
147 |
148 | - `:value` is the string value ("lexical value") of the data, e.g. "The Iliad", "100.31"
149 | - `:type` is the IRI of a data type, with `xsd:string` as the default
150 | - `:lang` is an optional language code, e.g. "en", "en-uk"
151 |
152 | The `literal` function is a convenient way to create a literal map:
153 |
154 | user=> (literal "The Iliad")
155 | {:value "The Iliad"}
156 | user=> (literal 100.31)
157 | {:value "100.31", :type :xsd:float}
158 |
159 | The `objectify` function takes a resource map and a value, and determines whether to convert the value to an IRI or a literal:
160 |
161 | user=> (objectify resources "Some string")
162 | {:value "Some string"}
163 | user=> (objectify resources "Homer")
164 | :Homer
165 |
166 | Now we can treat each map as a set of statements about a resources, and `triplify` it to a lazy sequence of triples. The format will be "flat triples", a list with slots for: subject, predicate, object, type, and lang.
167 |
168 | The `triplify` function takes our resource map and a map of data that includes a `:subject-iri` key. It returns a lazy sequence of triples.
169 |
170 | user=> (def triples (triplify resources book))
171 | #'user/triples
172 | user=> (vec triples)
173 | [[:the-iliad :title {:value "The Iliad"}] [:the-iliad :author :Homer]]
174 |
175 | You'll notice that the subject `:the-iliad` is repeated here. With a larger set of triples the redundancy will be greater. Instead we can use a nested data structure:
176 |
177 | user=> (def subjects (subjectify triples))
178 | #'user/subjects
179 | user=> subjects
180 | {:the-iliad {:title #{{:value "The Iliad"}}, :author #{:Homer}}}
181 |
182 | From the inside out, it works like this:
183 |
184 | - object-set: the set of object with the same subject and predicate
185 | - predicate-map: a map from predicate IRIs to object sets
186 | - subject-map: map from subject IRIs to predicate sets
187 |
188 | We work with these data structures like any other Clojure data, using `merge`, `assoc`, `update`, and the rest of the standard Clojure toolkit:
189 |
190 | user=> (def context+ (merge default-context context))
191 | #'user/context+
192 | user=> (def subjects+ (assoc-in subjects [:the-iliad :rdf:type] #{:book}))
193 | #'user/subjects+
194 | user=> (def triples+ (conj triples [:the-iliad :rdf:type :book]))
195 | #'user/triples+
196 |
197 | Now, we can write to standard linked data formats, such as Turtle:
198 |
199 | user=> (def prefixes (assoc (get-prefixes context) :rdf rdf :xsd xsd))
200 | #'user/prefixes
201 | user=> (def expanded-triples (map #(expand-all context+ %) triples+))
202 | #'user/expanded-triples
203 | user=> (edn-ld.jena/write-triple-string prefixes expanded-triples)
204 | @prefix ex: .
205 | @prefix rdf: .
206 | @prefix xsd: .
207 | @prefix dc: .
208 |
209 | ex:the-iliad a ex:book ;
210 | dc:author ex:Homer ;
211 | dc:title "The Iliad"^^xsd:string .
212 |
213 | One more thing before we're done: *named graphs*. A graph is just a set of triples. When we want to talk about a particular graph, we give it a name: an IRI, of course. Then we can talk about sets of named graphs when we want to compare them, merge them, etc. The official name for a set of graphs is an "[RDF dataset](http://www.w3.org/TR/rdf11-concepts/#section-dataset)". A dataset includes "default graph" with no name.
214 |
215 | By adding the name of a graph, our *triples* become *quads* ("quadruples"). We define a quad and some new functions to handle them.
216 |
217 | user=> (def library [(assoc book :graph-iri :library)])
218 | #'user/library
219 | user=> library
220 | [{:title "The Iliad", :author "Homer", :subject-iri :the-iliad, :graph-iri :library}]
221 | user=> (def quads (quadruplify-all resources library))
222 | #'user/quads
223 | user=> (vec quads)
224 | [[:library :the-iliad :title {:value "The Iliad"}] [:library :the-iliad :author :Homer]]
225 | user=> (graphify quads)
226 | {:library {:the-iliad {:title #{{:value "The Iliad"}}, :author #{:Homer}}}}
227 |
228 |
229 | ## More
230 |
231 | - Conference paper about EDN-LD ([PDF](https://github.com/ontodev/icbo2015-edn-ld/blob/master/edn_ld.pdf), [source](https://github.com/ontodev/icbo2015-edn-ld))
232 |
233 |
234 | ## Change Log
235 |
236 | - 0.3.0
237 | - update to Jena 3.0.1
238 | - 0.2.2
239 | - fix bug in blank node handling
240 | - 0.2.1
241 | - fix bug in edn-ld.jena/make-node
242 | - 0.2.0
243 | - use Apache Jena for reading and writing
244 | - fix `triplify` functions to use `:subject-iri` key
245 | - add `quadruplify` and `graphify` functions, using `:graph-iri` key
246 | - rename `squash` functions to `flatten`
247 | - fix `flatten` functions
248 | - many more unit tests
249 | - prefer Triples to FlatTriples
250 | - 0.1.0
251 | - first release
252 |
253 |
254 | ## To Do
255 |
256 | - finish streaming RDFXML reader and writer
257 | - ClojureScript support? Would require different libraries for reading and writing
258 |
259 |
260 | ## License
261 |
262 | Copyright © 2015 James A. Overton
263 |
264 | Distributed under the BSD 3-Clause License.
265 |
--------------------------------------------------------------------------------
/project.clj:
--------------------------------------------------------------------------------
1 | (defproject edn-ld "0.3.0"
2 | :description "A simple linked data tool"
3 | :url "https://github.com/ontodev/edn-ld"
4 | :license {:name "BSD 3-Clause License"
5 | :url "http://opensource.org/licenses/BSD-3-Clause"}
6 | :dependencies [[org.clojure/clojure "1.7.0-beta3"]
7 | [prismatic/schema "0.4.2"]
8 | [org.apache.jena/jena-arq "3.0.1"]
9 | [org.codehaus.woodstox/woodstox-core-asl "4.3.0"]]
10 | :plugins [[lein-cljfmt "0.1.10"]])
11 |
--------------------------------------------------------------------------------
/src/edn_ld/common.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.common)
2 |
3 | (def rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
4 |
5 | (def rdfs "http://www.w3.org/2000/01/rdf-schema#")
6 |
7 | (def xsd "http://www.w3.org/2001/XMLSchema#")
8 |
9 | (def owl "http://www.w3.org/2002/07/owl#")
10 |
11 | (def default-prefixes {:rdf rdf :rdfs rdfs :xsd xsd :owl owl})
12 |
13 | (def default-context default-prefixes)
14 |
--------------------------------------------------------------------------------
/src/edn_ld/core.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.core
2 | (:require [clojure.string :as string]
3 | [clojure.test :refer :all]
4 | [schema.core :as s]
5 | [edn-ld.common :refer [rdf xsd]]))
6 |
7 | ;; EDN-LD uses [Prismatic Schema](https://github.com/Prismatic/schema)
8 | ;; to specify and validate the data structures that we use.
9 | ;; Clojure is not a strongly typed language like Java.
10 | ;; Instead we build our data structures from a rich set of primitives,
11 | ;; and use schemas to ensure that our data has the right shape.
12 |
13 |
14 | ;; # Identifiers
15 |
16 | ;; Linked data consists of a network of links between resources
17 | ;; named by Internationalized Resource Identifiers (IRIs).
18 | ;; IRIs extend the more familiar URL and URI to use UNICODE characters.
19 | ;; [RFC3987](http://tools.ietf.org/html/rfc3987)
20 | ;; provides a grammar for parsing IRIs,
21 | ;; but for now we will cut corners and allow any String.
22 |
23 | (def IRI s/Str)
24 |
25 | ;; IRIs provide explicit, globally unique names for things.
26 | ;; Anything that we want to talk about should have an IRI.
27 | ;; But sometimes all we need is a local, implicit link, without a global name.
28 | ;; In this case we can use a blank node.
29 | ;; We'll use the [Turtle](http://www.w3.org/TR/turtle/#BNodes) syntax
30 | ;; and say that a blank node is a string that starts with "_:"
31 |
32 | (def BlankNode #"^_:.*$")
33 |
34 | ;; IRIs can be long and cumbersome to work with,
35 | ;; so we'll defined a Contraction to be a keyword that we can expand to an IRI.
36 |
37 | (def Contraction s/Keyword)
38 |
39 | ;; To move between IRIs and Contractions we'll use a PrefixMap or a Context.
40 | ;; A PrefixMap is just a map from prefix keywords to IRIs.
41 |
42 | (def PrefixMap {(s/maybe s/Keyword) IRI})
43 |
44 | ;; A Context is just a map from Contractions to Contractions or IRIs.
45 |
46 | (def Context {(s/maybe Contraction) (s/either Contraction IRI)})
47 |
48 | ;; Contexts can be recursive, so they're more convenient to use,
49 | ;; but when specifying output formats we'll need to use a PrefixMap.
50 | ;; To move from a Context to a PrefixMap, this function is usually sufficient:
51 |
52 | (defn get-prefixes
53 | "Given a Context, return a PrefixMap
54 | by removing pairs where the value is not a string."
55 | [context]
56 | (->> context
57 | (filter #(string? (val %)))
58 | (into {})))
59 |
60 | ;; We'll expand a contracted IRI in a Context by recursively looking up keys.
61 | ;; Since the recursion should not be very deep, we won't bother using `loop`.
62 | ;; Since Contractions are all keywords, we just return any other type of input.
63 | ;; We'll ignore some special keywords used later for Literals.
64 | ;; If the Contraction contains a colon (:),
65 | ;; then we'll split it and look up the prefix part.
66 | ;; E.g. `:rdfs:label` will resolve the prefix `:rdfs` and then append `label`.
67 | ;; Be careful not to build a loop into your Context!
68 | ;; For example: `(expand {:foo :foo} :foo)`
69 |
70 | (def reserved-keywords #{:value :type :lang :subject-iri :graph-iri})
71 |
72 | (defn reserved?
73 | "Return true if the input is a reserved keyword."
74 | [input]
75 | (contains? reserved-keywords input))
76 |
77 | (defn expand
78 | "Given a Context and some input (usually a Contraction),
79 | try to return an IRI string.
80 | If the input is not a keyword then just return it;
81 | if the input is a key in the Context then return the expanded value;
82 | if the input has a prefix in the Context then return the joined value;
83 | otherwise use the default prefix."
84 | [context input]
85 | (try
86 | (cond
87 | (not (keyword? input))
88 | input
89 | (reserved? input)
90 | input
91 | (find context input)
92 | (expand context (get context input))
93 | (.contains (name input) ":")
94 | (let [[prefix local] (string/split (name input) #":" 2)]
95 | (str (get context (keyword prefix)) local))
96 | :else
97 | (str (expand context (get context nil)) (name input)))
98 | (catch StackOverflowError e input)))
99 |
100 | (defn expand-all
101 | "Given an optional Context and some collection,
102 | try to expand all Contractions in the collection,
103 | and return the updated collection."
104 | ([coll]
105 | (expand-all
106 | {:rdf:langString (str rdf "langString")
107 | :xsd:string (str xsd "string")}
108 | coll))
109 | ([context coll]
110 | (clojure.walk/prewalk (partial expand context) coll)))
111 |
112 | ;; Contracting an IRI is a little trickier.
113 | ;; First we consider the case where the IRI is exactly a value in the Context.
114 | ;; So we expand and reverse the context to map from IRIs to Contractions,
115 | ;; then look up keys in the reversed Context.
116 |
117 | ;; Since these functions are likely to be called a lot on small inputs,
118 | ;; we'll use `memoize` to trade time for space.
119 |
120 | (defn reverse-context
121 | "Given a context map from prefixes to IRIs,
122 | return a map from IRIs to prefixes."
123 | [context]
124 | (->> context
125 | (map (juxt #(expand context (val %)) key))
126 | (into {})))
127 |
128 | (def memoized-reverse-context (memoize reverse-context))
129 |
130 | ;; Second we consider the case where the IRI starts with a prefix.
131 | ;; We'll use the longest prefix we can find,
132 | ;; which requires us to sort them by alphanumerically then by length.
133 |
134 | (defn sort-prefixes
135 | "Given a Context,
136 | return a sequence of (IRI prefix) pairs, from longest IRI to shortest."
137 | [context]
138 | (->> context
139 | (map (juxt #(expand context (val %)) key))
140 | sort
141 | (sort-by (comp count first) >)))
142 |
143 | (def memoized-sort-prefixes (memoize sort-prefixes))
144 |
145 | ;; The `get-prefixed` uses lazy sequences,
146 | ;; so the minimal number of maps and filters will be used.
147 |
148 | (defn get-prefixed
149 | "Given Context and an input (usually an IRI string),
150 | try to return a Contraction using the longest prefix."
151 | [context input]
152 | (->> context
153 | memoized-sort-prefixes
154 | (filter #(.startsWith input (first %)))
155 | (map
156 | (fn [[uri prefix]]
157 | (string/replace-first
158 | input
159 | uri
160 | (if prefix (str (name prefix) ":") ""))))
161 | (map keyword)
162 | first))
163 |
164 | ;; Now we define our `contract` function to handle both cases.
165 |
166 | (defn contract
167 | "Given a Context and an input (usually and IRI string),
168 | try to return a Contraction.
169 | If the input is not a string, just return it;
170 | if the input exactly matches a value in the context map, return the key;
171 | otherwise try to use the longest matching prefix."
172 | [context input]
173 | (cond
174 | (not (string? input))
175 | input
176 | (find (memoized-reverse-context context) input)
177 | (get (memoized-reverse-context context) input)
178 | (get-prefixed context input)
179 | (get-prefixed context input)
180 | :else
181 | input))
182 |
183 |
184 | ;; # Literals
185 |
186 | ;; We can also link things to literal data, such as strings and numbers.
187 | ;; We represent literals as a map with special keys.
188 | ;;
189 | ;; - :value is the lexical value of the data, and must be present
190 | ;; - :type is an IRI specifying the type of data
191 | ;; - :lang is a code for the language of the data, which must conform to
192 | ;; [BCP 47](http://tools.ietf.org/html/bcp47#section-2.2.9)
193 | ;;
194 | ;; Again, we'll cut corners for now and allow any string to be a language tag.
195 | ;; If the :type is xsd:string we won't include it.
196 | ;; If the :lang key is present, then the :type must be rdf:langString.
197 | ;; So we have three cases:
198 |
199 | (def Lexical s/Str)
200 |
201 | (def Datatype IRI)
202 |
203 | (def Lang s/Str)
204 |
205 | (def DefaultLiteral {:value Lexical}) ; implicit :type :xsd:string
206 |
207 | (def TypedLiteral {:value Lexical :type Datatype})
208 |
209 | (def LangLiteral
210 | {:value Lexical
211 | (s/optional-key :type) (s/enum (str rdf "langString") :rdf:langString)
212 | :lang Lang})
213 |
214 | (def Literal (s/either DefaultLiteral LangLiteral TypedLiteral))
215 |
216 | ;; For convenience, we'll define a multimethod that takes a Clojure value
217 | ;; and returns its datatype IRI.
218 | ;; The default value is xsd:string.
219 | ;; You can extend this multimethod as desired: http://clojure.org/multimethods
220 |
221 | (defmulti get-type
222 | "Given a value, return a best guess at its RDF datatype."
223 | class)
224 |
225 | (defmethod get-type :default [_] :xsd:string)
226 |
227 | (defmethod get-type String [_] :xsd:string)
228 |
229 | (defmethod get-type Integer [_] :xsd:integer)
230 |
231 | (defmethod get-type Long [_] :xsd:integer)
232 |
233 | (defmethod get-type Float [_] :xsd:float)
234 |
235 | (defmethod get-type Double [_] :xsd:float)
236 |
237 | ;; Now we define a convenience function to create a Literal
238 | ;; with an explicit or implicit type.
239 |
240 | (defn literal
241 | "Given a value and an optional type or language tag, return a Literal.
242 | If the second argument starts with '@', consider it a language tag,
243 | otherwise consider it a type IRI."
244 | ([value] (literal (str value) (get-type value)))
245 | ([value type-or-lang]
246 | (if (.startsWith (str type-or-lang) "@")
247 | (literal value nil (.substring (str type-or-lang) 1))
248 | (literal value type-or-lang nil)))
249 | ([value type lang]
250 | (cond
251 | lang
252 | {:value value
253 | :lang lang}
254 | (= type :xsd:string)
255 | {:value value}
256 | type
257 | {:value value
258 | :type type}
259 | :else
260 | (throw (Exception. (format "'%s' is not a valid type" type))))))
261 |
262 |
263 | ;; # Triples
264 |
265 | ;; A triple contains a Subject, a Predicate, and an Object.
266 | ;; It's a statement that asserts that the Subject stands in a relationship
267 | ;; to the Object as specified by the Predicate.
268 | ;; A triple is also a directed edge in a graph,
269 | ;; forming a link from the Subject to the Object.
270 |
271 | ;; A Subject must be a resource, either named with an IRI or anonymous
272 | ;; with a BlankNode.
273 |
274 | (def ExpandedSubject (s/either IRI BlankNode))
275 |
276 | (def ContractedSubject (s/either IRI BlankNode Contraction))
277 |
278 | ;; A Predicate must be a named resource, an IRI.
279 |
280 | (def ExpandedPredicate IRI)
281 |
282 | (def ContractedPredicate (s/either IRI Contraction))
283 |
284 | ;; An Object can either be a resource (IRI or BlankNode) or a Literal.
285 |
286 | (def ExpandedObject (s/either IRI BlankNode Literal))
287 |
288 | (def ContractedObject (s/either IRI BlankNode Contraction Literal))
289 |
290 | ;; Since an Object can be any one of these types,
291 | ;; we define a ResourceMap as a map from any value to an IRI or Contraction,
292 | ;; then `objectify` function that tries to use the ResourceMap
293 | ;; and returns a literal if it fails.
294 |
295 | (def ResourceMap {s/Any (s/either IRI Contraction)})
296 |
297 | (defn objectify
298 | "Given an optional ResourceMap and an input value,
299 | return the resource if possible, otherwise a Literal."
300 | ([input]
301 | (objectify nil input))
302 | ([resource-map input]
303 | (cond
304 | (nil? input)
305 | nil
306 | (keyword? input)
307 | input
308 | (find resource-map input)
309 | (get resource-map input)
310 | :else
311 | (literal input))))
312 |
313 | ;; Now we can define triples:
314 |
315 | (def ExpandedTriple [ExpandedSubject ExpandedPredicate ExpandedObject])
316 | (def ExpandedTriples [ExpandedTriple])
317 |
318 | (def ContractedTriple [ContractedSubject ContractedPredicate ContractedObject])
319 | (def ContractedTriples [ContractedTriple])
320 |
321 | (def Triple ContractedTriple)
322 | (def Triples ContractedTriples)
323 |
324 | ;; If we want to write our triples to a table of strings
325 | ;; we can use the FlatTriple format,
326 | ;; which is just a sequence with three, four, or five values:
327 | ;;
328 | ;; - three values when the object is not a Literal
329 | ;; - four values when the object is a TypedLiteral
330 | ;; (to avoid ambiguity, we always specify the type, even if it's xsd:string)
331 | ;; - five values when the object is a LangLiteral
332 | ;; (in which case the type must be rdf:langString)
333 | ;;
334 | ;; FlatTriples are convenient for streaming and working with lazy sequences.
335 |
336 | (def FlatTriple
337 | (s/either
338 | [ContractedSubject ContractedPredicate ContractedSubject]
339 | [ContractedSubject ContractedPredicate Lexical Datatype]
340 | [ContractedSubject ContractedPredicate Lexical Datatype Lang]))
341 |
342 | (def FlatTriples [FlatTriple])
343 |
344 | ;; The most interesting part of EDN-LD
345 | ;; is converting from general EDN data to triples.
346 | ;; Now we define some functions to make that easy.
347 | ;; We rely on another convention: the special `:subject-iri` key.
348 | ;; Our `triplify` and `triplify-all` functions
349 | ;; expect their input maps to contain a `:subject-iri` key
350 | ;; that will be used to specify the subject of the triple.
351 | ;; The `:subject-iri` key is not treated as a predicate.
352 |
353 | (defn flatten-literal
354 | "Given a literal map and options, return a vector representation.
355 | Option :expand will return default types as IRI instead of Contractions."
356 | [{:keys [value type lang] :as literal}]
357 | (cond
358 | lang
359 | [value :rdf:langString lang]
360 | type
361 | [value type]
362 | value
363 | [value :xsd:string]
364 | :else
365 | (throw (Exception. "Literal map must have a :value."))))
366 |
367 | (defmulti flatten-triples
368 | "Given a Subject, a Predicate, and an Object,
369 | return a sequence of FlatTriples (usually containing just one FlatTriple)."
370 | (fn [subject predicate object] (class object)))
371 |
372 | (defmethod flatten-triples String
373 | [subject predicate object]
374 | [[subject predicate object]])
375 |
376 | (defmethod flatten-triples clojure.lang.Keyword
377 | [subject predicate object]
378 | [[subject predicate object]])
379 |
380 | (defmethod flatten-triples java.util.Map
381 | [subject predicate object]
382 | [(into [subject predicate] (flatten-literal object))])
383 |
384 | (defn triplify-one
385 | "Given an optional ResourceMap, a Subject, a Predicate, and an Object,
386 | return a Triple.
387 | Tries to avoid circular references where subject and object are the same."
388 | ([subject predicate object]
389 | (triplify-one nil subject predicate object))
390 | ([resources subject predicate object]
391 | (if (= subject (objectify resources object))
392 | [subject predicate (objectify nil object)]
393 | [subject predicate (objectify resources object)])))
394 |
395 | (defn triplify
396 | "Given an optional ResourceMap and a map of data
397 | that has a :subject-iri key,
398 | return a lazy sequence of Triples."
399 | ([input-map]
400 | (triplify nil input-map))
401 | ([resources input-map]
402 | (->> input-map
403 | (map (juxt (constantly (:subject-iri input-map)) key val))
404 | ; remove special keys :subject-iri and :graph-iri
405 | (remove #(contains? #{:subject-iri :graph-iri} (second %)))
406 | (map (partial apply triplify-one resources)))))
407 |
408 | (defn triplify-all
409 | "Given an optional ResourceMap and a sequence of input maps
410 | where each map has a :subject-iri key,
411 | return a lazy sequence of Triples."
412 | ([input-maps]
413 | (triplify-all nil input-maps))
414 | ([resources input-maps]
415 | (mapcat (partial triplify resources) input-maps)))
416 |
417 | ;; Triples are convenient for streaming, but not for everything.
418 | ;; They can be quite redundant.
419 | ;; We also define a nested data structure called a SubjectMap.
420 | ;; From the inside out, it works like this:
421 | ;;
422 | ;; - ObjectSet: the set of object with the same subject and predicate
423 | ;; - PredicateMap: a map from predicate IRIs to ObjectSets
424 | ;; - SubjectMap: map from subject IRIs to PredicateMaps
425 |
426 | (def ObjectSet #{ContractedObject})
427 |
428 | (def PredicateMap {ContractedPredicate ObjectSet})
429 |
430 | (def SubjectMap {ContractedSubject PredicateMap})
431 |
432 | ;; The `subjectify` function rolls a sequence of Triples into a SubjectMap.
433 |
434 | (defn subjectify
435 | "Given a sequence of Triples, return a SubjectMap."
436 | [triples]
437 | (reduce
438 | (fn [coll [subject predicate object datatype lang]]
439 | (update-in
440 | coll
441 | [subject predicate]
442 | (fnil conj #{})
443 | (if datatype
444 | (literal object datatype lang)
445 | object)))
446 | nil
447 | triples))
448 |
449 | ;; We can also go the other way, from SubjectMap to Triples.
450 |
451 | (defn flatten-subjects
452 | "Given a SubjectMap, return a lazy sequnce of Triples."
453 | [subject-map]
454 | (apply
455 | concat
456 | (for [[subject predicate-map] subject-map
457 | [predicate object-set] predicate-map
458 | object object-set]
459 | (flatten-triples subject predicate object))))
460 |
461 | ;; # Quads
462 |
463 | ;; A graph is a set of triples.
464 | ;; If we want to talk about a graph, we give it a name: an IRI, of course.
465 | ;; Unlike other names, we allow a GraphName to be nil.
466 |
467 | (def ExpandedGraphName (s/maybe (s/either IRI BlankNode)))
468 |
469 | (def ContractedGraphName (s/maybe (s/either IRI BlankNode Contraction)))
470 |
471 | ;; When we add the name of a graph to a triple we get a "quad"
472 |
473 | (def ExpandedQuad
474 | [ExpandedGraphName ExpandedSubject ExpandedPredicate ExpandedObject])
475 |
476 | (def ContractedQuad
477 | [ContractedGraphName ContractedSubject ContractedPredicate ContractedObject])
478 |
479 | ;; We also define FlatQuads.
480 | ;; WARNING: A FlatQuad can have the same length as a FlatTriple!
481 | ;; We suggest that you stick to either Triples or Quads to avoid ambiguity.
482 |
483 | (def FlatQuads
484 | (s/either
485 | [ContractedGraphName ContractedSubject ContractedPredicate
486 | ContractedSubject]
487 | [ContractedGraphName ContractedSubject ContractedPredicate Lexical Datatype]
488 | [ContractedGraphName ContractedSubject ContractedPredicate Lexical Datatype
489 | Lang]))
490 |
491 | (def FlatQuads [FlatQuads])
492 |
493 | ;; Now we define `quadruplify` functions,
494 | ;; adding a new special key: `:graph-iri`.
495 |
496 | (defn quadruplify-one
497 | "Given an optional ResourceMap, a GraphName, a Subject, a Predicate,
498 | and an Object, return a Quad.
499 | Tries to avoid circular references where subject and object are the same."
500 | ([graph subject predicate object]
501 | (quadruplify-one nil subject predicate object))
502 | ([resources graph subject predicate object]
503 | (into [graph] (triplify-one resources subject predicate object))))
504 |
505 | (defn quadruplify
506 | "Given an optional ResourceMap and a map of data
507 | that includes :subject-iri and :graph-iri keys,
508 | return a lazy sequence of Quads."
509 | ([input-map]
510 | (quadruplify nil input-map))
511 | ([resources input-map]
512 | (->> input-map
513 | (map (juxt (constantly (:graph-iri input-map))
514 | (constantly (:subject-iri input-map))
515 | key
516 | val))
517 | ; remove special keys :subject-iri and :graph-iri
518 | (remove #(contains? #{:subject-iri :graph-iri} (nth % 2)))
519 | (map (partial apply quadruplify-one resources)))))
520 |
521 | (defn quadruplify-all
522 | "Given an optional ResourceMap and a sequence of input maps
523 | where each map has :subject-iri and :graph-iri keys,
524 | return a lazy sequence of Quads."
525 | ([input-maps]
526 | (quadruplify-all nil input-maps))
527 | ([resources input-maps]
528 | (mapcat (partial quadruplify resources) input-maps)))
529 |
530 |
531 | ;; We represent a collection of named graphs as one more layer of maps
532 | ;; with GraphNames as keys and SubjectMaps as values.
533 | ;; The "nil" key indicates the default graph.
534 |
535 | (def GraphMap {ContractedGraphName SubjectMap})
536 |
537 | (defn graphify
538 | "Given a sequence of Quads, return a GraphMap."
539 | [quads]
540 | (reduce
541 | (fn [coll [graph subject predicate object datatype lang]]
542 | (update-in
543 | coll
544 | [graph subject predicate]
545 | (fnil conj #{})
546 | (if datatype
547 | (literal object datatype lang)
548 | object)))
549 | nil
550 | quads))
551 |
552 | (defn flatten-graphs
553 | "Given a GraphMap, return a lazy sequnce of Quads."
554 | [graph-map]
555 | (apply
556 | concat
557 | (for [[graph subject-map] graph-map
558 | [subject predicate-map] subject-map
559 | [predicate object-set] predicate-map
560 | object object-set]
561 | (map (partial concat [graph])
562 | (flatten-triples subject predicate object)))))
563 |
--------------------------------------------------------------------------------
/src/edn_ld/jena.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.jena
2 | (:require [clojure.string :as string]
3 | [clojure.java.io :as io]
4 | [edn-ld.common :refer [xsd default-prefixes]])
5 | (:import (java.io StringReader StringWriter)
6 | (org.apache.jena.graph Triple Node_URI Node_Blank Node_Literal)
7 | (org.apache.jena.sparql.core Quad)
8 | (org.apache.jena.riot.system StreamRDF)
9 | (org.apache.jena.rdf.model ModelFactory RDFNode AnonId)
10 | (org.apache.jena.query DatasetFactory)
11 | (org.apache.jena.datatypes BaseDatatype)
12 | (org.apache.jena.riot RDFDataMgr RDFLanguages Lang)))
13 |
14 |
15 | ;; # Apache Jena
16 |
17 | (defmulti read-node
18 | "Given a Jena Node, return an EDN-LD node."
19 | class)
20 |
21 | (defmethod read-node :default
22 | [node]
23 | nil)
24 |
25 | (defmethod read-node Node_URI
26 | [node]
27 | (.getURI node))
28 |
29 | (defmethod read-node Node_Blank
30 | [node]
31 | (str "_:" (.getLabelString (.getBlankNodeId node))))
32 |
33 | (defmethod read-node Node_Literal
34 | [node]
35 | (let [value (.getLiteralLexicalForm node)
36 | type (.getLiteralDatatype node)
37 | type (when type (.getURI type))
38 | lang (.getLiteralLanguage node)]
39 | (merge
40 | {:value value}
41 | (when type
42 | (when (not= type (str xsd "string"))
43 | {:type type}))
44 | (when-not (string/blank? lang)
45 | {:lang lang}))))
46 |
47 | (defmethod read-node RDFNode
48 | [node]
49 | (cond
50 | (.isURIResource node)
51 | (.getURI node)
52 | (.isAnon node)
53 | (str "_:" (.getLabelString (.getId (.asResource node))))
54 | (.isLiteral node)
55 | (let [value (.getLiteralLexicalForm node)
56 | type (.getLiteralDatatype node)
57 | type (when type (.getURI type))
58 | lang (.getLiteralLanguage node)]
59 | (merge
60 | {:value value}
61 | (when type
62 | (when (not= type (str xsd "string"))
63 | {:type type}))
64 | (when-not (string/blank? lang)
65 | {:lang lang})))))
66 |
67 | (defmulti make-node
68 | "Given a model and an EDN-LD node, return a Jena Node."
69 | (fn [model node] (class node)))
70 |
71 | (defmethod make-node String
72 | [model node]
73 | (if (.startsWith node "_:")
74 | (.createResource model (AnonId. (.substring node 2)))
75 | (.createResource model node)))
76 |
77 | (defmethod make-node java.util.Map
78 | [model {:keys [value type lang] :as node}]
79 | (cond
80 | lang
81 | (.createLiteral model value lang)
82 | type
83 | (.createTypedLiteral model value (BaseDatatype. type))
84 | :else
85 | (.createTypedLiteral model value (BaseDatatype. (str xsd "string")))))
86 |
87 | (defn get-model
88 | "Given a PrefixMap and ExpandedTriples,
89 | return a model with the namespace prefixes and triples added."
90 | [prefixes triples]
91 | (let [model (ModelFactory/createDefaultModel)]
92 | (doseq [[prefix iri] (filter #(string? (val %)) prefixes)]
93 | (.setNsPrefix model (name prefix) iri))
94 | (doseq [[subject predicate object] triples]
95 | (.add
96 | model
97 | (make-node model subject)
98 | (.createProperty model predicate)
99 | (make-node model object)))
100 | model))
101 |
102 | (defn get-triple-map
103 | "Given Quads, return a map from GraphNames to SubjectMaps."
104 | [quads]
105 | (reduce
106 | (fn [coll quad]
107 | (update-in
108 | coll
109 | [(first quad)]
110 | (fnil conj [])
111 | (->> quad (drop 1) vec)))
112 | {}
113 | quads))
114 |
115 | (defn get-model-map
116 | "Given a PrefixMap and a map from graph names to Triples,
117 | return a map from graph names to Models."
118 | [prefixes triple-map]
119 | (->> triple-map
120 | (map (juxt key #(get-model prefixes (val %))))
121 | (into {})))
122 |
123 | (defn get-dataset
124 | "Given a PrefixMap and ExpandedQuads,
125 | return a dataset with models for each of the graphs."
126 | [prefixes quads]
127 | (let [dataset (DatasetFactory/createMem)
128 | models (->> quads get-triple-map (get-model-map prefixes))]
129 | (doseq [[graph model] (filter key models)]
130 | (.addNamedModel dataset graph model))
131 | (if (find models nil)
132 | (.setDefaultModel dataset (get models nil))
133 | (.setDefaultModel dataset (get-model prefixes nil)))
134 | dataset))
135 |
136 | (defn get-format
137 | "Given a Lang, a format string, a content type, or a filename,
138 | try to return an RDF Lang (file format)."
139 | [format]
140 | (or (when (instance? Lang format) format)
141 | (when (string? format) (RDFLanguages/nameToLang format))
142 | (when (string? format) (RDFLanguages/contentTypeToLang format))
143 | (when (string? format) (RDFLanguages/filenameToLang format))
144 | (throw (Exception. (str "Could not determine format: " format)))))
145 |
146 | (defn get-output
147 | "Given a StringWriter or potential output stream,
148 | return either a StringWriter or an OutputStream."
149 | [output]
150 | (if (instance? StringWriter output)
151 | output
152 | (io/output-stream output)))
153 |
154 |
155 | ;; # Read Triples
156 |
157 | (defn stream-triples
158 | "Given atoms for a PrefixMap and a sequence for ExpandedTriples,
159 | return an instance of StreamRDF for collecting triples.
160 | Quads are ignored."
161 | [prefixes triples]
162 | (reify StreamRDF
163 | (^void start [_])
164 | (^void triple [_ ^Triple triple]
165 | (swap!
166 | triples
167 | conj
168 | [(read-node (.getSubject triple))
169 | (read-node (.getPredicate triple))
170 | (read-node (.getObject triple))]))
171 | (^void quad [_ ^Quad quad])
172 | (^void base [_ ^String base]
173 | (swap! prefixes assoc :base-iri base)) ; TODO: handle base IRI
174 | (^void prefix [_ ^String prefix ^String iri]
175 | (swap!
176 | prefixes
177 | assoc
178 | (if (string/blank? prefix) nil (keyword prefix))
179 | iri))
180 | (^void finish [_])))
181 |
182 | (defn read-triples
183 | "Given a source path, reader, or input stream,
184 | an optional format name, and an optional base IRI,
185 | return the pair of a PrefixMap and ExpandedTriples."
186 | ([source]
187 | (let [prefixes (atom {})
188 | triples (atom [])]
189 | (RDFDataMgr/parse (stream-triples prefixes triples)
190 | source)
191 | [@prefixes @triples]))
192 | ([source format]
193 | (let [prefixes (atom {})
194 | triples (atom [])]
195 | (RDFDataMgr/parse (stream-triples prefixes triples)
196 | source
197 | (get-format format))
198 | [@prefixes @triples]))
199 | ([source base format]
200 | (let [prefixes (atom {})
201 | triples (atom [])]
202 | (RDFDataMgr/parse (stream-triples prefixes triples)
203 | source
204 | base
205 | (get-format format))
206 | [@prefixes @triples])))
207 |
208 | (defn read-triple-string
209 | "Given an input string, an optional base IRI, and a format name,
210 | return the pair of a PrefixMap and ExpandedTriples."
211 | ([input format]
212 | (read-triple-string input "http://example.com/" format))
213 | ([input base format]
214 | (read-triples (StringReader. input) base format)))
215 |
216 |
217 | ;; # Write Triples
218 |
219 | (defn write-triples
220 | "Given a destination that can be used as an OutputStream or StringWriter,
221 | an optional format, a PrefixMap, and ExpandedTriples,
222 | write the triples to the destination and return them unchanged."
223 | ([dest prefixes triples]
224 | (write-triples
225 | dest
226 | (RDFLanguages/filenameToLang dest)
227 | prefixes
228 | triples))
229 | ([dest format prefixes triples]
230 | (with-open [output (get-output dest)]
231 | (RDFDataMgr/write
232 | output
233 | (get-model prefixes triples)
234 | (get-format format)))
235 | triples))
236 |
237 | (defn write-triple-string
238 | "Given an optional format (defaults to Turtle),
239 | a PrefixMap, and ExpandedTriples,
240 | return a string representation in that format."
241 | ([prefixes triples]
242 | (with-open [writer (StringWriter.)]
243 | (write-triples writer (get-format "ttl") prefixes triples)
244 | (str writer)))
245 | ([format prefixes triples]
246 | (with-open [writer (StringWriter.)]
247 | (write-triples writer format prefixes triples)
248 | (str writer))))
249 |
250 |
251 | ;; # Read Quads
252 |
253 | (defn stream-quads
254 | "Given atoms for a PrefixMap and a sequence for ExpandedQuads,
255 | return an instance of StreamRDF for collecting quads.
256 | Triples are ignored."
257 | [prefixes quads]
258 | (reify StreamRDF
259 | (^void start [_])
260 | (^void triple [_ ^Triple triple])
261 | (^void quad [_ ^Quad quad]
262 | (swap!
263 | quads
264 | conj
265 | [(read-node (.getGraph quad))
266 | (read-node (.getSubject quad))
267 | (read-node (.getPredicate quad))
268 | (read-node (.getObject quad))]))
269 | (^void base [_ ^String base]
270 | (swap! prefixes assoc :base-iri base)) ; TODO: handle base IRI
271 | (^void prefix [_ ^String prefix ^String iri]
272 | (swap!
273 | prefixes
274 | assoc
275 | (if (string/blank? prefix) nil (keyword prefix))
276 | iri))
277 | (^void finish [_])))
278 |
279 | (defn read-quads
280 | "Given a source path, reader, or input stream,
281 | an optional format name, and an optional base IRI,
282 | return the pair of a PrefixMap and ExpandedQuads."
283 | ([source]
284 | (let [prefixes (atom {})
285 | quads (atom [])]
286 | (RDFDataMgr/parse (stream-quads prefixes quads)
287 | source)
288 | [@prefixes @quads]))
289 | ([source format]
290 | (let [prefixes (atom {})
291 | quads (atom [])]
292 | (RDFDataMgr/parse (stream-quads prefixes quads)
293 | source
294 | (get-format format))
295 | [@prefixes @quads]))
296 | ([source base format]
297 | (let [prefixes (atom {})
298 | quads (atom [])]
299 | (RDFDataMgr/parse (stream-quads prefixes quads)
300 | source
301 | base
302 | (get-format format))
303 | [@prefixes @quads])))
304 |
305 | (defn read-quad-string
306 | "Given an input string, an optional base IRI, and a format name,
307 | return the pair of a PrefixMap and ExpandedQuads."
308 | ([input format]
309 | (read-quad-string input "http://example.com/" format))
310 | ([input base format]
311 | (read-quads (StringReader. input) base format)))
312 |
313 |
314 | ;; # Write Quads
315 |
316 | (defn write-quads
317 | "Given a destination that can be used as an OutputStream or StringWriter,
318 | an optional PrefixMap, an optional format, and ExpandedQuads,
319 | write the quads to the destination and return them unchanged."
320 | ([dest quads]
321 | (write-quads dest default-prefixes quads))
322 | ([dest prefixes quads]
323 | (write-quads
324 | dest
325 | (RDFLanguages/filenameToLang dest)
326 | prefixes
327 | quads))
328 | ([dest format prefixes quads]
329 | (with-open [output (get-output dest)]
330 | (RDFDataMgr/write
331 | output
332 | (get-dataset prefixes quads)
333 | (get-format format)))
334 | quads))
335 |
336 | (defn write-quad-string
337 | "Given an optional PrefixMap, and optional format (defaults to Trig),
338 | and ExpandedQuads, return a string representation in that format."
339 | ([quads]
340 | (write-quad-string default-prefixes quads))
341 | ([prefixes quads]
342 | (with-open [writer (StringWriter.)]
343 | (write-quads writer (get-format "trig") prefixes quads)
344 | (str writer)))
345 | ([format prefixes quads]
346 | (with-open [writer (StringWriter.)]
347 | (write-quads writer format prefixes quads)
348 | (str writer))))
349 |
--------------------------------------------------------------------------------
/src/edn_ld/rdfxml.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.rdfxml
2 | (:require [clojure.string :as string]
3 | [clojure.java.io :as io]
4 | [edn-ld.common :refer [rdf xsd default-context]])
5 | (:import (org.codehaus.stax2 XMLInputFactory2 XMLOutputFactory2
6 | XMLStreamReader2 XMLStreamWriter2)))
7 |
8 | ;; WARNING: This code is work in progress!
9 | ;; Does not handle all of RDFXML, has not been optimized,
10 | ;; and has not been properly documented.
11 |
12 |
13 | ;; ## Clojure Type Hints
14 | ;;
15 | ;; Java is a strongly typed programming language, in which the type of every
16 | ;; variable and method is explicitly declared. Clojure is a dynamic language
17 | ;; in which types are usually inferred. The type inference process often
18 | ;; involves "reflection" into classes, and this tends to slow down Clojure code.
19 | ;;
20 | ;; Clojure can be almost as fast as native Java code, but we have to avoid
21 | ;; reflection. We avoid it by adding "type hints" to tell the Clojure compiler
22 | ;; what types to expect for input and return values. Metadata annotation on
23 | ;; functions and values provide the hints: `^QName`, `^XMLInputFactory2`, etc.
24 |
25 | ;; The code in this file is optimized for speed, so we tell the Clojure compiler
26 | ;; to `warn-on-reflection` and add type hints until those warnings go away.
27 |
28 | ;(set! *warn-on-reflection* true)
29 |
30 | ;; ## Factories
31 | ;;
32 | ;; The following functions are used to create factories, readers, writers,
33 | ;; and filters from the Woodstox library.
34 |
35 | (defn create-input-factory
36 | "Create and return Woodstox XMLInputFactory2, with configuration."
37 | ^XMLInputFactory2 []
38 | (XMLInputFactory2/newInstance))
39 |
40 | (defn create-output-factory
41 | "Create and return Woodstox XMLOutputFactory2."
42 | ^XMLOutputFactory2 []
43 | (XMLOutputFactory2/newInstance))
44 |
45 | ;; By default we will use the same shared input and output factories.
46 |
47 | (def ^XMLInputFactory2 shared-input-factory (create-input-factory))
48 | (def ^XMLOutputFactory2 shared-output-factory (create-output-factory))
49 |
50 | (defn create-stream-reader
51 | "Create and return a Woodstox XMLStreamReader2 for a given path.
52 | An XMLInputFactory is optional."
53 | (^XMLStreamReader2 [path]
54 | (create-stream-reader shared-input-factory path))
55 | (^XMLStreamReader2 [^XMLInputFactory2 input-factory path]
56 | (.createXMLStreamReader input-factory (io/file path))))
57 |
58 | (defn create-stream-writer
59 | "Create and return a Woodstox XMLStreamWriter2 for a given path.
60 | An XMLOutputFactory is optional."
61 | (^XMLStreamWriter2 [path]
62 | (create-stream-writer shared-output-factory path))
63 | (^XMLStreamWriter2 [^XMLOutputFactory2 output-factory path]
64 | (.createXMLStreamWriter output-factory (clojure.java.io/writer path))))
65 |
66 | ; To read an XML file lazily
67 | ; we create a wrapped function using lazy-seq.
68 | ; If the (.hasNext reader) fails, the
69 | ; http://stackoverflow.com/a/19656800
70 |
71 | #_(defn lazy-read-ok
72 | [csv-file]
73 | (with-open [in-file (io/reader csv-file)]
74 | (frequencies (map #(nth % 2) (csv/read-csv in-file)))))
75 |
76 | (defn lazy-read
77 | [path]
78 | (let [reader (create-stream-reader path)
79 | lazy (fn lazy [wrapped]
80 | (lazy-seq
81 | (if (.hasNext reader)
82 | (do (.next reader)
83 | (cons (.getEventType reader) (lazy reader)))
84 | (.close reader))))]
85 | (lazy reader)))
86 |
87 | (defn advance
88 | [reader]
89 | (while (and (.hasNext reader) (not (.isStartElement reader)))
90 | (.next reader)))
91 |
92 | (defn get-context
93 | [reader]
94 | (advance reader)
95 | (when (and (.isStartElement reader)
96 | (= (.getLocalName reader) "RDF"))
97 | (->> (range 0 (.getNamespaceCount reader))
98 | (map
99 | (fn [i]
100 | [(when-not (string/blank? (.getNamespacePrefix reader i))
101 | (.getNamespacePrefix reader i))
102 | (.getNamespaceURI reader i)]))
103 | (into {}))))
104 |
105 | (defn get-element-iri
106 | [context reader]
107 | (when (.isStartElement reader)
108 | (str (get context (.getPrefix reader))
109 | (.getLocalName reader))))
110 |
111 | (defn get-text
112 | [reader]
113 | (while (and (.hasNext reader) (not (.hasText reader)))
114 | (.next reader))
115 | (.getText reader))
116 |
117 | (defn get-attribute-map
118 | "Given a reader at a start element,
119 | return a map from attribute IRIs to their values."
120 | [context reader]
121 | (when (.isStartElement reader)
122 | (->> (range 0 (.getAttributeCount reader))
123 | (map
124 | (fn [i]
125 | [(str (get context (.getAttributePrefix reader i))
126 | (.getAttributeLocalName reader i))
127 | (.getAttributeValue reader i)]))
128 | (into {}))))
129 |
130 | (defn read-triple
131 | "Read the triples attached to a given element."
132 | [context subject reader]
133 | (when (.isStartElement reader)
134 | (let [element (get-element-iri context reader)
135 | attrs (get-attribute-map context reader)
136 | about (get attrs (str rdf "about"))
137 | resource (get attrs (str rdf "resource"))
138 | datatype (get attrs (str rdf "datatype") (str xsd "string"))
139 | lang (get attrs "lang")]
140 | (cond
141 | ; RDF root element: return nothing
142 | (= element (str rdf "RDF"))
143 | [nil nil]
144 | ; Description element
145 | ; TODO: does not handle non-RDF attributes
146 | (and about (= element (str rdf "Description")))
147 | [about nil]
148 | ; type declaration element
149 | ; TODO: does not handle non-RDF attributes
150 | about
151 | [about [about (str rdf "type") element nil nil]]
152 | ; predicate resource assertion
153 | resource
154 | [subject [subject element resource nil nil]]
155 | ; predicate langString literal assertion
156 | lang
157 | [subject
158 | [subject element (get-text reader) (str rdf "langString") lang]]
159 | ; predicate typed literal assertion
160 | :else
161 | [subject [subject element (get-text reader) datatype nil]]))))
162 |
163 | (defn myread
164 | [path]
165 | (with-open [reader (create-stream-reader path)]
166 | (let [context (get-context reader)]
167 | (println context)
168 | (loop [triples []
169 | subject nil]
170 | (if (.hasNext reader)
171 | (let [[subject triple] (read-triple context subject reader)]
172 | (.next reader)
173 | (advance reader)
174 | (recur (conj triples triple) subject))
175 | triples)))))
176 |
177 | (defn myshow
178 | [path]
179 | (->> path
180 | myread
181 | (remove nil?)
182 | (map println)
183 | doall))
184 |
185 | (defn compact
186 | "Given a context map and an IRI,
187 | return a triple of the matched prefix, the namespace IRI, and the localname."
188 | [context iri]
189 | (if (string/blank? iri)
190 | ["" (get context nil) iri]
191 | (->> context
192 | (filter #(string? (val %)))
193 | (map (juxt val (comp name key)))
194 | (sort-by (comp count first) >)
195 | (filter #(.startsWith (str iri) (first %)))
196 | first
197 | ((fn [[uri prefix]]
198 | [prefix uri (string/replace-first (str iri) (str uri) "")])))))
199 |
200 | (defn write-object
201 | "Given a single Object (IRI or Literal), write it to RDFXML."
202 | [writer prefix uri local object]
203 | (.writeCharacters writer "\n ")
204 | (.writeStartElement writer prefix local uri)
205 | (when (string? object)
206 | (.writeAttribute writer "rdf" rdf "resource" object))
207 | (when (:type object)
208 | (when (and (not= (:type object) (str xsd "string"))
209 | (not= (:type object) (str rdf "langString")))
210 | (.writeAttribute writer "rdf" rdf "datatype" (:type object))))
211 | (when (:lang object)
212 | (.writeAttribute writer "xml" nil "lang" (:lang object)))
213 | (when (:value object)
214 | (.writeCharacters writer (:value object)))
215 | (.writeEndElement writer))
216 |
217 | (defn write-predicate
218 | "Given a predicate and an object set, write all of the objects."
219 | [writer context predicate object-set]
220 | (let [[prefix uri local] (compact context predicate)]
221 | (doseq [object object-set]
222 | (write-object writer prefix uri local object))))
223 |
224 | (defn pick-first-type
225 | "Given a predicate-map, pick out the first rdf:type,
226 | and return the pair of that type and a predicate map
227 | with that type removed."
228 | [predicate-map]
229 | (let [types (seq (get predicate-map (str rdf "type")))]
230 | [(first types)
231 | (if (seq (rest types))
232 | (assoc predicate-map
233 | (str rdf "type")
234 | (set (rest types)))
235 | (dissoc predicate-map (str rdf "type")))]))
236 |
237 | (defn write-subject
238 | "Given a subject and predicate map,
239 | write an element for this subject,
240 | with children for all predicate-object pairs."
241 | [writer context subject predicate-map]
242 | (.writeCharacters writer " ")
243 | (let [[type predicate-map] (pick-first-type predicate-map)]
244 | (if type
245 | (let [[prefix uri local] (compact context type)]
246 | (.writeStartElement writer prefix local uri))
247 | (.writeStartElement writer rdf "Description"))
248 | (.writeAttribute writer "rdf" rdf "about" subject)
249 | (when (seq predicate-map)
250 | (doseq [[predicate object-set]
251 | (seq (dissoc predicate-map (str rdf "type")))]
252 | (write-predicate writer context predicate object-set))
253 | (.writeCharacters writer "\n ")))
254 | (.writeEndElement writer))
255 |
256 | (defn write-subjects
257 | [writer context subject-map]
258 | (.writeStartDocument writer)
259 | (.writeCharacters writer "\n")
260 | (.writeStartElement writer "rdf" "RDF" rdf)
261 | (doseq [[prefix uri] (seq context)]
262 | (when (and (nil? prefix) (string? uri))
263 | (.writeDefaultNamespace writer uri))
264 | (when (and prefix (string? uri))
265 | (.writeNamespace writer (name prefix) uri)))
266 | (.writeCharacters writer "\n")
267 | (doseq [subject (butlast (keys subject-map))]
268 | (write-subject writer context subject (get subject-map subject))
269 | (.writeCharacters writer "\n\n"))
270 | (when (last (keys subject-map))
271 | (let [subject (last (keys subject-map))]
272 | (write-subject writer context subject (get subject-map subject))
273 | (.writeCharacters writer "\n")))
274 | (.writeEndElement writer)
275 | (.writeEndDocument writer)
276 | (.flush writer)
277 | (.close writer))
278 |
279 | (defn write-file
280 | ([path subject-map]
281 | (write-file path default-context subject-map))
282 | ([path context subject-map]
283 | (write-subjects
284 | (create-stream-writer path)
285 | context
286 | subject-map)))
287 |
288 | (defn write-string
289 | "Given an optional context and a subject map,
290 | return a string with the RDFXML representation."
291 | ([subject-map]
292 | (write-string default-context subject-map))
293 | ([context subject-map]
294 | (let [writer (java.io.StringWriter.)]
295 | (write-subjects
296 | (.createXMLStreamWriter shared-output-factory writer)
297 | context
298 | subject-map)
299 | (.toString writer))))
300 |
301 | (defn mytest
302 | []
303 | (->> (myread "simple.owl")
304 | (remove nil?)
305 | edn-ld.core/subjectify
306 | (write-file "output.owl" default-context)))
307 |
--------------------------------------------------------------------------------
/test-resources/books.tsv:
--------------------------------------------------------------------------------
1 | Title Author
2 | The Iliad Homer
3 |
--------------------------------------------------------------------------------
/test/edn_ld/core_test.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.core-test
2 | (:require [clojure.test :refer :all]
3 | [clojure.string :as string]
4 | [schema.core :as s]
5 | [edn-ld.core :refer :all]
6 | [edn-ld.common :refer [rdf xsd]]))
7 |
8 | ; These macros are borrowed from
9 | ; https://github.com/Prismatic/schema/blob/master/test/clj/schema/test_macros.clj
10 | (defmacro valid!
11 | "Assert that x satisfies schema s, and the walked value is equal to the original."
12 | [s x]
13 | `(let [x# ~x] (~'is (= x# ((s/start-walker s/walker ~s) x#)))))
14 |
15 | (defmacro invalid!
16 | "Assert that x does not satisfy schema s, optionally checking the stringified return value"
17 | ([s x]
18 | `(~'is (s/check ~s ~x)))
19 | ([s x expected]
20 | `(do (invalid! ~s ~x)
21 | (sm/if-cljs nil (~'is (= ~expected (pr-str (s/check ~s ~x))))))))
22 |
23 | ;; For now, any string is a valid IRI.
24 |
25 | (deftest test-iris
26 | (valid! IRI "http://example.com")
27 | (valid! IRI "foo")
28 | (invalid! IRI 123)
29 | (invalid! IRI :foo))
30 |
31 | (def context
32 | {:ex "http://example.com/"
33 | nil :ex
34 | :foo :ex:foo})
35 |
36 | (deftest test-context
37 | (valid! Context context))
38 |
39 | (deftest test-expand
40 | (are [x y] (= (expand context x) y)
41 | nil nil
42 | 123 123
43 | :foo "http://example.com/foo"
44 | :ex "http://example.com/"
45 | :ex:bar "http://example.com/bar"
46 | :baz "http://example.com/baz"))
47 |
48 | (deftest test-contract
49 | (are [x y] (= (contract context x) y)
50 | nil nil
51 | 123 123
52 | "foo" "foo"
53 | "http://example.com/foo" :foo
54 | "http://example.com/bar" :bar))
55 |
56 | (deftest test-literals
57 | (invalid! Literal nil)
58 | (invalid! Literal "foo")
59 | (invalid! Literal 123)
60 | (invalid! Literal {})
61 | (invalid! Literal [])
62 | (invalid! Literal {:value 123})
63 | (invalid! Literal {:value "foo" :type 123})
64 | (invalid! Literal {:value "foo" :type "bar" :lang "en"})
65 | (valid! Literal {:value "foo"})
66 | (valid! Literal {:value "foo" :type "bar"})
67 | (valid! Literal {:value "foo" :lang "en"})
68 | (is (= (literal "foo")
69 | {:value "foo"}))
70 | (is (= (literal 123)
71 | {:value "123" :type :xsd:integer}))
72 | (is (= (literal "foo" "bar")
73 | {:value "foo" :type "bar"}))
74 | (is (= (literal "foo" "@bar")
75 | {:value "foo" :lang "bar"})))
76 |
77 | (deftest test-flatten-triples
78 | (is (= (flatten-triples :s :p :o)
79 | [[:s :p :o]]))
80 | (is (= (flatten-triples :s :p {:value "o"})
81 | [[:s :p "o" :xsd:string]]))
82 | (is (= (flatten-triples :s :p {:value "o" :type :foo})
83 | [[:s :p "o" :foo]]))
84 | (is (= (flatten-triples :s :p {:value "o" :lang "en"})
85 | [[:s :p "o" :rdf:langString "en"]])))
86 |
87 | (deftest test-objectify
88 | (are [x y] (= (objectify x) y)
89 | nil nil
90 | :foo :foo
91 | "foo" {:value "foo"}
92 | 123 {:value "123" :type :xsd:integer})
93 | (is (= (objectify {"foo" :foo} "foo") :foo)))
94 |
95 | (deftest test-triplify
96 | (is (= (triplify {:subject-iri :subject
97 | :predicate :object})
98 | [[:subject :predicate :object]]))
99 | (is (= (triplify {:subject-iri :subject
100 | :predicate "Object"})
101 | [[:subject :predicate {:value "Object"}]]))
102 | (is (= (triplify {"Object" :object}
103 | {:subject-iri :subject
104 | :predicate "Object"})
105 | [[:subject :predicate :object]])))
106 |
107 | (deftest test-quadruplify
108 | (is (= (quadruplify {:subject-iri :subject
109 | :graph-iri :graph
110 | :predicate :object})
111 | [[:graph :subject :predicate :object]]))
112 | (is (= (quadruplify {:subject-iri :subject
113 | :graph-iri :graph
114 | :predicate "Object"})
115 | [[:graph :subject :predicate {:value "Object"}]]))
116 | (is (= (quadruplify {"Object" :object}
117 | {:subject-iri :subject
118 | :graph-iri :graph
119 | :predicate "Object"})
120 | [[:graph :subject :predicate :object]])))
121 |
122 | (deftest test-subjectify
123 | (is (= (subjectify
124 | [[:subject :predicate :object]
125 | [:subject :predicate "Object" :xsd:string]])
126 | {:subject {:predicate #{:object {:value "Object"}}}})))
127 |
128 | (deftest test-graphify
129 | (is (= (graphify
130 | [[:graph :subject :predicate :object]
131 | [:graph :subject :predicate "Object" :xsd:string]])
132 | {:graph {:subject {:predicate #{:object {:value "Object"}}}}})))
133 |
134 | (deftest test-flatten-subjects
135 | (is (= (set (flatten-subjects
136 | {:subject {:predicate #{:object {:value "Object"}}}}))
137 | #{[:subject :predicate :object]
138 | [:subject :predicate "Object" :xsd:string]})))
139 |
140 | (deftest test-flatten-graphs
141 | (is (= (set (flatten-graphs
142 | {:graph {:subject {:predicate #{:object {:value "Object"}}}}}))
143 | #{[:graph :subject :predicate :object]
144 | [:graph :subject :predicate "Object" :xsd:string]})))
145 |
--------------------------------------------------------------------------------
/test/edn_ld/jena_test.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.jena-test
2 | (:require [clojure.test :refer :all]
3 | [clojure.string :as string]
4 | [edn-ld.jena :refer :all]
5 | [edn-ld.common :refer [rdf xsd]])
6 | (:import (org.apache.jena.riot Lang)
7 | (org.apache.jena.rdf.model ModelFactory)))
8 |
9 | (deftest test-formats
10 | (are [x y] (= (get-format x) y)
11 | "turtle" Lang/TURTLE
12 | "foo.ttl" Lang/TURTLE
13 | "application/turtle" Lang/TURTLE
14 | Lang/TURTLE Lang/TURTLE))
15 |
16 | (def ex "http://example.com/")
17 | (def test1-turtle
18 | "@prefix ex: .
19 | ex:subject ex:predicate \"Object\"@en ;
20 | ex:predicate ex:object .")
21 |
22 | (def test1-edn
23 | [[(str ex "subject") (str ex "predicate") (str ex "object")]
24 | [(str ex "subject") (str ex "predicate") {:value "Object" :type "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString" :lang "en"}]])
25 |
26 | (defn clean
27 | [s]
28 | (-> s
29 | (string/replace #"(?m)\s+" " ")
30 | string/trim))
31 |
32 | (deftest test-triples
33 | (let [[prefixes triples] (read-triple-string test1-turtle "turtle")]
34 | (is (= prefixes {:base-iri ex :ex ex}))
35 | (is (= (set triples) (set test1-edn))))
36 | (is (= (clean (write-triple-string {:ex ex} test1-edn))
37 | (clean test1-turtle))))
38 |
39 | (def test2-trig
40 | "@prefix ex: .
41 | ex:graph {
42 | ex:subject ex:predicate \"Object\"@en ;
43 | ex:predicate ex:object .
44 | }")
45 |
46 | (def test2-edn
47 | [[(str ex "graph") (str ex "subject") (str ex "predicate") (str ex "object")]
48 | [(str ex "graph") (str ex "subject") (str ex "predicate")
49 | {:value "Object" :type "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString" :lang "en"}]])
50 |
51 | (deftest test-quads
52 | (let [[prefixes quads] (read-quad-string test2-trig "trig")]
53 | (is (= prefixes {:base-iri ex :ex ex}))
54 | (is (= (set quads) (set test2-edn))))
55 | (is (= (clean (write-quad-string {:ex ex} test2-edn))
56 | (clean test2-trig))))
57 |
58 | (deftest test-blank
59 | (let [model (ModelFactory/createDefaultModel)
60 | node (make-node model "_:foo")]
61 | (is (= (read-node node) "_:foo"))))
62 |
--------------------------------------------------------------------------------
/test/edn_ld/readme_test.clj:
--------------------------------------------------------------------------------
1 | (ns edn-ld.readme-test
2 | (:require [clojure.test :refer :all]
3 | [clojure.string :as string]
4 | [edn-ld.core :refer :all]
5 | [edn-ld.common :refer :all]
6 | [edn-ld.rdfxml]))
7 |
8 | ;; Parse the README.md file for indented code blocks,
9 | ;; execute anything marked "user=>",
10 | ;; and compare it to the expected output.
11 |
12 | (defn clean
13 | [s]
14 | (-> s
15 | (string/replace #"(?m)\s+" " ")
16 | string/trim))
17 |
18 | (def prompt "user=> ")
19 |
20 | (defn run-test
21 | [user expected]
22 | (let [command (string/replace user prompt "")
23 | actual (-> command
24 | read-string
25 | eval
26 | str
27 | (string/replace #"^#'edn-ld.readme-test" "#'user"))]
28 | ;(println "C" command)
29 | ;(println "E" expected)
30 | ;(println "A" actual)
31 | ;(println (= actual expected))
32 | (is (= (clean expected) (clean actual)))))
33 |
34 | (->> "README.md"
35 | slurp
36 | string/split-lines
37 | (filter #(.startsWith % " "))
38 | (map #(string/replace % #"^ " ""))
39 | (remove #(re-find #"^(\-|\+|\*) " %)) ; remove nested unordered list items
40 | (remove #(re-find #"^(\$|;)" %)) ; remove shell prompts and comments
41 | (drop-while #(not (.startsWith % prompt)))
42 | (drop 4) ; ignore 'use' and 'require'
43 | (partition-by #(.startsWith % prompt))
44 | (map (partial string/join "\n"))
45 | (partition 2)
46 | (map (partial apply run-test))
47 | doall
48 | (apply = true))
49 |
--------------------------------------------------------------------------------