├── .gitignore ├── .travis.yml ├── LICENSE ├── README.md ├── project.clj ├── src └── edn_ld │ ├── common.clj │ ├── core.clj │ ├── jena.clj │ └── rdfxml.clj ├── test-resources └── books.tsv └── test └── edn_ld ├── core_test.clj ├── jena_test.clj └── readme_test.clj /.gitignore: -------------------------------------------------------------------------------- 1 | pom.xml 2 | pom.xml.asc 3 | *jar 4 | /lib/ 5 | /classes/ 6 | /target/ 7 | /checkouts/ 8 | .lein-deps-sum 9 | .lein-repl-history 10 | .lein-plugins/ 11 | .lein-failures 12 | .lein-env 13 | .nrepl-port 14 | .DS_Store 15 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: clojure 2 | jdk: 3 | - oraclejdk8 4 | - oraclejdk7 5 | - openjdk7 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, James A. Overton 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | * Neither the name of edn-ld nor the names of its 15 | contributors may be used to endorse or promote products derived from 16 | this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EDN-LD 2 | 3 | [![Build Status](https://travis-ci.org/ontodev/edn-ld.svg?branch=master)](https://travis-ci.org/ontodev/edn-ld) 4 | 5 | EDN-LD is a set of conventions and a library for working with [Linked Data (LD)](http://linkeddata.org) using [Extensible Data Notation (EDN)](https://github.com/edn-format/edn) and the [Clojure programming language](http://clojure.org). EDN-LD builds on EDN and [JSON-LD](http://json-ld.org), but is not otherwise affiliated with those projects. 6 | 7 | **[Try EDN-LD online!](http://try.edn-ld.com)** 8 | 9 | This project is in early development! 10 | 11 | 12 | ## Linked Data 13 | 14 | Linked data is an approach to working with data on the Web: 15 | 16 | - instead of tables we have graphs -- networks of data 17 | - instead of rows we have resources -- nodes in the graph 18 | - the values in our cells are also nodes -- either resources or literals: strings, numbers, dates 19 | - and instead of columns we have named relations that link nodes to form the graph 20 | 21 | Just think of your tables as big sets of row-column-cell "triples". By switching from rigid tables to flexible graphs, we can easily merge data from across the web. 22 | 23 | Linked data is simple. The tools for working with it are powerful: big Java libraries such as [Jena](https://jena.apache.org), [Sesame](http://rdf4j.org), [OWLAPI](http://owlapi.sourceforge.net), etc. Unfortunately, most of the tools are not simple. 24 | 25 | EDN-LD is a simple linked data tool. 26 | 27 | 28 | ## Install 29 | 30 | EDN-LD is a Clojure library. The easiest way to get started is to use [Leiningen](http://leiningen.org) and add this to your `project.clj` dependencies: 31 | 32 | [edn-ld "0.3.0"] 33 | 34 | 35 | ## Tutorial 36 | 37 | Try out EDN-LD with our [interactive online tutorial](http://try.edn-ld.com), or by cloning this project and starting a REPL: 38 | 39 | $ git clone https://github.com/ontodev/edn-ld.git 40 | $ cd edn-ld 41 | $ lein repl 42 | nREPL server started ... 43 | user=> (use 'edn-ld.core 'edn-ld.common) 44 | nil 45 | user=> (require '[clojure.string :as string]) 46 | nil 47 | user=> "Ready!" 48 | Ready! 49 | 50 | Say we have a (very small) table of books and their authors called `books.tsv`: 51 | 52 | Title | Author 53 | ----------|------- 54 | The Iliad | Homer 55 | 56 | A common way to represent this in Clojure is as a list of maps, with the column names as the keys. We can `slurp` and split the data until we get what we want: 57 | 58 | user=> (defn split-row [row] (string/split row #"\t")) 59 | #'user/split-row 60 | user=> (defn read-tsv [path] (->> path slurp string/split-lines (drop 1) (mapv split-row))) 61 | #'user/read-tsv 62 | user=> (def rows (read-tsv "test-resources/books.tsv")) 63 | #'user/rows 64 | user=> rows 65 | [["The Iliad" "Homer"]] 66 | 67 | Now we use `zipmap` to associate keys with values: 68 | 69 | user=> (def data (mapv (partial zipmap [:title :author]) rows)) 70 | #'user/data 71 | user=> data 72 | [{:title "The Iliad", :author "Homer"}] 73 | 74 | We have the data in a convenient shape, but what does it mean? Well, there's some resource that has "The Iliad" as its title, and some guy named "Homer" who is the author of that resource. We also know from the context that it's a book. 75 | 76 | The first thing to do is give names to our resources. Linked data names are [IRIs](https://en.wikipedia.org/wiki/Internationalized_resource_identifier): globally unique identifiers that generalize the familiar URL you see in your browser's location bar. We can use some standard names for our relations from the [Dublin Core](http://dublincore.org) metadata standard, and we'll make up some more. 77 | 78 | Name | IRI 79 | ----------|----------------------------------------- 80 | title | `http://purl.org/dc/elements/1.1/title` 81 | author | `http://purl.org/dc/elements/1.1/author` 82 | The Iliad | `http://example.com/the-iliad` 83 | Homer | `http://example.com/Homer` 84 | book | `http://example.com/book` 85 | 86 | IRIs can be long and cumbersome, so let's define some prefixes that we can use to shorten them: 87 | 88 | Prefix | IRI 89 | -------|----------------------------------- 90 | `dc` | `http://purl.org/dc/elements/1.1/` 91 | `ex` | `http://example.com/` 92 | 93 | The `ex` prefix will be our default. We use strings for full IRIs and keywords when we're using some sort of contraction. 94 | 95 | IRI | Contraction 96 | -----------------------------------------|------------ 97 | `http://purl.org/dc/elements/1.1/title` | `:dc:title` 98 | `http://purl.org/dc/elements/1.1/author` | `:dc:author` 99 | `http://example.com/the-iliad` | `:the-iliad` 100 | `http://example.com/Homer` | `:Homer` 101 | `http://example.com/book` | `:book` 102 | 103 | We'll put this naming information in a *context* map: 104 | 105 | user=> (def context {:dc "http://purl.org/dc/elements/1.1/", :ex "http://example.com/", nil :ex, :title :dc:title, :author :dc:author}) 106 | #'user/context 107 | 108 | The `nil` key indicates the default prefix `:ex`. Now we can use the context to expand contractions and to contract IRIs: 109 | 110 | user=> (expand context :title) 111 | http://purl.org/dc/elements/1.1/title 112 | user=> (expand context :Homer) 113 | http://example.com/Homer 114 | user=> (contract context "http://purl.org/dc/elements/1.1/title") 115 | :title 116 | user=> (contract context "http://purl.org/dc/elements/1.1/foo") 117 | :dc:foo 118 | user=> (expand-all context data) 119 | [{"http://purl.org/dc/elements/1.1/title" "The Iliad", "http://purl.org/dc/elements/1.1/author" "Homer"}] 120 | 121 | Sometimes we also want to *resolve* a name to an IRI. We can define a resources map from string to IRIs or contractions: 122 | 123 | user=> (def resources {"Homer" :Homer, "The Iliad" :the-iliad}) 124 | #'user/resources 125 | 126 | We should include this information in our data by assigning a special `:subject-iri` to each of our maps. We can do this one at a time with `assoc`: 127 | 128 | user=> (def book (assoc (first data) :subject-iri :the-iliad)) 129 | #'user/book 130 | user=> book 131 | {:title "The Iliad", :author "Homer", :subject-iri :the-iliad} 132 | 133 | Or we can use a higher-order function to find the title from the resources map: 134 | 135 | user=> (def books (mapv #(assoc % :subject-iri (get resources (:title %))) data)) 136 | #'user/books 137 | user=> books 138 | [{:title "The Iliad", :author "Homer", :subject-iri :the-iliad}] 139 | 140 | Now it's time to convert our book data to "triples", i.e. statements about things to put in our graph. A triple consists of a subject, a predicate, and an object: 141 | 142 | - the subject is the name of a resource: an IRI 143 | - the predicate is the name of a relation: also an IRI 144 | - the object can either be an IRI or literal data. 145 | 146 | We represent an IRI with a string, or a contracted IRI with a keyword. We represent literal data as a map with special keys: 147 | 148 | - `:value` is the string value ("lexical value") of the data, e.g. "The Iliad", "100.31" 149 | - `:type` is the IRI of a data type, with `xsd:string` as the default 150 | - `:lang` is an optional language code, e.g. "en", "en-uk" 151 | 152 | The `literal` function is a convenient way to create a literal map: 153 | 154 | user=> (literal "The Iliad") 155 | {:value "The Iliad"} 156 | user=> (literal 100.31) 157 | {:value "100.31", :type :xsd:float} 158 | 159 | The `objectify` function takes a resource map and a value, and determines whether to convert the value to an IRI or a literal: 160 | 161 | user=> (objectify resources "Some string") 162 | {:value "Some string"} 163 | user=> (objectify resources "Homer") 164 | :Homer 165 | 166 | Now we can treat each map as a set of statements about a resources, and `triplify` it to a lazy sequence of triples. The format will be "flat triples", a list with slots for: subject, predicate, object, type, and lang. 167 | 168 | The `triplify` function takes our resource map and a map of data that includes a `:subject-iri` key. It returns a lazy sequence of triples. 169 | 170 | user=> (def triples (triplify resources book)) 171 | #'user/triples 172 | user=> (vec triples) 173 | [[:the-iliad :title {:value "The Iliad"}] [:the-iliad :author :Homer]] 174 | 175 | You'll notice that the subject `:the-iliad` is repeated here. With a larger set of triples the redundancy will be greater. Instead we can use a nested data structure: 176 | 177 | user=> (def subjects (subjectify triples)) 178 | #'user/subjects 179 | user=> subjects 180 | {:the-iliad {:title #{{:value "The Iliad"}}, :author #{:Homer}}} 181 | 182 | From the inside out, it works like this: 183 | 184 | - object-set: the set of object with the same subject and predicate 185 | - predicate-map: a map from predicate IRIs to object sets 186 | - subject-map: map from subject IRIs to predicate sets 187 | 188 | We work with these data structures like any other Clojure data, using `merge`, `assoc`, `update`, and the rest of the standard Clojure toolkit: 189 | 190 | user=> (def context+ (merge default-context context)) 191 | #'user/context+ 192 | user=> (def subjects+ (assoc-in subjects [:the-iliad :rdf:type] #{:book})) 193 | #'user/subjects+ 194 | user=> (def triples+ (conj triples [:the-iliad :rdf:type :book])) 195 | #'user/triples+ 196 | 197 | Now, we can write to standard linked data formats, such as Turtle: 198 | 199 | user=> (def prefixes (assoc (get-prefixes context) :rdf rdf :xsd xsd)) 200 | #'user/prefixes 201 | user=> (def expanded-triples (map #(expand-all context+ %) triples+)) 202 | #'user/expanded-triples 203 | user=> (edn-ld.jena/write-triple-string prefixes expanded-triples) 204 | @prefix ex: . 205 | @prefix rdf: . 206 | @prefix xsd: . 207 | @prefix dc: . 208 | 209 | ex:the-iliad a ex:book ; 210 | dc:author ex:Homer ; 211 | dc:title "The Iliad"^^xsd:string . 212 | 213 | One more thing before we're done: *named graphs*. A graph is just a set of triples. When we want to talk about a particular graph, we give it a name: an IRI, of course. Then we can talk about sets of named graphs when we want to compare them, merge them, etc. The official name for a set of graphs is an "[RDF dataset](http://www.w3.org/TR/rdf11-concepts/#section-dataset)". A dataset includes "default graph" with no name. 214 | 215 | By adding the name of a graph, our *triples* become *quads* ("quadruples"). We define a quad and some new functions to handle them. 216 | 217 | user=> (def library [(assoc book :graph-iri :library)]) 218 | #'user/library 219 | user=> library 220 | [{:title "The Iliad", :author "Homer", :subject-iri :the-iliad, :graph-iri :library}] 221 | user=> (def quads (quadruplify-all resources library)) 222 | #'user/quads 223 | user=> (vec quads) 224 | [[:library :the-iliad :title {:value "The Iliad"}] [:library :the-iliad :author :Homer]] 225 | user=> (graphify quads) 226 | {:library {:the-iliad {:title #{{:value "The Iliad"}}, :author #{:Homer}}}} 227 | 228 | 229 | ## More 230 | 231 | - Conference paper about EDN-LD ([PDF](https://github.com/ontodev/icbo2015-edn-ld/blob/master/edn_ld.pdf), [source](https://github.com/ontodev/icbo2015-edn-ld)) 232 | 233 | 234 | ## Change Log 235 | 236 | - 0.3.0 237 | - update to Jena 3.0.1 238 | - 0.2.2 239 | - fix bug in blank node handling 240 | - 0.2.1 241 | - fix bug in edn-ld.jena/make-node 242 | - 0.2.0 243 | - use Apache Jena for reading and writing 244 | - fix `triplify` functions to use `:subject-iri` key 245 | - add `quadruplify` and `graphify` functions, using `:graph-iri` key 246 | - rename `squash` functions to `flatten` 247 | - fix `flatten` functions 248 | - many more unit tests 249 | - prefer Triples to FlatTriples 250 | - 0.1.0 251 | - first release 252 | 253 | 254 | ## To Do 255 | 256 | - finish streaming RDFXML reader and writer 257 | - ClojureScript support? Would require different libraries for reading and writing 258 | 259 | 260 | ## License 261 | 262 | Copyright © 2015 James A. Overton 263 | 264 | Distributed under the BSD 3-Clause License. 265 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject edn-ld "0.3.0" 2 | :description "A simple linked data tool" 3 | :url "https://github.com/ontodev/edn-ld" 4 | :license {:name "BSD 3-Clause License" 5 | :url "http://opensource.org/licenses/BSD-3-Clause"} 6 | :dependencies [[org.clojure/clojure "1.7.0-beta3"] 7 | [prismatic/schema "0.4.2"] 8 | [org.apache.jena/jena-arq "3.0.1"] 9 | [org.codehaus.woodstox/woodstox-core-asl "4.3.0"]] 10 | :plugins [[lein-cljfmt "0.1.10"]]) 11 | -------------------------------------------------------------------------------- /src/edn_ld/common.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.common) 2 | 3 | (def rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#") 4 | 5 | (def rdfs "http://www.w3.org/2000/01/rdf-schema#") 6 | 7 | (def xsd "http://www.w3.org/2001/XMLSchema#") 8 | 9 | (def owl "http://www.w3.org/2002/07/owl#") 10 | 11 | (def default-prefixes {:rdf rdf :rdfs rdfs :xsd xsd :owl owl}) 12 | 13 | (def default-context default-prefixes) 14 | -------------------------------------------------------------------------------- /src/edn_ld/core.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.core 2 | (:require [clojure.string :as string] 3 | [clojure.test :refer :all] 4 | [schema.core :as s] 5 | [edn-ld.common :refer [rdf xsd]])) 6 | 7 | ;; EDN-LD uses [Prismatic Schema](https://github.com/Prismatic/schema) 8 | ;; to specify and validate the data structures that we use. 9 | ;; Clojure is not a strongly typed language like Java. 10 | ;; Instead we build our data structures from a rich set of primitives, 11 | ;; and use schemas to ensure that our data has the right shape. 12 | 13 | 14 | ;; # Identifiers 15 | 16 | ;; Linked data consists of a network of links between resources 17 | ;; named by Internationalized Resource Identifiers (IRIs). 18 | ;; IRIs extend the more familiar URL and URI to use UNICODE characters. 19 | ;; [RFC3987](http://tools.ietf.org/html/rfc3987) 20 | ;; provides a grammar for parsing IRIs, 21 | ;; but for now we will cut corners and allow any String. 22 | 23 | (def IRI s/Str) 24 | 25 | ;; IRIs provide explicit, globally unique names for things. 26 | ;; Anything that we want to talk about should have an IRI. 27 | ;; But sometimes all we need is a local, implicit link, without a global name. 28 | ;; In this case we can use a blank node. 29 | ;; We'll use the [Turtle](http://www.w3.org/TR/turtle/#BNodes) syntax 30 | ;; and say that a blank node is a string that starts with "_:" 31 | 32 | (def BlankNode #"^_:.*$") 33 | 34 | ;; IRIs can be long and cumbersome to work with, 35 | ;; so we'll defined a Contraction to be a keyword that we can expand to an IRI. 36 | 37 | (def Contraction s/Keyword) 38 | 39 | ;; To move between IRIs and Contractions we'll use a PrefixMap or a Context. 40 | ;; A PrefixMap is just a map from prefix keywords to IRIs. 41 | 42 | (def PrefixMap {(s/maybe s/Keyword) IRI}) 43 | 44 | ;; A Context is just a map from Contractions to Contractions or IRIs. 45 | 46 | (def Context {(s/maybe Contraction) (s/either Contraction IRI)}) 47 | 48 | ;; Contexts can be recursive, so they're more convenient to use, 49 | ;; but when specifying output formats we'll need to use a PrefixMap. 50 | ;; To move from a Context to a PrefixMap, this function is usually sufficient: 51 | 52 | (defn get-prefixes 53 | "Given a Context, return a PrefixMap 54 | by removing pairs where the value is not a string." 55 | [context] 56 | (->> context 57 | (filter #(string? (val %))) 58 | (into {}))) 59 | 60 | ;; We'll expand a contracted IRI in a Context by recursively looking up keys. 61 | ;; Since the recursion should not be very deep, we won't bother using `loop`. 62 | ;; Since Contractions are all keywords, we just return any other type of input. 63 | ;; We'll ignore some special keywords used later for Literals. 64 | ;; If the Contraction contains a colon (:), 65 | ;; then we'll split it and look up the prefix part. 66 | ;; E.g. `:rdfs:label` will resolve the prefix `:rdfs` and then append `label`. 67 | ;; Be careful not to build a loop into your Context! 68 | ;; For example: `(expand {:foo :foo} :foo)` 69 | 70 | (def reserved-keywords #{:value :type :lang :subject-iri :graph-iri}) 71 | 72 | (defn reserved? 73 | "Return true if the input is a reserved keyword." 74 | [input] 75 | (contains? reserved-keywords input)) 76 | 77 | (defn expand 78 | "Given a Context and some input (usually a Contraction), 79 | try to return an IRI string. 80 | If the input is not a keyword then just return it; 81 | if the input is a key in the Context then return the expanded value; 82 | if the input has a prefix in the Context then return the joined value; 83 | otherwise use the default prefix." 84 | [context input] 85 | (try 86 | (cond 87 | (not (keyword? input)) 88 | input 89 | (reserved? input) 90 | input 91 | (find context input) 92 | (expand context (get context input)) 93 | (.contains (name input) ":") 94 | (let [[prefix local] (string/split (name input) #":" 2)] 95 | (str (get context (keyword prefix)) local)) 96 | :else 97 | (str (expand context (get context nil)) (name input))) 98 | (catch StackOverflowError e input))) 99 | 100 | (defn expand-all 101 | "Given an optional Context and some collection, 102 | try to expand all Contractions in the collection, 103 | and return the updated collection." 104 | ([coll] 105 | (expand-all 106 | {:rdf:langString (str rdf "langString") 107 | :xsd:string (str xsd "string")} 108 | coll)) 109 | ([context coll] 110 | (clojure.walk/prewalk (partial expand context) coll))) 111 | 112 | ;; Contracting an IRI is a little trickier. 113 | ;; First we consider the case where the IRI is exactly a value in the Context. 114 | ;; So we expand and reverse the context to map from IRIs to Contractions, 115 | ;; then look up keys in the reversed Context. 116 | 117 | ;; Since these functions are likely to be called a lot on small inputs, 118 | ;; we'll use `memoize` to trade time for space. 119 | 120 | (defn reverse-context 121 | "Given a context map from prefixes to IRIs, 122 | return a map from IRIs to prefixes." 123 | [context] 124 | (->> context 125 | (map (juxt #(expand context (val %)) key)) 126 | (into {}))) 127 | 128 | (def memoized-reverse-context (memoize reverse-context)) 129 | 130 | ;; Second we consider the case where the IRI starts with a prefix. 131 | ;; We'll use the longest prefix we can find, 132 | ;; which requires us to sort them by alphanumerically then by length. 133 | 134 | (defn sort-prefixes 135 | "Given a Context, 136 | return a sequence of (IRI prefix) pairs, from longest IRI to shortest." 137 | [context] 138 | (->> context 139 | (map (juxt #(expand context (val %)) key)) 140 | sort 141 | (sort-by (comp count first) >))) 142 | 143 | (def memoized-sort-prefixes (memoize sort-prefixes)) 144 | 145 | ;; The `get-prefixed` uses lazy sequences, 146 | ;; so the minimal number of maps and filters will be used. 147 | 148 | (defn get-prefixed 149 | "Given Context and an input (usually an IRI string), 150 | try to return a Contraction using the longest prefix." 151 | [context input] 152 | (->> context 153 | memoized-sort-prefixes 154 | (filter #(.startsWith input (first %))) 155 | (map 156 | (fn [[uri prefix]] 157 | (string/replace-first 158 | input 159 | uri 160 | (if prefix (str (name prefix) ":") "")))) 161 | (map keyword) 162 | first)) 163 | 164 | ;; Now we define our `contract` function to handle both cases. 165 | 166 | (defn contract 167 | "Given a Context and an input (usually and IRI string), 168 | try to return a Contraction. 169 | If the input is not a string, just return it; 170 | if the input exactly matches a value in the context map, return the key; 171 | otherwise try to use the longest matching prefix." 172 | [context input] 173 | (cond 174 | (not (string? input)) 175 | input 176 | (find (memoized-reverse-context context) input) 177 | (get (memoized-reverse-context context) input) 178 | (get-prefixed context input) 179 | (get-prefixed context input) 180 | :else 181 | input)) 182 | 183 | 184 | ;; # Literals 185 | 186 | ;; We can also link things to literal data, such as strings and numbers. 187 | ;; We represent literals as a map with special keys. 188 | ;; 189 | ;; - :value is the lexical value of the data, and must be present 190 | ;; - :type is an IRI specifying the type of data 191 | ;; - :lang is a code for the language of the data, which must conform to 192 | ;; [BCP 47](http://tools.ietf.org/html/bcp47#section-2.2.9) 193 | ;; 194 | ;; Again, we'll cut corners for now and allow any string to be a language tag. 195 | ;; If the :type is xsd:string we won't include it. 196 | ;; If the :lang key is present, then the :type must be rdf:langString. 197 | ;; So we have three cases: 198 | 199 | (def Lexical s/Str) 200 | 201 | (def Datatype IRI) 202 | 203 | (def Lang s/Str) 204 | 205 | (def DefaultLiteral {:value Lexical}) ; implicit :type :xsd:string 206 | 207 | (def TypedLiteral {:value Lexical :type Datatype}) 208 | 209 | (def LangLiteral 210 | {:value Lexical 211 | (s/optional-key :type) (s/enum (str rdf "langString") :rdf:langString) 212 | :lang Lang}) 213 | 214 | (def Literal (s/either DefaultLiteral LangLiteral TypedLiteral)) 215 | 216 | ;; For convenience, we'll define a multimethod that takes a Clojure value 217 | ;; and returns its datatype IRI. 218 | ;; The default value is xsd:string. 219 | ;; You can extend this multimethod as desired: http://clojure.org/multimethods 220 | 221 | (defmulti get-type 222 | "Given a value, return a best guess at its RDF datatype." 223 | class) 224 | 225 | (defmethod get-type :default [_] :xsd:string) 226 | 227 | (defmethod get-type String [_] :xsd:string) 228 | 229 | (defmethod get-type Integer [_] :xsd:integer) 230 | 231 | (defmethod get-type Long [_] :xsd:integer) 232 | 233 | (defmethod get-type Float [_] :xsd:float) 234 | 235 | (defmethod get-type Double [_] :xsd:float) 236 | 237 | ;; Now we define a convenience function to create a Literal 238 | ;; with an explicit or implicit type. 239 | 240 | (defn literal 241 | "Given a value and an optional type or language tag, return a Literal. 242 | If the second argument starts with '@', consider it a language tag, 243 | otherwise consider it a type IRI." 244 | ([value] (literal (str value) (get-type value))) 245 | ([value type-or-lang] 246 | (if (.startsWith (str type-or-lang) "@") 247 | (literal value nil (.substring (str type-or-lang) 1)) 248 | (literal value type-or-lang nil))) 249 | ([value type lang] 250 | (cond 251 | lang 252 | {:value value 253 | :lang lang} 254 | (= type :xsd:string) 255 | {:value value} 256 | type 257 | {:value value 258 | :type type} 259 | :else 260 | (throw (Exception. (format "'%s' is not a valid type" type)))))) 261 | 262 | 263 | ;; # Triples 264 | 265 | ;; A triple contains a Subject, a Predicate, and an Object. 266 | ;; It's a statement that asserts that the Subject stands in a relationship 267 | ;; to the Object as specified by the Predicate. 268 | ;; A triple is also a directed edge in a graph, 269 | ;; forming a link from the Subject to the Object. 270 | 271 | ;; A Subject must be a resource, either named with an IRI or anonymous 272 | ;; with a BlankNode. 273 | 274 | (def ExpandedSubject (s/either IRI BlankNode)) 275 | 276 | (def ContractedSubject (s/either IRI BlankNode Contraction)) 277 | 278 | ;; A Predicate must be a named resource, an IRI. 279 | 280 | (def ExpandedPredicate IRI) 281 | 282 | (def ContractedPredicate (s/either IRI Contraction)) 283 | 284 | ;; An Object can either be a resource (IRI or BlankNode) or a Literal. 285 | 286 | (def ExpandedObject (s/either IRI BlankNode Literal)) 287 | 288 | (def ContractedObject (s/either IRI BlankNode Contraction Literal)) 289 | 290 | ;; Since an Object can be any one of these types, 291 | ;; we define a ResourceMap as a map from any value to an IRI or Contraction, 292 | ;; then `objectify` function that tries to use the ResourceMap 293 | ;; and returns a literal if it fails. 294 | 295 | (def ResourceMap {s/Any (s/either IRI Contraction)}) 296 | 297 | (defn objectify 298 | "Given an optional ResourceMap and an input value, 299 | return the resource if possible, otherwise a Literal." 300 | ([input] 301 | (objectify nil input)) 302 | ([resource-map input] 303 | (cond 304 | (nil? input) 305 | nil 306 | (keyword? input) 307 | input 308 | (find resource-map input) 309 | (get resource-map input) 310 | :else 311 | (literal input)))) 312 | 313 | ;; Now we can define triples: 314 | 315 | (def ExpandedTriple [ExpandedSubject ExpandedPredicate ExpandedObject]) 316 | (def ExpandedTriples [ExpandedTriple]) 317 | 318 | (def ContractedTriple [ContractedSubject ContractedPredicate ContractedObject]) 319 | (def ContractedTriples [ContractedTriple]) 320 | 321 | (def Triple ContractedTriple) 322 | (def Triples ContractedTriples) 323 | 324 | ;; If we want to write our triples to a table of strings 325 | ;; we can use the FlatTriple format, 326 | ;; which is just a sequence with three, four, or five values: 327 | ;; 328 | ;; - three values when the object is not a Literal 329 | ;; - four values when the object is a TypedLiteral 330 | ;; (to avoid ambiguity, we always specify the type, even if it's xsd:string) 331 | ;; - five values when the object is a LangLiteral 332 | ;; (in which case the type must be rdf:langString) 333 | ;; 334 | ;; FlatTriples are convenient for streaming and working with lazy sequences. 335 | 336 | (def FlatTriple 337 | (s/either 338 | [ContractedSubject ContractedPredicate ContractedSubject] 339 | [ContractedSubject ContractedPredicate Lexical Datatype] 340 | [ContractedSubject ContractedPredicate Lexical Datatype Lang])) 341 | 342 | (def FlatTriples [FlatTriple]) 343 | 344 | ;; The most interesting part of EDN-LD 345 | ;; is converting from general EDN data to triples. 346 | ;; Now we define some functions to make that easy. 347 | ;; We rely on another convention: the special `:subject-iri` key. 348 | ;; Our `triplify` and `triplify-all` functions 349 | ;; expect their input maps to contain a `:subject-iri` key 350 | ;; that will be used to specify the subject of the triple. 351 | ;; The `:subject-iri` key is not treated as a predicate. 352 | 353 | (defn flatten-literal 354 | "Given a literal map and options, return a vector representation. 355 | Option :expand will return default types as IRI instead of Contractions." 356 | [{:keys [value type lang] :as literal}] 357 | (cond 358 | lang 359 | [value :rdf:langString lang] 360 | type 361 | [value type] 362 | value 363 | [value :xsd:string] 364 | :else 365 | (throw (Exception. "Literal map must have a :value.")))) 366 | 367 | (defmulti flatten-triples 368 | "Given a Subject, a Predicate, and an Object, 369 | return a sequence of FlatTriples (usually containing just one FlatTriple)." 370 | (fn [subject predicate object] (class object))) 371 | 372 | (defmethod flatten-triples String 373 | [subject predicate object] 374 | [[subject predicate object]]) 375 | 376 | (defmethod flatten-triples clojure.lang.Keyword 377 | [subject predicate object] 378 | [[subject predicate object]]) 379 | 380 | (defmethod flatten-triples java.util.Map 381 | [subject predicate object] 382 | [(into [subject predicate] (flatten-literal object))]) 383 | 384 | (defn triplify-one 385 | "Given an optional ResourceMap, a Subject, a Predicate, and an Object, 386 | return a Triple. 387 | Tries to avoid circular references where subject and object are the same." 388 | ([subject predicate object] 389 | (triplify-one nil subject predicate object)) 390 | ([resources subject predicate object] 391 | (if (= subject (objectify resources object)) 392 | [subject predicate (objectify nil object)] 393 | [subject predicate (objectify resources object)]))) 394 | 395 | (defn triplify 396 | "Given an optional ResourceMap and a map of data 397 | that has a :subject-iri key, 398 | return a lazy sequence of Triples." 399 | ([input-map] 400 | (triplify nil input-map)) 401 | ([resources input-map] 402 | (->> input-map 403 | (map (juxt (constantly (:subject-iri input-map)) key val)) 404 | ; remove special keys :subject-iri and :graph-iri 405 | (remove #(contains? #{:subject-iri :graph-iri} (second %))) 406 | (map (partial apply triplify-one resources))))) 407 | 408 | (defn triplify-all 409 | "Given an optional ResourceMap and a sequence of input maps 410 | where each map has a :subject-iri key, 411 | return a lazy sequence of Triples." 412 | ([input-maps] 413 | (triplify-all nil input-maps)) 414 | ([resources input-maps] 415 | (mapcat (partial triplify resources) input-maps))) 416 | 417 | ;; Triples are convenient for streaming, but not for everything. 418 | ;; They can be quite redundant. 419 | ;; We also define a nested data structure called a SubjectMap. 420 | ;; From the inside out, it works like this: 421 | ;; 422 | ;; - ObjectSet: the set of object with the same subject and predicate 423 | ;; - PredicateMap: a map from predicate IRIs to ObjectSets 424 | ;; - SubjectMap: map from subject IRIs to PredicateMaps 425 | 426 | (def ObjectSet #{ContractedObject}) 427 | 428 | (def PredicateMap {ContractedPredicate ObjectSet}) 429 | 430 | (def SubjectMap {ContractedSubject PredicateMap}) 431 | 432 | ;; The `subjectify` function rolls a sequence of Triples into a SubjectMap. 433 | 434 | (defn subjectify 435 | "Given a sequence of Triples, return a SubjectMap." 436 | [triples] 437 | (reduce 438 | (fn [coll [subject predicate object datatype lang]] 439 | (update-in 440 | coll 441 | [subject predicate] 442 | (fnil conj #{}) 443 | (if datatype 444 | (literal object datatype lang) 445 | object))) 446 | nil 447 | triples)) 448 | 449 | ;; We can also go the other way, from SubjectMap to Triples. 450 | 451 | (defn flatten-subjects 452 | "Given a SubjectMap, return a lazy sequnce of Triples." 453 | [subject-map] 454 | (apply 455 | concat 456 | (for [[subject predicate-map] subject-map 457 | [predicate object-set] predicate-map 458 | object object-set] 459 | (flatten-triples subject predicate object)))) 460 | 461 | ;; # Quads 462 | 463 | ;; A graph is a set of triples. 464 | ;; If we want to talk about a graph, we give it a name: an IRI, of course. 465 | ;; Unlike other names, we allow a GraphName to be nil. 466 | 467 | (def ExpandedGraphName (s/maybe (s/either IRI BlankNode))) 468 | 469 | (def ContractedGraphName (s/maybe (s/either IRI BlankNode Contraction))) 470 | 471 | ;; When we add the name of a graph to a triple we get a "quad" 472 | 473 | (def ExpandedQuad 474 | [ExpandedGraphName ExpandedSubject ExpandedPredicate ExpandedObject]) 475 | 476 | (def ContractedQuad 477 | [ContractedGraphName ContractedSubject ContractedPredicate ContractedObject]) 478 | 479 | ;; We also define FlatQuads. 480 | ;; WARNING: A FlatQuad can have the same length as a FlatTriple! 481 | ;; We suggest that you stick to either Triples or Quads to avoid ambiguity. 482 | 483 | (def FlatQuads 484 | (s/either 485 | [ContractedGraphName ContractedSubject ContractedPredicate 486 | ContractedSubject] 487 | [ContractedGraphName ContractedSubject ContractedPredicate Lexical Datatype] 488 | [ContractedGraphName ContractedSubject ContractedPredicate Lexical Datatype 489 | Lang])) 490 | 491 | (def FlatQuads [FlatQuads]) 492 | 493 | ;; Now we define `quadruplify` functions, 494 | ;; adding a new special key: `:graph-iri`. 495 | 496 | (defn quadruplify-one 497 | "Given an optional ResourceMap, a GraphName, a Subject, a Predicate, 498 | and an Object, return a Quad. 499 | Tries to avoid circular references where subject and object are the same." 500 | ([graph subject predicate object] 501 | (quadruplify-one nil subject predicate object)) 502 | ([resources graph subject predicate object] 503 | (into [graph] (triplify-one resources subject predicate object)))) 504 | 505 | (defn quadruplify 506 | "Given an optional ResourceMap and a map of data 507 | that includes :subject-iri and :graph-iri keys, 508 | return a lazy sequence of Quads." 509 | ([input-map] 510 | (quadruplify nil input-map)) 511 | ([resources input-map] 512 | (->> input-map 513 | (map (juxt (constantly (:graph-iri input-map)) 514 | (constantly (:subject-iri input-map)) 515 | key 516 | val)) 517 | ; remove special keys :subject-iri and :graph-iri 518 | (remove #(contains? #{:subject-iri :graph-iri} (nth % 2))) 519 | (map (partial apply quadruplify-one resources))))) 520 | 521 | (defn quadruplify-all 522 | "Given an optional ResourceMap and a sequence of input maps 523 | where each map has :subject-iri and :graph-iri keys, 524 | return a lazy sequence of Quads." 525 | ([input-maps] 526 | (quadruplify-all nil input-maps)) 527 | ([resources input-maps] 528 | (mapcat (partial quadruplify resources) input-maps))) 529 | 530 | 531 | ;; We represent a collection of named graphs as one more layer of maps 532 | ;; with GraphNames as keys and SubjectMaps as values. 533 | ;; The "nil" key indicates the default graph. 534 | 535 | (def GraphMap {ContractedGraphName SubjectMap}) 536 | 537 | (defn graphify 538 | "Given a sequence of Quads, return a GraphMap." 539 | [quads] 540 | (reduce 541 | (fn [coll [graph subject predicate object datatype lang]] 542 | (update-in 543 | coll 544 | [graph subject predicate] 545 | (fnil conj #{}) 546 | (if datatype 547 | (literal object datatype lang) 548 | object))) 549 | nil 550 | quads)) 551 | 552 | (defn flatten-graphs 553 | "Given a GraphMap, return a lazy sequnce of Quads." 554 | [graph-map] 555 | (apply 556 | concat 557 | (for [[graph subject-map] graph-map 558 | [subject predicate-map] subject-map 559 | [predicate object-set] predicate-map 560 | object object-set] 561 | (map (partial concat [graph]) 562 | (flatten-triples subject predicate object))))) 563 | -------------------------------------------------------------------------------- /src/edn_ld/jena.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.jena 2 | (:require [clojure.string :as string] 3 | [clojure.java.io :as io] 4 | [edn-ld.common :refer [xsd default-prefixes]]) 5 | (:import (java.io StringReader StringWriter) 6 | (org.apache.jena.graph Triple Node_URI Node_Blank Node_Literal) 7 | (org.apache.jena.sparql.core Quad) 8 | (org.apache.jena.riot.system StreamRDF) 9 | (org.apache.jena.rdf.model ModelFactory RDFNode AnonId) 10 | (org.apache.jena.query DatasetFactory) 11 | (org.apache.jena.datatypes BaseDatatype) 12 | (org.apache.jena.riot RDFDataMgr RDFLanguages Lang))) 13 | 14 | 15 | ;; # Apache Jena 16 | 17 | (defmulti read-node 18 | "Given a Jena Node, return an EDN-LD node." 19 | class) 20 | 21 | (defmethod read-node :default 22 | [node] 23 | nil) 24 | 25 | (defmethod read-node Node_URI 26 | [node] 27 | (.getURI node)) 28 | 29 | (defmethod read-node Node_Blank 30 | [node] 31 | (str "_:" (.getLabelString (.getBlankNodeId node)))) 32 | 33 | (defmethod read-node Node_Literal 34 | [node] 35 | (let [value (.getLiteralLexicalForm node) 36 | type (.getLiteralDatatype node) 37 | type (when type (.getURI type)) 38 | lang (.getLiteralLanguage node)] 39 | (merge 40 | {:value value} 41 | (when type 42 | (when (not= type (str xsd "string")) 43 | {:type type})) 44 | (when-not (string/blank? lang) 45 | {:lang lang})))) 46 | 47 | (defmethod read-node RDFNode 48 | [node] 49 | (cond 50 | (.isURIResource node) 51 | (.getURI node) 52 | (.isAnon node) 53 | (str "_:" (.getLabelString (.getId (.asResource node)))) 54 | (.isLiteral node) 55 | (let [value (.getLiteralLexicalForm node) 56 | type (.getLiteralDatatype node) 57 | type (when type (.getURI type)) 58 | lang (.getLiteralLanguage node)] 59 | (merge 60 | {:value value} 61 | (when type 62 | (when (not= type (str xsd "string")) 63 | {:type type})) 64 | (when-not (string/blank? lang) 65 | {:lang lang}))))) 66 | 67 | (defmulti make-node 68 | "Given a model and an EDN-LD node, return a Jena Node." 69 | (fn [model node] (class node))) 70 | 71 | (defmethod make-node String 72 | [model node] 73 | (if (.startsWith node "_:") 74 | (.createResource model (AnonId. (.substring node 2))) 75 | (.createResource model node))) 76 | 77 | (defmethod make-node java.util.Map 78 | [model {:keys [value type lang] :as node}] 79 | (cond 80 | lang 81 | (.createLiteral model value lang) 82 | type 83 | (.createTypedLiteral model value (BaseDatatype. type)) 84 | :else 85 | (.createTypedLiteral model value (BaseDatatype. (str xsd "string"))))) 86 | 87 | (defn get-model 88 | "Given a PrefixMap and ExpandedTriples, 89 | return a model with the namespace prefixes and triples added." 90 | [prefixes triples] 91 | (let [model (ModelFactory/createDefaultModel)] 92 | (doseq [[prefix iri] (filter #(string? (val %)) prefixes)] 93 | (.setNsPrefix model (name prefix) iri)) 94 | (doseq [[subject predicate object] triples] 95 | (.add 96 | model 97 | (make-node model subject) 98 | (.createProperty model predicate) 99 | (make-node model object))) 100 | model)) 101 | 102 | (defn get-triple-map 103 | "Given Quads, return a map from GraphNames to SubjectMaps." 104 | [quads] 105 | (reduce 106 | (fn [coll quad] 107 | (update-in 108 | coll 109 | [(first quad)] 110 | (fnil conj []) 111 | (->> quad (drop 1) vec))) 112 | {} 113 | quads)) 114 | 115 | (defn get-model-map 116 | "Given a PrefixMap and a map from graph names to Triples, 117 | return a map from graph names to Models." 118 | [prefixes triple-map] 119 | (->> triple-map 120 | (map (juxt key #(get-model prefixes (val %)))) 121 | (into {}))) 122 | 123 | (defn get-dataset 124 | "Given a PrefixMap and ExpandedQuads, 125 | return a dataset with models for each of the graphs." 126 | [prefixes quads] 127 | (let [dataset (DatasetFactory/createMem) 128 | models (->> quads get-triple-map (get-model-map prefixes))] 129 | (doseq [[graph model] (filter key models)] 130 | (.addNamedModel dataset graph model)) 131 | (if (find models nil) 132 | (.setDefaultModel dataset (get models nil)) 133 | (.setDefaultModel dataset (get-model prefixes nil))) 134 | dataset)) 135 | 136 | (defn get-format 137 | "Given a Lang, a format string, a content type, or a filename, 138 | try to return an RDF Lang (file format)." 139 | [format] 140 | (or (when (instance? Lang format) format) 141 | (when (string? format) (RDFLanguages/nameToLang format)) 142 | (when (string? format) (RDFLanguages/contentTypeToLang format)) 143 | (when (string? format) (RDFLanguages/filenameToLang format)) 144 | (throw (Exception. (str "Could not determine format: " format))))) 145 | 146 | (defn get-output 147 | "Given a StringWriter or potential output stream, 148 | return either a StringWriter or an OutputStream." 149 | [output] 150 | (if (instance? StringWriter output) 151 | output 152 | (io/output-stream output))) 153 | 154 | 155 | ;; # Read Triples 156 | 157 | (defn stream-triples 158 | "Given atoms for a PrefixMap and a sequence for ExpandedTriples, 159 | return an instance of StreamRDF for collecting triples. 160 | Quads are ignored." 161 | [prefixes triples] 162 | (reify StreamRDF 163 | (^void start [_]) 164 | (^void triple [_ ^Triple triple] 165 | (swap! 166 | triples 167 | conj 168 | [(read-node (.getSubject triple)) 169 | (read-node (.getPredicate triple)) 170 | (read-node (.getObject triple))])) 171 | (^void quad [_ ^Quad quad]) 172 | (^void base [_ ^String base] 173 | (swap! prefixes assoc :base-iri base)) ; TODO: handle base IRI 174 | (^void prefix [_ ^String prefix ^String iri] 175 | (swap! 176 | prefixes 177 | assoc 178 | (if (string/blank? prefix) nil (keyword prefix)) 179 | iri)) 180 | (^void finish [_]))) 181 | 182 | (defn read-triples 183 | "Given a source path, reader, or input stream, 184 | an optional format name, and an optional base IRI, 185 | return the pair of a PrefixMap and ExpandedTriples." 186 | ([source] 187 | (let [prefixes (atom {}) 188 | triples (atom [])] 189 | (RDFDataMgr/parse (stream-triples prefixes triples) 190 | source) 191 | [@prefixes @triples])) 192 | ([source format] 193 | (let [prefixes (atom {}) 194 | triples (atom [])] 195 | (RDFDataMgr/parse (stream-triples prefixes triples) 196 | source 197 | (get-format format)) 198 | [@prefixes @triples])) 199 | ([source base format] 200 | (let [prefixes (atom {}) 201 | triples (atom [])] 202 | (RDFDataMgr/parse (stream-triples prefixes triples) 203 | source 204 | base 205 | (get-format format)) 206 | [@prefixes @triples]))) 207 | 208 | (defn read-triple-string 209 | "Given an input string, an optional base IRI, and a format name, 210 | return the pair of a PrefixMap and ExpandedTriples." 211 | ([input format] 212 | (read-triple-string input "http://example.com/" format)) 213 | ([input base format] 214 | (read-triples (StringReader. input) base format))) 215 | 216 | 217 | ;; # Write Triples 218 | 219 | (defn write-triples 220 | "Given a destination that can be used as an OutputStream or StringWriter, 221 | an optional format, a PrefixMap, and ExpandedTriples, 222 | write the triples to the destination and return them unchanged." 223 | ([dest prefixes triples] 224 | (write-triples 225 | dest 226 | (RDFLanguages/filenameToLang dest) 227 | prefixes 228 | triples)) 229 | ([dest format prefixes triples] 230 | (with-open [output (get-output dest)] 231 | (RDFDataMgr/write 232 | output 233 | (get-model prefixes triples) 234 | (get-format format))) 235 | triples)) 236 | 237 | (defn write-triple-string 238 | "Given an optional format (defaults to Turtle), 239 | a PrefixMap, and ExpandedTriples, 240 | return a string representation in that format." 241 | ([prefixes triples] 242 | (with-open [writer (StringWriter.)] 243 | (write-triples writer (get-format "ttl") prefixes triples) 244 | (str writer))) 245 | ([format prefixes triples] 246 | (with-open [writer (StringWriter.)] 247 | (write-triples writer format prefixes triples) 248 | (str writer)))) 249 | 250 | 251 | ;; # Read Quads 252 | 253 | (defn stream-quads 254 | "Given atoms for a PrefixMap and a sequence for ExpandedQuads, 255 | return an instance of StreamRDF for collecting quads. 256 | Triples are ignored." 257 | [prefixes quads] 258 | (reify StreamRDF 259 | (^void start [_]) 260 | (^void triple [_ ^Triple triple]) 261 | (^void quad [_ ^Quad quad] 262 | (swap! 263 | quads 264 | conj 265 | [(read-node (.getGraph quad)) 266 | (read-node (.getSubject quad)) 267 | (read-node (.getPredicate quad)) 268 | (read-node (.getObject quad))])) 269 | (^void base [_ ^String base] 270 | (swap! prefixes assoc :base-iri base)) ; TODO: handle base IRI 271 | (^void prefix [_ ^String prefix ^String iri] 272 | (swap! 273 | prefixes 274 | assoc 275 | (if (string/blank? prefix) nil (keyword prefix)) 276 | iri)) 277 | (^void finish [_]))) 278 | 279 | (defn read-quads 280 | "Given a source path, reader, or input stream, 281 | an optional format name, and an optional base IRI, 282 | return the pair of a PrefixMap and ExpandedQuads." 283 | ([source] 284 | (let [prefixes (atom {}) 285 | quads (atom [])] 286 | (RDFDataMgr/parse (stream-quads prefixes quads) 287 | source) 288 | [@prefixes @quads])) 289 | ([source format] 290 | (let [prefixes (atom {}) 291 | quads (atom [])] 292 | (RDFDataMgr/parse (stream-quads prefixes quads) 293 | source 294 | (get-format format)) 295 | [@prefixes @quads])) 296 | ([source base format] 297 | (let [prefixes (atom {}) 298 | quads (atom [])] 299 | (RDFDataMgr/parse (stream-quads prefixes quads) 300 | source 301 | base 302 | (get-format format)) 303 | [@prefixes @quads]))) 304 | 305 | (defn read-quad-string 306 | "Given an input string, an optional base IRI, and a format name, 307 | return the pair of a PrefixMap and ExpandedQuads." 308 | ([input format] 309 | (read-quad-string input "http://example.com/" format)) 310 | ([input base format] 311 | (read-quads (StringReader. input) base format))) 312 | 313 | 314 | ;; # Write Quads 315 | 316 | (defn write-quads 317 | "Given a destination that can be used as an OutputStream or StringWriter, 318 | an optional PrefixMap, an optional format, and ExpandedQuads, 319 | write the quads to the destination and return them unchanged." 320 | ([dest quads] 321 | (write-quads dest default-prefixes quads)) 322 | ([dest prefixes quads] 323 | (write-quads 324 | dest 325 | (RDFLanguages/filenameToLang dest) 326 | prefixes 327 | quads)) 328 | ([dest format prefixes quads] 329 | (with-open [output (get-output dest)] 330 | (RDFDataMgr/write 331 | output 332 | (get-dataset prefixes quads) 333 | (get-format format))) 334 | quads)) 335 | 336 | (defn write-quad-string 337 | "Given an optional PrefixMap, and optional format (defaults to Trig), 338 | and ExpandedQuads, return a string representation in that format." 339 | ([quads] 340 | (write-quad-string default-prefixes quads)) 341 | ([prefixes quads] 342 | (with-open [writer (StringWriter.)] 343 | (write-quads writer (get-format "trig") prefixes quads) 344 | (str writer))) 345 | ([format prefixes quads] 346 | (with-open [writer (StringWriter.)] 347 | (write-quads writer format prefixes quads) 348 | (str writer)))) 349 | -------------------------------------------------------------------------------- /src/edn_ld/rdfxml.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.rdfxml 2 | (:require [clojure.string :as string] 3 | [clojure.java.io :as io] 4 | [edn-ld.common :refer [rdf xsd default-context]]) 5 | (:import (org.codehaus.stax2 XMLInputFactory2 XMLOutputFactory2 6 | XMLStreamReader2 XMLStreamWriter2))) 7 | 8 | ;; WARNING: This code is work in progress! 9 | ;; Does not handle all of RDFXML, has not been optimized, 10 | ;; and has not been properly documented. 11 | 12 | 13 | ;; ## Clojure Type Hints 14 | ;; 15 | ;; Java is a strongly typed programming language, in which the type of every 16 | ;; variable and method is explicitly declared. Clojure is a dynamic language 17 | ;; in which types are usually inferred. The type inference process often 18 | ;; involves "reflection" into classes, and this tends to slow down Clojure code. 19 | ;; 20 | ;; Clojure can be almost as fast as native Java code, but we have to avoid 21 | ;; reflection. We avoid it by adding "type hints" to tell the Clojure compiler 22 | ;; what types to expect for input and return values. Metadata annotation on 23 | ;; functions and values provide the hints: `^QName`, `^XMLInputFactory2`, etc. 24 | 25 | ;; The code in this file is optimized for speed, so we tell the Clojure compiler 26 | ;; to `warn-on-reflection` and add type hints until those warnings go away. 27 | 28 | ;(set! *warn-on-reflection* true) 29 | 30 | ;; ## Factories 31 | ;; 32 | ;; The following functions are used to create factories, readers, writers, 33 | ;; and filters from the Woodstox library. 34 | 35 | (defn create-input-factory 36 | "Create and return Woodstox XMLInputFactory2, with configuration." 37 | ^XMLInputFactory2 [] 38 | (XMLInputFactory2/newInstance)) 39 | 40 | (defn create-output-factory 41 | "Create and return Woodstox XMLOutputFactory2." 42 | ^XMLOutputFactory2 [] 43 | (XMLOutputFactory2/newInstance)) 44 | 45 | ;; By default we will use the same shared input and output factories. 46 | 47 | (def ^XMLInputFactory2 shared-input-factory (create-input-factory)) 48 | (def ^XMLOutputFactory2 shared-output-factory (create-output-factory)) 49 | 50 | (defn create-stream-reader 51 | "Create and return a Woodstox XMLStreamReader2 for a given path. 52 | An XMLInputFactory is optional." 53 | (^XMLStreamReader2 [path] 54 | (create-stream-reader shared-input-factory path)) 55 | (^XMLStreamReader2 [^XMLInputFactory2 input-factory path] 56 | (.createXMLStreamReader input-factory (io/file path)))) 57 | 58 | (defn create-stream-writer 59 | "Create and return a Woodstox XMLStreamWriter2 for a given path. 60 | An XMLOutputFactory is optional." 61 | (^XMLStreamWriter2 [path] 62 | (create-stream-writer shared-output-factory path)) 63 | (^XMLStreamWriter2 [^XMLOutputFactory2 output-factory path] 64 | (.createXMLStreamWriter output-factory (clojure.java.io/writer path)))) 65 | 66 | ; To read an XML file lazily 67 | ; we create a wrapped function using lazy-seq. 68 | ; If the (.hasNext reader) fails, the 69 | ; http://stackoverflow.com/a/19656800 70 | 71 | #_(defn lazy-read-ok 72 | [csv-file] 73 | (with-open [in-file (io/reader csv-file)] 74 | (frequencies (map #(nth % 2) (csv/read-csv in-file))))) 75 | 76 | (defn lazy-read 77 | [path] 78 | (let [reader (create-stream-reader path) 79 | lazy (fn lazy [wrapped] 80 | (lazy-seq 81 | (if (.hasNext reader) 82 | (do (.next reader) 83 | (cons (.getEventType reader) (lazy reader))) 84 | (.close reader))))] 85 | (lazy reader))) 86 | 87 | (defn advance 88 | [reader] 89 | (while (and (.hasNext reader) (not (.isStartElement reader))) 90 | (.next reader))) 91 | 92 | (defn get-context 93 | [reader] 94 | (advance reader) 95 | (when (and (.isStartElement reader) 96 | (= (.getLocalName reader) "RDF")) 97 | (->> (range 0 (.getNamespaceCount reader)) 98 | (map 99 | (fn [i] 100 | [(when-not (string/blank? (.getNamespacePrefix reader i)) 101 | (.getNamespacePrefix reader i)) 102 | (.getNamespaceURI reader i)])) 103 | (into {})))) 104 | 105 | (defn get-element-iri 106 | [context reader] 107 | (when (.isStartElement reader) 108 | (str (get context (.getPrefix reader)) 109 | (.getLocalName reader)))) 110 | 111 | (defn get-text 112 | [reader] 113 | (while (and (.hasNext reader) (not (.hasText reader))) 114 | (.next reader)) 115 | (.getText reader)) 116 | 117 | (defn get-attribute-map 118 | "Given a reader at a start element, 119 | return a map from attribute IRIs to their values." 120 | [context reader] 121 | (when (.isStartElement reader) 122 | (->> (range 0 (.getAttributeCount reader)) 123 | (map 124 | (fn [i] 125 | [(str (get context (.getAttributePrefix reader i)) 126 | (.getAttributeLocalName reader i)) 127 | (.getAttributeValue reader i)])) 128 | (into {})))) 129 | 130 | (defn read-triple 131 | "Read the triples attached to a given element." 132 | [context subject reader] 133 | (when (.isStartElement reader) 134 | (let [element (get-element-iri context reader) 135 | attrs (get-attribute-map context reader) 136 | about (get attrs (str rdf "about")) 137 | resource (get attrs (str rdf "resource")) 138 | datatype (get attrs (str rdf "datatype") (str xsd "string")) 139 | lang (get attrs "lang")] 140 | (cond 141 | ; RDF root element: return nothing 142 | (= element (str rdf "RDF")) 143 | [nil nil] 144 | ; Description element 145 | ; TODO: does not handle non-RDF attributes 146 | (and about (= element (str rdf "Description"))) 147 | [about nil] 148 | ; type declaration element 149 | ; TODO: does not handle non-RDF attributes 150 | about 151 | [about [about (str rdf "type") element nil nil]] 152 | ; predicate resource assertion 153 | resource 154 | [subject [subject element resource nil nil]] 155 | ; predicate langString literal assertion 156 | lang 157 | [subject 158 | [subject element (get-text reader) (str rdf "langString") lang]] 159 | ; predicate typed literal assertion 160 | :else 161 | [subject [subject element (get-text reader) datatype nil]])))) 162 | 163 | (defn myread 164 | [path] 165 | (with-open [reader (create-stream-reader path)] 166 | (let [context (get-context reader)] 167 | (println context) 168 | (loop [triples [] 169 | subject nil] 170 | (if (.hasNext reader) 171 | (let [[subject triple] (read-triple context subject reader)] 172 | (.next reader) 173 | (advance reader) 174 | (recur (conj triples triple) subject)) 175 | triples))))) 176 | 177 | (defn myshow 178 | [path] 179 | (->> path 180 | myread 181 | (remove nil?) 182 | (map println) 183 | doall)) 184 | 185 | (defn compact 186 | "Given a context map and an IRI, 187 | return a triple of the matched prefix, the namespace IRI, and the localname." 188 | [context iri] 189 | (if (string/blank? iri) 190 | ["" (get context nil) iri] 191 | (->> context 192 | (filter #(string? (val %))) 193 | (map (juxt val (comp name key))) 194 | (sort-by (comp count first) >) 195 | (filter #(.startsWith (str iri) (first %))) 196 | first 197 | ((fn [[uri prefix]] 198 | [prefix uri (string/replace-first (str iri) (str uri) "")]))))) 199 | 200 | (defn write-object 201 | "Given a single Object (IRI or Literal), write it to RDFXML." 202 | [writer prefix uri local object] 203 | (.writeCharacters writer "\n ") 204 | (.writeStartElement writer prefix local uri) 205 | (when (string? object) 206 | (.writeAttribute writer "rdf" rdf "resource" object)) 207 | (when (:type object) 208 | (when (and (not= (:type object) (str xsd "string")) 209 | (not= (:type object) (str rdf "langString"))) 210 | (.writeAttribute writer "rdf" rdf "datatype" (:type object)))) 211 | (when (:lang object) 212 | (.writeAttribute writer "xml" nil "lang" (:lang object))) 213 | (when (:value object) 214 | (.writeCharacters writer (:value object))) 215 | (.writeEndElement writer)) 216 | 217 | (defn write-predicate 218 | "Given a predicate and an object set, write all of the objects." 219 | [writer context predicate object-set] 220 | (let [[prefix uri local] (compact context predicate)] 221 | (doseq [object object-set] 222 | (write-object writer prefix uri local object)))) 223 | 224 | (defn pick-first-type 225 | "Given a predicate-map, pick out the first rdf:type, 226 | and return the pair of that type and a predicate map 227 | with that type removed." 228 | [predicate-map] 229 | (let [types (seq (get predicate-map (str rdf "type")))] 230 | [(first types) 231 | (if (seq (rest types)) 232 | (assoc predicate-map 233 | (str rdf "type") 234 | (set (rest types))) 235 | (dissoc predicate-map (str rdf "type")))])) 236 | 237 | (defn write-subject 238 | "Given a subject and predicate map, 239 | write an element for this subject, 240 | with children for all predicate-object pairs." 241 | [writer context subject predicate-map] 242 | (.writeCharacters writer " ") 243 | (let [[type predicate-map] (pick-first-type predicate-map)] 244 | (if type 245 | (let [[prefix uri local] (compact context type)] 246 | (.writeStartElement writer prefix local uri)) 247 | (.writeStartElement writer rdf "Description")) 248 | (.writeAttribute writer "rdf" rdf "about" subject) 249 | (when (seq predicate-map) 250 | (doseq [[predicate object-set] 251 | (seq (dissoc predicate-map (str rdf "type")))] 252 | (write-predicate writer context predicate object-set)) 253 | (.writeCharacters writer "\n "))) 254 | (.writeEndElement writer)) 255 | 256 | (defn write-subjects 257 | [writer context subject-map] 258 | (.writeStartDocument writer) 259 | (.writeCharacters writer "\n") 260 | (.writeStartElement writer "rdf" "RDF" rdf) 261 | (doseq [[prefix uri] (seq context)] 262 | (when (and (nil? prefix) (string? uri)) 263 | (.writeDefaultNamespace writer uri)) 264 | (when (and prefix (string? uri)) 265 | (.writeNamespace writer (name prefix) uri))) 266 | (.writeCharacters writer "\n") 267 | (doseq [subject (butlast (keys subject-map))] 268 | (write-subject writer context subject (get subject-map subject)) 269 | (.writeCharacters writer "\n\n")) 270 | (when (last (keys subject-map)) 271 | (let [subject (last (keys subject-map))] 272 | (write-subject writer context subject (get subject-map subject)) 273 | (.writeCharacters writer "\n"))) 274 | (.writeEndElement writer) 275 | (.writeEndDocument writer) 276 | (.flush writer) 277 | (.close writer)) 278 | 279 | (defn write-file 280 | ([path subject-map] 281 | (write-file path default-context subject-map)) 282 | ([path context subject-map] 283 | (write-subjects 284 | (create-stream-writer path) 285 | context 286 | subject-map))) 287 | 288 | (defn write-string 289 | "Given an optional context and a subject map, 290 | return a string with the RDFXML representation." 291 | ([subject-map] 292 | (write-string default-context subject-map)) 293 | ([context subject-map] 294 | (let [writer (java.io.StringWriter.)] 295 | (write-subjects 296 | (.createXMLStreamWriter shared-output-factory writer) 297 | context 298 | subject-map) 299 | (.toString writer)))) 300 | 301 | (defn mytest 302 | [] 303 | (->> (myread "simple.owl") 304 | (remove nil?) 305 | edn-ld.core/subjectify 306 | (write-file "output.owl" default-context))) 307 | -------------------------------------------------------------------------------- /test-resources/books.tsv: -------------------------------------------------------------------------------- 1 | Title Author 2 | The Iliad Homer 3 | -------------------------------------------------------------------------------- /test/edn_ld/core_test.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.core-test 2 | (:require [clojure.test :refer :all] 3 | [clojure.string :as string] 4 | [schema.core :as s] 5 | [edn-ld.core :refer :all] 6 | [edn-ld.common :refer [rdf xsd]])) 7 | 8 | ; These macros are borrowed from 9 | ; https://github.com/Prismatic/schema/blob/master/test/clj/schema/test_macros.clj 10 | (defmacro valid! 11 | "Assert that x satisfies schema s, and the walked value is equal to the original." 12 | [s x] 13 | `(let [x# ~x] (~'is (= x# ((s/start-walker s/walker ~s) x#))))) 14 | 15 | (defmacro invalid! 16 | "Assert that x does not satisfy schema s, optionally checking the stringified return value" 17 | ([s x] 18 | `(~'is (s/check ~s ~x))) 19 | ([s x expected] 20 | `(do (invalid! ~s ~x) 21 | (sm/if-cljs nil (~'is (= ~expected (pr-str (s/check ~s ~x)))))))) 22 | 23 | ;; For now, any string is a valid IRI. 24 | 25 | (deftest test-iris 26 | (valid! IRI "http://example.com") 27 | (valid! IRI "foo") 28 | (invalid! IRI 123) 29 | (invalid! IRI :foo)) 30 | 31 | (def context 32 | {:ex "http://example.com/" 33 | nil :ex 34 | :foo :ex:foo}) 35 | 36 | (deftest test-context 37 | (valid! Context context)) 38 | 39 | (deftest test-expand 40 | (are [x y] (= (expand context x) y) 41 | nil nil 42 | 123 123 43 | :foo "http://example.com/foo" 44 | :ex "http://example.com/" 45 | :ex:bar "http://example.com/bar" 46 | :baz "http://example.com/baz")) 47 | 48 | (deftest test-contract 49 | (are [x y] (= (contract context x) y) 50 | nil nil 51 | 123 123 52 | "foo" "foo" 53 | "http://example.com/foo" :foo 54 | "http://example.com/bar" :bar)) 55 | 56 | (deftest test-literals 57 | (invalid! Literal nil) 58 | (invalid! Literal "foo") 59 | (invalid! Literal 123) 60 | (invalid! Literal {}) 61 | (invalid! Literal []) 62 | (invalid! Literal {:value 123}) 63 | (invalid! Literal {:value "foo" :type 123}) 64 | (invalid! Literal {:value "foo" :type "bar" :lang "en"}) 65 | (valid! Literal {:value "foo"}) 66 | (valid! Literal {:value "foo" :type "bar"}) 67 | (valid! Literal {:value "foo" :lang "en"}) 68 | (is (= (literal "foo") 69 | {:value "foo"})) 70 | (is (= (literal 123) 71 | {:value "123" :type :xsd:integer})) 72 | (is (= (literal "foo" "bar") 73 | {:value "foo" :type "bar"})) 74 | (is (= (literal "foo" "@bar") 75 | {:value "foo" :lang "bar"}))) 76 | 77 | (deftest test-flatten-triples 78 | (is (= (flatten-triples :s :p :o) 79 | [[:s :p :o]])) 80 | (is (= (flatten-triples :s :p {:value "o"}) 81 | [[:s :p "o" :xsd:string]])) 82 | (is (= (flatten-triples :s :p {:value "o" :type :foo}) 83 | [[:s :p "o" :foo]])) 84 | (is (= (flatten-triples :s :p {:value "o" :lang "en"}) 85 | [[:s :p "o" :rdf:langString "en"]]))) 86 | 87 | (deftest test-objectify 88 | (are [x y] (= (objectify x) y) 89 | nil nil 90 | :foo :foo 91 | "foo" {:value "foo"} 92 | 123 {:value "123" :type :xsd:integer}) 93 | (is (= (objectify {"foo" :foo} "foo") :foo))) 94 | 95 | (deftest test-triplify 96 | (is (= (triplify {:subject-iri :subject 97 | :predicate :object}) 98 | [[:subject :predicate :object]])) 99 | (is (= (triplify {:subject-iri :subject 100 | :predicate "Object"}) 101 | [[:subject :predicate {:value "Object"}]])) 102 | (is (= (triplify {"Object" :object} 103 | {:subject-iri :subject 104 | :predicate "Object"}) 105 | [[:subject :predicate :object]]))) 106 | 107 | (deftest test-quadruplify 108 | (is (= (quadruplify {:subject-iri :subject 109 | :graph-iri :graph 110 | :predicate :object}) 111 | [[:graph :subject :predicate :object]])) 112 | (is (= (quadruplify {:subject-iri :subject 113 | :graph-iri :graph 114 | :predicate "Object"}) 115 | [[:graph :subject :predicate {:value "Object"}]])) 116 | (is (= (quadruplify {"Object" :object} 117 | {:subject-iri :subject 118 | :graph-iri :graph 119 | :predicate "Object"}) 120 | [[:graph :subject :predicate :object]]))) 121 | 122 | (deftest test-subjectify 123 | (is (= (subjectify 124 | [[:subject :predicate :object] 125 | [:subject :predicate "Object" :xsd:string]]) 126 | {:subject {:predicate #{:object {:value "Object"}}}}))) 127 | 128 | (deftest test-graphify 129 | (is (= (graphify 130 | [[:graph :subject :predicate :object] 131 | [:graph :subject :predicate "Object" :xsd:string]]) 132 | {:graph {:subject {:predicate #{:object {:value "Object"}}}}}))) 133 | 134 | (deftest test-flatten-subjects 135 | (is (= (set (flatten-subjects 136 | {:subject {:predicate #{:object {:value "Object"}}}})) 137 | #{[:subject :predicate :object] 138 | [:subject :predicate "Object" :xsd:string]}))) 139 | 140 | (deftest test-flatten-graphs 141 | (is (= (set (flatten-graphs 142 | {:graph {:subject {:predicate #{:object {:value "Object"}}}}})) 143 | #{[:graph :subject :predicate :object] 144 | [:graph :subject :predicate "Object" :xsd:string]}))) 145 | -------------------------------------------------------------------------------- /test/edn_ld/jena_test.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.jena-test 2 | (:require [clojure.test :refer :all] 3 | [clojure.string :as string] 4 | [edn-ld.jena :refer :all] 5 | [edn-ld.common :refer [rdf xsd]]) 6 | (:import (org.apache.jena.riot Lang) 7 | (org.apache.jena.rdf.model ModelFactory))) 8 | 9 | (deftest test-formats 10 | (are [x y] (= (get-format x) y) 11 | "turtle" Lang/TURTLE 12 | "foo.ttl" Lang/TURTLE 13 | "application/turtle" Lang/TURTLE 14 | Lang/TURTLE Lang/TURTLE)) 15 | 16 | (def ex "http://example.com/") 17 | (def test1-turtle 18 | "@prefix ex: . 19 | ex:subject ex:predicate \"Object\"@en ; 20 | ex:predicate ex:object .") 21 | 22 | (def test1-edn 23 | [[(str ex "subject") (str ex "predicate") (str ex "object")] 24 | [(str ex "subject") (str ex "predicate") {:value "Object" :type "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString" :lang "en"}]]) 25 | 26 | (defn clean 27 | [s] 28 | (-> s 29 | (string/replace #"(?m)\s+" " ") 30 | string/trim)) 31 | 32 | (deftest test-triples 33 | (let [[prefixes triples] (read-triple-string test1-turtle "turtle")] 34 | (is (= prefixes {:base-iri ex :ex ex})) 35 | (is (= (set triples) (set test1-edn)))) 36 | (is (= (clean (write-triple-string {:ex ex} test1-edn)) 37 | (clean test1-turtle)))) 38 | 39 | (def test2-trig 40 | "@prefix ex: . 41 | ex:graph { 42 | ex:subject ex:predicate \"Object\"@en ; 43 | ex:predicate ex:object . 44 | }") 45 | 46 | (def test2-edn 47 | [[(str ex "graph") (str ex "subject") (str ex "predicate") (str ex "object")] 48 | [(str ex "graph") (str ex "subject") (str ex "predicate") 49 | {:value "Object" :type "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString" :lang "en"}]]) 50 | 51 | (deftest test-quads 52 | (let [[prefixes quads] (read-quad-string test2-trig "trig")] 53 | (is (= prefixes {:base-iri ex :ex ex})) 54 | (is (= (set quads) (set test2-edn)))) 55 | (is (= (clean (write-quad-string {:ex ex} test2-edn)) 56 | (clean test2-trig)))) 57 | 58 | (deftest test-blank 59 | (let [model (ModelFactory/createDefaultModel) 60 | node (make-node model "_:foo")] 61 | (is (= (read-node node) "_:foo")))) 62 | -------------------------------------------------------------------------------- /test/edn_ld/readme_test.clj: -------------------------------------------------------------------------------- 1 | (ns edn-ld.readme-test 2 | (:require [clojure.test :refer :all] 3 | [clojure.string :as string] 4 | [edn-ld.core :refer :all] 5 | [edn-ld.common :refer :all] 6 | [edn-ld.rdfxml])) 7 | 8 | ;; Parse the README.md file for indented code blocks, 9 | ;; execute anything marked "user=>", 10 | ;; and compare it to the expected output. 11 | 12 | (defn clean 13 | [s] 14 | (-> s 15 | (string/replace #"(?m)\s+" " ") 16 | string/trim)) 17 | 18 | (def prompt "user=> ") 19 | 20 | (defn run-test 21 | [user expected] 22 | (let [command (string/replace user prompt "") 23 | actual (-> command 24 | read-string 25 | eval 26 | str 27 | (string/replace #"^#'edn-ld.readme-test" "#'user"))] 28 | ;(println "C" command) 29 | ;(println "E" expected) 30 | ;(println "A" actual) 31 | ;(println (= actual expected)) 32 | (is (= (clean expected) (clean actual))))) 33 | 34 | (->> "README.md" 35 | slurp 36 | string/split-lines 37 | (filter #(.startsWith % " ")) 38 | (map #(string/replace % #"^ " "")) 39 | (remove #(re-find #"^(\-|\+|\*) " %)) ; remove nested unordered list items 40 | (remove #(re-find #"^(\$|;)" %)) ; remove shell prompts and comments 41 | (drop-while #(not (.startsWith % prompt))) 42 | (drop 4) ; ignore 'use' and 'require' 43 | (partition-by #(.startsWith % prompt)) 44 | (map (partial string/join "\n")) 45 | (partition 2) 46 | (map (partial apply run-test)) 47 | doall 48 | (apply = true)) 49 | --------------------------------------------------------------------------------