├── .gitignore ├── DRAFTS ├── 1G │ ├── LPGs_many_implementation_choices.md │ ├── README.md │ └── a.ttl ├── Lipofuscin │ ├── README.md │ └── media │ │ ├── book.png │ │ └── global_search.png └── lisp_sparql_visidata │ └── README.md ├── README.md ├── SPARQL_value_functions ├── README.md └── aa.csv ├── a_random_permutation └── README.md ├── add_abcl_to_springboot ├── README.md └── media │ ├── back_to_stdout_on_jvm.png │ ├── jvm_starting.png │ ├── now_to_slimv.png │ └── swank_started.png ├── blend_google_sheet_with_wikidata └── README.md ├── building-open-source-projects └── media │ ├── calmly.webp │ └── simply.png ├── cepl ├── README.md ├── a.lisp └── render.png ├── design_implications └── README.md ├── dynamic_pagination_with_sparql_anything └── README.md ├── fused_edges ├── README.md └── media │ ├── articulating_edges.png │ └── fused_edges.png ├── git_repo_as_rdf ├── README.md └── media │ ├── bodge.png │ ├── curl_tweet.png │ ├── exploded_diagram.jpg │ ├── first.png │ ├── maybe.gif │ └── ora_slide.jpg ├── iDE ├── README.md ├── example │ ├── bbc.ttl │ ├── creativework.ttl │ ├── dublin_core_terms.ttl │ ├── owl.ttl │ ├── rdf.ttl │ └── rdfs.ttl └── media │ └── async-execution.gif ├── interfaces_and_personalities ├── README.md └── media │ └── diagram.png ├── intuitive_graph_viz ├── README.md └── media │ ├── blobby.png │ ├── mnemonic_graph.png │ └── standard_graph.png ├── json-ld └── README.md ├── reason-over ├── README.md └── media │ ├── ldow2013-paper-08.pdf │ └── paper.png ├── relational_as_graph └── README.md ├── scraping_with_sparql ├── README.md └── media │ ├── inspector_open.png │ ├── inspector_request_headers.png │ ├── inspector_results.png │ ├── jira_loading.png │ └── screenie.png ├── semantic_messages ├── README.md ├── media │ └── tweet.png └── mo.trig ├── software_in_rdf ├── readme.md └── ytdl.ttl ├── sparql-gotcha ├── README.md └── some.dg ├── using_apl ├── README.md └── media │ ├── APL_logo.png │ └── ethanol.png └── work_on_engineered_artifacts ├── README.md └── media └── vmstat.png /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.swo 3 | -------------------------------------------------------------------------------- /DRAFTS/1G/LPGs_many_implementation_choices.md: -------------------------------------------------------------------------------- 1 | ## LPG's implementation choices 2 | 3 | As a metamodel, LPG (labeled property graph) has too many options on where to say things. 4 | That means to use an LPG representation you must make implementation choices that aren't domain modeling choices. 5 | 6 | The implementaion choices I am refering to are the choices about where to put data. 7 | I call these "implementation" choices because a storage location is often an implementation choice. 8 | 9 | In RDF you don't have those implementation choices to make. 10 | In RDF, there is a single way to say things: as a triple. 11 | 12 | For example, let's say we want to talk about [this paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/) and its sections. 13 | I'll use the LPG representation found in [this project](https://github.com/covidgraph/data_cord19). 14 | 15 | 16 | The following Cypher query (against a Neo4j instance): 17 | ``` 18 | match (s0)-[p0]->(s {id:"c6c001198a1ca7136a0c476b7723d9bf"})-[p:BODYTEXTCOLLECTION_HAS_BODYTEXT]->(o) 19 | return s0,labels(s0),p0,type(p0),s,labels(s),p,type(p),o,labels(o) order by p.position 20 | ``` 21 | 22 | yields: 23 | ``` 24 | ╒══════════════════════════════════════════════════════════════════════╤════════════╤════╤══════════════════════════════╤═════════════════════════════════════════╤══════════════════════╤═══════════════╤═════════════════════════════════╤══════════════════════════════════════════════════════════════════════╤════════════╕ 25 | │"s0" │"labels(s0)"│"p0"│"type(p0)" │"s" │"labels(s)" │"p" │"type(p)" │"o" │"labels(o)" │ 26 | ╞══════════════════════════════════════════════════════════════════════╪════════════╪════╪══════════════════════════════╪═════════════════════════════════════════╪══════════════════════╪═══════════════╪═════════════════════════════════╪══════════════════════════════════════════════════════════════════════╪════════════╡ 27 | │{"cord_uid":"ug7v899j","cord19-fulltext_hash":"d1aafb70c066a2068b02786│["Paper"] │{} │"PAPER_HAS_BODYTEXTCOLLECTION"│{"id":"c6c001198a1ca7136a0c476b7723d9bf"}│["BodyTextCollection"]│{"position":0} │"BODYTEXTCOLLECTION_HAS_BODYTEXT"│{"section":"Introduction","_hash_id":"d32453131e57e46e02884893b9c039ae│["BodyText"]│ 28 | │f8929fd9c900897fb","journal":"BMC Infect Dis","publish_time":"2001-07-│ │ │ │ │ │ │ │","text":"Mycoplasma pneumoniae is a common cause of upper and lower r│ │ 29 | │04","source":"PMC","title":"Clinical features of culture-proven Mycopl│ │ │ │ │ │ │ │espiratory tract infections. It remains one of the most frequent cause│ │ 30 | │asma pneumoniae infections at King Abdulaziz University Hospital, Jedd│ │ │ │ │ │ │ │s of atypical pneumonia particu-larly among young adults. │ │ 31 | │ah, Saudi Arabia","_hash_id":"84d7ffe49e6bde194fc995223bac848b","url":│ │ │ │ │ │ │ │..." } │ │ 32 | │"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/"} │ │ │ │ │ │ │ │ │ │ 33 | │ │ │ │ │ │ │ │ │ │ │ 34 | ├──────────────────────────────────────────────────────────────────────┼────────────┼────┼──────────────────────────────┼─────────────────────────────────────────┼──────────────────────┼───────────────┼─────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼────────────┤ 35 | │{"cord_uid":"ug7v899j","cord19-fulltext_hash":"d1aafb70c066a2068b02786│["Paper"] │{} │"PAPER_HAS_BODYTEXTCOLLECTION"│{"id":"c6c001198a1ca7136a0c476b7723d9bf"}│["BodyTextCollection"]│{"position":1} │"BODYTEXTCOLLECTION_HAS_BODYTEXT"│{"section":"Institution and patient population","_hash_id":"bfbe0ce7b5│["BodyText"]│ 36 | │f8929fd9c900897fb","journal":"BMC Infect Dis","publish_time":"2001-07-│ │ │ │ │ │ │ │2ce5eafe590b0697cf7fb4","text":"KAUH is a tertiary care teaching hospi│ │ 37 | │04","source":"PMC","title":"Clinical features of culture-proven Mycopl│ │ │ │ │ │ │ │tal with a bed capacity of 265 beds and annual admissions of 18000 to │ │ 38 | │asma pneumoniae infections at King Abdulaziz University Hospital, Jedd│ │ │ │ │ │ │ │19000 patients. Patients with M. pneumoniae positive cultures from res│ │ 39 | │ah, Saudi Arabia","_hash_id":"84d7ffe49e6bde194fc995223bac848b","url":│ │ │ │ │ │ │ │piratory specimens were identified over a 24-months" period from Janua│ │ 40 | │"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35282/"} │ │ │ │ │ │ │ │ry, 1997 through December, 1998 for this review."} │ │ 41 | ├──────────────────────────────────────────────────────────────────────┼────────────┼────┼──────────────────────────────┼─────────────────────────────────────────┼──────────────────────┼───────────────┼─────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼────────────┤ 42 | ... 43 | ``` 44 | 45 | So the results show us that (in the metamodel language (LPG)): 46 | - The paper (s0) 47 | - has some properties (key/value pairs) including the paper's title, the journal name it was published in, etc. 48 | - has the label "Paper" 49 | - has a relationship (p0) to some body text collection (s) 50 | - the relationship (p0) is of type "PAPER_HAS_BODYTEXTCOLLECTION" 51 | - The body text collection (s) 52 | - has a property: an identifier 53 | - has the label "BodyTextCollection" 54 | - has a relationship (p) to some body text (o) 55 | - the relationship (p) 56 | - is of type "BODYTEXTCOLLECTION_HAS_BODYTEXT" 57 | - has a property: a position (in the paper) 58 | - The body text (o) 59 | - has some properties including the section name, and the actual text 60 | 61 | 62 | TODO is it more clear to talk about choices or options? or a choice with options? 63 | 64 | 65 | With LPG you can put data in: 66 | - a node property (any key, any value) 67 | - a node label (single key "label", any value) 68 | - a relationship property (any key, any value) 69 | - a relationship type (single key "type", any value) 70 | 71 | That is 4 options and I think that is 3 options too many. 72 | 73 | If you want your data to participate in the "extended graph" (which includes any other graph you might care about later) then you want most of your choices to be data modeling choices not implementation choices. 74 | As your graph participates in the "extended graph" you don't want to your data modeling choice (saying the paper is of type "Paper") to be undermined by the fact that you chose to implement that assertion with a node label while somewhere else in the extended graph a similar assertion was implemented with a node property even though the data modeling agreed on the type "Paper." 75 | 76 | 77 | When you make the same data modeling choice but a difference implementation choice you lose query uniformity. 78 | Let's hold the data modeling choice at: "label" means "type of" and "Paper" is the kind of thing we are talking about and step through the implementation options' corresponding Cypher query: 79 | ``` 80 | match (s:Paper)-[p]-(o) return s,p,o 81 | match (s {label: "Paper"})-[p]-(o) return s,p,o 82 | match (s)-[p:label]-(o:Paper) return s,p,o 83 | match (s)-[p:label]-(o {label: "Paper"}) return s,p,o 84 | 85 | ``` 86 | 87 | 88 | 89 | If want your graph to align with any other graph they must agree in the options they selected for data modeling choices and in the options they chose for implementation choices. 90 | The data modeling choices are hard enough so why complicate things by adding implementation choices? 91 | You can prevent implementation choices from impairing data integration by not making any (implementation choices) by using RDF. 92 | You can't do that with LPG; you must make implementation choices. 93 | 94 | 95 | Now let's look at the implementation choices that were made in [this](https://github.com/covidgraph/data_cord19) project and see what the cost of those choices is. 96 | 97 | 1) Assertion: The paper is a type of "Paper" 98 | - Implementation: node label "Paper" 99 | 100 | 2) Assertion: The paper was published in a specific journal 101 | - Implementation: node property key/value pair: "journal", "BMC Infect Dis" 102 | 103 | 3) Assertion: The paper has a collection of text in the body 104 | - Implementation: relationship type "PAPER_HAS_BODYTEXTCOLLECTION" with node label "BodyTextCollection" 105 | 106 | 4) Assertion: The collection of text in the body has body text in 0th (and 1st, and 2nd, etc.) position in the paper 107 | - Implementation: relationship property key/value pair: "position", 0 (1, 2, etc., etc.) 108 | 109 | 5) Assertion: the body text in position 0 has the literal text "Mycoplasma pneumoniae is a common cause..." 110 | - Implementation: node property key/value pair: "text", "Mycoplasma pneumoniae is a common cause..." 111 | 112 | 113 | Because these 4 different implementation choices were made the cost is that in order to query the graph you need to discover which choice was made for each assertion. 114 | 115 | 116 | But notice how, in natural language, each assertion has the same structure: subject, predicate, object 117 | 118 | ``` 119 | --------------------------------------------------------------------------- 120 | | | subject | predicate | object 121 | |-------------------------------------------------------------------------------- 122 | |1 | the paper | is of type | "Paper" 123 | |2 | the paper | was published in | "BMC Infect Dis" 124 | |3 | the paper | has a collection | the body text collection 125 | |4 | the body text collection | has text in position | 0 126 | |5 | text in position 0 | has literal text | "Mycoplasma pneumoniae is a common cause..." 127 | ------------------------------------------------------------------------------- 128 | ``` 129 | 130 | 131 | So you can use a single implementation option (subject, predicate, object) then you only have to make data modeling choices. 132 | Also if you use a single implementation option, the graph query writer wouldn't need to discover which implementation choices were made. 133 | 134 | This single implementation option is what RDF does. 135 | With RDF there is only one query flavor to find things that are of type "Paper": 136 | ``` 137 | select * where {?s :type :Paper} 138 | ``` 139 | 140 | With RDF, if we allow different data modeling options then query content changes but not structure: 141 | ``` 142 | select * where {?s :type :Paper} 143 | select * where {?s :typeOf :Paper} 144 | select * where {?s :type :JournalArticle} 145 | select * where {?s :typeOf :JournalArticle} 146 | ``` 147 | 148 | But with LPG, if we allow different data modeling options then query content changes *and* the structure can change to (because there are different implementation options): 149 | ``` 150 | match (s:Paper)-[p]-(o) return s,p,o 151 | match (s {type: "Paper"})-[p]-(o) return s,p,o 152 | match (s)-[p:type]-(o:Paper) return s,p,o 153 | match (s)-[p:type]-(o {type: "Paper"}) return s,p,o 154 | 155 | match (s:Paper)-[p]-(o) return s,p,o 156 | match (s {typeOf: "Paper"})-[p]-(o) return s,p,o 157 | match (s)-[p:typeOf]-(o:Paper) return s,p,o 158 | match (s)-[p:typeOf]-(o {typeOf: "Paper"}) return s,p,o 159 | 160 | match (s:JournalArticle)-[p]-(o) return s,p,o 161 | match (s {type: "JournalArticle"})-[p]-(o) return s,p,o 162 | match (s)-[p:type]-(o:JournalArticle) return s,p,o 163 | match (s)-[p:type]-(o {type: "JournalArticle"}) return s,p,o 164 | 165 | match (s:JournalArticle)-[p]-(o) return s,p,o 166 | match (s {typeOf: "JournalArticle"})-[p]-(o) return s,p,o 167 | match (s)-[p:typeOf]-(o:JournalArticle) return s,p,o 168 | match (s)-[p:typeOf]-(o {typeOf: "JournalArticle"}) return s,p,o 169 | ``` 170 | -------------------------------------------------------------------------------- /DRAFTS/1G/README.md: -------------------------------------------------------------------------------- 1 | # Graph? Yes! Which one? RDF! 2 | 3 | 4 | AWS Neptune supports LPG and RDF but they are separate sides of Neptune -- that is, LPG and RDF aren't interoperable. 5 | Some key AWS Neptune team members are [thinking](https://www.lassila.org/publications/2021/scg2021-lassila+etal.pdf) about LPG and RDF graph interoperability in what they call "one graph" (1G). 6 | The [idea](https://www.lassila.org/publications/2021/scg2021-lassila+etal-preso.pdf) is that 1G is comprehensive enough to support LPG and RDF metamodels at the same time. 7 | 8 | You can understand why they'd want to do that. 9 | If they can appeal to all the customers that can't easily decided between LPG and RDF they'd appeal to many more customers. 10 | 11 | Deciding between LPG and RDF is a matter of deciding between today and many tomorrows, respectively. 12 | 13 | 14 | 15 | ## Why Graphs 16 | 17 | When a data scientist is looking at some tabular data she gets a semantic network (a graph) in her head. 18 | ``` 19 | ---------------------------------------------------------------- 20 | | where | who | what 21 | ------------------------------------------------------------------- 22 | | "library" | "Colonel Mustard" | "candlestick" 23 | | "conservatory" | "Mrs. Peacock" | "lead pipe" 24 | ``` 25 | 26 | She doesn't just mentally interact with the string "library." 27 | She mentally interacts with the thing [library](https://www.wikidata.org/wiki/Q29843656). 28 | The thing library is a specific kind of room. 29 | It is used to store books. 30 | It is related to [library](https://www.wikidata.org/wiki/Q7075) the institution. 31 | 32 | So graphs aren't magical. 33 | They are just representations of what happens in our heads when we interact with data. 34 | And if you put (explicitly) that cool stuff that happens in our heads inside some computers you can use that cool stuff programmatically. 35 | Clearly we need graphs to do cool stuff. 36 | 37 | 38 | 39 | 40 | ## today's needs (LPG) # vs. many of tomorrow's needs (RDF) (i need this section somewhere ) 41 | 42 | I find myself looking for reasons to solve [the general problem](https://xkcd.com/974/) but not everyone does. 43 | I'm glad we have a mixture of reason seekers. 44 | 45 | > Making a good choice between the two technology stacks is complex and requires a balanced consideration of data modeling aspects, query language features, and their adequacy for current and future use cases. 46 | 47 | When it comes to future use cases, in my experience, all roads lead to (data) integration. 48 | RDF/SPARQL is specifically designed with data integration in mind (global URIs, federated queries, a single representation pattern (the triple), etc.). 49 | 50 | > ... we often see information architects prefer the features of the RDF model because of a good fit with use cases for data alignment, master data management, and data exchange. 51 | 52 | To put it not so carefully, I would say that LPG seems to be specifically designed with the following in mind: 53 | - introducing graphs to software developers (who know json) 54 | - and not strongly encouraging them to model their domain thoughtfully 55 | - supporting path traversal well 56 | 57 | In other words: LPG was designed to support "[point solutions](https://allegrograph.com/why-young-developers-dont-get-knowledge-graphs/)." 58 | 59 | 60 | > Software developers often choose an LPG language because they find it more natural and more "compatible" with their programming paradigm. 61 | 62 | Making a decision based on what your team is comfortable with (LPG) is about today. 63 | 64 | 65 | ### domain modeling 66 | 67 | We have a finite number of choices we can make per day. 68 | 69 | Kurt Cagle [notes](https://www.bbntimes.com/technology/the-pros-and-cons-of-rdf-star-and-sparql-star) "That RDF is not used as much tends to come down to the fact that most developers prefer to model their domain as little as possible." 70 | 71 | Software developers certainly model their logical and physical schema as needed, but modeling of the conceptual schema (the domain) is given minimal attention. 72 | 73 | That developers prefer to model their domain as little as possible causes things like this: 74 | 75 | > Note that the choice of LPG can also happen when RDF is dismissed out of hand because it is viewed as complex and "academic". 76 | 77 | The use of LPG makes it more natural to skip thoughtful domain modeling (which is the "academic" part) . 78 | I say that because with LPG there are many [implementation choices](TODO link to section) to make and domain modeling choices. 79 | Whereas, with RDF there are no implementation choices (there are only domain modeling choices). 80 | Because LPG allows software developers to spend their daily budget of decisions on implementation choices the domain modeling can get the attention scraps. 81 | 82 | 83 | > Regardless of what the reasons, we believe that the (forced) choice of graph models slows the adoption of graphs because it creates confusion and segmentation in the graph database space. 84 | 85 | I agree with that but I am not sure if the optimal way to un-segment the graph database space is to work hard making a model (the "1G" model) to accommodate a metamodel that makes it more natural to skip thoughtful domain modeling. 86 | After AWS Neptune lands all the indecisive customers, the customers will still eventually have to get onto the important and tricky business of domain modeling so why not just have the customers start now. 87 | 88 | 89 | 90 | 91 | ## Statements About Statements 92 | 93 | 94 | A big part of LPG is the ability to make statements about statements (with relationship properties). 95 | But the ability to make statements about statements encourages you to skip more thoughtful domain modeling. 96 | And it is the thoughtful domain modeling that enables data integration and it allows query writers to explore generalizations such as analogy. 97 | 98 | Let's look at what thoughtful domain modeling looks like. 99 | 100 | An example from the 1G paper is: 101 | ``` 102 | << :Alice :knows :Bob >> :since "2020-01-01"^^xsd:date . 103 | ``` 104 | Which is here represented in RDF-star but which is also easy to represent LPG (for example with a relationship property). 105 | 106 | Maybe you think that statement is easy to read and work with. 107 | It seems to say something like: "Alice has known Bob since 2020." 108 | 109 | But my ontologist colleagues say something like "I probably wouldn't reify that state of affairs like that." 110 | 111 | The RDF representation I'll show is a little more wordy than the rdf-star version but it offers data integration advantages. 112 | The data integration advantages of RDF with thoughtful domain modeling arise from the fact that relationships (like `:knows`) don't have to be merely a single edge type, instead you can decompose "knowing" into the structure of the RDF graph like the following: 113 | 114 | ``` 115 | # alice knows bob and we know how (a conversation) and since when (day granularity) 116 | 117 | :event-045 a :Conversation , :Introduction ; 118 | gist:hasActualStart "2020-01-01"^^xsd:date ; 119 | :hasParticipation [ a :Participation ; 120 | :hasRole :Interlocutor ; 121 | :hasParticipant :Bob ] ; 122 | :hasParticipation [ a :Participation ; 123 | :hasRole :Interlocutor ; 124 | :hasParticipant :Alice ] . 125 | ``` 126 | That is, knowing (as defined here) consists of: 127 | - awareness of another (`:Introduction`) 128 | - some historical event (`has:hasActualStart`) such as a (`:Conversation`) 129 | - acts of participation (`:Participation`) 130 | - with roles (such as `:Interlocutor`) 131 | - and participants 132 | 133 | (see [a.ttl](./a.ttl) for more context) 134 | 135 | 136 | By putting information into the structure of the graph and by using a Pareto vocabulary (ontology) you can do the following: 137 | 138 | A) Represent secondary or tertiary facts about previously stated facts. 139 | 140 | ``` 141 | # Fred witnessed Alice meeting Bob 142 | 143 | :event-045 :hasParticipation [ a :Participation ; 144 | :hasRole :Observer ; 145 | :hasParticipant :Fred ] . 146 | ``` 147 | Also see [a.ttl](./a.ttl) for tertiary fact representation. 148 | 149 | 150 | B) Generalize to other events with the same vocabulary. 151 | 152 | See [a.ttl](./a.ttl) for representations of film productions and rocket launches using the same vocabulary. 153 | You'll notice how representations don't just bottom out in strings. 154 | "Things not strings" is a common refrain in the RDF world. 155 | 156 | 157 | 158 | Put another [way](https://www.bbntimes.com/technology/the-pros-and-cons-of-rdf-star-and-sparql-star): "RDF-star [(statements about statements)] should not be used to solve [domain] modeling deficiencies." 159 | Kurt Cagle also notes "do you need RDF-star? From the annotational standpoint, quite possibly, as it provides a means of tracking volatile property and relationship value changes over time." 160 | While you could, I think I might instead use something like a triplestore with immutable state (e.g. [asami](https://github.com/threatgrid/asami)) when I have a volatile named graph. 161 | In general, however, I would keep named graphs read-only as often as possible. 162 | No, nevermind on that. ? 163 | Most data is read-only-flavored anyway because as things move from the present into the past their volatility is over. 164 | 165 | 166 | ## implementation choices 167 | 168 | With LPG you can put representations in: 169 | - a node property (any key, any value) 170 | - a node label (single key "label", any value) 171 | - a relationship property (any key, any value) 172 | - a relationship type (single key "type", any value) 173 | 174 | That is 4 options and I think that is 3 options too many. 175 | See [LPGs_many_implementation_choices](./LPGs_many_implementation_choices.md) for an expansion on that. 176 | 177 | -------------------------------------------------------------------------------- /DRAFTS/1G/a.ttl: -------------------------------------------------------------------------------- 1 | @prefix gist: . 2 | @prefix owl: . 3 | @prefix rdf: . 4 | @prefix rdfs: . 5 | @prefix skos: . 6 | @prefix xml: . 7 | @prefix xsd: . 8 | @prefix : . 9 | 10 | ############################################################### 11 | # alice knows bob and we know how (a conversation) and since when (day granularity) 12 | 13 | :Alice a :Human . 14 | :Bob a :Human . 15 | 16 | :event-045 a :Conversation , :Introduction ; 17 | gist:hasActualStart "2020-01-01"^^xsd:date ; 18 | :hasParticipation [ a :Participation ; 19 | :hasRole :Interlocutor ; 20 | :hasParticipant :Bob ] ; 21 | :hasParticipation [ a :Participation ; 22 | :hasRole :Interlocutor ; 23 | :hasParticipant :Alice ] . 24 | # NOTE :Interlocutor implies the introduction was symmetric, right ? 25 | # 26 | # maybe there could be a rule like: if there is an introduction where 27 | # each act of participation :hasRole something that implies symmetry 28 | # then the whole introduction was symmetric 29 | # -> all participants know each other 30 | ############################################################### 31 | 32 | 33 | ############################################################### 34 | # harry knows sally and/or sally knows harry but we don't know how or since when 35 | # 36 | # i don't think we can know if the knowing is symmetric because we don't know 37 | # the role each played in the participation 38 | 39 | :Harry a :Human . 40 | :Sally a :Human . 41 | 42 | :event-0248 a :Introduction ; 43 | :hasParticipation [ a :Participation ; 44 | :hasParticipant :Harry ] ; 45 | :hasParticipation [ a :Participation ; 46 | :hasParticipant :Sally ] . 47 | ############################################################### 48 | 49 | # :hasParticipant is gist:hasParticipant ? 50 | 51 | 52 | 53 | 54 | # << :event-045 :hasParticipant :Bob >> :objectHasRole :Interlocutuor . 55 | # vs 56 | # 57 | # :event-045 :hasParticipation [ a :Participation ; 58 | # :hasRole :Interlocutuor ; 59 | # :hasParticipant :Bob ] . 60 | 61 | 62 | :Human rdfs:subClassOf :Agent . 63 | 64 | :Participation a rdfs:Class ; 65 | rdfs:comment "An act of participation " . 66 | 67 | :hasParticipation a owl:ObjectProperty ; # TODO 68 | rdfs:label "has act of participation" . 69 | 70 | :Introduction rdfs:subClassOf gist:Event ; 71 | rdfs:comment "An Event that involves an explicit brininging together of Things ... not previously brought together?" 72 | # ^ TODO something about awareness for agents and causal proximity for non-agents 73 | 74 | :Conversation rdfs:subClassOf gist:Event ; # or activity? 75 | rdfs:comment "An Event that involves communication among Agents." 76 | 77 | 78 | # P710 79 | # :knows is symmetric 80 | # :knowsOf is not 81 | 82 | # TODO look at triples for "object has role" qualifier in WD. 83 | 84 | 85 | 86 | ############################################################### 87 | # Fred witnessed Alice meeting Bob 88 | 89 | :event-045 :hasParticipation [ a :Participation ; 90 | :hasRole :Observer ; 91 | :hasParticipant :Fred ] . 92 | 93 | ############################################################### 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | :event-555 a gist:Event ; 103 | rdfs:comment "SpaceX SN-8 launch" ; 104 | :hasPart :event-0834 . 105 | 106 | :event-0834 a :ChemicalReaction ; 107 | gist:hasActualStart "2020-12-08"^^xsd:date ; 108 | :hasParticipation [ a :Participation ; 109 | :hasRole :Oxidizer ; 110 | :hasParticipant [ a :Oxygen ] ] ; 111 | :hasParticipation [ a :Participation ; 112 | :hasRole :Fuel ; 113 | :hasParticipant [ a :Methane ] ] . 114 | 115 | # :Fuel :closeMatch :Combustable . # ? 116 | 117 | 118 | 119 | 120 | :Casablanca a :Film ; 121 | gist:name "Casablanca" . 122 | 123 | :event-120009 a :Event ; 124 | gist:produces :Casablanca ; 125 | :hasParticipation [ a :Participation ; 126 | :hasRole :Actor ; 127 | :hasParticipant :HumphreyBogart ; 128 | :hasCharacterRole :RickBlaine ] ; 129 | :hasParticipation :Participation-5998 . 130 | 131 | :Participation-5998 a :Participation ; 132 | :hasRole :Director ; 133 | :hasParticipant :MichaelCurtiz ; 134 | 135 | 136 | 137 | ############################################################### 138 | # Michael Curtiz was paid $73,400 for directing Casablanca. 139 | # according to 140 | # https://www.memorabletv.com/features/casablanca-the-complete-budget-breakdown/ 141 | # TODO make a named graph 142 | :Balance-00347 gist:goesToAgent :MichaelCurtiz . 143 | :Balance-00347 a gist:Balance ; 144 | gist:isBasedOn :Participation-5998 ; 145 | TODO:hasMagnitude [ rdf:value 73400 ; 146 | TODO:unit gist:_USDollar ] . 147 | ############################################################### 148 | 149 | 150 | 151 | 152 | # Some computer scientists attend The Mother of All Demos where Douglas 153 | # Englebart is the presentor 154 | :event-111003 a :Presentation ; 155 | rdfs:label "The Mother of All Demos" ; 156 | gist:hasActualStart "1968-12-08"^^xsd:date ; 157 | :hasParticipation [ a :Participation ; 158 | :hasRole :Presentor ; 159 | :hasParticipant :DouglasEnglebart ] ; 160 | :hasParticipation [ a :Participation ; 161 | :hasRole :Observer ; 162 | :hasParticipant :AlanKay ] ; 163 | :hasParticipation [ a :Participation ; 164 | :hasRole :Observer ; 165 | :hasParticipant :AndriesVanDam] ; 166 | :hasParticipation [ a :Participation ; 167 | :hasRole :Observer ; 168 | :hasParticipant :CharlesIrby] ; 169 | :hasParticipation [ a :Participation ; 170 | :hasRole :Observer ; 171 | :hasParticipant :BobSproull] . 172 | 173 | 174 | # Douglas Englebart and Alan Kay attend the anniversary 175 | :event-221010 a :Presentation ; 176 | rdfs:label "The Mother of All Demos 30th Anniversary" ; 177 | gist:hasActualStart "1998-12-08"^^xsd:date ; 178 | :hasParticipation :Participation-8573344 ; 179 | :hasParticipation [ a :Participation ; 180 | :hasRole :Observer ; 181 | :hasParticipant :AlanKay ] ; 182 | 183 | :Participation-8573344 a :Participation ; 184 | :hasRole :Observer ; 185 | :hasParticipant :DouglasEnglebart . 186 | 187 | # Alan Kay observes that Douglas Englebart is at the anniversary as an observer 188 | :Participation-8573344 :hasParticipation :Participation-5553344 . 189 | :Participation-5553344 a :Participation ; 190 | :hasRole :Observer ; 191 | :hasParticipant :AlanKay ] . 192 | 193 | # Douglas Englebart observes that Alan Kay observes that Douglas Englebart is at the anniversary as an observer 194 | :Participation-5553344 :hasParticipation _:b1 . 195 | _:b1 a :Participation ; 196 | :hasRole :Observer ; 197 | :hasParticipant :DouglasEnglebart . 198 | 199 | # TODO how would we say that: 200 | # Douglas Englebart does not observe that Alan Kay observes that Douglas Englebart is at the anniversary as an observer 201 | # do we need any owl negative fact axioms? 202 | # i think like this... yuck 203 | [ a owl:NegativePropertyAssertion ; 204 | owl:sourceIndividual :Participation-5553344 ; 205 | owl:assertionProperty :hasParticipation ; 206 | owl:targetIndividual _:b1 ] 207 | # https://www.w3.org/2007/OWL/wiki/Quick_Reference_Guide 208 | # 209 | # i think we could say no one witnessed something like: 210 | # :hasRole :Nothing ; 211 | # or maybe we need to close the world with a closure axiom? 212 | -------------------------------------------------------------------------------- /DRAFTS/Lipofuscin/README.md: -------------------------------------------------------------------------------- 1 | # Communication Breakdown > Lack of Data 2 | 3 | ### On the unique name assumption spectrum 4 | 5 | NOTE: I am still working on this. 6 | 7 | 8 | I was reading De Grey's "[Ending Aging](https://archive.org/details/endingagingrejuv00degr)" and I came across this: 9 | 10 | > Researchers tend to get holed up in their narrowly specialized fields of study, and conse¬ quently they, too, rarely compare notes and observe the confluence of ob¬ servations in different fields of science (or even subfields within those fields) ... [facts are] being ob¬ scured by the use of specialist jargon ... 11 | 12 | It stood out to me beacuse it is something I also encounter in my work with linked data / the semantic web / knowledge graphs. 13 | 14 | ![book](media/book.png) 15 | 16 | ## Quick Background 17 | 18 | The context for that quote is that De Grey is describing how [lysosome](https://en.wikipedia.org/wiki/Lysosome) failure accounts for a range of diseases. 19 | 20 | 21 | ## Garbage Catastrophe theory of aging 22 | 23 | Let's just let the book describe it: 24 | 25 | > With his collaborator Alex Terman, Brunk outlined a “garbage catas¬ trophe” theory of aging, in which accumulating lipofuscin inside the lyso¬ some dilutes the organelle’s acidity and supply of enzymes. In this model, lipofuscin also wastes a lot of the enzymes that the cell body produces, by sucking them up without making effective use of them, thereby diverting them away from the other, still-functional lysosomal contents against which they could be put to effective use. 26 | 27 | 28 | ## Name #1 29 | 30 | "lipofuscin" 31 | 32 | > Lipofuscin is actually not a single, specific compound, but a catch-all term for the mixture of stubborn waste products that refuse to be broken down af¬ ter they’ve been sent to the lysosome for degradation—materials so chemically convoluted that the normal complement of lysosomal enzymes just doesn’t know how to deal with them. 33 | 34 | 35 | 36 | Since De Grey was giving this theory a try he thought he should look for lipofuscin references in the scientific literature. 37 | 38 | 39 | > But I wasn’t yet convinced that lysosomal failure was truly a significant contribu¬ tor to aging, because if the theory were right you would expect to find evi¬ dence connecting lipofuscin to actual age-related disease, and no such evidence initially turned up when I went looking for it. 40 | 41 | > I quickly learned, however, that **this seeming lack of data was more of a communication breakdown than an information vacuum.** Researchers tend to get holed up in their narrowly specialized fields of study, and conse¬ quently they, too, rarely compare notes and observe the confluence of ob¬ servations in different fields of science (or even subfields within those fields). **I soon found that if I stopped specifically talking about “lipofuscin” and began asking researchers about the importance of lysosomal dysfunc¬ tion in the diseases that they studied, I was suddenly inundated with evi¬ dence that the accumulation of junk that should be processed in the lysosome was at the heart of the matter**—but that this fact was being ob¬ scured by the use of specialist jargon in referring to those wastes. 42 | 43 | (emphasis mine) 44 | 45 | 46 | 47 | ## Name #2 48 | 49 | "foam cells" 50 | 51 | > As I quickly learned, researchers had been placing lysosomal failure at the core of the molecular events that underlie the formation of atherosclerotic plaques for years before I began looking into the issue—and they did so without ever mentioning “lipofuscin.” 52 | 53 | 54 | 55 | 56 | ## Name #3 57 | 58 | "A2E" (when it pertains to Macular Degeneration) 59 | 60 | > But again, because of the specialist terminology in use (A2E, rather than “lipofuscin”), the role of lysosomal inadequacy has been—and you will pardon the unfortunate pun!—obscured. 61 | 62 | 63 | ## TODO 64 | 65 | NFT (neurofibrillary tangles) and Lewy Bodies are not lipofoscin. (see screenshot) 66 | > people specifically looking for a connection with "lipofuscin" can miss these data... 67 | 68 | ## Wikipedia 69 | 70 | So that's three distinct names... 71 | 72 | I didn't look around for sources that were around in 2008 (when the book was published) but the Wikipedia page today for [Lipofuscin](https://en.wikipedia.org/wiki/Lipofuscin) does mention macular degeneration, A2E, and sclerotic arteries. 73 | 74 | ## Wikidata 75 | 76 | Since De Grey notes the equivalence (or at least the connection) of these terms I wanted to see if connections were [machine-readably](https://en.wikipedia.org/wiki/Resource_Description_Framework) specified since they are human-readably specificed in the Wikipedia page on Lipofuscin. So I did a little SPARQL querying on Wikidata. 77 | 78 | TODO add the triples 79 | 80 | I found wd:Q217740 81 | "label": "Lipofuscin", 82 | schema.org description pigment jaune brun 83 | a subclass of the class of chemical compounds 84 | hasPart carbon 85 | 86 | 87 | and I found wd:Q27139841 88 | "description": "chemical compound", 89 | "text": "A2E" 90 | A2E is an altLabel, so is di-retinoid-pyridinium-ethanolamine 91 | label service calls it: N-retinylidene-N-retinylethanolamine 92 | 93 | 94 | 95 | I did an iterative deepening search looking for a sequence of forward edges connecting Lipofuscin to A2E. 96 | No results up to about 6 edges but then the queries started timing out. 97 | 98 | Then I did an iterative deepening search looking for a sequence of forward edges connecting A2E to Lipofuscin. 99 | No results up to about 6 edges but then the queries started timing out. 100 | 101 | Notice that I didn't do an iterative deepening search looking for a sequence of forward and/or backward edges connecting A2E to Lipofuscin. That would be a good idea since not all edges in Wikidata have their entailed inverse. 102 | e.g. 103 | 104 | this triple is in the Wikidata graph: 105 | 106 | . 107 | 108 | which says "Aubrey De Grey resides in San Francisco." 109 | 110 | but this triple is not in the graph: 111 | 112 | . 113 | 114 | which says "San Francisco has resident Aubrey De Grey." [1] 115 | 116 | 117 | Also I found 118 | "id": "Q38115664", 119 | "label": "Foam cells in atherosclerosis.", 120 | which is an instance of "scholarly article" 121 | but I didn't try to look for connections. 122 | 123 | 124 | 125 | There are people doing experiments and recording results with equipment I don't have access to and with techniques I do not know about. But I do know about graph query langauges and I like to integrate things. If those results were recorded using RDF (making using of existing ontologies and extending them when necessary) then it would allow people like me to integrate results across communication boundaries. 126 | "id": "Q50288904", 127 | "label": "A2E is phagocytosed", 128 | 129 | 130 | 131 | ## What is the next step for me? 132 | 133 | I've already spent a couple hours getting familar with Wikidata's [data model](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model) and vocabulary prefixes and searching for terms related to lipofuscin. 134 | I'd like to figure out where and in what format I could make the machine-readable statements connecting A2E, lipofuscin, NLT, Lewy Bodies (perhaps with a few intermediate nodes) if they don't already exist somewhere... 135 | As it would be nice to contribute to the global KG rather than just the global WWW (actually I think it is fair to say this information is already in the global WWW but only in a human-readable format). 136 | 137 | 138 | 139 | ## Background searching 140 | 141 | Neo's search program, which continues while he is asleep, looks like it might have been a World Wide Web (one-way linked human-readable documents) search. 142 | 143 | ![global search](media/global_search.png) 144 | 145 | But a SPARQL (RDF) search would have been much more interesting. Jumping from service to service, traversing named graphs, finding ontologies, aligning them heursitically, and applying inference... 146 | Maybe it would be nice to make SPARQL endpoints of curated SPARQL endpoints and named graphs? We could even apply SHACL over the graphs and score them according to the validation results. 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | --- 157 | 158 | [1] Which makes me think that it might be a good idea to see if there is an ontology that notes that wdt:P551 is owl:inverseOf wdt:P466 and for all the other cases. The ontology doesn't contain the `wdt:` properties. 159 | -------------------------------------------------------------------------------- /DRAFTS/Lipofuscin/media/book.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/DRAFTS/Lipofuscin/media/book.png -------------------------------------------------------------------------------- /DRAFTS/Lipofuscin/media/global_search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/DRAFTS/Lipofuscin/media/global_search.png -------------------------------------------------------------------------------- /DRAFTS/lisp_sparql_visidata/README.md: -------------------------------------------------------------------------------- 1 | # [lisp](https://en.wikipedia.org/wiki/Common_Lisp), [SPARQL](https://en.wikipedia.org/wiki/SPARQL), and [Visidata](https://www.visidata.org/) 2 | 3 | 4 | I often run SPARQL queries against [triplestores](https://en.wikipedia.org/wiki/Triplestore) at a Common Lisp REPL and I like to get the results into Visidata. 5 | I use [slimv_box]() to run [ABCL]() and I use [a Common Lisp wrapper]() around Apache Jena. 6 | 7 | 8 | TODO 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Justin's Weblog 2 | 3 | ## [Intuitive Graph Visualization](intuitive_graph_viz) 4 | ### Aug 31 2024 5 | 6 | Knowledge Graph visualization can be so much more useful 7 | 8 | ## [An iDE: integrating development environment](iDE) 9 | ### May 17 2024 10 | 11 | Using escape hatches to make the development environment you need 12 | 13 | ## [Design Implications](design_implications) 14 | ### Jan 09 2024 15 | 16 | An ounce of thoughtful design is worth a pound of clever implementation 17 | 18 | ## [Relational Databases as Graphs](relational_as_graph) 19 | ### May 23 2023 20 | 21 | Querying a Relational Database with a Graph Query Language 22 | 23 | ## [Semantic Messages](semantic_messages) 24 | ### Nov 3 2022 25 | 26 | Too much specificity can prevent things from working 27 | 28 | ## [Git Repositories as RDF Graphs](git_repo_as_rdf) 29 | 30 | Transforming git repos into RDF graphs 31 | 32 | ## [Fused Edges, Please Don't](fused_edges) 33 | 34 | What is a Fused Edge and why you shouldn't do it 35 | 36 | ## [How it feels to use APL](using_apl) 37 | 38 | Using APL feels different than using other languages I know 39 | 40 | ## [A Random Permutation](a_random_permutation) 41 | 42 | Generating random permutations with APL 43 | 44 | ## [Scraping Webpages with SPARQL](scraping_with_sparql) 45 | 46 | Using SPARQL to extract data from webpages 47 | 48 | ## [SPARQL Value Functions](SPARQL_value_functions) 49 | 50 | Using Java libraries to do things in your SPARQL queries 51 | 52 | ## [Blend Google Sheet with Wikidata](blend_google_sheet_with_wikidata) 53 | 54 | Write a SPARQL query that blends RDF with non-RDF (in a Google Sheet) 55 | 56 | ## [Dynamic Pagination with SPARQL Anything](dynamic_pagination_with_sparql_anything) 57 | 58 | Write a SPARQL query that talks to a REST API (that does not produce RDF) and handles pagination 59 | 60 | ## [Work on Engineered Artifacts](work_on_engineered_artifacts) 61 | 62 | Comparing work on F16s to work on software systems 63 | 64 | ## [Reason Over](reason-over) 65 | 66 | Reasoning over Wikidata using RDFS 67 | 68 | ## [Software in RDF](software_in_rdf) 69 | 70 | Discovering candidate solutions with queries ([Hoogle](https://hoogle.haskell.org/) but for everything) 71 | 72 | ## [SPARQL Gotcha](sparql-gotcha) 73 | 74 | Looking for paths between President Obama and Paul Simon with SPARQL 75 | 76 | ## [JSON-LD](json-ld) 77 | 78 | Super brief JSON-LD orientation 79 | 80 | ## [Add ABCL to Spring Boot](add_abcl_to_springboot) 81 | 82 | Adding a Lisp interpreter to a (Java) Spring Boot application for REPL development 83 | -------------------------------------------------------------------------------- /SPARQL_value_functions/README.md: -------------------------------------------------------------------------------- 1 | # SPARQL Value Fuctions 2 | 3 | When I read Bob DuCharme's [blog on using custom javascript functions](https://www.bobdc.com/blog/arqjavascript/) in SPARQL queries I knew I needed to try it out. 4 | What I really wanted was to be able to call functions from some [npm](https://www.npmjs.com/) libraries. 5 | Turns out I wasn't able to figure out how to do that. 6 | It might not be a simple thing to do because Apache Jena doesn't bundle a node runtime... also I didn't even look at how Apache Jena (ARQ specifically) is evaluating this custom javascript. Still, the ability to call vanilla javascript in a SPARQL query is nice. 7 | 8 | But I still wanted to call someone else's library functions so I moved onto another option: SPARQL Value Functions. 9 | At least that is what Apache Jena [calls this custom fuction pluggability](https://jena.apache.org/documentation/query/writing_functions.html). 10 | 11 | Here is what I wanted to do... 12 | 13 | Take some non-RDF data with string date representations that were messy and canonicalize them (to xsd:dateTime since SPARQL respects that format). 14 | 15 | Here is the csv example we'll work with: 16 | 17 | |classification\_one |classification\_two |classification\_three|company\_system\_of\_record|company\_id |company\_name |company\_inception | 18 | |--------------------|--------------------|--------------------|--------------------|------------|--------------------|-------------------------| 19 | |Energy, chemicals and utilities| | |CompanyX\_DeptY\_SystemA|1 |ABT Inc\. |Last day of August 2004 | 20 | |Energy, chemicals and utilities|Chemicals | |CompanyX\_DeptY\_SystemM|2 | | | 21 | |Energy, chemicals and utilities|Electricity \- nuclear| |CompanyX\_DeptY\_SystemZ|3 | |OCT 1989 | 22 | |Energy, chemicals and utilities|Electricity \- renewables| |CompanyX\_DeptY\_SystemZ|4 | |DEC 2011 | 23 | |Energy, chemicals and utilities|Electricity \- renewables|Biomass |CompanyX\_DeptY\_SystemM|5 |Fred's Renewables LLC| | 24 | |Energy, chemicals and utilities|Electricity \- renewables|Hydro |CompanyX\_DeptY\_SystemP|6 |WAT Inc\. |9 January 2000 | 25 | |Energy, chemicals and utilities|thermal | |CompanyX\_DeptY\_SystemQ|7 | |5 Mar 1992 | 26 | |Energy, chemicals and utilities|thermal |coal |CompanyX\_DeptY\_SystemQ|8 | |19 Jun 2020 | 27 | |Financial institutions| | |CompanyX\_DeptY\_SystemM|9 |Pet E\. Cash Co\. | | 28 | |Financial institutions|Asset / investment management and funds| |CompanyX\_DeptY\_SystemM|10 | | | 29 | 30 | 31 | (I started with a .csv I got [here](https://github.com/kg-construct/rml-questions/discussions/3) and added some columns to it.) 32 | 33 | You'll notice there is some taxonomy information in there as well as some company information. 34 | Also notice how each source system uses different date representations. 35 | I didn't want to just do a regex for each style. 36 | I was looking for an 80/20 rule approach and a general way to add functionality to SPARQL queries. 37 | 38 | 39 | Here are the triples my construct query produced: 40 | ``` 41 | @prefix datething: . 42 | @prefix ex: . 43 | @prefix fx: . 44 | @prefix ns: . 45 | @prefix rdf: . 46 | @prefix skos: . 47 | @prefix xsd: . 48 | @prefix xyz: . 49 | 50 | [ rdf:type ex:Company ; 51 | ex:categorizedBy ex:thermal , ; 52 | ex:hasActualStart "1992-03-05T00:00:00.000Z"^^xsd:dateTime ; 53 | ex:hasID [ rdf:type ex:Identifier ; 54 | ex:identifiedBy "7" ; 55 | ex:inSystem 56 | ] 57 | ] . 58 | 59 | ex:thermal rdf:type skos:Concept ; 60 | skos:broader . 61 | 62 | ex:Hydro rdf:type skos:Concept ; 63 | skos:broader . 64 | 65 | 66 | rdf:type skos:Concept ; 67 | skos:broader . 68 | 69 | [ rdf:type ex:Company ; 70 | ex:categorizedBy ; 71 | ex:hasID [ rdf:type ex:Identifier ; 72 | ex:identifiedBy "9" ; 73 | ex:inSystem 74 | ] ; 75 | ex:name "Pet E. Cash Co." 76 | ] . 77 | 78 | 79 | rdf:type skos:Concept ; 80 | skos:broader . 81 | 82 | ex:Chemicals rdf:type skos:Concept ; 83 | skos:broader . 84 | 85 | [ rdf:type ex:Company ; 86 | ex:categorizedBy , ; 87 | ex:hasID [ rdf:type ex:Identifier ; 88 | ex:identifiedBy "10" ; 89 | ex:inSystem 90 | ] 91 | ] . 92 | 93 | 94 | rdf:type skos:Concept ; 95 | skos:broader . 96 | 97 | 98 | rdf:type skos:Concept . 99 | 100 | ex:Biomass rdf:type skos:Concept ; 101 | skos:broader . 102 | 103 | [ rdf:type ex:Company ; 104 | ex:categorizedBy ex:Hydro , , ; 105 | ex:hasActualStart "2000-01-09T00:00:00.000Z"^^xsd:dateTime ; 106 | ex:hasID [ rdf:type ex:Identifier ; 107 | ex:identifiedBy "6" ; 108 | ex:inSystem 109 | ] ; 110 | ex:name "WAT Inc." 111 | ] . 112 | 113 | [ rdf:type ex:Company ; 114 | ex:categorizedBy , ; 115 | ex:hasActualStart "1989-10-01T00:00:00.000Z"^^xsd:dateTime ; 116 | ex:hasID [ rdf:type ex:Identifier ; 117 | ex:identifiedBy "3" ; 118 | ex:inSystem 119 | ] 120 | ] . 121 | 122 | [ rdf:type ex:Company ; 123 | ex:categorizedBy , ; 124 | ex:hasActualStart "2011-12-01T00:00:00.000Z"^^xsd:dateTime ; 125 | ex:hasID [ rdf:type ex:Identifier ; 126 | ex:identifiedBy "4" ; 127 | ex:inSystem 128 | ] 129 | ] . 130 | 131 | ex:coal rdf:type skos:Concept ; 132 | skos:broader ex:thermal . 133 | 134 | [ rdf:type ex:Company ; 135 | ex:categorizedBy ex:coal , ex:thermal , ; 136 | ex:hasActualStart "2020-06-19T00:00:00.000Z"^^xsd:dateTime ; 137 | ex:hasID [ rdf:type ex:Identifier ; 138 | ex:identifiedBy "8" ; 139 | ex:inSystem 140 | ] 141 | ] . 142 | 143 | [ rdf:type ex:Company ; 144 | ex:categorizedBy ex:Chemicals , ; 145 | ex:hasID [ rdf:type ex:Identifier ; 146 | ex:identifiedBy "2" ; 147 | ex:inSystem 148 | ] 149 | ] . 150 | 151 | [ rdf:type ex:Company ; 152 | ex:categorizedBy ; 153 | ex:hasActualStart "2004-08-31T00:00:00.000Z"^^xsd:dateTime ; 154 | ex:hasID [ rdf:type ex:Identifier ; 155 | ex:identifiedBy "1" ; 156 | ex:inSystem 157 | ] ; 158 | ex:name "ABT Inc." 159 | ] . 160 | 161 | [ rdf:type ex:Company ; 162 | ex:categorizedBy ex:Biomass , , ; 163 | ex:hasID [ rdf:type ex:Identifier ; 164 | ex:identifiedBy "5" ; 165 | ex:inSystem 166 | ] ; 167 | ex:name "Fred's Renewables LLC" 168 | ] . 169 | 170 | 171 | rdf:type skos:Concept . 172 | 173 | ``` 174 | (NOTE: I wouldn't recommend using blank nodes for the companies like I did here but I am not showing off domain modeling in RDF in this post.) 175 | 176 | Here is the query that produced those triples (in a bash command for ease of replication): 177 | ``` 178 | curl --silent 'http://localhost:3000/sparql.anything' \ 179 | --data-urlencode 'query= 180 | PREFIX xyz: 181 | PREFIX ns: 182 | PREFIX rdf: 183 | PREFIX fx: 184 | prefix skos: 185 | prefix ex: 186 | PREFIX xsd: 187 | PREFIX datething: 188 | construct {?concept_one a skos:Concept . 189 | ?concept_two a skos:Concept . 190 | ?concept_three a skos:Concept . 191 | ?concept_three skos:broader ?concept_two . 192 | ?concept_two skos:broader ?concept_one . 193 | ?company a ex:Company . 194 | ?company ex:hasActualStart ?when . 195 | ?company ex:name ?company_name . 196 | ?company ex:hasID ?company_identifier . 197 | ?company ex:categorizedBy ?concept_one . 198 | ?company ex:categorizedBy ?concept_two . 199 | ?company ex:categorizedBy ?concept_three . 200 | ?company_identifier a ex:Identifier . 201 | ?company_identifier ex:identifiedBy ?company_id . 202 | ?company_identifier ex:inSystem ?source_system . 203 | } 204 | WHERE { 205 | service { 206 | fx:properties fx:csv.null-string "" . 207 | ?root a ns:root ; 208 | ?slotp ?row . 209 | optional {?row xyz:classification_one ?one_string 210 | bind(uri(concat(str(ex:),encode_for_uri(?one_string))) as ?concept_one)} 211 | optional {?row xyz:classification_two ?two_string 212 | bind(uri(concat(str(ex:),encode_for_uri(?two_string))) as ?concept_two)} 213 | optional {?row xyz:classification_three ?three_string 214 | bind(uri(concat(str(ex:),encode_for_uri(?three_string))) as ?concept_three)} 215 | { 216 | ?row xyz:company_id ?company_id . 217 | ?row xyz:company_system_of_record ?source_system_string . 218 | bind(uri(concat(str(ex:),"system/",encode_for_uri(?source_system_string))) as ?source_system) 219 | optional{ ?row xyz:company_name ?company_name } 220 | bind(bnode() as ?company) 221 | bind(bnode() as ?company_identifier) 222 | } 223 | optional { ?row xyz:company_inception ?when_string 224 | bind(strdt(datething:parse(?when_string),xsd:dateTime) as ?when) 225 | } 226 | } 227 | }' 228 | 229 | ``` 230 | 231 | 232 | 233 | These 2 lines do the custom function invocation. 234 | 235 | ``` 236 | PREFIX datething: 237 | bind(strdt(datething:parse(?when_string),xsd:dateTime) as ?when) 238 | ``` 239 | 240 | The function `datething:parse` takes a variable (bound to a string) as an argument and returns an xsd:dateTime string representation then we cast it to an `xsd:dateTime` with `strdt`. 241 | In order to allow that to happen you have to put a .jar on Apache Jena Fuseki's classpath. 242 | I am running SPARQL Anything (which is Fuseki but with some added functionality to allow it to treat non-RDF data as RDF). 243 | 244 | Here are the steps to invoke this messy string date parsing function: 245 | 246 | 1) Build the .jar file for the `datething:parse` functionality by following the instructions [here](https://github.com/justin2004/datething) under the "how" section. 247 | 248 | 249 | 2) `git clone` [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything). 250 | 251 | 3) Copy the .jar you produced in (1) to directory you created in (2). 252 | 253 | 4) Run SPARQL Anything in a docker container (listening on port 3000) by following [these](https://github.com/SPARQL-Anything/sparql.anything/blob/v0.5-DEV/BROWSER.md) instructions. (At this point Fuseki's classpath should have that .jar file on it.) 254 | 255 | 5) Download `aa.csv` (the .csv example) from this git repo and copy it into the SPARQL Anything directory. 256 | 257 | 6) Run the bash command above (with the SPARQL query) and you should have your constructed triples! 258 | 259 | 260 | So with this `datething:parse` functionality you can triplify, say, a spreadsheet that has a variety of date representations in it and likely get most of the dates canonicalized so you can sort, filter, etc. on them in a SPARQL query. 261 | ([Here](https://github.com/justin2004/weblog/tree/master/blend_google_sheet_with_wikidata) is my post on using SPARQL Anything to extract triples from a public google sheet.) 262 | 263 | This .jar should also work with [CSV2RDF](https://github.com/AtomGraph/CSV2RDF) as it also uses Apache Jena and the SPARQL construct approach to triplification. 264 | 265 | Happy triplifying! 266 | -------------------------------------------------------------------------------- /SPARQL_value_functions/aa.csv: -------------------------------------------------------------------------------- 1 | classification_one,classification_two,classification_three,company_system_of_record,company_id,company_name,company_inception 2 | "Energy, chemicals and utilities",,,CompanyX_DeptY_SystemA,1,ABT Inc.,Last day of August 2004 3 | "Energy, chemicals and utilities",Chemicals,,CompanyX_DeptY_SystemM,2,, 4 | "Energy, chemicals and utilities",Electricity - nuclear,,CompanyX_DeptY_SystemZ,3,,OCT 1989 5 | "Energy, chemicals and utilities",Electricity - renewables,,CompanyX_DeptY_SystemZ,4,,DEC 2011 6 | "Energy, chemicals and utilities",Electricity - renewables,Biomass,CompanyX_DeptY_SystemM,5,Fred's Renewables LLC, 7 | "Energy, chemicals and utilities",Electricity - renewables,Hydro,CompanyX_DeptY_SystemP,6,WAT Inc.,9 January 2000 8 | "Energy, chemicals and utilities",thermal,,CompanyX_DeptY_SystemQ,7,,5 Mar 1992 9 | "Energy, chemicals and utilities",thermal,coal,CompanyX_DeptY_SystemQ,8,,19 Jun 2020 10 | Financial institutions,,,CompanyX_DeptY_SystemM,9,Pet E. Cash Co., 11 | Financial institutions,Asset / investment management and funds,,CompanyX_DeptY_SystemM,10,, 12 | -------------------------------------------------------------------------------- /a_random_permutation/README.md: -------------------------------------------------------------------------------- 1 | # A Random Permutation 2 | 3 | The other day I needed to pick a random order for my team to present our work in. 4 | Usually one of us just makes up an order but this time I decided to do it programmatically. 5 | 6 | With my team on the call, I fired up an [APL REPL](https://tryapl.org/) and entered an expression like: 7 | 8 | ```apl 9 | 'alice' 'bob' 'yazeed' 'zach'[4?4] 10 | ┌─────┬────┬──────┬───┐ 11 | │alice│zach│yazeed│bob│ 12 | └─────┴────┴──────┴───┘ 13 | ``` 14 | 15 | So with 5 characters, `[4?4]`, I had the business end of an expression to select a random permutation. 16 | But in that expression I have to manually count the number of people on my team (4) and type it two times. 17 | Let's see if we can eliminate that so I can add people and have the expression handle it. 18 | 19 | ```apl 20 | {⍵[?⍨≢⍵]} 'alice' 'bob' 'yazeed' 'zach' 21 | ┌────┬──────┬─────┬───┐ 22 | │zach│yazeed│alice│bob│ 23 | └────┴──────┴─────┴───┘ 24 | ``` 25 | 26 | So with 9 characters I had a function to handle a group of any size. 27 | 28 | If some NASA Apollo program contributors join my team I can handle them: 29 | ```apl 30 | {⍵[?⍨≢⍵]} 'alice' 'bob' 'yazeed' 'zach' 'margaret' 'fred' 'jim' 'jack' 31 | ┌────┬──────┬───┬───┬────┬────┬─────┬────────┐ 32 | │zach│yazeed│bob│jim│jack│fred│alice│margaret│ 33 | └────┴──────┴───┴───┴────┴────┴─────┴────────┘ 34 | ``` 35 | 36 | But that expression explicitly references the parameter (⍵) two times. 37 | There is another way to do this that does not involve explicitly referencing the parameters: [tacit style](https://en.wikipedia.org/wiki/Tacit_programming) 38 | 39 | ```apl 40 | (⊂⍤?⍨∘≢⌷⊢) 'alice' 'bob' 'yazeed' 'zach' 41 | ┌──────┬───┬────┬─────┐ 42 | │yazeed│bob│zach│alice│ 43 | └──────┴───┴────┴─────┘ 44 | ``` 45 | 46 | Notice that that expression does not reference its parameters explicitly (⍵ and ⍺ do not occur in the expression). 47 | It is only 8 characters (if you don't count the parens which are only there because I put the argument (the vector of character vectors) next to it.) 48 | 49 | I don't expect that many other languages can express this in 8 characters or less (especially without a library)! 50 | 51 | I've only been using APL recreationally ([contributing to April](https://github.com/phantomics/april)) for about a year but I was able to arrive at the first two expressions quickly. 52 | The tacit style took me at least 10 minutes and I had help from [The APLcart](https://aplcart.info/). 53 | (I didn't make my team watch me try to write this tacit style expression; I did it after work.) 54 | 55 | I knew I wanted a permutation of a vector of character vectors so I searched "permutation" in APLcart and I found the following (which is the basis of the tacit expression I wrote): 56 | ``` 57 | Iv⌷⍨∘⊂⍨Y Permute: Reorder major cells of Y according tot permutation vector Iv 58 | ``` 59 | 60 | APLcart tells you that Iv is an integer vector and Y is any array. 61 | I won't describe the 10 minutes in detail but I'll list the things I needed to remember/review/discover in addition to the language primitives: 62 | 63 | 64 | 65 | [Function Trains](https://help.dyalog.com/18.2/index.htm#Language/Introduction/Trains.htm?Highlight=train). 66 | Specifically the monadic fork. 67 | 68 | Dyalog APL's expression tree. 69 | I actually don't know what they call it but if you type a tacit expression in the REPL it prints a cute tree representation depicting how the functions and operators get glued together to form a derived function. 70 | 71 | ``` 72 | ⌷⍨∘⊂⍨ 73 | ⍨ 74 | ┌─┘ 75 | ∘ 76 | ┌┴┐ 77 | ⍨ ⊂ 78 | ┌─┘ 79 | ⌷ 80 | ``` 81 | 82 | If APL looks interesting to you I can recommend the [Dyalog APL Tutor](https://tutorial.dyalog.com/) to get more comfortable with the language. 83 | If you want to see a famous use of APL you can watch [Conway's Game Of Life in APL](https://www.youtube.com/watch?v=a9xAKttWgP4). 84 | 85 | Ok, have fun making random orders to present your team's work in. :) 86 | -------------------------------------------------------------------------------- /add_abcl_to_springboot/README.md: -------------------------------------------------------------------------------- 1 | ## Adding a Lisp Interpreter To a Spring Boot Application 2 | 3 | ABCL ([Armed Bear Common Lisp](https://common-lisp.net/project/armedbear/)) is an implementation of Common Lisp that runs on the JVM. I really like the [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop) style development so I wanted to bring that to the Java application (using Spring Boot) I am working on. 4 | 5 | This is nice because it allows one to explore a Java library without waiting on compilation and JVM startup. 6 | 7 | 8 | ### The first problem I encountered: the classloader 9 | 10 | "The Spring Boot Maven and Gradle plugins both package our application as executable JARs – such a file can't be used in another project since class files are put into BOOT-INF/classes. This is not a bug, but a feature." [0](https://www.baeldung.com/spring-boot-dependency) 11 | 12 | 13 | Once you are at the ABCL REPL you can look up classes like: 14 | ``` 15 | CL-USER> (jss:japropos "String") 16 | ... 17 | sun.swing.StringUIClientPropertyKey: Java Class 18 | sun.text.normalizer.ReplaceableString: Java Class 19 | ... 20 | ``` 21 | 22 | 23 | You can get a Java class designator like: 24 | ``` 25 | CL-USER> (jclass "java.lang.String") 26 | # 27 | ``` 28 | 29 | Or you can just instantiate a new object: 30 | ``` 31 | CL-USER> (jss:new "java.lang.String" 32 | "hello there") 33 | # 34 | ``` 35 | 36 | 37 | But if you've made your jar file using the Spring Boot Maven plugin you'll see something like: 38 | ``` 39 | CL-USER> (jss:japropos "Application") 40 | BOOT-INF.classes.com.khoubyari.example.Application: Java Class 41 | ``` 42 | 43 | Which you can not get a designator for using the default classloader: 44 | ``` 45 | CL-USER> (jclass "BOOT-INF.classes.com.khoubyari.example.Application") 46 | ; Evaluation aborted on NIL 47 | ``` 48 | 49 | ``` 50 | (jclass "com.khoubyari.example.Application") 51 | Class not found: com.khoubyari.example.Application 52 | [Condition of type ERROR] 53 | ``` 54 | 55 | 56 | The default classloader won't load them. 57 | I briefly tried to instantiate and use an "org.springframework.boot.loader.LaunchedURLClassLoader" (like Spring does) but I didn't get it to work (although in principle it should be possible, I think). 58 | 59 | The way I got around this problem was to make a pom.xml with all the dependencies I want to be able to use from Lisp and from Java and then I put that resultant jar file on the classpath. 60 | 61 | The key is to not start the application using the -jar flag but instead put all the jars on the classpath and then use flags to tell Spring Boot what the main class is: 62 | ``` 63 | CLASSPATH="target/spring-boot-rest-example-0.5.0.jar" 64 | CLASSPATH="$CLASSPATH:/root/abcl-bin-1.6.0/abcl.jar" 65 | CLASSPATH="$CLASSPATH:/mnt/shared-dependencies/target/shared-things-1.0.0-jar-with-dependencies.jar" 66 | 67 | java -cp "$CLASSPATH" \ 68 | -Dspring.profiles.active=test \ 69 | -Dloader.main=com.khoubyari.example.Application \ 70 | org.springframework.boot.loader.PropertiesLauncher 71 | ``` 72 | 73 | To demonstrate this I started with an existing Spring Boot application and on my [fork](https://github.com/justin2004/spring-boot-rest-example) (see the last few commits) I added a Lisp interpreter and loaded swank so I could explore and make changes to the live application from vim. 74 | 75 | 76 | If you have [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/) installed you can: 77 | 78 | 79 | First, clone my [fork](https://github.com/justin2004/spring-boot-rest-example) 80 | 81 | 82 | Next, run `docker-compose up` 83 | 84 | Maven will build the jars then the JVM will be started: 85 | 86 | ![jvm_starting](./media/jvm_starting.png) 87 | 88 | In the main method a Lisp interpreter is created and it will read in a .lisp source file and start swank (the server side of [slime](https://common-lisp.net/project/slime/)): 89 | 90 | ![swank_started](./media/swank_started.png) 91 | 92 | 93 | Finally, use a slime client (here I use [slimv](https://github.com/kovisoft/slimv) also running in a docker [container](https://github.com/justin2004/slimv_box) to connect to the swank server running on the JVM in the docker container. (I won't describe how to connect a slime client here as I am using the abcl branch of my slimv_box repository which still requires some manual steps when building the docker image.) 94 | 95 | Now you can type Lisp expressions in vim and they will be evaulated by the ABCL interpreter running in the remote JVM: 96 | 97 | ![slimv](./media/now_to_slimv.png) 98 | 99 | Above I got the reference to an ArrayList (a static field in the Interloper class), checked its size, added a string to it, then checked its size again. 100 | 101 | ![back_to_jvm](./media/back_to_stdout_on_jvm.png) 102 | 103 | The Java code was able to see the change made by the Lisp interpreter. 104 | 105 | 106 | 107 | ### The next few problems I encountered include: 108 | - The need to call methods on "java.lang.reflect.Field" 109 | - Inner classes 110 | 111 | I'll write about those in another installment. 112 | Also I am keeping helper functions [here](https://github.com/justin2004/abcl_repl_helpers) as I write them. 113 | -------------------------------------------------------------------------------- /add_abcl_to_springboot/media/back_to_stdout_on_jvm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/add_abcl_to_springboot/media/back_to_stdout_on_jvm.png -------------------------------------------------------------------------------- /add_abcl_to_springboot/media/jvm_starting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/add_abcl_to_springboot/media/jvm_starting.png -------------------------------------------------------------------------------- /add_abcl_to_springboot/media/now_to_slimv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/add_abcl_to_springboot/media/now_to_slimv.png -------------------------------------------------------------------------------- /add_abcl_to_springboot/media/swank_started.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/add_abcl_to_springboot/media/swank_started.png -------------------------------------------------------------------------------- /blend_google_sheet_with_wikidata/README.md: -------------------------------------------------------------------------------- 1 | # Blending a Google Sheet with Wikidata 2 | 3 | 4 | There is [interest](https://phabricator.wikimedia.org/T181319) in the Wikidata community in accessing external tabular data in a [SPARQL](https://en.wikipedia.org/wiki/SPARQL) query. While that development looks like it is ongoing you can already access tabular data in a SPARQL query with [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything). 5 | 6 | 7 | Here is the Google Sheet we'll use in this example: 8 | 9 | 10 | |item\_name |item\_Q |item\_note | 11 | |-------------|----------|--------------------------------| 12 | |hair brush |Q1642980 |can be used to remover tangles | 13 | |tooth brush | |soft bristles are the best | 14 | |tweezers |Q192504 |some wee hands | 15 | |diode |Q11656 | | 16 | 17 | 18 | Which lives here: 19 | 20 | `https://docs.google.com/spreadsheets/d/1ZE5SGutY1-_O4OFj-W9YPcXObhAfUaCX73SKpSlTRLk/` 21 | 22 | 23 | First you have to run SPARQL Anything. There are instruction in the project's README and [here](https://github.com/SPARQL-Anything/sparql.anything/blob/v0.3-DEV/BROWSER.md) for a docker deployment (which is how I run it for this example). 24 | 25 | 26 | The query (wrapped in a curl call so you can run it in a bash shell) is below. 27 | It uses the Wikidata Q number in the sheet to enrich the rows in the sheet with information about the type of thing each item is and the use of each item. 28 | 29 | The query: 30 | 31 | ``` 32 | curl --silent 'http://localhost:3000/sparql.anything' \ 33 | --header "Accept: text/csv" \ 34 | --data-urlencode 'query= 35 | PREFIX xyz: 36 | PREFIX rdf: 37 | PREFIX fx: 38 | PREFIX wd: # Wikibase entity - item or property. 39 | PREFIX wdt: # Truthy assertions about the data, links entity to value directly. 40 | PREFIX wikibase: 41 | PREFIX bd: 42 | PREFIX rdfs: 43 | SELECT ?item_name ?item_q ?item_note ?item_use_label ?item_superclass ?item_superclass_label 44 | WHERE { 45 | SERVICE { 46 | fx:properties fx:csv.null-string "" . 47 | ?item xyz:item_name ?item_name . 48 | optional{?item xyz:item_Q ?item_q } . 49 | optional{?item xyz:item_note ?item_note } . 50 | bind(if(bound(?item_q),iri(concat(str(wd:),?item_q)),"this will not match anything in the subject position") as ?item_q_uri) . 51 | } 52 | optional { 53 | service { 54 | ?item_q_uri wdt:P279 ?item_superclass . 55 | optional { 56 | ?item_superclass rdfs:label ?item_superclass_label . 57 | filter(lang(?item_superclass_label) = "en") 58 | } 59 | optional { 60 | ?item_q_uri wdt:P366 ?item_use . 61 | ?item_use rdfs:label ?item_use_label . 62 | filter(lang(?item_use_label) = "en") 63 | } 64 | } 65 | } 66 | }' 67 | ``` 68 | 69 | 70 | The output: 71 | 72 | 73 | 74 | 75 | |item\_name |item\_q |item\_note |item\_use\_label |item\_superclass |item\_superclass\_label| 76 | |-------------|----------|--------------------|--------------------|--------------------|-----------------------| 77 | |diode |Q11656 | |electrical resistance|http://www\.wikidata\.org/entity/Q11653|electronic component | 78 | |hair brush |Q1642980 |can be used to remover tangles|hair care |http://www\.wikidata\.org/entity/Q10528974|personal hygiene item | 79 | |hair brush |Q1642980 |can be used to remover tangles|hairdressing |http://www\.wikidata\.org/entity/Q10528974|personal hygiene item | 80 | |hair brush |Q1642980 |can be used to remover tangles|hair care |http://www\.wikidata\.org/entity/Q5639584|hairstyling tool | 81 | |hair brush |Q1642980 |can be used to remover tangles|hairdressing |http://www\.wikidata\.org/entity/Q5639584|hairstyling tool | 82 | |hair brush |Q1642980 |can be used to remover tangles|hair care |http://www\.wikidata\.org/entity/Q614467|brush | 83 | |hair brush |Q1642980 |can be used to remover tangles|hairdressing |http://www\.wikidata\.org/entity/Q614467|brush | 84 | |tweezers |Q192504 |some wee hands |motion |http://www\.wikidata\.org/entity/Q1378235|forceps | 85 | |tweezers |Q192504 |some wee hands |motion |http://www\.wikidata\.org/entity/Q1074814|surgical instrument | 86 | |tweezers |Q192504 |some wee hands |motion |http://www\.wikidata\.org/entity/Q834028|laboratory equipment | 87 | |tweezers |Q192504 |some wee hands |motion |http://www\.wikidata\.org/entity/Q2578402|hand tool | 88 | |tooth brush | |soft bristles are the best| | | | 89 | 90 | 91 | 92 | SPARQL Anything supports more than just blending of tabular data (see its README). 93 | While you could certainly do something like this example with your favorite programming language I think it is quite convenient to have the ability to access non-graph sources of data in a SPARQL query. 94 | -------------------------------------------------------------------------------- /building-open-source-projects/media/calmly.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/building-open-source-projects/media/calmly.webp -------------------------------------------------------------------------------- /building-open-source-projects/media/simply.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/building-open-source-projects/media/simply.png -------------------------------------------------------------------------------- /cepl/README.md: -------------------------------------------------------------------------------- 1 | ## CEPL 2 | 3 | CEPL is very cool project by baggers. 4 | 5 | https://github.com/cbaggers/cepl 6 | 7 | But I was having trouble getting the examples to run. 8 | 9 | https://github.com/cbaggers/cepl.examples 10 | 11 | 12 | I'm not sure if I just don't understand packages enough but I had to edit an example, adding some package qualifications, to get it to work. 13 | So I thought i'd share what I used to get CEPL running in case someone else wants to try CEPL but is having trouble. 14 | 15 | Baggers uses emacs so if you are an emacs user maybe it will be clear to you how to use CEPL by watching him use it: 16 | 17 | https://www.youtube.com/channel/UCMV8p6Lb-bd6UZtTc_QD4zA 18 | 19 | 20 | 21 | I'd say the following is mostly for vim users 22 | 23 | 0) Get slimv_box from https://github.com/justin2004/slimv_box 24 | 25 | 0) Download a.lisp from this git repo 26 | 27 | 0) cd to the directory containing your downloaded a.lisp file 28 | 29 | 0) run `vvc` (see README at https://github.com/justin2004/slimv_box) 30 | 31 | 0) follow the instructions in a.lisp 32 | 33 | 34 | [![asciicast](https://asciinema.org/a/271453.svg)](https://asciinema.org/a/271453) 35 | 36 | 37 | 38 | ![](render.png) 39 | 40 | -------------------------------------------------------------------------------- /cepl/a.lisp: -------------------------------------------------------------------------------- 1 | ; edit this file using slimv_box started with "vvc" (https://github.com/justin2004/slimv_box) 2 | ; press ,b to evaluate this whole vim buffer 3 | ; then jump to the bottom of this document 4 | 5 | (ql:quickload :cepl.sdl2) 6 | (ql:quickload :quickproject) 7 | (cepl:make-project "your-proj") 8 | (ql:quickload "your-proj") 9 | (in-package :your-proj) 10 | 11 | 12 | 13 | ;;;;;;;;;; begin baggers' example 14 | 15 | ;; This gives us a simple moving triangle 16 | 17 | (defparameter *vertex-stream* nil) 18 | (defparameter *array* nil) 19 | (defparameter *loop* 0.0) 20 | 21 | ;; note the use of implicit uniform capture with *loop* 22 | ;; special vars in scope can be used inline. During compilation 23 | ;; cepl will try work out the type. It does this by seeing if the 24 | ;; symbol is bound to a value, and if it is it checks the type of 25 | ;; the value for a suitable matching varjo type 26 | (cepl:defun-g calc-pos ((v-pos :vec4) (id :float)) 27 | (let ((pos (rtg-math:v! (* (rtg-math:s~ v-pos :xyz) 0.3) 1.0))) 28 | (+ pos (let ((i (/ (+ (float id)) 2))) 29 | (rtg-math:v! (sin (+ i *loop*)) 30 | (cos (* 3 (+ (tan i) *loop*))) 31 | 0.0 0.0))))) 32 | 33 | ;; Also showing that we can use gpu-lambdas inline in defpipeline-g 34 | ;; It's not usually done as reusable functions are generally nicer 35 | ;; but I wanted to show that it was possible :) 36 | (cepl:defpipeline-g prog-1 () 37 | (cepl:lambda-g ((position :vec4) &uniform (i :float)) 38 | (calc-pos position i)) 39 | (cepl:lambda-g () 40 | (rtg-math:v! (cos *loop*) (sin *loop*) 0.4 1.0))) 41 | 42 | (defun step-demo () 43 | (sleep 0.01) 44 | (cepl:step-host) 45 | (livesupport:update-repl-link) 46 | (setf *loop* (+ 0.011 *loop*)) 47 | (cepl:clear) 48 | (loop :for i :below 100 :do 49 | (let ((i (/ i 2.0))) 50 | (cepl:map-g #'prog-1 *vertex-stream* :i i))) 51 | (cepl:swap)) 52 | 53 | (let ((running nil)) 54 | (defun run-loop () 55 | (setf running t) 56 | (setf *array* (cepl:make-gpu-array (list (rtg-math:v! 0.0 0.2 0.0 1.0) 57 | (rtg-math:v! -0.2 -0.2 0.0 1.0) 58 | (rtg-math:v! 0.2 -0.2 0.0 1.0)) 59 | :element-type :vec4 60 | :dimensions 3)) 61 | (setf *vertex-stream* (cepl:make-buffer-stream *array*)) 62 | (loop :while (and running (not (cepl:shutting-down-p))) :do 63 | (livesupport:continuable (step-demo)))) 64 | (defun stop-loop () (setf running nil))) 65 | 66 | 67 | ;;;;;;;;;; end baggers' example 68 | 69 | 70 | 71 | ; then uncomment and put your cursor in each form and evaluate with ,e 72 | ; (cepl:repl) 73 | ; (run-loop) 74 | ;; (stop-loop) 75 | 76 | -------------------------------------------------------------------------------- /cepl/render.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/cepl/render.png -------------------------------------------------------------------------------- /design_implications/README.md: -------------------------------------------------------------------------------- 1 | ## Design Implications (Python & APL) 2 | 3 | Today I was using Python and I needed to concatenate some sequences. 4 | I tried to just put a `,` in between them but that wasn't it. 5 | Then I remembered how the design of Python, in its pursuit of readability and expressiveness, overloads `+` as numeric addition and sequence concatenation. 6 | It does so at a cost and I don't think it is cute anymore (like I did when I first saw it). 7 | 8 | A super thoughtful design pays for itself in ways that are hard to quantify and Python's design isn't super thoughtful with respect to overloading `+`. 9 | 10 | 11 | ## Python 12 | 13 | In Python `+` is overloaded. 14 | It can do arithmetic addition (for numeric operands) or concatenation (for sequence operands). 15 | 16 | When operands are numbers we get arithmetic addition: 17 | ```python 18 | 1 + 1 19 | 2 20 | ``` 21 | 22 | When operands are sequences we get concatenation: 23 | ```python 24 | [1, 2] + [3, 4] 25 | [1, 2, 3, 4] 26 | ``` 27 | 28 | But what if you have a sequences of numbers and you want to add the numbers in the sequences? 29 | If you want item wise addition of sequences (with numeric elements) you need more than just `+`. 30 | 31 | You need **four** pieces of machinery (a function `zip`, a list comprehension, a `for` loop, and the `+`): 32 | 33 | ```python 34 | [x + y for x, y in zip([1,2,3], [1,2,3])] 35 | [2, 4, 6] 36 | ``` 37 | 38 | I'm talking about the core language here not libraries. 39 | 40 | ## APL 41 | 42 | In other parts of the world (Hello, Canada!) we have [APL](https://tryapl.org/). 43 | 44 | APL was designed silmultaneously as a mathematical notation *and* as an executable language. 45 | Ken Iverson, the designer of APL: ["the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation."](https://www.jsoftware.com/papers/tot.htm) 46 | 47 | In APL `+` isn't overloaded based on the types of operands like it is in Python. 48 | `+` can handle operands that are arrays (almost everything is an array in APL). 49 | And arrays can have zero or many dimensions. 50 | 51 | When operands are arrays of zero dimensions (scalars). 52 | ```apl 53 | 1+1 54 | 2 55 | ``` 56 | 57 | When an operands is an array of zero dimensions (scalar) and another is an array of one dimension (we get [broadcasting/extention](https://aplwiki.com/wiki/Scalar_extension)). 58 | ```apl 59 | 1+1 2 3 4 60 | 2 3 4 5 61 | ``` 62 | 63 | When both operands are arrays of one dimension (item wise addition). 64 | ```apl 65 | 1 2 3 + 1 2 3 66 | 2 4 6 67 | ``` 68 | 69 | Note that in APL, item wise addition of arrays (sequences in Python speak) only require **one** piece of machinery: `+`. 70 | 71 | And concatenation is a different glyph: `,`. 72 | 73 | ```apl 74 | 1 2 3 , 4 5 6 75 | 1 2 3 4 5 6 76 | ``` 77 | 78 | If you try to concatenate two character arrays with `+` in APL: 79 | ```apl 80 | 'hello'+'goodb' 81 | DOMAIN ERROR 82 | 'hello'+'goodb' 83 | ∧ 84 | ``` 85 | 86 | Because arithmetic addition isn't defined on characters -- you want concatenation not addition. 87 | ```apl 88 | 'hello','goodbye' 89 | hellogoodbye 90 | ``` 91 | 92 | ## The End 93 | 94 | Sure, APL has an additional glyph to learn (`,`) if need to concatenate. 95 | But by introducing that glyph (at design time) and using `+` with two operands to always mean arithmetic addition, APL was able to do with 1 piece of machinery what Python needs 4 to do. 96 | 97 | That is generally the case when comparing APL and Python: ["each one of these single-character glyphs in APL when translated equates to anywhere from roughly 1 to 50 lines of Python!"](https://www.reddit.com/r/Python/comments/z7doen/i_spent_the_last_2_months_converting_apl/) 98 | 99 | 100 | As I use Python I'll continue to wish I was able to sprinkle in APL expressions to taste. 101 | 102 | If you want to get an idea of what it feels like I use APL check [this](https://github.com/justin2004/weblog/tree/master/using_apl) out. 103 | 104 | If you want to learn APL, a search for "learn APL" works but I found [this](https://tutorial.dyalog.com/) to be useful. 105 | 106 | The [APL Wiki](https://aplwiki.com/wiki/Main_Page) is good too. 107 | -------------------------------------------------------------------------------- /dynamic_pagination_with_sparql_anything/README.md: -------------------------------------------------------------------------------- 1 | # Dynamic Pagination with SPARQL Anything 2 | 3 | [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything) can talk to REST APIs. 4 | Some REST APIs employ pagination to reduce the number of results shown and to deliver results incrementally to the client. 5 | An example of such an API is the [arxiv API](https://arxiv.org/help/api/user-manual#Quickstart). 6 | 7 | For example, this curl command does a search for papers written by Seth Lloyd where the abstract contains the word "coherent": 8 | 9 | ``curl 'http://export.arxiv.org/api/query?search_query=abs:coherent+AND+au:seth%20lloyd&max_results=10&start=0'`` 10 | 11 | It returns XML and part of that XML looks like: 12 | ``` 13 | 153 14 | 1 15 | 10 16 | ``` 17 | 18 | So there are 16 "pages" of results with 10 items per page. 19 | Some REST APIs use page numbers but this one uses offsets (or start index). 20 | 21 | Page number 0 has offset 0, page number 1 has offset 10, etc. 22 | Finally, page number 15 has offset 150. 23 | 24 | 25 | You can handle pagination and step through pages in any application code but you can also step through those pages in a single SPARQL query. 26 | 27 | First you have to run SPARQL Anything. 28 | There are instructions on the project's README and for this example I am using the docker image described [here](https://github.com/justin2004/sparql.anything/blob/fuseki-docker/BROWSER.md). 29 | 30 | 31 | Here is the query (which you can run in a bash shell): 32 | 33 | 34 | ```sparql 35 | curl --silent 'http://localhost:3000/sparql.anything' \ 36 | --header "Accept: application/turtle" \ 37 | --data-urlencode 'query= 38 | PREFIX xyz: 39 | PREFIX rdf: 40 | PREFIX fx: 41 | prefix xhtml: 42 | prefix what: 43 | PREFIX xsd: 44 | prefix a9: 45 | construct {?s ?p ?o} 46 | WHERE { 47 | bind(encode_for_uri("coherent") as ?string_in_abstract) 48 | bind(encode_for_uri("seth lloyd") as ?author) 49 | bind(encode_for_uri("10") as ?results_per_page) 50 | bind(concat("x-sparql-anything:media-type=application/xml,location=http://export.arxiv.org/api/query?search_query=abs:", 51 | ?string_in_abstract,"+AND+au:",?author,"&max_results=",?results_per_page) as ?service_string) . 52 | bind(iri(concat(?service_string,"&start=0")) as ?service) 53 | service ?service { # this first service just gets the number of results and discards the first page of actual results 54 | values ?allowed_page { 0 1 2 3 } . # pages are 0-based and these are the pages you want (if they exist) 55 | [] a fx:root ; 56 | ?slot_p [ a a9:totalResults ; 57 | rdf:_1 ?total_results ] ; 58 | ?slot_p1 [ a a9:itemsPerPage ; 59 | rdf:_1 ?items_per_page ] ; 60 | ?slot_p2 [ a a9:startIndex ; 61 | rdf:_1 ?start_index ] . 62 | bind(xsd:integer(ceil(xsd:integer(?total_results) / xsd:integer(?items_per_page))) as ?total_pages) . 63 | filter(xsd:integer(?allowed_page) < ?total_pages) . 64 | bind(str(xsd:integer(?allowed_page) * xsd:integer(?items_per_page)) as ?offset) . 65 | } 66 | bind(iri(concat(?service_string,"&start=",coalesce(?offset),"")) as ?service_step) . 67 | service ?service_step { # this service steps through each allowed page of results. 68 | # the search results of the previous service are discarded to allow all binding to happen in a single 69 | # service (this one) for simplicity. 70 | ?s ?p ?o . 71 | } 72 | }' 73 | ``` 74 | 75 | 76 | Which will produce triples something like: 77 | ``` 78 | [ a fx:root , ; 79 | rdf:_1 [ a 80 | ... 81 | ] 82 | 83 | [ a fx:root , ; 84 | rdf:_1 [ a 85 | ... 86 | ] 87 | 88 | [ a fx:root , ; 89 | rdf:_1 [ a 90 | ... 91 | ] 92 | 93 | [ a fx:root , ; 94 | rdf:_1 [ a 95 | ... 96 | ] 97 | 98 | ``` 99 | 100 | You'll see 4 subjects at the top level. 101 | One for each page we allowed (0, 1, 2, and 3). 102 | 103 | Each subject represents a page of results that you can project values of interest out of (with a SPARQL select). 104 | You can even transform them into thoughtfully modeled triples (with a SPARQL construct). 105 | -------------------------------------------------------------------------------- /fused_edges/README.md: -------------------------------------------------------------------------------- 1 | # Fused Edges 2 | 3 | If you are doing domain modeling and using a graph database you might be tempted to use fused edges. 4 | You see them around the semantic web. 5 | But you should resist the temptation. 6 | 7 | ## What 8 | 9 | In a graph database a fused edge occurs when a domain modeler uses a single edge where a node and two edges would be more thoughtful. 10 | To me a fused edge feels like running an interstate through an area of interest and not putting an exit nearby. 11 | It also feels like putting a cast on a joint that normally articulates. 12 | 13 | Here is an example of a fused edge: 14 | 15 | ![fused edges](media/fused_edges.png) 16 | 17 | And here is what that fused edge looks like in turtle (a popular RDF graph serialization): 18 | ```ttl 19 | :event01 :venueName "Olive Garden" . 20 | ``` 21 | 22 | You can see the fusion of edges in the name of the edge usually: there is a "venue" and there is a "name." 23 | 24 | Here is a more thoughtful representation 25 | ![articulating edges](media/articulating_edges.png) 26 | 27 | 28 | with an additional point of articulation: the venue. 29 | ```ttl 30 | :event01 :occursIn :venue01 . 31 | :venue01 :name "Olive Garden" . 32 | ``` 33 | 34 | Here is another common fused edge: 35 | 36 | ```ttl 37 | :person02 :mothersMaidenName "Smith" . 38 | ``` 39 | 40 | vs. 41 | 42 | ```ttl 43 | :person02 :hasMother :person01 . 44 | :person01 :maidenName "Smith" . 45 | ``` 46 | 47 | 48 | ## Why 49 | I can think of three (two I heard other people say) reasons why fused edges might be used. 50 | Let's use the event and venue example. 51 | 52 | 1) [Your source data may not have details about the venue other than its name.](https://twitter.com/valexiev1/status/1509176909741109258?s=20&t=SBnKJ9_TXmVwgRgvfz2aLg) 53 | 54 | 2) ["you get better #findability with dedicated properties"](https://twitter.com/salgo60/status/1516753692728471559?s=20&t=sYoBxBlyLxBg0XWUqQ9fNQ) 55 | 56 | 3) Fewer nodes in a graph likely means fewer hardware resources are required. 57 | 58 | Let me attempt to persuade you that you should mostly ignore those reasons to use fused edges. 59 | 60 | (1) 61 | 62 | One of the ideas of the semantic web is AAA: Anyone can say Anything about Any topic. 63 | 64 | It is hard for someone to say something about the venue (perhaps its address, current owner, hours of operation, other events that occur there, etc.) if no node exists in the graph for it. 65 | With the fused edge, if someone does come along later and they want to express the venue's address it is not a straight forward update. 66 | You'd have to make a new venue node, find the event node in the graph, find all the edges expressing facts about the venue and move them to the new venue node, then connect the event to the new venue node. 67 | Finding all the edges hanging off of the event that express facts about the venue will likely be a manual effort -- there probably won't be clever data for the machine to use that says `:venueName` is not a direct attribute of the event but rather it is a direct attribute of the venue not yet represented in the graph. 68 | 69 | Also, fused edges encourage the use of additional fused edges. 70 | If you don't have a node to reference then a modeler might make more fused edges in order to express additional information. 71 | 72 | (2) 73 | 74 | Giving a shortcut a name can be valuable, yes. 75 | 76 | But I think if you use a shortcut the details that the shortcut hides should also be available. 77 | If you use fused edges those details are not available; there is only the shortcut. 78 | 79 | There are ways to have dedicated properties without sacrificing the details. 80 | 81 | In SPARQL you can use shortcuts: property paths. 82 | In OWL you can define those shortcuts: property chains. 83 | 84 | In a SPARQL query you could just do 85 | ```sparql 86 | ?event :occursIn/:name ?venue_name . 87 | ``` 88 | 89 | Or you could define that in OWL 90 | ```ttl 91 | :venueName owl:propertyChainAxiom ( :occursIn :name ) . 92 | ``` 93 | And if you have an OWL 2 reasoner active you can just query using the shortcut you just defined 94 | ```sparql 95 | ?event :venueName ?venue_name . 96 | ``` 97 | 98 | (3) 99 | 100 | Ok, using fused edges does reduce the number of triples in your graph. 101 | I can put a billion triples in a triplestore on my laptop and query durations will probably be acceptable. 102 | If I put 100 billion triples on my laptop query durations might not be acceptable. 103 | Still I think I would rather consider partitioning the data and using SPARQL query federation rather than fusing edges together to reduce resource requirements. 104 | I say that because I reach for semantic web technologies when I think radical data interoperability and serendipity would be valuable. 105 | 106 | Fused edges and radical data interoperability don't go together. 107 | Fused edges are about the use cases you currently know about and the data you currently have. 108 | Graphs with thoughtful points of articulation are about the use cases you know about, those you discover tomorrow, and about potential data. 109 | Points of articulation in a graph suggest enrichment opportunities and new questions. 110 | 111 | 112 | 113 | 114 | ## Schema.org 115 | 116 | [Schema.org](https://github.com/schemaorg/schemaorg) is a well known ontology that unfortunately has lots of fused edges. 117 | 118 | If you run this SPARQL query against `schema.ttl` you'll see some examples. 119 | ```sparql 120 | PREFIX schema: 121 | PREFIX rdfs: 122 | SELECT ?s ?com 123 | WHERE 124 | { graph ?g { 125 | ?s rdfs:comment ?com . 126 | { 127 | GRAPH ?g 128 | { ?s schema:rangeIncludes schema:URL 129 | MINUS 130 | { ?s schema:rangeIncludes ?o 131 | FILTER ( ?o != schema:URL ) 132 | } 133 | } 134 | } 135 | } 136 | } 137 | ``` 138 | 139 | That query finds properties that are intended to have only instances of schema:URL in the object position. 140 | 141 | You get these bindings: 142 | 143 | 144 | |s |com | 145 | |--------------------|-------------------------------------------------------------------------------------------------------------------------------------------| 146 | |https://schema\.org/sameAs|URL of a reference Web page that unambiguously indicates the item's identity\. E\.g\. the URL of the item's Wikipedia page, Wikidata entry, or official website\.| 147 | |https://schema\.org/additionalType|An additional type for the item, typically used for adding more specific types from external vocabularies in microdata syntax\. This is a relationship between something and a class that the thing is in\. In RDFa syntax, it is better to use the native RDFa syntax \- the 'typeof' attribute \- for multiple types\. Schema\.org tools may have only weaker understanding of extra types, in particular those defined externally\.| 148 | |https://schema\.org/codeRepository|Link to the repository where the un\-compiled, human readable code and related code is located \(SVN, github, CodePlex\)\. | 149 | |https://schema\.org/contentUrl|Actual bytes of the media object, for example the image file or video file\. | 150 | |https://schema\.org/discussionUrl|A link to the page containing the comments of the CreativeWork\. | 151 | |https://schema\.org/downloadUrl|If the file can be downloaded, URL to download the binary\. | 152 | |https://schema\.org/embedUrl|A URL pointing to a player for a specific video\. In general, this is the information in the \`\`\`src\`\`\` element of an \`\`\`embed\`\`\` tag and should not be the same as the content of the \`\`\`loc\`\`\` tag\.| 153 | |https://schema\.org/installUrl|URL at which the app may be installed, if different from the URL of the item\. | 154 | |https://schema\.org/map|A URL to a map of the place\. | 155 | |https://schema\.org/maps|A URL to a map of the place\. | 156 | |https://schema\.org/paymentUrl|The URL for sending a payment\. | 157 | |https://schema\.org/relatedLink|A link related to this web page, for example to other related web pages\. | 158 | |https://schema\.org/replyToUrl|The URL at which a reply may be posted to the specified UserComment\. | 159 | |https://schema\.org/serviceUrl|The website to access the service\. | 160 | |https://schema\.org/significantLinks|The most significant URLs on the page\. Typically, these are the non\-navigation links that are clicked on the most\. | 161 | |https://schema\.org/significantLink|One of the more significant URLs on the page\. Typically, these are the non\-navigation links that are clicked on the most\. | 162 | |https://schema\.org/targetUrl|The URL of a node in an established educational framework\. | 163 | |https://schema\.org/thumbnailUrl|A thumbnail image relevant to the Thing\. | 164 | |https://schema\.org/trackingUrl|Tracking url for the parcel delivery\. | 165 | |https://schema\.org/url|URL of the item\. | 166 | 167 | 168 | 169 | You can see that most of those object properties are fused edges. 170 | 171 | e.g. 172 | 173 | schema:paymentUrl fuses together `hasPayment` and `url` 174 | 175 | schema:trackingUrl fuses together `hasTracking` and `url` 176 | 177 | schema:codeRepository fuses together `hasCodeRepository` and `url` 178 | 179 | etc. 180 | 181 | I think each of those named shortcuts would be fine if they were built up from primitives like 182 | ```ttl 183 | :codeRepositoryURL owl:propertyChainAxiom ( :hasCodeRepository :url ) . 184 | ``` 185 | but I might not put them in core Schema.org because then what stops people from thinking all their favorite named shortcuts belong in core Schema.org? 186 | 187 | 188 | Also if you run that same query with `schema:Place` (instead of `schema:URL`) you can see many more fused properties. 189 | Maybe I'll do another post where I catalog all the fused properties in Schema.org. 190 | 191 | ## Wrap it up 192 | 193 | If you find yourself in the position of building an ontology (the T-box) then remember that the object properties you create will shape the way domain modelers think about decomposing their data. 194 | An ontology with composable object/data properties, such as [Gist](https://github.com/semanticarts/gist), encourages domain modelers to use points of articulation in their graphs. 195 | You can always later define object properties that build upon the more primitive and composable object properties but once you start fusing edges it could be hard to reel it in. 196 | 197 | Please consider not using fused edges and instead use an ontology that encourages the thoughtful use of points (nodes) of articulation. 198 | I don't see how [the semantic web can turn down any stereo's volume when you get a phone call](https://www-sop.inria.fr/acacia/cours/essi2006/Scientific%20American_%20Feature%20Article_%20The%20Semantic%20Web_%20May%202001.pdf) without thoughtful points of articulation. 199 | 200 | ## Final Appeal 201 | 202 | If you believe you must use an edge like `:venueName` then please put something like this in your Tbox: 203 | `:venueName owl:propertyChainAxiom ( :occursIn :name ) .` 204 | 205 | 206 | 207 | ## Appendix 208 | 209 | 210 | 211 | schema.org way (fused edges) 212 | ```ttl 213 | [ a schema:CreativeWork ; 214 | a wd:Q1886349 ; # Logo 215 | schema:url "https://i.imgur.com/46JjPLl.jpg" ; 216 | rdfs:label "Shipwreck Cafe Logo" ; 217 | schema:discussionUrl "https://gist.github.com/justin2004/183add3d617105cc9cc7cee013d44198" ] 218 | 219 | ``` 220 | 221 | points of articulation way 222 | ```ttl 223 | [ a schema:UserComments ; 224 | schema:url "https://gist.github.com/justin2004/183add3d617105cc9cc7cee013d44198" ; 225 | schema:discusses [ a schema:CreativeWork ; 226 | a wd:Q1886349 ; # Logo 227 | rdfs:label "Shipwreck Cafe Logo" ; 228 | schema:url "https://i.imgur.com/46JjPLl.jpg" 229 | ] 230 | ] 231 | wd:Q113149564 schema:logo "https://i.imgur.com/46JjPLl.jpg" . 232 | ``` 233 | 234 | `schema:discussionUrl` is really a shorthand for the property path: `(^schema:discusses)/schema:url`. 235 | So it is 2 edges fused together in such a way that you can't reference the node in the middle: the discussion itself. 236 | If you can't reference the node in the middle (the discussion itself) you can't say when it started, when it ended, 237 | who the participants were, etc. 238 | 239 | 240 | Oh, I think the reason Schema.org has so many fused edges is that it is designed as a way to add semantics to webpages. 241 | A webpage is a document... which is often a bag of information. 242 | So a fused edge leaving a bag of information doesn't seem like such a sin. 243 | But, personally, that makes me want to do more than attempt to hang semantics off of a bag of information. 244 | -------------------------------------------------------------------------------- /fused_edges/media/articulating_edges.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/fused_edges/media/articulating_edges.png -------------------------------------------------------------------------------- /fused_edges/media/fused_edges.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/fused_edges/media/fused_edges.png -------------------------------------------------------------------------------- /git_repo_as_rdf/media/bodge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/git_repo_as_rdf/media/bodge.png -------------------------------------------------------------------------------- /git_repo_as_rdf/media/curl_tweet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/git_repo_as_rdf/media/curl_tweet.png -------------------------------------------------------------------------------- /git_repo_as_rdf/media/exploded_diagram.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/git_repo_as_rdf/media/exploded_diagram.jpg -------------------------------------------------------------------------------- /git_repo_as_rdf/media/first.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/git_repo_as_rdf/media/first.png -------------------------------------------------------------------------------- /git_repo_as_rdf/media/maybe.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/git_repo_as_rdf/media/maybe.gif -------------------------------------------------------------------------------- /git_repo_as_rdf/media/ora_slide.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/git_repo_as_rdf/media/ora_slide.jpg -------------------------------------------------------------------------------- /iDE/README.md: -------------------------------------------------------------------------------- 1 | # iDE - Integrat*ing* Development Environment 2 | 3 | 4 | ## An iDE? 5 | 6 | If you are around software development you've heard of IDEs (integrated development environment). 7 | You probably use one or know someone who does. 8 | 9 | I'd like to contrast IDEs (integrat*ed*) with iDEs (integrat*ing*). 10 | 11 | "Integrated" connotes finality. 12 | As in "it has already been integrated for you -- you just use it now." 13 | 14 | Of course IDEs are extensible but I only know one person that has made an extension/plug-in for an IDE. 15 | On the other hand, I know many vim and emacs users that have made extensions for their iDE. 16 | No, they don't call their thing an iDE but they do use it like an iDE. 17 | 18 | The spirit of an iDE is: using [escape hatches](https://wiki.c2.com/?EscapeHatch) to weave together tools that do their thing well. 19 | That probably explains why you might not hear about iDEs... maybe they don't deserve a name but I needed to give this blog post a title. 20 | 21 | 22 | 23 | It might just be that the people that select an IDE do so because it is already integrated (and they just want to work within it) while vim and emacs users are inclined to shave the yak a little. 24 | 25 | The first IDE I used was Eclipse. 26 | I didn't choose it though. 27 | That was the IDE my team was using to develop and maintain some software and all the setup guides for our codebase were Eclipse oriented. 28 | So I used it as I wasn't prepared to roll my own approach yet. 29 | 30 | Long story short, I never enjoyed using it. 31 | I did what I had to in it and used the command line (custom scripts, pipelines, etc.) for everything else. 32 | 33 | Fast forward to today -- I now adapt my iDE to whatever project/codebase I need to work on. 34 | For my day job one of things things I get to do is develop and maintain knowledge graphs. 35 | That requires that I modify RDF and SPARQL files. 36 | 37 | I want to show you my iDE for the semantic web stack (SPARQL/RDF). 38 | 39 | 40 | ## My RDF/SPARQL workflow 41 | 42 | Since my iDE is mostly a patchwork of tools that work easily and well together I've found that it is easier to get tools to work together if they can communicate using text. 43 | That can be in the form of files on the filesystem, command line options, keyboard input, or text-based network protocols (like REST). 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | In order to demonstrate this workflow I've set up some RDF files in the `example/` directory. 54 | 55 | 56 | ## Demonstration 57 | 58 | Here are the elements of my semantic web iDE: 59 | 60 | - Named terminal sessions 61 | - [terminal multiplexer (tmux)](https://github.com/tmux/tmux/wiki) 62 | - I typically have a session for editing RDF files, editing and running SPARQL queries, and I usually keep a session for each serivice I am running (Apache Jena Fuseki, [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything), Protege, etc.) 63 | 64 | - Editing, viewing, syntax highlighting/checking, execution 65 | - [vim](https://en.wikipedia.org/wiki/Vim_(text_editor)) 66 | - Note that I get asynchronous SPARQL query execution in vim so I can keep editing queries while running a big query 67 | - [Apache Jena's riot command](https://jena.apache.org/documentation/tools/index.html) 68 | 69 | - RDF (e.g. turtle file) formatting 70 | - [rdf-toolkit](https://github.com/edmcouncil/rdf-toolkit) 71 | - This has the benefit that everyone on your team can use their favorite RDF editor and you can still have minimal diffs (when using version control) 72 | 73 | - SPARQL query formatting and execution 74 | - [Apache Jena](https://jena.apache.org/) 75 | - Note: I've done the same thing with other triple stores (RDFox, GraphDB, Stardog, etc.) 76 | - Even if I am not using Jena as the triplestore I still always use the Jena java library and the command line utilities. 77 | 78 | - RDF navigation (jumping to RDF resources) 79 | - [ctags](https://en.wikipedia.org/wiki/Ctags) 80 | - It's been around a long time and is supported by many tools. 81 | 82 | - Analyzing/manipulating query results 83 | - [VisiData](https://www.visidata.org/) 84 | - I highly recommend learning VisiData -- I used [this tutorial](https://jsvine.github.io/intro-to-visidata/index.html). 85 | 86 | 87 | First the demonstration then I'll describe the setup below. 88 | 89 | I thought it would be easier show the workflow with a video and narration. 90 | 91 | [Here is the video](https://www.youtube.com/watch?v=mJpgZGddOMs). 92 | 93 | 94 | 95 | ## Areas of future improvement? 96 | 97 | 98 | 99 | When I want to see a tree view of a class hierarchy or taxonomy I often use Protege but for skos based taxonomies I first have to transform to it to a `rdfs:subClassOf` style hierarchy. 100 | I [started on a visidata based tree viewer](https://github.com/saulpw/visidata/discussions/2282#discussioncomment-8633738) but I need to finish it. 101 | 102 | When I need to produce a pictorial graph I often use my former colleague's [Turtle Editor Viewer](https://www.semantechs.co.uk/turtle-editor-viewer/). 103 | It might be nice to incorporate that or something like it into my workflow. 104 | 105 | I use [LSP](https://langserver.org/) for some langauges. 106 | It would be interesting to think about adding LSP support for RDF and SPARQL files. 107 | 108 | Also, the plugin I use for RDF file syntax highlight has support for completion (using [prefix.cc](https://prefix.cc/)) but it didn't work out of the box so I need to tinker with it. 109 | In the meantime I just use vim's built-in tab completion. 110 | 111 | 112 | 113 | ## Setup 114 | 115 | 116 | At the center of it all: [the command line](https://web.stanford.edu/class/cs81n/command.txt), of course. 117 | I assume you are using `bash` as the interpreter but I think the commands will still work if `zsh` was selected for you. 118 | 119 | Also, although I use vim I have a colleague who uses emacs in a similar manner. 120 | I bet you could get a similar iDE with several other text editors provided they have sufficient escape hatch capability. 121 | 122 | 123 | ### Packages 124 | 125 | First we need to install some packages (I am assuming a Debian based distro): 126 | 127 | ```bash 128 | sudo apt install tmux git vim-common default-jre universal-ctags python3 129 | ``` 130 | 131 | Also, I recommend that you [install Docker](https://docs.docker.com/engine/install/debian/#install-using-the-repository). 132 | This is how I package [the SPARQL query pretty printer](https://github.com/justin2004/sparql_pretty) and it is useful for running things like instances of Apache Jena Fuseki, SPARQL Anything, etc. 133 | Without Docker, you have to think "do I have the runtime it requires? all the dependencies? the config files? what is this thing's path?" 134 | 135 | 136 | ### VIM Plugins 137 | 138 | 139 | Set up the vim plugin manager: [Vundle](https://github.com/VundleVim/Vundle.vim) 140 | There are instructions there in its readme. 141 | 142 | And put this into your `~/.vimrc` in the section designated for Vundle: 143 | 144 | ```vim 145 | " for syntax highlighting 146 | Plugin 'niklasl/vim-rdf' 147 | " for running commands asynchronously in vim 148 | Plugin 'https://github.com/skywind3000/asyncrun.vim' 149 | " for quikly commenting on/off blocks of code (in this case in RDF and SPARQL) 150 | Plugin 'tpope/vim-commentary' 151 | ``` 152 | 153 | In vim, run `:PluginInstall` as Vundle requires for new packages. 154 | 155 | 156 | ### RDF File Serializer 157 | 158 | Download an rdf-toolkit release jarfile. 159 | 160 | The current version is [here](https://github.com/edmcouncil/rdf-toolkit/releases/tag/v2.0). 161 | 162 | In order to make navigation around RDF files possible we need to format (serialize) turtle files in a standard way. 163 | rdf-toolkit works well as a serializer. 164 | 165 | Here is how I sometimes run it over all the turtle files in my current directory: 166 | 167 | ```bash 168 | ls -1 *ttl | while read IN ; do java -jar ~/Downloads/rdf-toolkit.jar -sdt explicit -dtd -ibn -s $IN > ${IN}_formatted.ttl ; done 169 | ``` 170 | 171 | Later I'll show you a way to do directly from vim if you prefer. 172 | 173 | 174 | ### RDF File Navigation 175 | 176 | Since we are serializing using rdf-toolkit we can the widely supported `ctags` to jump to URIs (in the same and different files). 177 | 178 | `cd` to the `example/` directory in this repository and run the following to build the index file: 179 | 180 | ```bash 181 | ctags --langdef=turtle --langmap=turtle:.ttl --regex-turtle='/^([<][^>]+[>])/\1/T,urinoprefix/' --regex-turtle='/^([a-z]+:[0-9a-zA-Z_.]+).*/\1/T,uri,a uri/' -V -R . 182 | ``` 183 | 184 | Then you should see a `tags` file in your current directory. 185 | As long as you start vim from that directory the tags index will be used. 186 | 187 | Also, you need to tell vim about the lexical syntax of keywords in turtle files. 188 | e.g. URI Qnames have the `:` character, full URIs are surrounded by `<` and `>`, etc. 189 | 190 | While we are at the let's set up a few other useful settings for RDF and SPARQL files. 191 | Put this into your `~/.vimrc`: 192 | 193 | (**NOTE**: you'll need to look at the raw markdown file to copy this vim configuration from.) 194 | 195 | ```vim 196 | " set the filetype to 'sparql' for .sparql and .rq 197 | au BufRead,BufNewFile *.{sparql,rq} setfiletype sparql 198 | au BufRead,BufNewFile *.{turtle,ttl} setfiletype turtle 199 | 200 | " set up comment characters 201 | au FileType turtle set commentstring=#%s 202 | au FileType sparql set commentstring=#%s 203 | 204 | " allow URIs (full and qnames) be to keywords in vim 205 | au FileType turtle set iskeyword=@,48-57,_,192-255,:,-,<,>,/,. 206 | 207 | " allow RDF file formatting/syntax checking in vim 208 | " NOTE you'll need the path to where you downloaded and unzipped the Jena utilities 209 | au BufRead *.ttl let &l:equalprg='~/Downloads/apache-jena-5.0.0/bin/riot --formatted=turtle --syntax=turtle - ' 210 | 211 | 212 | " define a macro for running the SPARQL query pretty printer 213 | au FileType sparql let @p = '?query= j0v/''/-1/ !docker run --rm -i justin2004/sparql_pretty :noh ' 214 | 215 | " define a macro for executing a SPARQL query and putting the results into visidata 216 | au FileType sparql let @q = 'mm1Gvap"cy''mvap"Cy:split +put\ c buf0 :set buftype=nofile :file! `mktemp` :w! /tmp/lala1.sh :q ''mzz :AsyncRun tmux split-window bash -l -c "source /tmp/lala1.sh > /tmp/lala1.csv ; vd /tmp/lala1.csv" ' 217 | ``` 218 | 219 | Now start vim and edit one of the .ttl files in `example/`. 220 | Put your cursor on a URI and press `ctrl` `]` to jump to the definition. 221 | 222 | You can get more help on jumping to keywords in vim by running: 223 | 224 | ```vim 225 | :help ^] 226 | ``` 227 | 228 | or 229 | 230 | ```vim 231 | :help tags 232 | ``` 233 | 234 | After you jump to several keywords (URIs) you can jump back through the URIs you came through with `ctrl` `t`. 235 | 236 | 237 | ### RDF File Formatting/Validation 238 | 239 | In vim, you can format (and syntax check) a RDF (turtle) file by going to the first line `GG` and then pressing `=G`. 240 | That will invoke the `equalprg` we defined in `.vimrc` on the text in the file. 241 | 242 | e.g. 243 | 244 | If I do that on a file called `some.ttl` with this content: 245 | 246 | ```turtle 247 | PREFIX skos: 248 | PREFIX ex: 249 | ex:thing55 skos:broader ex:thing34 . 250 | ex:thing57 skos:broader [ skos:prefLabel "apple" . 251 | ``` 252 | 253 | You'll get: 254 | 255 | ``` 256 | 13:45:02 ERROR riot :: [line: 4, col: 51] Triples not terminated properly in []-list 257 | ``` 258 | 259 | Since we forgot the closing `]` at the end of the blank node on line 4. 260 | 261 | 262 | 263 | 264 | ### Terminal Multiplexing 265 | 266 | Maybe first [find a quick tutorial on tmux](https://letmegooglethat.com/?q=tmux+tutorial). 267 | I started with GNU Screen long ago and then upgraded to tmux so I don't have one to recommend but there are several out there. 268 | 269 | Create a tmux session called "ontology". 270 | In that session, using vim, open one of the .ttl files in the `example/` directory. 271 | 272 | Now create a tmux session called "query" 273 | 274 | Using vim, create a file (e.g. `sparql.sh`) for your SPARQL queries. 275 | 276 | At the top of the file put this: 277 | 278 | ```bash 279 | # vim: filetype=sparql 280 | # 281 | # optionally you can include variables like this if you need them: 282 | USER=justin 283 | PASSWORD=your_password_here 284 | # 285 | # the execution macro will prepend these variables definitions to your SPARQL query so you can do variable substitution if you need. 286 | 287 | 288 | # and here is a query to get you started 289 | # 290 | # NOTE that the macro that executes that depends on seeing `query=` and the ending `'` on a line of its own 291 | 292 | curl --silent 'https://query.wikidata.org/sparql' \ 293 | --header "Accept: text/csv" \ 294 | --data-urlencode 'query= 295 | select * 296 | where {?s ?p ?o} 297 | limit 5 298 | ' 299 | 300 | ``` 301 | 302 | ### VisiData Configuration 303 | 304 | Before we use the execution macro let's get VisiData setup. 305 | 306 | VisiData has installation instructions [here](https://github.com/saulpw/visidata). 307 | 308 | You can test to see if it is working by running `vd .` to open VisiData on your current directory. 309 | 310 | After you do that put this into `~/.visidatarc`: 311 | 312 | ```python 313 | def get_qname(uri): 314 | # TODO perhaps build this from a .ttl file 315 | PREFIXES={ 316 | 'http://www.bbc.co.uk/ontologies/bbc/':'bbc', 317 | 'http://www.bbc.co.uk/ontologies/coreconcepts/':'core', 318 | 'http://www.bbc.co.uk/ontologies/creativework/':'cwork', 319 | 'http://purl.org/dc/elements/1.1/':'dc', 320 | 'http://purl.org/dc/terms/':'dcterms', 321 | 'http://www.w3.org/2002/07/owl#':'owl', 322 | 'http://www.bbc.co.uk/ontologies/provenance/':'provenance', 323 | 'http://www.w3.org/1999/02/22-rdf-syntax-ns#':'rdf', 324 | 'http://www.w3.org/2000/01/rdf-schema#':'rdfs', 325 | 'http://www.bbc.co.uk/ontologies/tagging/':'tagging', 326 | 'http://www.w3.org/2001/XMLSchema#':'xsd' 327 | } 328 | for prefix in PREFIXES.keys(): 329 | if uri.startswith(prefix): 330 | return uri.replace(prefix, PREFIXES[prefix] + ":") 331 | return uri 332 | 333 | def to_uri(): 334 | TMUX_RDF_SESSION='ontology' 335 | subprocess.run(['tmux','switch-client','-t',TMUX_RDF_SESSION]) 336 | uri = get_qname(vd.sheet.cursorCell.value) 337 | subprocess.run(['tmux','send-keys','-t',TMUX_RDF_SESSION,':tsel ' + uri , 'Enter', '1','Enter']) 338 | 339 | BaseSheet.addCommand('3', 'go-to-uri', 'to_uri()') 340 | ``` 341 | 342 | That defines a python function and makes a VisiData command to invoke it and binds the key `3` to it. 343 | You can change the key of course. 344 | 345 | 346 | ### SPARQL Query Execution 347 | 348 | Now, put your cursor anywhere on the body of the SPARQL query we put into `query.sh`. 349 | To pretty print the query press `@p`. 350 | 351 | Note that the pretty printing adds a empty line between the prefixes and the `select` line. 352 | Unfortunately you'll have to delete that empty line that before you execute the query since the macro depends on the query being wholly within the vim paragraph text object. 353 | That also means you can't have other empty lines in your SPARQL queries. 354 | If I want whitespace I just use a leading `#` on those lines. 355 | 356 | In vim run this if you want more details on text objects: 357 | 358 | ```vim 359 | :help text-objects 360 | ``` 361 | 362 | Now to execute the SPARQL query. 363 | 364 | With your cursor anywhere in the SPARQL query body, press `@q` and the query should get executed and VisiData will open in another tmux pane with the results of the query. 365 | 366 | In VisiData when your cursor is on a cell with a URI in it you can press `3` to jump to the tmux "ontology" session and find that keyword. 367 | 368 | ## Need Help? 369 | 370 | I hope you find this useful. 371 | Perhaps it can inspire you to build an iDE for yourself. 372 | 373 | These tools (the command line, tmux, and vim/emacs) and techniques (escape hatchery) can serve you well even if you don't work on the semantic web stack. 374 | 375 | Feel free to open an issue or start a discussion on this repository if you need help getting any of this to work or if you have any ideas on how to extend it! 376 | 377 | --- 378 | 379 | ### RDF file downloads used in `example/`: 380 | 381 | - https://www.bbc.co.uk/ontologies/documents/creativework.ttl 382 | - https://www.bbc.co.uk/ontologies/documents/bbc.ttl 383 | - https://www.dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms.ttl 384 | - http://www.w3.org/2002/07/owl# 385 | - http://www.w3.org/1999/02/22-rdf-syntax-ns# 386 | - http://www.w3.org/2000/01/rdf-schema# 387 | -------------------------------------------------------------------------------- /iDE/example/rdf.ttl: -------------------------------------------------------------------------------- 1 | @prefix dc: . 2 | @prefix owl: . 3 | @prefix rdf: . 4 | @prefix rdfs: . 5 | @prefix xs: . 6 | 7 | 8 | a owl:Ontology ; 9 | dc:date "2019-12-16"^^xs:string ; 10 | dc:description "This is the RDF Schema for the RDF vocabulary terms in the RDF Namespace, defined in RDF 1.1 Concepts."^^xs:string ; 11 | dc:title "The RDF Concepts Vocabulary (RDF)"^^xs:string ; 12 | . 13 | 14 | rdf:Alt 15 | a rdfs:Class ; 16 | rdfs:subClassOf rdfs:Container ; 17 | rdfs:label "Alt"^^xs:string ; 18 | rdfs:comment "The class of containers of alternatives."^^xs:string ; 19 | rdfs:isDefinedBy ; 20 | . 21 | 22 | rdf:Bag 23 | a rdfs:Class ; 24 | rdfs:subClassOf rdfs:Container ; 25 | rdfs:label "Bag"^^xs:string ; 26 | rdfs:comment "The class of unordered containers."^^xs:string ; 27 | rdfs:isDefinedBy ; 28 | . 29 | 30 | rdf:CompoundLiteral 31 | a rdfs:Class ; 32 | rdfs:subClassOf rdfs:Resource ; 33 | rdfs:label "CompoundLiteral"^^xs:string ; 34 | rdfs:comment "A class representing a compound literal."^^xs:string ; 35 | rdfs:isDefinedBy ; 36 | rdfs:seeAlso ; 37 | . 38 | 39 | rdf:HTML 40 | a rdfs:Datatype ; 41 | rdfs:subClassOf rdfs:Literal ; 42 | rdfs:label "HTML"^^xs:string ; 43 | rdfs:comment "The datatype of RDF literals storing fragments of HTML content"^^xs:string ; 44 | rdfs:isDefinedBy ; 45 | rdfs:seeAlso ; 46 | . 47 | 48 | rdf:JSON 49 | a rdfs:Datatype ; 50 | rdfs:subClassOf rdfs:Literal ; 51 | rdfs:label "JSON"^^xs:string ; 52 | rdfs:comment "The datatype of RDF literals storing JSON content."^^xs:string ; 53 | rdfs:isDefinedBy ; 54 | rdfs:seeAlso ; 55 | . 56 | 57 | rdf:List 58 | a rdfs:Class ; 59 | rdfs:subClassOf rdfs:Resource ; 60 | rdfs:label "List"^^xs:string ; 61 | rdfs:comment "The class of RDF Lists."^^xs:string ; 62 | rdfs:isDefinedBy ; 63 | . 64 | 65 | rdf:PlainLiteral 66 | a rdfs:Datatype ; 67 | rdfs:subClassOf rdfs:Literal ; 68 | rdfs:label "PlainLiteral"^^xs:string ; 69 | rdfs:comment "The class of plain (i.e. untyped) literal values, as used in RIF and OWL 2"^^xs:string ; 70 | rdfs:isDefinedBy ; 71 | rdfs:seeAlso ; 72 | . 73 | 74 | rdf:Property 75 | a rdfs:Class ; 76 | rdfs:subClassOf rdfs:Resource ; 77 | rdfs:label "Property"^^xs:string ; 78 | rdfs:comment "The class of RDF properties."^^xs:string ; 79 | rdfs:isDefinedBy ; 80 | . 81 | 82 | rdf:Seq 83 | a rdfs:Class ; 84 | rdfs:subClassOf rdfs:Container ; 85 | rdfs:label "Seq"^^xs:string ; 86 | rdfs:comment "The class of ordered containers."^^xs:string ; 87 | rdfs:isDefinedBy ; 88 | . 89 | 90 | rdf:Statement 91 | a rdfs:Class ; 92 | rdfs:subClassOf rdfs:Resource ; 93 | rdfs:label "Statement"^^xs:string ; 94 | rdfs:comment "The class of RDF statements."^^xs:string ; 95 | rdfs:isDefinedBy ; 96 | . 97 | 98 | rdf:XMLLiteral 99 | a rdfs:Datatype ; 100 | rdfs:subClassOf rdfs:Literal ; 101 | rdfs:label "XMLLiteral"^^xs:string ; 102 | rdfs:comment "The datatype of XML literal values."^^xs:string ; 103 | rdfs:isDefinedBy ; 104 | . 105 | 106 | rdf:direction 107 | a rdf:Property ; 108 | rdfs:label "direction"^^xs:string ; 109 | rdfs:comment "The base direction component of a CompoundLiteral."^^xs:string ; 110 | rdfs:domain rdf:CompoundLiteral ; 111 | rdfs:isDefinedBy ; 112 | rdfs:seeAlso ; 113 | . 114 | 115 | rdf:first 116 | a rdf:Property ; 117 | rdfs:label "first"^^xs:string ; 118 | rdfs:comment "The first item in the subject RDF list."^^xs:string ; 119 | rdfs:domain rdf:List ; 120 | rdfs:isDefinedBy ; 121 | rdfs:range rdfs:Resource ; 122 | . 123 | 124 | rdf:langString 125 | a rdfs:Datatype ; 126 | rdfs:subClassOf rdfs:Literal ; 127 | rdfs:label "langString"^^xs:string ; 128 | rdfs:comment "The datatype of language-tagged string values"^^xs:string ; 129 | rdfs:isDefinedBy ; 130 | rdfs:seeAlso ; 131 | . 132 | 133 | rdf:language 134 | a rdf:Property ; 135 | rdfs:label "language"^^xs:string ; 136 | rdfs:comment "The language component of a CompoundLiteral."^^xs:string ; 137 | rdfs:domain rdf:CompoundLiteral ; 138 | rdfs:isDefinedBy ; 139 | rdfs:seeAlso ; 140 | . 141 | 142 | rdf:nil 143 | a rdf:List ; 144 | rdfs:label "nil"^^xs:string ; 145 | rdfs:comment "The empty list, with no items in it. If the rest of a list is nil then the list has no more items in it."^^xs:string ; 146 | rdfs:isDefinedBy ; 147 | . 148 | 149 | rdf:object 150 | a rdf:Property ; 151 | rdfs:label "object"^^xs:string ; 152 | rdfs:comment "The object of the subject RDF statement."^^xs:string ; 153 | rdfs:domain rdf:Statement ; 154 | rdfs:isDefinedBy ; 155 | rdfs:range rdfs:Resource ; 156 | . 157 | 158 | rdf:predicate 159 | a rdf:Property ; 160 | rdfs:label "predicate"^^xs:string ; 161 | rdfs:comment "The predicate of the subject RDF statement."^^xs:string ; 162 | rdfs:domain rdf:Statement ; 163 | rdfs:isDefinedBy ; 164 | rdfs:range rdfs:Resource ; 165 | . 166 | 167 | rdf:rest 168 | a rdf:Property ; 169 | rdfs:label "rest"^^xs:string ; 170 | rdfs:comment "The rest of the subject RDF list after the first item."^^xs:string ; 171 | rdfs:domain rdf:List ; 172 | rdfs:isDefinedBy ; 173 | rdfs:range rdf:List ; 174 | . 175 | 176 | rdf:subject 177 | a rdf:Property ; 178 | rdfs:label "subject"^^xs:string ; 179 | rdfs:comment "The subject of the subject RDF statement."^^xs:string ; 180 | rdfs:domain rdf:Statement ; 181 | rdfs:isDefinedBy ; 182 | rdfs:range rdfs:Resource ; 183 | . 184 | 185 | rdf:type 186 | a rdf:Property ; 187 | rdfs:label "type"^^xs:string ; 188 | rdfs:comment "The subject is an instance of a class."^^xs:string ; 189 | rdfs:domain rdfs:Resource ; 190 | rdfs:isDefinedBy ; 191 | rdfs:range rdfs:Class ; 192 | . 193 | 194 | rdf:value 195 | a rdf:Property ; 196 | rdfs:label "value"^^xs:string ; 197 | rdfs:comment "Idiomatic property used for structured values."^^xs:string ; 198 | rdfs:domain rdfs:Resource ; 199 | rdfs:isDefinedBy ; 200 | rdfs:range rdfs:Resource ; 201 | . 202 | 203 | -------------------------------------------------------------------------------- /iDE/example/rdfs.ttl: -------------------------------------------------------------------------------- 1 | @prefix dc: . 2 | @prefix owl: . 3 | @prefix rdf: . 4 | @prefix rdfs: . 5 | @prefix xs: . 6 | 7 | 8 | a owl:Ontology ; 9 | dc:title "The RDF Schema vocabulary (RDFS)"^^xs:string ; 10 | rdfs:seeAlso ; 11 | . 12 | 13 | rdfs:Class 14 | a rdfs:Class ; 15 | rdfs:subClassOf rdfs:Resource ; 16 | rdfs:label "Class"^^xs:string ; 17 | rdfs:comment "The class of classes."^^xs:string ; 18 | rdfs:isDefinedBy ; 19 | . 20 | 21 | rdfs:Container 22 | a rdfs:Class ; 23 | rdfs:subClassOf rdfs:Resource ; 24 | rdfs:label "Container"^^xs:string ; 25 | rdfs:comment "The class of RDF containers."^^xs:string ; 26 | rdfs:isDefinedBy ; 27 | . 28 | 29 | rdfs:ContainerMembershipProperty 30 | a rdfs:Class ; 31 | rdfs:subClassOf rdf:Property ; 32 | rdfs:label "ContainerMembershipProperty"^^xs:string ; 33 | rdfs:comment """The class of container membership properties, rdf:_1, rdf:_2, ..., 34 | all of which are sub-properties of 'member'."""^^xs:string ; 35 | rdfs:isDefinedBy ; 36 | . 37 | 38 | rdfs:Datatype 39 | a rdfs:Class ; 40 | rdfs:subClassOf rdfs:Class ; 41 | rdfs:label "Datatype"^^xs:string ; 42 | rdfs:comment "The class of RDF datatypes."^^xs:string ; 43 | rdfs:isDefinedBy ; 44 | . 45 | 46 | rdfs:Literal 47 | a rdfs:Class ; 48 | rdfs:subClassOf rdfs:Resource ; 49 | rdfs:label "Literal"^^xs:string ; 50 | rdfs:comment "The class of literal values, eg. textual strings and integers."^^xs:string ; 51 | rdfs:isDefinedBy ; 52 | . 53 | 54 | rdfs:Resource 55 | a rdfs:Class ; 56 | rdfs:label "Resource"^^xs:string ; 57 | rdfs:comment "The class resource, everything."^^xs:string ; 58 | rdfs:isDefinedBy ; 59 | . 60 | 61 | rdfs:comment 62 | a rdf:Property ; 63 | rdfs:label "comment"^^xs:string ; 64 | rdfs:comment "A description of the subject resource."^^xs:string ; 65 | rdfs:domain rdfs:Resource ; 66 | rdfs:isDefinedBy ; 67 | rdfs:range rdfs:Literal ; 68 | . 69 | 70 | rdfs:domain 71 | a rdf:Property ; 72 | rdfs:label "domain"^^xs:string ; 73 | rdfs:comment "A domain of the subject property."^^xs:string ; 74 | rdfs:domain rdf:Property ; 75 | rdfs:isDefinedBy ; 76 | rdfs:range rdfs:Class ; 77 | . 78 | 79 | rdfs:isDefinedBy 80 | a rdf:Property ; 81 | rdfs:subPropertyOf rdfs:seeAlso ; 82 | rdfs:label "isDefinedBy"^^xs:string ; 83 | rdfs:comment "The defininition of the subject resource."^^xs:string ; 84 | rdfs:domain rdfs:Resource ; 85 | rdfs:isDefinedBy ; 86 | rdfs:range rdfs:Resource ; 87 | . 88 | 89 | rdfs:label 90 | a rdf:Property ; 91 | rdfs:label "label"^^xs:string ; 92 | rdfs:comment "A human-readable name for the subject."^^xs:string ; 93 | rdfs:domain rdfs:Resource ; 94 | rdfs:isDefinedBy ; 95 | rdfs:range rdfs:Literal ; 96 | . 97 | 98 | rdfs:member 99 | a rdf:Property ; 100 | rdfs:label "member"^^xs:string ; 101 | rdfs:comment "A member of the subject resource."^^xs:string ; 102 | rdfs:domain rdfs:Resource ; 103 | rdfs:isDefinedBy ; 104 | rdfs:range rdfs:Resource ; 105 | . 106 | 107 | rdfs:range 108 | a rdf:Property ; 109 | rdfs:label "range"^^xs:string ; 110 | rdfs:comment "A range of the subject property."^^xs:string ; 111 | rdfs:domain rdf:Property ; 112 | rdfs:isDefinedBy ; 113 | rdfs:range rdfs:Class ; 114 | . 115 | 116 | rdfs:seeAlso 117 | a rdf:Property ; 118 | rdfs:label "seeAlso"^^xs:string ; 119 | rdfs:comment "Further information about the subject resource."^^xs:string ; 120 | rdfs:domain rdfs:Resource ; 121 | rdfs:isDefinedBy ; 122 | rdfs:range rdfs:Resource ; 123 | . 124 | 125 | rdfs:subClassOf 126 | a rdf:Property ; 127 | rdfs:label "subClassOf"^^xs:string ; 128 | rdfs:comment "The subject is a subclass of a class."^^xs:string ; 129 | rdfs:domain rdfs:Class ; 130 | rdfs:isDefinedBy ; 131 | rdfs:range rdfs:Class ; 132 | . 133 | 134 | rdfs:subPropertyOf 135 | a rdf:Property ; 136 | rdfs:label "subPropertyOf"^^xs:string ; 137 | rdfs:comment "The subject is a subproperty of a property."^^xs:string ; 138 | rdfs:domain rdf:Property ; 139 | rdfs:isDefinedBy ; 140 | rdfs:range rdf:Property ; 141 | . 142 | 143 | -------------------------------------------------------------------------------- /iDE/media/async-execution.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/iDE/media/async-execution.gif -------------------------------------------------------------------------------- /interfaces_and_personalities/README.md: -------------------------------------------------------------------------------- 1 | ##### DRAFT 2 | 3 | # Interfaces and Personalities 4 | 5 | I work in the software development space and I have been noticing patterns between the personalities involved in the development process and the resultant software product. 6 | 7 | The patterns are evident between (a) the personalities assigned to a development project and (b) the maneuverability of the final product. 8 | 9 | Interfaces are, of course, points where two things meet and interact. 10 | And I see the personalities as: tendencies to focus on a particular interface. 11 | 12 | In software development the things interacting are: 13 | 14 | - People 15 | - Knowledge 16 | - Needs/Wants 17 | - Specifications 18 | - Applications 19 | - Execution Environments 20 | - Serialized Data 21 | - Explicit 22 | - Implicit 23 | 24 | We could slice up the space differently but I think this set of things gives rise to the interfaces that are most worthy of discussion. 25 | 26 | There are interfaces between each of those things to all the other things but I think only a handful of those interfaces account for the most amount of product quality variation between software implementations. 27 | 28 | I mostly want to compare (a) the interfaces between applications and (b) the interfaces between knowledge and serialized data. 29 | I'll also enumerate some of the nearby interfaces just to situate the comparison. 30 | I don't have a name for these interfaces but I have a personality that tends to focus on them: 31 | 32 | ![](media/diagram.png) 33 | 34 | By varying the amount of attention each of these personalities put into a software product you can vary some important operational characteristics of the software. 35 | 36 | It isn't always the case that you get a person or role for each personality type. 37 | Sometimes one person represents several of the personalities but I don't think I've ever come across a person that represents all of them. 38 | 39 | Let's look at each of them in turn. 40 | 41 | ## Between Application and Application 42 | 43 | The interface between applications is a focus of the personality "Software Engineer." 44 | 45 | Software engineers focus on more than just the interface between applications but their focus on this is what makes the resultant software seem more "developer friendly." 46 | This personality puts work into interfaces that other applications can use (APIs). 47 | Allowing this personality to have the majority vote in design often results in software that this personality likes to develop against. 48 | 49 | -- some circularity here? -- 50 | 51 | The software engineer personality designs APIs with a mind for only their applications needs (not the needs of the ecosystem around it). 52 | 53 | This personality almost always gets overrepresented. 54 | Because of that, software that is developed in this manner is lopsided (more application-centric than data-centric) which results in the [Software Wasteland](https://www.amazon.com/Software-Wasteland-Application-Centric-Hobbling-Enterprises/dp/1634623169). 55 | 56 | ## Between Application and Execution Environment 57 | The interface between applications and execution environments: "DevOps Engineer" 58 | 59 | Without some attention to this interface you'll likely get an application that won't easily run in other execution environments. 60 | 61 | ## Between Application and Serialized Data 62 | The interface between Applications and Serialized Data is a focus of the personality: "Data Engineer" 63 | 64 | When you think of enterprise software, that almost always entails some kind of ETL (extracting, transforming, and loading data). 65 | This is the domain of the data engineer personality. 66 | 67 | ## Between Serialized Data (explicit) and Serialized Data (implicit) 68 | 69 | Serialized Data (explicit) and Serialized Data (implicit): "Data Scientist" 70 | 71 | This is the personality that is doing AI/ML. 72 | This work is about taking some input (serialized data) and doing some computations and producing some output (serialized data). 73 | Most often the output data is implicit, in that it was derivable from the input but the input did not explicitly state it. 74 | 75 | e.g. 76 | From the input "Owen has 2 apples. He gives 1 to his dad." you can derive "Owen now has 1 apple" as it is an implicit fact based on the input and the background worldview (including the axiom that apples are [rivalrous goods](https://en.wikipedia.org/wiki/Rivalry_(economics))). 77 | 78 | To do useful derivations (that aren't easily obtained) you often need multiple different input data sources. 79 | You can use any mechanism to produce these derivations: algebra, regression analysis, deductive logic, artificial neural networks, and LLMs. 80 | 81 | This personalty has been historically told to "produce insights." 82 | And these days this personality is told to "take loosey-goosey input and do the right thing." 83 | I've also written (pre-ChatGPT) about such loosey-goosey input [here](https://github.com/justin2004/weblog/tree/master/semantic_messages). 84 | 85 | 86 | ## Between Knowledge and Serialized Data 87 | 88 | Knowledge and Serialized Data: "Applied Ontologist" 89 | 90 | This personality is the most underrepresented in software development and that under representation results in systems that work for narrow purposes *but* do not adapt well to variations on those purposes and new purposes. 91 | 92 | When this personality is underrepresented you get software that: 93 | - has data schemata for its narrow purposes 94 | - takes "stay in your lane" to an extreme 95 | - entangles the meaning of serialized data with the code that handles it 96 | - such that it takes significant effort to migrate data out of one system and into another 97 | - assumes someone else will be the one doing the ETL to get data in and out 98 | 99 | The work of this personality is: expressing knowledge and situations using the terminology of a particular worldview. 100 | 101 | Some of the key principles of this personality are thoughtfully summarized [here](https://datacentricmanifesto.org/). 102 | 103 | ## Case Study 104 | 105 | Let's consider how a software engineer and an applied ontologist would approach the design of an inventory system. 106 | 107 | What would the software engineer personality prioritize while developing an inventory system? 108 | They'd prioritize the design of a back-end to do CRUD (create, read, update, delete) via an API. 109 | 110 | 111 | Contrast that to what an applied ontologist would prioritize: 112 | They'd prioritize building a representation of the knowledge about the relevant details (and adjacent details -- enough to contextualize the particulars of this application) of inventory within a particular worldview. 113 | -------------------------------------------------------------------------------- /interfaces_and_personalities/media/diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/interfaces_and_personalities/media/diagram.png -------------------------------------------------------------------------------- /intuitive_graph_viz/README.md: -------------------------------------------------------------------------------- 1 | # Knowledge Graph Viz 2 | 3 | ## What We Currently Have 4 | 5 | Knowledge Graph (KG) visualization is disappointing. 6 | At least all the products I've used. 7 | 8 | I think KG viz today is dissapointing because: 9 | 10 | - (1) they mostly using circles and lines 11 | - (2) the layout algorithms (radial, circular, force-directed, etc.) are unaware of what the edges and node types mean 12 | - (3) they don't have a way to downsample (when rendering a large graph) while preserving the spirit of the meaning of the data 13 | 14 | Regarding (2), KG viz today doesn't make use of the information contained in the ontology used by the KG. 15 | One of the reasons for this is because many KGs in the wild employ schemata but not ontologies (with primitives using axiomatic definitions). 16 | [gist](https://github.com/semanticarts/gist) is an example of such an ontology. 17 | 18 | Unfortunately in the context of KGs, often when you hear people say "ontology" they merely mean "schema." 19 | 20 | In KG viz today, when you get more 100 things to render they all look about the same: 21 | 22 | 23 | 24 | 25 | They look like abstract art. 26 | 27 | I showed some of these to my seven year old son and he said "tons of random dots, what does that mean?" 28 | 29 | When they are smaller (like a few dozen things) they might be more useful but you have to resort to reading them like text: 30 | 31 | 32 | 33 | ## What We Could Do Instead 34 | 35 | I think what we are missing in KG viz are renderings that convey meaning intuitively. 36 | Street signs with no words are a good example of this. 37 | You don't need much experience in the world to have a pretty good idea of what they mean. 38 | Also, the meaning is conveyed quickly -- you don't have to read text and engage a linguistic part of your mind. 39 | 40 | I would like to see an option to render graphs more like this: 41 | 42 | 43 | 44 | 45 | One thing to note about this approach is that there are only a few dozen concepts in the `gist` ontology that need to have associated icons (or visual motifs). 46 | Imagine if a good designer took a pass at this. 47 | 48 | 49 | This is just a first step. 50 | 51 | This approach only involves iconic symbols for types of things and relationships between those things. 52 | I then again showed this to my seven year old son and he talked through what he thought it was representing and he got many of the ideas right. 53 | 54 | I think subsequent rendering steps would involve making using of arrangement, orientation, proximity, and size. 55 | 56 | Orientation/direction: 57 | 58 | For example, perhaps all the causal flavored edges in gist (`gist:triggers`, `gist:isAffectedBy`, `gist:produces`, etc.) could be rendered from left to right. 59 | This would give the graph viz a causal left to right character that could help draw your eye to common causal dependencies. 60 | 61 | Size: 62 | 63 | For example, perhaps when the number of outgoing `gist:produces` edges gets larger the size of the source icon could get larger. 64 | 65 | ## Intuitive KG Viz Goals 66 | 67 | The main goal of these subsequent steps would be to allow a large portion of the graph to be rendered and downsample the resolution of the meaning (while maintaining the spirit of the meaning) so as to not produce a pixel overload that is incomprehensible. 68 | 69 | An ideal KG viz should allow impressions (of the meaning of the graph) to be almost a reflex. 70 | 71 | When you look out of the window from an airplane you don't see an incomprehensible pile of jiggling pixels. 72 | Instead you see a downsampled scene of the world beneath you. 73 | If you have enough altitude you can't see individual cars or buildings anymore, but you can recognize aggregations of them. 74 | 75 | In the same manner, imagine if you zoomed out on a graph viz like that and lots of individual instances of `gist:Content` became rendereded in aggregate (perhaps like reams of paper). 76 | 77 | 78 | ## Let's Do This 79 | 80 | If you are a designer you could pick an ontology (maybe start with [gist](https://github.com/semanticarts/gist) or [CCO](https://github.com/CommonCoreOntology/CommonCoreOntologies)) and decide how to render each of the primitives (classes and object properties). 81 | 82 | If you are a UI implementor perhaps you could implement my simple design and then we could see how it scales up to larger graphs. 83 | 84 | Let me know if you want to work on this together! 85 | 86 | Perhaps these first few implementations wouldn't be useful for non-toy datasets but they might help us think better about how to proceed with graph viz. 87 | 88 | 89 | --- 90 | 91 | Note: the graph data (RDF) I used in this example comes from [this post](https://github.com/justin2004/weblog/tree/master/semantic_messages). 92 | -------------------------------------------------------------------------------- /intuitive_graph_viz/media/blobby.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/intuitive_graph_viz/media/blobby.png -------------------------------------------------------------------------------- /intuitive_graph_viz/media/mnemonic_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/intuitive_graph_viz/media/mnemonic_graph.png -------------------------------------------------------------------------------- /intuitive_graph_viz/media/standard_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/intuitive_graph_viz/media/standard_graph.png -------------------------------------------------------------------------------- /json-ld/README.md: -------------------------------------------------------------------------------- 1 | # json-ld 2 | 3 | I had a hard time understanding json-ld until I used turtle files a lot. 4 | Now as I look at json-ld it makes sense. 5 | When using triples (subject predicate object) the main thing to remember is that you must fully qualify. 6 | That is what all this `@context` and `@id` is about in json-ld. 7 | 8 | 9 | ### example 10 | 11 | This example below comes from https://json-ld.org/playground/ 12 | 13 | Here is a turtle file (containing triples). 14 | 15 | ``` 16 | _:b0 "Professor" . 17 | _:b0 "Jane Doe" . 18 | _:b0 "(425) 123-4567" . 19 | _:b0 . 20 | _:b0 . 21 | ``` 22 | 23 | The file talks about something with a jobTitle, name, etc. 24 | The `_:b0` just means that we don't know the URI for the subject but, nevertheless, we are going to say some things about some subject. 25 | 26 | 27 | 28 | We could make this file easier for a person to edit with prefixes: 29 | 30 | ``` 31 | @prefix sc: . 32 | @prefix rdf: . 33 | 34 | _:b0 sc:jobTitle "Professor" . 35 | _:b0 sc:name "Jane Doe" . 36 | _:b0 sc:telephone "(425) 123-4567" . 37 | _:b0 sc:url . 38 | _:b0 rdf:type sc:Person . 39 | ``` 40 | 41 | But note that everything is still fully qualified. We just have a shortcut notation now. 42 | 43 | Now json-ld: 44 | 45 | ``` 46 | { 47 | "@context": "http://schema.org/", 48 | "@type": "Person", 49 | "name": "Jane Doe", 50 | "jobTitle": "Professor", 51 | "telephone": "(425) 123-4567", 52 | "url": "http://www.janedoe.com" 53 | } 54 | ``` 55 | 56 | 57 | See how the value of `@context` is just the prefix for all the raw keys: name, jobTitle, telephone, and url. 58 | Just like in the turtle file with prefixes defined. 59 | 60 | And `rdf:type` gets some special treatment in json-ld as `@type`. 61 | Often in turtle files you'll see `rdf:type` abbreviated as `a`. 62 | 63 | This example does not use `@id` but that just means "this is a URI" as opposed to some other data type (like string or integer). 64 | 65 | 66 | 67 | ### efficiency 68 | 69 | If you use terms from many vocabularies/ontologies then you might not want to keep transmitting a long list of `@context` values. 70 | In that case it [looks](https://niem.github.io/json/reference/json-ld/context/) like you can refer to some resource that contains your context. 71 | 72 | `A key "@context" may have, as a value, a URI, which is a name for a JSON-LD context object.` 73 | 74 | ``` 75 | { 76 | "@context" : "https://example.org/my/context.json", 77 | "nc:PersonPreferredName": "Morty" 78 | } 79 | ``` 80 | 81 | -------------------------------------------------------------------------------- /reason-over/README.md: -------------------------------------------------------------------------------- 1 | # Reasoning over Wikidata 2 | 3 | 4 | ## Wouldn't it be nice 5 | 6 | I posted [this question](https://www.reddit.com/r/semanticweb/comments/lp0iey/reasoning_over_service/) a few days ago. The idea is that it would be nice to be able to query a remote SPARQL endpoint and specify an ontology to be used with a reasoner over the triples from the remote endpoint. For example I know that in this ontology that I bring I would like this triple: 7 | 8 | `wdt:P31 rdfs:subPropertyOf rdf:type .` 9 | 10 | wdt:P31 is Wikidata's way of saying "is an instance of" and that is what rdf:type says too. 11 | 12 | 13 | I naively thought this just might work if I query a triplestore that has reasoning enabled and I specify a service to get some triples from [Wikidata](https://www.wikidata.org). 14 | 15 | ``` 16 | select * where { 17 | service { 18 | ?s ?p ?o . 19 | filter(?s=wd:Q23) . 20 | filter(?o=wd:Q5) . 21 | } 22 | ``` 23 | 24 | 25 | But I couldn't get the local triplestore to reason over the triples that satisfy my service clause. 26 | 27 | 28 | A little searching led me to [this](http://ceur-ws.org/Vol-996/papers/ldow2013-paper-08.pdf) academic paper. 29 | ![paper](media/paper.png) 30 | 31 | 32 | It looked promising. 33 | 34 | > ... the sparql endpoint provider decides which inference rules are used for its entailment regimes. In this paper, we propose an extension to the sparql query language to support remote reasoning, in which the data consumer can define the inference rules. 35 | 36 | I was excited. 37 | 38 | But I was reminded that, often, engineers can't easily spend academic currency. 39 | Publishing a paper (or attempting to) and proposing a standard (or an extension to a standard) might win academic points but it doesn't help me leverage Wikidata today. 40 | 41 | > For the proof-of-concept, we have extended Apache Jena ARQ query engine to support remote reasoning 42 | 43 | The paper mentions but does not make code available. 44 | 45 | 46 | I think saying that [The Scientific Paper Is Obsolete](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/) sounds harsh but I would have much prefered a "computational essay" that actually executes and produces some result than a paper whose writers would like to get a [standard](https://www.w3.org/TR/sparql11-query/) extended. 47 | 48 | Side note: 49 | 50 | Someone in academia might think "proposed an extension to a standard" looks good on a CV but wouldn't "wrote a tool that became the cURL of remote-ish SPARQL reasoning" look better? 51 | I don't even need to link to cURL. You know what it is and how useful it is. 52 | 53 | ## Make something that works today 54 | 55 | I had recently read Ritchie and Thompson's "The UNIX Time-Shaing System" [paper](https://archive.org/details/UNIX-Time-Sharing-System) from 1974. 56 | Their software composability ideas where mingling with my desire to have reasoning over remote SPARQL endpoints. 57 | 58 | > The most important role of UNIX is to provide a file system. 59 | 60 | That is partly because files are a great way to allow interprocess communication to be blended with human interaction. 61 | 62 | I got excited again thinking that I could implement something like remote reasoning using UNIX thinking a.k.a [Small, Sharp tools](https://www.brandur.org/small-sharp-tools). 63 | 64 | > 6.2 Filters 65 | 66 | > ... 67 | 68 | > A sequence of commands separated by vertical bars causes the Shell to execute all the commands simultaneously and to arrange that the standard output of each command be delivered to the standard input of the next command in the sequence. 69 | 70 | The sequence I had in mind goes like this: 71 | 72 | 0) Find or craft some triples to serve as your [Tbox](https://en.wikipedia.org/wiki/Tbox) (ontology) triples. 73 | 74 | 0) Query the remote SPARQL endpoint to obtain some triples that you want to reason over. 75 | 76 | 0) Apply a reasoner (using your Tbox triples) to the triples you want to reason over and produce some additional derived triples. 77 | 78 | 0) Run a final SPARQL query against all the resultant (derived and original) triples. 79 | 80 | 81 | ## The result 82 | 83 | The project I made ended up [here](https://github.com/justin2004/wikidata_reasoning). 84 | 85 | It presents the illusion of remote reasoning (since reasoning actually happens locally) but it does not require any standard extension -- it simply glues together existing tools and uses existing standards. 86 | It also doesn't tease someone with a desire to reason over a remote SPARQL endpoint. It runs and produces results. 87 | 88 | I still consider the project to be remote-ish reasoning because from the perspective of an application (or person) that wants to consume the output it is not necessary to build the reasoning in to your application (if you use this project or something like it). 89 | 90 | I could have made this repo a single command pipeline but the Apache Jena command line utilities sometimes need to see a file (with an extension) so it can guess what RDF serialization format is in use. 91 | But it is also nice to store intermediate results in files because then you can change queries or the ontology file and let `make` decide what needs to be executed to update your output file. 92 | 93 | 94 | See the [project page](https://github.com/justin2004/wikidata_reasoning) for a demonstration. 95 | -------------------------------------------------------------------------------- /reason-over/media/ldow2013-paper-08.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/reason-over/media/ldow2013-paper-08.pdf -------------------------------------------------------------------------------- /reason-over/media/paper.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/reason-over/media/paper.png -------------------------------------------------------------------------------- /relational_as_graph/README.md: -------------------------------------------------------------------------------- 1 | # Querying a Relational Database with a Graph Query Language 2 | 3 | 4 | ## Introduction 5 | 6 | If you'd like to produce a graph (RDF) from a relational database then this post might be useful for you. 7 | 8 | In this post we'll use: 9 | - [PostgreSQL](https://www.postgresql.org/) 10 | - A Relational Database 11 | - [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything) 12 | - A tool built atop the Apache Jena project that presents structured data as RDF 13 | 14 | 15 | ## Examples 16 | 17 | I've loaded up an instance of Postgres with a few tables. 18 | 19 | Let's list the tables using the command line interface to Postgres: 20 | 21 | ```bash 22 | echo "\d" | PGPASSWORD=mysecretpassword psql -h 172.17.0.1 -p 5432 -U postgres --csv -f /dev/stdin 23 | ``` 24 | 25 | Produces: 26 | ```csv 27 | Schema,Name,Type,Owner 28 | public,freightrates,table,postgres 29 | public,orderlist,table,postgres 30 | public,plantports,table,postgres 31 | public,productsperplant,table,postgres 32 | public,vmicustomers,table,postgres 33 | public,whcapacities,table,postgres 34 | public,whcosts,table,postgres 35 | ``` 36 | 37 | Let's also peek at one of the tables by running a SQL select query using the command line interface: 38 | 39 | ```bash 40 | echo "select * from whcosts limit 3" | PGPASSWORD=mysecretpassword psql -h 172.17.0.1 -p 5432 -U postgres --csv -f /dev/stdin 41 | ``` 42 | 43 | Produces: 44 | ```csv 45 | WH,Cost/unit 46 | PLANT15,1.42 47 | PLANT17,0.43 48 | PLANT18,2.04 49 | ``` 50 | 51 | The tables happen to be about a supply chain but the content doesn't matter for this post. 52 | 53 | We've recently added a feature to SPARQL Anything that allows us to more easily transform these tables into RDF. 54 | As of writing you'll need to use the branch `v0.9-DEV` to run these queries. 55 | I'll update this post once this feature gets into the next release. 56 | 57 | First, let's just enumerate the table names using a SPARQL query: 58 | 59 | ```sparql 60 | PREFIX xyz: 61 | PREFIX fx: 62 | SELECT * 63 | WHERE 64 | { SERVICE 65 | { fx:properties 66 | fx:command "echo \\\\d | PGPASSWORD=mysecretpassword psql -h 172.17.0.1 -p 5432 -U postgres --csv -f /dev/stdin" ; 67 | fx:media-type "text/csv" ; 68 | fx:csv.headers "true" . 69 | [] xyz:Name ?table_name 70 | } 71 | } 72 | ``` 73 | 74 | The Postgres (psql) command to list the tables is `\d`. 75 | Notice we had to think carefully about escaping that backslash. :) 76 | 77 | Other than that, the command text is exactly what you'd type in the shell. 78 | We just happen to be embedding that command text into a SPARQL query. 79 | 80 | The SPARQL Anything engine, upon seeing a `SERVICE` clause using its protocol `x-sparql-anything:` springs to life and runs our shell command then it interprets the output of that command as csv (with a header row). 81 | We just want the column called "Name" (which SPARQL Anything makes a URI for called `xyz:Name`). 82 | 83 | And here are the results: 84 | 85 | |table\_name | 86 | |------------------| 87 | |plantports | 88 | |vmicustomers | 89 | |whcosts | 90 | |orderlist | 91 | |productsperplant | 92 | |whcapacities | 93 | |freightrates | 94 | 95 | SPARQL _select_ queries produce tables, like that one. 96 | But SPARQL _construct_ queries produce RDF (graphs). 97 | 98 | Now let's just focus on one table and transform it, with a SPARQL construct query, into some modeled RDF. 99 | 100 | ```sparql 101 | PREFIX fx: 102 | PREFIX ex: 103 | PREFIX gist: 104 | PREFIX xyz: 105 | PREFIX xsd: 106 | CONSTRUCT 107 | { 108 | ?warehouse_iri ex:somePredicateGoesHere ?something . 109 | ?warehouse_iri a gist:Building . 110 | ?something gist:hasMagnitude ?mag . 111 | ?mag gist:hasUnitOfMeasure gist:_USDollar . 112 | ?mag gist:numericValue ?cost_double . 113 | } 114 | WHERE 115 | { SERVICE 116 | { BIND(concat("echo \"select * from whcosts\" | PGPASSWORD=mysecretpassword psql -h 172.17.0.1 -p 5432 -U postgres --csv -f /dev/stdin") AS ?cmd) 117 | fx:properties 118 | fx:command ?cmd ; 119 | fx:media-type "text/csv" ; 120 | fx:csv.headers "true" . 121 | [] xyz:costperunit ?cost ; 122 | xyz:wh ?warehouse 123 | BIND(IRI(concat(str(ex:), "Warehouse_", ?warehouse)) AS ?warehouse_iri) 124 | BIND(bnode() AS ?mag) 125 | BIND(bnode() AS ?something) 126 | BIND(xsd:double(?cost) AS ?cost_double) 127 | } 128 | } 129 | ``` 130 | 131 | Which produces this graph: 132 | ```ttl 133 | @prefix ex: . 134 | @prefix fx: . 135 | @prefix gist: . 136 | @prefix xsd: . 137 | @prefix xyz: . 138 | 139 | ex:Warehouse_PLANT06 a gist:Building ; 140 | ex:somePredicateGoesHere [ gist:hasMagnitude [ gist:hasUnitOfMeasure gist:_USDollar ; 141 | gist:numericValue "0.55"^^xsd:double 142 | ] 143 | ] . 144 | 145 | ex:Warehouse_PLANT19 a gist:Building ; 146 | ex:somePredicateGoesHere [ gist:hasMagnitude [ gist:hasUnitOfMeasure gist:_USDollar ; 147 | gist:numericValue "0.64"^^xsd:double 148 | ] 149 | ] . 150 | 151 | ex:Warehouse_PLANT13 a gist:Building ; 152 | ex:somePredicateGoesHere [ gist:hasMagnitude [ gist:hasUnitOfMeasure gist:_USDollar ; 153 | gist:numericValue "0.47"^^xsd:double 154 | ] 155 | ] . 156 | 157 | ... 158 | 159 | ``` 160 | 161 | Note that I spent only 38 seconds "modeling" the situation where each warehouse has associated with it a certain cost per unit for sending units through it. 162 | I wouldn't model the situation that way but this post isn't about modeling. 163 | I just want to demonstrate the technical moves here. 164 | 165 | Also note that I hardcoded the string "select * from whcosts" but you could `BIND` variables and do something more programmatic. 166 | 167 | For example, here is one way to iterate over all the tables and generate the resultant naive RDF: 168 | 169 | ```sparql 170 | PREFIX xyz: 171 | PREFIX fx: 172 | PREFIX ex: 173 | CONSTRUCT 174 | { 175 | GRAPH ?g 176 | { ?s ?p ?o .} 177 | } 178 | WHERE 179 | { SERVICE 180 | { { { SELECT * 181 | WHERE 182 | { SERVICE 183 | { fx:properties 184 | fx:command "echo \\\\d | PGPASSWORD=mysecretpassword psql -h 172.17.0.1 -p 5432 -U postgres --csv -f /dev/stdin" ; 185 | fx:media-type "text/csv" ; 186 | fx:csv.headers "true" . 187 | [] xyz:Name ?table_name 188 | } 189 | } 190 | } 191 | BIND(concat("echo \"select * from ", ?table_name, " limit 30\" | PGPASSWORD=mysecretpassword psql -h 172.17.0.1 -p 5432 -U postgres --csv -f /dev/stdin") AS ?cmd) 192 | BIND(IRI(concat(str(ex:), "NamedGraph_", ?table_name)) AS ?g) 193 | } 194 | fx:properties 195 | fx:command ?cmd ; 196 | fx:media-type "text/csv" ; 197 | fx:csv.headers "true" . 198 | ?s ?p ?o 199 | } 200 | } 201 | ``` 202 | 203 | Notice this SPARQL query doesn't hardcode table names so it can accommodate any changes to the database of tables. 204 | 205 | It produces this RDF (I've remove some triples for brevity): 206 | ```ttl 207 | @prefix ex: . 208 | @prefix fx: . 209 | @prefix xyz: . 210 | 211 | ex:NamedGraph_vmicustomers { 212 | [ a fx:root ; 213 | 214 | [ xyz:customers "V55555555_9" ; 215 | xyz:plant_code "PLANT02" 216 | ] ; 217 | 218 | [ xyz:customers "V55555_10" ; 219 | xyz:plant_code "PLANT02" 220 | ] ; 221 | 222 | [ xyz:customers "V555555555555555_18" ; 223 | xyz:plant_code "PLANT06" 224 | ] 225 | ] . 226 | } 227 | 228 | ex:NamedGraph_freightrates { 229 | [ a fx:root ; 230 | 231 | [ xyz:carrier "V444_6" ; 232 | xyz:carrier_type "V88888888_0" ; 233 | xyz:dest_port_cd "PORT09" ; 234 | xyz:max_wgh_qty "4.99" ; 235 | xyz:minimum_cost " $43.23 " ; 236 | xyz:minm_wgh_qty "0" ; 237 | xyz:mode_dsc "AIR " ; 238 | xyz:orig_port_cd "PORT08" ; 239 | xyz:rate " $1.83 " ; 240 | xyz:svc_cd "DTD" ; 241 | xyz:tpt_day_cnt "2" 242 | ] ; 243 | 244 | [ xyz:carrier "V444_6" ; 245 | xyz:carrier_type "V88888888_0" ; 246 | xyz:dest_port_cd "PORT09" ; 247 | xyz:max_wgh_qty "9.99" ; 248 | xyz:minimum_cost " $43.23 " ; 249 | xyz:minm_wgh_qty "5" ; 250 | xyz:mode_dsc "AIR " ; 251 | xyz:orig_port_cd "PORT08" ; 252 | xyz:rate " $1.83 " ; 253 | xyz:svc_cd "DTD" ; 254 | xyz:tpt_day_cnt "2" 255 | ] ; 256 | 257 | [ xyz:carrier "V444_6" ; 258 | xyz:carrier_type "V88888888_0" ; 259 | xyz:dest_port_cd "PORT09" ; 260 | xyz:max_wgh_qty "99999.99" ; 261 | xyz:minimum_cost " $43.23 " ; 262 | xyz:minm_wgh_qty "2000" ; 263 | xyz:mode_dsc "AIR " ; 264 | xyz:orig_port_cd "PORT08" ; 265 | xyz:rate " $0.64 " ; 266 | xyz:svc_cd "DTD" ; 267 | xyz:tpt_day_cnt "2" 268 | ] 269 | ] . 270 | } 271 | 272 | ... 273 | 274 | ``` 275 | 276 | Notice this time we are producing quads (triples in a named graph). 277 | We asked for one named graph per database table. 278 | 279 | Also notice that the RDF is in what the SPARQL Anything project calls [Facade-X](https://github.com/SPARQL-Anything/sparql.anything#facade-x): a subset of RDF for representing data from diverse sources into containers and slots. 280 | 281 | We rarely use this container-slot representation as a final form; it is almost always transformed with a SPARQL construct query like we did above. 282 | 283 | ## Final Thoughts 284 | 285 | Although this demonstration uses PostgreSQL, there are equivalent command line invocations for Oracle, MySQL, MSSQL, etc. 286 | 287 | This kind of "support" for relational database sources is stringy and not first class. 288 | SPARQL Anything recognizes csv as a first class data input and we are just bridging the gap between the database and csv by using the `fx:command` property. 289 | 290 | But I do like that for small to medium sized relational source systems we could quickly start modeling and transforming into RDF without much tooling. 291 | Also, we don't need much verbosity either. 292 | SPARQL construct queries are a very efficient way to express transformations (that is, not much text and few primitives) on RDF compared to other methods in use. 293 | 294 | Have fun turning tables into graphs! 295 | -------------------------------------------------------------------------------- /scraping_with_sparql/README.md: -------------------------------------------------------------------------------- 1 | # Scraping Webpages with SPARQL 2 | 3 | ## Intro 4 | 5 | Sometimes you want some data that does not sit behind an application friendly interface. 6 | The friendliest API is a SPARQL endpoint. 7 | Using [SPARQL Anything](https://github.com/SPARQL-Anything/sparql.anything) you can view many APIs as approximations of SPARQL endpoints. 8 | You can even view webpages as approximations of SPARQL endpoints. 9 | 10 | In this post I am going to be [scraping](https://en.wikipedia.org/wiki/Web_scraping) some data from Apache's JIRA. 11 | I created an issue for Apache Jena a while back and I am going to pretend I want to be reminded of my activity on that issue. 12 | Let's also pretend that Apache's JIRA doesn't have a more friendly REST API (because sometimes even if one exists you may not have access to use it.). 13 | 14 | [This](https://issues.apache.org/jira/secure/ViewProfile.jspa) is the URL we'll be scraping. 15 | It is my user profile. 16 | 17 | Things to note: 18 | 19 | - You have to be logged into this page to see your activity stream 20 | - To see your activity stream you need javascript to interpret the webpage 21 | - Web Browsers (I am using Firefox) emit events as pages load 22 | - ["The DOMContentLoaded event fires when the initial HTML document has been completely loaded and parsed, without waiting for stylesheets, images, and subframes to finish loading."](https://developer.mozilla.org/en-US/docs/Web/API/Window/DOMContentLoaded_event) 23 | - ["A different event, load, should be used only to detect a fully-loaded page."](https://developer.mozilla.org/en-US/docs/Web/API/Window/DOMContentLoaded_event) 24 | - Sometimes content is still loading into a webpage even after the "load" event fires so you might need to be able to wait to scrape the content you want 25 | 26 | SPARQL Anything can handle all of that with your assistance. 27 | 28 | I'll break the process down a little bit (assuming you've never scraped a webpage before). 29 | 30 | ## Login 31 | 32 | We are just going to do the cheap approach: manually login with our browser then manually get the [cookie](https://en.wikipedia.org/wiki/HTTP_cookie). 33 | 34 | Log in with your credentials and navigate to the page you want to scrape. 35 | 36 | --- 37 | 38 | Open your Inspector and go to the Network tab. 39 | ![](media/inspector_open.png) 40 | 41 | --- 42 | 43 | Refresh the page, in the Inspector scroll up the to first request and click it. 44 | ![](media/inspector_results.png) 45 | 46 | --- 47 | 48 | In the headers tab you should see Request Headers (I've obscured my cookie value). 49 | ![](media/inspector_request_headers.png) 50 | 51 | --- 52 | 53 | Usually Request Headers have all the information you need to send to the website in order to scrape. 54 | Copy information into your SPARQL query as needed. 55 | It often requires some experimentation to figure out the minimum set of headers you actually need. 56 | 57 | ## Write a hello world SPARQL (construct) query to see if it works 58 | 59 | I've got SPARQL Anything running in its Docker container. 60 | The SPARQL query is wrapped in a bash command: 61 | 62 | ```sparql 63 | curl --silent 'http://localhost:3000/sparql.anything' \ 64 | --data-urlencode 'query= 65 | PREFIX xyz: 66 | PREFIX ns: 67 | PREFIX rdf: 68 | PREFIX fx: 69 | prefix skos: 70 | prefix what: 71 | prefix xhtml: 72 | prefix ex: 73 | PREFIX xsd: 74 | construct {?s ?p ?o} 75 | WHERE { 76 | service { 77 | fx:properties fx:location "https://issues.apache.org/jira/secure/ViewProfile.jspa" . 78 | fx:properties fx:media-type "text/html" . 79 | fx:properties fx:html.browser "firefox" . 80 | fx:properties fx:html.browser.screenshot "file:///app/screenie.png" . 81 | fx:properties fx:html.browser.wait "5" . 82 | fx:properties fx:http.header.User-Agent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0" . 83 | fx:properties fx:http.header.Cookie "atlassian.xsrf.token=BLAHBLAH..." . 84 | ?s ?p ?o . 85 | } 86 | }' 87 | ``` 88 | 89 | Which yields triples (in turtle format): 90 | ```turtle 91 | @prefix ex: . 92 | @prefix fx: . 93 | @prefix ns: . 94 | @prefix rdf: . 95 | @prefix skos: . 96 | @prefix what: . 97 | @prefix xhtml: . 98 | @prefix xsd: . 99 | @prefix xyz: . 100 | 101 | [ rdf:type xhtml:html , ns:root ; 102 | rdf:_1 [ rdf:type xhtml:head ; 103 | rdf:_1 [ rdf:type xhtml:meta ; 104 | xhtml:charset "utf-8" 105 | ] ; 106 | ... 107 | ``` 108 | 109 | You'll notice in my query that I said: 110 | 111 | The URL is: 112 | 113 | `fx:properties fx:location "https://issues.apache.org/jira/secure/ViewProfile.jspa" .` 114 | 115 | Expect html content: 116 | 117 | `fx:properties fx:media-type "text/html" .` 118 | 119 | Use a headless browser (firefox): 120 | 121 | `fx:properties fx:html.browser "firefox" .` 122 | 123 | Take a screenshot (in case we need to troubleshoot) and save it here: 124 | 125 | `fx:properties fx:html.browser.screenshot "file:///app/screenie.png" .` 126 | 127 | After the "load" event is emitted wait 5 seconds: 128 | 129 | `fx:properties fx:html.browser.wait "5" .` 130 | 131 | Send the following two HTTP headers (that we copied from the Inspector earlier): 132 | 133 | ``` 134 | fx:properties fx:http.header.User-Agent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0" . 135 | fx:properties fx:http.header.Cookie "atlassian.xsrf.token=BLAHBLAH..." . 136 | ``` 137 | 138 | --- 139 | 140 | In this case, if you don't wait a few seconds your content won't be loaded: 141 | 142 | ![](media/jira_loading.png) 143 | 144 | --- 145 | 146 | When you do wait a few seconds you get (as seen in file:///app/screenie.png): 147 | 148 | ![](media/screenie.png) 149 | 150 | --- 151 | 152 | ## Pick the HTML elements you need 153 | 154 | Let's just pretend I need a few details of my activity. 155 | 156 | After some exploration of the triples I arrive at the following query: 157 | 158 | ```sparql 159 | curl --silent 'http://localhost:3000/sparql.anything' \ 160 | -H 'Accept: text/csv' \ 161 | --data-urlencode 'query= 162 | PREFIX xyz: 163 | PREFIX ns: 164 | PREFIX rdf: 165 | PREFIX fx: 166 | prefix skos: 167 | prefix what: 168 | prefix xhtml: 169 | prefix ex: 170 | PREFIX xsd: 171 | select ?username ?action_string ?issue_string ?issue_label ?issue_type ?when_string 172 | WHERE { 173 | service { 174 | fx:properties fx:location "https://issues.apache.org/jira/secure/ViewProfile.jspa" . 175 | fx:properties fx:media-type "text/html" . 176 | fx:properties fx:html.browser "firefox" . 177 | fx:properties fx:html.browser.screenshot "file:///app/screenie.png" . 178 | fx:properties fx:html.browser.wait "5" . 179 | fx:properties fx:http.header.User-Agent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0" . 180 | fx:properties fx:http.header.Cookie "atlassian.xsrf.token=BLAH" . 181 | [ ?slotA [ ?slot1 [ xhtml:class "activity-item-user activity-item-author" ; 182 | what:innerText ?username ] ; 183 | ?slot2 ?action_string ; 184 | ?slot3 [ rdf:_1 [ rdf:_1 ?issue_string ] ; 185 | what:innerText ?issue_label ] ] ; 186 | ?slotB [ rdf:_1 [ xhtml:alt ?issue_type ] ; 187 | rdf:_2 [ rdf:_1 ?when_string ] ] ] 188 | filter(fx:next(?slot1) = ?slot2) 189 | filter(fx:next(?slot2) = ?slot3) 190 | filter(fx:next(?slotA) = ?slotB) 191 | } 192 | }' 193 | ``` 194 | 195 | Which yields: 196 | 197 | |username |action\_string |issue\_string |issue\_label |issue\_type |when\_string | 198 | |----------|--------------------|--------------|--------------------|------------|-----------------| 199 | |Justin |created |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |05/Oct/21 00:49 | 200 | |Justin |updated the Description of |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |05/Oct/21 23:33 | 201 | |Justin |updated the Description of |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |05/Oct/21 00:53 | 202 | |Justin |updated the Description of |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |05/Oct/21 00:52 | 203 | |Justin |attached 2 files to |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |05/Oct/21 00:54 | 204 | |Justin |commented on |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |05/Oct/21 13:14 | 205 | |Justin |commented on |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |07/Oct/21 12:48 | 206 | |Justin |commented on |JENA\-2176 |JENA\-2176 \- TDB2 queries can execute quadpatterns with a literal in the subject position JENA\-2176|Question |07/Oct/21 12:50 | 207 | 208 | 209 | Also note that I requested `text/csv` but you can request the data in a different format: 210 | 211 | `text/tab-separated-values` 212 | 213 | `application/sparql-results+xml` 214 | 215 | `application/sparql-results+json` 216 | 217 | 218 | ## Comments on writing the query 219 | 220 | To write a query against triplified HTML it is pretty much essential to have the raw triples (from the hello world query above) open in an editor for reference. 221 | 222 | I use folding (in my editor) to focus on the nodes of interest. 223 | 224 | This part of the query (triple patterns and filters) does all the extraction: 225 | ``` 226 | [ ?slotA [ ?slot1 [ xhtml:class "activity-item-user activity-item-author" ; 227 | what:innerText ?username ] ; 228 | ?slot2 ?action_string ; 229 | ?slot3 [ rdf:_1 [ rdf:_1 ?issue_string ] ; 230 | what:innerText ?issue_label ] ] ; 231 | ?slotB [ rdf:_1 [ xhtml:alt ?issue_type ] ; 232 | rdf:_2 [ rdf:_1 ?when_string ] ] ] 233 | filter(fx:next(?slot1) = ?slot2) 234 | filter(fx:next(?slot2) = ?slot3) 235 | filter(fx:next(?slotA) = ?slotB) 236 | ``` 237 | 238 | It is vanilla SPARQL but with a function that is defined in SPARQL Anything: `fx:next()` which is described [here](https://github.com/SPARQL-Anything/sparql.anything#magic-properties-and-functions). 239 | 240 | I find that the part that takes the most time is making the extraction triple patterns. 241 | One thing that I think would help a lot would be a plugin for my text editor to show the path from the root node to the node my cursor is currently in. 242 | 243 | 244 | 245 | ## Closing 246 | 247 | Why would you want to use SPARQL to scrape a webpage? 248 | 249 | Of course you can scrape webpages (that render content with javascript) with general purpose languages using libraries like [Puppeteer](https://github.com/puppeteer/puppeteer/) or [Playwright](https://github.com/microsoft/playwright-java) (which is what SPARQL Anything uses under the hood). 250 | 251 | What I like about this approach is that SPARQL is the only language that everyone on my team knows very well. 252 | We already implement non-RDF to RDF transformations using SPARQL constructs and deliver answers to questions using SPARQL. 253 | 254 | Also I think it would be easier to teach a non-programmer to scrape using this method. 255 | Plus it would greatly benefit many non-programmers to learn some SPARQL and start querying [Wikidata](https://www.wikidata.org). 256 | I recommend [this tutorial](https://www.youtube.com/watch?v=kJph4q0Im98) on SPARQL querying Wikidata. 257 | 258 | One thing I did not demonstrate in this post is the ability to, in this single SPARQL query, integrate with: 259 | - SPARQL endpoints 260 | - most REST APIs 261 | - including other webpages 262 | - files in your filesystem 263 | 264 | So you could scrape from this page, iterate over referenced webpages (scraping them), bind some strings from the referenced webpages, then do a lookup of those strings using a REST API, then do a final lookup using a SPARQL endpoint. 265 | If you are interested in such a thing I have a [blog post](/blend_google_sheet_with_wikidata) on using SPARQL Anything to blend a Google Sheet with Wikidata. 266 | 267 | In general, I think using SPARQL/RDF encourages you to lay your data down such that it wears its meaning on its sleeve (because you can't as [easily](/SPARQL_value_functions) specify arbitrary processes in SPARQL). 268 | By "wear its meaning on its sleeve I mean": data that doesn't require each query to express an unpacking process and data that uses a common vocabulary/ontology across domains. 269 | 270 | Example of unpacking: 271 | 272 | If you store a range like "32-45" then each query will need to apply some regex or some function to enumerate the integers in the range if it needs to match a single integer. 273 | 274 | Example of using a common vocabulary/ontology: 275 | 276 | If you have a relational database with the tables "Customer" and "Supplier" and each have a column "name" (or a reference to a column that eventually leads to a column called "name") those have the same meaning (casual name) but you can't use the same predicate to obtain the names. 277 | If you have to write a query that uses "Customer.name" and "Supplier.name" curiosity won't lead you to write a query that uses "Customer.name," "Supplier.name," "TruckDriver.name," "Mechanic.name," "Administrator.name," etc. but curiosity will lead you to a query like "?s [gist:name](https://github.com/semanticarts/gist/blob/develop/gistCore.ttl#L3757) ?name" that will look for any subject that has a casual name. 278 | 279 | Ok, have fun scraping with SPARQL! 280 | 281 | -------------------------------------------------------------------------------- /scraping_with_sparql/media/inspector_open.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/scraping_with_sparql/media/inspector_open.png -------------------------------------------------------------------------------- /scraping_with_sparql/media/inspector_request_headers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/scraping_with_sparql/media/inspector_request_headers.png -------------------------------------------------------------------------------- /scraping_with_sparql/media/inspector_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/scraping_with_sparql/media/inspector_results.png -------------------------------------------------------------------------------- /scraping_with_sparql/media/jira_loading.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/scraping_with_sparql/media/jira_loading.png -------------------------------------------------------------------------------- /scraping_with_sparql/media/screenie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/scraping_with_sparql/media/screenie.png -------------------------------------------------------------------------------- /semantic_messages/README.md: -------------------------------------------------------------------------------- 1 | # Interfaces and Interactions 2 | 3 | ## Too Much Specificity And Not Enough Play 4 | 5 | I recently saw [this](https://twitter.com/satnam6502/status/1586398234326446080?s=20&t=mWqKsR_2LdG5WeHE37y5Ow) tweet and it reminded me about something I've wanted to think and talk about. 6 | 7 | ![book](media/tweet.png) 8 | 9 | Satnam continues 10 | 11 | > configuration management has not had the attention enjoyed by academic research for languages and networking, as well as language and networking innovations in industry. 12 | 13 | > I don’t think a “configuration language” is the solution, nor is a domain specific language / library (DSL). 14 | 15 | I tend to agree. 16 | I think perhaps we should explore more loosey-goosey, declarative approaches. 17 | That is, I'd like to explore systems with more play (as in "scope or freedom to act or operate"). 18 | 19 | I'd like to see more semantic messages that convey the spirit rather than the letter. 20 | When you can't foresee all the consequences of the letter then that's when the spirit can help. 21 | 22 | That's what I'd like to think about in this post. 23 | 24 | Let's see an example of such a loosey-goosey semantic message. 25 | 26 | ## Semantic Messages 27 | 28 | I'm writing another blog post on what "semantic" means in the semantic web. 29 | I'll put a link here once I am done but in the mean time think of "semantic" as getting different things (different people, different machines, people and machines, etc.) to see eye to eye. 30 | Yes, a tall order, but I'm optimistic about it. 31 | 32 | The hypothetical situation is that I have an instance of Apache Jena Fuseki (a database for RDF) running on my local machine. 33 | There is a software agent (semantic web style) running on my local machine that knows how to interact with Apache Jena Fuseki. 34 | I am running a software agent (semantic web style) to whom I make requests. 35 | 36 | I have a file on my machine that I want to load into a dataset on the Apache Jena Fuseki instance. 37 | I type this request to my agent "load /mnt/toys/gifts.ttl into Apache Jena Fuseki listening on port 3030 at dataset 'gifts' on 25 Dec early in the morning." 38 | 39 | My agent produces the following RDF (or I do by some other means) in TriG serialization: 40 | 41 | ```turtle 42 | @prefix : . 43 | @prefix gist: . 44 | @prefix owl: . 45 | @prefix rdf: . 46 | @prefix rdfs: . 47 | @prefix skos: . 48 | @prefix xml: . 49 | @prefix xsd: . 50 | @prefix schema: . 51 | 52 | :message0 a gist:Message ; 53 | gist:comesFromAgent [ gist:name "Justin Dowdy" ; 54 | gist:hasAddress [ gist:containedText "justin2004@hotmail.com" ] ] ; 55 | gist:isAbout :message0content . 56 | 57 | :message0content a gist:Content, :NamedGraph ; 58 | rdfs:comment "this named graph is the content of the message" . 59 | 60 | :message0content { 61 | :message0content gist:hasGoal :goal0 . 62 | :goal0 a gist:Goal ; 63 | rdfs:comment "this is the goal specified in the content of the message" ; 64 | gist:isAbout :goal0content . 65 | } 66 | 67 | :goal0content a gist:Content , :NamedGraph ; 68 | rdfs:comment "this named graph is the content of the goal" . 69 | 70 | :goal0content { 71 | [ a gist:Event ; 72 | gist:produces [ a gist:Content ; 73 | gist:isBasedOn [ a gist:FormattedContent ; 74 | gist:hasAddress [ gist:containedText "file:///mnt/toys/gifts.ttl" ] ; 75 | gist:isExpressedIn [ a gist:MediaType ; 76 | schema:encodingFormat "application/turtle" ] ] ; 77 | gist:isPartOf [ a gist:Content ; 78 | gist:name "gifts" ; 79 | rdfs:comment 'the dataset called "gifts"' ; 80 | gist:isPartOf [ a gist:System ; 81 | gist:hasAddress [ gist:containedText "http://127.0.0.1:3030" ] ; 82 | gist:name "Apache Jena Fuseki" ] ] ] ; 83 | gist:plannedStartDateTime "2022-12-25T01:00:00Z"^^xsd:dateTime ] 84 | } 85 | ``` 86 | 87 | ### Side Note 88 | 89 | You might notice that I've used the URI of an RDF named graph in the place where a resource would typically be expected. 90 | With this blog post I am also thinking about using named graphs to represent the content of goals (`gist:Goal`). 91 | Really a named graph could represent the content of many different types of things. 92 | 93 | 94 | 95 | 96 | 97 | ## Back to the semantic message example 98 | 99 | My agent then puts that RDF onto the semantic message bus (the bus where agents listen for and send RDF) on my local machine. 100 | The agent that governs Apache Jena Fuseki sees the RDF and recognizes that it knows how to handle the request. 101 | 102 | The Fuseki agent that interprets that RDF needs to know some things. 103 | 104 | The Fuseki agent needs to know things like: 105 | - that it is capable of and allowed to handle requests to load data into the Apache Jena Fuseki running on localhost at port 3030 106 | - how to use [GSP](https://www.w3.org/TR/sparql11-http-rdf-update/) or some other programmatic method to load data into Fuseki 107 | - how to reference a dataset or optionally create one if the desired on does not exist 108 | - how to delay the execution of this (since the `gist:plannedStartDateTime` is in the future) 109 | 110 | My agent needs to know things like: 111 | - it is allowed to make assumptions 112 | - e.g. if I leave off the year, when I am talking about a goal, when I reference a date then I probably mean whatever year that date occurs in next 113 | - it can look in my existing graphs (perhaps my "personal knowledge graphs") to gather information 114 | 115 | Fuseki's agent can't be too finicky about interpreting the RDF. 116 | The RDF isn't really a request conforming to a contract; it is more of a spirit of a request. 117 | 118 | If you are familiar with RDF and [gist](https://github.com/semanticarts/gist), the spirit of the RDF is pretty clear "early in the morning on December 25th find the file /mnt/toys/gifts.ttl and load it into the dataset 'gifts' on the Apache Jena Fuseki server running on localhost at port 3030." 119 | 120 | If the agent saw this message or a similar message but it knew the message content wasn't sufficient for it to do anything then it would reply, by putting RDF onto the semantic message bus, with the content of another goal as if to say "did you mean this?" 121 | There could be a back and forth between my agent and the agent governing Apache Jena Fuseki as my agent figures out how to schedule the ingestion of that data. 122 | 123 | But this time Fuseki's agent knew what to do. 124 | It runs the following command: 125 | 126 | ```bash 127 | at 25 dec 1am <<~ 128 | curl -X POST 'http://127.0.0.1:3030/gifts/data' -H 'Content-type: text/turtle' --data-binary @/mnt/toys/gifts.ttl 129 | ~ 130 | ``` 131 | 132 | My agent receives some confirmation RDF and the gifts should be available, via SPARQL, before the kids wake up on Christmas morning. 133 | 134 | ## The Article 135 | 136 | In this post I'm mostly sketching out some of the consequences of the ideas presented in [this](https://www-sop.inria.fr/acacia/cours/essi2006/Scientific%20American_%20Feature%20Article_%20The%20Semantic%20Web_%20May%202001.pdf) 2001 Scientific American article. 137 | 138 | > Standardization can only go so far, because we can't anticipate all possible future needs. 139 | 140 | Right on. 141 | 142 | > The Semantic Web, in contrast, is more flexible. The consumer and producer agents can reach a shared understanding by exchanging ontologies, which provide the vocabulary needed for discussion. 143 | 144 | I'm less optimistic that we'll sort out useful ontology exchange anytime soon. 145 | In the mean time I think picking a single upper ontology that is squishy in the right ways is a path forward. 146 | 147 | > Semantics also makes it easier to take advantage of a service that only partially matches a request. 148 | 149 | I think for semantics to work in this way we have to accept that our systems will get more adaptive at the cost of becoming less brittle. 150 | 151 | Brittle: 152 | - by design, shouldn't ever be wrong 153 | - when it sees something unexpected it stops or breaks 154 | 155 | Adaptive: 156 | - by design, could be wrong 157 | - when it sees something unexpected it tries to figure it out 158 | 159 | That might be hard for people to accept. 160 | Perhaps it is why we haven't progressed much on these kind of agents since the 2001 article. 161 | 162 | 163 | ## Closing 164 | 165 | I haven't sketched everything out. 166 | For example, what if the command fails on 25 Dec because the file is missing? 167 | I'd expect the Fuseki agent to tell my agent. 168 | Also maybe my agent could periodically check that the file is accessible and report back to me if it isn't. 169 | 170 | Anyway, I imagine you get the idea. 171 | 172 | I do think a requirement of semantic message buses is that all agents must have the same world view and speak the same language. 173 | Ontologies set the world view and language. 174 | I used the [gist upper ontology](https://github.com/semanticarts/gist) for my example. 175 | 176 | Maybe make an agent! 177 | Or let me know what you think about this stuff. 178 | -------------------------------------------------------------------------------- /semantic_messages/media/tweet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/semantic_messages/media/tweet.png -------------------------------------------------------------------------------- /semantic_messages/mo.trig: -------------------------------------------------------------------------------- 1 | @prefix : . 2 | @prefix gist: . 3 | @prefix owl: . 4 | @prefix rdf: . 5 | @prefix rdfs: . 6 | @prefix skos: . 7 | @prefix xml: . 8 | @prefix xsd: . 9 | @prefix schema: . 10 | 11 | :message0 a gist:Message ; 12 | gist:comesFromAgent [ gist:name "Justin Dowdy" ; 13 | gist:hasAddress [ gist:containedText "justin2004@hotmail.com" ] ] ; 14 | gist:isAbout :message0content . 15 | 16 | :message0content a gist:Content, :NamedGraph ; 17 | rdfs:comment "this named graph is the content of the message" . 18 | 19 | :message0content { 20 | :message0content gist:hasGoal :goal0 . 21 | :goal0 a gist:Goal ; 22 | rdfs:comment "this is the goal specified in the content of the message" ; 23 | gist:isAbout :goal0content . 24 | } 25 | 26 | :goal0content a gist:Content , :NamedGraph ; 27 | rdfs:comment "this named graph is the content of the goal" . 28 | 29 | :goal0content { 30 | [ a gist:Event ; 31 | gist:produces [ a gist:Content ; 32 | gist:isBasedOn [ a gist:FormattedContent ; 33 | gist:hasAddress [ gist:containedText "file:///mnt/toys/gifts.ttl" ] ; 34 | gist:isExpressedIn [ a gist:MediaType ; 35 | schema:encodingFormat "application/turtle" ] ] ; 36 | gist:isPartOf [ a gist:Content ; 37 | gist:name "gifts" ; 38 | rdfs:comment 'the dataset called "gifts"' ; 39 | gist:isPartOf [ a gist:System ; 40 | gist:hasAddress [ gist:containedText "http://127.0.0.1:3030" ] ; 41 | gist:name "Apache Jena Fuseki" ] ] ] ; 42 | gist:plannedStartDateTime "2022-12-25T01:00:00Z"^^xsd:dateTime ] 43 | } 44 | -------------------------------------------------------------------------------- /software_in_rdf/readme.md: -------------------------------------------------------------------------------- 1 | # thinking about representing software usage as a sequence of processes in RDF 2 | 3 | 4 | My family took a car trip recently and we wanted our son to be able to watch some episodes of a show he likes along the way. 5 | We have an iPad 2 and I found some Daniel Tiger episodes on YouTube. 6 | I just needed to find some software to traverse the "path" between the YouTube video and the playback of video on the iPad. 7 | 8 | This time I did that traversal manually but wouldn't it have been neat if I could have done something like: 9 | 10 | ``` 11 | prefix obi: 12 | prefix iao: 13 | prefix ro: 14 | prefix pretend: 15 | 16 | select ?consumer where { 17 | ?input (iao:is_specified_input_of/iao:has_specified_output_of)+ ?output . 18 | ?output iao:is_specified_input_of ?consumer . 19 | ?consumer a obi:display_function . 20 | ?consumer ro:located_in pretend:iPad2 . 21 | filter(?input = ) . 22 | } 23 | ``` 24 | 25 | Which is a [SPARQL](https://en.wikipedia.org/wiki/SPARQL) query. 26 | 27 | 28 | `(iao:is_specified_input_of/iao:has_specified_output_of)+` 29 | 30 | is shorthand for something like a chain a processes: 31 | 32 | ``` 33 | ?input0 iao:is_specified_input_of ?process0 . 34 | ?process0 iao:has_specified_output_of ?output0 . 35 | 36 | ?output0 iao:is_specified_input_of ?process1 . 37 | ?process1 iao:has_specified_output_of ?output1 . 38 | ``` 39 | 40 | etc. 41 | 42 | 43 | 44 | I started to make an ontology that could represent the steps involved. 45 | - using software to: 46 | - download files from the web 47 | - convert between video container files and transcode 48 | - transfer files between locally networked devices 49 | - play audio/video files on an iPad2 50 | 51 | but it seems like there are some existing ontologies that will work. 52 | Those include: 53 | 54 | - Ontology for Biomedical Investigations 55 | - Information Artifact Ontology 56 | - Relation Ontology 57 | - Software Ontology 58 | 59 | All of which can be found at http://www.obofoundry.org/ 60 | 61 | 62 | 63 | 64 | ### possibility 65 | If we were to encode all program (software) capabilities into RDF then SPARQL queries could tell you if something you want to do with software is possible. 66 | 67 | 68 | ### program selection 69 | If you modified the query to output all the processes along the way then you could see the sequence of programs you would need to carry out your desired task. I don't think SPARQL has paths as a first class thing yet but Stardog has an extension for that. 70 | 71 | Perhaps my SPARQL query would have returned something like: 72 | 73 | `youtube-dl, ffmpeg, python -m http.server, FileBrowser (Apple App Store)` 74 | 75 | 76 | ### program configuration 77 | If each transformation function that software could perform (swo:is_executed_in) came with annotations that contain the arguments given to the command line interface of the program to make it perform the desired function then you'd have some great hints about what to type (at a command shell). 78 | 79 | 80 | ### program generation 81 | And if the Software Ontology had Classes for command line arguments we would be getting closer to SPARQL queries that could output shell scripts. 82 | 83 | --- 84 | 85 | 86 | Note that I've taken some shortcuts for readability. 87 | 88 | e.g. ro:located_in is really 89 | 90 | 91 | 92 | 93 | 94 | -------------------------------------------------------------------------------- /software_in_rdf/ytdl.ttl: -------------------------------------------------------------------------------- 1 | # scratch pad for thinking about the functions of the program "youtube-dl" 2 | 3 | ytdl a obi:software_script; 4 | ro:has_part ytdl-as ; 5 | ro:has_part ytdl-os ; 6 | 7 | 8 | # action spec 9 | ytdl-as a iao:action_specification ; 10 | obi:is_about [ a iao:action_specification ; 11 | a bfo:realizeable_entity ; 12 | a bfo:speficically_dep_continuant ; 13 | rdfs:comment "downloading remote content" ] . 14 | 15 | ytdl-os a iao:objective_speficiation ; 16 | obi:is_about [ a bfo:realizeable_entity ; 17 | rdfs:comment "downloading remote content?" ] . 18 | 19 | 20 | # pretending downloading is a transformation 21 | ytdl-run a obi:data_transformation ; 22 | a obi:planned_process ; 23 | 24 | 25 | # swo has swo:REST_service which ytdl uses to output a mp4 file 26 | 27 | # are params input to ytdl or configuration? 28 | 29 | # swo needs a copy class that is a subclass of obi:data_transformation 30 | # moving from one media (location) to another 31 | # actually swo:data_storage might work 32 | 33 | # TODO there is swo:has_specified_data_input why is it not a subclass of obi:has_specified_input? 34 | 35 | ytdl-run obi:has_specified_input [ a swo:REST_service ] . 36 | 37 | ytdl-run obi:has_specified_output [ a swo:data_format_specification ; 38 | a swo:mp4 ; # TODO add to swo 39 | rdfs:comment "m4a or webm av container" ] . 40 | -------------------------------------------------------------------------------- /sparql-gotcha/README.md: -------------------------------------------------------------------------------- 1 | # SPARQL Gotcha 2 | 3 | 4 | Using [Wikidata](https://www.wikidata.org) let's look for a connection between President Obama (Q76) and Paul Simon (Q4028). 5 | 6 | It is simple -- should be very easy, right? 7 | 8 | 9 | ## Are they directly connected? 10 | 11 | It would be ideal, for an exploratory query session, if we could just run this query: 12 | 13 | ``` 14 | select * { 15 | ?s ?p ?o . 16 | filter(?s=wd:Q76 ). 17 | filter(?o=wd:Q4028). 18 | } 19 | ``` 20 | [1] 21 | 22 | But in RDF edges are directed so that query only looks triples whose subject is President Obama and whose object is Paul Simon. 23 | It would not find triples who subject is Paul Simon and whose object is President Obama. 24 | (There is a way to make this ideal query do what we'd like it to do but I'll describe that at the end on this post.) 25 | 26 | In order to not care about the direction of the edge we'd need to run this: 27 | ``` 28 | select * { 29 | {?s ?p ?o .} 30 | union 31 | {?o ?p ?s .} 32 | filter(?s=wd:Q76 ). 33 | filter(?o=wd:Q4028). 34 | } 35 | ``` 36 | [2] 37 | 38 | 39 | That query looks for these 2 triple patterns: 40 | ``` 41 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - - - + 42 | ' President Obama ' --> ' some predicate ' --> ' Paul Simon ' 43 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - - - + 44 | 45 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - - - + 46 | ' Paul Simon ' --> ' some predicate ' --> ' President Obama ' 47 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - - - + 48 | ``` 49 | 50 | 51 | There are no results found. 52 | 53 | 54 | 55 | ## Are they indirectly connected? 56 | #### Do they have one node between them? 57 | 58 | ``` 59 | select * { 60 | ?s ?p ?o . 61 | ?o ?p1 ?o1 . 62 | filter(?s=wd:Q76 ). 63 | filter(?o1=wd:Q4028). 64 | } 65 | ``` 66 | 67 | No results -- but notice that the query only looks for *these* two back to back triple patterns: 68 | ``` 69 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - + + - - - - - - - -+ + - - - - - -+ 70 | ' President Obama ' --> ' some predicate ' --> ' some object ' --> ' some predicate ' --> ' Paul Simon ' 71 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - + + - - - - - - - -+ + - - - - - -+ 72 | ``` 73 | 74 | 75 | That query will not find *these* two back to back triples patterns: 76 | ``` 77 | + - - - - - -+ + - - - - - - - -+ + - - - - - - + + - - - - - - - -+ + - - - - - - - - + 78 | ' Paul Simon ' --> ' some predicate ' --> ' some object ' --> ' some predicate ' --> ' President Obama ' 79 | + - - - - - -+ + - - - - - - - -+ + - - - - - - + + - - - - - - - -+ + - - - - - - - - + 80 | ``` 81 | 82 | We could update the filter to allow ?s and ?o1 to be Q76 or Q4028 (but not the same) but we'd still be missing a possible connection. 83 | 84 | That possible connection we wouldn't find is this: 85 | ``` 86 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - + 87 | ' President Obama ' --> ' some predicate ' --> ' some object ' 88 | + - - - - - - - - + + - - - - - - - -+ + - - - - - - + 89 | ^ 90 | | 91 | | 92 | + - - - - - - - - + + - - - - - - - -+ | 93 | ' Paul Simon ' --> ' some predicate ' ------+ 94 | + - - - - - - - - + + - - - - - - - -+ 95 | ``` 96 | [3] 97 | 98 | 99 | And it is that very connection that is in the graph. 100 | Next let's see how we could find a connection, in general, without worry about the specific sequence of directed edges. 101 | 102 | 103 | 104 | 105 | Do President Obama and Paul Simon have one node between them (trying all possible directed edge orders)? 106 | ``` 107 | select * { 108 | ?s ((<>|!<>)|^(<>|!<>))/((<>|!<>)|^(<>|!<>)) ?o . 109 | filter(?s=wd:Q76 ). 110 | filter(?o=wd:Q4028). 111 | } 112 | ``` 113 | 114 | Yes they do! 115 | ``` 116 | s | o ‖ 117 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 118 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 119 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 120 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 121 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 122 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 123 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 124 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 125 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q4028 ‖ 126 | ``` 127 | There are 9 paths between them. 128 | That query makes use of [property paths](https://www.w3.org/TR/sparql11-property-paths/#path-language). 129 | But it does not show what the paths are. 130 | 131 | 132 | ## What are the paths between them? 133 | 134 | 135 | This query will at least show you what the node in the middle is: 136 | ``` 137 | select * { 138 | ?s ((<>|!<>)|^(<>|!<>)) ?node . 139 | ?node ((<>|!<>)|^(<>|!<>)) ?o . 140 | filter(?s=wd:Q76 ). 141 | filter(?o=wd:Q4028). 142 | } 143 | ``` 144 | 145 | Results: 146 | ``` 147 | s | node | o ‖ 148 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q5 | http://www.wikidata.org/entity/Q4028 ‖ 149 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q30 | http://www.wikidata.org/entity/Q4028 ‖ 150 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q6581097 | http://www.wikidata.org/entity/Q4028 ‖ 151 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/L485 | http://www.wikidata.org/entity/Q4028 ‖ 152 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q1860 | http://www.wikidata.org/entity/Q4028 ‖ 153 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q1860 | http://www.wikidata.org/entity/Q4028 ‖ 154 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q1860 | http://www.wikidata.org/entity/Q4028 ‖ 155 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q463303 | http://www.wikidata.org/entity/Q4028 ‖ 156 | http://www.wikidata.org/entity/Q76 | http://www.wikidata.org/entity/Q67311526 | http://www.wikidata.org/entity/Q4028 ‖ 157 | ``` 158 | 159 | This is an improvement. For example wd:Q5 is "human." So it looks like one way President Obama and Paul Simon are connected is by the fact that they are both human. 160 | 161 | 162 | 163 | If you want to see what the predicates are in those paths you have to add some additional patterns. 164 | And after some exploration you arrive at the right sequence of patterns: 165 | 166 | ``` 167 | select ?sLabel ?pEntityLabel ?oLabel ?p1EntityLabel ?s1Label { 168 | ?s ?p ?o . 169 | filter(?s = wd:Q76 ) . 170 | ?s1 ?p1 ?o . 171 | filter (isUri(?o)) . 172 | filter (?s1=wd:Q4028) . 173 | optional {?p ^(wikibase:claim|wikibase:directClaim|wikibase:statementProperty) ?pEntity} . 174 | optional {?p1 ^(wikibase:claim|wikibase:directClaim|wikibase:statementProperty) ?p1Entity} . 175 | SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } . 176 | } 177 | ``` 178 | 179 | We can see that Paul Simon and President Obama share some common attributes: 180 | ``` 181 | ||sLabel ||pEntityLabel ||oLabel ||p1EntityLabel ||s1Label | 182 | |Barack Obama |member of |American Academy of Arts and Sciences |member of |Paul Simon | 183 | |Barack Obama |described by source |Obalky knih.cz |described by source |Paul Simon | 184 | |Barack Obama |languages spoken, written or signed |English |languages spoken, written or signed |Paul Simon | 185 | |Barack Obama |writing language |English |languages spoken, written or signed |Paul Simon | 186 | |Barack Obama |native language |English |languages spoken, written or signed |Paul Simon | 187 | |Barack Obama |personal pronoun |L485 |personal pronoun |Paul Simon | 188 | |Barack Obama |sex or gender |male |sex or gender |Paul Simon | 189 | |Barack Obama |instance of |human |instance of |Paul Simon | 190 | |Barack Obama |country of citizenship |United States of America |country of citizenship |Paul Simon | 191 | ``` 192 | 193 | 194 | ## Closing thoughts 195 | 196 | Since SPARQL does not have first class support for paths you do have to put more effort into the query as opposed to [Cypher](https://neo4j.com/developer/cypher/). Cypher does support paths as a first class thing. 197 | 198 | Also Because SPARQL doesn't let you bind variables on a "backwards edge" like Cypher does, in SPARQL you have to resort to using a property path like: `((<>|!<>)|^(<>|!<>))`. And that property path doesn't let you bind the matching edge (forward or backwards) so it is only good to find *if* there is a connection -- not for finding *what* the connection is. 199 | 200 | I do know that [Stardog](https://www.stardog.com/blog/a-path-of-our-own/) does have an extension to SPARQL that treats paths as first class but I haven't tried it yet. 201 | 202 | 203 | I still prefer the simplicity of the RDF model and its powerful query langauge (SPARQL) over LPG models (like Neo4j). SPARQL's inability to bind a variable on a "backwards edge" and no first class support for paths only make it harder to use for exploratory querying. If you embed SPARQL in some application code you will already know what your needs are and you'll have time to make the appropriate triple and graph patterns. If you need to find paths you can do some [iterative deepening](https://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search). 204 | 205 | 206 | ## Quick final note 207 | #### Making the "ideal query" work 208 | 209 | At the beginning I said that this was an ideal query for looking for a direct connection between President Obama and Paul Simon. 210 | 211 | ``` 212 | select * { 213 | ?s ?p ?o . 214 | filter(?s=wd:Q76 ). 215 | filter(?o=wd:Q4028). 216 | } 217 | ``` 218 | I think it is ideal for the query writer because it can be expressed/typed quickly. 219 | 220 | In order to allow this query to work as we desire, in general, we'd need to mandate that all predicates have their inverse represented. For example, if we a triple that corresponds to this statement: 221 | 222 | `Paul Simon saw President Obama` 223 | 224 | Upon seeing that triple we'd need to derive this triple: 225 | 226 | `President Obama was seen by Paul Simon` 227 | 228 | With a [reasoner](https://en.wikipedia.org/wiki/Semantic_reasoner) it is easy to set up the conditions for this to happen. 229 | You just need reasoning enabled and these additional triples in your data: 230 | ``` 231 | :wasSeenBy a owl:ObjectProperty . 232 | 233 | :saw a owl:ObjectProperty ; 234 | owl:inverseOf :wasSeenBy . 235 | ``` 236 | 237 | But we don't just care about `:seen` and `:wasSeenBy` so we'd need to do this for every predicate (object property, specifically) in all the ontologies our data is using. Which might be a pain to do by hand but I think we could write some SPARQL do this pretty easily. 238 | 239 | Maybe the real hurdle is that using an OWL reasoner on lots of triples is tricky business. 240 | 241 | And if we pretend that is now easy and fast *then* we'll likely have the need or desire to reason on data in someone else's graph using an ontology (perhaps with all inverse predicates defined) we bring. I am working on something now to hopefully make that easier. Stay tuned! 242 | 243 | 244 | 245 | --- 246 | [1] There are some variations on this simple query. Two more I can think of right now are 1) using [VALUES](https://www.w3.org/TR/sparql11-query/#inline-data) and 2) using a literal instead of a variable (which you filter on). 247 | 248 | [2] In order to run this query yourself you can use this bash command: 249 | 250 | ``` 251 | curl --silent -H 'Accept: text/csv' 'https://query.wikidata.org/sparql' \ 252 | --data-urlencode 'query= 253 | PREFIX wikibase: 254 | PREFIX wd: # Wikibase entity - item or property. 255 | PREFIX wdt: # Truthy assertions about the data, links entity to value directly. 256 | PREFIX p: # Links entity to statement 257 | PREFIX ps: # Links value to statement 258 | PREFIX pq: # Links qualifier to statement node 259 | PREFIX rdfs: 260 | PREFIX bd: 261 | prefix sch: 262 | select * { 263 | {?s ?p ?o .} 264 | union 265 | {?o ?p ?s .} 266 | filter(?s=wd:Q76 ). 267 | filter(?o=wd:Q4028). 268 | } 269 | limit 300' 270 | ``` 271 | 272 | 273 | [3] If you want to make ASCII graphs like this you can use [graph-easy-box](https://github.com/justin2004/graph-easy-box). 274 | See `some.dg` in this directory. 275 | -------------------------------------------------------------------------------- /sparql-gotcha/some.dg: -------------------------------------------------------------------------------- 1 | digraph F { 2 | rankdir=LR; 3 | b->p->s ; 4 | ss->pp->bb; 5 | b [label="President Obama";style=dashed]; 6 | bb [label="President Obama";style=dashed]; 7 | s [label="Paul Simon";style=dashed]; 8 | ss [label="Paul Simon";style=dashed]; 9 | p [label="some predicate";style=dashed]; 10 | pp [label="some predicate";style=dashed]; 11 | } 12 | 13 | /* digraph G { */ 14 | /* rankdir=LR; */ 15 | /* b->p->ss ; */ 16 | /* s->pp->bb; */ 17 | /* b [label="President Obama";style=dashed]; */ 18 | /* bb [label="President Obama";style=dashed]; */ 19 | /* s [label="Paul Simon";style=dashed]; */ 20 | /* ss [label="Paul Simon";style=dashed]; */ 21 | /* p [label="some predicate";style=dashed]; */ 22 | /* pp [label="some predicate";style=dashed]; */ 23 | /* } */ 24 | 25 | 26 | 27 | /* digraph H { */ 28 | /* rankdir=LR; */ 29 | /* b->p->o ; */ 30 | /* o->pp->s; */ 31 | /* b [label="President Obama";style=dashed]; */ 32 | /* s [label="Paul Simon";style=dashed]; */ 33 | /* o [label="some object";style=dashed]; */ 34 | /* p [label="some predicate";style=dashed]; */ 35 | /* pp [label="some predicate";style=dashed]; */ 36 | /* } */ 37 | 38 | /* digraph H1 { */ 39 | /* rankdir=LR; */ 40 | /* s->p->o ; */ 41 | /* o->pp->b; */ 42 | /* b [label="President Obama";style=dashed]; */ 43 | /* s [label="Paul Simon";style=dashed]; */ 44 | /* o [label="some object";style=dashed]; */ 45 | /* p [label="some predicate";style=dashed]; */ 46 | /* pp [label="some predicate";style=dashed]; */ 47 | /* } */ 48 | 49 | 50 | /* digraph I { */ 51 | /* rankdir=LR; */ 52 | /* b->p->o ; */ 53 | /* s->pp->o; */ 54 | /* b [label="President Obama";style=dashed]; */ 55 | /* s [label="Paul Simon";style=dashed]; */ 56 | /* o [label="some object";style=dashed]; */ 57 | /* p [label="some predicate";style=dashed]; */ 58 | /* pp [label="some predicate";style=dashed]; */ 59 | /* } */ 60 | -------------------------------------------------------------------------------- /using_apl/README.md: -------------------------------------------------------------------------------- 1 | # How it Feels to Use APL 2 | 3 | 4 | ## APL 5 | 6 | APL 7 | 8 | APL is A Programming Language. 9 | 10 | It works quite well when you are working with data that is or can be viewed as a rectangular collection of elements. 11 | 12 | I've used it to [teach image processing](https://github.com/justin2004/image-processing#image-processing-with-apl) to students. 13 | 14 | In APL the number of primitives (functions and operators) you need to know is pretty small. 15 | Each primitive consists of a single character. 16 | 17 | Here they are: 18 | ``` 19 | ! * < = > ? | ~ ¨ ¯ × ÷ ← ↑ → ↓ ∆ ∇ ∊ ∘ 20 | ∧ ∨ ∩ ∪ ≠ ≡ ≢ ≤ ≥ ⊂ ⊃ ⊆ ⊖ ⊢ ⊣ ⊤ ⊥ ⋄ ⌈ ⌊ 21 | ⌷ ⌸ ⌹ ⌺ ⌽ ⌿ ⍀ ⍉ ⍋ ⍎ ⍒ ⍕ ⍙ ⍝ ⍞ ⍟ ⍠ ⍣ ⍤ ⍥ 22 | ⍨ ⍪ ⍫ ⍬ ⍱ ⍲ ⍳ ⍴ ⍵ ⍷ ⍸ ⍺ ⎕ ○ 23 | ``` 24 | 25 | Entering the characters on a standard keyboard is [not a problem](https://aplwiki.com/wiki/Typing_glyphs#By_platform). 26 | 27 | ## What Is APL Though? 28 | 29 | ["Is [APL] just a set of well-chosen matrix operators?"](https://news.ycombinator.com/item?id=17186378) 30 | 31 | APL does have a thoughtfully chosen set of array primitives. 32 | The primitive coverage feels like it approaches a periodic table of computational process behavior: everything you need to efficiently express any computation. 33 | You can do anything with it and you can often do things with a [surprisingly small amount of primitives](https://www.youtube.com/watch?v=a9xAKttWgP4). 34 | 35 | [But there is more to this language.](https://news.ycombinator.com/item?id=17186470) 36 | 37 | 38 | 39 | ## How it Feels to Use it 40 | 41 | To me, programming in APL feels like designing molecules. 42 | I like Clojure a lot but programming in Clojure doesn't feel like that. 43 | I've never designed a molecule so this blog post is meant to be evocative and is based on my subjective impressions and associations. 44 | 45 | 46 | If you want to invoke or reference a function in most programming languages you have to spell the name of the function. 47 | Often APL programmers talk of "the spelling of a function in APL" but they mean something different. 48 | 49 | APL programmers might say "arithmetic mean (or average) is spelled `+/÷≢` in APL" 50 | 51 | I suppose an APL expression is like a spelling in that there are typographical items that you string together. 52 | But APL expressions (specifically [trains](https://help.dyalog.com/18.2/index.htm#Language/Introduction/Trains.htm)) feel like condensed [structural formulae](https://en.wikipedia.org/wiki/Structural_formula) of computational processes. 53 | A structural formula of a molecule shows how the constituent atoms (primitives) are bonded together. 54 | 55 | 56 | CH3CH2OH 57 | 58 | ![ethanol maybe](media/ethanol.png) 59 | 60 | An APL expression is a condensed formula that indicates how the language primitives should be bonded together. 61 | 62 | In fact, if you express a function in a [Dyalog APL REPL](https://tryapl.org/) you'll see a tree rendering of the derived function. 63 | 64 | ``` 65 | +/÷≢ 66 | ┌─┼─┐ 67 | / ÷ ≢ 68 | ┌─┘ 69 | + 70 | ``` 71 | 72 | The tree diagrams of derived functions are one way Dyalog APL renders the bonding of primitives in what it calls trains. 73 | Those tree diagrams and structural formulas are helpful for visualizing how the molecule's parts contribute to the whole which will interact with their surroundings (arguments or chemical entities, respectively). 74 | 75 | ## Example Time 76 | 77 | Let's look at an example. 78 | Let's say you want to put a comma between each item in a sequence. 79 | In Clojure we can use `interpose` to do that. 80 | The Clojure core has already assigned a function to that name. 81 | 82 | ```clojure 83 | (interpose "," (range 1 10)) 84 | => (1 "," 2 "," 3 "," 4 "," 5 "," 6 "," 7 "," 8 "," 9) 85 | ``` 86 | 87 | In Clojure you spell the name of the function ("interpose" in this example). 88 | But `interpose` is just a name and interpose itself is really [elsewhere](https://github.com/clojure/clojure/blob/35bd89f05f8dc4aec47001ca10fe9163abc02ea6/src/clj/clojure/core.clj#L5231). 89 | 90 | It isn't really convenient to break Clojure's interpose into pieces and re-mix the parts to do something different. 91 | It is intended that interpose is one of your primitives. 92 | 93 | In APL you reference interpose behavior directly (without going through a multi-character name): 94 | 95 | ```apl 96 | 1↓,',',⍪⍳9 97 | 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 98 | ``` 99 | All the parts of APL's interpose are exposed and the re-mixing potential is immediate. 100 | 101 | Let's step through that APL expression (in APL evaluation is right to left). 102 | 103 | [iota](https://help.dyalog.com/18.2/Content/Language/Symbols/Iota.htm) 9 104 | 105 | ```apl 106 | ⍳9 107 | 1 2 3 4 5 6 7 8 9 108 | ``` 109 | 110 | Let's think of that as the argument and what we do next as the interpose behavior. 111 | 112 | [table](https://help.dyalog.com/18.2/Content/Language/Symbols/Comma%20Bar.htm) it 113 | 114 | ```apl 115 | ⍪⍳9 116 | 1 117 | 2 118 | 3 119 | 4 120 | 5 121 | 6 122 | 7 123 | 8 124 | 9 125 | ``` 126 | 127 | [catenate](https://help.dyalog.com/18.2/Content/Language/Symbols/Comma.htm#kanchor3327) the character `,` onto the matrix (with [scalar extension](https://aplwiki.com/wiki/Scalar_extension)) 128 | 129 | ```apl 130 | ',',⍪⍳9 131 | , 1 132 | , 2 133 | , 3 134 | , 4 135 | , 5 136 | , 6 137 | , 7 138 | , 8 139 | , 9 140 | ``` 141 | 142 | [ravel](https://help.dyalog.com/18.2/Content/Language/Symbols/Comma.htm#kanchor3325) it 143 | 144 | ```apl 145 | ,',',⍪⍳9 146 | , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 147 | ``` 148 | 149 | 1 [drop](https://help.dyalog.com/18.2/Content/Language/Symbols/Down%20Arrow.htm#kanchor1111) to remove the unwanted leading comma 150 | 151 | ```apl 152 | 1↓,',',⍪⍳9 153 | 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 154 | ``` 155 | 156 | Sometimes you just build an expression up like this and you're done. 157 | 158 | But the resultant expression isn't a function so you can't name it, pass actual arguments to it, etc. 159 | ```apl 160 | 1↓,',',⍪ 161 | SYNTAX ERROR: Missing right argument 162 | ``` 163 | 164 | But you could turn that expression into a function (in this case by using ∘ [jot](https://help.dyalog.com/18.2/Content/Language/Symbols/Jot.htm)). 165 | 166 | ```apl 167 | 1∘↓∘,','∘,∘⍪ 168 | ┌─┴─┐ 169 | ∘ ∘ 170 | ┌┴┐ ┌┴┐ 171 | ∘ , ∘ ⍪ 172 | ┌┴┐ ┌┴┐ 173 | 1 ↓ , , 174 | ``` 175 | 176 | 177 | Also here is another way to formulate interpose with an APL function: 178 | 179 | ```apl 180 | (⊣,',',⊢)/ 181 | / 182 | ┌─┘ 183 | ┌─┼───┐ 184 | ⊣ , ┌─┼─┐ 185 | , , ⊢ 186 | ``` 187 | 188 | 189 | ## Summary of How it Feels to Use APL 190 | 191 | In APL it feels like you are making molecules with atoms (language primitives) and bonds (combinators/trains). 192 | The spelling of the name of the molecule _is_ the molecule. 193 | The name isn't a layer of indirection; it is directly the entity. 194 | 195 | 196 | ## If the Molecule Needs More Work 197 | 198 | Instead of 1,2,3,4,5,6,7,8,9 what if I want pairs partitioned like: 1 2 , 3 4 , 5 6 , 7 8 199 | 200 | In Clojure you could reference, by name, another function: `partition`. 201 | 202 | 203 | ``` 204 | (interpose "," (partition 2 (range 1 10))) 205 | => ((1 2) "," (3 4) "," (5 6) "," (7 8)) 206 | ``` 207 | 208 | Close enough. 209 | 210 | In APL you can get partition behavior by first reshaping the vector into a matrix. 211 | Below we'll use a train to compute the desired shape of the matrix. 212 | You'll notice that in trains you don't reference arguments explicitly. 213 | Trains are a form of [tacit programming](https://en.wikipedia.org/wiki/Tacit_programming). 214 | 215 | Here is the train to compute the desired shape: 216 | ```apl 217 | (,∘2)(⌊÷∘2) 218 | ┌─┴─┐ 219 | ∘ ┌┴┐ 220 | ┌┴┐ ⌊ ∘ 221 | , 2 ┌┴┐ 222 | ÷ 2 223 | ``` 224 | 225 | Using it looks like: 226 | ```apl 227 | (,∘2)(⌊÷∘2) 9 228 | 4 2 229 | ``` 230 | We'll use the result, the vector `4 2`, as the desired shape of the matrix. 231 | That is, we want to [reshape](https://help.dyalog.com/18.2/Content/Language/Symbols/Rho.htm#kanchor2859) (with ⍴) the argument (a vector) into a matrix with 4 rows 2 columns. 232 | 233 | 234 | ```apl 235 | ⍳9 236 | 1 2 3 4 5 6 7 8 9 237 | 4 2⍴⍳9 238 | 1 2 239 | 3 4 240 | 5 6 241 | 7 8 242 | ``` 243 | 244 | 245 | And then we embed that train into another train: 246 | ```apl 247 | ((,∘2)(⌊÷∘2))⍴⍳ 248 | ┌───┼─┐ 249 | ┌─┴─┐ ⍴ ⍳ 250 | ∘ ┌┴┐ 251 | ┌┴┐ ⌊ ∘ 252 | , 2 ┌┴┐ 253 | ÷ 2 254 | ``` 255 | 256 | Then we just need to follow up with the expression (that we used a moment ago): 257 | ```apl 258 | 1↓,',', 259 | ``` 260 | 261 | Which will: 262 | 263 | - catenate (left and right argument given to `,`): concatenate the comma onto each row of the matrix 264 | - ravel (right argument given to `,`): turn the matrix into a vector 265 | - drop (right and left argument given to `↓`): drop the leading comma 266 | 267 | 268 | All together: 269 | ```apl 270 | 1↓,',',(((,∘2)(⌊÷∘2))⍴⍳) 9 271 | 1 2 , 3 4 , 5 6 , 7 8 272 | ``` 273 | 274 | 275 | # Is All This Chemistry Mostly Because of APL's Single Character Primitives? 276 | 277 | Maybe. 278 | 279 | Single character primitives mean you have less to overcome to express something that can stand alone. 280 | As you type `i` `n` `t` `e` `r` `p` `o` `s` `e` none of that stands alone until you finish the final character. 281 | In APL every primitive (well except the operators) can stand alone. 282 | 283 | My 5 year old son does almost any action, that he wants to do, quickly and with cheerfulness. 284 | He has very little to overcome to perform an action. 285 | He doesn't believe the action might be of little value and therefore not worth the effort. 286 | He has enough energy to do the action. 287 | 288 | Adults, on the other hand, [need to hear reasons](https://youtu.be/7jVr0-ghGWU?t=26) before they get out of their chairs. 289 | 290 | "At my age if I'm sitting down and somebody tells me I need to get up and go in another room I need to be told all the information why first." - Louis CK 291 | 292 | ```c 293 | for(i=0;i<⍵;i++) 294 | ``` 295 | feels tedious, like getting up out of my chair and going into another room. 296 | 297 | I bet Louis would be much more willing to just move his eyes to look at something upon request. 298 | 299 | ```apl 300 | ⍳ 301 | ``` 302 | feels atomic, light, and almost reflexive like moving only my eyes to look at something. 303 | 304 | 305 | 306 | ```clojure 307 | (range) 308 | ``` 309 | feels like something in between... maybe like scooting down a seat so someone can sit next to me. 310 | 311 | # Conclusion 312 | 313 | I'm not sure if the chemistry analogy was necessary but it seemed fun which was enough to motivate me to type all this out. 314 | -------------------------------------------------------------------------------- /using_apl/media/APL_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/using_apl/media/APL_logo.png -------------------------------------------------------------------------------- /using_apl/media/ethanol.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/using_apl/media/ethanol.png -------------------------------------------------------------------------------- /work_on_engineered_artifacts/media/vmstat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/justin2004/weblog/1bc768fde7b3e81f6ac08c1ffa238a328b3eb590/work_on_engineered_artifacts/media/vmstat.png --------------------------------------------------------------------------------