├── sem_layers.png ├── sem_layers.graffle ├── Use Cases └── README.md ├── Requirements └── README.md ├── Tools └── README.md ├── LICENSE ├── .gitignore └── README.md /sem_layers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ceteri/pkg/HEAD/sem_layers.png -------------------------------------------------------------------------------- /sem_layers.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ceteri/pkg/HEAD/sem_layers.graffle -------------------------------------------------------------------------------- /Use Cases/README.md: -------------------------------------------------------------------------------- 1 | ## Current Use Cases for the Group 2 | 3 | * Personal knowledge collection 4 | * Sharing links to add context to group discussion 5 | * Topical conversation on KG related subject 6 | * Collaborating on specific project 7 | * Publishing tutorials 8 | -------------------------------------------------------------------------------- /Requirements/README.md: -------------------------------------------------------------------------------- 1 | ## What are the PKG group’s requirements/needs from a tool stack? 2 | 3 | * Asynchronous project chat 4 | * Shared document for note taking 5 | * Creating semantic markups 6 | * Shared storage 7 | * Versioning 8 | * Publishing on the web 9 | * Knowledge graph 10 | -------------------------------------------------------------------------------- /Tools/README.md: -------------------------------------------------------------------------------- 1 | ## Some tools that can be useful for PKGs 2 | 3 | **Points to think about:** 4 | * Properties 5 | * Where on the stack each tool fits. 6 | 7 | **List** 8 | * [Athens Research](https://github.com/athensresearch/athens) 9 | * [Dendron](https://dendron.so/) 10 | * [Dokieli](https://dokie.li/) 11 | * [Foam](https://foambubble.github.io/foam/) 12 | * [logseq](https://logseq.com/) 13 | * [Obsidian](https://obsidian.md/) 14 | * [org-roam](https://www.orgroam.com/) 15 | * [Remnote](https://www.remnote.io/) 16 | * [Roam Research](https://roamresearch.com/) 17 | * [tiddlyroam](https://tiddlyroam.org/) 18 | 19 | See also [this list](https://www.notion.so/Artificial-Brain-Networked-with-linear-notebook-app-a131b468fc6f43218fb8105430304709) 20 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Paco Nathan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Personal Knowledge Graph 2 | 3 | Collaboration on descriptions used in personal knowledge graph (PKG) practices. 4 | 5 | 6 | ## Semantic Layers 7 | 8 | The following layers are found among processes for developing *knowledge graphs*. 9 | Ostensibly these descriptions can apply across the range of available tools. 10 | 11 | **Discussion about these layers and iteration on their descriptions is needed.** 12 | 13 | ![semantic layers](https://raw.githubusercontent.com/ceteri/pkg/main/sem_layers.png) 14 | 15 | ### Layer 1: remote storage 16 | 17 | For example, the popular *storage grids* such as Amazon S3, Azure Storage, Google GCS, etc., are at **Layer 1**. 18 | These are amazingly robust and cost-effective, although relatively "raw" in the sense that they are neither file systems nor databases. 19 | Also, they are mostly designed for programmers (or applications) to use. 20 | 21 | Use of *remote storage* distinguishes a PKG use case from the trivial case of one person merely using *local storage* on their local computer. 22 | It use implies capabilities for collaboration, publishing, disaster recovery, etc. 23 | 24 | ### Layer 2: versioning 25 | 26 | Services such as GitHub, GitLab, etc., are at **Layer 2**. 27 | These typically bundle the versioning semantics of a tool such as *git* along with a storage grid, then provide ways to publish (e.g., jumping all the way up to **Layer 9**). 28 | Graph-based data and metadata can be difficult to version – or rather, there are specialized methods and *git* doesn't necessarily understand. 29 | 30 | This work is by definition *transactional* in nature. 31 | 32 | ### Layer 3: markdown 33 | 34 | The *markdown* at **Layer 3** is one among many popular formats. 35 | It has the benefits of being relatively human-readable, even in its raw form. 36 | It's also native in Jupyter notebooks, as well as one of the most popular formats for documenting open source projects, and increasingly used among technical publishers as well. 37 | 38 | ### Layer 4: semantic markup 39 | 40 | The *semantic markup* at **Layer 4** begins to add some semantic properties to content formatted in markdown. 41 | For example, means for adding links and other metadata. 42 | Services such as [Obsidian](https://obsidian.md/) and [Roam](https://roamresearch.com/) are largely at this layer. 43 | 44 | ### Layer 5: shared editing 45 | 46 | **Layer 5** shared editing is what people commonly associate with Google Docs or Box. 47 | There's a programming technique called *append-only logs* which makes collaborative editing feasible to manage online. 48 | These services typically bundle **Layer 1** storage, along with some aspects of **Layer 2** versioning. 49 | Generally these services lack much awareness about markdown formats specifically, and tend to be MS Word lookalikes. 50 | 51 | This work is by definition *transactional* in nature. 52 | 53 | ### Layer 6: shared vocabulary 54 | 55 | The *shared vocabulary* in **Layer 6** is where a project attempts to harmonize their semantic markup with commonly used *controlled vocabularies*, such that the metadata references shared definitions. 56 | Examples include [DCMI](https://dublincore.org/specifications/dublin-core/dcmi-terms/#) and [Schema.org](https://schema.org/) among many others. 57 | 58 | In aggregate, this is where *ontology* gets described. 59 | 60 | ### Layer 7: persistent identifiers 61 | 62 | **Layer 7** uses *persistent identifiers* to "populate" the semantic markup such that content can be referenced globally using unique identifiers. 63 | Each class of persistent identifier will have some "authority" backing it. 64 | For example, there are *ISBN* for books, *ISSN* for periodicals, *ORCID* for resesearchers, *ROR* for organizations, *DOI* for articles, etc. 65 | Using a *URL* is perhaps the simplest case. 66 | 67 | Alternatively, a given organization be its own "authority", i.e., it may construct and publish it's own identifiers specific to its context. 68 | This can be performed using [URN](https://en.wikipedia.org/wiki/Uniform_Resource_Name) that are composed of some local identifiers. 69 | For example, each article posted on LinkedIn has a URN specific to LinkedIn, which gets exposed in public as part of that article's URL (web link). 70 | For example, in the link: `https://www.linkedin.com/feed/update/urn:li:activity:6774890201442582528/` the URN portion is `urnli:activity:6774890201442582528` where the latter hex number is a kind of [*uuid*](https://en.wikipedia.org/wiki/Universally_unique_identifier) defined within the context of LinkedIn. 71 | The other elements composing the URN's structure help clarify the semantics about its domain and interpretation. 72 | 73 | While semantic conventions such as [`dct:identifier`](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#http://purl.org/dc/terms/identifier) may be used as "catch-alls" for representing persistent identifiers, unless a KG represents each class of identifier uniquely then it probably won't be able to do support effective queries, inference, validation, embedding, search, etc., 74 | In other words, "good enough" representation does not guarantee effective inference down the road in the KG use cases. 75 | 76 | ### Layer 8: knowledge graph 77 | 78 | Within **Layer 8** is where the *knowledge graph* work happens. 79 | Of course, this has its own internal layering: RDF for triples/quads, then RDFS schema for defining properties, then OWL for machine interpretability, and so on. 80 | 81 | Both the **Layer 6** ontology and the **Layer 7** identifiers must exist as "overlays" atop the content – in other words, as semantic annotation – for the **Layer 8** knowledge graph usage to make any sense. 82 | Sometimes this is described using the relatively dated terms [*TBox*](https://en.wikipedia.org/wiki/Tbox) and [*ABox*](https://en.wikipedia.org/wiki/Bbox) respectively, although these may introduce some distorted interpretation. 83 | 84 | ### Layer 9: publishing 85 | 86 | Finally, there's a *presentation* layer at the top – roughly similar to network layer models – where *publishing* the KGs occurs. 87 | Since the W3C standards emerged from *world wide web*, many of their notions (Solid, LDP, etc.) tend to fixate at this layer, without being especially mindful about practical details for some of the underlying foundations. 88 | 89 | This layer is where many KG use cases provide features for public access. 90 | Publishing may be a matter of: 91 | 92 | * web-based rendering 93 | * search and query capabilities 94 | * API access 95 | 96 | ### Misc. Notes 97 | 98 | Recognizing how the marketing departments of technology vendors tend to promise "all things to all people" in reality few if any commercial offerings provide support across the entire stack of these layers. 99 | 100 | Effective practices in industry tend to: 101 | 102 | * leverage a [*middle-out* strategy](https://answers.knowledgegraph.tech/t/whats-the-difference-between-a-bottom-up-and-a-top-down-ontology-modeling-approach/5064) in lieu of *top-down* EKG practices, where PKG practices may evolve into major components of *middle-out* projects 103 | * integrate multiple libraries, tools, and services to provide coverage across the stack, depending on the needs of their use cases 104 | 105 | Notably, the *graph database* vendors tend to focus on **Layer 1** and the *query* aspects (mixed into either in **Layer 8** or **Layer 9**) of search, while not providing especially effective solutions for the other layers. 106 | 107 | --- 108 | 109 | ## What are the PKG group’s requirements/needs from a tool stack? 110 | 111 | Check the [Requirements section](https://github.com/ceteri/pkg/tree/main/Requirements) 112 | 113 | ## Tools 114 | 115 | Check the [Tools section](https://github.com/ceteri/pkg/tree/main/Tools) 116 | 117 | 118 | ## Current Use Cases for the Group 119 | 120 | Check the [Use Cases section](https://github.com/ceteri/pkg/tree/main/Use%20Cases) 121 | --------------------------------------------------------------------------------