├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── demo
└── index.html
├── img
├── coj.png
└── geojson_optimize.png
└── spec
├── header_example.json
├── header_schema.json
└── readme.md
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributions
2 |
3 | All ideas, questions, comments and suggested changes are welcome and can be submitted by opening an issue in this repository. Simple fixes such as typos, additional explainations and clarifications are welcome directly in pull requests.
4 |
5 | Please understand that any contributions will be under the [MIT License](./LICENSE)
6 |
7 |
8 |
9 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018 Boundless
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # COGJ: Cloud Optimized GeoJSON
2 |
3 | ## What?
4 | COGJ is a derivative of GeoJSON that is optimized for network storage and efficient data access over HTTP using range reads. It was inspired by the [Cloud Optimized GeoTiff](https://www.cogeo.org/) spec and is envisioned to be a vector counterpart.
5 |
6 | The goals are to be:
7 | - Simple --> HTTP(S)
8 | - Flexible --> just enough spec with room to move
9 | - Democratic --> open, human readable, and easy to implement
10 |
11 | ## Why?
12 |
13 | Having opaque files on remote services requires you to download the entire file before being able to make use of it. Just as COG changed this for raster, COGJ aims to do the same for vector.
14 |
15 | The problem exists in all common vector formats:
16 | - Sqlite based (MBTiles, GeoPackage)
17 | - Shapefiles (multiple files, often zipped)
18 | - GeoJSON and GML require custom parsing to avoid XML or JSON encoding issues with partial reads
19 | - Vector services (databases or WFS) are great for subsetting data but require running a service or lambda
20 |
21 | ## Benefits
22 | - **Reduced infrastructure requirements:** efficient data acces without so much as a lambda (serverless)
23 | - **Reduced bandwidth consumption:** load just the parts you need from even the most massive files
24 | - **Reduce client load:** allows browsers and disadvantaged clients to work with datasets that would previously overwelm them
25 | - **Simplification:** single, self describing file
26 | - **Universal ease:** read efficiently from an HTTP(S) connection or filesystem
27 |
28 | ## Drawbacks (to be fair)
29 | - **Read Optimized:** can be authored in a text editor, but isn't easy. Realistically should be written by software.
30 | - **YAGF:** yet another geo format
31 | - **Range reads:** are not supported everywhere
32 |
33 | ## Sweet spots
34 | - Large, read only, public datasets (i.e. Goverments, NGOs)
35 | - Disadvantaged users, slow connections or mobile devices (e.g. Disaster relief)
36 | - Aggregating datasets into a portable web friendly package
37 | - Batch processing/viewing very large amounts of data (e.g. ML input or output)
38 |
39 | # How does it work?
40 |
41 | The concept is simple: a traditional GeoJSON file is broken into *n* number of feature collections, each of which is independently a valid geojson document. The collections can be made using any sorting or ordering algorithm which makes sense for the given data (temporal, spatial, etc.).
42 |
43 | 
44 |
45 | The collections are arranged back-to-back in a single file with the first 10k of the file reserved for metadata. The [metadata header](./spec/readme.md) contains metadata about the file, as a whole, as well as an array of collection metadata.
46 |
47 | In practice, a networked client would:
48 |
49 | 1. Perform a HTTP GET request specifying the first 10k of the file
50 | 2. Parse the header and examine the metadata to determine which collections to load
51 | 3. Use collection metadata to build another HTTP GET range request for the desired collections
52 | 4. Pass the response to any tool that handles GeoJSON
53 |
54 | The same flow would apply to the file on a disk -- just replace HTTP GET range requests with `seek` and `read` commands.
55 |
56 |
57 | ## Demo
58 |
59 | This demo shows 2 files on S3 which both contain the same data [Cadastral.geojson](https://s3.amazonaws.com/cogeojson/Cadastral.geojson) and [Cadastral.geojson.coj](https://s3.amazonaws.com/cogeojson/Cadastral.geojson.coj) in different formats. Both are about 160mb and contain the same cadastral data in Harford County, Maryland.
60 |
61 | [Live Demo](https://s3.amazonaws.com/cogeojson/index.html)
62 |
63 | The demo shows a simple OpenLayers app which lets you load either file with the click of a button. Running this in Chrome with network throttled to Fast 3G setting to emphasize the point -- and because thats the reality for many.
64 |
65 | [](https://www.youtube.com/watch?v=YMM2sGZHgoA)
66 |
67 | ### How hard is this to implement?
68 |
69 | Reading the header:
70 | ```javascript
71 | fetch('https://s3.amazonaws.com/cogeojson/Cadastral.json.coj',{headers: {"Range":"bytes=0-9999"}})
72 | .then(response=>{return response.json();})
73 | ```
74 | Reading collections of features:
75 | ```javascript
76 | fetch('https://s3.amazonaws.com/cogeojson/Cadastral.json.coj',{headers: {"Range":"bytes="+start+"-"+end}})
77 | .then(response => {return response.json()}).then(function(json){
78 | //pass geojson object to your mapping library
79 | ...
80 | }
81 | ```
82 |
83 | ### What about the overhead?
84 |
85 | While there is certainly some overhead related to the metadata and additional feature collection boilerplate, the impact is negligible. While the COGJ version of this test data does have a few extra curly braces, the actual file size is smaller because extra whitespace was removed. What this means is that the overhead of whitespace is larger than efficiently subdividing the file. For our test data, the COGJ is about 10mb smaller.
86 |
87 |
--------------------------------------------------------------------------------
/demo/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |