├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── demo └── index.html ├── img ├── coj.png └── geojson_optimize.png └── spec ├── header_example.json ├── header_schema.json └── readme.md /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributions 2 | 3 | All ideas, questions, comments and suggested changes are welcome and can be submitted by opening an issue in this repository. Simple fixes such as typos, additional explainations and clarifications are welcome directly in pull requests. 4 | 5 | Please understand that any contributions will be under the [MIT License](./LICENSE) 6 | 7 | 8 | 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Boundless 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # COGJ: Cloud Optimized GeoJSON 2 | 3 | ## What? 4 | COGJ is a derivative of GeoJSON that is optimized for network storage and efficient data access over HTTP using range reads. It was inspired by the [Cloud Optimized GeoTiff](https://www.cogeo.org/) spec and is envisioned to be a vector counterpart. 5 | 6 | The goals are to be: 7 | - Simple --> HTTP(S) 8 | - Flexible --> just enough spec with room to move 9 | - Democratic --> open, human readable, and easy to implement 10 | 11 | ## Why? 12 | 13 | Having opaque files on remote services requires you to download the entire file before being able to make use of it. Just as COG changed this for raster, COGJ aims to do the same for vector. 14 | 15 | The problem exists in all common vector formats: 16 | - Sqlite based (MBTiles, GeoPackage) 17 | - Shapefiles (multiple files, often zipped) 18 | - GeoJSON and GML require custom parsing to avoid XML or JSON encoding issues with partial reads 19 | - Vector services (databases or WFS) are great for subsetting data but require running a service or lambda 20 | 21 | ## Benefits 22 | - **Reduced infrastructure requirements:** efficient data acces without so much as a lambda (serverless) 23 | - **Reduced bandwidth consumption:** load just the parts you need from even the most massive files 24 | - **Reduce client load:** allows browsers and disadvantaged clients to work with datasets that would previously overwelm them 25 | - **Simplification:** single, self describing file 26 | - **Universal ease:** read efficiently from an HTTP(S) connection or filesystem 27 | 28 | ## Drawbacks (to be fair) 29 | - **Read Optimized:** can be authored in a text editor, but isn't easy. Realistically should be written by software. 30 | - **YAGF:** yet another geo format 31 | - **Range reads:** are not supported everywhere 32 | 33 | ## Sweet spots 34 | - Large, read only, public datasets (i.e. Goverments, NGOs) 35 | - Disadvantaged users, slow connections or mobile devices (e.g. Disaster relief) 36 | - Aggregating datasets into a portable web friendly package 37 | - Batch processing/viewing very large amounts of data (e.g. ML input or output) 38 | 39 | # How does it work? 40 | 41 | The concept is simple: a traditional GeoJSON file is broken into *n* number of feature collections, each of which is independently a valid geojson document. The collections can be made using any sorting or ordering algorithm which makes sense for the given data (temporal, spatial, etc.). 42 | 43 | ![COGJ vs GeoJSON](/img/geojson_optimize.png) 44 | 45 | The collections are arranged back-to-back in a single file with the first 10k of the file reserved for metadata. The [metadata header](./spec/readme.md) contains metadata about the file, as a whole, as well as an array of collection metadata. 46 | 47 | In practice, a networked client would: 48 | 49 | 1. Perform a HTTP GET request specifying the first 10k of the file 50 | 2. Parse the header and examine the metadata to determine which collections to load 51 | 3. Use collection metadata to build another HTTP GET range request for the desired collections 52 | 4. Pass the response to any tool that handles GeoJSON 53 | 54 | The same flow would apply to the file on a disk -- just replace HTTP GET range requests with `seek` and `read` commands. 55 | 56 | 57 | ## Demo 58 | 59 | This demo shows 2 files on S3 which both contain the same data [Cadastral.geojson](https://s3.amazonaws.com/cogeojson/Cadastral.geojson) and [Cadastral.geojson.coj](https://s3.amazonaws.com/cogeojson/Cadastral.geojson.coj) in different formats. Both are about 160mb and contain the same cadastral data in Harford County, Maryland. 60 | 61 | [Live Demo](https://s3.amazonaws.com/cogeojson/index.html) 62 | 63 | The demo shows a simple OpenLayers app which lets you load either file with the click of a button. Running this in Chrome with network throttled to Fast 3G setting to emphasize the point -- and because thats the reality for many. 64 | 65 | [![Demo Video](/img/coj.png)](https://www.youtube.com/watch?v=YMM2sGZHgoA) 66 | 67 | ### How hard is this to implement? 68 | 69 | Reading the header: 70 | ```javascript 71 | fetch('https://s3.amazonaws.com/cogeojson/Cadastral.json.coj',{headers: {"Range":"bytes=0-9999"}}) 72 | .then(response=>{return response.json();}) 73 | ``` 74 | Reading collections of features: 75 | ```javascript 76 | fetch('https://s3.amazonaws.com/cogeojson/Cadastral.json.coj',{headers: {"Range":"bytes="+start+"-"+end}}) 77 | .then(response => {return response.json()}).then(function(json){ 78 | //pass geojson object to your mapping library 79 | ... 80 | } 81 | ``` 82 | 83 | ### What about the overhead? 84 | 85 | While there is certainly some overhead related to the metadata and additional feature collection boilerplate, the impact is negligible. While the COGJ version of this test data does have a few extra curly braces, the actual file size is smaller because extra whitespace was removed. What this means is that the overhead of whitespace is larger than efficiently subdividing the file. For our test data, the COGJ is about 10mb smaller. 86 | 87 | -------------------------------------------------------------------------------- /demo/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | COGJ viewer 6 | 7 | 8 | 9 | 10 | 15 | 16 | 17 |
18 |
19 |
20 | 151 | 152 | 153 | -------------------------------------------------------------------------------- /img/coj.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/planetfederal/cogj-spec/47d0a5b5fdfb8cfe6b5fac415c054b3d4b6bb73e/img/coj.png -------------------------------------------------------------------------------- /img/geojson_optimize.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/planetfederal/cogj-spec/47d0a5b5fdfb8cfe6b5fac415c054b3d4b6bb73e/img/geojson_optimize.png -------------------------------------------------------------------------------- /spec/header_example.json: -------------------------------------------------------------------------------- 1 | { 2 | "type": "FeatureCollection" 3 | "size": 102429060, 4 | "features": 124672, 5 | "name": "Buildings", 6 | "description":"Building data from Harford County, MD", 7 | "version":"1.0.0", 8 | "published": "2018-09-13T15:05:16+00:00", 9 | "extended_metadata": 10 | { 11 | "start": 52462, 12 | "size": 14575 13 | }, 14 | "bbox": [ 15 | -76.57139851406845, 16 | 39.39485141810742, 17 | -76.08362813450836, 18 | 39.72564001999409 19 | ], 20 | "features": [ 21 | { 22 | "type": "Feature", 23 | "bbox": [ 24 | -76.14459943195337, 25 | 39.47754856857909, 26 | -76.08362813450836, 27 | 39.518897143814925 28 | ], 29 | "geometry": null, 30 | "properties": { 31 | "start": 10240, 32 | "size": 327634, 33 | "name": "Majors Choice", 34 | "description": "Buildings from the majors choice subdivision", 35 | "features": 329 36 | }, 37 | }, 38 | { 39 | "type": "Feature", 40 | "bbox": [ 41 | -76.14459943195337, 42 | 39.518897143814925, 43 | -76.08362813450836, 44 | 39.560245719050755 45 | ], 46 | "geometry": null, 47 | "properties": { 48 | "start": 337875, 49 | "size": 5200289, 50 | "name": "Aberdeen, MD", 51 | "description": "buildings within the city limits of Aberdeen, MD in Harford County", 52 | "features": 5826 53 | }, 54 | } 55 | ] 56 | } 57 | -------------------------------------------------------------------------------- /spec/header_schema.json: -------------------------------------------------------------------------------- 1 | { 2 | "definitions": {}, 3 | "$schema": "http://json-schema.org/draft-07/schema#", 4 | "$id": "https://github.com/boundlessgeo/coj-spec/spec/header_schema.json", 5 | "type": "object", 6 | "title": "Cloud Optimized GeoJSON schema", 7 | "additionalProperties": true, 8 | "required": [ 9 | "size", 10 | "features", 11 | "bbox", 12 | "collections" 13 | ], 14 | "properties": { 15 | "size": { 16 | "$id": "#/properties/size", 17 | "type": "integer", 18 | "title": "The Size Schema", 19 | "default": 0, 20 | "examples": [ 21 | 102429060 22 | ] 23 | }, 24 | "features": { 25 | "$id": "#/properties/features", 26 | "type": "integer", 27 | "title": "The Features Schema", 28 | "default": 0, 29 | "examples": [ 30 | 124672 31 | ] 32 | }, 33 | "name": { 34 | "$id": "#/properties/name", 35 | "type": "string", 36 | "title": "The Name Schema", 37 | "default": "", 38 | "examples": [ 39 | "Buildings" 40 | ], 41 | "pattern": "^(.*)$" 42 | }, 43 | "description": { 44 | "$id": "#/properties/description", 45 | "type": "string", 46 | "title": "The Description Schema", 47 | "default": "", 48 | "examples": [ 49 | "Building data from Harford County, MD" 50 | ], 51 | "pattern": "^(.*)$" 52 | }, 53 | "version": { 54 | "$id": "#/properties/version", 55 | "type": "string", 56 | "title": "The Version Schema", 57 | "default": "", 58 | "examples": [ 59 | "1.0.0" 60 | ], 61 | "pattern": "^(.*)$" 62 | }, 63 | "published": { 64 | "$id": "#/properties/published", 65 | "type": "string", 66 | "title": "The Published Schema", 67 | "default": "", 68 | "examples": [ 69 | "2018-09-13T15:05:16+00:00" 70 | ], 71 | "pattern": "^(.*)$" 72 | }, 73 | "extended_metadata": { 74 | "$id": "#/properties/extended_metadata", 75 | "type": "object", 76 | "title": "The Extended_metadata Schema", 77 | "default": null, 78 | "required": [ 79 | "start", 80 | "size" 81 | ], 82 | "properties": { 83 | "start": { 84 | "$id": "#/properties/extended_metadata/properties/start", 85 | "type": "integer", 86 | "title": "The Start Schema", 87 | "default": 0, 88 | "examples": [ 89 | 52462 90 | ] 91 | }, 92 | "size": { 93 | "$id": "#/properties/extended_metadata/properties/size", 94 | "type": "integer", 95 | "title": "The Size Schema", 96 | "default": 0, 97 | "examples": [ 98 | 14575 99 | ] 100 | } 101 | } 102 | }, 103 | "bbox": { 104 | "$id": "#/properties/bbox", 105 | "type": "array", 106 | "title": "The Bbox Schema", 107 | "items": { 108 | "$id": "#/properties/bbox/items", 109 | "type": "number", 110 | "title": "The Items Schema", 111 | "minItems":4, 112 | "maxItems": 4, 113 | "additionalItems": false, 114 | "default": 0.0, 115 | "examples": [ 116 | -76.57139851406845, 117 | 39.39485141810742, 118 | -76.08362813450836, 119 | 39.72564001999409 120 | ] 121 | } 122 | }, 123 | "tiles": { 124 | "$id": "#/properties/collections", 125 | "type": "array", 126 | "title": "The Collections Schema", 127 | "items": { 128 | "$id": "#/properties/collections/items", 129 | "type": "object", 130 | "description": "a list of collections contained in this file", 131 | "title": "The Items Schema", 132 | "required": [ 133 | "start", 134 | "size", 135 | "bbox", 136 | "features" 137 | ], 138 | "properties": { 139 | "start": { 140 | "$id": "#/properties/collections/items/properties/start", 141 | "type": "integer", 142 | "title": "The Start Schema", 143 | "default": 0, 144 | "description": "the byte in the file where the collection starts", 145 | "examples": [ 146 | 10240 147 | ] 148 | }, 149 | "size": { 150 | "$id": "#/properties/collections/items/properties/size", 151 | "type": "integer", 152 | "description": "the size of the collection in bytes", 153 | "title": "The Size Schema", 154 | "default": 0, 155 | "examples": [ 156 | 327634 157 | ] 158 | }, 159 | "bbox": { 160 | "$id": "#/properties/collections/items/properties/bbox", 161 | "type": "array", 162 | "title": "The Bbox Schema", 163 | "items": { 164 | "$id": "#/properties/collections/items/properties/bbox/items", 165 | "type": "number", 166 | "title": "The Items Schema", 167 | "description": "the WGS84 bounding box of the feature collection", 168 | "minItems": 4, 169 | "maxItems": 4, 170 | "additionalItems": false, 171 | "default": 0.0, 172 | "examples": [ 173 | -76.14459943195337, 174 | 39.47754856857909, 175 | -76.08362813450836, 176 | 39.518897143814925 177 | ] 178 | } 179 | }, 180 | "name": { 181 | "$id": "#/properties/collections/items/properties/name", 182 | "type": "string", 183 | "title": "The Name Schema", 184 | "default": "", 185 | "examples": [ 186 | "Buildings" 187 | ], 188 | "pattern": "^(.*)$" 189 | }, 190 | "description": { 191 | "$id": "#/properties/collections/items/properties/description", 192 | "type": "string", 193 | "title": "The Description Schema", 194 | "default": "", 195 | "examples": [ 196 | "A subset of data from from Harford County, MD -- between main street and brodway" 197 | ], 198 | "pattern": "^(.*)$" 199 | }, 200 | "features": { 201 | "$id": "#/properties/collections/items/properties/features", 202 | "type": "integer", 203 | "title": "The Features Schema", 204 | "description": "the number of features in the collection", 205 | "default": 0, 206 | "examples": [ 207 | 329 208 | ] 209 | } 210 | } 211 | } 212 | } 213 | } 214 | } -------------------------------------------------------------------------------- /spec/readme.md: -------------------------------------------------------------------------------- 1 | # COGJ Spec 2 | 3 | COGJ defines a header included before an unbounded list of GeoJSON FeatureCollections. Therefore this spec only defines the header and file structure -- not the FeatureCollection or sub structures of it. For information on those please reference the official [IETF doc](https://tools.ietf.org/html/rfc7946). 4 | 5 | 6 | ## COGJ Header 7 | 8 | 9 | ## Separating Feature Collections 10 | 11 | Each `FeatureCollection` in the `collection` section should be prefixed with the record separator symbol (UTF8 `0x001E`) and terminated with the line feed symbol (UTF8 `0x000A`). 12 | The following describes the structure and meaning of properties in the COGJ header. The header must start at byte 0 of the COGJ file and cannot exceed beyond byte 9999 (0 indexed). The header must be valid JSON and must adhere to the JSON schema defined [here](./header_schema.json). 13 | 14 | COGJ Headers are a GeoJSON FeatureCollection with extended properties. 15 | 16 | ### Mandatory Fields 17 | 18 | The following properties are mandatory in the header: 19 | 20 | | Name | Optional | Type | Meaning | 21 | |:------:|:-----------:|:------:|:--------------:| 22 | | `size`| no | integer | the size in bytes of the entire file | 23 | |`features`| no | integer | the number of total features contained in the entire file | 24 | |`bbox`| no| array | the bounding box of the entire dataset as an array of 4 coordinates specified in decimal degrees and WGS84 ordered as left, bottom, right, top | 25 | |`collections`|no| an array of `collection` objects | each object describes a FeatureCollection contained in this file | 26 | |`name`| yes | string |a short name for this dataset | 27 | |`description`| yes | string |a textual desciption of this dataset | 28 | |`version`| yes | string | the version of this dataset, ideally compliant with [sematic versioning](https://semver.org/) | 29 | |`published`| yes | string | the publication timestamp in [ISO8601 format](https://en.wikipedia.org/wiki/ISO_8601) | 30 | |`extended_metadata`| yes | object | an object that points to an range of bytes which allows this header to extend beyond 10k byte size | 31 | 32 | 33 | The `features` array contains the GeoJSON features with following properties: 34 | 35 | | Name | Optional | Type | Meaning | 36 | |:------:|:-----------:|:------:|:--------------:| 37 | |`start`| no| integer | the first byte of the collection in the file | 38 | |`size` | no | integer | the size of the collection in bytes | 39 | |`bbox`| no| array | the bounding box of the collecttion as an array of 4 coordinates specified in decimal degrees and WGS84 ordered as left, bottom, right, top | 40 | |`features`| no | integer | the number of features in this collection | 41 | |`name`| yes | string |a short name for this dataset | 42 | |`description`| yes | string |a textual desciption of this dataset | 43 | 44 | The `extended_metadata` object contains the following properties: 45 | 46 | | Name | Optional | Type | Meaning | 47 | |:------:|:-----------:|:------:|:--------------:| 48 | | `start`| no| integer | the first byte of the `extended_metadata` in the file | 49 | |`size` | no | integer | the size of the `extended_metadata` in bytes | 50 | 51 | The `geometry` object must be null. 52 | 53 | ## Additional Properties 54 | 55 | Both the top level `header` and the `collection` objects allow `additionalProperties` in the JSON schema definition. This means that any arbitrary properties may be inserted to enrich the file or collection metadata. 56 | 57 | ## COGJ Extended header 58 | 59 | When dealing with very large datasets which have been divided into large numbers of collections it may be possible for the header to overflow the 10k limit. In such cases the `extended_metadata` object allows the publisher to specify an additional byte range containing the full metadata header. 60 | 61 | The content of the byte range specified by the `extended_metadata` object should be valid JSON and adhere to the same header schema. 62 | 63 | 64 | ## Header Padding 65 | 66 | As the header JSON will rarely be exactly 10,000 bytes, the remainder of the length should be padded using spaces (UTF8 `0x0020`) so that the size is exact. 67 | 68 | ## Separating Feature Collections 69 | 70 | Each `FeatureCollection` in the `collection` section should be prefixed with the record separator symbol (UTF8 `0x001E`) and terminated with the line feed symbol (UTF8 `0x000A`). 71 | --------------------------------------------------------------------------------