├── README.md └── geoparquet-overture.md /README.md: -------------------------------------------------------------------------------- 1 | # DuckDB and GeoParquet Tutorial 2 | 3 | This is a quick tutorial on how you can use DuckDB to easily access the [cloud-native version](https://beta.source.coop/cholmes/google-open-buildings) of the [Google Open Buildings](https://sites.research.google/open-buildings/) data set from [source.coop](https://beta.source.coop/) and transform it into your favorite GIS 4 | format. A big thanks to [Mark Litwintschik's post on DuckDB's Spatial Extension](https://tech.marksblogg.com/duckdb-gis-spatial-extension.html) for lots of the key information, it's highly recommended. 5 | 6 | ## About DuckDB? 7 | 8 | [DuckDB](https://duckdb.org/) is an awesome new tool for working with data. In some ways it's a next generation 'SQLite' (which is behind GeoPackage in the geo world) - but fundamentally designed for analysis workflows. TODO: more explanation. 9 | 10 | To install it just follow the instructions at: https://duckdb.org/docs/installation/index. This tutorial uses the command line version 11 | 12 | There are a couple of awesome extensions that make it very easy to work with parquet files on the cloud. [httpfs](https://duckdb.org/docs/extensions/httpfs.html) enables you to pull in S3 13 | files directly within DuckDB, and [spatial](https://duckdb.org/docs/extensions/spatial) gives 14 | you geometries with a number of operations, and lets you write out to over 50 different formats. 15 | 16 | ## Setting up DuckDB 17 | 18 | Once you've installed it then getting started is easy. Just type `duckdb` from the command-line. If you want to persist the tables you create you can supply a name, like `duckdb buildings.db`, but it's not necessary for this tutorial. After you're in the DuckDB interface you'll need to install and load the two extensions (you just need to install once, so can skip that in the future): 19 | 20 | ``` 21 | INSTALL spatial; 22 | LOAD spatial; 23 | INSTALL httpfs; 24 | LOAD httpfs; 25 | ``` 26 | 27 | The DuckDB docs say that you should set your S3 region, but it doesn't seem to be necessary. For 28 | this dataset it'd be: 29 | 30 | ``` 31 | SET s3_region='us-west-2'; 32 | ``` 33 | 34 | ## Full country into GeoParquet 35 | 36 | We'll start with how to get a country-wide GeoParquet file from the [geoparquet-admin1](https://beta.source.coop/cholmes/google-open-buildings/browse/geoparquet-admin1) directory, which partitions the dataset into directories for each country and files for each admin level 1 region. 37 | 38 | For this we don't actually even need the spatial extension - we'll just use DuckDB's great S3 selection interface to easily export to Parquet and then use another tool to turn it into official GeoParquet. 39 | 40 | The following call selects all parquet files from the [`country=SSD`](https://beta.source.coop/cholmes/google-open-buildings/browse/geoparquet-admin1/country=SSD) (South Sudan) directory. 41 | 42 | ``` 43 | COPY (SELECT * FROM 44 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=SSD/*.parquet') 45 | TO 'south_sudan.parquet' (FORMAT PARQUET); 46 | ``` 47 | 48 | These stream directly out to a local parquet file. Below we'll explore how to save it into DuckDB and work with it. The output will not be Geoparquet, but a Parquet file with Well Known Binary. Hopefully DuckDB will support native GeoParquet output so we won't need the conversion at the end. As long as you name the geometry column 'geometry' then you can use the [gpq](https://github.com/planetlabs/gpq) `convert` function to turn it from a parquet file with WKB to a valid GeoParquet file. 49 | 50 | To set up gpq just use the [installation docs](https://github.com/planetlabs/gpq#installation). And when it's set up you can just say: 51 | 52 | ``` 53 | gpq convert south_sudan.parquet south_sudan-geo.parquet 54 | ``` 55 | 56 | 57 | The `south_sudan-geo.parquet` file should be valid GeoParquet. You can then change it into any other format using GDAL's [`ogr2ogr`](https://gdal.org/programs/ogr2ogr.html): 58 | 59 | ``` 60 | ogr2ogr laos.fgb laos-geo.parquet 61 | ``` 62 | 63 | Note that the above file is only about 3 megabytes. Larger countries can be hundreds of megabytes or gigabytes, so be sure you have a fast connections or patience. DuckDB should give some updates on progress as it works, but it doesn't seem to be super accurate with remote files. 64 | 65 | ## Using DuckDB Spatial with GeoParquet 66 | 67 | Now we'll get into working with DuckDB a bit more, mostly to transform it into different output formats to start. We'll start small, which should work with most connections. But once we get to the bigger requests they may take awhile if you have a slower connection. DuckDB can still be useful, but you'll probably want to save the files as tables and persist locally. 68 | 69 | ### Working with one file 70 | 71 | We'll start with just working with a single parquet file. 72 | 73 | #### Count one file 74 | 75 | You can get the count of a file, just put in the S3 URL to the parquet file. 76 | 77 | ``` 78 | SELECT count(*) FROM 79 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/Attapeu.parquet'; 80 | ``` 81 | 82 | Results in: 83 | 84 | ``` 85 | ┌──────────────┐ 86 | │ count_star() │ 87 | │ int64 │ 88 | ├──────────────┤ 89 | │ 98454 │ 90 | └──────────────┘ 91 | ``` 92 | 93 | #### Select all one file 94 | 95 | And you can see everything in it: 96 | 97 | ``` 98 | SELECT * FROM 99 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/Attapeu.parquet'; 100 | ``` 101 | 102 | Which should get you a response like: 103 | 104 | ``` 105 | ┌────────────────┬────────────┬────────────────┬────────────────────────────────────────────────┬───────────┬─────────┬─────────┐ 106 | │ area_in_meters │ confidence │ full_plus_code │ geometry │ id │ country │ admin_1 │ 107 | │ double │ double │ varchar │ blob │ int64 │ varchar │ varchar │ 108 | ├────────────────┼────────────┼────────────────┼────────────────────────────────────────────────┼───────────┼─────────┼─────────┤ 109 | │ 86.0079 │ 0.6857 │ 7P68VCWH+4962 │ \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00… │ 632004767 │ LAO │ Attapeu │ 110 | │ 35.1024 │ 0.6889 │ 7P68VCWH+4H65 │ \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00… │ 632004768 │ LAO │ Attapeu │ 111 | │ 40.6071 │ 0.6593 │ 7P68VCWH+53J5 │ \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00… │ 632004769 │ LAO │ Attapeu │ 112 | │ · │ · │ · │ · │ · │ · │ · │ 113 | │ · │ · │ · │ · │ · │ · │ · │ 114 | │ · │ · │ · │ · │ · │ · │ · │ 115 | │ 641629546 │ LAO │ Attapeu │ 116 | │ 59.2047 │ 0.6885 │ 7P7976C9+J5X3 │ \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00… │ 641629547 │ LAO │ Attapeu │ 117 | │ 13.8254 │ 0.6183 │ 7P7976C9+M48G │ \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00… │ 641629548 │ LAO │ Attapeu │ 118 | │ 183.8289 │ 0.7697 │ 7P7976H4+VHGV │ \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00… │ 641629614 │ LAO │ Attapeu │ 119 | ├────────────────┴────────────┴────────────────┴────────────────────────────────────────────────┴───────────┴─────────┴─────────┤ 120 | │ 98454 rows (40 shown) 7 columns │ 121 | └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 122 | ``` 123 | 124 | #### Filter one file 125 | 126 | From there you can easily filter, like only show the largest buildings: 127 | 128 | ``` 129 | SELECT * FROM 130 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/Attapeu.parquet' 131 | WHERE area_in_meters > 1000; 132 | ``` 133 | 134 | #### Get one file in your local duckdb 135 | 136 | If you've got a fast connection you can easily just keep doing your sql queries against the parquet files that are sitting on S3. But you can also easily pull the data into a table and then work with it locally. 137 | 138 | ``` 139 | CREATE TABLE attapeu AS SELECT * EXCLUDE geometry, ST_GEOMFROMWKB(geometry) AS geometry FROM 140 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/Attapeu.parquet'; 141 | ``` 142 | 143 | This creates a true 'geometry' type from the well known binary 'geometry' field, which 144 | can then be used in spatial operations. Note this also shows one of Duck's [friendlier SQL](https://duckdb.org/2022/05/04/friendlier-sql.html) additions with `EXCLUDE`. 145 | 146 | #### Write out duckdb table 147 | 148 | You can then write this out as common geospatial formats: 149 | 150 | ``` 151 | COPY (SELECT * EXCLUDE geometry, ST_AsWKB(geometry) AS geometry from attapeu) 152 | TO 'attapeu-1.fgb' WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); 153 | ``` 154 | 155 | The DuckDB output does not seem to consistenly set the spatial reference system (hopefully someone will point out how to do this consistently or improve it in the future). You can clean this up with `ogr2ogr`: 156 | 157 | ``` 158 | ogr2ogr -a_srs EPSG:4326 attapeu.fgb attapeu-1.fgb 159 | ``` 160 | 161 | #### Directly streaming output 162 | 163 | You also don't have to instantiate the table in DuckDB if your connection is fast, you can just do: 164 | 165 | ``` 166 | COPY (SELECT * EXCLUDE geometry, ST_AsWKB(ST_GEOMFROMWKB(geometry)) AS geometry FROM 167 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/Attapeu.parquet') 168 | TO 'attapeu-2.fgb' WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); 169 | ``` 170 | 171 | This one also needs to cleaned up with a projection with ogr2ogr. 172 | 173 | ### Working with a whole country 174 | 175 | Using DuckDB with Parquet on S3 starts to really shine when you want to work with lots of data. There are lots of easy options to just download a file and then transform it. But bulk downloading from S3 and then getting the data formatted as you want can be a pain - configuring your S3 client, getting the names of everything to download, etc. 176 | 177 | With DuckDB and these GeoParquet files you can just use various * patterns to select multiple files and treat them as a single one: 178 | 179 | ``` 180 | SELECT count(*) FROM 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/*.parquet'; 181 | ``` 182 | 183 | The above query gets you all the bulidings in Laos. If your connection is quite fast you can do all these calls directly on the parquet files. But for most it's easiest to load it locally: 184 | 185 | ``` 186 | CREATE TABLE laos AS SELECT * EXCLUDE geometry, ST_GEOMFROMWKB(geometry) AS geometry FROM 187 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=LAO/*.parquet'; 188 | ``` 189 | 190 | From there it's easy to add more data. Let's rename our table to `se_asia` and then pull down 191 | the data from Thailand as well: 192 | 193 | ``` 194 | ALTER TABLE laos RENAME TO se_asia; 195 | INSERT INTO se_asia from (SELECT * EXCLUDE geometry, ST_GEOMFROMWKB(geometry) AS geometry FROM 196 | 's3://us-west-2.opendata.source.coop/google-research-open-buildings/geoparquet-admin1/country=THA/*.parquet'); 197 | ``` 198 | 199 | This will take a bit longer, as Thailand has about ten times the number of buildings of Laos. 200 | 201 | You can continue to add other countries to the `se_asia` table, and then write it out as a number of gis formats just like above. 202 | -------------------------------------------------------------------------------- /geoparquet-overture.md: -------------------------------------------------------------------------------- 1 | ## About 2 | 3 | This is some quick notes on how to get proper GeoParquet files (and other formats) from the new [Overture Maps Release](https://github.com/OvertureMaps/data) 4 | 5 | I'm finding DuckDB to be a great tool for this, and love that the Overture team put up a couple sample queries at https://github.com/OvertureMaps/data/tree/main/duckdb_queries 6 | 7 | I tried using GPQ and ogr2ogr directly. GPQ had a problem with byte array, hoping it's an easy fix at https://github.com/planetlabs/gpq/issues/57 ogr2ogr 8 | seems to have trouble with nested data structures perhaps? I get 9 | 10 | ``` 11 | Warning 1: Field names of unhandled type map> ('names')> ignored 12 | Warning 1: Field sources of unhandled type list> ignored 13 | ERROR 1: ICreateFeature: NULL geometry not supported with spatial index 14 | ERROR 1: Unable to write feature 0 from layer 20230725_211237_00132_5p54t_25816df1-b864-49c0-a9a3-a13da4f37a90. 15 | ERROR 1: Terminating translation prematurely after failed 16 | translation of layer 20230725_211237_00132_5p54t_25816df1-b864-49c0-a9a3-a13da4f37a90 (use -skipfailures to skip errors) 17 | ``` 18 | 19 | So DuckDB with their samples seems to be ideal. But the readme just shows GeoJSON output. What about GeoParquet? 20 | 21 | #### Countries to GeoParquet 22 | 23 | That's pretty easy, can adapt their example to do Parquet output: 24 | 25 | ``` 26 | load spatial; 27 | load httpfs; 28 | COPY ( 29 | SELECT 30 | type, 31 | subType, 32 | localityType, 33 | adminLevel, 34 | isoCountryCodeAlpha2, 35 | JSON(names) AS names, 36 | JSON(sources) AS sources, 37 | geometry 38 | FROM read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1) 39 | WHERE adminLevel = 2 40 | AND ST_GeometryType(ST_GeomFromWKB(geometry)) IN ('POLYGON','MULTIPOLYGON') 41 | ) TO 'countries-tmp.parquet' 42 | WITH (FORMAT Parquet); 43 | ``` 44 | 45 | (be sure to have spatial and httpfs installed) 46 | 47 | And then once you get your parquet output then use GPQ or ogr2ogr 48 | 49 | ``` 50 | gpq convert countries-tmp.parquet countries.parquet 51 | ``` 52 | 53 | #### Countries to GeoPackage 54 | 55 | What about output to GeoPackage, Shapefile or Flatgeobuf? 56 | 57 | ``` 58 | load spatial; 59 | load httpfs; 60 | COPY ( 61 | SELECT 62 | type, 63 | subType, 64 | localityType, 65 | adminLevel, 66 | isoCountryCodeAlpha2, 67 | JSON(names) AS names, 68 | JSON(sources) AS sources, 69 | ST_GeomFromWkb(geometry) AS geometry 70 | FROM read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1) 71 | WHERE adminLevel = 2 72 | AND ST_GeometryType(ST_GeomFromWkb(geometry)) IN ('POLYGON','MULTIPOLYGON') 73 | ) TO 'countries.gpkg' 74 | WITH (FORMAT GDAL, DRIVER 'GPKG'); 75 | ``` 76 | ##### correct srs 77 | 78 | Note that DuckDB's spatial support does not yet write out the projection information. I've heard this is coming soon, but until then you need to 79 | convert the output using ogr2ogr: 80 | 81 | ``` 82 | ogr2ogr -a_srs EPSG:4326 countries-tmp.gpkg countries.gpkg 83 | mv countries-tmp.gpkg countries.gpkg 84 | ``` 85 | 86 | ##### Shapefile and FlatGeobuf 87 | 88 | Note that GeoPackage output from DuckDB is particulary slow (like 3-4x slower than fgb or shapefile, 10x slower than parquet). To use another format 89 | just change the file suffix and the driver. Shapefile is `DRIVER 'ESRI Shapefile'` and FlatGeobuf is `DRIVER 'FlatGebuf'). 90 | 91 | Note that ESRI conversion will raise errors like: 92 | ``` 93 | Warning 6: Normalized/laundered field name: 'localitytype' to 'localityty' 94 | Warning 6: Normalized/laundered field name: 'isocountrycodealpha2' to 'isocountry' 95 | Warning 1: One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1. This warning will not be emitted anymore. 96 | ``` 97 | So if you have QGIS just use FlatGeobuf or GeoParquet, and if you have ESRI GeoPackage. 98 | 99 | ### Saving Data to DuckDB 100 | 101 | Note that all the above are pulling the Parquet files from the cloud for every command. If you're going to put it in different formats or want to 102 | explore it locally then I recommend putting it into DuckDB. You can do this in a temporary table by just typing `duckdb` and doing the following 103 | commands. That will keep the data local while you're in DuckDB, but it'll go away when you close it. Most convenient is to just make a DuckDB 104 | database, you just pick any file name, but ending in .db or .duckdb is convenient to remember what it is: 105 | 106 | ``` 107 | duckdb overture.duckdb 108 | ``` 109 | 110 | Then when you start DuckDB you create a table instead of writing it out immediately. If you want the exact fields you'll write out then say: 111 | 112 | ``` 113 | load spatial; 114 | load httpfs; 115 | CREATE TABLE countries AS ( 116 | SELECT 117 | type, 118 | subType, 119 | localityType, 120 | adminLevel, 121 | isoCountryCodeAlpha2, 122 | JSON(names) AS names, 123 | JSON(sources) AS sources, 124 | ST_GeomFromWKB(geometry) AS geometry 125 | FROM read_parquet('s3://overturemaps-us-west-2/release/2023-07-26-alpha.0/theme=admins/type=*/*', filename=true, hive_partitioning=1) 126 | WHERE adminLevel = 2 127 | AND ST_GeometryType(ST_GeomFromWKB(geometry)) IN ('POLYGON','MULTIPOLYGON') 128 | ); 129 | ``` 130 | 131 | (note you don't need to load spatial and httpfs for every call - just every time you open a duckdb. So I just put them in for copy and paste ease, 132 | as it doesn't hurt to have them and the error messages when they aren't there are often a bit tricky to figure out). 133 | 134 | Then you can do normal sql commands: 135 | 136 | ``` 137 | select * from countries; 138 | ``` 139 | 140 | ``` 141 | select count(*) from countries; 142 | ``` 143 | 144 | Then if you want to write this out to a format you can do: 145 | 146 | ``` 147 | COPY (SELECT * from countries) TO 'countries.fgb' 148 | WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); 149 | ``` 150 | 151 | (Remember you'll need to convert the output with ogr2ogr to get the srs, as [above](#correct-srs) 152 | 153 | state notes: 154 | 155 | ``` 156 | COPY (select type, subtype, localityType, adminLevel, isoCountryCodeAlpha2, JSON(names) AS names, JSON(sources) AS sources, ST_GeomFromWkb(geometry) AS geometry from locals where adminLevel = 4 AND ST_GeometryType(ST_GeomFromWkb(geometry)) IN ('POLYGON','MULTIPOLYGON')) TO 'states.fgb' WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); 157 | ``` 158 | 159 | ### Extracting JSON 160 | 161 | So the JSON blobs aren't super useful, and it can be a pain to do the JSON extraction in a geospatial program. So we can also use DuckDB to 162 | extract out key values: 163 | 164 | To see the ISO code and then the name in the local language you can do: 165 | 166 | ``` 167 | select isocountrycodealpha2, names->>'$.common[0].value', geometry from countries; 168 | ``` 169 | 170 | (I'm assuming that 'local' is always first, but that may be a bad assumption. It'd be better to look for the 'language' value to be 'local', but I'm not sure how to do that - suggestions welcome). 171 | 172 | ### Buildings 173 | 174 | (just notes for myself here for now). 175 | 176 | After download 177 | `create table buildings as (select * from read_parquet('202*'));` for directory after aws s3 cp or sync call (that has some other stuff in it), when duckdb is started in the directory. 178 | 179 | ``` 180 | COPY (SELECT * from b) TO 'buildings-tmp.fgb' 181 | WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); 182 | ``` 183 | 184 | Write out just values that aren't complicated (to parse these out better later); 185 | 186 | ``` 187 | COPY (SELECT id, updatetime, geometry from b) TO 'buildings-t.parquet' WITH (FORMAT PARQUET); 188 | ``` 189 | 190 | Then `gpq convert buildings-t.parquet buildings.parquet` 191 | 192 | Buildings data structure: 193 | 194 | ``` 195 | CREATE TABLE buildings( 196 | id VARCHAR, 197 | updatetime VARCHAR, 198 | "version" INTEGER, 199 | "names" MAP(VARCHAR, MAP(VARCHAR, VARCHAR)[]), 200 | "level" INTEGER, 201 | height DOUBLE, 202 | numfloors INTEGER, 203 | "class" VARCHAR, 204 | sources MAP(VARCHAR, VARCHAR)[], 205 | bbox STRUCT(minx DOUBLE, maxx DOUBLE, miny DOUBLE, maxy DOUBLE), 206 | geometry BLOB);; 207 | ``` 208 | 209 | Join to get country: 210 | 211 | ``` 212 | ALTER TABLE buildings 213 | ADD COLUMN country_iso VARCHAR; 214 | 215 | UPDATE buildings 216 | SET country_iso = countries.isocountrycodealpha2 217 | FROM countries 218 | WHERE ST_Intersects(countries.geometry, ST_GeomFromWKB(buildings.geometry)); 219 | ``` 220 | 221 | Initial order by attempt: 222 | 223 | ``` 224 | COPY (SELECT * from b WHERE country = 'US' ORDER BY ST_X(ST_Centroid(ST_GeomFromWKB(geometry))), 225 | ST_Y(ST_Centroid(ST_GeomFromWKB(geometry)))) TO 'buildings-us.parquet' 226 | ``` 227 | 228 | Code to add quadkeys is at https://github.com/opengeos/open-buildings/blob/e0825d25423b50c89754bd5b0975118db2cf3c68/open_buildings/overture-buildings-parquet-add-columns.py#L46 229 | 230 | Statement to make a view for a smaller quadkey: 231 | 232 | ``` 233 | CREATE VIEW buildings_l5 AS SELECT id, updatetime, "version", "names", "level", height, numfloors, "class", sources, 234 | bbox, geometry, quadkey, SUBSTR(quadkey, 1, 5) AS quadkey5, country_iso FROM buildings; 235 | ``` 236 | 237 | The copy command I used is (used in a script, but just swap out for a specific quadkey): 238 | ``` 239 | copy_cmd = f"COPY (SELECT * FROM buildings_l5 WHERE country_iso = 'US' AND quadkey5 = '{quad5}' ORDER BY quadkey) TO '{quad5}_temp.parquet' WITH (FORMAT PARQUET); " 240 | ``` 241 | 242 | ### Getting data from source.coop 243 | 244 | ``` 245 | CREATE TABLE gg_park AS (SELECT ST_GeomFromWKB(geometry) AS geometry, JSON(names) AS names, JSON(height) as height from 246 | read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country/country_iso=US/*.parquet', hive_partitioning=1), WHERE 247 | bbox.minX > -122.5103 AND 248 | bbox.maxX < -122.4543 AND 249 | bbox.minY > 37.7658 AND 250 | bbox.maxY < 37.7715); 251 | ``` 252 | 253 | ``` 254 | CREATE TABLE gg_park AS (SELECT * from 255 | read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country/country_iso=US/20321.parquet', hive_partitioning=1), WHERE 256 | bbox.minX > -122.5103 AND 257 | bbox.maxX < -122.4543 AND 258 | bbox.minY > 37.7658 AND 259 | bbox.maxY < 37.7715); 260 | ``` 261 | 262 | ``` 263 | SELECT count(*) from 264 | read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country/country_iso=US/20321.parquet', hive_partitioning=1), WHERE 265 | bbox.minX > -122.5103 AND 266 | bbox.maxX < -122.4543 AND 267 | bbox.minY > 37.7658 AND 268 | bbox.maxY < 37.7715; 269 | ``` 270 | 271 | ``` 272 | SELECT ST_GeomFromWKB(geometry) AS geometry, JSON(names) AS names, JSON(height) as height from '*.parquet' WHERE 273 | bbox.minX > -122.5103 AND 274 | bbox.maxX < -122.4543 AND 275 | bbox.minY > 37.7658 AND 276 | bbox.maxY < 37.7715 277 | ``` 278 | 279 | ``` 280 | SELECT * from 281 | read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-2/country_iso=US/20321_20000.parquet', hive_partitioning=1), WHERE 282 | bbox.minX > -122.5103 AND 283 | bbox.maxX < -122.4543 AND 284 | bbox.minY > 37.7658 AND 285 | bbox.maxY < 37.7715; 286 | ``` 287 | 288 | ``` 289 | SELECT * from 290 | read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-2/20321_5k.parquet'), WHERE 291 | ``` 292 | 293 | ``` 294 | select * from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-2/country_iso=US/*.parquet'), WHERE quadkey LIKE '2032120231%'; 295 | ``` 296 | 297 | ``` 298 | select * from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-2/country_iso=US/*.parquet'), 299 | WHERE quadkey LIKE '2032120231%' AND 300 | ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON((-122.5103 37.7658, -122.4543 37.7658, -122.4543 37.7715, -122.5103 37.7715, -122.5103 37.7658))')); 301 | ``` 302 | 303 | ``` 304 | load spatial; 305 | select * from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-2/*.parquet'), 306 | WHERE quadkey LIKE '030230%' AND 307 | ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON ((-76.272307 45.616715, -76.110399 45.462413, -76.053333 45.643628, -75.965744 45.403742, -75.848958 45.456828, -75.943183 45.294619, -76.378476 45.321687, -76.124997 45.375782, -76.342644 45.494982, -76.320083 45.552629, -76.320083 45.552629, -76.272307 45.616715))')); 308 | ``` 309 | 310 | ``` 311 | load spatial; 312 | select * from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-country-2/*.parquet'), 313 | WHERE quadkey LIKE '12002332%' AND 314 | ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON ((9.446443 56.207488, 9.432342 56.178091, 9.535293 56.16731, 9.467509 56.188492, 9.543958 56.190004, 9.446443 56.207488))')); 315 | ``` 316 | 317 | ``` 318 | load spatial; 319 | select * from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-3/*/*.parquet'), 320 | WHERE quadkey LIKE '0213231%' AND 321 | ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON ((-103.440312 44.12233, -103.319886 44.12233, -103.319886 44.045527, -103.440312 44.045527, -103.440312 44.12233))')); 322 | ``` 323 | 324 | ``` 325 | load spatial; 326 | COPY (select id, updatetime, ST_AsWKB(ST_GeomFromWKB(geometry)) AS geometry, country_iso from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-3/*.parquet'), 327 | WHERE quadkey LIKE '0213231%' AND 328 | ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON ((-103.440312 44.12233, -103.319886 44.12233, -103.319886 44.045527, -103.440312 44.045527, -103.440312 44.12233))'))) TO 'buildings.geojson' 329 | WITH (FORMAT GDAL, DRIVER 'GeoJSON'); 330 | ``` 331 | 332 | ``` 333 | COPY (select id, country_iso, ST_AsWKB(ST_GeomFromWKB(geometry)) AS geometry from read_parquet('s3://us-west-2.opendata.source.coop/cholmes/overture/geoparquet-3/*.parquet'), 334 | WHERE quadkey LIKE '2103213003%' AND 335 | ST_Within(ST_GeomFromWKB(geometry), ST_GeomFromText('POLYGON ((-58.53445 -34.64447, -58.570879 -34.660286, -58.597863 -34.615051, -58.514887 -34.599226, -58.542546 -34.627541, -58.542546 -34.627541, -58.53445 -34.64447))'))) TO 'buildings.fgb' WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); 336 | ``` 337 | --------------------------------------------------------------------------------