├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | PostgreSQL & PostGIS Cheatsheet 2 | =============================== 3 | This is a collection of information on PostgreSQL and PostGIS for what I tend to use most often. 4 | 5 | ## TOC 6 | - [Installing Postgres & PostGIS](#installation) 7 | - [Using Postgres on the command line: PSQL](#psql) 8 | - [Importing Data into Postgres](#importing-data) 9 | - [Exporting Data from Postgres](#exporting-data) 10 | - [Joining Tables](#joining-tables-using-a-shared-key) 11 | - [Upgrading Postgres](#upgrading-postgres) 12 | - [PostGIS common commands](#postgis-1) 13 | - [Common PostGIS spatial queries](#common-spatial-queries) 14 | - [Spatial Indexing](#spatial-indexing) 15 | - [Importing spatial data into PostGIS](#importing-spatial-data-to-postgis) 16 | - [Exporting spatial data from PostGIS](#exporting-spatial-data-from-postgis) 17 | - [Other Methods of Interacting With Postgres/PostGIS](#other-methods-of-interacting-with-postgres/postgis) 18 | 19 | ## Installation 20 | ### Postgres 21 | - to install on Ubuntu do: `apt-get install postgresql` 22 | 23 | - to install on Mac OS X first install [homebrew](http://brew.sh/) and then do `brew install postgresql` 24 | 25 | - to install on Windows... 26 | 27 | Note that for OS X and Ubuntu you may need to run the above commands as a super user / using `sudo`. 28 | 29 | #### Set Up 30 | On Ubuntu you typically need to log in as the Postgres user and do some admin things: 31 | 32 | - log in as postgres: `sudo -i -u postgres` 33 | - create a new user: `createuser --interactive` 34 | - type the name of the new user (no spaces!), typically the same name as your linux user that isn't root. You can add a new linux user by doing `adduser username`. 35 | - typically you want the user to have super-user privileges, so type `y` when asked. 36 | - create a new database that has the same name as the new user: `createdb username` 37 | 38 | For Mac OS X you can skip the above if you install with homebrew. 39 | 40 | For Windows.... 41 | 42 | 43 | #### Starting the Postgres Database 44 | On Mac OS X: 45 | 46 | - to start the Postgres server do: `postgres -D /usr/local/var/postgres` 47 | 48 | - or do `pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start` to start and `pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log stop` to stop 49 | 50 | - to have Postgres start everytime you boot your Mac do: `ln -sfv /usr/local/opt/postgresql/*.plist ~/Library/LaunchAgents` then to check that it's working after booting do: `ps ax | grep sql` 51 | 52 | ### PostGIS 53 | - On Ubuntu do `apt-get install postgis` 54 | 55 | - On Mac OS X the easiest method is via homebrew: `brew install postgis` 56 | (note that if you don't have Postgres or GDAL installed already it will automatically install these first). 57 | 58 | - to install on Windows... 59 | 60 | ## psql 61 | psql is the interactive unix command line tool for interacting with Postgres/PostGIS. 62 | 63 | ### Common Commands 64 | - log-in / connect to a database name by doing `psql -d db_name` 65 | 66 | - for doing admin type things such as managing db users, log in as the postgres user: `psql postgres;` 67 | 68 | - to create a database: `CREATE DATABASE database-name;` 69 | 70 | - to connect to a database: `\c database-name;` 71 | 72 | - to delete a database `DROP DATABASE database-name;` 73 | 74 | - to connect when starting psql use the `-d` flag like: `psql -d nyc_noise` 75 | 76 | - to list all databases: `\l` 77 | 78 | - to quit psql: `\q` 79 | 80 | - to grant privileges to a user (requires logging in as `postgres` ): 81 | 82 | `GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;` 83 | 84 | - to enable the hstore extension ( for key : value pairs, useful when working with OpenStreetMap data) do: `CREATE EXTENSION hstore` 85 | 86 | - to view columns of a table: `\d table_name` 87 | 88 | - to list all columns in a table (helpful when you have a lot of columns!): 89 | `select column_name from information_schema.columns where table_name = 'my_table' order by column_name asc;` 90 | 91 | - to rename a column: 92 | `alter table noise.hoods rename column noise_sqkm to complaints_sqkm;` 93 | 94 | - to change a column's data type: 95 | `alter table noise.hoods alter column noise_area type float;` 96 | 97 | - to compute values from two columns and assign them to another column: `update noise.hoods set noise_area = noise/(area/1000);` 98 | 99 | - to search by wildcard use the `like` (case sensitive) or `ilike` (treats everything as lowercase) command: 100 | `SELECT count(*) from violations where inspection_date::text ilike '2014%';` 101 | 102 | - to insert data into a table: 103 | 104 | ``` 105 | INSERT INTO table_name (column1, column2) 106 | VALUES 107 | (value1, value2); 108 | ``` 109 | 110 | - to insert data from another table: 111 | 112 | ``` 113 | INSERT INTO table_name (value1, value2) 114 | SELECT column1, column2 115 | FROM other_table_name 116 | ``` 117 | 118 | 119 | - to remove rows using a where clause: 120 | `DELETE FROM table_name WHERE some_column = some_value` 121 | 122 | 123 | - **list all column names from a table in alphabetical order:** 124 | 125 | ``` 126 | select column_name 127 | from information_schema.columns 128 | where table_schema = 'public' 129 | and table_name = 'bk_pluto' 130 | order by column_name; 131 | ``` 132 | 133 | - **List data from a column as a single row, comma separated:** 134 | 1. `SELECT array_to_string( array( SELECT id FROM table ), ',' )` 135 | 2. `SELECT string_agg(id, ',') FROM table` 136 | 137 | - **rename an existing table:** 138 | `ALTER TABLE table_name RENAME TO table_name_new;` 139 | 140 | - **rename an existing column** of a table: 141 | `ALTER TABLE table_name RENAME COLUMN column_name TO column_new_name;` 142 | 143 | - **Find duplicate rows** in a table based on values from two fields: 144 | 145 | ``` 146 | select * from ( 147 | SELECT id, 148 | ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row 149 | FROM Photos 150 | ) dups 151 | where 152 | dups.Row > 1 153 | ``` 154 | credit: [MatthewJ on stack-exchange](http://stackoverflow.com/questions/14471179/find-duplicate-rows-with-postgresql) 155 | 156 | - **Bulk Queries** are efficient when doing multiple inserts or updates of different values: 157 | 158 | ``` 159 | UPDATE election_results o 160 | SET votes=n.votes, pro=n.pro 161 | FROM (VALUES (1,11,9), 162 | (2,44,28), 163 | (3,25,4) 164 | ) n(county_id,votes,pro) 165 | WHERE o.county_id = n.county_id; 166 | ``` 167 | 168 | ``` 169 | INSERT INTO election_results (county_id,voters,pro) 170 | VALUES (1, 11,8), 171 | (12,21,10), 172 | (78,31,27); 173 | ``` 174 | ``` 175 | WITH 176 | -- write the new values 177 | n(ip,visits,clicks) AS ( 178 | VALUES ('192.168.1.1',2,12), 179 | ('192.168.1.2',6,18), 180 | ('192.168.1.3',3,4) 181 | ), 182 | -- update existing rows 183 | upsert AS ( 184 | UPDATE page_views o 185 | SET visits=n.visits, clicks=n.clicks 186 | FROM n WHERE o.ip = n.ip 187 | RETURNING o.ip 188 | ) 189 | -- insert missing rows 190 | INSERT INTO page_views (ip,visits,clicks) 191 | SELECT n.ip, n.visits, n.clicks FROM n 192 | WHERE n.ip NOT IN ( 193 | SELECT ip FROM upsert 194 | ); 195 | ``` 196 | ### Importing Data 197 | - import data from a CSV file using the COPY command: 198 | 199 | ``` 200 | COPY noise.locations (name, complaint, descript, boro, lat, lon) 201 | FROM '/Users/chrislhenrick/tutorials/postgresql/data/noise.csv' WITH CSV HEADER; 202 | ``` 203 | - import a CSV file "AS IS" using csvkit's `csvsql` (requires python, pip, csvkit, psycopg2): 204 | 205 | ``` 206 | csvsql --db postgresql:///nyc_pluto --insert 2012_DHCR_Bldg.csv 207 | ``` 208 | 209 | ### Exporting Data 210 | - export data as a CSV with Headers using COPY: 211 | 212 | ``` 213 | COPY dob_jobs_2014 to '/Users/chrislhenrick/development/nyc_dob_jobs/data/2014/dob_jobs_2014.csv' DELIMITER ',' CSV Header; 214 | ``` 215 | 216 | - to the current workspace without saving to a file: 217 | 218 | ``` 219 | COPY (SELECT foo FROM bar) TO STDOUT CSV HEADER; 220 | ``` 221 | 222 | - from the command line w/o connecting to postgres: 223 | 224 | ``` 225 | psql -d dbname -t -A -F"," -c "select * from table_name" > output.csv 226 | ``` 227 | 228 | 229 | ### Joining Tables Using a Shared Key 230 | From CartoDB's tutorial [Join data from two tables using SQL](http://docs.cartodb.com/tutorials/joining_data.html) 231 | 232 | - Join two tables that share a key using an `INNER JOIN`(Postgresql's default join type): 233 | 234 | ``` 235 | SELECT table_1.the_geom,table_1.iso_code,table_2.population 236 | FROM table_1, table_2 237 | WHERE table_1.iso_code = table_2.iso 238 | ``` 239 | 240 | - To update a table's data based on that of a join: 241 | 242 | ``` 243 | UPDATE table_1 as t1 244 | SET population = ( 245 | SELECT population 246 | FROM table_2 247 | WHERE iso = t1.iso_code 248 | LIMIT 1 249 | ) 250 | ``` 251 | 252 | - aggregate data on a join (if table 2 has multiple rows for a unique identifier): 253 | 254 | ``` 255 | SELECT 256 | table_1.the_geom, 257 | table_1.iso_code, 258 | SUM(table_2.total) as total 259 | FROM table_1, table_2 260 | WHERE table_1.iso_code = table_2.iso 261 | GROUP BY table_1.iso_code, table_2.iso 262 | ``` 263 | - update the value of a column based on the aggregate join: 264 | 265 | ``` 266 | UPDATE table_1 as t1 267 | SET total = ( 268 | SELECT SUM(total) 269 | FROM table_2 270 | WHERE iso = t1.iso_code 271 | GROUP BY iso 272 | ) 273 | ``` 274 | 275 | ### Upgrading Postgres 276 | [This Tutorial](http://blog.55minutes.com/2013/09/postgresql-93-brew-upgrade/) was very helpful for upgrading on Mac OS X via homebrew. 277 | 278 | **_WARNING:_** **Back up your data before doing this incase you screw up like I did!** 279 | 280 | Basically the steps are: 281 | 282 | 1. Shut down Postgresql: 283 | `launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist` 284 | 285 | 2. Create a new Postgresql9.x data directory: 286 | `initdb /usr/local/var/postgres9.4 -E utf8` 287 | 288 | 3. Run the pg_upgrade command: 289 | 290 | ``` 291 | pg_upgrade \ 292 | -d /usr/local/var/postgres \ 293 | -D /usr/local/var/postgres9.4 \ 294 | -b /usr/local/Cellar/postgresql/9.3.5_1/bin/ \ 295 | -B /usr/local/Cellar/postgresql/9.4.0/bin/ \ 296 | -v 297 | ``` 298 | 4. Change kernel settings if necessary: 299 | 300 | ``` 301 | sudo sysctl -w kern.sysv.shmall=65536 302 | sudo sysctl -w kern.sysv.shmmax=16777216 303 | ``` 304 | - I also ran sudo vi /etc/sysctl.conf and entered the same values: 305 | 306 | ``` 307 | kern.sysv.shmall=65536 308 | kern.sysv.shmmax=16777216 309 | ``` 310 | - re-run the pg_upgrade command in step 3 311 | 312 | 5. Move the new data directory into place: 313 | 314 | ``` 315 | cd /usr/local/var 316 | mv postgres postgres9.2.4 317 | mv postgres9.3 postgres 318 | ``` 319 | 6. Start the new version of PostgreSQL: 320 | `launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist` 321 | - check to make sure it worked: 322 | 323 | ``` 324 | psql postgres -c "select version()" 325 | psql -l 326 | ``` 327 | 328 | 7. Cleanup: 329 | - `vacuumdb --all --analyze-only` 330 | - `analyze_new_cluster.sh`* 331 | - `delete_old_cluster.sh`* 332 | - `brew cleanup postgresql` 333 | (* scripts were generated in same the directory where `pg_upgrade` was ran) 334 | 335 | 336 | ## PostGIS 337 | PostGIS is the extension for Postgres that allows for working with geometry data types and doing GIS operations in Postgres. 338 | 339 | ### Common Commands 340 | 341 | - to enable PostGIS in a Postgres database do: `CREATE EXTENSION postgis;` 342 | 343 | - to enable PostGIS topology do: `CREATE EXTENSION postgis_topology;` 344 | 345 | - to support OSM tags do: `CREATE EXTENSION hstore;` 346 | 347 | - create a new table for data from a CSV that has lat and lon columns: 348 | 349 | ``` 350 | create table noise.locations 351 | ( 352 | name varchar(100), 353 | complaint varchar(100), descript varchar(100), 354 | boro varchar(50), 355 | lat float8, 356 | lon float8, 357 | geom geometry(POINT, 4326) 358 | ); 359 | ``` 360 | 361 | - inputing values for the geometry type after loading data from a CSV: 362 | `update noise.locations set the_geom = ST_SetSRID(ST_MakePoint(lon, lat), 4326);` 363 | 364 | - adding a geometry column in a non-spatial table: 365 | `select addgeometryColumn('table_name', 'geom', 4326, 'POINT', 2);` 366 | 367 | - calculating area in EPSG 4326: 368 | `alter table noise.hoods set area = (select ST_Area(geom::geography));` 369 | 370 | 371 | ### Common Spatial Queries 372 | You may view more of these in [my intro to Visualizing Geospatial Data with CartoDB](https://github.com/clhenrick/cartodb-tutorial/tree/master/sql). 373 | 374 | **Find all polygons from dataset A that intersect points from dataset B:** 375 | 376 | ``` 377 | SELECT a.* 378 | FROM table_a_polygons a, table_b_points b 379 | WHERE ST_Intersects(a.the_geom, b.the_geom); 380 | ``` 381 | 382 | **Find all rows in a polygon dataset that intersect a given point:** 383 | 384 | ``` 385 | -- note: geometry for point must be in the order lon, lat (x, y) 386 | SELECT * FROM nyc_tenants_rights_service_areas 387 | where 388 | ST_Intersects( 389 | ST_GeomFromText( 390 | 'Point(-73.982557 40.724435)', 4326 391 | ), 392 | nyc_tenants_rights_service_areas.the_geom 393 | ); 394 | ``` 395 | 396 | Or using `ST_Contains`: 397 | 398 | ``` 399 | SELECT * FROM nyc_tenants_rights_service_areas 400 | where 401 | st_contains( 402 | nyc_tenants_rights_service_areas.the_geom, 403 | ST_GeomFromText( 404 | 'Point(-73.917104 40.694827)', 4326 405 | ) 406 | ); 407 | ``` 408 | 409 | **Counting points inside a polygon:** 410 | 411 | With ST_Containts(): 412 | 413 | ``` 414 | SELECT us_counties.the_geom_webmercator,us_counties.cartodb_id, 415 | count(quakes.the_geom) 416 | AS total 417 | FROM us_counties JOIN quakes 418 | ON st_contains(us_counties.the_geom,quakes.the_geom) 419 | GROUP BY us_counties.cartodb_id; 420 | ``` 421 | 422 | To update a column from table A with the number of points from table B that intersect table A's polygons: 423 | 424 | ``` 425 | update noise.hoods set num_complaints = ( 426 | select count(*) 427 | from noise.locations 428 | where 429 | ST_Intersects( 430 | noise.locations.geom, 431 | noise.hoods.geom 432 | ) 433 | ); 434 | ``` 435 | 436 | **Select data within a bounding box** 437 | Using [`ST_MakeEnvelope`](http://postgis.refractions.net/docs/ST_MakeEnvelope.html) 438 | 439 | HINT: You can use [bboxfinder.com](http://bboxfinder.com/) to easily grab coordinates 440 | of a bounding box for a given area. 441 | 442 | ``` 443 | SELECT * FROM some_table 444 | where geom && ST_MakeEnvelope(-73.913891, 40.873781, -73.907229, 40.878251, 4326) 445 | ``` 446 | 447 | **Make a line from a series of points** 448 | 449 | ``` 450 | SELECT ST_MakeLine (the_geom ORDER BY id ASC) 451 | AS the_geom, route 452 | FROM points_table 453 | GROUP BY route; 454 | ``` 455 | 456 | **Order points in a table by distance to a given lat lon** 457 | This one uses CartoDB's built-in function `CDB_LatLng` which is short hand for doing: 458 | `SELECT ST_Transform( ST_GeomFromText( 'Point(-73.982557 40.724435)',),4326)` 459 | 460 | ``` 461 | SELECT * FROM table 462 | ORDER BY the_geom <-> 463 | CDB_LatLng(42.5,-73) LIMIT 10; 464 | ``` 465 | 466 | **Access the previous row of data and get value (time, value, number, etc) difference** 467 | 468 | ``` 469 | WITH calc_duration AS ( 470 | SELECT 471 | cartodb_id, 472 | extract(epoch FROM (date_time - lag(date_time,1) OVER(ORDER BY date_time))) AS 473 | duration_in_seconds 474 | FROM tracking_eric 475 | ORDER BY date_time 476 | ) 477 | UPDATE tracking_eric 478 | SET duration_in_seconds = calc_duration.duration_in_seconds 479 | FROM calc_duration 480 | WHERE calc_duration.cartodb_id = tracking_eric.cartodb_id 481 | ``` 482 | 483 | **select population density by county** 484 | 485 | In this one we cast the geometry data type to the geography data type to get units of measure in meters. 486 | 487 | ``` 488 | SELECT pop_sqkm, 489 | round( pop / (ST_Area(the_geom::geography)/1000000)) 490 | as psqkm 491 | FROM us_counties 492 | ``` 493 | 494 | 495 | ### Spatial Indexing 496 | Makes queries hella fast. [OSGeo](http://revenant.ca/www/postgis/workshop/indexing.html) has a good tutorial. 497 | 498 | - Basically the steps are: 499 | `CREATE INDEX table_name_gix ON table_name USING GIST (geom);` 500 | `VACUUM ANALYZE table_name` 501 | `CLUSTER table_name USING table_name_gix;` 502 | ***Do this every time after making changes to your dataset or importing new data.** 503 | 504 | ### Importing Spatial Data to PostGIS 505 | #### Using shp2pgsql 506 | 1. Do: 507 | `shp2pgsql -I -s 4326 nyc-pediacities-hoods-v3-edit.shp noise.hoods > noise.sql` 508 | Or for using the geography data type do: 509 | `shp2pgsql -G -I nyc-pediacities-hoods-v3-edit.shp noise.nyc-pediacities-hoods-v3-edit_geographic > nyc_pediacities-hoods-v3-edit.sql` 510 | 511 | 2. Do: 512 | `psql -d nyc_noise -f noise.sql` 513 | Or for the geography type above: 514 | `psql -d nyc_noise -f nyc_pediacities-hoods-v3-edit.sql ` 515 | 516 | #### Using osm2pgsql 517 | To import an OpenStreetMap extract in PBF format do: 518 | `osm2pgsql -H localhost --hstore-all -d nyc_from_osm ~/Downloads/newyorkcity.osm.pbf` 519 | 520 | #### Using ogr2ogr 521 | Example importing a GeoJSON file into a database called nyc_pluto: 522 | 523 | ``` 524 | ogr2ogr -f PostgreSQL \ 525 | PG:"host='localhost' user='chrislhenrick' port='5432' \ 526 | dbname='nyc_pluto' password=''" \ 527 | bk_map_pluto_4326.json -nln bk_pluto 528 | ``` 529 | 530 | 531 | ### Exporting Spatial Data from PostGIS 532 | The two main tools used to export spatial data with more complex geometries from Postgres/PostGIS than points are `pgsql2shp` and `ogr2ogr`. 533 | 534 | #### Using pgsql2shp 535 | `pgsql2shp` is a tool that comes installed with PostGIS that allows for exporting data from a PostGIS database to a shapefile format. To use it you need to specify a file path to the output shapefile (just stating the basename with no extension will output in the current working directory), a host name (usually this is `localhost`), a user name, a password for the user, a database name, and an SQL query. 536 | 537 | ``` 538 | pgsql2shp -f -h -u -P databasename "" 539 | ``` 540 | 541 | A sample export of a shapefile called `my_data` from a database called `my_db` looks like this: 542 | 543 | ``` 544 | pgsql2shp -f my_data -h localhost -u clhenrick -P 'mypassword' my_db "SELECT * FROM my_data " 545 | ``` 546 | 547 | #### Using ogr2ogr 548 | **Note:** You may need to set the `GDAL_DATA` path if you git this error: 549 | 550 | ``` 551 | ERROR 4: Unable to open EPSG support file gcs.csv. 552 | Try setting the GDAL_DATA environment variable to point to the 553 | directory containing EPSG csv files. 554 | ``` 555 | If on Linux / Mac OS do this: `export GDAL_DATA=/usr/local/share/gdal` 556 | If on Windows do this: `C:\> set GDAL_DATA=C:\GDAL\data` 557 | 558 | **To Export Data** 559 | Use ogr2ogr as follows to export a table (in this case a table called `dob_jobs_2014`) to a `GeoJSON` file (in this case a file called dob_jobs_2014_geocoded.geojson): 560 | 561 | ``` 562 | ogr2ogr -f GeoJSON -t_srs EPSG:4326 dob_jobs_2014_geocoded.geojson \ 563 | PG:"host='localhost' dbname='dob_jobs' user='chrislhenrick' password='' port='5432'" \ 564 | -sql "SELECT bbl, house, streetname, borough, jobtype, jobstatus, existheight, proposedheight, \ 565 | existoccupancy, proposedoccupany, horizontalenlrgmt, verticalenlrgmt, ownerbusinessname, \ 566 | ownerhousestreet, ownercitystatezip, ownerphone, jobdescription, geom \ 567 | FROM dob_jobs_2014 WHERE geom IS NOT NULL" 568 | ``` 569 | 570 | - **note:** you must select the column containing the geometry (usually `geom` or `wkb_geometry`) for your exported layer to have geometry data. 571 | 572 | ## Other Methods of Interacting With Postgres/PostGIS 573 | to do... 574 | ### PGAdmin 575 | 576 | ### Python 577 | 578 | ### Node JS 579 | --------------------------------------------------------------------------------