├── .gitignore
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | PostgreSQL & PostGIS Cheatsheet
  2 | ===============================
  3 | This is a collection of information on PostgreSQL and PostGIS for what I tend to use most often.
  4 | 
  5 | ## TOC
  6 | - [Installing Postgres & PostGIS](#installation)  
  7 | - [Using Postgres on the command line: PSQL](#psql)
  8 | - [Importing Data into Postgres](#importing-data)
  9 | - [Exporting Data from Postgres](#exporting-data)
 10 | - [Joining Tables](#joining-tables-using-a-shared-key)
 11 | - [Upgrading Postgres](#upgrading-postgres)
 12 | - [PostGIS common commands](#postgis-1)
 13 | - [Common PostGIS spatial queries](#common-spatial-queries)
 14 | - [Spatial Indexing](#spatial-indexing)
 15 | - [Importing spatial data into PostGIS](#importing-spatial-data-to-postgis)
 16 | - [Exporting spatial data from PostGIS](#exporting-spatial-data-from-postgis)
 17 | - [Other Methods of Interacting With Postgres/PostGIS](#other-methods-of-interacting-with-postgres/postgis)
 18 | 
 19 | ## Installation
 20 | ### Postgres
 21 | - to install on Ubuntu do: `apt-get install postgresql`
 22 | 
 23 | - to install on Mac OS X first install [homebrew](http://brew.sh/) and then do `brew install postgresql`
 24 | 
 25 | - to install on Windows...
 26 | 
 27 | Note that for OS X and Ubuntu you may need to run the above commands as a super user / using `sudo`.
 28 | 
 29 | #### Set Up
 30 | On Ubuntu you typically need to log in as the Postgres user and do some admin things:
 31 | 
 32 | - log in as postgres: `sudo -i -u postgres`
 33 | - create a new user: `createuser --interactive`
 34 | - type the name of the new user (no spaces!), typically the same name as your linux user that isn't root. You can add a new linux user by doing `adduser username`.
 35 | - typically you want the user to have super-user privileges, so type `y` when asked.
 36 | - create a new database that has the same name as the new user: `createdb username`
 37 | 
 38 | For Mac OS X you can skip the above if you install with homebrew.
 39 | 
 40 | For Windows....
 41 | 
 42 | 
 43 | #### Starting the Postgres Database
 44 | On Mac OS X:  
 45 | 
 46 | - to start the Postgres server do: `postgres -D /usr/local/var/postgres`
 47 | 
 48 | - or do `pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start` to start and `pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log stop` to stop
 49 | 
 50 | - to have Postgres start everytime you boot your Mac do: `ln -sfv /usr/local/opt/postgresql/*.plist ~/Library/LaunchAgents` then to check that it's working after booting do: `ps ax | grep sql`
 51 | 
 52 | ### PostGIS
 53 | - On Ubuntu do `apt-get install postgis`
 54 | 
 55 | - On Mac OS X the easiest method is via homebrew: `brew install postgis`  
 56 | (note that if you don't have Postgres or GDAL installed already it will automatically install these first).
 57 | 
 58 | - to install on Windows...
 59 | 
 60 | ## psql
 61 | psql is the interactive unix command line tool for interacting with Postgres/PostGIS.
 62 | 
 63 | ### Common Commands
 64 | - log-in / connect to a database name by doing `psql -d db_name`
 65 | 
 66 | - for doing admin type things such as managing db users, log in as the postgres user: `psql postgres;`
 67 | 
 68 | - to create a database: `CREATE DATABASE database-name;`
 69 | 
 70 | - to connect to a database: `\c database-name;`
 71 | 
 72 | - to delete a database `DROP DATABASE database-name;`
 73 | 
 74 | - to connect when starting psql use the `-d` flag like: `psql -d nyc_noise`
 75 | 
 76 | - to list all databases: `\l`
 77 | 
 78 | - to quit psql: `\q`
 79 | 
 80 | - to grant privileges to a user (requires logging in as `postgres` ):
 81 | 
 82 | 	`GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;`
 83 | 
 84 | - to enable the hstore extension ( for key : value pairs, useful when working with OpenStreetMap data) do: `CREATE EXTENSION hstore`
 85 | 
 86 | - to view columns of a table: `\d table_name`
 87 | 
 88 | - to list all columns in a table (helpful when you have a lot of columns!):  
 89 |   `select column_name from information_schema.columns where table_name = 'my_table' order by column_name asc;`
 90 | 
 91 | - to rename a column:  
 92 |   `alter table noise.hoods rename column noise_sqkm to complaints_sqkm;`
 93 | 
 94 | - to change a column's data type:  
 95 |   `alter table noise.hoods alter column noise_area type float;`
 96 | 
 97 | - to compute values from two columns and assign them to another column: `update noise.hoods set noise_area = noise/(area/1000);`
 98 | 
 99 | - to search by wildcard use the `like` (case sensitive) or `ilike` (treats everything as lowercase) command:   
100 |   `SELECT count(*) from violations where inspection_date::text ilike '2014%';`
101 | 
102 | - to insert data into a table:  
103 | 
104 |   ```
105 |   INSERT INTO table_name (column1, column2)
106 |   VALUES
107 |   	(value1, value2);
108 |   ```
109 | 
110 | - to insert data from another table:
111 | 
112 |   ```
113 |   INSERT INTO table_name (value1, value2)
114 |   SELECT column1, column2
115 |   FROM other_table_name
116 |   ```
117 | 
118 | 
119 | - to remove rows using a where clause:  
120 |   `DELETE FROM table_name WHERE some_column = some_value`
121 | 
122 | 
123 | - **list all column names from a table in alphabetical order:**  
124 | 
125 |   ```
126 |   select column_name
127 |   from information_schema.columns
128 |   where table_schema = 'public'
129 |   and table_name = 'bk_pluto'
130 |   order by column_name;
131 |   ```
132 | 
133 | - **List data from a column as a single row, comma separated:**
134 |   1. `SELECT array_to_string( array( SELECT id FROM table ), ',' )`  
135 |   2.  `SELECT string_agg(id, ',') FROM table`
136 | 
137 | - **rename an existing table:**  
138 |   `ALTER TABLE table_name RENAME TO table_name_new;`
139 | 
140 | - **rename an existing column** of a table:  
141 |   `ALTER TABLE table_name RENAME COLUMN column_name TO column_new_name;`
142 | 
143 | - **Find duplicate rows** in a table based on values from two fields:
144 | 
145 | 	```
146 | 	select * from (
147 | 	  SELECT id,
148 | 	  ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
149 | 	  FROM Photos
150 | 	) dups
151 | 	where
152 | 	dups.Row > 1
153 | 	```
154 | 	credit: [MatthewJ on stack-exchange](http://stackoverflow.com/questions/14471179/find-duplicate-rows-with-postgresql)
155 | 
156 | - **Bulk Queries** are efficient when doing multiple inserts or updates of different values:
157 | 
158 | 	```
159 | 	UPDATE election_results o
160 | 	SET votes=n.votes, pro=n.pro
161 | 	FROM (VALUES (1,11,9),
162 | 	             (2,44,28),
163 | 	             (3,25,4)
164 | 	      ) n(county_id,votes,pro)
165 | 	WHERE o.county_id = n.county_id;
166 | 	```
167 | 
168 | 	```
169 | 	INSERT INTO election_results (county_id,voters,pro)
170 | 	VALUES  (1, 11,8),
171 | 	        (12,21,10),
172 | 	        (78,31,27);    
173 |     ```
174 |     ```
175 | 	WITH
176 | 	-- write the new values
177 | 	n(ip,visits,clicks) AS (
178 | 	  VALUES ('192.168.1.1',2,12),
179 | 	         ('192.168.1.2',6,18),
180 | 	         ('192.168.1.3',3,4)
181 | 	),
182 | 	-- update existing rows
183 | 	upsert AS (
184 | 	  UPDATE page_views o
185 | 	  SET visits=n.visits, clicks=n.clicks
186 | 	  FROM n WHERE o.ip = n.ip
187 | 	  RETURNING o.ip
188 | 	)
189 | 	-- insert missing rows
190 | 	INSERT INTO page_views (ip,visits,clicks)
191 | 	SELECT n.ip, n.visits, n.clicks FROM n
192 | 	WHERE n.ip NOT IN (
193 | 	  SELECT ip FROM upsert
194 | 	);    
195 | 	```
196 | ### Importing Data
197 | - import data from a CSV file using the COPY command:  
198 | 
199 | 	```
200 | 	COPY noise.locations (name, complaint, descript, boro, lat, lon)
201 | 	FROM '/Users/chrislhenrick/tutorials/postgresql/data/noise.csv' WITH CSV HEADER;
202 | 	```
203 | - import a CSV file "AS IS" using csvkit's `csvsql` (requires python, pip, csvkit, psycopg2):
204 | 
205 | 	```
206 | 	csvsql --db postgresql:///nyc_pluto --insert 2012_DHCR_Bldg.csv
207 | 	```
208 | 
209 | ### Exporting Data
210 | - export data as a CSV with Headers using COPY:
211 | 
212 | 	```
213 | 	COPY dob_jobs_2014 to '/Users/chrislhenrick/development/nyc_dob_jobs/data/2014/dob_jobs_2014.csv' DELIMITER ',' CSV Header;
214 | 	```
215 | 
216 | - to the current workspace without saving to a file:
217 | 
218 | 	```
219 | 	COPY (SELECT foo FROM bar) TO STDOUT CSV HEADER;
220 | 	```
221 | 
222 | - from the command line w/o connecting to postgres:
223 | 
224 | 	```
225 | 	psql -d dbname -t -A -F"," -c "select * from table_name" > output.csv
226 | 	```
227 | 
228 | 
229 | ### Joining Tables Using a Shared Key
230 | From CartoDB's tutorial [Join data from two tables using SQL](http://docs.cartodb.com/tutorials/joining_data.html)
231 | 
232 | - Join two tables that share a key using an `INNER JOIN`(Postgresql's default join type):
233 | 
234 | 	```
235 | 	SELECT table_1.the_geom,table_1.iso_code,table_2.population
236 | 	FROM table_1, table_2
237 | 	WHERE table_1.iso_code = table_2.iso
238 | 	```
239 | 
240 | - To update a table's data based on that of a join:
241 | 
242 | 	```
243 | 	UPDATE table_1 as t1
244 | 	SET population = (
245 | 	  SELECT population
246 | 	  FROM table_2
247 | 	  WHERE iso = t1.iso_code
248 | 	  LIMIT 1
249 | 	)
250 | 	```
251 | 
252 | - aggregate data on a join (if table 2 has multiple rows for a unique identifier):
253 | 
254 | 	```
255 | 	SELECT
256 | 	  table_1.the_geom,
257 | 	  table_1.iso_code,
258 | 	  SUM(table_2.total) as total
259 | 	FROM table_1, table_2
260 | 	WHERE table_1.iso_code = table_2.iso
261 | 	GROUP BY table_1.iso_code, table_2.iso
262 | 	```
263 | - update the value of a column based on the aggregate join:
264 | 
265 | 	```
266 | 	UPDATE table_1 as t1
267 | 	SET total =  (
268 | 	  SELECT SUM(total)
269 | 	  FROM table_2
270 | 	  WHERE iso = t1.iso_code
271 | 	  GROUP BY iso
272 | 	)
273 | 	```
274 | 
275 | ### Upgrading Postgres
276 | [This Tutorial](http://blog.55minutes.com/2013/09/postgresql-93-brew-upgrade/) was very helpful for upgrading on Mac OS X via homebrew.
277 | 
278 | **_WARNING:_** **Back up your data before doing this incase you screw up like I did!**
279 | 
280 | Basically the steps are:  
281 | 
282 | 1. Shut down Postgresql:  
283 | 	`launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist`
284 | 
285 | 2. Create a new Postgresql9.x data directory:  
286 | 	`initdb /usr/local/var/postgres9.4 -E utf8`
287 | 
288 | 3. Run the pg_upgrade command:
289 | 
290 | 	```
291 | 	pg_upgrade \
292 | 	-d /usr/local/var/postgres \
293 | 	-D /usr/local/var/postgres9.4 \
294 | 	-b /usr/local/Cellar/postgresql/9.3.5_1/bin/ \
295 | 	-B /usr/local/Cellar/postgresql/9.4.0/bin/ \
296 | 	-v
297 | 	```
298 | 4. Change kernel settings if necessary:
299 | 
300 | 	```
301 | 	sudo sysctl -w kern.sysv.shmall=65536
302 | 	sudo sysctl -w kern.sysv.shmmax=16777216
303 | 	```  
304 | 	- I also ran sudo vi /etc/sysctl.conf and entered the same values:
305 | 
306 | 	```
307 | 	kern.sysv.shmall=65536
308 | 	kern.sysv.shmmax=16777216
309 | 	```
310 | 	- re-run the pg_upgrade command in step 3
311 | 
312 | 5. Move the new data directory into place:
313 | 
314 | 	```
315 | 	cd /usr/local/var
316 | 	mv postgres postgres9.2.4
317 | 	mv postgres9.3 postgres
318 | 	```
319 | 6. Start the new version of PostgreSQL:
320 | 	 `launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist`  
321 |    - check to make sure it worked:
322 | 
323 |    ```
324 |    psql postgres -c "select version()"
325 |    psql -l
326 |    ```
327 | 
328 | 7. Cleanup:  
329 | 	- `vacuumdb --all --analyze-only`  
330 | 	- `analyze_new_cluster.sh`*  
331 | 	- `delete_old_cluster.sh`*
332 | 	- `brew cleanup postgresql`  
333 | 	(* scripts were generated in same the directory where `pg_upgrade` was ran)
334 | 
335 | 
336 | ## PostGIS
337 | PostGIS is the extension for Postgres that allows for working with geometry data types and doing GIS operations in Postgres.
338 | 
339 | ### Common Commands
340 | 
341 | - to enable PostGIS in a Postgres database do: `CREATE EXTENSION postgis;`
342 | 
343 | - to enable PostGIS topology do: `CREATE EXTENSION postgis_topology;`
344 | 
345 | - to support OSM tags do: `CREATE EXTENSION hstore;`
346 | 
347 | - create a new table for data from a CSV that has lat and lon columns:
348 | 
349 |   ```
350 |   create table noise.locations
351 |   (                                     
352 | 	name varchar(100),
353 | 	complaint varchar(100), descript varchar(100),
354 | 	boro varchar(50),
355 | 	lat float8,
356 | 	lon float8,
357 | 	geom geometry(POINT, 4326)
358 |   );
359 |   ```
360 | 
361 | - inputing values for the geometry type after loading data from a CSV:  
362 | `update noise.locations set the_geom = ST_SetSRID(ST_MakePoint(lon, lat), 4326);`
363 | 
364 | - adding a geometry column in a non-spatial table:  
365 |   `select addgeometryColumn('table_name', 'geom', 4326, 'POINT', 2);`
366 | 
367 | - calculating area in EPSG 4326:  
368 |   `alter table noise.hoods set area = (select ST_Area(geom::geography));`
369 | 
370 | 
371 | ### Common Spatial Queries
372 | You may view more of these in [my intro to Visualizing Geospatial Data with CartoDB](https://github.com/clhenrick/cartodb-tutorial/tree/master/sql).
373 | 
374 | **Find all polygons from dataset A that intersect points from dataset B:**
375 | 
376 | ```
377 | SELECT a.*
378 | FROM table_a_polygons a, table_b_points b
379 | WHERE ST_Intersects(a.the_geom, b.the_geom);
380 | ```
381 | 
382 | **Find all rows in a polygon dataset that intersect a given point:**
383 | 
384 | ```
385 | -- note: geometry for point must be in the order lon, lat (x, y)
386 | SELECT * FROM nyc_tenants_rights_service_areas
387 | where
388 | ST_Intersects(
389 |   ST_GeomFromText(
390 |    'Point(-73.982557 40.724435)', 4326
391 |   ),
392 |   nyc_tenants_rights_service_areas.the_geom    
393 | );
394 | ```
395 | 
396 | Or using `ST_Contains`:
397 | 
398 | ```
399 | SELECT * FROM nyc_tenants_rights_service_areas
400 | where
401 | st_contains(
402 |   nyc_tenants_rights_service_areas.the_geom,
403 |   ST_GeomFromText(
404 |    'Point(-73.917104 40.694827)', 4326
405 |   )      
406 | );
407 | ```
408 | 
409 | **Counting points inside a polygon:**
410 | 
411 | With ST_Containts():
412 | 
413 | ```
414 | SELECT us_counties.the_geom_webmercator,us_counties.cartodb_id,
415 | count(quakes.the_geom)
416 | AS total
417 | FROM us_counties JOIN quakes
418 | ON st_contains(us_counties.the_geom,quakes.the_geom)
419 | GROUP BY us_counties.cartodb_id;
420 | ```
421 | 
422 | To update a column from table A with the number of points from table B that intersect table A's polygons:  
423 | 
424 | ```
425 | update noise.hoods set num_complaints = (
426 | 	select count(*)
427 | 	from noise.locations
428 | 	where
429 | 	ST_Intersects(
430 | 		noise.locations.geom,
431 | 		noise.hoods.geom
432 | 	)
433 | );
434 | ```
435 | 
436 | **Select data within a bounding box**  
437 | Using [`ST_MakeEnvelope`](http://postgis.refractions.net/docs/ST_MakeEnvelope.html)
438 | 
439 | HINT: You can use [bboxfinder.com](http://bboxfinder.com/) to easily grab coordinates
440 | of a bounding box for a given area.
441 | 
442 | ```
443 | SELECT * FROM some_table
444 | where geom && ST_MakeEnvelope(-73.913891, 40.873781, -73.907229, 40.878251, 4326)
445 | ```
446 | 
447 | **Make a line from a series of points**
448 | 
449 | ```
450 | SELECT ST_MakeLine (the_geom ORDER BY id ASC)
451 | AS the_geom, route
452 | FROM points_table
453 | GROUP BY route;
454 | ```
455 | 
456 | **Order points in a table by distance to a given lat lon**  
457 | This one uses CartoDB's built-in function `CDB_LatLng` which is short hand for doing:  
458 | `SELECT ST_Transform( ST_GeomFromText( 'Point(-73.982557 40.724435)',),4326)`
459 | 
460 | ```
461 | SELECT * FROM table
462 | ORDER BY the_geom <->
463 | CDB_LatLng(42.5,-73) LIMIT 10;
464 | ```
465 | 
466 | **Access the previous row of data and get value (time, value, number, etc) difference**
467 | 
468 | ```
469 | WITH calc_duration AS (
470 |  SELECT
471 |  cartodb_id,
472 |  extract(epoch FROM (date_time - lag(date_time,1) OVER(ORDER BY date_time))) AS
473 | duration_in_seconds
474 |  FROM tracking_eric
475 |  ORDER BY date_time
476 | )
477 | UPDATE tracking_eric
478 | SET duration_in_seconds = calc_duration.duration_in_seconds
479 | FROM calc_duration
480 | WHERE calc_duration.cartodb_id = tracking_eric.cartodb_id
481 | ```
482 | 
483 | **select population density by county**
484 | 
485 | In this one we cast the geometry data type to the geography data type to get units of measure in meters.
486 | 
487 | ```
488 | SELECT pop_sqkm,
489 |  round( pop / (ST_Area(the_geom::geography)/1000000))
490 |  as psqkm
491 |  FROM us_counties
492 | ```
493 | 
494 | 
495 | ### Spatial Indexing
496 | Makes queries hella fast. [OSGeo](http://revenant.ca/www/postgis/workshop/indexing.html) has a good tutorial.
497 | 
498 | - Basically the steps are:  
499 |   `CREATE INDEX table_name_gix ON table_name USING GIST (geom);`  
500 |   `VACUUM ANALYZE table_name`  
501 |   `CLUSTER table_name USING table_name_gix;`  
502 |   ***Do this every time after making changes to your dataset or importing new data.**
503 | 
504 | ### Importing Spatial Data to PostGIS
505 | #### Using shp2pgsql
506 | 1. Do:  
507 | `shp2pgsql -I -s 4326 nyc-pediacities-hoods-v3-edit.shp noise.hoods > noise.sql`  
508 | Or for using the geography data type do:  
509 | `shp2pgsql -G -I nyc-pediacities-hoods-v3-edit.shp noise.nyc-pediacities-hoods-v3-edit_geographic > nyc_pediacities-hoods-v3-edit.sql`
510 | 
511 | 2. Do:  
512 | `psql -d nyc_noise -f noise.sql`  
513 | Or for the geography type above:  
514 | `psql -d nyc_noise -f nyc_pediacities-hoods-v3-edit.sql `
515 | 
516 | #### Using osm2pgsql
517 | To import an OpenStreetMap extract in PBF format do:  
518 | `osm2pgsql -H localhost --hstore-all -d nyc_from_osm ~/Downloads/newyorkcity.osm.pbf`
519 | 
520 | #### Using ogr2ogr
521 | Example importing a GeoJSON file into a database called nyc_pluto:  
522 | 
523 | ```
524 | ogr2ogr -f PostgreSQL \
525 | PG:"host='localhost' user='chrislhenrick' port='5432' \
526 | dbname='nyc_pluto' password=''" \
527 | bk_map_pluto_4326.json -nln bk_pluto
528 | ```
529 | 
530 | 
531 | ### Exporting Spatial Data from PostGIS
532 | The two main tools used to export spatial data with more complex geometries from Postgres/PostGIS than points are `pgsql2shp` and `ogr2ogr`.
533 | 
534 | #### Using pgsql2shp
535 | `pgsql2shp` is a tool that comes installed with PostGIS that allows for exporting data from a PostGIS database to a shapefile format. To use it you need to specify a file path to the output shapefile (just stating the basename with no extension will output in the current working directory), a host name (usually this is `localhost`), a user name, a password for the user, a database name, and an SQL query.
536 | 
537 | ```
538 | pgsql2shp -f <path to output shapefile> -h <hostname> -u <username> -P <password> databasename "<query>"
539 | ```
540 | 
541 | A sample export of a shapefile called `my_data` from a database called `my_db` looks like this:
542 | 
543 | ```
544 | pgsql2shp -f my_data -h localhost -u clhenrick -P 'mypassword' my_db "SELECT * FROM my_data "
545 | ```
546 | 
547 | #### Using ogr2ogr
548 | **Note:** You may need to set the `GDAL_DATA` path if you git this error:
549 | 
550 | ```
551 | ERROR 4: Unable to open EPSG support file gcs.csv.
552 | Try setting the GDAL_DATA environment variable to point to the
553 | directory containing EPSG csv files.
554 | ```
555 | If on Linux / Mac OS do this: `export GDAL_DATA=/usr/local/share/gdal`  
556 | If on Windows do this: `C:\> set GDAL_DATA=C:\GDAL\data`
557 | 
558 | **To Export Data**  
559 | Use ogr2ogr as follows to export a table (in this case a table called `dob_jobs_2014`) to a `GeoJSON` file (in this case a file called dob_jobs_2014_geocoded.geojson):
560 | 
561 | ```
562 | ogr2ogr -f GeoJSON -t_srs EPSG:4326 dob_jobs_2014_geocoded.geojson \
563 | PG:"host='localhost' dbname='dob_jobs' user='chrislhenrick' password='' port='5432'" \
564 | -sql "SELECT bbl, house, streetname, borough, jobtype, jobstatus, existheight, proposedheight, \
565 | existoccupancy, proposedoccupany, horizontalenlrgmt, verticalenlrgmt, ownerbusinessname, \
566 | ownerhousestreet, ownercitystatezip, ownerphone, jobdescription, geom \
567 | FROM dob_jobs_2014 WHERE geom IS NOT NULL"
568 | ```
569 | 
570 | - **note:** you must select the column containing the geometry (usually `geom` or `wkb_geometry`) for your exported layer to have geometry data.
571 | 
572 | ## Other Methods of Interacting With Postgres/PostGIS
573 | to do...
574 | ### PGAdmin
575 | 
576 | ### Python
577 | 
578 | ### Node JS
579 | 


--------------------------------------------------------------------------------