├── .gitignore ├── Glossary.md ├── README.md ├── data ├── census-tracts │ ├── census-tracts.cpg │ ├── census-tracts.dbf │ ├── census-tracts.prj │ ├── census-tracts.shp │ └── census-tracts.shx ├── school-districts │ ├── school-districts.cpg │ ├── school-districts.dbf │ ├── school-districts.prj │ ├── school-districts.qpj │ ├── school-districts.shp │ └── school-districts.shx └── schools │ ├── data-dictionary.pdf │ ├── schools.cpg │ ├── schools.dbf │ ├── schools.prj │ ├── schools.qpj │ ├── schools.shp │ └── schools.shx └── images ├── st_contains_false.png └── st_contains_true.png /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | *.sql 3 | -------------------------------------------------------------------------------- /Glossary.md: -------------------------------------------------------------------------------- 1 | # Glossary 2 | 3 | * **Spatial reference system** (SRS) or **Coordinate Reference System** (CRS): coordinate based system used to locate geographical features. Can be global, regional, or local level. This will define the map projection. 4 | * **SRID** (spatial reference identifier): an identifier associated with a spatial reference system. 5 | * **EPSG** (European Petroleum Survey Group): creators of EPSG Geodetic Parameter Set, which is a database of coordinate systems. 6 | * **WGS** (World Geodetic System): standard coordinate system for Earth. The latest revision is WGS 1984 (WGS 84). 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Using SQL for GIS analysis with PostGIS 2 | 3 | ## Contents 4 | 1. [Importing shapefiles into PostGIS](https://github.com/arianagiorgi/postgis-intro#importing-shapefiles-into-postgis) 5 | 2. [Viewing tables in QGIS](https://github.com/arianagiorgi/postgis-intro#viewing-tables-in-qgis) 6 | 3. [Spatial queries](https://github.com/arianagiorgi/postgis-intro#spatial-queries) 7 | 4. [Exporting query results](https://github.com/arianagiorgi/postgis-intro#exporting-query-results) 8 | 9 | 10 | ## Requirements 11 | * PostgreSQL 12 | * PostGIS 13 | * Postico 14 | * QGIS 3 15 | * ogr2ogr (for exporting, if applicable) 16 | 17 | ## Data included 18 | * [Texas schools](https://schoolsdata2-tea-texas.opendata.arcgis.com/datasets/059432fd0dcb4a208974c235e837c94f_0) from Texas Education Agency 19 | * [Texas school districts](http://schoolsdata2-tea-texas.opendata.arcgis.com/datasets/e115fed14c0f4ca5b942dc3323626b1c_0) from Texas Education Agency 20 | * Census Tracts: [Shapefiles](https://www.census.gov/geo/maps-data/data/tiger-line.html) and [ACS data](https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml) from Census Bureau 21 | 22 | 23 | ## Importing shapefiles into PostGIS 24 | 25 | 1. Create database and enable PostGIS in your database. From the terminal, run: 26 | ``` 27 | $ createdb postgis_intro 28 | $ psql --dbname postgis_intro --command "CREATE EXTENSION postgis;" 29 | ``` 30 | 31 | 2. Determine your project's SRID (see our glossary). 32 | 33 | The most popular SRID is usually EPSG:4326 (WGS 84). You can also look up your shapefile's SRID [here](http://prj2epsg.org/search) by uploading your `.prj` file (which is where the projection information lives). 34 | 35 | Our data files are all EPSG:4326. 36 | 37 | 3. Use `shp2pgsql` command to write a SQL file that will contain table and data information for PostgreSQL to upload: 38 | ``` 39 | $ shp2pgsql -s -I .shp \ 40 | > .sql 41 | ``` 42 | So our commands to upload all tables will look like: 43 | ``` 44 | $ shp2pgsql -s 4326 -I data/schools/schools.shp \ 45 | schools > schools.sql 46 | ``` 47 | ``` 48 | $ shp2pgsql -s 4326 -I data/school-districts/school-districts.shp \ 49 | districts > school-districts.sql 50 | ``` 51 | ``` 52 | $ shp2pgsql -s 4326 -I data/census-tracts/census-tracts.shp \ 53 | tracts > census-tracts.sql 54 | ``` 55 | Then we'll upload the sql file to the database we created: 56 | ``` 57 | $ psql --dbname --file .sql 58 | ``` 59 | Our commands will look like: 60 | ``` 61 | $ psql --dbname postgis_intro --file schools.sql 62 | ``` 63 | ``` 64 | $ psql --dbname postgis_intro --file school-districts.sql 65 | ``` 66 | ``` 67 | $ psql --dbname postgis_intro --file census-tracts.sql 68 | ``` 69 | 70 | ## Viewing tables in QGIS 71 | 72 | 1. In the Browser Panel (along left side), right click "PostGIS" and select "New Connection...". 73 | 2. Enter connection information and then click "OK". 74 | 3. You can now access the database's tables in the database manager, which is accessible by clicking "Database" in the menu and choosing "DB Manager...". 75 | 4. In the DB manager you now have two choices: 76 | - You can add the entire table, as-is, by double-clicking it. 77 | - You can query the database using SQL and add the query result by opening the "SQL window". To write a query, click the second icon in the DB manager toolbar (it looks like a piece of paper with a wrench on it). Enter your query in that window and use the options at the bottom of the pane to add it to your project. 78 | 79 | ## Spatial queries 80 | 81 | PostGIS adds almost 300 new functions and operators to PostgreSQL that can do everything from basic spatial calculations to complex aggregations. It's easy to recognize these new functions because they all use the `ST_` prefix (for spatial type), as you'll see below: 82 | 83 | ### Basic calculations 84 | 85 | Many functions perform simple calculations on a single geometry or pair of geometries. For example, we can see the area of each of our census tracts using the `ST_Area` function: 86 | 87 | ```sql 88 | SELECT name, ST_Area(geom) 89 | FROM tracts 90 | ORDER BY id; 91 | ``` 92 | 93 | [`ST_Area`](https://postgis.net/docs/ST_Area.html) and many functions like it that output measurements report them in the coordinate system of the geometries you're using in your query. In our case, that's WGS84, which tells us that the first census tract in our results has an area of _0.046862144283_. That's because our coordinate system is in degrees. 94 | 95 | If we had been using, say, [NAD83 / Texas South Central](https://epsg.io/2278), which is in feet, we would've gotten our answer in feet. We can see that in action if we transform our geometries to that coordinate system with [`ST_Transform`](https://postgis.net/docs/ST_Transform.html): 96 | 97 | ```sql 98 | SELECT name, ST_Area(ST_Transform(geom, 2278)) 99 | FROM tracts 100 | ORDER BY id; 101 | ``` 102 | 103 | We now get an answer of _5296730440.84863_ (square feet). 104 | 105 | We can also solve this problem using [`Geography`s](https://postgis.net/docs/using_postgis_dbmanagement.html#PostGIS_Geography). Geographies, unlike geometries, operate on spherical coordinates like the latitude and longitude values we're using here. Most common functions, when passed geographies instead of geometries, will return values in meters: 106 | 107 | ```sql 108 | SELECT name, ST_Area(geom::GEOGRAPHY) 109 | FROM tracts 110 | ORDER BY id; 111 | ``` 112 | 113 | We now get an answer of _491172828.83503_ (square meters). 114 | 115 | ### Deriving new geometries 116 | 117 | Some functions can calculate new geometries based on your existing ones. For example, we can get the centroid of all of our census tracts using: 118 | 119 | ```sql 120 | SELECT name, ST_Centroid(geom) 121 | FROM tracts 122 | ORDER BY id; 123 | ``` 124 | 125 | While the original column contains `POLYGON` data, the value returned from [`ST_Centroid`](https://postgis.net/docs/ST_Centroid.html) will be a `POINT`. 126 | 127 | Similarly, we can create buffers around our school locations using [`ST_Buffer`](http://www.postgis.net/docs/ST_Buffer.html), which will derive polygons from the points: 128 | 129 | ```sql 130 | SELECT campname, ST_Buffer(geom, 1) 131 | FROM schools 132 | ORDER BY campname; 133 | ``` 134 | 135 | The size of the buffer (the second function argument) in the example above is in the units of the coordinate system for the geometry. So the above would create a one _degree_ buffer. To buffer by a kilometer, we can cast to a geography just like we did with `ST_Area`: 136 | 137 | ```sql 138 | SELECT campname, ST_Buffer(geom::GEOGRAPHY, 1000) 139 | FROM schools 140 | ORDER BY campname; 141 | ``` 142 | 143 | 144 | ### Spatial relationships 145 | 146 | We really start to see the power of spatial queries when we start to use PostGIS to join and filter out tables based on spatial relationship. For example, we can perform a point-in-polygon query to determine the district that each of our schools is in using the `ST_Contains` function: 147 | 148 | ```sql 149 | SELECT schools.campname, districts.name as district_name 150 | FROM schools 151 | LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom); 152 | ``` 153 | 154 | Combined with a `GROUP BY`, we can now easily count the number of schools in each district: 155 | 156 | ```sql 157 | SELECT districts.name, COUNT(*) AS num_schools 158 | FROM schools 159 | LEFT JOIN districts ON ST_Contains(districts.geom, schools.geom) 160 | GROUP BY districts.name 161 | ORDER BY num_schools DESC; 162 | ``` 163 | 164 | `ST_Contains` is just one of several functions that can be used to characterize the spatial relationship between two geometries. To determine the correct function to use, you should consult the function's documentation, which contains helpful images that show how the function evaluates. 165 | 166 | For example, the [`ST_Contains` documentation](https://postgis.net/docs/ST_Contains.html) says the below geometries would return `TRUE`: 167 | 168 | ![ST_Contains TRUE](images/st_contains_true.png?raw=true) 169 | 170 | And the below would return `FALSE`: 171 | 172 | ![ST_Contains FALSE](images/st_contains_false.png?raw=true) 173 | 174 | We can really see the difference between the spatial functions when we try to use them with two polygon layers - our census tract table and our school districts table. For example, this `ST_Contains` query tells us that there are 2,731 census tracts that are fully contained by our districts: 175 | 176 | ```sql 177 | SELECT COUNT(*) 178 | FROM tracts 179 | INNER JOIN districts ON ST_Contains(districts.geom, tracts.geom); 180 | ``` 181 | 182 | But using [`ST_Overlaps`](http://postgis.net/docs/ST_Overlaps.html) we can see that there are more than 8,000 times where a tract is partially, but not wholly contained by, a district: 183 | 184 | ```sql 185 | SELECT COUNT(*) 186 | FROM tracts 187 | INNER JOIN districts ON ST_Overlaps(districts.geom, tracts.geom); 188 | ``` 189 | 190 | And there are 11,000 times, according to [`ST_Intersects`](https://postgis.net/docs/ST_Intersects.html) that a tract "shares any portion of space". This basically combines the results of `ST_Contains` and `ST_Overlaps`: 191 | 192 | ```sql 193 | SELECT COUNT(*) 194 | FROM tracts 195 | INNER JOIN districts ON ST_Intersects(districts.geom, tracts.geom); 196 | ``` 197 | 198 | Note that there are only about 5,000 census tracts in the state. But the above queries are capturing each pair of overlaps between our districts and census tracts. 199 | 200 | _See [here](http://postgis.net/workshops/postgis-intro/spatial_relationships.html) for a complete list of the relationship functions._ 201 | 202 | ### Aggregations 203 | 204 | We can take what we've learned above about spatial relationships and use it to relate data about our census tracts to our school districts. For example, we can take tract-level data and use it to calculate district-level data. 205 | 206 | The below adds a column to our `ST_Intersects` query from above that includes the percentage overlap between the district and the tract: 207 | 208 | ```sql 209 | SELECT 210 | districts.name, 211 | tracts.id, 212 | ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) AS overlap_pct 213 | FROM districts 214 | INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom); 215 | ``` 216 | 217 | We can then allocate that percentage of the under-18-year-olds in the census table to the school district: 218 | 219 | ```sql 220 | SELECT 221 | districts.name, 222 | tracts.id, 223 | ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18 AS under_18_pop 224 | FROM districts 225 | INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom); 226 | ``` 227 | 228 | And by adding a `GROUP BY` we can get a district-wide estimate of the under-18 population: 229 | 230 | ```sql 231 | SELECT 232 | districts.name, 233 | SUM(ST_Area(ST_Intersection(districts.geom, tracts.geom)) / ST_Area(tracts.geom) * tracts.age_undr18) AS under_18_pop 234 | FROM districts 235 | INNER JOIN tracts ON ST_Intersects(districts.geom, tracts.geom) 236 | GROUP BY districts.name; 237 | ``` 238 | 239 | ## Exporting query results 240 | 241 | ### Exporting in QGIS 242 | 1. Right click layer in Layer Panel. Select "Export" > "Save Features As..." 243 | 2. Under "Format" you can chose from shapefile, geojson, CSV, etc. Make sure the CRS is populated with what you want, or you can change it if you'd like. 244 | 245 | ### Exporting with pgsql2shp 246 | The pgsql2shp command to call from the terminal will look like: 247 | ``` 248 | $ pgsql2shp -f -g "" 249 | ``` 250 | Here is an example of what ours might look like, if we wanted only Dallas and Forth Worth school districts from the `districts` table: 251 | ``` 252 | $ pgsql2shp -f filter-query.shp -g geom postgis_intro \ 253 | "select * from districts where name in ('Dallas ISD', 'Fort Worth ISD')" 254 | ``` 255 | 256 | ### Exporting with ogr2ogr 257 | This is a good option if you want to go straight from PostGIS to a file other than a shapefile (think GeoJSON for a D3 viz or to load into Mapbox). 258 | 259 | Here is a typical command: 260 | ``` 261 | $ ogr2ogr -f "" \ 262 | PG:"host= dbname=''" \ 263 | -sql "";" 264 | ``` 265 | 266 | If we wanted to export the entire `districts` table: 267 | ``` 268 | $ ogr2ogr -f "GeoJSON" school-districts.json \ 269 | PG:"host=localhost dbname='postgis_intro'" \ 270 | -sql "select * from districts;" 271 | ``` 272 | -------------------------------------------------------------------------------- /data/census-tracts/census-tracts.cpg: -------------------------------------------------------------------------------- 1 | UTF-8 -------------------------------------------------------------------------------- /data/census-tracts/census-tracts.dbf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/census-tracts/census-tracts.dbf -------------------------------------------------------------------------------- /data/census-tracts/census-tracts.prj: -------------------------------------------------------------------------------- 1 | GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]] -------------------------------------------------------------------------------- /data/census-tracts/census-tracts.shp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/census-tracts/census-tracts.shp -------------------------------------------------------------------------------- /data/census-tracts/census-tracts.shx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/census-tracts/census-tracts.shx -------------------------------------------------------------------------------- /data/school-districts/school-districts.cpg: -------------------------------------------------------------------------------- 1 | UTF-8 -------------------------------------------------------------------------------- /data/school-districts/school-districts.dbf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/school-districts/school-districts.dbf -------------------------------------------------------------------------------- /data/school-districts/school-districts.prj: -------------------------------------------------------------------------------- 1 | GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]] -------------------------------------------------------------------------------- /data/school-districts/school-districts.qpj: -------------------------------------------------------------------------------- 1 | GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]] 2 | -------------------------------------------------------------------------------- /data/school-districts/school-districts.shp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/school-districts/school-districts.shp -------------------------------------------------------------------------------- /data/school-districts/school-districts.shx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/school-districts/school-districts.shx -------------------------------------------------------------------------------- /data/schools/data-dictionary.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/schools/data-dictionary.pdf -------------------------------------------------------------------------------- /data/schools/schools.cpg: -------------------------------------------------------------------------------- 1 | UTF-8 -------------------------------------------------------------------------------- /data/schools/schools.dbf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/schools/schools.dbf -------------------------------------------------------------------------------- /data/schools/schools.prj: -------------------------------------------------------------------------------- 1 | GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]] -------------------------------------------------------------------------------- /data/schools/schools.qpj: -------------------------------------------------------------------------------- 1 | GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]] 2 | -------------------------------------------------------------------------------- /data/schools/schools.shp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/schools/schools.shp -------------------------------------------------------------------------------- /data/schools/schools.shx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/data/schools/schools.shx -------------------------------------------------------------------------------- /images/st_contains_false.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/images/st_contains_false.png -------------------------------------------------------------------------------- /images/st_contains_true.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/arianagiorgi/postgis-intro/087c7c595f229669fe16faa35a51f3fb115e88fe/images/st_contains_true.png --------------------------------------------------------------------------------