├── .gitignore ├── README.md ├── Vagrantfile ├── provisioning ├── tigergeocoder.yml └── vars │ ├── tiger-geocoder-postgres.yml │ └── tiger-mounted-drive.yml ├── setup ├── ansible-ubuntu-setup.sh └── fetch-tiger-geocoder-role.sh └── tiger-local-vm.json /.gitignore: -------------------------------------------------------------------------------- 1 | provisioning/roles/* 2 | .vagrant 3 | tmp/ 4 | .pwd 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # THIS PROJECT IS NO LONGER OFFICIALLY SUPPORTED BY ENIGMA 2 | 3 | # Ansible Playbook for the PostGIS TIGER Geocoder 4 | 5 | 6 | 7 | This Ansible playbook provisions and installs the basic TIGER Geocoder/Postgis setup, which automates the build of a Postgres/Postgis database that includes geography columns for U.S. States and Territories at the following summary levels: 8 | 9 | * census block (tabblock) 10 | * census block group (bg) 11 | * census tract (tract) 12 | * zipcode (zcta5) 13 | * census county subdivision (cousub) 14 | * county (county) 15 | * census place (place) 16 | * state (state) 17 | 18 | If you are launching a fresh AWS ec2 instance, you can just use our [Amazon Machine Image](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) (id: ami-fbd57790). 19 | 20 | Since the AMI is the simplest solution, this playbook is perhaps most useful for someone who wants to run a local geocoder on a virtual machine, or someone who wants to add the geocoder to a pre-existing server. There are instructions for doing both of these things below. 21 | 22 | ## Features 23 | 24 | * Install PostgreSQL 25 | * Install Postgis 26 | * Mount a data drive (optional, recommended!) 27 | * Install `postgis_tiger_geocoder` extension 28 | * Install nation-level geographies and specified state-level geographies 29 | 30 | ## Requirements 31 | 32 | Provisioning tested on: 33 | * Vagrant 1.6.5 34 | * Ansible 1.7.2 35 | 36 | Provisioned box tested with: 37 | * Ubuntu 14.04 38 | * Postgresql 9.3.6 39 | * Postgis 2.1 40 | 41 | An environment variable called `TIGER_DB_PASSWORD` with the password for your PostgreSQL instance. 42 | 43 | ## Pre-installation 44 | 45 | * [Ansible's official installation guide](http://docs.ansible.com/intro_installation.html). 46 | 47 | 48 | * [Vagrant's official installation guide](http://docs.vagrantup.com/v2/installation/). 49 | 50 | If you are using Vagrant, you'll also need to download [Virtualbox](https://www.virtualbox.org/). 51 | 52 | ## Running the playbook 53 | 54 | Ansible is "the simplest way to automate apps and IT infrastructure". Vagrant enables one to "create and configure lightweight, reproducible, and portable development environments." Using one or both of them with this playbook will allow you to launch the TIGER geocoder. 55 | 56 | ### Local Vagrant Install 57 | 58 | For a **local** virtual machine, setup depends on your local system. Make sure you've installed Vagrant, Ansible and Virtualbox (see above links). Then: 59 | ``` 60 | git clone https://github.com/enigma-io/ansible-tiger-geocoder-playbook.git 61 | cd ansible-tiger-geocoder-playbook 62 | # This is intended to be run from the main repo directory 63 | sh setup/fetch-tiger-geocoder-role.sh 64 | # This uses the Vagrantfile included in the home directory of this repo. 65 | vagrant up tiger 66 | ``` 67 | 68 | ### AWS Install 69 | 70 | For a **remote** AWS instance, a few things to be aware of: 71 | 72 | * You'll probably want to use a mounted drive since the TIGER dataset will far exceed the default disk drive for an instance. 73 | 74 | 75 | Once you've provisioned a new AWS box and ssh'd in, get the playbook repo. 76 | 77 | ``` 78 | sudo apt-get update 79 | sudo apt-get install git 80 | git clone https://github.com/enigma-io/ansible-tiger-geocoder-playbook.git 81 | cd ansible-tiger-geocoder-playbook 82 | ``` 83 | 84 | Then run a script that sets up the Ansible role in the proper folder: 85 | ``` 86 | sudo chmod +x ./setup/fetch-tiger-geocoder-role.sh 87 | sudo sh ./setup/fetch-tiger-geocoder-role.sh 88 | ``` 89 | 90 | Then run a script that installs Ansible: 91 | 92 | ``` 93 | sudo chmod +x ./setup/ansible-ubuntu-setup.sh 94 | sudo sh ./setup/ansible-ubuntu-setup.sh 95 | ``` 96 | 97 | Store the password for your postgres database: 98 | 99 | ``` 100 | echo 'export TIGER_DB_PASSWORD=changeme' >> ~/.bashrc 101 | source ~/.bashrc 102 | ``` 103 | 104 | Then open a screen. Running the playbook in a screen will make running this ~24 hour process a lot less annoying! 105 | 106 | ``` 107 | screen -S load_tiger 108 | ``` 109 | 110 | Then, execute the ansible-playbook command: 111 | 112 | ``` 113 | ansible-playbook -i localhost, -vv \ 114 | /home/ubuntu/ansible-tiger-geocoder-playbook/provisioning/tigergeocoder.yml \ 115 | --extra-vars="tiger_local_vm=false tiger_mounted_drive_path=/dev/xvdb" \ 116 | --connection=local 117 | 118 | ``` 119 | 120 | Now you can let the playbook run. Consult `man screen` for more details, but to safely exit the screen enter: `ctrl-a ctrl-d`, and to re-enter the screen to see how it's progressing, you can type `screen -r load_tiger`. 121 | 122 | ## How It Works 123 | 124 | 125 | The playbook deals with a number of `pre_task` steps that are not included in our official `tiger-geocoder` role that make it easy to spin up a fresh local or remote instance with all the requirements to get a geocoder running. That includes installing Postgres and PostGIS, and mounting a data drive. 126 | 127 | 128 | ### Choosing States/Territories to include 129 | 130 | All possible two-letter abbrevations to download and load into the geocoder are included at `provisioning/roles/tiger-geocoder/defaults/main.yml` in the variable`tiger_geos`. 131 | 132 | Comment out those you're not interested in including. 133 | 134 | **Warning**: The role comes with ALL possible variables uncommented! 135 | 136 | ### DB Password 137 | 138 | You must store the password to your database as a local environment variable named `TIGER_DB_PASSWORD`. 139 | 140 | ### Storage 141 | 142 | The playbook is pre-set to assume you will be using a mounted drive, but you can turn this functionality off if you want. 143 | 144 | A mounted drive accommodates the size requirement of installing the geocoder for all possible U.S. States and Territories, which amounts to nearly a hundred gigabytes. If you wanted to download just a part of the data (the state of Wisonsin, perhaps), then you'll have less need of a mounted drive. 145 | 146 | 147 | ##### Turn off mounted drive option 148 | 149 | **Locally:** 150 | Remove the line that starts with `tiger_mounted_drive_path` in the `Vagrantfile` in the home directory of this repo. 151 | 152 | **On AWS:*** 153 | Remove the `tiger_mounted_drive_path` arg from the command-line option in the directions above. 154 | 155 | 156 | ###### Local mounted drive 157 | You can run a local geocoder and host the data itself on a local mounted drive. 158 | 159 | In order to do this, specify the value for the `file_to_disk` key located in the `tiger_vb_mount` field in `tiger-local-vm.json` in the home directory of this repo. 160 | 161 | The default is currently set to `./tmp/tiger_mounted_drive.vdi.`, but it could ostensibly be changed to `/Volumes/your_mounted_4TB/geocoder.vdi`. 162 | 163 | 164 | ### Provisioning Time 165 | 166 | Provisioning the entire Tiger dataset will take a long time! If you plan to include every State and Territory, plan to either have your computer running for upwards of 24 hours, or run the playbook in a screen on a remote host. 167 | 168 | ### Logging in 169 | 170 | After running the provision script, you should be able to log in to your box with the command `vagrant ssh`, log in to postgres via the `psql` cli like: 171 | 172 | `psql -d yourdb -U postgres -h localhost` 173 | 174 | You'll be prompted for your password. You should have set this with your local environment variable `TIGER_DB_PASSWORD` (see the 'DB Password' section). 175 | 176 | Once logged in, you can run a query like: 177 | 178 | ``` 179 | geocoder=# select * from geocode('1600 Pennsylvania Avenue Northwest, Washington, DC 20500'); 180 | ``` 181 | 182 | with results: 183 | ``` 184 | addy | geomout | rating 185 | ----------------------------------------------------+----------------------------------------------------+-------- 186 | (1600,,Pennsylvania,Ave,NW,,Washington,DC,20502,t) | 0101000020AD100000FF3316523F4253C0101234A607734340 | 2 187 | ``` 188 | 189 | Keep in mind that you now have access to PostGIS functions along with the suite of functions that the TIGER Geocoder offers. Documentation for both can be found at [http://postgis.net/docs/](http://postgis.net/docs/) 190 | 191 | 192 | ## Additional TIGER Data 193 | 194 | As mentioned, this playbook builds the base-level TIGER geocoder that comes from scripts generated by invoking the `load_generate` scripts built into the `postgis_tiger_geocoder` Postgis extension. 195 | 196 | If you want to load other summary levels, you can do so by running a script that takes the form: 197 | 198 | `shp2pgsql -c -s 4269 -g the_geom -W "latin1" tl_2013_us_cbsa.dbf tiger.cbsa | psql` 199 | 200 | and then improve the speed of your queries by indexing the geometry file in the table, like: 201 | 202 | `create index tiger_cbsa_the_geom_gist ON tiger.cbsa USING gist (the_geom);` 203 | 204 | where you replace 'cbsa' with the census summary level you're interested in adding to the database. 205 | -------------------------------------------------------------------------------- /Vagrantfile: -------------------------------------------------------------------------------- 1 | require 'json' 2 | require 'fileutils' 3 | # read service definitions 4 | service = File.read('./tiger-local-vm.json') 5 | svc = JSON.parse(service) 6 | 7 | Vagrant.configure("2") do |config| 8 | 9 | config.vm.box = "trusty64" 10 | config.vm.box_url = "https://vagrantcloud.com/ubuntu/boxes/trusty64/versions/14.04/providers/virtualbox.box" 11 | 12 | config.vm.define "tiger" do |machine| 13 | machine.vm.provider :virtualbox do |v| 14 | if svc["ram"] 15 | v.memory = svc["ram"] 16 | else 17 | v.memory = 8096 18 | end 19 | if svc["cpu"] 20 | v.cpus = svc["cpu"] 21 | else 22 | v.cpus = 4 23 | end 24 | # include a mounted drive 25 | if svc["tiger_vb_mount"] 26 | d = svc["tiger_vb_mount"] 27 | v.customize [ 28 | "createhd", 29 | "--filename", d["path"], 30 | "--size", d["size"] 31 | ] 32 | v.customize [ 33 | "storageattach", :id, 34 | "--storagectl", d["storagectl"], 35 | "--port", d["port"], 36 | "--device", d["device"], 37 | "--type", d["type"], 38 | "--medium", d["path"] 39 | ] 40 | end 41 | end 42 | 43 | machine.vm.provision "ansible" do |ansible| 44 | ansible.extra_vars = { ansible_ssh_user: "vagrant", hostname: "tiger"} 45 | ansible.playbook = "./provisioning/tigergeocoder.yml" 46 | ansible.verbose = "vvvv" 47 | ansible.extra_vars = { 48 | tiger_local_vm: true, 49 | tiger_mounted_drive_path: "/dev/sdb" 50 | } 51 | end 52 | end 53 | end -------------------------------------------------------------------------------- /provisioning/tigergeocoder.yml: -------------------------------------------------------------------------------- 1 | --- 2 | - hosts: all 3 | gather_facts: yes 4 | sudo: yes 5 | 6 | vars_files: 7 | - vars/tiger-geocoder-postgres.yml 8 | - vars/tiger-mounted-drive.yml 9 | 10 | # install postgres and postgis, mount a disk, change the data directory 11 | pre_tasks: 12 | - name: install postgres + dependencies 13 | apt: "pkg={{item}} state=installed update_cache=yes" 14 | with_items: 15 | - postgresql 16 | - postgresql-contrib 17 | - postgis 18 | - postgresql-9.3-postgis-2.1 19 | - unzip 20 | - wget 21 | 22 | - name: ensure the tiger mount directory exists 23 | file: "state=directory path={{tiger_mount_point}} mode=0700 owner=postgres group=postgres" 24 | 25 | - name: stop postgres 26 | action: service name=postgresql state=stopped 27 | 28 | - name: Create a ext4 filesystem if a mounted drive is specified 29 | # Note: Using the 'force' parameter failed for local vms during testing. 30 | # Alternatively creating a partition on the mounted disk requires interaction with the terminal, 31 | # so running the command to format the entire mounted disk is the current behavior. 32 | #filesystem: "fstype=ext4 dev={{tiger_mounted_drive_path}} force=yes" 33 | command: "mkfs.ext4 -F {{tiger_mounted_drive_path}}" 34 | when: tiger_mounted_drive_path is defined 35 | 36 | - name: mount a drive if specified 37 | mount: "name={{tiger_mount_point}} src={{tiger_mounted_drive_path}} fstype=ext4 state=mounted" 38 | when: tiger_mounted_drive_path is defined 39 | 40 | - name: ensure the tiger directory exists 41 | file: "state=directory path={{tiger_data_directory}} mode=0700 owner=postgres group=postgres" 42 | 43 | - name: ensure the postgres data directory exists 44 | file: "state=directory path={{tiger_pg_data_directory}} mode=0700 owner=postgres group=postgres" 45 | 46 | - name: change the data directory to a mounted drive in pgconf 47 | lineinfile: dest={{tiger_pg_conf_file}} state=present regexp=^data_directory line=data_directory=\'{{tiger_pg_data_directory}}\' 48 | when: tiger_mounted_drive 49 | 50 | - name: change the data directory to a mounted drive in pgconf 51 | lineinfile: dest={{tiger_pg_conf_file}} state=present regexp=^data_directory line=data_directory=\'{{tiger_pg_data_directory}}\' 52 | when: tiger_mounted_drive 53 | 54 | - name: reconfigure the db with the new data directory 55 | command: "/usr/lib/postgresql/9.3/bin/initdb -D {{tiger_pg_data_directory}}" 56 | sudo: yes 57 | sudo_user: postgres 58 | 59 | - name: restart postgres 60 | action: service name=postgresql state=started 61 | 62 | - name: create tiger db 63 | command: "psql -d postgres -U postgres -c 'CREATE DATABASE {{tiger_pg_db_name}}'" 64 | sudo: yes 65 | sudo_user: postgres 66 | 67 | - name: install postgis extension 68 | command: "psql -d {{tiger_pg_db_name}} -U postgres -c 'CREATE EXTENSION postgis'" 69 | sudo: yes 70 | sudo_user: postgres 71 | 72 | - name: create extension postgis_topology 73 | command: "psql -d {{tiger_pg_db_name}} -U postgres -c 'CREATE EXTENSION IF NOT EXISTS postgis_topology'" 74 | sudo: yes 75 | sudo_user: postgres 76 | 77 | roles: 78 | - tiger-geocoder 79 | -------------------------------------------------------------------------------- /provisioning/vars/tiger-geocoder-postgres.yml: -------------------------------------------------------------------------------- 1 | tiger_pg_conf_file: /etc/postgresql/9.3/main/postgresql.conf 2 | tiger_pg_data_directory: /gisdata/pg 3 | tiger_pg_db_name: census 4 | tiger_pg_password: "{{ lookup('env', 'TIGER_DB_PASSWORD')}}" -------------------------------------------------------------------------------- /provisioning/vars/tiger-mounted-drive.yml: -------------------------------------------------------------------------------- 1 | tiger_mounted_drive: true 2 | tiger_data_directory: /gisdata 3 | tiger_mount_point: /gisdata 4 | tiger_mounted_drive_path: "{{ lookup('env', 'tiger_mounted_drive_path')}}" -------------------------------------------------------------------------------- /setup/ansible-ubuntu-setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # Set up ansible on Ubuntu 12 and > 4 | apt-get update && apt-get install software-properties-common -y 5 | apt-add-repository ppa:ansible/ansible -y 6 | apt-get update && apt-get install ansible -y 7 | apt-get install git -y -------------------------------------------------------------------------------- /setup/fetch-tiger-geocoder-role.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | git clone https://github.com/enigma-io/tiger-geocoder ./provisioning/roles/tiger-geocoder -------------------------------------------------------------------------------- /tiger-local-vm.json: -------------------------------------------------------------------------------- 1 | { 2 | "ram": 8096, 3 | "cpu": 4, 4 | "tiger_vb_mount": { 5 | "path": "./tmp/tiger_mounted_drive.vdi", 6 | "size": 524288, 7 | "storagectl": "SATAController", 8 | "port": 1, 9 | "device": 0, 10 | "type": "hdd" 11 | } 12 | } 13 | 14 | --------------------------------------------------------------------------------