├── .gitignore
├── README.md
├── data
    ├── county_centers.csv
    ├── nearest_hei.csv
    ├── neighborcounties.csv
    └── stcrosswalk.csv
└── scripts
    ├── nearesthei.r
    ├── neighborcounties.py
    └── popcenters.r


/.gitignore:
--------------------------------------------------------------------------------
1 | *.dbf
2 | *.shp
3 | *.shx
4 | *.Rhistory
5 | *.DS_Store
6 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | This directory contains spatial data files and the scripts that produce them. The data files are linked both to their GitHub repository location and a raw version to facilitate easy download. Scripts to produce the datasets can be found in the `./scripts` subdirectory of the repository.  
 2 | 
 3 | ## Data
 4 | 
 5 | | File name | Script | Language | Description|
 6 | |:----------|:-----|:-------|:-----------|
 7 | |[`neighborcounties.csv`](https://github.com/btskinner/spatial/blob/master/data/neighborcounties.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/neighborcounties.csv)|`neighborcounties.py`|Python 3.5|Long data file that lists all adjacent counties (2010)|
 8 | |[`county_centers.csv`](https://github.com/btskinner/spatial/blob/master/data/county_centers.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/county_centers.csv)|`popcenters.r`|R 3.2.3|Geocoordinates for geographic and population-weighted centers in all counties (2000 and 2010)|
 9 | |[`nearest_hei.csv`](https://github.com/btskinner/spatial/blob/master/data/nearest_hei.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/nearest_hei.csv)|`nearesthei.r`|R 3.2.3|Nearest higher education institution (HEI) to every county, by sector (2010-2014)|
10 | 
11 | ## Detail
12 | 
13 | ### [`neighborcounties.csv`](https://github.com/btskinner/spatial/blob/master/data/neighborcounties.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/neighborcounties.csv)
14 |   
15 | This long file links every county in the United States (as of the 2010 Census) with all of its contiguous counties. Five-digit county-level [FIPS](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards) codes are used to identify the counties. These codes uniquely identify each county and can be used to link with other datasets such as the [American Community Survey](https://www.census.gov/programs-surveys/acs/).
16 | 
17 | ##### COLUMNS
18 | 
19 | | Name | Description|
20 | |:-----|:-----------|
21 | |`orgfips`|Origin county FIPS code|
22 | |`adjfips`|Adjacent county FIPS code|
23 | |`instate`|==1 if adjacent county is in the same state|
24 | 
25 | ### [`county_centers.csv`](https://github.com/btskinner/spatial/blob/master/data/county_centers.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/county_centers.csv)  
26 | 
27 | This wide file gives the latitude and longitude for the spatial and population centers of every county in the United States for the Census years 2000 and 2010. These coordinates are given by the U.S. Census; this file simply collects them in a single easy-to-use file. Five-digit county-level [FIPS](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards) codes are used to identify the counties. 
28 | 
29 | ##### COLUMNS
30 | 
31 | | Name | Description|
32 | |:-----|:-----------|
33 | |`fips`|Unique county-level five-digit FIPS code|
34 | |`clon00`|Longitude of spatial center, 2000|
35 | |`clat00`|Latitude of spatial center, 2000|
36 | |`clon10`|Longitude of spatial center, 2010|
37 | |`clat10`|Latitude of spatial center, 2010|
38 | |`pclon00`|Longitude of population-weighted center, 2000|
39 | |`pclat00`|Latitude of population-weighted center, 2000|
40 | |`pclon10`|Longitude of population-weighted center, 2010|
41 | |`pclat10`|Latitude of population-weighted center, 2010|
42 | 
43 | ### [`nearest_hei.csv`](https://github.com/btskinner/spatial/blob/master/data/nearest_hei.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/nearest_hei.csv)  
44 | 
45 | This long file gives the nearest highest education institution (HEI) to each county population center across a number of years and higher education sectors. Each row gives the nearest institution's [IPEDS](http://nces.ed.gov/ipeds/datacenter/Default.aspx) unique `unitid`, the distance in miles, and indicators for the year and subset of included schools (*e.g.,* nearest public four-year, nearest public two-year, etc.).
46 | 
47 | ##### COLUMNS
48 | 
49 | | Name | Description|
50 | |:-----|:-----------|
51 | |`fips`|Unique county-level five-digit FIPS code|
52 | |`unitid`|Unique IPEDS identifier for nearest HEI|
53 | |`miles`|Distance in miles between county population center and nearest HEI|
54 | |`limit_instate`|==1 if sample of schools is limited to those in same state as county|
55 | |`year`|Year of match|
56 | |`any`|==1 if any type of HEI is included in sample|
57 | |`limit_fouryr`|==1 if only four-year HEIs are included in sample|
58 | |`limit_twoyr`|==1 if only two-year HEIS are included in sample|
59 | |`limit_pub`|==1 if only public HEIs are included in sample|
60 | |`limit_pnp`|==1 if only private, non-profit HEIs are included in sample|
61 | |`limit_pfp`|==1 if only private, for-profit HEIs are inluced in sample|
62 | 
63 | ##### EXAMPLES
64 | 
65 | *Absolute nearest HEI (regardless of sector and crossing state lines)*
66 | 
67 | * Rows in which `limit_instate == 0` and `any == 1`
68 | 
69 | *Nearest instate public four-year HEI*
70 | 
71 | * Rows in which `limit_instate == 1` and `limit_fouryr == 1` and `limit_pub == 1`
72 | 
73 | *Nearest instate private, for-profit two-year HEI*
74 | 
75 | * Rows in which `limit_instate == 1` and `limit_twoyr == 1` and `limit_pfp == 1`
76 | 


--------------------------------------------------------------------------------
/data/stcrosswalk.csv:
--------------------------------------------------------------------------------
1 | st,stname,stfips,region,divisionAL,Alabama,01,3,6AK,Alaska,02,4,9AZ,Arizona,04,4,8AR,Arkansas,05,3,7CA,California,06,4,9CO,Colorado,08,4,8CT,Connecticut,09,1,1DE,Delaware,10,3,5DC,District of Columbia,11,3,5FL,Florida,12,3,5GA,Georgia,13,3,5HI,Hawaii,15,4,9ID,Idaho,16,4,8IL,Illinois,17,2,3IN,Indiana,18,2,3IA,Iowa,19,2,4KS,Kansas,20,2,4KY,Kentucky,21,3,6LA,Louisiana,22,3,7ME,Maine,23,1,1MD,Maryland,24,3,5MA,Massachusetts,25,1,1MI,Michigan,26,2,3MN,Minnesota,27,2,4MS,Mississippi,28,3,6MO,Missouri,29,2,4MT,Montana,30,4,8NE,Nebraska,31,2,4NV,Nevada,32,4,8NH,New Hampshire,33,1,1NJ,New Jersey,34,1,2NM,New Mexico,35,4,8NY,New York,36,1,2NC,North Carolina,37,3,5ND,North Dakota,38,2,4OH,Ohio,39,2,3OK,Oklahoma,40,3,6OR,Oregon,41,4,9PA,Pennsylvania,42,1,2RI,Rhode Island,44,1,1SC,South Carolina,45,3,5SD,South Dakota,46,2,4TN,Tennessee,47,3,6TX,Texas,48,3,7UT,Utah,49,4,8VT,Vermont,50,1,1VA,Virginia,51,3,5WA,Washington,53,4,9WV,West Virginia,54,3,5WI,Wisconsin,55,2,3WY,Wyoming,56,4,8


--------------------------------------------------------------------------------
/scripts/nearesthei.r:
--------------------------------------------------------------------------------
  1 | ################################################################################
  2 | ##
  3 | ## PROJ: Nearest higher education institution to county population center
  4 | ## FILE: nearesthei.r
  5 | ## AUTH: Benjamin Skinner
  6 | ## INIT: 29 December 2015
  7 | ## REVN: 8 February 2018
  8 | ##
  9 | ################################################################################
 10 | 
 11 | ## PURPOSE #####################################################################
 12 | ##
 13 | ## This file is used to find the nearest higher education institution to
 14 | ## every county population center in the United States.
 15 | ##
 16 | ## Latitude and longitude data on colleges come from the IPEDS database.
 17 | ## Population center data comes from the United States Census Bureau as put
 18 | ## together by the <popcenters.r> script.
 19 | ##
 20 | ################################################################################
 21 | 
 22 | ## clear memory
 23 | rm(list=ls())
 24 | 
 25 | ## required library
 26 | libs <- c('dplyr','geosphere','readr')
 27 | lapply(libs, require, character.only=TRUE)
 28 | 
 29 | ## directory paths
 30 | ddir <- '../data/'
 31 | 
 32 | ## formula (meters to miles)
 33 | m2miles <- 0.0006214
 34 | 
 35 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 36 | ## Functions
 37 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 38 | 
 39 | getIpeds <- function(year) {
 40 | 
 41 |     ## --------------------------------------
 42 |     ## This function downloads and subsets HD
 43 |     ## IPEDS files
 44 |     ## --------------------------------------
 45 | 
 46 |     ## file to retrieve
 47 |     f <- paste0('HD',year)
 48 | 
 49 |     ## download file
 50 |     url <- paste0('https://nces.ed.gov/ipeds/datacenter/data/',f,'.zip')
 51 |     temp <- tempfile()
 52 |     download.file(url,temp)
 53 | 
 54 |     ## read file; lower names
 55 |     df <- read_csv(unz(temp,paste0(tolower(f),'.csv')))
 56 |     names(df) <- tolower(names(df))
 57 | 
 58 |     ## subset
 59 |     df <- df %>%
 60 |         select(unitid,countycd,longitud,latitude,sector) %>%
 61 |         mutate(fouryr = as.integer(sector %in% c(1,2,3)),
 62 |                twoyr = as.integer(sector %in% c(4,5,6)),
 63 |                pub = as.integer(sector %in% c(1,4,7)),
 64 |                pnp = as.integer(sector %in% c(2,5,8)),
 65 |                pfp = as.integer(sector %in% c(3,6,9)),
 66 |                fips = countycd,
 67 |                stfips = floor(fips/1000)) %>%
 68 |         filter(!is.na(longitud),
 69 |                !is.na(latitude)) %>%
 70 |         filter(fouryr == 1 | twoyr == 1) %>%
 71 |         select(-c(countycd,sector))
 72 | 
 73 |     ## return df dataframe
 74 |     return(df)
 75 | }
 76 | 
 77 | nearestHei <- function(hei_df,county_df) {
 78 | 
 79 |     ## --------------------------------------
 80 |     ## This function computes the distances
 81 |     ## between each county population
 82 |     ## centroid and HEI and returns a data
 83 |     ## frame with the nearest HEI to each
 84 |     ## county centroid.
 85 |     ## --------------------------------------
 86 | 
 87 |     ## sort dataframes
 88 |     hei_df <- hei_df %>% arrange(unitid)
 89 |     county_df <- county_df %>% arrange(fips)
 90 | 
 91 |     ## grab vectors of unitid and county fips
 92 |     fips <- county_df$fips
 93 |     unitid <- hei_df$unitid
 94 | 
 95 |     ## matrix of county lon/lat
 96 |     cmat <- data.matrix(county_df %>% select(pclon10,pclat10))
 97 | 
 98 |     ## matrix of hei lon/lat
 99 |     hmat <- data.matrix(hei_df %>% select(longitud,latitude))
100 | 
101 |     ## calculate distances (may take a minute)
102 |     dist <- distm(cmat,hmat)
103 | 
104 |     ## add row and column names
105 |     rownames(dist) <- fips
106 |     colnames(dist) <- unitid
107 | 
108 |     ## --------------------------------------
109 |     ## Across states
110 |     ## --------------------------------------
111 | 
112 |     ## get nearest unitid and distance for each county
113 |     nearest <- apply(dist, 1, FUN=function(x){
114 |         index <- which.min(x)
115 |         return(cbind(names(x[index]),x[index]*m2miles))
116 |     })
117 | 
118 |     ## transpose and save as dataframe
119 |     nearest <- data.frame(t(nearest),stringsAsFactors=FALSE)
120 | 
121 |     ## clean up
122 |     all <- nearest %>%
123 |         mutate(fips = rownames(nearest),
124 |                unitid = as.integer(X1),
125 |                miles = round(as.numeric(X2),2),
126 |                limit_instate = 0) %>%
127 |         select(fips,unitid,miles,limit_instate)
128 | 
129 |     ## --------------------------------------
130 |     ## Instate only
131 |     ## --------------------------------------
132 | 
133 |     ## number of observations for each dataframe
134 |     ncols <- nrow(hei_df)
135 |     nrows <- nrow(county_df)
136 | 
137 |     ## build matrices of state fips; transpose 2nd for overlay
138 |     countyst <- matrix(rep(county_df$stfips,ncols),ncol=ncols)
139 |     heist <- t(matrix(rep(hei_df$stfips,nrows),ncol=nrows))
140 | 
141 |     ## mask: 1==same state, 0==different
142 |     mask <- ifelse(countyst == heist, TRUE, FALSE)
143 | 
144 |     ## where FALSE, make Inf (we want smallest number later)
145 |     dist[!mask] <- Inf
146 | 
147 |      ## get nearest unitid and distance for each county
148 |     nearest <- apply(dist, 1, FUN=function(x){
149 |         index <- which.min(x)
150 |         return(cbind(names(x[index]),x[index]*m2miles))
151 |     })
152 | 
153 |     ## transpose and save as dataframe
154 |     nearest <- data.frame(t(nearest),stringsAsFactors=FALSE)
155 | 
156 |     ## clean up
157 |     ins <- nearest %>%
158 |         mutate(fips = rownames(nearest),
159 |                unitid = as.integer(X1),
160 |                miles = round(as.numeric(X2),2),
161 |                limit_instate = 1) %>%
162 |         select(fips,unitid,miles,limit_instate)
163 | 
164 |     ## combine and arrange
165 |     nearest <- data.frame(rbind(all,ins)) %>%
166 |         mutate(fips = as.integer(fips)) %>%
167 |         arrange(fips)
168 | 
169 |     ## return
170 |     return(nearest)
171 | }
172 | 
173 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
174 | ## Read in population center data
175 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
176 | 
177 | ## read data
178 | popcen <- read_csv(paste0(ddir,'county_centers.csv'))
179 | 
180 | ## subset to 2010 population centers
181 | popcen <- popcen %>%
182 |     mutate(fips = as.integer(fips),
183 |            stfips = floor(fips/1000)) %>%
184 |     filter(!is.na(pclon10),
185 |            !is.na(pclat10)) %>%
186 |     select(fips,stfips,pclon10,pclat10)
187 | 
188 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189 | ## Run
190 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
191 | 
192 | ## IPEDS years to use
193 | years <- c(2010:2016)
194 | 
195 | ## init list
196 | yearlist <- list()
197 | 
198 | ## loop through years
199 | for(y in years) {
200 | 
201 |     ## get appropriate IPEDS file
202 |     hei <- getIpeds(y)
203 | 
204 |     ## Combinations
205 |     ##
206 |     ## (1) Public four-year
207 |     ## (2) Public two-year
208 |     ## (3) Private four-year, non-profit
209 |     ## (4) Private two-year, non-profit
210 |     ## (5) Private four-year, for-profit
211 |     ## (6) Private two-year, for-profit
212 |     ## (7) Any institution
213 | 
214 |     ## set up combination vectors
215 |     combo <- list(c(0,1,0,1,0,0),       # (1)
216 |                   c(0,0,1,1,0,0),       # (2)
217 |                   c(0,1,0,0,1,0),       # (3)
218 |                   c(0,0,1,0,1,0),       # (4)
219 |                   c(0,1,0,0,0,1),       # (5)
220 |                   c(0,0,1,0,0,1),       # (6)
221 |                   c(1,0,0,0,0,0))       # (7)
222 | 
223 |     ## init list
224 |     dflist <- list()
225 | 
226 |     for(c in 1:length(combo)) {
227 | 
228 |         message(paste0('\nCombination ',c))
229 | 
230 |         if(c != length(combo)) {
231 |             ## subset hei data
232 |             hei_sub <- hei %>%
233 |                 filter(fouryr == combo[[c]][2],
234 |                        twoyr == combo[[c]][3],
235 |                        pub == combo[[c]][4],
236 |                        pnp == combo[[c]][5],
237 |                        pfp == combo[[c]][6])
238 |         } else {
239 |             ## no subset
240 |             hei_sub <- hei
241 |         }
242 | 
243 |         ## get nearest
244 |         message('\nComputing nearest HEIs')
245 |         df <- nearestHei(hei_sub, popcen)
246 | 
247 |         ## add indicator variables
248 |         df <- df %>%
249 |             mutate(year = y,
250 |                    any = combo[[c]][1],
251 |                    limit_fouryr = combo[[c]][2],
252 |                    limit_twoyr = combo[[c]][3],
253 |                    limit_pub = combo[[c]][4],
254 |                    limit_pnp = combo[[c]][5],
255 |                    limit_pfp = combo[[c]][6])
256 | 
257 |         ## add df to dflist
258 |         dflist[[c]] <- df
259 |     }
260 | 
261 |     ## collapse list
262 |     message('\nCollapsing list into single dataframe')
263 |     out <- bind_rows(dflist)
264 | 
265 |     ## arrange
266 |     yearlist[[as.character(y)]] <- out %>% arrange(fips,year)
267 | }
268 | 
269 | ## collapse year list into single dataframe
270 | df <- bind_rows(yearlist)
271 | 
272 | ## some states don't have all types of institutions; drop if mile is Inf
273 | df <- df %>% filter(!is.infinite(miles))
274 | 
275 | ## arrange
276 | df <- df %>% arrange(fips,year)
277 | 
278 | ## write to disk
279 | write_csv(df,paste0(ddir,'nearest_hei.csv'))
280 | 
281 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
282 | ## END FILE
283 | ## =============================================================================
284 | 


--------------------------------------------------------------------------------
/scripts/neighborcounties.py:
--------------------------------------------------------------------------------
  1 | # ==============================================================================
  2 | #
  3 | # FILE: neighborcounties.py
  4 | # AUTH: Benjamin Skinner
  5 | # INIT: 3 July 2015
  6 | #
  7 | # ==============================================================================
  8 | 
  9 | # libraries
 10 | import pysal as ps
 11 | import pandas as pd
 12 | import numpy as np
 13 | 
 14 | # data dirs
 15 | shp = '../data/tl_2010_us_county10.shp'
 16 | dbf = '../data/tl_2010_us_county10.dbf'
 17 | 
 18 | # --------------------------------------------------------------------
 19 | # Store neighboring counties
 20 | # --------------------------------------------------------------------
 21 | 
 22 | # message
 23 | print('\nFinding adjacent counties.\n')
 24 | 
 25 | # read in data, finding counties that share borders
 26 | counties = ps.rook_from_shapefile(shp)
 27 | 
 28 | # store neighbors dictionary
 29 | neighbors = counties.neighbors
 30 | 
 31 | # convert dict to dataframe
 32 | neighbors = pd.DataFrame.from_dict(neighbors, orient='index')
 33 | 
 34 | # id value is the index value
 35 | neighbors['id'] = neighbors.index
 36 | 
 37 | # convert from wide to long
 38 | neighbors = pd.melt(neighbors, id_vars='id', value_name='adjid')
 39 | 
 40 | # drop number of neighboring counties
 41 | neighbors = neighbors.drop('variable', axis=1)
 42 | 
 43 | # drop values with NaN (flotsam from melt)
 44 | neighbors = neighbors[np.isfinite(neighbors['adjid'])]
 45 | 
 46 | # sort by ids
 47 | neighbors = neighbors.sort_values(['id','adjid'])
 48 | 
 49 | # --------------------------------------------------------------------
 50 | # Create concordance dataframe
 51 | # --------------------------------------------------------------------
 52 | 
 53 | # message
 54 | print('\nCreating concordance dataframe.\n')
 55 | 
 56 | # get accompanying database information
 57 | db = ps.open(dbf)
 58 | 
 59 | # select fips column
 60 | concordance = pd.DataFrame(db.by_col_array(['GEOID10']), columns=['fips'])
 61 | 
 62 | # create id for merge
 63 | concordance['id'] = concordance.index
 64 | 
 65 | # --------------------------------------------------------------------
 66 | # Merge to convert ids to fips codes
 67 | # --------------------------------------------------------------------
 68 | 
 69 | # message
 70 | print('\nConverting IDs to FIPS codes.\n')
 71 | 
 72 | # merge to get origin fips values
 73 | df = pd.merge(neighbors, concordance, how='left', left_on='id', right_on='id')
 74 | df = df.rename(columns = {'fips': 'orgfips'})
 75 | 
 76 | # merge to get adjacent fips values
 77 | df = pd.merge(df, concordance, how='left', left_on='adjid', right_on='id')
 78 | df = df.rename(columns = {'fips': 'adjfips'})
 79 | 
 80 | # subset to just origin and adjacent fips values; sort
 81 | df = df[['orgfips','adjfips']]
 82 | df = df.sort_values(['orgfips', 'adjfips'])
 83 | 
 84 | # --------------------------------------------------------------------
 85 | # Create indicator for same state counties
 86 | # --------------------------------------------------------------------
 87 | 
 88 | # message
 89 | print('\nCreating indicator variable for same state counties.\n')
 90 | 
 91 | # convert to floats
 92 | df = df[['orgfips', 'adjfips']].astype(float)
 93 | 
 94 | # add indicator for county in same state
 95 | df['instate'] = (np.floor(df['orgfips']/1000)==np.floor((df['adjfips']/1000)))
 96 | df['instate'] = df['instate'].astype(int)
 97 | 
 98 | # convert back to string, adding leading zeros
 99 | df['orgfips'] = df['orgfips'].astype(int).astype(str).str.zfill(5)
100 | df['adjfips'] = df['adjfips'].astype(int).astype(str).str.zfill(5)
101 | 
102 | # --------------------------------------------------------------------
103 | # Write to disk
104 | # --------------------------------------------------------------------
105 | 
106 | # message
107 | print('\nWriting to disk.\n')
108 | 
109 | # final sort
110 | df = df.sort_values(['orgfips', 'adjfips'])
111 | 
112 | # write to csv
113 | df.to_csv('../data/neighborcounties.csv', index=False)
114 | 
115 | # --------------------------------------------------------------------
116 | # End file
117 | # ====================================================================
118 | 
119 | 


--------------------------------------------------------------------------------
/scripts/popcenters.r:
--------------------------------------------------------------------------------
  1 | ################################################################################
  2 | ##
  3 | ## PROJ: Population Centers
  4 | ## FILE: popcenters.r
  5 | ## AUTH: Benjamin Skinner
  6 | ## INIT: 26 October 2014
  7 | ##
  8 | ################################################################################
  9 | 
 10 | ## PURPOSE #####################################################################
 11 | ##
 12 | ## This file is used to create a matrix that gives the population centers for
 13 | ## each county in 2000 and 2010. The data are already collected by the U.S.
 14 | ## Census Bureau; this script just puts the files together.
 15 | ##
 16 | ## Raw data files come from U.S. Census files for 2000 and 2010:
 17 | ##
 18 | ## 2000: ftp://ftp.census.gov/geo/docs/reference/cenpop2000/county
 19 | ## 2010: ftp://ftp.census.gov/geo/docs/reference/cenpop2010/county
 20 | ##
 21 | ################################################################################
 22 | 
 23 | ## clear memory
 24 | rm(list=ls())
 25 | 
 26 | ## libraries
 27 | libs <- c('dplyr','RCurl','readr')
 28 | lapply(libs, require, character.only=TRUE)
 29 | 
 30 | ## directories
 31 | ddir <- '../data/'
 32 | 
 33 | ################################################################################
 34 | ## CENTERS: 2000 AND 2010
 35 | ################################################################################
 36 | 
 37 | ## raw file directory; files
 38 | urldir <- 'ftp://ftp.census.gov/geo/docs/maps-data/data/gazetteer/'
 39 | url1 <- paste0(urldir, 'county2k.zip');
 40 | url2 <- paste0(urldir, 'Gaz_counties_national.zip')
 41 | 
 42 | ## set up temp folders and download
 43 | temp1 <- tempfile(); download.file(url1, temp1)
 44 | temp2 <- tempfile(); download.file(url2, temp2)
 45 | 
 46 | ## read; fixed width for 2000; tab delimited for 2010
 47 | cen00 <- read_fwf(unz(temp1, 'county2k.txt', open='rb'),
 48 |                   fwf_widths(c(72,8,9,14,14,12,12,10,11)))
 49 | cen10 <- read_delim(unz(temp2, 'Gaz_counties_national.txt', open='rb'),
 50 |                     delim='\t')
 51 | 
 52 | ## clean
 53 | cen00 <- cen00 %>%
 54 |     mutate(fips = substr(cen00$X1,3,7)) %>%
 55 |     select(fips, X9, X8) %>%
 56 |     rename(clon00 = X9,
 57 |            clat00 = X8)
 58 | 
 59 | cen10 <- cen10 %>%
 60 |     select(GEOID, INTPTLONG, INTPTLAT) %>%
 61 |     rename(fips = GEOID,
 62 |            clon10 = INTPTLONG,
 63 |            clat10 = INTPTLAT)
 64 | 
 65 | ## join
 66 | cen <- cen00 %>% full_join(cen10, by='fips')
 67 | 
 68 | ################################################################################
 69 | ## POPCENTERS: 2000 and 2010
 70 | ################################################################################
 71 | 
 72 | ## need to get list of separated state files (2000)
 73 | url <- paste0('ftp://ftp.census.gov/geo/docs/reference/cenpop2000/county/')
 74 | fn <- unlist(strsplit(getURL(url, dirlistonly = TRUE), '\n'))
 75 | 
 76 | ## download each in turn and store in list (will take a sec...ignore warnings)
 77 | stlist <- lapply(fn, FUN = function(x){read_csv(paste0(url,x),col_names=FALSE)})
 78 | 
 79 | ## collapse list of dataframes into single dataframe
 80 | cp00 <- do.call(rbind, stlist)
 81 | 
 82 | ## download raw file (2010)
 83 | url <- paste0('ftp://ftp.census.gov/geo/docs/reference/cenpop2010/county/',
 84 |               'CenPop2010_Mean_CO.txt')
 85 | 
 86 | ## download/read file; lower names
 87 | cp10 <- read_csv(url)
 88 | names(cp10) <- tolower(names(cp10))
 89 | 
 90 | ## ## merge state and country fips
 91 | ## cp00$fips <- paste0(cp00$V1, cp00$V2)
 92 | 
 93 | ## clean
 94 | cp00 <- cp00 %>%
 95 |     mutate(fips = paste0(cp00$X1, cp00$X2)) %>%
 96 |     select(fips, X6, X5) %>%
 97 |     rename(pclon00 = X6,
 98 |            pclat00 = X5)
 99 | 
100 | ## subset table based on what is needed; rename; make numeric
101 | ## cp00 <- cbind(cp00$fips, cp00$V6, cp00$V5)
102 | ## colnames(cp00) <- c('fips','pclon00','pclat00')
103 | ## cp00 <- apply(cp00, 2, FUN = function(x){as.numeric(x)})
104 | 
105 | ## clean
106 | cp10 <- cp10 %>%
107 |     mutate(fips = paste0(cp10$statefp, cp10$countyfp)) %>%
108 |     select(fips, longitude, latitude) %>%
109 |     rename(pclon10 = longitude,
110 |            pclat10 = latitude)
111 | 
112 | ## ## merge state and country fips
113 | ## cp10$fips <- paste0(cp10$statefp, cp10$countyfp)
114 | 
115 | ## ## subset table based on what is needed; rename; make numeric
116 | ## cp10 <- cbind(cp10$fips, cp10$longitude, cp10$latitude)
117 | ## colnames(cp10) <- c('fips','pclon10','pclat10')
118 | ## cp10 <- apply(cp10, 2, FUN = function(x){as.numeric(x)})
119 | 
120 | ## merge
121 | popcen <- cp00 %>% full_join(cp10, by='fips')
122 | 
123 | ################################################################################
124 | ## MERGE ALL
125 | ################################################################################
126 | 
127 | ## merge
128 | centroids <- cen %>% full_join(popcen, by='fips')
129 | 
130 | ## clean and sort
131 | centroids <- centroids %>%
132 |     filter(fips != 'NANA',
133 |            fips != '6985',
134 |            as.numeric(fips) <= 57000) %>%
135 |     arrange(fips)
136 | 
137 | ################################################################################
138 | ## OUTPUT
139 | ################################################################################
140 | 
141 | write_csv(centroids, paste0(ddir, 'county_centers.csv'))
142 | 
143 | ## -----------------------------------------------------------------------------
144 | ## END FILE
145 | ################################################################################
146 | 


--------------------------------------------------------------------------------