├── .gitignore ├── README.md ├── data ├── county_centers.csv ├── nearest_hei.csv ├── neighborcounties.csv └── stcrosswalk.csv └── scripts ├── nearesthei.r ├── neighborcounties.py └── popcenters.r /.gitignore: -------------------------------------------------------------------------------- 1 | *.dbf 2 | *.shp 3 | *.shx 4 | *.Rhistory 5 | *.DS_Store 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This directory contains spatial data files and the scripts that produce them. The data files are linked both to their GitHub repository location and a raw version to facilitate easy download. Scripts to produce the datasets can be found in the `./scripts` subdirectory of the repository. 2 | 3 | ## Data 4 | 5 | | File name | Script | Language | Description| 6 | |:----------|:-----|:-------|:-----------| 7 | |[`neighborcounties.csv`](https://github.com/btskinner/spatial/blob/master/data/neighborcounties.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/neighborcounties.csv)|`neighborcounties.py`|Python 3.5|Long data file that lists all adjacent counties (2010)| 8 | |[`county_centers.csv`](https://github.com/btskinner/spatial/blob/master/data/county_centers.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/county_centers.csv)|`popcenters.r`|R 3.2.3|Geocoordinates for geographic and population-weighted centers in all counties (2000 and 2010)| 9 | |[`nearest_hei.csv`](https://github.com/btskinner/spatial/blob/master/data/nearest_hei.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/nearest_hei.csv)|`nearesthei.r`|R 3.2.3|Nearest higher education institution (HEI) to every county, by sector (2010-2014)| 10 | 11 | ## Detail 12 | 13 | ### [`neighborcounties.csv`](https://github.com/btskinner/spatial/blob/master/data/neighborcounties.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/neighborcounties.csv) 14 | 15 | This long file links every county in the United States (as of the 2010 Census) with all of its contiguous counties. Five-digit county-level [FIPS](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards) codes are used to identify the counties. These codes uniquely identify each county and can be used to link with other datasets such as the [American Community Survey](https://www.census.gov/programs-surveys/acs/). 16 | 17 | ##### COLUMNS 18 | 19 | | Name | Description| 20 | |:-----|:-----------| 21 | |`orgfips`|Origin county FIPS code| 22 | |`adjfips`|Adjacent county FIPS code| 23 | |`instate`|==1 if adjacent county is in the same state| 24 | 25 | ### [`county_centers.csv`](https://github.com/btskinner/spatial/blob/master/data/county_centers.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/county_centers.csv) 26 | 27 | This wide file gives the latitude and longitude for the spatial and population centers of every county in the United States for the Census years 2000 and 2010. These coordinates are given by the U.S. Census; this file simply collects them in a single easy-to-use file. Five-digit county-level [FIPS](https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards) codes are used to identify the counties. 28 | 29 | ##### COLUMNS 30 | 31 | | Name | Description| 32 | |:-----|:-----------| 33 | |`fips`|Unique county-level five-digit FIPS code| 34 | |`clon00`|Longitude of spatial center, 2000| 35 | |`clat00`|Latitude of spatial center, 2000| 36 | |`clon10`|Longitude of spatial center, 2010| 37 | |`clat10`|Latitude of spatial center, 2010| 38 | |`pclon00`|Longitude of population-weighted center, 2000| 39 | |`pclat00`|Latitude of population-weighted center, 2000| 40 | |`pclon10`|Longitude of population-weighted center, 2010| 41 | |`pclat10`|Latitude of population-weighted center, 2010| 42 | 43 | ### [`nearest_hei.csv`](https://github.com/btskinner/spatial/blob/master/data/nearest_hei.csv) [[Raw]](https://raw.githubusercontent.com/btskinner/spatial/master/data/nearest_hei.csv) 44 | 45 | This long file gives the nearest highest education institution (HEI) to each county population center across a number of years and higher education sectors. Each row gives the nearest institution's [IPEDS](http://nces.ed.gov/ipeds/datacenter/Default.aspx) unique `unitid`, the distance in miles, and indicators for the year and subset of included schools (*e.g.,* nearest public four-year, nearest public two-year, etc.). 46 | 47 | ##### COLUMNS 48 | 49 | | Name | Description| 50 | |:-----|:-----------| 51 | |`fips`|Unique county-level five-digit FIPS code| 52 | |`unitid`|Unique IPEDS identifier for nearest HEI| 53 | |`miles`|Distance in miles between county population center and nearest HEI| 54 | |`limit_instate`|==1 if sample of schools is limited to those in same state as county| 55 | |`year`|Year of match| 56 | |`any`|==1 if any type of HEI is included in sample| 57 | |`limit_fouryr`|==1 if only four-year HEIs are included in sample| 58 | |`limit_twoyr`|==1 if only two-year HEIS are included in sample| 59 | |`limit_pub`|==1 if only public HEIs are included in sample| 60 | |`limit_pnp`|==1 if only private, non-profit HEIs are included in sample| 61 | |`limit_pfp`|==1 if only private, for-profit HEIs are inluced in sample| 62 | 63 | ##### EXAMPLES 64 | 65 | *Absolute nearest HEI (regardless of sector and crossing state lines)* 66 | 67 | * Rows in which `limit_instate == 0` and `any == 1` 68 | 69 | *Nearest instate public four-year HEI* 70 | 71 | * Rows in which `limit_instate == 1` and `limit_fouryr == 1` and `limit_pub == 1` 72 | 73 | *Nearest instate private, for-profit two-year HEI* 74 | 75 | * Rows in which `limit_instate == 1` and `limit_twoyr == 1` and `limit_pfp == 1` 76 | -------------------------------------------------------------------------------- /data/stcrosswalk.csv: -------------------------------------------------------------------------------- 1 | st,stname,stfips,region,division AL,Alabama,01,3,6 AK,Alaska,02,4,9 AZ,Arizona,04,4,8 AR,Arkansas,05,3,7 CA,California,06,4,9 CO,Colorado,08,4,8 CT,Connecticut,09,1,1 DE,Delaware,10,3,5 DC,District of Columbia,11,3,5 FL,Florida,12,3,5 GA,Georgia,13,3,5 HI,Hawaii,15,4,9 ID,Idaho,16,4,8 IL,Illinois,17,2,3 IN,Indiana,18,2,3 IA,Iowa,19,2,4 KS,Kansas,20,2,4 KY,Kentucky,21,3,6 LA,Louisiana,22,3,7 ME,Maine,23,1,1 MD,Maryland,24,3,5 MA,Massachusetts,25,1,1 MI,Michigan,26,2,3 MN,Minnesota,27,2,4 MS,Mississippi,28,3,6 MO,Missouri,29,2,4 MT,Montana,30,4,8 NE,Nebraska,31,2,4 NV,Nevada,32,4,8 NH,New Hampshire,33,1,1 NJ,New Jersey,34,1,2 NM,New Mexico,35,4,8 NY,New York,36,1,2 NC,North Carolina,37,3,5 ND,North Dakota,38,2,4 OH,Ohio,39,2,3 OK,Oklahoma,40,3,6 OR,Oregon,41,4,9 PA,Pennsylvania,42,1,2 RI,Rhode Island,44,1,1 SC,South Carolina,45,3,5 SD,South Dakota,46,2,4 TN,Tennessee,47,3,6 TX,Texas,48,3,7 UT,Utah,49,4,8 VT,Vermont,50,1,1 VA,Virginia,51,3,5 WA,Washington,53,4,9 WV,West Virginia,54,3,5 WI,Wisconsin,55,2,3 WY,Wyoming,56,4,8 -------------------------------------------------------------------------------- /scripts/nearesthei.r: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ## 3 | ## PROJ: Nearest higher education institution to county population center 4 | ## FILE: nearesthei.r 5 | ## AUTH: Benjamin Skinner 6 | ## INIT: 29 December 2015 7 | ## REVN: 8 February 2018 8 | ## 9 | ################################################################################ 10 | 11 | ## PURPOSE ##################################################################### 12 | ## 13 | ## This file is used to find the nearest higher education institution to 14 | ## every county population center in the United States. 15 | ## 16 | ## Latitude and longitude data on colleges come from the IPEDS database. 17 | ## Population center data comes from the United States Census Bureau as put 18 | ## together by the script. 19 | ## 20 | ################################################################################ 21 | 22 | ## clear memory 23 | rm(list=ls()) 24 | 25 | ## required library 26 | libs <- c('dplyr','geosphere','readr') 27 | lapply(libs, require, character.only=TRUE) 28 | 29 | ## directory paths 30 | ddir <- '../data/' 31 | 32 | ## formula (meters to miles) 33 | m2miles <- 0.0006214 34 | 35 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 36 | ## Functions 37 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 38 | 39 | getIpeds <- function(year) { 40 | 41 | ## -------------------------------------- 42 | ## This function downloads and subsets HD 43 | ## IPEDS files 44 | ## -------------------------------------- 45 | 46 | ## file to retrieve 47 | f <- paste0('HD',year) 48 | 49 | ## download file 50 | url <- paste0('https://nces.ed.gov/ipeds/datacenter/data/',f,'.zip') 51 | temp <- tempfile() 52 | download.file(url,temp) 53 | 54 | ## read file; lower names 55 | df <- read_csv(unz(temp,paste0(tolower(f),'.csv'))) 56 | names(df) <- tolower(names(df)) 57 | 58 | ## subset 59 | df <- df %>% 60 | select(unitid,countycd,longitud,latitude,sector) %>% 61 | mutate(fouryr = as.integer(sector %in% c(1,2,3)), 62 | twoyr = as.integer(sector %in% c(4,5,6)), 63 | pub = as.integer(sector %in% c(1,4,7)), 64 | pnp = as.integer(sector %in% c(2,5,8)), 65 | pfp = as.integer(sector %in% c(3,6,9)), 66 | fips = countycd, 67 | stfips = floor(fips/1000)) %>% 68 | filter(!is.na(longitud), 69 | !is.na(latitude)) %>% 70 | filter(fouryr == 1 | twoyr == 1) %>% 71 | select(-c(countycd,sector)) 72 | 73 | ## return df dataframe 74 | return(df) 75 | } 76 | 77 | nearestHei <- function(hei_df,county_df) { 78 | 79 | ## -------------------------------------- 80 | ## This function computes the distances 81 | ## between each county population 82 | ## centroid and HEI and returns a data 83 | ## frame with the nearest HEI to each 84 | ## county centroid. 85 | ## -------------------------------------- 86 | 87 | ## sort dataframes 88 | hei_df <- hei_df %>% arrange(unitid) 89 | county_df <- county_df %>% arrange(fips) 90 | 91 | ## grab vectors of unitid and county fips 92 | fips <- county_df$fips 93 | unitid <- hei_df$unitid 94 | 95 | ## matrix of county lon/lat 96 | cmat <- data.matrix(county_df %>% select(pclon10,pclat10)) 97 | 98 | ## matrix of hei lon/lat 99 | hmat <- data.matrix(hei_df %>% select(longitud,latitude)) 100 | 101 | ## calculate distances (may take a minute) 102 | dist <- distm(cmat,hmat) 103 | 104 | ## add row and column names 105 | rownames(dist) <- fips 106 | colnames(dist) <- unitid 107 | 108 | ## -------------------------------------- 109 | ## Across states 110 | ## -------------------------------------- 111 | 112 | ## get nearest unitid and distance for each county 113 | nearest <- apply(dist, 1, FUN=function(x){ 114 | index <- which.min(x) 115 | return(cbind(names(x[index]),x[index]*m2miles)) 116 | }) 117 | 118 | ## transpose and save as dataframe 119 | nearest <- data.frame(t(nearest),stringsAsFactors=FALSE) 120 | 121 | ## clean up 122 | all <- nearest %>% 123 | mutate(fips = rownames(nearest), 124 | unitid = as.integer(X1), 125 | miles = round(as.numeric(X2),2), 126 | limit_instate = 0) %>% 127 | select(fips,unitid,miles,limit_instate) 128 | 129 | ## -------------------------------------- 130 | ## Instate only 131 | ## -------------------------------------- 132 | 133 | ## number of observations for each dataframe 134 | ncols <- nrow(hei_df) 135 | nrows <- nrow(county_df) 136 | 137 | ## build matrices of state fips; transpose 2nd for overlay 138 | countyst <- matrix(rep(county_df$stfips,ncols),ncol=ncols) 139 | heist <- t(matrix(rep(hei_df$stfips,nrows),ncol=nrows)) 140 | 141 | ## mask: 1==same state, 0==different 142 | mask <- ifelse(countyst == heist, TRUE, FALSE) 143 | 144 | ## where FALSE, make Inf (we want smallest number later) 145 | dist[!mask] <- Inf 146 | 147 | ## get nearest unitid and distance for each county 148 | nearest <- apply(dist, 1, FUN=function(x){ 149 | index <- which.min(x) 150 | return(cbind(names(x[index]),x[index]*m2miles)) 151 | }) 152 | 153 | ## transpose and save as dataframe 154 | nearest <- data.frame(t(nearest),stringsAsFactors=FALSE) 155 | 156 | ## clean up 157 | ins <- nearest %>% 158 | mutate(fips = rownames(nearest), 159 | unitid = as.integer(X1), 160 | miles = round(as.numeric(X2),2), 161 | limit_instate = 1) %>% 162 | select(fips,unitid,miles,limit_instate) 163 | 164 | ## combine and arrange 165 | nearest <- data.frame(rbind(all,ins)) %>% 166 | mutate(fips = as.integer(fips)) %>% 167 | arrange(fips) 168 | 169 | ## return 170 | return(nearest) 171 | } 172 | 173 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 174 | ## Read in population center data 175 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 176 | 177 | ## read data 178 | popcen <- read_csv(paste0(ddir,'county_centers.csv')) 179 | 180 | ## subset to 2010 population centers 181 | popcen <- popcen %>% 182 | mutate(fips = as.integer(fips), 183 | stfips = floor(fips/1000)) %>% 184 | filter(!is.na(pclon10), 185 | !is.na(pclat10)) %>% 186 | select(fips,stfips,pclon10,pclat10) 187 | 188 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 189 | ## Run 190 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 191 | 192 | ## IPEDS years to use 193 | years <- c(2010:2016) 194 | 195 | ## init list 196 | yearlist <- list() 197 | 198 | ## loop through years 199 | for(y in years) { 200 | 201 | ## get appropriate IPEDS file 202 | hei <- getIpeds(y) 203 | 204 | ## Combinations 205 | ## 206 | ## (1) Public four-year 207 | ## (2) Public two-year 208 | ## (3) Private four-year, non-profit 209 | ## (4) Private two-year, non-profit 210 | ## (5) Private four-year, for-profit 211 | ## (6) Private two-year, for-profit 212 | ## (7) Any institution 213 | 214 | ## set up combination vectors 215 | combo <- list(c(0,1,0,1,0,0), # (1) 216 | c(0,0,1,1,0,0), # (2) 217 | c(0,1,0,0,1,0), # (3) 218 | c(0,0,1,0,1,0), # (4) 219 | c(0,1,0,0,0,1), # (5) 220 | c(0,0,1,0,0,1), # (6) 221 | c(1,0,0,0,0,0)) # (7) 222 | 223 | ## init list 224 | dflist <- list() 225 | 226 | for(c in 1:length(combo)) { 227 | 228 | message(paste0('\nCombination ',c)) 229 | 230 | if(c != length(combo)) { 231 | ## subset hei data 232 | hei_sub <- hei %>% 233 | filter(fouryr == combo[[c]][2], 234 | twoyr == combo[[c]][3], 235 | pub == combo[[c]][4], 236 | pnp == combo[[c]][5], 237 | pfp == combo[[c]][6]) 238 | } else { 239 | ## no subset 240 | hei_sub <- hei 241 | } 242 | 243 | ## get nearest 244 | message('\nComputing nearest HEIs') 245 | df <- nearestHei(hei_sub, popcen) 246 | 247 | ## add indicator variables 248 | df <- df %>% 249 | mutate(year = y, 250 | any = combo[[c]][1], 251 | limit_fouryr = combo[[c]][2], 252 | limit_twoyr = combo[[c]][3], 253 | limit_pub = combo[[c]][4], 254 | limit_pnp = combo[[c]][5], 255 | limit_pfp = combo[[c]][6]) 256 | 257 | ## add df to dflist 258 | dflist[[c]] <- df 259 | } 260 | 261 | ## collapse list 262 | message('\nCollapsing list into single dataframe') 263 | out <- bind_rows(dflist) 264 | 265 | ## arrange 266 | yearlist[[as.character(y)]] <- out %>% arrange(fips,year) 267 | } 268 | 269 | ## collapse year list into single dataframe 270 | df <- bind_rows(yearlist) 271 | 272 | ## some states don't have all types of institutions; drop if mile is Inf 273 | df <- df %>% filter(!is.infinite(miles)) 274 | 275 | ## arrange 276 | df <- df %>% arrange(fips,year) 277 | 278 | ## write to disk 279 | write_csv(df,paste0(ddir,'nearest_hei.csv')) 280 | 281 | ## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 282 | ## END FILE 283 | ## ============================================================================= 284 | -------------------------------------------------------------------------------- /scripts/neighborcounties.py: -------------------------------------------------------------------------------- 1 | # ============================================================================== 2 | # 3 | # FILE: neighborcounties.py 4 | # AUTH: Benjamin Skinner 5 | # INIT: 3 July 2015 6 | # 7 | # ============================================================================== 8 | 9 | # libraries 10 | import pysal as ps 11 | import pandas as pd 12 | import numpy as np 13 | 14 | # data dirs 15 | shp = '../data/tl_2010_us_county10.shp' 16 | dbf = '../data/tl_2010_us_county10.dbf' 17 | 18 | # -------------------------------------------------------------------- 19 | # Store neighboring counties 20 | # -------------------------------------------------------------------- 21 | 22 | # message 23 | print('\nFinding adjacent counties.\n') 24 | 25 | # read in data, finding counties that share borders 26 | counties = ps.rook_from_shapefile(shp) 27 | 28 | # store neighbors dictionary 29 | neighbors = counties.neighbors 30 | 31 | # convert dict to dataframe 32 | neighbors = pd.DataFrame.from_dict(neighbors, orient='index') 33 | 34 | # id value is the index value 35 | neighbors['id'] = neighbors.index 36 | 37 | # convert from wide to long 38 | neighbors = pd.melt(neighbors, id_vars='id', value_name='adjid') 39 | 40 | # drop number of neighboring counties 41 | neighbors = neighbors.drop('variable', axis=1) 42 | 43 | # drop values with NaN (flotsam from melt) 44 | neighbors = neighbors[np.isfinite(neighbors['adjid'])] 45 | 46 | # sort by ids 47 | neighbors = neighbors.sort_values(['id','adjid']) 48 | 49 | # -------------------------------------------------------------------- 50 | # Create concordance dataframe 51 | # -------------------------------------------------------------------- 52 | 53 | # message 54 | print('\nCreating concordance dataframe.\n') 55 | 56 | # get accompanying database information 57 | db = ps.open(dbf) 58 | 59 | # select fips column 60 | concordance = pd.DataFrame(db.by_col_array(['GEOID10']), columns=['fips']) 61 | 62 | # create id for merge 63 | concordance['id'] = concordance.index 64 | 65 | # -------------------------------------------------------------------- 66 | # Merge to convert ids to fips codes 67 | # -------------------------------------------------------------------- 68 | 69 | # message 70 | print('\nConverting IDs to FIPS codes.\n') 71 | 72 | # merge to get origin fips values 73 | df = pd.merge(neighbors, concordance, how='left', left_on='id', right_on='id') 74 | df = df.rename(columns = {'fips': 'orgfips'}) 75 | 76 | # merge to get adjacent fips values 77 | df = pd.merge(df, concordance, how='left', left_on='adjid', right_on='id') 78 | df = df.rename(columns = {'fips': 'adjfips'}) 79 | 80 | # subset to just origin and adjacent fips values; sort 81 | df = df[['orgfips','adjfips']] 82 | df = df.sort_values(['orgfips', 'adjfips']) 83 | 84 | # -------------------------------------------------------------------- 85 | # Create indicator for same state counties 86 | # -------------------------------------------------------------------- 87 | 88 | # message 89 | print('\nCreating indicator variable for same state counties.\n') 90 | 91 | # convert to floats 92 | df = df[['orgfips', 'adjfips']].astype(float) 93 | 94 | # add indicator for county in same state 95 | df['instate'] = (np.floor(df['orgfips']/1000)==np.floor((df['adjfips']/1000))) 96 | df['instate'] = df['instate'].astype(int) 97 | 98 | # convert back to string, adding leading zeros 99 | df['orgfips'] = df['orgfips'].astype(int).astype(str).str.zfill(5) 100 | df['adjfips'] = df['adjfips'].astype(int).astype(str).str.zfill(5) 101 | 102 | # -------------------------------------------------------------------- 103 | # Write to disk 104 | # -------------------------------------------------------------------- 105 | 106 | # message 107 | print('\nWriting to disk.\n') 108 | 109 | # final sort 110 | df = df.sort_values(['orgfips', 'adjfips']) 111 | 112 | # write to csv 113 | df.to_csv('../data/neighborcounties.csv', index=False) 114 | 115 | # -------------------------------------------------------------------- 116 | # End file 117 | # ==================================================================== 118 | 119 | -------------------------------------------------------------------------------- /scripts/popcenters.r: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ## 3 | ## PROJ: Population Centers 4 | ## FILE: popcenters.r 5 | ## AUTH: Benjamin Skinner 6 | ## INIT: 26 October 2014 7 | ## 8 | ################################################################################ 9 | 10 | ## PURPOSE ##################################################################### 11 | ## 12 | ## This file is used to create a matrix that gives the population centers for 13 | ## each county in 2000 and 2010. The data are already collected by the U.S. 14 | ## Census Bureau; this script just puts the files together. 15 | ## 16 | ## Raw data files come from U.S. Census files for 2000 and 2010: 17 | ## 18 | ## 2000: ftp://ftp.census.gov/geo/docs/reference/cenpop2000/county 19 | ## 2010: ftp://ftp.census.gov/geo/docs/reference/cenpop2010/county 20 | ## 21 | ################################################################################ 22 | 23 | ## clear memory 24 | rm(list=ls()) 25 | 26 | ## libraries 27 | libs <- c('dplyr','RCurl','readr') 28 | lapply(libs, require, character.only=TRUE) 29 | 30 | ## directories 31 | ddir <- '../data/' 32 | 33 | ################################################################################ 34 | ## CENTERS: 2000 AND 2010 35 | ################################################################################ 36 | 37 | ## raw file directory; files 38 | urldir <- 'ftp://ftp.census.gov/geo/docs/maps-data/data/gazetteer/' 39 | url1 <- paste0(urldir, 'county2k.zip'); 40 | url2 <- paste0(urldir, 'Gaz_counties_national.zip') 41 | 42 | ## set up temp folders and download 43 | temp1 <- tempfile(); download.file(url1, temp1) 44 | temp2 <- tempfile(); download.file(url2, temp2) 45 | 46 | ## read; fixed width for 2000; tab delimited for 2010 47 | cen00 <- read_fwf(unz(temp1, 'county2k.txt', open='rb'), 48 | fwf_widths(c(72,8,9,14,14,12,12,10,11))) 49 | cen10 <- read_delim(unz(temp2, 'Gaz_counties_national.txt', open='rb'), 50 | delim='\t') 51 | 52 | ## clean 53 | cen00 <- cen00 %>% 54 | mutate(fips = substr(cen00$X1,3,7)) %>% 55 | select(fips, X9, X8) %>% 56 | rename(clon00 = X9, 57 | clat00 = X8) 58 | 59 | cen10 <- cen10 %>% 60 | select(GEOID, INTPTLONG, INTPTLAT) %>% 61 | rename(fips = GEOID, 62 | clon10 = INTPTLONG, 63 | clat10 = INTPTLAT) 64 | 65 | ## join 66 | cen <- cen00 %>% full_join(cen10, by='fips') 67 | 68 | ################################################################################ 69 | ## POPCENTERS: 2000 and 2010 70 | ################################################################################ 71 | 72 | ## need to get list of separated state files (2000) 73 | url <- paste0('ftp://ftp.census.gov/geo/docs/reference/cenpop2000/county/') 74 | fn <- unlist(strsplit(getURL(url, dirlistonly = TRUE), '\n')) 75 | 76 | ## download each in turn and store in list (will take a sec...ignore warnings) 77 | stlist <- lapply(fn, FUN = function(x){read_csv(paste0(url,x),col_names=FALSE)}) 78 | 79 | ## collapse list of dataframes into single dataframe 80 | cp00 <- do.call(rbind, stlist) 81 | 82 | ## download raw file (2010) 83 | url <- paste0('ftp://ftp.census.gov/geo/docs/reference/cenpop2010/county/', 84 | 'CenPop2010_Mean_CO.txt') 85 | 86 | ## download/read file; lower names 87 | cp10 <- read_csv(url) 88 | names(cp10) <- tolower(names(cp10)) 89 | 90 | ## ## merge state and country fips 91 | ## cp00$fips <- paste0(cp00$V1, cp00$V2) 92 | 93 | ## clean 94 | cp00 <- cp00 %>% 95 | mutate(fips = paste0(cp00$X1, cp00$X2)) %>% 96 | select(fips, X6, X5) %>% 97 | rename(pclon00 = X6, 98 | pclat00 = X5) 99 | 100 | ## subset table based on what is needed; rename; make numeric 101 | ## cp00 <- cbind(cp00$fips, cp00$V6, cp00$V5) 102 | ## colnames(cp00) <- c('fips','pclon00','pclat00') 103 | ## cp00 <- apply(cp00, 2, FUN = function(x){as.numeric(x)}) 104 | 105 | ## clean 106 | cp10 <- cp10 %>% 107 | mutate(fips = paste0(cp10$statefp, cp10$countyfp)) %>% 108 | select(fips, longitude, latitude) %>% 109 | rename(pclon10 = longitude, 110 | pclat10 = latitude) 111 | 112 | ## ## merge state and country fips 113 | ## cp10$fips <- paste0(cp10$statefp, cp10$countyfp) 114 | 115 | ## ## subset table based on what is needed; rename; make numeric 116 | ## cp10 <- cbind(cp10$fips, cp10$longitude, cp10$latitude) 117 | ## colnames(cp10) <- c('fips','pclon10','pclat10') 118 | ## cp10 <- apply(cp10, 2, FUN = function(x){as.numeric(x)}) 119 | 120 | ## merge 121 | popcen <- cp00 %>% full_join(cp10, by='fips') 122 | 123 | ################################################################################ 124 | ## MERGE ALL 125 | ################################################################################ 126 | 127 | ## merge 128 | centroids <- cen %>% full_join(popcen, by='fips') 129 | 130 | ## clean and sort 131 | centroids <- centroids %>% 132 | filter(fips != 'NANA', 133 | fips != '6985', 134 | as.numeric(fips) <= 57000) %>% 135 | arrange(fips) 136 | 137 | ################################################################################ 138 | ## OUTPUT 139 | ################################################################################ 140 | 141 | write_csv(centroids, paste0(ddir, 'county_centers.csv')) 142 | 143 | ## ----------------------------------------------------------------------------- 144 | ## END FILE 145 | ################################################################################ 146 | --------------------------------------------------------------------------------