├── .gitignore ├── README.md ├── __init__.py ├── bootstrap_geonames.sh ├── process_geonames_us.py ├── test.py ├── tweet_us_city_geocoder.py ├── tweet_us_state_geocoder.py ├── us.cities.json ├── us.states.json.gz ├── us_cities_geocode.csv ├── us_geocode.csv ├── us_geocode.csv.gz └── util.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by https://www.gitignore.io 2 | 3 | .python-version 4 | 5 | ### Windows ### 6 | # Windows image file caches 7 | Thumbs.db 8 | ehthumbs.db 9 | 10 | # Folder config file 11 | Desktop.ini 12 | 13 | # Recycle Bin used on file shares 14 | $RECYCLE.BIN/ 15 | 16 | # Windows Installer files 17 | *.cab 18 | *.msi 19 | *.msm 20 | *.msp 21 | 22 | # Windows shortcuts 23 | *.lnk 24 | 25 | 26 | ### OSX ### 27 | .DS_Store 28 | .AppleDouble 29 | .LSOverride 30 | 31 | # Icon must end with two \r 32 | Icon 33 | 34 | 35 | # Thumbnails 36 | ._* 37 | 38 | # Files that might appear on external disk 39 | .Spotlight-V100 40 | .Trashes 41 | 42 | # Directories potentially created on remote AFP share 43 | .AppleDB 44 | .AppleDesktop 45 | Network Trash Folder 46 | Temporary Items 47 | .apdisk 48 | 49 | 50 | ### Linux ### 51 | *~ 52 | 53 | # KDE directory preferences 54 | .directory 55 | 56 | 57 | ### Intellij ### 58 | # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm 59 | 60 | *.iml 61 | 62 | ## Directory-based project format: 63 | .idea/ 64 | # if you remove the above rule, at least ignore the following: 65 | 66 | # User-specific stuff: 67 | # .idea/workspace.xml 68 | # .idea/tasks.xml 69 | # .idea/dictionaries 70 | 71 | # Sensitive or high-churn files: 72 | # .idea/dataSources.ids 73 | # .idea/dataSources.xml 74 | # .idea/sqlDataSources.xml 75 | # .idea/dynamic.xml 76 | # .idea/uiDesigner.xml 77 | 78 | # Gradle: 79 | # .idea/gradle.xml 80 | # .idea/libraries 81 | 82 | # Mongo Explorer plugin: 83 | # .idea/mongoSettings.xml 84 | 85 | ## File-based project format: 86 | *.ipr 87 | *.iws 88 | 89 | ## Plugin-specific files: 90 | 91 | # IntelliJ 92 | out/ 93 | 94 | # mpeltonen/sbt-idea plugin 95 | .idea_modules/ 96 | 97 | # JIRA plugin 98 | atlassian-ide-plugin.xml 99 | 100 | # Crashlytics plugin (for Android Studio and IntelliJ) 101 | com_crashlytics_export_strings.xml 102 | crashlytics.properties 103 | crashlytics-build.properties 104 | 105 | 106 | ### SublimeText ### 107 | # cache files for sublime text 108 | *.tmlanguage.cache 109 | *.tmPreferences.cache 110 | *.stTheme.cache 111 | 112 | # workspace files are user-specific 113 | *.sublime-workspace 114 | 115 | # project files should be checked into the repository, unless a significant 116 | # proportion of contributors will probably not be using SublimeText 117 | # *.sublime-project 118 | 119 | # sftp configuration file 120 | sftp-config.json 121 | 122 | 123 | ### MicrosoftOffice ### 124 | *.tmp 125 | 126 | # Word temporary 127 | ~$*.doc* 128 | 129 | # Excel temporary 130 | ~$*.xls* 131 | 132 | # Excel Backup File 133 | *.xlk 134 | 135 | 136 | ### LaTeX ### 137 | *.acn 138 | *.acr 139 | *.alg 140 | *.aux 141 | *.bbl 142 | *.blg 143 | *.dvi 144 | *.fdb_latexmk 145 | *.glg 146 | *.glo 147 | *.gls 148 | *.idx 149 | *.ilg 150 | *.ind 151 | *.ist 152 | *.lof 153 | *.log 154 | *.lot 155 | *.maf 156 | *.mtc 157 | *.mtc0 158 | *.nav 159 | *.nlo 160 | *.out 161 | *.pdfsync 162 | *.ps 163 | *.snm 164 | *.synctex.gz 165 | *.toc 166 | *.vrb 167 | *.xdy 168 | *.tdo 169 | 170 | ### Python ### 171 | # Byte-compiled / optimized / DLL files 172 | __pycache__/ 173 | *.py[cod] 174 | 175 | # C extensions 176 | *.so 177 | 178 | # Distribution / packaging 179 | .Python 180 | env/ 181 | build/ 182 | develop-eggs/ 183 | dist/ 184 | downloads/ 185 | eggs/ 186 | lib/ 187 | lib64/ 188 | parts/ 189 | sdist/ 190 | var/ 191 | *.egg-info/ 192 | .installed.cfg 193 | *.egg 194 | 195 | # PyInstaller 196 | # Usually these files are written by a python script from a template 197 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 198 | *.manifest 199 | *.spec 200 | 201 | # Installer logs 202 | pip-log.txt 203 | pip-delete-this-directory.txt 204 | 205 | # Unit test / coverage reports 206 | htmlcov/ 207 | .tox/ 208 | .coverage 209 | .cache 210 | nosetests.xml 211 | coverage.xml 212 | 213 | # Translations 214 | *.mo 215 | *.pot 216 | 217 | # Django stuff: 218 | *.log 219 | 220 | # Sphinx documentation 221 | docs/_build/ 222 | 223 | # PyBuilder 224 | target/ 225 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | Twitter User Geocoder 3 | ========== 4 | 5 | A utility to help you map Twitter users to specific states (or cities, this is more problematic, and the solution here isn't that reliable). 6 | 7 | The name `geocoder` is a little misleading, as I wasn't attempt to resolve the location string to a set of geocodes (i.e., latitude and longitude). 8 | 9 | The following information is somewhat from my paper (based on other papers that I have read, and my own experience). In some of our projects, we are trying to study regional differences (e.g., sentiments, public perceptions,etc.) based on Twitter data. For example, we studied how people use [(trans-)gender identification terminology](http://bianjiang.github.io/twitter-language-on-transgender/) differently across different regions; and test to see whether the gender identification terms used by the lgbt community is different from the general public (it's obviously true, but we need data to support this conclusion). 10 | 11 | But anyway, back to the topic of geocoding. For geotagging, we extracted the ‘location’ field, part of a user’s profile, and attempted to assign a U.S. state to each tweet accordingly. Specifically, we searched each location field for a number of lexical patterns indicating the location of the user such as the name of a state (e.g., Arkansas or Florida), or a city name in combination with a state name or state abbreviation in various possible formats (e.g., “——, fl” or “——, florida” or “——, fl, usa”). Selfreported locations are often nonsensical (e.g., “wonder land” or “up in the sky”), but strict patterns produced good matches and helped to reduce the number of false positives. 12 | 13 | Notably Twitter also provides the ability to attach geocodes (i.e., latitude and longitude) to a user’s profile and to each tweet. Yet, since geolocation needs to be enabled explicitly by the user as well as requires the user to have a device that is capable of capturing geocodes (e.g., a mobile phone with GPS turned on), very few tweets we have collected have this information. This is consistent with findings from previous studies. If the ‘location’ field was missing in a user’s profile, but the ‘geo’ attribute was available (or the geocodes are embedded in the `location` field, which is very common for third-party mobile apps that post Tweets on user's behalf early on), we attempted to resolve the location of the user through reverse geocoding via the publicly available GeoNames geographical database. However, we did not use the geocodes attached to each individual tweet since it is possible that a user was traveling away from their home state, in which case the geocodes attached to the tweets would be different from those on their profile. For our study, we geotagged the tweets based on where the user is from, not where the user is traveling temporarily. However, we do consider the scenario where a user permanently moved from one state to another reflected as a change in the ‘location’ field of a user’s profile. 14 | 15 | Installation 16 | ------------ 17 | 18 | None... just clone this and start using it. It's not that complicated yet to have a setup.py.. 19 | 20 | git clone git://github.com/bianjiang/twitter-user-geocoder.git 21 | 22 | Dependencies 23 | ------------ 24 | I think this is Python 3 only. At least I haven't tested this on Python 2 yet. 25 | 26 | How to use 27 | ------------ 28 | It's basically trying to resolve the `location` string to a U.S. state or city based on lexical patterns. So, the first thing this tool does is to construct a dictionary based on [GeoNames](http://www.geonames.org/) database, which is done by running: 29 | 30 | $ ./bootstraph_geonames.sh 31 | 32 | It downloads the relevant databases from geonames, and run `process_geonames_us.py` to generate the lexical dictionaries. 33 | 34 | **Note that, the `bootstraph_geonames.sh` uses `aria2c` to download the file, and I uses [pyenv](https://github.com/yyuu/pyenv) to manage my python versions.** 35 | 36 | **I included the dictionaries I have generated in this repo. The `us.states.json` is too big, so I have to compressed it (with `gzip`). Before you use it, you have to unzip it first.** 37 | 38 | **You should regenerate the dictionaries, since GeoNames databases are still growing. But there are a few caveats (see below), so I had to manually fix a few lexical patterns. ** 39 | 40 | Now, I split the code into two `tweet_us_city_geocoder.py` and `tweet_us_state_geocoder.py`. And the functionality of each is obvious based on the names. At the end of each script you see a test case. 41 | 42 | For `tweet_us_state_geocoder.py`, the output is the two-letter abbreviated state names or `None` if the geocodes out side of US. 43 | 44 | ```python 45 | 46 | logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') 47 | tug = TweetUSStateGeocoder() 48 | 49 | logger.info(tug.get_state('xxx: (-37.81, 144.96)')) # output None, geocodes out side of US 50 | logger.info(tug.get_state('Little Rock, AR')) # output 'ar' 51 | 52 | ``` 53 | 54 | For `tweet_us_city_geocoder.py`, the output is a `dict` object. 55 | 56 | ```python 57 | logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') 58 | tug = TweetUSCityGeocoder() 59 | 60 | logger.info(tug.get_city_state('xxx: (39.76838,-86.15804)')) # output ({'city': 'indianapolis', 'state': 'in'}) 61 | ``` 62 | 63 | The output is: 64 | 65 | ```python 66 | {'city': 'indianapolis', 'state': 'in'} 67 | ``` 68 | 69 | 70 | Caveats 71 | ------------ 72 | There are a number of issues such as the data in GeoNames isn't complete. But the biggest issue is duplicated city names. So I had to make a choice between e.g., hollywood, ca vs. hollywood, fl, when you see a `location` string of `hollywood`. This is obvious, you will choose `hollywood, ca`, but you have to realize that you will have no tweets being categorized as from `hollywood, fl` any more. It's a trade-off that you have to make. 73 | 74 | For the dictionaries I included in this repo, I manually fixed a lot of this based on my own use-case. 75 | 76 | #### Cities with duplicate names: 77 | 78 | # INFO: ['hollywood', 'kansas city', 'peoria', 'columbia', 'springfield', 'columbus', 'glendale', 'aurora'] 79 | 80 | hollywood, ca vs. hollywood, fl 81 | ```` 82 | { 83 | "names": ["hollywood"], 84 | "state": "fl", 85 | "city": "hollywood" 86 | } 87 | ```` 88 | 89 | kansas city, mo vs. kansas city, ks (They are the same, border city) 90 | 91 | peoria, il vs. north peoria, il vs. peoria, az (Treat it as peoria, il) 92 | { 93 | "names": ["peoria"], 94 | "state": "az", 95 | "city": "peoria" 96 | } 97 | 98 | los angeles, vs. east los angeles (Take los angeles) 99 | 100 | las vegas vs. north las vegas (Take las vegas) 101 | { 102 | "names": ["north las vegas"], 103 | "state": "nv", 104 | "city": "north las vegas" 105 | } 106 | 107 | columbia, sc vs. columbia, mo (Take columbia, sc, but removed since it's confusing with district of columbia) 108 | { 109 | "names": ["columbia"], 110 | "state": "mo", 111 | "city": "columbia" 112 | } 113 | 114 | springfield, mo vs. springfield, ma vs. springfield, il (Take springfield, mo) 115 | 116 | chattanooga, tn vs. east chattanooga, tn (Take chattanooga, tn) 117 | 118 | columbus, oh vs. columbus, ga (Take columnbus, oh) 119 | 120 | glendale, ca vs north glendale, ca vs. glendale, az (Take glendale, ca) 121 | 122 | aurora, co vs. aurora, il (Take aurora, co) 123 | 124 | independence, mo vs. east independence, mo (Take independence, mo) 125 | 126 | boston, ma vs. south boston, ma (Take boston, ma) 127 | 128 | memphis, tn vs. new south memphis, tn (Take memphis, tn) 129 | 130 | raleigh, nc vs. west raleigh, nc (Take raleigh, nc) 131 | 132 | ### License 133 | ------------ 134 | 135 | The MIT License (MIT) 136 | Copyright (c) 2015 Jiang Bian (ji0ng.bi0n@gmail.com) 137 | 138 | Permission is hereby granted, free of charge, to any person obtaining a copy of 139 | this software and associated documentation files (the "Software"), to deal in 140 | the Software without restriction, including without limitation the rights to 141 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 142 | the Software, and to permit persons to whom the Software is furnished to do so, 143 | subject to the following conditions: 144 | 145 | The above copyright notice and this permission notice shall be included in all 146 | copies or substantial portions of the Software. 147 | 148 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 149 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 150 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 151 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 152 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 153 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 154 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bianjiang/twitter-user-geocoder/5c080e5e9f54303538c14469fdb7ce82760f89b5/__init__.py -------------------------------------------------------------------------------- /bootstrap_geonames.sh: -------------------------------------------------------------------------------- 1 | aria2c -s 5 -x 5 http://download.geonames.org/export/zip/US.zip 2 | mv US.zip zip_US.zip 3 | unzip zip_US.zip 4 | mv US.txt zip_US.txt 5 | rm -rf readme.txt 6 | aria2c -s 5 -x 5 http://download.geonames.org/export/dump/US.zip 7 | unzip US.zip 8 | rm -rf readme.txt 9 | aria2c -s 5 -x 5 http://download.geonames.org/export/dump/allCountries.zip 10 | unzip allCountries.zip 11 | rm -rf readme.txt 12 | aria2c -s 5 -x 5 http://download.geonames.org/export/dump/cities1000.zip 13 | unzip cities1000.zip 14 | rm readme.txt 15 | pyenv local 3.4.3 16 | python process_geonames_us.py 17 | rm *.zip 18 | rm *.txt 19 | -------------------------------------------------------------------------------- /process_geonames_us.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | import logging 6 | import logging.handlers 7 | 8 | logger = logging.getLogger('ProcessGeoname') 9 | logging.basicConfig(level=logging.DEBUG, format='%(levelname)s: %(message)s') 10 | logging.getLogger("requests").setLevel(logging.WARNING) 11 | 12 | 13 | import csv, codecs, sys, json, os, util 14 | 15 | csv.field_size_limit(sys.maxsize) 16 | 17 | def prepare_us_places_to_state_mapping(): 18 | 19 | def build_non_us_places(): 20 | non_us_places = [] 21 | with codecs.getreader('utf-8')(open('allCountries.txt', 'rb')) as f: 22 | reader = csv.reader(f, delimiter='\t') 23 | cnt = 0 24 | for row in reader: 25 | country = row[8].lower().strip() 26 | state = row[10].lower().strip() 27 | name = row[1].lower().strip() 28 | 29 | if country != 'us': 30 | non_us_places.append(name) 31 | 32 | return non_us_places 33 | 34 | non_us_places = build_non_us_places() 35 | non_us_places.extend(util.bad_locations) # noticed bad location names 36 | 37 | non_us_places = set(non_us_places) 38 | 39 | features = { 40 | 0: {}, #state 41 | 1: {}, #city 42 | 2 : {} #county 43 | } 44 | 45 | state_lookup = {} 46 | 47 | country_alternate_names = ['us', 'u.s.', 'u.s', 'u.s.a.', 'u.s.a', 'usa', 'united states']#, 'united states of america', 'america'] 48 | country = 'us' 49 | #build state lookup 50 | with codecs.getreader('utf-8')(open('zip_US.txt', 'rb')) as f: 51 | reader = csv.reader(f, delimiter='\t') 52 | 53 | for row in reader: 54 | state_name = row[3].lower().strip() 55 | 56 | if (not state_name): 57 | continue 58 | state = row[4].lower().strip() 59 | 60 | place = row[2].lower().strip() 61 | 62 | features[0][state] = state 63 | features[0][state_name] = state 64 | for country in country_alternate_names: 65 | features[0]['%s,%s'%(state, country)] = state 66 | features[0]['%s %s'%(state, country)] = state 67 | features[0]['%s,%s'%(state_name, country)] = state 68 | features[0]['%s %s'%(state_name, country)] = state 69 | 70 | state_lookup[state] = state_name 71 | 72 | if (place not in non_us_places): 73 | features[1][place] = state 74 | features[1]['%s,%s'%(place, state)] = state 75 | features[1]['%s,%s'%(place, state_name)] = state 76 | for country in country_alternate_names: 77 | features[1]['%s,%s'%(place, country)] = state 78 | features[1]['%s,%s,%s'%(place, state, country)] = state 79 | features[1]['%s,%s,%s'%(place, state_name, country)] = state 80 | 81 | features[1]['%s %s'%(place, state)] = state 82 | features[1]['%s %s'%(place, state_name)] = state 83 | for country in country_alternate_names: 84 | features[1]['%s %s'%(place, country)] = state 85 | features[1]['%s %s %s'%(place, state, country)] = state 86 | features[1]['%s %s %s'%(place, state_name, country)] = state 87 | 88 | with codecs.getreader('utf-8')(open('US.txt', 'rb')) as f: 89 | reader = csv.reader(f, delimiter='\t') 90 | cnt = 0 91 | for row in reader: 92 | 93 | country = row[8].lower().strip() 94 | state = row[10].lower().strip() 95 | name = row[1].lower().strip() 96 | 97 | if country != 'us': 98 | continue 99 | 100 | #feature class see http://www.geonames.org/export/codes.html, char(1) 101 | if (row[6].strip() not in ('A', 'P')): 102 | continue 103 | 104 | featureClassCode = row[7].strip() 105 | 106 | # admin1 - admin4 107 | #logger.info('%s, %s, %s, %s'%(row[10], row[11], row[12], row[13])) 108 | #ADM1 -> state 109 | #ADM2 -> County 110 | #ADM3 -> Village 111 | #ADM4 -> Doesn't exist 112 | 113 | # state already processed 114 | # if(featureClassCode == 'ADM1'): 115 | # features[0][name] = state 116 | # for c in country_alternate_names: 117 | # features[0]['%s,%s'%(name, c)] = state 118 | # features[0]['%s %s'%(name, c)] = state 119 | # features[0][state] = state 120 | # cnt += 1 121 | 122 | # if(name == 'kentucky'): 123 | # logger.info(row) 124 | # # county 125 | if(featureClassCode == 'ADM2'): 126 | if (name not in non_us_places): 127 | features[2][name] = state 128 | features[2]['%s,%s'%(name, state)] = state 129 | features[2]['%s,%s'%(name, state_lookup[state])] = state 130 | for country in country_alternate_names: 131 | features[2]['%s,%s'%(name, country)] = state 132 | features[2]['%s,%s,%s'%(name, state, country)] = state 133 | features[2]['%s,%s,%s'%(name, state_lookup[state], country)] = state 134 | 135 | features[2]['%s %s'%(name, state)] = state 136 | features[2]['%s %s'%(name, state_lookup[state])] = state 137 | for country in country_alternate_names: 138 | features[2]['%s %s'%(name, country)] = state 139 | features[2]['%s %s %s'%(name, state, country)] = state 140 | features[2]['%s %s %s'%(name, state_lookup[state], country)] = state 141 | 142 | cnt += 1 143 | elif(featureClassCode == 'PPL'): 144 | if (name not in non_us_places): 145 | features[1][name] = state 146 | features[1]['%s,%s'%(name, state)] = state 147 | features[1]['%s,%s'%(name, state_lookup[state])] = state 148 | for country in country_alternate_names: 149 | features[1]['%s,%s'%(name, country)] = state 150 | features[1]['%s,%s,%s'%(name, state, country)] = state 151 | features[1]['%s,%s,%s'%(name, state_lookup[state], country)] = state 152 | 153 | features[1]['%s %s'%(name, state)] = state 154 | features[1]['%s %s'%(name, state_lookup[state])] = state 155 | for country in country_alternate_names: 156 | features[1]['%s %s'%(name, country)] = state 157 | features[1]['%s %s %s'%(name, state, country)] = state 158 | features[1]['%s %s %s'%(name, state_lookup[state], country)] = state 159 | 160 | cnt += 1 161 | 162 | 163 | if (cnt % 10000 == 0): 164 | logger.info('%d'%(cnt)) 165 | 166 | logger.info(cnt) 167 | 168 | feature_file = os.path.abspath('us.states.json') 169 | 170 | with open(feature_file, 'w') as wf: 171 | json.dump(features, wf) 172 | 173 | def prepare_reverse_gecoding_data_by_zip_us(): 174 | 175 | with codecs.getreader('utf-8')(open('zip_US.txt', 'rb')) as f, open('us_geocode.csv', 'w') as of: 176 | reader = csv.reader(f, delimiter='\t') 177 | writer = csv.writer(of) 178 | for row in reader: 179 | state_name = row[3].lower().strip() 180 | 181 | if (not state_name): 182 | continue 183 | state = row[4].lower().strip() 184 | 185 | place = row[2].lower().strip() 186 | 187 | latitude, longitude = row[9:11] 188 | 189 | line = latitude, longitude, state, place 190 | writer.writerow(line) 191 | 192 | def read_us_cities(): 193 | cities = [] 194 | with codecs.getreader('utf-8')(open('cities1000.txt', 'rb')) as rf: 195 | reader = csv.reader(rf, delimiter='\t') 196 | for row in reader: 197 | 198 | country = row[8].lower().strip() 199 | 200 | if (country != 'us'): 201 | continue 202 | 203 | state = row[10].lower().strip() 204 | 205 | population = int(row[14].lower().strip()) 206 | 207 | name = row[1].lower().strip() 208 | 209 | alternate_names = row[3].lower().strip() 210 | 211 | latitude, longitude = row[4:6] 212 | 213 | #logger.info(row) 214 | #logger.info("[%s], [%s], [%s]: [%d], (%s, %s)"%(name, state, country, population, latitude, longitude)) 215 | 216 | cities.append({ 217 | "name": name, 218 | "alternate_names": alternate_names, 219 | "latitude": latitude, 220 | "longitude": longitude, 221 | "state": state, 222 | "country": country, 223 | "population": population 224 | }) 225 | 226 | return cities 227 | 228 | def prepare_us_city_mapping(): 229 | cities = read_us_cities() 230 | 231 | cities = sorted(cities, key=lambda k: k['population'], reverse=True) 232 | 233 | output = [] 234 | for city in cities: 235 | 236 | population = int(city['population']) 237 | if (population < 100000): 238 | continue 239 | 240 | names = [city['name']] 241 | #alternate_names = city['alternate_names'].split(',') 242 | 243 | # for alternate_name in alternate_names: 244 | # 245 | # alternate_name = alternate_name.strip() 246 | # 247 | # try: 248 | # lang = detect(alternate_name) 249 | # 250 | # if (lang == 'en'): 251 | # names.append(alternate_name) 252 | # except: 253 | # pass 254 | 255 | output.append({ 256 | 'city': city['name'], 257 | 'state': city['state'], 258 | 'names': names 259 | }) 260 | 261 | logger.info("total cities (population > 100k): %d"%(len(output))) 262 | us_cities_file = os.path.abspath('us.cities.json') 263 | 264 | with open(us_cities_file, 'w') as wf: 265 | json.dump(output, wf) 266 | 267 | 268 | def prepare_reverse_gecoding_data_by_cities_us(): 269 | 270 | cities = read_us_cities() 271 | 272 | cities = sorted(cities, key=lambda k: k['population'], reverse=True) 273 | 274 | with open('us_cities_geocode.csv', 'w') as of: 275 | writer = csv.writer(of) 276 | 277 | for city in cities: 278 | population = int(city['population']) 279 | if (population < 100000): 280 | continue 281 | 282 | line = city['latitude'], city['longitude'], city['state'], city['name'] 283 | writer.writerow(line) 284 | 285 | 286 | if __name__ == "__main__": 287 | logger.info(sys.version) 288 | 289 | # prepare_reverse_gecoding_data_by_cities_us() 290 | # prepare_us_city_mapping() 291 | 292 | prepare_reverse_gecoding_data_by_zip_us() 293 | prepare_us_places_to_state_mapping() 294 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | import logging 6 | import logging.handlers 7 | 8 | logger = logging.getLogger('ProcessGeoname') 9 | logging.basicConfig(level=logging.DEBUG, format='%(levelname)s: %(message)s') 10 | logging.getLogger("requests").setLevel(logging.WARNING) 11 | 12 | import csv, codecs, sys, json, os, copy 13 | import collections 14 | 15 | def check_duplicates(): 16 | us_cities_file = os.path.abspath('./geocoding/us.cities.json') 17 | cities = [] 18 | with open(us_cities_file, 'r') as rf: 19 | for city in json.load(rf): 20 | cities.append(city['city']) 21 | 22 | logger.info([x for x, y in collections.Counter(cities).items() if y > 1]) 23 | 24 | logger.info(len(set(cities))) 25 | 26 | # INFO: ['hollywood', 'kansas city', 'peoria', 'columbia', 'springfield', 'columbus', 'glendale', 'aurora'] 27 | 28 | if __name__ == "__main__": 29 | 30 | logger.info(sys.version) 31 | 32 | check_duplicates() 33 | -------------------------------------------------------------------------------- /tweet_us_city_geocoder.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | import logging 6 | 7 | logger = logging.getLogger('GeocodingTweets') 8 | 9 | import os, json, sys, re, csv, codecs 10 | from scipy.spatial import cKDTree as KDTree 11 | from math import sin, cos, sqrt, atan2, radians, isinf 12 | 13 | def singleton(cls): 14 | """Singleton pattern to avoid loading class multiple times 15 | """ 16 | instances = {} 17 | def getinstance(): 18 | if cls not in instances: 19 | instances[cls] = cls() 20 | return instances[cls] 21 | return getinstance 22 | 23 | 24 | @singleton 25 | class TweetUSCityGeocoder: 26 | 27 | def __init__(self, geocode_filename='us_cities_geocode.csv', us_cities='us.cities.json'): 28 | coordinates, self.locations = self.extract_coordinates_and_locations(rel_path(geocode_filename)) 29 | self.tree = KDTree(coordinates) 30 | 31 | self.us_cities = self.load_us_cities(rel_path(us_cities)) 32 | 33 | # keep only alpha, space, period and comma 34 | self.keep_alpha_p = re.compile(r'[^a-zA-Z\s\.,]') 35 | 36 | self.geomap = {} 37 | 38 | def load_us_cities(self, local_filename): 39 | if os.path.exists(local_filename): 40 | with open(local_filename, 'r') as rf: 41 | return json.load(rf) 42 | else: 43 | logger.error("missing us_cities file: [%s]"%(local_filename)) 44 | sys.exit(1) 45 | 46 | def extract_coordinates_and_locations(self, local_filename): 47 | """Extract geocode data from zip 48 | """ 49 | if os.path.exists(local_filename): 50 | # open compact CSV 51 | rows = csv.reader(codecs.getreader('utf-8')(open(local_filename, 'rb'))) 52 | else: 53 | logger.error("missing geocode file: [%s]"%(local_filename)) 54 | sys.exit(1) 55 | 56 | # load a list of known coordinates and corresponding locations 57 | coordinates, locations = [], [] 58 | for latitude, longitude, state, place in rows: 59 | coordinates.append((latitude, longitude)) 60 | locations.append(dict(state=state, city=place, latitude=latitude, longitude=longitude)) 61 | return coordinates, locations 62 | 63 | def query_coordinates(self, coordinates): 64 | """Find closest match to this list of coordinates 65 | """ 66 | try: 67 | distances, indices = self.tree.query(coordinates, k=1) #, distance_upper_bound=0.1 68 | except ValueError as e: 69 | logger.erro('Unable to parse coordinates:', coordinates) 70 | raise e 71 | else: 72 | results = [] 73 | for distance, index in zip(distances, indices): 74 | if not isinf(distance): 75 | result = self.locations[index] 76 | result['distance'] = distance 77 | 78 | results.append(result) 79 | 80 | return results 81 | 82 | def distance(self, coordinate_1, coordinate_2): 83 | 84 | R = 6373.0 85 | 86 | lat1, lon1 = coordinate_1 87 | lat2, lon2 = coordinate_2 88 | 89 | lat1 = radians(float(lat1)) 90 | lon1 = radians(float(lon1)) 91 | lat2 = radians(float(lat2)) 92 | lon2 = radians(float(lon2)) 93 | 94 | dlon = lon2 - lon1 95 | dlat = lat2 - lat1 96 | a = (sin(dlat/2))**2 + cos(lat1) * cos(lat2) * (sin(dlon/2))**2 97 | c = 2 * atan2(sqrt(a), sqrt(1-a)) 98 | distance = R * c 99 | 100 | return distance * 0.621371 101 | 102 | def get_by_coordinate(self, coordinate): 103 | """Search for closest known location to this coordinate 104 | """ 105 | tug = TweetUSCityGeocoder() 106 | results = tug.query_coordinates([coordinate]) 107 | return results[0] if results else None 108 | 109 | def search_by_coordinates(self, coordinates): 110 | """Search for closest known locations to these coordinates 111 | """ 112 | tug = TweetUSCityGeocoder() 113 | return tug.query_coordinates(coordinates) 114 | 115 | def get_city_state(self, address): 116 | 117 | address = address.strip() 118 | 119 | city_state = None 120 | 121 | if address not in self.geomap: 122 | 123 | p = re.findall(r'.*?([-+]?\d*\.\d+),([-+]?\d*\.\d+)', address) 124 | 125 | if (len(p) > 0): 126 | coordinate = p.pop() 127 | nearest = self.get_by_coordinate(coordinate) 128 | 129 | if nearest: 130 | c2 = nearest['latitude'], nearest['longitude'] 131 | d = self.distance(coordinate, c2) 132 | if (d < 20): # less than 100 miles 133 | 134 | city_state = { 135 | 'city': nearest['city'], 136 | 'state': nearest['state'] 137 | } 138 | self.geomap[address] = city_state 139 | 140 | return city_state 141 | else: 142 | 143 | address_ = address.replace(', ', ',') 144 | address_ = re.sub(self.keep_alpha_p, '', address_) 145 | address_ = address_.lower() 146 | 147 | for cs in self.us_cities: 148 | for name in cs['names']: 149 | m = re.findall(r'(?:^|\s)(%s)(?=\s|$)'%(name), address_) 150 | if (m): 151 | city_state = cs 152 | self.geomap[address] = city_state 153 | return city_state 154 | 155 | else: 156 | city_state = self.geomap[address] 157 | 158 | return city_state 159 | 160 | 161 | def rel_path(filename): 162 | """Return the path of this filename relative to the current script 163 | """ 164 | return os.path.join(os.getcwd(), os.path.dirname(__file__), filename) 165 | 166 | # def distance(coordinate_1, coordinate_2): 167 | # tug = TweetUSGeocoder() 168 | # return tug.distance(coordinate_1, coordinate_2) 169 | 170 | if __name__=="__main__": 171 | 172 | logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') 173 | tug = TweetUSCityGeocoder() 174 | 175 | logger.info(tug.get_city_state('xxx: (39.76838,-86.15804)')) 176 | #logger.info(tug.get_state('Little Rock, AR')) 177 | 178 | # test some coordinate lookups 179 | #city1 = (-37.81, 144.96) 180 | # city1 = (34.7240049,-92.3379275) 181 | # city2 = (35.7240049,-92.3379275) 182 | # logger.info(tug.distance(city1, city2)) 183 | # city1 = (54.143,-165.7854) 184 | # #city2 = (31.76, 35.21) 185 | # nearest = tug.get_by_coordinate(city1) 186 | # logger.info(nearest) 187 | # if (nearest): 188 | # nearest_city = nearest['latitude'], nearest['longitude'] 189 | 190 | # logger.info(tug.distance(city1, nearest_city)) 191 | # #print(search([city1, city2])) 192 | -------------------------------------------------------------------------------- /tweet_us_state_geocoder.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | import logging 6 | 7 | logger = logging.getLogger('GeocodingTweets') 8 | 9 | import os, json, sys, re, csv, codecs 10 | from scipy.spatial import cKDTree as KDTree 11 | from math import sin, cos, sqrt, atan2, radians, isinf 12 | 13 | def singleton(cls): 14 | """Singleton pattern to avoid loading class multiple times 15 | """ 16 | instances = {} 17 | def getinstance(): 18 | if cls not in instances: 19 | instances[cls] = cls() 20 | return instances[cls] 21 | return getinstance 22 | 23 | 24 | @singleton 25 | class TweetUSStateGeocoder: 26 | 27 | def __init__(self, geocode_filename='us_geocode.csv', us_places_to_state_mapping_filename='us.states.json'): 28 | coordinates, self.locations = self.extract_coordinates_and_locations(rel_path(geocode_filename)) 29 | self.tree = KDTree(coordinates) 30 | 31 | self.us_places_to_state_map = self.load_us_places_to_state_mapping_file(rel_path(us_places_to_state_mapping_filename)) 32 | 33 | # keep only alpha, space, period and comma 34 | self.keep_alpha_p = re.compile(r'[^a-zA-Z\s\.,]') 35 | 36 | self.geomap = {} 37 | 38 | def load_us_places_to_state_mapping_file(self, local_filename): 39 | if os.path.exists(local_filename): 40 | with open(local_filename, 'r') as rf: 41 | return json.load(rf) 42 | else: 43 | logger.error("missing us_places_to_state_mapping file: [%s]"%(local_filename)) 44 | sys.exit(1) 45 | 46 | def extract_coordinates_and_locations(self, local_filename): 47 | """Extract geocode data from zip 48 | """ 49 | if os.path.exists(local_filename): 50 | # open compact CSV 51 | rows = csv.reader(codecs.getreader('utf-8')(open(local_filename, 'rb'))) 52 | else: 53 | logger.error("missing geocode file: [%s]"%(local_filename)) 54 | sys.exit(1) 55 | 56 | # load a list of known coordinates and corresponding locations 57 | coordinates, locations = [], [] 58 | for latitude, longitude, state, place in rows: 59 | coordinates.append((latitude, longitude)) 60 | locations.append(dict(state=state, city=place, latitude=latitude, longitude=longitude)) 61 | return coordinates, locations 62 | 63 | def query_coordinates(self, coordinates): 64 | """Find closest match to this list of coordinates 65 | """ 66 | try: 67 | distances, indices = self.tree.query(coordinates, k=1) #, distance_upper_bound=0.1 68 | except ValueError as e: 69 | logger.erro('Unable to parse coordinates:', coordinates) 70 | raise e 71 | else: 72 | results = [] 73 | for distance, index in zip(distances, indices): 74 | if not isinf(distance): 75 | result = self.locations[index] 76 | result['distance'] = distance 77 | 78 | results.append(result) 79 | 80 | return results 81 | 82 | def distance(self, coordinate_1, coordinate_2): 83 | 84 | R = 6373.0 85 | 86 | lat1, lon1 = coordinate_1 87 | lat2, lon2 = coordinate_2 88 | 89 | lat1 = radians(float(lat1)) 90 | lon1 = radians(float(lon1)) 91 | lat2 = radians(float(lat2)) 92 | lon2 = radians(float(lon2)) 93 | 94 | dlon = lon2 - lon1 95 | dlat = lat2 - lat1 96 | a = (sin(dlat/2))**2 + cos(lat1) * cos(lat2) * (sin(dlon/2))**2 97 | c = 2 * atan2(sqrt(a), sqrt(1-a)) 98 | distance = R * c 99 | 100 | return distance * 0.621371 101 | 102 | def get_by_coordinate(self, coordinate): 103 | """Search for closest known location to this coordinate 104 | """ 105 | tug = TweetUSStateGeocoder() 106 | results = tug.query_coordinates([coordinate]) 107 | return results[0] if results else None 108 | 109 | def search_by_coordinates(self, coordinates): 110 | """Search for closest known locations to these coordinates 111 | """ 112 | tug = TweetUSStateGeocoder() 113 | return tug.query_coordinates(coordinates) 114 | 115 | def get_state(self, address): 116 | 117 | address = address.strip() 118 | 119 | state = None 120 | 121 | if address not in self.geomap: 122 | 123 | p = re.findall(r'.*?([-+]?\d*\.\d+),([-+]?\d*\.\d+)', address) 124 | 125 | if (len(p) > 0): 126 | coordinate = p.pop() 127 | nearest = self.get_by_coordinate(coordinate) 128 | 129 | if nearest: 130 | c2 = nearest['latitude'], nearest['longitude'] 131 | d = self.distance(coordinate, c2) 132 | if (d < 20): # less than 100 miles 133 | state = nearest['state'] 134 | self.geomap[address] = state 135 | 136 | else: 137 | 138 | address_ = address.replace(', ', ',') 139 | address_ = re.sub(self.keep_alpha_p, '', address_) 140 | address_ = address_.lower() 141 | 142 | for i in range(3): 143 | #state = us_places_to_state_map[address] if address in us_places_to_state_map else None 144 | if address_ in self.us_places_to_state_map['%s'%i]: 145 | state = self.us_places_to_state_map['%s'%i][address_] 146 | self.geomap[address] = state 147 | break 148 | # logger.info('[%s]->%s'%(address, state)) 149 | else: 150 | state = self.geomap[address] 151 | 152 | return state 153 | 154 | 155 | def rel_path(filename): 156 | """Return the path of this filename relative to the current script 157 | """ 158 | return os.path.join(os.getcwd(), os.path.dirname(__file__), filename) 159 | 160 | # def distance(coordinate_1, coordinate_2): 161 | # tug = TweetUSGeocoder() 162 | # return tug.distance(coordinate_1, coordinate_2) 163 | 164 | if __name__=="__main__": 165 | 166 | logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') 167 | tug = TweetUSStateGeocoder() 168 | 169 | logger.info(tug.get_state('xxx: (-37.81, 144.96)')) 170 | logger.info(tug.get_state('Little Rock, AR')) 171 | 172 | # test some coordinate lookups 173 | #city1 = (-37.81, 144.96) 174 | # city1 = (34.7240049,-92.3379275) 175 | # city2 = (35.7240049,-92.3379275) 176 | # logger.info(tug.distance(city1, city2)) 177 | # city1 = (54.143,-165.7854) 178 | # #city2 = (31.76, 35.21) 179 | # nearest = tug.get_by_coordinate(city1) 180 | # logger.info(nearest) 181 | # if (nearest): 182 | # nearest_city = nearest['latitude'], nearest['longitude'] 183 | 184 | # logger.info(tug.distance(city1, nearest_city)) 185 | # #print(search([city1, city2])) 186 | -------------------------------------------------------------------------------- /us.cities.json: -------------------------------------------------------------------------------- 1 | [{ 2 | "names": ["los angeles"], 3 | "state": "ca", 4 | "city": "los angeles" 5 | }, { 6 | "names": ["chicago"], 7 | "state": "il", 8 | "city": "chicago" 9 | }, { 10 | "names": ["houston"], 11 | "state": "tx", 12 | "city": "houston" 13 | }, { 14 | "names": ["philadelphia"], 15 | "state": "pa", 16 | "city": "philadelphia" 17 | }, { 18 | "names": ["phoenix"], 19 | "state": "az", 20 | "city": "phoenix" 21 | }, { 22 | "names": ["san diego"], 23 | "state": "ca", 24 | "city": "san diego" 25 | }, { 26 | "names": ["dallas"], 27 | "state": "tx", 28 | "city": "dallas" 29 | }, { 30 | "names": ["san jose"], 31 | "state": "ca", 32 | "city": "san jose" 33 | }, { 34 | "names": ["indianapolis"], 35 | "state": "in", 36 | "city": "indianapolis" 37 | }, { 38 | "names": ["jacksonville"], 39 | "state": "fl", 40 | "city": "jacksonville" 41 | }, { 42 | "names": ["san francisco"], 43 | "state": "ca", 44 | "city": "san francisco" 45 | }, { 46 | "names": ["austin"], 47 | "state": "tx", 48 | "city": "austin" 49 | }, { 50 | "names": ["columbus"], 51 | "state": "oh", 52 | "city": "columbus" 53 | }, { 54 | "names": ["fort worth"], 55 | "state": "tx", 56 | "city": "fort worth" 57 | }, { 58 | "names": ["charlotte"], 59 | "state": "nc", 60 | "city": "charlotte" 61 | }, { 62 | "names": ["el paso"], 63 | "state": "tx", 64 | "city": "el paso" 65 | }, { 66 | "names": ["memphis"], 67 | "state": "tn", 68 | "city": "memphis" 69 | }, { 70 | "names": ["baltimore"], 71 | "state": "md", 72 | "city": "baltimore" 73 | }, { 74 | "names": ["boston"], 75 | "state": "ma", 76 | "city": "boston" 77 | }, { 78 | "names": ["seattle"], 79 | "state": "wa", 80 | "city": "seattle" 81 | }, { 82 | "names": ["washington, d.c."], 83 | "state": "dc", 84 | "city": "washington, d.c." 85 | }, { 86 | "names": ["denver"], 87 | "state": "co", 88 | "city": "denver" 89 | }, { 90 | "names": ["milwaukee"], 91 | "state": "wi", 92 | "city": "milwaukee" 93 | }, { 94 | "names": ["portland"], 95 | "state": "or", 96 | "city": "portland" 97 | }, { 98 | "names": ["las vegas"], 99 | "state": "nv", 100 | "city": "las vegas" 101 | }, { 102 | "names": ["oklahoma city"], 103 | "state": "ok", 104 | "city": "oklahoma city" 105 | }, { 106 | "names": ["albuquerque"], 107 | "state": "nm", 108 | "city": "albuquerque" 109 | }, { 110 | "names": ["nashville"], 111 | "state": "tn", 112 | "city": "nashville" 113 | }, { 114 | "names": ["tucson"], 115 | "state": "az", 116 | "city": "tucson" 117 | }, { 118 | "names": ["fresno"], 119 | "state": "ca", 120 | "city": "fresno" 121 | }, { 122 | "names": ["sacramento"], 123 | "state": "ca", 124 | "city": "sacramento" 125 | }, { 126 | "names": ["long beach"], 127 | "state": "ca", 128 | "city": "long beach" 129 | }, { 130 | "names": ["kansas city"], 131 | "state": "ks", 132 | "city": "kansas city" 133 | }, { 134 | "names": ["mesa"], 135 | "state": "az", 136 | "city": "mesa" 137 | }, { 138 | "names": ["atlanta"], 139 | "state": "ga", 140 | "city": "atlanta" 141 | }, { 142 | "names": ["colorado springs"], 143 | "state": "co", 144 | "city": "colorado springs" 145 | }, { 146 | "names": ["raleigh"], 147 | "state": "nc", 148 | "city": "raleigh" 149 | }, { 150 | "names": ["miami"], 151 | "state": "fl", 152 | "city": "miami" 153 | }, { 154 | "names": ["tulsa"], 155 | "state": "ok", 156 | "city": "tulsa" 157 | }, { 158 | "names": ["oakland"], 159 | "state": "ca", 160 | "city": "oakland" 161 | }, { 162 | "names": ["wichita"], 163 | "state": "ks", 164 | "city": "wichita" 165 | }, { 166 | "names": ["honolulu"], 167 | "state": "hi", 168 | "city": "honolulu" 169 | }, { 170 | "names": ["arlington"], 171 | "state": "tx", 172 | "city": "arlington" 173 | }, { 174 | "names": ["bakersfield"], 175 | "state": "ca", 176 | "city": "bakersfield" 177 | }, { 178 | "names": ["new orleans"], 179 | "state": "la", 180 | "city": "new orleans" 181 | }, { 182 | "names": ["anaheim"], 183 | "state": "ca", 184 | "city": "anaheim" 185 | }, { 186 | "names": ["tampa"], 187 | "state": "fl", 188 | "city": "tampa" 189 | }, { 190 | "names": ["aurora"], 191 | "state": "co", 192 | "city": "aurora" 193 | }, { 194 | "names": ["santa ana"], 195 | "state": "ca", 196 | "city": "santa ana" 197 | }, { 198 | "names": ["st. louis"], 199 | "state": "mo", 200 | "city": "st. louis" 201 | }, { 202 | "names": ["pittsburgh"], 203 | "state": "pa", 204 | "city": "pittsburgh" 205 | }, { 206 | "names": ["corpus christi"], 207 | "state": "tx", 208 | "city": "corpus christi" 209 | }, { 210 | "names": ["riverside"], 211 | "state": "ca", 212 | "city": "riverside" 213 | }, { 214 | "names": ["cincinnati"], 215 | "state": "oh", 216 | "city": "cincinnati" 217 | }, { 218 | "names": ["lexington-fayette"], 219 | "state": "ky", 220 | "city": "lexington-fayette" 221 | }, { 222 | "names": ["anchorage"], 223 | "state": "ak", 224 | "city": "anchorage" 225 | }, { 226 | "names": ["stockton"], 227 | "state": "ca", 228 | "city": "stockton" 229 | }, { 230 | "names": ["ironville"], 231 | "state": "ky", 232 | "city": "ironville" 233 | }, { 234 | "names": ["meads"], 235 | "state": "ky", 236 | "city": "meads" 237 | }, { 238 | "names": ["greensboro"], 239 | "state": "nc", 240 | "city": "greensboro" 241 | }, { 242 | "names": ["henderson"], 243 | "state": "nv", 244 | "city": "henderson" 245 | }, { 246 | "names": ["fort wayne"], 247 | "state": "in", 248 | "city": "fort wayne" 249 | }, { 250 | "names": ["saint petersburg"], 251 | "state": "fl", 252 | "city": "saint petersburg" 253 | }, { 254 | "names": ["chula vista"], 255 | "state": "ca", 256 | "city": "chula vista" 257 | }, { 258 | "names": ["louisville"], 259 | "state": "ky", 260 | "city": "louisville" 261 | }, { 262 | "names": ["orlando"], 263 | "state": "fl", 264 | "city": "orlando" 265 | }, { 266 | "names": ["chandler"], 267 | "state": "az", 268 | "city": "chandler" 269 | }, { 270 | "names": ["madison"], 271 | "state": "wi", 272 | "city": "madison" 273 | }, { 274 | "names": ["winston-salem"], 275 | "state": "nc", 276 | "city": "winston-salem" 277 | }, { 278 | "names": ["lubbock"], 279 | "state": "tx", 280 | "city": "lubbock" 281 | }, { 282 | "names": ["baton rouge"], 283 | "state": "la", 284 | "city": "baton rouge" 285 | }, { 286 | "names": ["durham"], 287 | "state": "nc", 288 | "city": "durham" 289 | }, { 290 | "names": ["garland"], 291 | "state": "tx", 292 | "city": "garland" 293 | }, { 294 | "names": ["lexington"], 295 | "state": "ky", 296 | "city": "lexington" 297 | }, { 298 | "names": ["reno"], 299 | "state": "nv", 300 | "city": "reno" 301 | }, { 302 | "names": ["hialeah"], 303 | "state": "fl", 304 | "city": "hialeah" 305 | }, { 306 | "names": ["paradise"], 307 | "state": "nv", 308 | "city": "paradise" 309 | }, { 310 | "names": ["scottsdale"], 311 | "state": "az", 312 | "city": "scottsdale" 313 | }, { 314 | "names": ["fremont"], 315 | "state": "ca", 316 | "city": "fremont" 317 | }, { 318 | "names": ["birmingham"], 319 | "state": "al", 320 | "city": "birmingham" 321 | }, { 322 | "names": ["san bernardino"], 323 | "state": "ca", 324 | "city": "san bernardino" 325 | }, { 326 | "names": ["spokane"], 327 | "state": "wa", 328 | "city": "spokane" 329 | }, { 330 | "names": ["gilbert"], 331 | "state": "az", 332 | "city": "gilbert" 333 | }, { 334 | "names": ["montgomery"], 335 | "state": "al", 336 | "city": "montgomery" 337 | }, { 338 | "names": ["des moines"], 339 | "state": "ia", 340 | "city": "des moines" 341 | }, { 342 | "names": ["modesto"], 343 | "state": "ca", 344 | "city": "modesto" 345 | }, { 346 | "names": ["fayetteville"], 347 | "state": "nc", 348 | "city": "fayetteville" 349 | }, { 350 | "names": ["shreveport"], 351 | "state": "la", 352 | "city": "shreveport" 353 | }, { 354 | "names": ["tacoma"], 355 | "state": "wa", 356 | "city": "tacoma" 357 | }, { 358 | "names": ["oxnard"], 359 | "state": "ca", 360 | "city": "oxnard" 361 | }, { 362 | "names": ["fontana"], 363 | "state": "ca", 364 | "city": "fontana" 365 | }, { 366 | "names": ["mobile"], 367 | "state": "al", 368 | "city": "mobile" 369 | }, { 370 | "names": ["little rock"], 371 | "state": "ar", 372 | "city": "little rock" 373 | }, { 374 | "names": ["moreno valley"], 375 | "state": "ca", 376 | "city": "moreno valley" 377 | }, { 378 | "names": ["glendale"], 379 | "state": "ca", 380 | "city": "glendale" 381 | }, { 382 | "names": ["amarillo"], 383 | "state": "tx", 384 | "city": "amarillo" 385 | }, { 386 | "names": ["huntington beach"], 387 | "state": "ca", 388 | "city": "huntington beach" 389 | }, { 390 | "names": ["sunrise manor"], 391 | "state": "nv", 392 | "city": "sunrise manor" 393 | }, { 394 | "names": ["oxnard shores"], 395 | "state": "ca", 396 | "city": "oxnard shores" 397 | }, { 398 | "names": ["salt lake city"], 399 | "state": "ut", 400 | "city": "salt lake city" 401 | }, { 402 | "names": ["tallahassee"], 403 | "state": "fl", 404 | "city": "tallahassee" 405 | }, { 406 | "names": ["huntsville"], 407 | "state": "al", 408 | "city": "huntsville" 409 | }, { 410 | "names": ["knoxville"], 411 | "state": "tn", 412 | "city": "knoxville" 413 | }, { 414 | "names": ["spring valley"], 415 | "state": "nv", 416 | "city": "spring valley" 417 | }, { 418 | "names": ["providence"], 419 | "state": "ri", 420 | "city": "providence" 421 | }, { 422 | "names": ["santa clarita"], 423 | "state": "ca", 424 | "city": "santa clarita" 425 | }, { 426 | "names": ["grand prairie"], 427 | "state": "tx", 428 | "city": "grand prairie" 429 | }, { 430 | "names": ["brownsville"], 431 | "state": "tx", 432 | "city": "brownsville" 433 | }, { 434 | "names": ["jackson"], 435 | "state": "ms", 436 | "city": "jackson" 437 | }, { 438 | "names": ["overland park"], 439 | "state": "ks", 440 | "city": "overland park" 441 | }, { 442 | "names": ["garden grove"], 443 | "state": "ca", 444 | "city": "garden grove" 445 | }, { 446 | "names": ["santa rosa"], 447 | "state": "ca", 448 | "city": "santa rosa" 449 | }, { 450 | "names": ["chattanooga"], 451 | "state": "tn", 452 | "city": "chattanooga" 453 | }, { 454 | "names": ["hollywood"], 455 | "state": "ca", 456 | "city": "hollywood" 457 | }, { 458 | "names": ["oceanside"], 459 | "state": "ca", 460 | "city": "oceanside" 461 | }, { 462 | "names": ["fort lauderdale"], 463 | "state": "fl", 464 | "city": "fort lauderdale" 465 | }, { 466 | "names": ["rancho cucamonga"], 467 | "state": "ca", 468 | "city": "rancho cucamonga" 469 | }, { 470 | "names": ["port saint lucie"], 471 | "state": "fl", 472 | "city": "port saint lucie" 473 | }, { 474 | "names": ["ontario"], 475 | "state": "ca", 476 | "city": "ontario" 477 | }, { 478 | "names": ["vancouver"], 479 | "state": "wa", 480 | "city": "vancouver" 481 | }, { 482 | "names": ["tempe"], 483 | "state": "az", 484 | "city": "tempe" 485 | }, { 486 | "names": ["springfield"], 487 | "state": "mo", 488 | "city": "springfield" 489 | }, { 490 | "names": ["tempe junction"], 491 | "state": "az", 492 | "city": "tempe junction" 493 | }, { 494 | "names": ["lancaster"], 495 | "state": "ca", 496 | "city": "lancaster" 497 | }, { 498 | "names": ["eugene"], 499 | "state": "or", 500 | "city": "eugene" 501 | }, { 502 | "names": ["pembroke pines"], 503 | "state": "fl", 504 | "city": "pembroke pines" 505 | }, { 506 | "names": ["salem"], 507 | "state": "or", 508 | "city": "salem" 509 | }, { 510 | "names": ["cape coral"], 511 | "state": "fl", 512 | "city": "cape coral" 513 | }, { 514 | "names": ["sioux falls"], 515 | "state": "sd", 516 | "city": "sioux falls" 517 | }, { 518 | "names": ["elk grove"], 519 | "state": "ca", 520 | "city": "elk grove" 521 | }, { 522 | "names": ["rockford"], 523 | "state": "il", 524 | "city": "rockford" 525 | }, { 526 | "names": ["palmdale"], 527 | "state": "ca", 528 | "city": "palmdale" 529 | }, { 530 | "names": ["corona"], 531 | "state": "ca", 532 | "city": "corona" 533 | }, { 534 | "names": ["salinas"], 535 | "state": "ca", 536 | "city": "salinas" 537 | }, { 538 | "names": ["pomona"], 539 | "state": "ca", 540 | "city": "pomona" 541 | }, { 542 | "names": ["joliet"], 543 | "state": "il", 544 | "city": "joliet" 545 | }, { 546 | "names": ["boise"], 547 | "state": "id", 548 | "city": "boise" 549 | }, { 550 | "names": ["torrance"], 551 | "state": "ca", 552 | "city": "torrance" 553 | }, { 554 | "names": ["bridgeport"], 555 | "state": "ct", 556 | "city": "bridgeport" 557 | }, { 558 | "names": ["hayward"], 559 | "state": "ca", 560 | "city": "hayward" 561 | }, { 562 | "names": ["fort collins"], 563 | "state": "co", 564 | "city": "fort collins" 565 | }, { 566 | "names": ["escondido"], 567 | "state": "ca", 568 | "city": "escondido" 569 | }, { 570 | "names": ["lakewood"], 571 | "state": "co", 572 | "city": "lakewood" 573 | }, { 574 | "names": ["metairie terrace"], 575 | "state": "la", 576 | "city": "metairie terrace" 577 | }, { 578 | "names": ["naperville"], 579 | "state": "il", 580 | "city": "naperville" 581 | }, { 582 | "names": ["dayton"], 583 | "state": "oh", 584 | "city": "dayton" 585 | }, { 586 | "names": ["sunnyvale"], 587 | "state": "ca", 588 | "city": "sunnyvale" 589 | }, { 590 | "names": ["metairie"], 591 | "state": "la", 592 | "city": "metairie" 593 | }, { 594 | "names": ["pasadena"], 595 | "state": "ca", 596 | "city": "pasadena" 597 | }, { 598 | "names": ["orange"], 599 | "state": "ca", 600 | "city": "orange" 601 | }, { 602 | "names": ["savannah"], 603 | "state": "ga", 604 | "city": "savannah" 605 | }, { 606 | "names": ["cary"], 607 | "state": "nc", 608 | "city": "cary" 609 | }, { 610 | "names": ["fullerton"], 611 | "state": "ca", 612 | "city": "fullerton" 613 | }, { 614 | "names": ["clarksville"], 615 | "state": "tn", 616 | "city": "clarksville" 617 | }, { 618 | "names": ["west valley city"], 619 | "state": "ut", 620 | "city": "west valley city" 621 | }, { 622 | "names": ["topeka"], 623 | "state": "ks", 624 | "city": "topeka" 625 | }, { 626 | "names": ["thousand oaks"], 627 | "state": "ca", 628 | "city": "thousand oaks" 629 | }, { 630 | "names": ["cedar rapids"], 631 | "state": "ia", 632 | "city": "cedar rapids" 633 | }, { 634 | "names": ["olathe"], 635 | "state": "ks", 636 | "city": "olathe" 637 | }, { 638 | "names": ["gainesville"], 639 | "state": "fl", 640 | "city": "gainesville" 641 | }, { 642 | "names": ["simi valley"], 643 | "state": "ca", 644 | "city": "simi valley" 645 | }, { 646 | "names": ["bellevue"], 647 | "state": "wa", 648 | "city": "bellevue" 649 | }, { 650 | "names": ["concord"], 651 | "state": "ca", 652 | "city": "concord" 653 | }, { 654 | "names": ["miramar"], 655 | "state": "fl", 656 | "city": "miramar" 657 | }, { 658 | "names": ["coral springs"], 659 | "state": "fl", 660 | "city": "coral springs" 661 | }, { 662 | "names": ["lafayette"], 663 | "state": "la", 664 | "city": "lafayette" 665 | }, { 666 | "names": ["charleston"], 667 | "state": "sc", 668 | "city": "charleston" 669 | }, { 670 | "names": ["carrollton"], 671 | "state": "tx", 672 | "city": "carrollton" 673 | }, { 674 | "names": ["roseville"], 675 | "state": "ca", 676 | "city": "roseville" 677 | }, { 678 | "names": ["thornton"], 679 | "state": "co", 680 | "city": "thornton" 681 | }, { 682 | "names": ["beaumont"], 683 | "state": "tx", 684 | "city": "beaumont" 685 | }, { 686 | "names": ["surprise"], 687 | "state": "az", 688 | "city": "surprise" 689 | }, { 690 | "names": ["evansville"], 691 | "state": "in", 692 | "city": "evansville" 693 | }, { 694 | "names": ["abilene"], 695 | "state": "tx", 696 | "city": "abilene" 697 | }, { 698 | "names": ["frisco"], 699 | "state": "tx", 700 | "city": "frisco" 701 | }, { 702 | "names": ["independence"], 703 | "state": "mo", 704 | "city": "independence" 705 | }, { 706 | "names": ["athens"], 707 | "state": "ga", 708 | "city": "athens" 709 | }, { 710 | "names": ["santa clara"], 711 | "state": "ca", 712 | "city": "santa clara" 713 | }, { 714 | "names": ["peoria"], 715 | "state": "il", 716 | "city": "peoria" 717 | }, { 718 | "names": ["el monte"], 719 | "state": "ca", 720 | "city": "el monte" 721 | }, { 722 | "names": ["denton"], 723 | "state": "tx", 724 | "city": "denton" 725 | }, { 726 | "names": ["berkeley"], 727 | "state": "ca", 728 | "city": "berkeley" 729 | }, { 730 | "names": ["provo"], 731 | "state": "ut", 732 | "city": "provo" 733 | }, { 734 | "names": ["downey"], 735 | "state": "ca", 736 | "city": "downey" 737 | }, { 738 | "names": ["midland"], 739 | "state": "tx", 740 | "city": "midland" 741 | }, { 742 | "names": ["norman"], 743 | "state": "ok", 744 | "city": "norman" 745 | }, { 746 | "names": ["irving"], 747 | "state": "ct", 748 | "city": "irving" 749 | }, { 750 | "names": ["costa mesa"], 751 | "state": "ca", 752 | "city": "costa mesa" 753 | }, { 754 | "names": ["inglewood"], 755 | "state": "ca", 756 | "city": "inglewood" 757 | }, { 758 | "names": ["murfreesboro"], 759 | "state": "tn", 760 | "city": "murfreesboro" 761 | }, { 762 | "names": ["enterprise"], 763 | "state": "nv", 764 | "city": "enterprise" 765 | }, { 766 | "names": ["elgin"], 767 | "state": "il", 768 | "city": "elgin" 769 | }, { 770 | "names": ["clearwater"], 771 | "state": "fl", 772 | "city": "clearwater" 773 | }, { 774 | "names": ["miami gardens"], 775 | "state": "fl", 776 | "city": "miami gardens" 777 | }, { 778 | "names": ["pueblo"], 779 | "state": "co", 780 | "city": "pueblo" 781 | }, { 782 | "names": ["lowell"], 783 | "state": "ma", 784 | "city": "lowell" 785 | }, { 786 | "names": ["wilmington"], 787 | "state": "nc", 788 | "city": "wilmington" 789 | }, { 790 | "names": ["arvada"], 791 | "state": "co", 792 | "city": "arvada" 793 | }, { 794 | "names": ["westminster"], 795 | "state": "co", 796 | "city": "westminster" 797 | }, { 798 | "names": ["west covina"], 799 | "state": "ca", 800 | "city": "west covina" 801 | }, { 802 | "names": ["gresham"], 803 | "state": "or", 804 | "city": "gresham" 805 | }, { 806 | "names": ["norwalk"], 807 | "state": "ca", 808 | "city": "norwalk" 809 | }, { 810 | "names": ["carlsbad"], 811 | "state": "ca", 812 | "city": "carlsbad" 813 | }, { 814 | "names": ["fairfield"], 815 | "state": "ca", 816 | "city": "fairfield" 817 | }, { 818 | "names": ["cambridge"], 819 | "state": "ma", 820 | "city": "cambridge" 821 | }, { 822 | "names": ["universal city"], 823 | "state": "ca", 824 | "city": "universal city" 825 | }, { 826 | "names": ["high point"], 827 | "state": "nc", 828 | "city": "high point" 829 | }, { 830 | "names": ["billings"], 831 | "state": "mt", 832 | "city": "billings" 833 | }, { 834 | "names": ["green bay"], 835 | "state": "wi", 836 | "city": "green bay" 837 | }, { 838 | "names": ["west jordan"], 839 | "state": "ut", 840 | "city": "west jordan" 841 | }, { 842 | "names": ["richmond"], 843 | "state": "ca", 844 | "city": "richmond" 845 | }, { 846 | "names": ["brandon"], 847 | "state": "fl", 848 | "city": "brandon" 849 | }, { 850 | "names": ["murrieta"], 851 | "state": "ca", 852 | "city": "murrieta" 853 | }, { 854 | "names": ["burbank"], 855 | "state": "ca", 856 | "city": "burbank" 857 | }, { 858 | "names": ["palm bay"], 859 | "state": "fl", 860 | "city": "palm bay" 861 | }, { 862 | "names": ["everett"], 863 | "state": "wa", 864 | "city": "everett" 865 | }, { 866 | "names": ["antioch"], 867 | "state": "ca", 868 | "city": "antioch" 869 | }, { 870 | "names": ["south bend"], 871 | "state": "in", 872 | "city": "south bend" 873 | }, { 874 | "names": ["daly city"], 875 | "state": "ca", 876 | "city": "daly city" 877 | }, { 878 | "names": ["centennial"], 879 | "state": "co", 880 | "city": "centennial" 881 | }, { 882 | "names": ["temecula"], 883 | "state": "ca", 884 | "city": "temecula" 885 | }] 886 | -------------------------------------------------------------------------------- /us.states.json.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bianjiang/twitter-user-geocoder/5c080e5e9f54303538c14469fdb7ce82760f89b5/us.states.json.gz -------------------------------------------------------------------------------- /us_cities_geocode.csv: -------------------------------------------------------------------------------- 1 | 34.05223,-118.24368,ca,los angeles 2 | 41.85003,-87.65005,il,chicago 3 | 29.76328,-95.36327,tx,houston 4 | 39.95233,-75.16379,pa,philadelphia 5 | 33.44838,-112.07404,az,phoenix 6 | 32.71533,-117.15726,ca,san diego 7 | 32.78306,-96.80667,tx,dallas 8 | 37.33939,-121.89496,ca,san jose 9 | 39.76838,-86.15804,in,indianapolis 10 | 30.33218,-81.65565,fl,jacksonville 11 | 37.77493,-122.41942,ca,san francisco 12 | 30.26715,-97.74306,tx,austin 13 | 39.96118,-82.99879,oh,columbus 14 | 32.72541,-97.32085,tx,fort worth 15 | 35.22709,-80.84313,nc,charlotte 16 | 31.75872,-106.48693,tx,el paso 17 | 35.14953,-90.04898,tn,memphis 18 | 39.29038,-76.61219,md,baltimore 19 | 42.35843,-71.05977,ma,boston 20 | 47.60621,-122.33207,wa,seattle 21 | 38.89511,-77.03637,dc,"washington, d.c." 22 | 39.73915,-104.9847,co,denver 23 | 43.0389,-87.90647,wi,milwaukee 24 | 45.52345,-122.67621,or,portland 25 | 36.17497,-115.13722,nv,las vegas 26 | 35.46756,-97.51643,ok,oklahoma city 27 | 35.08449,-106.65114,nm,albuquerque 28 | 36.16589,-86.78444,tn,nashville 29 | 32.22174,-110.92648,az,tucson 30 | 36.74773,-119.77237,ca,fresno 31 | 38.58157,-121.4944,ca,sacramento 32 | 33.76696,-118.18923,ca,long beach 33 | 33.42227,-111.82264,az,mesa 34 | 33.749,-84.38798,ga,atlanta 35 | 38.83388,-104.82136,co,colorado springs 36 | 35.7721,-78.63861,nc,raleigh 37 | 25.77427,-80.19366,fl,miami 38 | 36.15398,-95.99277,ok,tulsa 39 | 37.80437,-122.2708,ca,oakland 40 | 37.69224,-97.33754,ks,wichita 41 | 21.30694,-157.85833,hi,honolulu 42 | 32.73569,-97.10807,tx,arlington 43 | 35.37329,-119.01871,ca,bakersfield 44 | 29.95465,-90.07507,la,new orleans 45 | 33.83529,-117.9145,ca,anaheim 46 | 27.94752,-82.45843,fl,tampa 47 | 39.72943,-104.83192,co,aurora 48 | 33.74557,-117.86783,ca,santa ana 49 | 38.62727,-90.19789,mo,st. louis 50 | 40.44062,-79.99589,pa,pittsburgh 51 | 27.80058,-97.39638,tx,corpus christi 52 | 33.95335,-117.39616,ca,riverside 53 | 39.162,-84.45689,oh,cincinnati 54 | 38.0498,-84.45855,ky,lexington-fayette 55 | 61.21806,-149.90028,ak,anchorage 56 | 37.9577,-121.29078,ca,stockton 57 | 38.45647,-82.69238,ky,ironville 58 | 38.41258,-82.70905,ky,meads 59 | 36.07264,-79.79198,nc,greensboro 60 | 36.0397,-114.98194,nv,henderson 61 | 41.1306,-85.12886,in,fort wayne 62 | 27.77086,-82.67927,fl,saint petersburg 63 | 32.64005,-117.0842,ca,chula vista 64 | 38.25424,-85.75941,ky,louisville 65 | 28.53834,-81.37924,fl,orlando 66 | 33.30616,-111.84125,az,chandler 67 | 43.07305,-89.40123,wi,madison 68 | 36.09986,-80.24422,nc,winston-salem 69 | 33.57786,-101.85517,tx,lubbock 70 | 30.45075,-91.15455,la,baton rouge 71 | 35.99403,-78.89862,nc,durham 72 | 32.91262,-96.63888,tx,garland 73 | 37.98869,-84.47772,ky,lexington 74 | 39.52963,-119.8138,nv,reno 75 | 25.8576,-80.27811,fl,hialeah 76 | 36.09719,-115.14666,nv,paradise 77 | 33.50921,-111.89903,az,scottsdale 78 | 37.54827,-121.98857,ca,fremont 79 | 33.52066,-86.80249,al,birmingham 80 | 34.10834,-117.28977,ca,san bernardino 81 | 47.65966,-117.42908,wa,spokane 82 | 33.35283,-111.78903,az,gilbert 83 | 32.36681,-86.29997,al,montgomery 84 | 41.60054,-93.60911,ia,des moines 85 | 37.6391,-120.99688,ca,modesto 86 | 35.05266,-78.87836,nc,fayetteville 87 | 32.52515,-93.75018,la,shreveport 88 | 47.25288,-122.44429,wa,tacoma 89 | 34.1975,-119.17705,ca,oxnard 90 | 34.09223,-117.43505,ca,fontana 91 | 30.69436,-88.04305,al,mobile 92 | 34.74648,-92.28959,ar,little rock 93 | 33.93752,-117.23059,ca,moreno valley 94 | 34.14251,-118.25508,ca,glendale 95 | 35.222,-101.8313,tx,amarillo 96 | 33.6603,-117.99923,ca,huntington beach 97 | 36.21108,-115.07306,nv,sunrise manor 98 | 34.19084,-119.2415,ca,oxnard shores 99 | 40.76078,-111.89105,ut,salt lake city 100 | 30.43826,-84.28073,fl,tallahassee 101 | 34.73037,-86.5861,al,huntsville 102 | 35.96064,-83.92074,tn,knoxville 103 | 36.10803,-115.245,nv,spring valley 104 | 41.82399,-71.41283,ri,providence 105 | 34.39166,-118.54259,ca,santa clarita 106 | 32.74596,-96.99778,tx,grand prairie 107 | 25.90175,-97.49748,tx,brownsville 108 | 32.29876,-90.18481,ms,jackson 109 | 38.98223,-94.67079,ks,overland park 110 | 33.77391,-117.94145,ca,garden grove 111 | 38.44047,-122.71443,ca,santa rosa 112 | 35.04563,-85.30968,tn,chattanooga 113 | 34.09834,-118.32674,ca,hollywood 114 | 33.19587,-117.37948,ca,oceanside 115 | 26.12231,-80.14338,fl,fort lauderdale 116 | 34.1064,-117.59311,ca,rancho cucamonga 117 | 27.29393,-80.35033,fl,port saint lucie 118 | 34.06334,-117.65089,ca,ontario 119 | 45.63873,-122.66149,wa,vancouver 120 | 33.41477,-111.90931,az,tempe 121 | 37.21533,-93.29824,mo,springfield 122 | 33.41421,-111.94348,az,tempe junction 123 | 34.69804,-118.13674,ca,lancaster 124 | 44.05207,-123.08675,or,eugene 125 | 26.00315,-80.22394,fl,pembroke pines 126 | 44.9429,-123.0351,or,salem 127 | 26.56285,-81.94953,fl,cape coral 128 | 43.54997,-96.70033,sd,sioux falls 129 | 38.4088,-121.37162,ca,elk grove 130 | 42.27113,-89.094,il,rockford 131 | 34.57943,-118.11646,ca,palmdale 132 | 33.87529,-117.56644,ca,corona 133 | 36.67774,-121.6555,ca,salinas 134 | 34.05529,-117.75228,ca,pomona 135 | 41.52519,-88.0834,il,joliet 136 | 43.6135,-116.20345,id,boise 137 | 39.11417,-94.62746,ks,kansas city 138 | 33.83585,-118.34063,ca,torrance 139 | 41.16704,-73.20483,ct,bridgeport 140 | 37.66882,-122.0808,ca,hayward 141 | 40.58526,-105.08442,co,fort collins 142 | 33.11921,-117.08642,ca,escondido 143 | 39.70471,-105.08137,co,lakewood 144 | 29.97854,-90.16396,la,metairie terrace 145 | 41.78586,-88.14729,il,naperville 146 | 39.75895,-84.19161,oh,dayton 147 | 37.36883,-122.03635,ca,sunnyvale 148 | 29.98409,-90.15285,la,metairie 149 | 34.14778,-118.14452,ca,pasadena 150 | 33.78779,-117.85311,ca,orange 151 | 32.08354,-81.09983,ga,savannah 152 | 35.79154,-78.78112,nc,cary 153 | 33.87029,-117.92534,ca,fullerton 154 | 36.52977,-87.35945,tn,clarksville 155 | 40.69161,-112.00105,ut,west valley city 156 | 39.04833,-95.67804,ks,topeka 157 | 34.17056,-118.83759,ca,thousand oaks 158 | 42.00833,-91.64407,ia,cedar rapids 159 | 38.8814,-94.81913,ks,olathe 160 | 29.65163,-82.32483,fl,gainesville 161 | 34.26945,-118.78148,ca,simi valley 162 | 47.61038,-122.20068,wa,bellevue 163 | 37.97798,-122.03107,ca,concord 164 | 25.98731,-80.23227,fl,miramar 165 | 26.27119,-80.2706,fl,coral springs 166 | 30.22409,-92.01984,la,lafayette 167 | 32.77657,-79.93092,sc,charleston 168 | 32.95373,-96.89028,tx,carrollton 169 | 38.75212,-121.28801,ca,roseville 170 | 39.86804,-104.97192,co,thornton 171 | 30.08605,-94.10185,tx,beaumont 172 | 33.63059,-112.33322,az,surprise 173 | 37.97476,-87.55585,in,evansville 174 | 32.44874,-99.73314,tx,abilene 175 | 33.15067,-96.82361,tx,frisco 176 | 39.09112,-94.41551,mo,independence 177 | 33.96095,-83.37794,ga,athens 178 | 37.35411,-121.95524,ca,santa clara 179 | 40.69365,-89.58899,il,peoria 180 | 34.06862,-118.02757,ca,el monte 181 | 33.21484,-97.13307,tx,denton 182 | 37.87159,-122.27275,ca,berkeley 183 | 40.23384,-111.65853,ut,provo 184 | 33.94001,-118.13257,ca,downey 185 | 31.99735,-102.07791,tx,midland 186 | 35.22257,-97.43948,ok,norman 187 | 41.55815,-73.0515,ct,irving 188 | 33.64113,-117.91867,ca,costa mesa 189 | 33.96168,-118.35313,ca,inglewood 190 | 35.84562,-86.39027,tn,murfreesboro 191 | 36.02525,-115.24194,nv,enterprise 192 | 42.03725,-88.28119,il,elgin 193 | 27.96585,-82.8001,fl,clearwater 194 | 25.94204,-80.2456,fl,miami gardens 195 | 38.25445,-104.60914,co,pueblo 196 | 42.63342,-71.31617,ma,lowell 197 | 34.22573,-77.94471,nc,wilmington 198 | 39.80276,-105.08748,co,arvada 199 | 39.83665,-105.0372,co,westminster 200 | 34.06862,-117.93895,ca,west covina 201 | 45.49818,-122.43148,or,gresham 202 | 33.90224,-118.08173,ca,norwalk 203 | 33.15809,-117.35059,ca,carlsbad 204 | 38.24936,-122.03997,ca,fairfield 205 | 42.3751,-71.10561,ma,cambridge 206 | 34.1389,-118.35341,ca,universal city 207 | 35.95569,-80.00532,nc,high point 208 | 45.78329,-108.50069,mt,billings 209 | 44.51916,-88.01983,wi,green bay 210 | 40.60967,-111.9391,ut,west jordan 211 | 37.93576,-122.34775,ca,richmond 212 | 27.9378,-82.28592,fl,brandon 213 | 33.55391,-117.21392,ca,murrieta 214 | 34.18084,-118.30897,ca,burbank 215 | 28.03446,-80.58866,fl,palm bay 216 | 47.97898,-122.20208,wa,everett 217 | 38.00492,-121.80579,ca,antioch 218 | 41.68338,-86.25001,in,south bend 219 | 37.70577,-122.46192,ca,daly city 220 | 39.57916,-104.87692,co,centennial 221 | 33.49364,-117.14836,ca,temecula 222 | -------------------------------------------------------------------------------- /us_geocode.csv.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bianjiang/twitter-user-geocoder/5c080e5e9f54303538c14469fdb7ce82760f89b5/us_geocode.csv.gz -------------------------------------------------------------------------------- /util.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | import logging 6 | 7 | logger = logging.getLogger('GeocodingTweets') 8 | 9 | bad_locations = ['spain', 'nowhere'] 10 | 11 | def bad_location(location): 12 | 13 | location = location.lower() 14 | 15 | for bad_location in bad_locations: 16 | if (bad_location in location): 17 | return True 18 | 19 | return False --------------------------------------------------------------------------------