├── .gitignore ├── BLOG_POST.mkd ├── README ├── blockr.py ├── build_neighborhood.sh ├── dump ├── LICENSE.txt └── README.txt ├── geocrawlr.py ├── junk ├── OsmApi.py ├── README ├── fetch_osm.py ├── pull_photos.py ├── samplr.py └── up_one_level.py ├── leaves_from_woeid.py ├── mapnik_render.py ├── outliers.py └── util ├── consolidate_geojson.py ├── geoplanet.py └── upload_photos.py /.gitignore: -------------------------------------------------------------------------------- 1 | data 2 | *.pyc 3 | -------------------------------------------------------------------------------- /BLOG_POST.mkd: -------------------------------------------------------------------------------- 1 | # It's a beautiful day in the neighborhood! 2 | #####by Schuyler Erle 3 | ######8/5/2012 1:00pm 4 | ######San Francisco, CA 5 | 6 | A large part of our job at SimpleGeo consists of listening closely to our users, and trying to understand what kinds of geo-related tools will make their lives easier and their apps more awesome. One thing we hear about pretty regularly is the lack of freely available neighborhood boundaries for international cities. 7 | 8 | Now, SimpleGeo Context has had neighborhood boundaries for most major US cities ever since we launched the product. We’ve been asked about neighborhoods in cities outside the US, but, when we started looking, we didn’t immediately find a source that was available under a license that we could encourage you to freely reuse. So we decided to make our own! 9 | 10 | We’re pleased to announce the availability in SimpleGeo Context of neighborhood boundaries for the following twelve cities: 11 | 12 | * Amsterdam 13 | * Barcelona 14 | * Beijing 15 | * Berlin 16 | * Florence 17 | * London 18 | * Paris 19 | * Rome 20 | * Shanghai 21 | * Sydney 22 | * Tokyo 23 | * Vienna 24 | 25 | Additionally, we now have approximate boundaries for Paris’s arrondissements and Berlin’s ortsteils. Check out the Eiffel Tower in our Context demo – scroll down to see the map, and click “Features” on the right – or perhaps Westminster Abbey. You can also see some visualizations in [our Flickr stream](http://www.flickr.com/photos/simplegeo/sets/72157627358066594/). 26 | 27 | Now, neighborhoods are, in many ways, a unique form of geography. Some geographies are physical by nature: A park has boundaries, a road has a center line, et cetera. Most non-physical geographies have some legal existence, like a post code or a city or a province, where a statute or a treaty defines the boundaries of the geography. As an informal division of a city, a neighborhood’s boundaries are often both invisible and lacking in precise definition. Often, the conventionally accepted boundaries of a neighborhood ebb and flow over time, as the economics or demographics of the region change. Neighborhood boundaries are usually fuzzy, and frequently overlap in practice, in ways that other kinds of geography do not. 28 | 29 | So, we’ll be totally candid – our new international neighborhood dataset is definitely a work in progress. There are some evident issues with the new dataset, but we thought it better to release and then iterate, rather than wait indefinitely on impossible perfection. We hope to continue to refine and improve the data, as well as add lots of new cities. 30 | Due to the data sources we combined to produce them, all of the new neighborhood data in Context is licensed under the [Open Database License (ODbL)](http://opendatacommons.org/licenses/odbl/). You can find the new neighborhoods in SimpleGeo Context, and you can also download the [whole data set](http://s3.amazonaws.com/simplegeo-public/neighborhoods_dump_20110804.zip). We hope you do awesome things with it! 31 | 32 | Read on for the technical details! 33 | 34 | Generating neighborhood boundaries for new cities actually turned out to be a pretty good trick. We didn’t have any source for boundaries themselves, but Flickr’s body of geotagged photos represent a pool of samples of neighborhood locations, because photos taken in cities often have a machine tag containing the Where On Earth ID (or “WoE ID”) for the corresponding neighborhood. We used the freely available [Yahoo! GeoPlanet data dumps](http://developer.yahoo.com/geo/geoplanet/data/) to identify the WoE IDs of neighborhoods — “Suburb” or “LocalAdmin” in the parlance of GeoPlanet — in the cities in which we were interested. We then used the Flickr API to draw a sample of geotagged photo locations for that WoE ID to establish a kind of “cloud” of points that roughly represent that neighborhood. 35 | 36 | At first, we tried generating a [Voronoi diagram](http://en.wikipedia.org/wiki/Voronoi_diagram) over the entire area of the city, and then merging the resulting shapes by WoE ID. This yielded “boundaries” that were very organic, and kind of weird looking. They didn’t correspond to our intuitions about how neighborhoods are structured in the minds of residents and visitors. In our experience, neighborhood boundaries in large cities often conform to the physical geography, such as the lines of roads and waterways, rather than cutting across city blocks, and even buildings. 37 | 38 | We turned to [OpenStreetMap](http://openstreetmap.org/) as a source for the physical geography of roads, railroads, and waterways, because OSM turns out to be a pretty good source for this sort of data in most of the world’s largest cities. After loading the [entire world of OSM](http://wiki.openstreetmap.org/wiki/Planet.osm) into a [PostGIS](http://postgis.refractions.net/) database, we take the linework for each city, and, treating it as a set of polygon boundaries, use [GRASS](http://osgeo.org/grass/) to clean up the data and generate a polygon for each “city block” in our area of interest. Using OSM, of course, means that the results need to be licensed ODbL, in order to respect the desire of the community that derivative works be shared alike. 39 | 40 | The rest of the work gets done in Python, using the excellent [Shapely](http://trac.gispython.org/lab/wiki/Shapely) library. First, we group the geotagged photo locations by neighborhood, and then filter them by [median absolute deviation](http://en.wikipedia.org/wiki/Median_absolute_deviation) to remove mistagged outliers. Next, we iterate over each city block, and tally up the weighted inverse distances of the n nearest geotagged photos to decide which neighborhood the block “belongs” to. After all the blocks are assigned, we extract the largest polygon for each neighborhood as its “core”, reassign blocks that are detached from their core to other nearby neighborhoods, and do a bit of cleanup. This surprisingly simple local-then-global approach yields pretty convincing results. 41 | 42 | We’ve considered two possible improvements for the future. We’ve experimented with an additional step that focuses on swapping blocks at the edges of neighborhoods to improve “compactness”, which intuitively feels like a important property of neighborhood boundaries, and we hope to revisit this soon. 43 | 44 | The other possible “improvement” has to do with the fact that, for the time being, the boundaries we’re providing are sharply defined. We felt that this might be easier for developers to work with, versus a set of neighborhood boundaries with variable overlap, but we’d really like your feedback about this works out for you in practice. 45 | 46 | You can find the code on Github, if you’re interested: [http://github.com/simplegeo/betashapes/](http://github.com/simplegeo/betashapes/). The name of the code repository is left as an exercise for the reader. -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | Betashapes 2 | ========== 3 | created by Melissa Santos and Schuyler Erle 4 | (c) 2011 SimpleGeo, Inc. 5 | 6 | What is this? 7 | ------------- 8 | 9 | It's the code used by SimpleGeo to generate its international neighborhood 10 | dataset. 11 | 12 | See the blog post for an explanation: 13 | 14 | http://blog.simplegeo.com/2011/08/05/its-a-beautiful-day-in-the-neighborhood/ 15 | (The blog post content is now in the BLOG_POST.mkd file of this repo) 16 | 17 | Why's it here? 18 | -------------- 19 | 20 | We had fun writing it. We like giving stuff away. Maybe you'll find it useful. 21 | Maybe you'll improve it and send us a pull request! We provide no warranty, and 22 | no support. If it breaks, you get to keep the pieces. 23 | 24 | How's it work? 25 | -------------- 26 | 27 | Well, it helps if you download Yahoo's GeoPlanet dump, and load both it and all 28 | or some subset of Planet.osm into PostGIS. 29 | 30 | You'll need to create a data/ directory, and dump a mapping of WoE ID -> Name 31 | into a file called `data/names.txt`, and another mapping of Parent ID, Name, 32 | Type -> WoE ID into another file called `data/suburbs.txt`. This is stupid and 33 | could be done a lot more cleanly. 34 | 35 | Here is a sample of the names.txt we're using: 36 | 37 | 29372661 San Francisco Javier 38 | 772864 San Francisco de Paula 39 | 108040 Villa de San Francisco 40 | 142610 San Francisco Culhuacán 41 | 349422 San Francisco de Limache 42 | 12521721 San Francisco International Airport 43 | 44 | Here's a sample of the suburbs.txt: 45 | 46 | 44418 Streatham Common Suburb 20089509 47 | 44418 Upper Walthamstow Suburb 20089365 48 | 44418 Castelnau Suburb 20089570 49 | 44418 Harold Hill Suburb 22483 50 | 44418 Blackfriars Road Suburb 20094299 51 | 44418 Lampton Suburb 44314 52 | 44418 Lower Place Suburb 20089447 53 | 44418 Furzedown Suburb 20089510 54 | 44418 Crofton Suburb 20089334 55 | 44418 Collier's Wood Suburb 20089517 56 | 57 | Running build_neighborhood.sh takes over from there. 58 | 59 | What's in it? 60 | ------------- 61 | 62 | build_neighborhood.sh 63 | 64 | This shell script makes the magic happen. Depends on PostgreSQL and GRASS, 65 | in addition to all the other stuff in here. 66 | 67 | blockr.py 68 | 69 | The main neighborhood generation script. Takes a name file 70 | (tab-separated, mapping WoE ID to name), a GeoJSON FeatureCollection 71 | containing the block polygons to be assigned, and a points file (as 72 | generated by geocrawlr.py). 73 | 74 | Requires Shapely. 75 | 76 | outliers.py 77 | 78 | A module for reading points.txt files and discarding outlying points based 79 | on median absolute distance. If run as a script, prints the bounding box of 80 | the points after outliers are discarded. 81 | 82 | geocrawlr.py [ ...] 83 | 84 | A script that crawls the Flickr API looking for geotagged photo records 85 | associated with the given woe_ids. Writes line-by-line, tab-separated 86 | values to stdout consisting of: Photo ID, WoE ID, Longitude, Latitude. 87 | Uses Flickr.API. You must have your FLICKR_KEY and FLICKR_SECRET set in the 88 | environment. 89 | 90 | geoplanet.py 91 | 92 | A utility script to query Y! GeoPlanet. Takes names, one per line, on stdin, 93 | queries GeoPlanet, and outputs the first WoE ID and name returned on stdout. 94 | Set YAHOO_APPID in your environment. 95 | 96 | mapnik_render.py 97 | 98 | A Mapnik script to visualize the neighborhood.json and blocks.json data 99 | together. 100 | 101 | leaves_from_woeid.py 102 | 103 | Walks a table of GeoPlanet data in PostgreSQL and fetches all the leaves 104 | descending from a given WoE ID. 105 | 106 | What's a "betashape"? 107 | --------------------- 108 | 109 | See: 110 | 111 | http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ 112 | 113 | also see: 114 | 115 | http://code.flickr.com/blog/2009/01/12/living-in-the-donut-hole/ 116 | 117 | and for good measure: 118 | 119 | http://code.flickr.com/blog/2011/01/08/flickr-shapefiles-public-dataset-2-0/ 120 | 121 | Propers to Aaron Straup Cope for his ideas and encouragement. 122 | 123 | License 124 | ------- 125 | 126 | Copyright (c) 2011, SimpleGeo, Inc. 127 | All rights reserved. 128 | 129 | Redistribution and use in source and binary forms, with or without 130 | modification, are permitted provided that the following conditions are met: 131 | 132 | * Redistributions of source code must retain the above copyright 133 | notice, this list of conditions and the following disclaimer. 134 | * Redistributions in binary form must reproduce the above copyright 135 | notice, this list of conditions and the following disclaimer in the 136 | documentation and/or other materials provided with the distribution. 137 | * Neither the name of the SimpleGeo, Inc. nor the 138 | names of its contributors may be used to endorse or promote products 139 | derived from this software without specific prior written permission. 140 | 141 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 142 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 143 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 144 | DISCLAIMED. IN NO EVENT SHALL SIMPLEGEO, INC. BE LIABLE FOR ANY 145 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 146 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 147 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 148 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 149 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 150 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 151 | -------------------------------------------------------------------------------- /blockr.py: -------------------------------------------------------------------------------- 1 | from shapely.geometry import Point, Polygon, MultiPolygon, asShape 2 | from shapely.geometry.polygon import LinearRing 3 | from shapely.ops import cascaded_union, polygonize 4 | from shapely.prepared import prep 5 | from rtree import Rtree 6 | from outliers import load_points, discard_outliers 7 | import sys, json, math, pickle, os, geojson 8 | 9 | SAMPLE_SIZE = 20 10 | SCALE_FACTOR = 111111.0 # meters per degree latitude 11 | #ACTION_THRESHOLD = 2.0/math.sqrt(1000.0) # 1 point closer than 1km 12 | ACTION_THRESHOLD = 20.0/math.sqrt(1000.0) # 1 point closer than 1km 13 | AREA_BOUND = 0.001 14 | TARGET_ASSIGN_LEVEL = 0.75 15 | 16 | name_file, line_file, point_file = sys.argv[1:4] 17 | 18 | places = {} 19 | names = {} 20 | blocks = {} 21 | if os.path.exists(point_file + '.cache'): 22 | print >>sys.stderr, "Reading from %s cache..." % point_file 23 | names, blocks, places = pickle.load(file(point_file + ".cache")) 24 | blocks = map(asShape, blocks) 25 | else: 26 | all_names = {} 27 | count = 0 28 | for line in file(name_file): 29 | place_id, name = line.strip().split(None, 1) 30 | all_names[int(place_id)] = name 31 | count += 1 32 | if count % 1000 == 0: 33 | print >>sys.stderr, "\rRead %d names from %s." % (count, name_file), 34 | print >>sys.stderr, "\rRead %d names from %s." % (count, name_file) 35 | 36 | places = load_points(point_file) 37 | for place_id in places: 38 | names[place_id] = all_names.get(place_id, "") 39 | places = discard_outliers(places) 40 | 41 | lines = [] 42 | do_polygonize = False 43 | print >>sys.stderr, "Reading lines from %s..." % line_file, 44 | for feature in geojson.loads(file(line_file).read()): 45 | if feature.geometry.type in ('LineString', 'MultiLineString'): 46 | do_polygonize = True 47 | lines.append(asShape(feature.geometry.to_dict())) 48 | print >>sys.stderr, "%d lines read." % len(lines) 49 | if do_polygonize: 50 | print >>sys.stderr, "Polygonizing %d lines..." % (len(lines)), 51 | blocks = [poly.__geo_interface__ for poly in polygonize(lines)] 52 | print >>sys.stderr, "%d blocks formed." % len(blocks) 53 | else: 54 | blocks = [poly.__geo_interface__ for poly in lines] 55 | 56 | if not os.path.exists(point_file + '.cache'): 57 | print >>sys.stderr, "Caching points, blocks, and names ..." 58 | pickle.dump((names, blocks, places), file(point_file + ".cache", "w"), -1) 59 | blocks = map(asShape, blocks) 60 | 61 | points = [] 62 | place_list = set() 63 | count = 0 64 | for place_id, pts in places.items(): 65 | count += 1 66 | print >>sys.stderr, "\rPreparing %d of %d places..." % (count, len(places)), 67 | for pt in pts: 68 | place_list.add((len(points), pt+pt, None)) 69 | points.append((place_id, Point(pt))) 70 | print >>sys.stderr, "Indexing...", 71 | index = Rtree(place_list) 72 | print >>sys.stderr, "Done." 73 | 74 | def score_block(polygon): 75 | centroid = polygon.centroid 76 | #prepared = prep(polygon) 77 | score = {} 78 | outside_samples = 0 79 | for item in index.nearest((centroid.x, centroid.y), num_results=SAMPLE_SIZE): 80 | place_id, point = points[item] 81 | score.setdefault(place_id, 0.0) 82 | #if prepared.contains(point): 83 | # score[place_id] += 1.0 84 | #else: 85 | score[place_id] += 1.0 / math.sqrt(max(polygon.distance(point)*SCALE_FACTOR, 1.0)) 86 | outside_samples += 1 87 | return list(reversed(sorted((sc, place_id) for place_id, sc in score.items()))) 88 | 89 | count = 0 90 | assigned_blocks = {} 91 | assigned_ct = 0 92 | unassigned = {} #keyed on the polygon's index in blocks 93 | for count in range(len(blocks)): 94 | polygon = blocks[count] 95 | print >>sys.stderr, "\rScoring %d of %d blocks..." % ((count+1), len(blocks)), 96 | if not polygon.is_valid: 97 | try: 98 | polygon = polygon.buffer(0) 99 | blocks[count] = polygon 100 | except: 101 | pass 102 | if not polygon.is_valid: 103 | continue 104 | if polygon.is_empty: continue 105 | if polygon.area > AREA_BOUND: continue 106 | 107 | scores = score_block(polygon) 108 | best, winner = scores[0] 109 | if best > ACTION_THRESHOLD: 110 | assigned_ct += 1 111 | assigned_blocks.setdefault(winner, []) 112 | assigned_blocks[winner].append(polygon) 113 | else: 114 | # if the block wasn't assigned hang onto the info about the winning nbhd 115 | unassigned[count] = (best, winner) 116 | print >>sys.stderr, "Done, assigned %d of %d blocks" % (assigned_ct, len(blocks)) 117 | 118 | new_threshold = ACTION_THRESHOLD 119 | while float(assigned_ct)/len(blocks) < TARGET_ASSIGN_LEVEL and len(unassigned) > 0: 120 | new_threshold -= 0.1 121 | print >>sys.stderr, "\rDropping threshold to %f1.3... " % new_threshold 122 | for blockindex in unassigned.keys(): 123 | best, winner = unassigned[blockindex] 124 | #if blocks[blockindex].is_empty: del(unassigned[blockindex]) 125 | if best > new_threshold: 126 | assigned_ct += 1 127 | assigned_blocks.setdefault(winner, []) 128 | assigned_blocks[winner].append(blocks[blockindex]) 129 | del unassigned[blockindex] 130 | print >>sys.stderr, "Done, assigned %d of %d blocks" % (assigned_ct, len(blocks)) 131 | 132 | 133 | polygons = {} 134 | count = 0 135 | for place_id in places.keys(): 136 | count += 1 137 | print >>sys.stderr, "\rMerging %d of %d boundaries..." % (count, len(places)), 138 | if place_id not in assigned_blocks: continue 139 | polygons[place_id] = cascaded_union(assigned_blocks[place_id]) 140 | print >>sys.stderr, "Done." 141 | 142 | count = 0 143 | orphans = [] 144 | for place_id, multipolygon in polygons.items(): 145 | count += 1 146 | print >>sys.stderr, "\rRemoving %d orphans from %d of %d polygons..." % (len(orphans), count, len(polygons)), 147 | if type(multipolygon) is not MultiPolygon: continue 148 | polygon_count = [0] * len(multipolygon) 149 | for i, polygon in enumerate(multipolygon.geoms): 150 | prepared = prep(polygon) 151 | for item in index.intersection(polygon.bounds): 152 | item_id, point = points[item] 153 | if item_id == place_id and prepared.intersects(point): 154 | polygon_count[i] += 1 155 | winner = max((c, i) for (i, c) in enumerate(polygon_count))[1] 156 | polygons[place_id] = multipolygon.geoms[winner] 157 | orphans.extend((place_id, p) for i, p in enumerate(multipolygon.geoms) if i != winner) 158 | print >>sys.stderr, "Done." 159 | 160 | count = 0 161 | total = len(orphans) 162 | retries = 0 163 | unassigned = None 164 | while orphans: 165 | unassigned = [] 166 | for origin_id, orphan in orphans: 167 | count += 1 168 | changed = False 169 | print >>sys.stderr, "\rReassigning %d of %d orphans..." % (count-retries, total), 170 | for score, place_id in score_block(orphan): 171 | if place_id not in polygons: 172 | # Turns out we just wind up assigning tiny, inappropriate places 173 | #polygons[place_id] = orphan 174 | #changed = True 175 | continue 176 | elif place_id != origin_id and orphan.intersects(polygons[place_id]): 177 | polygons[place_id] = polygons[place_id].union(orphan) 178 | changed = True 179 | if changed: 180 | break 181 | if not changed: 182 | unassigned.append((origin_id, orphan)) 183 | retries += 1 184 | if len(unassigned) == len(orphans): 185 | # give up 186 | break 187 | orphans = unassigned 188 | print >>sys.stderr, "%d retried, %d unassigned." % (retries, len(unassigned)) 189 | 190 | print >>sys.stderr, "Returning remaining orphans to original places." 191 | for origin_id, orphan in orphans: 192 | if orphan.intersects(polygons[origin_id]): 193 | polygons[origin_id] = polygons[origin_id].union(orphan) 194 | 195 | print >>sys.stderr, "Try to assign the holes to neighboring neighborhoods." 196 | #merge the nbhds 197 | city = cascaded_union(polygons.values()) 198 | 199 | #pull out any holes in the resulting Polygon/Multipolygon 200 | if type(city) is Polygon: 201 | over = [city] 202 | elif type(city) is MultiPolygon: 203 | over = city.geoms 204 | else: 205 | print >>sys.stderr, "\rcity is of type %s, wtf." % (type(city)) 206 | 207 | holes = [] 208 | for poly in over: 209 | holes.extend((Polygon(LinearRing(interior.coords)) for interior in poly.interiors)) 210 | 211 | count = 0 212 | total = len(holes) 213 | retries = 0 214 | unassigned = None 215 | while holes: 216 | unassigned = [] 217 | for hole in holes: 218 | count += 1 219 | changed = False 220 | print >>sys.stderr, "\rReassigning %d of %d holes..." % (count-retries, total), 221 | for score, place_id in score_block(hole): 222 | if place_id not in polygons: 223 | # Turns out we just wind up assigning tiny, inappropriate places 224 | #nbhds[place_id] = hole 225 | #changed = True 226 | continue 227 | elif hole.intersects(polygons[place_id]): 228 | polygons[place_id] = polygons[place_id].union(hole) 229 | changed = True 230 | if changed: 231 | break 232 | if not changed: 233 | unassigned.append(hole) 234 | retries += 1 235 | if len(unassigned) == len(holes): 236 | # give up 237 | break 238 | holes = unassigned 239 | print >>sys.stderr, "%d retried, %d unassigned." % (retries, len(unassigned)) 240 | 241 | print >>sys.stderr, "Buffering polygons." 242 | for place_id, polygon in polygons.items(): 243 | if type(polygon) is Polygon: 244 | polygon = Polygon(polygon.exterior.coords) 245 | else: 246 | bits = [] 247 | for p in polygon.geoms: 248 | if type(p) is Polygon: 249 | bits.append(Polygon(p.exterior.coords)) 250 | polygon = MultiPolygon(bits) 251 | polygons[place_id] = polygon.buffer(0) 252 | 253 | 254 | print >>sys.stderr, "Writing output." 255 | features = [] 256 | for place_id, poly in polygons.items(): 257 | features.append({ 258 | "type": "Feature", 259 | "id": place_id, 260 | "geometry": poly.__geo_interface__, 261 | "properties": {"woe_id": place_id, "name": names.get(place_id, "")} 262 | }) 263 | 264 | collection = { 265 | "type": "FeatureCollection", 266 | "features": features 267 | } 268 | 269 | print json.dumps(collection) 270 | 271 | -------------------------------------------------------------------------------- /build_neighborhood.sh: -------------------------------------------------------------------------------- 1 | NAME=$1 2 | WOEID=$2 3 | DBNAME=osm # you need to have planet.osm (or some relevant portion) imported 4 | DBPORT=5433 5 | #GRASS_LOCATION=/home/sderle/grass/Global/PERMANENT 6 | GRASS_LOCATION=/mnt/places/melissa/grass/Global/PERMANENT 7 | export GRASS_BATCH_JOB=$GRASS_LOCATION/neighborhood.$$ 8 | 9 | if [ ! -r data/names.txt -o ! -r data/suburbs.txt ]; then 10 | echo "data/names.txt (tab separated file mapping woe_id to name) is missing, or" 11 | echo "data/suburbs.txt (tab separated file mapping parent_id, name, type, woe_id) is missing" 12 | exit 1 13 | fi 14 | 15 | if [ ! -r data/photos_$WOEID.txt ]; then 16 | grep ^$WOEID data/suburbs.txt | cut -f4 | xargs python geocrawlr.py >data/photos_$WOEID.txt 17 | fi 18 | 19 | BBOX=`python outliers.py data/photos_$WOEID.txt` 20 | 21 | if [ ! -r data/blocks_$WOEID.json ]; then 22 | pgsql2shp -f tmp$WOEID.shp -p $DBPORT $DBNAME \ 23 | "select osm_id, way from planet_osm_line where way && 'BOX($BBOX)'::box2d and (highway is not null or waterway is not null)" \ 24 | || exit 1 25 | 26 | sed -e "s/WOEID/$WOEID/g" >$GRASS_BATCH_JOB < data/$NAME.json 37 | -------------------------------------------------------------------------------- /dump/LICENSE.txt: -------------------------------------------------------------------------------- 1 | ## ODC Open Database License (ODbL) 2 | 3 | ### Preamble 4 | 5 | The Open Database License (ODbL) is a license agreement intended to 6 | allow users to freely share, modify, and use this Database while 7 | maintaining this same freedom for others. Many databases are covered by 8 | copyright, and therefore this document licenses these rights. Some 9 | jurisdictions, mainly in the European Union, have specific rights that 10 | cover databases, and so the ODbL addresses these rights, too. Finally, 11 | the ODbL is also an agreement in contract for users of this Database to 12 | act in certain ways in return for accessing this Database. 13 | 14 | Databases can contain a wide variety of types of content (images, 15 | audiovisual material, and sounds all in the same database, for example), 16 | and so the ODbL only governs the rights over the Database, and not the 17 | contents of the Database individually. Licensors should use the ODbL 18 | together with another license for the contents, if the contents have a 19 | single set of rights that uniformly covers all of the contents. If the 20 | contents have multiple sets of different rights, Licensors should 21 | describe what rights govern what contents together in the individual 22 | record or in some other way that clarifies what rights apply. 23 | 24 | Sometimes the contents of a database, or the database itself, can be 25 | covered by other rights not addressed here (such as private contracts, 26 | trade mark over the name, or privacy rights / data protection rights 27 | over information in the contents), and so you are advised that you may 28 | have to consult other documents or clear other rights before doing 29 | activities not covered by this License. 30 | 31 | ------ 32 | 33 | The Licensor (as defined below) 34 | 35 | and 36 | 37 | You (as defined below) 38 | 39 | agree as follows: 40 | 41 | ### 1.0 Definitions of Capitalised Words 42 | 43 | "Collective Database" – Means this Database in unmodified form as part 44 | of a collection of independent databases in themselves that together are 45 | assembled into a collective whole. A work that constitutes a Collective 46 | Database will not be considered a Derivative Database. 47 | 48 | "Convey" – As a verb, means Using the Database, a Derivative Database, 49 | or the Database as part of a Collective Database in any way that enables 50 | a Person to make or receive copies of the Database or a Derivative 51 | Database. Conveying does not include interaction with a user through a 52 | computer network, or creating and Using a Produced Work, where no 53 | transfer of a copy of the Database or a Derivative Database occurs. 54 | "Contents" – The contents of this Database, which includes the 55 | information, independent works, or other material collected into the 56 | Database. For example, the contents of the Database could be factual 57 | data or works such as images, audiovisual material, text, or sounds. 58 | 59 | "Database" – A collection of material (the Contents) arranged in a 60 | systematic or methodical way and individually accessible by electronic 61 | or other means offered under the terms of this License. 62 | 63 | "Database Directive" – Means Directive 96/9/EC of the European 64 | Parliament and of the Council of 11 March 1996 on the legal protection 65 | of databases, as amended or succeeded. 66 | 67 | "Database Right" – Means rights resulting from the Chapter III ("sui 68 | generis") rights in the Database Directive (as amended and as transposed 69 | by member states), which includes the Extraction and Re-utilisation of 70 | the whole or a Substantial part of the Contents, as well as any similar 71 | rights available in the relevant jurisdiction under Section 10.4. 72 | 73 | "Derivative Database" – Means a database based upon the Database, and 74 | includes any translation, adaptation, arrangement, modification, or any 75 | other alteration of the Database or of a Substantial part of the 76 | Contents. This includes, but is not limited to, Extracting or 77 | Re-utilising the whole or a Substantial part of the Contents in a new 78 | Database. 79 | 80 | "Extraction" – Means the permanent or temporary transfer of all or a 81 | Substantial part of the Contents to another medium by any means or in 82 | any form. 83 | 84 | "License" – Means this license agreement and is both a license of rights 85 | such as copyright and Database Rights and an agreement in contract. 86 | 87 | "Licensor" – Means the Person that offers the Database under the terms 88 | of this License. 89 | 90 | "Person" – Means a natural or legal person or a body of persons 91 | corporate or incorporate. 92 | 93 | "Produced Work" – a work (such as an image, audiovisual material, text, 94 | or sounds) resulting from using the whole or a Substantial part of the 95 | Contents (via a search or other query) from this Database, a Derivative 96 | Database, or this Database as part of a Collective Database. 97 | 98 | "Publicly" – means to Persons other than You or under Your control by 99 | either more than 50% ownership or by the power to direct their 100 | activities (such as contracting with an independent consultant). 101 | 102 | "Re-utilisation" – means any form of making available to the public all 103 | or a Substantial part of the Contents by the distribution of copies, by 104 | renting, by online or other forms of transmission. 105 | 106 | "Substantial" – Means substantial in terms of quantity or quality or a 107 | combination of both. The repeated and systematic Extraction or 108 | Re-utilisation of insubstantial parts of the Contents may amount to the 109 | Extraction or Re-utilisation of a Substantial part of the Contents. 110 | 111 | "Use" – As a verb, means doing any act that is restricted by copyright 112 | or Database Rights whether in the original medium or any other; and 113 | includes without limitation distributing, copying, publicly performing, 114 | publicly displaying, and preparing derivative works of the Database, as 115 | well as modifying the Database as may be technically necessary to use it 116 | in a different mode or format. 117 | 118 | "You" – Means a Person exercising rights under this License who has not 119 | previously violated the terms of this License with respect to the 120 | Database, or who has received express permission from the Licensor to 121 | exercise rights under this License despite a previous violation. 122 | 123 | Words in the singular include the plural and vice versa. 124 | 125 | ### 2.0 What this License covers 126 | 127 | 2.1. Legal effect of this document. This License is: 128 | 129 | a. A license of applicable copyright and neighbouring rights; 130 | 131 | b. A license of the Database Right; and 132 | 133 | c. An agreement in contract between You and the Licensor. 134 | 135 | 2.2 Legal rights covered. This License covers the legal rights in the 136 | Database, including: 137 | 138 | a. Copyright. Any copyright or neighbouring rights in the Database. 139 | The copyright licensed includes any individual elements of the 140 | Database, but does not cover the copyright over the Contents 141 | independent of this Database. See Section 2.4 for details. Copyright 142 | law varies between jurisdictions, but is likely to cover: the Database 143 | model or schema, which is the structure, arrangement, and organisation 144 | of the Database, and can also include the Database tables and table 145 | indexes; the data entry and output sheets; and the Field names of 146 | Contents stored in the Database; 147 | 148 | b. Database Rights. Database Rights only extend to the Extraction and 149 | Re-utilisation of the whole or a Substantial part of the Contents. 150 | Database Rights can apply even when there is no copyright over the 151 | Database. Database Rights can also apply when the Contents are removed 152 | from the Database and are selected and arranged in a way that would 153 | not infringe any applicable copyright; and 154 | 155 | c. Contract. This is an agreement between You and the Licensor for 156 | access to the Database. In return you agree to certain conditions of 157 | use on this access as outlined in this License. 158 | 159 | 2.3 Rights not covered. 160 | 161 | a. This License does not apply to computer programs used in the making 162 | or operation of the Database; 163 | 164 | b. This License does not cover any patents over the Contents or the 165 | Database; and 166 | 167 | c. This License does not cover any trademarks associated with the 168 | Database. 169 | 170 | 2.4 Relationship to Contents in the Database. The individual items of 171 | the Contents contained in this Database may be covered by other rights, 172 | including copyright, patent, data protection, privacy, or personality 173 | rights, and this License does not cover any rights (other than Database 174 | Rights or in contract) in individual Contents contained in the Database. 175 | For example, if used on a Database of images (the Contents), this 176 | License would not apply to copyright over individual images, which could 177 | have their own separate licenses, or one single license covering all of 178 | the rights over the images. 179 | 180 | ### 3.0 Rights granted 181 | 182 | 3.1 Subject to the terms and conditions of this License, the Licensor 183 | grants to You a worldwide, royalty-free, non-exclusive, terminable (but 184 | only under Section 9) license to Use the Database for the duration of 185 | any applicable copyright and Database Rights. These rights explicitly 186 | include commercial use, and do not exclude any field of endeavour. To 187 | the extent possible in the relevant jurisdiction, these rights may be 188 | exercised in all media and formats whether now known or created in the 189 | future. 190 | 191 | The rights granted cover, for example: 192 | 193 | a. Extraction and Re-utilisation of the whole or a Substantial part of 194 | the Contents; 195 | 196 | b. Creation of Derivative Databases; 197 | 198 | c. Creation of Collective Databases; 199 | 200 | d. Creation of temporary or permanent reproductions by any means and 201 | in any form, in whole or in part, including of any Derivative 202 | Databases or as a part of Collective Databases; and 203 | 204 | e. Distribution, communication, display, lending, making available, or 205 | performance to the public by any means and in any form, in whole or in 206 | part, including of any Derivative Database or as a part of Collective 207 | Databases. 208 | 209 | 3.2 Compulsory license schemes. For the avoidance of doubt: 210 | 211 | a. Non-waivable compulsory license schemes. In those jurisdictions in 212 | which the right to collect royalties through any statutory or 213 | compulsory licensing scheme cannot be waived, the Licensor reserves 214 | the exclusive right to collect such royalties for any exercise by You 215 | of the rights granted under this License; 216 | 217 | b. Waivable compulsory license schemes. In those jurisdictions in 218 | which the right to collect royalties through any statutory or 219 | compulsory licensing scheme can be waived, the Licensor waives the 220 | exclusive right to collect such royalties for any exercise by You of 221 | the rights granted under this License; and, 222 | 223 | c. Voluntary license schemes. The Licensor waives the right to collect 224 | royalties, whether individually or, in the event that the Licensor is 225 | a member of a collecting society that administers voluntary licensing 226 | schemes, via that society, from any exercise by You of the rights 227 | granted under this License. 228 | 229 | 3.3 The right to release the Database under different terms, or to stop 230 | distributing or making available the Database, is reserved. Note that 231 | this Database may be multiple-licensed, and so You may have the choice 232 | of using alternative licenses for this Database. Subject to Section 233 | 10.4, all other rights not expressly granted by Licensor are reserved. 234 | 235 | ### 4.0 Conditions of Use 236 | 237 | 4.1 The rights granted in Section 3 above are expressly made subject to 238 | Your complying with the following conditions of use. These are important 239 | conditions of this License, and if You fail to follow them, You will be 240 | in material breach of its terms. 241 | 242 | 4.2 Notices. If You Publicly Convey this Database, any Derivative 243 | Database, or the Database as part of a Collective Database, then You 244 | must: 245 | 246 | a. Do so only under the terms of this License or another license 247 | permitted under Section 4.4; 248 | 249 | b. Include a copy of this License (or, as applicable, a license 250 | permitted under Section 4.4) or its Uniform Resource Identifier (URI) 251 | with the Database or Derivative Database, including both in the 252 | Database or Derivative Database and in any relevant documentation; and 253 | 254 | c. Keep intact any copyright or Database Right notices and notices 255 | that refer to this License. 256 | 257 | d. If it is not possible to put the required notices in a particular 258 | file due to its structure, then You must include the notices in a 259 | location (such as a relevant directory) where users would be likely to 260 | look for it. 261 | 262 | 4.3 Notice for using output (Contents). Creating and Using a Produced 263 | Work does not require the notice in Section 4.2. However, if you 264 | Publicly Use a Produced Work, You must include a notice associated with 265 | the Produced Work reasonably calculated to make any Person that uses, 266 | views, accesses, interacts with, or is otherwise exposed to the Produced 267 | Work aware that Content was obtained from the Database, Derivative 268 | Database, or the Database as part of a Collective Database, and that it 269 | is available under this License. 270 | 271 | a. Example notice. The following text will satisfy notice under 272 | Section 4.3: 273 | 274 | Contains information from DATABASE NAME, which is made available 275 | here under the Open Database License (ODbL). 276 | 277 | DATABASE NAME should be replaced with the name of the Database and a 278 | hyperlink to the URI of the Database. "Open Database License" should 279 | contain a hyperlink to the URI of the text of this License. If 280 | hyperlinks are not possible, You should include the plain text of the 281 | required URI's with the above notice. 282 | 283 | 4.4 Share alike. 284 | 285 | a. Any Derivative Database that You Publicly Use must be only under 286 | the terms of: 287 | 288 | i. This License; 289 | 290 | ii. A later version of this License similar in spirit to this 291 | License; or 292 | 293 | iii. A compatible license. 294 | 295 | If You license the Derivative Database under one of the licenses 296 | mentioned in (iii), You must comply with the terms of that license. 297 | 298 | b. For the avoidance of doubt, Extraction or Re-utilisation of the 299 | whole or a Substantial part of the Contents into a new database is a 300 | Derivative Database and must comply with Section 4.4. 301 | 302 | c. Derivative Databases and Produced Works. A Derivative Database is 303 | Publicly Used and so must comply with Section 4.4. if a Produced Work 304 | created from the Derivative Database is Publicly Used. 305 | 306 | d. Share Alike and additional Contents. For the avoidance of doubt, 307 | You must not add Contents to Derivative Databases under Section 4.4 a 308 | that are incompatible with the rights granted under this License. 309 | 310 | e. Compatible licenses. Licensors may authorise a proxy to determine 311 | compatible licenses under Section 4.4 a iii. If they do so, the 312 | authorised proxy's public statement of acceptance of a compatible 313 | license grants You permission to use the compatible license. 314 | 315 | 316 | 4.5 Limits of Share Alike. The requirements of Section 4.4 do not apply 317 | in the following: 318 | 319 | a. For the avoidance of doubt, You are not required to license 320 | Collective Databases under this License if You incorporate this 321 | Database or a Derivative Database in the collection, but this License 322 | still applies to this Database or a Derivative Database as a part of 323 | the Collective Database; 324 | 325 | b. Using this Database, a Derivative Database, or this Database as 326 | part of a Collective Database to create a Produced Work does not 327 | create a Derivative Database for purposes of Section 4.4; and 328 | 329 | c. Use of a Derivative Database internally within an organisation is 330 | not to the public and therefore does not fall under the requirements 331 | of Section 4.4. 332 | 333 | 4.6 Access to Derivative Databases. If You Publicly Use a Derivative 334 | Database or a Produced Work from a Derivative Database, You must also 335 | offer to recipients of the Derivative Database or Produced Work a copy 336 | in a machine readable form of: 337 | 338 | a. The entire Derivative Database; or 339 | 340 | b. A file containing all of the alterations made to the Database or 341 | the method of making the alterations to the Database (such as an 342 | algorithm), including any additional Contents, that make up all the 343 | differences between the Database and the Derivative Database. 344 | 345 | The Derivative Database (under a.) or alteration file (under b.) must be 346 | available at no more than a reasonable production cost for physical 347 | distributions and free of charge if distributed over the internet. 348 | 349 | 4.7 Technological measures and additional terms 350 | 351 | a. This License does not allow You to impose (except subject to 352 | Section 4.7 b.) any terms or any technological measures on the 353 | Database, a Derivative Database, or the whole or a Substantial part of 354 | the Contents that alter or restrict the terms of this License, or any 355 | rights granted under it, or have the effect or intent of restricting 356 | the ability of any person to exercise those rights. 357 | 358 | b. Parallel distribution. You may impose terms or technological 359 | measures on the Database, a Derivative Database, or the whole or a 360 | Substantial part of the Contents (a "Restricted Database") in 361 | contravention of Section 4.74 a. only if You also make a copy of the 362 | Database or a Derivative Database available to the recipient of the 363 | Restricted Database: 364 | 365 | i. That is available without additional fee; 366 | 367 | ii. That is available in a medium that does not alter or restrict 368 | the terms of this License, or any rights granted under it, or have 369 | the effect or intent of restricting the ability of any person to 370 | exercise those rights (an "Unrestricted Database"); and 371 | 372 | iii. The Unrestricted Database is at least as accessible to the 373 | recipient as a practical matter as the Restricted Database. 374 | 375 | c. For the avoidance of doubt, You may place this Database or a 376 | Derivative Database in an authenticated environment, behind a 377 | password, or within a similar access control scheme provided that You 378 | do not alter or restrict the terms of this License or any rights 379 | granted under it or have the effect or intent of restricting the 380 | ability of any person to exercise those rights. 381 | 382 | 4.8 Licensing of others. You may not sublicense the Database. Each time 383 | You communicate the Database, the whole or Substantial part of the 384 | Contents, or any Derivative Database to anyone else in any way, the 385 | Licensor offers to the recipient a license to the Database on the same 386 | terms and conditions as this License. You are not responsible for 387 | enforcing compliance by third parties with this License, but You may 388 | enforce any rights that You have over a Derivative Database. You are 389 | solely responsible for any modifications of a Derivative Database made 390 | by You or another Person at Your direction. You may not impose any 391 | further restrictions on the exercise of the rights granted or affirmed 392 | under this License. 393 | 394 | ### 5.0 Moral rights 395 | 396 | 5.1 Moral rights. This section covers moral rights, including any rights 397 | to be identified as the author of the Database or to object to treatment 398 | that would otherwise prejudice the author's honour and reputation, or 399 | any other derogatory treatment: 400 | 401 | a. For jurisdictions allowing waiver of moral rights, Licensor waives 402 | all moral rights that Licensor may have in the Database to the fullest 403 | extent possible by the law of the relevant jurisdiction under Section 404 | 10.4; 405 | 406 | b. If waiver of moral rights under Section 5.1 a in the relevant 407 | jurisdiction is not possible, Licensor agrees not to assert any moral 408 | rights over the Database and waives all claims in moral rights to the 409 | fullest extent possible by the law of the relevant jurisdiction under 410 | Section 10.4; and 411 | 412 | c. For jurisdictions not allowing waiver or an agreement not to assert 413 | moral rights under Section 5.1 a and b, the author may retain their 414 | moral rights over certain aspects of the Database. 415 | 416 | Please note that some jurisdictions do not allow for the waiver of moral 417 | rights, and so moral rights may still subsist over the Database in some 418 | jurisdictions. 419 | 420 | ### 6.0 Fair dealing, Database exceptions, and other rights not affected 421 | 422 | 6.1 This License does not affect any rights that You or anyone else may 423 | independently have under any applicable law to make any use of this 424 | Database, including without limitation: 425 | 426 | a. Exceptions to the Database Right including: Extraction of Contents 427 | from non-electronic Databases for private purposes, Extraction for 428 | purposes of illustration for teaching or scientific research, and 429 | Extraction or Re-utilisation for public security or an administrative 430 | or judicial procedure. 431 | 432 | b. Fair dealing, fair use, or any other legally recognised limitation 433 | or exception to infringement of copyright or other applicable laws. 434 | 435 | 6.2 This License does not affect any rights of lawful users to Extract 436 | and Re-utilise insubstantial parts of the Contents, evaluated 437 | quantitatively or qualitatively, for any purposes whatsoever, including 438 | creating a Derivative Database (subject to other rights over the 439 | Contents, see Section 2.4). The repeated and systematic Extraction or 440 | Re-utilisation of insubstantial parts of the Contents may however amount 441 | to the Extraction or Re-utilisation of a Substantial part of the 442 | Contents. 443 | 444 | ### 7.0 Warranties and Disclaimer 445 | 446 | 7.1 The Database is licensed by the Licensor "as is" and without any 447 | warranty of any kind, either express, implied, or arising by statute, 448 | custom, course of dealing, or trade usage. Licensor specifically 449 | disclaims any and all implied warranties or conditions of title, 450 | non-infringement, accuracy or completeness, the presence or absence of 451 | errors, fitness for a particular purpose, merchantability, or otherwise. 452 | Some jurisdictions do not allow the exclusion of implied warranties, so 453 | this exclusion may not apply to You. 454 | 455 | ### 8.0 Limitation of liability 456 | 457 | 8.1 Subject to any liability that may not be excluded or limited by law, 458 | the Licensor is not liable for, and expressly excludes, all liability 459 | for loss or damage however and whenever caused to anyone by any use 460 | under this License, whether by You or by anyone else, and whether caused 461 | by any fault on the part of the Licensor or not. This exclusion of 462 | liability includes, but is not limited to, any special, incidental, 463 | consequential, punitive, or exemplary damages such as loss of revenue, 464 | data, anticipated profits, and lost business. This exclusion applies 465 | even if the Licensor has been advised of the possibility of such 466 | damages. 467 | 468 | 8.2 If liability may not be excluded by law, it is limited to actual and 469 | direct financial loss to the extent it is caused by proved negligence on 470 | the part of the Licensor. 471 | 472 | ### 9.0 Termination of Your rights under this License 473 | 474 | 9.1 Any breach by You of the terms and conditions of this License 475 | automatically terminates this License with immediate effect and without 476 | notice to You. For the avoidance of doubt, Persons who have received the 477 | Database, the whole or a Substantial part of the Contents, Derivative 478 | Databases, or the Database as part of a Collective Database from You 479 | under this License will not have their licenses terminated provided 480 | their use is in full compliance with this License or a license granted 481 | under Section 4.8 of this License. Sections 1, 2, 7, 8, 9 and 10 will 482 | survive any termination of this License. 483 | 484 | 9.2 If You are not in breach of the terms of this License, the Licensor 485 | will not terminate Your rights under it. 486 | 487 | 9.3 Unless terminated under Section 9.1, this License is granted to You 488 | for the duration of applicable rights in the Database. 489 | 490 | 9.4 Reinstatement of rights. If you cease any breach of the terms and 491 | conditions of this License, then your full rights under this License 492 | will be reinstated: 493 | 494 | a. Provisionally and subject to permanent termination until the 60th 495 | day after cessation of breach; 496 | 497 | b. Permanently on the 60th day after cessation of breach unless 498 | otherwise reasonably notified by the Licensor; or 499 | 500 | c. Permanently if reasonably notified by the Licensor of the 501 | violation, this is the first time You have received notice of 502 | violation of this License from the Licensor, and You cure the 503 | violation prior to 30 days after your receipt of the notice. 504 | 505 | Persons subject to permanent termination of rights are not eligible to 506 | be a recipient and receive a license under Section 4.8. 507 | 508 | 9.5 Notwithstanding the above, Licensor reserves the right to release 509 | the Database under different license terms or to stop distributing or 510 | making available the Database. Releasing the Database under different 511 | license terms or stopping the distribution of the Database will not 512 | withdraw this License (or any other license that has been, or is 513 | required to be, granted under the terms of this License), and this 514 | License will continue in full force and effect unless terminated as 515 | stated above. 516 | 517 | ### 10.0 General 518 | 519 | 10.1 If any provision of this License is held to be invalid or 520 | unenforceable, that must not affect the validity or enforceability of 521 | the remainder of the terms and conditions of this License and each 522 | remaining provision of this License shall be valid and enforced to the 523 | fullest extent permitted by law. 524 | 525 | 10.2 This License is the entire agreement between the parties with 526 | respect to the rights granted here over the Database. It replaces any 527 | earlier understandings, agreements or representations with respect to 528 | the Database. 529 | 530 | 10.3 If You are in breach of the terms of this License, You will not be 531 | entitled to rely on the terms of this License or to complain of any 532 | breach by the Licensor. 533 | 534 | 10.4 Choice of law. This License takes effect in and will be governed by 535 | the laws of the relevant jurisdiction in which the License terms are 536 | sought to be enforced. If the standard suite of rights granted under 537 | applicable copyright law and Database Rights in the relevant 538 | jurisdiction includes additional rights not granted under this License, 539 | these additional rights are granted in this License in order to meet the 540 | terms of this License. 541 | 542 | -------------------------------------------------------------------------------- /dump/README.txt: -------------------------------------------------------------------------------- 1 | -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 2 | SimpleGeo International Neighborhoods Dump 3 | -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- 4 | 5 | This archive contains the SimpleGeo International Neighborhoods dataset from 04 6 | August 2011 in various formats. 7 | 8 | See the blog post for more details on the dataset: 9 | 10 | http://blog.simplegeo.com/2011/08/05/its-a-beautiful-day-in-the-neighborhood/ 11 | 12 | ------------- 13 | What's Inside 14 | ------------- 15 | 16 | geojson/ GeoJSON, one per city 17 | kml/ KML, one per city 18 | shp/ A single ESRI Shapefile of the entire dataset 19 | 20 | Each record in the Shapefile also includes the name of the city, the WoE ID of 21 | the city (parent_id), and the WoE feature type (either "Suburb" in the case of 22 | informal neighborhoods, or "LocalAdmin" in the case of formal city divisions). 23 | 24 | Otherwise, the three formats provided in this archive are functionally 25 | identical. 26 | 27 | ------------------- 28 | Where It Comes From 29 | ------------------- 30 | 31 | SimpleGeo produced this dataset using Open Source software published at 32 | http://github.com/simplegeo/betashapes/. 33 | 34 | This dataset is derived from: 35 | 36 | * Yahoo! GeoPlanet (http://developer.yahoo.com/geo/geoplanet/data/) 37 | * OpenStreetMap (http://openstreetmap.org/) 38 | * the public Flickr API (http://www.flickr.com/services/api/) 39 | 40 | Because the dataset is based on OSM, we make it available to you under the ODC 41 | Open Database License 1.0. Please see the license summary and disclaimer below, 42 | and the full license text as given in LICENSE.txt. 43 | 44 | SimpleGeo makes it easy for developers to build location-aware applications. 45 | Find out more at http://simplegeo.com/! 46 | 47 | --------------------------------- 48 | What You're Welcome to Do With It 49 | --------------------------------- 50 | 51 | You are free: 52 | 53 | To Share: To copy, distribute and use the database. 54 | To Create: To produce works from the database. 55 | To Adapt: To modify, transform and build upon the database. 56 | 57 | As long as you: 58 | 59 | Attribute: You must attribute any public use of the database, or works 60 | produced from the database, in the manner specified in the ODbL. For any 61 | use or redistribution of the database, or works produced from it, you must 62 | make clear to others the license of the database and keep intact any 63 | notices on the original database. 64 | 65 | Share-Alike: If you publicly use any adapted version of this database, or 66 | works produced from an adapted database, you must also offer that adapted 67 | database under the ODbL. 68 | 69 | Keep open: If you redistribute the database, or an adapted version of it, 70 | then you may use technological measures that restrict the work (such as 71 | DRM) as long as you also redistribute a version without such measures. 72 | 73 | Disclaimer: 74 | 75 | The above summary is not the license text. It is simply a handy reference 76 | for understanding the ODbL 1.0 — it is a human-readable expression of some 77 | of its key terms. This summary has no legal value, and its contents do not 78 | appear in the actual license. Read the full ODbL 1.0 license text for the 79 | exact terms that apply. 80 | 81 | THIS DATABASE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 82 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 83 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 84 | DISCLAIMED. IN NO EVENT SHALL SIMPLEGEO, INC. BE LIABLE FOR ANY DIRECT, 85 | INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 86 | BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 87 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 88 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE 89 | OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATABASE, EVEN IF 90 | ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 91 | 92 | The full text of the license is given in the LICENSE.txt file. It can also be found at 93 | http://opendatacommons.org/licenses/odbl/1-0/. 94 | 95 | =30= 96 | -------------------------------------------------------------------------------- /geocrawlr.py: -------------------------------------------------------------------------------- 1 | import Flickr.API 2 | import json, time, sys, os 3 | 4 | FLICKR_KEY = os.environ["FLICKR_KEY"] 5 | FLICKR_SECRET = os.environ["FLICKR_SECRET"] 6 | 7 | START_PAGE = 1 8 | END_PAGE = 10 9 | 10 | api = Flickr.API.API(FLICKR_KEY, FLICKR_SECRET) 11 | 12 | for woe_id in map(int, sys.argv[1:]): 13 | print >>sys.stderr, "WOEID:", woe_id 14 | page = total_pages = START_PAGE 15 | 16 | while page <= total_pages: 17 | print >>sys.stderr, ">>> Reading %d of %d... " % (page, total_pages), 18 | request = Flickr.API.Request( 19 | method="flickr.photos.search", 20 | format="json", 21 | nojsoncallback=1, 22 | sort="interestingness-desc", 23 | page=page, 24 | woe_id=woe_id, 25 | extras="geo", 26 | min_date_taken="2007-01-01 00:00:00" 27 | ) 28 | start = time.time() 29 | response = None 30 | while response is None: 31 | try: 32 | response = api.execute_request(request).read() 33 | except Exception, e: 34 | print >>sys.stderr, "Retrying due to:", e 35 | try: 36 | result = json.loads(response) 37 | result = result["photos"] 38 | print >>sys.stderr, "%d results, %.1fs elapsed." % (len(result["photo"]),time.time()-start) 39 | for item in result["photo"]: 40 | try: 41 | print "\t".join(str(item[k]) for k in ("id","woeid","longitude","latitude")) 42 | except Exception, e: 43 | print >>sys.stderr, e 44 | total_pages = min(int(result["pages"]), END_PAGE) 45 | #time.sleep(1.0) 46 | except Exception, e: 47 | print >>sys.stderr, e 48 | page += 1 49 | -------------------------------------------------------------------------------- /junk/OsmApi.py: -------------------------------------------------------------------------------- 1 | #-*- coding: utf-8 -*- 2 | 3 | ########################################################################### 4 | ## ## 5 | ## Copyrights Etienne Chové 2009-2010 ## 6 | ## ## 7 | ## This program is free software: you can redistribute it and/or modify ## 8 | ## it under the terms of the GNU General Public License as published by ## 9 | ## the Free Software Foundation, either version 3 of the License, or ## 10 | ## (at your option) any later version. ## 11 | ## ## 12 | ## This program is distributed in the hope that it will be useful, ## 13 | ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## 14 | ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## 15 | ## GNU General Public License for more details. ## 16 | ## ## 17 | ## You should have received a copy of the GNU General Public License ## 18 | ## along with this program. If not, see . ## 19 | ## ## 20 | ########################################################################### 21 | 22 | ## HomePage : http://wiki.openstreetmap.org/wiki/PythonOsmApi 23 | 24 | ########################################################################### 25 | ## History ## 26 | ########################################################################### 27 | ## 0.2.19 2010-05-24 Add debug message on ApiError ## 28 | ## 0.2.18 2010-04-20 Fix ChangesetClose and _http_request ## 29 | ## 0.2.17 2010-01-02 Capabilities implementation ## 30 | ## 0.2.16 2010-01-02 ChangesetsGet by Alexander Rampp ## 31 | ## 0.2.15 2009-12-16 xml encoding error for < and > ## 32 | ## 0.2.14 2009-11-20 changesetautomulti parameter ## 33 | ## 0.2.13 2009-11-16 modify instead update for osc ## 34 | ## 0.2.12 2009-11-14 raise ApiError on 4xx errors -- Xoff ## 35 | ## 0.2.11 2009-10-14 unicode error on ChangesetUpload ## 36 | ## 0.2.10 2009-10-14 RelationFullRecur definition ## 37 | ## 0.2.9 2009-10-13 automatic changeset management ## 38 | ## ChangesetUpload implementation ## 39 | ## 0.2.8 2009-10-13 *(Create|Update|Delete) use not unique _do method ## 40 | ## 0.2.7 2009-10-09 implement all missing fonctions except ## 41 | ## ChangesetsGet and GetCapabilities ## 42 | ## 0.2.6 2009-10-09 encoding clean-up ## 43 | ## 0.2.5 2009-10-09 implements NodesGet, WaysGet, RelationsGet ## 44 | ## ParseOsm, ParseOsc ## 45 | ## 0.2.4 2009-10-06 clean-up ## 46 | ## 0.2.3 2009-09-09 keep http connection alive for multiple request ## 47 | ## (Node|Way|Relation)Get return None when object ## 48 | ## have been deleted (raising error before) ## 49 | ## 0.2.2 2009-07-13 can identify applications built on top of the lib ## 50 | ## 0.2.1 2009-05-05 some changes in constructor -- chove@crans.org ## 51 | ## 0.2 2009-05-01 initial import ## 52 | ########################################################################### 53 | 54 | __version__ = '0.2.19' 55 | 56 | import httplib, base64, xml.dom.minidom, time, sys, urllib 57 | 58 | class ApiError(Exception): 59 | 60 | def __init__(self, status, reason, payload): 61 | self.status = status 62 | self.reason = reason 63 | self.payload = payload 64 | 65 | def __str__(self): 66 | return "Request failed: " + str(self.status) + " - " + self.reason + " - " + self.payload 67 | 68 | ########################################################################### 69 | ## Main class ## 70 | 71 | class OsmApi: 72 | 73 | def __init__(self, 74 | username = None, 75 | password = None, 76 | passwordfile = None, 77 | appid = "", 78 | created_by = "PythonOsmApi/"+__version__, 79 | api = "www.openstreetmap.org", 80 | changesetauto = False, 81 | changesetautotags = {}, 82 | changesetautosize = 500, 83 | changesetautomulti = 1, 84 | debug = False 85 | ): 86 | 87 | # debug 88 | self._debug = debug 89 | 90 | # Get username 91 | if username: 92 | self._username = username 93 | elif passwordfile: 94 | self._username = open(passwordfile).readline().split(":")[0].strip() 95 | 96 | # Get password 97 | if password: 98 | self._password = password 99 | elif passwordfile: 100 | for l in open(passwordfile).readlines(): 101 | l = l.strip().split(":") 102 | if l[0] == self._username: 103 | self._password = l[1] 104 | 105 | # Changest informations 106 | self._changesetauto = changesetauto # auto create and close changesets 107 | self._changesetautotags = changesetautotags # tags for automatic created changesets 108 | self._changesetautosize = changesetautosize # change count for auto changeset 109 | self._changesetautosize = changesetautosize # change count for auto changeset 110 | self._changesetautomulti = changesetautomulti # close a changeset every # upload 111 | self._changesetautocpt = 0 112 | self._changesetautodata = [] # data to upload for auto group 113 | 114 | # Get API 115 | self._api = api 116 | self._xapi = True if "xapi" in api else False 117 | 118 | # Get created_by 119 | if not appid: 120 | self._created_by = created_by 121 | else: 122 | self._created_by = appid + " (" + created_by + ")" 123 | 124 | # Initialisation 125 | self._CurrentChangesetId = 0 126 | 127 | # Http connection 128 | self._conn = httplib.HTTPConnection(self._api, 80) 129 | 130 | def __del__(self): 131 | if self._changesetauto: 132 | self._changesetautoflush(True) 133 | return None 134 | 135 | ####################################################################### 136 | # Capabilities # 137 | ####################################################################### 138 | 139 | def Capabilities(self): 140 | """ Returns ApiCapabilities. """ 141 | uri = "/api/capabilities" 142 | data = self._get(uri) 143 | data = xml.dom.minidom.parseString(data) 144 | print data.getElementsByTagName("osm") 145 | data = data.getElementsByTagName("osm")[0].getElementsByTagName("api")[0] 146 | result = {} 147 | for elem in data.childNodes: 148 | if elem.nodeType <> elem.ELEMENT_NODE: 149 | continue 150 | result[elem.nodeName] = {} 151 | print elem.nodeName 152 | for k, v in elem.attributes.items(): 153 | try: 154 | result[elem.nodeName][k] = float(v) 155 | except: 156 | result[elem.nodeName][k] = v 157 | return result 158 | 159 | ####################################################################### 160 | # Node # 161 | ####################################################################### 162 | 163 | def NodeGet(self, NodeId, NodeVersion = -1): 164 | """ Returns NodeData for node #NodeId. """ 165 | uri = "/api/0.6/node/"+str(NodeId) 166 | if NodeVersion <> -1: uri += "/"+str(NodeVersion) 167 | data = self._get(uri) 168 | if not data: return data 169 | data = xml.dom.minidom.parseString(data) 170 | data = data.getElementsByTagName("osm")[0].getElementsByTagName("node")[0] 171 | return self._DomParseNode(data) 172 | 173 | def NodeCreate(self, NodeData): 174 | """ Creates a node. Returns updated NodeData (without timestamp). """ 175 | return self._do("create", "node", NodeData) 176 | 177 | def NodeUpdate(self, NodeData): 178 | """ Updates node with NodeData. Returns updated NodeData (without timestamp). """ 179 | return self._do("modify", "node", NodeData) 180 | 181 | def NodeDelete(self, NodeData): 182 | """ Delete node with NodeData. Returns updated NodeData (without timestamp). """ 183 | return self._do("delete", "node", NodeData) 184 | 185 | def NodeHistory(self, NodeId): 186 | """ Returns dict(NodeVerrsion: NodeData). """ 187 | uri = "/api/0.6/node/"+str(NodeId)+"/history" 188 | data = self._get(uri) 189 | data = xml.dom.minidom.parseString(data) 190 | result = {} 191 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("node"): 192 | data = self._DomParseNode(data) 193 | result[data[u"version"]] = data 194 | return result 195 | 196 | def NodeWays(self, NodeId): 197 | """ Returns [WayData, ... ] containing node #NodeId. """ 198 | uri = "/api/0.6/node/%d/ways"%NodeId 199 | data = self._get(uri) 200 | data = xml.dom.minidom.parseString(data) 201 | result = [] 202 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("way"): 203 | data = self._DomParseRelation(data) 204 | result.append(data) 205 | return result 206 | 207 | def NodeRelations(self, NodeId): 208 | """ Returns [RelationData, ... ] containing node #NodeId. """ 209 | uri = "/api/0.6/node/%d/relations"%NodeId 210 | data = self._get(uri) 211 | data = xml.dom.minidom.parseString(data) 212 | result = [] 213 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("relation"): 214 | data = self._DomParseRelation(data) 215 | result.append(data) 216 | return result 217 | 218 | def NodesGet(self, NodeIdList): 219 | """ Returns dict(NodeId: NodeData) for each node in NodeIdList """ 220 | uri = "/api/0.6/nodes?nodes=" + ",".join([str(x) for x in NodeIdList]) 221 | data = self._get(uri) 222 | data = xml.dom.minidom.parseString(data) 223 | result = {} 224 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("node"): 225 | data = self._DomParseNode(data) 226 | result[data[u"id"]] = data 227 | return result 228 | 229 | ####################################################################### 230 | # Way # 231 | ####################################################################### 232 | 233 | def WayGet(self, WayId, WayVersion = -1): 234 | """ Returns WayData for way #WayId. """ 235 | uri = "/api/0.6/way/"+str(WayId) 236 | if WayVersion <> -1: uri += "/"+str(WayVersion) 237 | data = self._get(uri) 238 | if not data: return data 239 | data = xml.dom.minidom.parseString(data) 240 | data = data.getElementsByTagName("osm")[0].getElementsByTagName("way")[0] 241 | return self._DomParseWay(data) 242 | 243 | def WayCreate(self, WayData): 244 | """ Creates a way. Returns updated WayData (without timestamp). """ 245 | return self._do("create", "way", WayData) 246 | 247 | def WayUpdate(self, WayData): 248 | """ Updates way with WayData. Returns updated WayData (without timestamp). """ 249 | return self._do("modify", "way", WayData) 250 | 251 | def WayDelete(self, WayData): 252 | """ Delete way with WayData. Returns updated WayData (without timestamp). """ 253 | return self._do("delete", "way", WayData) 254 | 255 | def WayHistory(self, WayId): 256 | """ Returns dict(WayVerrsion: WayData). """ 257 | uri = "/api/0.6/way/"+str(WayId)+"/history" 258 | data = self._get(uri) 259 | data = xml.dom.minidom.parseString(data) 260 | result = {} 261 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("way"): 262 | data = self._DomParseWay(data) 263 | result[data[u"version"]] = data 264 | return result 265 | 266 | def WayRelations(self, WayId): 267 | """ Returns [RelationData, ...] containing way #WayId. """ 268 | uri = "/api/0.6/way/%d/relations"%WayId 269 | data = self._get(uri) 270 | data = xml.dom.minidom.parseString(data) 271 | result = [] 272 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("relation"): 273 | data = self._DomParseRelation(data) 274 | result.append(data) 275 | return result 276 | 277 | def WayFull(self, WayId): 278 | """ Return full data for way WayId as list of {type: node|way|relation, data: {}}. """ 279 | uri = "/api/0.6/way/"+str(WayId)+"/full" 280 | data = self._get(uri) 281 | return self.ParseOsm(data) 282 | 283 | def WaysGet(self, WayIdList): 284 | """ Returns dict(WayId: WayData) for each way in WayIdList """ 285 | uri = "/api/0.6/ways?ways=" + ",".join([str(x) for x in WayIdList]) 286 | data = self._get(uri) 287 | data = xml.dom.minidom.parseString(data) 288 | result = {} 289 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("way"): 290 | data = self._DomParseWay(data) 291 | result[data[u"id"]] = data 292 | return result 293 | 294 | ####################################################################### 295 | # Relation # 296 | ####################################################################### 297 | 298 | def RelationGet(self, RelationId, RelationVersion = -1): 299 | """ Returns RelationData for relation #RelationId. """ 300 | uri = "/api/0.6/relation/"+str(RelationId) 301 | if RelationVersion <> -1: uri += "/"+str(RelationVersion) 302 | data = self._get(uri) 303 | if not data: return data 304 | data = xml.dom.minidom.parseString(data) 305 | data = data.getElementsByTagName("osm")[0].getElementsByTagName("relation")[0] 306 | return self._DomParseRelation(data) 307 | 308 | def RelationCreate(self, RelationData): 309 | """ Creates a relation. Returns updated RelationData (without timestamp). """ 310 | return self._do("create", "relation", RelationData) 311 | 312 | def RelationUpdate(self, RelationData): 313 | """ Updates relation with RelationData. Returns updated RelationData (without timestamp). """ 314 | return self._do("modify", "relation", RelationData) 315 | 316 | def RelationDelete(self, RelationData): 317 | """ Delete relation with RelationData. Returns updated RelationData (without timestamp). """ 318 | return self._do("delete", "relation", RelationData) 319 | 320 | def RelationHistory(self, RelationId): 321 | """ Returns dict(RelationVerrsion: RelationData). """ 322 | uri = "/api/0.6/relation/"+str(RelationId)+"/history" 323 | data = self._get(uri) 324 | data = xml.dom.minidom.parseString(data) 325 | result = {} 326 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("relation"): 327 | data = self._DomParseRelation(data) 328 | result[data[u"version"]] = data 329 | return result 330 | 331 | def RelationRelations(self, RelationId): 332 | """ Returns list of RelationData containing relation #RelationId. """ 333 | uri = "/api/0.6/relation/%d/relations"%RelationId 334 | data = self._get(uri) 335 | data = xml.dom.minidom.parseString(data) 336 | result = [] 337 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("relation"): 338 | data = self._DomParseRelation(data) 339 | result.append(data) 340 | return result 341 | 342 | def RelationFullRecur(self, RelationId): 343 | """ Return full data for relation RelationId. Recurisve version relation of relations. """ 344 | data = [] 345 | todo = [RelationId] 346 | done = [] 347 | while todo: 348 | rid = todo.pop(0) 349 | done.append(rid) 350 | temp = self.RelationFull(rid) 351 | for item in temp: 352 | if item["type"] <> "relation": 353 | continue 354 | if item["data"]["id"] in done: 355 | continue 356 | todo.append(item["data"]["id"]) 357 | data += temp 358 | return data 359 | 360 | def RelationFull(self, RelationId): 361 | """ Return full data for relation RelationId as list of {type: node|way|relation, data: {}}. """ 362 | uri = "/api/0.6/relation/"+str(RelationId)+"/full" 363 | data = self._get(uri) 364 | return self.ParseOsm(data) 365 | 366 | def RelationsGet(self, RelationIdList): 367 | """ Returns dict(RelationId: RelationData) for each relation in RelationIdList """ 368 | uri = "/api/0.6/relations?relations=" + ",".join([str(x) for x in RelationIdList]) 369 | data = self._get(uri) 370 | data = xml.dom.minidom.parseString(data) 371 | result = {} 372 | for data in data.getElementsByTagName("osm")[0].getElementsByTagName("relation"): 373 | data = self._DomParseRelation(data) 374 | result[data[u"id"]] = data 375 | return result 376 | 377 | ####################################################################### 378 | # Changeset # 379 | ####################################################################### 380 | 381 | def ChangesetGet(self, ChangesetId): 382 | """ Returns ChangesetData for changeset #ChangesetId. """ 383 | data = self._get("/api/0.6/changeset/"+str(ChangesetId)) 384 | data = xml.dom.minidom.parseString(data) 385 | data = data.getElementsByTagName("osm")[0].getElementsByTagName("changeset")[0] 386 | return self._DomParseChangeset(data) 387 | 388 | def ChangesetUpdate(self, ChangesetTags = {}): 389 | """ Updates current changeset with ChangesetTags. """ 390 | if self._CurrentChangesetId == -1: 391 | raise Exception, "No changeset currently opened" 392 | if u"created_by" not in ChangesetTags: 393 | ChangesetTags[u"created_by"] = self._created_by 394 | result = self._put("/api/0.6/changeset/"+str(self._CurrentChangesetId), self._XmlBuild("changeset", {u"tag": ChangesetTags})) 395 | return self._CurrentChangesetId 396 | 397 | def ChangesetCreate(self, ChangesetTags = {}): 398 | """ Opens a changeset. Returns #ChangesetId. """ 399 | if self._CurrentChangesetId: 400 | raise Exception, "Changeset alreadey opened" 401 | if u"created_by" not in ChangesetTags: 402 | ChangesetTags[u"created_by"] = self._created_by 403 | result = self._put("/api/0.6/changeset/create", self._XmlBuild("changeset", {u"tag": ChangesetTags})) 404 | self._CurrentChangesetId = int(result) 405 | return self._CurrentChangesetId 406 | 407 | def ChangesetClose(self): 408 | """ Closes current changeset. Returns #ChangesetId. """ 409 | if not self._CurrentChangesetId: 410 | raise Exception, "No changeset currently opened" 411 | result = self._put("/api/0.6/changeset/"+str(self._CurrentChangesetId)+"/close", u"") 412 | CurrentChangesetId = self._CurrentChangesetId 413 | self._CurrentChangesetId = 0 414 | return CurrentChangesetId 415 | 416 | def ChangesetUpload(self, ChangesData): 417 | """ Upload data. ChangesData is a list of dict {type: node|way|relation, action: create|delete|modify, data: {}}. Returns list with updated ids. """ 418 | data = "" 419 | data += u"\n" 420 | data += u"\n" 421 | for change in ChangesData: 422 | data += u"<"+change["action"]+">\n" 423 | change["data"]["changeset"] = self._CurrentChangesetId 424 | data += self._XmlBuild(change["type"], change["data"], False).decode("utf-8") 425 | data += u"\n" 426 | data += u"" 427 | data = self._http("POST", "/api/0.6/changeset/"+str(self._CurrentChangesetId)+"/upload", True, data.encode("utf-8")) 428 | data = xml.dom.minidom.parseString(data) 429 | data = data.getElementsByTagName("diffResult")[0] 430 | data = [x for x in data.childNodes if x.nodeType == x.ELEMENT_NODE] 431 | for i in range(len(ChangesData)): 432 | if ChangesData[i]["action"] == "delete": 433 | ChangesData[i]["data"].pop("version") 434 | else: 435 | ChangesData[i]["data"]["version"] = int(data[i].getAttribute("new_id")) 436 | return ChangesData 437 | 438 | def ChangesetDownload(self, ChangesetId): 439 | """ Download data from a changeset. Returns list of dict {type: node|way|relation, action: create|delete|modify, data: {}}. """ 440 | uri = "/api/0.6/changeset/"+str(ChangesetId)+"/download" 441 | data = self._get(uri) 442 | return self.ParseOsc(data) 443 | 444 | def ChangesetsGet(self, min_lon=None, min_lat=None, max_lon=None, max_lat=None, 445 | userid=None, username=None, 446 | closed_after=None, created_before=None, 447 | only_open=False, only_closed=False): 448 | """ Returns dict(ChangsetId: ChangesetData) matching all criteria. """ 449 | 450 | uri = "/api/0.6/changesets" 451 | params = {} 452 | if min_lon or min_lat or max_lon or max_lat: 453 | params["bbox"] = ",".join([str(min_lon),str(min_lat),str(max_lon),str(max_lat)]) 454 | if userid: 455 | params["user"] = userid 456 | if username: 457 | params["display_name"] = username 458 | if closed_after and not created_before: 459 | params["time"] = closed_after 460 | if created_before: 461 | if not closed_after: 462 | closed_after = "1970-01-01T00:00:00Z" 463 | params["time"] = closed_after + "," + created_before 464 | if only_open: 465 | params["open"] = 1 466 | if only_closed: 467 | params["closed"] = 1 468 | 469 | if params: 470 | uri += "?" + urllib.urlencode(params) 471 | 472 | data = self._get(uri) 473 | data = xml.dom.minidom.parseString(data) 474 | data = data.getElementsByTagName("osm")[0].getElementsByTagName("changeset") 475 | result = {} 476 | for curChangeset in data: 477 | tmpCS = self._DomParseChangeset(curChangeset) 478 | result[tmpCS["id"]] = tmpCS 479 | return result 480 | 481 | ####################################################################### 482 | # Other # 483 | ####################################################################### 484 | 485 | def Map(self, min_lon, min_lat, max_lon, max_lat, **kwargs): 486 | """ Download data in bounding box. Returns list of dict {type: node|way|relation, data: {}}. """ 487 | if False: #self._xapi: 488 | kwargs["bbox"] = "bbox=%f,%f,%f,%f" % (min_lon, min_lat, max_lon, max_lat) 489 | args = ["[%s=%s]" % item for item in kwargs.items()] 490 | uri = "/api/0.6/*" + "".join(args) 491 | else: 492 | uri = "/api/0.6/map?bbox=%f,%f,%f,%f"%(min_lon, min_lat, max_lon, max_lat) 493 | 494 | data = self._get(uri) 495 | return self.ParseOsm(data) 496 | 497 | ####################################################################### 498 | # Data parser # 499 | ####################################################################### 500 | 501 | def ParseOsm(self, data): 502 | """ Parse osm data. Returns list of dict {type: node|way|relation, data: {}}. """ 503 | data = xml.dom.minidom.parseString(data) 504 | data = data.getElementsByTagName("osm")[0] 505 | result = [] 506 | for elem in data.childNodes: 507 | if elem.nodeName == u"node": 508 | result.append({u"type": elem.nodeName, u"data": self._DomParseNode(elem)}) 509 | elif elem.nodeName == u"way": 510 | result.append({u"type": elem.nodeName, u"data": self._DomParseWay(elem)}) 511 | elif elem.nodeName == u"relation": 512 | result.append({u"type": elem.nodeName, u"data": self._DomParseRelation(elem)}) 513 | return result 514 | 515 | def ParseOsc(self, data): 516 | """ Parse osc data. Returns list of dict {type: node|way|relation, action: create|delete|modify, data: {}}. """ 517 | data = xml.dom.minidom.parseString(data) 518 | data = data.getElementsByTagName("osmChange")[0] 519 | result = [] 520 | for action in data.childNodes: 521 | if action.nodeName == u"#text": continue 522 | for elem in action.childNodes: 523 | if elem.nodeName == u"node": 524 | result.append({u"action":action.nodeName, u"type": elem.nodeName, u"data": self._DomParseNode(elem)}) 525 | elif elem.nodeName == u"way": 526 | result.append({u"action":action.nodeName, u"type": elem.nodeName, u"data": self._DomParseWay(elem)}) 527 | elif elem.nodeName == u"relation": 528 | result.append({u"action":action.nodeName, u"type": elem.nodeName, u"data": self._DomParseRelation(elem)}) 529 | return result 530 | 531 | ####################################################################### 532 | # Internal http function # 533 | ####################################################################### 534 | 535 | def _do(self, action, OsmType, OsmData): 536 | if self._changesetauto: 537 | self._changesetautodata.append({"action":action, "type":OsmType, "data":OsmData}) 538 | self._changesetautoflush() 539 | return None 540 | else: 541 | return self._do_manu(action, OsmType, OsmData) 542 | 543 | def _do_manu(self, action, OsmType, OsmData): 544 | if not self._CurrentChangesetId: 545 | raise Exception, "You need to open a changeset before uploading data" 546 | if u"timestamp" in OsmData: 547 | OsmData.pop(u"timestamp") 548 | OsmData[u"changeset"] = self._CurrentChangesetId 549 | if action == "create": 550 | if OsmData.get(u"id", -1) > 0: 551 | raise Exception, "This "+OsmType+" already exists" 552 | result = self._put("/api/0.6/"+OsmType+"/create", self._XmlBuild(OsmType, OsmData)) 553 | OsmData[u"id"] = int(result.strip()) 554 | OsmData[u"version"] = 1 555 | return OsmData 556 | elif action == "modify": 557 | result = self._put("/api/0.6/"+OsmType+"/"+str(OsmData[u"id"]), self._XmlBuild(OsmType, OsmData)) 558 | OsmData[u"version"] = int(result.strip()) 559 | return OsmData 560 | elif action =="delete": 561 | result = self._delete("/api/0.6/"+OsmType+"/"+str(OsmData[u"id"]), self._XmlBuild(OsmType, OsmData)) 562 | OsmData[u"version"] = int(result.strip()) 563 | OsmData[u"visible"] = False 564 | return OsmData 565 | 566 | def flush(self): 567 | return self._changesetautoflush(True) 568 | 569 | def _changesetautoflush(self, force = False): 570 | while (len(self._changesetautodata) >= self._changesetautosize) or (force and self._changesetautodata): 571 | if self._changesetautocpt == 0: 572 | self.ChangesetCreate(self._changesetautotags) 573 | self.ChangesetUpload(self._changesetautodata[:self._changesetautosize]) 574 | self._changesetautodata = self._changesetautodata[self._changesetautosize:] 575 | self._changesetautocpt += 1 576 | if self._changesetautocpt == self._changesetautomulti: 577 | self.ChangesetClose() 578 | self._changesetautocpt = 0 579 | if self._changesetautocpt and force: 580 | self.ChangesetClose() 581 | self._changesetautocpt = 0 582 | return None 583 | 584 | def _http_request(self, cmd, path, auth, send): 585 | if self._debug: 586 | path2 = path 587 | if len(path2) > 50: 588 | path2 = path2[:50]+"[...]" 589 | print >>sys.stderr, "%s %s %s"%(time.strftime("%Y-%m-%d %H:%M:%S"),cmd,path2) 590 | self._conn.putrequest(cmd, path) 591 | self._conn.putheader('User-Agent', self._created_by) 592 | if auth: 593 | self._conn.putheader('Authorization', 'Basic ' + base64.encodestring(self._username + ':' + self._password).strip()) 594 | if send <> None: 595 | self._conn.putheader('Content-Length', len(send)) 596 | self._conn.endheaders() 597 | if send: 598 | self._conn.send(send) 599 | response = self._conn.getresponse() 600 | if response.status <> 200: 601 | payload = response.read().strip() 602 | if response.status == 410: 603 | return None 604 | raise ApiError(response.status, response.reason, payload) 605 | if self._debug: 606 | print >>sys.stderr, "%s %s %s done"%(time.strftime("%Y-%m-%d %H:%M:%S"),cmd,path2) 607 | return response.read() 608 | 609 | def _http(self, cmd, path, auth, send): 610 | i = 0 611 | while True: 612 | i += 1 613 | try: 614 | return self._http_request(cmd, path, auth, send) 615 | except ApiError, e: 616 | if e.status >= 500: 617 | if i == 5: raise 618 | if i <> 1: time.sleep(5) 619 | self._conn = httplib.HTTPConnection(self._api, 80) 620 | else: raise 621 | except Exception: 622 | if i == 5: raise 623 | if i <> 1: time.sleep(5) 624 | self._conn = httplib.HTTPConnection(self._api, 80) 625 | 626 | def _get(self, path): 627 | return self._http('GET', path, False, None) 628 | 629 | def _put(self, path, data): 630 | return self._http('PUT', path, True, data) 631 | 632 | def _delete(self, path, data): 633 | return self._http('DELETE', path, True, data) 634 | 635 | ####################################################################### 636 | # Internal dom function # 637 | ####################################################################### 638 | 639 | def _DomGetAttributes(self, DomElement): 640 | """ Returns a formated dictionnary of attributes of a DomElement. """ 641 | result = {} 642 | for k, v in DomElement.attributes.items(): 643 | if k == u"uid" : v = int(v) 644 | elif k == u"changeset" : v = int(v) 645 | elif k == u"version" : v = int(v) 646 | elif k == u"id" : v = int(v) 647 | elif k == u"lat" : v = float(v) 648 | elif k == u"lon" : v = float(v) 649 | elif k == u"open" : v = v=="true" 650 | elif k == u"visible" : v = v=="true" 651 | elif k == u"ref" : v = int(v) 652 | result[k] = v 653 | return result 654 | 655 | def _DomGetTag(self, DomElement): 656 | """ Returns the dictionnary of tags of a DomElement. """ 657 | result = {} 658 | for t in DomElement.getElementsByTagName("tag"): 659 | k = t.attributes["k"].value 660 | v = t.attributes["v"].value 661 | result[k] = v 662 | return result 663 | 664 | def _DomGetNd(self, DomElement): 665 | """ Returns the list of nodes of a DomElement. """ 666 | result = [] 667 | for t in DomElement.getElementsByTagName("nd"): 668 | result.append(int(int(t.attributes["ref"].value))) 669 | return result 670 | 671 | def _DomGetMember(self, DomElement): 672 | """ Returns a list of relation members. """ 673 | result = [] 674 | for m in DomElement.getElementsByTagName("member"): 675 | result.append(self._DomGetAttributes(m)) 676 | return result 677 | 678 | def _DomParseNode(self, DomElement): 679 | """ Returns NodeData for the node. """ 680 | result = self._DomGetAttributes(DomElement) 681 | result[u"tag"] = self._DomGetTag(DomElement) 682 | return result 683 | 684 | def _DomParseWay(self, DomElement): 685 | """ Returns WayData for the way. """ 686 | result = self._DomGetAttributes(DomElement) 687 | result[u"tag"] = self._DomGetTag(DomElement) 688 | result[u"nd"] = self._DomGetNd(DomElement) 689 | return result 690 | 691 | def _DomParseRelation(self, DomElement): 692 | """ Returns RelationData for the relation. """ 693 | result = self._DomGetAttributes(DomElement) 694 | result[u"tag"] = self._DomGetTag(DomElement) 695 | result[u"member"] = self._DomGetMember(DomElement) 696 | return result 697 | 698 | def _DomParseChangeset(self, DomElement): 699 | """ Returns ChangesetData for the changeset. """ 700 | result = self._DomGetAttributes(DomElement) 701 | result[u"tag"] = self._DomGetTag(DomElement) 702 | return result 703 | 704 | ####################################################################### 705 | # Internal xml builder # 706 | ####################################################################### 707 | 708 | def _XmlBuild(self, ElementType, ElementData, WithHeaders = True): 709 | 710 | xml = u"" 711 | if WithHeaders: 712 | xml += u"\n" 713 | xml += u"\n" 714 | 715 | # 716 | xml += u" <" + ElementType 717 | if u"id" in ElementData: 718 | xml += u" id=\"" + str(ElementData[u"id"]) + u"\"" 719 | if u"lat" in ElementData: 720 | xml += u" lat=\"" + str(ElementData[u"lat"]) + u"\"" 721 | if u"lon" in ElementData: 722 | xml += u" lon=\"" + str(ElementData[u"lon"]) + u"\"" 723 | if u"version" in ElementData: 724 | xml += u" version=\"" + str(ElementData[u"version"]) + u"\"" 725 | xml += u" visible=\"" + str(ElementData.get(u"visible", True)).lower() + u"\"" 726 | if ElementType in [u"node", u"way", u"relation"]: 727 | xml += u" changeset=\"" + str(self._CurrentChangesetId) + u"\"" 728 | xml += u">\n" 729 | 730 | # 731 | for k, v in ElementData.get(u"tag", {}).items(): 732 | xml += u" \n" 733 | 734 | # 735 | for member in ElementData.get(u"member", []): 736 | xml += u" \n" 737 | 738 | # 739 | for ref in ElementData.get(u"nd", []): 740 | xml += u" \n" 741 | 742 | # 743 | xml += u" \n" 744 | 745 | if WithHeaders: 746 | xml += u"\n" 747 | 748 | return xml.encode("utf8") 749 | 750 | def _XmlEncode(self, text): 751 | return text.replace("&", "&").replace("\"", """).replace("<","<").replace(">",">") 752 | 753 | ## End of main class ## 754 | ########################################################################### 755 | -------------------------------------------------------------------------------- /junk/README: -------------------------------------------------------------------------------- 1 | This directory contains the detritus of our efforts. Fortunately, bits are cheap! 2 | -------------------------------------------------------------------------------- /junk/fetch_osm.py: -------------------------------------------------------------------------------- 1 | from OsmApi import OsmApi 2 | from outliers import load_points, discard_outliers, get_bbox_for_points 3 | import sys, geojson, time 4 | 5 | DEFAULT_TAGS = ("highway", "waterway") 6 | 7 | osm = OsmApi() #api="http://open.mapquestapi.com/xapi" 8 | 9 | def get_osm_ways(bbox, wanted_tags=DEFAULT_TAGS): 10 | nodes = {} 11 | left, bottom, right, top = bbox 12 | step = .025 13 | scale = 100000.0 14 | count = 0 15 | iterations = int(((right-left)/step+1)*((top-bottom)/step+1)) 16 | for x in range(int(left*scale), int(right*scale), int(step*scale)): 17 | for y in range(int(bottom*scale), int(top*scale), int(step*scale)): 18 | count += 1 19 | ways = {} 20 | request = (x/scale, y/scale, min(x/scale+step,right), min(y/scale+step, top)) 21 | start= time.time() 22 | print >>sys.stderr, "\rRequesting %.4f,%.4f,%.4f,%.4f from OSM (%d of %d)..." % (request+(count, iterations)), 23 | for item in osm.Map(*request): 24 | data = item["data"] 25 | if item["type"] == "node": 26 | nodes[int(data["id"])] = map(float, (data["lon"],data["lat"])) 27 | elif item["type"] == "way" and any(t for t in wanted_tags if t in data["tag"]): 28 | ways[int(data["id"])] = ( dict((k, data["tag"].get(k, "")) for k in wanted_tags), data["nd"] ) 29 | print >>sys.stderr, "%d nodes, %d ways found (%.2f elapsed)" % (len(nodes),len(ways),time.time()-start) 30 | for way_id, (tags, node_ids) in ways.items(): 31 | feature = geojson.Feature() 32 | feature.geometry = geojson.LineString(coordinates=(nodes[ref] for ref in node_ids if ref in nodes)) 33 | feature.properties = tags 34 | feature.id = way_id 35 | yield feature 36 | time.sleep(0.5) 37 | 38 | def main(points_file): 39 | places = load_points(points_file) 40 | #random_place = dict([places.popitem()]) 41 | random_place = discard_outliers(places) 42 | bbox = get_bbox_for_points(places) 43 | for obj in get_osm_ways(bbox): 44 | print obj.to_dict() 45 | 46 | if __name__ == "__main__": 47 | main(sys.argv[1]) 48 | 49 | -------------------------------------------------------------------------------- /junk/pull_photos.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | import sys 3 | import csv 4 | 5 | #first arg: input file, csv. column woe_id should be the list of woe_ids we want to pull out of photos.txt 6 | #second arg: output file, txt subset of photos.txt (also remove photoid. samplr not expecting it) 7 | 8 | def main(): 9 | infile = sys.argv[1] 10 | 11 | outfile = sys.argv[2] 12 | 13 | photofile = "photos.txt" 14 | 15 | woes = [] 16 | ireader = csv.DictReader(open(infile, 'r')) 17 | for line in ireader: 18 | woes.append(line['woe_id']) 19 | 20 | 21 | pfh = open(photofile, 'r') 22 | ofh = open(outfile, 'w') 23 | 24 | outstr = "%s\t%s\t%s\n" 25 | 26 | for row in pfh: 27 | photoid, placeid, lon, lat = row.strip().split() 28 | if placeid in woes: 29 | out = outstr % (placeid, lon, lat) 30 | ofh.write(out) 31 | 32 | if __name__ == "__main__": 33 | sys.exit(main()) 34 | 35 | -------------------------------------------------------------------------------- /junk/samplr.py: -------------------------------------------------------------------------------- 1 | from shapely.geometry import Point, MultiPoint, Polygon, MultiPolygon 2 | from shapely.ops import cascaded_union, polygonize 3 | from shapely.prepared import prep 4 | from rtree import Rtree 5 | import sys, random, json, numpy, math, pickle, os 6 | 7 | SAMPLE_ITERATIONS = 200 8 | SAMPLE_SIZE = 5 9 | MEDIAN_THRESHOLD = 5.0 10 | 11 | median_distance_cache = {} 12 | def median_distances(pts, aggregate=numpy.median): 13 | key = tuple(sorted(pts)) 14 | if key in median_distance_cache: return median_distance_cache[key] 15 | median = (numpy.median([pt[0] for pt in pts]), 16 | numpy.median([pt[1] for pt in pts])) 17 | distances = [] 18 | for pt in pts: 19 | dist = math.sqrt(((median[0]-pt[0])*math.cos(median[1]*math.pi/180.0))**2+(median[1]-pt[1])**2) 20 | distances.append((dist, pt)) 21 | 22 | median_dist = aggregate([dist for dist, pt in distances]) 23 | median_distance_cache[key] = (median_dist, distances) 24 | return (median_dist, distances) 25 | 26 | def mean_distances(pts): 27 | return median_distances(pts, numpy.mean) 28 | 29 | name_file, point_file = sys.argv[1:3] 30 | 31 | places = {} 32 | names = {} 33 | if os.path.exists(point_file + '.cache'): 34 | print >>sys.stderr, "Reading from %s cache..." % point_file 35 | names, places = pickle.load(file(point_file + ".cache")) 36 | else: 37 | all_names = {} 38 | count = 0 39 | for line in file(name_file): 40 | place_id, name = line.strip().split(None, 1) 41 | all_names[int(place_id)] = name 42 | count += 1 43 | if count % 1000 == 0: 44 | print >>sys.stderr, "\rRead %d names from %s." % (count, name_file), 45 | print >>sys.stderr, "\rRead %d names from %s." % (count, name_file) 46 | 47 | count = 0 48 | for line in file(point_file): 49 | place_id, lon, lat = line.strip().split() 50 | place_id = int(place_id) 51 | names[place_id] = all_names.get(place_id, "") 52 | point = (float(lon), float(lat)) 53 | pts = places.setdefault(place_id, set()) 54 | pts.add(point) 55 | count += 1 56 | if count % 1000 == 0: 57 | print >>sys.stderr, "\rRead %d points in %d places." % (count, len(places)), 58 | print >>sys.stderr, "\rRead %d points in %d places." % (count, len(places)) 59 | 60 | count = 0 61 | discarded = 0 62 | for place_id, pts in places.items(): 63 | count += 1 64 | print >>sys.stderr, "\rComputing outliers for %d of %d places..." % (count, len(places)), 65 | median_dist, distances = median_distances(pts) 66 | keep = [pt for dist, pt in distances if dist < median_dist * MEDIAN_THRESHOLD] 67 | discarded += len(pts) - len(keep) 68 | places[place_id] = keep 69 | 70 | print >>sys.stderr, "%d points discarded." % discarded 71 | 72 | if not os.path.exists(point_file + '.cache'): 73 | print >>sys.stderr, "Caching points..." 74 | pickle.dump((names, places), file(point_file + ".cache", "w"), -1) 75 | 76 | print >>sys.stderr, "Indexing..." 77 | points = [] 78 | place_list = set() 79 | for place_id, pts in places.items(): 80 | for pt in pts: 81 | place_list.add((len(points), pt+pt, None)) 82 | points.append((place_id, Point(pt))) 83 | index = Rtree(place_list) 84 | 85 | """ 86 | 87 | REASSIGNMENT_PASSES = 10 88 | iterations = 0 89 | count = 0 90 | queue = places.keys() + [None] 91 | while len(queue) > 1: 92 | place_id = queue.pop(0) 93 | if place_id is None: 94 | count = 0 95 | iterations += 1 96 | queue.append(None) 97 | place_id = queue.pop(0) 98 | if not places[place_id]: 99 | del places[place_id] 100 | continue 101 | pts = places[place_id] 102 | count += 1 103 | print >>sys.stderr, "\rIteration #%d of reassignment: %d of %d places..." % (iterations, count, len(queue)), 104 | if iterations > len(pts) / 10.0: continue 105 | old_source_mean, distances = mean_distances(pts) 106 | _, outlier = max(distances) 107 | best = (None, 0.0) 108 | print >>sys.stderr, "" 109 | for nearest in index.nearest(outlier.bounds, 3, objects=True): 110 | #print >>sys.stderr, " -> %s (%d) versus %s (%d)" % (outlier, place_id, Point(nearest.bbox[0:2]), nearest.id) 111 | if nearest.id == place_id: continue 112 | old_target_mean, _ = mean_distances(places[nearest.id]) 113 | source = list(pts) 114 | source.remove(outlier) 115 | target = list(places[nearest.id]) + [outlier] 116 | #print >>sys.stderr, " source: new=%d items, old=%d items" % (len(source), len(pts)) 117 | new_source_mean, _ = mean_distances(source) 118 | new_target_mean, _ = mean_distances(target) 119 | print >>sys.stderr, " source mean: new=%.6f, old=%.6f" % (old_source_mean, new_source_mean) 120 | print >>sys.stderr, " target mean: new=%.6f, old=%.6f" % (old_target_mean, new_target_mean) 121 | if new_source_mean < old_source_mean and \ 122 | new_target_mean < old_target_mean: 123 | improvement = (old_source_mean - new_source_mean) \ 124 | + (old_target_mean - new_target_mean) 125 | if improvement > best[1]: 126 | best = (nearest.id, improvement) 127 | if best[1] > 0: 128 | pts.remove(outlier) 129 | places[best[0]].append(outlier) 130 | queue.append(place_id) 131 | print >>sys.stderr, "%s moved from %d to %d." % (outlier, place_id, best[0]) 132 | 133 | print >>sys.stderr, "Done." 134 | 135 | """ 136 | 137 | sample_hulls = {} 138 | count = 0 139 | for place_id, pts in places.items(): 140 | hulls = [] 141 | if len(pts) < 3: 142 | print >>sys.stderr, "\n ... discarding place #%d" % place_id 143 | continue 144 | for i in range(min(pts,SAMPLE_ITERATIONS)): 145 | multipoint = MultiPoint(random.sample(pts, min(SAMPLE_SIZE, len(pts)))) 146 | hull = multipoint.convex_hull 147 | if isinstance(hull, Polygon) and not hull.is_empty: hulls.append(hull) 148 | try: 149 | sample_hulls[place_id] = cascaded_union(hulls) 150 | except: 151 | print >>sys.stderr, hulls 152 | sys.exit() 153 | if hasattr(sample_hulls[place_id], "geoms"): 154 | sample_hulls[place_id] = cascaded_union([hull for hull in sample_hulls[place_id] if type(hull) is Polygon]) 155 | count += SAMPLE_ITERATIONS 156 | print >>sys.stderr, "\rComputing %d of %d hulls..." % (count, (len(places) * SAMPLE_ITERATIONS)), 157 | 158 | print >>sys.stderr, "\nCombining hull boundaries..." 159 | boundaries = cascaded_union([hull.boundary for hull in sample_hulls.values()]) 160 | 161 | print >>sys.stderr, "Polygonizing %d boundaries..." % len(boundaries) 162 | rings = list(polygonize(boundaries)) 163 | 164 | for i, ring in enumerate(rings): 165 | print >>sys.stderr, "\rBuffering %d of %d polygons..." % (i, len(rings)), 166 | size = math.sqrt(ring.area)*0.1 167 | rings[i] = ring.buffer(size) 168 | print >>sys.stderr, "Done." 169 | 170 | polygons = {} 171 | count = 0 172 | for polygon in rings: 173 | if polygon.is_empty: continue 174 | place_count = dict((place_id, 0) for place_id in places) 175 | prepared = prep(polygon) 176 | for item in index.intersection(polygon.bounds): 177 | place_id, point = points[item] 178 | if prepared.intersects(point): 179 | place_count[place_id] += 1 180 | pt_count, place_id = max((c, i) for (i, c) in place_count.items()) 181 | polys = polygons.setdefault(place_id, []) 182 | polys.append(polygon) 183 | count += 1 184 | print >>sys.stderr, "\rAssigning %d of %d polygons..." % (count,len(rings)), 185 | print >>sys.stderr, "Done." 186 | 187 | count = 0 188 | for place_id, polys in polygons.items(): 189 | polygons[place_id] = cascaded_union(polys) 190 | count += 1 191 | print >>sys.stderr, "\rUnifying %d of %d polygons..." % (count,len(polygons)), 192 | print >>sys.stderr, "Done." 193 | 194 | count = 0 195 | orphans = [] 196 | for place_id, multipolygon in polygons.items(): 197 | count += 1 198 | print >>sys.stderr, "\rRemoving %d orphans from %d of %d polygons..." % (len(orphans), count, len(polygons)), 199 | if type(multipolygon) is not MultiPolygon: continue 200 | polygon_count = [0] * len(multipolygon) 201 | for i, polygon in enumerate(multipolygon.geoms): 202 | prepared = prep(polygon) 203 | for item in index.intersection(polygon.bounds): 204 | item_id, point = points[item] 205 | if item_id == place_id and prepared.intersects(point): 206 | polygon_count[i] += 1 207 | winner = max((c, i) for (i, c) in enumerate(polygon_count))[1] 208 | polygons[place_id] = multipolygon.geoms[winner] 209 | orphans.extend(p for i, p in enumerate(multipolygon.geoms) if i != winner) 210 | print >>sys.stderr, "Done." 211 | 212 | orphans = [] 213 | count = 0 214 | changed = True 215 | while changed and orphans: 216 | orphan = orphans.pop(0) 217 | changed = False 218 | count += 1 219 | print >>sys.stderr, "\rReassigning %d of %d orphans..." % (count, len(orphans)), 220 | place_count = dict((place_id, 0) for place_id in places) 221 | total_count = 0.0 222 | prepared = prep(orphan) 223 | for item in index.intersection(orphan.bounds): 224 | item_id, point = points[item] 225 | if prepared.intersects(point): 226 | place_count[item_id] += 1 227 | total_count += 1 228 | for place_id, ct in place_count.items(): 229 | if total_count > 0 and float(ct)/total_count > 1/3.0: 230 | polygons[place_id] = polygons[place_id].union(orphan) 231 | changed = True 232 | if not changed: 233 | orphans.append(orphan) 234 | 235 | print >>sys.stderr, "Done." 236 | 237 | print >>sys.stderr, "\nWriting output." 238 | features = [] 239 | for place_id, poly in polygons.items(): 240 | features.append({ 241 | "type": "Feature", 242 | "id": place_id, 243 | "geometry": poly.__geo_interface__, 244 | "properties": {"woe_id": place_id, "name": names.get(place_id, "")} 245 | }) 246 | 247 | collection = { 248 | "type": "FeatureCollection", 249 | "features": features 250 | } 251 | 252 | print json.dumps(collection) 253 | 254 | -------------------------------------------------------------------------------- /junk/up_one_level.py: -------------------------------------------------------------------------------- 1 | from shapely.geometry import Polygon, MultiPolygon, shape 2 | from shapely.ops import cascaded_union 3 | import sys, json 4 | import psycopg2 5 | import psycopg2.extras 6 | 7 | #read in a GeoJSON featurecollection 8 | #translate those geoms into shapely land 9 | #lookup the parent woe ids for each geom 10 | #cascaded union the geoms to make geoms for the parents. 11 | #output new GeoJSON featurecollection 12 | 13 | town, townwoeid = sys.argv[1:3] 14 | 15 | json_file = "data/%s.json" % town 16 | 17 | infh = open(json_file, 'r') 18 | injson = json.loads(infh.next()) 19 | nbhds = {} 20 | 21 | print >>sys.stderr, "Reading in nbhds." 22 | 23 | for feature in injson['features']: 24 | nbhd = shape(feature['geometry']) 25 | nbhds[feature['id']] = nbhd 26 | 27 | print >>sys.stderr, "Looking up parents." 28 | family = {} 29 | 30 | pquery = """select parent_id 31 | from woe_places 32 | where woe_id = %s""" 33 | 34 | iquery = """select name 35 | from woe_places 36 | where woe_id = %s""" 37 | 38 | conn_string = "dbname='hood'" 39 | conn = psycopg2.connect(conn_string) 40 | cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) 41 | 42 | for woeid, nbhd in nbhds.items(): 43 | pq = pquery % woeid 44 | cursor.execute(pq) 45 | rs = cursor.fetchone() 46 | parent = rs["parent_id"] 47 | if parent == townwoeid: 48 | print >>sys.stderr, "Nbhd %s has the town for a parent" % woeid 49 | continue 50 | if parent not in family: 51 | iq = iquery % parent 52 | cursor.execute(iq) 53 | rs = cursor.fetchone() 54 | family[parent] = {} 55 | family[parent]["name"] = rs["name"] 56 | family[parent]["children"] = [woeid] 57 | else: 58 | family[parent]["children"].append(woeid) 59 | 60 | print >>sys.stderr, "Merging %s stems" % len(family.keys()) 61 | for parent in family.keys(): 62 | family[parent]['geom'] = cascaded_union([nbhds[child] for child in family[parent]['children']]) 63 | 64 | 65 | print >>sys.stderr, "Buffering stems." 66 | for parent, feature in family.items(): 67 | polygon = feature['geom'] 68 | #print >>sys.stderr, "\r%s has shape of type %s" %(place_id, type(polygon)) 69 | if type(polygon) is Polygon: 70 | polygon = Polygon(polygon.exterior.coords) 71 | else: 72 | polygon = MultiPolygon([Polygon(p.exterior.coords)for p in polygon.geoms]) 73 | family[parent]['geom'] = polygon.buffer(0) 74 | 75 | print >>sys.stderr, "Writing output." 76 | features = [] 77 | for place_id, feature in family.items(): 78 | features.append({ 79 | "type": "Feature", 80 | "id": place_id, 81 | "geometry": feature['geom'].__geo_interface__, 82 | "properties": {"woe_id": place_id, "name": feature['name']} 83 | }) 84 | 85 | collection = { 86 | "type": "FeatureCollection", 87 | "features": features 88 | } 89 | 90 | print json.dumps(collection) 91 | 92 | -------------------------------------------------------------------------------- /leaves_from_woeid.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | import sys 3 | import psycopg2 4 | import psycopg2.extras 5 | import csv 6 | import copy 7 | #take in a woe_id 8 | #find all the children of that woe_id that are local_admins or suburbs 9 | #for each of the children that are local admins, get all their children that are local admins or suburbs 10 | #repeat until have list of descendents that have no children 11 | #print list as name, woe_id 12 | leaftypes = ('LocalAdmin',"Suburb") 13 | 14 | #owriter = csv.writer(sys.stdout) 15 | #owriter.writerow(["parent_id","name","type","woe_id"]) 16 | 17 | def main(): 18 | for woeid in sys.argv[1:]: 19 | print >>sys.stderr, woeid, 20 | 21 | childq = """select * from woe_places 22 | where parent_id = %s 23 | and placetype in ('County','LocalAdmin','Suburb')""" 24 | 25 | conn_string = "dbname='hood'" 26 | # get a connection, if a connect cannot be made an exception will be raised here 27 | conn = psycopg2.connect(conn_string) 28 | cursor = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) 29 | 30 | search = set([woeid]) 31 | leaves = set() 32 | names = {} 33 | types = {} 34 | while len(search) > 0: 35 | print >>sys.stderr, ".", 36 | curr_search = copy.copy(search) 37 | for woe in curr_search: 38 | search.remove(woe) 39 | qry = childq % woe 40 | cursor.execute(qry) 41 | if cursor.rowcount == 0: 42 | if woe not in types: 43 | break 44 | if types[woe] in leaftypes: 45 | leaves.add((woeid,names[woe],types[woe],woe)) 46 | for line in cursor: 47 | names[line['woe_id']] = line['name'] 48 | types[line['woe_id']] = line['placetype'] 49 | search.add(line['woe_id']) 50 | 51 | conn.close() 52 | print >>sys.stderr, "" 53 | 54 | for leaf in leaves: 55 | #owriter.writerow(leaf) 56 | print "\t".join(map(str,leaf)) 57 | 58 | if __name__ == "__main__": 59 | sys.exit(main()) 60 | 61 | -------------------------------------------------------------------------------- /mapnik_render.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from mapnik import * 4 | import sys, random 5 | 6 | width, height = 2048, 2048 7 | rgbs = ["80", "a2", "ab"] 8 | base = "data/results" 9 | city = sys.argv[1] 10 | 11 | woe_id = None 12 | # intl_cities.txt is a tab-separated file mapping woe_id -> name 13 | for line in file(base+"/intl_cities.txt"): 14 | woe_id, name = line.strip().split(None,1) 15 | if city == name: break 16 | if woe_id is None: raise Exception("Couldn't find the city '%s'" % city) 17 | 18 | m = Map(width, height, "+proj=latlong +datum=WGS84") 19 | m.background = Color('white') 20 | 21 | if city == "Tokyo": 22 | register_fonts("/usr/share/fonts/truetype/takao") 23 | font = "TakaoMincho Regular" 24 | else: 25 | font = "DejaVu Sans Bold" 26 | 27 | def append_style(name, *symbols): 28 | s = Style() 29 | r = Rule() 30 | for symbol in symbols: 31 | r.symbols.append(symbol) 32 | s.rules.append(r) 33 | m.append_style(name,s) 34 | 35 | random.shuffle(rgbs) 36 | fill = Color('#%s%s%s' % tuple(rgbs)) 37 | hood = Layer('hood', "+proj=latlong +datum=WGS84") 38 | hood.datasource = Ogr(base=base,file=city+".json",layer="OGRGeoJSON") 39 | append_style("hood", PolygonSymbolizer(fill)) 40 | hood.styles.append("hood") 41 | m.layers.append(hood) 42 | 43 | blocks = Layer('blocks',"+proj=latlong +datum=WGS84") 44 | blocks.datasource = Ogr(base=base,file="blocks_"+woe_id+".json",layer='OGRGeoJSON') 45 | append_style('blocks', LineSymbolizer(Color('rgb(50%,50%,50%)'),1.0)) 46 | blocks.styles.append('blocks') 47 | m.layers.append(blocks) 48 | 49 | bounds = Layer('bounds', "+proj=latlong +datum=WGS84") 50 | bounds.datasource = Ogr(base=base,file=city+".json",layer="OGRGeoJSON") 51 | append_style("bounds", LineSymbolizer(Color('#222222'), 2.0)) 52 | text = TextSymbolizer("name", font, 12, Color("black")) 53 | text.allow_overlap = False 54 | text.avoid_edges = True 55 | text.wrap_width = 15 56 | halo_fill = [min(x+32, 255) for x in (fill.r, fill.g, fill.b)] 57 | text.halo_fill = Color(*halo_fill) 58 | text.halo_radius = 1 59 | append_style("bounds_label", text) 60 | # 61 | bounds.styles.append("bounds") 62 | bounds.styles.append("bounds_label") 63 | m.layers.append(bounds) 64 | 65 | m.zoom_to_box(hood.envelope()) 66 | render_to_file(m,city+'.png', 'png') 67 | -------------------------------------------------------------------------------- /outliers.py: -------------------------------------------------------------------------------- 1 | import numpy 2 | import sys 3 | import math 4 | 5 | MEDIAN_THRESHOLD = 5.0 6 | 7 | median_distance_cache = {} 8 | def median_distances(pts, aggregate=numpy.median): 9 | key = tuple(sorted(pts)) 10 | if key in median_distance_cache: return median_distance_cache[key] 11 | median = (numpy.median([pt[0] for pt in pts]), 12 | numpy.median([pt[1] for pt in pts])) 13 | distances = [] 14 | for pt in pts: 15 | dist = math.sqrt(((median[0]-pt[0])*math.cos(median[1]*math.pi/180.0))**2+(median[1]-pt[1])**2) 16 | distances.append((dist, pt)) 17 | 18 | median_dist = aggregate([dist for dist, pt in distances]) 19 | median_distance_cache[key] = (median_dist, distances) 20 | return (median_dist, distances) 21 | 22 | def mean_distances(pts): 23 | return median_distances(pts, numpy.mean) 24 | 25 | def load_points(point_file): 26 | places = {} 27 | count = 0 28 | for line in file(point_file): 29 | data = line.strip().split() 30 | place_id, lon, lat = data if len(data) == 3 else data[1:] 31 | place_id = int(place_id) 32 | point = (float(lon), float(lat)) 33 | pts = places.setdefault(place_id, set()) 34 | pts.add(point) 35 | count += 1 36 | if count % 1000 == 0: 37 | print >>sys.stderr, "\rRead %d points in %d places." % (count, len(places)), 38 | print >>sys.stderr, "\rRead %d points in %d places." % (count, len(places)) 39 | return places 40 | 41 | def discard_outliers(places, threshold=MEDIAN_THRESHOLD): 42 | count = 0 43 | discarded = 0 44 | result = {} 45 | for place_id, pts in places.items(): 46 | count += 1 47 | print >>sys.stderr, "\rComputing outliers for %d of %d places..." % (count, len(places)), 48 | median_dist, distances = median_distances(pts) 49 | keep = [pt for dist, pt in distances if dist < median_dist * threshold] 50 | discarded += len(pts) - len(keep) 51 | result[place_id] = keep 52 | print >>sys.stderr, "%d points discarded." % discarded 53 | return result 54 | 55 | def get_bbox_for_points(places): 56 | bbox = [180, 90, -180, -90] 57 | for pid, pts in places.items(): 58 | for pt in pts: 59 | for i in range(4): 60 | bbox[i] = min(bbox[i], pt[i%2]) if i<2 else max(bbox[i], pt[i%2]) 61 | return bbox 62 | 63 | def main(filename): 64 | places = load_points(filename) 65 | places = discard_outliers(places) 66 | bbox = get_bbox_for_points(places) 67 | #print ",".join(map(str, bbox)) 68 | print "%s %s, %s %s" % (bbox[0], bbox[1], bbox[2], bbox[3]) 69 | 70 | if __name__ == "__main__": 71 | main(sys.argv[1]) 72 | 73 | -------------------------------------------------------------------------------- /util/consolidate_geojson.py: -------------------------------------------------------------------------------- 1 | import json, sys, os.path 2 | 3 | woe = {} 4 | for line in file(sys.argv[1]): 5 | woe_id, name = line.strip().split(None,1) 6 | name = name.split("_")[0] 7 | woe[name] = int(woe_id) 8 | 9 | features = [] 10 | for fname in sys.argv[2:]: 11 | print >>sys.stderr, "-", fname 12 | name = os.path.basename(fname).split(".")[0].split("_")[0] 13 | collection = json.load(file(fname)) 14 | for record in collection["features"]: 15 | record["properties"]["city"] = name 16 | record["properties"]["parent_id"] = woe[name] 17 | record["properties"]["woe_type"] = "Suburb" if "_" not in fname else "LocalAdmin" 18 | features.append(record) 19 | 20 | json.dump( 21 | { "type": "FeatureCollection", "features": features }, 22 | sys.stdout) 23 | 24 | -------------------------------------------------------------------------------- /util/geoplanet.py: -------------------------------------------------------------------------------- 1 | 2 | import urllib, json, sys 3 | 4 | APPID = os.environ["YAHOO_APPID"] 5 | url = 'http://where.yahooapis.com/v1/places.q(%s)?select=long&format=json&appid=' 6 | 7 | for line in sys.stdin: 8 | query = url % line.strip() 9 | result = urllib.urlopen(query).read() 10 | result = json.loads(result) 11 | place = result['places']['place'][0] 12 | print place['woeid'], "\t", place["name"] 13 | -------------------------------------------------------------------------------- /util/upload_photos.py: -------------------------------------------------------------------------------- 1 | import Flickr.API 2 | import os, os.path, json, sys 3 | import xml.etree.ElementTree 4 | 5 | key, secret = os.environ["FLICKR_KEY"], os.environ["FLICKR_SECRET"] 6 | 7 | # flickr.test.echo: 8 | api = Flickr.API.API(key, secret) 9 | token = None 10 | 11 | # flickr.auth.getFrob: 12 | frob_request = Flickr.API.Request(method='flickr.auth.getFrob') 13 | frob_rsp = api.execute_request(frob_request) 14 | if frob_rsp.code == 200: 15 | frob_rsp_et = xml.etree.ElementTree.parse(frob_rsp) 16 | if frob_rsp_et.getroot().get('stat') == 'ok': 17 | frob = frob_rsp_et.findtext('frob') 18 | 19 | # get the desktop authentication url 20 | auth_url = api.get_authurl('write', frob=frob) 21 | 22 | # ask the user to authorize your app now using that url 23 | print "auth me: %s" % (auth_url,) 24 | input = raw_input("done [y]: ") 25 | if input.lower() not in ('', 'y', 'yes'): 26 | sys.exit() 27 | 28 | # flickr.auth.getToken: 29 | token_rsp = api.execute_request(Flickr.API.Request(method='flickr.auth.getToken', frob=frob, format='json', nojsoncallback=1)) 30 | if token_rsp.code == 200: 31 | token_rsp_json = json.load(token_rsp) 32 | if token_rsp_json['stat'] == 'ok': 33 | token = str(token_rsp_json['auth']['token']['_content']) 34 | 35 | for filename in sys.argv[1:]: 36 | photo = file(filename, "rb") 37 | filename = os.path.basename(filename) 38 | #upload_response = api.execute_upload(filename=filename, args={'auth_token':token, 'title':title, 'photo':photo}) 39 | 40 | upload_request = Flickr.API.Request("http://api.flickr.com/services/upload", auth_token=token, title=filename, photo=photo) 41 | upload_response = api.execute_request(upload_request, sign=True, encode=Flickr.API.encode_multipart_formdata) 42 | print upload_response 43 | --------------------------------------------------------------------------------