├── .gitignore
├── LICENSE.md
├── README.md
├── examples
    ├── 1-Overview-OpenStreetMap-data.ipynb
    ├── 2-Spatial-indices-Land-use-mix.ipynb
    ├── 3-Spatial-indices-Urban-sprawl.ipynb
    ├── 4-Spatial-indices-Granularity.ipynb
    ├── 5-Batch-mode-Urban-sprawl.ipynb
    ├── 6-Disaggregated-population-estimates-Pre-requisites.ipynb
    ├── 7-Disaggregated-population-estimates-Residential-surface-approach.ipynb
    ├── 8-Disaggregated-population-estimates-Neural-networks-approach.ipynb
    └── images
    │   ├── Grenoble_GPW_simulation.png
    │   ├── Grenoble_INSEE.png
    │   ├── Lyon_Accessibility.png
    │   ├── Lyon_Buildings.png
    │   ├── Lyon_Dispersion.png
    │   ├── Lyon_Landusemix.png
    │   ├── Lyon_POIs.png
    │   ├── Lyon_activities_densities.png
    │   ├── Lyon_densities.png
    │   └── Lyon_graph.png
├── setup.py
└── urbansprawl
    ├── __init__.py
    ├── osm
        ├── __init__.py
        ├── classification.py
        ├── core.py
        ├── overpass.py
        ├── surface.py
        ├── tags.py
        └── utils.py
    ├── population
        ├── __init__.py
        ├── core.py
        ├── data_extract.py
        ├── downscaling.py
        ├── urban_features.py
        └── utils.py
    ├── settings.py
    └── sprawl
        ├── __init__.py
        ├── accessibility.py
        ├── accessibility_parallel.py
        ├── core.py
        ├── dispersion.py
        ├── landusemix.py
        └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Folders
 2 | __pycache__/
 3 | cache/
 4 | data/
 5 | logs/
 6 | .ipynb_checkpoints/
 7 | 
 8 | # Logs
 9 | *.log
10 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Luciano Gervasoni
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Urbansprawl
  2 | 
  3 | The urbansprawl project provides an open source framework for assessing urban sprawl using open data. It uses OpenStreetMap (OSM) data to calculate its sprawling indices, divided in Accessibility, Land use mix, and Dispersion.
  4 | 
  5 | Locations of residential and activity (e.g. shop, commerce, office, among others) units are used to measure mixed use development and built-up dispersion, whereas the street network is used to measure the accessibility between different land uses. The output consists of spatial indices, which can be easily integrated with GIS platforms.
  6 | 
  7 | Additionally, a method to perform dissagregated population estimates at building level is provided. Our goal is to estimate the number of people living at the fine level of individual households by using open urban data (OpenStreetMap) and coarse-scaled population data (census tract).
  8 | 
  9 | **Motivation:**
 10 | 
 11 | Urban sprawl has been related to numerous negative environmental and socioeconomic impacts. Meanwhile, the number of people living in cities has been increasing considerably since 1950, from 746 million to 3.9 billion in 2014. More than 66% of the world's population are projected to live in urban areas by 2050, against 30% in 1950 [(United Nations, 2014)](https://esa.un.org/unpd/wup/publications/files/wup2014-highlights.pdf). The fact that urban areas have been growing at increasing rates urges for assessing urban sprawl towards sustainable development. However, sprawl is an elusive term and different approaches to measure it have lead to heterogeneous results. 
 12 | 
 13 | Moreover, most studies rely on private/commercial data-sets and their software is rarely made public, impeding research reproducibility and comparability. Furthermore, many works give as result a unique value for a region of analysis, dismissing spatial information which is vital for urban planners and policy makers. 
 14 | 
 15 | This situation brings new challenges on how to conceive cities that host such amounts of population in a sustainable way. Thus, this sustainability question should address several aspects, ranging from economical to social and environmental matters among others. Urbansprawl provides an open framework to aid in the process of calculating sprawling indices.
 16 | 
 17 | **Framework characteristics:**
 18 | 
 19 | * Open data: we rely solely on open data in order to ensure replicability.
 20 | * Open source: users are free to use the framework for any purpose.
 21 | * World-wide coverage: the analysis can be applied to any city in the world, as long as sufficient data exists.
 22 | * Data homogeneity: a set of statistical tools are applied to homogeneous and well-defined [map features](https://wiki.openstreetmap.org/wiki/Map_Features) data.
 23 | * Geo-localized data: precise location of features allow to cope with the [Modifiable Areal Unit Problem](https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem) (avoid using gridded data, e.g. Land Use Land Cover data).
 24 | * Crowd-sourced data: rapid updates given an ever-increasing community.
 25 | * GIS output: easy to integrate with other GIS frameworks.
 26 | * Potential missing data: still few data exist for some regions in the world.
 27 | 
 28 | **Disclaimer:** This package is no longer maintained.
 29 | 
 30 | **For more details, refer to:**
 31 | * Gervasoni Luciano, 2018. "[Contributions to the formalization and implementation of spatial urban indices using open data : application to urban sprawl studies](https://tel.archives-ouvertes.fr/tel-02077356)." Computers and Society [cs.CY]. Université Grenoble Alpes, 2018.
 32 | * Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2016. "[A framework for evaluating urban land use mix from crowd-sourcing data](https://hal.inria.fr/hal-01396792)." 2nd International Workshop on Big Data for Sustainable Development (IEEE Big Data 2016).
 33 | * Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2017. "[LUM_OSM: une plateforme pour l'évaluation de la mixité urbaine à partir de données participatives](https://hal.inria.fr/hal-01548341)." GAST Workshop, Conférence Extraction et Gestion de Connaissances (EGC 2017).
 34 | * Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2017. "[Calculating spatial urban sprawl indices using open data](https://hal.inria.fr/hal-01535469)." 15th International Conference on Computers in Urban Planning and Urban Management (CUPUM 2017).
 35 | * Gervasoni Luciano, Fenet Serge, and Sturm Peter. 2018. "[Une méthode pour l’estimation désagrégée de données de population à l’aide de données ouvertes](https://hal.inria.fr/hal-01667975)." Conférence Internationale sur l'Extraction et la Gestion des Connaissances (EGC 2018).
 36 | * Gervasoni Luciano, Fenet Serge, Perrier Régis and Sturm Peter. 2018. "[Convolutional neural networks for disaggregated population mapping using open data](https://hal.inria.fr/hal-01852585)." IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018).
 37 | 
 38 | ## Installation
 39 | 
 40 | The urbansprawl framework works with Python 2+3.
 41 | 
 42 | - Python dependencies:
 43 | ```sh
 44 | osmnx scikit-learn psutil tensorflow keras jupyter
 45 | ```
 46 | 
 47 | ### Using pip
 48 | - Install the ```spatialindex``` library. Using apt-get (Linux):
 49 | ```sh
 50 | sudo apt-get install libspatialindex-dev
 51 | ```
 52 | - Install the dependencies using *pip*
 53 | ```sh
 54 | pip install osmnx scikit-learn psutil tensorflow keras jupyter
 55 | ```
 56 | 
 57 | ### Using Miniconda
 58 | - Install [Miniconda](https://conda.io/miniconda.html)
 59 | - [Optional] Create a [conda virtual environment](http://conda.pydata.org/docs/using/envs.html) 
 60 | ```
 61 | conda create --name urbansprawl-env
 62 | source activate urbansprawl-env
 63 | ```
 64 | 
 65 | - Install the dependencies using the conda package manager and the conda-forge channel
 66 | ```sh
 67 | conda install -c conda-forge libspatialindex osmnx scikit-learn psutil tensorflow keras jupyter
 68 | ```
 69 | 
 70 | ### Using Anaconda
 71 | - Install [Anaconda](https://www.anaconda.com/download)
 72 | - [Optional] Create a [conda virtual environment](http://conda.pydata.org/docs/using/envs.html) 
 73 | ```
 74 | conda create --name urbansprawl-env
 75 | source activate urbansprawl-env
 76 | ```
 77 | 
 78 | - Install the dependencies using the conda package manager and the conda-forge channel
 79 | ```sh
 80 | conda update -c conda-forge --all
 81 | conda install -c conda-forge osmnx scikit-learn psutil tensorflow keras jupyter
 82 | ```
 83 | 
 84 | ## Usage
 85 | The framework is presented through different [examples](https://github.com/lgervasoni/urbansprawl/tree/master/examples) in the form of notebooks. As well, the computational running times involved in each procedure are shown for each example. To this end, a _r5.large_ [AWS EC2](https://aws.amazon.com/ec2/) instance was employed (2 vCPU and 16GiB memory) to run the notebooks.
 86 | 
 87 | Please note that the different procedures can be both memory and time consuming, according to the size of the chosen region of interest. In order to run the different notebooks, type in a terminal:
 88 | ```sh
 89 | jupyter notebook
 90 | ```
 91 | 
 92 | ## Example: Urban sprawl
 93 | 
 94 | OpenStreetMap data is retrieved using the Overpass API.
 95 | An input region of interest can be extracted by:
 96 | 
 97 | * Place + result number: The name of the city/region, and the resulting number to retrieve (as seen in OpenStreetMap result order)
 98 | * Polygon: A polygon with the coordinates delimitating the desired region of interest
 99 | * Bounding box: Using northing, southing, easting, and westing coordinates
100 | * Point + distance (meters): Use the (latitude, longitude) central point plus an input distance around it
101 | * Address + distance (meters): Set the address as central point and an input distance around it
102 | 
103 | Additionally, the state of the data-base can be retrieved for a specific data. 
104 | This allows for comparisons across time and keeping track of a city's evolution.
105 | 
106 | Results are depicted for the city of **Lyon, France**:
107 | 
108 | - Locations of residential and activity land uses are retrieved
109 | 
110 | * Buildings with defined land use:
111 | 	* Blue: Residential use
112 | 	* Red: Activity use
113 | 	* Green: Mixed use
114 | 
115 | ![Buildings](examples/images/Lyon_Buildings.png?raw=true)
116 | 
117 | * Points of interest (POIs) with defined land use:
118 | 
119 | ![POI](examples/images/Lyon_POIs.png?raw=true)
120 | 
121 | - Densities for each land use are estimated:
122 | 
123 | 	* Probability density function estimated using Kernel Density Estimation (KDE)
124 | 
125 | ![Densit](examples/images/Lyon_densities.png?raw=true)
126 | 
127 | * Activity uses can be further classified using the OSM wiki:
128 | 	* Leisure and amenity
129 | 	* Shop
130 | 	* Commercial and industrial
131 | 
132 | ![Activ](examples/images/Lyon_activities_densities.png?raw=true)
133 | 
134 | - Street network:
135 | 
136 | ![SN](examples/images/Lyon_graph.png?raw=true)
137 | 
138 | **Sprawling indices:**
139 | 
140 | - Land use mix indices: Degree of co-occurence of differing land uses within 'walkable' distances.
141 | 
142 | ![LUM](examples/images/Lyon_Landusemix.png?raw=true)
143 | 
144 | - Accessibility indices: Denotes the degree of accessibility to differing land uses (from residential to activity uses).
145 | 
146 | 	* Fixed activities: Represents the distance needed to travel in order to reach a certain number of activity land uses
147 | 
148 | 	* Fixed distance: Denotes the cumulative number of activity opportunities found within a certain travel distance
149 | 
150 | ![Acc](examples/images/Lyon_Accessibility.png?raw=true)
151 | 
152 | - Dispersion indices: Denotes the degree of scatteredness of the built-up area.
153 | 
154 | 	* "A landscape suffers from urban sprawl if it is permeated by urban development or solitary buildings [...]. The more area built over and the more dispersed the built-up area, [...] the higher the degree of urban sprawl" [(Jaeger and Schwick 2014)](http://www.sciencedirect.com/science/article/pii/S1470160X13004858)
155 | 
156 | ![Disp](examples/images/Lyon_Dispersion.png?raw=true)
157 | 
158 | ## Example: Population densities
159 | 
160 | Gridded population data is used in the context of population densities downscaling:
161 | 
162 | * A fine scale description of residential land use (surface) per building is built exploiting OpenStreetMap.
163 | 
164 | * Using coarse-scale gridded population data, we perform the down-scaling for each household given their containing area for residential usage
165 | 
166 | * The evaluation is carried out using fine-grained census block data (INSEE) for cities in France as ground-truth.
167 | 
168 | Population count images are depicted for the city of **Grenoble, France**:
169 | 
170 | - Population densities (INSEE census data):
171 | 
172 | ![INSEE](examples/images/Grenoble_INSEE.png?raw=true)
173 | 
174 | 
175 | - Population densities (INSEE census data, Gridded Population World resolution):
176 | 
177 | ![GPW](examples/images/Grenoble_GPW_simulation.png?raw=true)
178 | 
179 | 


--------------------------------------------------------------------------------
/examples/images/Grenoble_GPW_simulation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Grenoble_GPW_simulation.png


--------------------------------------------------------------------------------
/examples/images/Grenoble_INSEE.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Grenoble_INSEE.png


--------------------------------------------------------------------------------
/examples/images/Lyon_Accessibility.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Accessibility.png


--------------------------------------------------------------------------------
/examples/images/Lyon_Buildings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Buildings.png


--------------------------------------------------------------------------------
/examples/images/Lyon_Dispersion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Dispersion.png


--------------------------------------------------------------------------------
/examples/images/Lyon_Landusemix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Landusemix.png


--------------------------------------------------------------------------------
/examples/images/Lyon_POIs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_POIs.png


--------------------------------------------------------------------------------
/examples/images/Lyon_activities_densities.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_activities_densities.png


--------------------------------------------------------------------------------
/examples/images/Lyon_densities.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_densities.png


--------------------------------------------------------------------------------
/examples/images/Lyon_graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_graph.png


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import find_packages, setup
 2 | 
 3 | with open('urbansprawl/__init__.py', 'r') as f:
 4 |     for line in f:
 5 |         if line.startswith('__version__'):
 6 |             version = line.strip().split('=')[1].strip(' \'"')
 7 |             break
 8 |     else:
 9 |         version = '0.0.1'
10 | 
11 | with open('README.md', 'rb') as f:
12 |     readme = f.read().decode('utf-8')
13 | 
14 | install_requires = [
15 |     'psutil',
16 |     'numpy<=1.14.1',
17 |     'pandas',
18 |     'matplotlib',
19 |     'shapely',
20 |     'geopandas',
21 |     'scikit-learn',
22 |     'tensorflow<=1.10.0',
23 |     'keras',
24 |     'networkx',
25 |     'osmnx',
26 |     'jupyter'
27 | ]
28 | 
29 | setup(
30 |     name='urbansprawl',
31 |     keywords=['urbansprawl', 'land use mix', 'gis', 'spatial analysis', 'machine learning', 'openstreetmap', 'population density', 'population downscaling', 'neural networks'],
32 |     version=version,
33 |     description='The urbansprawl project provides an open source framework for assessing urban sprawl using open data',
34 |     long_description=readme,
35 |     author='Luciano Gervasoni',
36 |     author_email='gervasoni.luc@gmail.com',
37 |     maintainer='Luciano Gervasoni',
38 |     maintainer_email='gervasoni.luc@gmail.com',
39 |     license='MIT',
40 |     url='https://github.com/lgervasoni/urbansprawl',
41 |     entry_points={ },
42 |     classifiers=[
43 |         'Development Status :: 4 - Beta',
44 |         'Intended Audience :: Developers',
45 |         'Intended Audience :: Science/Research',
46 |         'Topic :: Scientific/Engineering :: GIS',
47 |         'Topic :: Scientific/Engineering :: Visualization',
48 |         'Topic :: Scientific/Engineering :: Physics',
49 |         'Topic :: Scientific/Engineering :: Mathematics',
50 |         'Topic :: Scientific/Engineering :: Information Analysis',
51 |         'Topic :: Scientific/Engineering :: Artificial Intelligence',
52 |         'Operating System :: OS Independent',
53 |         'License :: OSI Approved :: MIT License',
54 |         'Programming Language :: Python :: 2.7',
55 |         'Programming Language :: Python :: 3.5',
56 |         'Programming Language :: Python :: 3.6',
57 |         'Programming Language :: Python :: Implementation :: CPython',
58 |     ],
59 |     install_requires=install_requires,
60 |     # pip install -e .[dev]
61 |     extras_require={'dev': ['pytest', 'flake8', 'ipython', 'ipdb']},
62 |     packages=find_packages(exclude=['examples']),
63 | )
64 | 


--------------------------------------------------------------------------------
/urbansprawl/__init__.py:
--------------------------------------------------------------------------------
 1 | """urbansprawl package
 2 | """
 3 | 
 4 | # OpenStreetMap data
 5 | from .osm.core import get_route_graph, get_processed_osm_data
 6 | 
 7 | # Spatial urban sprawl indices
 8 | from .sprawl.core import compute_grid_landusemix, compute_grid_accessibility, compute_grid_dispersion
 9 | from .sprawl.core import get_indices_grid, process_spatial_indices
10 | 
11 | # Disaggrated population estimates
12 | from .population.core import get_extract_population_data, compute_full_urban_features, get_training_testing_data, get_Y_X_features_population_data
13 | from .population.core import get_aggregated_squares, proportional_population_downscaling, population_downscaling_validation
14 | 
15 | 
16 | __version__ = '1.1'
17 | 


--------------------------------------------------------------------------------
/urbansprawl/osm/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/urbansprawl/osm/__init__.py


--------------------------------------------------------------------------------
/urbansprawl/osm/classification.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import osmnx as ox
  7 | import pandas as pd
  8 | import geopandas as gpd
  9 | import numpy as np
 10 | from scipy import spatial
 11 | 
 12 | from osmnx import log
 13 | 
 14 | from .tags import key_classification, landuse_classification, activity_classification
 15 | 
 16 | ####################################################################################
 17 | # Under uncertainty: Residential assumption?
 18 | RESIDENTIAL_ASSUMPTION_UNCERTAINTY = True
 19 | ####################################################################################
 20 | 
 21 | ############################################
 22 | ### Tag land use classification
 23 | ############################################
 24 | 
 25 | def aggregate_classification(classification_list):
 26 | 	""" 
 27 | 	Aggregate into a unique classification given an input list of classifications
 28 | 
 29 | 	Parameters
 30 | 	----------
 31 | 	classification_list : list
 32 | 		list with the input land use classifications
 33 | 	
 34 | 	Returns
 35 | 	----------
 36 | 	string
 37 | 		returns the aggregated classification
 38 | 	"""
 39 | 	if ("other" in classification_list): # other tag -> Non-interesting building
 40 | 		classification = None
 41 | 	elif ( ("activity" in classification_list) and ("residential" in classification_list) ): # Mixed
 42 | 		classification = "mixed"
 43 | 	elif ( "mixed" in classification_list ): # Mixed
 44 | 		classification = "mixed"
 45 | 	elif ("activity" in classification_list): # Activity
 46 | 		classification = "activity"
 47 | 	elif ("residential" in classification_list): # Residential
 48 | 		classification = "residential"
 49 | 	elif ("infer" in  classification_list): # To infer
 50 | 		classification = "infer"
 51 | 	else: # No valuable classification
 52 | 		classification = None
 53 | 
 54 | 	return classification	
 55 | 
 56 | def classify_tag(tags, return_key_value=True):
 57 | 	""" 
 58 | 	Classify the land use of input OSM tag in `activity`, `residential`, `mixed`, None, or `infer` (to infer later)
 59 | 
 60 | 	Parameters
 61 | 	----------
 62 | 	tags : dict
 63 | 		OpenStreetMap tags
 64 | 	
 65 | 	Returns
 66 | 	----------
 67 | 	string, dict
 68 | 		returns the classification, and a dict relating `key`:`value` defining its classification
 69 | 	"""
 70 | 	# key_value: Dictionary of osm key : osm value
 71 | 	classification, key_value = [], {}
 72 | 
 73 | 	for key, value in key_classification.items():
 74 | 		# Get the corresponding key tag (without its land use)
 75 | 		key_tag = key.replace("activity_","").replace("residential_","").replace("other_","").replace("infer_","")
 76 | 
 77 | 		if tags.get(key_tag) in value:
 78 | 			# First part of key defines the land use
 79 | 			new_classification = key.split("_")[0]
 80 | 			# Add the new classification
 81 | 			classification.append( new_classification )
 82 | 			# Associate the key-value
 83 | 			key_value[key_tag] = tags.get(key_tag)
 84 | 
 85 | 	classification = aggregate_classification(classification)
 86 | 
 87 | 	if (return_key_value):
 88 | 		return classification, key_value
 89 | 	else:
 90 | 		return classification
 91 | 
 92 | ############################################
 93 | ### Land use inference
 94 | ############################################
 95 | 
 96 | def classify_landuse_inference(land_use):
 97 | 	""" 
 98 | 	Classify input land use into a defined category: `other`, `activity`, `residential`, or None
 99 | 
100 | 	Parameters
101 | 	----------
102 | 	land_use : string
103 | 		input land use tag
104 | 	
105 | 	Returns
106 | 	----------
107 | 	string
108 | 		returns the land use classification
109 | 	"""
110 | 	for key, value in landuse_classification.items():
111 | 		# key: Classification ; value: keys contained in the classification
112 | 		if (land_use in value):
113 | 			return key
114 | 	# Uncertain case
115 | 	if (RESIDENTIAL_ASSUMPTION_UNCERTAINTY): # Undefined use. Assumption: Residential
116 | 		return "residential"
117 | 	else:
118 | 		return None # No tag
119 | 
120 | def compute_landuse_inference(df_buildings, df_landuse):
121 | 	""" 
122 | 	Compute land use inference for building polygons with no information
123 | 	The inference is done using polygons with defined land use
124 | 	A building polygon's land use is inferred by means of adopting the land use of the smallest encompassing polygon with defined land use
125 | 
126 | 	Parameters
127 | 	----------
128 | 	df_buildings : geopandas.GeoDataFrame
129 | 		input buildings
130 | 	df_landuse : geopandas.GeoDataFrame
131 | 		land use polygons to aid inference procedure
132 | 
133 | 	Returns
134 | 	----------
135 | 	
136 | 	"""
137 | 	# Get those indices which need to be inferred, and keep geometry column only
138 | 	df_buildings_to_infer = df_buildings.loc[ df_buildings['classification'] == 'infer', ["geometry"] ]
139 | 	# Add land use polygon's area
140 | 	df_landuse['area'] = df_landuse.apply(lambda x: x.geometry.area, axis=1)
141 | 
142 | 	# Get geometries to infer within Land use polygons matching
143 | 	sjoin = gpd.sjoin(df_buildings_to_infer, df_landuse, op='within')
144 | 
145 | 	# Add index column to sort values
146 | 	sjoin['index'] = sjoin.index
147 | 	# Sort values by index, then by area
148 | 	sjoin.sort_values(by=['index','area'], inplace=True)
149 | 	# Drop duplicates. Keep first (minimum computing area)
150 | 	sjoin.drop_duplicates(subset=['index'], keep='first', inplace=True)
151 | 
152 | 	##### Set key:value and classification
153 | 	# Set default value: inferred:None
154 | 	df_buildings.loc[ df_buildings_to_infer.index, "key_value" ] = df_buildings.loc[ df_buildings_to_infer.index].apply(lambda x: {"inferred":None} , axis=1)
155 | 	# Set land use for those buildings within a defined land use polygon
156 | 	df_buildings.loc[ sjoin.index, "key_value" ] = sjoin.apply(lambda x: {'inferred':x.landuse}, axis=1)
157 | 
158 | 	# Set classification
159 | 	df_buildings.loc[ df_buildings_to_infer.index, "classification" ] = df_buildings.loc[ df_buildings_to_infer.index, "key_value" ].apply(lambda x: classify_landuse_inference(x.get("inferred")) )
160 | 
161 | 	# Remove useless rows
162 | 	df_buildings.drop( df_buildings[ df_buildings.classification.isin([None,"other"]) ].index, inplace=True)
163 | 	df_buildings.reset_index(inplace=True,drop=True)
164 | 	assert( len( df_buildings[df_buildings.classification.isnull()] ) == 0 )
165 | 
166 | ############################################
167 | ### Activity type classification
168 | ############################################
169 | 
170 | def value_activity_category(x):
171 | 	""" 
172 | 	Classify the activity of input activity value
173 | 
174 | 	Parameters
175 | 	----------
176 | 	x : string
177 | 		activity value
178 | 	
179 | 	Returns
180 | 	----------
181 | 	string
182 | 		returns the activity classification
183 | 	"""
184 | 	for key, value in activity_classification.items():
185 | 		if x in value:
186 | 			return key
187 | 	return None
188 | 
189 | def key_value_activity_category(key, value):
190 | 	""" 
191 | 	Classify the activity of input pair key:value
192 | 
193 | 	Parameters
194 | 	----------
195 | 	key : string
196 | 		key dict
197 | 	value : string
198 | 		value dict
199 | 	
200 | 	Returns
201 | 	----------
202 | 	string
203 | 		returns the activity classification
204 | 	"""
205 | 	# Note that some values repeat for different keys (e.g. shop=fuel and amenity=fuel), but they do not belong to the same activity classification
206 | 	return {
207 | 		'shop': 'shop',
208 | 		'leisure': 'leisure/amenity',
209 | 		'amenity': 'leisure/amenity',
210 | 		'man_made' : 'commercial/industrial',
211 | 		'industrial' : 'commercial/industrial',
212 | 		'landuse' : value_activity_category(value),
213 | 		'inferred' : value_activity_category(value), # Inferred cases adopted land use values
214 | 		'building' : value_activity_category(value),
215 | 		'building:use' : value_activity_category(value),
216 | 		'building:part' : value_activity_category(value)
217 | 	}.get(key, None)
218 | 
219 | def classify_activity_category(key_values):
220 | 	""" 
221 | 	Classify input activity category into `commercial/industrial`, `leisure/amenity`, or `shop`
222 | 
223 | 	Parameters
224 | 	----------
225 | 	key_values : dict
226 | 		contain pairs of key:value relating to its usage
227 | 	
228 | 	Returns
229 | 	----------
230 | 	string
231 | 		returns the activity classification
232 | 	"""
233 | 	####################
234 | 	### Categories: commercial/industrial, leisure/amenity, shop
235 | 	####################
236 | 	categories = set( [ key_value_activity_category(key,value) for key,value in key_values.items() ] )
237 | 	categories.discard(None)
238 | 	return list(categories)
239 | 


--------------------------------------------------------------------------------
/urbansprawl/osm/core.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import osmnx as ox
  7 | import pandas as pd
  8 | import numpy as np
  9 | import time
 10 | import os.path
 11 | from osmnx import log
 12 | 
 13 | from .overpass import create_landuse_gdf, create_pois_gdf, create_building_parts_gdf, create_buildings_gdf_from_input, retrieve_route_graph
 14 | from .tags import columns_osm_tag, height_tags, building_parts_to_filter
 15 | from .classification import compute_landuse_inference, classify_tag, classify_activity_category
 16 | from .surface import compute_landuses_m2
 17 | from .utils import load_geodataframe, store_geodataframe, get_dataframes_filenames, associate_structures, sanity_check_height_tags
 18 | 
 19 | def get_route_graph(city_ref, date="", polygon=None, north=None, south=None, east=None, west=None, force_crs=None):
 20 | 	""" 
 21 | 	Wrapper to retrieve city's street network
 22 | 	Loads the data if stored locally
 23 | 	Otherwise, it retrieves the graph from OpenStreetMap using the osmnx package
 24 | 	Input polygon or bounding box coordinates determine the region of interest
 25 | 
 26 | 	Parameters
 27 | 	----------
 28 | 	city_ref : string
 29 | 		name of the city
 30 | 	polygon : shapely.Polygon
 31 | 		polygon shape of input city
 32 | 	north : float
 33 | 		northern latitude of bounding box
 34 | 	south : float
 35 | 		southern latitude of bounding box
 36 | 	east : float
 37 | 		eastern longitude of bounding box
 38 | 	west : float
 39 | 		western longitude of bounding box
 40 | 	force_crs : dict
 41 | 		graph will be projected to input crs
 42 | 
 43 | 	Returns
 44 | 	----------
 45 | 	networkx.multidigraph
 46 | 		projected graph
 47 | 	"""
 48 | 	return retrieve_route_graph(city_ref, date, polygon, north, south, east, west, force_crs)
 49 | 
 50 | def get_processed_osm_data(city_ref=None, region_args={"polygon":None, "place":None, "which_result":1, "point":None, "address":None, "distance":None, "north":None, "south":None, "east":None, "west":None},
 51 | 			kwargs={"retrieve_graph":True, "default_height":3, "meters_per_level":3, "associate_landuses_m2":True, "mixed_building_first_floor_activity":True, "minimum_m2_building_area":9, "date":None}):
 52 | 	"""
 53 | 	Retrieves buildings, building parts, and Points of Interest associated with a residential/activity land use from OpenStreetMap data for input city
 54 | 	If a name for input city is given, the data will be loaded (if it was previously stored)
 55 | 	If no stored files exist, it will query and process the data and store it under the city name
 56 | 	Queries data for input region (polygon, place, point/address and distance around, or bounding box coordinates)
 57 | 	Additional arguments will drive the overall process
 58 | 
 59 | 	Parameters
 60 | 	----------
 61 | 	city_ref : str
 62 | 		Name of input city / region
 63 | 	region_args : dict
 64 | 		contains the information to retrieve the region of interest as the following:
 65 | 			polygon : shapely Polygon or MultiPolygon
 66 | 				geographic shape to fetch the landuse footprints within
 67 | 			place : string or dict
 68 | 				query string or structured query dict to geocode/download
 69 | 			which_result : int
 70 | 				result number to retrieve from geocode/download when using query string 
 71 | 			point : tuple
 72 | 				the (lat, lon) central point around which to construct the graph
 73 | 			address : string
 74 | 				the address to geocode and use as the central point around which to construct the graph
 75 | 			distance : int
 76 | 				retain only those nodes within this many meters of the center of the graph
 77 | 			north : float
 78 | 				northern latitude of bounding box
 79 | 			south : float
 80 | 				southern latitude of bounding box
 81 | 			east : float
 82 | 				eastern longitude of bounding box
 83 | 			west : float
 84 | 				western longitude of bounding box
 85 | 	kwargs : dict
 86 | 		additional arguments to drive the process:
 87 | 			retrieve_graph : boolean
 88 | 				that determines if the street network for input city has to be retrieved and stored
 89 | 			default_height : float
 90 | 				height of buildings under missing data
 91 | 			meters_per_level : float
 92 | 				buildings number of levels assumed under missing data
 93 | 			associate_landuses_m2 : boolean
 94 | 				compute the total square meter for each land use
 95 | 			mixed_building_first_floor_activity : Boolean
 96 | 				if True: Associates building's first floor to activity uses and the rest to residential uses
 97 | 				if False: Associates half of the building's area to each land use (Activity and Residential)
 98 | 			minimum_m2_building_area : float
 99 | 				minimum area to be considered a building (otherwise filtered)
100 | 			date : datetime.datetime
101 | 				query the database at a certain time-stamp
102 | 
103 | 	Returns
104 | 	----------
105 | 	[ gpd.GeoDataFrame, gpd.GeoDataFrame, gpd.GeoDataFrame ]
106 | 		returns the output geo dataframe containing all buildings, building parts, and points associated to a residential or activity land usage
107 | 	
108 | 	"""
109 | 	log("OSM data requested for city: " + str(city_ref) )
110 | 
111 | 	start_time = time.time()
112 | 
113 | 	if (city_ref):
114 | 		geo_poly_file, geo_poly_parts_file, geo_point_file = get_dataframes_filenames(city_ref)
115 | 
116 | 		##########################
117 | 		### Stored file ?
118 | 		##########################
119 | 		if ( os.path.isfile(geo_poly_file) ): # File exists
120 | 			log("Found stored files for city " + city_ref)
121 | 			# Load local GeoDataFrames
122 | 			return load_geodataframe(geo_poly_file), load_geodataframe(geo_poly_parts_file), load_geodataframe(geo_point_file)
123 | 
124 | 	# Get keyword arguments for input region of interest
125 | 	polygon, place, which_result, point, address, distance, north, south, east, west = region_args.get("polygon"), region_args.get("place"), region_args.get("which_result"), region_args.get("point"), region_args.get("address"), region_args.get("distance"), region_args.get("north"), region_args.get("south"), region_args.get("east"), region_args.get("west")
126 | 
127 | 	### Valid input?
128 | 	if not( any( [not (polygon is None), place, point, address, north, south, east, west] ) ):
129 | 		log("Error: Must provide at least one type of input")
130 | 		return None, None, None
131 | 
132 | 	if ( kwargs.get("date") ): # Non-null date
133 | 		date_ = kwargs.get("date").strftime("%Y-%m-%dT%H:%M:%SZ")
134 | 		log("Requesting OSM database at time-stamp: " + date_)
135 | 		# e.g.: [date:"2004-05-06T00:00:00Z"]
136 | 		date_query = '[date:"'+date_+'"]'
137 | 	else:
138 | 		date_query = ""
139 | 
140 | 	##########################
141 | 	### Overpass query: Buildings
142 | 	##########################
143 | 	# Query and update bounding box / polygon
144 | 	df_osm_built, polygon, north, south, east, west = create_buildings_gdf_from_input(date=date_query, polygon=polygon, place=place, which_result=which_result, point=point, address=address, distance=distance, north=north, south=south, east=east, west=west)
145 | 	df_osm_built["osm_id"] = df_osm_built.index
146 | 	df_osm_built.reset_index(drop=True, inplace=True)
147 | 	df_osm_built.gdf_name = str(city_ref) + '_buildings' if not city_ref is None else 'buildings'
148 | 	##########################
149 | 	### Overpass query: Land use polygons. Aid to perform buildings land use inference
150 | 	##########################
151 | 	df_osm_lu = create_landuse_gdf(date=date_query, polygon=polygon, north=north, south=south, east=east, west=west)
152 | 	df_osm_lu["osm_id"] = df_osm_lu.index
153 | 	# Drop useless columns
154 | 	columns_of_interest = ["osm_id", "geometry", "landuse"]
155 | 	df_osm_lu.drop( [ col for col in list( df_osm_lu.columns ) if not col in columns_of_interest ], axis=1, inplace=True )
156 | 	df_osm_lu.reset_index(drop=True, inplace=True)
157 | 	df_osm_lu.gdf_name = str(city_ref) + '_landuse' if not city_ref is None else 'landuse'
158 | 	##########################
159 | 	### Overpass query: POIs
160 | 	##########################
161 | 	df_osm_pois = create_pois_gdf(date=date_query, polygon=polygon, north=north, south=south, east=east, west=west)
162 | 	df_osm_pois["osm_id"] = df_osm_pois.index
163 | 	df_osm_pois.reset_index(drop=True, inplace=True)
164 | 	df_osm_pois.gdf_name = str(city_ref) + '_points' if not city_ref is None else 'points'
165 | 	##########
166 | 	### Overpass query: Building parts. Allow to calculate the real amount of M^2 for each building
167 | 	##########
168 | 	df_osm_building_parts = create_building_parts_gdf(date=date_query, polygon=polygon, north=north, south=south, east=east, west=west)	
169 | 	# Filter: 1) rows not needed (roof, etc) and 2) building that already exists in `buildings` extract
170 | 	if ("building" in df_osm_building_parts.columns):
171 | 		df_osm_building_parts = df_osm_building_parts[ (~ df_osm_building_parts["building:part"].isin(building_parts_to_filter) ) & (~ df_osm_building_parts["building:part"].isnull() ) & (df_osm_building_parts["building"].isnull()) ]
172 | 	else:
173 | 		df_osm_building_parts = df_osm_building_parts[ (~ df_osm_building_parts["building:part"].isin(building_parts_to_filter) ) & (~ df_osm_building_parts["building:part"].isnull() ) ]
174 | 	df_osm_building_parts["osm_id"] = df_osm_building_parts.index
175 | 	df_osm_building_parts.reset_index(drop=True, inplace=True)
176 | 	df_osm_building_parts.gdf_name = str(city_ref) + '_building_parts' if not city_ref is None else 'building_parts'
177 | 	
178 | 	log("Done: OSM data requests. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
179 | 
180 | 	####################################################
181 | 	### Sanity check of height tags
182 | 	####################################################
183 | 	start_time = time.time()
184 | 
185 | 	sanity_check_height_tags(df_osm_built)
186 | 	sanity_check_height_tags(df_osm_building_parts)
187 | 
188 | 	def remove_nan_dict(x): # Remove entries with NaN values
189 | 		return { k:v for k, v in x.items() if pd.notnull(v) }
190 | 
191 | 	df_osm_built['height_tags'] = df_osm_built[ [ c for c in height_tags if c in df_osm_built.columns ] ].apply(lambda x: remove_nan_dict(x.to_dict() ), axis=1)
192 | 	df_osm_building_parts['height_tags'] = df_osm_building_parts[ [ c for c in height_tags if c in df_osm_building_parts.columns ] ].apply(lambda x: remove_nan_dict(x.to_dict() ), axis=1)
193 | 
194 | 	###########
195 | 	### Remove columns which do not provide valuable information
196 | 	###########	
197 | 	columns_of_interest = columns_osm_tag + ["osm_id", "geometry", "height_tags"]
198 | 	df_osm_built.drop( [ col for col in list( df_osm_built.columns ) if not col in columns_of_interest ], axis=1, inplace=True )
199 | 	df_osm_building_parts.drop( [ col for col in list( df_osm_building_parts.columns ) if not col in columns_of_interest ], axis=1, inplace=True)
200 | 
201 | 	columns_of_interest = columns_osm_tag + ["osm_id", "geometry"]
202 | 	df_osm_pois.drop( [ col for col in list( df_osm_pois.columns ) if not col in columns_of_interest ], axis=1, inplace=True )	
203 | 
204 | 	
205 | 	log('Done: Height tags sanity check and unnecessary columns have been dropped. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
206 | 
207 | 	###########
208 | 	### Classification
209 | 	###########
210 | 	start_time = time.time()
211 | 
212 | 	df_osm_built['classification'], df_osm_built['key_value'] = list( zip(*df_osm_built.apply( classify_tag, axis=1) ) )
213 | 	df_osm_pois['classification'], df_osm_pois['key_value'] = list( zip(*df_osm_pois.apply( classify_tag, axis=1) ) )
214 | 	df_osm_building_parts['classification'], df_osm_building_parts['key_value'] = list( zip(*df_osm_building_parts.apply( classify_tag, axis=1) ) )
215 | 
216 | 	# Remove unnecessary buildings
217 | 	df_osm_built.drop( df_osm_built[ df_osm_built.classification.isnull() ].index, inplace=True )
218 | 	df_osm_built.reset_index(inplace=True, drop=True)
219 | 	# Remove unnecessary POIs
220 | 	df_osm_pois.drop( df_osm_pois[ df_osm_pois.classification.isin(["infer","other"]) | df_osm_pois.classification.isnull() ].index, inplace=True )
221 | 	df_osm_pois.reset_index(inplace=True, drop=True)
222 | 	# Building parts will acquire its containing building land use if it is not available
223 | 	df_osm_building_parts.loc[ df_osm_building_parts.classification.isin(["infer","other"]), "classification" ] = None
224 | 
225 | 	log('Done: OSM tags classification. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
226 | 	
227 | 	###########
228 | 	### Remove already used tags
229 | 	###########
230 | 	start_time = time.time()
231 | 
232 | 	df_osm_built.drop( [ c for c in columns_osm_tag if c in df_osm_built.columns ], axis=1, inplace=True )
233 | 	df_osm_pois.drop( [ c for c in columns_osm_tag if c in df_osm_pois.columns ], axis=1, inplace=True )
234 | 	df_osm_building_parts.drop( [ c for c in columns_osm_tag if c in df_osm_building_parts.columns ], axis=1, inplace=True)
235 | 
236 | 	###########
237 | 	### Project, drop small buildings and reset indices
238 | 	###########
239 | 	### Project to UTM coordinates within the same zone
240 | 	df_osm_built = ox.project_gdf(df_osm_built)
241 | 	df_osm_lu = ox.project_gdf(df_osm_lu, to_crs=df_osm_built.crs)
242 | 	df_osm_pois = ox.project_gdf(df_osm_pois, to_crs=df_osm_built.crs)
243 | 	df_osm_building_parts = ox.project_gdf(df_osm_building_parts, to_crs=df_osm_built.crs)
244 | 
245 | 	# Drop buildings with an area lower than a threshold
246 | 	df_osm_built.drop( df_osm_built[ df_osm_built.geometry.area < kwargs["minimum_m2_building_area"] ].index, inplace=True )
247 | 
248 | 	log('Done: Geometries re-projection. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
249 | 
250 | 	####################################################
251 | 	### Infer buildings land use (under uncertainty)
252 | 	####################################################
253 | 	start_time = time.time()
254 | 
255 | 	compute_landuse_inference(df_osm_built, df_osm_lu)
256 | 	# Free space
257 | 	del df_osm_lu
258 | 
259 | 	assert( len( df_osm_built[df_osm_built.key_value =={"inferred":"other"} ] ) == 0 )
260 | 	assert( len( df_osm_built[df_osm_built.classification.isnull()] ) == 0 )
261 | 	assert( len( df_osm_pois[df_osm_pois.classification.isnull()] ) == 0 )
262 | 
263 | 	log('Done: Land use deduction. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
264 | 
265 | 	####################################################
266 | 	### Associate for each building, its containing building parts and Points of interest
267 | 	####################################################
268 | 	start_time = time.time()
269 | 
270 | 	associate_structures(df_osm_built, df_osm_building_parts, operation='contains', column='containing_parts')
271 | 	associate_structures(df_osm_built, df_osm_pois, operation='intersects', column='containing_poi')
272 | 
273 | 	# Classify activity types
274 | 	df_osm_built['activity_category'] = df_osm_built.apply(lambda x: classify_activity_category(x.key_value), axis=1)
275 | 	df_osm_pois['activity_category'] = df_osm_pois.apply(lambda x: classify_activity_category(x.key_value), axis=1)
276 | 	df_osm_building_parts['activity_category'] = df_osm_building_parts.apply(lambda x: classify_activity_category(x.key_value), axis=1)
277 | 
278 | 	log('Done: Building parts association and activity categorization. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
279 | 
280 | 	####################################################
281 | 	### Associate effective number of levels, and measure the surface dedicated to each land use per building
282 | 	####################################################
283 | 	if (kwargs["associate_landuses_m2"]):
284 | 		start_time = time.time()
285 | 
286 | 		default_height = kwargs["default_height"]
287 | 		meters_per_level = kwargs["meters_per_level"]
288 | 		mixed_building_first_floor_activity = kwargs["mixed_building_first_floor_activity"]
289 | 		compute_landuses_m2(df_osm_built, df_osm_building_parts, df_osm_pois, default_height=default_height, meters_per_level=meters_per_level, mixed_building_first_floor_activity=mixed_building_first_floor_activity)
290 | 
291 | 		# Set the composed classification given, for each building, its containing Points of Interest and building parts classification
292 | 		df_osm_built.loc[ df_osm_built.apply(lambda x: x.landuses_m2["activity"]>0 and x.landuses_m2["residential"]>0, axis=1 ), "classification" ] = "mixed"
293 | 
294 | 		log('Done: Land uses surface association. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
295 | 
296 | 	df_osm_built.loc[ df_osm_built.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan
297 | 	df_osm_pois.loc[ df_osm_pois.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan
298 | 	df_osm_building_parts.loc[ df_osm_building_parts.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan		
299 | 
300 | 	##########################
301 | 	### Overpass query: Street network graph
302 | 	##########################
303 | 	if (kwargs["retrieve_graph"]): # Save graph for input city shape
304 | 		start_time = time.time()
305 | 
306 | 		get_route_graph(city_ref, date=date_query, polygon=polygon, north=north, south=south, east=east, west=west, force_crs=df_osm_built.crs)
307 | 
308 | 		log('Done: Street network graph retrieval. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) )
309 | 
310 | 	##########################
311 | 	### Store file ?
312 | 	##########################
313 | 	if ( city_ref ): # File exists
314 | 		# Save GeoDataFrames
315 | 		store_geodataframe(df_osm_built, geo_poly_file)
316 | 		store_geodataframe(df_osm_building_parts, geo_poly_parts_file)
317 | 		store_geodataframe(df_osm_pois, geo_point_file)
318 | 		log("Stored OSM data files for city: "+city_ref)
319 | 
320 | 	return df_osm_built, df_osm_building_parts, df_osm_pois


--------------------------------------------------------------------------------
/urbansprawl/osm/overpass.py:
--------------------------------------------------------------------------------
   1 | ###################################################################################################
   2 | # Repository: https://github.com/lgervasoni/urbansprawl
   3 | # MIT License
   4 | ###################################################################################################
   5 | 
   6 | import time
   7 | import geopandas as gpd
   8 | from shapely.geometry import Point
   9 | from shapely.geometry import Polygon
  10 | from shapely.geometry import MultiPolygon
  11 | 
  12 | from osmnx import log
  13 | import logging as lg
  14 | import osmnx as ox
  15 | 
  16 | #######################################################################
  17 | ### Buildings
  18 | #######################################################################
  19 | 
  20 | def create_buildings_gdf_from_input(date="", polygon=None, place=None, which_result=1, point=None, address=None, distance=None, north=None, south=None, east=None, west=None):
  21 | 	""" 
  22 | 	Retrieve OSM buildings according to input data
  23 | 	Queries data for input region (polygon, place, point/address and distance around, or bounding box coordinates)
  24 | 	Updates the used polygon/bounding box to determine the region of interest	
  25 | 
  26 | 	Parameters
  27 | 	----------
  28 | 	date : string
  29 | 		query the database at a certain timestamp
  30 | 	polygon : shapely Polygon or MultiPolygon
  31 | 		geographic shape to fetch the landuse footprints within
  32 | 	place : string or dict
  33 | 		query string or structured query dict to geocode/download
  34 | 	which_result : int
  35 | 		result number to retrieve from geocode/download when using query string 
  36 | 	point : tuple
  37 | 		the (lat, lon) central point around which to construct the graph
  38 | 	address : string
  39 | 		the address to geocode and use as the central point around which to construct the graph
  40 | 	distance : int
  41 | 		retain only those nodes within this many meters of the center of the graph
  42 | 	north : float
  43 | 		northern latitude of bounding box
  44 | 	south : float
  45 | 		southern latitude of bounding box
  46 | 	east : float
  47 | 		eastern longitude of bounding box
  48 | 	west : float
  49 | 		western longitude of bounding box
  50 | 
  51 | 	Returns
  52 | 	----------
  53 | 	[ geopandas.GeoDataFrame, shapely.Polygon, float, float, float, float ]
  54 | 		retrieved buildings, region of interest polygon, and region of interest bounding box
  55 | 	"""
  56 | 	##########################
  57 | 	### Osmnx query: Buildings
  58 | 	##########################
  59 | 	if (not polygon is None):  # Polygon
  60 | 		log("Input type: Polygon")
  61 | 		# If input geo data frame, extract polygon shape
  62 | 		if ( type(polygon) is gpd.GeoDataFrame ):
  63 | 			assert( polygon.shape[0] == 1 )
  64 | 			polygon = polygon.geometry[0]
  65 | 		df_osm_built = buildings_from_polygon(date, polygon)
  66 | 	
  67 | 	elif ( all( [point,distance] ) ):  # Point + distance
  68 | 		log("Input type: Point")
  69 | 		df_osm_built = buildings_from_point(date, point, distance=distance)
  70 | 		# Get bounding box
  71 | 		west, south, east, north = df_osm_built.total_bounds
  72 | 	
  73 | 	elif ( all( [address,distance] ) ):  # Address
  74 | 		log("Input type: Address")
  75 | 		df_osm_built = buildings_from_address(date, address, distance=distance)
  76 | 		# Get bounding box
  77 | 		west, south, east, north = df_osm_built.total_bounds
  78 | 
  79 | 	elif (place):  # Place
  80 | 		log("Input type: Place")
  81 | 		if (which_result is None): which_result = 1
  82 | 		df_osm_built = buildings_from_place(date, place, which_result=which_result)
  83 | 		# Get encompassing polygon
  84 | 		poly_gdf = ox.gdf_from_place(place, which_result=which_result)
  85 | 		polygon = poly_gdf.geometry[0]
  86 | 	
  87 | 	elif ( all( [north,south,east,west] ) ): # Bounding box
  88 | 		log("Input type: Bounding box")
  89 | 		# Create points in specific order
  90 | 		p1 = (east,north)
  91 | 		p2 = (west,north)
  92 | 		p3 = (west,south)
  93 | 		p4 = (east,south)	
  94 | 		polygon = Polygon( [p1,p2,p3,p4] )
  95 | 		df_osm_built = buildings_from_polygon(date, polygon)	
  96 | 	else:
  97 | 		log("Error: Must provide at least one input")
  98 | 		return
  99 | 	return df_osm_built, polygon, north, south, east, west
 100 | 
 101 | def osm_bldg_download(date="", polygon=None, north=None, south=None, east=None, west=None,
 102 | 					  timeout=180, memory=None, max_query_area_size=50*1000*50*1000):
 103 | 	"""
 104 | 	Download OpenStreetMap building footprint data.
 105 | 	Parameters
 106 | 	----------
 107 | 	date : string
 108 | 		query the database at a certain timestamp
 109 | 	polygon : shapely Polygon or MultiPolygon
 110 | 		geographic shape to fetch the building footprints within
 111 | 	north : float
 112 | 		northern latitude of bounding box
 113 | 	south : float
 114 | 		southern latitude of bounding box
 115 | 	east : float
 116 | 		eastern longitude of bounding box
 117 | 	west : float
 118 | 		western longitude of bounding box
 119 | 	timeout : int
 120 | 		the timeout interval for requests and to pass to API
 121 | 	memory : int
 122 | 		server memory allocation size for the query, in bytes. If none, server
 123 | 		will use its default allocation size
 124 | 	max_query_area_size : float
 125 | 		max area for any part of the geometry, in the units the geometry is in:
 126 | 		any polygon bigger will get divided up for multiple queries to API
 127 | 		(default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are
 128 | 		meters))
 129 | 	Returns
 130 | 	-------
 131 | 	list
 132 | 		list of response_json dicts
 133 | 	"""
 134 | 
 135 | 	# check if we're querying by polygon or by bounding box based on which
 136 | 	# argument(s) where passed into this function
 137 | 	by_poly = polygon is not None
 138 | 	by_bbox = not (north is None or south is None or east is None or west is None)
 139 | 	if not (by_poly or by_bbox):
 140 | 		raise ValueError('You must pass a polygon or north, south, east, and west')
 141 | 
 142 | 	response_jsons = []
 143 | 
 144 | 	# pass server memory allocation in bytes for the query to the API
 145 | 	# if None, pass nothing so the server will use its default allocation size
 146 | 	# otherwise, define the query's maxsize parameter value as whatever the
 147 | 	# caller passed in
 148 | 	if memory is None:
 149 | 		maxsize = ''
 150 | 	else:
 151 | 		maxsize = '[maxsize:{}]'.format(memory)
 152 | 
 153 | 	# define the query to send the API
 154 | 	if by_bbox:
 155 | 		# turn bbox into a polygon and project to local UTM
 156 | 		polygon = Polygon([(west, south), (east, south), (east, north), (west, north)])
 157 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 158 | 
 159 | 		# subdivide it if it exceeds the max area size (in meters), then project
 160 | 		# back to lat-long
 161 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 162 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 163 | 		log('Requesting building footprints data within bounding box from API in {:,} request(s)'.format(len(geometry)))
 164 | 		start_time = time.time()
 165 | 
 166 | 		# loop through each polygon rectangle in the geometry (there will only
 167 | 		# be one if original bbox didn't exceed max area size)
 168 | 		for poly in geometry:
 169 | 			# represent bbox as south,west,north,east and round lat-longs to 8
 170 | 			# decimal places (ie, within 1 mm) so URL strings aren't different
 171 | 			# due to float rounding issues (for consistent caching)
 172 | 			west, south, east, north = poly.bounds
 173 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((way["building"]({south:.8f},'
 174 | 							  '{west:.8f},{north:.8f},{east:.8f});(._;>;););(relation["building"]'
 175 | 							  '({south:.8f},{west:.8f},{north:.8f},{east:.8f});(._;>;);););out;')
 176 | 			query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize)
 177 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
 178 | 			response_jsons.append(response_json)
 179 | 		msg = ('Got all building footprints data within bounding box from '
 180 | 			   'API in {:,} request(s) and {:,.2f} seconds')
 181 | 		log(msg.format(len(geometry), time.time()-start_time))
 182 | 
 183 | 	elif by_poly:
 184 | 		# project to utm, divide polygon up into sub-polygons if area exceeds a
 185 | 		# max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates
 186 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 187 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 188 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 189 | 		polygon_coord_strs = ox.get_polygons_coordinates(geometry)
 190 | 		log('Requesting building footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs)))
 191 | 		start_time = time.time()
 192 | 
 193 | 		# pass each polygon exterior coordinates in the list to the API, one at
 194 | 		# a time
 195 | 		for polygon_coord_str in polygon_coord_strs:
 196 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(way'
 197 | 							  '(poly:"{polygon}")["building"];(._;>;);relation'
 198 | 							  '(poly:"{polygon}")["building"];(._;>;););out;')
 199 | 			query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize)
 200 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
 201 | 			response_jsons.append(response_json)
 202 | 		msg = ('Got all building footprints data within polygon from API in '
 203 | 			   '{:,} request(s) and {:,.2f} seconds')
 204 | 		log(msg.format(len(polygon_coord_strs), time.time()-start_time))
 205 | 
 206 | 	return response_jsons
 207 | 
 208 | 
 209 | def create_buildings_gdf(date="", polygon=None, north=None, south=None, east=None,
 210 | 						 west=None, retain_invalid=False):
 211 | 	"""
 212 | 	Get building footprint data from OSM then assemble it into a GeoDataFrame.
 213 | 	Parameters
 214 | 	----------
 215 | 	date : string
 216 | 		query the database at a certain timestamp
 217 | 	polygon : shapely Polygon or MultiPolygon
 218 | 		geographic shape to fetch the building footprints within
 219 | 	north : float
 220 | 		northern latitude of bounding box
 221 | 	south : float
 222 | 		southern latitude of bounding box
 223 | 	east : float
 224 | 		eastern longitude of bounding box
 225 | 	west : float
 226 | 		western longitude of bounding box
 227 | 	retain_invalid : bool
 228 | 		if False discard any building footprints with an invalid geometry
 229 | 	Returns
 230 | 	-------
 231 | 	GeoDataFrame
 232 | 	"""
 233 | 
 234 | 	responses = osm_bldg_download(date, polygon, north, south, east, west)
 235 | 
 236 | 	vertices = {}
 237 | 	for response in responses:
 238 | 		for result in response['elements']:
 239 | 			if 'type' in result and result['type']=='node':
 240 | 				vertices[result['id']] = {'lat' : result['lat'],
 241 | 										  'lon' : result['lon']}
 242 | 
 243 | 	buildings = {}
 244 | 	for response in responses:
 245 | 		for result in response['elements']:
 246 | 			if 'type' in result and result['type']=='way':
 247 | 				nodes = result['nodes']
 248 | 				try:
 249 | 					polygon = Polygon([(vertices[node]['lon'], vertices[node]['lat']) for node in nodes])
 250 | 				except Exception:
 251 | 					log('Polygon has invalid geometry: {}'.format(nodes))
 252 | 				building = {'nodes' : nodes,
 253 | 							'geometry' : polygon}
 254 | 
 255 | 				if 'tags' in result:
 256 | 					for tag in result['tags']:
 257 | 						building[tag] = result['tags'][tag]
 258 | 
 259 | 				buildings[result['id']] = building
 260 | 
 261 | 	gdf = gpd.GeoDataFrame(buildings).T
 262 | 	gdf.crs = {'init':'epsg:4326'}
 263 | 
 264 | 	if not retain_invalid:
 265 | 		# drop all invalid geometries
 266 | 		gdf = gdf[gdf['geometry'].is_valid]
 267 | 
 268 | 	return gdf
 269 | 
 270 | 
 271 | def buildings_from_point(date, point, distance, retain_invalid=False):
 272 | 	"""
 273 | 	Get building footprints within some distance north, south, east, and west of
 274 | 	a lat-long point.
 275 | 	Parameters
 276 | 	----------
 277 | 	date : string
 278 | 		query the database at a certain timestamp
 279 | 	point : tuple
 280 | 		a lat-long point
 281 | 	distance : numeric
 282 | 		distance in meters
 283 | 	retain_invalid : bool
 284 | 		if False discard any building footprints with an invalid geometry
 285 | 	Returns
 286 | 	-------
 287 | 	GeoDataFrame
 288 | 	"""
 289 | 
 290 | 	bbox = ox.bbox_from_point(point=point, distance=distance)
 291 | 	north, south, east, west = bbox
 292 | 	return create_buildings_gdf(date=date, north=north, south=south, east=east, west=west, retain_invalid=retain_invalid)
 293 | 
 294 | 
 295 | def buildings_from_address(date, address, distance, retain_invalid=False):
 296 | 	"""
 297 | 	Get building footprints within some distance north, south, east, and west of
 298 | 	an address.
 299 | 	Parameters
 300 | 	----------
 301 | 	date : string
 302 | 		query the database at a certain timestamp
 303 | 	address : string
 304 | 		the address to geocode to a lat-long point
 305 | 	distance : numeric
 306 | 		distance in meters
 307 | 	retain_invalid : bool
 308 | 		if False discard any building footprints with an invalid geometry
 309 | 	Returns
 310 | 	-------
 311 | 	GeoDataFrame
 312 | 	"""
 313 | 
 314 | 	# geocode the address string to a (lat, lon) point
 315 | 	point = ox.geocode(query=address)
 316 | 
 317 | 	# get buildings within distance of this point
 318 | 	return buildings_from_point(date, point, distance, retain_invalid=retain_invalid)
 319 | 
 320 | 
 321 | def buildings_from_polygon(date, polygon, retain_invalid=False):
 322 | 	"""
 323 | 	Get building footprints within some polygon.
 324 | 	Parameters
 325 | 	----------
 326 | 	date : string
 327 | 		query the database at a certain timestamp
 328 | 	polygon : Polygon
 329 | 	retain_invalid : bool
 330 | 		if False discard any building footprints with an invalid geometry
 331 | 	Returns
 332 | 	-------
 333 | 	GeoDataFrame
 334 | 	"""
 335 | 
 336 | 	return create_buildings_gdf(date=date, polygon=polygon, retain_invalid=retain_invalid)
 337 | 
 338 | 
 339 | def buildings_from_place(date, place, which_result=1, retain_invalid=False):
 340 | 	"""
 341 | 	Get building footprints within the boundaries of some place.
 342 | 	Parameters
 343 | 	----------
 344 | 	date : string
 345 | 		query the database at a certain timestamp
 346 | 	place : string
 347 | 		the query to geocode to get geojson boundary polygon
 348 | 	which_result : int
 349 | 		result number to retrieve from geocode/download when using query string 
 350 | 	retain_invalid : bool
 351 | 		if False discard any building footprints with an invalid geometry
 352 | 	Returns
 353 | 	-------
 354 | 	GeoDataFrame
 355 | 	"""
 356 | 	city = ox.gdf_from_place(place, which_result=which_result)
 357 | 	polygon = city['geometry'].iloc[0]
 358 | 	return create_buildings_gdf(date=date, polygon=polygon, retain_invalid=retain_invalid)
 359 | 
 360 | #######################################################################
 361 | ### Street network graph
 362 | #######################################################################
 363 | 
 364 | def retrieve_route_graph(city_ref, date="", polygon=None, north=None, south=None, east=None, west=None, force_crs=None):
 365 | 	""" 
 366 | 	Retrieves street network graph for given `city_ref`
 367 | 	Loads the data if stored locally
 368 | 	Otherwise, it retrieves the graph from OpenStreetMap using the osmnx package
 369 | 	Input polygon or bounding box coordinates determine the region of interest
 370 | 
 371 | 	Parameters
 372 | 	----------
 373 | 	city_ref : string
 374 | 		name of the city
 375 | 	date : string
 376 | 		query the database at a certain timestamp
 377 | 	polygon : shapely.Polygon
 378 | 		polygon shape of input city
 379 | 	north : float
 380 | 		northern latitude of bounding box
 381 | 	south : float
 382 | 		southern latitude of bounding box
 383 | 	east : float
 384 | 		eastern longitude of bounding box
 385 | 	west : float
 386 | 		western longitude of bounding box
 387 | 	force_crs : dict
 388 | 		graph will be projected to input crs
 389 | 
 390 | 	Returns
 391 | 	----------
 392 | 	networkx.multidigraph
 393 | 		projected graph
 394 | 	"""
 395 | 	try:
 396 | 		G = ox.load_graphml(city_ref+'_network.graphml')
 397 | 		log( "Found graph for `"+city_ref+"` stored locally" )
 398 | 	except:
 399 | 		try:
 400 | 			if (not polygon is None):
 401 | 				G = graph_from_polygon(polygon, network_type='drive_service', date=date)
 402 | 			elif ( all( [north,south,east,west] ) ):
 403 | 				G = graph_from_bbox(north, south, east, west, network_type='drive_service', date=date)
 404 | 			else: # No inputs
 405 | 				log("Need an input to retrieve graph")
 406 | 				assert(False)
 407 | 
 408 | 			# Set graph name
 409 | 			G.graph['name'] = str(city_ref) + '_street_network' if not city_ref is None else 'street_network'
 410 | 
 411 | 			# Project graph
 412 | 			G = ox.project_graph(G, to_crs=force_crs)
 413 | 			
 414 | 			# Save street network as GraphML file
 415 | 			ox.save_graphml(G, filename=city_ref+'_network.graphml')
 416 | 			log( "Graph for `"+city_ref+"` has been retrieved and stored" )
 417 | 		except Exception as e:
 418 | 			log( "Osmnx graph could not be retrieved."+str(e), level=lg.ERROR )
 419 | 			return None
 420 | 	return G
 421 | 
 422 | def graph_from_polygon(polygon, network_type='all_private', simplify=True,
 423 | 					   retain_all=False, truncate_by_edge=False, name='unnamed',
 424 | 					   timeout=180, memory=None, date="",
 425 | 					   max_query_area_size=50*1000*50*1000,
 426 | 					   clean_periphery=True, infrastructure='way["highway"]'):
 427 | 	"""
 428 | 	Create a networkx graph from OSM data within the spatial boundaries of the
 429 | 	passed-in shapely polygon.
 430 | 	Parameters
 431 | 	----------
 432 | 	polygon : shapely Polygon or MultiPolygon
 433 | 		the shape to get network data within. coordinates should be in units of
 434 | 		latitude-longitude degrees.
 435 | 	network_type : string
 436 | 		what type of street network to get
 437 | 	simplify : bool
 438 | 		if true, simplify the graph topology
 439 | 	retain_all : bool
 440 | 		if True, return the entire graph even if it is not connected
 441 | 	truncate_by_edge : bool
 442 | 		if True retain node if it's outside bbox but at least one of node's
 443 | 		neighbors are within bbox
 444 | 	name : string
 445 | 		the name of the graph
 446 | 	timeout : int
 447 | 		the timeout interval for requests and to pass to API
 448 | 	memory : int
 449 | 		server memory allocation size for the query, in bytes. If none, server
 450 | 		will use its default allocation size
 451 | 	date : string
 452 | 		query the database at a certain timestamp
 453 | 	max_query_area_size : float
 454 | 		max size for any part of the geometry, in square degrees: any polygon
 455 | 		bigger will get divided up for multiple queries to API
 456 | 	clean_periphery : bool
 457 | 		if True (and simplify=True), buffer 0.5km to get a graph larger than
 458 | 		requested, then simplify, then truncate it to requested spatial extent
 459 | 	infrastructure : string
 460 | 		download infrastructure of given type (default is streets (ie, 'way["highway"]') but other
 461 | 		infrastructures may be selected like power grids (ie, 'way["power"~"line"]'))
 462 | 	Returns
 463 | 	-------
 464 | 	networkx multidigraph
 465 | 	"""
 466 | 
 467 | 	# verify that the geometry is valid and is a shapely Polygon/MultiPolygon
 468 | 	# before proceeding
 469 | 	if not polygon.is_valid:
 470 | 		raise ValueError('Shape does not have a valid geometry')
 471 | 	if not isinstance(polygon, (Polygon, MultiPolygon)):
 472 | 		raise ValueError('Geometry must be a shapely Polygon or MultiPolygon')
 473 | 
 474 | 	if clean_periphery and simplify:
 475 | 		# create a new buffered polygon 0.5km around the desired one
 476 | 		buffer_dist = 500
 477 | 		polygon_utm, crs_utm = ox.project_geometry(geometry=polygon)
 478 | 		polygon_proj_buff = polygon_utm.buffer(buffer_dist)
 479 | 		polygon_buffered, _ = ox.project_geometry(geometry=polygon_proj_buff, crs=crs_utm, to_latlong=True)
 480 | 
 481 | 		# get the network data from OSM,  create the buffered graph, then
 482 | 		# truncate it to the buffered polygon
 483 | 		response_jsons = osm_net_download(polygon=polygon_buffered, network_type=network_type,
 484 | 										  timeout=timeout, memory=memory,
 485 | 										  max_query_area_size=max_query_area_size,
 486 | 										  infrastructure=infrastructure)
 487 | 		G_buffered = ox.create_graph(response_jsons, name=name, retain_all=True, network_type=network_type)
 488 | 		G_buffered = ox.truncate_graph_polygon(G_buffered, polygon_buffered, retain_all=True, truncate_by_edge=truncate_by_edge)
 489 | 
 490 | 		# simplify the graph topology
 491 | 		G_buffered = ox.simplify_graph(G_buffered)
 492 | 
 493 | 		# truncate graph by polygon to return the graph within the polygon that
 494 | 		# caller wants. don't simplify again - this allows us to retain
 495 | 		# intersections along the street that may now only connect 2 street
 496 | 		# segments in the network, but in reality also connect to an
 497 | 		# intersection just outside the polygon
 498 | 		G = ox.truncate_graph_polygon(G_buffered, polygon, retain_all=retain_all, truncate_by_edge=truncate_by_edge)
 499 | 
 500 | 		# count how many street segments in buffered graph emanate from each
 501 | 		# intersection in un-buffered graph, to retain true counts for each
 502 | 		# intersection, even if some of its neighbors are outside the polygon
 503 | 		G.graph['streets_per_node'] = ox.count_streets_per_node(G_buffered, nodes=G.nodes())
 504 | 
 505 | 	else:
 506 | 		# download a list of API responses for the polygon/multipolygon
 507 | 		response_jsons = osm_net_download(polygon=polygon, network_type=network_type,
 508 | 										  timeout=timeout, memory=memory,
 509 | 										  max_query_area_size=max_query_area_size,
 510 | 										  infrastructure=infrastructure)
 511 | 
 512 | 		# create the graph from the downloaded data
 513 | 		G = ox.create_graph(response_jsons, name=name, retain_all=True, network_type=network_type)
 514 | 
 515 | 		# truncate the graph to the extent of the polygon
 516 | 		G = ox.truncate_graph_polygon(G, polygon, retain_all=retain_all, truncate_by_edge=truncate_by_edge)
 517 | 
 518 | 		# simplify the graph topology as the last step. don't truncate after
 519 | 		# simplifying or you may have simplified out to an endpoint beyond the
 520 | 		# truncation distance, in which case you will then strip out your entire
 521 | 		# edge
 522 | 		if simplify:
 523 | 			G = ox.simplify_graph(G)
 524 | 
 525 | 	log('graph_from_polygon() returning graph with {:,} nodes and {:,} edges'.format(len(list(G.nodes())), len(list(G.edges()))))
 526 | 	return G
 527 | 
 528 | def graph_from_bbox(north, south, east, west, network_type='all_private',
 529 | 					simplify=True, retain_all=False, truncate_by_edge=False,
 530 | 					name='unnamed', timeout=180, memory=None, date="",
 531 | 					max_query_area_size=50*1000*50*1000, clean_periphery=True,
 532 | 					infrastructure='way["highway"]'):
 533 | 	"""
 534 | 	Create a networkx graph from OSM data within some bounding box.
 535 | 	Parameters
 536 | 	----------
 537 | 	north : float
 538 | 		northern latitude of bounding box
 539 | 	south : float
 540 | 		southern latitude of bounding box
 541 | 	east : float
 542 | 		eastern longitude of bounding box
 543 | 	west : float
 544 | 		western longitude of bounding box
 545 | 	network_type : string
 546 | 		what type of street network to get
 547 | 	simplify : bool
 548 | 		if true, simplify the graph topology
 549 | 	retain_all : bool
 550 | 		if True, return the entire graph even if it is not connected
 551 | 	truncate_by_edge : bool
 552 | 		if True retain node if it's outside bbox but at least one of node's
 553 | 		neighbors are within bbox
 554 | 	name : string
 555 | 		the name of the graph
 556 | 	timeout : int
 557 | 		the timeout interval for requests and to pass to API
 558 | 	memory : int
 559 | 		server memory allocation size for the query, in bytes. If none, server
 560 | 		will use its default allocation size
 561 | 	date : string
 562 | 		query the database at a certain timestamp
 563 | 	max_query_area_size : float
 564 | 		max size for any part of the geometry, in square degrees: any polygon
 565 | 		bigger will get divided up for multiple queries to API
 566 | 	clean_periphery : bool
 567 | 		if True (and simplify=True), buffer 0.5km to get a graph larger than
 568 | 		requested, then simplify, then truncate it to requested spatial extent
 569 | 	infrastructure : string
 570 | 		download infrastructure of given type (default is streets (ie, 'way["highway"]') but other
 571 | 		infrastructures may be selected like power grids (ie, 'way["power"~"line"]'))
 572 | 	Returns
 573 | 	-------
 574 | 	networkx multidigraph
 575 | 	"""
 576 | 
 577 | 	if clean_periphery and simplify:
 578 | 		# create a new buffered bbox 0.5km around the desired one
 579 | 		buffer_dist = 500
 580 | 		polygon = Polygon([(west, north), (west, south), (east, south), (east, north)])
 581 | 		polygon_utm, crs_utm = ox.project_geometry(geometry=polygon)
 582 | 		polygon_proj_buff = polygon_utm.buffer(buffer_dist)
 583 | 		polygon_buff, _ = ox.project_geometry(geometry=polygon_proj_buff, crs=crs_utm, to_latlong=True)
 584 | 		west_buffered, south_buffered, east_buffered, north_buffered = polygon_buff.bounds
 585 | 
 586 | 		# get the network data from OSM then create the graph
 587 | 		response_jsons = osm_net_download(north=north_buffered, south=south_buffered,
 588 | 										  east=east_buffered, west=west_buffered,
 589 | 										  network_type=network_type, timeout=timeout,
 590 | 										  memory=memory, date=date,
 591 | 										  max_query_area_size=max_query_area_size,
 592 | 										  infrastructure=infrastructure)
 593 | 		G_buffered = ox.create_graph(response_jsons, name=name, retain_all=retain_all, network_type=network_type)
 594 | 		G = ox.truncate_graph_bbox(G_buffered, north, south, east, west, retain_all=True, truncate_by_edge=truncate_by_edge)
 595 | 
 596 | 		# simplify the graph topology
 597 | 		G_buffered = ox.simplify_graph(G_buffered)
 598 | 
 599 | 		# truncate graph by desired bbox to return the graph within the bbox
 600 | 		# caller wants
 601 | 		G = ox.truncate_graph_bbox(G_buffered, north, south, east, west, retain_all=retain_all, truncate_by_edge=truncate_by_edge)
 602 | 
 603 | 		# count how many street segments in buffered graph emanate from each
 604 | 		# intersection in un-buffered graph, to retain true counts for each
 605 | 		# intersection, even if some of its neighbors are outside the bbox
 606 | 		G.graph['streets_per_node'] = ox.count_streets_per_node(G_buffered, nodes=G.nodes())
 607 | 
 608 | 	else:
 609 | 		# get the network data from OSM
 610 | 		response_jsons = osm_net_download(north=north, south=south, east=east,
 611 | 										  west=west, network_type=network_type,
 612 | 										  timeout=timeout, memory=memory, date=date,
 613 | 										  max_query_area_size=max_query_area_size,
 614 | 										  infrastructure=infrastructure)
 615 | 
 616 | 		# create the graph, then truncate to the bounding box
 617 | 		G = ox.create_graph(response_jsons, name=name, retain_all=retain_all, network_type=network_type)
 618 | 		G = ox.truncate_graph_bbox(G, north, south, east, west, retain_all=retain_all, truncate_by_edge=truncate_by_edge)
 619 | 
 620 | 		# simplify the graph topology as the last step. don't truncate after
 621 | 		# simplifying or you may have simplified out to an endpoint
 622 | 		# beyond the truncation distance, in which case you will then strip out
 623 | 		# your entire edge
 624 | 		if simplify:
 625 | 			G = ox.simplify_graph(G)
 626 | 
 627 | 	log('graph_from_bbox() returning graph with {:,} nodes and {:,} edges'.format(len(list(G.nodes())), len(list(G.edges()))))
 628 | 	return  G
 629 | 
 630 | def osm_net_download(polygon=None, north=None, south=None, east=None, west=None,
 631 | 					 network_type='all_private', timeout=180, memory=None, date="",
 632 | 					 max_query_area_size=50*1000*50*1000, infrastructure='way["highway"]'):
 633 | 	"""
 634 | 	Download OSM ways and nodes within some bounding box from the Overpass API.
 635 | 	Parameters
 636 | 	----------
 637 | 	polygon : shapely Polygon or MultiPolygon
 638 | 		geographic shape to fetch the street network within
 639 | 	north : float
 640 | 		northern latitude of bounding box
 641 | 	south : float
 642 | 		southern latitude of bounding box
 643 | 	east : float
 644 | 		eastern longitude of bounding box
 645 | 	west : float
 646 | 		western longitude of bounding box
 647 | 	network_type : string
 648 | 		{'walk', 'bike', 'drive', 'drive_service', 'all', 'all_private'} what
 649 | 		type of street network to get
 650 | 	timeout : int
 651 | 		the timeout interval for requests and to pass to API
 652 | 	memory : int
 653 | 		server memory allocation size for the query, in bytes. If none, server
 654 | 		will use its default allocation size
 655 | 	date : string
 656 | 		query the database at a certain timestamp
 657 | 	max_query_area_size : float
 658 | 		max area for any part of the geometry, in the units the geometry is in:
 659 | 		any polygon bigger will get divided up for multiple queries to API
 660 | 		(default is 50,000 * 50,000 units [ie, 50km x 50km in area, if units are
 661 | 		meters])
 662 | 	infrastructure : string
 663 | 		download infrastructure of given type. default is streets, ie,
 664 | 		'way["highway"]') but other infrastructures may be selected like power
 665 | 		grids, ie, 'way["power"~"line"]'
 666 | 	Returns
 667 | 	-------
 668 | 	response_jsons : list
 669 | 	"""
 670 | 
 671 | 	# check if we're querying by polygon or by bounding box based on which
 672 | 	# argument(s) where passed into this function
 673 | 	by_poly = polygon is not None
 674 | 	by_bbox = not (north is None or south is None or east is None or west is None)
 675 | 	if not (by_poly or by_bbox):
 676 | 		raise ValueError('You must pass a polygon or north, south, east, and west')
 677 | 
 678 | 	# create a filter to exclude certain kinds of ways based on the requested
 679 | 	# network_type
 680 | 	osm_filter = ox.get_osm_filter(network_type)
 681 | 	response_jsons = []
 682 | 
 683 | 	# pass server memory allocation in bytes for the query to the API
 684 | 	# if None, pass nothing so the server will use its default allocation size
 685 | 	# otherwise, define the query's maxsize parameter value as whatever the
 686 | 	# caller passed in
 687 | 	if memory is None:
 688 | 		maxsize = ''
 689 | 	else:
 690 | 		maxsize = '[maxsize:{}]'.format(memory)
 691 | 
 692 | 	# define the query to send the API
 693 | 	# specifying way["highway"] means that all ways returned must have a highway
 694 | 	# key. the {filters} then remove ways by key/value. the '>' makes it recurse
 695 | 	# so we get ways and way nodes. maxsize is in bytes.
 696 | 	if by_bbox:
 697 | 		# turn bbox into a polygon and project to local UTM
 698 | 		polygon = Polygon([(west, south), (east, south), (east, north), (west, north)])
 699 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 700 | 
 701 | 		# subdivide it if it exceeds the max area size (in meters), then project
 702 | 		# back to lat-long
 703 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 704 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 705 | 		log('Requesting network data within bounding box from API in {:,} request(s)'.format(len(geometry)))
 706 | 		start_time = time.time()
 707 | 
 708 | 		# loop through each polygon rectangle in the geometry (there will only
 709 | 		# be one if original bbox didn't exceed max area size)
 710 | 		for poly in geometry:
 711 | 			# represent bbox as south,west,north,east and round lat-longs to 8
 712 | 			# decimal places (ie, within 1 mm) so URL strings aren't different
 713 | 			# due to float rounding issues (for consistent caching)
 714 | 			west, south, east, north = poly.bounds
 715 | 			query_template = date+'[out:json][timeout:{timeout}]{maxsize};({infrastructure}{filters}({south:.8f},{west:.8f},{north:.8f},{east:.8f});>;);out;'
 716 | 			query_str = query_template.format(north=north, south=south,
 717 | 											  east=east, west=west,
 718 | 											  infrastructure=infrastructure,
 719 | 											  filters=osm_filter,
 720 | 											  timeout=timeout, maxsize=maxsize)
 721 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
 722 | 			response_jsons.append(response_json)
 723 | 		log('Got all network data within bounding box from API in {:,} request(s) and {:,.2f} seconds'.format(len(geometry), time.time()-start_time))
 724 | 
 725 | 	elif by_poly:
 726 | 		# project to utm, divide polygon up into sub-polygons if area exceeds a
 727 | 		# max size (in meters), project back to lat-long, then get a list of
 728 | 		# polygon(s) exterior coordinates
 729 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 730 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 731 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 732 | 		polygon_coord_strs = ox.get_polygons_coordinates(geometry)
 733 | 		log('Requesting network data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs)))
 734 | 		start_time = time.time()
 735 | 
 736 | 		# pass each polygon exterior coordinates in the list to the API, one at
 737 | 		# a time
 738 | 		for polygon_coord_str in polygon_coord_strs:
 739 | 			query_template = date+'[out:json][timeout:{timeout}]{maxsize};({infrastructure}{filters}(poly:"{polygon}");>;);out;'
 740 | 			query_str = query_template.format(polygon=polygon_coord_str, infrastructure=infrastructure, filters=osm_filter, timeout=timeout, maxsize=maxsize)
 741 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
 742 | 			response_jsons.append(response_json)
 743 | 		log('Got all network data within polygon from API in {:,} request(s) and {:,.2f} seconds'.format(len(polygon_coord_strs), time.time()-start_time))
 744 | 
 745 | 	return response_jsons
 746 | 
 747 | #######################################################################
 748 | ### Land use
 749 | #######################################################################
 750 | 
 751 | def osm_landuse_download(date="", polygon=None, north=None, south=None, east=None, west=None,
 752 | 					  timeout=180, memory=None, max_query_area_size=50*1000*50*1000):
 753 | 	"""
 754 | 	Download OpenStreetMap landuse footprint data.
 755 | 	Parameters
 756 | 	----------
 757 | 	date : string
 758 | 		query the database at a certain timestamp
 759 | 	polygon : shapely Polygon or MultiPolygon
 760 | 		geographic shape to fetch the landuse footprints within
 761 | 	north : float
 762 | 		northern latitude of bounding box
 763 | 	south : float
 764 | 		southern latitude of bounding box
 765 | 	east : float
 766 | 		eastern longitude of bounding box
 767 | 	west : float
 768 | 		western longitude of bounding box
 769 | 	timeout : int
 770 | 		the timeout interval for requests and to pass to API
 771 | 	memory : int
 772 | 		server memory allocation size for the query, in bytes. If none, server
 773 | 		will use its default allocation size
 774 | 	max_query_area_size : float
 775 | 		max area for any part of the geometry, in the units the geometry is in:
 776 | 		any polygon bigger will get divided up for multiple queries to API
 777 | 		(default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are
 778 | 		meters))
 779 | 	Returns
 780 | 	-------
 781 | 	list
 782 | 		list of response_json dicts
 783 | 	"""
 784 | 
 785 | 	# check if we're querying by polygon or by bounding box based on which
 786 | 	# argument(s) where passed into this function
 787 | 	by_poly = polygon is not None
 788 | 	by_bbox = not (north is None or south is None or east is None or west is None)
 789 | 	if not (by_poly or by_bbox):
 790 | 		raise ValueError('You must pass a polygon or north, south, east, and west')
 791 | 
 792 | 	response_jsons = []
 793 | 
 794 | 	# pass server memory allocation in bytes for the query to the API
 795 | 	# if None, pass nothing so the server will use its default allocation size
 796 | 	# otherwise, define the query's maxsize parameter value as whatever the
 797 | 	# caller passed in
 798 | 	if memory is None:
 799 | 		maxsize = ''
 800 | 	else:
 801 | 		maxsize = '[maxsize:{}]'.format(memory)
 802 | 
 803 | 	# define the query to send the API
 804 | 	if by_bbox:
 805 | 		# turn bbox into a polygon and project to local UTM
 806 | 		polygon = Polygon([(west, south), (east, south), (east, north), (west, north)])
 807 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 808 | 
 809 | 		# subdivide it if it exceeds the max area size (in meters), then project
 810 | 		# back to lat-long
 811 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 812 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 813 | 		log('Requesting landuse footprints data within bounding box from API in {:,} request(s)'.format(len(geometry)))
 814 | 		start_time = time.time()
 815 | 
 816 | 		# loop through each polygon rectangle in the geometry (there will only
 817 | 		# be one if original bbox didn't exceed max area size)
 818 | 		for poly in geometry:
 819 | 			# represent bbox as south,west,north,east and round lat-longs to 8
 820 | 			# decimal places (ie, within 1 mm) so URL strings aren't different
 821 | 			# due to float rounding issues (for consistent caching)
 822 | 			west, south, east, north = poly.bounds
 823 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((way["landuse"]({south:.8f},'
 824 | 							  '{west:.8f},{north:.8f},{east:.8f});(._;>;););(relation["landuse"]'
 825 | 							  '({south:.8f},{west:.8f},{north:.8f},{east:.8f});(._;>;);););out;')
 826 | 			query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize)
 827 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
 828 | 			response_jsons.append(response_json)
 829 | 		msg = ('Got all landuse footprints data within bounding box from '
 830 | 			   'API in {:,} request(s) and {:,.2f} seconds')
 831 | 		log(msg.format(len(geometry), time.time()-start_time))
 832 | 
 833 | 	elif by_poly:
 834 | 		# project to utm, divide polygon up into sub-polygons if area exceeds a
 835 | 		# max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates
 836 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 837 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 838 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 839 | 		polygon_coord_strs = ox.get_polygons_coordinates(geometry)
 840 | 		log('Requesting landuse footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs)))
 841 | 		start_time = time.time()
 842 | 
 843 | 		# pass each polygon exterior coordinates in the list to the API, one at
 844 | 		# a time
 845 | 		for polygon_coord_str in polygon_coord_strs:
 846 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(way'
 847 | 							  '(poly:"{polygon}")["landuse"];(._;>;);relation'
 848 | 							  '(poly:"{polygon}")["landuse"];(._;>;););out;')
 849 | 			query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize)
 850 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
 851 | 			response_jsons.append(response_json)
 852 | 		msg = ('Got all landuse footprints data within polygon from API in '
 853 | 			   '{:,} request(s) and {:,.2f} seconds')
 854 | 		log(msg.format(len(polygon_coord_strs), time.time()-start_time))
 855 | 
 856 | 	return response_jsons
 857 | 
 858 | def create_landuse_gdf(date="", polygon=None, north=None, south=None, east=None,
 859 | 						 west=None, retain_invalid=False):
 860 | 	"""
 861 | 	Get landuse footprint data from OSM then assemble it into a GeoDataFrame.
 862 | 	Parameters
 863 | 	----------
 864 | 	date : string
 865 | 		query the database at a certain timestamp
 866 | 	polygon : shapely Polygon or MultiPolygon
 867 | 		geographic shape to fetch the landuse footprints within
 868 | 	north : float
 869 | 		northern latitude of bounding box
 870 | 	south : float
 871 | 		southern latitude of bounding box
 872 | 	east : float
 873 | 		eastern longitude of bounding box
 874 | 	west : float
 875 | 		western longitude of bounding box
 876 | 	retain_invalid : bool
 877 | 		if False discard any landuse footprints with an invalid geometry
 878 | 	Returns
 879 | 	-------
 880 | 	GeoDataFrame
 881 | 	"""
 882 | 
 883 | 	responses = osm_landuse_download(date, polygon, north, south, east, west)
 884 | 
 885 | 	vertices = {}
 886 | 	for response in responses:
 887 | 		for result in response['elements']:
 888 | 			if 'type' in result and result['type']=='node':
 889 | 				vertices[result['id']] = {'lat' : result['lat'],
 890 | 										  'lon' : result['lon']}
 891 | 
 892 | 	landuses = {}
 893 | 	for response in responses:
 894 | 		for result in response['elements']:
 895 | 			if 'type' in result and result['type']=='way':
 896 | 				nodes = result['nodes']
 897 | 				try:
 898 | 					polygon = Polygon([(vertices[node]['lon'], vertices[node]['lat']) for node in nodes])
 899 | 				except Exception:
 900 | 					log('Polygon has invalid geometry: {}'.format(nodes))
 901 | 				landuse = {'nodes' : nodes,
 902 | 							'geometry' : polygon}
 903 | 
 904 | 				if 'tags' in result:
 905 | 					for tag in result['tags']:
 906 | 						landuse[tag] = result['tags'][tag]
 907 | 
 908 | 				landuses[result['id']] = landuse
 909 | 
 910 | 	gdf = gpd.GeoDataFrame(landuses).T
 911 | 	gdf.crs = {'init':'epsg:4326'}
 912 | 
 913 | 	if not retain_invalid:
 914 | 		# drop all invalid geometries
 915 | 		gdf = gdf[gdf['geometry'].is_valid]
 916 | 
 917 | 	return gdf
 918 | 
 919 | #######################################################################
 920 | ### Points of interest
 921 | #######################################################################
 922 | 
 923 | def osm_pois_download(date="", polygon=None, north=None, south=None, east=None, west=None,
 924 | 					  timeout=180, memory=None, max_query_area_size=50*1000*50*1000):
 925 | 	"""
 926 | 	Download OpenStreetMap POIs footprint data.
 927 | 	Parameters
 928 | 	----------
 929 | 	date : string
 930 | 		query the database at a certain timestamp
 931 | 	polygon : shapely Polygon or MultiPolygon
 932 | 		geographic shape to fetch the POIs footprints within
 933 | 	north : float
 934 | 		northern latitude of bounding box
 935 | 	south : float
 936 | 		southern latitude of bounding box
 937 | 	east : float
 938 | 		eastern longitude of bounding box
 939 | 	west : float
 940 | 		western longitude of bounding box
 941 | 	timeout : int
 942 | 		the timeout interval for requests and to pass to API
 943 | 	memory : int
 944 | 		server memory allocation size for the query, in bytes. If none, server
 945 | 		will use its default allocation size
 946 | 	max_query_area_size : float
 947 | 		max area for any part of the geometry, in the units the geometry is in:
 948 | 		any polygon bigger will get divided up for multiple queries to API
 949 | 		(default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are
 950 | 		meters))
 951 | 	Returns
 952 | 	-------
 953 | 	list
 954 | 		list of response_json dicts
 955 | 	"""
 956 | 
 957 | 	# check if we're querying by polygon or by bounding box based on which
 958 | 	# argument(s) where passed into this function
 959 | 	by_poly = polygon is not None
 960 | 	by_bbox = not (north is None or south is None or east is None or west is None)
 961 | 	if not (by_poly or by_bbox):
 962 | 		raise ValueError('You must pass a polygon or north, south, east, and west')
 963 | 
 964 | 	response_jsons = []
 965 | 
 966 | 	# pass server memory allocation in bytes for the query to the API
 967 | 	# if None, pass nothing so the server will use its default allocation size
 968 | 	# otherwise, define the query's maxsize parameter value as whatever the
 969 | 	# caller passed in
 970 | 	if memory is None:
 971 | 		maxsize = ''
 972 | 	else:
 973 | 		maxsize = '[maxsize:{}]'.format(memory)
 974 | 
 975 | 	# define the query to send the API
 976 | 	if by_bbox:
 977 | 		# turn bbox into a polygon and project to local UTM
 978 | 		polygon = Polygon([(west, south), (east, south), (east, north), (west, north)])
 979 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
 980 | 
 981 | 		# subdivide it if it exceeds the max area size (in meters), then project
 982 | 		# back to lat-long
 983 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
 984 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
 985 | 		log('Requesting POIs footprints data within bounding box from API in {:,} request(s)'.format(len(geometry)))
 986 | 		start_time = time.time()
 987 | 
 988 | 		# loop through each polygon rectangle in the geometry (there will only
 989 | 		# be one if original bbox didn't exceed max area size)
 990 | 		for poly in geometry:
 991 | 			# represent bbox as south,west,north,east and round lat-longs to 8
 992 | 			# decimal places (ie, within 1 mm) so URL strings aren't different
 993 | 			# due to float rounding issues (for consistent caching)
 994 | 			west, south, east, north = poly.bounds
 995 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((node["amenity"]({south:.8f},'
 996 | 				'{west:.8f},{north:.8f},{east:.8f}););(node["leisure"]({south:.8f},'
 997 | 				'{west:.8f},{north:.8f},{east:.8f}););(node["office"]({south:.8f},'
 998 | 				'{west:.8f},{north:.8f},{east:.8f}););(node["shop"]({south:.8f},'
 999 | 				'{west:.8f},{north:.8f},{east:.8f}););(node["sport"]({south:.8f},'
1000 | 				'{west:.8f},{north:.8f},{east:.8f}););(node["building"]({south:.8f},'
1001 | 				'{west:.8f},{north:.8f},{east:.8f});););out;')
1002 | 			query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize)
1003 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
1004 | 			response_jsons.append(response_json)
1005 | 		msg = ('Got all POIs footprints data within bounding box from '
1006 | 			   'API in {:,} request(s) and {:,.2f} seconds')
1007 | 		log(msg.format(len(geometry), time.time()-start_time))
1008 | 
1009 | 	elif by_poly:
1010 | 		# project to utm, divide polygon up into sub-polygons if area exceeds a
1011 | 		# max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates
1012 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
1013 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
1014 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
1015 | 		polygon_coord_strs = ox.get_polygons_coordinates(geometry)
1016 | 		log('Requesting POIs footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs)))
1017 | 		start_time = time.time()
1018 | 
1019 | 		# pass each polygon exterior coordinates in the list to the API, one at
1020 | 		# a time
1021 | 		for polygon_coord_str in polygon_coord_strs:
1022 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};('
1023 | 				'(node["amenity"](poly:"{polygon}"););'
1024 | 				'(node["leisure"](poly:"{polygon}"););'
1025 | 				'(node["office"](poly:"{polygon}"););'
1026 | 				'(node["shop"](poly:"{polygon}"););'
1027 | 				'(node["sport"](poly:"{polygon}"););'
1028 | 				'(node["building"](poly:"{polygon}");););out;')
1029 | 			query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize)
1030 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
1031 | 			response_jsons.append(response_json)
1032 | 		msg = ('Got all POIs footprints data within polygon from API in '
1033 | 			   '{:,} request(s) and {:,.2f} seconds')
1034 | 		log(msg.format(len(polygon_coord_strs), time.time()-start_time))
1035 | 
1036 | 	return response_jsons
1037 | 
1038 | def create_pois_gdf(date="", polygon=None, north=None, south=None, east=None,
1039 | 						 west=None, retain_invalid=False):
1040 | 	"""
1041 | 	Get POIs footprint data from OSM then assemble it into a GeoDataFrame.
1042 | 	Parameters
1043 | 	----------
1044 | 	date : string
1045 | 		query the database at a certain timestamp
1046 | 	polygon : shapely Polygon or MultiPolygon
1047 | 		geographic shape to fetch the POIs footprints within
1048 | 	north : float
1049 | 		northern latitude of bounding box
1050 | 	south : float
1051 | 		southern latitude of bounding box
1052 | 	east : float
1053 | 		eastern longitude of bounding box
1054 | 	west : float
1055 | 		western longitude of bounding box
1056 | 	retain_invalid : bool
1057 | 		if False discard any POIs footprints with an invalid geometry
1058 | 	Returns
1059 | 	-------
1060 | 	GeoDataFrame
1061 | 	"""
1062 | 
1063 | 	responses = osm_pois_download(date, polygon, north, south, east, west)
1064 | 
1065 | 	vertices = {}
1066 | 	for response in responses:
1067 | 		for result in response['elements']:
1068 | 			if 'type' in result and result['type']=='node':
1069 | 
1070 | 				point = Point( result['lon'], result['lat'] )
1071 | 
1072 | 				POI = {'geometry' : point}
1073 | 
1074 | 				if 'tags' in result:
1075 | 					for tag in result['tags']:
1076 | 						POI[tag] = result['tags'][tag]
1077 | 
1078 | 				vertices[result['id']] = POI
1079 | 
1080 | 	gdf = gpd.GeoDataFrame(vertices).T
1081 | 	gdf.crs = {'init':'epsg:4326'}
1082 | 
1083 | 	if not retain_invalid:
1084 | 		try:
1085 | 			# drop all invalid geometries
1086 | 			gdf = gdf[gdf['geometry'].is_valid]
1087 | 		except: # Empty data frame
1088 | 			# Create a one-row data frame with null information (avoid later Spatial-Join crash)
1089 | 			if (polygon is not None): # Polygon given
1090 | 				point = polygon.centroid
1091 | 			else: # Bounding box
1092 | 				point = Point( (east+west)/2. , (north+south)/2. )
1093 | 			data = {"geometry":[point], "osm_id":[0]}
1094 | 			gdf = gpd.GeoDataFrame(data, crs={'init': 'epsg:4326'})
1095 | 
1096 | 	return gdf
1097 | 
1098 | #######################################################################
1099 | ### OSM Building parts
1100 | #######################################################################
1101 | 
1102 | def osm_bldg_part_download(date="", polygon=None, north=None, south=None, east=None, west=None,
1103 | 					  timeout=180, memory=None, max_query_area_size=50*1000*50*1000):
1104 | 	"""
1105 | 	Download OpenStreetMap building parts footprint data.
1106 | 	Parameters
1107 | 	----------
1108 | 	date : string
1109 | 		query the database at a certain timestamp
1110 | 	polygon : shapely Polygon or MultiPolygon
1111 | 		geographic shape to fetch the building footprints within
1112 | 	north : float
1113 | 		northern latitude of bounding box
1114 | 	south : float
1115 | 		southern latitude of bounding box
1116 | 	east : float
1117 | 		eastern longitude of bounding box
1118 | 	west : float
1119 | 		western longitude of bounding box
1120 | 	timeout : int
1121 | 		the timeout interval for requests and to pass to API
1122 | 	memory : int
1123 | 		server memory allocation size for the query, in bytes. If none, server
1124 | 		will use its default allocation size
1125 | 	max_query_area_size : float
1126 | 		max area for any part of the geometry, in the units the geometry is in:
1127 | 		any polygon bigger will get divided up for multiple queries to API
1128 | 		(default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are
1129 | 		meters))
1130 | 	Returns
1131 | 	-------
1132 | 	list
1133 | 		list of response_json dicts
1134 | 	"""
1135 | 
1136 | 	# check if we're querying by polygon or by bounding box based on which
1137 | 	# argument(s) where passed into this function
1138 | 	by_poly = polygon is not None
1139 | 	by_bbox = not (north is None or south is None or east is None or west is None)
1140 | 	if not (by_poly or by_bbox):
1141 | 		raise ValueError('You must pass a polygon or north, south, east, and west')
1142 | 
1143 | 	response_jsons = []
1144 | 
1145 | 	# pass server memory allocation in bytes for the query to the API
1146 | 	# if None, pass nothing so the server will use its default allocation size
1147 | 	# otherwise, define the query's maxsize parameter value as whatever the
1148 | 	# caller passed in
1149 | 	if memory is None:
1150 | 		maxsize = ''
1151 | 	else:
1152 | 		maxsize = '[maxsize:{}]'.format(memory)
1153 | 
1154 | 	# define the query to send the API
1155 | 	if by_bbox:
1156 | 		# turn bbox into a polygon and project to local UTM
1157 | 		polygon = Polygon([(west, south), (east, south), (east, north), (west, north)])
1158 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
1159 | 
1160 | 		# subdivide it if it exceeds the max area size (in meters), then project
1161 | 		# back to lat-long
1162 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
1163 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
1164 | 		log('Requesting building part footprints data within bounding box from API in {:,} request(s)'.format(len(geometry)))
1165 | 		start_time = time.time()
1166 | 
1167 | 		# loop through each polygon rectangle in the geometry (there will only
1168 | 		# be one if original bbox didn't exceed max area size)
1169 | 		for poly in geometry:
1170 | 			# represent bbox as south,west,north,east and round lat-longs to 8
1171 | 			# decimal places (ie, within 1 mm) so URL strings aren't different
1172 | 			# due to float rounding issues (for consistent caching)
1173 | 			west, south, east, north = poly.bounds
1174 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((way["building:part"]({south:.8f},'
1175 | 							  '{west:.8f},{north:.8f},{east:.8f});(._;>;););(relation["building:part"]'
1176 | 							  '({south:.8f},{west:.8f},{north:.8f},{east:.8f});(._;>;);););out;')
1177 | 			query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize)
1178 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
1179 | 			response_jsons.append(response_json)
1180 | 		msg = ('Got all building part footprints data within bounding box from '
1181 | 			   'API in {:,} request(s) and {:,.2f} seconds')
1182 | 		log(msg.format(len(geometry), time.time()-start_time))
1183 | 
1184 | 	elif by_poly:
1185 | 		# project to utm, divide polygon up into sub-polygons if area exceeds a
1186 | 		# max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates
1187 | 		geometry_proj, crs_proj = ox.project_geometry(polygon)
1188 | 		geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size)
1189 | 		geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True)
1190 | 		polygon_coord_strs = ox.get_polygons_coordinates(geometry)
1191 | 		log('Requesting building part footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs)))
1192 | 		start_time = time.time()
1193 | 
1194 | 		# pass each polygon exterior coordinates in the list to the API, one at
1195 | 		# a time
1196 | 		for polygon_coord_str in polygon_coord_strs:
1197 | 			query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(way'
1198 | 							  '(poly:"{polygon}")["building:part"];(._;>;);relation'
1199 | 							  '(poly:"{polygon}")["building:part"];(._;>;););out;')
1200 | 			query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize)
1201 | 			response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout)
1202 | 			response_jsons.append(response_json)
1203 | 		msg = ('Got all building part footprints data within polygon from API in '
1204 | 			   '{:,} request(s) and {:,.2f} seconds')
1205 | 		log(msg.format(len(polygon_coord_strs), time.time()-start_time))
1206 | 
1207 | 	return response_jsons
1208 | 
1209 | 
1210 | 
1211 | def create_building_parts_gdf(date="", polygon=None, north=None, south=None, east=None,
1212 | 						 west=None, retain_invalid=False):
1213 | 	"""
1214 | 	Get building footprint data from OSM then assemble it into a GeoDataFrame.
1215 | 	If no building parts are retrieved, a default (null-data) point located at the centroid of the region of interest is created
1216 | 
1217 | 	Parameters
1218 | 	----------
1219 | 	date : string
1220 | 		query the database at a certain timestamp
1221 | 	polygon : shapely Polygon or MultiPolygon
1222 | 		geographic shape to fetch the building footprints within
1223 | 	north : float
1224 | 		northern latitude of bounding box
1225 | 	south : float
1226 | 		southern latitude of bounding box
1227 | 	east : float
1228 | 		eastern longitude of bounding box
1229 | 	west : float
1230 | 		western longitude of bounding box
1231 | 	retain_invalid : bool
1232 | 		if False discard any building footprints with an invalid geometry
1233 | 	Returns
1234 | 	-------
1235 | 	GeoDataFrame
1236 | 	"""
1237 | 
1238 | 	responses = osm_bldg_part_download(date, polygon, north, south, east, west)
1239 | 
1240 | 	vertices = {}
1241 | 	for response in responses:
1242 | 		for result in response['elements']:
1243 | 			if 'type' in result and result['type']=='node':
1244 | 				vertices[result['id']] = {'lat' : result['lat'],
1245 | 										  'lon' : result['lon']}
1246 | 
1247 | 	buildings = {}
1248 | 	for response in responses:
1249 | 		for result in response['elements']:
1250 | 			if 'type' in result and result['type']=='way':
1251 | 				nodes = result['nodes']
1252 | 				try:
1253 | 					polygon = Polygon([(vertices[node]['lon'], vertices[node]['lat']) for node in nodes])
1254 | 				except Exception:
1255 | 					log('Polygon has invalid geometry: {}'.format(nodes))
1256 | 				building = {'nodes' : nodes,
1257 | 							'geometry' : polygon}
1258 | 
1259 | 				if 'tags' in result:
1260 | 					for tag in result['tags']:
1261 | 						building[tag] = result['tags'][tag]
1262 | 
1263 | 				buildings[result['id']] = building
1264 | 
1265 | 	gdf = gpd.GeoDataFrame(buildings).T
1266 | 	gdf.crs = {'init':'epsg:4326'}
1267 | 
1268 | 	if not retain_invalid:
1269 | 		try:
1270 | 			# drop all invalid geometries
1271 | 			gdf = gdf[gdf['geometry'].is_valid]
1272 | 		except: # Empty data frame
1273 | 			# Create a one-row data frame with null information (avoid later Spatial-Join crash)
1274 | 			if (polygon is not None): # Polygon given
1275 | 				point = polygon.centroid
1276 | 			else: # Bounding box
1277 | 				point = Point( (east+west)/2. , (north+south)/2. )
1278 | 			# Data as records
1279 | 			data = {"geometry":[point], "osm_id":[0], "building:part":["yes"], "height":[""]}
1280 | 			gdf = gpd.GeoDataFrame(data, crs={'init': 'epsg:4326'})
1281 | 
1282 | 	return gdf


--------------------------------------------------------------------------------
/urbansprawl/osm/surface.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import osmnx as ox
  7 | import pandas as pd
  8 | import geopandas as gpd
  9 | import numpy as np
 10 | from shapely.geometry import Polygon
 11 | 
 12 | from osmnx import log
 13 | 
 14 | from .tags import height_tags, activity_classification
 15 | from .classification import aggregate_classification
 16 | 
 17 | ############################################
 18 | ### Land uses surface association
 19 | ############################################
 20 | 
 21 | def get_composed_classification(building, df_pois):
 22 | 	""" 
 23 | 	Retrieve the composed classification given the building's containing Points of Interest
 24 | 
 25 | 	Parameters
 26 | 	----------
 27 | 	building : geopandas.GeoSeries
 28 | 		input building
 29 | 	df_pois : geopandas.GeoDataFrame
 30 | 		Points of Interest contained in the building
 31 | 
 32 | 	Returns
 33 | 	----------
 34 | 	geopandas.GeoSeries
 35 | 		returns a composed classification building
 36 | 	"""
 37 | 	# POIs aggregated classification
 38 | 	pois_classification = aggregate_classification( df_pois.classification.values )
 39 | 	# Composed building-POIs classification
 40 | 	composed_classification = aggregate_classification( [building.classification, pois_classification] )
 41 | 	# Composed activity categories
 42 | 	try:
 43 | 		composed_activity_category = list( set( [element for list_ in df_pois.activity_category for element in list_] + building.activity_category ) )
 44 | 	except: # df_pois.activity_category.isnull().all() Returns True
 45 | 		composed_activity_category = building.activity_category
 46 | 	# Create a Series for a new row with composed classification
 47 | 	composed_row = pd.Series( [building.geometry,composed_classification,composed_activity_category,building.building_levels], index=["geometry","classification","activity_category","building_levels"])
 48 | 	return composed_row
 49 | 
 50 | def sum_landuses(x, landuses_m2, default_classification = None, mixed_building_first_floor_activity=True):
 51 | 	""" 
 52 | 	Associate to each land use its correspondent surface use for input building
 53 | 	Mixed-uses building Option 1:
 54 | 		First floor: Activity use
 55 | 		Rest: residential use
 56 | 	Mixed-uses building Option 2:
 57 | 		Half used for Activity uses, the other half Residential use
 58 | 
 59 | 
 60 | 	Parameters
 61 | 	----------
 62 | 	x : geopandas.GeoSeries
 63 | 		input building
 64 | 	landuses_m2 : dict
 65 | 		squared meter surface associated to each land use
 66 | 	default_classification : pandas.Series
 67 | 		main building land use classification and included activity types
 68 | 	mixed_building_first_floor_activity : Boolean
 69 | 		if True: Associates building's first floor to activity uses and the rest to residential uses
 70 | 		if False: Associates half of the building's area to each land use (Activity and Residential)
 71 | 
 72 | 	Returns
 73 | 	----------
 74 | 
 75 | 	"""
 76 | 	# Empty
 77 | 	if ( not x.get("geometry")): return
 78 | 	# Mixed building assumption: First level for activity uses, the rest residential use	
 79 | 	if (x["classification"] is "activity"): # Sum activity use
 80 | 		landuses_m2["activity"] += x["geometry"].area * x["building_levels"]
 81 | 		# Sum activity category m2
 82 | 		area_per_activity_category = x["geometry"].area * x["building_levels"] / len( x["activity_category"] )
 83 | 		for activity_type in x["activity_category"]:
 84 | 			landuses_m2[activity_type] += area_per_activity_category		
 85 | 	elif (x["classification"] is "mixed"): # Sum activity and residential use
 86 | 
 87 | 		if (x["building_levels"] > 1) and (mixed_building_first_floor_activity): # More than one level
 88 | 			# First floor
 89 | 			landuses_m2["residential"] += x["geometry"].area * ( x["building_levels"] - 1 )
 90 | 			# Rest of the building
 91 | 			landuses_m2["activity"] += x["geometry"].area
 92 | 			area_per_activity_category = x["geometry"].area
 93 | 		
 94 | 		else: # One level building
 95 | 			landuses_m2["residential"] += x["geometry"].area * x["building_levels"] / 2.
 96 | 			landuses_m2["activity"] += x["geometry"].area * x["building_levels"] / 2.
 97 | 			area_per_activity_category = ( x["geometry"].area * x["building_levels"] / 2. ) / len( x["activity_category"] )
 98 | 		
 99 | 		# Sum activity category m2		
100 | 		for activity_type in x["activity_category"]:
101 | 			landuses_m2[activity_type] += area_per_activity_category			
102 | 	elif (x["classification"] == "residential"): # Sum residential use
103 | 		landuses_m2["residential"] += x["geometry"].area * x["building_levels"]
104 | 	else: 
105 | 		# Row does not contain a classification, use given default classification creating a new dict
106 | 		dict_x = {"classification":default_classification.classification, "geometry":x.geometry, "building_levels":x.building_levels, "activity_category":default_classification.activity_category}
107 | 		# Recursive call
108 | 		sum_landuses(dict_x, landuses_m2, mixed_building_first_floor_activity=mixed_building_first_floor_activity)
109 | 
110 | 
111 | def calculate_landuse_m2(building, mixed_building_first_floor_activity=True):
112 | 	""" 
113 | 	Calculate the total squared meters associated to residential and activity uses for input building
114 | 	In addition, surface usage for each activity types is performed
115 | 
116 | 	Parameters
117 | 	----------
118 | 	building : geopandas.GeoSeries
119 | 		input building
120 | 	mixed_building_first_floor_activity : Boolean
121 | 		if True: Associates building's first floor to activity uses and the rest to residential uses
122 | 		if False: Associates half of the building's area to each land use (Activity and Residential)
123 | 
124 | 	Returns
125 | 	----------
126 | 	dict
127 | 		contains the total associated surface to each land use key
128 | 	"""
129 | 	# Initialize
130 | 	landuse_m2 = {}
131 | 	landuse_m2["activity"] = 0
132 | 	landuse_m2["residential"] = 0
133 | 	for activity_type in list( activity_classification.keys() ):
134 | 		landuse_m2[activity_type] = 0
135 | 	
136 | 	# Get the composed classification from input building + containing POIs
137 | 	building_composed_classification = get_composed_classification(building, building.pois_full_parts)
138 | 	
139 | 	def no_min_level_geometry(building_parts):
140 | 		"""
141 | 		Returns building parts with no min. level associated
142 | 		"""
143 | 		def no_min_level_tag(x): # Buildings starts from a specific num level?
144 | 			if ( x.get("building:min_level") or x.get("min_level") or x.get("building:min_height") or x.get("min_height") ):
145 | 				return True
146 | 			else:
147 | 				return False
148 | 		# Get the geometries of the contained buildings with no height/level tags available
149 | 		geometries = building_parts.loc[ building_parts.height_tags.apply(lambda x: no_min_level_tag(x) ) ].geometry
150 | 		
151 | 		# Create the union of those geometries
152 | 		no_min_level_geom = Polygon()
153 | 		for shape in geometries.values:
154 | 			no_min_level_geom = no_min_level_geom.union(shape)
155 | 
156 | 		# Return the final shape
157 | 		return no_min_level_geom
158 | 	
159 | 	# Remove from the main building geometry, those building parts geometries that do not contain a minimum level/height: Avoid duplicating first level surface
160 | 	building_composed_classification.geometry = building_composed_classification.geometry.difference( no_min_level_geometry(building.full_parts) )
161 | 
162 | 	# Sum land uses for main building
163 | 	sum_landuses(building_composed_classification, landuse_m2, mixed_building_first_floor_activity=mixed_building_first_floor_activity)
164 | 	
165 | 	# Sum land uses for building parts. If no classification given, use the building's land use
166 | 	building.full_parts.apply(lambda x: sum_landuses(x, landuse_m2, building_composed_classification[["classification","activity_category"]], mixed_building_first_floor_activity=mixed_building_first_floor_activity), axis=1)
167 | 	
168 | 	return landuse_m2
169 | 
170 | def associate_levels(df_osm, default_height, meters_per_level):
171 | 	""" 
172 | 	Calculate the effective number of levels for each input building
173 | 	Under missing tag data, default values are used
174 | 	A column ['building_levels'] is added to the data frame
175 | 
176 | 	Parameters
177 | 	----------
178 | 	df_osm : geopandas.GeoDataFrame
179 | 		input data frame
180 | 	default_height : float
181 | 		default building height in meters
182 | 	meters_per_level : float
183 | 		default meters per level
184 | 
185 | 	Returns
186 | 	----------
187 | 
188 | 	"""
189 | 	def levels_from_height(height, meters_per_level):
190 | 		"""
191 | 		Returns estimated number of levels given input height (meters)
192 | 		"""
193 | 		levels = abs( round( height / meters_per_level ) )
194 | 		if (levels >= 1):
195 | 			return levels
196 | 		else: # By default: 1 level
197 | 			assert( height > 0 )
198 | 			return 1
199 | 
200 | 	def associate_level(x):
201 | 		"""
202 | 		Associates the number of levels to input building given its height tags information
203 | 		Returns the absolute value in order to consider the cases of underground levels
204 | 		"""
205 | 		
206 | 		# No height tags available
207 | 		if not ( type(x) == dict ): 
208 | 			# Default values
209 | 			number_levels = levels_from_height(default_height, meters_per_level)
210 | 			return number_levels
211 | 
212 | 		# Buildings starts from a specific num level?
213 | 		if ( x.get("building:min_level") ): # building:min_level
214 | 			min_level = x["building:min_level"]
215 | 		elif ( x.get("min_level") ): # min_level
216 | 			min_level = x["min_level"]
217 | 		######################### Height based
218 | 		elif ( x.get("building:min_height") ): # min_level
219 | 			min_level = levels_from_height( x["building:min_height"], meters_per_level )
220 | 		elif ( x.get("min_height") ): # min_level
221 | 			min_level = levels_from_height( x["min_height"], meters_per_level )
222 | 		else:
223 | 			min_level = 0
224 | 
225 | 		######################### Levels based
226 | 		if ( x.get("building:levels") ): # Number of building:levels given
227 | 			number_levels = abs( x["building:levels"] - min_level )
228 | 		elif ( x.get("levels") ): # Number of levels given
229 | 			number_levels = abs( x["levels"] - min_level )
230 | 		######################### Height based
231 | 		elif ( x.get("building:height") ): # building:height given
232 | 			number_levels = abs( levels_from_height( x["building:height"], meters_per_level ) - min_level )
233 | 		elif ( x.get("height") ): # height given
234 | 			number_levels = abs( levels_from_height( x["height"], meters_per_level ) - min_level )
235 | 		else: # No information given
236 | 			number_levels = levels_from_height(default_height, meters_per_level)
237 | 
238 | 		assert( number_levels >= 0 )
239 | 		if (number_levels == 0): # By default at least 1 level 
240 | 			number_levels = 1
241 | 		return number_levels
242 | 
243 | 	df_osm["building_levels"] = df_osm.height_tags.apply( lambda x: associate_level(x) )
244 | 
245 | def classification_sanity_check(building):
246 | 	"""
247 | 	Performs a sanity check in order to achieve coherence between the building's classification and the amount of M^2 associated to each land use
248 | 	Example: A building's classification could be 'residential', but contains its building parts (occupying 100% of the area, then all land uses M^2 associated to this land use) contain an acitivty use
249 | 	This would impose a problem of coherence between the classification and its surface land use
250 | 
251 | 	Parameters
252 | 	----------
253 | 	building : geopandas.GeoSeries
254 | 		one row denoting the building's information
255 | 
256 | 	Returns
257 | 	----------
258 | 	string
259 | 		returns the coherent classification
260 | 	"""
261 | 	if ( building.landuses_m2["residential"] > 0 ):
262 | 		if ( building.landuses_m2["activity"] > 0 ): # Mixed use
263 | 			return "mixed" 
264 | 		else: # Residential use
265 | 			return "residential"
266 | 	else: # Activity use
267 | 		return "activity"
268 | 
269 | def compute_landuses_m2(df_osm_built, df_osm_building_parts, df_osm_pois, default_height=6, meters_per_level=3, mixed_building_first_floor_activity=True):
270 | 	"""
271 | 	Determine the effective number of levels per building or building parts
272 | 	Calculate the amount of squared meters associated to residential and activity uses per building
273 | 	In addition, surface usage for each activity types is performed
274 | 
275 | 	Parameters
276 | 	----------
277 | 	df_osm_built : geopandas.GeoDataFrame
278 | 		OSM Buildings
279 | 	df_osm_building_parts : geopandas.GeoDataFrame
280 | 		OSM building parts
281 | 	df_osm_pois : geopandas.GeoDataFrame
282 | 		OSM Points of interest
283 | 	default_height : float
284 | 		default building height in meters
285 | 	meters_per_level : float
286 | 		default meters per level
287 | 	mixed_building_first_floor_activity : Boolean
288 | 		if True: Associates building's first floor to activity uses and the rest to residential uses
289 | 		if False: Associates half of the building's area to each land use (Activity and Residential)
290 | 
291 | 	Returns
292 | 	----------
293 | 
294 | 	"""
295 | 	# Associate the number of levels to each building / building part
296 | 	associate_levels(df_osm_built, default_height=default_height, meters_per_level=meters_per_level)
297 | 	associate_levels(df_osm_building_parts, default_height=default_height, meters_per_level=meters_per_level)
298 | 
299 | 	##################
300 | 	# Calculate for each building, the M^2 associated to each land usage considering building parts (area calculated given UTM coordinates projection assumption)
301 | 	##################
302 | 	
303 | 	# Associate the complete data frame of containing building parts
304 | 	col_interest = ["geometry","activity_category", "classification", "key_value", "height_tags", "building_levels"]
305 | 	df_osm_built["full_parts"] = df_osm_built.containing_parts.apply(lambda x: df_osm_building_parts.loc[x, col_interest ] if isinstance(x, list) else df_osm_building_parts.loc[ [], col_interest ] )
306 | 	
307 | 	# Associate the complete POIs contained in buildings
308 | 	col_interest = ["geometry","activity_category", "classification", "key_value"]
309 | 	df_osm_built["pois_full_parts"] = df_osm_built.containing_poi.apply(lambda x: df_osm_pois.loc[x, col_interest] if isinstance(x, list) else df_osm_pois.loc[ [], col_interest] )
310 | 
311 | 	# Calculate m2's for each land use, plus for each activity category
312 | 	df_osm_built["landuses_m2"] = df_osm_built.apply(lambda x: calculate_landuse_m2(x, mixed_building_first_floor_activity=mixed_building_first_floor_activity), axis=1 )
313 | 
314 | 	# Drop added full parts
315 | 	df_osm_built.drop( ["full_parts"], axis=1, inplace=True )
316 | 	df_osm_built.drop( ["pois_full_parts"], axis=1, inplace=True )
317 | 
318 | 	# Sanity check: For each building land use classification, its M^2 associated to these land uses must be greater than 1
319 | 	df_osm_built["classification"] = df_osm_built.apply(lambda x: classification_sanity_check(x), axis=1 )
320 | 


--------------------------------------------------------------------------------
/urbansprawl/osm/tags.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | # Height tags
  7 | height_tags = [ "min_height", "height", "min_level", "levels", "building:min_height", "building:height", "building:min_level", "building:levels", "building:levels:underground" ]
  8 | 
  9 | # Columns of interest corresponding to OSM keys 
 10 | columns_osm_tag = [ "amenity", "landuse", "leisure", "shop", "man_made", "building", "building:use", "building:part" ]
 11 | 
 12 | # Building parts which need to be filtered
 13 | building_parts_to_filter = ["no", "roof"]
 14 | 
 15 | #################################################################
 16 | ### Classify uses according to OpenStreetMap wiki
 17 | #################################################################
 18 | """
 19 | Possible tags classification:
 20 | 	residential: Defines a residential land use
 21 | 	activity: Defines any activity land use
 22 | 	other: Defines any non-residential and non-activity use
 23 | 	infer: Defines any condition where an inference needs to be done (using land use polygons containing them)
 24 | 
 25 | Possible activities classifications: 
 26 | 	shop, leisure/amenity, commercial/industrial
 27 | """
 28 | 
 29 | #################################################################
 30 | # Key:Value classification
 31 | key_classification = {}
 32 | # Land unse key:value classification
 33 | landuse_classification = {}
 34 | # Specific activity category classification
 35 | activity_classification = {"leisure/amenity":[], "shop":[], "commercial/industrial":[]}
 36 | 
 37 | #############
 38 | ### Amenity
 39 | #############
 40 | amenities_sustenance = ['bar','pub','restaurant','biergarten','cafe','fast_food','food_court','ice_cream']
 41 | amenities_education = ['college','kindergarten','library','public_bookcase','school','music_school','driving_school','language_school','university']
 42 | amenities_transportation = ['fuel','bicycle_rental','bus_station','car_rental','taxi','car_wash','ferry_terminal']
 43 | amenities_financial = ['atm','bank','bureau_de_change']
 44 | amenities_healthcare = ['baby_hatch','clinic','dentist','doctors','hospital','nursing_home','pharmacy','social_facility','veterinary']
 45 | amenities_entertainment = ['arts_centre','brothel','casino','cinema','community_centre','fountain','gambling','nightclub','planetarium','social_centre','stripclub','studio','swingerclub','theatre']
 46 | amenities_others = ['animal_boarding','animal_shelter','courthouse','coworking_space','crematorium','dive_centre','dojo','embassy','fire_station','gym','internet_cafe','marketplace','police','post_office','townhall']
 47 | 
 48 | amenities_activities = amenities_sustenance + amenities_education + amenities_transportation + amenities_financial + amenities_healthcare + amenities_entertainment + amenities_others
 49 | 
 50 | key_classification["activity_amenity"] = amenities_activities
 51 | activity_classification["leisure/amenity"] += amenities_activities
 52 | 
 53 | #############
 54 | ### Shop
 55 | #############
 56 | shop_other = ["bookmaker","copyshop","dry_cleaning","e-cigarette","funeral_directors","laundry","money_lender","pawnbroker","pet","pyrotechnics","religion","tobacco","toys","travel_agency","vacant","weapons","user defined"]
 57 | shop_gifts = ["anime","books","gift","lottery","newsagent","stationery","ticket"]
 58 | shop_art = ["art","collector","craft","frame","games","model","music","musical_instrument","photo","camera","trophy","video","video_games"]
 59 | shop_sports = ["bicycle","car","car_repair","car_parts","fuel","fishing","free_flying","hunting","motorcycle","outdoor","scuba_diving","sports","swimming_pool","tyres"]
 60 | shop_electronics = ["computer","electronics","hifi","mobile_phone","radiotechnics","vacuum_cleaner"]
 61 | shop_furniture = ["antiques","bed","candles","carpet","curtain","furniture","interior_decoration","kitchen","lamps","tiles","window_blind"]
 62 | shop_household = ["agrarian","bathroom_furnishing","doityourself","electrical","energy","fireplace","florist","garden_centre","garden_furniture","gas","glaziery","hardware","houseware","locksmith","paint","security","trade"]
 63 | shop_health = ["beauty","chemist","cosmetics","drugstore","erotic","hairdresser","hairdresser_supply","hearing_aids","herbalist","massage","medical_supply","nutrition_supplements","optician","perfumery","tattoo"]
 64 | shop_charity = ["charity","second_hand","variety_store"]
 65 | shop_clothing = ["baby_goods","bag","boutique","clothes","fabric","fashion","jewelry","leather","shoes","tailor","watches"]
 66 | shop_mall = ["department_store","general","kiosk","mall","supermarket"]
 67 | shop_food = ["alcohol","bakery","beverages","brewing_supplies","butcher","cheese","chocolate","coffee","confectionery","convenience","deli","dairy","farm","greengrocer","ice_cream","organic","pasta","pastry","seafood","spices","tea","wine"]
 68 | 
 69 | shop_activities = shop_other + shop_gifts + shop_art + shop_sports + shop_electronics + shop_furniture + shop_household + shop_health + shop_charity + shop_clothing + shop_mall + shop_food + ['shop']
 70 | 
 71 | key_classification["activity_shop"] = shop_activities
 72 | activity_classification["shop"] += shop_activities
 73 | 
 74 | #############
 75 | ### Leisure
 76 | #############
 77 | #Not tagged as activity: dog_park, bird_hide, bandstand, firepit, fishing, garden, golf_course, marina, nature_reserve, park, playground, slipway, track, wildlife_hide
 78 | leisure_activies = ['adult_gaming_centre','amusement_arcade','beach_resort','dance','escape_game','fitness_centre','hackerspace','horse_riding','ice_rink','miniature_golf','pitch','sauna','sports_centre','stadium','summer_camp','swimming_area','swimming_pool','water_park']
 79 | 
 80 | key_classification["activity_leisure"] = leisure_activies
 81 | activity_classification["leisure/amenity"] += leisure_activies
 82 | 
 83 | #############
 84 | ### Man made
 85 | #############
 86 | man_made_activities = ["offshore_platform", "works", "wastewater_plant", "water_works", "kiln", "monitoring_station", "observatory"]
 87 | man_made_other = ['adit', 'beacon', 'breakwater', 'bridge', 'bunker_silo', 'campanile', 'chimney', 'communications_tower', 'crane', 'cross', 'cutline', 'clearcut', 'embankment', 'dovecote', 'dyke', '	flagpole', 'gasometer', 'groyne', 'lighthouse', 'mast', 'mineshaft', 'obelisk', 'petroleum_well', 'pier', 'pipeline', 'pumping_station', 'reservoir_covered', 'silo', 'snow_fence', 'snow_net', 'storage_tank', 'street_cabinet', 'surveillance', 'survey_point', 'telscope', 'tower', 'watermill', 'water_tower', 'water_well', 'water_tap', 'wildlife_crossing', 'windmill']
 88 | 
 89 | key_classification["activity_man_made"] = man_made_activities
 90 | key_classification["other_man_made"] = man_made_other
 91 | activity_classification["commercial/industrial"] += man_made_activities
 92 | 
 93 | #############
 94 | ### Building
 95 | #############
 96 | building_infer = ['yes']
 97 | building_other = ['barn','bridge','bunker','cabin','cowshed','digester','garage','garages','farm_auxiliary','greenhouse','hut','roof','shed','stable','sty','transformer_tower','service','ruins']
 98 | building_related_activities = ['hangar', 'stable', 'cowshed', 'digester', 'construction'] # From building_other related to activities
 99 | building_shop = ['shop','kiosk']
100 | building_commercial = ['commercial','office','industrial','retail','warehouse'] + ['port']
101 | building_civic_amenity = ['cathedral','chapel','church','mosque','temple','synagogue','shrine','civic','hospital','school','stadium','train_station','transportation','university','public']
102 | 
103 | building_activities = building_commercial + building_civic_amenity + building_related_activities + building_shop
104 | building_residential = ['hotel','farm','apartment','apartments','dormitory','house','residential','retirement_home','terrace','houseboat','bungalow','static_caravan','detached']
105 | 
106 | key_classification["activity_building"] = building_activities
107 | key_classification["residential_building"] = building_residential
108 | key_classification["infer_building"] = building_infer
109 | key_classification["other_building"] = building_other
110 | activity_classification["commercial/industrial"] += building_commercial + building_related_activities
111 | activity_classification["leisure/amenity"] += building_civic_amenity
112 | activity_classification["shop"] += building_shop
113 | 
114 | #############
115 | ### Building:use
116 | #############
117 | key_classification["activity_building:use"] = building_activities
118 | key_classification["residential_building:use"] = building_residential
119 | 
120 | #############
121 | ### Building:part
122 | #############
123 | key_classification["activity_building:part"] = building_activities
124 | key_classification["residential_building:part"] = building_residential
125 | 
126 | #############
127 | ### Land use
128 | #############
129 | landuse_activities = building_activities + shop_activities + amenities_activities + leisure_activies + ['quarry','salt_pond','military']
130 | landuse_residential = ['residential']
131 | 
132 | # Land usage not related to residential or activity uses
133 | landuse_other_related = ['cemetery', 'landfill', 'railway']
134 | landuse_water = ['water', 'reservoir', 'basin']
135 | landuse_green = ['allotments','conservation', 'farmland', 'farmyard','forest','grass', 'greenfield', 'greenhouse_horticulture','meadow','orchard','pasture','peat_cutting','plant_nursery','recreation_ground','village_green','vineyard']
136 | landuse_other = landuse_other_related + landuse_water + landuse_green
137 | 
138 | landuse_classification["activity"] = landuse_activities
139 | landuse_classification["residential"] = landuse_residential
140 | landuse_classification["other"] = landuse_other
141 | 
142 | activity_classification["commercial/industrial"] += ['quarry','salt_pond','military']
143 | ####################################################################################


--------------------------------------------------------------------------------
/urbansprawl/osm/utils.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import osmnx as ox
  7 | import pandas as pd
  8 | import geopandas as gpd
  9 | import numpy as np
 10 | 
 11 | from .tags import height_tags
 12 | 
 13 | from ..settings import storage_folder
 14 | 
 15 | # Format for load/save the geo-data ['geojson','shp']
 16 | geo_format = 'geojson' # 'shp'
 17 | geo_driver = 'GeoJSON' # 'ESRI Shapefile'
 18 | 
 19 | ###################################################
 20 | ### I/O utils
 21 | ###################################################
 22 | 
 23 | def get_dataframes_filenames(city_ref_file):
 24 | 	"""
 25 | 	Get data frame file names for input city
 26 | 
 27 | 	Parameters
 28 | 	----------
 29 | 	city_ref_file : string
 30 | 		name of input city
 31 | 
 32 | 	Returns
 33 | 	----------
 34 | 	[ string, string, string ]
 35 | 		returns filenames for buildings, building parts, and points of interest
 36 | 	
 37 | 	"""
 38 | 	import os
 39 | 	if not(os.path.isdir(storage_folder)): 
 40 | 		os.makedirs(storage_folder)
 41 | 	geo_poly_file = storage_folder+"/"+city_ref_file+"_buildings."+geo_format
 42 | 	geo_poly_parts_file = storage_folder+"/"+city_ref_file+"_building_parts."+geo_format
 43 | 	geo_point_file = storage_folder+"/"+city_ref_file+"_poi."+geo_format
 44 | 	return geo_poly_file, geo_poly_parts_file, geo_point_file
 45 | 
 46 | def load_geodataframe(geo_filename):
 47 | 	""" 
 48 | 	Load input GeoDataFrame
 49 | 
 50 | 	Parameters
 51 | 	----------
 52 | 	geo_filename : string
 53 | 		input GeoDataFrame filename
 54 | 
 55 | 	Returns
 56 | 	----------
 57 | 	geopandas.GeoDataFrame
 58 | 		loaded data
 59 | 
 60 | 	"""
 61 | 	# Load using geopandas
 62 | 	df_osm_data = gpd.read_file(geo_filename)
 63 | 	# Set None as NaN
 64 | 	df_osm_data.fillna(value=np.nan, inplace=True)
 65 | 	# Replace empty string (Json NULL sometimes read as '') for NaN
 66 | 	df_osm_data.replace('', np.nan, inplace=True)
 67 | 	
 68 | 	def list_int_from_string(x): # List of integers given input in string format
 69 | 		return [ int(id_) for id_ in x.split(",") ]
 70 | 	def list_str_from_string(x): # List of strings given input in string format
 71 | 		return x.split(",")
 72 | 
 73 | 	# Recover list
 74 | 	if ( "activity_category" in df_osm_data.columns): 
 75 | 		df_osm_data[ "activity_category" ] = df_osm_data.activity_category.apply(lambda x: list_str_from_string(x) if pd.notnull(x) else np.nan )
 76 | 	if ( "containing_parts" in df_osm_data.columns): 
 77 | 		df_osm_data[ "containing_parts" ] = df_osm_data.containing_parts.apply( lambda x: list_int_from_string(x) if pd.notnull(x) else np.nan )
 78 | 	if ( "containing_poi" in df_osm_data.columns): 
 79 | 		df_osm_data[ "containing_poi" ] = df_osm_data.containing_poi.apply( lambda x: list_int_from_string(x) if pd.notnull(x) else np.nan )
 80 | 	
 81 | 	# To UTM coordinates
 82 | 	return ox.project_gdf( df_osm_data )
 83 | 
 84 | def store_geodataframe(df_osm_data, geo_filename):
 85 | 	""" 
 86 | 	Store input GeoDataFrame
 87 | 
 88 | 	Parameters
 89 | 	----------
 90 | 	df_osm_data : geopandas.GeoDataFrame
 91 | 		input OSM data frame
 92 | 	geo_filename : string
 93 | 		filename for GeoDataFrame storage
 94 | 
 95 | 	Returns
 96 | 	----------
 97 | 	
 98 | 	"""
 99 | 	# To EPSG 4326 (GeoJSON does not store projection information)
100 | 	df_osm_data = ox.project_gdf(df_osm_data, to_latlong=True)
101 | 	
102 | 	# Lists to string (needed to save GeoJSON files)
103 | 	if ( "activity_category" in df_osm_data.columns): 
104 | 		df_osm_data.activity_category = df_osm_data.activity_category.apply( lambda x: ','.join(str(e) for e in x) if isinstance(x,list) else np.nan )	
105 | 	if ( "containing_parts" in df_osm_data.columns):
106 | 		df_osm_data.containing_parts = df_osm_data.containing_parts.apply( lambda x: ','.join(str(e) for e in x) if isinstance(x,list) else np.nan )	
107 | 	if ( "containing_poi" in df_osm_data.columns):
108 | 		df_osm_data.containing_poi = df_osm_data.containing_poi.apply( lambda x: ','.join(str(e) for e in x) if isinstance(x,list) else np.nan )	
109 | 	
110 | 	# Save to file
111 | 	df_osm_data.to_file(geo_filename, driver=geo_driver)
112 | 
113 | ###################################################
114 | ### GeoDataFrame processing utils
115 | ###################################################
116 | 
117 | def sanity_check_height_tags(df_osm):
118 | 	""" 
119 | 	Compute a sanity check for all height tags
120 | 	If incorrectly tagged, try to replace with the correct tag
121 | 	Any meter or level related string are replaced, and heights using the imperial units are converted to the metric system
122 | 
123 | 	Parameters
124 | 	----------
125 | 	df_osm : geopandas.GeoDataFrame
126 | 		input OSM data frame
127 | 
128 | 	Returns
129 | 	----------
130 | 	
131 | 	"""
132 | 	def sanity_check(value):
133 | 		### Sanity check for height tags (sometimes wrongly-tagged)
134 | 		if not( (value is np.nan) or (value is None) ): # Non-null value
135 | 			try: # Can be read as float?
136 | 				return float(value)
137 | 			except: 
138 | 				try: # Try removing incorrectly tagged information: meters/levels 
139 | 					return float( value.replace('meters','').replace('meter','').replace('m','').replace('levels','').replace('level','').replace('l','') )
140 | 				except:					
141 | 					try: # Feet and inch values? e.g.: 4'7''
142 | 						split_value = value.split("'")
143 | 						feet, inches = split_value[0], split_value[1]
144 | 						if (inches is ''): # Non existent inches
145 | 							inches = '0'
146 | 						tot_inches = float(feet)*12 + float(inches)
147 | 						# Return meters equivalent
148 | 						return tot_inches * 0.0254
149 | 					except: # None. Incorrect tag
150 | 						return None
151 | 		return value
152 | 
153 | 	# Available height tags
154 | 	available_height_tags = [ col for col in height_tags if col in df_osm.columns ]
155 | 	# Apply-map sanity check
156 | 	df_osm[ available_height_tags ] = df_osm[ available_height_tags ].applymap(sanity_check)
157 | 
158 | def associate_structures(df_osm_encompassing_structures, df_osm_structures, operation='contains', column='containing_'):
159 | 	""" 
160 | 	Associate input structure geometries to its encompassing structures
161 | 	Structures are associated using the operation 'contains' or 'intersects'
162 | 	A new column in the encompassing data frame is added, incorporating the indices of the containing structures
163 | 
164 | 	Parameters
165 | 	----------
166 | 	df_osm_encompassing_structures : geopandas.GeoDataFrame
167 | 		encompassing data frame
168 | 	df_osm_structures : geopandas.GeoDataFrame
169 | 		structures data frame
170 | 	operation : string
171 | 		spatial join operation to associate structures
172 | 	column : string
173 | 		name of the column to add in encompassing data frame
174 | 
175 | 	Returns
176 | 	----------
177 | 	
178 | 	"""
179 | 	# Find, for each geometry, all containing structures
180 | 	sjoin = gpd.sjoin(df_osm_encompassing_structures[['geometry']], df_osm_structures[['geometry']], op=operation, rsuffix='cont')
181 | 	# Group by: polygon_index -> list of containing points indices
182 | 	group_indices = sjoin.groupby( sjoin.index, as_index=True )['index_cont'].apply(list)
183 | 	# Create new column
184 | 	df_osm_encompassing_structures.loc[ group_indices.index, column ] = group_indices.values
185 | 	# Reset indices
186 | 	df_osm_encompassing_structures.index.rename('',inplace=True)
187 | 	df_osm_structures.index.rename('',inplace=True)


--------------------------------------------------------------------------------
/urbansprawl/population/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/urbansprawl/population/__init__.py


--------------------------------------------------------------------------------
/urbansprawl/population/core.py:
--------------------------------------------------------------------------------
1 | ###################################################################################################
2 | # Repository: https://github.com/lgervasoni/urbansprawl
3 | # MIT License
4 | ###################################################################################################
5 | 
6 | from .data_extract import get_extract_population_data
7 | from .downscaling import proportional_population_downscaling
8 | from .urban_features import compute_full_urban_features, get_training_testing_data, get_Y_X_features_population_data
9 | from .utils import get_aggregated_squares, population_downscaling_validation


--------------------------------------------------------------------------------
/urbansprawl/population/data_extract.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | from shapely.geometry import Polygon, GeometryCollection
  7 | import geopandas as gpd
  8 | import pandas as pd
  9 | import os
 10 | import numpy as np
 11 | import osmnx as ox
 12 | 
 13 | from osmnx import log
 14 | 
 15 | from .utils import get_population_extract_filename
 16 | 
 17 | DATA_SOURCES = ['insee','gpw']
 18 | 
 19 | ##############################
 20 | ### I/O for population data
 21 | ##############################
 22 | 
 23 | def get_df_extract(df_data, poly_gdf, operation = "within"):
 24 | 	"""
 25 | 	Indexes input geo-data frame within an input region of interest
 26 | 	If the region of interest is given as a polygon, its bounding box is indexed
 27 | 
 28 | 	Parameters
 29 | 	----------
 30 | 	df_data : geopandas.GeoDataFrame
 31 | 		input data frame to index
 32 | 	poly_gdf : geopandas.GeoDataFrame
 33 | 		geodataframe containing the region of interest in form of polygon
 34 | 	operation : string
 35 | 		the desired spatial join operation: 'within' or 'intersects'
 36 | 
 37 | 	Returns
 38 | 	----------
 39 | 	geopandas.GeoDataFrame
 40 | 		returns the population data frame indexed within the region of interest
 41 | 	"""
 42 | 	# Project to same system coordinates
 43 | 	poly_gdf = ox.project_gdf(poly_gdf, to_crs=df_data.crs)
 44 | 	# Spatial join
 45 | 	df_extract = gpd.sjoin(df_data, poly_gdf, op=operation)
 46 | 	# Keep original columns
 47 | 	df_extract = df_extract[ df_data.columns ]
 48 | 	return df_extract
 49 | 
 50 | def get_population_df(pop_shapefile, pop_data_file, data_source, to_crs, poly_gdf):
 51 | 	"""
 52 | 	Read the population shapefile from input filename/s
 53 | 	Index the data within the bounding box
 54 | 	Project to desired CRS
 55 | 
 56 | 	Parameters
 57 | 	----------
 58 | 	pop_shapefile : string
 59 | 		population count shapefile
 60 | 	pop_data_file : string
 61 | 		population data additional file (required for INSEE format)
 62 | 	data_source : string
 63 | 		desired population data source
 64 | 	to_crs : dict
 65 | 		desired coordinate reference system
 66 | 	poly_gdf : geopandas.GeoDataFrame
 67 | 		geodataframe containing the region of interest in form of polygon
 68 | 
 69 | 	Returns
 70 | 	----------
 71 | 	geopandas.GeoDataFrame
 72 | 		returns the indexed and projected population data frame
 73 | 	"""
 74 | 	#######################################
 75 | 	### Load GPW/INSEE population data
 76 | 	#######################################
 77 | 	# Read population data
 78 | 	df_pop = gpd.read_file(pop_shapefile)
 79 | 		
 80 | 	### Extract region of interest (EPSG 4326)
 81 | 	# Filter geometries not contained in bounding box
 82 | 	df_pop = get_df_extract(df_pop, poly_gdf)
 83 | 
 84 | 	if (data_source is 'insee'):
 85 | 		#######################################
 86 | 		### Additional step for INSEE data
 87 | 		#######################################	
 88 | 		# Read dbf files
 89 | 		data_pop = gpd.read_file(pop_data_file)
 90 | 		# Get columns of interest
 91 | 		data_pop = data_pop[["idINSPIRE","ind_c"]]
 92 | 		df_pop = df_pop[["geometry","idINSPIRE"]]
 93 | 		# Inner join to obtain population count data associated to each geometry
 94 | 		df_pop = pd.merge(df_pop, data_pop, how='inner', on='idINSPIRE')
 95 | 	
 96 | 	# Rename population count column
 97 | 	df_pop.rename(columns={"ind_c":"pop_count", "DN":"pop_count"}, inplace=True)
 98 | 
 99 | 	return ox.project_gdf(df_pop, to_crs=to_crs)
100 | 
101 | def get_extract_population_data(city_ref, data_source, pop_shapefile=None, pop_data_file=None, to_crs={'init': 'epsg:4326'}, polygons_gdf=None):
102 | 	"""
103 | 	Get data population extract of desired data source for input city, calculating the convex hull of input buildings geodataframe
104 | 	The population data frame is projected to the desired coordinate reference system
105 | 	Stores the extracted shapefile
106 | 	Returns the stored population data for input 'data source' and 'city reference' if it was previously stored
107 | 
108 | 	Parameters
109 | 	----------
110 | 	city_ref : string
111 | 		name of input city
112 | 	data_source : string
113 | 		desired population data source
114 | 	pop_shapefile : string
115 | 		path of population count shapefile
116 | 	pop_data_file : string
117 | 		path of population data additional file (required for INSEE format)
118 | 	to_crs : dict
119 | 		desired coordinate reference system
120 | 	polygons_gdf : geopandas.GeoDataFrame
121 | 		polygons (e.g. buildings) for input region of interest which will determine the shape to extract
122 | 
123 | 	Returns
124 | 	----------
125 | 	geopandas.GeoDataFrame
126 | 		returns the extracted population data
127 | 	"""
128 | 	# Input data source type given?
129 | 	assert( data_source in DATA_SOURCES )
130 | 
131 | 	# Population extract exists?
132 | 	if ( os.path.exists( get_population_extract_filename(city_ref, data_source) ) ):
133 | 		log("Population extract exists for input city: "+city_ref)
134 | 		return gpd.read_file( get_population_extract_filename(city_ref, data_source) )
135 | 
136 | 	# Input shape given?
137 | 	assert( not ( np.all(polygons_gdf is None ) ) )
138 | 	# Input population shapefile given?
139 | 	assert( not pop_shapefile is None )
140 | 	# All input files given?
141 | 	assert( not ( (data_source == 'insee') and (pop_data_file is None) ) )
142 | 
143 | 	# Get buildings convex hull
144 | 	polygon = GeometryCollection( polygons_gdf.geometry.values.tolist() ).convex_hull
145 | 	# Convert to geo-dataframe with defined CRS
146 | 	poly_gdf = gpd.GeoDataFrame([polygon], columns=["geometry"], crs=polygons_gdf.crs)
147 | 	
148 | 	# Compute extract
149 | 	df_pop = get_population_df(pop_shapefile, pop_data_file, data_source, to_crs, poly_gdf)
150 | 	
151 | 	# Save to shapefile
152 | 	df_pop.to_file( get_population_extract_filename(city_ref, data_source), driver='ESRI Shapefile' )
153 | 	return df_pop	
154 | 
155 | 


--------------------------------------------------------------------------------
/urbansprawl/population/downscaling.py:
--------------------------------------------------------------------------------
 1 | ###################################################################################################
 2 | # Repository: https://github.com/lgervasoni/urbansprawl
 3 | # MIT License
 4 | ###################################################################################################
 5 | 
 6 | import geopandas as gpd
 7 | import osmnx as ox
 8 | 
 9 | def proportional_population_downscaling(df_osm_built, df_insee):
10 | 	"""
11 | 	Performs a proportional population downscaling considering the surface dedicated to residential land use
12 | 	Associates the estimated population to each building in column 'population'
13 | 
14 | 	Parameters
15 | 	----------
16 | 	df_osm_built : geopandas.GeoDataFrame
17 | 		input buildings with computed residential surface
18 | 	df_insee : geopandas.GeoDataFrame
19 | 		INSEE population data
20 | 
21 | 	Returns
22 | 	----------
23 | 
24 | 	"""
25 | 	if (df_insee.crs != df_osm_built.crs): # If projections do not match
26 | 		# First project to Lat-Long coordinates, then project to UTM coordinates
27 | 		df_insee = ox.project_gdf( ox.project_gdf(df_insee, to_latlong=True) )
28 | 
29 | 		# OSM Building geometries are already projected
30 | 		assert(df_insee.crs == df_osm_built.crs)
31 | 
32 | 	df_osm_built['geom'] = df_osm_built.geometry
33 | 	df_osm_built_residential = df_osm_built[ df_osm_built.apply(lambda x: x.landuses_m2['residential'] > 0, axis = 1) ]
34 | 
35 | 	# Loading/saving using geopandas loses the 'ellps' key
36 | 	df_insee.crs = df_osm_built_residential.crs
37 | 
38 | 	# Intersecting gridded population - buildings
39 | 	sjoin = gpd.sjoin( df_insee, df_osm_built_residential, op='intersects')
40 | 	# Calculate area within square (percentage of building with the square)
41 | 	sjoin['residential_m2_within'] = sjoin.apply(lambda x: x.landuses_m2['residential'] * (x.geom.intersection(x.geometry).area / x.geom.area), axis=1 )
42 | 	# Initialize
43 | 	df_insee['residential_m2_within'] = 0
44 | 	# Sum residential area within square
45 | 	sum_m2_per_square = sjoin.groupby(sjoin.index)['residential_m2_within'].sum()
46 | 	# Assign total residential area within each square
47 | 	df_insee.loc[ sum_m2_per_square.index, "residential_m2_within" ] = sum_m2_per_square.values
48 | 	# Get number of M^2 / person
49 | 	df_insee[ "m2_per_person" ] = df_insee.apply(lambda x: x.residential_m2_within / x.pop_count, axis=1)
50 | 
51 | 	def population_building(x, df_insee):
52 | 		# Sum of: For each square: M2 of building within square / M2 per person
53 | 		return ( x.get('m2',[]) / df_insee.loc[ x.get('idx',[]) ].m2_per_person ).sum()
54 | 	# Index: Buildings , Values: idx:Indices of gridded square population, m2: M2 within that square
55 | 	buildings_square_m2_association = sjoin.groupby('index_right').apply(lambda x: {'idx':list(x.index), 'm2':list(x.residential_m2_within)} )
56 | 	# Associate
57 | 	df_osm_built.loc[ buildings_square_m2_association.index, "population" ] = buildings_square_m2_association.apply(lambda x: population_building(x,df_insee) )
58 | 	# Drop unnecessary column
59 | 	df_osm_built.drop('geom', axis=1, inplace=True)
60 | 
61 | 


--------------------------------------------------------------------------------
/urbansprawl/population/urban_features.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import geopandas as gpd
  7 | import pandas as pd
  8 | import numpy as np
  9 | import osmnx as ox
 10 | import os.path
 11 | import time
 12 | 
 13 | from osmnx import log
 14 | 
 15 | from .utils import get_aggregated_squares, get_population_df_filled_empty_squares
 16 | # Filenames
 17 | from .utils import get_population_urban_features_filename, get_population_training_validating_filename
 18 | 
 19 | from .data_extract import get_extract_population_data
 20 | 
 21 | # Sprawl indices
 22 | from ..sprawl.dispersion import compute_grid_dispersion
 23 | from ..sprawl.landusemix import compute_grid_landusemix
 24 | 
 25 | from shapely.geometry import Polygon
 26 | 
 27 | def compute_full_urban_features(city_ref, df_osm_built=None, df_osm_pois=None, df_insee=None, data_source=None, kwargs={"max_dispersion":15}):
 28 | 	"""
 29 | 	Computes a set of urban features for each square where population count data exists
 30 | 
 31 | 	Parameters
 32 | 	----------
 33 | 	city_ref : string
 34 | 		city reference name
 35 | 	df_osm_built : geopandas.GeoDataFrame
 36 | 		input buildings
 37 | 	df_osm_pois : geopandas.GeoDataFrame
 38 | 		input points of interest
 39 | 	df_insee : geopandas.GeoDataFrame
 40 | 		grid-cells with population count where urban features will be calculated
 41 | 	data_source : str
 42 | 		define the type of population data for its retrieval in case it was stored
 43 | 	kwargs : dict
 44 | 		keyword arguments to guide the process
 45 | 
 46 | 	Returns
 47 | 	----------
 48 | 	geopandas.GeoDataFrame
 49 | 		geometry with updated urban features
 50 | 	"""
 51 | 
 52 | 	# Population extract exists?
 53 | 	if ( os.path.exists( get_population_urban_features_filename(city_ref, data_source) ) ):
 54 | 		log("Urban features from population gridded data exist for input city: "+city_ref)
 55 | 		# Read from GeoJSON (default projection coordinates)
 56 | 		df_insee_urban_features_4326 = gpd.read_file( get_population_urban_features_filename(city_ref, data_source) )
 57 | 		# Project to UTM coordinates
 58 | 		return ox.project_gdf(df_insee_urban_features_4326)
 59 | 
 60 | 	# Required arguments
 61 | 	assert( not df_osm_built is None )
 62 | 	assert( not df_osm_pois is None )
 63 | 	assert( not df_insee is None )
 64 | 
 65 | 	# Get population count data with filled empty squares (null population)
 66 | 	df_insee_urban_features = get_population_df_filled_empty_squares(df_insee)
 67 | 	# Set crs
 68 | 	crs_proj = df_insee.crs
 69 | 	df_insee_urban_features.crs = crs_proj
 70 | 
 71 | 	##################
 72 | 	### Urban features
 73 | 	##################
 74 | 	# Compute the urban features for each square
 75 | 	log("Calculating urban features")
 76 | 	start = time.time()
 77 | 
 78 | 	# Conserve building geometries
 79 | 	df_osm_built['geom_building'] = df_osm_built['geometry']
 80 | 
 81 | 	# Spatial join: grid-cell i - building j for all intersections
 82 | 	df_insee_urban_features = gpd.sjoin( df_insee_urban_features, df_osm_built, op='intersects', how='left')
 83 | 
 84 | 	# When a grid-cell i does not intersect any building: NaN values
 85 | 	null_idx = df_insee_urban_features.loc[ df_insee_urban_features['geom_building'].isnull() ].index
 86 | 	# Replace NaN for urban features calculation
 87 | 	min_polygon = Polygon([(0,0), (0,np.finfo(float).eps), (np.finfo(float).eps,np.finfo(float).eps)])
 88 | 	df_insee_urban_features.loc[null_idx, 'geom_building'] = df_insee_urban_features.loc[null_idx, 'geom_building'].apply(lambda x: min_polygon)
 89 | 	df_insee_urban_features.loc[null_idx, 'landuses_m2' ] = len( null_idx ) * [{'residential':0, 'activity':0}]
 90 | 	df_insee_urban_features.loc[null_idx, 'building_levels'] = len(null_idx) * [0]
 91 | 
 92 | 	### Pre-calculation of urban features
 93 | 	
 94 | 	# Apply percentage of building presence within square: 1 if fully contained, 0.5 if half the building contained, ...
 95 | 	df_insee_urban_features['building_ratio'] = df_insee_urban_features.apply( lambda x: x.geom_building.intersection(x.geometry).area / x.geom_building.area, axis=1 )
 96 | 
 97 | 	df_insee_urban_features['m2_total_residential'] = df_insee_urban_features.apply( lambda x: x.building_ratio * x.landuses_m2['residential'], axis=1 )
 98 | 	df_insee_urban_features['m2_total_activity'] = df_insee_urban_features.apply( lambda x: x.building_ratio * x.landuses_m2['activity'], axis=1 )
 99 | 
100 | 	df_insee_urban_features['m2_footprint_residential'] = 0
101 | 	df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']), 'm2_footprint_residential' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']) ].apply(lambda x: x.building_ratio * x.geom_building.area, axis=1 )
102 | 	df_insee_urban_features['m2_footprint_activity'] = 0
103 | 	df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']), 'm2_footprint_activity' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']) ].apply(lambda x: x.building_ratio * x.geom_building.area, axis=1 )
104 | 	df_insee_urban_features['m2_footprint_mixed'] = 0
105 | 	df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']), 'm2_footprint_mixed' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']) ].apply(lambda x: x.building_ratio * x.geom_building.area, axis=1 )
106 | 
107 | 	df_insee_urban_features['num_built_activity'] = 0
108 | 	df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']), 'num_built_activity' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']) ].building_ratio
109 | 	df_insee_urban_features['num_built_residential'] = 0
110 | 	df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']), 'num_built_residential' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']) ].building_ratio
111 | 	df_insee_urban_features['num_built_mixed'] = 0
112 | 	df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']), 'num_built_mixed' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']) ].building_ratio
113 | 
114 | 	df_insee_urban_features['num_levels'] = df_insee_urban_features.apply( lambda x: x.building_ratio * x.building_levels, axis=1 )
115 | 	df_insee_urban_features['num_buildings'] = df_insee_urban_features['building_ratio']
116 | 
117 | 	df_insee_urban_features['built_up_m2'] = df_insee_urban_features.apply( lambda x: x.geom_building.area * x.building_ratio , axis=1 )
118 | 
119 | 
120 | 	### Urban features aggregation functions
121 | 	urban_features_aggregation = {}
122 | 	urban_features_aggregation['idINSPIRE'] = lambda x: x.head(1)
123 | 	urban_features_aggregation['pop_count'] = lambda x: x.head(1)
124 | 	urban_features_aggregation['geometry'] = lambda x: x.head(1)
125 | 
126 | 	urban_features_aggregation['m2_total_residential'] = 'sum'
127 | 	urban_features_aggregation['m2_total_activity'] = 'sum'
128 | 
129 | 	urban_features_aggregation['m2_footprint_residential'] = 'sum'
130 | 	urban_features_aggregation['m2_footprint_activity'] = 'sum'
131 | 	urban_features_aggregation['m2_footprint_mixed'] = 'sum'
132 | 
133 | 	urban_features_aggregation['num_built_activity'] = 'sum'
134 | 	urban_features_aggregation['num_built_residential'] = 'sum'
135 | 	urban_features_aggregation['num_built_mixed'] = 'sum'
136 | 
137 | 	urban_features_aggregation['num_levels'] = 'sum'
138 | 	urban_features_aggregation['num_buildings'] = 'sum'
139 | 
140 | 	urban_features_aggregation['built_up_m2'] = 'sum'
141 | 
142 | 	# Apply aggregate functions
143 | 	df_insee_urban_features = df_insee_urban_features.groupby( df_insee_urban_features.index ).agg( urban_features_aggregation )
144 | 
145 | 	# Calculate built up relation (relative to the area of the grid-cell geometry)
146 | 	df_insee_urban_features['built_up_relation'] = df_insee_urban_features.apply(lambda x: x.built_up_m2 / x.geometry.area, axis=1)
147 | 	df_insee_urban_features.drop('built_up_m2', axis=1, inplace=True)
148 | 	
149 | 	# To geopandas.GeoDataFrame and set crs
150 | 	df_insee_urban_features = gpd.GeoDataFrame(df_insee_urban_features)
151 | 	df_insee_urban_features.crs = crs_proj
152 | 	
153 | 	# POIs
154 | 	df_osm_pois_selection = df_osm_pois[ df_osm_pois.classification.isin(["activity","mixed"]) ]
155 | 	gpd_intersection_pois = gpd.sjoin( df_insee_urban_features, df_osm_pois_selection, op='intersects', how='left')
156 | 	# Number of activity/mixed POIs
157 | 	df_insee_urban_features['num_activity_pois'] = gpd_intersection_pois.groupby( gpd_intersection_pois.index ).agg({'osm_id':'count'})
158 | 
159 | 
160 | 	##################
161 | 	### Sprawling indices
162 | 	##################
163 | 	df_insee_urban_features['geometry_squares'] = df_insee_urban_features.geometry
164 | 	df_insee_urban_features['geometry'] = df_insee_urban_features.geometry.centroid
165 | 
166 | 	'''
167 | 	compute_grid_accessibility(df_insee_urban_features, graph, df_osm_built, df_osm_pois)
168 | 	'''
169 | 
170 | 	# Compute land uses mix + densities estimation
171 | 	compute_grid_landusemix(df_insee_urban_features, df_osm_built, df_osm_pois)
172 | 	# Dispersion indices
173 | 	compute_grid_dispersion(df_insee_urban_features, df_osm_built)
174 | 
175 | 	if (kwargs.get("max_dispersion")): # Set max bounds for dispersion values
176 | 		df_insee_urban_features.loc[ df_insee_urban_features.dispersion > kwargs.get("max_dispersion"), "dispersion" ] = kwargs.get("max_dispersion")
177 | 
178 | 	# Set back original geometries
179 | 	df_insee_urban_features['geometry'] = df_insee_urban_features.geometry_squares
180 | 	df_insee_urban_features.drop('geometry_squares', axis=1, inplace=True)
181 | 	
182 | 	# Fill NaN sprawl indices with 0
183 | 	df_insee_urban_features.fillna(0, inplace=True)
184 | 
185 | 	# Save to GeoJSON file (no projection conserved, then use EPSG 4326)
186 | 	ox.project_gdf(df_insee_urban_features, to_latlong=True).to_file( get_population_urban_features_filename(city_ref, data_source), driver='GeoJSON' )
187 | 
188 | 	elapsed_time = int(time.time() - start)
189 | 	log("Done: Urban features calculation. Elapsed time (H:M:S): " + '{:02d}:{:02d}:{:02d}'.format(elapsed_time // 3600, (elapsed_time % 3600 // 60), elapsed_time % 60) )
190 | 
191 | 	return df_insee_urban_features
192 | 
193 | def get_training_testing_data(city_ref, df_insee_urban_features=None):
194 | 	"""
195 | 	Returns the Y and X arrays for training/testing population downscaling estimates.
196 | 
197 | 	Y contains vectors with the correspondent population densities
198 | 	X contains vectors with normalized urban features
199 | 	X_columns columns referring to X values
200 | 	Numpy arrays are stored locally
201 | 
202 | 	Parameters
203 | 	----------
204 | 	city_ref : string
205 | 		city reference name
206 | 	df_insee_urban_features : geopandas.GeoDataFrame
207 | 		grid-cells with population count data and calculated urban features
208 | 
209 | 	Returns
210 | 	----------
211 | 	np.array, np.array, np.array
212 | 		Y vector, X vector, X column names vector
213 | 	"""
214 | 	# Population extract exists?
215 | 	if ( os.path.exists( get_population_training_validating_filename(city_ref) ) ):
216 | 		log("Urban population training+validation data/features exist for input city: " + city_ref)
217 | 		# Read from Numpy.Arrays
218 | 		data = np.load( get_population_training_validating_filename(city_ref) )
219 | 		# Project to UTM coordinates
220 | 		return data["Y"], data["X"], data["X_columns"]
221 | 
222 | 	log("Calculating urban training+validation data/features for city: " + city_ref)
223 | 	start = time.time()
224 | 
225 | 	# Select columns to normalize
226 | 	columns_to_normalise = [col for col in df_insee_urban_features.columns if "num_" in col or "m2_" in col or "dispersion" in col or "accessibility" in col]
227 | 	# Normalize selected columns
228 | 	df_insee_urban_features.loc[:,columns_to_normalise] = df_insee_urban_features.loc[:,columns_to_normalise].apply(lambda x: x / x.max() , axis=0)
229 | 
230 | 	# By default, idINSPIRE for created squares (0 population count) is 0: Change for 'CRS' string: Coherent with squares aggregation procedure (string matching)
231 | 	df_insee_urban_features.loc[ df_insee_urban_features.idINSPIRE == 0, "idINSPIRE" ] = "CRS"
232 | 
233 | 	# Aggregate 5x5 squares: Get all possible aggregations (step of 200 meters = length of individual square)
234 | 	aggregated_df_insee_urban_features = get_aggregated_squares(ox.project_gdf(df_insee_urban_features, to_crs="+init=epsg:3035"), step=200., conserve_squares_info=True)
235 | 
236 | 	# X values: Vector <x1,x2, ... , xn> with normalized urban features
237 | 	X_values = []
238 | 	# Y values: Vector <y1, y2, ... , ym> with normalized population densities. m=25
239 | 	Y_values = []
240 | 
241 | 	# For each <Indices> combination, create a X and Y vector
242 | 	for idx in aggregated_df_insee_urban_features.indices:
243 | 		# Extract the urban features in the given 'indices' order (Fill to 0 for non-existent squares)
244 | 		square_info = df_insee_urban_features.reindex( idx ).fillna(0)
245 | 		# Y input (Ground truth): Population densities
246 | 		population_densities = (square_info["pop_count"] / square_info["pop_count"].sum() ).values
247 | 
248 | 		if (all (pd.isna(population_densities)) ): # If sum of population count is 0, remove (NaN values)
249 | 			continue
250 | 
251 | 		# X input: Normalized urban features
252 | 		urban_features = square_info[ [col for col in square_info.columns if col not in ['idINSPIRE','geometry','pop_count']] ].values
253 | 
254 | 		# Append X, Y
255 | 		X_values.append(urban_features)
256 | 		Y_values.append(population_densities)
257 | 
258 | 	# Get the columns order referenced in each X vector
259 | 	X_values_columns = df_insee_urban_features[  [col for col in square_info.columns if col not in ['idINSPIRE','geometry','pop_count']]  ].columns
260 | 	X_values_columns = np.array(X_values_columns)
261 | 
262 | 	# To Numpy Array
263 | 	X_values = np.array(X_values)
264 | 	Y_values = np.array(Y_values)
265 | 
266 | 	# Save to file
267 | 	np.savez( get_population_training_validating_filename(city_ref), X=X_values, Y=Y_values, X_columns=X_values_columns)
268 | 	
269 | 	log("Done: urban training+validation data/features. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) )
270 | 
271 | 	return Y_values, X_values, X_values_columns
272 | 
273 | def get_Y_X_features_population_data(cities_selection=None, cities_skip=None):
274 | 	"""
275 | 	Returns the Y and X arrays for training/testing population downscaling estimates.
276 | 	It gathers either a selection of cities or all stored cities but a selected list to skip
277 | 
278 | 	Y contains vectors with the correspondent population densities
279 | 	X contains vectors with normalized urban features
280 | 	X_columns columns referring to X values
281 | 	Numpy arrays are previously stored
282 | 
283 | 	Parameters
284 | 	----------
285 | 	cities_selection : string
286 | 		list of cities to select
287 | 	cities_skip : string
288 | 		list of cities to skip (retrieve the rest)
289 | 
290 | 	Returns
291 | 	----------
292 | 	np.array, np.array, np.array
293 | 		Y vector, X vector, X column names vector
294 | 	"""
295 | 	arr_X, arr_Y = [], []
296 | 	
297 | 	# Get the complete training-testig dataset
298 | 	for Y_X_data_city in os.listdir("data/training"):
299 | 		# Only if it contains a valid extension
300 | 		if ('.npz' not in Y_X_data_city): continue
301 | 		
302 | 		# Get city's name
303 | 		city_ref = Y_X_data_city.replace('_X_Y.npz','')
304 | 		
305 | 		# Only retrieve data from cities_selection (if ever given)
306 | 		if ( (cities_selection is not None) and (city_ref not in cities_selection) ): 
307 | 			log('Skipping city: ' + str(city_ref) )
308 | 			continue
309 | 			
310 | 		# Skip cities data from from cities_skip (if ever given)
311 | 		if ( (cities_skip is not None) and (city_ref in cities_skip) ): 
312 | 			log('Skipping city:', city_ref)
313 | 			continue
314 | 		
315 | 		log('Retrieving data for city: ' + str(city_ref) )
316 | 		
317 | 		# Get stored data
318 | 		city_Y, city_X, city_X_cols = get_training_testing_data(city_ref)
319 | 		# Append values
320 | 		arr_Y.append(city_Y)
321 | 		arr_X.append(city_X)
322 | 		
323 | 	# Assumption: All generated testing-training data contain the same X columns
324 | 	return np.concatenate(arr_Y), np.concatenate(arr_X), city_X_cols
325 | 


--------------------------------------------------------------------------------
/urbansprawl/population/utils.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import geopandas as gpd
  7 | import pandas as pd
  8 | import osmnx as ox
  9 | import numpy as np
 10 | 
 11 | from shapely.geometry import Polygon
 12 | from shapely.geometry import Point
 13 | 
 14 | from ..settings import storage_folder
 15 | 
 16 | # Format for load/save the geo-data ['geojson','shp']
 17 | geo_format = 'geojson' # 'shp'
 18 | 
 19 | 
 20 | def get_population_extract_filename(city_ref_file, data_source):
 21 | 	"""
 22 | 	Get data population extract filename for input city
 23 | 
 24 | 	Parameters
 25 | 	----------
 26 | 	city_ref_file : string
 27 | 		name of input city
 28 | 	data_source : string
 29 | 		desired population data source
 30 | 
 31 | 	Returns
 32 | 	----------
 33 | 	string
 34 | 		returns the population extract filename
 35 | 	
 36 | 	"""
 37 | 	# Folder exists?
 38 | 	import os
 39 | 	if not(os.path.isdir(storage_folder + "/" + data_source)): 
 40 | 		os.makedirs(storage_folder + "/" + data_source)
 41 | 	return storage_folder + "/" + data_source + "/" + city_ref_file + "_population.shp"
 42 | 
 43 | def get_population_urban_features_filename(city_ref_file, data_source):
 44 | 	"""
 45 | 	Get population urban features extract filename for input city
 46 | 	Force GeoJSON format: Shapefiles truncate column names
 47 | 
 48 | 	Parameters
 49 | 	----------
 50 | 	city_ref_file : string
 51 | 		name of input city
 52 | 	data_source : string
 53 | 		desired population data source
 54 | 
 55 | 	Returns
 56 | 	----------
 57 | 	string
 58 | 		returns the population extract filename
 59 | 	
 60 | 	"""
 61 | 	# Folder exists?
 62 | 	import os
 63 | 	if not(os.path.isdir(storage_folder + "/" + data_source)): 
 64 | 		os.makedirs(storage_folder + "/" + data_source)
 65 | 	return storage_folder + "/" + data_source + "/" + city_ref_file + "_urban_features." + geo_format
 66 | 
 67 | def get_population_training_validating_filename(city_ref_file, data_source="training"):
 68 | 	"""
 69 | 	Get population normalized urban features extract and population densities filename for input city
 70 | 	Stored in Numpy.Arrays
 71 | 
 72 | 	Parameters
 73 | 	----------
 74 | 	city_ref_file : string
 75 | 		name of input city
 76 | 
 77 | 	Returns
 78 | 	----------
 79 | 	string
 80 | 		returns the numpy stored/storing filename
 81 | 	
 82 | 	"""
 83 | 	# Folder exists?
 84 | 	import os
 85 | 	if not(os.path.isdir(storage_folder + "/" + data_source)): 
 86 | 		os.makedirs(storage_folder + "/" + data_source)
 87 | 	return storage_folder + "/" + data_source + "/" + city_ref_file + "_X_Y.npz"
 88 | 
 89 | #################################################################
 90 | 
 91 | def get_aggregated_squares(df_insee, step=1000., conserve_squares_info=False):
 92 | 	"""
 93 | 	Aggregates input population data in squares of 5x5
 94 | 	Assumption: Input squares 200m by 200m
 95 | 	INSEE data contains column 'idINSPIRE' which denotes in EPSG:3035 the northing/easting coordinates of the south-west box endpoint
 96 | 	If conserve squares information is True, the information relative to each original square is kept
 97 | 	Output: Aggregated squares of 1km by 1km
 98 | 
 99 | 	Parameters
100 | 	----------
101 | 	df_insee : geopandas.GeoDataFrame
102 | 		INSEE population data
103 | 	step : float
104 | 		sampling step (default of 1 kilometer)
105 | 	conserve_squares_info : bool
106 | 		determines if each aggregated square conserves the information of each smaller composing square
107 | 
108 | 	Returns
109 | 	----------
110 | 	geopandas.GeoDataFrame
111 | 		returns the aggregated population data
112 | 	"""
113 | 	def get_northing_easting(x): # Extract northing and easting coordinates
114 | 		try:
115 | 			north, east = x.idINSPIRE.split("N")[1].split("E")
116 | 			x["north"] = int(north)
117 | 			x["east"] = int(east)
118 | 		except:
119 | 			x["north"], x["east"] = np.nan, np.nan
120 | 		return x
121 | 
122 | 	def index_square(x, df_insee, offset_index):
123 | 		squares = df_insee.cx[ x.geometry.x - offset_index: x.geometry.x + offset_index , x.geometry.y - offset_index: x.geometry.y + offset_index]
124 | 		aggregated_polygon = Polygon()
125 | 		for geom in squares.geometry:
126 | 			aggregated_polygon = aggregated_polygon.union(geom)
127 | 		x["square_geometry"] = aggregated_polygon
128 | 		x["pop_count"] = squares.pop_count.sum()
129 | 		return x
130 | 
131 | 	def index_square_conservative(x, df_insee, offset_index):
132 | 		squares = df_insee.cx[ x.geometry.x - offset_index: x.geometry.x + offset_index , x.geometry.y - offset_index: x.geometry.y + offset_index]
133 | 		aggregated_polygon = Polygon()
134 | 		for geom in squares.geometry:
135 | 			aggregated_polygon = aggregated_polygon.union(geom)
136 | 		x["square_geometry"] = aggregated_polygon
137 | 		x["pop_count"] = squares.pop_count.sum()
138 | 		
139 | 		indices = []
140 | 		for y_diff in [-400, -200, 0, 200, 400]: 
141 | 			for x_diff in [-400, -200, 0, 200, 400]: # First iterate over Easting
142 | 				# - 100 on each coordinate: Due to Northing Easting representing south-west point of box
143 | 				coords_match = "N" + str( int(x.NE.y) + y_diff - 100) + "E" + str( int(x.NE.x) + x_diff - 100)
144 | 				values = df_insee[ df_insee.idINSPIRE.str.contains(coords_match) ].index.values
145 | 				if (len(values) == 0):
146 | 					indices += [None]
147 | 				else: # Cocatenate index value
148 | 					indices += list( values )
149 | 				
150 | 		x["indices"] = indices
151 | 		return x
152 | 
153 | 	# Get northing and easting coordinates
154 | 	coordinates = df_insee.apply(lambda x: get_northing_easting(x), axis=1 )[["north","east"]]
155 | 
156 | 	if (conserve_squares_info): # +100 meters to obtain the centroid of each box
157 | 		coords_offset = 100.
158 | 	else: # +500 meters to obtain the centroid of the 5x5 squared-box
159 | 		coords_offset = 500.
160 | 
161 | 	# North, east coordinates denote the south-west box endpoint: 
162 | 	north_min, north_max = coordinates.north.min() + coords_offset, coordinates.north.max() + coords_offset
163 | 	east_min, east_max = coordinates.east.min() + coords_offset, coordinates.east.max() + coords_offset
164 | 
165 | 	# Create mesh grid: One point for each square's centroid: Each square has an extent of 1km by 1km
166 | 	xv, yv = np.meshgrid( np.arange(east_min, east_max, step), np.arange(north_min, north_max, step) )
167 | 	points = [ Point(x,y) for x,y in zip( xv.ravel(), yv.ravel() ) ]
168 | 	# Initialize GeoDataFrame
169 | 	df_squares = gpd.GeoDataFrame( points, columns=[ "geometry" ], crs="+init=epsg:3035" )
170 | 	
171 | 	# Project
172 | 	df_squares = ox.project_gdf(df_squares, to_crs = df_insee.crs)
173 | 
174 | 	# Save Northing-Easting original coordinates for its later reference
175 | 	df_squares["NE"] = points
176 | 
177 | 	if (conserve_squares_info):
178 | 		index_function = index_square_conservative
179 | 	else:
180 | 		index_function = index_square
181 | 
182 | 	# Index, for each square centroid, +- 400 meters to achieve squares of 5 by 5
183 | 	df_squares = df_squares.apply(lambda x: index_function( x, df_insee, offset_index=400 ), axis=1 )
184 | 	# Update geometry
185 | 	df_squares['geometry'] = df_squares.square_geometry
186 | 
187 | 	# Drop useless columns
188 | 	df_squares.drop( ['square_geometry','NE'], axis=1, inplace=True )
189 | 	# Drop empty squares (rows)
190 | 	df_squares.drop(df_squares[ df_squares.geometry.area == 0 ].index, axis=0, inplace=True)
191 | 	# Reset index
192 | 	df_squares.reset_index(drop=True, inplace=True)
193 | 	# Set CRS key-words
194 | 	df_squares.crs = df_insee.crs
195 | 
196 | 	return df_squares
197 | 
198 | 
199 | def population_downscaling_validation(df_osm_built, df_insee):
200 | 	"""
201 | 	Validates the population downscaling estimation by means of aggregating the sum of buildings estimated population lying within each population square
202 | 	Allows to compare the real population count with the estimated population lying within each square
203 | 	Updates new column 'pop_estimation' for each square in the population data frame
204 | 
205 | 	Parameters
206 | 	----------
207 | 	df_osm_built : geopandas.GeoDataFrame
208 | 		input buildings with computed population count
209 | 	df_insee : geopandas.GeoDataFrame
210 | 		INSEE population data
211 | 
212 | 	Returns
213 | 	----------
214 | 
215 | 	"""
216 | 	df_osm_built['geom'] = df_osm_built.geometry
217 | 	df_osm_built_residential = df_osm_built[ df_osm_built.apply(lambda x: x.landuses_m2['residential'] > 0, axis = 1) ]
218 | 	df_insee.crs = df_osm_built_residential.crs
219 | 
220 | 	# Intersecting gridded population - buildings
221 | 	sjoin = gpd.sjoin( df_insee, df_osm_built_residential, op='intersects')
222 | 	# Calculate area within square (percentage of building with the square)
223 | 	sjoin['pop_estimation'] = sjoin.apply(lambda x: x.population * (x.geom.intersection(x.geometry).area / x.geom.area), axis=1 )
224 | 	
225 | 	# Initialize
226 | 	df_insee['pop_estimation'] = np.nan
227 | 	sum_pop_per_square = sjoin.groupby(sjoin.index)['pop_estimation'].sum()
228 | 	
229 | 	df_insee.loc[ sum_pop_per_square.index, "pop_estimation" ] = sum_pop_per_square.values
230 | 	# Drop unnecessary column
231 | 	df_osm_built.drop('geom', axis=1, inplace=True)
232 | 	# Set to 0 nan values
233 | 	df_insee.loc[ df_insee.pop_estimation.isnull(), "pop_estimation" ] = 0
234 | 
235 | 	# Compute absolute and relative error
236 | 	df_insee["absolute_error"] = df_insee.apply(lambda x: abs(x.pop_count - x.pop_estimation), axis=1)
237 | 	df_insee["relative_error"] = df_insee.apply(lambda x: abs(x.pop_count - x.pop_estimation) / x.pop_count, axis=1)
238 | 
239 | 
240 | def get_population_df_filled_empty_squares(df_insee):
241 | 	""" 
242 | 	Add empty squares as 0-population box-squares
243 | 
244 | 	Parameters
245 | 	----------
246 | 	df_insee : geopandas.GeoDataFrame
247 | 		INSEE population data
248 | 
249 | 	Returns
250 | 	----------
251 | 
252 | 	"""
253 | 	def get_northing_easting(x): # Extract northing and easting coordinates
254 | 		north, east = x.idINSPIRE.split("N")[1].split("E")
255 | 		x["north"] = int(north)
256 | 		x["east"] = int(east)
257 | 		return x
258 | 
259 | 	# Project data to its original projection coordinates
260 | 	df_insee_3035 = ox.project_gdf(df_insee, to_crs="+init=epsg:3035")
261 | 	
262 | 	# Get northing and easting coordinates
263 | 	coordinates = df_insee.apply(lambda x: get_northing_easting(x), axis=1 )[["north","east"]]
264 | 
265 | 	# +100 meters to obtain the centroid of each box
266 | 	coords_offset = 100
267 | 	# Input data step
268 | 	step = 200.
269 | 
270 | 	# North, east coordinates denote the south-west box endpoint: 
271 | 	north_min, north_max = coordinates.north.min() + coords_offset, coordinates.north.max() + coords_offset
272 | 	east_min, east_max = coordinates.east.min() + coords_offset, coordinates.east.max() + coords_offset
273 | 	
274 | 	# Create mesh grid: One point for each square's centroid: Each square has an extent of 1km by 1km
275 | 	xv, yv = np.meshgrid( np.arange(east_min, east_max, step), np.arange(north_min, north_max, step) )
276 | 	
277 | 	# For every given coordinate, if a box is not created (no population), make it with an initial population of 0
278 | 	empty_population_box = []
279 | 
280 | 	for E, N in zip( xv.ravel(), yv.ravel() ): # Center-point
281 | 		point_df = gpd.GeoDataFrame( [Point(E,N)], columns=[ "geometry" ], crs="+init=epsg:3035" )
282 | 		if ( gpd.sjoin( point_df, df_insee_3035 ).empty ): # Does not intersect any existing square-box
283 | 			# Create new square
284 | 			empty_population_box.append( Polygon([ (E - 100., N - 100.), (E - 100., N + 100. ), (E + 100., N + 100. ), (E + 100., N - 100. ), (E - 100., N - 100. ) ]) )
285 | 
286 | 	# Concatenate original data frame + Empty squares
287 | 	gdf_concat = pd.concat( [df_insee_3035, gpd.GeoDataFrame( {'geometry':empty_population_box, 'pop_count':[0]*len(empty_population_box) }, crs="+init=epsg:3035" ) ], ignore_index=True, sort=False )
288 | 
289 | 	# Remove added grid-cells outside the convex hull of the population data frame
290 | 	df_insee_convex_hull_3035 = df_insee_3035.unary_union.convex_hull
291 | 	gdf_concat = gdf_concat[ gdf_concat.apply(lambda x: df_insee_convex_hull_3035.intersects(x.geometry), axis=1 ) ]
292 | 	gdf_concat.reset_index(drop=True, inplace=True)	
293 | 
294 | 	# Project (First project to latitude-longitude due to GeoPandas issue)
295 | 	return ox.project_gdf( ox.project_gdf(gdf_concat, to_latlong=True) )


--------------------------------------------------------------------------------
/urbansprawl/settings.py:
--------------------------------------------------------------------------------
1 | ###################################################################################################
2 | # Repository: https://github.com/lgervasoni/urbansprawl
3 | # MIT License
4 | ###################################################################################################
5 | 
6 | storage_folder = 'data'
7 | images_folder = 'images'
8 | 


--------------------------------------------------------------------------------
/urbansprawl/sprawl/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/urbansprawl/sprawl/__init__.py


--------------------------------------------------------------------------------
/urbansprawl/sprawl/accessibility.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | from scipy import spatial
  7 | import numpy as np
  8 | import pandas as pd
  9 | import geopandas as gpd
 10 | import networkx as nx
 11 | import time
 12 | import os
 13 | import subprocess
 14 | import shutil
 15 | 
 16 | from multiprocessing import cpu_count
 17 | 
 18 | from osmnx import log
 19 | from .utils import divide_long_edges_graph
 20 | 
 21 | ##############################################################
 22 | ### Compute accessibility grid
 23 | ##############################################################
 24 | 
 25 | def compute_grid_accessibility(df_indices, G, df_osm_built, df_osm_pois, kw_args={'fixed_distance':True,'fixed_activities':False,'max_edge_length':200,'max_node_distance':250,
 26 | 				'fixed_distance_max_travel_distance':2000, 'fixed_distance_max_num_activities':250, 'fixed_activities_min_number': 20, 'fixed_activities_max_travel_distance':5000} ):
 27 | 	""" 
 28 | 	Calculate accessibility values at point_ref
 29 | 
 30 | 	Parameters
 31 | 	----------
 32 | 	df_indices : geopandas.GeoDataFrame
 33 | 		data frame containing the (x,y) reference points to calculate indices
 34 | 	G : networkx multidigraph
 35 | 		input graph to calculate accessibility
 36 | 	df_osm_built : geopandas.GeoDataFrame
 37 | 		data frame containing the building's geometries and corresponding land uses
 38 | 	df_osm_pois : geopandas.GeoDataFrame
 39 | 		data frame containing the points' of interest geometries
 40 | 	kw_args: dict
 41 | 		additional keyword arguments for the indices calculation
 42 | 			fixed_distance : bool
 43 | 				denotes the cumulative opportunities access to activity land uses given a fixed maximum distance to travel
 44 | 			fixed_activities : bool
 45 | 				represents the distance needed to travel in order to reach a certain number of activity land uses
 46 | 			max_edge_length: int
 47 | 				maximum length, in meters, to tolerate an edge in a graph (otherwise, divide edge)
 48 | 			max_node_distance: int
 49 | 				maximum distance tolerated from input point to closest graph node in order to calculate accessibility values
 50 | 			fixed_distance_max_travel_distance: int
 51 | 				(fixed distance) maximum distance tolerated (cut&branch) when searching for the activities
 52 | 			fixed_distance_max_num_activities: int
 53 | 				(fixed distance) cut iteration if the number of activities exceeds a threshold
 54 | 			fixed_activities_min_number: int
 55 | 				(fixed activities) minimum number of activities required
 56 | 			fixed_activities_max_travel_distance : int
 57 | 				(fixed activities) maximum distance tolerated (cut&branch) when searching for the activities
 58 | 
 59 | 
 60 | 	Returns
 61 | 	----------
 62 | 	int
 63 | 		number of activities found within a radius distance using the street network
 64 | 	"""
 65 | 	log("Calculating accessibility indices")
 66 | 	start = time.time()
 67 | 
 68 | 	# Assert that only one option is set
 69 | 	assert( kw_args["fixed_distance"] ^ kw_args["fixed_activities"] )
 70 | 
 71 | 	# Arguments to pandas.Series
 72 | 	kw_arguments = pd.Series(kw_args)
 73 | 
 74 | 	##############
 75 | 	### Prepare input data for indices calculation in parallel call
 76 | 	##############
 77 | 	# Temporary folder to pickle data
 78 | 	if ( not os.path.exists("temp") ): os.makedirs("temp")
 79 | 	# Number of CPU cores on your system
 80 | 	num_cores = cpu_count() 
 81 | 	# Prepare input data: As many chunks of data as cores
 82 | 	prepare_data(G, df_osm_built, df_osm_pois, df_indices, num_cores, kw_arguments )
 83 | 
 84 | 	#This command could have multiple commands separated by a new line \n
 85 | 	parallel_code = os.path.realpath(__file__).replace(".py","_parallel.py")
 86 | 	command_call = "python " + parallel_code + " temp/graph.gpickle temp/points_NUM_CHUNK.pkl temp/arguments.pkl"
 87 | 	
 88 | 	##############
 89 | 	### Verify amount of memory used per subprocess
 90 | 	##############
 91 | 	p = subprocess.Popen(command_call.replace("NUM_CHUNK",str(0)) + " memory_test", stdout=subprocess.PIPE, shell=True)
 92 | 	output, err = p.communicate()
 93 | 	p.wait()
 94 | 
 95 | 	# Max number of subprocess allocations given its memory consumption
 96 | 	numbers = [ numb for numb in str(output) if numb in ["0","1","2","3","4","5","6","7","8","9"] ]
 97 | 	max_processes = int( ''.join(numbers) )
 98 | 	log("Maximum number of processes to allocate (considering memory availability): " + str(max_processes) )
 99 | 	log("Number of available cores: " + str(num_cores) )
100 | 
101 | 	##############
102 | 	### Set chunks to run in parallel: If more core than allowed processes, divide chunks to run at most X processes
103 | 	##############
104 | 	if (num_cores > max_processes):	# Run parallel-chunks at a splitted pace, to avoid memory swap		
105 | 		chunks_run = np.array_split( list( range(num_cores) ), max_processes )
106 | 	else: # Run all chunks in parallel
107 | 		chunks_run = [ list( range(num_cores) ) ]
108 | 
109 | 
110 | 	# Parallel implementation
111 | 	for chunk in chunks_run: # Run full chunk
112 | 		Ps_i = []
113 | 		for i in chunk: # Run each index
114 | 			p = subprocess.Popen(command_call.replace("NUM_CHUNK",str(i)), stdout=subprocess.PIPE, shell=True)
115 | 			Ps_i.append( p )
116 | 		
117 | 		# Get the output
118 | 		Output_errs = [ p.communicate() for p in Ps_i ]
119 | 		
120 | 		# This makes the wait possible
121 | 		Ps_status = [ p.wait() for p in Ps_i ]
122 | 
123 | 		# Output for chunk
124 | 		for output, err in Output_errs:
125 | 			log ( str(output) )
126 | 
127 | 	# Associate data by getting the chunk results concatenated
128 | 	index_column = "accessibility"
129 | 	df_indices[index_column] = pd.concat( [ pd.read_pickle("temp/indices_NUM_CHUNK.pkl".replace("NUM_CHUNK",str(i)) ) for i in range(num_cores) ], ignore_index=True ).accessibility
130 | 
131 | 	# Delete temporary folder
132 | 	shutil.rmtree('temp')
133 | 
134 | 	log("Done: Accessibility indices. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) )
135 | 
136 | ###############
137 | # Utils
138 | ###############
139 | 
140 | def prepare_data(G, df_osm_built, df_osm_pois, df_indices, num_processes, kw_arguments):
141 | 	""" 
142 | 	Pickles data to a temporary folder in order to achieve parallel accessibility calculation
143 | 	A new subprocess will be created in order to minimize memory requirements
144 | 
145 | 	Parameters
146 | 	----------
147 | 	G : networkx multidigraph
148 | 		input graph to calculate accessibility
149 | 	df_osm_built : geopandas.GeoDataFrame
150 | 		buildings data
151 | 	df_osm_pois : geopandas.GeoDataFrame
152 | 		buildings data
153 | 	df_indices : geopandas.GeoDataFrame
154 | 		data frame where indices will be calculated
155 | 	num_processes : int
156 | 		number of data chunks to create
157 | 	kw_arguments : pandas.Series
158 | 		additional keyword arguments
159 | 
160 | 	Returns
161 | 	----------
162 | 
163 | 	"""
164 | 	# Divide long edges
165 | 	divide_long_edges_graph(G, kw_arguments.max_edge_length )
166 | 	log("Graph long edges shortened")
167 | 
168 | 	# Get activities
169 | 	df_built_activ = df_osm_built[ df_osm_built.classification.isin(["activity","mixed"]) ]
170 | 	df_pois_activ = df_osm_pois[ df_osm_pois.classification.isin(["activity","mixed"]) ]
171 | 	
172 | 	# Associate them to its closest node in the graph
173 | 	associate_activities_closest_node(G, df_built_activ, df_pois_activ )
174 | 	log("Activities associated to graph nodes")
175 | 
176 | 	# Nodes dict
177 | 	for n, data in G.nodes.data(data=True):
178 | 		# Remove useless keys
179 | 		keys_ = list( data.keys() )
180 | 		[ data.pop(k) for k in keys_ if k not in ["x","y","num_activities"] ]
181 | 	
182 | 	# Edges dict
183 | 	for u, v, data in G.edges.data(data=True, keys=False):
184 | 		# Remove useless keys
185 | 		keys_ = list( data.keys() )
186 | 		[ data.pop(k) for k in keys_ if k not in ["length","key"] ]
187 | 
188 | 	try:
189 | 		G.graph.pop("streets_per_node")
190 | 	except:
191 | 		pass
192 | 	# Pickle graph
193 | 	nx.write_gpickle(G, "temp/graph.gpickle")
194 | 
195 | 	# Prepare input indices points
196 | 	data_split = np.array_split(df_indices, num_processes)
197 | 	for i in range(num_processes):
198 | 		data_split[i].to_pickle("temp/points_"+str(i)+".pkl")
199 | 	# Pickle arguments
200 | 	kw_arguments.to_pickle("temp/arguments.pkl")
201 | 
202 | def associate_activities_closest_node(G, df_activities_built, df_activities_pois ):
203 | 	""" 
204 | 	Associates the number of existing activities to their closest nodes in the graph
205 | 
206 | 	Parameters
207 | 	----------
208 | 	G : networkx multidigraph
209 | 		input graph to calculate accessibility
210 | 	df_activities_built : pandas.DataFrame
211 | 		data selection of buildings with activity uses
212 | 	df_activities_pois : pandas.DataFrame
213 | 		data selection of points of interest with activity uses
214 | 
215 | 	Returns
216 | 	----------
217 | 
218 | 	"""
219 | 	# Initialize number of activity references
220 | 	for u, data in G.nodes(data=True):
221 | 		data["num_activities"] = 0
222 | 
223 | 	# Initialize KDTree of graph nodes
224 | 	coords = np.array([[node, data['x'], data['y']] for node, data in G.nodes(data=True)])
225 | 	df_nodes = pd.DataFrame(coords, columns=['node', 'x', 'y'])
226 | 	# zip coordinates
227 | 	data = list(zip( df_nodes["x"].ravel(), df_nodes["y"].ravel() ))
228 | 	# Create input tree
229 | 	tree = spatial.KDTree( data )
230 | 
231 | 	def associate_to_node(tree, point, G):
232 | 		distance, idx_node = tree.query( (point.x,point.y) )
233 | 		G.node[ df_nodes.loc[ idx_node, "node"] ]["num_activities"] += 1
234 | 	
235 | 	# Associate each activity to its closest node
236 | 	df_activities_built.apply(lambda x: associate_to_node(tree, x.geometry.centroid, G) , axis=1)
237 | 	df_activities_pois.apply(lambda x: associate_to_node(tree, x.geometry.centroid, G) , axis=1)


--------------------------------------------------------------------------------
/urbansprawl/sprawl/accessibility_parallel.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import sys
  7 | import numpy as np
  8 | import pandas as pd
  9 | import networkx as nx
 10 | from bisect import bisect
 11 | import time
 12 | 
 13 | def get_nearest_node_utm(G, point, return_dist=False):
 14 | 	"""
 15 | 	Return the nearest graph node to some specified point in UTM coordinates
 16 | 	
 17 | 	Parameters
 18 | 	----------
 19 | 	G : networkx multidigraph
 20 | 		input graph
 21 | 	point : tuple
 22 | 		the (x, y) point for which we will find the nearest node in the graph
 23 | 	return_dist : bool
 24 | 		optionally also return the distance between the point and the nearest node
 25 | 	
 26 | 	Returns
 27 | 	-------
 28 | 	int or tuple
 29 | 		corresponding node or tuple (int node, float distance)
 30 | 	""" 
 31 | 	# dump graph node coordinates into a pandas dataframe indexed by node id with x and y columns
 32 | 	coords = np.array([[node, data['x'], data['y']] for node, data in G.nodes(data=True)])
 33 | 	df = pd.DataFrame(coords, columns=['node', 'x', 'y']).set_index('node')
 34 | 	# Point coordinates
 35 | 	p_x, p_y = point
 36 | 	#print(df.loc[124550])
 37 | 	distances = df.apply(lambda x: np.sqrt( ( x.x - p_x)**2 + ( x.y- p_y)**2), axis=1 )
 38 | 	
 39 | 	# nearest node's ID is the index label of the minimum distance
 40 | 	nearest_node = distances.idxmin()
 41 | 	
 42 | 	# if caller requested return_dist, return distance between the point and the nearest node as well
 43 | 	if return_dist:
 44 | 		return int(nearest_node), distances.loc[nearest_node]
 45 | 	else:
 46 | 		return int(nearest_node)
 47 | 
 48 | ##############################################
 49 | ### Accessibility indices calculation
 50 | ##############################################
 51 | 
 52 | def get_count_activities_fixed_distance(G, point_ref, arguments):
 53 | 	""" 
 54 | 	Calculate accessibility value at point_ref according to chosen metric
 55 | 	If no graph node exists nearby input point reference, NaN is set
 56 | 	Based on counting the number of (activity) opportunities given a fixed maximum distance to travel
 57 | 
 58 | 	Parameters
 59 | 	----------
 60 | 	G : networkx multidigraph
 61 | 		input graph to calculate accessibility
 62 | 	point_ref: shapely.Point
 63 | 		reference point to calculate accesisibility
 64 | 
 65 | 	Returns
 66 | 	----------
 67 | 	int
 68 | 		returns the number of reached activities
 69 | 	"""
 70 | 	# Find closest node to point_ref
 71 | 	N0, distance = get_nearest_node_utm(G, point_ref.coords[0], return_dist=True)
 72 | 	# Distance to closest node too high?
 73 | 	if (distance > arguments.max_node_distance): return np.nan
 74 | 
 75 | 	# Initialize data structures
 76 | 	visited_nodes = set()
 77 | 	neighboring_nodes_id = []
 78 | 	neighboring_nodes_cost = []
 79 | 	num_activities_travelled = 0
 80 | 	N_visit = N0
 81 | 
 82 | 	# Pre-compute the shortest path length from source node N0 to other nodes; using lengths of roads as weight
 83 | 	shortest_path_length_N0_ = nx.single_source_dijkstra_path_length(G, source=N0, cutoff=arguments.fixed_distance_max_travel_distance, weight="length")
 84 | 
 85 | 	while ( True ):
 86 | 		# Store visited node
 87 | 		visited_nodes.add(N_visit)
 88 | 
 89 | 		# Update traveled activities
 90 | 		num_activities_travelled += G.node[N_visit]["num_activities"]
 91 | 
 92 | 		# Reached sufficient number of activities
 93 | 		if ( num_activities_travelled >= arguments.fixed_distance_max_num_activities ): return arguments.fixed_distance_max_num_activities
 94 | 		
 95 | 		# Add to neighboring_nodes the neighbors of visited node
 96 | 		for N_i in G.neighbors(N_visit):
 97 | 			if ( (not N_i in neighboring_nodes_id) and (not N_i in visited_nodes) ): # Not stored/visited already
 98 | 				# Store neighboring nodes, ordered by their distance cost
 99 | 				cost = shortest_path_length_N0_.get(N_i)
100 | 
101 | 				if (cost): # If path within MAX_DISTANCE_TO_TRAVEL exists, add neighboring node
102 | 					idx_to_insert = bisect( neighboring_nodes_cost, cost )
103 | 					# Insert in ordered list
104 | 					neighboring_nodes_id.insert(idx_to_insert, N_i)
105 | 					neighboring_nodes_cost.insert(idx_to_insert, cost)
106 | 		
107 | 		if (neighboring_nodes_id): # If neighborings nodes exist: Continue iteration
108 | 			# Update next node to visit
109 | 			N_visit = neighboring_nodes_id.pop(0)
110 | 			# Pop cost associated to N_visit
111 | 			neighboring_nodes_cost.pop(0)
112 | 		else: # If no neighboring nodes: Reached maximum distance tolerated, cut the iteration
113 | 			return num_activities_travelled
114 | 	
115 | 	return np.nan
116 | 
117 | 
118 | 
119 | def get_minimum_cost_activities_travel(G, point_ref, arguments):
120 | 	""" 
121 | 	Calculate accessibility value at point_ref according to chosen metric
122 | 	If no graph node exists nearby input point reference, NaN is set
123 | 	Based on the minimum radius travel cost to accomplish a certain quantity of activities
124 | 
125 | 	Parameters
126 | 	----------
127 | 	G : networkx multidigraph
128 | 		input graph to calculate accessibility
129 | 	point_ref: shapely.Point
130 | 		reference point to calculate accessibility
131 | 
132 | 	Returns
133 | 	----------
134 | 	float
135 | 		returns the computed radius cost length
136 | 	"""
137 | 	# Find closest node to point_ref
138 | 	N0, distance = get_nearest_node_utm(G, point_ref.coords[0], return_dist=True)
139 | 	# Distance to closest node too high?
140 | 	if (distance > arguments.max_node_distance): return np.nan
141 | 
142 | 	# Initialize data structures
143 | 	visited_nodes = []
144 | 	neighboring_nodes_id = []
145 | 	neighboring_nodes_cost = []
146 | 	activities_travelled = 0
147 | 	
148 | 	N_visit = N0
149 | 
150 | 	while ( not activities_travelled >= arguments.fixed_activities_min_number ):
151 | 		# Store visited node
152 | 		visited_nodes.append(N_visit)
153 | 
154 | 		# Update traveled activities
155 | 		activities_travelled += G.node[N_visit]["num_activities"]
156 | 
157 | 		# Add to neighboring_nodes the neighbors of visited node
158 | 		for N_i in G.neighbors(N_visit):
159 | 			if ( (not N_i in neighboring_nodes_id) and (not N_i in visited_nodes) ): # Not stored/visited already
160 | 				# Store neighboring nodes, ordered by their distance cost
161 | 				cost = nx.shortest_path_length(G,N0,N_i,weight="length")
162 | 				idx_to_insert = bisect( neighboring_nodes_cost, cost )
163 | 				# Insert in ordered list
164 | 				neighboring_nodes_id.insert(idx_to_insert, N_i)
165 | 				neighboring_nodes_cost.insert(idx_to_insert, cost)
166 | 		
167 | 		if (neighboring_nodes_id): # If not empty
168 | 			# Update next node to visit
169 | 			N_visit = neighboring_nodes_id.pop(0)
170 | 			cost_travel = neighboring_nodes_cost.pop(0)
171 | 
172 | 			# Reached maximum distance tolerated. Cut iteration
173 | 			if (cost_travel > arguments.fixed_activities_max_travel_distance):
174 | 				return arguments.fixed_activities_max_travel_distance
175 | 		else: # Empty neighbors
176 | 			return np.nan
177 | 	
178 | 	# Accomplished. End node: visited_nodes[-1]
179 | 	return nx.shortest_path_length(G,N0,visited_nodes[-1],weight="length")
180 | 
181 | 
182 | def main(argv):
183 | 	""" 
184 | 	Main program to drive the accessibility indices calculation
185 | 
186 | 	Parameters
187 | 	----------
188 | 	argv : array
189 | 		arguments to drive the calculation
190 | 
191 | 	Returns
192 | 	----------
193 | 
194 | 	"""
195 | 	start = time.time()
196 | 
197 | 	# Load graph
198 | 	G = nx.read_gpickle(argv[1])
199 | 
200 | 	# Load indices points
201 | 	indices = pd.read_pickle( argv[2] )
202 | 
203 | 	# Load indices calculation arguments
204 | 	arguments = pd.read_pickle( argv[3] )
205 | 
206 | 	if ( ( len(argv) > 4 ) and (argv[4] == "memory_test" ) ): # Test memory used for current subprocess
207 | 		import os
208 | 		import psutil
209 | 		process = psutil.Process(os.getpid())
210 | 		Allocated_process_MB = process.memory_info().rss / 1000 / 1000
211 | 		Free_system_MB = psutil.virtual_memory().available / 1000 / 1000
212 | 		#print( MB_total )
213 | 		#print( Free_system_MB )
214 | 		max_processes = int( Free_system_MB / Allocated_process_MB )
215 | 		print(max_processes)
216 | 		return
217 | 
218 | 
219 | 	if (arguments.fixed_activities):
220 | 		_calculate_accessibility = get_minimum_cost_activities_travel
221 | 	elif (arguments.fixed_distance):
222 | 		_calculate_accessibility = get_count_activities_fixed_distance
223 | 	else:
224 | 		assert(False)
225 | 
226 | 	# Calculate accessibility
227 | 	indices["accessibility"] = indices.geometry.apply(lambda x: _calculate_accessibility(G, x, arguments) )
228 | 
229 | 	# Store results
230 | 	indices.to_pickle( argv[2].replace('points','indices') )
231 | 
232 | 	end = time.time()
233 | 	print( "Time:",str(end-start) )
234 | 
235 | 
236 | if __name__ == "__main__":
237 | 	main(sys.argv)


--------------------------------------------------------------------------------
/urbansprawl/sprawl/core.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import pandas as pd
  7 | import geopandas as gpd
  8 | import numpy as np
  9 | from osmnx import log
 10 | from shapely.geometry import Point
 11 | 
 12 | from ..osm.core import get_route_graph, get_processed_osm_data
 13 | from .landusemix import compute_grid_landusemix
 14 | from .accessibility import compute_grid_accessibility
 15 | from .dispersion import compute_grid_dispersion
 16 | 
 17 | def get_indices_grid(df_osm_built, df_osm_building_parts, df_osm_pois, step=100):
 18 | 	""" 
 19 | 	Creates an input geodataframe with points sampled in a regular grid
 20 | 
 21 | 	Parameters
 22 | 	----------
 23 | 	df_osm_built : geopandas.GeoDataFrame
 24 | 		OSM processed buildings
 25 | 	df_osm_building_parts : geopandas.GeoDataFrame
 26 | 		OSM processed building parts
 27 | 	df_osm_pois : geopandas.GeoDataFrame
 28 | 		OSM processed points of interest
 29 | 	step : int
 30 | 		step to sample the regular grid in meters
 31 | 
 32 | 	Returns
 33 | 	----------
 34 | 	geopandas.GeoDataFrame
 35 | 		regular grid
 36 | 	"""
 37 | 	# Get bounding box
 38 | 	west, south, east, north = pd.concat( [ df_osm_built, df_osm_building_parts, df_osm_pois ], sort=False ).total_bounds
 39 | 	# Create indices
 40 | 	df_indices = gpd.GeoDataFrame( [ Point(i,j) for i in np.arange(west, east, step) for j in np.arange(south, north, step) ], columns=["geometry"] )
 41 | 	# Set projection
 42 | 	df_indices.crs = df_osm_built.crs
 43 | 	return df_indices
 44 | 
 45 | def process_spatial_indices(city_ref=None, region_args={"polygon":None, "place":None, "which_result":1, "point":None, "address":None, "distance":None, "north":None, "south":None, "east":None, "west":None},
 46 | 			grid_step = 100,
 47 | 			process_osm_args = {"retrieve_graph":True, "default_height":3, "meters_per_level":3, "associate_landuses_m2":True, "minimum_m2_building_area":9, "date":None},
 48 | 			dispersion_args = {'radius_search': 750, 'use_median': False, 'K_nearest': 50},
 49 | 			landusemix_args = {'walkable_distance': 600, 'compute_activity_types_kde': True, 'weighted_kde': True, 'pois_weight': 9, 'log_weighted': True},
 50 | 			accessibility_args = {'fixed_distance': True, 'fixed_activities': False, 'max_edge_length': 200, 'max_node_distance': 250, 
 51 | 				'fixed_distance_max_travel_distance': 2000, 'fixed_distance_max_num_activities': 250, 'fixed_activities_min_number': 20},
 52 | 			indices_computation = {"dispersion":True, "landusemix":True, "accessibility":True} ):
 53 | 	"""
 54 | 	Process sprawling indices for an input region of interest
 55 | 	1) OSM data is retrieved and processed. 
 56 | 		If the city name has already been processed, locally stored data will be loaded
 57 | 	2) A regular grid is created where indices will be calculated
 58 | 	3) Sprawling indices are calculated and returned
 59 | 
 60 | 	Parameters
 61 | 	----------
 62 | 	city_ref : str
 63 | 		Name of input city / region
 64 | 	grid_step : int
 65 | 		step to sample the regular grid in meters
 66 | 	region_args : dict
 67 | 		contains the information to retrieve the region of interest as the following:
 68 | 			polygon : shapely Polygon or MultiPolygon
 69 | 				geographic shape to fetch the land use footprints within
 70 | 			place : string or dict
 71 | 				query string or structured query dict to geocode/download
 72 | 			which_result : int
 73 | 				result number to retrieve from geocode/download when using query string 
 74 | 			point : tuple
 75 | 				the (lat, lon) central point around which to construct the region
 76 | 			address : string
 77 | 				the address to geocode and use as the central point around which to construct the region
 78 | 			distance : int
 79 | 				retain only those nodes within this many meters of the center of the region
 80 | 			north : float
 81 | 				northern latitude of bounding box
 82 | 			south : float
 83 | 				southern latitude of bounding box
 84 | 			east : float
 85 | 				eastern longitude of bounding box
 86 | 			west : float
 87 | 				western longitude of bounding box
 88 | 	process_osm_args : dict
 89 | 		additional arguments to drive the OSM data extraction process:
 90 | 			retrieve_graph : boolean
 91 | 				that determines if the street network for input city has to be retrieved and stored
 92 | 			default_height : float
 93 | 				height of buildings under missing data
 94 | 			meters_per_level : float
 95 | 				buildings number of levels assumed under missing data
 96 | 			associate_landuses_m2 : boolean
 97 | 				compute the total square meter for each land use
 98 | 			minimum_m2_building_area : float
 99 | 				minimum area to be considered a building (otherwise filtered)
100 | 			date : datetime.datetime
101 | 				query the database at a certain timestamp
102 | 	dispersion_args : dict
103 | 		arguments to drive the dispersion indices calculation
104 | 			radius_search: int
105 | 				circle radius to consider the dispersion calculation at a local point
106 | 			use_median : bool
107 | 				denotes whether the median or mean should be used to calculate the indices
108 | 			K_nearest : int
109 | 				number of neighboring buildings to consider in evaluation
110 | 	landusemix_args : dict
111 | 		arguments to drive the land use mix indices calculation
112 | 			walkable_distance : int
113 | 				the bandwidth assumption for Kernel Density Estimation calculations (meters)
114 | 			compute_activity_types_kde : bool
115 | 				determines if the densities for each activity type should be computed
116 | 			weighted_kde : bool
117 | 				use Weighted Kernel Density Estimation or classic version
118 | 			pois_weight : int
119 | 				Points of interest weight equivalence with buildings (squared meter)
120 | 			log_weighted : bool
121 | 				apply natural logarithmic function to surface weights
122 | 	accessibility_args : dict
123 | 		arguments to drive the accessibility indices calculation
124 | 			fixed_distance : bool
125 | 				denotes the cumulative opportunities access to activity land uses given a fixed maximum distance to travel
126 | 			fixed_activities : bool
127 | 				represents the distance needed to travel in order to reach a certain number of activity land uses
128 | 			max_edge_length: int
129 | 				maximum length, in meters, to tolerate an edge in a graph (otherwise, divide edge)
130 | 			max_node_distance: int
131 | 				maximum distance tolerated from input point to closest graph node in order to calculate accessibility values
132 | 			fixed_distance_max_travel_distance: int
133 | 				(fixed distance) maximum distance tolerated (cut&branch) when searching for the activities
134 | 			fixed_distance_max_num_activities: int
135 | 				(fixed distance) cut iteration if the number of activities exceeds a threshold
136 | 			fixed_activities_min_number: int
137 | 				(fixed activities) minimum number of activities required
138 | 	indices_computation : dict
139 | 		determines what sprawling indices should be computed
140 | 
141 | 	Returns
142 | 	----------
143 | 	gpd.GeoDataFrame
144 | 		returns the regular grid with the indicated sprawling indices	
145 | 	"""
146 | 	try:
147 | 		# Process OSM data
148 | 		df_osm_built, df_osm_building_parts, df_osm_pois = get_processed_osm_data(city_ref=city_ref, region_args=region_args, kwargs=process_osm_args)
149 | 		# Get route graph
150 | 		G = get_route_graph(city_ref)
151 | 
152 | 		if (not ( indices_computation.get("accessibility") or indices_computation.get("landusemix") or indices_computation.get("dispersion") ) ):
153 | 			log("Not computing any spatial indices")
154 | 			return None
155 | 		
156 | 		# Get indices grid
157 | 		df_indices = get_indices_grid(df_osm_built, df_osm_building_parts, df_osm_pois, grid_step)
158 | 		
159 | 		# Compute sprawling indices
160 | 		if (indices_computation.get("accessibility")):
161 | 			compute_grid_accessibility(df_indices, G, df_osm_built, df_osm_pois, accessibility_args)
162 | 		if (indices_computation.get("landusemix")):
163 | 			compute_grid_landusemix(df_indices, df_osm_built, df_osm_pois, landusemix_args)
164 | 		if (indices_computation.get("dispersion")):
165 | 			compute_grid_dispersion(df_indices, df_osm_built, dispersion_args)
166 | 		
167 | 		return df_indices
168 | 
169 | 	except Exception as e:
170 | 		log("Could not compute the spatial indices. An exception occurred: " + str(e))
171 | 		return None
172 | 


--------------------------------------------------------------------------------
/urbansprawl/sprawl/dispersion.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | from scipy import spatial
  7 | import numpy as np
  8 | import pandas as pd
  9 | import time
 10 | 
 11 | from osmnx import log
 12 | 
 13 | ##############################################################
 14 | ### Dispersion indices methods
 15 | ##############################################################
 16 | 
 17 | def closest_building_distance_median( point_ref, tree, df_closest_d, radius_search ):
 18 | 	""" 
 19 | 	Dispersion metric at point_ref
 20 | 	Computes the median of the closest distance to another building for each building within a radius search
 21 | 	Uses the input KDTree to accelerate calculations
 22 | 
 23 | 	Parameters
 24 | 	----------
 25 | 	point_ref : shapely.Point
 26 | 		calculate index at input point
 27 | 	tree : scipy.spatial.KDTree
 28 | 		KDTree of buildings centroid
 29 | 	df : pandas.DataFrame 
 30 | 		data frame of buildings with closest distance calculation
 31 | 	radius_search : float
 32 | 		circle radius to consider the dispersion calculation at a local point
 33 | 
 34 | 	Returns
 35 | 	----------
 36 | 	float
 37 | 		value of dispersion at input point
 38 | 	"""
 39 | 	# Query buildings within radius search
 40 | 	indices = tree.query_ball_point( point_ref, radius_search )
 41 | 	# No dispersion value
 42 | 	if (len(indices) == 0): return np.NaN
 43 | 	# Calculate median of closest distance values. If no information is available, NaN is set
 44 | 	return df_closest_d.loc[ indices ].median()
 45 | 
 46 | def closest_building_distance_average( point_ref, tree, df_closest_d, radius_search ):
 47 | 	""" 
 48 | 	Dispersion metric at point_ref
 49 | 	Computes the mean of the closest distance to another building for each building within a radius search
 50 | 	Uses the input KDTree to accelerate calculations
 51 | 
 52 | 	Parameters
 53 | 	----------
 54 | 	point_ref : shapely.Point
 55 | 		calculate index at input point
 56 | 	tree : scipy.spatial.KDTree
 57 | 		KDTree of buildings centroid
 58 | 	df : pandas.DataFrame 
 59 | 		data frame of buildings with closest distance calculation
 60 | 	radius_search : int
 61 | 		circle radius to consider the dispersion calculation at a local point
 62 | 
 63 | 	Returns
 64 | 	----------
 65 | 	float
 66 | 		value of dispersion at input point
 67 | 	"""
 68 | 	# Query buildings within radius search
 69 | 	indices = tree.query_ball_point( point_ref, radius_search )
 70 | 	# No dispersion value
 71 | 	if (len(indices) == 0): return np.NaN
 72 | 	# Calculate mean of closest distance values. If no information is available, NaN is set
 73 | 	return df_closest_d.loc[ indices ].mean()
 74 | 
 75 | 
 76 | ##############################################################
 77 | ### Dispersion indices calculation
 78 | ##############################################################
 79 | 
 80 | def compute_grid_dispersion(df_indices, df_osm_built, kwargs={"radius_search":750, "use_median":True, "K_nearest":50} ):
 81 | 	""" 
 82 | 	Creates grid and calculates dispersion indices.
 83 | 
 84 | 	Parameters
 85 | 	----------
 86 | 	df_indices : geopandas.GeoDataFrame
 87 | 		data frame containing the (x,y) reference points to calculate indices
 88 | 	df_osm_built : geopandas.GeoDataFrame
 89 | 		data frame containing the building's geometries
 90 | 	kw_args: dict
 91 | 		additional keyword arguments for the indices calculation
 92 | 			radius_search: int
 93 | 				circle radius to consider the dispersion calculation at a local point
 94 | 			use_median : bool
 95 | 				denotes whether the median or mean should be used to calculate the indices
 96 | 			K_nearest : int
 97 | 				number of neighboring buildings to consider in evaluation
 98 | 
 99 | 	Returns
100 | 	----------
101 | 	geopandas.GeoDataFrame
102 | 		data frame with the added column for dispersion indices
103 | 	"""
104 | 	log("Calculating dispersion indices")
105 | 	start = time.time()
106 | 
107 | 	# Get radius search: circle radius to consider the dispersion calculation at a local point
108 | 	radius_search = kwargs["radius_search"]
109 | 	# Use the median or mean computation ?
110 | 	use_median = kwargs["use_median"]
111 | 
112 | 	# Assign dispersion calculation method
113 | 	if (kwargs["use_median"]):
114 | 		_calculate_dispersion = closest_building_distance_median
115 | 	else:
116 | 		_calculate_dispersion = closest_building_distance_average
117 | 	
118 | 	# Calculate the closest distance for each building within K_nearest centroid buildings
119 | 	_apply_polygon_closest_distance_neighbor(df_osm_built, K_nearest = kwargs["K_nearest"] )
120 | 	
121 | 	# For dispersion calculation approximation, create KDTree with buildings centroid
122 | 	coords_data = [ point.coords[0] for point in df_osm_built.loc[ df_osm_built.closest_d.notnull() ].geometry.apply(lambda x: x.centroid) ]
123 | 	# Create KDTree
124 | 	tree = spatial.KDTree( coords_data )
125 | 	
126 | 	# Compute dispersion indices
127 | 	index_column = "dispersion"
128 | 	df_indices[index_column] = df_indices.geometry.apply(lambda x: _calculate_dispersion(x, tree, df_osm_built.closest_d, radius_search ) )
129 | 	
130 | 	# Remove added column
131 | 	df_osm_built.drop('closest_d', axis=1, inplace=True)
132 | 
133 | 	log("Done: Dispersion indices. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) )
134 | 	
135 | 
136 | def _apply_polygon_closest_distance_neighbor(df_osm_built, K_nearest = 50):
137 | 	""" 
138 | 	Computes for each polygon, the distance to the (approximated) nearest neighboring polygon
139 | 	Approximation is done using distance between centroids to K nearest neighboring polygons, then evaluating the real polygon distance
140 | 	A column `closest_d` is added in the data frame
141 | 
142 | 	Parameters
143 | 	----------
144 | 	df_osm_built: geopandas.GeoDataFrame
145 | 		data frame containing the building's geometries
146 | 	K_nearest: int
147 | 		number of neighboring polygons to evaluate
148 | 
149 | 	Returns
150 | 	----------
151 | 
152 | 	"""
153 | 	def get_closest_indices(tree, x, K_nearest):
154 | 		# Query the closest buidings considering their centroid
155 | 		return tree.query( x.centroid.coords[0] , k=K_nearest+1)[1][1:]
156 | 	def compute_closest_distance(x, buildings):
157 | 		# Minimum distance of all distances between reference building 'x' and the other buildings
158 | 		return (buildings.apply(lambda b: x.distance(b) ) ).min()
159 | 
160 | 	# Use all elements to get the exact closest neighbor?
161 | 	if ( (K_nearest == -1) or (K_nearest >= len(df_osm_built)) ): K_nearest = len(df_osm_built)-1
162 | 
163 | 	# Get separate list for coordinates
164 | 	coords_data = [ geom.centroid.coords[0] for geom in df_osm_built.geometry ]
165 | 	# Create KD Tree using polygon's centroid
166 | 	tree = spatial.KDTree( coords_data )
167 | 
168 | 	# Get the closest buildings indices
169 | 	df_osm_built['closest_buildings'] = df_osm_built.geometry.apply(lambda x: get_closest_indices(tree, x, K_nearest) )
170 | 	# Compute the minimum real distance for the closest buildings
171 | 	df_osm_built['closest_d'] = df_osm_built.apply(lambda x: compute_closest_distance(x.geometry,df_osm_built.geometry.loc[x.closest_buildings]), axis=1 )
172 | 	# Drop unnecessary column
173 | 	df_osm_built.drop('closest_buildings', axis=1, inplace=True)


--------------------------------------------------------------------------------
/urbansprawl/sprawl/landusemix.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import math
  7 | import numpy as np
  8 | import pandas as pd
  9 | import time
 10 | 
 11 | from sklearn.neighbors.kde import KernelDensity
 12 | from .utils import WeightedKernelDensityEstimation
 13 | 
 14 | from osmnx import log
 15 | 
 16 | ##############################################################
 17 | ### Land use mix indices methods
 18 | ##############################################################
 19 | 
 20 | def metric_phi_entropy(x,y):
 21 | 	""" 
 22 | 	Shannon's entropy metric
 23 | 	Based on article "Comparing measures of urban land use mix, 2013"
 24 | 
 25 | 	Parameters
 26 | 	----------
 27 | 	x : float
 28 | 		probability related to land use X
 29 | 	y : float
 30 | 		probability related to land use Y
 31 | 
 32 | 	Returns
 33 | 	----------
 34 | 	float
 35 | 		entropy value
 36 | 	"""
 37 | 	# Undefined for negative values
 38 | 	if (x<0 or y<0): return np.nan
 39 | 	# Entropy Index not defined for any input value equal to zero (due to logarithm)
 40 | 	if (x == 0 or y == 0): return 0
 41 | 	# Sum = 1
 42 | 	x_,y_ = x/(x+y), y/(x+y)
 43 | 	phi_value = - ( ( x_*math.log(x_) ) + ( y_*math.log(y_) ) ) / math.log(2)
 44 | 	return phi_value
 45 | 
 46 | #### Assign land use mix method
 47 | _land_use_mix = metric_phi_entropy
 48 | 
 49 | ##############################################################
 50 | ### Land use mix indices calculation
 51 | ##############################################################
 52 | 
 53 | def compute_grid_landusemix(df_indices, df_osm_built, df_osm_pois, kw_args={'walkable_distance':600,'compute_activity_types_kde':True,'weighted_kde':True,'pois_weight':9,'log_weighted':True} ):
 54 | 	""" 
 55 | 	Calculate land use mix indices on input grid
 56 | 
 57 | 	Parameters
 58 | 	----------
 59 | 	df_indices : geopandas.GeoDataFrame
 60 | 		data frame containing the (x,y) reference points to calculate indices
 61 | 	df_osm_built : geopandas.GeoDataFrame
 62 | 		data frame containing the building's geometries
 63 | 	df_osm_pois : geopandas.GeoDataFrame
 64 | 		data frame containing the points' of interest geometries
 65 | 	kw_args: dict
 66 | 		additional keyword arguments for the indices calculation
 67 | 			walkable_distance : int
 68 | 				the bandwidth assumption for Kernel Density Estimation calculations (meters)
 69 | 			compute_activity_types_kde : bool
 70 | 				determines if the densities for each activity type should be computed
 71 | 			weighted_kde : bool
 72 | 				use Weighted Kernel Density Estimation or classic version
 73 | 			pois_weight : int
 74 | 				Points of interest weight equivalence with buildings (squared meter)
 75 | 			log_weighted : bool
 76 | 				apply natural logarithmic function to surface weights
 77 | 
 78 | 	Returns
 79 | 	----------
 80 | 	pandas.DataFrame
 81 | 		land use mix indices
 82 | 	"""
 83 | 	log("Calculating land use mix indices")
 84 | 	start = time.time()
 85 | 
 86 | 	# Get the bandwidth, related to 'walkable distances'
 87 | 	bandwidth = kw_args["walkable_distance"]
 88 | 	# Compute a weighted KDE?
 89 | 	weighted_kde = kw_args["weighted_kde"]
 90 | 	X_weights = None
 91 | 
 92 | 	# Get full list of contained POIs
 93 | 	contained_pois = list(set([element for list_ in df_osm_built.containing_poi[ df_osm_built.containing_poi.notnull() ] for element in list_]))
 94 | 	# Get the POIs not contained by any building
 95 | 	df_osm_pois_not_contained = df_osm_pois[ ~ df_osm_pois.index.isin( contained_pois) ]
 96 | 
 97 | 	############
 98 | 	### Calculate land use density estimations
 99 | 	############
100 | 
101 | 	####
102 | 	# Residential
103 | 	####
104 | 	df_osm_built_indexed = df_osm_built[ df_osm_built.classification.isin(["residential","mixed"]) ]
105 | 	if (weighted_kde): X_weights = df_osm_built_indexed.landuses_m2.apply(lambda x: x["residential"] )
106 | 
107 | 	df_indices["residential_pdf"] = calculate_kde(df_indices.geometry, df_osm_built_indexed, None, bandwidth, X_weights, kw_args["pois_weight"], kw_args["log_weighted"] )
108 | 	log("Residential density estimation done")
109 | 
110 | 	####
111 | 	# Activities
112 | 	####
113 | 	df_osm_built_indexed = df_osm_built[ df_osm_built.classification.isin(["activity","mixed"]) ]
114 | 	df_osm_pois_not_cont_indexed = df_osm_pois_not_contained[ df_osm_pois_not_contained.classification.isin(["activity","mixed"]) ]
115 | 	if (weighted_kde): X_weights = df_osm_built_indexed.landuses_m2.apply(lambda x: x["activity"] )
116 | 	
117 | 	df_indices["activity_pdf"] = calculate_kde(df_indices.geometry, df_osm_built_indexed, df_osm_pois_not_cont_indexed, bandwidth, X_weights, kw_args["pois_weight"], kw_args["log_weighted"] )
118 | 	log("Activity density estimation done")
119 | 	
120 | 	####
121 | 	# Compute activity types densities
122 | 	####
123 | 	if ( kw_args["compute_activity_types_kde"] ):
124 | 		assert('activity_category' in df_osm_built.columns)
125 | 
126 | 		# Get unique category values
127 | 		unique_categories_built = [list(x) for x in set(tuple(x) for x in df_osm_built.activity_category.values if isinstance(x,list) ) ]
128 | 		unique_categories_pois = [list(x) for x in set(tuple(x) for x in df_osm_pois_not_cont_indexed.activity_category.values if isinstance(x,list) ) ]
129 | 		flat_list = [item for sublist in unique_categories_built + unique_categories_pois for item in sublist]
130 | 		categories = list( set(flat_list) )
131 | 
132 | 		for cat in categories: # Get data frame selection of input category
133 | 			# Buildings and POIs within that category
134 | 			df_built_category = df_osm_built_indexed[ df_osm_built_indexed.activity_category.apply(lambda x: (isinstance(x,list)) and (cat in x) ) ]
135 | 			df_pois_category = df_osm_pois_not_cont_indexed[ df_osm_pois_not_cont_indexed.activity_category.apply(lambda x: (isinstance(x,list)) and (cat in x) ) ]
136 | 			if (weighted_kde): X_weights = df_built_category.landuses_m2.apply(lambda x: x[ cat ] )
137 | 			
138 | 			df_indices[ cat + "_pdf" ] = calculate_kde( df_indices.geometry, df_built_category, df_pois_category, bandwidth, X_weights, kw_args["pois_weight"], kw_args["log_weighted"] )
139 | 		
140 | 		log("Activity grouped by types density estimation done")
141 | 	
142 | 
143 | 	# Compute land use mix indices
144 | 	index_column = "landusemix"
145 | 	df_indices[index_column] = df_indices.apply(lambda x: _land_use_mix(x.activity_pdf, x.residential_pdf), axis=1 )
146 | 	df_indices["landuse_intensity"] = df_indices.apply(lambda x: (x.activity_pdf + x.residential_pdf)/2., axis=1 )
147 | 	
148 | 	log("Done: Land use mix indices. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) )
149 | 	
150 | ####
151 | 
152 | def calculate_kde(points, df_osm_built, df_osm_pois=None, bandwidth=400, X_weights=None, pois_weight=9, log_weight=True):
153 | 	"""
154 | 	Evaluate the probability density function using Kernel Density Estimation of input geo-localized data
155 | 	KDE's bandwidth stands for walkable distances
156 | 	If input weights are given, a Weighted Kernel Density Estimation is carried out
157 | 
158 | 	Parameters
159 | 	----------
160 | 	points : geopandas.GeoSeries
161 | 		reference points to calculate indices
162 | 	df_osm_built : geopandas.GeoDataFrame
163 | 		data frame containing the building's geometries
164 | 	df_osm_pois : geopandas.GeoDataFrame
165 | 		data frame containing the points' of interest geometries
166 | 	bandwidth: int
167 | 		bandwidth value to be employed on the Kernel Density Estimation
168 | 	X_weights : pandas.Series
169 | 		indicates the weight for each input building (e.g. surface)
170 | 	pois_weight : int
171 | 		weight assigned to points of interest
172 | 	log_weight : bool
173 | 		if indicated, applies a log transformation to input weight values
174 | 
175 | 	Returns
176 | 	----------
177 | 	pandas.Series
178 | 		
179 | 	"""
180 | 	# X_b : Buildings array
181 | 	X_b = [ [p.x,p.y] for p in df_osm_built.geometry.centroid.values ]
182 | 	
183 | 	# X_p : Points array
184 | 	if (df_osm_pois is None): X_p = []
185 | 	else: X_p = [ [p.x,p.y] for p in df_osm_pois.geometry.centroid.values ]
186 | 	
187 | 	# X : Full array
188 | 	X = np.array( X_b + X_p )
189 | 
190 | 	# Points where the probability density function will be evaluated
191 | 	Y = np.array( [ [p.x,p.y] for p in points.values ] )
192 | 
193 | 	if (not (X_weights is None) ): # Weighted Kernel Density Estimation
194 | 		# Building's weight + POIs weight
195 | 		X_W = np.concatenate( [X_weights.values, np.repeat( [pois_weight], len(X_p) )] )
196 | 
197 | 		if (log_weight): # Apply logarithm
198 | 			X_W = np.log( X_W )
199 | 
200 | 		PDF = WeightedKernelDensityEstimation(X, X_W, bandwidth, Y)		
201 | 		return pd.Series( PDF / PDF.max() )		
202 | 	else: # Kernel Density Estimation
203 | 		# Sklearn 
204 | 		kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth).fit(X)		
205 | 		# Sklearn returns the results in the form log(density)
206 | 		PDF = np.exp(kde.score_samples(Y))
207 | 		return pd.Series( PDF / PDF.max() )


--------------------------------------------------------------------------------
/urbansprawl/sprawl/utils.py:
--------------------------------------------------------------------------------
  1 | ###################################################################################################
  2 | # Repository: https://github.com/lgervasoni/urbansprawl
  3 | # MIT License
  4 | ###################################################################################################
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | import networkx as nx
  9 | import math
 10 | from shapely.geometry import LineString
 11 | from scipy.spatial.distance import cdist
 12 | 
 13 | 
 14 | def WeightedKernelDensityEstimation(X, Weights, bandwidth, Y, max_mb_per_chunk = 1000):
 15 |     """ 
 16 |     Computes a Weighted Kernel Density Estimation
 17 | 
 18 |     Parameters
 19 |     ----------
 20 |     X : array
 21 |         input points
 22 |     Weights : array
 23 |         array of weights associated to points
 24 |     bandwidth : float
 25 |         bandwidth for kernel density estimation
 26 |     Y : array
 27 |         points where density estimations will be performed
 28 | 
 29 |     Returns
 30 |     ----------
 31 |     pd.Series
 32 |         returns an array of the estimated densities rescaled between [0;1]
 33 |     """
 34 |     def get_megabytes_pairwise_distances_allocation(X, Y):
 35 |     	# Calculate MB needed to allocate pairwise distances
 36 |     	return len(X) * len(Y) * 8 * 1e-6
 37 |     
 38 |     # During this procedure, pairwise euclidean distances are computed between inputs points X and points to estimate Y
 39 |     # For this reason, Y is divided in chunks to avoid big memory allocations. At most, X megabytes per chunk are allocated for pairwise distances
 40 |     Y_split = np.array_split( Y, math.ceil( get_megabytes_pairwise_distances_allocation(X,Y) / max_mb_per_chunk ) )
 41 |     
 42 |     """
 43 |     ### Step by step
 44 |     # Weighed KDE: Sum{ Weight_i * K( (X-Xi) / h) }
 45 |     W_norm = np.array( Weights / np.sum(Weights) )
 46 |     cdist_values = cdist( Y, X, 'euclidean') / bandwidth
 47 |     Ks = np.exp( -.5 * ( cdist_values ) ** 2  )
 48 |     PDF = np.sum( Ks * W_norm, axis=1)
 49 |     """
 50 |     """
 51 |     ### Complete version. Memory consuming
 52 |     PDF = np.sum( np.exp( -.5 * ( cdist( Y, X, 'euclidean') / bandwidth ) ** 2  ) * ( np.array( Weights / np.sum(Weights) ) ), axis=1)
 53 |     """
 54 | 
 55 |     ### Divide Y in chunks to avoid big memory allocations
 56 |     PDF = np.concatenate( [ np.sum( np.exp( -.5 * ( cdist( Y_i, X, 'euclidean') / bandwidth ) ** 2  ) * ( np.array( Weights / np.sum(Weights) ) ), axis=1) for Y_i in Y_split ] )
 57 |     # Rescale
 58 |     return pd.Series( PDF / PDF.sum() )
 59 | 
 60 | 
 61 | def cut_in_two(line):
 62 | 	"""
 63 | 	Cuts input line into two lines of equal length
 64 | 
 65 | 	Parameters
 66 | 	----------
 67 | 	line : shapely.LineString
 68 | 		input line
 69 | 
 70 | 	Returns
 71 | 	----------
 72 | 	list (LineString, LineString, Point)
 73 | 		two lines and the middle point cutting input line
 74 | 	"""
 75 | 	from shapely.geometry import Point, LineString
 76 | 	# Get final distance value
 77 | 	distance = line.length/2
 78 | 	# Cuts a line in two at a distance from its starting point
 79 | 	if distance <= 0.0 or distance >= line.length:
 80 | 		return [LineString(line)]
 81 | 	coords = list(line.coords)
 82 | 	for i, p in enumerate(coords):
 83 | 		pd = line.project(Point(p))
 84 | 		if pd == distance:
 85 | 			return [LineString(coords[:i+1]), LineString(coords[i:]), pd]
 86 | 		if pd > distance:
 87 | 			cp = line.interpolate(distance)
 88 | 			return [ LineString(coords[:i] + [(cp.x, cp.y)]), LineString([(cp.x, cp.y)] + coords[i:]), cp]
 89 | 
 90 | class NodeCounter:
 91 | 	"""
 92 | 	Node negative counter. Utils for node osmid creation. Start on -1 and it auto decrements
 93 | 	"""
 94 | 	def __init__(self):
 95 | 		self._num = 0
 96 | 	def get_num(self):
 97 | 		self._num -= 1
 98 | 		return self._num
 99 | 
100 | def verify_divide_edge(G, u, v, key, data, node_creation_counter, max_edge_length):
101 | 	"""
102 | 	Verify if edge(u,v)[key] length is higher than a certain threshold
103 | 	In this case, divide edge(u,v) in two edges of equal length
104 | 	Assign negative values to the edges new osm id
105 | 	Call recursively to continue dividing each of the lines if necessary
106 | 
107 | 	Parameters
108 | 	----------
109 | 	G : networkx multidigraph
110 | 		input graph
111 | 	u : node
112 | 		origin node
113 | 	v : node
114 | 		destination node
115 | 	key : int
116 | 		(u,v) arc identifier
117 | 	data : dict
118 | 		arc data
119 | 	node_creation_counter : NodeCounter
120 | 		node identifier creation
121 | 	max_edge_length : float
122 | 		maximum tolerated edge length
123 | 
124 | 	Returns
125 | 	----------
126 | 
127 | 	"""
128 | 	# Input: Two communicated nodes (u, v)
129 | 	if ( data["length"] <= max_edge_length ): # Already satisfy condition?
130 | 		return
131 | 	
132 | 	# Get geometry connecting (u,v)
133 | 	if ( data.get("geometry",None) ): # Geometry exists
134 | 		geometry = data["geometry"]
135 | 	else: # Real geometry is a straight line between the two nodes
136 | 		P_U = G.node[u]["x"], G.node[u]["y"]
137 | 		P_V = G.node[v]["x"], G.node[v]["y"]
138 | 		geometry = LineString( (P_U, P_V) )
139 | 	
140 | 	# Get geometries for edge(u,middle), edge(middle,v) and node(middle)
141 | 	line1, line2, middle_point = cut_in_two(geometry)
142 | 	
143 | 	# Copy edge(u,v) data to conserve attributes. Modify its length
144 | 	data_e1 = data.copy()
145 | 	data_e2 = data.copy()
146 | 	# Associate correct length
147 | 	data_e1["length"] = line1.length
148 | 	data_e2["length"] = line2.length
149 | 	# Assign geometries
150 | 	data_e1["geometry"] = line1
151 | 	data_e2["geometry"] = line2
152 | 	
153 | 	# Create new node: Middle distance of edge
154 | 	x,y = list(middle_point.coords)[0]
155 | 	# Set a new unique osmid: Negative (as in OSM2PGSQL, created objects contain negative osmid)
156 | 	node_osmid = node_creation_counter.get_num()
157 | 	node_data = {'osmid':node_osmid, 'x':x, 'y':y}
158 | 	
159 | 	# Add middle node with its corresponding data
160 | 	G.add_node(node_osmid)
161 | 	nx.set_node_attributes(G, {node_osmid : node_data } )
162 | 	
163 | 	# Add edges (u,middle) and (middle,v)
164 | 	G.add_edge(u, node_osmid)
165 | 	nx.set_edge_attributes(G, { (u, node_osmid, 0): data_e1 } )
166 | 	G.add_edge(node_osmid, v)
167 | 	nx.set_edge_attributes(G, { (node_osmid, v, 0): data_e2 } )
168 | 	
169 | 	# Remove edge (u,v)
170 | 	G.remove_edge(u,v,key=key)
171 | 	
172 | 	# Recursively verify created edges and divide if necessary. Use last added key to identify the edge
173 | 	last_key = len( G[u][node_osmid] ) -1
174 | 	verify_divide_edge(G, u, node_osmid, last_key, data_e1, node_creation_counter, max_edge_length)
175 | 	last_key = len( G[node_osmid][v] ) -1
176 | 	verify_divide_edge(G, node_osmid, v, last_key, data_e2, node_creation_counter, max_edge_length)
177 | 	
178 | 
179 | def divide_long_edges_graph(G, max_edge_length):
180 | 	"""
181 | 	Divide all edges with a higher length than input threshold by means of dividing the arcs and creating new nodes
182 | 
183 | 	Parameters
184 | 	----------
185 | 	G : networkx multidigraph
186 | 		input graph
187 | 	max_edge_length : float
188 | 		maximum tolerated edge length
189 | 
190 | 	Returns
191 | 	----------
192 | 
193 | 	"""
194 | 	# Negative osm_id indicate created nodes
195 | 	node_creation_counter = NodeCounter()
196 | 
197 | 	for u, v, key, data in list( G.edges(data=True, keys=True) ):
198 | 		if ( data["length"] > max_edge_length ):
199 | 			# Divide the edge (u,v) recursively
200 | 			verify_divide_edge(G, u, v, key, data, node_creation_counter, max_edge_length)


--------------------------------------------------------------------------------