├── .gitignore ├── LICENSE.md ├── README.md ├── examples ├── 1-Overview-OpenStreetMap-data.ipynb ├── 2-Spatial-indices-Land-use-mix.ipynb ├── 3-Spatial-indices-Urban-sprawl.ipynb ├── 4-Spatial-indices-Granularity.ipynb ├── 5-Batch-mode-Urban-sprawl.ipynb ├── 6-Disaggregated-population-estimates-Pre-requisites.ipynb ├── 7-Disaggregated-population-estimates-Residential-surface-approach.ipynb ├── 8-Disaggregated-population-estimates-Neural-networks-approach.ipynb └── images │ ├── Grenoble_GPW_simulation.png │ ├── Grenoble_INSEE.png │ ├── Lyon_Accessibility.png │ ├── Lyon_Buildings.png │ ├── Lyon_Dispersion.png │ ├── Lyon_Landusemix.png │ ├── Lyon_POIs.png │ ├── Lyon_activities_densities.png │ ├── Lyon_densities.png │ └── Lyon_graph.png ├── setup.py └── urbansprawl ├── __init__.py ├── osm ├── __init__.py ├── classification.py ├── core.py ├── overpass.py ├── surface.py ├── tags.py └── utils.py ├── population ├── __init__.py ├── core.py ├── data_extract.py ├── downscaling.py ├── urban_features.py └── utils.py ├── settings.py └── sprawl ├── __init__.py ├── accessibility.py ├── accessibility_parallel.py ├── core.py ├── dispersion.py ├── landusemix.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Folders 2 | __pycache__/ 3 | cache/ 4 | data/ 5 | logs/ 6 | .ipynb_checkpoints/ 7 | 8 | # Logs 9 | *.log 10 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Luciano Gervasoni 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Urbansprawl 2 | 3 | The urbansprawl project provides an open source framework for assessing urban sprawl using open data. It uses OpenStreetMap (OSM) data to calculate its sprawling indices, divided in Accessibility, Land use mix, and Dispersion. 4 | 5 | Locations of residential and activity (e.g. shop, commerce, office, among others) units are used to measure mixed use development and built-up dispersion, whereas the street network is used to measure the accessibility between different land uses. The output consists of spatial indices, which can be easily integrated with GIS platforms. 6 | 7 | Additionally, a method to perform dissagregated population estimates at building level is provided. Our goal is to estimate the number of people living at the fine level of individual households by using open urban data (OpenStreetMap) and coarse-scaled population data (census tract). 8 | 9 | **Motivation:** 10 | 11 | Urban sprawl has been related to numerous negative environmental and socioeconomic impacts. Meanwhile, the number of people living in cities has been increasing considerably since 1950, from 746 million to 3.9 billion in 2014. More than 66% of the world's population are projected to live in urban areas by 2050, against 30% in 1950 [(United Nations, 2014)](https://esa.un.org/unpd/wup/publications/files/wup2014-highlights.pdf). The fact that urban areas have been growing at increasing rates urges for assessing urban sprawl towards sustainable development. However, sprawl is an elusive term and different approaches to measure it have lead to heterogeneous results. 12 | 13 | Moreover, most studies rely on private/commercial data-sets and their software is rarely made public, impeding research reproducibility and comparability. Furthermore, many works give as result a unique value for a region of analysis, dismissing spatial information which is vital for urban planners and policy makers. 14 | 15 | This situation brings new challenges on how to conceive cities that host such amounts of population in a sustainable way. Thus, this sustainability question should address several aspects, ranging from economical to social and environmental matters among others. Urbansprawl provides an open framework to aid in the process of calculating sprawling indices. 16 | 17 | **Framework characteristics:** 18 | 19 | * Open data: we rely solely on open data in order to ensure replicability. 20 | * Open source: users are free to use the framework for any purpose. 21 | * World-wide coverage: the analysis can be applied to any city in the world, as long as sufficient data exists. 22 | * Data homogeneity: a set of statistical tools are applied to homogeneous and well-defined [map features](https://wiki.openstreetmap.org/wiki/Map_Features) data. 23 | * Geo-localized data: precise location of features allow to cope with the [Modifiable Areal Unit Problem](https://en.wikipedia.org/wiki/Modifiable_areal_unit_problem) (avoid using gridded data, e.g. Land Use Land Cover data). 24 | * Crowd-sourced data: rapid updates given an ever-increasing community. 25 | * GIS output: easy to integrate with other GIS frameworks. 26 | * Potential missing data: still few data exist for some regions in the world. 27 | 28 | **Disclaimer:** This package is no longer maintained. 29 | 30 | **For more details, refer to:** 31 | * Gervasoni Luciano, 2018. "[Contributions to the formalization and implementation of spatial urban indices using open data : application to urban sprawl studies](https://tel.archives-ouvertes.fr/tel-02077356)." Computers and Society [cs.CY]. Université Grenoble Alpes, 2018. 32 | * Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2016. "[A framework for evaluating urban land use mix from crowd-sourcing data](https://hal.inria.fr/hal-01396792)." 2nd International Workshop on Big Data for Sustainable Development (IEEE Big Data 2016). 33 | * Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2017. "[LUM_OSM: une plateforme pour l'évaluation de la mixité urbaine à partir de données participatives](https://hal.inria.fr/hal-01548341)." GAST Workshop, Conférence Extraction et Gestion de Connaissances (EGC 2017). 34 | * Gervasoni Luciano, Bosch Martí, Fenet Serge, and Sturm Peter. 2017. "[Calculating spatial urban sprawl indices using open data](https://hal.inria.fr/hal-01535469)." 15th International Conference on Computers in Urban Planning and Urban Management (CUPUM 2017). 35 | * Gervasoni Luciano, Fenet Serge, and Sturm Peter. 2018. "[Une méthode pour l’estimation désagrégée de données de population à l’aide de données ouvertes](https://hal.inria.fr/hal-01667975)." Conférence Internationale sur l'Extraction et la Gestion des Connaissances (EGC 2018). 36 | * Gervasoni Luciano, Fenet Serge, Perrier Régis and Sturm Peter. 2018. "[Convolutional neural networks for disaggregated population mapping using open data](https://hal.inria.fr/hal-01852585)." IEEE International Conference on Data Science and Advanced Analytics (DSAA 2018). 37 | 38 | ## Installation 39 | 40 | The urbansprawl framework works with Python 2+3. 41 | 42 | - Python dependencies: 43 | ```sh 44 | osmnx scikit-learn psutil tensorflow keras jupyter 45 | ``` 46 | 47 | ### Using pip 48 | - Install the ```spatialindex``` library. Using apt-get (Linux): 49 | ```sh 50 | sudo apt-get install libspatialindex-dev 51 | ``` 52 | - Install the dependencies using *pip* 53 | ```sh 54 | pip install osmnx scikit-learn psutil tensorflow keras jupyter 55 | ``` 56 | 57 | ### Using Miniconda 58 | - Install [Miniconda](https://conda.io/miniconda.html) 59 | - [Optional] Create a [conda virtual environment](http://conda.pydata.org/docs/using/envs.html) 60 | ``` 61 | conda create --name urbansprawl-env 62 | source activate urbansprawl-env 63 | ``` 64 | 65 | - Install the dependencies using the conda package manager and the conda-forge channel 66 | ```sh 67 | conda install -c conda-forge libspatialindex osmnx scikit-learn psutil tensorflow keras jupyter 68 | ``` 69 | 70 | ### Using Anaconda 71 | - Install [Anaconda](https://www.anaconda.com/download) 72 | - [Optional] Create a [conda virtual environment](http://conda.pydata.org/docs/using/envs.html) 73 | ``` 74 | conda create --name urbansprawl-env 75 | source activate urbansprawl-env 76 | ``` 77 | 78 | - Install the dependencies using the conda package manager and the conda-forge channel 79 | ```sh 80 | conda update -c conda-forge --all 81 | conda install -c conda-forge osmnx scikit-learn psutil tensorflow keras jupyter 82 | ``` 83 | 84 | ## Usage 85 | The framework is presented through different [examples](https://github.com/lgervasoni/urbansprawl/tree/master/examples) in the form of notebooks. As well, the computational running times involved in each procedure are shown for each example. To this end, a _r5.large_ [AWS EC2](https://aws.amazon.com/ec2/) instance was employed (2 vCPU and 16GiB memory) to run the notebooks. 86 | 87 | Please note that the different procedures can be both memory and time consuming, according to the size of the chosen region of interest. In order to run the different notebooks, type in a terminal: 88 | ```sh 89 | jupyter notebook 90 | ``` 91 | 92 | ## Example: Urban sprawl 93 | 94 | OpenStreetMap data is retrieved using the Overpass API. 95 | An input region of interest can be extracted by: 96 | 97 | * Place + result number: The name of the city/region, and the resulting number to retrieve (as seen in OpenStreetMap result order) 98 | * Polygon: A polygon with the coordinates delimitating the desired region of interest 99 | * Bounding box: Using northing, southing, easting, and westing coordinates 100 | * Point + distance (meters): Use the (latitude, longitude) central point plus an input distance around it 101 | * Address + distance (meters): Set the address as central point and an input distance around it 102 | 103 | Additionally, the state of the data-base can be retrieved for a specific data. 104 | This allows for comparisons across time and keeping track of a city's evolution. 105 | 106 | Results are depicted for the city of **Lyon, France**: 107 | 108 | - Locations of residential and activity land uses are retrieved 109 | 110 | * Buildings with defined land use: 111 | * Blue: Residential use 112 | * Red: Activity use 113 | * Green: Mixed use 114 | 115 | ![Buildings](examples/images/Lyon_Buildings.png?raw=true) 116 | 117 | * Points of interest (POIs) with defined land use: 118 | 119 | ![POI](examples/images/Lyon_POIs.png?raw=true) 120 | 121 | - Densities for each land use are estimated: 122 | 123 | * Probability density function estimated using Kernel Density Estimation (KDE) 124 | 125 | ![Densit](examples/images/Lyon_densities.png?raw=true) 126 | 127 | * Activity uses can be further classified using the OSM wiki: 128 | * Leisure and amenity 129 | * Shop 130 | * Commercial and industrial 131 | 132 | ![Activ](examples/images/Lyon_activities_densities.png?raw=true) 133 | 134 | - Street network: 135 | 136 | ![SN](examples/images/Lyon_graph.png?raw=true) 137 | 138 | **Sprawling indices:** 139 | 140 | - Land use mix indices: Degree of co-occurence of differing land uses within 'walkable' distances. 141 | 142 | ![LUM](examples/images/Lyon_Landusemix.png?raw=true) 143 | 144 | - Accessibility indices: Denotes the degree of accessibility to differing land uses (from residential to activity uses). 145 | 146 | * Fixed activities: Represents the distance needed to travel in order to reach a certain number of activity land uses 147 | 148 | * Fixed distance: Denotes the cumulative number of activity opportunities found within a certain travel distance 149 | 150 | ![Acc](examples/images/Lyon_Accessibility.png?raw=true) 151 | 152 | - Dispersion indices: Denotes the degree of scatteredness of the built-up area. 153 | 154 | * "A landscape suffers from urban sprawl if it is permeated by urban development or solitary buildings [...]. The more area built over and the more dispersed the built-up area, [...] the higher the degree of urban sprawl" [(Jaeger and Schwick 2014)](http://www.sciencedirect.com/science/article/pii/S1470160X13004858) 155 | 156 | ![Disp](examples/images/Lyon_Dispersion.png?raw=true) 157 | 158 | ## Example: Population densities 159 | 160 | Gridded population data is used in the context of population densities downscaling: 161 | 162 | * A fine scale description of residential land use (surface) per building is built exploiting OpenStreetMap. 163 | 164 | * Using coarse-scale gridded population data, we perform the down-scaling for each household given their containing area for residential usage 165 | 166 | * The evaluation is carried out using fine-grained census block data (INSEE) for cities in France as ground-truth. 167 | 168 | Population count images are depicted for the city of **Grenoble, France**: 169 | 170 | - Population densities (INSEE census data): 171 | 172 | ![INSEE](examples/images/Grenoble_INSEE.png?raw=true) 173 | 174 | 175 | - Population densities (INSEE census data, Gridded Population World resolution): 176 | 177 | ![GPW](examples/images/Grenoble_GPW_simulation.png?raw=true) 178 | 179 | -------------------------------------------------------------------------------- /examples/images/Grenoble_GPW_simulation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Grenoble_GPW_simulation.png -------------------------------------------------------------------------------- /examples/images/Grenoble_INSEE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Grenoble_INSEE.png -------------------------------------------------------------------------------- /examples/images/Lyon_Accessibility.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Accessibility.png -------------------------------------------------------------------------------- /examples/images/Lyon_Buildings.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Buildings.png -------------------------------------------------------------------------------- /examples/images/Lyon_Dispersion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Dispersion.png -------------------------------------------------------------------------------- /examples/images/Lyon_Landusemix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_Landusemix.png -------------------------------------------------------------------------------- /examples/images/Lyon_POIs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_POIs.png -------------------------------------------------------------------------------- /examples/images/Lyon_activities_densities.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_activities_densities.png -------------------------------------------------------------------------------- /examples/images/Lyon_densities.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_densities.png -------------------------------------------------------------------------------- /examples/images/Lyon_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/examples/images/Lyon_graph.png -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import find_packages, setup 2 | 3 | with open('urbansprawl/__init__.py', 'r') as f: 4 | for line in f: 5 | if line.startswith('__version__'): 6 | version = line.strip().split('=')[1].strip(' \'"') 7 | break 8 | else: 9 | version = '0.0.1' 10 | 11 | with open('README.md', 'rb') as f: 12 | readme = f.read().decode('utf-8') 13 | 14 | install_requires = [ 15 | 'psutil', 16 | 'numpy<=1.14.1', 17 | 'pandas', 18 | 'matplotlib', 19 | 'shapely', 20 | 'geopandas', 21 | 'scikit-learn', 22 | 'tensorflow<=1.10.0', 23 | 'keras', 24 | 'networkx', 25 | 'osmnx', 26 | 'jupyter' 27 | ] 28 | 29 | setup( 30 | name='urbansprawl', 31 | keywords=['urbansprawl', 'land use mix', 'gis', 'spatial analysis', 'machine learning', 'openstreetmap', 'population density', 'population downscaling', 'neural networks'], 32 | version=version, 33 | description='The urbansprawl project provides an open source framework for assessing urban sprawl using open data', 34 | long_description=readme, 35 | author='Luciano Gervasoni', 36 | author_email='gervasoni.luc@gmail.com', 37 | maintainer='Luciano Gervasoni', 38 | maintainer_email='gervasoni.luc@gmail.com', 39 | license='MIT', 40 | url='https://github.com/lgervasoni/urbansprawl', 41 | entry_points={ }, 42 | classifiers=[ 43 | 'Development Status :: 4 - Beta', 44 | 'Intended Audience :: Developers', 45 | 'Intended Audience :: Science/Research', 46 | 'Topic :: Scientific/Engineering :: GIS', 47 | 'Topic :: Scientific/Engineering :: Visualization', 48 | 'Topic :: Scientific/Engineering :: Physics', 49 | 'Topic :: Scientific/Engineering :: Mathematics', 50 | 'Topic :: Scientific/Engineering :: Information Analysis', 51 | 'Topic :: Scientific/Engineering :: Artificial Intelligence', 52 | 'Operating System :: OS Independent', 53 | 'License :: OSI Approved :: MIT License', 54 | 'Programming Language :: Python :: 2.7', 55 | 'Programming Language :: Python :: 3.5', 56 | 'Programming Language :: Python :: 3.6', 57 | 'Programming Language :: Python :: Implementation :: CPython', 58 | ], 59 | install_requires=install_requires, 60 | # pip install -e .[dev] 61 | extras_require={'dev': ['pytest', 'flake8', 'ipython', 'ipdb']}, 62 | packages=find_packages(exclude=['examples']), 63 | ) 64 | -------------------------------------------------------------------------------- /urbansprawl/__init__.py: -------------------------------------------------------------------------------- 1 | """urbansprawl package 2 | """ 3 | 4 | # OpenStreetMap data 5 | from .osm.core import get_route_graph, get_processed_osm_data 6 | 7 | # Spatial urban sprawl indices 8 | from .sprawl.core import compute_grid_landusemix, compute_grid_accessibility, compute_grid_dispersion 9 | from .sprawl.core import get_indices_grid, process_spatial_indices 10 | 11 | # Disaggrated population estimates 12 | from .population.core import get_extract_population_data, compute_full_urban_features, get_training_testing_data, get_Y_X_features_population_data 13 | from .population.core import get_aggregated_squares, proportional_population_downscaling, population_downscaling_validation 14 | 15 | 16 | __version__ = '1.1' 17 | -------------------------------------------------------------------------------- /urbansprawl/osm/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/urbansprawl/osm/__init__.py -------------------------------------------------------------------------------- /urbansprawl/osm/classification.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import osmnx as ox 7 | import pandas as pd 8 | import geopandas as gpd 9 | import numpy as np 10 | from scipy import spatial 11 | 12 | from osmnx import log 13 | 14 | from .tags import key_classification, landuse_classification, activity_classification 15 | 16 | #################################################################################### 17 | # Under uncertainty: Residential assumption? 18 | RESIDENTIAL_ASSUMPTION_UNCERTAINTY = True 19 | #################################################################################### 20 | 21 | ############################################ 22 | ### Tag land use classification 23 | ############################################ 24 | 25 | def aggregate_classification(classification_list): 26 | """ 27 | Aggregate into a unique classification given an input list of classifications 28 | 29 | Parameters 30 | ---------- 31 | classification_list : list 32 | list with the input land use classifications 33 | 34 | Returns 35 | ---------- 36 | string 37 | returns the aggregated classification 38 | """ 39 | if ("other" in classification_list): # other tag -> Non-interesting building 40 | classification = None 41 | elif ( ("activity" in classification_list) and ("residential" in classification_list) ): # Mixed 42 | classification = "mixed" 43 | elif ( "mixed" in classification_list ): # Mixed 44 | classification = "mixed" 45 | elif ("activity" in classification_list): # Activity 46 | classification = "activity" 47 | elif ("residential" in classification_list): # Residential 48 | classification = "residential" 49 | elif ("infer" in classification_list): # To infer 50 | classification = "infer" 51 | else: # No valuable classification 52 | classification = None 53 | 54 | return classification 55 | 56 | def classify_tag(tags, return_key_value=True): 57 | """ 58 | Classify the land use of input OSM tag in `activity`, `residential`, `mixed`, None, or `infer` (to infer later) 59 | 60 | Parameters 61 | ---------- 62 | tags : dict 63 | OpenStreetMap tags 64 | 65 | Returns 66 | ---------- 67 | string, dict 68 | returns the classification, and a dict relating `key`:`value` defining its classification 69 | """ 70 | # key_value: Dictionary of osm key : osm value 71 | classification, key_value = [], {} 72 | 73 | for key, value in key_classification.items(): 74 | # Get the corresponding key tag (without its land use) 75 | key_tag = key.replace("activity_","").replace("residential_","").replace("other_","").replace("infer_","") 76 | 77 | if tags.get(key_tag) in value: 78 | # First part of key defines the land use 79 | new_classification = key.split("_")[0] 80 | # Add the new classification 81 | classification.append( new_classification ) 82 | # Associate the key-value 83 | key_value[key_tag] = tags.get(key_tag) 84 | 85 | classification = aggregate_classification(classification) 86 | 87 | if (return_key_value): 88 | return classification, key_value 89 | else: 90 | return classification 91 | 92 | ############################################ 93 | ### Land use inference 94 | ############################################ 95 | 96 | def classify_landuse_inference(land_use): 97 | """ 98 | Classify input land use into a defined category: `other`, `activity`, `residential`, or None 99 | 100 | Parameters 101 | ---------- 102 | land_use : string 103 | input land use tag 104 | 105 | Returns 106 | ---------- 107 | string 108 | returns the land use classification 109 | """ 110 | for key, value in landuse_classification.items(): 111 | # key: Classification ; value: keys contained in the classification 112 | if (land_use in value): 113 | return key 114 | # Uncertain case 115 | if (RESIDENTIAL_ASSUMPTION_UNCERTAINTY): # Undefined use. Assumption: Residential 116 | return "residential" 117 | else: 118 | return None # No tag 119 | 120 | def compute_landuse_inference(df_buildings, df_landuse): 121 | """ 122 | Compute land use inference for building polygons with no information 123 | The inference is done using polygons with defined land use 124 | A building polygon's land use is inferred by means of adopting the land use of the smallest encompassing polygon with defined land use 125 | 126 | Parameters 127 | ---------- 128 | df_buildings : geopandas.GeoDataFrame 129 | input buildings 130 | df_landuse : geopandas.GeoDataFrame 131 | land use polygons to aid inference procedure 132 | 133 | Returns 134 | ---------- 135 | 136 | """ 137 | # Get those indices which need to be inferred, and keep geometry column only 138 | df_buildings_to_infer = df_buildings.loc[ df_buildings['classification'] == 'infer', ["geometry"] ] 139 | # Add land use polygon's area 140 | df_landuse['area'] = df_landuse.apply(lambda x: x.geometry.area, axis=1) 141 | 142 | # Get geometries to infer within Land use polygons matching 143 | sjoin = gpd.sjoin(df_buildings_to_infer, df_landuse, op='within') 144 | 145 | # Add index column to sort values 146 | sjoin['index'] = sjoin.index 147 | # Sort values by index, then by area 148 | sjoin.sort_values(by=['index','area'], inplace=True) 149 | # Drop duplicates. Keep first (minimum computing area) 150 | sjoin.drop_duplicates(subset=['index'], keep='first', inplace=True) 151 | 152 | ##### Set key:value and classification 153 | # Set default value: inferred:None 154 | df_buildings.loc[ df_buildings_to_infer.index, "key_value" ] = df_buildings.loc[ df_buildings_to_infer.index].apply(lambda x: {"inferred":None} , axis=1) 155 | # Set land use for those buildings within a defined land use polygon 156 | df_buildings.loc[ sjoin.index, "key_value" ] = sjoin.apply(lambda x: {'inferred':x.landuse}, axis=1) 157 | 158 | # Set classification 159 | df_buildings.loc[ df_buildings_to_infer.index, "classification" ] = df_buildings.loc[ df_buildings_to_infer.index, "key_value" ].apply(lambda x: classify_landuse_inference(x.get("inferred")) ) 160 | 161 | # Remove useless rows 162 | df_buildings.drop( df_buildings[ df_buildings.classification.isin([None,"other"]) ].index, inplace=True) 163 | df_buildings.reset_index(inplace=True,drop=True) 164 | assert( len( df_buildings[df_buildings.classification.isnull()] ) == 0 ) 165 | 166 | ############################################ 167 | ### Activity type classification 168 | ############################################ 169 | 170 | def value_activity_category(x): 171 | """ 172 | Classify the activity of input activity value 173 | 174 | Parameters 175 | ---------- 176 | x : string 177 | activity value 178 | 179 | Returns 180 | ---------- 181 | string 182 | returns the activity classification 183 | """ 184 | for key, value in activity_classification.items(): 185 | if x in value: 186 | return key 187 | return None 188 | 189 | def key_value_activity_category(key, value): 190 | """ 191 | Classify the activity of input pair key:value 192 | 193 | Parameters 194 | ---------- 195 | key : string 196 | key dict 197 | value : string 198 | value dict 199 | 200 | Returns 201 | ---------- 202 | string 203 | returns the activity classification 204 | """ 205 | # Note that some values repeat for different keys (e.g. shop=fuel and amenity=fuel), but they do not belong to the same activity classification 206 | return { 207 | 'shop': 'shop', 208 | 'leisure': 'leisure/amenity', 209 | 'amenity': 'leisure/amenity', 210 | 'man_made' : 'commercial/industrial', 211 | 'industrial' : 'commercial/industrial', 212 | 'landuse' : value_activity_category(value), 213 | 'inferred' : value_activity_category(value), # Inferred cases adopted land use values 214 | 'building' : value_activity_category(value), 215 | 'building:use' : value_activity_category(value), 216 | 'building:part' : value_activity_category(value) 217 | }.get(key, None) 218 | 219 | def classify_activity_category(key_values): 220 | """ 221 | Classify input activity category into `commercial/industrial`, `leisure/amenity`, or `shop` 222 | 223 | Parameters 224 | ---------- 225 | key_values : dict 226 | contain pairs of key:value relating to its usage 227 | 228 | Returns 229 | ---------- 230 | string 231 | returns the activity classification 232 | """ 233 | #################### 234 | ### Categories: commercial/industrial, leisure/amenity, shop 235 | #################### 236 | categories = set( [ key_value_activity_category(key,value) for key,value in key_values.items() ] ) 237 | categories.discard(None) 238 | return list(categories) 239 | -------------------------------------------------------------------------------- /urbansprawl/osm/core.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import osmnx as ox 7 | import pandas as pd 8 | import numpy as np 9 | import time 10 | import os.path 11 | from osmnx import log 12 | 13 | from .overpass import create_landuse_gdf, create_pois_gdf, create_building_parts_gdf, create_buildings_gdf_from_input, retrieve_route_graph 14 | from .tags import columns_osm_tag, height_tags, building_parts_to_filter 15 | from .classification import compute_landuse_inference, classify_tag, classify_activity_category 16 | from .surface import compute_landuses_m2 17 | from .utils import load_geodataframe, store_geodataframe, get_dataframes_filenames, associate_structures, sanity_check_height_tags 18 | 19 | def get_route_graph(city_ref, date="", polygon=None, north=None, south=None, east=None, west=None, force_crs=None): 20 | """ 21 | Wrapper to retrieve city's street network 22 | Loads the data if stored locally 23 | Otherwise, it retrieves the graph from OpenStreetMap using the osmnx package 24 | Input polygon or bounding box coordinates determine the region of interest 25 | 26 | Parameters 27 | ---------- 28 | city_ref : string 29 | name of the city 30 | polygon : shapely.Polygon 31 | polygon shape of input city 32 | north : float 33 | northern latitude of bounding box 34 | south : float 35 | southern latitude of bounding box 36 | east : float 37 | eastern longitude of bounding box 38 | west : float 39 | western longitude of bounding box 40 | force_crs : dict 41 | graph will be projected to input crs 42 | 43 | Returns 44 | ---------- 45 | networkx.multidigraph 46 | projected graph 47 | """ 48 | return retrieve_route_graph(city_ref, date, polygon, north, south, east, west, force_crs) 49 | 50 | def get_processed_osm_data(city_ref=None, region_args={"polygon":None, "place":None, "which_result":1, "point":None, "address":None, "distance":None, "north":None, "south":None, "east":None, "west":None}, 51 | kwargs={"retrieve_graph":True, "default_height":3, "meters_per_level":3, "associate_landuses_m2":True, "mixed_building_first_floor_activity":True, "minimum_m2_building_area":9, "date":None}): 52 | """ 53 | Retrieves buildings, building parts, and Points of Interest associated with a residential/activity land use from OpenStreetMap data for input city 54 | If a name for input city is given, the data will be loaded (if it was previously stored) 55 | If no stored files exist, it will query and process the data and store it under the city name 56 | Queries data for input region (polygon, place, point/address and distance around, or bounding box coordinates) 57 | Additional arguments will drive the overall process 58 | 59 | Parameters 60 | ---------- 61 | city_ref : str 62 | Name of input city / region 63 | region_args : dict 64 | contains the information to retrieve the region of interest as the following: 65 | polygon : shapely Polygon or MultiPolygon 66 | geographic shape to fetch the landuse footprints within 67 | place : string or dict 68 | query string or structured query dict to geocode/download 69 | which_result : int 70 | result number to retrieve from geocode/download when using query string 71 | point : tuple 72 | the (lat, lon) central point around which to construct the graph 73 | address : string 74 | the address to geocode and use as the central point around which to construct the graph 75 | distance : int 76 | retain only those nodes within this many meters of the center of the graph 77 | north : float 78 | northern latitude of bounding box 79 | south : float 80 | southern latitude of bounding box 81 | east : float 82 | eastern longitude of bounding box 83 | west : float 84 | western longitude of bounding box 85 | kwargs : dict 86 | additional arguments to drive the process: 87 | retrieve_graph : boolean 88 | that determines if the street network for input city has to be retrieved and stored 89 | default_height : float 90 | height of buildings under missing data 91 | meters_per_level : float 92 | buildings number of levels assumed under missing data 93 | associate_landuses_m2 : boolean 94 | compute the total square meter for each land use 95 | mixed_building_first_floor_activity : Boolean 96 | if True: Associates building's first floor to activity uses and the rest to residential uses 97 | if False: Associates half of the building's area to each land use (Activity and Residential) 98 | minimum_m2_building_area : float 99 | minimum area to be considered a building (otherwise filtered) 100 | date : datetime.datetime 101 | query the database at a certain time-stamp 102 | 103 | Returns 104 | ---------- 105 | [ gpd.GeoDataFrame, gpd.GeoDataFrame, gpd.GeoDataFrame ] 106 | returns the output geo dataframe containing all buildings, building parts, and points associated to a residential or activity land usage 107 | 108 | """ 109 | log("OSM data requested for city: " + str(city_ref) ) 110 | 111 | start_time = time.time() 112 | 113 | if (city_ref): 114 | geo_poly_file, geo_poly_parts_file, geo_point_file = get_dataframes_filenames(city_ref) 115 | 116 | ########################## 117 | ### Stored file ? 118 | ########################## 119 | if ( os.path.isfile(geo_poly_file) ): # File exists 120 | log("Found stored files for city " + city_ref) 121 | # Load local GeoDataFrames 122 | return load_geodataframe(geo_poly_file), load_geodataframe(geo_poly_parts_file), load_geodataframe(geo_point_file) 123 | 124 | # Get keyword arguments for input region of interest 125 | polygon, place, which_result, point, address, distance, north, south, east, west = region_args.get("polygon"), region_args.get("place"), region_args.get("which_result"), region_args.get("point"), region_args.get("address"), region_args.get("distance"), region_args.get("north"), region_args.get("south"), region_args.get("east"), region_args.get("west") 126 | 127 | ### Valid input? 128 | if not( any( [not (polygon is None), place, point, address, north, south, east, west] ) ): 129 | log("Error: Must provide at least one type of input") 130 | return None, None, None 131 | 132 | if ( kwargs.get("date") ): # Non-null date 133 | date_ = kwargs.get("date").strftime("%Y-%m-%dT%H:%M:%SZ") 134 | log("Requesting OSM database at time-stamp: " + date_) 135 | # e.g.: [date:"2004-05-06T00:00:00Z"] 136 | date_query = '[date:"'+date_+'"]' 137 | else: 138 | date_query = "" 139 | 140 | ########################## 141 | ### Overpass query: Buildings 142 | ########################## 143 | # Query and update bounding box / polygon 144 | df_osm_built, polygon, north, south, east, west = create_buildings_gdf_from_input(date=date_query, polygon=polygon, place=place, which_result=which_result, point=point, address=address, distance=distance, north=north, south=south, east=east, west=west) 145 | df_osm_built["osm_id"] = df_osm_built.index 146 | df_osm_built.reset_index(drop=True, inplace=True) 147 | df_osm_built.gdf_name = str(city_ref) + '_buildings' if not city_ref is None else 'buildings' 148 | ########################## 149 | ### Overpass query: Land use polygons. Aid to perform buildings land use inference 150 | ########################## 151 | df_osm_lu = create_landuse_gdf(date=date_query, polygon=polygon, north=north, south=south, east=east, west=west) 152 | df_osm_lu["osm_id"] = df_osm_lu.index 153 | # Drop useless columns 154 | columns_of_interest = ["osm_id", "geometry", "landuse"] 155 | df_osm_lu.drop( [ col for col in list( df_osm_lu.columns ) if not col in columns_of_interest ], axis=1, inplace=True ) 156 | df_osm_lu.reset_index(drop=True, inplace=True) 157 | df_osm_lu.gdf_name = str(city_ref) + '_landuse' if not city_ref is None else 'landuse' 158 | ########################## 159 | ### Overpass query: POIs 160 | ########################## 161 | df_osm_pois = create_pois_gdf(date=date_query, polygon=polygon, north=north, south=south, east=east, west=west) 162 | df_osm_pois["osm_id"] = df_osm_pois.index 163 | df_osm_pois.reset_index(drop=True, inplace=True) 164 | df_osm_pois.gdf_name = str(city_ref) + '_points' if not city_ref is None else 'points' 165 | ########## 166 | ### Overpass query: Building parts. Allow to calculate the real amount of M^2 for each building 167 | ########## 168 | df_osm_building_parts = create_building_parts_gdf(date=date_query, polygon=polygon, north=north, south=south, east=east, west=west) 169 | # Filter: 1) rows not needed (roof, etc) and 2) building that already exists in `buildings` extract 170 | if ("building" in df_osm_building_parts.columns): 171 | df_osm_building_parts = df_osm_building_parts[ (~ df_osm_building_parts["building:part"].isin(building_parts_to_filter) ) & (~ df_osm_building_parts["building:part"].isnull() ) & (df_osm_building_parts["building"].isnull()) ] 172 | else: 173 | df_osm_building_parts = df_osm_building_parts[ (~ df_osm_building_parts["building:part"].isin(building_parts_to_filter) ) & (~ df_osm_building_parts["building:part"].isnull() ) ] 174 | df_osm_building_parts["osm_id"] = df_osm_building_parts.index 175 | df_osm_building_parts.reset_index(drop=True, inplace=True) 176 | df_osm_building_parts.gdf_name = str(city_ref) + '_building_parts' if not city_ref is None else 'building_parts' 177 | 178 | log("Done: OSM data requests. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 179 | 180 | #################################################### 181 | ### Sanity check of height tags 182 | #################################################### 183 | start_time = time.time() 184 | 185 | sanity_check_height_tags(df_osm_built) 186 | sanity_check_height_tags(df_osm_building_parts) 187 | 188 | def remove_nan_dict(x): # Remove entries with NaN values 189 | return { k:v for k, v in x.items() if pd.notnull(v) } 190 | 191 | df_osm_built['height_tags'] = df_osm_built[ [ c for c in height_tags if c in df_osm_built.columns ] ].apply(lambda x: remove_nan_dict(x.to_dict() ), axis=1) 192 | df_osm_building_parts['height_tags'] = df_osm_building_parts[ [ c for c in height_tags if c in df_osm_building_parts.columns ] ].apply(lambda x: remove_nan_dict(x.to_dict() ), axis=1) 193 | 194 | ########### 195 | ### Remove columns which do not provide valuable information 196 | ########### 197 | columns_of_interest = columns_osm_tag + ["osm_id", "geometry", "height_tags"] 198 | df_osm_built.drop( [ col for col in list( df_osm_built.columns ) if not col in columns_of_interest ], axis=1, inplace=True ) 199 | df_osm_building_parts.drop( [ col for col in list( df_osm_building_parts.columns ) if not col in columns_of_interest ], axis=1, inplace=True) 200 | 201 | columns_of_interest = columns_osm_tag + ["osm_id", "geometry"] 202 | df_osm_pois.drop( [ col for col in list( df_osm_pois.columns ) if not col in columns_of_interest ], axis=1, inplace=True ) 203 | 204 | 205 | log('Done: Height tags sanity check and unnecessary columns have been dropped. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 206 | 207 | ########### 208 | ### Classification 209 | ########### 210 | start_time = time.time() 211 | 212 | df_osm_built['classification'], df_osm_built['key_value'] = list( zip(*df_osm_built.apply( classify_tag, axis=1) ) ) 213 | df_osm_pois['classification'], df_osm_pois['key_value'] = list( zip(*df_osm_pois.apply( classify_tag, axis=1) ) ) 214 | df_osm_building_parts['classification'], df_osm_building_parts['key_value'] = list( zip(*df_osm_building_parts.apply( classify_tag, axis=1) ) ) 215 | 216 | # Remove unnecessary buildings 217 | df_osm_built.drop( df_osm_built[ df_osm_built.classification.isnull() ].index, inplace=True ) 218 | df_osm_built.reset_index(inplace=True, drop=True) 219 | # Remove unnecessary POIs 220 | df_osm_pois.drop( df_osm_pois[ df_osm_pois.classification.isin(["infer","other"]) | df_osm_pois.classification.isnull() ].index, inplace=True ) 221 | df_osm_pois.reset_index(inplace=True, drop=True) 222 | # Building parts will acquire its containing building land use if it is not available 223 | df_osm_building_parts.loc[ df_osm_building_parts.classification.isin(["infer","other"]), "classification" ] = None 224 | 225 | log('Done: OSM tags classification. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 226 | 227 | ########### 228 | ### Remove already used tags 229 | ########### 230 | start_time = time.time() 231 | 232 | df_osm_built.drop( [ c for c in columns_osm_tag if c in df_osm_built.columns ], axis=1, inplace=True ) 233 | df_osm_pois.drop( [ c for c in columns_osm_tag if c in df_osm_pois.columns ], axis=1, inplace=True ) 234 | df_osm_building_parts.drop( [ c for c in columns_osm_tag if c in df_osm_building_parts.columns ], axis=1, inplace=True) 235 | 236 | ########### 237 | ### Project, drop small buildings and reset indices 238 | ########### 239 | ### Project to UTM coordinates within the same zone 240 | df_osm_built = ox.project_gdf(df_osm_built) 241 | df_osm_lu = ox.project_gdf(df_osm_lu, to_crs=df_osm_built.crs) 242 | df_osm_pois = ox.project_gdf(df_osm_pois, to_crs=df_osm_built.crs) 243 | df_osm_building_parts = ox.project_gdf(df_osm_building_parts, to_crs=df_osm_built.crs) 244 | 245 | # Drop buildings with an area lower than a threshold 246 | df_osm_built.drop( df_osm_built[ df_osm_built.geometry.area < kwargs["minimum_m2_building_area"] ].index, inplace=True ) 247 | 248 | log('Done: Geometries re-projection. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 249 | 250 | #################################################### 251 | ### Infer buildings land use (under uncertainty) 252 | #################################################### 253 | start_time = time.time() 254 | 255 | compute_landuse_inference(df_osm_built, df_osm_lu) 256 | # Free space 257 | del df_osm_lu 258 | 259 | assert( len( df_osm_built[df_osm_built.key_value =={"inferred":"other"} ] ) == 0 ) 260 | assert( len( df_osm_built[df_osm_built.classification.isnull()] ) == 0 ) 261 | assert( len( df_osm_pois[df_osm_pois.classification.isnull()] ) == 0 ) 262 | 263 | log('Done: Land use deduction. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 264 | 265 | #################################################### 266 | ### Associate for each building, its containing building parts and Points of interest 267 | #################################################### 268 | start_time = time.time() 269 | 270 | associate_structures(df_osm_built, df_osm_building_parts, operation='contains', column='containing_parts') 271 | associate_structures(df_osm_built, df_osm_pois, operation='intersects', column='containing_poi') 272 | 273 | # Classify activity types 274 | df_osm_built['activity_category'] = df_osm_built.apply(lambda x: classify_activity_category(x.key_value), axis=1) 275 | df_osm_pois['activity_category'] = df_osm_pois.apply(lambda x: classify_activity_category(x.key_value), axis=1) 276 | df_osm_building_parts['activity_category'] = df_osm_building_parts.apply(lambda x: classify_activity_category(x.key_value), axis=1) 277 | 278 | log('Done: Building parts association and activity categorization. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 279 | 280 | #################################################### 281 | ### Associate effective number of levels, and measure the surface dedicated to each land use per building 282 | #################################################### 283 | if (kwargs["associate_landuses_m2"]): 284 | start_time = time.time() 285 | 286 | default_height = kwargs["default_height"] 287 | meters_per_level = kwargs["meters_per_level"] 288 | mixed_building_first_floor_activity = kwargs["mixed_building_first_floor_activity"] 289 | compute_landuses_m2(df_osm_built, df_osm_building_parts, df_osm_pois, default_height=default_height, meters_per_level=meters_per_level, mixed_building_first_floor_activity=mixed_building_first_floor_activity) 290 | 291 | # Set the composed classification given, for each building, its containing Points of Interest and building parts classification 292 | df_osm_built.loc[ df_osm_built.apply(lambda x: x.landuses_m2["activity"]>0 and x.landuses_m2["residential"]>0, axis=1 ), "classification" ] = "mixed" 293 | 294 | log('Done: Land uses surface association. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 295 | 296 | df_osm_built.loc[ df_osm_built.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan 297 | df_osm_pois.loc[ df_osm_pois.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan 298 | df_osm_building_parts.loc[ df_osm_building_parts.activity_category.apply(lambda x: len(x)==0 ), "activity_category" ] = np.nan 299 | 300 | ########################## 301 | ### Overpass query: Street network graph 302 | ########################## 303 | if (kwargs["retrieve_graph"]): # Save graph for input city shape 304 | start_time = time.time() 305 | 306 | get_route_graph(city_ref, date=date_query, polygon=polygon, north=north, south=south, east=east, west=west, force_crs=df_osm_built.crs) 307 | 308 | log('Done: Street network graph retrieval. Elapsed time (H:M:S): ' + time.strftime("%H:%M:%S", time.gmtime(time.time()-start_time)) ) 309 | 310 | ########################## 311 | ### Store file ? 312 | ########################## 313 | if ( city_ref ): # File exists 314 | # Save GeoDataFrames 315 | store_geodataframe(df_osm_built, geo_poly_file) 316 | store_geodataframe(df_osm_building_parts, geo_poly_parts_file) 317 | store_geodataframe(df_osm_pois, geo_point_file) 318 | log("Stored OSM data files for city: "+city_ref) 319 | 320 | return df_osm_built, df_osm_building_parts, df_osm_pois -------------------------------------------------------------------------------- /urbansprawl/osm/overpass.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import time 7 | import geopandas as gpd 8 | from shapely.geometry import Point 9 | from shapely.geometry import Polygon 10 | from shapely.geometry import MultiPolygon 11 | 12 | from osmnx import log 13 | import logging as lg 14 | import osmnx as ox 15 | 16 | ####################################################################### 17 | ### Buildings 18 | ####################################################################### 19 | 20 | def create_buildings_gdf_from_input(date="", polygon=None, place=None, which_result=1, point=None, address=None, distance=None, north=None, south=None, east=None, west=None): 21 | """ 22 | Retrieve OSM buildings according to input data 23 | Queries data for input region (polygon, place, point/address and distance around, or bounding box coordinates) 24 | Updates the used polygon/bounding box to determine the region of interest 25 | 26 | Parameters 27 | ---------- 28 | date : string 29 | query the database at a certain timestamp 30 | polygon : shapely Polygon or MultiPolygon 31 | geographic shape to fetch the landuse footprints within 32 | place : string or dict 33 | query string or structured query dict to geocode/download 34 | which_result : int 35 | result number to retrieve from geocode/download when using query string 36 | point : tuple 37 | the (lat, lon) central point around which to construct the graph 38 | address : string 39 | the address to geocode and use as the central point around which to construct the graph 40 | distance : int 41 | retain only those nodes within this many meters of the center of the graph 42 | north : float 43 | northern latitude of bounding box 44 | south : float 45 | southern latitude of bounding box 46 | east : float 47 | eastern longitude of bounding box 48 | west : float 49 | western longitude of bounding box 50 | 51 | Returns 52 | ---------- 53 | [ geopandas.GeoDataFrame, shapely.Polygon, float, float, float, float ] 54 | retrieved buildings, region of interest polygon, and region of interest bounding box 55 | """ 56 | ########################## 57 | ### Osmnx query: Buildings 58 | ########################## 59 | if (not polygon is None): # Polygon 60 | log("Input type: Polygon") 61 | # If input geo data frame, extract polygon shape 62 | if ( type(polygon) is gpd.GeoDataFrame ): 63 | assert( polygon.shape[0] == 1 ) 64 | polygon = polygon.geometry[0] 65 | df_osm_built = buildings_from_polygon(date, polygon) 66 | 67 | elif ( all( [point,distance] ) ): # Point + distance 68 | log("Input type: Point") 69 | df_osm_built = buildings_from_point(date, point, distance=distance) 70 | # Get bounding box 71 | west, south, east, north = df_osm_built.total_bounds 72 | 73 | elif ( all( [address,distance] ) ): # Address 74 | log("Input type: Address") 75 | df_osm_built = buildings_from_address(date, address, distance=distance) 76 | # Get bounding box 77 | west, south, east, north = df_osm_built.total_bounds 78 | 79 | elif (place): # Place 80 | log("Input type: Place") 81 | if (which_result is None): which_result = 1 82 | df_osm_built = buildings_from_place(date, place, which_result=which_result) 83 | # Get encompassing polygon 84 | poly_gdf = ox.gdf_from_place(place, which_result=which_result) 85 | polygon = poly_gdf.geometry[0] 86 | 87 | elif ( all( [north,south,east,west] ) ): # Bounding box 88 | log("Input type: Bounding box") 89 | # Create points in specific order 90 | p1 = (east,north) 91 | p2 = (west,north) 92 | p3 = (west,south) 93 | p4 = (east,south) 94 | polygon = Polygon( [p1,p2,p3,p4] ) 95 | df_osm_built = buildings_from_polygon(date, polygon) 96 | else: 97 | log("Error: Must provide at least one input") 98 | return 99 | return df_osm_built, polygon, north, south, east, west 100 | 101 | def osm_bldg_download(date="", polygon=None, north=None, south=None, east=None, west=None, 102 | timeout=180, memory=None, max_query_area_size=50*1000*50*1000): 103 | """ 104 | Download OpenStreetMap building footprint data. 105 | Parameters 106 | ---------- 107 | date : string 108 | query the database at a certain timestamp 109 | polygon : shapely Polygon or MultiPolygon 110 | geographic shape to fetch the building footprints within 111 | north : float 112 | northern latitude of bounding box 113 | south : float 114 | southern latitude of bounding box 115 | east : float 116 | eastern longitude of bounding box 117 | west : float 118 | western longitude of bounding box 119 | timeout : int 120 | the timeout interval for requests and to pass to API 121 | memory : int 122 | server memory allocation size for the query, in bytes. If none, server 123 | will use its default allocation size 124 | max_query_area_size : float 125 | max area for any part of the geometry, in the units the geometry is in: 126 | any polygon bigger will get divided up for multiple queries to API 127 | (default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are 128 | meters)) 129 | Returns 130 | ------- 131 | list 132 | list of response_json dicts 133 | """ 134 | 135 | # check if we're querying by polygon or by bounding box based on which 136 | # argument(s) where passed into this function 137 | by_poly = polygon is not None 138 | by_bbox = not (north is None or south is None or east is None or west is None) 139 | if not (by_poly or by_bbox): 140 | raise ValueError('You must pass a polygon or north, south, east, and west') 141 | 142 | response_jsons = [] 143 | 144 | # pass server memory allocation in bytes for the query to the API 145 | # if None, pass nothing so the server will use its default allocation size 146 | # otherwise, define the query's maxsize parameter value as whatever the 147 | # caller passed in 148 | if memory is None: 149 | maxsize = '' 150 | else: 151 | maxsize = '[maxsize:{}]'.format(memory) 152 | 153 | # define the query to send the API 154 | if by_bbox: 155 | # turn bbox into a polygon and project to local UTM 156 | polygon = Polygon([(west, south), (east, south), (east, north), (west, north)]) 157 | geometry_proj, crs_proj = ox.project_geometry(polygon) 158 | 159 | # subdivide it if it exceeds the max area size (in meters), then project 160 | # back to lat-long 161 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 162 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 163 | log('Requesting building footprints data within bounding box from API in {:,} request(s)'.format(len(geometry))) 164 | start_time = time.time() 165 | 166 | # loop through each polygon rectangle in the geometry (there will only 167 | # be one if original bbox didn't exceed max area size) 168 | for poly in geometry: 169 | # represent bbox as south,west,north,east and round lat-longs to 8 170 | # decimal places (ie, within 1 mm) so URL strings aren't different 171 | # due to float rounding issues (for consistent caching) 172 | west, south, east, north = poly.bounds 173 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((way["building"]({south:.8f},' 174 | '{west:.8f},{north:.8f},{east:.8f});(._;>;););(relation["building"]' 175 | '({south:.8f},{west:.8f},{north:.8f},{east:.8f});(._;>;);););out;') 176 | query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize) 177 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 178 | response_jsons.append(response_json) 179 | msg = ('Got all building footprints data within bounding box from ' 180 | 'API in {:,} request(s) and {:,.2f} seconds') 181 | log(msg.format(len(geometry), time.time()-start_time)) 182 | 183 | elif by_poly: 184 | # project to utm, divide polygon up into sub-polygons if area exceeds a 185 | # max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates 186 | geometry_proj, crs_proj = ox.project_geometry(polygon) 187 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 188 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 189 | polygon_coord_strs = ox.get_polygons_coordinates(geometry) 190 | log('Requesting building footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs))) 191 | start_time = time.time() 192 | 193 | # pass each polygon exterior coordinates in the list to the API, one at 194 | # a time 195 | for polygon_coord_str in polygon_coord_strs: 196 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(way' 197 | '(poly:"{polygon}")["building"];(._;>;);relation' 198 | '(poly:"{polygon}")["building"];(._;>;););out;') 199 | query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize) 200 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 201 | response_jsons.append(response_json) 202 | msg = ('Got all building footprints data within polygon from API in ' 203 | '{:,} request(s) and {:,.2f} seconds') 204 | log(msg.format(len(polygon_coord_strs), time.time()-start_time)) 205 | 206 | return response_jsons 207 | 208 | 209 | def create_buildings_gdf(date="", polygon=None, north=None, south=None, east=None, 210 | west=None, retain_invalid=False): 211 | """ 212 | Get building footprint data from OSM then assemble it into a GeoDataFrame. 213 | Parameters 214 | ---------- 215 | date : string 216 | query the database at a certain timestamp 217 | polygon : shapely Polygon or MultiPolygon 218 | geographic shape to fetch the building footprints within 219 | north : float 220 | northern latitude of bounding box 221 | south : float 222 | southern latitude of bounding box 223 | east : float 224 | eastern longitude of bounding box 225 | west : float 226 | western longitude of bounding box 227 | retain_invalid : bool 228 | if False discard any building footprints with an invalid geometry 229 | Returns 230 | ------- 231 | GeoDataFrame 232 | """ 233 | 234 | responses = osm_bldg_download(date, polygon, north, south, east, west) 235 | 236 | vertices = {} 237 | for response in responses: 238 | for result in response['elements']: 239 | if 'type' in result and result['type']=='node': 240 | vertices[result['id']] = {'lat' : result['lat'], 241 | 'lon' : result['lon']} 242 | 243 | buildings = {} 244 | for response in responses: 245 | for result in response['elements']: 246 | if 'type' in result and result['type']=='way': 247 | nodes = result['nodes'] 248 | try: 249 | polygon = Polygon([(vertices[node]['lon'], vertices[node]['lat']) for node in nodes]) 250 | except Exception: 251 | log('Polygon has invalid geometry: {}'.format(nodes)) 252 | building = {'nodes' : nodes, 253 | 'geometry' : polygon} 254 | 255 | if 'tags' in result: 256 | for tag in result['tags']: 257 | building[tag] = result['tags'][tag] 258 | 259 | buildings[result['id']] = building 260 | 261 | gdf = gpd.GeoDataFrame(buildings).T 262 | gdf.crs = {'init':'epsg:4326'} 263 | 264 | if not retain_invalid: 265 | # drop all invalid geometries 266 | gdf = gdf[gdf['geometry'].is_valid] 267 | 268 | return gdf 269 | 270 | 271 | def buildings_from_point(date, point, distance, retain_invalid=False): 272 | """ 273 | Get building footprints within some distance north, south, east, and west of 274 | a lat-long point. 275 | Parameters 276 | ---------- 277 | date : string 278 | query the database at a certain timestamp 279 | point : tuple 280 | a lat-long point 281 | distance : numeric 282 | distance in meters 283 | retain_invalid : bool 284 | if False discard any building footprints with an invalid geometry 285 | Returns 286 | ------- 287 | GeoDataFrame 288 | """ 289 | 290 | bbox = ox.bbox_from_point(point=point, distance=distance) 291 | north, south, east, west = bbox 292 | return create_buildings_gdf(date=date, north=north, south=south, east=east, west=west, retain_invalid=retain_invalid) 293 | 294 | 295 | def buildings_from_address(date, address, distance, retain_invalid=False): 296 | """ 297 | Get building footprints within some distance north, south, east, and west of 298 | an address. 299 | Parameters 300 | ---------- 301 | date : string 302 | query the database at a certain timestamp 303 | address : string 304 | the address to geocode to a lat-long point 305 | distance : numeric 306 | distance in meters 307 | retain_invalid : bool 308 | if False discard any building footprints with an invalid geometry 309 | Returns 310 | ------- 311 | GeoDataFrame 312 | """ 313 | 314 | # geocode the address string to a (lat, lon) point 315 | point = ox.geocode(query=address) 316 | 317 | # get buildings within distance of this point 318 | return buildings_from_point(date, point, distance, retain_invalid=retain_invalid) 319 | 320 | 321 | def buildings_from_polygon(date, polygon, retain_invalid=False): 322 | """ 323 | Get building footprints within some polygon. 324 | Parameters 325 | ---------- 326 | date : string 327 | query the database at a certain timestamp 328 | polygon : Polygon 329 | retain_invalid : bool 330 | if False discard any building footprints with an invalid geometry 331 | Returns 332 | ------- 333 | GeoDataFrame 334 | """ 335 | 336 | return create_buildings_gdf(date=date, polygon=polygon, retain_invalid=retain_invalid) 337 | 338 | 339 | def buildings_from_place(date, place, which_result=1, retain_invalid=False): 340 | """ 341 | Get building footprints within the boundaries of some place. 342 | Parameters 343 | ---------- 344 | date : string 345 | query the database at a certain timestamp 346 | place : string 347 | the query to geocode to get geojson boundary polygon 348 | which_result : int 349 | result number to retrieve from geocode/download when using query string 350 | retain_invalid : bool 351 | if False discard any building footprints with an invalid geometry 352 | Returns 353 | ------- 354 | GeoDataFrame 355 | """ 356 | city = ox.gdf_from_place(place, which_result=which_result) 357 | polygon = city['geometry'].iloc[0] 358 | return create_buildings_gdf(date=date, polygon=polygon, retain_invalid=retain_invalid) 359 | 360 | ####################################################################### 361 | ### Street network graph 362 | ####################################################################### 363 | 364 | def retrieve_route_graph(city_ref, date="", polygon=None, north=None, south=None, east=None, west=None, force_crs=None): 365 | """ 366 | Retrieves street network graph for given `city_ref` 367 | Loads the data if stored locally 368 | Otherwise, it retrieves the graph from OpenStreetMap using the osmnx package 369 | Input polygon or bounding box coordinates determine the region of interest 370 | 371 | Parameters 372 | ---------- 373 | city_ref : string 374 | name of the city 375 | date : string 376 | query the database at a certain timestamp 377 | polygon : shapely.Polygon 378 | polygon shape of input city 379 | north : float 380 | northern latitude of bounding box 381 | south : float 382 | southern latitude of bounding box 383 | east : float 384 | eastern longitude of bounding box 385 | west : float 386 | western longitude of bounding box 387 | force_crs : dict 388 | graph will be projected to input crs 389 | 390 | Returns 391 | ---------- 392 | networkx.multidigraph 393 | projected graph 394 | """ 395 | try: 396 | G = ox.load_graphml(city_ref+'_network.graphml') 397 | log( "Found graph for `"+city_ref+"` stored locally" ) 398 | except: 399 | try: 400 | if (not polygon is None): 401 | G = graph_from_polygon(polygon, network_type='drive_service', date=date) 402 | elif ( all( [north,south,east,west] ) ): 403 | G = graph_from_bbox(north, south, east, west, network_type='drive_service', date=date) 404 | else: # No inputs 405 | log("Need an input to retrieve graph") 406 | assert(False) 407 | 408 | # Set graph name 409 | G.graph['name'] = str(city_ref) + '_street_network' if not city_ref is None else 'street_network' 410 | 411 | # Project graph 412 | G = ox.project_graph(G, to_crs=force_crs) 413 | 414 | # Save street network as GraphML file 415 | ox.save_graphml(G, filename=city_ref+'_network.graphml') 416 | log( "Graph for `"+city_ref+"` has been retrieved and stored" ) 417 | except Exception as e: 418 | log( "Osmnx graph could not be retrieved."+str(e), level=lg.ERROR ) 419 | return None 420 | return G 421 | 422 | def graph_from_polygon(polygon, network_type='all_private', simplify=True, 423 | retain_all=False, truncate_by_edge=False, name='unnamed', 424 | timeout=180, memory=None, date="", 425 | max_query_area_size=50*1000*50*1000, 426 | clean_periphery=True, infrastructure='way["highway"]'): 427 | """ 428 | Create a networkx graph from OSM data within the spatial boundaries of the 429 | passed-in shapely polygon. 430 | Parameters 431 | ---------- 432 | polygon : shapely Polygon or MultiPolygon 433 | the shape to get network data within. coordinates should be in units of 434 | latitude-longitude degrees. 435 | network_type : string 436 | what type of street network to get 437 | simplify : bool 438 | if true, simplify the graph topology 439 | retain_all : bool 440 | if True, return the entire graph even if it is not connected 441 | truncate_by_edge : bool 442 | if True retain node if it's outside bbox but at least one of node's 443 | neighbors are within bbox 444 | name : string 445 | the name of the graph 446 | timeout : int 447 | the timeout interval for requests and to pass to API 448 | memory : int 449 | server memory allocation size for the query, in bytes. If none, server 450 | will use its default allocation size 451 | date : string 452 | query the database at a certain timestamp 453 | max_query_area_size : float 454 | max size for any part of the geometry, in square degrees: any polygon 455 | bigger will get divided up for multiple queries to API 456 | clean_periphery : bool 457 | if True (and simplify=True), buffer 0.5km to get a graph larger than 458 | requested, then simplify, then truncate it to requested spatial extent 459 | infrastructure : string 460 | download infrastructure of given type (default is streets (ie, 'way["highway"]') but other 461 | infrastructures may be selected like power grids (ie, 'way["power"~"line"]')) 462 | Returns 463 | ------- 464 | networkx multidigraph 465 | """ 466 | 467 | # verify that the geometry is valid and is a shapely Polygon/MultiPolygon 468 | # before proceeding 469 | if not polygon.is_valid: 470 | raise ValueError('Shape does not have a valid geometry') 471 | if not isinstance(polygon, (Polygon, MultiPolygon)): 472 | raise ValueError('Geometry must be a shapely Polygon or MultiPolygon') 473 | 474 | if clean_periphery and simplify: 475 | # create a new buffered polygon 0.5km around the desired one 476 | buffer_dist = 500 477 | polygon_utm, crs_utm = ox.project_geometry(geometry=polygon) 478 | polygon_proj_buff = polygon_utm.buffer(buffer_dist) 479 | polygon_buffered, _ = ox.project_geometry(geometry=polygon_proj_buff, crs=crs_utm, to_latlong=True) 480 | 481 | # get the network data from OSM, create the buffered graph, then 482 | # truncate it to the buffered polygon 483 | response_jsons = osm_net_download(polygon=polygon_buffered, network_type=network_type, 484 | timeout=timeout, memory=memory, 485 | max_query_area_size=max_query_area_size, 486 | infrastructure=infrastructure) 487 | G_buffered = ox.create_graph(response_jsons, name=name, retain_all=True, network_type=network_type) 488 | G_buffered = ox.truncate_graph_polygon(G_buffered, polygon_buffered, retain_all=True, truncate_by_edge=truncate_by_edge) 489 | 490 | # simplify the graph topology 491 | G_buffered = ox.simplify_graph(G_buffered) 492 | 493 | # truncate graph by polygon to return the graph within the polygon that 494 | # caller wants. don't simplify again - this allows us to retain 495 | # intersections along the street that may now only connect 2 street 496 | # segments in the network, but in reality also connect to an 497 | # intersection just outside the polygon 498 | G = ox.truncate_graph_polygon(G_buffered, polygon, retain_all=retain_all, truncate_by_edge=truncate_by_edge) 499 | 500 | # count how many street segments in buffered graph emanate from each 501 | # intersection in un-buffered graph, to retain true counts for each 502 | # intersection, even if some of its neighbors are outside the polygon 503 | G.graph['streets_per_node'] = ox.count_streets_per_node(G_buffered, nodes=G.nodes()) 504 | 505 | else: 506 | # download a list of API responses for the polygon/multipolygon 507 | response_jsons = osm_net_download(polygon=polygon, network_type=network_type, 508 | timeout=timeout, memory=memory, 509 | max_query_area_size=max_query_area_size, 510 | infrastructure=infrastructure) 511 | 512 | # create the graph from the downloaded data 513 | G = ox.create_graph(response_jsons, name=name, retain_all=True, network_type=network_type) 514 | 515 | # truncate the graph to the extent of the polygon 516 | G = ox.truncate_graph_polygon(G, polygon, retain_all=retain_all, truncate_by_edge=truncate_by_edge) 517 | 518 | # simplify the graph topology as the last step. don't truncate after 519 | # simplifying or you may have simplified out to an endpoint beyond the 520 | # truncation distance, in which case you will then strip out your entire 521 | # edge 522 | if simplify: 523 | G = ox.simplify_graph(G) 524 | 525 | log('graph_from_polygon() returning graph with {:,} nodes and {:,} edges'.format(len(list(G.nodes())), len(list(G.edges())))) 526 | return G 527 | 528 | def graph_from_bbox(north, south, east, west, network_type='all_private', 529 | simplify=True, retain_all=False, truncate_by_edge=False, 530 | name='unnamed', timeout=180, memory=None, date="", 531 | max_query_area_size=50*1000*50*1000, clean_periphery=True, 532 | infrastructure='way["highway"]'): 533 | """ 534 | Create a networkx graph from OSM data within some bounding box. 535 | Parameters 536 | ---------- 537 | north : float 538 | northern latitude of bounding box 539 | south : float 540 | southern latitude of bounding box 541 | east : float 542 | eastern longitude of bounding box 543 | west : float 544 | western longitude of bounding box 545 | network_type : string 546 | what type of street network to get 547 | simplify : bool 548 | if true, simplify the graph topology 549 | retain_all : bool 550 | if True, return the entire graph even if it is not connected 551 | truncate_by_edge : bool 552 | if True retain node if it's outside bbox but at least one of node's 553 | neighbors are within bbox 554 | name : string 555 | the name of the graph 556 | timeout : int 557 | the timeout interval for requests and to pass to API 558 | memory : int 559 | server memory allocation size for the query, in bytes. If none, server 560 | will use its default allocation size 561 | date : string 562 | query the database at a certain timestamp 563 | max_query_area_size : float 564 | max size for any part of the geometry, in square degrees: any polygon 565 | bigger will get divided up for multiple queries to API 566 | clean_periphery : bool 567 | if True (and simplify=True), buffer 0.5km to get a graph larger than 568 | requested, then simplify, then truncate it to requested spatial extent 569 | infrastructure : string 570 | download infrastructure of given type (default is streets (ie, 'way["highway"]') but other 571 | infrastructures may be selected like power grids (ie, 'way["power"~"line"]')) 572 | Returns 573 | ------- 574 | networkx multidigraph 575 | """ 576 | 577 | if clean_periphery and simplify: 578 | # create a new buffered bbox 0.5km around the desired one 579 | buffer_dist = 500 580 | polygon = Polygon([(west, north), (west, south), (east, south), (east, north)]) 581 | polygon_utm, crs_utm = ox.project_geometry(geometry=polygon) 582 | polygon_proj_buff = polygon_utm.buffer(buffer_dist) 583 | polygon_buff, _ = ox.project_geometry(geometry=polygon_proj_buff, crs=crs_utm, to_latlong=True) 584 | west_buffered, south_buffered, east_buffered, north_buffered = polygon_buff.bounds 585 | 586 | # get the network data from OSM then create the graph 587 | response_jsons = osm_net_download(north=north_buffered, south=south_buffered, 588 | east=east_buffered, west=west_buffered, 589 | network_type=network_type, timeout=timeout, 590 | memory=memory, date=date, 591 | max_query_area_size=max_query_area_size, 592 | infrastructure=infrastructure) 593 | G_buffered = ox.create_graph(response_jsons, name=name, retain_all=retain_all, network_type=network_type) 594 | G = ox.truncate_graph_bbox(G_buffered, north, south, east, west, retain_all=True, truncate_by_edge=truncate_by_edge) 595 | 596 | # simplify the graph topology 597 | G_buffered = ox.simplify_graph(G_buffered) 598 | 599 | # truncate graph by desired bbox to return the graph within the bbox 600 | # caller wants 601 | G = ox.truncate_graph_bbox(G_buffered, north, south, east, west, retain_all=retain_all, truncate_by_edge=truncate_by_edge) 602 | 603 | # count how many street segments in buffered graph emanate from each 604 | # intersection in un-buffered graph, to retain true counts for each 605 | # intersection, even if some of its neighbors are outside the bbox 606 | G.graph['streets_per_node'] = ox.count_streets_per_node(G_buffered, nodes=G.nodes()) 607 | 608 | else: 609 | # get the network data from OSM 610 | response_jsons = osm_net_download(north=north, south=south, east=east, 611 | west=west, network_type=network_type, 612 | timeout=timeout, memory=memory, date=date, 613 | max_query_area_size=max_query_area_size, 614 | infrastructure=infrastructure) 615 | 616 | # create the graph, then truncate to the bounding box 617 | G = ox.create_graph(response_jsons, name=name, retain_all=retain_all, network_type=network_type) 618 | G = ox.truncate_graph_bbox(G, north, south, east, west, retain_all=retain_all, truncate_by_edge=truncate_by_edge) 619 | 620 | # simplify the graph topology as the last step. don't truncate after 621 | # simplifying or you may have simplified out to an endpoint 622 | # beyond the truncation distance, in which case you will then strip out 623 | # your entire edge 624 | if simplify: 625 | G = ox.simplify_graph(G) 626 | 627 | log('graph_from_bbox() returning graph with {:,} nodes and {:,} edges'.format(len(list(G.nodes())), len(list(G.edges())))) 628 | return G 629 | 630 | def osm_net_download(polygon=None, north=None, south=None, east=None, west=None, 631 | network_type='all_private', timeout=180, memory=None, date="", 632 | max_query_area_size=50*1000*50*1000, infrastructure='way["highway"]'): 633 | """ 634 | Download OSM ways and nodes within some bounding box from the Overpass API. 635 | Parameters 636 | ---------- 637 | polygon : shapely Polygon or MultiPolygon 638 | geographic shape to fetch the street network within 639 | north : float 640 | northern latitude of bounding box 641 | south : float 642 | southern latitude of bounding box 643 | east : float 644 | eastern longitude of bounding box 645 | west : float 646 | western longitude of bounding box 647 | network_type : string 648 | {'walk', 'bike', 'drive', 'drive_service', 'all', 'all_private'} what 649 | type of street network to get 650 | timeout : int 651 | the timeout interval for requests and to pass to API 652 | memory : int 653 | server memory allocation size for the query, in bytes. If none, server 654 | will use its default allocation size 655 | date : string 656 | query the database at a certain timestamp 657 | max_query_area_size : float 658 | max area for any part of the geometry, in the units the geometry is in: 659 | any polygon bigger will get divided up for multiple queries to API 660 | (default is 50,000 * 50,000 units [ie, 50km x 50km in area, if units are 661 | meters]) 662 | infrastructure : string 663 | download infrastructure of given type. default is streets, ie, 664 | 'way["highway"]') but other infrastructures may be selected like power 665 | grids, ie, 'way["power"~"line"]' 666 | Returns 667 | ------- 668 | response_jsons : list 669 | """ 670 | 671 | # check if we're querying by polygon or by bounding box based on which 672 | # argument(s) where passed into this function 673 | by_poly = polygon is not None 674 | by_bbox = not (north is None or south is None or east is None or west is None) 675 | if not (by_poly or by_bbox): 676 | raise ValueError('You must pass a polygon or north, south, east, and west') 677 | 678 | # create a filter to exclude certain kinds of ways based on the requested 679 | # network_type 680 | osm_filter = ox.get_osm_filter(network_type) 681 | response_jsons = [] 682 | 683 | # pass server memory allocation in bytes for the query to the API 684 | # if None, pass nothing so the server will use its default allocation size 685 | # otherwise, define the query's maxsize parameter value as whatever the 686 | # caller passed in 687 | if memory is None: 688 | maxsize = '' 689 | else: 690 | maxsize = '[maxsize:{}]'.format(memory) 691 | 692 | # define the query to send the API 693 | # specifying way["highway"] means that all ways returned must have a highway 694 | # key. the {filters} then remove ways by key/value. the '>' makes it recurse 695 | # so we get ways and way nodes. maxsize is in bytes. 696 | if by_bbox: 697 | # turn bbox into a polygon and project to local UTM 698 | polygon = Polygon([(west, south), (east, south), (east, north), (west, north)]) 699 | geometry_proj, crs_proj = ox.project_geometry(polygon) 700 | 701 | # subdivide it if it exceeds the max area size (in meters), then project 702 | # back to lat-long 703 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 704 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 705 | log('Requesting network data within bounding box from API in {:,} request(s)'.format(len(geometry))) 706 | start_time = time.time() 707 | 708 | # loop through each polygon rectangle in the geometry (there will only 709 | # be one if original bbox didn't exceed max area size) 710 | for poly in geometry: 711 | # represent bbox as south,west,north,east and round lat-longs to 8 712 | # decimal places (ie, within 1 mm) so URL strings aren't different 713 | # due to float rounding issues (for consistent caching) 714 | west, south, east, north = poly.bounds 715 | query_template = date+'[out:json][timeout:{timeout}]{maxsize};({infrastructure}{filters}({south:.8f},{west:.8f},{north:.8f},{east:.8f});>;);out;' 716 | query_str = query_template.format(north=north, south=south, 717 | east=east, west=west, 718 | infrastructure=infrastructure, 719 | filters=osm_filter, 720 | timeout=timeout, maxsize=maxsize) 721 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 722 | response_jsons.append(response_json) 723 | log('Got all network data within bounding box from API in {:,} request(s) and {:,.2f} seconds'.format(len(geometry), time.time()-start_time)) 724 | 725 | elif by_poly: 726 | # project to utm, divide polygon up into sub-polygons if area exceeds a 727 | # max size (in meters), project back to lat-long, then get a list of 728 | # polygon(s) exterior coordinates 729 | geometry_proj, crs_proj = ox.project_geometry(polygon) 730 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 731 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 732 | polygon_coord_strs = ox.get_polygons_coordinates(geometry) 733 | log('Requesting network data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs))) 734 | start_time = time.time() 735 | 736 | # pass each polygon exterior coordinates in the list to the API, one at 737 | # a time 738 | for polygon_coord_str in polygon_coord_strs: 739 | query_template = date+'[out:json][timeout:{timeout}]{maxsize};({infrastructure}{filters}(poly:"{polygon}");>;);out;' 740 | query_str = query_template.format(polygon=polygon_coord_str, infrastructure=infrastructure, filters=osm_filter, timeout=timeout, maxsize=maxsize) 741 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 742 | response_jsons.append(response_json) 743 | log('Got all network data within polygon from API in {:,} request(s) and {:,.2f} seconds'.format(len(polygon_coord_strs), time.time()-start_time)) 744 | 745 | return response_jsons 746 | 747 | ####################################################################### 748 | ### Land use 749 | ####################################################################### 750 | 751 | def osm_landuse_download(date="", polygon=None, north=None, south=None, east=None, west=None, 752 | timeout=180, memory=None, max_query_area_size=50*1000*50*1000): 753 | """ 754 | Download OpenStreetMap landuse footprint data. 755 | Parameters 756 | ---------- 757 | date : string 758 | query the database at a certain timestamp 759 | polygon : shapely Polygon or MultiPolygon 760 | geographic shape to fetch the landuse footprints within 761 | north : float 762 | northern latitude of bounding box 763 | south : float 764 | southern latitude of bounding box 765 | east : float 766 | eastern longitude of bounding box 767 | west : float 768 | western longitude of bounding box 769 | timeout : int 770 | the timeout interval for requests and to pass to API 771 | memory : int 772 | server memory allocation size for the query, in bytes. If none, server 773 | will use its default allocation size 774 | max_query_area_size : float 775 | max area for any part of the geometry, in the units the geometry is in: 776 | any polygon bigger will get divided up for multiple queries to API 777 | (default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are 778 | meters)) 779 | Returns 780 | ------- 781 | list 782 | list of response_json dicts 783 | """ 784 | 785 | # check if we're querying by polygon or by bounding box based on which 786 | # argument(s) where passed into this function 787 | by_poly = polygon is not None 788 | by_bbox = not (north is None or south is None or east is None or west is None) 789 | if not (by_poly or by_bbox): 790 | raise ValueError('You must pass a polygon or north, south, east, and west') 791 | 792 | response_jsons = [] 793 | 794 | # pass server memory allocation in bytes for the query to the API 795 | # if None, pass nothing so the server will use its default allocation size 796 | # otherwise, define the query's maxsize parameter value as whatever the 797 | # caller passed in 798 | if memory is None: 799 | maxsize = '' 800 | else: 801 | maxsize = '[maxsize:{}]'.format(memory) 802 | 803 | # define the query to send the API 804 | if by_bbox: 805 | # turn bbox into a polygon and project to local UTM 806 | polygon = Polygon([(west, south), (east, south), (east, north), (west, north)]) 807 | geometry_proj, crs_proj = ox.project_geometry(polygon) 808 | 809 | # subdivide it if it exceeds the max area size (in meters), then project 810 | # back to lat-long 811 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 812 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 813 | log('Requesting landuse footprints data within bounding box from API in {:,} request(s)'.format(len(geometry))) 814 | start_time = time.time() 815 | 816 | # loop through each polygon rectangle in the geometry (there will only 817 | # be one if original bbox didn't exceed max area size) 818 | for poly in geometry: 819 | # represent bbox as south,west,north,east and round lat-longs to 8 820 | # decimal places (ie, within 1 mm) so URL strings aren't different 821 | # due to float rounding issues (for consistent caching) 822 | west, south, east, north = poly.bounds 823 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((way["landuse"]({south:.8f},' 824 | '{west:.8f},{north:.8f},{east:.8f});(._;>;););(relation["landuse"]' 825 | '({south:.8f},{west:.8f},{north:.8f},{east:.8f});(._;>;);););out;') 826 | query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize) 827 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 828 | response_jsons.append(response_json) 829 | msg = ('Got all landuse footprints data within bounding box from ' 830 | 'API in {:,} request(s) and {:,.2f} seconds') 831 | log(msg.format(len(geometry), time.time()-start_time)) 832 | 833 | elif by_poly: 834 | # project to utm, divide polygon up into sub-polygons if area exceeds a 835 | # max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates 836 | geometry_proj, crs_proj = ox.project_geometry(polygon) 837 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 838 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 839 | polygon_coord_strs = ox.get_polygons_coordinates(geometry) 840 | log('Requesting landuse footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs))) 841 | start_time = time.time() 842 | 843 | # pass each polygon exterior coordinates in the list to the API, one at 844 | # a time 845 | for polygon_coord_str in polygon_coord_strs: 846 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(way' 847 | '(poly:"{polygon}")["landuse"];(._;>;);relation' 848 | '(poly:"{polygon}")["landuse"];(._;>;););out;') 849 | query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize) 850 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 851 | response_jsons.append(response_json) 852 | msg = ('Got all landuse footprints data within polygon from API in ' 853 | '{:,} request(s) and {:,.2f} seconds') 854 | log(msg.format(len(polygon_coord_strs), time.time()-start_time)) 855 | 856 | return response_jsons 857 | 858 | def create_landuse_gdf(date="", polygon=None, north=None, south=None, east=None, 859 | west=None, retain_invalid=False): 860 | """ 861 | Get landuse footprint data from OSM then assemble it into a GeoDataFrame. 862 | Parameters 863 | ---------- 864 | date : string 865 | query the database at a certain timestamp 866 | polygon : shapely Polygon or MultiPolygon 867 | geographic shape to fetch the landuse footprints within 868 | north : float 869 | northern latitude of bounding box 870 | south : float 871 | southern latitude of bounding box 872 | east : float 873 | eastern longitude of bounding box 874 | west : float 875 | western longitude of bounding box 876 | retain_invalid : bool 877 | if False discard any landuse footprints with an invalid geometry 878 | Returns 879 | ------- 880 | GeoDataFrame 881 | """ 882 | 883 | responses = osm_landuse_download(date, polygon, north, south, east, west) 884 | 885 | vertices = {} 886 | for response in responses: 887 | for result in response['elements']: 888 | if 'type' in result and result['type']=='node': 889 | vertices[result['id']] = {'lat' : result['lat'], 890 | 'lon' : result['lon']} 891 | 892 | landuses = {} 893 | for response in responses: 894 | for result in response['elements']: 895 | if 'type' in result and result['type']=='way': 896 | nodes = result['nodes'] 897 | try: 898 | polygon = Polygon([(vertices[node]['lon'], vertices[node]['lat']) for node in nodes]) 899 | except Exception: 900 | log('Polygon has invalid geometry: {}'.format(nodes)) 901 | landuse = {'nodes' : nodes, 902 | 'geometry' : polygon} 903 | 904 | if 'tags' in result: 905 | for tag in result['tags']: 906 | landuse[tag] = result['tags'][tag] 907 | 908 | landuses[result['id']] = landuse 909 | 910 | gdf = gpd.GeoDataFrame(landuses).T 911 | gdf.crs = {'init':'epsg:4326'} 912 | 913 | if not retain_invalid: 914 | # drop all invalid geometries 915 | gdf = gdf[gdf['geometry'].is_valid] 916 | 917 | return gdf 918 | 919 | ####################################################################### 920 | ### Points of interest 921 | ####################################################################### 922 | 923 | def osm_pois_download(date="", polygon=None, north=None, south=None, east=None, west=None, 924 | timeout=180, memory=None, max_query_area_size=50*1000*50*1000): 925 | """ 926 | Download OpenStreetMap POIs footprint data. 927 | Parameters 928 | ---------- 929 | date : string 930 | query the database at a certain timestamp 931 | polygon : shapely Polygon or MultiPolygon 932 | geographic shape to fetch the POIs footprints within 933 | north : float 934 | northern latitude of bounding box 935 | south : float 936 | southern latitude of bounding box 937 | east : float 938 | eastern longitude of bounding box 939 | west : float 940 | western longitude of bounding box 941 | timeout : int 942 | the timeout interval for requests and to pass to API 943 | memory : int 944 | server memory allocation size for the query, in bytes. If none, server 945 | will use its default allocation size 946 | max_query_area_size : float 947 | max area for any part of the geometry, in the units the geometry is in: 948 | any polygon bigger will get divided up for multiple queries to API 949 | (default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are 950 | meters)) 951 | Returns 952 | ------- 953 | list 954 | list of response_json dicts 955 | """ 956 | 957 | # check if we're querying by polygon or by bounding box based on which 958 | # argument(s) where passed into this function 959 | by_poly = polygon is not None 960 | by_bbox = not (north is None or south is None or east is None or west is None) 961 | if not (by_poly or by_bbox): 962 | raise ValueError('You must pass a polygon or north, south, east, and west') 963 | 964 | response_jsons = [] 965 | 966 | # pass server memory allocation in bytes for the query to the API 967 | # if None, pass nothing so the server will use its default allocation size 968 | # otherwise, define the query's maxsize parameter value as whatever the 969 | # caller passed in 970 | if memory is None: 971 | maxsize = '' 972 | else: 973 | maxsize = '[maxsize:{}]'.format(memory) 974 | 975 | # define the query to send the API 976 | if by_bbox: 977 | # turn bbox into a polygon and project to local UTM 978 | polygon = Polygon([(west, south), (east, south), (east, north), (west, north)]) 979 | geometry_proj, crs_proj = ox.project_geometry(polygon) 980 | 981 | # subdivide it if it exceeds the max area size (in meters), then project 982 | # back to lat-long 983 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 984 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 985 | log('Requesting POIs footprints data within bounding box from API in {:,} request(s)'.format(len(geometry))) 986 | start_time = time.time() 987 | 988 | # loop through each polygon rectangle in the geometry (there will only 989 | # be one if original bbox didn't exceed max area size) 990 | for poly in geometry: 991 | # represent bbox as south,west,north,east and round lat-longs to 8 992 | # decimal places (ie, within 1 mm) so URL strings aren't different 993 | # due to float rounding issues (for consistent caching) 994 | west, south, east, north = poly.bounds 995 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((node["amenity"]({south:.8f},' 996 | '{west:.8f},{north:.8f},{east:.8f}););(node["leisure"]({south:.8f},' 997 | '{west:.8f},{north:.8f},{east:.8f}););(node["office"]({south:.8f},' 998 | '{west:.8f},{north:.8f},{east:.8f}););(node["shop"]({south:.8f},' 999 | '{west:.8f},{north:.8f},{east:.8f}););(node["sport"]({south:.8f},' 1000 | '{west:.8f},{north:.8f},{east:.8f}););(node["building"]({south:.8f},' 1001 | '{west:.8f},{north:.8f},{east:.8f});););out;') 1002 | query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize) 1003 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 1004 | response_jsons.append(response_json) 1005 | msg = ('Got all POIs footprints data within bounding box from ' 1006 | 'API in {:,} request(s) and {:,.2f} seconds') 1007 | log(msg.format(len(geometry), time.time()-start_time)) 1008 | 1009 | elif by_poly: 1010 | # project to utm, divide polygon up into sub-polygons if area exceeds a 1011 | # max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates 1012 | geometry_proj, crs_proj = ox.project_geometry(polygon) 1013 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 1014 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 1015 | polygon_coord_strs = ox.get_polygons_coordinates(geometry) 1016 | log('Requesting POIs footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs))) 1017 | start_time = time.time() 1018 | 1019 | # pass each polygon exterior coordinates in the list to the API, one at 1020 | # a time 1021 | for polygon_coord_str in polygon_coord_strs: 1022 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(' 1023 | '(node["amenity"](poly:"{polygon}"););' 1024 | '(node["leisure"](poly:"{polygon}"););' 1025 | '(node["office"](poly:"{polygon}"););' 1026 | '(node["shop"](poly:"{polygon}"););' 1027 | '(node["sport"](poly:"{polygon}"););' 1028 | '(node["building"](poly:"{polygon}");););out;') 1029 | query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize) 1030 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 1031 | response_jsons.append(response_json) 1032 | msg = ('Got all POIs footprints data within polygon from API in ' 1033 | '{:,} request(s) and {:,.2f} seconds') 1034 | log(msg.format(len(polygon_coord_strs), time.time()-start_time)) 1035 | 1036 | return response_jsons 1037 | 1038 | def create_pois_gdf(date="", polygon=None, north=None, south=None, east=None, 1039 | west=None, retain_invalid=False): 1040 | """ 1041 | Get POIs footprint data from OSM then assemble it into a GeoDataFrame. 1042 | Parameters 1043 | ---------- 1044 | date : string 1045 | query the database at a certain timestamp 1046 | polygon : shapely Polygon or MultiPolygon 1047 | geographic shape to fetch the POIs footprints within 1048 | north : float 1049 | northern latitude of bounding box 1050 | south : float 1051 | southern latitude of bounding box 1052 | east : float 1053 | eastern longitude of bounding box 1054 | west : float 1055 | western longitude of bounding box 1056 | retain_invalid : bool 1057 | if False discard any POIs footprints with an invalid geometry 1058 | Returns 1059 | ------- 1060 | GeoDataFrame 1061 | """ 1062 | 1063 | responses = osm_pois_download(date, polygon, north, south, east, west) 1064 | 1065 | vertices = {} 1066 | for response in responses: 1067 | for result in response['elements']: 1068 | if 'type' in result and result['type']=='node': 1069 | 1070 | point = Point( result['lon'], result['lat'] ) 1071 | 1072 | POI = {'geometry' : point} 1073 | 1074 | if 'tags' in result: 1075 | for tag in result['tags']: 1076 | POI[tag] = result['tags'][tag] 1077 | 1078 | vertices[result['id']] = POI 1079 | 1080 | gdf = gpd.GeoDataFrame(vertices).T 1081 | gdf.crs = {'init':'epsg:4326'} 1082 | 1083 | if not retain_invalid: 1084 | try: 1085 | # drop all invalid geometries 1086 | gdf = gdf[gdf['geometry'].is_valid] 1087 | except: # Empty data frame 1088 | # Create a one-row data frame with null information (avoid later Spatial-Join crash) 1089 | if (polygon is not None): # Polygon given 1090 | point = polygon.centroid 1091 | else: # Bounding box 1092 | point = Point( (east+west)/2. , (north+south)/2. ) 1093 | data = {"geometry":[point], "osm_id":[0]} 1094 | gdf = gpd.GeoDataFrame(data, crs={'init': 'epsg:4326'}) 1095 | 1096 | return gdf 1097 | 1098 | ####################################################################### 1099 | ### OSM Building parts 1100 | ####################################################################### 1101 | 1102 | def osm_bldg_part_download(date="", polygon=None, north=None, south=None, east=None, west=None, 1103 | timeout=180, memory=None, max_query_area_size=50*1000*50*1000): 1104 | """ 1105 | Download OpenStreetMap building parts footprint data. 1106 | Parameters 1107 | ---------- 1108 | date : string 1109 | query the database at a certain timestamp 1110 | polygon : shapely Polygon or MultiPolygon 1111 | geographic shape to fetch the building footprints within 1112 | north : float 1113 | northern latitude of bounding box 1114 | south : float 1115 | southern latitude of bounding box 1116 | east : float 1117 | eastern longitude of bounding box 1118 | west : float 1119 | western longitude of bounding box 1120 | timeout : int 1121 | the timeout interval for requests and to pass to API 1122 | memory : int 1123 | server memory allocation size for the query, in bytes. If none, server 1124 | will use its default allocation size 1125 | max_query_area_size : float 1126 | max area for any part of the geometry, in the units the geometry is in: 1127 | any polygon bigger will get divided up for multiple queries to API 1128 | (default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are 1129 | meters)) 1130 | Returns 1131 | ------- 1132 | list 1133 | list of response_json dicts 1134 | """ 1135 | 1136 | # check if we're querying by polygon or by bounding box based on which 1137 | # argument(s) where passed into this function 1138 | by_poly = polygon is not None 1139 | by_bbox = not (north is None or south is None or east is None or west is None) 1140 | if not (by_poly or by_bbox): 1141 | raise ValueError('You must pass a polygon or north, south, east, and west') 1142 | 1143 | response_jsons = [] 1144 | 1145 | # pass server memory allocation in bytes for the query to the API 1146 | # if None, pass nothing so the server will use its default allocation size 1147 | # otherwise, define the query's maxsize parameter value as whatever the 1148 | # caller passed in 1149 | if memory is None: 1150 | maxsize = '' 1151 | else: 1152 | maxsize = '[maxsize:{}]'.format(memory) 1153 | 1154 | # define the query to send the API 1155 | if by_bbox: 1156 | # turn bbox into a polygon and project to local UTM 1157 | polygon = Polygon([(west, south), (east, south), (east, north), (west, north)]) 1158 | geometry_proj, crs_proj = ox.project_geometry(polygon) 1159 | 1160 | # subdivide it if it exceeds the max area size (in meters), then project 1161 | # back to lat-long 1162 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 1163 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 1164 | log('Requesting building part footprints data within bounding box from API in {:,} request(s)'.format(len(geometry))) 1165 | start_time = time.time() 1166 | 1167 | # loop through each polygon rectangle in the geometry (there will only 1168 | # be one if original bbox didn't exceed max area size) 1169 | for poly in geometry: 1170 | # represent bbox as south,west,north,east and round lat-longs to 8 1171 | # decimal places (ie, within 1 mm) so URL strings aren't different 1172 | # due to float rounding issues (for consistent caching) 1173 | west, south, east, north = poly.bounds 1174 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};((way["building:part"]({south:.8f},' 1175 | '{west:.8f},{north:.8f},{east:.8f});(._;>;););(relation["building:part"]' 1176 | '({south:.8f},{west:.8f},{north:.8f},{east:.8f});(._;>;);););out;') 1177 | query_str = query_template.format(north=north, south=south, east=east, west=west, timeout=timeout, maxsize=maxsize) 1178 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 1179 | response_jsons.append(response_json) 1180 | msg = ('Got all building part footprints data within bounding box from ' 1181 | 'API in {:,} request(s) and {:,.2f} seconds') 1182 | log(msg.format(len(geometry), time.time()-start_time)) 1183 | 1184 | elif by_poly: 1185 | # project to utm, divide polygon up into sub-polygons if area exceeds a 1186 | # max size (in meters), project back to lat-long, then get a list of polygon(s) exterior coordinates 1187 | geometry_proj, crs_proj = ox.project_geometry(polygon) 1188 | geometry_proj_consolidated_subdivided = ox.consolidate_subdivide_geometry(geometry_proj, max_query_area_size=max_query_area_size) 1189 | geometry, _ = ox.project_geometry(geometry_proj_consolidated_subdivided, crs=crs_proj, to_latlong=True) 1190 | polygon_coord_strs = ox.get_polygons_coordinates(geometry) 1191 | log('Requesting building part footprints data within polygon from API in {:,} request(s)'.format(len(polygon_coord_strs))) 1192 | start_time = time.time() 1193 | 1194 | # pass each polygon exterior coordinates in the list to the API, one at 1195 | # a time 1196 | for polygon_coord_str in polygon_coord_strs: 1197 | query_template = (date+'[out:json][timeout:{timeout}]{maxsize};(way' 1198 | '(poly:"{polygon}")["building:part"];(._;>;);relation' 1199 | '(poly:"{polygon}")["building:part"];(._;>;););out;') 1200 | query_str = query_template.format(polygon=polygon_coord_str, timeout=timeout, maxsize=maxsize) 1201 | response_json = ox.overpass_request(data={'data':query_str}, timeout=timeout) 1202 | response_jsons.append(response_json) 1203 | msg = ('Got all building part footprints data within polygon from API in ' 1204 | '{:,} request(s) and {:,.2f} seconds') 1205 | log(msg.format(len(polygon_coord_strs), time.time()-start_time)) 1206 | 1207 | return response_jsons 1208 | 1209 | 1210 | 1211 | def create_building_parts_gdf(date="", polygon=None, north=None, south=None, east=None, 1212 | west=None, retain_invalid=False): 1213 | """ 1214 | Get building footprint data from OSM then assemble it into a GeoDataFrame. 1215 | If no building parts are retrieved, a default (null-data) point located at the centroid of the region of interest is created 1216 | 1217 | Parameters 1218 | ---------- 1219 | date : string 1220 | query the database at a certain timestamp 1221 | polygon : shapely Polygon or MultiPolygon 1222 | geographic shape to fetch the building footprints within 1223 | north : float 1224 | northern latitude of bounding box 1225 | south : float 1226 | southern latitude of bounding box 1227 | east : float 1228 | eastern longitude of bounding box 1229 | west : float 1230 | western longitude of bounding box 1231 | retain_invalid : bool 1232 | if False discard any building footprints with an invalid geometry 1233 | Returns 1234 | ------- 1235 | GeoDataFrame 1236 | """ 1237 | 1238 | responses = osm_bldg_part_download(date, polygon, north, south, east, west) 1239 | 1240 | vertices = {} 1241 | for response in responses: 1242 | for result in response['elements']: 1243 | if 'type' in result and result['type']=='node': 1244 | vertices[result['id']] = {'lat' : result['lat'], 1245 | 'lon' : result['lon']} 1246 | 1247 | buildings = {} 1248 | for response in responses: 1249 | for result in response['elements']: 1250 | if 'type' in result and result['type']=='way': 1251 | nodes = result['nodes'] 1252 | try: 1253 | polygon = Polygon([(vertices[node]['lon'], vertices[node]['lat']) for node in nodes]) 1254 | except Exception: 1255 | log('Polygon has invalid geometry: {}'.format(nodes)) 1256 | building = {'nodes' : nodes, 1257 | 'geometry' : polygon} 1258 | 1259 | if 'tags' in result: 1260 | for tag in result['tags']: 1261 | building[tag] = result['tags'][tag] 1262 | 1263 | buildings[result['id']] = building 1264 | 1265 | gdf = gpd.GeoDataFrame(buildings).T 1266 | gdf.crs = {'init':'epsg:4326'} 1267 | 1268 | if not retain_invalid: 1269 | try: 1270 | # drop all invalid geometries 1271 | gdf = gdf[gdf['geometry'].is_valid] 1272 | except: # Empty data frame 1273 | # Create a one-row data frame with null information (avoid later Spatial-Join crash) 1274 | if (polygon is not None): # Polygon given 1275 | point = polygon.centroid 1276 | else: # Bounding box 1277 | point = Point( (east+west)/2. , (north+south)/2. ) 1278 | # Data as records 1279 | data = {"geometry":[point], "osm_id":[0], "building:part":["yes"], "height":[""]} 1280 | gdf = gpd.GeoDataFrame(data, crs={'init': 'epsg:4326'}) 1281 | 1282 | return gdf -------------------------------------------------------------------------------- /urbansprawl/osm/surface.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import osmnx as ox 7 | import pandas as pd 8 | import geopandas as gpd 9 | import numpy as np 10 | from shapely.geometry import Polygon 11 | 12 | from osmnx import log 13 | 14 | from .tags import height_tags, activity_classification 15 | from .classification import aggregate_classification 16 | 17 | ############################################ 18 | ### Land uses surface association 19 | ############################################ 20 | 21 | def get_composed_classification(building, df_pois): 22 | """ 23 | Retrieve the composed classification given the building's containing Points of Interest 24 | 25 | Parameters 26 | ---------- 27 | building : geopandas.GeoSeries 28 | input building 29 | df_pois : geopandas.GeoDataFrame 30 | Points of Interest contained in the building 31 | 32 | Returns 33 | ---------- 34 | geopandas.GeoSeries 35 | returns a composed classification building 36 | """ 37 | # POIs aggregated classification 38 | pois_classification = aggregate_classification( df_pois.classification.values ) 39 | # Composed building-POIs classification 40 | composed_classification = aggregate_classification( [building.classification, pois_classification] ) 41 | # Composed activity categories 42 | try: 43 | composed_activity_category = list( set( [element for list_ in df_pois.activity_category for element in list_] + building.activity_category ) ) 44 | except: # df_pois.activity_category.isnull().all() Returns True 45 | composed_activity_category = building.activity_category 46 | # Create a Series for a new row with composed classification 47 | composed_row = pd.Series( [building.geometry,composed_classification,composed_activity_category,building.building_levels], index=["geometry","classification","activity_category","building_levels"]) 48 | return composed_row 49 | 50 | def sum_landuses(x, landuses_m2, default_classification = None, mixed_building_first_floor_activity=True): 51 | """ 52 | Associate to each land use its correspondent surface use for input building 53 | Mixed-uses building Option 1: 54 | First floor: Activity use 55 | Rest: residential use 56 | Mixed-uses building Option 2: 57 | Half used for Activity uses, the other half Residential use 58 | 59 | 60 | Parameters 61 | ---------- 62 | x : geopandas.GeoSeries 63 | input building 64 | landuses_m2 : dict 65 | squared meter surface associated to each land use 66 | default_classification : pandas.Series 67 | main building land use classification and included activity types 68 | mixed_building_first_floor_activity : Boolean 69 | if True: Associates building's first floor to activity uses and the rest to residential uses 70 | if False: Associates half of the building's area to each land use (Activity and Residential) 71 | 72 | Returns 73 | ---------- 74 | 75 | """ 76 | # Empty 77 | if ( not x.get("geometry")): return 78 | # Mixed building assumption: First level for activity uses, the rest residential use 79 | if (x["classification"] is "activity"): # Sum activity use 80 | landuses_m2["activity"] += x["geometry"].area * x["building_levels"] 81 | # Sum activity category m2 82 | area_per_activity_category = x["geometry"].area * x["building_levels"] / len( x["activity_category"] ) 83 | for activity_type in x["activity_category"]: 84 | landuses_m2[activity_type] += area_per_activity_category 85 | elif (x["classification"] is "mixed"): # Sum activity and residential use 86 | 87 | if (x["building_levels"] > 1) and (mixed_building_first_floor_activity): # More than one level 88 | # First floor 89 | landuses_m2["residential"] += x["geometry"].area * ( x["building_levels"] - 1 ) 90 | # Rest of the building 91 | landuses_m2["activity"] += x["geometry"].area 92 | area_per_activity_category = x["geometry"].area 93 | 94 | else: # One level building 95 | landuses_m2["residential"] += x["geometry"].area * x["building_levels"] / 2. 96 | landuses_m2["activity"] += x["geometry"].area * x["building_levels"] / 2. 97 | area_per_activity_category = ( x["geometry"].area * x["building_levels"] / 2. ) / len( x["activity_category"] ) 98 | 99 | # Sum activity category m2 100 | for activity_type in x["activity_category"]: 101 | landuses_m2[activity_type] += area_per_activity_category 102 | elif (x["classification"] == "residential"): # Sum residential use 103 | landuses_m2["residential"] += x["geometry"].area * x["building_levels"] 104 | else: 105 | # Row does not contain a classification, use given default classification creating a new dict 106 | dict_x = {"classification":default_classification.classification, "geometry":x.geometry, "building_levels":x.building_levels, "activity_category":default_classification.activity_category} 107 | # Recursive call 108 | sum_landuses(dict_x, landuses_m2, mixed_building_first_floor_activity=mixed_building_first_floor_activity) 109 | 110 | 111 | def calculate_landuse_m2(building, mixed_building_first_floor_activity=True): 112 | """ 113 | Calculate the total squared meters associated to residential and activity uses for input building 114 | In addition, surface usage for each activity types is performed 115 | 116 | Parameters 117 | ---------- 118 | building : geopandas.GeoSeries 119 | input building 120 | mixed_building_first_floor_activity : Boolean 121 | if True: Associates building's first floor to activity uses and the rest to residential uses 122 | if False: Associates half of the building's area to each land use (Activity and Residential) 123 | 124 | Returns 125 | ---------- 126 | dict 127 | contains the total associated surface to each land use key 128 | """ 129 | # Initialize 130 | landuse_m2 = {} 131 | landuse_m2["activity"] = 0 132 | landuse_m2["residential"] = 0 133 | for activity_type in list( activity_classification.keys() ): 134 | landuse_m2[activity_type] = 0 135 | 136 | # Get the composed classification from input building + containing POIs 137 | building_composed_classification = get_composed_classification(building, building.pois_full_parts) 138 | 139 | def no_min_level_geometry(building_parts): 140 | """ 141 | Returns building parts with no min. level associated 142 | """ 143 | def no_min_level_tag(x): # Buildings starts from a specific num level? 144 | if ( x.get("building:min_level") or x.get("min_level") or x.get("building:min_height") or x.get("min_height") ): 145 | return True 146 | else: 147 | return False 148 | # Get the geometries of the contained buildings with no height/level tags available 149 | geometries = building_parts.loc[ building_parts.height_tags.apply(lambda x: no_min_level_tag(x) ) ].geometry 150 | 151 | # Create the union of those geometries 152 | no_min_level_geom = Polygon() 153 | for shape in geometries.values: 154 | no_min_level_geom = no_min_level_geom.union(shape) 155 | 156 | # Return the final shape 157 | return no_min_level_geom 158 | 159 | # Remove from the main building geometry, those building parts geometries that do not contain a minimum level/height: Avoid duplicating first level surface 160 | building_composed_classification.geometry = building_composed_classification.geometry.difference( no_min_level_geometry(building.full_parts) ) 161 | 162 | # Sum land uses for main building 163 | sum_landuses(building_composed_classification, landuse_m2, mixed_building_first_floor_activity=mixed_building_first_floor_activity) 164 | 165 | # Sum land uses for building parts. If no classification given, use the building's land use 166 | building.full_parts.apply(lambda x: sum_landuses(x, landuse_m2, building_composed_classification[["classification","activity_category"]], mixed_building_first_floor_activity=mixed_building_first_floor_activity), axis=1) 167 | 168 | return landuse_m2 169 | 170 | def associate_levels(df_osm, default_height, meters_per_level): 171 | """ 172 | Calculate the effective number of levels for each input building 173 | Under missing tag data, default values are used 174 | A column ['building_levels'] is added to the data frame 175 | 176 | Parameters 177 | ---------- 178 | df_osm : geopandas.GeoDataFrame 179 | input data frame 180 | default_height : float 181 | default building height in meters 182 | meters_per_level : float 183 | default meters per level 184 | 185 | Returns 186 | ---------- 187 | 188 | """ 189 | def levels_from_height(height, meters_per_level): 190 | """ 191 | Returns estimated number of levels given input height (meters) 192 | """ 193 | levels = abs( round( height / meters_per_level ) ) 194 | if (levels >= 1): 195 | return levels 196 | else: # By default: 1 level 197 | assert( height > 0 ) 198 | return 1 199 | 200 | def associate_level(x): 201 | """ 202 | Associates the number of levels to input building given its height tags information 203 | Returns the absolute value in order to consider the cases of underground levels 204 | """ 205 | 206 | # No height tags available 207 | if not ( type(x) == dict ): 208 | # Default values 209 | number_levels = levels_from_height(default_height, meters_per_level) 210 | return number_levels 211 | 212 | # Buildings starts from a specific num level? 213 | if ( x.get("building:min_level") ): # building:min_level 214 | min_level = x["building:min_level"] 215 | elif ( x.get("min_level") ): # min_level 216 | min_level = x["min_level"] 217 | ######################### Height based 218 | elif ( x.get("building:min_height") ): # min_level 219 | min_level = levels_from_height( x["building:min_height"], meters_per_level ) 220 | elif ( x.get("min_height") ): # min_level 221 | min_level = levels_from_height( x["min_height"], meters_per_level ) 222 | else: 223 | min_level = 0 224 | 225 | ######################### Levels based 226 | if ( x.get("building:levels") ): # Number of building:levels given 227 | number_levels = abs( x["building:levels"] - min_level ) 228 | elif ( x.get("levels") ): # Number of levels given 229 | number_levels = abs( x["levels"] - min_level ) 230 | ######################### Height based 231 | elif ( x.get("building:height") ): # building:height given 232 | number_levels = abs( levels_from_height( x["building:height"], meters_per_level ) - min_level ) 233 | elif ( x.get("height") ): # height given 234 | number_levels = abs( levels_from_height( x["height"], meters_per_level ) - min_level ) 235 | else: # No information given 236 | number_levels = levels_from_height(default_height, meters_per_level) 237 | 238 | assert( number_levels >= 0 ) 239 | if (number_levels == 0): # By default at least 1 level 240 | number_levels = 1 241 | return number_levels 242 | 243 | df_osm["building_levels"] = df_osm.height_tags.apply( lambda x: associate_level(x) ) 244 | 245 | def classification_sanity_check(building): 246 | """ 247 | Performs a sanity check in order to achieve coherence between the building's classification and the amount of M^2 associated to each land use 248 | Example: A building's classification could be 'residential', but contains its building parts (occupying 100% of the area, then all land uses M^2 associated to this land use) contain an acitivty use 249 | This would impose a problem of coherence between the classification and its surface land use 250 | 251 | Parameters 252 | ---------- 253 | building : geopandas.GeoSeries 254 | one row denoting the building's information 255 | 256 | Returns 257 | ---------- 258 | string 259 | returns the coherent classification 260 | """ 261 | if ( building.landuses_m2["residential"] > 0 ): 262 | if ( building.landuses_m2["activity"] > 0 ): # Mixed use 263 | return "mixed" 264 | else: # Residential use 265 | return "residential" 266 | else: # Activity use 267 | return "activity" 268 | 269 | def compute_landuses_m2(df_osm_built, df_osm_building_parts, df_osm_pois, default_height=6, meters_per_level=3, mixed_building_first_floor_activity=True): 270 | """ 271 | Determine the effective number of levels per building or building parts 272 | Calculate the amount of squared meters associated to residential and activity uses per building 273 | In addition, surface usage for each activity types is performed 274 | 275 | Parameters 276 | ---------- 277 | df_osm_built : geopandas.GeoDataFrame 278 | OSM Buildings 279 | df_osm_building_parts : geopandas.GeoDataFrame 280 | OSM building parts 281 | df_osm_pois : geopandas.GeoDataFrame 282 | OSM Points of interest 283 | default_height : float 284 | default building height in meters 285 | meters_per_level : float 286 | default meters per level 287 | mixed_building_first_floor_activity : Boolean 288 | if True: Associates building's first floor to activity uses and the rest to residential uses 289 | if False: Associates half of the building's area to each land use (Activity and Residential) 290 | 291 | Returns 292 | ---------- 293 | 294 | """ 295 | # Associate the number of levels to each building / building part 296 | associate_levels(df_osm_built, default_height=default_height, meters_per_level=meters_per_level) 297 | associate_levels(df_osm_building_parts, default_height=default_height, meters_per_level=meters_per_level) 298 | 299 | ################## 300 | # Calculate for each building, the M^2 associated to each land usage considering building parts (area calculated given UTM coordinates projection assumption) 301 | ################## 302 | 303 | # Associate the complete data frame of containing building parts 304 | col_interest = ["geometry","activity_category", "classification", "key_value", "height_tags", "building_levels"] 305 | df_osm_built["full_parts"] = df_osm_built.containing_parts.apply(lambda x: df_osm_building_parts.loc[x, col_interest ] if isinstance(x, list) else df_osm_building_parts.loc[ [], col_interest ] ) 306 | 307 | # Associate the complete POIs contained in buildings 308 | col_interest = ["geometry","activity_category", "classification", "key_value"] 309 | df_osm_built["pois_full_parts"] = df_osm_built.containing_poi.apply(lambda x: df_osm_pois.loc[x, col_interest] if isinstance(x, list) else df_osm_pois.loc[ [], col_interest] ) 310 | 311 | # Calculate m2's for each land use, plus for each activity category 312 | df_osm_built["landuses_m2"] = df_osm_built.apply(lambda x: calculate_landuse_m2(x, mixed_building_first_floor_activity=mixed_building_first_floor_activity), axis=1 ) 313 | 314 | # Drop added full parts 315 | df_osm_built.drop( ["full_parts"], axis=1, inplace=True ) 316 | df_osm_built.drop( ["pois_full_parts"], axis=1, inplace=True ) 317 | 318 | # Sanity check: For each building land use classification, its M^2 associated to these land uses must be greater than 1 319 | df_osm_built["classification"] = df_osm_built.apply(lambda x: classification_sanity_check(x), axis=1 ) 320 | -------------------------------------------------------------------------------- /urbansprawl/osm/tags.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | # Height tags 7 | height_tags = [ "min_height", "height", "min_level", "levels", "building:min_height", "building:height", "building:min_level", "building:levels", "building:levels:underground" ] 8 | 9 | # Columns of interest corresponding to OSM keys 10 | columns_osm_tag = [ "amenity", "landuse", "leisure", "shop", "man_made", "building", "building:use", "building:part" ] 11 | 12 | # Building parts which need to be filtered 13 | building_parts_to_filter = ["no", "roof"] 14 | 15 | ################################################################# 16 | ### Classify uses according to OpenStreetMap wiki 17 | ################################################################# 18 | """ 19 | Possible tags classification: 20 | residential: Defines a residential land use 21 | activity: Defines any activity land use 22 | other: Defines any non-residential and non-activity use 23 | infer: Defines any condition where an inference needs to be done (using land use polygons containing them) 24 | 25 | Possible activities classifications: 26 | shop, leisure/amenity, commercial/industrial 27 | """ 28 | 29 | ################################################################# 30 | # Key:Value classification 31 | key_classification = {} 32 | # Land unse key:value classification 33 | landuse_classification = {} 34 | # Specific activity category classification 35 | activity_classification = {"leisure/amenity":[], "shop":[], "commercial/industrial":[]} 36 | 37 | ############# 38 | ### Amenity 39 | ############# 40 | amenities_sustenance = ['bar','pub','restaurant','biergarten','cafe','fast_food','food_court','ice_cream'] 41 | amenities_education = ['college','kindergarten','library','public_bookcase','school','music_school','driving_school','language_school','university'] 42 | amenities_transportation = ['fuel','bicycle_rental','bus_station','car_rental','taxi','car_wash','ferry_terminal'] 43 | amenities_financial = ['atm','bank','bureau_de_change'] 44 | amenities_healthcare = ['baby_hatch','clinic','dentist','doctors','hospital','nursing_home','pharmacy','social_facility','veterinary'] 45 | amenities_entertainment = ['arts_centre','brothel','casino','cinema','community_centre','fountain','gambling','nightclub','planetarium','social_centre','stripclub','studio','swingerclub','theatre'] 46 | amenities_others = ['animal_boarding','animal_shelter','courthouse','coworking_space','crematorium','dive_centre','dojo','embassy','fire_station','gym','internet_cafe','marketplace','police','post_office','townhall'] 47 | 48 | amenities_activities = amenities_sustenance + amenities_education + amenities_transportation + amenities_financial + amenities_healthcare + amenities_entertainment + amenities_others 49 | 50 | key_classification["activity_amenity"] = amenities_activities 51 | activity_classification["leisure/amenity"] += amenities_activities 52 | 53 | ############# 54 | ### Shop 55 | ############# 56 | shop_other = ["bookmaker","copyshop","dry_cleaning","e-cigarette","funeral_directors","laundry","money_lender","pawnbroker","pet","pyrotechnics","religion","tobacco","toys","travel_agency","vacant","weapons","user defined"] 57 | shop_gifts = ["anime","books","gift","lottery","newsagent","stationery","ticket"] 58 | shop_art = ["art","collector","craft","frame","games","model","music","musical_instrument","photo","camera","trophy","video","video_games"] 59 | shop_sports = ["bicycle","car","car_repair","car_parts","fuel","fishing","free_flying","hunting","motorcycle","outdoor","scuba_diving","sports","swimming_pool","tyres"] 60 | shop_electronics = ["computer","electronics","hifi","mobile_phone","radiotechnics","vacuum_cleaner"] 61 | shop_furniture = ["antiques","bed","candles","carpet","curtain","furniture","interior_decoration","kitchen","lamps","tiles","window_blind"] 62 | shop_household = ["agrarian","bathroom_furnishing","doityourself","electrical","energy","fireplace","florist","garden_centre","garden_furniture","gas","glaziery","hardware","houseware","locksmith","paint","security","trade"] 63 | shop_health = ["beauty","chemist","cosmetics","drugstore","erotic","hairdresser","hairdresser_supply","hearing_aids","herbalist","massage","medical_supply","nutrition_supplements","optician","perfumery","tattoo"] 64 | shop_charity = ["charity","second_hand","variety_store"] 65 | shop_clothing = ["baby_goods","bag","boutique","clothes","fabric","fashion","jewelry","leather","shoes","tailor","watches"] 66 | shop_mall = ["department_store","general","kiosk","mall","supermarket"] 67 | shop_food = ["alcohol","bakery","beverages","brewing_supplies","butcher","cheese","chocolate","coffee","confectionery","convenience","deli","dairy","farm","greengrocer","ice_cream","organic","pasta","pastry","seafood","spices","tea","wine"] 68 | 69 | shop_activities = shop_other + shop_gifts + shop_art + shop_sports + shop_electronics + shop_furniture + shop_household + shop_health + shop_charity + shop_clothing + shop_mall + shop_food + ['shop'] 70 | 71 | key_classification["activity_shop"] = shop_activities 72 | activity_classification["shop"] += shop_activities 73 | 74 | ############# 75 | ### Leisure 76 | ############# 77 | #Not tagged as activity: dog_park, bird_hide, bandstand, firepit, fishing, garden, golf_course, marina, nature_reserve, park, playground, slipway, track, wildlife_hide 78 | leisure_activies = ['adult_gaming_centre','amusement_arcade','beach_resort','dance','escape_game','fitness_centre','hackerspace','horse_riding','ice_rink','miniature_golf','pitch','sauna','sports_centre','stadium','summer_camp','swimming_area','swimming_pool','water_park'] 79 | 80 | key_classification["activity_leisure"] = leisure_activies 81 | activity_classification["leisure/amenity"] += leisure_activies 82 | 83 | ############# 84 | ### Man made 85 | ############# 86 | man_made_activities = ["offshore_platform", "works", "wastewater_plant", "water_works", "kiln", "monitoring_station", "observatory"] 87 | man_made_other = ['adit', 'beacon', 'breakwater', 'bridge', 'bunker_silo', 'campanile', 'chimney', 'communications_tower', 'crane', 'cross', 'cutline', 'clearcut', 'embankment', 'dovecote', 'dyke', ' flagpole', 'gasometer', 'groyne', 'lighthouse', 'mast', 'mineshaft', 'obelisk', 'petroleum_well', 'pier', 'pipeline', 'pumping_station', 'reservoir_covered', 'silo', 'snow_fence', 'snow_net', 'storage_tank', 'street_cabinet', 'surveillance', 'survey_point', 'telscope', 'tower', 'watermill', 'water_tower', 'water_well', 'water_tap', 'wildlife_crossing', 'windmill'] 88 | 89 | key_classification["activity_man_made"] = man_made_activities 90 | key_classification["other_man_made"] = man_made_other 91 | activity_classification["commercial/industrial"] += man_made_activities 92 | 93 | ############# 94 | ### Building 95 | ############# 96 | building_infer = ['yes'] 97 | building_other = ['barn','bridge','bunker','cabin','cowshed','digester','garage','garages','farm_auxiliary','greenhouse','hut','roof','shed','stable','sty','transformer_tower','service','ruins'] 98 | building_related_activities = ['hangar', 'stable', 'cowshed', 'digester', 'construction'] # From building_other related to activities 99 | building_shop = ['shop','kiosk'] 100 | building_commercial = ['commercial','office','industrial','retail','warehouse'] + ['port'] 101 | building_civic_amenity = ['cathedral','chapel','church','mosque','temple','synagogue','shrine','civic','hospital','school','stadium','train_station','transportation','university','public'] 102 | 103 | building_activities = building_commercial + building_civic_amenity + building_related_activities + building_shop 104 | building_residential = ['hotel','farm','apartment','apartments','dormitory','house','residential','retirement_home','terrace','houseboat','bungalow','static_caravan','detached'] 105 | 106 | key_classification["activity_building"] = building_activities 107 | key_classification["residential_building"] = building_residential 108 | key_classification["infer_building"] = building_infer 109 | key_classification["other_building"] = building_other 110 | activity_classification["commercial/industrial"] += building_commercial + building_related_activities 111 | activity_classification["leisure/amenity"] += building_civic_amenity 112 | activity_classification["shop"] += building_shop 113 | 114 | ############# 115 | ### Building:use 116 | ############# 117 | key_classification["activity_building:use"] = building_activities 118 | key_classification["residential_building:use"] = building_residential 119 | 120 | ############# 121 | ### Building:part 122 | ############# 123 | key_classification["activity_building:part"] = building_activities 124 | key_classification["residential_building:part"] = building_residential 125 | 126 | ############# 127 | ### Land use 128 | ############# 129 | landuse_activities = building_activities + shop_activities + amenities_activities + leisure_activies + ['quarry','salt_pond','military'] 130 | landuse_residential = ['residential'] 131 | 132 | # Land usage not related to residential or activity uses 133 | landuse_other_related = ['cemetery', 'landfill', 'railway'] 134 | landuse_water = ['water', 'reservoir', 'basin'] 135 | landuse_green = ['allotments','conservation', 'farmland', 'farmyard','forest','grass', 'greenfield', 'greenhouse_horticulture','meadow','orchard','pasture','peat_cutting','plant_nursery','recreation_ground','village_green','vineyard'] 136 | landuse_other = landuse_other_related + landuse_water + landuse_green 137 | 138 | landuse_classification["activity"] = landuse_activities 139 | landuse_classification["residential"] = landuse_residential 140 | landuse_classification["other"] = landuse_other 141 | 142 | activity_classification["commercial/industrial"] += ['quarry','salt_pond','military'] 143 | #################################################################################### -------------------------------------------------------------------------------- /urbansprawl/osm/utils.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import osmnx as ox 7 | import pandas as pd 8 | import geopandas as gpd 9 | import numpy as np 10 | 11 | from .tags import height_tags 12 | 13 | from ..settings import storage_folder 14 | 15 | # Format for load/save the geo-data ['geojson','shp'] 16 | geo_format = 'geojson' # 'shp' 17 | geo_driver = 'GeoJSON' # 'ESRI Shapefile' 18 | 19 | ################################################### 20 | ### I/O utils 21 | ################################################### 22 | 23 | def get_dataframes_filenames(city_ref_file): 24 | """ 25 | Get data frame file names for input city 26 | 27 | Parameters 28 | ---------- 29 | city_ref_file : string 30 | name of input city 31 | 32 | Returns 33 | ---------- 34 | [ string, string, string ] 35 | returns filenames for buildings, building parts, and points of interest 36 | 37 | """ 38 | import os 39 | if not(os.path.isdir(storage_folder)): 40 | os.makedirs(storage_folder) 41 | geo_poly_file = storage_folder+"/"+city_ref_file+"_buildings."+geo_format 42 | geo_poly_parts_file = storage_folder+"/"+city_ref_file+"_building_parts."+geo_format 43 | geo_point_file = storage_folder+"/"+city_ref_file+"_poi."+geo_format 44 | return geo_poly_file, geo_poly_parts_file, geo_point_file 45 | 46 | def load_geodataframe(geo_filename): 47 | """ 48 | Load input GeoDataFrame 49 | 50 | Parameters 51 | ---------- 52 | geo_filename : string 53 | input GeoDataFrame filename 54 | 55 | Returns 56 | ---------- 57 | geopandas.GeoDataFrame 58 | loaded data 59 | 60 | """ 61 | # Load using geopandas 62 | df_osm_data = gpd.read_file(geo_filename) 63 | # Set None as NaN 64 | df_osm_data.fillna(value=np.nan, inplace=True) 65 | # Replace empty string (Json NULL sometimes read as '') for NaN 66 | df_osm_data.replace('', np.nan, inplace=True) 67 | 68 | def list_int_from_string(x): # List of integers given input in string format 69 | return [ int(id_) for id_ in x.split(",") ] 70 | def list_str_from_string(x): # List of strings given input in string format 71 | return x.split(",") 72 | 73 | # Recover list 74 | if ( "activity_category" in df_osm_data.columns): 75 | df_osm_data[ "activity_category" ] = df_osm_data.activity_category.apply(lambda x: list_str_from_string(x) if pd.notnull(x) else np.nan ) 76 | if ( "containing_parts" in df_osm_data.columns): 77 | df_osm_data[ "containing_parts" ] = df_osm_data.containing_parts.apply( lambda x: list_int_from_string(x) if pd.notnull(x) else np.nan ) 78 | if ( "containing_poi" in df_osm_data.columns): 79 | df_osm_data[ "containing_poi" ] = df_osm_data.containing_poi.apply( lambda x: list_int_from_string(x) if pd.notnull(x) else np.nan ) 80 | 81 | # To UTM coordinates 82 | return ox.project_gdf( df_osm_data ) 83 | 84 | def store_geodataframe(df_osm_data, geo_filename): 85 | """ 86 | Store input GeoDataFrame 87 | 88 | Parameters 89 | ---------- 90 | df_osm_data : geopandas.GeoDataFrame 91 | input OSM data frame 92 | geo_filename : string 93 | filename for GeoDataFrame storage 94 | 95 | Returns 96 | ---------- 97 | 98 | """ 99 | # To EPSG 4326 (GeoJSON does not store projection information) 100 | df_osm_data = ox.project_gdf(df_osm_data, to_latlong=True) 101 | 102 | # Lists to string (needed to save GeoJSON files) 103 | if ( "activity_category" in df_osm_data.columns): 104 | df_osm_data.activity_category = df_osm_data.activity_category.apply( lambda x: ','.join(str(e) for e in x) if isinstance(x,list) else np.nan ) 105 | if ( "containing_parts" in df_osm_data.columns): 106 | df_osm_data.containing_parts = df_osm_data.containing_parts.apply( lambda x: ','.join(str(e) for e in x) if isinstance(x,list) else np.nan ) 107 | if ( "containing_poi" in df_osm_data.columns): 108 | df_osm_data.containing_poi = df_osm_data.containing_poi.apply( lambda x: ','.join(str(e) for e in x) if isinstance(x,list) else np.nan ) 109 | 110 | # Save to file 111 | df_osm_data.to_file(geo_filename, driver=geo_driver) 112 | 113 | ################################################### 114 | ### GeoDataFrame processing utils 115 | ################################################### 116 | 117 | def sanity_check_height_tags(df_osm): 118 | """ 119 | Compute a sanity check for all height tags 120 | If incorrectly tagged, try to replace with the correct tag 121 | Any meter or level related string are replaced, and heights using the imperial units are converted to the metric system 122 | 123 | Parameters 124 | ---------- 125 | df_osm : geopandas.GeoDataFrame 126 | input OSM data frame 127 | 128 | Returns 129 | ---------- 130 | 131 | """ 132 | def sanity_check(value): 133 | ### Sanity check for height tags (sometimes wrongly-tagged) 134 | if not( (value is np.nan) or (value is None) ): # Non-null value 135 | try: # Can be read as float? 136 | return float(value) 137 | except: 138 | try: # Try removing incorrectly tagged information: meters/levels 139 | return float( value.replace('meters','').replace('meter','').replace('m','').replace('levels','').replace('level','').replace('l','') ) 140 | except: 141 | try: # Feet and inch values? e.g.: 4'7'' 142 | split_value = value.split("'") 143 | feet, inches = split_value[0], split_value[1] 144 | if (inches is ''): # Non existent inches 145 | inches = '0' 146 | tot_inches = float(feet)*12 + float(inches) 147 | # Return meters equivalent 148 | return tot_inches * 0.0254 149 | except: # None. Incorrect tag 150 | return None 151 | return value 152 | 153 | # Available height tags 154 | available_height_tags = [ col for col in height_tags if col in df_osm.columns ] 155 | # Apply-map sanity check 156 | df_osm[ available_height_tags ] = df_osm[ available_height_tags ].applymap(sanity_check) 157 | 158 | def associate_structures(df_osm_encompassing_structures, df_osm_structures, operation='contains', column='containing_'): 159 | """ 160 | Associate input structure geometries to its encompassing structures 161 | Structures are associated using the operation 'contains' or 'intersects' 162 | A new column in the encompassing data frame is added, incorporating the indices of the containing structures 163 | 164 | Parameters 165 | ---------- 166 | df_osm_encompassing_structures : geopandas.GeoDataFrame 167 | encompassing data frame 168 | df_osm_structures : geopandas.GeoDataFrame 169 | structures data frame 170 | operation : string 171 | spatial join operation to associate structures 172 | column : string 173 | name of the column to add in encompassing data frame 174 | 175 | Returns 176 | ---------- 177 | 178 | """ 179 | # Find, for each geometry, all containing structures 180 | sjoin = gpd.sjoin(df_osm_encompassing_structures[['geometry']], df_osm_structures[['geometry']], op=operation, rsuffix='cont') 181 | # Group by: polygon_index -> list of containing points indices 182 | group_indices = sjoin.groupby( sjoin.index, as_index=True )['index_cont'].apply(list) 183 | # Create new column 184 | df_osm_encompassing_structures.loc[ group_indices.index, column ] = group_indices.values 185 | # Reset indices 186 | df_osm_encompassing_structures.index.rename('',inplace=True) 187 | df_osm_structures.index.rename('',inplace=True) -------------------------------------------------------------------------------- /urbansprawl/population/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/urbansprawl/population/__init__.py -------------------------------------------------------------------------------- /urbansprawl/population/core.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | from .data_extract import get_extract_population_data 7 | from .downscaling import proportional_population_downscaling 8 | from .urban_features import compute_full_urban_features, get_training_testing_data, get_Y_X_features_population_data 9 | from .utils import get_aggregated_squares, population_downscaling_validation -------------------------------------------------------------------------------- /urbansprawl/population/data_extract.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | from shapely.geometry import Polygon, GeometryCollection 7 | import geopandas as gpd 8 | import pandas as pd 9 | import os 10 | import numpy as np 11 | import osmnx as ox 12 | 13 | from osmnx import log 14 | 15 | from .utils import get_population_extract_filename 16 | 17 | DATA_SOURCES = ['insee','gpw'] 18 | 19 | ############################## 20 | ### I/O for population data 21 | ############################## 22 | 23 | def get_df_extract(df_data, poly_gdf, operation = "within"): 24 | """ 25 | Indexes input geo-data frame within an input region of interest 26 | If the region of interest is given as a polygon, its bounding box is indexed 27 | 28 | Parameters 29 | ---------- 30 | df_data : geopandas.GeoDataFrame 31 | input data frame to index 32 | poly_gdf : geopandas.GeoDataFrame 33 | geodataframe containing the region of interest in form of polygon 34 | operation : string 35 | the desired spatial join operation: 'within' or 'intersects' 36 | 37 | Returns 38 | ---------- 39 | geopandas.GeoDataFrame 40 | returns the population data frame indexed within the region of interest 41 | """ 42 | # Project to same system coordinates 43 | poly_gdf = ox.project_gdf(poly_gdf, to_crs=df_data.crs) 44 | # Spatial join 45 | df_extract = gpd.sjoin(df_data, poly_gdf, op=operation) 46 | # Keep original columns 47 | df_extract = df_extract[ df_data.columns ] 48 | return df_extract 49 | 50 | def get_population_df(pop_shapefile, pop_data_file, data_source, to_crs, poly_gdf): 51 | """ 52 | Read the population shapefile from input filename/s 53 | Index the data within the bounding box 54 | Project to desired CRS 55 | 56 | Parameters 57 | ---------- 58 | pop_shapefile : string 59 | population count shapefile 60 | pop_data_file : string 61 | population data additional file (required for INSEE format) 62 | data_source : string 63 | desired population data source 64 | to_crs : dict 65 | desired coordinate reference system 66 | poly_gdf : geopandas.GeoDataFrame 67 | geodataframe containing the region of interest in form of polygon 68 | 69 | Returns 70 | ---------- 71 | geopandas.GeoDataFrame 72 | returns the indexed and projected population data frame 73 | """ 74 | ####################################### 75 | ### Load GPW/INSEE population data 76 | ####################################### 77 | # Read population data 78 | df_pop = gpd.read_file(pop_shapefile) 79 | 80 | ### Extract region of interest (EPSG 4326) 81 | # Filter geometries not contained in bounding box 82 | df_pop = get_df_extract(df_pop, poly_gdf) 83 | 84 | if (data_source is 'insee'): 85 | ####################################### 86 | ### Additional step for INSEE data 87 | ####################################### 88 | # Read dbf files 89 | data_pop = gpd.read_file(pop_data_file) 90 | # Get columns of interest 91 | data_pop = data_pop[["idINSPIRE","ind_c"]] 92 | df_pop = df_pop[["geometry","idINSPIRE"]] 93 | # Inner join to obtain population count data associated to each geometry 94 | df_pop = pd.merge(df_pop, data_pop, how='inner', on='idINSPIRE') 95 | 96 | # Rename population count column 97 | df_pop.rename(columns={"ind_c":"pop_count", "DN":"pop_count"}, inplace=True) 98 | 99 | return ox.project_gdf(df_pop, to_crs=to_crs) 100 | 101 | def get_extract_population_data(city_ref, data_source, pop_shapefile=None, pop_data_file=None, to_crs={'init': 'epsg:4326'}, polygons_gdf=None): 102 | """ 103 | Get data population extract of desired data source for input city, calculating the convex hull of input buildings geodataframe 104 | The population data frame is projected to the desired coordinate reference system 105 | Stores the extracted shapefile 106 | Returns the stored population data for input 'data source' and 'city reference' if it was previously stored 107 | 108 | Parameters 109 | ---------- 110 | city_ref : string 111 | name of input city 112 | data_source : string 113 | desired population data source 114 | pop_shapefile : string 115 | path of population count shapefile 116 | pop_data_file : string 117 | path of population data additional file (required for INSEE format) 118 | to_crs : dict 119 | desired coordinate reference system 120 | polygons_gdf : geopandas.GeoDataFrame 121 | polygons (e.g. buildings) for input region of interest which will determine the shape to extract 122 | 123 | Returns 124 | ---------- 125 | geopandas.GeoDataFrame 126 | returns the extracted population data 127 | """ 128 | # Input data source type given? 129 | assert( data_source in DATA_SOURCES ) 130 | 131 | # Population extract exists? 132 | if ( os.path.exists( get_population_extract_filename(city_ref, data_source) ) ): 133 | log("Population extract exists for input city: "+city_ref) 134 | return gpd.read_file( get_population_extract_filename(city_ref, data_source) ) 135 | 136 | # Input shape given? 137 | assert( not ( np.all(polygons_gdf is None ) ) ) 138 | # Input population shapefile given? 139 | assert( not pop_shapefile is None ) 140 | # All input files given? 141 | assert( not ( (data_source == 'insee') and (pop_data_file is None) ) ) 142 | 143 | # Get buildings convex hull 144 | polygon = GeometryCollection( polygons_gdf.geometry.values.tolist() ).convex_hull 145 | # Convert to geo-dataframe with defined CRS 146 | poly_gdf = gpd.GeoDataFrame([polygon], columns=["geometry"], crs=polygons_gdf.crs) 147 | 148 | # Compute extract 149 | df_pop = get_population_df(pop_shapefile, pop_data_file, data_source, to_crs, poly_gdf) 150 | 151 | # Save to shapefile 152 | df_pop.to_file( get_population_extract_filename(city_ref, data_source), driver='ESRI Shapefile' ) 153 | return df_pop 154 | 155 | -------------------------------------------------------------------------------- /urbansprawl/population/downscaling.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import geopandas as gpd 7 | import osmnx as ox 8 | 9 | def proportional_population_downscaling(df_osm_built, df_insee): 10 | """ 11 | Performs a proportional population downscaling considering the surface dedicated to residential land use 12 | Associates the estimated population to each building in column 'population' 13 | 14 | Parameters 15 | ---------- 16 | df_osm_built : geopandas.GeoDataFrame 17 | input buildings with computed residential surface 18 | df_insee : geopandas.GeoDataFrame 19 | INSEE population data 20 | 21 | Returns 22 | ---------- 23 | 24 | """ 25 | if (df_insee.crs != df_osm_built.crs): # If projections do not match 26 | # First project to Lat-Long coordinates, then project to UTM coordinates 27 | df_insee = ox.project_gdf( ox.project_gdf(df_insee, to_latlong=True) ) 28 | 29 | # OSM Building geometries are already projected 30 | assert(df_insee.crs == df_osm_built.crs) 31 | 32 | df_osm_built['geom'] = df_osm_built.geometry 33 | df_osm_built_residential = df_osm_built[ df_osm_built.apply(lambda x: x.landuses_m2['residential'] > 0, axis = 1) ] 34 | 35 | # Loading/saving using geopandas loses the 'ellps' key 36 | df_insee.crs = df_osm_built_residential.crs 37 | 38 | # Intersecting gridded population - buildings 39 | sjoin = gpd.sjoin( df_insee, df_osm_built_residential, op='intersects') 40 | # Calculate area within square (percentage of building with the square) 41 | sjoin['residential_m2_within'] = sjoin.apply(lambda x: x.landuses_m2['residential'] * (x.geom.intersection(x.geometry).area / x.geom.area), axis=1 ) 42 | # Initialize 43 | df_insee['residential_m2_within'] = 0 44 | # Sum residential area within square 45 | sum_m2_per_square = sjoin.groupby(sjoin.index)['residential_m2_within'].sum() 46 | # Assign total residential area within each square 47 | df_insee.loc[ sum_m2_per_square.index, "residential_m2_within" ] = sum_m2_per_square.values 48 | # Get number of M^2 / person 49 | df_insee[ "m2_per_person" ] = df_insee.apply(lambda x: x.residential_m2_within / x.pop_count, axis=1) 50 | 51 | def population_building(x, df_insee): 52 | # Sum of: For each square: M2 of building within square / M2 per person 53 | return ( x.get('m2',[]) / df_insee.loc[ x.get('idx',[]) ].m2_per_person ).sum() 54 | # Index: Buildings , Values: idx:Indices of gridded square population, m2: M2 within that square 55 | buildings_square_m2_association = sjoin.groupby('index_right').apply(lambda x: {'idx':list(x.index), 'm2':list(x.residential_m2_within)} ) 56 | # Associate 57 | df_osm_built.loc[ buildings_square_m2_association.index, "population" ] = buildings_square_m2_association.apply(lambda x: population_building(x,df_insee) ) 58 | # Drop unnecessary column 59 | df_osm_built.drop('geom', axis=1, inplace=True) 60 | 61 | -------------------------------------------------------------------------------- /urbansprawl/population/urban_features.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import geopandas as gpd 7 | import pandas as pd 8 | import numpy as np 9 | import osmnx as ox 10 | import os.path 11 | import time 12 | 13 | from osmnx import log 14 | 15 | from .utils import get_aggregated_squares, get_population_df_filled_empty_squares 16 | # Filenames 17 | from .utils import get_population_urban_features_filename, get_population_training_validating_filename 18 | 19 | from .data_extract import get_extract_population_data 20 | 21 | # Sprawl indices 22 | from ..sprawl.dispersion import compute_grid_dispersion 23 | from ..sprawl.landusemix import compute_grid_landusemix 24 | 25 | from shapely.geometry import Polygon 26 | 27 | def compute_full_urban_features(city_ref, df_osm_built=None, df_osm_pois=None, df_insee=None, data_source=None, kwargs={"max_dispersion":15}): 28 | """ 29 | Computes a set of urban features for each square where population count data exists 30 | 31 | Parameters 32 | ---------- 33 | city_ref : string 34 | city reference name 35 | df_osm_built : geopandas.GeoDataFrame 36 | input buildings 37 | df_osm_pois : geopandas.GeoDataFrame 38 | input points of interest 39 | df_insee : geopandas.GeoDataFrame 40 | grid-cells with population count where urban features will be calculated 41 | data_source : str 42 | define the type of population data for its retrieval in case it was stored 43 | kwargs : dict 44 | keyword arguments to guide the process 45 | 46 | Returns 47 | ---------- 48 | geopandas.GeoDataFrame 49 | geometry with updated urban features 50 | """ 51 | 52 | # Population extract exists? 53 | if ( os.path.exists( get_population_urban_features_filename(city_ref, data_source) ) ): 54 | log("Urban features from population gridded data exist for input city: "+city_ref) 55 | # Read from GeoJSON (default projection coordinates) 56 | df_insee_urban_features_4326 = gpd.read_file( get_population_urban_features_filename(city_ref, data_source) ) 57 | # Project to UTM coordinates 58 | return ox.project_gdf(df_insee_urban_features_4326) 59 | 60 | # Required arguments 61 | assert( not df_osm_built is None ) 62 | assert( not df_osm_pois is None ) 63 | assert( not df_insee is None ) 64 | 65 | # Get population count data with filled empty squares (null population) 66 | df_insee_urban_features = get_population_df_filled_empty_squares(df_insee) 67 | # Set crs 68 | crs_proj = df_insee.crs 69 | df_insee_urban_features.crs = crs_proj 70 | 71 | ################## 72 | ### Urban features 73 | ################## 74 | # Compute the urban features for each square 75 | log("Calculating urban features") 76 | start = time.time() 77 | 78 | # Conserve building geometries 79 | df_osm_built['geom_building'] = df_osm_built['geometry'] 80 | 81 | # Spatial join: grid-cell i - building j for all intersections 82 | df_insee_urban_features = gpd.sjoin( df_insee_urban_features, df_osm_built, op='intersects', how='left') 83 | 84 | # When a grid-cell i does not intersect any building: NaN values 85 | null_idx = df_insee_urban_features.loc[ df_insee_urban_features['geom_building'].isnull() ].index 86 | # Replace NaN for urban features calculation 87 | min_polygon = Polygon([(0,0), (0,np.finfo(float).eps), (np.finfo(float).eps,np.finfo(float).eps)]) 88 | df_insee_urban_features.loc[null_idx, 'geom_building'] = df_insee_urban_features.loc[null_idx, 'geom_building'].apply(lambda x: min_polygon) 89 | df_insee_urban_features.loc[null_idx, 'landuses_m2' ] = len( null_idx ) * [{'residential':0, 'activity':0}] 90 | df_insee_urban_features.loc[null_idx, 'building_levels'] = len(null_idx) * [0] 91 | 92 | ### Pre-calculation of urban features 93 | 94 | # Apply percentage of building presence within square: 1 if fully contained, 0.5 if half the building contained, ... 95 | df_insee_urban_features['building_ratio'] = df_insee_urban_features.apply( lambda x: x.geom_building.intersection(x.geometry).area / x.geom_building.area, axis=1 ) 96 | 97 | df_insee_urban_features['m2_total_residential'] = df_insee_urban_features.apply( lambda x: x.building_ratio * x.landuses_m2['residential'], axis=1 ) 98 | df_insee_urban_features['m2_total_activity'] = df_insee_urban_features.apply( lambda x: x.building_ratio * x.landuses_m2['activity'], axis=1 ) 99 | 100 | df_insee_urban_features['m2_footprint_residential'] = 0 101 | df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']), 'm2_footprint_residential' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']) ].apply(lambda x: x.building_ratio * x.geom_building.area, axis=1 ) 102 | df_insee_urban_features['m2_footprint_activity'] = 0 103 | df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']), 'm2_footprint_activity' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']) ].apply(lambda x: x.building_ratio * x.geom_building.area, axis=1 ) 104 | df_insee_urban_features['m2_footprint_mixed'] = 0 105 | df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']), 'm2_footprint_mixed' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']) ].apply(lambda x: x.building_ratio * x.geom_building.area, axis=1 ) 106 | 107 | df_insee_urban_features['num_built_activity'] = 0 108 | df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']), 'num_built_activity' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['activity']) ].building_ratio 109 | df_insee_urban_features['num_built_residential'] = 0 110 | df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']), 'num_built_residential' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['residential']) ].building_ratio 111 | df_insee_urban_features['num_built_mixed'] = 0 112 | df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']), 'num_built_mixed' ] = df_insee_urban_features.loc[ df_insee_urban_features.classification.isin(['mixed']) ].building_ratio 113 | 114 | df_insee_urban_features['num_levels'] = df_insee_urban_features.apply( lambda x: x.building_ratio * x.building_levels, axis=1 ) 115 | df_insee_urban_features['num_buildings'] = df_insee_urban_features['building_ratio'] 116 | 117 | df_insee_urban_features['built_up_m2'] = df_insee_urban_features.apply( lambda x: x.geom_building.area * x.building_ratio , axis=1 ) 118 | 119 | 120 | ### Urban features aggregation functions 121 | urban_features_aggregation = {} 122 | urban_features_aggregation['idINSPIRE'] = lambda x: x.head(1) 123 | urban_features_aggregation['pop_count'] = lambda x: x.head(1) 124 | urban_features_aggregation['geometry'] = lambda x: x.head(1) 125 | 126 | urban_features_aggregation['m2_total_residential'] = 'sum' 127 | urban_features_aggregation['m2_total_activity'] = 'sum' 128 | 129 | urban_features_aggregation['m2_footprint_residential'] = 'sum' 130 | urban_features_aggregation['m2_footprint_activity'] = 'sum' 131 | urban_features_aggregation['m2_footprint_mixed'] = 'sum' 132 | 133 | urban_features_aggregation['num_built_activity'] = 'sum' 134 | urban_features_aggregation['num_built_residential'] = 'sum' 135 | urban_features_aggregation['num_built_mixed'] = 'sum' 136 | 137 | urban_features_aggregation['num_levels'] = 'sum' 138 | urban_features_aggregation['num_buildings'] = 'sum' 139 | 140 | urban_features_aggregation['built_up_m2'] = 'sum' 141 | 142 | # Apply aggregate functions 143 | df_insee_urban_features = df_insee_urban_features.groupby( df_insee_urban_features.index ).agg( urban_features_aggregation ) 144 | 145 | # Calculate built up relation (relative to the area of the grid-cell geometry) 146 | df_insee_urban_features['built_up_relation'] = df_insee_urban_features.apply(lambda x: x.built_up_m2 / x.geometry.area, axis=1) 147 | df_insee_urban_features.drop('built_up_m2', axis=1, inplace=True) 148 | 149 | # To geopandas.GeoDataFrame and set crs 150 | df_insee_urban_features = gpd.GeoDataFrame(df_insee_urban_features) 151 | df_insee_urban_features.crs = crs_proj 152 | 153 | # POIs 154 | df_osm_pois_selection = df_osm_pois[ df_osm_pois.classification.isin(["activity","mixed"]) ] 155 | gpd_intersection_pois = gpd.sjoin( df_insee_urban_features, df_osm_pois_selection, op='intersects', how='left') 156 | # Number of activity/mixed POIs 157 | df_insee_urban_features['num_activity_pois'] = gpd_intersection_pois.groupby( gpd_intersection_pois.index ).agg({'osm_id':'count'}) 158 | 159 | 160 | ################## 161 | ### Sprawling indices 162 | ################## 163 | df_insee_urban_features['geometry_squares'] = df_insee_urban_features.geometry 164 | df_insee_urban_features['geometry'] = df_insee_urban_features.geometry.centroid 165 | 166 | ''' 167 | compute_grid_accessibility(df_insee_urban_features, graph, df_osm_built, df_osm_pois) 168 | ''' 169 | 170 | # Compute land uses mix + densities estimation 171 | compute_grid_landusemix(df_insee_urban_features, df_osm_built, df_osm_pois) 172 | # Dispersion indices 173 | compute_grid_dispersion(df_insee_urban_features, df_osm_built) 174 | 175 | if (kwargs.get("max_dispersion")): # Set max bounds for dispersion values 176 | df_insee_urban_features.loc[ df_insee_urban_features.dispersion > kwargs.get("max_dispersion"), "dispersion" ] = kwargs.get("max_dispersion") 177 | 178 | # Set back original geometries 179 | df_insee_urban_features['geometry'] = df_insee_urban_features.geometry_squares 180 | df_insee_urban_features.drop('geometry_squares', axis=1, inplace=True) 181 | 182 | # Fill NaN sprawl indices with 0 183 | df_insee_urban_features.fillna(0, inplace=True) 184 | 185 | # Save to GeoJSON file (no projection conserved, then use EPSG 4326) 186 | ox.project_gdf(df_insee_urban_features, to_latlong=True).to_file( get_population_urban_features_filename(city_ref, data_source), driver='GeoJSON' ) 187 | 188 | elapsed_time = int(time.time() - start) 189 | log("Done: Urban features calculation. Elapsed time (H:M:S): " + '{:02d}:{:02d}:{:02d}'.format(elapsed_time // 3600, (elapsed_time % 3600 // 60), elapsed_time % 60) ) 190 | 191 | return df_insee_urban_features 192 | 193 | def get_training_testing_data(city_ref, df_insee_urban_features=None): 194 | """ 195 | Returns the Y and X arrays for training/testing population downscaling estimates. 196 | 197 | Y contains vectors with the correspondent population densities 198 | X contains vectors with normalized urban features 199 | X_columns columns referring to X values 200 | Numpy arrays are stored locally 201 | 202 | Parameters 203 | ---------- 204 | city_ref : string 205 | city reference name 206 | df_insee_urban_features : geopandas.GeoDataFrame 207 | grid-cells with population count data and calculated urban features 208 | 209 | Returns 210 | ---------- 211 | np.array, np.array, np.array 212 | Y vector, X vector, X column names vector 213 | """ 214 | # Population extract exists? 215 | if ( os.path.exists( get_population_training_validating_filename(city_ref) ) ): 216 | log("Urban population training+validation data/features exist for input city: " + city_ref) 217 | # Read from Numpy.Arrays 218 | data = np.load( get_population_training_validating_filename(city_ref) ) 219 | # Project to UTM coordinates 220 | return data["Y"], data["X"], data["X_columns"] 221 | 222 | log("Calculating urban training+validation data/features for city: " + city_ref) 223 | start = time.time() 224 | 225 | # Select columns to normalize 226 | columns_to_normalise = [col for col in df_insee_urban_features.columns if "num_" in col or "m2_" in col or "dispersion" in col or "accessibility" in col] 227 | # Normalize selected columns 228 | df_insee_urban_features.loc[:,columns_to_normalise] = df_insee_urban_features.loc[:,columns_to_normalise].apply(lambda x: x / x.max() , axis=0) 229 | 230 | # By default, idINSPIRE for created squares (0 population count) is 0: Change for 'CRS' string: Coherent with squares aggregation procedure (string matching) 231 | df_insee_urban_features.loc[ df_insee_urban_features.idINSPIRE == 0, "idINSPIRE" ] = "CRS" 232 | 233 | # Aggregate 5x5 squares: Get all possible aggregations (step of 200 meters = length of individual square) 234 | aggregated_df_insee_urban_features = get_aggregated_squares(ox.project_gdf(df_insee_urban_features, to_crs="+init=epsg:3035"), step=200., conserve_squares_info=True) 235 | 236 | # X values: Vector with normalized urban features 237 | X_values = [] 238 | # Y values: Vector with normalized population densities. m=25 239 | Y_values = [] 240 | 241 | # For each combination, create a X and Y vector 242 | for idx in aggregated_df_insee_urban_features.indices: 243 | # Extract the urban features in the given 'indices' order (Fill to 0 for non-existent squares) 244 | square_info = df_insee_urban_features.reindex( idx ).fillna(0) 245 | # Y input (Ground truth): Population densities 246 | population_densities = (square_info["pop_count"] / square_info["pop_count"].sum() ).values 247 | 248 | if (all (pd.isna(population_densities)) ): # If sum of population count is 0, remove (NaN values) 249 | continue 250 | 251 | # X input: Normalized urban features 252 | urban_features = square_info[ [col for col in square_info.columns if col not in ['idINSPIRE','geometry','pop_count']] ].values 253 | 254 | # Append X, Y 255 | X_values.append(urban_features) 256 | Y_values.append(population_densities) 257 | 258 | # Get the columns order referenced in each X vector 259 | X_values_columns = df_insee_urban_features[ [col for col in square_info.columns if col not in ['idINSPIRE','geometry','pop_count']] ].columns 260 | X_values_columns = np.array(X_values_columns) 261 | 262 | # To Numpy Array 263 | X_values = np.array(X_values) 264 | Y_values = np.array(Y_values) 265 | 266 | # Save to file 267 | np.savez( get_population_training_validating_filename(city_ref), X=X_values, Y=Y_values, X_columns=X_values_columns) 268 | 269 | log("Done: urban training+validation data/features. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) ) 270 | 271 | return Y_values, X_values, X_values_columns 272 | 273 | def get_Y_X_features_population_data(cities_selection=None, cities_skip=None): 274 | """ 275 | Returns the Y and X arrays for training/testing population downscaling estimates. 276 | It gathers either a selection of cities or all stored cities but a selected list to skip 277 | 278 | Y contains vectors with the correspondent population densities 279 | X contains vectors with normalized urban features 280 | X_columns columns referring to X values 281 | Numpy arrays are previously stored 282 | 283 | Parameters 284 | ---------- 285 | cities_selection : string 286 | list of cities to select 287 | cities_skip : string 288 | list of cities to skip (retrieve the rest) 289 | 290 | Returns 291 | ---------- 292 | np.array, np.array, np.array 293 | Y vector, X vector, X column names vector 294 | """ 295 | arr_X, arr_Y = [], [] 296 | 297 | # Get the complete training-testig dataset 298 | for Y_X_data_city in os.listdir("data/training"): 299 | # Only if it contains a valid extension 300 | if ('.npz' not in Y_X_data_city): continue 301 | 302 | # Get city's name 303 | city_ref = Y_X_data_city.replace('_X_Y.npz','') 304 | 305 | # Only retrieve data from cities_selection (if ever given) 306 | if ( (cities_selection is not None) and (city_ref not in cities_selection) ): 307 | log('Skipping city: ' + str(city_ref) ) 308 | continue 309 | 310 | # Skip cities data from from cities_skip (if ever given) 311 | if ( (cities_skip is not None) and (city_ref in cities_skip) ): 312 | log('Skipping city:', city_ref) 313 | continue 314 | 315 | log('Retrieving data for city: ' + str(city_ref) ) 316 | 317 | # Get stored data 318 | city_Y, city_X, city_X_cols = get_training_testing_data(city_ref) 319 | # Append values 320 | arr_Y.append(city_Y) 321 | arr_X.append(city_X) 322 | 323 | # Assumption: All generated testing-training data contain the same X columns 324 | return np.concatenate(arr_Y), np.concatenate(arr_X), city_X_cols 325 | -------------------------------------------------------------------------------- /urbansprawl/population/utils.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import geopandas as gpd 7 | import pandas as pd 8 | import osmnx as ox 9 | import numpy as np 10 | 11 | from shapely.geometry import Polygon 12 | from shapely.geometry import Point 13 | 14 | from ..settings import storage_folder 15 | 16 | # Format for load/save the geo-data ['geojson','shp'] 17 | geo_format = 'geojson' # 'shp' 18 | 19 | 20 | def get_population_extract_filename(city_ref_file, data_source): 21 | """ 22 | Get data population extract filename for input city 23 | 24 | Parameters 25 | ---------- 26 | city_ref_file : string 27 | name of input city 28 | data_source : string 29 | desired population data source 30 | 31 | Returns 32 | ---------- 33 | string 34 | returns the population extract filename 35 | 36 | """ 37 | # Folder exists? 38 | import os 39 | if not(os.path.isdir(storage_folder + "/" + data_source)): 40 | os.makedirs(storage_folder + "/" + data_source) 41 | return storage_folder + "/" + data_source + "/" + city_ref_file + "_population.shp" 42 | 43 | def get_population_urban_features_filename(city_ref_file, data_source): 44 | """ 45 | Get population urban features extract filename for input city 46 | Force GeoJSON format: Shapefiles truncate column names 47 | 48 | Parameters 49 | ---------- 50 | city_ref_file : string 51 | name of input city 52 | data_source : string 53 | desired population data source 54 | 55 | Returns 56 | ---------- 57 | string 58 | returns the population extract filename 59 | 60 | """ 61 | # Folder exists? 62 | import os 63 | if not(os.path.isdir(storage_folder + "/" + data_source)): 64 | os.makedirs(storage_folder + "/" + data_source) 65 | return storage_folder + "/" + data_source + "/" + city_ref_file + "_urban_features." + geo_format 66 | 67 | def get_population_training_validating_filename(city_ref_file, data_source="training"): 68 | """ 69 | Get population normalized urban features extract and population densities filename for input city 70 | Stored in Numpy.Arrays 71 | 72 | Parameters 73 | ---------- 74 | city_ref_file : string 75 | name of input city 76 | 77 | Returns 78 | ---------- 79 | string 80 | returns the numpy stored/storing filename 81 | 82 | """ 83 | # Folder exists? 84 | import os 85 | if not(os.path.isdir(storage_folder + "/" + data_source)): 86 | os.makedirs(storage_folder + "/" + data_source) 87 | return storage_folder + "/" + data_source + "/" + city_ref_file + "_X_Y.npz" 88 | 89 | ################################################################# 90 | 91 | def get_aggregated_squares(df_insee, step=1000., conserve_squares_info=False): 92 | """ 93 | Aggregates input population data in squares of 5x5 94 | Assumption: Input squares 200m by 200m 95 | INSEE data contains column 'idINSPIRE' which denotes in EPSG:3035 the northing/easting coordinates of the south-west box endpoint 96 | If conserve squares information is True, the information relative to each original square is kept 97 | Output: Aggregated squares of 1km by 1km 98 | 99 | Parameters 100 | ---------- 101 | df_insee : geopandas.GeoDataFrame 102 | INSEE population data 103 | step : float 104 | sampling step (default of 1 kilometer) 105 | conserve_squares_info : bool 106 | determines if each aggregated square conserves the information of each smaller composing square 107 | 108 | Returns 109 | ---------- 110 | geopandas.GeoDataFrame 111 | returns the aggregated population data 112 | """ 113 | def get_northing_easting(x): # Extract northing and easting coordinates 114 | try: 115 | north, east = x.idINSPIRE.split("N")[1].split("E") 116 | x["north"] = int(north) 117 | x["east"] = int(east) 118 | except: 119 | x["north"], x["east"] = np.nan, np.nan 120 | return x 121 | 122 | def index_square(x, df_insee, offset_index): 123 | squares = df_insee.cx[ x.geometry.x - offset_index: x.geometry.x + offset_index , x.geometry.y - offset_index: x.geometry.y + offset_index] 124 | aggregated_polygon = Polygon() 125 | for geom in squares.geometry: 126 | aggregated_polygon = aggregated_polygon.union(geom) 127 | x["square_geometry"] = aggregated_polygon 128 | x["pop_count"] = squares.pop_count.sum() 129 | return x 130 | 131 | def index_square_conservative(x, df_insee, offset_index): 132 | squares = df_insee.cx[ x.geometry.x - offset_index: x.geometry.x + offset_index , x.geometry.y - offset_index: x.geometry.y + offset_index] 133 | aggregated_polygon = Polygon() 134 | for geom in squares.geometry: 135 | aggregated_polygon = aggregated_polygon.union(geom) 136 | x["square_geometry"] = aggregated_polygon 137 | x["pop_count"] = squares.pop_count.sum() 138 | 139 | indices = [] 140 | for y_diff in [-400, -200, 0, 200, 400]: 141 | for x_diff in [-400, -200, 0, 200, 400]: # First iterate over Easting 142 | # - 100 on each coordinate: Due to Northing Easting representing south-west point of box 143 | coords_match = "N" + str( int(x.NE.y) + y_diff - 100) + "E" + str( int(x.NE.x) + x_diff - 100) 144 | values = df_insee[ df_insee.idINSPIRE.str.contains(coords_match) ].index.values 145 | if (len(values) == 0): 146 | indices += [None] 147 | else: # Cocatenate index value 148 | indices += list( values ) 149 | 150 | x["indices"] = indices 151 | return x 152 | 153 | # Get northing and easting coordinates 154 | coordinates = df_insee.apply(lambda x: get_northing_easting(x), axis=1 )[["north","east"]] 155 | 156 | if (conserve_squares_info): # +100 meters to obtain the centroid of each box 157 | coords_offset = 100. 158 | else: # +500 meters to obtain the centroid of the 5x5 squared-box 159 | coords_offset = 500. 160 | 161 | # North, east coordinates denote the south-west box endpoint: 162 | north_min, north_max = coordinates.north.min() + coords_offset, coordinates.north.max() + coords_offset 163 | east_min, east_max = coordinates.east.min() + coords_offset, coordinates.east.max() + coords_offset 164 | 165 | # Create mesh grid: One point for each square's centroid: Each square has an extent of 1km by 1km 166 | xv, yv = np.meshgrid( np.arange(east_min, east_max, step), np.arange(north_min, north_max, step) ) 167 | points = [ Point(x,y) for x,y in zip( xv.ravel(), yv.ravel() ) ] 168 | # Initialize GeoDataFrame 169 | df_squares = gpd.GeoDataFrame( points, columns=[ "geometry" ], crs="+init=epsg:3035" ) 170 | 171 | # Project 172 | df_squares = ox.project_gdf(df_squares, to_crs = df_insee.crs) 173 | 174 | # Save Northing-Easting original coordinates for its later reference 175 | df_squares["NE"] = points 176 | 177 | if (conserve_squares_info): 178 | index_function = index_square_conservative 179 | else: 180 | index_function = index_square 181 | 182 | # Index, for each square centroid, +- 400 meters to achieve squares of 5 by 5 183 | df_squares = df_squares.apply(lambda x: index_function( x, df_insee, offset_index=400 ), axis=1 ) 184 | # Update geometry 185 | df_squares['geometry'] = df_squares.square_geometry 186 | 187 | # Drop useless columns 188 | df_squares.drop( ['square_geometry','NE'], axis=1, inplace=True ) 189 | # Drop empty squares (rows) 190 | df_squares.drop(df_squares[ df_squares.geometry.area == 0 ].index, axis=0, inplace=True) 191 | # Reset index 192 | df_squares.reset_index(drop=True, inplace=True) 193 | # Set CRS key-words 194 | df_squares.crs = df_insee.crs 195 | 196 | return df_squares 197 | 198 | 199 | def population_downscaling_validation(df_osm_built, df_insee): 200 | """ 201 | Validates the population downscaling estimation by means of aggregating the sum of buildings estimated population lying within each population square 202 | Allows to compare the real population count with the estimated population lying within each square 203 | Updates new column 'pop_estimation' for each square in the population data frame 204 | 205 | Parameters 206 | ---------- 207 | df_osm_built : geopandas.GeoDataFrame 208 | input buildings with computed population count 209 | df_insee : geopandas.GeoDataFrame 210 | INSEE population data 211 | 212 | Returns 213 | ---------- 214 | 215 | """ 216 | df_osm_built['geom'] = df_osm_built.geometry 217 | df_osm_built_residential = df_osm_built[ df_osm_built.apply(lambda x: x.landuses_m2['residential'] > 0, axis = 1) ] 218 | df_insee.crs = df_osm_built_residential.crs 219 | 220 | # Intersecting gridded population - buildings 221 | sjoin = gpd.sjoin( df_insee, df_osm_built_residential, op='intersects') 222 | # Calculate area within square (percentage of building with the square) 223 | sjoin['pop_estimation'] = sjoin.apply(lambda x: x.population * (x.geom.intersection(x.geometry).area / x.geom.area), axis=1 ) 224 | 225 | # Initialize 226 | df_insee['pop_estimation'] = np.nan 227 | sum_pop_per_square = sjoin.groupby(sjoin.index)['pop_estimation'].sum() 228 | 229 | df_insee.loc[ sum_pop_per_square.index, "pop_estimation" ] = sum_pop_per_square.values 230 | # Drop unnecessary column 231 | df_osm_built.drop('geom', axis=1, inplace=True) 232 | # Set to 0 nan values 233 | df_insee.loc[ df_insee.pop_estimation.isnull(), "pop_estimation" ] = 0 234 | 235 | # Compute absolute and relative error 236 | df_insee["absolute_error"] = df_insee.apply(lambda x: abs(x.pop_count - x.pop_estimation), axis=1) 237 | df_insee["relative_error"] = df_insee.apply(lambda x: abs(x.pop_count - x.pop_estimation) / x.pop_count, axis=1) 238 | 239 | 240 | def get_population_df_filled_empty_squares(df_insee): 241 | """ 242 | Add empty squares as 0-population box-squares 243 | 244 | Parameters 245 | ---------- 246 | df_insee : geopandas.GeoDataFrame 247 | INSEE population data 248 | 249 | Returns 250 | ---------- 251 | 252 | """ 253 | def get_northing_easting(x): # Extract northing and easting coordinates 254 | north, east = x.idINSPIRE.split("N")[1].split("E") 255 | x["north"] = int(north) 256 | x["east"] = int(east) 257 | return x 258 | 259 | # Project data to its original projection coordinates 260 | df_insee_3035 = ox.project_gdf(df_insee, to_crs="+init=epsg:3035") 261 | 262 | # Get northing and easting coordinates 263 | coordinates = df_insee.apply(lambda x: get_northing_easting(x), axis=1 )[["north","east"]] 264 | 265 | # +100 meters to obtain the centroid of each box 266 | coords_offset = 100 267 | # Input data step 268 | step = 200. 269 | 270 | # North, east coordinates denote the south-west box endpoint: 271 | north_min, north_max = coordinates.north.min() + coords_offset, coordinates.north.max() + coords_offset 272 | east_min, east_max = coordinates.east.min() + coords_offset, coordinates.east.max() + coords_offset 273 | 274 | # Create mesh grid: One point for each square's centroid: Each square has an extent of 1km by 1km 275 | xv, yv = np.meshgrid( np.arange(east_min, east_max, step), np.arange(north_min, north_max, step) ) 276 | 277 | # For every given coordinate, if a box is not created (no population), make it with an initial population of 0 278 | empty_population_box = [] 279 | 280 | for E, N in zip( xv.ravel(), yv.ravel() ): # Center-point 281 | point_df = gpd.GeoDataFrame( [Point(E,N)], columns=[ "geometry" ], crs="+init=epsg:3035" ) 282 | if ( gpd.sjoin( point_df, df_insee_3035 ).empty ): # Does not intersect any existing square-box 283 | # Create new square 284 | empty_population_box.append( Polygon([ (E - 100., N - 100.), (E - 100., N + 100. ), (E + 100., N + 100. ), (E + 100., N - 100. ), (E - 100., N - 100. ) ]) ) 285 | 286 | # Concatenate original data frame + Empty squares 287 | gdf_concat = pd.concat( [df_insee_3035, gpd.GeoDataFrame( {'geometry':empty_population_box, 'pop_count':[0]*len(empty_population_box) }, crs="+init=epsg:3035" ) ], ignore_index=True, sort=False ) 288 | 289 | # Remove added grid-cells outside the convex hull of the population data frame 290 | df_insee_convex_hull_3035 = df_insee_3035.unary_union.convex_hull 291 | gdf_concat = gdf_concat[ gdf_concat.apply(lambda x: df_insee_convex_hull_3035.intersects(x.geometry), axis=1 ) ] 292 | gdf_concat.reset_index(drop=True, inplace=True) 293 | 294 | # Project (First project to latitude-longitude due to GeoPandas issue) 295 | return ox.project_gdf( ox.project_gdf(gdf_concat, to_latlong=True) ) -------------------------------------------------------------------------------- /urbansprawl/settings.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | storage_folder = 'data' 7 | images_folder = 'images' 8 | -------------------------------------------------------------------------------- /urbansprawl/sprawl/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lgervasoni/urbansprawl/b26bdf7889fdba1382259be7c14e7e0d8f535cd9/urbansprawl/sprawl/__init__.py -------------------------------------------------------------------------------- /urbansprawl/sprawl/accessibility.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | from scipy import spatial 7 | import numpy as np 8 | import pandas as pd 9 | import geopandas as gpd 10 | import networkx as nx 11 | import time 12 | import os 13 | import subprocess 14 | import shutil 15 | 16 | from multiprocessing import cpu_count 17 | 18 | from osmnx import log 19 | from .utils import divide_long_edges_graph 20 | 21 | ############################################################## 22 | ### Compute accessibility grid 23 | ############################################################## 24 | 25 | def compute_grid_accessibility(df_indices, G, df_osm_built, df_osm_pois, kw_args={'fixed_distance':True,'fixed_activities':False,'max_edge_length':200,'max_node_distance':250, 26 | 'fixed_distance_max_travel_distance':2000, 'fixed_distance_max_num_activities':250, 'fixed_activities_min_number': 20, 'fixed_activities_max_travel_distance':5000} ): 27 | """ 28 | Calculate accessibility values at point_ref 29 | 30 | Parameters 31 | ---------- 32 | df_indices : geopandas.GeoDataFrame 33 | data frame containing the (x,y) reference points to calculate indices 34 | G : networkx multidigraph 35 | input graph to calculate accessibility 36 | df_osm_built : geopandas.GeoDataFrame 37 | data frame containing the building's geometries and corresponding land uses 38 | df_osm_pois : geopandas.GeoDataFrame 39 | data frame containing the points' of interest geometries 40 | kw_args: dict 41 | additional keyword arguments for the indices calculation 42 | fixed_distance : bool 43 | denotes the cumulative opportunities access to activity land uses given a fixed maximum distance to travel 44 | fixed_activities : bool 45 | represents the distance needed to travel in order to reach a certain number of activity land uses 46 | max_edge_length: int 47 | maximum length, in meters, to tolerate an edge in a graph (otherwise, divide edge) 48 | max_node_distance: int 49 | maximum distance tolerated from input point to closest graph node in order to calculate accessibility values 50 | fixed_distance_max_travel_distance: int 51 | (fixed distance) maximum distance tolerated (cut&branch) when searching for the activities 52 | fixed_distance_max_num_activities: int 53 | (fixed distance) cut iteration if the number of activities exceeds a threshold 54 | fixed_activities_min_number: int 55 | (fixed activities) minimum number of activities required 56 | fixed_activities_max_travel_distance : int 57 | (fixed activities) maximum distance tolerated (cut&branch) when searching for the activities 58 | 59 | 60 | Returns 61 | ---------- 62 | int 63 | number of activities found within a radius distance using the street network 64 | """ 65 | log("Calculating accessibility indices") 66 | start = time.time() 67 | 68 | # Assert that only one option is set 69 | assert( kw_args["fixed_distance"] ^ kw_args["fixed_activities"] ) 70 | 71 | # Arguments to pandas.Series 72 | kw_arguments = pd.Series(kw_args) 73 | 74 | ############## 75 | ### Prepare input data for indices calculation in parallel call 76 | ############## 77 | # Temporary folder to pickle data 78 | if ( not os.path.exists("temp") ): os.makedirs("temp") 79 | # Number of CPU cores on your system 80 | num_cores = cpu_count() 81 | # Prepare input data: As many chunks of data as cores 82 | prepare_data(G, df_osm_built, df_osm_pois, df_indices, num_cores, kw_arguments ) 83 | 84 | #This command could have multiple commands separated by a new line \n 85 | parallel_code = os.path.realpath(__file__).replace(".py","_parallel.py") 86 | command_call = "python " + parallel_code + " temp/graph.gpickle temp/points_NUM_CHUNK.pkl temp/arguments.pkl" 87 | 88 | ############## 89 | ### Verify amount of memory used per subprocess 90 | ############## 91 | p = subprocess.Popen(command_call.replace("NUM_CHUNK",str(0)) + " memory_test", stdout=subprocess.PIPE, shell=True) 92 | output, err = p.communicate() 93 | p.wait() 94 | 95 | # Max number of subprocess allocations given its memory consumption 96 | numbers = [ numb for numb in str(output) if numb in ["0","1","2","3","4","5","6","7","8","9"] ] 97 | max_processes = int( ''.join(numbers) ) 98 | log("Maximum number of processes to allocate (considering memory availability): " + str(max_processes) ) 99 | log("Number of available cores: " + str(num_cores) ) 100 | 101 | ############## 102 | ### Set chunks to run in parallel: If more core than allowed processes, divide chunks to run at most X processes 103 | ############## 104 | if (num_cores > max_processes): # Run parallel-chunks at a splitted pace, to avoid memory swap 105 | chunks_run = np.array_split( list( range(num_cores) ), max_processes ) 106 | else: # Run all chunks in parallel 107 | chunks_run = [ list( range(num_cores) ) ] 108 | 109 | 110 | # Parallel implementation 111 | for chunk in chunks_run: # Run full chunk 112 | Ps_i = [] 113 | for i in chunk: # Run each index 114 | p = subprocess.Popen(command_call.replace("NUM_CHUNK",str(i)), stdout=subprocess.PIPE, shell=True) 115 | Ps_i.append( p ) 116 | 117 | # Get the output 118 | Output_errs = [ p.communicate() for p in Ps_i ] 119 | 120 | # This makes the wait possible 121 | Ps_status = [ p.wait() for p in Ps_i ] 122 | 123 | # Output for chunk 124 | for output, err in Output_errs: 125 | log ( str(output) ) 126 | 127 | # Associate data by getting the chunk results concatenated 128 | index_column = "accessibility" 129 | df_indices[index_column] = pd.concat( [ pd.read_pickle("temp/indices_NUM_CHUNK.pkl".replace("NUM_CHUNK",str(i)) ) for i in range(num_cores) ], ignore_index=True ).accessibility 130 | 131 | # Delete temporary folder 132 | shutil.rmtree('temp') 133 | 134 | log("Done: Accessibility indices. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) ) 135 | 136 | ############### 137 | # Utils 138 | ############### 139 | 140 | def prepare_data(G, df_osm_built, df_osm_pois, df_indices, num_processes, kw_arguments): 141 | """ 142 | Pickles data to a temporary folder in order to achieve parallel accessibility calculation 143 | A new subprocess will be created in order to minimize memory requirements 144 | 145 | Parameters 146 | ---------- 147 | G : networkx multidigraph 148 | input graph to calculate accessibility 149 | df_osm_built : geopandas.GeoDataFrame 150 | buildings data 151 | df_osm_pois : geopandas.GeoDataFrame 152 | buildings data 153 | df_indices : geopandas.GeoDataFrame 154 | data frame where indices will be calculated 155 | num_processes : int 156 | number of data chunks to create 157 | kw_arguments : pandas.Series 158 | additional keyword arguments 159 | 160 | Returns 161 | ---------- 162 | 163 | """ 164 | # Divide long edges 165 | divide_long_edges_graph(G, kw_arguments.max_edge_length ) 166 | log("Graph long edges shortened") 167 | 168 | # Get activities 169 | df_built_activ = df_osm_built[ df_osm_built.classification.isin(["activity","mixed"]) ] 170 | df_pois_activ = df_osm_pois[ df_osm_pois.classification.isin(["activity","mixed"]) ] 171 | 172 | # Associate them to its closest node in the graph 173 | associate_activities_closest_node(G, df_built_activ, df_pois_activ ) 174 | log("Activities associated to graph nodes") 175 | 176 | # Nodes dict 177 | for n, data in G.nodes.data(data=True): 178 | # Remove useless keys 179 | keys_ = list( data.keys() ) 180 | [ data.pop(k) for k in keys_ if k not in ["x","y","num_activities"] ] 181 | 182 | # Edges dict 183 | for u, v, data in G.edges.data(data=True, keys=False): 184 | # Remove useless keys 185 | keys_ = list( data.keys() ) 186 | [ data.pop(k) for k in keys_ if k not in ["length","key"] ] 187 | 188 | try: 189 | G.graph.pop("streets_per_node") 190 | except: 191 | pass 192 | # Pickle graph 193 | nx.write_gpickle(G, "temp/graph.gpickle") 194 | 195 | # Prepare input indices points 196 | data_split = np.array_split(df_indices, num_processes) 197 | for i in range(num_processes): 198 | data_split[i].to_pickle("temp/points_"+str(i)+".pkl") 199 | # Pickle arguments 200 | kw_arguments.to_pickle("temp/arguments.pkl") 201 | 202 | def associate_activities_closest_node(G, df_activities_built, df_activities_pois ): 203 | """ 204 | Associates the number of existing activities to their closest nodes in the graph 205 | 206 | Parameters 207 | ---------- 208 | G : networkx multidigraph 209 | input graph to calculate accessibility 210 | df_activities_built : pandas.DataFrame 211 | data selection of buildings with activity uses 212 | df_activities_pois : pandas.DataFrame 213 | data selection of points of interest with activity uses 214 | 215 | Returns 216 | ---------- 217 | 218 | """ 219 | # Initialize number of activity references 220 | for u, data in G.nodes(data=True): 221 | data["num_activities"] = 0 222 | 223 | # Initialize KDTree of graph nodes 224 | coords = np.array([[node, data['x'], data['y']] for node, data in G.nodes(data=True)]) 225 | df_nodes = pd.DataFrame(coords, columns=['node', 'x', 'y']) 226 | # zip coordinates 227 | data = list(zip( df_nodes["x"].ravel(), df_nodes["y"].ravel() )) 228 | # Create input tree 229 | tree = spatial.KDTree( data ) 230 | 231 | def associate_to_node(tree, point, G): 232 | distance, idx_node = tree.query( (point.x,point.y) ) 233 | G.node[ df_nodes.loc[ idx_node, "node"] ]["num_activities"] += 1 234 | 235 | # Associate each activity to its closest node 236 | df_activities_built.apply(lambda x: associate_to_node(tree, x.geometry.centroid, G) , axis=1) 237 | df_activities_pois.apply(lambda x: associate_to_node(tree, x.geometry.centroid, G) , axis=1) -------------------------------------------------------------------------------- /urbansprawl/sprawl/accessibility_parallel.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import sys 7 | import numpy as np 8 | import pandas as pd 9 | import networkx as nx 10 | from bisect import bisect 11 | import time 12 | 13 | def get_nearest_node_utm(G, point, return_dist=False): 14 | """ 15 | Return the nearest graph node to some specified point in UTM coordinates 16 | 17 | Parameters 18 | ---------- 19 | G : networkx multidigraph 20 | input graph 21 | point : tuple 22 | the (x, y) point for which we will find the nearest node in the graph 23 | return_dist : bool 24 | optionally also return the distance between the point and the nearest node 25 | 26 | Returns 27 | ------- 28 | int or tuple 29 | corresponding node or tuple (int node, float distance) 30 | """ 31 | # dump graph node coordinates into a pandas dataframe indexed by node id with x and y columns 32 | coords = np.array([[node, data['x'], data['y']] for node, data in G.nodes(data=True)]) 33 | df = pd.DataFrame(coords, columns=['node', 'x', 'y']).set_index('node') 34 | # Point coordinates 35 | p_x, p_y = point 36 | #print(df.loc[124550]) 37 | distances = df.apply(lambda x: np.sqrt( ( x.x - p_x)**2 + ( x.y- p_y)**2), axis=1 ) 38 | 39 | # nearest node's ID is the index label of the minimum distance 40 | nearest_node = distances.idxmin() 41 | 42 | # if caller requested return_dist, return distance between the point and the nearest node as well 43 | if return_dist: 44 | return int(nearest_node), distances.loc[nearest_node] 45 | else: 46 | return int(nearest_node) 47 | 48 | ############################################## 49 | ### Accessibility indices calculation 50 | ############################################## 51 | 52 | def get_count_activities_fixed_distance(G, point_ref, arguments): 53 | """ 54 | Calculate accessibility value at point_ref according to chosen metric 55 | If no graph node exists nearby input point reference, NaN is set 56 | Based on counting the number of (activity) opportunities given a fixed maximum distance to travel 57 | 58 | Parameters 59 | ---------- 60 | G : networkx multidigraph 61 | input graph to calculate accessibility 62 | point_ref: shapely.Point 63 | reference point to calculate accesisibility 64 | 65 | Returns 66 | ---------- 67 | int 68 | returns the number of reached activities 69 | """ 70 | # Find closest node to point_ref 71 | N0, distance = get_nearest_node_utm(G, point_ref.coords[0], return_dist=True) 72 | # Distance to closest node too high? 73 | if (distance > arguments.max_node_distance): return np.nan 74 | 75 | # Initialize data structures 76 | visited_nodes = set() 77 | neighboring_nodes_id = [] 78 | neighboring_nodes_cost = [] 79 | num_activities_travelled = 0 80 | N_visit = N0 81 | 82 | # Pre-compute the shortest path length from source node N0 to other nodes; using lengths of roads as weight 83 | shortest_path_length_N0_ = nx.single_source_dijkstra_path_length(G, source=N0, cutoff=arguments.fixed_distance_max_travel_distance, weight="length") 84 | 85 | while ( True ): 86 | # Store visited node 87 | visited_nodes.add(N_visit) 88 | 89 | # Update traveled activities 90 | num_activities_travelled += G.node[N_visit]["num_activities"] 91 | 92 | # Reached sufficient number of activities 93 | if ( num_activities_travelled >= arguments.fixed_distance_max_num_activities ): return arguments.fixed_distance_max_num_activities 94 | 95 | # Add to neighboring_nodes the neighbors of visited node 96 | for N_i in G.neighbors(N_visit): 97 | if ( (not N_i in neighboring_nodes_id) and (not N_i in visited_nodes) ): # Not stored/visited already 98 | # Store neighboring nodes, ordered by their distance cost 99 | cost = shortest_path_length_N0_.get(N_i) 100 | 101 | if (cost): # If path within MAX_DISTANCE_TO_TRAVEL exists, add neighboring node 102 | idx_to_insert = bisect( neighboring_nodes_cost, cost ) 103 | # Insert in ordered list 104 | neighboring_nodes_id.insert(idx_to_insert, N_i) 105 | neighboring_nodes_cost.insert(idx_to_insert, cost) 106 | 107 | if (neighboring_nodes_id): # If neighborings nodes exist: Continue iteration 108 | # Update next node to visit 109 | N_visit = neighboring_nodes_id.pop(0) 110 | # Pop cost associated to N_visit 111 | neighboring_nodes_cost.pop(0) 112 | else: # If no neighboring nodes: Reached maximum distance tolerated, cut the iteration 113 | return num_activities_travelled 114 | 115 | return np.nan 116 | 117 | 118 | 119 | def get_minimum_cost_activities_travel(G, point_ref, arguments): 120 | """ 121 | Calculate accessibility value at point_ref according to chosen metric 122 | If no graph node exists nearby input point reference, NaN is set 123 | Based on the minimum radius travel cost to accomplish a certain quantity of activities 124 | 125 | Parameters 126 | ---------- 127 | G : networkx multidigraph 128 | input graph to calculate accessibility 129 | point_ref: shapely.Point 130 | reference point to calculate accessibility 131 | 132 | Returns 133 | ---------- 134 | float 135 | returns the computed radius cost length 136 | """ 137 | # Find closest node to point_ref 138 | N0, distance = get_nearest_node_utm(G, point_ref.coords[0], return_dist=True) 139 | # Distance to closest node too high? 140 | if (distance > arguments.max_node_distance): return np.nan 141 | 142 | # Initialize data structures 143 | visited_nodes = [] 144 | neighboring_nodes_id = [] 145 | neighboring_nodes_cost = [] 146 | activities_travelled = 0 147 | 148 | N_visit = N0 149 | 150 | while ( not activities_travelled >= arguments.fixed_activities_min_number ): 151 | # Store visited node 152 | visited_nodes.append(N_visit) 153 | 154 | # Update traveled activities 155 | activities_travelled += G.node[N_visit]["num_activities"] 156 | 157 | # Add to neighboring_nodes the neighbors of visited node 158 | for N_i in G.neighbors(N_visit): 159 | if ( (not N_i in neighboring_nodes_id) and (not N_i in visited_nodes) ): # Not stored/visited already 160 | # Store neighboring nodes, ordered by their distance cost 161 | cost = nx.shortest_path_length(G,N0,N_i,weight="length") 162 | idx_to_insert = bisect( neighboring_nodes_cost, cost ) 163 | # Insert in ordered list 164 | neighboring_nodes_id.insert(idx_to_insert, N_i) 165 | neighboring_nodes_cost.insert(idx_to_insert, cost) 166 | 167 | if (neighboring_nodes_id): # If not empty 168 | # Update next node to visit 169 | N_visit = neighboring_nodes_id.pop(0) 170 | cost_travel = neighboring_nodes_cost.pop(0) 171 | 172 | # Reached maximum distance tolerated. Cut iteration 173 | if (cost_travel > arguments.fixed_activities_max_travel_distance): 174 | return arguments.fixed_activities_max_travel_distance 175 | else: # Empty neighbors 176 | return np.nan 177 | 178 | # Accomplished. End node: visited_nodes[-1] 179 | return nx.shortest_path_length(G,N0,visited_nodes[-1],weight="length") 180 | 181 | 182 | def main(argv): 183 | """ 184 | Main program to drive the accessibility indices calculation 185 | 186 | Parameters 187 | ---------- 188 | argv : array 189 | arguments to drive the calculation 190 | 191 | Returns 192 | ---------- 193 | 194 | """ 195 | start = time.time() 196 | 197 | # Load graph 198 | G = nx.read_gpickle(argv[1]) 199 | 200 | # Load indices points 201 | indices = pd.read_pickle( argv[2] ) 202 | 203 | # Load indices calculation arguments 204 | arguments = pd.read_pickle( argv[3] ) 205 | 206 | if ( ( len(argv) > 4 ) and (argv[4] == "memory_test" ) ): # Test memory used for current subprocess 207 | import os 208 | import psutil 209 | process = psutil.Process(os.getpid()) 210 | Allocated_process_MB = process.memory_info().rss / 1000 / 1000 211 | Free_system_MB = psutil.virtual_memory().available / 1000 / 1000 212 | #print( MB_total ) 213 | #print( Free_system_MB ) 214 | max_processes = int( Free_system_MB / Allocated_process_MB ) 215 | print(max_processes) 216 | return 217 | 218 | 219 | if (arguments.fixed_activities): 220 | _calculate_accessibility = get_minimum_cost_activities_travel 221 | elif (arguments.fixed_distance): 222 | _calculate_accessibility = get_count_activities_fixed_distance 223 | else: 224 | assert(False) 225 | 226 | # Calculate accessibility 227 | indices["accessibility"] = indices.geometry.apply(lambda x: _calculate_accessibility(G, x, arguments) ) 228 | 229 | # Store results 230 | indices.to_pickle( argv[2].replace('points','indices') ) 231 | 232 | end = time.time() 233 | print( "Time:",str(end-start) ) 234 | 235 | 236 | if __name__ == "__main__": 237 | main(sys.argv) -------------------------------------------------------------------------------- /urbansprawl/sprawl/core.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import pandas as pd 7 | import geopandas as gpd 8 | import numpy as np 9 | from osmnx import log 10 | from shapely.geometry import Point 11 | 12 | from ..osm.core import get_route_graph, get_processed_osm_data 13 | from .landusemix import compute_grid_landusemix 14 | from .accessibility import compute_grid_accessibility 15 | from .dispersion import compute_grid_dispersion 16 | 17 | def get_indices_grid(df_osm_built, df_osm_building_parts, df_osm_pois, step=100): 18 | """ 19 | Creates an input geodataframe with points sampled in a regular grid 20 | 21 | Parameters 22 | ---------- 23 | df_osm_built : geopandas.GeoDataFrame 24 | OSM processed buildings 25 | df_osm_building_parts : geopandas.GeoDataFrame 26 | OSM processed building parts 27 | df_osm_pois : geopandas.GeoDataFrame 28 | OSM processed points of interest 29 | step : int 30 | step to sample the regular grid in meters 31 | 32 | Returns 33 | ---------- 34 | geopandas.GeoDataFrame 35 | regular grid 36 | """ 37 | # Get bounding box 38 | west, south, east, north = pd.concat( [ df_osm_built, df_osm_building_parts, df_osm_pois ], sort=False ).total_bounds 39 | # Create indices 40 | df_indices = gpd.GeoDataFrame( [ Point(i,j) for i in np.arange(west, east, step) for j in np.arange(south, north, step) ], columns=["geometry"] ) 41 | # Set projection 42 | df_indices.crs = df_osm_built.crs 43 | return df_indices 44 | 45 | def process_spatial_indices(city_ref=None, region_args={"polygon":None, "place":None, "which_result":1, "point":None, "address":None, "distance":None, "north":None, "south":None, "east":None, "west":None}, 46 | grid_step = 100, 47 | process_osm_args = {"retrieve_graph":True, "default_height":3, "meters_per_level":3, "associate_landuses_m2":True, "minimum_m2_building_area":9, "date":None}, 48 | dispersion_args = {'radius_search': 750, 'use_median': False, 'K_nearest': 50}, 49 | landusemix_args = {'walkable_distance': 600, 'compute_activity_types_kde': True, 'weighted_kde': True, 'pois_weight': 9, 'log_weighted': True}, 50 | accessibility_args = {'fixed_distance': True, 'fixed_activities': False, 'max_edge_length': 200, 'max_node_distance': 250, 51 | 'fixed_distance_max_travel_distance': 2000, 'fixed_distance_max_num_activities': 250, 'fixed_activities_min_number': 20}, 52 | indices_computation = {"dispersion":True, "landusemix":True, "accessibility":True} ): 53 | """ 54 | Process sprawling indices for an input region of interest 55 | 1) OSM data is retrieved and processed. 56 | If the city name has already been processed, locally stored data will be loaded 57 | 2) A regular grid is created where indices will be calculated 58 | 3) Sprawling indices are calculated and returned 59 | 60 | Parameters 61 | ---------- 62 | city_ref : str 63 | Name of input city / region 64 | grid_step : int 65 | step to sample the regular grid in meters 66 | region_args : dict 67 | contains the information to retrieve the region of interest as the following: 68 | polygon : shapely Polygon or MultiPolygon 69 | geographic shape to fetch the land use footprints within 70 | place : string or dict 71 | query string or structured query dict to geocode/download 72 | which_result : int 73 | result number to retrieve from geocode/download when using query string 74 | point : tuple 75 | the (lat, lon) central point around which to construct the region 76 | address : string 77 | the address to geocode and use as the central point around which to construct the region 78 | distance : int 79 | retain only those nodes within this many meters of the center of the region 80 | north : float 81 | northern latitude of bounding box 82 | south : float 83 | southern latitude of bounding box 84 | east : float 85 | eastern longitude of bounding box 86 | west : float 87 | western longitude of bounding box 88 | process_osm_args : dict 89 | additional arguments to drive the OSM data extraction process: 90 | retrieve_graph : boolean 91 | that determines if the street network for input city has to be retrieved and stored 92 | default_height : float 93 | height of buildings under missing data 94 | meters_per_level : float 95 | buildings number of levels assumed under missing data 96 | associate_landuses_m2 : boolean 97 | compute the total square meter for each land use 98 | minimum_m2_building_area : float 99 | minimum area to be considered a building (otherwise filtered) 100 | date : datetime.datetime 101 | query the database at a certain timestamp 102 | dispersion_args : dict 103 | arguments to drive the dispersion indices calculation 104 | radius_search: int 105 | circle radius to consider the dispersion calculation at a local point 106 | use_median : bool 107 | denotes whether the median or mean should be used to calculate the indices 108 | K_nearest : int 109 | number of neighboring buildings to consider in evaluation 110 | landusemix_args : dict 111 | arguments to drive the land use mix indices calculation 112 | walkable_distance : int 113 | the bandwidth assumption for Kernel Density Estimation calculations (meters) 114 | compute_activity_types_kde : bool 115 | determines if the densities for each activity type should be computed 116 | weighted_kde : bool 117 | use Weighted Kernel Density Estimation or classic version 118 | pois_weight : int 119 | Points of interest weight equivalence with buildings (squared meter) 120 | log_weighted : bool 121 | apply natural logarithmic function to surface weights 122 | accessibility_args : dict 123 | arguments to drive the accessibility indices calculation 124 | fixed_distance : bool 125 | denotes the cumulative opportunities access to activity land uses given a fixed maximum distance to travel 126 | fixed_activities : bool 127 | represents the distance needed to travel in order to reach a certain number of activity land uses 128 | max_edge_length: int 129 | maximum length, in meters, to tolerate an edge in a graph (otherwise, divide edge) 130 | max_node_distance: int 131 | maximum distance tolerated from input point to closest graph node in order to calculate accessibility values 132 | fixed_distance_max_travel_distance: int 133 | (fixed distance) maximum distance tolerated (cut&branch) when searching for the activities 134 | fixed_distance_max_num_activities: int 135 | (fixed distance) cut iteration if the number of activities exceeds a threshold 136 | fixed_activities_min_number: int 137 | (fixed activities) minimum number of activities required 138 | indices_computation : dict 139 | determines what sprawling indices should be computed 140 | 141 | Returns 142 | ---------- 143 | gpd.GeoDataFrame 144 | returns the regular grid with the indicated sprawling indices 145 | """ 146 | try: 147 | # Process OSM data 148 | df_osm_built, df_osm_building_parts, df_osm_pois = get_processed_osm_data(city_ref=city_ref, region_args=region_args, kwargs=process_osm_args) 149 | # Get route graph 150 | G = get_route_graph(city_ref) 151 | 152 | if (not ( indices_computation.get("accessibility") or indices_computation.get("landusemix") or indices_computation.get("dispersion") ) ): 153 | log("Not computing any spatial indices") 154 | return None 155 | 156 | # Get indices grid 157 | df_indices = get_indices_grid(df_osm_built, df_osm_building_parts, df_osm_pois, grid_step) 158 | 159 | # Compute sprawling indices 160 | if (indices_computation.get("accessibility")): 161 | compute_grid_accessibility(df_indices, G, df_osm_built, df_osm_pois, accessibility_args) 162 | if (indices_computation.get("landusemix")): 163 | compute_grid_landusemix(df_indices, df_osm_built, df_osm_pois, landusemix_args) 164 | if (indices_computation.get("dispersion")): 165 | compute_grid_dispersion(df_indices, df_osm_built, dispersion_args) 166 | 167 | return df_indices 168 | 169 | except Exception as e: 170 | log("Could not compute the spatial indices. An exception occurred: " + str(e)) 171 | return None 172 | -------------------------------------------------------------------------------- /urbansprawl/sprawl/dispersion.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | from scipy import spatial 7 | import numpy as np 8 | import pandas as pd 9 | import time 10 | 11 | from osmnx import log 12 | 13 | ############################################################## 14 | ### Dispersion indices methods 15 | ############################################################## 16 | 17 | def closest_building_distance_median( point_ref, tree, df_closest_d, radius_search ): 18 | """ 19 | Dispersion metric at point_ref 20 | Computes the median of the closest distance to another building for each building within a radius search 21 | Uses the input KDTree to accelerate calculations 22 | 23 | Parameters 24 | ---------- 25 | point_ref : shapely.Point 26 | calculate index at input point 27 | tree : scipy.spatial.KDTree 28 | KDTree of buildings centroid 29 | df : pandas.DataFrame 30 | data frame of buildings with closest distance calculation 31 | radius_search : float 32 | circle radius to consider the dispersion calculation at a local point 33 | 34 | Returns 35 | ---------- 36 | float 37 | value of dispersion at input point 38 | """ 39 | # Query buildings within radius search 40 | indices = tree.query_ball_point( point_ref, radius_search ) 41 | # No dispersion value 42 | if (len(indices) == 0): return np.NaN 43 | # Calculate median of closest distance values. If no information is available, NaN is set 44 | return df_closest_d.loc[ indices ].median() 45 | 46 | def closest_building_distance_average( point_ref, tree, df_closest_d, radius_search ): 47 | """ 48 | Dispersion metric at point_ref 49 | Computes the mean of the closest distance to another building for each building within a radius search 50 | Uses the input KDTree to accelerate calculations 51 | 52 | Parameters 53 | ---------- 54 | point_ref : shapely.Point 55 | calculate index at input point 56 | tree : scipy.spatial.KDTree 57 | KDTree of buildings centroid 58 | df : pandas.DataFrame 59 | data frame of buildings with closest distance calculation 60 | radius_search : int 61 | circle radius to consider the dispersion calculation at a local point 62 | 63 | Returns 64 | ---------- 65 | float 66 | value of dispersion at input point 67 | """ 68 | # Query buildings within radius search 69 | indices = tree.query_ball_point( point_ref, radius_search ) 70 | # No dispersion value 71 | if (len(indices) == 0): return np.NaN 72 | # Calculate mean of closest distance values. If no information is available, NaN is set 73 | return df_closest_d.loc[ indices ].mean() 74 | 75 | 76 | ############################################################## 77 | ### Dispersion indices calculation 78 | ############################################################## 79 | 80 | def compute_grid_dispersion(df_indices, df_osm_built, kwargs={"radius_search":750, "use_median":True, "K_nearest":50} ): 81 | """ 82 | Creates grid and calculates dispersion indices. 83 | 84 | Parameters 85 | ---------- 86 | df_indices : geopandas.GeoDataFrame 87 | data frame containing the (x,y) reference points to calculate indices 88 | df_osm_built : geopandas.GeoDataFrame 89 | data frame containing the building's geometries 90 | kw_args: dict 91 | additional keyword arguments for the indices calculation 92 | radius_search: int 93 | circle radius to consider the dispersion calculation at a local point 94 | use_median : bool 95 | denotes whether the median or mean should be used to calculate the indices 96 | K_nearest : int 97 | number of neighboring buildings to consider in evaluation 98 | 99 | Returns 100 | ---------- 101 | geopandas.GeoDataFrame 102 | data frame with the added column for dispersion indices 103 | """ 104 | log("Calculating dispersion indices") 105 | start = time.time() 106 | 107 | # Get radius search: circle radius to consider the dispersion calculation at a local point 108 | radius_search = kwargs["radius_search"] 109 | # Use the median or mean computation ? 110 | use_median = kwargs["use_median"] 111 | 112 | # Assign dispersion calculation method 113 | if (kwargs["use_median"]): 114 | _calculate_dispersion = closest_building_distance_median 115 | else: 116 | _calculate_dispersion = closest_building_distance_average 117 | 118 | # Calculate the closest distance for each building within K_nearest centroid buildings 119 | _apply_polygon_closest_distance_neighbor(df_osm_built, K_nearest = kwargs["K_nearest"] ) 120 | 121 | # For dispersion calculation approximation, create KDTree with buildings centroid 122 | coords_data = [ point.coords[0] for point in df_osm_built.loc[ df_osm_built.closest_d.notnull() ].geometry.apply(lambda x: x.centroid) ] 123 | # Create KDTree 124 | tree = spatial.KDTree( coords_data ) 125 | 126 | # Compute dispersion indices 127 | index_column = "dispersion" 128 | df_indices[index_column] = df_indices.geometry.apply(lambda x: _calculate_dispersion(x, tree, df_osm_built.closest_d, radius_search ) ) 129 | 130 | # Remove added column 131 | df_osm_built.drop('closest_d', axis=1, inplace=True) 132 | 133 | log("Done: Dispersion indices. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) ) 134 | 135 | 136 | def _apply_polygon_closest_distance_neighbor(df_osm_built, K_nearest = 50): 137 | """ 138 | Computes for each polygon, the distance to the (approximated) nearest neighboring polygon 139 | Approximation is done using distance between centroids to K nearest neighboring polygons, then evaluating the real polygon distance 140 | A column `closest_d` is added in the data frame 141 | 142 | Parameters 143 | ---------- 144 | df_osm_built: geopandas.GeoDataFrame 145 | data frame containing the building's geometries 146 | K_nearest: int 147 | number of neighboring polygons to evaluate 148 | 149 | Returns 150 | ---------- 151 | 152 | """ 153 | def get_closest_indices(tree, x, K_nearest): 154 | # Query the closest buidings considering their centroid 155 | return tree.query( x.centroid.coords[0] , k=K_nearest+1)[1][1:] 156 | def compute_closest_distance(x, buildings): 157 | # Minimum distance of all distances between reference building 'x' and the other buildings 158 | return (buildings.apply(lambda b: x.distance(b) ) ).min() 159 | 160 | # Use all elements to get the exact closest neighbor? 161 | if ( (K_nearest == -1) or (K_nearest >= len(df_osm_built)) ): K_nearest = len(df_osm_built)-1 162 | 163 | # Get separate list for coordinates 164 | coords_data = [ geom.centroid.coords[0] for geom in df_osm_built.geometry ] 165 | # Create KD Tree using polygon's centroid 166 | tree = spatial.KDTree( coords_data ) 167 | 168 | # Get the closest buildings indices 169 | df_osm_built['closest_buildings'] = df_osm_built.geometry.apply(lambda x: get_closest_indices(tree, x, K_nearest) ) 170 | # Compute the minimum real distance for the closest buildings 171 | df_osm_built['closest_d'] = df_osm_built.apply(lambda x: compute_closest_distance(x.geometry,df_osm_built.geometry.loc[x.closest_buildings]), axis=1 ) 172 | # Drop unnecessary column 173 | df_osm_built.drop('closest_buildings', axis=1, inplace=True) -------------------------------------------------------------------------------- /urbansprawl/sprawl/landusemix.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import math 7 | import numpy as np 8 | import pandas as pd 9 | import time 10 | 11 | from sklearn.neighbors.kde import KernelDensity 12 | from .utils import WeightedKernelDensityEstimation 13 | 14 | from osmnx import log 15 | 16 | ############################################################## 17 | ### Land use mix indices methods 18 | ############################################################## 19 | 20 | def metric_phi_entropy(x,y): 21 | """ 22 | Shannon's entropy metric 23 | Based on article "Comparing measures of urban land use mix, 2013" 24 | 25 | Parameters 26 | ---------- 27 | x : float 28 | probability related to land use X 29 | y : float 30 | probability related to land use Y 31 | 32 | Returns 33 | ---------- 34 | float 35 | entropy value 36 | """ 37 | # Undefined for negative values 38 | if (x<0 or y<0): return np.nan 39 | # Entropy Index not defined for any input value equal to zero (due to logarithm) 40 | if (x == 0 or y == 0): return 0 41 | # Sum = 1 42 | x_,y_ = x/(x+y), y/(x+y) 43 | phi_value = - ( ( x_*math.log(x_) ) + ( y_*math.log(y_) ) ) / math.log(2) 44 | return phi_value 45 | 46 | #### Assign land use mix method 47 | _land_use_mix = metric_phi_entropy 48 | 49 | ############################################################## 50 | ### Land use mix indices calculation 51 | ############################################################## 52 | 53 | def compute_grid_landusemix(df_indices, df_osm_built, df_osm_pois, kw_args={'walkable_distance':600,'compute_activity_types_kde':True,'weighted_kde':True,'pois_weight':9,'log_weighted':True} ): 54 | """ 55 | Calculate land use mix indices on input grid 56 | 57 | Parameters 58 | ---------- 59 | df_indices : geopandas.GeoDataFrame 60 | data frame containing the (x,y) reference points to calculate indices 61 | df_osm_built : geopandas.GeoDataFrame 62 | data frame containing the building's geometries 63 | df_osm_pois : geopandas.GeoDataFrame 64 | data frame containing the points' of interest geometries 65 | kw_args: dict 66 | additional keyword arguments for the indices calculation 67 | walkable_distance : int 68 | the bandwidth assumption for Kernel Density Estimation calculations (meters) 69 | compute_activity_types_kde : bool 70 | determines if the densities for each activity type should be computed 71 | weighted_kde : bool 72 | use Weighted Kernel Density Estimation or classic version 73 | pois_weight : int 74 | Points of interest weight equivalence with buildings (squared meter) 75 | log_weighted : bool 76 | apply natural logarithmic function to surface weights 77 | 78 | Returns 79 | ---------- 80 | pandas.DataFrame 81 | land use mix indices 82 | """ 83 | log("Calculating land use mix indices") 84 | start = time.time() 85 | 86 | # Get the bandwidth, related to 'walkable distances' 87 | bandwidth = kw_args["walkable_distance"] 88 | # Compute a weighted KDE? 89 | weighted_kde = kw_args["weighted_kde"] 90 | X_weights = None 91 | 92 | # Get full list of contained POIs 93 | contained_pois = list(set([element for list_ in df_osm_built.containing_poi[ df_osm_built.containing_poi.notnull() ] for element in list_])) 94 | # Get the POIs not contained by any building 95 | df_osm_pois_not_contained = df_osm_pois[ ~ df_osm_pois.index.isin( contained_pois) ] 96 | 97 | ############ 98 | ### Calculate land use density estimations 99 | ############ 100 | 101 | #### 102 | # Residential 103 | #### 104 | df_osm_built_indexed = df_osm_built[ df_osm_built.classification.isin(["residential","mixed"]) ] 105 | if (weighted_kde): X_weights = df_osm_built_indexed.landuses_m2.apply(lambda x: x["residential"] ) 106 | 107 | df_indices["residential_pdf"] = calculate_kde(df_indices.geometry, df_osm_built_indexed, None, bandwidth, X_weights, kw_args["pois_weight"], kw_args["log_weighted"] ) 108 | log("Residential density estimation done") 109 | 110 | #### 111 | # Activities 112 | #### 113 | df_osm_built_indexed = df_osm_built[ df_osm_built.classification.isin(["activity","mixed"]) ] 114 | df_osm_pois_not_cont_indexed = df_osm_pois_not_contained[ df_osm_pois_not_contained.classification.isin(["activity","mixed"]) ] 115 | if (weighted_kde): X_weights = df_osm_built_indexed.landuses_m2.apply(lambda x: x["activity"] ) 116 | 117 | df_indices["activity_pdf"] = calculate_kde(df_indices.geometry, df_osm_built_indexed, df_osm_pois_not_cont_indexed, bandwidth, X_weights, kw_args["pois_weight"], kw_args["log_weighted"] ) 118 | log("Activity density estimation done") 119 | 120 | #### 121 | # Compute activity types densities 122 | #### 123 | if ( kw_args["compute_activity_types_kde"] ): 124 | assert('activity_category' in df_osm_built.columns) 125 | 126 | # Get unique category values 127 | unique_categories_built = [list(x) for x in set(tuple(x) for x in df_osm_built.activity_category.values if isinstance(x,list) ) ] 128 | unique_categories_pois = [list(x) for x in set(tuple(x) for x in df_osm_pois_not_cont_indexed.activity_category.values if isinstance(x,list) ) ] 129 | flat_list = [item for sublist in unique_categories_built + unique_categories_pois for item in sublist] 130 | categories = list( set(flat_list) ) 131 | 132 | for cat in categories: # Get data frame selection of input category 133 | # Buildings and POIs within that category 134 | df_built_category = df_osm_built_indexed[ df_osm_built_indexed.activity_category.apply(lambda x: (isinstance(x,list)) and (cat in x) ) ] 135 | df_pois_category = df_osm_pois_not_cont_indexed[ df_osm_pois_not_cont_indexed.activity_category.apply(lambda x: (isinstance(x,list)) and (cat in x) ) ] 136 | if (weighted_kde): X_weights = df_built_category.landuses_m2.apply(lambda x: x[ cat ] ) 137 | 138 | df_indices[ cat + "_pdf" ] = calculate_kde( df_indices.geometry, df_built_category, df_pois_category, bandwidth, X_weights, kw_args["pois_weight"], kw_args["log_weighted"] ) 139 | 140 | log("Activity grouped by types density estimation done") 141 | 142 | 143 | # Compute land use mix indices 144 | index_column = "landusemix" 145 | df_indices[index_column] = df_indices.apply(lambda x: _land_use_mix(x.activity_pdf, x.residential_pdf), axis=1 ) 146 | df_indices["landuse_intensity"] = df_indices.apply(lambda x: (x.activity_pdf + x.residential_pdf)/2., axis=1 ) 147 | 148 | log("Done: Land use mix indices. Elapsed time (H:M:S): " + time.strftime("%H:%M:%S", time.gmtime(time.time()-start)) ) 149 | 150 | #### 151 | 152 | def calculate_kde(points, df_osm_built, df_osm_pois=None, bandwidth=400, X_weights=None, pois_weight=9, log_weight=True): 153 | """ 154 | Evaluate the probability density function using Kernel Density Estimation of input geo-localized data 155 | KDE's bandwidth stands for walkable distances 156 | If input weights are given, a Weighted Kernel Density Estimation is carried out 157 | 158 | Parameters 159 | ---------- 160 | points : geopandas.GeoSeries 161 | reference points to calculate indices 162 | df_osm_built : geopandas.GeoDataFrame 163 | data frame containing the building's geometries 164 | df_osm_pois : geopandas.GeoDataFrame 165 | data frame containing the points' of interest geometries 166 | bandwidth: int 167 | bandwidth value to be employed on the Kernel Density Estimation 168 | X_weights : pandas.Series 169 | indicates the weight for each input building (e.g. surface) 170 | pois_weight : int 171 | weight assigned to points of interest 172 | log_weight : bool 173 | if indicated, applies a log transformation to input weight values 174 | 175 | Returns 176 | ---------- 177 | pandas.Series 178 | 179 | """ 180 | # X_b : Buildings array 181 | X_b = [ [p.x,p.y] for p in df_osm_built.geometry.centroid.values ] 182 | 183 | # X_p : Points array 184 | if (df_osm_pois is None): X_p = [] 185 | else: X_p = [ [p.x,p.y] for p in df_osm_pois.geometry.centroid.values ] 186 | 187 | # X : Full array 188 | X = np.array( X_b + X_p ) 189 | 190 | # Points where the probability density function will be evaluated 191 | Y = np.array( [ [p.x,p.y] for p in points.values ] ) 192 | 193 | if (not (X_weights is None) ): # Weighted Kernel Density Estimation 194 | # Building's weight + POIs weight 195 | X_W = np.concatenate( [X_weights.values, np.repeat( [pois_weight], len(X_p) )] ) 196 | 197 | if (log_weight): # Apply logarithm 198 | X_W = np.log( X_W ) 199 | 200 | PDF = WeightedKernelDensityEstimation(X, X_W, bandwidth, Y) 201 | return pd.Series( PDF / PDF.max() ) 202 | else: # Kernel Density Estimation 203 | # Sklearn 204 | kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth).fit(X) 205 | # Sklearn returns the results in the form log(density) 206 | PDF = np.exp(kde.score_samples(Y)) 207 | return pd.Series( PDF / PDF.max() ) -------------------------------------------------------------------------------- /urbansprawl/sprawl/utils.py: -------------------------------------------------------------------------------- 1 | ################################################################################################### 2 | # Repository: https://github.com/lgervasoni/urbansprawl 3 | # MIT License 4 | ################################################################################################### 5 | 6 | import numpy as np 7 | import pandas as pd 8 | import networkx as nx 9 | import math 10 | from shapely.geometry import LineString 11 | from scipy.spatial.distance import cdist 12 | 13 | 14 | def WeightedKernelDensityEstimation(X, Weights, bandwidth, Y, max_mb_per_chunk = 1000): 15 | """ 16 | Computes a Weighted Kernel Density Estimation 17 | 18 | Parameters 19 | ---------- 20 | X : array 21 | input points 22 | Weights : array 23 | array of weights associated to points 24 | bandwidth : float 25 | bandwidth for kernel density estimation 26 | Y : array 27 | points where density estimations will be performed 28 | 29 | Returns 30 | ---------- 31 | pd.Series 32 | returns an array of the estimated densities rescaled between [0;1] 33 | """ 34 | def get_megabytes_pairwise_distances_allocation(X, Y): 35 | # Calculate MB needed to allocate pairwise distances 36 | return len(X) * len(Y) * 8 * 1e-6 37 | 38 | # During this procedure, pairwise euclidean distances are computed between inputs points X and points to estimate Y 39 | # For this reason, Y is divided in chunks to avoid big memory allocations. At most, X megabytes per chunk are allocated for pairwise distances 40 | Y_split = np.array_split( Y, math.ceil( get_megabytes_pairwise_distances_allocation(X,Y) / max_mb_per_chunk ) ) 41 | 42 | """ 43 | ### Step by step 44 | # Weighed KDE: Sum{ Weight_i * K( (X-Xi) / h) } 45 | W_norm = np.array( Weights / np.sum(Weights) ) 46 | cdist_values = cdist( Y, X, 'euclidean') / bandwidth 47 | Ks = np.exp( -.5 * ( cdist_values ) ** 2 ) 48 | PDF = np.sum( Ks * W_norm, axis=1) 49 | """ 50 | """ 51 | ### Complete version. Memory consuming 52 | PDF = np.sum( np.exp( -.5 * ( cdist( Y, X, 'euclidean') / bandwidth ) ** 2 ) * ( np.array( Weights / np.sum(Weights) ) ), axis=1) 53 | """ 54 | 55 | ### Divide Y in chunks to avoid big memory allocations 56 | PDF = np.concatenate( [ np.sum( np.exp( -.5 * ( cdist( Y_i, X, 'euclidean') / bandwidth ) ** 2 ) * ( np.array( Weights / np.sum(Weights) ) ), axis=1) for Y_i in Y_split ] ) 57 | # Rescale 58 | return pd.Series( PDF / PDF.sum() ) 59 | 60 | 61 | def cut_in_two(line): 62 | """ 63 | Cuts input line into two lines of equal length 64 | 65 | Parameters 66 | ---------- 67 | line : shapely.LineString 68 | input line 69 | 70 | Returns 71 | ---------- 72 | list (LineString, LineString, Point) 73 | two lines and the middle point cutting input line 74 | """ 75 | from shapely.geometry import Point, LineString 76 | # Get final distance value 77 | distance = line.length/2 78 | # Cuts a line in two at a distance from its starting point 79 | if distance <= 0.0 or distance >= line.length: 80 | return [LineString(line)] 81 | coords = list(line.coords) 82 | for i, p in enumerate(coords): 83 | pd = line.project(Point(p)) 84 | if pd == distance: 85 | return [LineString(coords[:i+1]), LineString(coords[i:]), pd] 86 | if pd > distance: 87 | cp = line.interpolate(distance) 88 | return [ LineString(coords[:i] + [(cp.x, cp.y)]), LineString([(cp.x, cp.y)] + coords[i:]), cp] 89 | 90 | class NodeCounter: 91 | """ 92 | Node negative counter. Utils for node osmid creation. Start on -1 and it auto decrements 93 | """ 94 | def __init__(self): 95 | self._num = 0 96 | def get_num(self): 97 | self._num -= 1 98 | return self._num 99 | 100 | def verify_divide_edge(G, u, v, key, data, node_creation_counter, max_edge_length): 101 | """ 102 | Verify if edge(u,v)[key] length is higher than a certain threshold 103 | In this case, divide edge(u,v) in two edges of equal length 104 | Assign negative values to the edges new osm id 105 | Call recursively to continue dividing each of the lines if necessary 106 | 107 | Parameters 108 | ---------- 109 | G : networkx multidigraph 110 | input graph 111 | u : node 112 | origin node 113 | v : node 114 | destination node 115 | key : int 116 | (u,v) arc identifier 117 | data : dict 118 | arc data 119 | node_creation_counter : NodeCounter 120 | node identifier creation 121 | max_edge_length : float 122 | maximum tolerated edge length 123 | 124 | Returns 125 | ---------- 126 | 127 | """ 128 | # Input: Two communicated nodes (u, v) 129 | if ( data["length"] <= max_edge_length ): # Already satisfy condition? 130 | return 131 | 132 | # Get geometry connecting (u,v) 133 | if ( data.get("geometry",None) ): # Geometry exists 134 | geometry = data["geometry"] 135 | else: # Real geometry is a straight line between the two nodes 136 | P_U = G.node[u]["x"], G.node[u]["y"] 137 | P_V = G.node[v]["x"], G.node[v]["y"] 138 | geometry = LineString( (P_U, P_V) ) 139 | 140 | # Get geometries for edge(u,middle), edge(middle,v) and node(middle) 141 | line1, line2, middle_point = cut_in_two(geometry) 142 | 143 | # Copy edge(u,v) data to conserve attributes. Modify its length 144 | data_e1 = data.copy() 145 | data_e2 = data.copy() 146 | # Associate correct length 147 | data_e1["length"] = line1.length 148 | data_e2["length"] = line2.length 149 | # Assign geometries 150 | data_e1["geometry"] = line1 151 | data_e2["geometry"] = line2 152 | 153 | # Create new node: Middle distance of edge 154 | x,y = list(middle_point.coords)[0] 155 | # Set a new unique osmid: Negative (as in OSM2PGSQL, created objects contain negative osmid) 156 | node_osmid = node_creation_counter.get_num() 157 | node_data = {'osmid':node_osmid, 'x':x, 'y':y} 158 | 159 | # Add middle node with its corresponding data 160 | G.add_node(node_osmid) 161 | nx.set_node_attributes(G, {node_osmid : node_data } ) 162 | 163 | # Add edges (u,middle) and (middle,v) 164 | G.add_edge(u, node_osmid) 165 | nx.set_edge_attributes(G, { (u, node_osmid, 0): data_e1 } ) 166 | G.add_edge(node_osmid, v) 167 | nx.set_edge_attributes(G, { (node_osmid, v, 0): data_e2 } ) 168 | 169 | # Remove edge (u,v) 170 | G.remove_edge(u,v,key=key) 171 | 172 | # Recursively verify created edges and divide if necessary. Use last added key to identify the edge 173 | last_key = len( G[u][node_osmid] ) -1 174 | verify_divide_edge(G, u, node_osmid, last_key, data_e1, node_creation_counter, max_edge_length) 175 | last_key = len( G[node_osmid][v] ) -1 176 | verify_divide_edge(G, node_osmid, v, last_key, data_e2, node_creation_counter, max_edge_length) 177 | 178 | 179 | def divide_long_edges_graph(G, max_edge_length): 180 | """ 181 | Divide all edges with a higher length than input threshold by means of dividing the arcs and creating new nodes 182 | 183 | Parameters 184 | ---------- 185 | G : networkx multidigraph 186 | input graph 187 | max_edge_length : float 188 | maximum tolerated edge length 189 | 190 | Returns 191 | ---------- 192 | 193 | """ 194 | # Negative osm_id indicate created nodes 195 | node_creation_counter = NodeCounter() 196 | 197 | for u, v, key, data in list( G.edges(data=True, keys=True) ): 198 | if ( data["length"] > max_edge_length ): 199 | # Divide the edge (u,v) recursively 200 | verify_divide_edge(G, u, v, key, data, node_creation_counter, max_edge_length) --------------------------------------------------------------------------------