├── .gitignore ├── CITATION.cff ├── README.md ├── images ├── 1.png ├── 2.png ├── 3.png ├── 4.png ├── 5.png ├── 6.png ├── 7.png ├── 8.png ├── panoramic-noroads.png ├── panoramic-roads.png └── pipeline.png ├── main_script.py ├── mapillaryGVI.yml ├── mapillary_GVI_googlecolab.ipynb ├── modules ├── availability.py ├── osmnx_road_network.py ├── process_data.py └── segmentation_images.py ├── pipeline_step_by_step.ipynb ├── predict_missing_gvi.py ├── pygam └── pygam.py └── scripts ├── get_gvi_gpkg.py ├── mean_gvi_street.py └── results_metrics.py /.gitignore: -------------------------------------------------------------------------------- 1 | results -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | message: "If you use this software, please cite it as below." 3 | authors: 4 | - family-names: "Vázquez Sánchez" 5 | given-names: "Ilse Abril" 6 | - family-names: "Labib" 7 | given-names: "SM" 8 | orcid: "https://orcid.org/0000-0002-4127-2075" 9 | title: "Automated Green View Index Modeling Pipeline using Mapillary Street Images and Transformer models" 10 | version: 0.1.0 11 | doi: 10.5281/zenodo.8106479 12 | date-released: 2023-07-03 13 | url: "https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility" 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Automated Green View Index Modeling Pipeline using Mapillary Street Images and Transformer models [![DOI](https://zenodo.org/badge/637342975.svg)](https://zenodo.org/badge/latestdoi/637342975) 2 | 3 | 4 | Open In Colab 5 | 6 | 7 | 8 | 9 | ## Aims and objectives 10 | Urban green spaces provide various benefits, but assessing their visibility is challenging. Traditional methods and Google Street View (GSV) has limitations, therefore integrating Volunteered Street View Imagery (VSVI) platforms like Mapillary has been proposed. Mapillary offers open data and a large community of contributors, but it has its own limitations in terms of data quality and coverage. However, for areas with insufficient street image data, the Normalised Difference Vegetation Index (NDVI) can be used as an alternative indicator for quantifying greenery. While some studies have shown the potential of Mapillary for evaluating urban greenness visibility, there is a lack of systematic evaluation and standardised methodologies. 11 | 12 | The primary objective of this project is to develop a scalable and reproducible framework for leveraging Mapillary street-view image data to assess the Green View Index (GVI) in diverse geographical contexts. Additionally, the framework will utilise NDVI to supplement information in areas where data is unavailable. 13 | 14 | 15 | ## Content 16 | - [Setting up the environment](#setting-up-the-environment) 17 | - [Running in Google Colab](#running-in-google-colab) 18 | - [Running in a local environment](#running-in-a-local-environment) 19 | - [Explaining the Pipeline](#explaining-the-pipeline) 20 | - [Step 1. Retrieve street road network and generate sample points](#step-1-retrieve-street-road-network-and-generate-sample-points) 21 | - [Step 2. Assign images to each sample point based on proximity](#step-2-assign-images-to-each-sample-point-based-on-proximity) 22 | - [Step 3. Clean and process data](#step-3-clean-and-process-data) 23 | - [Step 4. Calculate GVI](#step-4-calculate-gvi) 24 | - [Step 5 (Optional). Evaluate image availability and image usability of Mapillary Image data](#step-5-optional-evaluate-image-availability-and-image-usability-of-mapillary-image-data) 25 | - [Step 6 (Optional). Model GVI for missing points](#step-6-optional-model-gvi-for-missing-points) 26 | - [Acknowledgements and Contact Information](#acknowledgements-and-contact-information) 27 |

28 | 29 | 30 | ## Setting up the environment 31 | 32 | ### Running in Google Colab 33 | To run the project in Google Colab, you have two options: 34 | 35 |
    36 |
  1. Download the mapillary_GVI_googlecolab.ipynb notebook and open it on Google Colab.
  2. 37 |
  3. Alternatively, you can directly access the notebook using this link
  4. 38 |
39 | 40 | Before running the Jupyter Notebook, it is optional but highly recommended to configure Google Colab to use a GPU. Follow these steps: 41 |
    42 |
  1. Go to the "Runtime" menu at the top.
  2. 43 |
  3. Select "Change runtime type" from the dropdown menu.
  4. 44 |
  5. In the "Runtime type" section, choose "Python 3".
  6. 45 |
  7. In the "Hardware accelerator" section, select "GPU".
  8. 46 |
  9. In the "GPU type" section, choose "T4" if available.
  10. 47 |
  11. In the "Runtime shape" section, select "High RAM".
  12. 48 |
  13. Save the notebook settings
  14. 49 |
50 | 51 | This notebook contains the following code: 52 |
    53 |
  1. Install Required Libraries: To begin, the notebook ensures that the required libraries are installed, making sure that all the necessary dependencies are available for execution within the Google Colab environment. 54 | 55 | ```python 56 | %pip install transformers==4.29.2 57 | %pip install geopandas=0.12.2 58 | %pip install torch==1.13.1 59 | %pip install vt2geojson==0.2.1 60 | %pip install mercantile==1.2.1 61 | %pip install osmnx==1.3.0 62 | ``` 63 |
  2. 64 | 65 |
  3. Mount Google Drive: To facilitate convenient access to files and storage, the notebook proceeds to mount Google Drive. This step allows for the seamless uploading of the project folder, which can then be easily accessed and utilised throughout the entirety of the notebook. 66 | 67 | ```python 68 | from google.colab import drive 69 | 70 | drive.mount('/content/drive') 71 | 72 | %cd /content/drive/MyDrive 73 | ``` 74 |
  4. 75 | 76 |
  5. Clone GitHub Repository (If Needed): To ensure the availability of the required scripts and files from the "StreetView-NatureVisibility" GitHub repository, the notebook first checks if the repository has already been cloned in the Google Drive. If the repository is not found, the notebook proceeds to clone it using the 'git clone' command. This step guarantees that all the necessary components from the repository are accessible and ready for use. 77 | 78 | ```python 79 | import os 80 | 81 | if not os.path.isdir('StreetView-NatureVisibility'): 82 | !git clone https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility.git 83 | 84 | %cd StreetView-NatureVisibility 85 | ``` 86 |
  6. 87 | 88 |
  7. Set Analysis Parameters: To customise the analysis, it is essential to modify the values of the following variables based on your specific requirements. 89 | 90 | ```python 91 | place = 'De Uithof, Utrecht' 92 | distance = 50 93 | cut_by_road_centres = 0 94 | access_token = 'MLY|' 95 | file_name = 'utrecht-gvi' 96 | max_workers = 6 97 | num_sample_images = 10 98 | begin = None 99 | end = None 100 | ``` 101 | In this example, the main_script.py file will be executed to analyse the Green View Index for De Uithof, Utrecht (Utrecht Science Park). 102 | 103 | Replace the following parameters with appropriate values: 104 | 114 |
    115 |
  8. 116 | 117 |
  9. Retrieve Green View Index (GVI) Data: The notebook executes a script ('main_script.py') to retrieve the Green View Index (GVI) data. The script takes the specified analysis parameters as input and performs the data retrieval process. 118 | 119 | ```python 120 | command = f"python main_script.py '{place}' {distance} {cut_by_road_centres} '{access_token}' {file_name} {max_workers} {begin if begin is not None else ''} {end if end is not None else ''}" 121 | !{command} 122 | ``` 123 |
  10. 124 |
  11. Generate GeoPackage Files (Optional): After retrieving the GVI data, the notebook executes another script ('get_gvi_gpkg.py') to generate GeoPackage files from the obtained CSV files. The generated GeoPackage files include the road network of the analysed place, sample points, and the CSV file containing GVI values. 125 | 126 | ```python 127 | command = f"python get_gvi_gpkg.py '{place}'" 128 | !{command} 129 | ``` 130 |
  12. 131 |
  13. Compute Mean GVI per Street, and Get Availability and Usability Scores (Optional): Additionally, the notebook provides the option to compute the mean Green View Index (GVI) value per street in the road network. Running a script ('mean_gvi_street.py') achieves this computation. 132 | 133 | ```python 134 | command = f"python mean_gvi_street.py '{place}'" 135 | !{command} 136 | ``` 137 | 138 | Once this script was executed, the code to calculate the Image Availability Score and Image Usability Score, along with other quality metrics can be run. 139 | 140 | ```python 141 | command = f"python results_metrics.py '{place}'" 142 | !{command} 143 | ``` 144 | 145 |
  14. 146 |
  15. Estimate missing GVI points with NDVI (Optional): Finally, it is possible to make an estimation of the GVI values for the points that have missing images using the NDVI value and linear regression. Before proceeding to the next cell, please make sure to follow these steps: 147 |
      148 |
    1. Choose a projection in metres that is suitable for your study area.
    2. 149 |
    3. Ensure that you have created the required folder structure: StreetView-NatureVisibility/results/{place}/ndvi.
    4. 150 |
    5. Place the corresponding NDVI file, named ndvi.tif, inside this folder. It is recommended to use an NDVI file that has been consistently generated for the study area over the course of a year. The NDVI file must be in the same chosen projection for your area of study
    6. 151 |
    152 | 153 | Important note: please note that the EPSG code specified in the code, which is 32631, is just an example for De Uithof, Netherlands. 154 | 155 | ```python 156 | epsg_code = 32631 157 | command = f"python predict_missing_gvi.py '{place}' {epsg_code} {distance}" 158 | !{command} 159 | ``` 160 | 161 |
  16. 162 |
  17. Accessing Results: Once the analysis is completed, you can access your Google Drive and navigate to the 'StreetView-NatureVisibility/results/' folder. Inside this folder, you will find a subfolder named after the location that was analysed. This subfolder contains several directories, including: 163 | 169 |
  18. 170 |
171 | 172 |

173 | 174 | ### Running in a local environment 175 | To create a Conda environment and run the code using the provided YML file, follow these steps: 176 | 177 |
    178 |
  1. Cloning GitHub Repository: Open a terminal or command prompt on your computer and navigate to the directory where you want to clone the GitHub repository using the following commands: 179 | 180 |
      181 |
    1. Use the cd command to change directories. For example, if you want to clone the repository in the "Documents" folder, you can use the following command: 182 | 183 | ```bash 184 | cd Documents 185 | ``` 186 |
    2. 187 |
    3. Clone the GitHub repository named "StreetView-NatureVisibility" by executing the following command: 188 | 189 | ```bash 190 | git clone https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility.git 191 | ``` 192 | This command will download the repository and create a local copy on your computer. 193 |
    4. 194 |
    5. Once the cloning process is complete, navigate to the cloned repository by using the cd command: 195 | 196 | ```bash 197 | cd StreetView-NatureVisibility 198 | ``` 199 |
    6. 200 |
    201 |
  2. 202 |
  3. Create a Conda environment using the provided YML file: Run the following command to create the Conda environment: 203 | 204 | ```bash 205 | conda env create -f mapillaryGVI.yml 206 | ``` 207 | This command will read the YML file and start creating the environment with the specified dependencies. The process may take a few minutes to complete. 208 |
  4. 209 |
  5. Activate conda environment: After the environment creation is complete, activate the newly created environment using the following command: 210 | 211 | ```bash 212 | conda activate mapillaryGVI 213 | ``` 214 |
  6. 215 |
  7. Compute GVI index: Once the environment is activated, you can start using the project. To run the code and analyse the Green View Index of a specific place, open the terminal and execute the following command: 216 | 217 | ```bash 218 | python main_script.py place distance cut_by_road_centres access_token file_name max_workers num_sample_images begin end 219 | ``` 220 | 221 | Replace the following parameters with appropriate values: 222 | 232 |
    233 |
  8. 234 | 235 |
  9. Generate GeoPackage Files (Optional): After retrieving the GVI data, you have the option to generate GeoPackage files from the obtained CSV files. This step can be executed by running the following command in the terminal: 236 | 237 | ```bash 238 | python get_gvi_gpkg.py place 239 | ``` 240 |
  10. 241 |
  11. Compute Mean GVI per Street, and Get Availability and Usability Scores (Optional): Additionally, you can compute the mean Green View Index (GVI) value per street in the road network. To perform this computation, run the following command in the terminal: 242 | 243 | ```bash 244 | python mean_gvi_street.py place 245 | ``` 246 | 247 | Once this script was executed, the script to calculate the Image Availability Score and Image Usability Score, along with other quality metrics can be run. 248 | 249 | ```python 250 | python results_metrics.py place 251 | ``` 252 |
  12. 253 |
  13. Estimate missing GVI points with NDVI file (Optional): Finally, it is possible to make an estimation of the GVI values for the points that have missing images using the NDVI value and linear regression. Before proceeding to the next cell, please make sure to follow these steps: 254 |
      255 |
    1. Make sure to use an appropriate projection in metres that is suitable for your study area. For example, you can use the same projection as the one used in the roads.gpkg file.
    2. 256 |
    3. Ensure that you have created the required folder structure: StreetView-NatureVisibility/results/{place}/ndvi. Place the corresponding NDVI file, named ndvi.tif, inside this folder. It is recommended to use an NDVI file that has been consistently generated for the study area over the course of a year. The NDVI file must be in the same chosen projection for your area of study.
    4. 257 |
    258 | 259 | ```shell 260 | python predict_missing_gvi.py {place} {epsg_code} {distance} 261 | ``` 262 | 263 |
  14. 264 |
  15. Accessing Results: Once the analysis is completed, you can navigate to the cloned repository directory on your local computer. Inside the repository, you will find a folder named results. Within the results folder, there will be a subfolder named after the location that was analysed. This subfolder contains several directories, including: 265 |
272 |

273 | 274 | ## Explaining the Pipeline 275 | 276 | For this explanation, Utrecht Science Park will be used. Therefore, the command should look like this: 277 | 278 | ```bash 279 | python main_script.py 'De Uithof, Utrecht' 50 0 'MLY|' sample-file 8 280 | ``` 281 | When executing this command, the code will automatically run from Step 1 to Step 4. 282 | 283 | ![png](images/pipeline.png) 284 | 285 | 286 | ### Step 1. Retrieve street road network and generate sample points 287 | 288 | The first step of the code is to retrieve the road network for a specific place using OpenStreetMap data with the help of the OSMNX library. It begins by fetching the road network graph, focusing on roads that are suitable for driving. One important thing to note is that for bidirectional streets, the osmnx library returns duplicate lines. In this code, we take care to remove these duplicates and keep only the unique road segments to ensure that samples are not taken on the same road multiple times, preventing redundancy in subsequent analysis. 289 | 290 | Following that, the code proceeds to project the graph from its original latitude-longitude coordinates to a local projection in metres. This projection is crucial for achieving accurate measurements in subsequent steps where we need to calculate distances between points. By converting the graph to a local projection, we ensure that our measurements align with the real-world distances on the ground, enabling precise analysis and calculations based on the road network data. 291 | 292 | 293 | ```python 294 | road = get_road_network(place) 295 | ``` 296 | 297 | ![png](images/1.png) 298 | 299 | Then, a list of evenly distributed points along the road network, with a specified distance between each point is generated. This is achieved using a function that takes the road network data and an optional distance parameter N, which is set to a default value of 50 metres. 300 | 301 | The function iterates over each road in the roads dataframe and creates points at regular intervals of the specified distance (N). By doing so, it ensures that the generated points are evenly spaced along the road network. 302 | 303 | To maintain a consistent spatial reference, the function sets the Coordinate Reference System (CRS) of the gdf_points dataframe to match the CRS of the roads dataframe. This ensures that the points and the road network are in the same local projected CRS, measured in metres. 304 | 305 | Furthermore, to avoid duplication and redundancy, the function removes any duplicate points in the gdf_points dataframe based on the geometry column. This ensures that each point in the resulting dataframe is unique and represents a distinct location along the road network. 306 | 307 | 308 | ```python 309 | points = select_points_on_road_network(road, distance) 310 | ``` 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 |
geometry
0POINT (649611.194 5772295.371)
1POINT (649609.587 5772345.345)
2POINT (649607.938 5772395.318)
3POINT (649606.112 5772445.285)
4POINT (649604.286 5772495.252)
342 | 343 | ![png](images/2.png) 344 | 345 | 346 | ### Step 2. Assign images to each sample point based on proximity 347 | 348 | The next step in the pipeline focuses on finding the closest features (images) for each point. 349 | 350 | To facilitate this process, the map is divided into smaller sections called tiles. Each tile represents a specific region of the map at a given zoom level. The XYZ tile scheme is employed, where each tile is identified by its zoom level (z), row (x), and column (y) coordinates. In this case, a zoom level of 14 is used, as it aligns with the supported zoom level in the Mapillary API. 351 | 352 | The get_features_on_points function utilises the mercantile.tile function from the mercantile library to determine the tile coordinates for each point in the points dataframe. By providing the latitude and longitude coordinates of a point, this function returns the corresponding tile coordinates (x, y, z) at the specified zoom level. 353 | 354 | Once the points are grouped based on their tile coordinates, the tiles are downloaded in parallel using threads. The get_features_for_tile function constructs a unique URL for each tile and sends a request to the Mapillary API to retrieve the features (images) within that specific tile. 355 | 356 | To calculate the distances between the features and the points, a k-dimensional tree (KDTree) approach is employed using the local projected CRS in metres. The KDTree is built using the geometry coordinates of the feature points. By querying the KDTree, the nearest neighbours of the points in the points dataframe are identified. The closest feature and distance information are then assigned to each point accordingly. 357 | 358 | 359 | ```python 360 | features = get_features_on_points(points, access_token, distance) 361 | ``` 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 |
geometrytilefeaturedistanceimage_idis_panoramicid
0POINT (5.18339 52.08099)Tile(x=8427, y=5405, z=14){'type': 'Feature', 'geometry': {'type': 'Poin...4.750421211521443868382False0
1POINT (5.18338 52.08144)Tile(x=8427, y=5405, z=14){'type': 'Feature', 'geometry': {'type': 'Poin...0.852942844492656278272False1
2POINT (5.18338 52.08189)Tile(x=8427, y=5405, z=14){'type': 'Feature', 'geometry': {'type': 'Poin...0.787206938764229999108False2
409 | 410 | 411 | ### Step 3. Clean and process data 412 | 413 | In this step, the download_images_for_points function is responsible for efficiently downloading and processing images associated with the points in the GeoDataFrame to calculate the Green View Index (GVI). The function performs the following sub-steps: 414 | 415 | 1. Initialisation and Setup: The function initialises the image processing models and prepares the CSV file for storing the results. It also creates a lock object to ensure thread safety during concurrent execution. 416 | 417 | 2. Image Download and Processing: The function iterates over the rows in the GeoDataFrame and submits download tasks to a ThreadPoolExecutor for concurrent execution. Each task downloads the associated image, applies specific processing steps, and calculates the GVI. The processing steps include: 418 | 419 | - Panoramic Image Cropping using Road Centers: If the image is panoramic and destined for cropping using road centers, the following steps are followed: 420 | 1. Crop the bottom 20% band to improve analysis accuracy, focusing on critical features. 421 | 2. Apply semantic segmentation to categorize different regions or objects within the image. 422 | 3. Augment the panorama's width by wrapping the initial 25% of the image around its right edge. This addition enhances the scene's comprehensiveness. 423 | 4. Identify road centres using the segmentation output to to establish the base points for cropping. 424 | 5. Crop the image based on the found road centres.

425 | 426 | ![png](images/panoramic-roads.png) 427 | 428 | - Panoramic Image without Road Center Cropping: When dealing with panoramic images not intended for cropping via road centers, the process unfolds as follows: 429 | 1. Crop the bottom 20% band to improve analysis accuracy 430 | 2. Apply semantic segmentation to assign labels to different regions or objects in the image. 431 | 3. Divide the image into four equal-width sections.

432 | 433 | ![png](images/panoramic-noroads.png) 434 | 435 | - Non-Panoramic Image 436 | 1. Apply semantic segmentation to assign labels to different regions or objects in the image. 437 | 2. Identify road centres using the segmentation to determine the suitability of the image. This involves ascertaining whether the camera angle captures a valuable portion of the panorama for analysis. 438 | 5. If road centers cannot be identified, the image is disregarded and excluded from further analysis.

439 | 440 | ```python 441 | results = download_images_for_points(features_copy, access_token, max_workers, place, file_name) 442 | ``` 443 | 444 | 445 | ### Step 4. Calculate GVI 446 | After each image is cleaned and processed with previous steps, the Green View Index (GVI), representing the percentage of vegetation visible in the analysed images, is calculated. 447 | 448 | The GVI results, along with the is_panoramic flag and error flags, are collected for each image. The results are written to a CSV file, with each row corresponding to a point in the GeoDataFrame, as soon as a thread finishes its task. 449 | 450 | ![png](images/3.png) 451 | 452 | When the code ends running, there will be a folder in "results/{place}/gvi", which will contain a CSV file with the results. We can use it as a GeoDataframe using the following code. 453 | 454 | ```python 455 | path = "results/De Uithof, Utrecht/gvi/gvi-points.csv" 456 | results = pd.read_csv(path) 457 | 458 | # Convert the 'geometry' column to valid Point objects 459 | results['geometry'] = results.apply(lambda row: Point(float(row["x"]), float(row["y"])), axis=1) 460 | 461 | # Convert the merged DataFrame to a GeoDataFrame 462 | gdf = gpd.GeoDataFrame(results, geometry='geometry', crs=4326) 463 | ``` 464 | 465 | ### Step 5 (Optional). Evaluate image availability and image usability of Mapillary Image data 466 | After analysing the desired images, the image availability and usability are measured by utilising the following equations: 467 | 468 | ![](https://latex.codecogs.com/svg.image?Image&space;Availability&space;Score&space;(IAS)&space;=&space;\frac{N_{imgassigned}}{N_{total}}) 469 | 470 | ![](https://latex.codecogs.com/svg.image?Image&space;Usability&space;Score&space;(IUS)&space;=&space;\frac{N_{imgassigned&space;\land&space;GVIknown}}{N_{imgassigned}}) 471 | 472 | Then, to allow comparisons between multiple cities, the adjusted scores for both metrics are calculated by multiplying the natural logarithm of the road length. 473 | 474 | ![](https://latex.codecogs.com/svg.image?Adjusted&space;Image&space;Availability&space;Score&space;(AIAS)&space;=&space;\frac{N_{imgassigned}}{N_{total}}\times&space;ln(roadlength)) 475 | 476 | ![](https://latex.codecogs.com/svg.image?Adjusted&space;Image&space;Usability&space;Score&space;(AIUS)&space;=&space;\frac{N_{imgassigned&space;\land&space;GVIknown}}{N_{imgassigned}}\times&space;ln(roadlength)) 477 | 478 | ```bash 479 | python results_metrics.py "De Uithof, Utrecht" 480 | ``` 481 | 482 | To illustrate the types of images considered usable for this analysis, we provide the following examples. As it can be seen, the images that are centred on the road are deemed suitable for this analysis. However, images with obstructed or limited visibility have been excluded due to their lack of useful information. This selection was made using the algorithm developed by [Matthew Danish](https://github.com/mrd/vsvi_filter). 483 | 484 | **Suitable images for the analysis** 485 | ![png](images/5.png) 486 | ![png](images/6.png) 487 | 488 | 489 | **Unsuitable images for the analysis** 490 | ![png](images/7.png) 491 | ![png](images/8.png) 492 | 493 | 494 | ### Step 6 (Optional). Model GVI for missing points 495 | Finally, the analysis employs Linear Regression and Linear Generalised Additive models (GAM) to extract insights from the GVI points calculated in the previous step. The primary objective here is to estimate the GVI values for points with missing images. For this purpose, the code incorporates a module developed by [Yúri Grings](https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/GreenEx_Py), which facilitates the extraction of the NDVI values from a TIF file for a given list of points of interest. 496 | 497 | To successfully execute this step, an NDVI file specific to the study area is needed. For optimal results, it is recommended to use an NDVI file that has been consistently generated for the study area throughout an entire year. Furthermore, ensure that the coordinate reference system (CRS) of the NDVI file is projected, with metres as the unit of measurement. 498 | 499 | ```bash 500 | python predict_missing_gvi.py "De Uithof, Utrecht" 32631 50 501 | ``` 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 520 | 521 | 522 | 523 |
Linear RegressionLinear GAM
RMSE0.17070.1640
AIC-879.7232-899.8143
524 | 525 | ![png](images/4.png) 526 | 527 | 528 | ## Acknowledgements and Contact Information 529 | Project made in collaboration with Dr. SM Labib from the Department of Human Geography and Spatial Planning at Utrecht University. This is a project of the Spatial Data Science and Geo-AI Lab, conducted for the Applied Data Science MSc degree 530 | 531 | Ilse Abril Vázquez Sánchez
532 | i.a.vazquezsanchez@students.uu.nl
533 | GitHub profile: iabrilvzqz
534 | 535 | Dr. S.M. Labib
536 | s.m.labib@uu.nl 537 | -------------------------------------------------------------------------------- /images/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/1.png -------------------------------------------------------------------------------- /images/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/2.png -------------------------------------------------------------------------------- /images/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/3.png -------------------------------------------------------------------------------- /images/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/4.png -------------------------------------------------------------------------------- /images/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/5.png -------------------------------------------------------------------------------- /images/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/6.png -------------------------------------------------------------------------------- /images/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/7.png -------------------------------------------------------------------------------- /images/8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/8.png -------------------------------------------------------------------------------- /images/panoramic-noroads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/panoramic-noroads.png -------------------------------------------------------------------------------- /images/panoramic-roads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/panoramic-roads.png -------------------------------------------------------------------------------- /images/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/pipeline.png -------------------------------------------------------------------------------- /main_script.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['USE_PYGEOS'] = '0' 3 | 4 | import modules.process_data as process_data 5 | import modules.osmnx_road_network as road_network 6 | 7 | import geopandas as gpd 8 | from datetime import timedelta 9 | from time import time 10 | import random 11 | import sys 12 | 13 | 14 | if __name__ == "__main__": 15 | # When running the code from the terminal, this is the order in which the parameters should be entered 16 | args = sys.argv 17 | city = args[1] # Name of the city to analyse (e.g. Amsterdam, Netherlands) 18 | distance = int(args[2]) # Distance between the sample points in meters 19 | cut_by_road_centres = int(args[3]) # Determine if panoramic images are going to be cropped using the road centres 20 | access_token = args[4] # Access token for mapillary (e.g. MLY|) 21 | file_name = args[5] # Name of the csv file in which the points with the GVI value are going to be stored 22 | max_workers = int(args[6]) # Number of threads that are going to be used, a good starting point could be the number of cores of the computer 23 | num_sample_images = int(args[7]) 24 | begin = int(args[8]) if len(args) > 8 else None 25 | end = int(args[9]) if len(args) > 9 else None 26 | 27 | process_data.prepare_folders(city) 28 | 29 | file_path_features = os.path.join("results", city, "points", "points.gpkg") 30 | file_path_road = os.path.join("results", city, "roads", "roads.gpkg") 31 | 32 | if not os.path.exists(file_path_features): 33 | # Get the sample points and the features assigned to each point 34 | road = road_network.get_road_network(city) 35 | 36 | # Save road in gpkg file 37 | road["index"] = road.index 38 | road["index"] = road["index"].astype(str) 39 | road["highway"] = road["highway"].astype(str) 40 | road["length"] = road["length"].astype(float) 41 | road[["index", "geometry", "length", "highway"]].to_file(file_path_road, driver="GPKG", crs=road.crs) 42 | 43 | points = road_network.select_points_on_road_network(road, distance) 44 | features = road_network.get_features_on_points(points, access_token, distance) 45 | features.to_file(file_path_features, driver="GPKG") 46 | else: 47 | # If the points file already exists, then we use it to continue with the analysis 48 | features = gpd.read_file(file_path_features, layer="points") 49 | 50 | features = features.sort_values(by='id') 51 | 52 | # If we include a begin and end value, then the dataframe is splitted and we are going to analyse just that points 53 | if begin != None and end != None: 54 | features = features.iloc[begin:end] 55 | 56 | # Get a list of n random row indices 57 | sample_indices = random.sample(range(len(features)), num_sample_images) 58 | # Create a new column 'random_flag' and set it to False for all rows 59 | features["save_sample"] = False 60 | 61 | # Set True for the randomly selected rows 62 | features.loc[sample_indices, "save_sample"] = True 63 | 64 | # Get the initial time 65 | start_time = time() 66 | 67 | results = process_data.download_images_for_points(features, access_token, max_workers, cut_by_road_centres, city, file_name) 68 | 69 | # Get the final time 70 | end_time = time() 71 | 72 | # Calculate the elapsed time 73 | elapsed_time = end_time - start_time 74 | 75 | # Format the elapsed time as "hh:mm:ss" 76 | formatted_time = str(timedelta(seconds=elapsed_time)) 77 | 78 | print(f"Running time: {formatted_time}") -------------------------------------------------------------------------------- /mapillaryGVI.yml: -------------------------------------------------------------------------------- 1 | name: mapillaryGVI 2 | channels: 3 | - conda-forge 4 | - defaults 5 | dependencies: 6 | - ipykernel=6.21.1=pyh736e0ef_0 7 | - geopandas=0.12.2=pyhd8ed1ab_0 8 | - pandas=1.5.3=py310hecf8f37_0 9 | - pip=23.0=pyhd8ed1ab_0 10 | - planetary-computer=0.4.9=pyhd8ed1ab_0 11 | - pystac=1.6.1=pyhd8ed1ab_1 12 | - pystac-client=0.6.0=pyhd8ed1ab_0 13 | - python=3.10.9=he7542f4_0_cpython 14 | - rasterio=1.3.4=py310h3600f62_0 15 | - rioxarray=0.13.3=pyhd8ed1ab_0 16 | - scikit-learn=1.2.1=py310hcebe997_0 17 | - seaborn=0.12.2=hd8ed1ab_0 18 | - shapely=2.0.1=py310h4e43f2a_0 19 | - watchdog=2.2.1=py310h389cd99_0 20 | - xarray-spatial=0.3.5=pyhd8ed1ab_0 21 | - pip: 22 | - folium==0.14.0 23 | - numpy==1.23.0 24 | - matplotlib==3.7.1 25 | - huggingface-hub==0.14.1 26 | - mercantile==1.2.1 27 | - numpy==1.23 28 | - odc-geo==0.4.0 29 | - odc-stac==0.3.6 30 | - osmnx==1.3.0 31 | - pygam==0.8.0 32 | - scipy==1.9.0 33 | - torch==2.0.1 34 | - tqdm==4.61.1 35 | - transformers==4.29.2 36 | - vt2geojson==0.2.1 -------------------------------------------------------------------------------- /modules/availability.py: -------------------------------------------------------------------------------- 1 | # Module taken from Yúri Grings' GitHub repository 2 | # https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/GreenEx_Py 3 | 4 | # Data manipulation and analysis 5 | import numpy as np 6 | import pandas as pd 7 | 8 | # File and directory operations 9 | import os 10 | 11 | # Geospatial data processing and analysis 12 | import geopandas as gpd 13 | import osmnx as ox 14 | import networkx as nx 15 | import rioxarray 16 | import xrspatial 17 | from rasterio.enums import Resampling 18 | import pyproj 19 | import shapely.geometry as sg 20 | from shapely.ops import transform 21 | 22 | # Geospatial data access and catalogs 23 | import pystac_client 24 | import planetary_computer 25 | import odc.stac 26 | 27 | # Date and time manipulation 28 | from datetime import datetime, timedelta 29 | from time import time 30 | 31 | # Progress tracking 32 | from tqdm import tqdm 33 | 34 | # Images 35 | from PIL import Image 36 | import requests 37 | from io import BytesIO 38 | 39 | ##### MAIN FUNCTIONS 40 | def get_mean_NDVI(point_of_interest_file, ndvi_raster_file=None, crs_epsg=None, polygon_type="neighbourhood", buffer_type=None, 41 | buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, year=datetime.now().year, 42 | write_to_file=True, save_ndvi=True, output_dir=os.getcwd()): 43 | ### Step 1: Read and process user inputs, check conditions 44 | poi = gpd.read_file(point_of_interest_file) 45 | # Verify that locations are either all provided using point geometries or all provided using polygon geometries 46 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'): 47 | geom_type = poi.iloc[0]['geometry'].geom_type 48 | else: 49 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function") 50 | 51 | # Make sure the type of polygon is specified if poi file contains polygon geometries 52 | if geom_type == "Polygon": 53 | if polygon_type not in ["neighbourhood", "house"]: 54 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 55 | 56 | # In case of house polygons, transform to centroids 57 | if geom_type == "Polygon": 58 | if polygon_type not in ["neighbourhood", "house"]: 59 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 60 | if polygon_type == "house": 61 | print("Changing geometry type to Point by computing polygon centroids...") 62 | poi['geometry'] = poi['geometry'].centroid 63 | geom_type = poi.iloc[0]['geometry'].geom_type 64 | print("Done \n") 65 | 66 | # Make sure buffer distance and type are set in case of point geometries 67 | if geom_type == "Point": 68 | if buffer_type not in ["euclidean", "network"]: 69 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 70 | 71 | # Make sure CRS is projected rather than geographic 72 | if not poi.crs.is_projected: 73 | if crs_epsg is None: 74 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395") 75 | epsg = 3395 76 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 77 | else: 78 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified") 79 | epsg = crs_epsg 80 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 81 | else: 82 | epsg = poi.crs.to_epsg() 83 | 84 | # Create epsg transformer to use planetary computer and OSM 85 | epsg_transformer = pyproj.Transformer.from_crs(f"epsg:{epsg}", "epsg:4326") 86 | 87 | # Make sure poi dataframe contains ID column 88 | if "id" in poi.columns: 89 | if poi['id'].isnull().values.any(): 90 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int) 91 | else: 92 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int) 93 | 94 | # Make sure the buffer_type argument has a valid value if not None 95 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]: 96 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 97 | 98 | # If buffer type is set to euclidean, make sure that the buffer distance is set 99 | if buffer_type == "euclidean": 100 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 101 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer") 102 | 103 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set 104 | if buffer_type == "network": 105 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 106 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)): 107 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers") 108 | else: 109 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time 110 | # Convert km per hour to m per minute 111 | meters_per_minute = travel_speed * 1000 / 60 112 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects 113 | buffer_dist = trip_time * meters_per_minute * 1.25 114 | else: 115 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time 116 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0: 117 | raise TypeError("Please make sure that one of the following requirements is met:\ 118 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\ 119 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set") 120 | speed_time = False 121 | 122 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified 123 | if buffer_dist is None: 124 | poi_polygon = sg.box(*poi.total_bounds) 125 | else: 126 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist) 127 | 128 | # Retrieve NDVI raster, use planetary computer if not provided by user 129 | if ndvi_raster_file is None: 130 | print("Retrieving NDVI raster through planetary computer...") 131 | start_ndvi_retrieval = time() 132 | 133 | # Transform CRS to comply with planetary computer requirements 134 | bounding_box_pc = transform(epsg_transformer.transform, poi_polygon).bounds 135 | # Swap coords order to match with planetary computer format 136 | bounding_box_pc = [bounding_box_pc[1], bounding_box_pc[0], bounding_box_pc[3], bounding_box_pc[2]] 137 | 138 | # Query planetary computer 139 | catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1",modifier=planetary_computer.sign_inplace) 140 | # Obtain Area of Interest 141 | time_of_interest = f"{year}-01-01/{year}-12-30" 142 | # Search Data 143 | search = catalog.search(collections=["sentinel-2-l2a"], 144 | bbox=bounding_box_pc, 145 | datetime=time_of_interest, 146 | query={"eo:cloud_cover": {"lt": 20}}) 147 | # Obtain Data 148 | items = search.item_collection() 149 | # Create dataframe from planetary computer's item collection dictionary 150 | items_df = gpd.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326") 151 | # Make sure only images are maintained that contain all points/polygons of interest 152 | items_df_poi = items_df[items_df.geometry.contains(sg.box(*bounding_box_pc))] 153 | # Determine lowest percentage of cloud cover among filtered items 154 | lowest_cloud_cover = items_df_poi['eo:cloud_cover'].min() 155 | # Filter the satellite image which has the lowest cloud cover percentage 156 | item_to_select = items_df_poi[items_df_poi['eo:cloud_cover'] == lowest_cloud_cover] 157 | # Select item that matches the filters above and will be used to compose ndvi raster 158 | selected_item = next(item for item in items if item.properties["s2:granule_id"] == item_to_select.iloc[0]['s2:granule_id']) 159 | # Obtain Bands of Interest 160 | selected_item_data = odc.stac.stac_load([selected_item], bands = ['red', 'green', 'blue', 'nir'], bbox = bounding_box_pc).isel(time=0) 161 | # Calculate NDVI values 162 | ndvi = xrspatial.multispectral.ndvi(selected_item_data['nir'], selected_item_data['red']) 163 | # Reproject to original poi CRS 164 | ndvi_src = ndvi.rio.reproject(f"EPSG:{epsg}", resampling= Resampling.nearest, nodata=np.nan) 165 | 166 | # Provide information on satellite image that was used to user 167 | print(f"Information on the satellite image retrieved from planetary computer, use to calculate NDVI values:\ 168 | \n Date on which image was generated: {selected_item.properties['s2:generation_time']}\ 169 | \n Percentage of cloud cover: {selected_item.properties['eo:cloud_cover']}\ 170 | \n Percentage of pixels with missing data {selected_item.properties['s2:nodata_pixel_percentage']}") 171 | 172 | # Save satellite image that was used in case user specifies so 173 | if save_ndvi: 174 | # Retrieve the image URL 175 | image_url = selected_item.assets["rendered_preview"].href 176 | # Download the image data 177 | response = requests.get(image_url) 178 | # Create a PIL Image object from the downloaded image data 179 | image = Image.open(BytesIO(response.content)) 180 | # Create directory if the one specified by the user does not yet exist 181 | if not os.path.exists(output_dir): 182 | os.makedirs(output_dir) 183 | # Get filename of the poi file to append information to it 184 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file)) 185 | # Save the image to a file 186 | image.save(os.path.join(output_dir, f"{input_filename}_ndvi_satellite_image.png")) 187 | ndvi_src.rio.to_raster(os.path.join(output_dir, f"{input_filename}_ndvi_raster.tif")) 188 | print("Satellite image and created NDVI raster successfully saved to file") 189 | end_ndvi_retrieval = time() 190 | elapsed_ndvi_retrieval = end_ndvi_retrieval - start_ndvi_retrieval 191 | print(f"Done, running time: {str(timedelta(seconds=elapsed_ndvi_retrieval))} \n") 192 | else: 193 | # Read ndvi raster provided by user 194 | ndvi_src = rioxarray.open_rasterio(ndvi_raster_file) 195 | # Make sure that ndvi raster has same CRS as poi file 196 | if not ndvi_src.rio.crs.to_epsg() == epsg: 197 | print("Adjusting CRS of NDVI file to match with Point of Interest CRS...") 198 | ndvi_src.rio.write_crs(f'EPSG:{epsg}', inplace=True) 199 | print("Done \n") 200 | 201 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the NDVI raster provided 202 | if not all(geom.within(sg.box(*ndvi_src.rio.bounds())) for geom in poi['geometry']): 203 | if geom_type == "Point": 204 | raise ValueError("Not all points of interest are within the NDVI file provided, please make sure they are and re-run the function") 205 | else: 206 | if not all(geom.intersects(sg.box(*ndvi_src.rio.bounds())) for geom in poi['geometry']): 207 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the NDVI file provided, please make sure they are/do and re-run the function") 208 | else: 209 | print("Warning: Not all polygons of interest are completely within the area covered by the NDVI file provided, results will be based on intersecting part of polygons involved \n") 210 | 211 | ### Step 2: Construct the Area of Interest based on the arguments as defined by user 212 | if buffer_type is None: 213 | # Buffer type == None implies that provided polygons serve as areas of interest 214 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry']) 215 | else: 216 | if buffer_type == "euclidean": 217 | # Create area of interest based on euclidean distance 218 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist)) 219 | else: 220 | # Make sure network type argument has valid value 221 | if network_type not in ["walk", "bike", "drive", "all"]: 222 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function") 223 | 224 | # If poi file still contains polygon geometries, compute centroids so that isochrone maps can be created 225 | if geom_type == "Polygon": 226 | print("Changing geometry type to Point by computing polygon centroids so that isochrones can be retrieved...") 227 | poi['geometry'] = poi['geometry'].centroid 228 | print("Done \n") 229 | 230 | print("Retrieving network within total bounds of point(s) of interest, extended by buffer distance as specified...") 231 | start_network_retrieval = time() 232 | # Transform total bounds polygon of poi file to 4326 for OSM 233 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326") 234 | # Extract polygon in EPSG 4326 235 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0] 236 | # Retrieve street network for desired network type 237 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type) 238 | # Project street network graph back to original poi CRS 239 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}") 240 | end_network_retrieval = time() 241 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval 242 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n") 243 | 244 | # Compute isochrone areas for points of interest 245 | aoi_geometry = [] 246 | for geom in tqdm(poi['geometry'], desc = 'Retrieving isochrone for point(s) of interest'): 247 | # Find node which is closest to point location as base for next steps 248 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y) 249 | # Create subgraph around point of interest for efficiency purposes 250 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length") 251 | # Calculate the time it takes to cover each edge's distance if speed_time is True 252 | if speed_time: 253 | for _, _, _, data in buffer_graph.edges(data=True, keys=True): 254 | data["time"] = data["length"] / meters_per_minute 255 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 256 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time") 257 | else: 258 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 259 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length") 260 | # Compute isochrones, see separate function for line by line explanation 261 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph) 262 | aoi_geometry.append(isochrone_poly) 263 | 264 | # Create geodataframe of isochrone geometries 265 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}") 266 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n") 267 | 268 | ### Step 3: Calculate mean NDVI values and write results to file 269 | print("Calculating mean NDVI values...") 270 | start_calc = time() 271 | # Check whether areas of interest, created in previous steps, are fully covered by the ndvi raster, provide warning if not 272 | if not all(geom.within(sg.box(*ndvi_src.rio.bounds())) for geom in aoi_gdf['geometry']): 273 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the NDVI raster, note that results will be based on the intersecting part of the buffer zone") 274 | # Calculate mean ndvi for geometries in poi file 275 | poi['mean_NDVI'] = aoi_gdf.apply(lambda row: ndvi_src.rio.clip([row.geometry]).clip(min=0).mean().values.round(3), axis=1) 276 | end_calc = time() 277 | elapsed_calc = end_calc - start_calc 278 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n") 279 | 280 | if write_to_file: 281 | print("Writing results to new geopackage file in specified directory...") 282 | # Create directory if output directory specified by user does not yet exist 283 | if not os.path.exists(output_dir): 284 | os.makedirs(output_dir) 285 | # Retrieve filename from original poi file to add information to it while writing to file 286 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file)) 287 | poi.to_file(os.path.join(output_dir, f"{input_filename}_ndvi_added.gpkg"), driver="GPKG") 288 | print("Done") 289 | 290 | return poi 291 | 292 | def get_landcover_percentages(point_of_interest_file, landcover_raster_file=None, crs_epsg=None, polygon_type="neighbourhood", 293 | buffer_type=None, buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, 294 | write_to_file=True, save_lulc=True, output_dir=os.getcwd()): 295 | ### Step 1: Read and process user input, check conditions 296 | poi = gpd.read_file(point_of_interest_file) 297 | # Make sure that geometries in poi file are either all provided using point geometries or all using polygon geometries 298 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'): 299 | geom_type = poi.iloc[0]['geometry'].geom_type 300 | else: 301 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function") 302 | 303 | # Make sure type of polygon is specified in case poi file contains polygon geometries 304 | if geom_type == "Polygon": 305 | if polygon_type not in ["neighbourhood", "house"]: 306 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 307 | 308 | # In case of house polygons, transform to centroids 309 | if geom_type == "Polygon": 310 | if polygon_type not in ["neighbourhood", "house"]: 311 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 312 | if polygon_type == "house": 313 | print("Changing geometry type to Point by computing polygon centroids...") 314 | poi['geometry'] = poi['geometry'].centroid 315 | geom_type = poi.iloc[0]['geometry'].geom_type 316 | print("Done \n") 317 | 318 | # Make sure buffer distance and type are set in case of point geometries 319 | if geom_type == "Point": 320 | if buffer_type not in ["euclidean", "network"]: 321 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 322 | 323 | # Make sure CRS is projected rather than geographic 324 | if not poi.crs.is_projected: 325 | if crs_epsg is None: 326 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395") 327 | epsg = 3395 328 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 329 | else: 330 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified") 331 | epsg = crs_epsg 332 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 333 | else: 334 | epsg = poi.crs.to_epsg() 335 | 336 | # Make sure poi dataframe contains ID column 337 | if "id" in poi.columns: 338 | if poi['id'].isnull().values.any(): 339 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int) 340 | else: 341 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int) 342 | 343 | # Make sure the buffer_type argument has a valid value if not None 344 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]: 345 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 346 | 347 | # If buffer type is set to euclidean, make sure that the buffer distance is set 348 | if buffer_type == "euclidean": 349 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 350 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer") 351 | 352 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set 353 | if buffer_type == "network": 354 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 355 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)): 356 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers") 357 | else: 358 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time 359 | # Convert km per hour to m per minute 360 | meters_per_minute = travel_speed * 1000 / 60 361 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects 362 | buffer_dist = trip_time * meters_per_minute * 1.25 363 | else: 364 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time 365 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0: 366 | raise TypeError("Please make sure that one of the following requirements is met:\ 367 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\ 368 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set") 369 | speed_time = False 370 | 371 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified 372 | if buffer_dist is None: 373 | poi_polygon = sg.box(*poi.total_bounds) 374 | else: 375 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist) 376 | 377 | if landcover_raster_file is None: 378 | # Create epsg transformer to use planetary computer 379 | epsg_transformer = pyproj.Transformer.from_crs(f"epsg:{epsg}", "epsg:4326") 380 | print("Retrieving landcover class raster through planetary computer...") 381 | start_landcover_retrieval = time() 382 | # transform CRS to comply with planetary computer requirements 383 | bounding_box_pc = transform(epsg_transformer.transform, poi_polygon).bounds 384 | # Swap coords order to match with planetary computer format 385 | bounding_box_pc = [bounding_box_pc[1], bounding_box_pc[0], bounding_box_pc[3], bounding_box_pc[2]] 386 | 387 | # Query planetary computer 388 | catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1",modifier=planetary_computer.sign_inplace) 389 | search = catalog.search( 390 | collections=["esa-worldcover"], 391 | bbox=bounding_box_pc, 392 | ) 393 | # Retrieve the items and select the first, most recent one 394 | items = search.item_collection() 395 | selected_item = items[0] 396 | # Extract landcover classes and store in dictionary to use in later stage 397 | class_list = selected_item.assets["map"].extra_fields["classification:classes"] 398 | classmap = { 399 | c["value"]: c["description"] 400 | for c in class_list 401 | } 402 | 403 | # Load raster using rioxarray 404 | landcover = rioxarray.open_rasterio(selected_item.assets["map"].href) 405 | # Clip raster to bounds of geometries in poi file 406 | landcover_clip = landcover.rio.clip_box(*bounding_box_pc) 407 | # Reproject to original poi file CRS 408 | landcover_src = landcover_clip.rio.reproject(f"EPSG:{epsg}", resampling= Resampling.nearest) 409 | 410 | # Provide landcover image information to user 411 | print(f"Information on the land cover image retrieved from planetary computer:\ 412 | \n Image description: {selected_item.properties['description']}\ 413 | \n Image timeframe: {selected_item.properties['start_datetime']} - {selected_item.properties['end_datetime']}") 414 | 415 | if save_lulc: 416 | # Create directory if the one specified by user does not yet exist 417 | if not os.path.exists(output_dir): 418 | os.makedirs(output_dir) 419 | # Extract filename of poi file to add information when writing to file 420 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file)) 421 | # Write landcover raster to file 422 | landcover_src.rio.to_raster(os.path.join(output_dir, f"{input_filename}_lulc_raster.tif")) 423 | print("Landcover image successfully saved to raster file") 424 | end_landcover_retrieval = time() 425 | elapsed_landcover_retrieval = end_landcover_retrieval - start_landcover_retrieval 426 | print(f"Done, running time: {str(timedelta(seconds=elapsed_landcover_retrieval))} \n") 427 | else: 428 | landcover_src = rioxarray.open_rasterio(landcover_raster_file) 429 | # Make sure landcover raster has same CRS as poi file 430 | if not landcover_src.rio.crs.to_epsg() == epsg: 431 | print("Adjusting CRS of land cover file to match with Point of Interest CRS...") 432 | landcover_src.rio.write_crs(f'EPSG:{epsg}', inplace=True) 433 | print("Done \n") 434 | 435 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the NDVI raster provided 436 | if not all(geom.within(sg.box(*landcover_src.rio.bounds())) for geom in poi['geometry']): 437 | if geom_type == "Point": 438 | raise ValueError("Not all points of interest are within the landcover file provided, please make sure they are and re-run the function") 439 | else: 440 | if not all(geom.intersects(sg.box(*landcover_src.rio.bounds())) for geom in poi['geometry']): 441 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the landcover file provided, please make sure they are/do and re-run the function") 442 | else: 443 | print("Warning: Not all polygons of interest are completely within the area covered by the landcover file provided, results will be based on intersecting part of polygons involved \n") 444 | 445 | ### Step 2: Construct the Area of Interest based on the arguments as defined by user 446 | if buffer_type is None: 447 | # Buffer type == None implies that polygons in poi file serve as areas of interest 448 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry']) 449 | else: 450 | # Make sure buffer_dist is set in case buffer_type set to euclidean 451 | if buffer_type == "euclidean": 452 | # Create area of interest based on euclidean distance 453 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist)) 454 | else: 455 | # Make sure network_type argument has valid value 456 | if network_type not in ["walk", "bike", "drive", "all"]: 457 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function") 458 | 459 | # In case poi still contains polygon geometries, compute centroids so that isochrones can be created 460 | if geom_type == "Polygon": 461 | print("Changing geometry type to Point by computing polygon centroids so that isochrones can be retrieved...") 462 | poi['geometry'] = poi['geometry'].centroid 463 | print("Done \n") 464 | 465 | print("Retrieving network within total bounds of point(s) of interest, extended by buffer distance as specified...") 466 | start_network_retrieval = time() 467 | # Transform bounds polygon of poi file to 4326 for OSM 468 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326") 469 | # Extract polygon in EPSG 4326 470 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0] 471 | # Retrieve street network for desired network type 472 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type) 473 | # Project street network graph back to original poi CRS 474 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}") 475 | end_network_retrieval = time() 476 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval 477 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n") 478 | 479 | # Compose area of interest based on isochrones 480 | aoi_geometry = [] 481 | for geom in tqdm(poi['geometry'], desc='Retrieving isochrone for point(s) of interest'): 482 | # Find node which is closest to point location as base for next steps 483 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y) 484 | # Create subgraph for efficiency purposes 485 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length") 486 | # Calculate the time it takes to cover each edge's distance if speed_time is True 487 | if speed_time: 488 | for _, _, _, data in buffer_graph.edges(data=True, keys=True): 489 | data["time"] = data["length"] / meters_per_minute 490 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 491 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time") 492 | else: 493 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 494 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length") 495 | # Compute isochrones, see separate function for line by line explanation 496 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph) 497 | aoi_geometry.append(isochrone_poly) 498 | 499 | # Create dataframe of isochrone polygons 500 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}") 501 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n") 502 | 503 | ### Step 3: Perform calculations and write results to file 504 | print("Calculating landcover class percentages...") 505 | start_calc = time() 506 | # Check if areas of interest, resulting from previous steps, are fully covered by landcover raster, provide warning if not 507 | if not all(geom.within(sg.box(*landcover_src.rio.bounds())) for geom in aoi_gdf['geometry']): 508 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the landcover raster, note that results will be based on the intersecting part of the buffer zone") 509 | 510 | # apply the landcover percentage function to each geometry in the GeoDataFrame and create a new Pandas Series 511 | landcover_percentages_series = aoi_gdf.geometry.apply(lambda x: pd.Series(calculate_landcover_percentages(landcover_src=landcover_src, geometry=x))) 512 | # rename the columns with the landcover class values 513 | if landcover_raster_file is None: 514 | landcover_percentages_series = landcover_percentages_series.rename(columns=lambda x: str(classmap.get(x, x))) 515 | else: 516 | landcover_percentages_series.columns = ["class_" + str(col) for col in landcover_percentages_series.columns] 517 | # concatenate the new series to the original dataframe 518 | poi = pd.concat([poi, landcover_percentages_series], axis=1) 519 | end_calc = time() 520 | elapsed_calc = end_calc - start_calc 521 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n") 522 | 523 | if write_to_file: 524 | print("Writing results to new geopackage file in specified directory...") 525 | # Create output directory if the one specified by user does not yet exist 526 | if not os.path.exists(output_dir): 527 | os.makedirs(output_dir) 528 | # Extract poi filename to add information to it when writing to file 529 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file)) 530 | poi.to_file(os.path.join(output_dir, f"{input_filename}_LCperc_added.gpkg"), driver="GPKG") 531 | print("Done") 532 | 533 | return poi 534 | 535 | def get_canopy_percentage(point_of_interest_file, canopy_vector_file, crs_epsg=None, polygon_type="neighbourhood", buffer_type=None, 536 | buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, write_to_file=True, output_dir=os.getcwd()): 537 | ### Step 1: Read and process user input, check conditions 538 | poi = gpd.read_file(point_of_interest_file) 539 | # Make sure geometries of poi file are either all provided using point geometries or all using polygon geometries 540 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'): 541 | geom_type = poi.iloc[0]['geometry'].geom_type 542 | else: 543 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function") 544 | 545 | # Make sure type of polygon is specified in case poi file contains polygon geometries 546 | if geom_type == "Polygon": 547 | if polygon_type not in ["neighbourhood", "house"]: 548 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 549 | 550 | # In case of house polygons, transform to centroids 551 | if geom_type == "Polygon": 552 | if polygon_type not in ["neighbourhood", "house"]: 553 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 554 | if polygon_type == "house": 555 | print("Changing geometry type to Point by computing polygon centroids...") 556 | poi['geometry'] = poi['geometry'].centroid 557 | geom_type = poi.iloc[0]['geometry'].geom_type 558 | print("Done \n") 559 | 560 | # Make sure buffer distance and type are set in case of point geometries 561 | if geom_type == "Point": 562 | if buffer_type not in ["euclidean", "network"]: 563 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 564 | 565 | # Make sure CRS is projected rather than geographic 566 | if not poi.crs.is_projected: 567 | if crs_epsg is None: 568 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395") 569 | epsg = 3395 570 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 571 | else: 572 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified") 573 | epsg = crs_epsg 574 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 575 | else: 576 | epsg = poi.crs.to_epsg() 577 | 578 | # Make sure poi dataframe contains ID column 579 | if "id" in poi.columns: 580 | if poi['id'].isnull().values.any(): 581 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int) 582 | else: 583 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int) 584 | 585 | # Retrieve tree canopy data 586 | canopy_src = gpd.read_file(canopy_vector_file) 587 | # Make sure geometries in canopy file are of polygon or multipolygon as areas need to be calculated 588 | if not (canopy_src['geometry'].geom_type.isin(['Polygon', 'MultiPolygon']).all()): 589 | raise ValueError("Please make sure all geometries of the tree canopy file are of 'Polygon' or 'MultiPolygon' type and re-run the function") 590 | 591 | # Make sure canopy file has same CRS as poi file 592 | if not canopy_src.crs.to_epsg() == epsg: 593 | print("Adjusting CRS of Greenspace file to match with Point of Interest CRS...") 594 | canopy_src.to_crs(f'EPSG:{epsg}', inplace=True) 595 | print("Done \n") 596 | 597 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the tree canopy file provided 598 | if not all(geom.within(sg.box(*canopy_src.total_bounds)) for geom in poi['geometry']): 599 | if geom_type == "Point": 600 | raise ValueError("Not all points of interest are within the tree canopy file provided, please make sure they are and re-run the function") 601 | else: 602 | if not all(geom.intersects(sg.box(*canopy_src.total_bounds)) for geom in poi['geometry']): 603 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the tree canopy file provided, please make sure they are/do and re-run the function") 604 | else: 605 | print("Warning: Not all polygons of interest are completely within the area covered by the tree canopy file provided, results will be based on intersecting part of polygons involved \n") 606 | 607 | # Make sure the buffer_type argument has a valid value if not None 608 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]: 609 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 610 | 611 | # If buffer type is set to euclidean, make sure that the buffer distance is set 612 | if buffer_type == "euclidean": 613 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 614 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer") 615 | 616 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set 617 | if buffer_type == "network": 618 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 619 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)): 620 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers") 621 | else: 622 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time 623 | # Convert km per hour to m per minute 624 | meters_per_minute = travel_speed * 1000 / 60 625 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects 626 | buffer_dist = trip_time * meters_per_minute * 1.25 627 | else: 628 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time 629 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0: 630 | raise TypeError("Please make sure that one of the following requirements is met:\ 631 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\ 632 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set") 633 | speed_time = False 634 | 635 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified 636 | if buffer_dist is None: 637 | poi_polygon = sg.box(*poi.total_bounds) 638 | else: 639 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist) 640 | 641 | ### Step 2: Construct the Area of Interest based on the arguments as defined by user 642 | if buffer_type is None: 643 | # Buffer type == None implies that polygon geometries serve as areas of interest 644 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry']) 645 | else: 646 | # Make sure buffer dist is set in case buffer type set to euclidean 647 | if buffer_type == "euclidean": 648 | # Create area of interest based on euclidean buffer 649 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist)) 650 | else: 651 | # Make sure network_type has valid value 652 | if network_type not in ["walk", "bike", "drive", "all"]: 653 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function") 654 | 655 | # In case poi still contain polygon geometries, compute centroids so that isochrones can be created 656 | if geom_type == "Polygon": 657 | print("Changing geometry type to Point by computing polygon centroids so that isochrone can be retrieved...") 658 | poi['geometry'] = poi['geometry'].centroid 659 | print("Done \n") 660 | 661 | print("Retrieving network within total bounds of point(s) of interest, extended by buffer distance as specified...") 662 | start_network_retrieval = time() 663 | # Transform bounds polygon of poi file to 4326 for OSM 664 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326") 665 | # Extract polygon in EPSG 4326 666 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0] 667 | # Retrieve street network for desired network type 668 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type) 669 | # Project street network graph back to original poi CRS 670 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}") 671 | end_network_retrieval = time() 672 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval 673 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n") 674 | 675 | aoi_geometry = [] 676 | for geom in tqdm(poi['geometry'], desc='Retrieving isochrone for point(s) of interest'): 677 | # Find node which is closest to point location as base for next steps 678 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y) 679 | # Create subgraph around poi for efficiency purposes 680 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length") 681 | # Calculate the time it takes to cover each edge's distance if speed_time is True 682 | if speed_time: 683 | for _, _, _, data in buffer_graph.edges(data=True, keys=True): 684 | data["time"] = data["length"] / meters_per_minute 685 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 686 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time") 687 | else: 688 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 689 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length") 690 | # Compute isochrones, see separate function for line by line explanation 691 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph) 692 | aoi_geometry.append(isochrone_poly) 693 | 694 | # Create dataframe of isochrone polygons 695 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}") 696 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n") 697 | 698 | 699 | ### Step 3: Perform calculations and write results to file 700 | print("Calculating percentage of tree canopy coverage...") 701 | start_calc = time() 702 | # Check whether areas of interest, resulting from previous steps, are fully covered by tree canopy file, provide warning if not 703 | if not all(geom.within(sg.box(*canopy_src.total_bounds)) for geom in aoi_gdf['geometry']): 704 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the tree canopy file, note that results will be based on the intersecting part of the buffer zone") 705 | 706 | # Calculate percentage of tree canopy cover 707 | poi['canopy_cover'] = aoi_gdf.apply(lambda row: str(((canopy_src.clip(row.geometry).area.sum()/row.geometry.area)*100).round(2))+'%', axis=1) 708 | end_calc = time() 709 | elapsed_calc = end_calc - start_calc 710 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n") 711 | 712 | if write_to_file: 713 | print("Writing results to new geopackage file in specified directory...") 714 | # Create directory if the one specified by the user does not yet exist 715 | if not os.path.exists(output_dir): 716 | os.makedirs(output_dir) 717 | # Extract filename of poi file to add information to it when writing to file 718 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file)) 719 | poi.to_file(os.path.join(output_dir, f"{input_filename}_CanopyPerc_added.gpkg"), driver="GPKG") 720 | print("Done") 721 | 722 | return poi 723 | 724 | def get_park_percentage(point_of_interest_file, park_vector_file=None, crs_epsg=None, polygon_type="neighbourhood", buffer_type=None, 725 | buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, write_to_file=True, 726 | output_dir=os.getcwd()): 727 | ### Step 1: Read and process user input, check conditions 728 | poi = gpd.read_file(point_of_interest_file) 729 | # Make sure geometries of poi file are either all provided using point geometries or all using polygon geometries 730 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'): 731 | geom_type = poi.iloc[0]['geometry'].geom_type 732 | else: 733 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function") 734 | 735 | # Make sure type of polygon is specified in case poi file contains polygon geometries 736 | if geom_type == "Polygon": 737 | if polygon_type not in ["neighbourhood", "house"]: 738 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 739 | 740 | # In case of house polygons, transform to centroids 741 | if geom_type == "Polygon": 742 | if polygon_type not in ["neighbourhood", "house"]: 743 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'") 744 | if polygon_type == "house": 745 | print("Changing geometry type to Point by computing polygon centroids...") 746 | poi['geometry'] = poi['geometry'].centroid 747 | geom_type = poi.iloc[0]['geometry'].geom_type 748 | print("Done \n") 749 | 750 | # Make sure buffer distance and type are set in case of point geometries 751 | if geom_type == "Point": 752 | if buffer_type not in ["euclidean", "network"]: 753 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 754 | 755 | # Make sure CRS is projected rather than geographic 756 | if not poi.crs.is_projected: 757 | if crs_epsg is None: 758 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395") 759 | epsg = 3395 760 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 761 | else: 762 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified") 763 | epsg = crs_epsg 764 | poi.to_crs(f"EPSG:{epsg}", inplace=True) 765 | else: 766 | epsg = poi.crs.to_epsg() 767 | 768 | # Make sure poi dataframe contains ID column 769 | if "id" in poi.columns: 770 | if poi['id'].isnull().values.any(): 771 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int) 772 | else: 773 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int) 774 | 775 | # Make sure the buffer_type argument has a valid value if not None 776 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]: 777 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function") 778 | 779 | # If buffer type is set to euclidean, make sure that the buffer distance is set 780 | if buffer_type == "euclidean": 781 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 782 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer") 783 | 784 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set 785 | if buffer_type == "network": 786 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0): 787 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)): 788 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers") 789 | else: 790 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time 791 | # Convert km per hour to m per minute 792 | meters_per_minute = travel_speed * 1000 / 60 793 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects 794 | buffer_dist = trip_time * meters_per_minute * 1.25 795 | else: 796 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time 797 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0: 798 | raise TypeError("Please make sure that one of the following requirements is met:\ 799 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\ 800 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set") 801 | speed_time = False 802 | 803 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified 804 | if buffer_dist is None: 805 | poi_polygon = sg.box(*poi.total_bounds) 806 | else: 807 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist) 808 | # Transform to 4326 for OSM 809 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326") 810 | # Extract polygon in EPSG 4326 811 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0] 812 | 813 | ### Step 2: Read park data, retrieve from OSM if not provided by user 814 | if park_vector_file is None: 815 | print(f"Retrieving parks within total bounds of {geom_type}(s) of interest, extended by buffer distance if specified...") 816 | start_park_retrieval = time() 817 | # Tags seen as Urban Greenspace (UGS) require the following: 818 | # 1. Tag represent an area 819 | # 2. The area is outdoor 820 | # 3. The area is (semi-)publically available 821 | # 4. The area is likely to contain trees, grass and/or greenery 822 | # 5. The area can reasonable be used for walking or recreational activities 823 | park_tags = {'landuse':['allotments','forest','greenfield','village_green'], 'leisure':['garden','fitness_station','nature_reserve','park','playground'],'natural':'grassland'} 824 | # Extract parks from OpenStreetMap 825 | park_src = ox.geometries_from_polygon(wgs_polygon, tags=park_tags) 826 | # Change CRS to the same one as poi file 827 | park_src.to_crs(f"EPSG:{epsg}", inplace=True) 828 | # Create a boolean mask to filter out polygons and multipolygons 829 | polygon_mask = park_src['geometry'].apply(lambda geom: geom.geom_type in ['Polygon', 'MultiPolygon']) 830 | # Filter the GeoDataFrame to keep only polygons and multipolygons 831 | park_src = park_src.loc[polygon_mask] 832 | end_park_retrieval = time() 833 | elapsed_park_retrieval = end_park_retrieval - start_park_retrieval 834 | print(f"Done, running time: {str(timedelta(seconds=elapsed_park_retrieval))} \n") 835 | else: 836 | park_src = gpd.read_file(park_vector_file) 837 | # Make sure geometries are all polygons or multipolygons as areas should be calculated 838 | if not (park_src['geometry'].geom_type.isin(['Polygon', 'MultiPolygon']).all()): 839 | raise ValueError("Please make sure all geometries of the park file are of 'Polygon' or 'MultiPolygon' type and re-run the function") 840 | 841 | # Make sure CRS of park file is same as CRS of poi file 842 | if not park_src.crs.to_epsg() == epsg: 843 | print("Adjusting CRS of Greenspace file to match with Point of Interest CRS...") 844 | park_src.to_crs(f'EPSG:{epsg}', inplace=True) 845 | print("Done \n") 846 | 847 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the park file provided 848 | if not all(geom.within(sg.box(*park_src.total_bounds)) for geom in poi['geometry']): 849 | if geom_type == "Point": 850 | raise ValueError("Not all points of interest are within the park file provided, please make sure they are and re-run the function") 851 | else: 852 | if not all(geom.intersects(sg.box(*park_src.total_bounds)) for geom in poi['geometry']): 853 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the park file provided, please make sure they are/do and re-run the function") 854 | else: 855 | print("Warning: Not all polygons of interest are completely within the area covered by the park file provided, results will be based on intersecting part of polygons involved \n") 856 | 857 | ### Step 3: Construct the Area of Interest based on the arguments as defined by user 858 | if buffer_type is None: 859 | # Buffer type == None implies that polygon geometries serve as areas of interest 860 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry']) 861 | else: 862 | # Make sure buffer dist is set in case buffer type is euclidean 863 | if buffer_type == "euclidean": 864 | # Create area of interest based on euclidean buffer 865 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist)) 866 | else: 867 | # Make sure network type has valid value 868 | if network_type not in ["walk", "bike", "drive", "all"]: 869 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function") 870 | 871 | # If poi still contains polygon geometries, compute centroids so that isochrones can be created 872 | if geom_type == "Polygon": 873 | print("Changing geometry type to Point by computing polygon centroids so that isochrones can be retrieved...") 874 | poi['geometry'] = poi['geometry'].centroid 875 | print("Done \n") 876 | 877 | print(f"Retrieving network within total bounds of {geom_type}(s) of interest, extended by buffer distance as specified...") 878 | start_network_retrieval = time() 879 | # Retrieve street network for desired network type 880 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type) 881 | # Project street network graph back to original poi CRS 882 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}") 883 | end_network_retrieval = time() 884 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval 885 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n") 886 | 887 | aoi_geometry = [] 888 | for geom in tqdm(poi['geometry'], desc='Retrieving isochrone for point(s) of interest'): 889 | # Find node which is closest to point location as base for next steps 890 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y) 891 | # Create subgraph around poi for efficiency purposes 892 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length") 893 | # Calculate the time it takes to cover each edge's distance if speed_time is True 894 | if speed_time: 895 | for _, _, _, data in buffer_graph.edges(data=True, keys=True): 896 | data["time"] = data["length"] / meters_per_minute 897 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 898 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time") 899 | else: 900 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters 901 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length") 902 | # Compute isochrones, see separate function for line by line explanation 903 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph) 904 | aoi_geometry.append(isochrone_poly) 905 | 906 | # Create dataframe with isochrone geometries 907 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}") 908 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n") 909 | 910 | ### Step 4: Perform calculations and write results to file 911 | print("Calculating percentage of park area coverage...") 912 | start_calc = time() 913 | # Check whether areas of interest, resulting from previous steps, are fully covered by park file, provide warning if not 914 | if not all(geom.within(sg.box(*park_src.total_bounds)) for geom in aoi_gdf['geometry']): 915 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the park file, note that results will be based on the intersecting part of the buffer zone") 916 | 917 | # Calculate percentage of park area cover 918 | poi['park_cover'] = aoi_gdf.apply(lambda row: str(((park_src.clip(row.geometry).area.sum()/row.geometry.area)*100).round(2))+'%', axis=1) 919 | end_calc = time() 920 | elapsed_calc = end_calc - start_calc 921 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n") 922 | 923 | if write_to_file: 924 | print("Writing results to new geopackage file in specified directory...") 925 | # Create output directory if the one specified by user does not yet exist 926 | if not os.path.exists(output_dir): 927 | os.makedirs(output_dir) 928 | # Extract filename of poi file to add information to it when writing to file 929 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file)) 930 | poi.to_file(os.path.join(output_dir, f"{input_filename}_ParkPerc_added.gpkg"), driver="GPKG") 931 | print("Done") 932 | 933 | return poi 934 | 935 | ##### SUPPORTING FUNCTIONS 936 | # Function to create isochrone polygon of network 937 | def make_iso_poly(buffer_graph, subgraph, edge_buff=25, node_buff=0): 938 | #Note: based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb 939 | node_points = [sg.Point((data["x"], data["y"])) for node, data in subgraph.nodes(data=True)] # Create list of point geometries existing of x and y coordinates for each node in subgraph retrieved from previous step 940 | nodes_gdf = gpd.GeoDataFrame({"id": list(subgraph.nodes)}, geometry=node_points) # Create geodataframe containing data from previous step 941 | nodes_gdf = nodes_gdf.set_index("id") # Set index to node ID 942 | 943 | edge_lines = [] 944 | for n_fr, n_to in subgraph.edges(): # Iterate over edges in subgraph 945 | f = nodes_gdf.loc[n_fr].geometry # Retrieve geometry of the 'from' node of the edge 946 | t = nodes_gdf.loc[n_to].geometry # Retrieve geometry of the 'to' node of the edge 947 | edge_lookup = buffer_graph.get_edge_data(n_fr, n_to)[0].get("geometry", sg.LineString([f, t])) # Retrieve edge geometry between from and to nodes 948 | edge_lines.append(edge_lookup) # Append edge geometry to list of edge lines 949 | 950 | n = nodes_gdf.buffer(node_buff).geometry # Create buffer around the nodes 951 | e = gpd.GeoSeries(edge_lines).buffer(edge_buff).geometry # Create buffer around the edges 952 | all_gs = list(n) + list(e) # Concatenate nodes and edges 953 | isochrone_poly = gpd.GeoSeries(all_gs).unary_union # Create polygon of the concatenated nodes and edges 954 | 955 | isochrone_poly = sg.Polygon(isochrone_poly.exterior) # try to fill in surrounded areas so shapes will appear solid and blocks without white space inside them 956 | 957 | return isochrone_poly 958 | 959 | # Function to calculate land cover percentages for a single geometry 960 | def calculate_landcover_percentages(landcover_src, geometry): 961 | # Clip landcover raster to area of interest 962 | clipped = landcover_src.rio.clip([geometry]).clip(min=0) 963 | # Count the occurrences of all unique raster values 964 | unique, counts = np.unique(clipped.values, return_counts=True) 965 | # Calculate total nr. of occurrences 966 | total = counts.sum() 967 | # Calculate percentages for each class 968 | percentages = {value: str((count / total * 100).round(3)) + "%" for value, count in zip(unique, counts)} 969 | return percentages -------------------------------------------------------------------------------- /modules/osmnx_road_network.py: -------------------------------------------------------------------------------- 1 | # Libraries for working with maps and geospatial data 2 | from vt2geojson.tools import vt_bytes_to_geojson 3 | from shapely.geometry import Point 4 | from scipy.spatial import cKDTree 5 | import geopandas as gpd 6 | import osmnx as ox 7 | import mercantile 8 | 9 | # Libraries for working with concurrency and file manipulation 10 | from concurrent.futures import ThreadPoolExecutor, as_completed 11 | from tqdm import tqdm 12 | import pandas as pd 13 | import numpy as np 14 | import requests 15 | 16 | def get_road_network(city): 17 | # Get the road network graph using OpenStreetMap data 18 | # 'network_type' argument is set to 'drive' to get the road network suitable for driving 19 | # 'simplify' argument is set to 'True' to simplify the road network 20 | G = ox.graph_from_place(city, network_type="drive", simplify=True) 21 | 22 | # Create a set to store unique road identifiers 23 | unique_roads = set() 24 | # Create a new graph to store the simplified road network 25 | G_simplified = G.copy() 26 | 27 | # Iterate over each road segment 28 | for u, v, key, data in G.edges(keys=True, data=True): 29 | # Check if the road segment is a duplicate 30 | if (v, u) in unique_roads: 31 | # Remove the duplicate road segment 32 | G_simplified.remove_edge(u, v, key) 33 | else: 34 | # Add the road segment to the set of unique roads 35 | unique_roads.add((u, v)) 36 | 37 | # Update the graph with the simplified road network 38 | G = G_simplified 39 | 40 | # Project the graph from latitude-longitude coordinates to a local projection (in meters) 41 | G_proj = ox.project_graph(G) 42 | 43 | # Convert the projected graph to a GeoDataFrame 44 | # This function projects the graph to the UTM CRS for the UTM zone in which the graph's centroid lies 45 | _, edges = ox.graph_to_gdfs(G_proj) 46 | 47 | return edges 48 | 49 | 50 | # Get a list of points over the road map with a N distance between them 51 | def select_points_on_road_network(roads, N=50): 52 | points = [] 53 | # Iterate over each road 54 | 55 | for row in roads.itertuples(index=True, name='Road'): 56 | # Get the LineString object from the geometry 57 | linestring = row.geometry 58 | index = row.Index 59 | 60 | # Calculate the distance along the linestring and create points every 50 meters 61 | for distance in range(0, int(linestring.length), N): 62 | # Get the point on the road at the current position 63 | point = linestring.interpolate(distance) 64 | 65 | # Add the curent point to the list of points 66 | points.append([point, index]) 67 | 68 | # Convert the list of points to a GeoDataFrame 69 | gdf_points = gpd.GeoDataFrame(points, columns=["geometry", "road_index"], geometry="geometry") 70 | 71 | # Set the same CRS as the road dataframes for the points dataframe 72 | gdf_points.set_crs(roads.crs, inplace=True) 73 | 74 | # Drop duplicate rows based on the geometry column 75 | gdf_points = gdf_points.drop_duplicates(subset=['geometry']) 76 | gdf_points = gdf_points.reset_index(drop=True) 77 | 78 | return gdf_points 79 | 80 | 81 | # This function extracts the features for a given tile 82 | def get_features_for_tile(tile, access_token): 83 | # This URL retrieves all the features within the tile. These features are then going to be assigned to each sample point depending on the distance. 84 | tile_url = f"https://tiles.mapillary.com/maps/vtp/mly1_public/2/{tile.z}/{tile.x}/{tile.y}?access_token={access_token}" 85 | response = requests.get(tile_url) 86 | result = vt_bytes_to_geojson(response.content, tile.x, tile.y, tile.z, layer="image") 87 | return [tile, result] 88 | 89 | 90 | def get_features_on_points(points, access_token, max_distance=50, zoom=14): 91 | # Store the local crs in meters that was assigned by osmnx previously so we can use it to calculate the distances between features and points 92 | local_crs = points.crs 93 | 94 | # Set the CRS to 4326 because it is used by Mapillary 95 | points.to_crs(crs=4326, inplace=True) 96 | 97 | # Add a new column to gdf_points that contains the tile coordinates for each point 98 | points["tile"] = [mercantile.tile(x, y, zoom) for x, y in zip(points.geometry.x, points.geometry.y)] 99 | 100 | # Group the points by their corresponding tiles 101 | groups = points.groupby("tile") 102 | 103 | # Download the tiles and extract the features for each group 104 | features = [] 105 | 106 | # To make the process faster the tiles are downloaded using threads 107 | with ThreadPoolExecutor(max_workers=10) as executor: 108 | futures = [] 109 | 110 | for tile, _ in groups: 111 | futures.append(executor.submit(get_features_for_tile, tile, access_token)) 112 | 113 | for future in tqdm(as_completed(futures), total=len(futures), desc="Downloading tiles"): 114 | result = future.result() 115 | features.append(result) 116 | 117 | pd_features = pd.DataFrame(features, columns=["tile", "features"]) 118 | 119 | # Compute distances between each feature and all the points in gdf_points 120 | feature_points = gpd.GeoDataFrame( 121 | [(Point(f["geometry"]["coordinates"]), f) for row in pd_features["features"] for f in row["features"]], 122 | columns=["geometry", "feature"], 123 | geometry="geometry", 124 | crs=4326 125 | ) 126 | 127 | # Transform from EPSG:4326 (world °) to the local crs in meters that we got when we projected the roads graph in the previous step 128 | feature_points.to_crs(local_crs, inplace=True) 129 | points.to_crs(local_crs, inplace=True) 130 | 131 | # Create a KDTree (k-dimensional tree) from the "geometry" coordinates of feature_points 132 | feature_tree = cKDTree(feature_points["geometry"].apply(lambda p: [p.x, p.y]).tolist()) 133 | # Use the KDTree to query the nearest neighbors of the points in the "geometry" column of points DataFrame 134 | # The query returns the distances and indices of the nearest neighbors 135 | # The parameter "k=1" specifies that we want to find the nearest neighbor 136 | # The parameter "distance_upper_bound=max_distance" sets a maximum distance for the nearest neighbors 137 | distances, indices = feature_tree.query(points["geometry"].apply(lambda p: [p.x, p.y]).tolist(), k=1, distance_upper_bound=max_distance/2) 138 | 139 | # Create a list to store the closest features and distances to each point. If there are no images close then set the value of both to None 140 | closest_features = [feature_points.loc[i, "feature"] if np.isfinite(distances[idx]) else None for idx, i in enumerate(indices)] 141 | closest_distances = [distances[idx] if np.isfinite(distances[idx]) else None for idx in range(len(distances))] 142 | 143 | # Store the closest feature for each point in the "feature" column of the points DataFrame 144 | points["feature"] = closest_features 145 | 146 | # Store the distances as a new column in points 147 | points["distance"] = closest_distances 148 | 149 | # Store image id and is panoramic information as part of the dataframe 150 | points["image_id"] = points.apply(lambda row: str(row["feature"]["properties"]["id"]) if row["feature"] else "", axis=1) 151 | points["image_id"] = points["image_id"].astype(str) 152 | 153 | points["is_panoramic"] = points.apply(lambda row: bool(row["feature"]["properties"]["is_pano"]) if row["feature"] else None, axis=1) 154 | points["is_panoramic"] = points["is_panoramic"].astype(bool) 155 | 156 | # Convert results to geodataframe 157 | points["road_index"] = points["road_index"].astype(str) 158 | points["tile"] = points["tile"].astype(str) 159 | 160 | # Save the current index as a column 161 | points["id"] = points.index 162 | points = points.reset_index(drop=True) 163 | 164 | # Transform the coordinate reference system to EPSG 4326 165 | points.to_crs(epsg=4326, inplace=True) 166 | 167 | return points -------------------------------------------------------------------------------- /modules/process_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | os.environ['USE_PYGEOS'] = '0' 3 | 4 | from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation 5 | from scipy.signal import find_peaks 6 | import torch 7 | 8 | from concurrent.futures import ThreadPoolExecutor, as_completed 9 | from tqdm import tqdm 10 | import threading 11 | import csv 12 | 13 | from modules.segmentation_images import save_images 14 | 15 | from PIL import Image, ImageFile 16 | import numpy as np 17 | import requests 18 | 19 | ImageFile.LOAD_TRUNCATED_IMAGES = True 20 | 21 | def prepare_folders(city): 22 | # Create folder for storing GVI results, sample points and road network if they don't exist yet 23 | dir_path = os.path.join("results", city, "gvi") 24 | if not os.path.exists(dir_path): 25 | os.makedirs(dir_path) 26 | 27 | dir_path = os.path.join("results", city, "points") 28 | if not os.path.exists(dir_path): 29 | os.makedirs(dir_path) 30 | 31 | dir_path = os.path.join("results", city, "roads") 32 | if not os.path.exists(dir_path): 33 | os.makedirs(dir_path) 34 | 35 | dir_path = os.path.join("results", city, "sample_images") 36 | if not os.path.exists(dir_path): 37 | os.makedirs(dir_path) 38 | 39 | 40 | def get_models(): 41 | # Load the pretrained AutoImageProcessor from the "facebook/mask2former-swin-large-cityscapes-semantic" model 42 | processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic") 43 | # Set the device to GPU if available, otherwise use CPU 44 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 45 | # Load the pretrained Mask2FormerForUniversalSegmentation model from "facebook/mask2former-swin-large-cityscapes-semantic" 46 | model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic") 47 | # Move the model to the specified device (GPU or CPU) 48 | model = model.to(device) 49 | # Return the processor and model as a tuple 50 | return processor, model 51 | 52 | 53 | def segment_images(image, processor, model): 54 | # Preprocess the image using the image processor 55 | inputs = processor(images=image, return_tensors="pt") 56 | 57 | # Perform a forward pass through the model to obtain the segmentation 58 | with torch.no_grad(): 59 | # Check if a GPU is available 60 | if torch.cuda.is_available(): 61 | # Move the inputs to the GPU 62 | inputs = {k: v.to('cuda') for k, v in inputs.items()} 63 | # Perform the forward pass through the model 64 | outputs = model(**inputs) 65 | # Post-process the semantic segmentation outputs using the processor and move the results to CPU 66 | segmentation = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0].to('cpu') 67 | else: 68 | # Perform the forward pass through the model 69 | outputs = model(**inputs) 70 | # Post-process the semantic segmentation outputs using the processor 71 | segmentation = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0] 72 | 73 | return segmentation 74 | 75 | 76 | # Based on Matthew Danish code (https://github.com/mrd/vsvi_filter/tree/master) 77 | def run_length_encoding(in_array): 78 | # Convert input array to a NumPy array 79 | image_array = np.asarray(in_array) 80 | length = len(image_array) 81 | if length == 0: 82 | # Return None values if the array is empty 83 | return (None, None, None) 84 | else: 85 | # Calculate run lengths and change points in the array 86 | pairwise_unequal = image_array[1:] != image_array[:-1] 87 | change_points = np.append(np.where(pairwise_unequal), length - 1) # must include last element posi 88 | run_lengths = np.diff(np.append(-1, change_points)) # run lengths 89 | return(run_lengths, image_array[change_points]) 90 | 91 | def get_road_pixels_per_column(prediction): 92 | # Check which pixels in the prediction array correspond to roads (label 0) 93 | road_pixels = prediction == 0.0 94 | road_pixels_per_col = np.zeros(road_pixels.shape[1]) 95 | 96 | for i in range(road_pixels.shape[1]): 97 | # Encode the road pixels in each column and calculate the maximum run length 98 | run_lengths, values = run_length_encoding(road_pixels[:,i]) 99 | road_pixels_per_col[i] = run_lengths[values.nonzero()].max(initial=0) 100 | return road_pixels_per_col 101 | 102 | def get_road_centres(prediction, distance=2000, prominence=100): 103 | # Get the road pixels per column in the prediction 104 | road_pixels_per_col = get_road_pixels_per_column(prediction) 105 | 106 | # Find peaks in the road_pixels_per_col array based on distance and prominence criteria 107 | peaks, _ = find_peaks(road_pixels_per_col, distance=distance, prominence=prominence) 108 | 109 | return peaks 110 | 111 | 112 | def find_road_centre(segmentation): 113 | # Calculate distance and prominence thresholds based on the segmentation shape 114 | distance = int(2000 * segmentation.shape[1] // 5760) 115 | prominence = int(100 * segmentation.shape[0] // 2880) 116 | 117 | # Find road centers based on the segmentation, distance, and prominence thresholds 118 | centres = get_road_centres(segmentation, distance=distance, prominence=prominence) 119 | 120 | return centres 121 | 122 | 123 | def crop_panoramic_images_roads(original_width, image, segmentation, road_centre): 124 | width, height = image.size 125 | 126 | # Find duplicated centres 127 | duplicated_centres = [centre - original_width for centre in road_centre if centre >= original_width] 128 | 129 | # Drop the duplicated centres 130 | road_centre = [centre for centre in road_centre if centre not in duplicated_centres] 131 | 132 | # Calculate dimensions and offsets 133 | w4 = int(width / 4) # 134 | h4 = int(height / 4) 135 | hFor43 = int(w4 * 3 / 4) 136 | w98 = width + (w4 / 2) 137 | xrapneeded = int(width * 7 / 8) 138 | 139 | images = [] 140 | pickles = [] 141 | 142 | # Crop the panoramic image based on road centers 143 | for centre in road_centre: 144 | # Wrapped all the way around 145 | if centre >= w98: 146 | xlo = int((width - centre) - w4/2) 147 | cropped_image = image.crop((xlo, h4, xlo + w4, h4 + hFor43)) 148 | cropped_segmentation = segmentation[h4:h4+hFor43, xlo:xlo+w4] 149 | 150 | # Image requires assembly of two sides 151 | elif centre > xrapneeded: 152 | xlo = int(centre - (w4/2)) # horizontal_offset 153 | w4_p1 = width - xlo 154 | w4_p2 = w4 - w4_p1 155 | 156 | # Crop and concatenate image and segmentation 157 | cropped_image_1 = image.crop((xlo, h4, xlo + w4_p1, h4 + hFor43)) 158 | cropped_image_2 = image.crop((0, h4, w4_p2, h4 + hFor43)) 159 | 160 | cropped_image = Image.new(image.mode, (w4, hFor43)) 161 | cropped_image.paste(cropped_image_1, (0, 0)) 162 | cropped_image.paste(cropped_image_2, (w4_p1, 0)) 163 | 164 | cropped_segmentation_1 = segmentation[h4:h4+hFor43, xlo:xlo+w4_p1] 165 | cropped_segmentation_2 = segmentation[h4:h4+hFor43, 0:w4_p2] 166 | cropped_segmentation = torch.cat((cropped_segmentation_1, cropped_segmentation_2), dim=1) 167 | 168 | # Must paste together the two sides of the image 169 | elif centre < (w4 / 2): 170 | w4_p1 = int((w4 / 2) - centre) 171 | xhi = width - w4_p1 172 | w4_p2 = w4 - w4_p1 173 | 174 | # Crop and concatenate image and segmentation 175 | cropped_image_1 = image.crop((xhi, h4, xhi + w4_p1, h4 + hFor43)) 176 | cropped_image_2 = image.crop((0, h4, w4_p2, h4 + hFor43)) 177 | 178 | cropped_image = Image.new(image.mode, (w4, hFor43)) 179 | cropped_image.paste(cropped_image_1, (0, 0)) 180 | cropped_image.paste(cropped_image_2, (w4_p1, 0)) 181 | 182 | cropped_segmentation_1 = segmentation[h4:h4+hFor43, xhi:xhi+w4_p1] 183 | cropped_segmentation_2 = segmentation[h4:h4+hFor43, 0:w4_p2] 184 | cropped_segmentation = torch.cat((cropped_segmentation_1, cropped_segmentation_2), dim=1) 185 | 186 | # Straightforward crop 187 | else: 188 | xlo = int(centre - w4/2) 189 | cropped_image = image.crop((xlo, h4, xlo + w4, h4 + hFor43)) 190 | cropped_segmentation = segmentation[h4:h4+hFor43, xlo:xlo+w4] 191 | 192 | images.append(cropped_image) 193 | pickles.append(cropped_segmentation) 194 | 195 | return images, pickles 196 | 197 | 198 | def crop_panoramic_images(image, segmentation): 199 | width, height = image.size 200 | 201 | w4 = int(width / 4) 202 | h4 = int(height / 4) 203 | hFor43 = int(w4 * 3 / 4) 204 | 205 | images = [] 206 | pickles = [] 207 | 208 | # Crop the panoramic image based on road centers 209 | for w in range(4): 210 | x_begin = w * w4 211 | x_end = (w + 1) * w4 212 | cropped_image = image.crop((x_begin, h4, x_end, h4 + hFor43)) 213 | cropped_segmentation = segmentation[h4:h4+hFor43, x_begin:x_end] 214 | 215 | images.append(cropped_image) 216 | pickles.append(cropped_segmentation) 217 | 218 | return images, pickles 219 | 220 | 221 | def get_GVI(segmentations): 222 | total_pixels = 0 223 | vegetation_pixels = 0 224 | 225 | for segment in segmentations: 226 | # Calculate the total number of pixels in the segmentation 227 | total_pixels += segment.numel() 228 | # Filter the pixels that represent vegetation (label 8) and count them 229 | vegetation_pixels += (segment == 8).sum().item() 230 | 231 | # Calculate the percentage of green pixels in the segmentation 232 | return vegetation_pixels / total_pixels if total_pixels else 0 233 | 234 | 235 | def process_images(image_url, is_panoramic, cut_by_road_centres, processor, model): 236 | try: 237 | # Fetch and process the image 238 | image = Image.open(requests.get(image_url, stream=True).raw) 239 | 240 | if is_panoramic: 241 | # Get the size of the image 242 | width, height = image.size 243 | 244 | # Crop the bottom 20% of the image to remove the band at the bottom of the panoramic image 245 | bottom_crop = int(height * 0.2) 246 | image = image.crop((0, 0, width, height - bottom_crop)) 247 | 248 | # Apply the semantic segmentation to the image 249 | segmentation = segment_images(image, processor, model) 250 | 251 | if cut_by_road_centres: 252 | # Create a widened panorama by wrapping the first 25% of the image onto the right edge 253 | width, height = image.size 254 | w4 = int(0.25 * width) 255 | 256 | segmentation_25 = segmentation[:, :w4] 257 | # Concatenate the tensors along the first dimension (rows) to create the widened panorama with the segmentations 258 | segmentation_road = torch.cat((segmentation, segmentation_25), dim=1) 259 | 260 | cropped_image = image.crop((0, 0, w4, height)) 261 | widened_image = Image.new(image.mode, (width + w4, height)) 262 | widened_image.paste(image, (0, 0)) 263 | widened_image.paste(cropped_image, (width, 0)) 264 | 265 | # Find the road centers to determine if the image is suitable for analysis 266 | road_centre = find_road_centre(segmentation_road) 267 | 268 | # Crop the image and its segmentation based on the previously found road centers 269 | images, pickles = crop_panoramic_images_roads(width, widened_image, segmentation_road, road_centre) 270 | 271 | # Calculate the Green View Index (GVI) for the cropped segmentations 272 | GVI = get_GVI(pickles) 273 | else: 274 | # Cut panoramic image in 4 equal parts 275 | # Crop the image and its segmentation based on the previously found road centers 276 | images, pickles = crop_panoramic_images(image, segmentation) 277 | 278 | # Calculate the Green View Index (GVI) for the cropped segmentations 279 | GVI = get_GVI(pickles) 280 | 281 | return images, pickles, [GVI, True, False, False] 282 | 283 | else: 284 | # Apply the semantic segmentation to the image 285 | segmentation = segment_images(image, processor, model) 286 | 287 | # If the image is not panoramic, use the segmentation as it is 288 | # Find the road centers to determine if the image is suitable for analysis 289 | road_centre = find_road_centre(segmentation) 290 | 291 | if len(road_centre) > 0: 292 | # Calculate the Green View Index (GVI) for the cropped segmentations 293 | GVI = get_GVI([segmentation]) 294 | return [image], [segmentation], [GVI, False, False, False] 295 | else: 296 | # There are no road centers, so the image is not suitable for analysis 297 | return [image], [segmentation], [None, None, True, False] 298 | except: 299 | # If there was an error while processing the image, set the "error" flag to true and continue with other images 300 | return None, None, [None, None, True, True] 301 | 302 | 303 | # Download images 304 | def download_image(id, geometry, image_id, is_panoramic, save_sample, city, cut_by_road_centres, access_token, processor, model): 305 | # Check if the image id exists 306 | if image_id: 307 | try: 308 | # Create the authorization header for the Mapillary API request 309 | header = {'Authorization': 'OAuth {}'.format(access_token)} 310 | 311 | # Build the URL to fetch the image thumbnail's original URL 312 | url = 'https://graph.mapillary.com/{}?fields=thumb_original_url'.format(image_id) 313 | 314 | # Send a GET request to the Mapillary API to obtain the image URL 315 | response = requests.get(url, headers=header) 316 | data = response.json() 317 | 318 | # Extract the image URL from the response data 319 | image_url = data["thumb_original_url"] 320 | 321 | # Process the downloaded image using the provided image URL, is_panoramic flag, processor, and model 322 | images, segmentations, result = process_images(image_url, is_panoramic, cut_by_road_centres, processor, model) 323 | 324 | if save_sample: 325 | save_images(city, id, images, segmentations, result[0]) 326 | 327 | except: 328 | # An error occurred during the downloading of the image 329 | result = [None, None, True, True] 330 | else: 331 | # The point doesn't have an associated image, so we set the missing value flags 332 | result = [None, None, True, False] 333 | 334 | # Insert the coordinates (x and y) and the point ID at the beginning of the result list 335 | # This helps us associate the values in the result list with their corresponding point 336 | result.insert(0, geometry.y) 337 | result.insert(0, geometry.x) 338 | result.insert(0, id) 339 | 340 | return result 341 | 342 | 343 | def download_images_for_points(gdf, access_token, max_workers, cut_by_road_centres, city, file_name): 344 | # Get image processing models 345 | processor, model = get_models() 346 | 347 | # Prepare CSV file path 348 | csv_file = f"gvi-points-{file_name}.csv" 349 | csv_path = os.path.join("results", city, "gvi", csv_file) 350 | 351 | # Check if the CSV file exists and chose the correct editing mode 352 | file_exists = os.path.exists(csv_path) 353 | mode = 'a' if file_exists else 'w' 354 | 355 | # Create a lock object for thread safety 356 | results = [] 357 | lock = threading.Lock() 358 | 359 | # Open the CSV file in append mode with newline='' 360 | with open(csv_path, mode, newline='') as csvfile: 361 | # Create a CSV writer object 362 | writer = csv.writer(csvfile) 363 | 364 | # Write the header row if the file is newly created 365 | if not file_exists: 366 | writer.writerow(["id", "x", "y", "GVI", "is_panoramic", "missing", "error"]) 367 | 368 | # Create a ThreadPoolExecutor to process images concurrently 369 | with ThreadPoolExecutor(max_workers=max_workers) as executor: 370 | futures = [] 371 | 372 | # Iterate over the rows in the GeoDataFrame 373 | for _, row in gdf.iterrows(): 374 | try: 375 | # Submit a download_image task to the executor 376 | futures.append(executor.submit(download_image, row["id"], row["geometry"], row["image_id"], row["is_panoramic"], row["save_sample"], city, cut_by_road_centres, access_token, processor, model)) 377 | except Exception as e: 378 | print(f"Exception occurred for row {row['id']}: {str(e)}") 379 | 380 | # Process the completed futures using tqdm for progress tracking 381 | for future in tqdm(as_completed(futures), total=len(futures), desc=f"Downloading images"): 382 | # Retrieve the result of the completed future 383 | image_result = future.result() 384 | 385 | # Acquire the lock before appending to results and writing to the CSV file 386 | with lock: 387 | results.append(image_result) 388 | writer.writerow(image_result) 389 | 390 | # Return the processed image results 391 | return results -------------------------------------------------------------------------------- /modules/segmentation_images.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | # Color palette to map each class to a RGB value 5 | color_palette = [ 6 | [128, 64, 128], # 0: road - maroon 7 | [244, 35, 232], # 1: sidewalk - pink 8 | [70, 70, 70], # 2: building - dark gray 9 | [102, 102, 156], # 3: wall - purple 10 | [190, 153, 153], # 4: fence - light brown 11 | [153, 153, 153], # 5: pole - gray 12 | [250, 170, 30], # 6: traffic light - orange 13 | [220, 220, 0], # 7: traffic sign - yellow 14 | [0, 255, 0], # 8: vegetation - dark green 15 | [152, 251, 152], # 9: terrain - light green 16 | [70, 130, 180], # 10: sky - blue 17 | [220, 20, 60], # 11: person - red 18 | [255, 0, 0], # 12: rider - bright red 19 | [0, 0, 142], # 13: car - dark blue 20 | [0, 0, 70], # 14: truck - navy blue 21 | [0, 60, 100], # 15: bus - dark teal 22 | [0, 80, 100], # 16: train - dark green 23 | [0, 0, 230], # 17: motorcycle - blue 24 | [119, 11, 32] # 18: bicycle - dark red 25 | ] 26 | 27 | def visualize_results(city, image_id, image, segmentation, gvi, num): 28 | 29 | fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(9, 4), sharey=True) 30 | 31 | # Display the widened panorama image 32 | ax1.imshow(image) 33 | ax1.set_title("Image") 34 | ax1.axis("off") 35 | 36 | # Map the segmentation result to the color palette 37 | seg_color = np.zeros(segmentation.shape + (3,), dtype=np.uint8) 38 | for label, color in enumerate(color_palette): 39 | seg_color[segmentation == label] = color 40 | 41 | # Display the colored segmentation result 42 | ax2.imshow(seg_color) 43 | ax2.set_title("Segmentation") 44 | ax2.axis("off") 45 | 46 | fig.savefig("results/{}/sample_images/{}-{}.png".format(city, image_id, num), bbox_inches='tight', dpi=100) 47 | 48 | 49 | def save_images(city, image_id, images, pickles, gvi): 50 | num = 0 51 | 52 | for image, segmentation in zip(images, pickles): 53 | num += 1 54 | visualize_results(city, image_id, image, segmentation, gvi, num) -------------------------------------------------------------------------------- /predict_missing_gvi.py: -------------------------------------------------------------------------------- 1 | from sklearn.model_selection import cross_val_score 2 | from sklearn.linear_model import LinearRegression 3 | from pygam import LinearGAM, s 4 | import geopandas as gpd 5 | import pandas as pd 6 | import numpy as np 7 | import rasterio 8 | import sys 9 | import os 10 | 11 | # Function to calculate mean NDVI taken from Yúri Grings' GitHub repository 12 | # https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/GreenEx_Py 13 | from modules.availability import get_mean_NDVI 14 | 15 | def calculate_ndvi(gvi, ndvi, N, city, crs): 16 | ndvi_folder = os.path.join("results", city, "ndvi") 17 | 18 | mean_ndvi = get_mean_NDVI( point_of_interest_file=gvi, 19 | ndvi_raster_file = ndvi, 20 | buffer_type="euclidean", 21 | buffer_dist=N, 22 | crs_epsg=crs, 23 | write_to_file=False, 24 | save_ndvi=False) 25 | 26 | # Save the calculated NDVI values to a file 27 | mean_ndvi.to_crs(crs=4326, inplace=True) 28 | path_to_file = os.path.join(ndvi_folder, "calculated_ndvi_values.gpkg") 29 | mean_ndvi.to_file(path_to_file, driver="GPKG", crs=4326) 30 | 31 | return path_to_file 32 | 33 | 34 | def linear_regression(city): 35 | ndvi_folder = os.path.join("results", city, "ndvi") 36 | 37 | # Load ndvi layer 38 | ndvi_file = os.path.join(ndvi_folder, "calculated_ndvi_values.gpkg") 39 | ndvi_df = gpd.read_file(ndvi_file, layer="calculated_ndvi_values", crs=4326) 40 | 41 | # Separate data into known and missing GVI values 42 | known_df = ndvi_df[ndvi_df['missing'] == False].copy() 43 | missing_df = ndvi_df[ndvi_df['missing'] == True].copy() 44 | 45 | # Split known data into features (NDVI) and target (GVI) 46 | X_train = known_df[['mean_NDVI']] 47 | y_train = known_df['GVI'] 48 | 49 | # Prepare missing data for prediction 50 | X_test = missing_df[['mean_NDVI']] 51 | 52 | # Perform linear regression 53 | lin_reg = LinearRegression() 54 | lin_reg.fit(X_train, y_train) 55 | 56 | predicted_GVI = lin_reg.predict(X_test) 57 | 58 | # Assign the predicted values to the missing GVI values in the DataFrame 59 | missing_df['GVI'] = predicted_GVI 60 | 61 | # Concatenate the updated missing values with the known values 62 | updated_df = pd.concat([known_df, missing_df]) 63 | 64 | path_to_file = os.path.join(ndvi_folder, "calculated_missing_values_linreg.gpkg") 65 | updated_df.to_file(path_to_file, driver="GPKG", crs=4326) 66 | 67 | # Compute RMSE using cross-validation 68 | rmse_scores = np.sqrt(-cross_val_score(lin_reg, X_train, y_train, scoring='neg_mean_squared_error', cv=5)) 69 | avg_rmse = np.mean(rmse_scores) 70 | 71 | # Compute R2 score using cross-validation 72 | r2_scores = cross_val_score(lin_reg, X_train, y_train, scoring='r2', cv=5) 73 | avg_r2 = np.mean(r2_scores) 74 | 75 | # Get the number of parameters (including intercept) 76 | k = X_train.shape[1] + 1 77 | n = len(y_train) # number of samples 78 | 79 | # Calculate the AIC 80 | aic = n * np.log(avg_rmse ** 2) + 2 * k 81 | 82 | print("<----- Linear Regression ----->") 83 | print("R2 value:", avg_r2) 84 | print("RMSE:", avg_rmse) 85 | print("AIC value:", aic) 86 | 87 | return updated_df 88 | 89 | 90 | def gam_regression(city): 91 | ndvi_folder = os.path.join("results", city, "ndvi") 92 | 93 | # Load ndvi layer 94 | ndvi_file = os.path.join(ndvi_folder, "calculated_ndvi_values.gpkg") 95 | ndvi_df = gpd.read_file(ndvi_file, layer="calculated_ndvi_values", crs=4326) 96 | 97 | # Separate data into known and missing GVI values 98 | known_df = ndvi_df[ndvi_df['missing'] == False].copy() 99 | missing_df = ndvi_df[ndvi_df['missing'] == True].copy() 100 | 101 | # Split known data into features (NDVI) and target (GVI) 102 | X_train = known_df[['mean_NDVI']] 103 | y_train = known_df['GVI'] 104 | 105 | # Prepare missing data for prediction 106 | X_test = missing_df[['mean_NDVI']] 107 | 108 | n_features = 1 # number of features used in the model 109 | lams = np.logspace(-5, 5, 20) * n_features 110 | splines = 25 111 | 112 | # Train a Generalized Additive Model (GAM) 113 | gam = LinearGAM( 114 | s(0, n_splines=splines)).gridsearch( 115 | X_train.values, 116 | y_train.values, 117 | lam=lams 118 | ) 119 | 120 | predicted_GVI = gam.predict(X_test.values) 121 | 122 | # Assign the predicted values to the missing GVI values in the DataFrame 123 | missing_df['GVI'] = predicted_GVI 124 | 125 | # Concatenate the updated missing values with the known values 126 | updated_df = pd.concat([known_df, missing_df]) 127 | 128 | path_to_file= os.path.join(ndvi_folder, "calculated_missing_values_gam.gpkg") 129 | updated_df.to_file(path_to_file, driver="GPKG", crs=4326) 130 | 131 | # Compute RMSE using cross-validation 132 | rmse_scores = np.sqrt(-cross_val_score(gam, X_train, y_train, scoring='neg_mean_squared_error', cv=5)) 133 | avg_rmse = np.mean(rmse_scores) 134 | 135 | # Get the number of parameters (including intercept) 136 | k = X_train.shape[1] + 1 137 | n = len(y_train) # number of samples 138 | 139 | # Calculate the AIC 140 | aic = n * np.log(avg_rmse ** 2) + 2 * k 141 | 142 | print("<----- Linear GAM ----->") 143 | print("RMSE:", avg_rmse) 144 | print("AIC value:", aic) 145 | 146 | return updated_df 147 | 148 | 149 | def clean_points(city, crs): 150 | # Cleans the GVI points data by dropping points outside the extent of the NDVI file. 151 | 152 | # File paths for the GVI points and NDVI files 153 | # The NDVI file has to be stored in results/city/ndvi folder and has to be named ndvi.tif 154 | gvi = os.path.join("results", city, "gvi", "gvi-points.gpkg") 155 | ndvi = os.path.join("results", city, "ndvi", f"ndvi.tif") 156 | 157 | gvi_df = gpd.read_file(gvi, layer="gvi-points", crs=4326) 158 | gvi_df.to_crs(epsg=crs, inplace=True) 159 | 160 | # Get the extent of the NDVI file 161 | with rasterio.open(ndvi) as src: 162 | extent = src.bounds 163 | 164 | # Filter the GVI points to include only those within the extent of the NDVI file 165 | filtered_gvi = gvi_df.cx[extent[0]:extent[2], extent[1]:extent[3]] 166 | 167 | # Save the filtered GVI points to a new file to preserve the original data 168 | filtered_gvi_path = os.path.join("results", city, "ndvi", "filtered-points.gpkg") 169 | filtered_gvi.to_file(filtered_gvi_path, driver="GPKG", crs=crs) 170 | 171 | return filtered_gvi_path, ndvi 172 | 173 | 174 | if __name__ == "__main__": 175 | # Read command-line arguments 176 | args = sys.argv 177 | 178 | # Extract city, CRS, and distance from the command-line arguments 179 | city = args[1] # City to analyze 180 | ndvi_file_exists = bool(int(args[2])) # Indicates if we already have the NDVI values 181 | 182 | if not ndvi_file_exists: 183 | crs = int(args[3]) # CRS in meters, suitable for the area in which we are working 184 | # For example, we can use the same CRS as the roads.gpkg file 185 | # IMPORTANT: The NDVI image should be in this CRS 186 | distance = int(args[4]) # The distance used to generate the sample points 187 | 188 | # Step 1: Clean the GVI points by filtering points outside the extent of the NDVI file 189 | gvi, ndvi = clean_points(city, crs) 190 | 191 | # Step 2: Calculate the mean NDVI values from the filtered GVI points 192 | ndvi_path = calculate_ndvi(gvi, ndvi, distance//2, city, crs) 193 | 194 | # Step 3: Train a Linear Regression model to predict missing GVI values 195 | linreg = linear_regression(city) 196 | lingam = gam_regression(city) -------------------------------------------------------------------------------- /scripts/get_gvi_gpkg.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import geopandas as gpd 3 | from shapely.geometry import Point 4 | import glob 5 | import sys 6 | 7 | """ 8 | This script converts CSV files generated with the main_script into a GeoPackage (gpkg) file and a GeoJSON file. It processes the CSV files for a specific city, performs data cleaning and validation, and saves the resulting files in the specified path. 9 | The script just takes the city name as a command-line argument 10 | """ 11 | 12 | if __name__ == "__main__": 13 | args = sys.argv 14 | 15 | city = args[1] # City to analyse 16 | 17 | # Path to the CSV files 18 | csv_files = glob.glob(f"results/{city}/gvi/*.csv") 19 | 20 | # Create an empty list to store individual DataFrames 21 | dfs = [] 22 | 23 | # Loop through the CSV files, read each file using pandas, and append the resulting DataFrame to the list 24 | for csv_file in csv_files: 25 | df = pd.read_csv(csv_file) 26 | dfs.append(df) 27 | 28 | # Concatenate all the DataFrames in the list along the rows 29 | merged_df = pd.concat(dfs, ignore_index=True) 30 | 31 | # Iterate over the DataFrame rows 32 | for index, row in merged_df.iterrows(): 33 | try: 34 | # Attempt to convert the "x" and "y" values to floats 35 | float(row["x"]) 36 | float(row["y"]) 37 | except ValueError: 38 | # If a ValueError occurs, drop the row from the DataFrame 39 | merged_df.drop(index, inplace=True) 40 | 41 | # Drop duplicate rows based on the id column 42 | merged_df = merged_df.drop_duplicates(subset=['id']) 43 | 44 | merged_df.to_csv(f"results/{city}/gvi/gvi-points.csv", index=False) 45 | 46 | # Convert the 'geometry' column to valid Point objects 47 | merged_df['geometry'] = merged_df.apply(lambda row: Point(float(row["x"]), float(row["y"])), axis=1) 48 | merged_df["id"] = merged_df["id"].astype(int) 49 | 50 | # Convert the merged DataFrame to a GeoDataFrame 51 | gdf = gpd.GeoDataFrame(merged_df, geometry='geometry', crs=4326) 52 | 53 | path_to_file="results/{}/gvi/gvi-points.gpkg".format(city, city) 54 | gdf.to_file(path_to_file, driver="GPKG", crs=4326) -------------------------------------------------------------------------------- /scripts/mean_gvi_street.py: -------------------------------------------------------------------------------- 1 | import geopandas as gpd 2 | import sys 3 | import os 4 | 5 | """ 6 | The main purpose of the script is to compute statistical measures such as the mean GVI, the number of missing points, and the total number of points per road segment. These statistics provide valuable insights into the visibility and quality of roads within a given city. 7 | 8 | To use the script, the user needs to provide the name of the city to be analyzed as a command-line argument. The script then retrieves the necessary data files from the corresponding directories and performs the statistical calculations. 9 | """ 10 | 11 | 12 | if __name__ == "__main__": 13 | args = sys.argv 14 | 15 | city = args[1] # City to analyse 16 | 17 | dir_path = os.path.join("results", city) 18 | 19 | # Load roads layer 20 | roads_path = os.path.join(dir_path, "roads", "roads.gpkg") 21 | roads = gpd.read_file(roads_path, layer="roads") 22 | 23 | # Load points with gvi layer 24 | points_path = os.path.join(dir_path, "ndvi", "calculated_missing_values_linreg.gpkg") 25 | points = gpd.read_file(points_path, layer="calculated_missing_values_linreg", crs=4326) 26 | points.to_crs(crs=roads.crs, inplace=True) 27 | 28 | # Load points with roads layer 29 | points_road_path = os.path.join(dir_path, "points", "points.gpkg") 30 | points_road = gpd.read_file(points_road_path, layer="points", crs=4326) 31 | points_road.to_crs(crs=roads.crs, inplace=True) 32 | 33 | # Merge the dataframe containing the GVI value with the dataframe containing the roads ids 34 | points_road = points.merge(points_road, on="id") 35 | 36 | # Merge the previous dataframe with the roads dataframe 37 | intersection = points_road.merge(roads, left_on="road_index", right_on="index") 38 | 39 | # Get statistics per road (mean GVI value, number of null points, number of total points) 40 | gvi_per_road = intersection.groupby("road_index").agg( 41 | {'GVI': ['mean', lambda x: x.isnull().sum(), 'size']} 42 | ).reset_index() 43 | 44 | gvi_per_road.columns = ['road_index', 'avg_GVI', 'null_points_count', 'total_points'] 45 | 46 | # Merge the results back into the road layer 47 | roads_with_avg_gvi = roads.merge(gvi_per_road, left_on="index", right_on="road_index", how='left') 48 | 49 | # Save results to GPKG 50 | path_to_file="results/{}/gvi/gvi-streets.gpkg".format(city) 51 | roads_with_avg_gvi.to_file(path_to_file, driver="GPKG", crs=roads.crs) 52 | -------------------------------------------------------------------------------- /scripts/results_metrics.py: -------------------------------------------------------------------------------- 1 | import geopandas as gpd 2 | import numpy as np 3 | import pandas as pd 4 | import os 5 | import sys 6 | 7 | import seaborn as sns 8 | import matplotlib.pyplot as plt 9 | 10 | 11 | def plot_unavailable_images(df, city): 12 | grouped = df.groupby( 13 | ["highway", "city"] 14 | ).agg( 15 | {"total_null": "sum", 16 | "proportion_null": "sum"}) 17 | 18 | grouped2 = grouped.groupby("highway").agg({"total_null": "sum"}) 19 | 20 | # Sort the grouped DataFrame by 'total_null' column in descending order and select the top 5 rows 21 | top_5_highways = list(grouped2.nlargest(5, 'total_null').index) 22 | 23 | grouped = grouped.loc[top_5_highways] 24 | 25 | # Reset the index for proper sorting and grouping 26 | grouped = grouped.reset_index() 27 | 28 | grouped = grouped.sort_values(by="proportion_null", ascending=False) 29 | 30 | custom_palette = ["#D53E4F", "#FC8D59", "#FEE08B", "#FFFFBF", "#E6F598", "#99D594", "#3288BD"] 31 | 32 | # Create a bar plot for the top 5 highway types 33 | bar1 = sns.barplot(data=grouped, x="proportion_null", y="highway", hue="city", palette=custom_palette) 34 | 35 | # Create custom legend handles and labels for bar1 36 | handles1, labels1 = bar1.get_legend_handles_labels() 37 | # Add the legends outside of the plot 38 | plt.legend(handles1, labels1, title='City', bbox_to_anchor=(1.05, 1), loc='upper left') 39 | 40 | # Set labels and title 41 | plt.title('Top 5 Highway Types with Most Missing Images') 42 | 43 | plt.xlabel('Highway Types') 44 | plt.ylabel('Proportion of Missing Images by Highway Type') 45 | 46 | # Set the maximum value for the y-axis 47 | plt.xlim(0, 1) 48 | 49 | plt.show() 50 | 51 | # Set the figure size 52 | bar1.figure.set_size_inches(8, 6) # Adjust the width and height as needed 53 | 54 | bar1.figure.savefig(f'results/{city}/plot_missing_images_{city}.svg', format='svg', bbox_inches='tight') 55 | 56 | return grouped 57 | 58 | 59 | def get_unavailable_images(intersection, city): 60 | grouped = intersection.groupby(['road_index_x', 'highway']).agg({ 61 | 'image_id': lambda x: (~x.isnull()).sum(), # Count number of points with missing images 62 | }).reset_index() 63 | 64 | # Rename the columns of the grouped dataframe 65 | grouped.columns = ['road_index', 'highway', 'total_null'] 66 | 67 | # Count the number of missing values per road type 68 | count = grouped.groupby('highway').agg({ 69 | 'total_null': 'sum', 70 | }).sort_values('total_null', ascending=False) 71 | 72 | count['city'] = city 73 | count['proportion_null'] = count["total_null"] / len(intersection) 74 | 75 | return count 76 | 77 | 78 | def get_road_unavailable_images(city): 79 | dir_path = os.path.join("results", city) 80 | 81 | # Load roads layer 82 | roads_path = os.path.join(dir_path, "gvi", "gvi-streets.gpkg") 83 | roads = gpd.read_file(roads_path, layer="gvi-streets") 84 | 85 | # Load points with gvi layer 86 | points_path = os.path.join(dir_path, "gvi", "gvi-points.gpkg") 87 | points = gpd.read_file(points_path, layer="gvi-points", crs=4326) 88 | points.to_crs(crs=roads.crs, inplace=True) 89 | 90 | # Load points with roads layer 91 | points_road_path = os.path.join(dir_path, "points", "points.gpkg") 92 | points_road = gpd.read_file(points_road_path, layer="points", crs=4326) 93 | points_road.to_crs(crs=roads.crs, inplace=True) 94 | 95 | points_road = points_road.merge(points, on="id") 96 | 97 | # Merge the previous dataframe with the roads dataframe 98 | intersection = points_road.merge(roads, left_on="road_index", right_on="index") 99 | 100 | intersection = intersection[["id", "image_id", "distance", "is_panoramic_x", "road_index_x", "geometry_x", "GVI", "length", "highway"]] 101 | 102 | count = get_unavailable_images(intersection, city) 103 | 104 | return intersection, count 105 | 106 | 107 | def get_missing_images(df): 108 | unavailable = df[df["image_id"] == ""].count()[0] 109 | unsuitable = df[(df["GVI"].isnull()) & (df["image_id"]!="")].count()[0] 110 | total_null = df[df["GVI"].isnull()].count()[0] 111 | total = df.count()[0] 112 | percentage_null = total_null / total 113 | 114 | result_table = [unavailable, unsuitable, total_null, percentage_null, total] 115 | return pd.DataFrame([result_table], columns=['Unavailable', 'Unsuitable', 'Total', 'Proportion', 'Total Sample Points']) 116 | 117 | 118 | # Create an empty DataFrame to store the results 119 | def get_panoramic_images(df): 120 | is_panoramic = df[df["is_panoramic_x"]].count()[0] 121 | total = df[df["image_id"] != ""].count()[0] 122 | 123 | result_table = [is_panoramic, total, is_panoramic/total] 124 | return pd.DataFrame([result_table], columns=['Panoramic Images', 'Total Images', "Proportion"]) 125 | 126 | 127 | def get_availability_score(df): 128 | gvi_points = df[df["image_id"]!=""].count()[0] 129 | road_length = df["length"].sum() / 1000 130 | total = df.count()[0] 131 | 132 | result_table = [gvi_points, road_length, total, gvi_points/total, (gvi_points * np.log(road_length))/total] 133 | return pd.DataFrame([result_table], columns=['GVI Points', 'Road Length', 'Total Sample', 'Availability Score', 'Adjusted Availability Score']) 134 | 135 | 136 | def get_usability_score(df): 137 | gvi_points = df[(~df["GVI"].isnull()) & (df["image_id"]!="")].count()[0] 138 | road_length = df["length"].sum() / 1000 139 | total = df[df["image_id"]!=""].count()[0] 140 | 141 | result_table = [gvi_points, road_length, total, gvi_points/total, (gvi_points * np.log(road_length))/total] 142 | 143 | return pd.DataFrame([result_table], columns=['GVI Points', 'Road Length', 'Total Sample', 'Usability Score', 'Adjusted Usability Score']) 144 | 145 | 146 | def get_metrics(city): 147 | pd.set_option('display.max_columns', None) 148 | intersection, count = get_road_unavailable_images(city) 149 | 150 | print(f"Unavailable images per road type for {city}") 151 | print(plot_unavailable_images(count, city)) 152 | 153 | print(f"\nMissing images for {city}") 154 | print(get_missing_images(intersection)) 155 | 156 | print(f"\nPanoramic images for {city}") 157 | print(get_panoramic_images(intersection)) 158 | 159 | print(f"\nImage Availability Score and Adjusted Image Availability Score for {city}") 160 | print(get_availability_score(intersection)) 161 | 162 | print(f"\nImage Usability Score and Ajdusted Image Usability Score for {city}") 163 | print(get_usability_score(intersection)) 164 | 165 | 166 | if __name__ == "__main__": 167 | # Read command-line arguments 168 | args = sys.argv 169 | 170 | # Extract city, CRS, and distance from the command-line arguments 171 | city = args[1] # City to analyze 172 | 173 | get_metrics(city) 174 | 175 | 176 | 177 | --------------------------------------------------------------------------------