├── .gitignore
├── CITATION.cff
├── README.md
├── images
├── 1.png
├── 2.png
├── 3.png
├── 4.png
├── 5.png
├── 6.png
├── 7.png
├── 8.png
├── panoramic-noroads.png
├── panoramic-roads.png
└── pipeline.png
├── main_script.py
├── mapillaryGVI.yml
├── mapillary_GVI_googlecolab.ipynb
├── modules
├── availability.py
├── osmnx_road_network.py
├── process_data.py
└── segmentation_images.py
├── pipeline_step_by_step.ipynb
├── predict_missing_gvi.py
├── pygam
└── pygam.py
└── scripts
├── get_gvi_gpkg.py
├── mean_gvi_street.py
└── results_metrics.py
/.gitignore:
--------------------------------------------------------------------------------
1 | results
--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
1 | cff-version: 1.2.0
2 | message: "If you use this software, please cite it as below."
3 | authors:
4 | - family-names: "Vázquez Sánchez"
5 | given-names: "Ilse Abril"
6 | - family-names: "Labib"
7 | given-names: "SM"
8 | orcid: "https://orcid.org/0000-0002-4127-2075"
9 | title: "Automated Green View Index Modeling Pipeline using Mapillary Street Images and Transformer models"
10 | version: 0.1.0
11 | doi: 10.5281/zenodo.8106479
12 | date-released: 2023-07-03
13 | url: "https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility"
14 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Automated Green View Index Modeling Pipeline using Mapillary Street Images and Transformer models [](https://zenodo.org/badge/latestdoi/637342975)
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 | ## Aims and objectives
10 | Urban green spaces provide various benefits, but assessing their visibility is challenging. Traditional methods and Google Street View (GSV) has limitations, therefore integrating Volunteered Street View Imagery (VSVI) platforms like Mapillary has been proposed. Mapillary offers open data and a large community of contributors, but it has its own limitations in terms of data quality and coverage. However, for areas with insufficient street image data, the Normalised Difference Vegetation Index (NDVI) can be used as an alternative indicator for quantifying greenery. While some studies have shown the potential of Mapillary for evaluating urban greenness visibility, there is a lack of systematic evaluation and standardised methodologies.
11 |
12 | The primary objective of this project is to develop a scalable and reproducible framework for leveraging Mapillary street-view image data to assess the Green View Index (GVI) in diverse geographical contexts. Additionally, the framework will utilise NDVI to supplement information in areas where data is unavailable.
13 |
14 |
15 | ## Content
16 | - [Setting up the environment](#setting-up-the-environment)
17 | - [Running in Google Colab](#running-in-google-colab)
18 | - [Running in a local environment](#running-in-a-local-environment)
19 | - [Explaining the Pipeline](#explaining-the-pipeline)
20 | - [Step 1. Retrieve street road network and generate sample points](#step-1-retrieve-street-road-network-and-generate-sample-points)
21 | - [Step 2. Assign images to each sample point based on proximity](#step-2-assign-images-to-each-sample-point-based-on-proximity)
22 | - [Step 3. Clean and process data](#step-3-clean-and-process-data)
23 | - [Step 4. Calculate GVI](#step-4-calculate-gvi)
24 | - [Step 5 (Optional). Evaluate image availability and image usability of Mapillary Image data](#step-5-optional-evaluate-image-availability-and-image-usability-of-mapillary-image-data)
25 | - [Step 6 (Optional). Model GVI for missing points](#step-6-optional-model-gvi-for-missing-points)
26 | - [Acknowledgements and Contact Information](#acknowledgements-and-contact-information)
27 |
28 |
29 |
30 | ## Setting up the environment
31 |
32 | ### Running in Google Colab
33 | To run the project in Google Colab, you have two options:
34 |
35 |
36 | - Download the mapillary_GVI_googlecolab.ipynb notebook and open it on Google Colab.
37 | - Alternatively, you can directly access the notebook using this link
38 |
39 |
40 | Before running the Jupyter Notebook, it is optional but highly recommended to configure Google Colab to use a GPU. Follow these steps:
41 |
42 | - Go to the "Runtime" menu at the top.
43 | - Select "Change runtime type" from the dropdown menu.
44 | - In the "Runtime type" section, choose "Python 3".
45 | - In the "Hardware accelerator" section, select "GPU".
46 | - In the "GPU type" section, choose "T4" if available.
47 | - In the "Runtime shape" section, select "High RAM".
48 | - Save the notebook settings
49 |
50 |
51 | This notebook contains the following code:
52 |
53 | - Install Required Libraries: To begin, the notebook ensures that the required libraries are installed, making sure that all the necessary dependencies are available for execution within the Google Colab environment.
54 |
55 | ```python
56 | %pip install transformers==4.29.2
57 | %pip install geopandas=0.12.2
58 | %pip install torch==1.13.1
59 | %pip install vt2geojson==0.2.1
60 | %pip install mercantile==1.2.1
61 | %pip install osmnx==1.3.0
62 | ```
63 |
64 |
65 | - Mount Google Drive: To facilitate convenient access to files and storage, the notebook proceeds to mount Google Drive. This step allows for the seamless uploading of the project folder, which can then be easily accessed and utilised throughout the entirety of the notebook.
66 |
67 | ```python
68 | from google.colab import drive
69 |
70 | drive.mount('/content/drive')
71 |
72 | %cd /content/drive/MyDrive
73 | ```
74 |
75 |
76 | - Clone GitHub Repository (If Needed): To ensure the availability of the required scripts and files from the "StreetView-NatureVisibility" GitHub repository, the notebook first checks if the repository has already been cloned in the Google Drive. If the repository is not found, the notebook proceeds to clone it using the 'git clone' command. This step guarantees that all the necessary components from the repository are accessible and ready for use.
77 |
78 | ```python
79 | import os
80 |
81 | if not os.path.isdir('StreetView-NatureVisibility'):
82 | !git clone https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility.git
83 |
84 | %cd StreetView-NatureVisibility
85 | ```
86 |
87 |
88 | - Set Analysis Parameters: To customise the analysis, it is essential to modify the values of the following variables based on your specific requirements.
89 |
90 | ```python
91 | place = 'De Uithof, Utrecht'
92 | distance = 50
93 | cut_by_road_centres = 0
94 | access_token = 'MLY|'
95 | file_name = 'utrecht-gvi'
96 | max_workers = 6
97 | num_sample_images = 10
98 | begin = None
99 | end = None
100 | ```
101 | In this example, the main_script.py file will be executed to analyse the Green View Index for De Uithof, Utrecht (Utrecht Science Park).
102 |
103 | Replace the following parameters with appropriate values:
104 |
105 | - place: indicates the name of the place that you want to analyse. You can set the name of any city, neighbourhood or street you want to analyse.
106 | - distance: Represents the distance between sample points in metres.
107 | - cut_by_road_centres: 1 indicates that panoramic images will be cropped using the road centres. If 0 is chosen, then the panoramic images will be cropped into 4 equal-width images that will allow to analyse the complete panorama.
108 | - access_token: Access token for Mapillary (e.g. MLY|). If you don't have an access token yet, you can follow the instructions on this webpage.
109 | - file_name: Represents the name of the CSV file where the points with the GVI (Green View Index) value will be stored.
110 | - max_workers: Indicates the number of threads to be used. A good starting point is the number of CPU cores in the computer running the code. However, you can experiment with different thread counts to find the optimal balance between performance and resource utilisation. Keep in mind that this may not always be the maximum number of threads or the number of CPU cores.
111 | - num_sample_images: Number of images that are going to be stored along with their segmentations.
112 | - begin and end: Define the range of points to be analysed. If desired, you can omit these parameters, allowing the code to run for the entire dataset. However, specifying the range can be useful, especially if the code stops running before analysing all the points.
113 |
114 |
115 |
116 |
117 | - Retrieve Green View Index (GVI) Data: The notebook executes a script ('main_script.py') to retrieve the Green View Index (GVI) data. The script takes the specified analysis parameters as input and performs the data retrieval process.
118 |
119 | ```python
120 | command = f"python main_script.py '{place}' {distance} {cut_by_road_centres} '{access_token}' {file_name} {max_workers} {begin if begin is not None else ''} {end if end is not None else ''}"
121 | !{command}
122 | ```
123 |
124 | - Generate GeoPackage Files (Optional): After retrieving the GVI data, the notebook executes another script ('get_gvi_gpkg.py') to generate GeoPackage files from the obtained CSV files. The generated GeoPackage files include the road network of the analysed place, sample points, and the CSV file containing GVI values.
125 |
126 | ```python
127 | command = f"python get_gvi_gpkg.py '{place}'"
128 | !{command}
129 | ```
130 |
131 | - Compute Mean GVI per Street, and Get Availability and Usability Scores (Optional): Additionally, the notebook provides the option to compute the mean Green View Index (GVI) value per street in the road network. Running a script ('mean_gvi_street.py') achieves this computation.
132 |
133 | ```python
134 | command = f"python mean_gvi_street.py '{place}'"
135 | !{command}
136 | ```
137 |
138 | Once this script was executed, the code to calculate the Image Availability Score and Image Usability Score, along with other quality metrics can be run.
139 |
140 | ```python
141 | command = f"python results_metrics.py '{place}'"
142 | !{command}
143 | ```
144 |
145 |
146 | - Estimate missing GVI points with NDVI (Optional): Finally, it is possible to make an estimation of the GVI values for the points that have missing images using the NDVI value and linear regression. Before proceeding to the next cell, please make sure to follow these steps:
147 |
148 | - Choose a projection in metres that is suitable for your study area.
149 | - Ensure that you have created the required folder structure: StreetView-NatureVisibility/results/{place}/ndvi.
150 | - Place the corresponding NDVI file, named ndvi.tif, inside this folder. It is recommended to use an NDVI file that has been consistently generated for the study area over the course of a year. The NDVI file must be in the same chosen projection for your area of study
151 |
152 |
153 | Important note: please note that the EPSG code specified in the code, which is 32631, is just an example for De Uithof, Netherlands.
154 |
155 | ```python
156 | epsg_code = 32631
157 | command = f"python predict_missing_gvi.py '{place}' {epsg_code} {distance}"
158 | !{command}
159 | ```
160 |
161 |
162 | - Accessing Results: Once the analysis is completed, you can access your Google Drive and navigate to the 'StreetView-NatureVisibility/results/' folder. Inside this folder, you will find a subfolder named after the location that was analysed. This subfolder contains several directories, including:
163 |
164 | - roads: This directory contains the road network GeoPackage file, which provides information about the road infrastructure in the analysed area.
165 | - points: Here, you can find the sample points GeoPackage file, which includes the spatial data of the sampled points used in the analysis.
166 | - ndvi: This directory has the GeoPackage file with the estimated GVI values using linear regression.
167 | - gvi: Initially, this directory contains the CSV file generated during the analysis. It includes the calculated Green View Index (GVI) values for each sampled point. Additionally, if the script for computing the mean GVI per street was executed, this directory will also contain a GeoPackage (GPKG) file with the GVI values aggregated at the street level.
168 |
169 |
170 |
171 |
172 |
173 |
174 | ### Running in a local environment
175 | To create a Conda environment and run the code using the provided YML file, follow these steps:
176 |
177 |
178 | - Cloning GitHub Repository: Open a terminal or command prompt on your computer and navigate to the directory where you want to clone the GitHub repository using the following commands:
179 |
180 |
181 | - Use the cd command to change directories. For example, if you want to clone the repository in the "Documents" folder, you can use the following command:
182 |
183 | ```bash
184 | cd Documents
185 | ```
186 |
187 | - Clone the GitHub repository named "StreetView-NatureVisibility" by executing the following command:
188 |
189 | ```bash
190 | git clone https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility.git
191 | ```
192 | This command will download the repository and create a local copy on your computer.
193 |
194 | - Once the cloning process is complete, navigate to the cloned repository by using the cd command:
195 |
196 | ```bash
197 | cd StreetView-NatureVisibility
198 | ```
199 |
200 |
201 |
202 | - Create a Conda environment using the provided YML file: Run the following command to create the Conda environment:
203 |
204 | ```bash
205 | conda env create -f mapillaryGVI.yml
206 | ```
207 | This command will read the YML file and start creating the environment with the specified dependencies. The process may take a few minutes to complete.
208 |
209 | - Activate conda environment: After the environment creation is complete, activate the newly created environment using the following command:
210 |
211 | ```bash
212 | conda activate mapillaryGVI
213 | ```
214 |
215 | - Compute GVI index: Once the environment is activated, you can start using the project. To run the code and analyse the Green View Index of a specific place, open the terminal and execute the following command:
216 |
217 | ```bash
218 | python main_script.py place distance cut_by_road_centres access_token file_name max_workers num_sample_images begin end
219 | ```
220 |
221 | Replace the following parameters with appropriate values:
222 |
223 | - place: indicates the name of the place that you want to analyse. You can set the name of any city, neighbourhood or street you want to analyse.
224 | - distance: Represents the distance between sample points in metres.
225 | - cut_by_road_centres: 1 indicates that panoramic images will be cropped using the road centres. If 0 is chosen, then the panoramic images will be cropped into 4 equal-width images that will allow to analyse the complete panorama.
226 | - access_token: Access token for Mapillary (e.g. MLY|). If you don't have an access token yet, you can follow the instructions on this webpage.
227 | - file_name: Represents the name of the CSV file where the points with the GVI (Green View Index) value will be stored.
228 | - max_workers: Indicates the number of threads to be used. A good starting point is the number of CPU cores in the computer running the code. However, you can experiment with different thread counts to find the optimal balance between performance and resource utilisation. Keep in mind that this may not always be the maximum number of threads or the number of CPU cores.
229 | - num_sample_images: Number of images that are going to be stored along with their segmentations.
230 | - begin and end: Define the range of points to be analysed. If desired, you can omit these parameters, allowing the code to run for the entire dataset. However, specifying the range can be useful, especially if the code stops running before analysing all the points.
231 |
232 |
233 |
234 |
235 | - Generate GeoPackage Files (Optional): After retrieving the GVI data, you have the option to generate GeoPackage files from the obtained CSV files. This step can be executed by running the following command in the terminal:
236 |
237 | ```bash
238 | python get_gvi_gpkg.py place
239 | ```
240 |
241 | - Compute Mean GVI per Street, and Get Availability and Usability Scores (Optional): Additionally, you can compute the mean Green View Index (GVI) value per street in the road network. To perform this computation, run the following command in the terminal:
242 |
243 | ```bash
244 | python mean_gvi_street.py place
245 | ```
246 |
247 | Once this script was executed, the script to calculate the Image Availability Score and Image Usability Score, along with other quality metrics can be run.
248 |
249 | ```python
250 | python results_metrics.py place
251 | ```
252 |
253 | - Estimate missing GVI points with NDVI file (Optional): Finally, it is possible to make an estimation of the GVI values for the points that have missing images using the NDVI value and linear regression. Before proceeding to the next cell, please make sure to follow these steps:
254 |
255 | - Make sure to use an appropriate projection in metres that is suitable for your study area. For example, you can use the same projection as the one used in the roads.gpkg file.
256 | - Ensure that you have created the required folder structure: StreetView-NatureVisibility/results/{place}/ndvi. Place the corresponding NDVI file, named ndvi.tif, inside this folder. It is recommended to use an NDVI file that has been consistently generated for the study area over the course of a year. The NDVI file must be in the same chosen projection for your area of study.
257 |
258 |
259 | ```shell
260 | python predict_missing_gvi.py {place} {epsg_code} {distance}
261 | ```
262 |
263 |
264 | - Accessing Results: Once the analysis is completed, you can navigate to the cloned repository directory on your local computer. Inside the repository, you will find a folder named results. Within the results folder, there will be a subfolder named after the location that was analysed. This subfolder contains several directories, including:
265 |
266 | - roads: This directory contains the road network GeoPackage file, which provides information about the road infrastructure in the analysed area.
267 | - points: Here, you can find the sample points GeoPackage file, which includes the spatial data of the sampled points used in the analysis.
268 | - ndvi: This directory has the GeoPackage file with the estimated GVI values using linear regression.
269 | - gvi: Initially, this directory contains the CSV file generated during the analysis. It includes the calculated Green View Index (GVI) values for each sampled point. Additionally, if the script for computing the mean GVI per street was executed, this directory will also contain a GeoPackage (GPKG) file with the GVI values aggregated at the street level.
270 |
271 |
272 |
273 |
274 | ## Explaining the Pipeline
275 |
276 | For this explanation, Utrecht Science Park will be used. Therefore, the command should look like this:
277 |
278 | ```bash
279 | python main_script.py 'De Uithof, Utrecht' 50 0 'MLY|' sample-file 8
280 | ```
281 | When executing this command, the code will automatically run from Step 1 to Step 4.
282 |
283 | 
284 |
285 |
286 | ### Step 1. Retrieve street road network and generate sample points
287 |
288 | The first step of the code is to retrieve the road network for a specific place using OpenStreetMap data with the help of the OSMNX library. It begins by fetching the road network graph, focusing on roads that are suitable for driving. One important thing to note is that for bidirectional streets, the osmnx library returns duplicate lines. In this code, we take care to remove these duplicates and keep only the unique road segments to ensure that samples are not taken on the same road multiple times, preventing redundancy in subsequent analysis.
289 |
290 | Following that, the code proceeds to project the graph from its original latitude-longitude coordinates to a local projection in metres. This projection is crucial for achieving accurate measurements in subsequent steps where we need to calculate distances between points. By converting the graph to a local projection, we ensure that our measurements align with the real-world distances on the ground, enabling precise analysis and calculations based on the road network data.
291 |
292 |
293 | ```python
294 | road = get_road_network(place)
295 | ```
296 |
297 | 
298 |
299 | Then, a list of evenly distributed points along the road network, with a specified distance between each point is generated. This is achieved using a function that takes the road network data and an optional distance parameter N, which is set to a default value of 50 metres.
300 |
301 | The function iterates over each road in the roads dataframe and creates points at regular intervals of the specified distance (N). By doing so, it ensures that the generated points are evenly spaced along the road network.
302 |
303 | To maintain a consistent spatial reference, the function sets the Coordinate Reference System (CRS) of the gdf_points dataframe to match the CRS of the roads dataframe. This ensures that the points and the road network are in the same local projected CRS, measured in metres.
304 |
305 | Furthermore, to avoid duplication and redundancy, the function removes any duplicate points in the gdf_points dataframe based on the geometry column. This ensures that each point in the resulting dataframe is unique and represents a distinct location along the road network.
306 |
307 |
308 | ```python
309 | points = select_points_on_road_network(road, distance)
310 | ```
311 |
312 |
313 |
314 |
315 | |
316 | geometry |
317 |
318 |
319 |
320 |
321 | 0 |
322 | POINT (649611.194 5772295.371) |
323 |
324 |
325 | 1 |
326 | POINT (649609.587 5772345.345) |
327 |
328 |
329 | 2 |
330 | POINT (649607.938 5772395.318) |
331 |
332 |
333 | 3 |
334 | POINT (649606.112 5772445.285) |
335 |
336 |
337 | 4 |
338 | POINT (649604.286 5772495.252) |
339 |
340 |
341 |
342 |
343 | 
344 |
345 |
346 | ### Step 2. Assign images to each sample point based on proximity
347 |
348 | The next step in the pipeline focuses on finding the closest features (images) for each point.
349 |
350 | To facilitate this process, the map is divided into smaller sections called tiles. Each tile represents a specific region of the map at a given zoom level. The XYZ tile scheme is employed, where each tile is identified by its zoom level (z), row (x), and column (y) coordinates. In this case, a zoom level of 14 is used, as it aligns with the supported zoom level in the Mapillary API.
351 |
352 | The get_features_on_points function utilises the mercantile.tile function from the mercantile library to determine the tile coordinates for each point in the points dataframe. By providing the latitude and longitude coordinates of a point, this function returns the corresponding tile coordinates (x, y, z) at the specified zoom level.
353 |
354 | Once the points are grouped based on their tile coordinates, the tiles are downloaded in parallel using threads. The get_features_for_tile function constructs a unique URL for each tile and sends a request to the Mapillary API to retrieve the features (images) within that specific tile.
355 |
356 | To calculate the distances between the features and the points, a k-dimensional tree (KDTree) approach is employed using the local projected CRS in metres. The KDTree is built using the geometry coordinates of the feature points. By querying the KDTree, the nearest neighbours of the points in the points dataframe are identified. The closest feature and distance information are then assigned to each point accordingly.
357 |
358 |
359 | ```python
360 | features = get_features_on_points(points, access_token, distance)
361 | ```
362 |
363 |
364 |
365 |
366 | |
367 | geometry |
368 | tile |
369 | feature |
370 | distance |
371 | image_id |
372 | is_panoramic |
373 | id |
374 |
375 |
376 |
377 |
378 | 0 |
379 | POINT (5.18339 52.08099) |
380 | Tile(x=8427, y=5405, z=14) |
381 | {'type': 'Feature', 'geometry': {'type': 'Poin... |
382 | 4.750421 |
383 | 211521443868382 |
384 | False |
385 | 0 |
386 |
387 |
388 | 1 |
389 | POINT (5.18338 52.08144) |
390 | Tile(x=8427, y=5405, z=14) |
391 | {'type': 'Feature', 'geometry': {'type': 'Poin... |
392 | 0.852942 |
393 | 844492656278272 |
394 | False |
395 | 1 |
396 |
397 |
398 | 2 |
399 | POINT (5.18338 52.08189) |
400 | Tile(x=8427, y=5405, z=14) |
401 | {'type': 'Feature', 'geometry': {'type': 'Poin... |
402 | 0.787206 |
403 | 938764229999108 |
404 | False |
405 | 2 |
406 |
407 |
408 |
409 |
410 |
411 | ### Step 3. Clean and process data
412 |
413 | In this step, the download_images_for_points function is responsible for efficiently downloading and processing images associated with the points in the GeoDataFrame to calculate the Green View Index (GVI). The function performs the following sub-steps:
414 |
415 | 1. Initialisation and Setup: The function initialises the image processing models and prepares the CSV file for storing the results. It also creates a lock object to ensure thread safety during concurrent execution.
416 |
417 | 2. Image Download and Processing: The function iterates over the rows in the GeoDataFrame and submits download tasks to a ThreadPoolExecutor for concurrent execution. Each task downloads the associated image, applies specific processing steps, and calculates the GVI. The processing steps include:
418 |
419 | - Panoramic Image Cropping using Road Centers: If the image is panoramic and destined for cropping using road centers, the following steps are followed:
420 | 1. Crop the bottom 20% band to improve analysis accuracy, focusing on critical features.
421 | 2. Apply semantic segmentation to categorize different regions or objects within the image.
422 | 3. Augment the panorama's width by wrapping the initial 25% of the image around its right edge. This addition enhances the scene's comprehensiveness.
423 | 4. Identify road centres using the segmentation output to to establish the base points for cropping.
424 | 5. Crop the image based on the found road centres.
425 |
426 | 
427 |
428 | - Panoramic Image without Road Center Cropping: When dealing with panoramic images not intended for cropping via road centers, the process unfolds as follows:
429 | 1. Crop the bottom 20% band to improve analysis accuracy
430 | 2. Apply semantic segmentation to assign labels to different regions or objects in the image.
431 | 3. Divide the image into four equal-width sections.
432 |
433 | 
434 |
435 | - Non-Panoramic Image
436 | 1. Apply semantic segmentation to assign labels to different regions or objects in the image.
437 | 2. Identify road centres using the segmentation to determine the suitability of the image. This involves ascertaining whether the camera angle captures a valuable portion of the panorama for analysis.
438 | 5. If road centers cannot be identified, the image is disregarded and excluded from further analysis.
439 |
440 | ```python
441 | results = download_images_for_points(features_copy, access_token, max_workers, place, file_name)
442 | ```
443 |
444 |
445 | ### Step 4. Calculate GVI
446 | After each image is cleaned and processed with previous steps, the Green View Index (GVI), representing the percentage of vegetation visible in the analysed images, is calculated.
447 |
448 | The GVI results, along with the is_panoramic flag and error flags, are collected for each image. The results are written to a CSV file, with each row corresponding to a point in the GeoDataFrame, as soon as a thread finishes its task.
449 |
450 | 
451 |
452 | When the code ends running, there will be a folder in "results/{place}/gvi", which will contain a CSV file with the results. We can use it as a GeoDataframe using the following code.
453 |
454 | ```python
455 | path = "results/De Uithof, Utrecht/gvi/gvi-points.csv"
456 | results = pd.read_csv(path)
457 |
458 | # Convert the 'geometry' column to valid Point objects
459 | results['geometry'] = results.apply(lambda row: Point(float(row["x"]), float(row["y"])), axis=1)
460 |
461 | # Convert the merged DataFrame to a GeoDataFrame
462 | gdf = gpd.GeoDataFrame(results, geometry='geometry', crs=4326)
463 | ```
464 |
465 | ### Step 5 (Optional). Evaluate image availability and image usability of Mapillary Image data
466 | After analysing the desired images, the image availability and usability are measured by utilising the following equations:
467 |
468 | &space;=&space;\frac{N_{imgassigned}}{N_{total}})
469 |
470 | &space;=&space;\frac{N_{imgassigned&space;\land&space;GVIknown}}{N_{imgassigned}})
471 |
472 | Then, to allow comparisons between multiple cities, the adjusted scores for both metrics are calculated by multiplying the natural logarithm of the road length.
473 |
474 | &space;=&space;\frac{N_{imgassigned}}{N_{total}}\times&space;ln(roadlength))
475 |
476 | &space;=&space;\frac{N_{imgassigned&space;\land&space;GVIknown}}{N_{imgassigned}}\times&space;ln(roadlength))
477 |
478 | ```bash
479 | python results_metrics.py "De Uithof, Utrecht"
480 | ```
481 |
482 | To illustrate the types of images considered usable for this analysis, we provide the following examples. As it can be seen, the images that are centred on the road are deemed suitable for this analysis. However, images with obstructed or limited visibility have been excluded due to their lack of useful information. This selection was made using the algorithm developed by [Matthew Danish](https://github.com/mrd/vsvi_filter).
483 |
484 | **Suitable images for the analysis**
485 | 
486 | 
487 |
488 |
489 | **Unsuitable images for the analysis**
490 | 
491 | 
492 |
493 |
494 | ### Step 6 (Optional). Model GVI for missing points
495 | Finally, the analysis employs Linear Regression and Linear Generalised Additive models (GAM) to extract insights from the GVI points calculated in the previous step. The primary objective here is to estimate the GVI values for points with missing images. For this purpose, the code incorporates a module developed by [Yúri Grings](https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/GreenEx_Py), which facilitates the extraction of the NDVI values from a TIF file for a given list of points of interest.
496 |
497 | To successfully execute this step, an NDVI file specific to the study area is needed. For optimal results, it is recommended to use an NDVI file that has been consistently generated for the study area throughout an entire year. Furthermore, ensure that the coordinate reference system (CRS) of the NDVI file is projected, with metres as the unit of measurement.
498 |
499 | ```bash
500 | python predict_missing_gvi.py "De Uithof, Utrecht" 32631 50
501 | ```
502 |
503 |
504 |
505 |
506 | |
507 | Linear Regression |
508 | Linear GAM |
509 |
510 |
511 |
512 |
513 | RMSE |
514 | 0.1707 |
515 | 0.1640 |
516 |
517 |
518 | AIC |
519 | -879.7232 |
520 | -899.8143 |
521 |
522 |
523 |
524 |
525 | 
526 |
527 |
528 | ## Acknowledgements and Contact Information
529 | Project made in collaboration with Dr. SM Labib from the Department of Human Geography and Spatial Planning at Utrecht University. This is a project of the Spatial Data Science and Geo-AI Lab, conducted for the Applied Data Science MSc degree
530 |
531 | Ilse Abril Vázquez Sánchez
532 | i.a.vazquezsanchez@students.uu.nl
533 | GitHub profile: iabrilvzqz
534 |
535 | Dr. S.M. Labib
536 | s.m.labib@uu.nl
537 |
--------------------------------------------------------------------------------
/images/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/1.png
--------------------------------------------------------------------------------
/images/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/2.png
--------------------------------------------------------------------------------
/images/3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/3.png
--------------------------------------------------------------------------------
/images/4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/4.png
--------------------------------------------------------------------------------
/images/5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/5.png
--------------------------------------------------------------------------------
/images/6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/6.png
--------------------------------------------------------------------------------
/images/7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/7.png
--------------------------------------------------------------------------------
/images/8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/8.png
--------------------------------------------------------------------------------
/images/panoramic-noroads.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/panoramic-noroads.png
--------------------------------------------------------------------------------
/images/panoramic-roads.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/panoramic-roads.png
--------------------------------------------------------------------------------
/images/pipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Spatial-Data-Science-and-GEO-AI-Lab/StreetView-NatureVisibility/f4e6b5f53890db13bc32154682591937ba2271d0/images/pipeline.png
--------------------------------------------------------------------------------
/main_script.py:
--------------------------------------------------------------------------------
1 | import os
2 | os.environ['USE_PYGEOS'] = '0'
3 |
4 | import modules.process_data as process_data
5 | import modules.osmnx_road_network as road_network
6 |
7 | import geopandas as gpd
8 | from datetime import timedelta
9 | from time import time
10 | import random
11 | import sys
12 |
13 |
14 | if __name__ == "__main__":
15 | # When running the code from the terminal, this is the order in which the parameters should be entered
16 | args = sys.argv
17 | city = args[1] # Name of the city to analyse (e.g. Amsterdam, Netherlands)
18 | distance = int(args[2]) # Distance between the sample points in meters
19 | cut_by_road_centres = int(args[3]) # Determine if panoramic images are going to be cropped using the road centres
20 | access_token = args[4] # Access token for mapillary (e.g. MLY|)
21 | file_name = args[5] # Name of the csv file in which the points with the GVI value are going to be stored
22 | max_workers = int(args[6]) # Number of threads that are going to be used, a good starting point could be the number of cores of the computer
23 | num_sample_images = int(args[7])
24 | begin = int(args[8]) if len(args) > 8 else None
25 | end = int(args[9]) if len(args) > 9 else None
26 |
27 | process_data.prepare_folders(city)
28 |
29 | file_path_features = os.path.join("results", city, "points", "points.gpkg")
30 | file_path_road = os.path.join("results", city, "roads", "roads.gpkg")
31 |
32 | if not os.path.exists(file_path_features):
33 | # Get the sample points and the features assigned to each point
34 | road = road_network.get_road_network(city)
35 |
36 | # Save road in gpkg file
37 | road["index"] = road.index
38 | road["index"] = road["index"].astype(str)
39 | road["highway"] = road["highway"].astype(str)
40 | road["length"] = road["length"].astype(float)
41 | road[["index", "geometry", "length", "highway"]].to_file(file_path_road, driver="GPKG", crs=road.crs)
42 |
43 | points = road_network.select_points_on_road_network(road, distance)
44 | features = road_network.get_features_on_points(points, access_token, distance)
45 | features.to_file(file_path_features, driver="GPKG")
46 | else:
47 | # If the points file already exists, then we use it to continue with the analysis
48 | features = gpd.read_file(file_path_features, layer="points")
49 |
50 | features = features.sort_values(by='id')
51 |
52 | # If we include a begin and end value, then the dataframe is splitted and we are going to analyse just that points
53 | if begin != None and end != None:
54 | features = features.iloc[begin:end]
55 |
56 | # Get a list of n random row indices
57 | sample_indices = random.sample(range(len(features)), num_sample_images)
58 | # Create a new column 'random_flag' and set it to False for all rows
59 | features["save_sample"] = False
60 |
61 | # Set True for the randomly selected rows
62 | features.loc[sample_indices, "save_sample"] = True
63 |
64 | # Get the initial time
65 | start_time = time()
66 |
67 | results = process_data.download_images_for_points(features, access_token, max_workers, cut_by_road_centres, city, file_name)
68 |
69 | # Get the final time
70 | end_time = time()
71 |
72 | # Calculate the elapsed time
73 | elapsed_time = end_time - start_time
74 |
75 | # Format the elapsed time as "hh:mm:ss"
76 | formatted_time = str(timedelta(seconds=elapsed_time))
77 |
78 | print(f"Running time: {formatted_time}")
--------------------------------------------------------------------------------
/mapillaryGVI.yml:
--------------------------------------------------------------------------------
1 | name: mapillaryGVI
2 | channels:
3 | - conda-forge
4 | - defaults
5 | dependencies:
6 | - ipykernel=6.21.1=pyh736e0ef_0
7 | - geopandas=0.12.2=pyhd8ed1ab_0
8 | - pandas=1.5.3=py310hecf8f37_0
9 | - pip=23.0=pyhd8ed1ab_0
10 | - planetary-computer=0.4.9=pyhd8ed1ab_0
11 | - pystac=1.6.1=pyhd8ed1ab_1
12 | - pystac-client=0.6.0=pyhd8ed1ab_0
13 | - python=3.10.9=he7542f4_0_cpython
14 | - rasterio=1.3.4=py310h3600f62_0
15 | - rioxarray=0.13.3=pyhd8ed1ab_0
16 | - scikit-learn=1.2.1=py310hcebe997_0
17 | - seaborn=0.12.2=hd8ed1ab_0
18 | - shapely=2.0.1=py310h4e43f2a_0
19 | - watchdog=2.2.1=py310h389cd99_0
20 | - xarray-spatial=0.3.5=pyhd8ed1ab_0
21 | - pip:
22 | - folium==0.14.0
23 | - numpy==1.23.0
24 | - matplotlib==3.7.1
25 | - huggingface-hub==0.14.1
26 | - mercantile==1.2.1
27 | - numpy==1.23
28 | - odc-geo==0.4.0
29 | - odc-stac==0.3.6
30 | - osmnx==1.3.0
31 | - pygam==0.8.0
32 | - scipy==1.9.0
33 | - torch==2.0.1
34 | - tqdm==4.61.1
35 | - transformers==4.29.2
36 | - vt2geojson==0.2.1
--------------------------------------------------------------------------------
/modules/availability.py:
--------------------------------------------------------------------------------
1 | # Module taken from Yúri Grings' GitHub repository
2 | # https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/GreenEx_Py
3 |
4 | # Data manipulation and analysis
5 | import numpy as np
6 | import pandas as pd
7 |
8 | # File and directory operations
9 | import os
10 |
11 | # Geospatial data processing and analysis
12 | import geopandas as gpd
13 | import osmnx as ox
14 | import networkx as nx
15 | import rioxarray
16 | import xrspatial
17 | from rasterio.enums import Resampling
18 | import pyproj
19 | import shapely.geometry as sg
20 | from shapely.ops import transform
21 |
22 | # Geospatial data access and catalogs
23 | import pystac_client
24 | import planetary_computer
25 | import odc.stac
26 |
27 | # Date and time manipulation
28 | from datetime import datetime, timedelta
29 | from time import time
30 |
31 | # Progress tracking
32 | from tqdm import tqdm
33 |
34 | # Images
35 | from PIL import Image
36 | import requests
37 | from io import BytesIO
38 |
39 | ##### MAIN FUNCTIONS
40 | def get_mean_NDVI(point_of_interest_file, ndvi_raster_file=None, crs_epsg=None, polygon_type="neighbourhood", buffer_type=None,
41 | buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, year=datetime.now().year,
42 | write_to_file=True, save_ndvi=True, output_dir=os.getcwd()):
43 | ### Step 1: Read and process user inputs, check conditions
44 | poi = gpd.read_file(point_of_interest_file)
45 | # Verify that locations are either all provided using point geometries or all provided using polygon geometries
46 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'):
47 | geom_type = poi.iloc[0]['geometry'].geom_type
48 | else:
49 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function")
50 |
51 | # Make sure the type of polygon is specified if poi file contains polygon geometries
52 | if geom_type == "Polygon":
53 | if polygon_type not in ["neighbourhood", "house"]:
54 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
55 |
56 | # In case of house polygons, transform to centroids
57 | if geom_type == "Polygon":
58 | if polygon_type not in ["neighbourhood", "house"]:
59 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
60 | if polygon_type == "house":
61 | print("Changing geometry type to Point by computing polygon centroids...")
62 | poi['geometry'] = poi['geometry'].centroid
63 | geom_type = poi.iloc[0]['geometry'].geom_type
64 | print("Done \n")
65 |
66 | # Make sure buffer distance and type are set in case of point geometries
67 | if geom_type == "Point":
68 | if buffer_type not in ["euclidean", "network"]:
69 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
70 |
71 | # Make sure CRS is projected rather than geographic
72 | if not poi.crs.is_projected:
73 | if crs_epsg is None:
74 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395")
75 | epsg = 3395
76 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
77 | else:
78 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified")
79 | epsg = crs_epsg
80 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
81 | else:
82 | epsg = poi.crs.to_epsg()
83 |
84 | # Create epsg transformer to use planetary computer and OSM
85 | epsg_transformer = pyproj.Transformer.from_crs(f"epsg:{epsg}", "epsg:4326")
86 |
87 | # Make sure poi dataframe contains ID column
88 | if "id" in poi.columns:
89 | if poi['id'].isnull().values.any():
90 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int)
91 | else:
92 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int)
93 |
94 | # Make sure the buffer_type argument has a valid value if not None
95 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]:
96 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
97 |
98 | # If buffer type is set to euclidean, make sure that the buffer distance is set
99 | if buffer_type == "euclidean":
100 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
101 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer")
102 |
103 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set
104 | if buffer_type == "network":
105 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
106 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)):
107 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers")
108 | else:
109 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time
110 | # Convert km per hour to m per minute
111 | meters_per_minute = travel_speed * 1000 / 60
112 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects
113 | buffer_dist = trip_time * meters_per_minute * 1.25
114 | else:
115 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time
116 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0:
117 | raise TypeError("Please make sure that one of the following requirements is met:\
118 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\
119 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set")
120 | speed_time = False
121 |
122 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified
123 | if buffer_dist is None:
124 | poi_polygon = sg.box(*poi.total_bounds)
125 | else:
126 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist)
127 |
128 | # Retrieve NDVI raster, use planetary computer if not provided by user
129 | if ndvi_raster_file is None:
130 | print("Retrieving NDVI raster through planetary computer...")
131 | start_ndvi_retrieval = time()
132 |
133 | # Transform CRS to comply with planetary computer requirements
134 | bounding_box_pc = transform(epsg_transformer.transform, poi_polygon).bounds
135 | # Swap coords order to match with planetary computer format
136 | bounding_box_pc = [bounding_box_pc[1], bounding_box_pc[0], bounding_box_pc[3], bounding_box_pc[2]]
137 |
138 | # Query planetary computer
139 | catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1",modifier=planetary_computer.sign_inplace)
140 | # Obtain Area of Interest
141 | time_of_interest = f"{year}-01-01/{year}-12-30"
142 | # Search Data
143 | search = catalog.search(collections=["sentinel-2-l2a"],
144 | bbox=bounding_box_pc,
145 | datetime=time_of_interest,
146 | query={"eo:cloud_cover": {"lt": 20}})
147 | # Obtain Data
148 | items = search.item_collection()
149 | # Create dataframe from planetary computer's item collection dictionary
150 | items_df = gpd.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")
151 | # Make sure only images are maintained that contain all points/polygons of interest
152 | items_df_poi = items_df[items_df.geometry.contains(sg.box(*bounding_box_pc))]
153 | # Determine lowest percentage of cloud cover among filtered items
154 | lowest_cloud_cover = items_df_poi['eo:cloud_cover'].min()
155 | # Filter the satellite image which has the lowest cloud cover percentage
156 | item_to_select = items_df_poi[items_df_poi['eo:cloud_cover'] == lowest_cloud_cover]
157 | # Select item that matches the filters above and will be used to compose ndvi raster
158 | selected_item = next(item for item in items if item.properties["s2:granule_id"] == item_to_select.iloc[0]['s2:granule_id'])
159 | # Obtain Bands of Interest
160 | selected_item_data = odc.stac.stac_load([selected_item], bands = ['red', 'green', 'blue', 'nir'], bbox = bounding_box_pc).isel(time=0)
161 | # Calculate NDVI values
162 | ndvi = xrspatial.multispectral.ndvi(selected_item_data['nir'], selected_item_data['red'])
163 | # Reproject to original poi CRS
164 | ndvi_src = ndvi.rio.reproject(f"EPSG:{epsg}", resampling= Resampling.nearest, nodata=np.nan)
165 |
166 | # Provide information on satellite image that was used to user
167 | print(f"Information on the satellite image retrieved from planetary computer, use to calculate NDVI values:\
168 | \n Date on which image was generated: {selected_item.properties['s2:generation_time']}\
169 | \n Percentage of cloud cover: {selected_item.properties['eo:cloud_cover']}\
170 | \n Percentage of pixels with missing data {selected_item.properties['s2:nodata_pixel_percentage']}")
171 |
172 | # Save satellite image that was used in case user specifies so
173 | if save_ndvi:
174 | # Retrieve the image URL
175 | image_url = selected_item.assets["rendered_preview"].href
176 | # Download the image data
177 | response = requests.get(image_url)
178 | # Create a PIL Image object from the downloaded image data
179 | image = Image.open(BytesIO(response.content))
180 | # Create directory if the one specified by the user does not yet exist
181 | if not os.path.exists(output_dir):
182 | os.makedirs(output_dir)
183 | # Get filename of the poi file to append information to it
184 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file))
185 | # Save the image to a file
186 | image.save(os.path.join(output_dir, f"{input_filename}_ndvi_satellite_image.png"))
187 | ndvi_src.rio.to_raster(os.path.join(output_dir, f"{input_filename}_ndvi_raster.tif"))
188 | print("Satellite image and created NDVI raster successfully saved to file")
189 | end_ndvi_retrieval = time()
190 | elapsed_ndvi_retrieval = end_ndvi_retrieval - start_ndvi_retrieval
191 | print(f"Done, running time: {str(timedelta(seconds=elapsed_ndvi_retrieval))} \n")
192 | else:
193 | # Read ndvi raster provided by user
194 | ndvi_src = rioxarray.open_rasterio(ndvi_raster_file)
195 | # Make sure that ndvi raster has same CRS as poi file
196 | if not ndvi_src.rio.crs.to_epsg() == epsg:
197 | print("Adjusting CRS of NDVI file to match with Point of Interest CRS...")
198 | ndvi_src.rio.write_crs(f'EPSG:{epsg}', inplace=True)
199 | print("Done \n")
200 |
201 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the NDVI raster provided
202 | if not all(geom.within(sg.box(*ndvi_src.rio.bounds())) for geom in poi['geometry']):
203 | if geom_type == "Point":
204 | raise ValueError("Not all points of interest are within the NDVI file provided, please make sure they are and re-run the function")
205 | else:
206 | if not all(geom.intersects(sg.box(*ndvi_src.rio.bounds())) for geom in poi['geometry']):
207 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the NDVI file provided, please make sure they are/do and re-run the function")
208 | else:
209 | print("Warning: Not all polygons of interest are completely within the area covered by the NDVI file provided, results will be based on intersecting part of polygons involved \n")
210 |
211 | ### Step 2: Construct the Area of Interest based on the arguments as defined by user
212 | if buffer_type is None:
213 | # Buffer type == None implies that provided polygons serve as areas of interest
214 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'])
215 | else:
216 | if buffer_type == "euclidean":
217 | # Create area of interest based on euclidean distance
218 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist))
219 | else:
220 | # Make sure network type argument has valid value
221 | if network_type not in ["walk", "bike", "drive", "all"]:
222 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function")
223 |
224 | # If poi file still contains polygon geometries, compute centroids so that isochrone maps can be created
225 | if geom_type == "Polygon":
226 | print("Changing geometry type to Point by computing polygon centroids so that isochrones can be retrieved...")
227 | poi['geometry'] = poi['geometry'].centroid
228 | print("Done \n")
229 |
230 | print("Retrieving network within total bounds of point(s) of interest, extended by buffer distance as specified...")
231 | start_network_retrieval = time()
232 | # Transform total bounds polygon of poi file to 4326 for OSM
233 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326")
234 | # Extract polygon in EPSG 4326
235 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0]
236 | # Retrieve street network for desired network type
237 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type)
238 | # Project street network graph back to original poi CRS
239 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}")
240 | end_network_retrieval = time()
241 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval
242 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n")
243 |
244 | # Compute isochrone areas for points of interest
245 | aoi_geometry = []
246 | for geom in tqdm(poi['geometry'], desc = 'Retrieving isochrone for point(s) of interest'):
247 | # Find node which is closest to point location as base for next steps
248 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y)
249 | # Create subgraph around point of interest for efficiency purposes
250 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length")
251 | # Calculate the time it takes to cover each edge's distance if speed_time is True
252 | if speed_time:
253 | for _, _, _, data in buffer_graph.edges(data=True, keys=True):
254 | data["time"] = data["length"] / meters_per_minute
255 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
256 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time")
257 | else:
258 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
259 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length")
260 | # Compute isochrones, see separate function for line by line explanation
261 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph)
262 | aoi_geometry.append(isochrone_poly)
263 |
264 | # Create geodataframe of isochrone geometries
265 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}")
266 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n")
267 |
268 | ### Step 3: Calculate mean NDVI values and write results to file
269 | print("Calculating mean NDVI values...")
270 | start_calc = time()
271 | # Check whether areas of interest, created in previous steps, are fully covered by the ndvi raster, provide warning if not
272 | if not all(geom.within(sg.box(*ndvi_src.rio.bounds())) for geom in aoi_gdf['geometry']):
273 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the NDVI raster, note that results will be based on the intersecting part of the buffer zone")
274 | # Calculate mean ndvi for geometries in poi file
275 | poi['mean_NDVI'] = aoi_gdf.apply(lambda row: ndvi_src.rio.clip([row.geometry]).clip(min=0).mean().values.round(3), axis=1)
276 | end_calc = time()
277 | elapsed_calc = end_calc - start_calc
278 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n")
279 |
280 | if write_to_file:
281 | print("Writing results to new geopackage file in specified directory...")
282 | # Create directory if output directory specified by user does not yet exist
283 | if not os.path.exists(output_dir):
284 | os.makedirs(output_dir)
285 | # Retrieve filename from original poi file to add information to it while writing to file
286 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file))
287 | poi.to_file(os.path.join(output_dir, f"{input_filename}_ndvi_added.gpkg"), driver="GPKG")
288 | print("Done")
289 |
290 | return poi
291 |
292 | def get_landcover_percentages(point_of_interest_file, landcover_raster_file=None, crs_epsg=None, polygon_type="neighbourhood",
293 | buffer_type=None, buffer_dist=None, network_type=None, trip_time=None, travel_speed=None,
294 | write_to_file=True, save_lulc=True, output_dir=os.getcwd()):
295 | ### Step 1: Read and process user input, check conditions
296 | poi = gpd.read_file(point_of_interest_file)
297 | # Make sure that geometries in poi file are either all provided using point geometries or all using polygon geometries
298 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'):
299 | geom_type = poi.iloc[0]['geometry'].geom_type
300 | else:
301 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function")
302 |
303 | # Make sure type of polygon is specified in case poi file contains polygon geometries
304 | if geom_type == "Polygon":
305 | if polygon_type not in ["neighbourhood", "house"]:
306 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
307 |
308 | # In case of house polygons, transform to centroids
309 | if geom_type == "Polygon":
310 | if polygon_type not in ["neighbourhood", "house"]:
311 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
312 | if polygon_type == "house":
313 | print("Changing geometry type to Point by computing polygon centroids...")
314 | poi['geometry'] = poi['geometry'].centroid
315 | geom_type = poi.iloc[0]['geometry'].geom_type
316 | print("Done \n")
317 |
318 | # Make sure buffer distance and type are set in case of point geometries
319 | if geom_type == "Point":
320 | if buffer_type not in ["euclidean", "network"]:
321 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
322 |
323 | # Make sure CRS is projected rather than geographic
324 | if not poi.crs.is_projected:
325 | if crs_epsg is None:
326 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395")
327 | epsg = 3395
328 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
329 | else:
330 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified")
331 | epsg = crs_epsg
332 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
333 | else:
334 | epsg = poi.crs.to_epsg()
335 |
336 | # Make sure poi dataframe contains ID column
337 | if "id" in poi.columns:
338 | if poi['id'].isnull().values.any():
339 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int)
340 | else:
341 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int)
342 |
343 | # Make sure the buffer_type argument has a valid value if not None
344 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]:
345 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
346 |
347 | # If buffer type is set to euclidean, make sure that the buffer distance is set
348 | if buffer_type == "euclidean":
349 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
350 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer")
351 |
352 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set
353 | if buffer_type == "network":
354 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
355 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)):
356 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers")
357 | else:
358 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time
359 | # Convert km per hour to m per minute
360 | meters_per_minute = travel_speed * 1000 / 60
361 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects
362 | buffer_dist = trip_time * meters_per_minute * 1.25
363 | else:
364 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time
365 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0:
366 | raise TypeError("Please make sure that one of the following requirements is met:\
367 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\
368 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set")
369 | speed_time = False
370 |
371 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified
372 | if buffer_dist is None:
373 | poi_polygon = sg.box(*poi.total_bounds)
374 | else:
375 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist)
376 |
377 | if landcover_raster_file is None:
378 | # Create epsg transformer to use planetary computer
379 | epsg_transformer = pyproj.Transformer.from_crs(f"epsg:{epsg}", "epsg:4326")
380 | print("Retrieving landcover class raster through planetary computer...")
381 | start_landcover_retrieval = time()
382 | # transform CRS to comply with planetary computer requirements
383 | bounding_box_pc = transform(epsg_transformer.transform, poi_polygon).bounds
384 | # Swap coords order to match with planetary computer format
385 | bounding_box_pc = [bounding_box_pc[1], bounding_box_pc[0], bounding_box_pc[3], bounding_box_pc[2]]
386 |
387 | # Query planetary computer
388 | catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1",modifier=planetary_computer.sign_inplace)
389 | search = catalog.search(
390 | collections=["esa-worldcover"],
391 | bbox=bounding_box_pc,
392 | )
393 | # Retrieve the items and select the first, most recent one
394 | items = search.item_collection()
395 | selected_item = items[0]
396 | # Extract landcover classes and store in dictionary to use in later stage
397 | class_list = selected_item.assets["map"].extra_fields["classification:classes"]
398 | classmap = {
399 | c["value"]: c["description"]
400 | for c in class_list
401 | }
402 |
403 | # Load raster using rioxarray
404 | landcover = rioxarray.open_rasterio(selected_item.assets["map"].href)
405 | # Clip raster to bounds of geometries in poi file
406 | landcover_clip = landcover.rio.clip_box(*bounding_box_pc)
407 | # Reproject to original poi file CRS
408 | landcover_src = landcover_clip.rio.reproject(f"EPSG:{epsg}", resampling= Resampling.nearest)
409 |
410 | # Provide landcover image information to user
411 | print(f"Information on the land cover image retrieved from planetary computer:\
412 | \n Image description: {selected_item.properties['description']}\
413 | \n Image timeframe: {selected_item.properties['start_datetime']} - {selected_item.properties['end_datetime']}")
414 |
415 | if save_lulc:
416 | # Create directory if the one specified by user does not yet exist
417 | if not os.path.exists(output_dir):
418 | os.makedirs(output_dir)
419 | # Extract filename of poi file to add information when writing to file
420 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file))
421 | # Write landcover raster to file
422 | landcover_src.rio.to_raster(os.path.join(output_dir, f"{input_filename}_lulc_raster.tif"))
423 | print("Landcover image successfully saved to raster file")
424 | end_landcover_retrieval = time()
425 | elapsed_landcover_retrieval = end_landcover_retrieval - start_landcover_retrieval
426 | print(f"Done, running time: {str(timedelta(seconds=elapsed_landcover_retrieval))} \n")
427 | else:
428 | landcover_src = rioxarray.open_rasterio(landcover_raster_file)
429 | # Make sure landcover raster has same CRS as poi file
430 | if not landcover_src.rio.crs.to_epsg() == epsg:
431 | print("Adjusting CRS of land cover file to match with Point of Interest CRS...")
432 | landcover_src.rio.write_crs(f'EPSG:{epsg}', inplace=True)
433 | print("Done \n")
434 |
435 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the NDVI raster provided
436 | if not all(geom.within(sg.box(*landcover_src.rio.bounds())) for geom in poi['geometry']):
437 | if geom_type == "Point":
438 | raise ValueError("Not all points of interest are within the landcover file provided, please make sure they are and re-run the function")
439 | else:
440 | if not all(geom.intersects(sg.box(*landcover_src.rio.bounds())) for geom in poi['geometry']):
441 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the landcover file provided, please make sure they are/do and re-run the function")
442 | else:
443 | print("Warning: Not all polygons of interest are completely within the area covered by the landcover file provided, results will be based on intersecting part of polygons involved \n")
444 |
445 | ### Step 2: Construct the Area of Interest based on the arguments as defined by user
446 | if buffer_type is None:
447 | # Buffer type == None implies that polygons in poi file serve as areas of interest
448 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'])
449 | else:
450 | # Make sure buffer_dist is set in case buffer_type set to euclidean
451 | if buffer_type == "euclidean":
452 | # Create area of interest based on euclidean distance
453 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist))
454 | else:
455 | # Make sure network_type argument has valid value
456 | if network_type not in ["walk", "bike", "drive", "all"]:
457 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function")
458 |
459 | # In case poi still contains polygon geometries, compute centroids so that isochrones can be created
460 | if geom_type == "Polygon":
461 | print("Changing geometry type to Point by computing polygon centroids so that isochrones can be retrieved...")
462 | poi['geometry'] = poi['geometry'].centroid
463 | print("Done \n")
464 |
465 | print("Retrieving network within total bounds of point(s) of interest, extended by buffer distance as specified...")
466 | start_network_retrieval = time()
467 | # Transform bounds polygon of poi file to 4326 for OSM
468 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326")
469 | # Extract polygon in EPSG 4326
470 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0]
471 | # Retrieve street network for desired network type
472 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type)
473 | # Project street network graph back to original poi CRS
474 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}")
475 | end_network_retrieval = time()
476 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval
477 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n")
478 |
479 | # Compose area of interest based on isochrones
480 | aoi_geometry = []
481 | for geom in tqdm(poi['geometry'], desc='Retrieving isochrone for point(s) of interest'):
482 | # Find node which is closest to point location as base for next steps
483 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y)
484 | # Create subgraph for efficiency purposes
485 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length")
486 | # Calculate the time it takes to cover each edge's distance if speed_time is True
487 | if speed_time:
488 | for _, _, _, data in buffer_graph.edges(data=True, keys=True):
489 | data["time"] = data["length"] / meters_per_minute
490 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
491 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time")
492 | else:
493 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
494 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length")
495 | # Compute isochrones, see separate function for line by line explanation
496 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph)
497 | aoi_geometry.append(isochrone_poly)
498 |
499 | # Create dataframe of isochrone polygons
500 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}")
501 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n")
502 |
503 | ### Step 3: Perform calculations and write results to file
504 | print("Calculating landcover class percentages...")
505 | start_calc = time()
506 | # Check if areas of interest, resulting from previous steps, are fully covered by landcover raster, provide warning if not
507 | if not all(geom.within(sg.box(*landcover_src.rio.bounds())) for geom in aoi_gdf['geometry']):
508 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the landcover raster, note that results will be based on the intersecting part of the buffer zone")
509 |
510 | # apply the landcover percentage function to each geometry in the GeoDataFrame and create a new Pandas Series
511 | landcover_percentages_series = aoi_gdf.geometry.apply(lambda x: pd.Series(calculate_landcover_percentages(landcover_src=landcover_src, geometry=x)))
512 | # rename the columns with the landcover class values
513 | if landcover_raster_file is None:
514 | landcover_percentages_series = landcover_percentages_series.rename(columns=lambda x: str(classmap.get(x, x)))
515 | else:
516 | landcover_percentages_series.columns = ["class_" + str(col) for col in landcover_percentages_series.columns]
517 | # concatenate the new series to the original dataframe
518 | poi = pd.concat([poi, landcover_percentages_series], axis=1)
519 | end_calc = time()
520 | elapsed_calc = end_calc - start_calc
521 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n")
522 |
523 | if write_to_file:
524 | print("Writing results to new geopackage file in specified directory...")
525 | # Create output directory if the one specified by user does not yet exist
526 | if not os.path.exists(output_dir):
527 | os.makedirs(output_dir)
528 | # Extract poi filename to add information to it when writing to file
529 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file))
530 | poi.to_file(os.path.join(output_dir, f"{input_filename}_LCperc_added.gpkg"), driver="GPKG")
531 | print("Done")
532 |
533 | return poi
534 |
535 | def get_canopy_percentage(point_of_interest_file, canopy_vector_file, crs_epsg=None, polygon_type="neighbourhood", buffer_type=None,
536 | buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, write_to_file=True, output_dir=os.getcwd()):
537 | ### Step 1: Read and process user input, check conditions
538 | poi = gpd.read_file(point_of_interest_file)
539 | # Make sure geometries of poi file are either all provided using point geometries or all using polygon geometries
540 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'):
541 | geom_type = poi.iloc[0]['geometry'].geom_type
542 | else:
543 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function")
544 |
545 | # Make sure type of polygon is specified in case poi file contains polygon geometries
546 | if geom_type == "Polygon":
547 | if polygon_type not in ["neighbourhood", "house"]:
548 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
549 |
550 | # In case of house polygons, transform to centroids
551 | if geom_type == "Polygon":
552 | if polygon_type not in ["neighbourhood", "house"]:
553 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
554 | if polygon_type == "house":
555 | print("Changing geometry type to Point by computing polygon centroids...")
556 | poi['geometry'] = poi['geometry'].centroid
557 | geom_type = poi.iloc[0]['geometry'].geom_type
558 | print("Done \n")
559 |
560 | # Make sure buffer distance and type are set in case of point geometries
561 | if geom_type == "Point":
562 | if buffer_type not in ["euclidean", "network"]:
563 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
564 |
565 | # Make sure CRS is projected rather than geographic
566 | if not poi.crs.is_projected:
567 | if crs_epsg is None:
568 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395")
569 | epsg = 3395
570 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
571 | else:
572 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified")
573 | epsg = crs_epsg
574 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
575 | else:
576 | epsg = poi.crs.to_epsg()
577 |
578 | # Make sure poi dataframe contains ID column
579 | if "id" in poi.columns:
580 | if poi['id'].isnull().values.any():
581 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int)
582 | else:
583 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int)
584 |
585 | # Retrieve tree canopy data
586 | canopy_src = gpd.read_file(canopy_vector_file)
587 | # Make sure geometries in canopy file are of polygon or multipolygon as areas need to be calculated
588 | if not (canopy_src['geometry'].geom_type.isin(['Polygon', 'MultiPolygon']).all()):
589 | raise ValueError("Please make sure all geometries of the tree canopy file are of 'Polygon' or 'MultiPolygon' type and re-run the function")
590 |
591 | # Make sure canopy file has same CRS as poi file
592 | if not canopy_src.crs.to_epsg() == epsg:
593 | print("Adjusting CRS of Greenspace file to match with Point of Interest CRS...")
594 | canopy_src.to_crs(f'EPSG:{epsg}', inplace=True)
595 | print("Done \n")
596 |
597 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the tree canopy file provided
598 | if not all(geom.within(sg.box(*canopy_src.total_bounds)) for geom in poi['geometry']):
599 | if geom_type == "Point":
600 | raise ValueError("Not all points of interest are within the tree canopy file provided, please make sure they are and re-run the function")
601 | else:
602 | if not all(geom.intersects(sg.box(*canopy_src.total_bounds)) for geom in poi['geometry']):
603 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the tree canopy file provided, please make sure they are/do and re-run the function")
604 | else:
605 | print("Warning: Not all polygons of interest are completely within the area covered by the tree canopy file provided, results will be based on intersecting part of polygons involved \n")
606 |
607 | # Make sure the buffer_type argument has a valid value if not None
608 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]:
609 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
610 |
611 | # If buffer type is set to euclidean, make sure that the buffer distance is set
612 | if buffer_type == "euclidean":
613 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
614 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer")
615 |
616 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set
617 | if buffer_type == "network":
618 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
619 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)):
620 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers")
621 | else:
622 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time
623 | # Convert km per hour to m per minute
624 | meters_per_minute = travel_speed * 1000 / 60
625 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects
626 | buffer_dist = trip_time * meters_per_minute * 1.25
627 | else:
628 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time
629 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0:
630 | raise TypeError("Please make sure that one of the following requirements is met:\
631 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\
632 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set")
633 | speed_time = False
634 |
635 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified
636 | if buffer_dist is None:
637 | poi_polygon = sg.box(*poi.total_bounds)
638 | else:
639 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist)
640 |
641 | ### Step 2: Construct the Area of Interest based on the arguments as defined by user
642 | if buffer_type is None:
643 | # Buffer type == None implies that polygon geometries serve as areas of interest
644 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'])
645 | else:
646 | # Make sure buffer dist is set in case buffer type set to euclidean
647 | if buffer_type == "euclidean":
648 | # Create area of interest based on euclidean buffer
649 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist))
650 | else:
651 | # Make sure network_type has valid value
652 | if network_type not in ["walk", "bike", "drive", "all"]:
653 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function")
654 |
655 | # In case poi still contain polygon geometries, compute centroids so that isochrones can be created
656 | if geom_type == "Polygon":
657 | print("Changing geometry type to Point by computing polygon centroids so that isochrone can be retrieved...")
658 | poi['geometry'] = poi['geometry'].centroid
659 | print("Done \n")
660 |
661 | print("Retrieving network within total bounds of point(s) of interest, extended by buffer distance as specified...")
662 | start_network_retrieval = time()
663 | # Transform bounds polygon of poi file to 4326 for OSM
664 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326")
665 | # Extract polygon in EPSG 4326
666 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0]
667 | # Retrieve street network for desired network type
668 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type)
669 | # Project street network graph back to original poi CRS
670 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}")
671 | end_network_retrieval = time()
672 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval
673 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n")
674 |
675 | aoi_geometry = []
676 | for geom in tqdm(poi['geometry'], desc='Retrieving isochrone for point(s) of interest'):
677 | # Find node which is closest to point location as base for next steps
678 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y)
679 | # Create subgraph around poi for efficiency purposes
680 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length")
681 | # Calculate the time it takes to cover each edge's distance if speed_time is True
682 | if speed_time:
683 | for _, _, _, data in buffer_graph.edges(data=True, keys=True):
684 | data["time"] = data["length"] / meters_per_minute
685 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
686 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time")
687 | else:
688 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
689 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length")
690 | # Compute isochrones, see separate function for line by line explanation
691 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph)
692 | aoi_geometry.append(isochrone_poly)
693 |
694 | # Create dataframe of isochrone polygons
695 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}")
696 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n")
697 |
698 |
699 | ### Step 3: Perform calculations and write results to file
700 | print("Calculating percentage of tree canopy coverage...")
701 | start_calc = time()
702 | # Check whether areas of interest, resulting from previous steps, are fully covered by tree canopy file, provide warning if not
703 | if not all(geom.within(sg.box(*canopy_src.total_bounds)) for geom in aoi_gdf['geometry']):
704 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the tree canopy file, note that results will be based on the intersecting part of the buffer zone")
705 |
706 | # Calculate percentage of tree canopy cover
707 | poi['canopy_cover'] = aoi_gdf.apply(lambda row: str(((canopy_src.clip(row.geometry).area.sum()/row.geometry.area)*100).round(2))+'%', axis=1)
708 | end_calc = time()
709 | elapsed_calc = end_calc - start_calc
710 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n")
711 |
712 | if write_to_file:
713 | print("Writing results to new geopackage file in specified directory...")
714 | # Create directory if the one specified by the user does not yet exist
715 | if not os.path.exists(output_dir):
716 | os.makedirs(output_dir)
717 | # Extract filename of poi file to add information to it when writing to file
718 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file))
719 | poi.to_file(os.path.join(output_dir, f"{input_filename}_CanopyPerc_added.gpkg"), driver="GPKG")
720 | print("Done")
721 |
722 | return poi
723 |
724 | def get_park_percentage(point_of_interest_file, park_vector_file=None, crs_epsg=None, polygon_type="neighbourhood", buffer_type=None,
725 | buffer_dist=None, network_type=None, trip_time=None, travel_speed=None, write_to_file=True,
726 | output_dir=os.getcwd()):
727 | ### Step 1: Read and process user input, check conditions
728 | poi = gpd.read_file(point_of_interest_file)
729 | # Make sure geometries of poi file are either all provided using point geometries or all using polygon geometries
730 | if all(poi['geometry'].geom_type == 'Point') or all(poi['geometry'].geom_type == 'Polygon'):
731 | geom_type = poi.iloc[0]['geometry'].geom_type
732 | else:
733 | raise ValueError("Please make sure all geometries are of 'Point' type or all geometries are of 'Polygon' type and re-run the function")
734 |
735 | # Make sure type of polygon is specified in case poi file contains polygon geometries
736 | if geom_type == "Polygon":
737 | if polygon_type not in ["neighbourhood", "house"]:
738 | raise ValueError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
739 |
740 | # In case of house polygons, transform to centroids
741 | if geom_type == "Polygon":
742 | if polygon_type not in ["neighbourhood", "house"]:
743 | raise TypeError("Please make sure that the polygon_type argument is set to either 'neighbourhood' or 'house'")
744 | if polygon_type == "house":
745 | print("Changing geometry type to Point by computing polygon centroids...")
746 | poi['geometry'] = poi['geometry'].centroid
747 | geom_type = poi.iloc[0]['geometry'].geom_type
748 | print("Done \n")
749 |
750 | # Make sure buffer distance and type are set in case of point geometries
751 | if geom_type == "Point":
752 | if buffer_type not in ["euclidean", "network"]:
753 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
754 |
755 | # Make sure CRS is projected rather than geographic
756 | if not poi.crs.is_projected:
757 | if crs_epsg is None:
758 | print("Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to CRS with EPSG:3395")
759 | epsg = 3395
760 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
761 | else:
762 | print(f"Warning: The CRS of the PoI dataset is currently geographic, therefore it will now be projected to EPSG:{crs_epsg} as specified")
763 | epsg = crs_epsg
764 | poi.to_crs(f"EPSG:{epsg}", inplace=True)
765 | else:
766 | epsg = poi.crs.to_epsg()
767 |
768 | # Make sure poi dataframe contains ID column
769 | if "id" in poi.columns:
770 | if poi['id'].isnull().values.any():
771 | poi['id'] = poi['id'].fillna(pd.Series(range(1, len(poi) + 1))).astype(int)
772 | else:
773 | poi['id'] = pd.Series(range(1, len(poi) + 1)).astype(int)
774 |
775 | # Make sure the buffer_type argument has a valid value if not None
776 | if buffer_type is not None and buffer_type not in ["euclidean", "network"]:
777 | raise ValueError("Please make sure that the buffer_type argument is set to either 'euclidean' or 'network' and re-run the function")
778 |
779 | # If buffer type is set to euclidean, make sure that the buffer distance is set
780 | if buffer_type == "euclidean":
781 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
782 | raise TypeError("Please make sure that the buffer_dist argument is set to a positive integer")
783 |
784 | # If buffer type is set to network, make sure that either the buffer distance is set or both trip_time and travel_speed are set
785 | if buffer_type == "network":
786 | if not isinstance(buffer_dist, int) or (not buffer_dist > 0):
787 | if not isinstance(travel_speed, int) or (not travel_speed > 0) or (not isinstance(trip_time, int) or (not trip_time > 0)):
788 | raise TypeError("Please make sure that either the buffer_dist argument is set to a positive integer or both the travel_speed and trip_time are set to positive integers")
789 | else:
790 | speed_time = True # Set variable stating whether buffer_dist is calculated using travel speed and trip time
791 | # Convert km per hour to m per minute
792 | meters_per_minute = travel_speed * 1000 / 60
793 | # Calculate max distance that can be travelled based on argument specified by user and add 25% to account for edge effects
794 | buffer_dist = trip_time * meters_per_minute * 1.25
795 | else:
796 | # Buffer_dist and combination of travel_speed and trip_time cannot be set at same time
797 | if isinstance(travel_speed, int) and travel_speed > 0 and isinstance(trip_time, int) and trip_time > 0:
798 | raise TypeError("Please make sure that one of the following requirements is met:\
799 | \n1. If buffer_dist is set, travel_speed and trip_time should not be set\
800 | \n2. If travel_speed and trip_time are set, buffer_dist shoud not be set")
801 | speed_time = False
802 |
803 | # Create polygon in which all pois are located to extract data from PC/OSM, incl. buffer if specified
804 | if buffer_dist is None:
805 | poi_polygon = sg.box(*poi.total_bounds)
806 | else:
807 | poi_polygon = sg.box(*poi.total_bounds).buffer(buffer_dist)
808 | # Transform to 4326 for OSM
809 | polygon_gdf_wgs = gpd.GeoDataFrame(geometry=[poi_polygon], crs=f"EPSG:{epsg}").to_crs("EPSG:4326")
810 | # Extract polygon in EPSG 4326
811 | wgs_polygon = polygon_gdf_wgs['geometry'].values[0]
812 |
813 | ### Step 2: Read park data, retrieve from OSM if not provided by user
814 | if park_vector_file is None:
815 | print(f"Retrieving parks within total bounds of {geom_type}(s) of interest, extended by buffer distance if specified...")
816 | start_park_retrieval = time()
817 | # Tags seen as Urban Greenspace (UGS) require the following:
818 | # 1. Tag represent an area
819 | # 2. The area is outdoor
820 | # 3. The area is (semi-)publically available
821 | # 4. The area is likely to contain trees, grass and/or greenery
822 | # 5. The area can reasonable be used for walking or recreational activities
823 | park_tags = {'landuse':['allotments','forest','greenfield','village_green'], 'leisure':['garden','fitness_station','nature_reserve','park','playground'],'natural':'grassland'}
824 | # Extract parks from OpenStreetMap
825 | park_src = ox.geometries_from_polygon(wgs_polygon, tags=park_tags)
826 | # Change CRS to the same one as poi file
827 | park_src.to_crs(f"EPSG:{epsg}", inplace=True)
828 | # Create a boolean mask to filter out polygons and multipolygons
829 | polygon_mask = park_src['geometry'].apply(lambda geom: geom.geom_type in ['Polygon', 'MultiPolygon'])
830 | # Filter the GeoDataFrame to keep only polygons and multipolygons
831 | park_src = park_src.loc[polygon_mask]
832 | end_park_retrieval = time()
833 | elapsed_park_retrieval = end_park_retrieval - start_park_retrieval
834 | print(f"Done, running time: {str(timedelta(seconds=elapsed_park_retrieval))} \n")
835 | else:
836 | park_src = gpd.read_file(park_vector_file)
837 | # Make sure geometries are all polygons or multipolygons as areas should be calculated
838 | if not (park_src['geometry'].geom_type.isin(['Polygon', 'MultiPolygon']).all()):
839 | raise ValueError("Please make sure all geometries of the park file are of 'Polygon' or 'MultiPolygon' type and re-run the function")
840 |
841 | # Make sure CRS of park file is same as CRS of poi file
842 | if not park_src.crs.to_epsg() == epsg:
843 | print("Adjusting CRS of Greenspace file to match with Point of Interest CRS...")
844 | park_src.to_crs(f'EPSG:{epsg}', inplace=True)
845 | print("Done \n")
846 |
847 | # Make sure all points of interest are within or do at least intersect (in case of polygons) the park file provided
848 | if not all(geom.within(sg.box(*park_src.total_bounds)) for geom in poi['geometry']):
849 | if geom_type == "Point":
850 | raise ValueError("Not all points of interest are within the park file provided, please make sure they are and re-run the function")
851 | else:
852 | if not all(geom.intersects(sg.box(*park_src.total_bounds)) for geom in poi['geometry']):
853 | raise ValueError("Not all polygons of interest are within, or do at least partly intersect, with the area covered by the park file provided, please make sure they are/do and re-run the function")
854 | else:
855 | print("Warning: Not all polygons of interest are completely within the area covered by the park file provided, results will be based on intersecting part of polygons involved \n")
856 |
857 | ### Step 3: Construct the Area of Interest based on the arguments as defined by user
858 | if buffer_type is None:
859 | # Buffer type == None implies that polygon geometries serve as areas of interest
860 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'])
861 | else:
862 | # Make sure buffer dist is set in case buffer type is euclidean
863 | if buffer_type == "euclidean":
864 | # Create area of interest based on euclidean buffer
865 | aoi_gdf = gpd.GeoDataFrame(geometry=poi['geometry'].buffer(buffer_dist))
866 | else:
867 | # Make sure network type has valid value
868 | if network_type not in ["walk", "bike", "drive", "all"]:
869 | raise ValueError("Please make sure that the network_type argument is set to either 'walk', 'bike, 'drive' or 'all', and re-run the function")
870 |
871 | # If poi still contains polygon geometries, compute centroids so that isochrones can be created
872 | if geom_type == "Polygon":
873 | print("Changing geometry type to Point by computing polygon centroids so that isochrones can be retrieved...")
874 | poi['geometry'] = poi['geometry'].centroid
875 | print("Done \n")
876 |
877 | print(f"Retrieving network within total bounds of {geom_type}(s) of interest, extended by buffer distance as specified...")
878 | start_network_retrieval = time()
879 | # Retrieve street network for desired network type
880 | graph = ox.graph_from_polygon(wgs_polygon, network_type=network_type)
881 | # Project street network graph back to original poi CRS
882 | graph_projected = ox.project_graph(graph, to_crs=f"EPSG:{epsg}")
883 | end_network_retrieval = time()
884 | elapsed_network_retrieval = end_network_retrieval - start_network_retrieval
885 | print(f"Done, running time: {str(timedelta(seconds=elapsed_network_retrieval))} \n")
886 |
887 | aoi_geometry = []
888 | for geom in tqdm(poi['geometry'], desc='Retrieving isochrone for point(s) of interest'):
889 | # Find node which is closest to point location as base for next steps
890 | center_node = ox.distance.nearest_nodes(graph_projected, geom.x, geom.y)
891 | # Create subgraph around poi for efficiency purposes
892 | buffer_graph = nx.ego_graph(graph_projected, center_node, radius=buffer_dist*2, distance="length")
893 | # Calculate the time it takes to cover each edge's distance if speed_time is True
894 | if speed_time:
895 | for _, _, _, data in buffer_graph.edges(data=True, keys=True):
896 | data["time"] = data["length"] / meters_per_minute
897 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
898 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=trip_time, distance="time")
899 | else:
900 | # Create sub graph of the street network which contains only parts which can be reached within specified travel parameters
901 | subgraph = nx.ego_graph(buffer_graph, center_node, radius=buffer_dist, distance="length")
902 | # Compute isochrones, see separate function for line by line explanation
903 | isochrone_poly = make_iso_poly(buffer_graph=buffer_graph, subgraph=subgraph)
904 | aoi_geometry.append(isochrone_poly)
905 |
906 | # Create dataframe with isochrone geometries
907 | aoi_gdf = gpd.GeoDataFrame(geometry=aoi_geometry, crs=f"EPSG:{epsg}")
908 | print("Note: creation of isochrones based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb \n")
909 |
910 | ### Step 4: Perform calculations and write results to file
911 | print("Calculating percentage of park area coverage...")
912 | start_calc = time()
913 | # Check whether areas of interest, resulting from previous steps, are fully covered by park file, provide warning if not
914 | if not all(geom.within(sg.box(*park_src.total_bounds)) for geom in aoi_gdf['geometry']):
915 | print(f"Warning: Not all buffer zones for the {geom_type}s of Interest are completely within the area covered by the park file, note that results will be based on the intersecting part of the buffer zone")
916 |
917 | # Calculate percentage of park area cover
918 | poi['park_cover'] = aoi_gdf.apply(lambda row: str(((park_src.clip(row.geometry).area.sum()/row.geometry.area)*100).round(2))+'%', axis=1)
919 | end_calc = time()
920 | elapsed_calc = end_calc - start_calc
921 | print(f"Done, running time: {str(timedelta(seconds=elapsed_calc))} \n")
922 |
923 | if write_to_file:
924 | print("Writing results to new geopackage file in specified directory...")
925 | # Create output directory if the one specified by user does not yet exist
926 | if not os.path.exists(output_dir):
927 | os.makedirs(output_dir)
928 | # Extract filename of poi file to add information to it when writing to file
929 | input_filename, _ = os.path.splitext(os.path.basename(point_of_interest_file))
930 | poi.to_file(os.path.join(output_dir, f"{input_filename}_ParkPerc_added.gpkg"), driver="GPKG")
931 | print("Done")
932 |
933 | return poi
934 |
935 | ##### SUPPORTING FUNCTIONS
936 | # Function to create isochrone polygon of network
937 | def make_iso_poly(buffer_graph, subgraph, edge_buff=25, node_buff=0):
938 | #Note: based on code by gboeing, source: https://github.com/gboeing/osmnx-examples/blob/main/notebooks/13-isolines-isochrones.ipynb
939 | node_points = [sg.Point((data["x"], data["y"])) for node, data in subgraph.nodes(data=True)] # Create list of point geometries existing of x and y coordinates for each node in subgraph retrieved from previous step
940 | nodes_gdf = gpd.GeoDataFrame({"id": list(subgraph.nodes)}, geometry=node_points) # Create geodataframe containing data from previous step
941 | nodes_gdf = nodes_gdf.set_index("id") # Set index to node ID
942 |
943 | edge_lines = []
944 | for n_fr, n_to in subgraph.edges(): # Iterate over edges in subgraph
945 | f = nodes_gdf.loc[n_fr].geometry # Retrieve geometry of the 'from' node of the edge
946 | t = nodes_gdf.loc[n_to].geometry # Retrieve geometry of the 'to' node of the edge
947 | edge_lookup = buffer_graph.get_edge_data(n_fr, n_to)[0].get("geometry", sg.LineString([f, t])) # Retrieve edge geometry between from and to nodes
948 | edge_lines.append(edge_lookup) # Append edge geometry to list of edge lines
949 |
950 | n = nodes_gdf.buffer(node_buff).geometry # Create buffer around the nodes
951 | e = gpd.GeoSeries(edge_lines).buffer(edge_buff).geometry # Create buffer around the edges
952 | all_gs = list(n) + list(e) # Concatenate nodes and edges
953 | isochrone_poly = gpd.GeoSeries(all_gs).unary_union # Create polygon of the concatenated nodes and edges
954 |
955 | isochrone_poly = sg.Polygon(isochrone_poly.exterior) # try to fill in surrounded areas so shapes will appear solid and blocks without white space inside them
956 |
957 | return isochrone_poly
958 |
959 | # Function to calculate land cover percentages for a single geometry
960 | def calculate_landcover_percentages(landcover_src, geometry):
961 | # Clip landcover raster to area of interest
962 | clipped = landcover_src.rio.clip([geometry]).clip(min=0)
963 | # Count the occurrences of all unique raster values
964 | unique, counts = np.unique(clipped.values, return_counts=True)
965 | # Calculate total nr. of occurrences
966 | total = counts.sum()
967 | # Calculate percentages for each class
968 | percentages = {value: str((count / total * 100).round(3)) + "%" for value, count in zip(unique, counts)}
969 | return percentages
--------------------------------------------------------------------------------
/modules/osmnx_road_network.py:
--------------------------------------------------------------------------------
1 | # Libraries for working with maps and geospatial data
2 | from vt2geojson.tools import vt_bytes_to_geojson
3 | from shapely.geometry import Point
4 | from scipy.spatial import cKDTree
5 | import geopandas as gpd
6 | import osmnx as ox
7 | import mercantile
8 |
9 | # Libraries for working with concurrency and file manipulation
10 | from concurrent.futures import ThreadPoolExecutor, as_completed
11 | from tqdm import tqdm
12 | import pandas as pd
13 | import numpy as np
14 | import requests
15 |
16 | def get_road_network(city):
17 | # Get the road network graph using OpenStreetMap data
18 | # 'network_type' argument is set to 'drive' to get the road network suitable for driving
19 | # 'simplify' argument is set to 'True' to simplify the road network
20 | G = ox.graph_from_place(city, network_type="drive", simplify=True)
21 |
22 | # Create a set to store unique road identifiers
23 | unique_roads = set()
24 | # Create a new graph to store the simplified road network
25 | G_simplified = G.copy()
26 |
27 | # Iterate over each road segment
28 | for u, v, key, data in G.edges(keys=True, data=True):
29 | # Check if the road segment is a duplicate
30 | if (v, u) in unique_roads:
31 | # Remove the duplicate road segment
32 | G_simplified.remove_edge(u, v, key)
33 | else:
34 | # Add the road segment to the set of unique roads
35 | unique_roads.add((u, v))
36 |
37 | # Update the graph with the simplified road network
38 | G = G_simplified
39 |
40 | # Project the graph from latitude-longitude coordinates to a local projection (in meters)
41 | G_proj = ox.project_graph(G)
42 |
43 | # Convert the projected graph to a GeoDataFrame
44 | # This function projects the graph to the UTM CRS for the UTM zone in which the graph's centroid lies
45 | _, edges = ox.graph_to_gdfs(G_proj)
46 |
47 | return edges
48 |
49 |
50 | # Get a list of points over the road map with a N distance between them
51 | def select_points_on_road_network(roads, N=50):
52 | points = []
53 | # Iterate over each road
54 |
55 | for row in roads.itertuples(index=True, name='Road'):
56 | # Get the LineString object from the geometry
57 | linestring = row.geometry
58 | index = row.Index
59 |
60 | # Calculate the distance along the linestring and create points every 50 meters
61 | for distance in range(0, int(linestring.length), N):
62 | # Get the point on the road at the current position
63 | point = linestring.interpolate(distance)
64 |
65 | # Add the curent point to the list of points
66 | points.append([point, index])
67 |
68 | # Convert the list of points to a GeoDataFrame
69 | gdf_points = gpd.GeoDataFrame(points, columns=["geometry", "road_index"], geometry="geometry")
70 |
71 | # Set the same CRS as the road dataframes for the points dataframe
72 | gdf_points.set_crs(roads.crs, inplace=True)
73 |
74 | # Drop duplicate rows based on the geometry column
75 | gdf_points = gdf_points.drop_duplicates(subset=['geometry'])
76 | gdf_points = gdf_points.reset_index(drop=True)
77 |
78 | return gdf_points
79 |
80 |
81 | # This function extracts the features for a given tile
82 | def get_features_for_tile(tile, access_token):
83 | # This URL retrieves all the features within the tile. These features are then going to be assigned to each sample point depending on the distance.
84 | tile_url = f"https://tiles.mapillary.com/maps/vtp/mly1_public/2/{tile.z}/{tile.x}/{tile.y}?access_token={access_token}"
85 | response = requests.get(tile_url)
86 | result = vt_bytes_to_geojson(response.content, tile.x, tile.y, tile.z, layer="image")
87 | return [tile, result]
88 |
89 |
90 | def get_features_on_points(points, access_token, max_distance=50, zoom=14):
91 | # Store the local crs in meters that was assigned by osmnx previously so we can use it to calculate the distances between features and points
92 | local_crs = points.crs
93 |
94 | # Set the CRS to 4326 because it is used by Mapillary
95 | points.to_crs(crs=4326, inplace=True)
96 |
97 | # Add a new column to gdf_points that contains the tile coordinates for each point
98 | points["tile"] = [mercantile.tile(x, y, zoom) for x, y in zip(points.geometry.x, points.geometry.y)]
99 |
100 | # Group the points by their corresponding tiles
101 | groups = points.groupby("tile")
102 |
103 | # Download the tiles and extract the features for each group
104 | features = []
105 |
106 | # To make the process faster the tiles are downloaded using threads
107 | with ThreadPoolExecutor(max_workers=10) as executor:
108 | futures = []
109 |
110 | for tile, _ in groups:
111 | futures.append(executor.submit(get_features_for_tile, tile, access_token))
112 |
113 | for future in tqdm(as_completed(futures), total=len(futures), desc="Downloading tiles"):
114 | result = future.result()
115 | features.append(result)
116 |
117 | pd_features = pd.DataFrame(features, columns=["tile", "features"])
118 |
119 | # Compute distances between each feature and all the points in gdf_points
120 | feature_points = gpd.GeoDataFrame(
121 | [(Point(f["geometry"]["coordinates"]), f) for row in pd_features["features"] for f in row["features"]],
122 | columns=["geometry", "feature"],
123 | geometry="geometry",
124 | crs=4326
125 | )
126 |
127 | # Transform from EPSG:4326 (world °) to the local crs in meters that we got when we projected the roads graph in the previous step
128 | feature_points.to_crs(local_crs, inplace=True)
129 | points.to_crs(local_crs, inplace=True)
130 |
131 | # Create a KDTree (k-dimensional tree) from the "geometry" coordinates of feature_points
132 | feature_tree = cKDTree(feature_points["geometry"].apply(lambda p: [p.x, p.y]).tolist())
133 | # Use the KDTree to query the nearest neighbors of the points in the "geometry" column of points DataFrame
134 | # The query returns the distances and indices of the nearest neighbors
135 | # The parameter "k=1" specifies that we want to find the nearest neighbor
136 | # The parameter "distance_upper_bound=max_distance" sets a maximum distance for the nearest neighbors
137 | distances, indices = feature_tree.query(points["geometry"].apply(lambda p: [p.x, p.y]).tolist(), k=1, distance_upper_bound=max_distance/2)
138 |
139 | # Create a list to store the closest features and distances to each point. If there are no images close then set the value of both to None
140 | closest_features = [feature_points.loc[i, "feature"] if np.isfinite(distances[idx]) else None for idx, i in enumerate(indices)]
141 | closest_distances = [distances[idx] if np.isfinite(distances[idx]) else None for idx in range(len(distances))]
142 |
143 | # Store the closest feature for each point in the "feature" column of the points DataFrame
144 | points["feature"] = closest_features
145 |
146 | # Store the distances as a new column in points
147 | points["distance"] = closest_distances
148 |
149 | # Store image id and is panoramic information as part of the dataframe
150 | points["image_id"] = points.apply(lambda row: str(row["feature"]["properties"]["id"]) if row["feature"] else "", axis=1)
151 | points["image_id"] = points["image_id"].astype(str)
152 |
153 | points["is_panoramic"] = points.apply(lambda row: bool(row["feature"]["properties"]["is_pano"]) if row["feature"] else None, axis=1)
154 | points["is_panoramic"] = points["is_panoramic"].astype(bool)
155 |
156 | # Convert results to geodataframe
157 | points["road_index"] = points["road_index"].astype(str)
158 | points["tile"] = points["tile"].astype(str)
159 |
160 | # Save the current index as a column
161 | points["id"] = points.index
162 | points = points.reset_index(drop=True)
163 |
164 | # Transform the coordinate reference system to EPSG 4326
165 | points.to_crs(epsg=4326, inplace=True)
166 |
167 | return points
--------------------------------------------------------------------------------
/modules/process_data.py:
--------------------------------------------------------------------------------
1 | import os
2 | os.environ['USE_PYGEOS'] = '0'
3 |
4 | from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
5 | from scipy.signal import find_peaks
6 | import torch
7 |
8 | from concurrent.futures import ThreadPoolExecutor, as_completed
9 | from tqdm import tqdm
10 | import threading
11 | import csv
12 |
13 | from modules.segmentation_images import save_images
14 |
15 | from PIL import Image, ImageFile
16 | import numpy as np
17 | import requests
18 |
19 | ImageFile.LOAD_TRUNCATED_IMAGES = True
20 |
21 | def prepare_folders(city):
22 | # Create folder for storing GVI results, sample points and road network if they don't exist yet
23 | dir_path = os.path.join("results", city, "gvi")
24 | if not os.path.exists(dir_path):
25 | os.makedirs(dir_path)
26 |
27 | dir_path = os.path.join("results", city, "points")
28 | if not os.path.exists(dir_path):
29 | os.makedirs(dir_path)
30 |
31 | dir_path = os.path.join("results", city, "roads")
32 | if not os.path.exists(dir_path):
33 | os.makedirs(dir_path)
34 |
35 | dir_path = os.path.join("results", city, "sample_images")
36 | if not os.path.exists(dir_path):
37 | os.makedirs(dir_path)
38 |
39 |
40 | def get_models():
41 | # Load the pretrained AutoImageProcessor from the "facebook/mask2former-swin-large-cityscapes-semantic" model
42 | processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic")
43 | # Set the device to GPU if available, otherwise use CPU
44 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
45 | # Load the pretrained Mask2FormerForUniversalSegmentation model from "facebook/mask2former-swin-large-cityscapes-semantic"
46 | model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-cityscapes-semantic")
47 | # Move the model to the specified device (GPU or CPU)
48 | model = model.to(device)
49 | # Return the processor and model as a tuple
50 | return processor, model
51 |
52 |
53 | def segment_images(image, processor, model):
54 | # Preprocess the image using the image processor
55 | inputs = processor(images=image, return_tensors="pt")
56 |
57 | # Perform a forward pass through the model to obtain the segmentation
58 | with torch.no_grad():
59 | # Check if a GPU is available
60 | if torch.cuda.is_available():
61 | # Move the inputs to the GPU
62 | inputs = {k: v.to('cuda') for k, v in inputs.items()}
63 | # Perform the forward pass through the model
64 | outputs = model(**inputs)
65 | # Post-process the semantic segmentation outputs using the processor and move the results to CPU
66 | segmentation = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0].to('cpu')
67 | else:
68 | # Perform the forward pass through the model
69 | outputs = model(**inputs)
70 | # Post-process the semantic segmentation outputs using the processor
71 | segmentation = processor.post_process_semantic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
72 |
73 | return segmentation
74 |
75 |
76 | # Based on Matthew Danish code (https://github.com/mrd/vsvi_filter/tree/master)
77 | def run_length_encoding(in_array):
78 | # Convert input array to a NumPy array
79 | image_array = np.asarray(in_array)
80 | length = len(image_array)
81 | if length == 0:
82 | # Return None values if the array is empty
83 | return (None, None, None)
84 | else:
85 | # Calculate run lengths and change points in the array
86 | pairwise_unequal = image_array[1:] != image_array[:-1]
87 | change_points = np.append(np.where(pairwise_unequal), length - 1) # must include last element posi
88 | run_lengths = np.diff(np.append(-1, change_points)) # run lengths
89 | return(run_lengths, image_array[change_points])
90 |
91 | def get_road_pixels_per_column(prediction):
92 | # Check which pixels in the prediction array correspond to roads (label 0)
93 | road_pixels = prediction == 0.0
94 | road_pixels_per_col = np.zeros(road_pixels.shape[1])
95 |
96 | for i in range(road_pixels.shape[1]):
97 | # Encode the road pixels in each column and calculate the maximum run length
98 | run_lengths, values = run_length_encoding(road_pixels[:,i])
99 | road_pixels_per_col[i] = run_lengths[values.nonzero()].max(initial=0)
100 | return road_pixels_per_col
101 |
102 | def get_road_centres(prediction, distance=2000, prominence=100):
103 | # Get the road pixels per column in the prediction
104 | road_pixels_per_col = get_road_pixels_per_column(prediction)
105 |
106 | # Find peaks in the road_pixels_per_col array based on distance and prominence criteria
107 | peaks, _ = find_peaks(road_pixels_per_col, distance=distance, prominence=prominence)
108 |
109 | return peaks
110 |
111 |
112 | def find_road_centre(segmentation):
113 | # Calculate distance and prominence thresholds based on the segmentation shape
114 | distance = int(2000 * segmentation.shape[1] // 5760)
115 | prominence = int(100 * segmentation.shape[0] // 2880)
116 |
117 | # Find road centers based on the segmentation, distance, and prominence thresholds
118 | centres = get_road_centres(segmentation, distance=distance, prominence=prominence)
119 |
120 | return centres
121 |
122 |
123 | def crop_panoramic_images_roads(original_width, image, segmentation, road_centre):
124 | width, height = image.size
125 |
126 | # Find duplicated centres
127 | duplicated_centres = [centre - original_width for centre in road_centre if centre >= original_width]
128 |
129 | # Drop the duplicated centres
130 | road_centre = [centre for centre in road_centre if centre not in duplicated_centres]
131 |
132 | # Calculate dimensions and offsets
133 | w4 = int(width / 4) #
134 | h4 = int(height / 4)
135 | hFor43 = int(w4 * 3 / 4)
136 | w98 = width + (w4 / 2)
137 | xrapneeded = int(width * 7 / 8)
138 |
139 | images = []
140 | pickles = []
141 |
142 | # Crop the panoramic image based on road centers
143 | for centre in road_centre:
144 | # Wrapped all the way around
145 | if centre >= w98:
146 | xlo = int((width - centre) - w4/2)
147 | cropped_image = image.crop((xlo, h4, xlo + w4, h4 + hFor43))
148 | cropped_segmentation = segmentation[h4:h4+hFor43, xlo:xlo+w4]
149 |
150 | # Image requires assembly of two sides
151 | elif centre > xrapneeded:
152 | xlo = int(centre - (w4/2)) # horizontal_offset
153 | w4_p1 = width - xlo
154 | w4_p2 = w4 - w4_p1
155 |
156 | # Crop and concatenate image and segmentation
157 | cropped_image_1 = image.crop((xlo, h4, xlo + w4_p1, h4 + hFor43))
158 | cropped_image_2 = image.crop((0, h4, w4_p2, h4 + hFor43))
159 |
160 | cropped_image = Image.new(image.mode, (w4, hFor43))
161 | cropped_image.paste(cropped_image_1, (0, 0))
162 | cropped_image.paste(cropped_image_2, (w4_p1, 0))
163 |
164 | cropped_segmentation_1 = segmentation[h4:h4+hFor43, xlo:xlo+w4_p1]
165 | cropped_segmentation_2 = segmentation[h4:h4+hFor43, 0:w4_p2]
166 | cropped_segmentation = torch.cat((cropped_segmentation_1, cropped_segmentation_2), dim=1)
167 |
168 | # Must paste together the two sides of the image
169 | elif centre < (w4 / 2):
170 | w4_p1 = int((w4 / 2) - centre)
171 | xhi = width - w4_p1
172 | w4_p2 = w4 - w4_p1
173 |
174 | # Crop and concatenate image and segmentation
175 | cropped_image_1 = image.crop((xhi, h4, xhi + w4_p1, h4 + hFor43))
176 | cropped_image_2 = image.crop((0, h4, w4_p2, h4 + hFor43))
177 |
178 | cropped_image = Image.new(image.mode, (w4, hFor43))
179 | cropped_image.paste(cropped_image_1, (0, 0))
180 | cropped_image.paste(cropped_image_2, (w4_p1, 0))
181 |
182 | cropped_segmentation_1 = segmentation[h4:h4+hFor43, xhi:xhi+w4_p1]
183 | cropped_segmentation_2 = segmentation[h4:h4+hFor43, 0:w4_p2]
184 | cropped_segmentation = torch.cat((cropped_segmentation_1, cropped_segmentation_2), dim=1)
185 |
186 | # Straightforward crop
187 | else:
188 | xlo = int(centre - w4/2)
189 | cropped_image = image.crop((xlo, h4, xlo + w4, h4 + hFor43))
190 | cropped_segmentation = segmentation[h4:h4+hFor43, xlo:xlo+w4]
191 |
192 | images.append(cropped_image)
193 | pickles.append(cropped_segmentation)
194 |
195 | return images, pickles
196 |
197 |
198 | def crop_panoramic_images(image, segmentation):
199 | width, height = image.size
200 |
201 | w4 = int(width / 4)
202 | h4 = int(height / 4)
203 | hFor43 = int(w4 * 3 / 4)
204 |
205 | images = []
206 | pickles = []
207 |
208 | # Crop the panoramic image based on road centers
209 | for w in range(4):
210 | x_begin = w * w4
211 | x_end = (w + 1) * w4
212 | cropped_image = image.crop((x_begin, h4, x_end, h4 + hFor43))
213 | cropped_segmentation = segmentation[h4:h4+hFor43, x_begin:x_end]
214 |
215 | images.append(cropped_image)
216 | pickles.append(cropped_segmentation)
217 |
218 | return images, pickles
219 |
220 |
221 | def get_GVI(segmentations):
222 | total_pixels = 0
223 | vegetation_pixels = 0
224 |
225 | for segment in segmentations:
226 | # Calculate the total number of pixels in the segmentation
227 | total_pixels += segment.numel()
228 | # Filter the pixels that represent vegetation (label 8) and count them
229 | vegetation_pixels += (segment == 8).sum().item()
230 |
231 | # Calculate the percentage of green pixels in the segmentation
232 | return vegetation_pixels / total_pixels if total_pixels else 0
233 |
234 |
235 | def process_images(image_url, is_panoramic, cut_by_road_centres, processor, model):
236 | try:
237 | # Fetch and process the image
238 | image = Image.open(requests.get(image_url, stream=True).raw)
239 |
240 | if is_panoramic:
241 | # Get the size of the image
242 | width, height = image.size
243 |
244 | # Crop the bottom 20% of the image to remove the band at the bottom of the panoramic image
245 | bottom_crop = int(height * 0.2)
246 | image = image.crop((0, 0, width, height - bottom_crop))
247 |
248 | # Apply the semantic segmentation to the image
249 | segmentation = segment_images(image, processor, model)
250 |
251 | if cut_by_road_centres:
252 | # Create a widened panorama by wrapping the first 25% of the image onto the right edge
253 | width, height = image.size
254 | w4 = int(0.25 * width)
255 |
256 | segmentation_25 = segmentation[:, :w4]
257 | # Concatenate the tensors along the first dimension (rows) to create the widened panorama with the segmentations
258 | segmentation_road = torch.cat((segmentation, segmentation_25), dim=1)
259 |
260 | cropped_image = image.crop((0, 0, w4, height))
261 | widened_image = Image.new(image.mode, (width + w4, height))
262 | widened_image.paste(image, (0, 0))
263 | widened_image.paste(cropped_image, (width, 0))
264 |
265 | # Find the road centers to determine if the image is suitable for analysis
266 | road_centre = find_road_centre(segmentation_road)
267 |
268 | # Crop the image and its segmentation based on the previously found road centers
269 | images, pickles = crop_panoramic_images_roads(width, widened_image, segmentation_road, road_centre)
270 |
271 | # Calculate the Green View Index (GVI) for the cropped segmentations
272 | GVI = get_GVI(pickles)
273 | else:
274 | # Cut panoramic image in 4 equal parts
275 | # Crop the image and its segmentation based on the previously found road centers
276 | images, pickles = crop_panoramic_images(image, segmentation)
277 |
278 | # Calculate the Green View Index (GVI) for the cropped segmentations
279 | GVI = get_GVI(pickles)
280 |
281 | return images, pickles, [GVI, True, False, False]
282 |
283 | else:
284 | # Apply the semantic segmentation to the image
285 | segmentation = segment_images(image, processor, model)
286 |
287 | # If the image is not panoramic, use the segmentation as it is
288 | # Find the road centers to determine if the image is suitable for analysis
289 | road_centre = find_road_centre(segmentation)
290 |
291 | if len(road_centre) > 0:
292 | # Calculate the Green View Index (GVI) for the cropped segmentations
293 | GVI = get_GVI([segmentation])
294 | return [image], [segmentation], [GVI, False, False, False]
295 | else:
296 | # There are no road centers, so the image is not suitable for analysis
297 | return [image], [segmentation], [None, None, True, False]
298 | except:
299 | # If there was an error while processing the image, set the "error" flag to true and continue with other images
300 | return None, None, [None, None, True, True]
301 |
302 |
303 | # Download images
304 | def download_image(id, geometry, image_id, is_panoramic, save_sample, city, cut_by_road_centres, access_token, processor, model):
305 | # Check if the image id exists
306 | if image_id:
307 | try:
308 | # Create the authorization header for the Mapillary API request
309 | header = {'Authorization': 'OAuth {}'.format(access_token)}
310 |
311 | # Build the URL to fetch the image thumbnail's original URL
312 | url = 'https://graph.mapillary.com/{}?fields=thumb_original_url'.format(image_id)
313 |
314 | # Send a GET request to the Mapillary API to obtain the image URL
315 | response = requests.get(url, headers=header)
316 | data = response.json()
317 |
318 | # Extract the image URL from the response data
319 | image_url = data["thumb_original_url"]
320 |
321 | # Process the downloaded image using the provided image URL, is_panoramic flag, processor, and model
322 | images, segmentations, result = process_images(image_url, is_panoramic, cut_by_road_centres, processor, model)
323 |
324 | if save_sample:
325 | save_images(city, id, images, segmentations, result[0])
326 |
327 | except:
328 | # An error occurred during the downloading of the image
329 | result = [None, None, True, True]
330 | else:
331 | # The point doesn't have an associated image, so we set the missing value flags
332 | result = [None, None, True, False]
333 |
334 | # Insert the coordinates (x and y) and the point ID at the beginning of the result list
335 | # This helps us associate the values in the result list with their corresponding point
336 | result.insert(0, geometry.y)
337 | result.insert(0, geometry.x)
338 | result.insert(0, id)
339 |
340 | return result
341 |
342 |
343 | def download_images_for_points(gdf, access_token, max_workers, cut_by_road_centres, city, file_name):
344 | # Get image processing models
345 | processor, model = get_models()
346 |
347 | # Prepare CSV file path
348 | csv_file = f"gvi-points-{file_name}.csv"
349 | csv_path = os.path.join("results", city, "gvi", csv_file)
350 |
351 | # Check if the CSV file exists and chose the correct editing mode
352 | file_exists = os.path.exists(csv_path)
353 | mode = 'a' if file_exists else 'w'
354 |
355 | # Create a lock object for thread safety
356 | results = []
357 | lock = threading.Lock()
358 |
359 | # Open the CSV file in append mode with newline=''
360 | with open(csv_path, mode, newline='') as csvfile:
361 | # Create a CSV writer object
362 | writer = csv.writer(csvfile)
363 |
364 | # Write the header row if the file is newly created
365 | if not file_exists:
366 | writer.writerow(["id", "x", "y", "GVI", "is_panoramic", "missing", "error"])
367 |
368 | # Create a ThreadPoolExecutor to process images concurrently
369 | with ThreadPoolExecutor(max_workers=max_workers) as executor:
370 | futures = []
371 |
372 | # Iterate over the rows in the GeoDataFrame
373 | for _, row in gdf.iterrows():
374 | try:
375 | # Submit a download_image task to the executor
376 | futures.append(executor.submit(download_image, row["id"], row["geometry"], row["image_id"], row["is_panoramic"], row["save_sample"], city, cut_by_road_centres, access_token, processor, model))
377 | except Exception as e:
378 | print(f"Exception occurred for row {row['id']}: {str(e)}")
379 |
380 | # Process the completed futures using tqdm for progress tracking
381 | for future in tqdm(as_completed(futures), total=len(futures), desc=f"Downloading images"):
382 | # Retrieve the result of the completed future
383 | image_result = future.result()
384 |
385 | # Acquire the lock before appending to results and writing to the CSV file
386 | with lock:
387 | results.append(image_result)
388 | writer.writerow(image_result)
389 |
390 | # Return the processed image results
391 | return results
--------------------------------------------------------------------------------
/modules/segmentation_images.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | import numpy as np
3 |
4 | # Color palette to map each class to a RGB value
5 | color_palette = [
6 | [128, 64, 128], # 0: road - maroon
7 | [244, 35, 232], # 1: sidewalk - pink
8 | [70, 70, 70], # 2: building - dark gray
9 | [102, 102, 156], # 3: wall - purple
10 | [190, 153, 153], # 4: fence - light brown
11 | [153, 153, 153], # 5: pole - gray
12 | [250, 170, 30], # 6: traffic light - orange
13 | [220, 220, 0], # 7: traffic sign - yellow
14 | [0, 255, 0], # 8: vegetation - dark green
15 | [152, 251, 152], # 9: terrain - light green
16 | [70, 130, 180], # 10: sky - blue
17 | [220, 20, 60], # 11: person - red
18 | [255, 0, 0], # 12: rider - bright red
19 | [0, 0, 142], # 13: car - dark blue
20 | [0, 0, 70], # 14: truck - navy blue
21 | [0, 60, 100], # 15: bus - dark teal
22 | [0, 80, 100], # 16: train - dark green
23 | [0, 0, 230], # 17: motorcycle - blue
24 | [119, 11, 32] # 18: bicycle - dark red
25 | ]
26 |
27 | def visualize_results(city, image_id, image, segmentation, gvi, num):
28 |
29 | fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(9, 4), sharey=True)
30 |
31 | # Display the widened panorama image
32 | ax1.imshow(image)
33 | ax1.set_title("Image")
34 | ax1.axis("off")
35 |
36 | # Map the segmentation result to the color palette
37 | seg_color = np.zeros(segmentation.shape + (3,), dtype=np.uint8)
38 | for label, color in enumerate(color_palette):
39 | seg_color[segmentation == label] = color
40 |
41 | # Display the colored segmentation result
42 | ax2.imshow(seg_color)
43 | ax2.set_title("Segmentation")
44 | ax2.axis("off")
45 |
46 | fig.savefig("results/{}/sample_images/{}-{}.png".format(city, image_id, num), bbox_inches='tight', dpi=100)
47 |
48 |
49 | def save_images(city, image_id, images, pickles, gvi):
50 | num = 0
51 |
52 | for image, segmentation in zip(images, pickles):
53 | num += 1
54 | visualize_results(city, image_id, image, segmentation, gvi, num)
--------------------------------------------------------------------------------
/predict_missing_gvi.py:
--------------------------------------------------------------------------------
1 | from sklearn.model_selection import cross_val_score
2 | from sklearn.linear_model import LinearRegression
3 | from pygam import LinearGAM, s
4 | import geopandas as gpd
5 | import pandas as pd
6 | import numpy as np
7 | import rasterio
8 | import sys
9 | import os
10 |
11 | # Function to calculate mean NDVI taken from Yúri Grings' GitHub repository
12 | # https://github.com/Spatial-Data-Science-and-GEO-AI-Lab/GreenEx_Py
13 | from modules.availability import get_mean_NDVI
14 |
15 | def calculate_ndvi(gvi, ndvi, N, city, crs):
16 | ndvi_folder = os.path.join("results", city, "ndvi")
17 |
18 | mean_ndvi = get_mean_NDVI( point_of_interest_file=gvi,
19 | ndvi_raster_file = ndvi,
20 | buffer_type="euclidean",
21 | buffer_dist=N,
22 | crs_epsg=crs,
23 | write_to_file=False,
24 | save_ndvi=False)
25 |
26 | # Save the calculated NDVI values to a file
27 | mean_ndvi.to_crs(crs=4326, inplace=True)
28 | path_to_file = os.path.join(ndvi_folder, "calculated_ndvi_values.gpkg")
29 | mean_ndvi.to_file(path_to_file, driver="GPKG", crs=4326)
30 |
31 | return path_to_file
32 |
33 |
34 | def linear_regression(city):
35 | ndvi_folder = os.path.join("results", city, "ndvi")
36 |
37 | # Load ndvi layer
38 | ndvi_file = os.path.join(ndvi_folder, "calculated_ndvi_values.gpkg")
39 | ndvi_df = gpd.read_file(ndvi_file, layer="calculated_ndvi_values", crs=4326)
40 |
41 | # Separate data into known and missing GVI values
42 | known_df = ndvi_df[ndvi_df['missing'] == False].copy()
43 | missing_df = ndvi_df[ndvi_df['missing'] == True].copy()
44 |
45 | # Split known data into features (NDVI) and target (GVI)
46 | X_train = known_df[['mean_NDVI']]
47 | y_train = known_df['GVI']
48 |
49 | # Prepare missing data for prediction
50 | X_test = missing_df[['mean_NDVI']]
51 |
52 | # Perform linear regression
53 | lin_reg = LinearRegression()
54 | lin_reg.fit(X_train, y_train)
55 |
56 | predicted_GVI = lin_reg.predict(X_test)
57 |
58 | # Assign the predicted values to the missing GVI values in the DataFrame
59 | missing_df['GVI'] = predicted_GVI
60 |
61 | # Concatenate the updated missing values with the known values
62 | updated_df = pd.concat([known_df, missing_df])
63 |
64 | path_to_file = os.path.join(ndvi_folder, "calculated_missing_values_linreg.gpkg")
65 | updated_df.to_file(path_to_file, driver="GPKG", crs=4326)
66 |
67 | # Compute RMSE using cross-validation
68 | rmse_scores = np.sqrt(-cross_val_score(lin_reg, X_train, y_train, scoring='neg_mean_squared_error', cv=5))
69 | avg_rmse = np.mean(rmse_scores)
70 |
71 | # Compute R2 score using cross-validation
72 | r2_scores = cross_val_score(lin_reg, X_train, y_train, scoring='r2', cv=5)
73 | avg_r2 = np.mean(r2_scores)
74 |
75 | # Get the number of parameters (including intercept)
76 | k = X_train.shape[1] + 1
77 | n = len(y_train) # number of samples
78 |
79 | # Calculate the AIC
80 | aic = n * np.log(avg_rmse ** 2) + 2 * k
81 |
82 | print("<----- Linear Regression ----->")
83 | print("R2 value:", avg_r2)
84 | print("RMSE:", avg_rmse)
85 | print("AIC value:", aic)
86 |
87 | return updated_df
88 |
89 |
90 | def gam_regression(city):
91 | ndvi_folder = os.path.join("results", city, "ndvi")
92 |
93 | # Load ndvi layer
94 | ndvi_file = os.path.join(ndvi_folder, "calculated_ndvi_values.gpkg")
95 | ndvi_df = gpd.read_file(ndvi_file, layer="calculated_ndvi_values", crs=4326)
96 |
97 | # Separate data into known and missing GVI values
98 | known_df = ndvi_df[ndvi_df['missing'] == False].copy()
99 | missing_df = ndvi_df[ndvi_df['missing'] == True].copy()
100 |
101 | # Split known data into features (NDVI) and target (GVI)
102 | X_train = known_df[['mean_NDVI']]
103 | y_train = known_df['GVI']
104 |
105 | # Prepare missing data for prediction
106 | X_test = missing_df[['mean_NDVI']]
107 |
108 | n_features = 1 # number of features used in the model
109 | lams = np.logspace(-5, 5, 20) * n_features
110 | splines = 25
111 |
112 | # Train a Generalized Additive Model (GAM)
113 | gam = LinearGAM(
114 | s(0, n_splines=splines)).gridsearch(
115 | X_train.values,
116 | y_train.values,
117 | lam=lams
118 | )
119 |
120 | predicted_GVI = gam.predict(X_test.values)
121 |
122 | # Assign the predicted values to the missing GVI values in the DataFrame
123 | missing_df['GVI'] = predicted_GVI
124 |
125 | # Concatenate the updated missing values with the known values
126 | updated_df = pd.concat([known_df, missing_df])
127 |
128 | path_to_file= os.path.join(ndvi_folder, "calculated_missing_values_gam.gpkg")
129 | updated_df.to_file(path_to_file, driver="GPKG", crs=4326)
130 |
131 | # Compute RMSE using cross-validation
132 | rmse_scores = np.sqrt(-cross_val_score(gam, X_train, y_train, scoring='neg_mean_squared_error', cv=5))
133 | avg_rmse = np.mean(rmse_scores)
134 |
135 | # Get the number of parameters (including intercept)
136 | k = X_train.shape[1] + 1
137 | n = len(y_train) # number of samples
138 |
139 | # Calculate the AIC
140 | aic = n * np.log(avg_rmse ** 2) + 2 * k
141 |
142 | print("<----- Linear GAM ----->")
143 | print("RMSE:", avg_rmse)
144 | print("AIC value:", aic)
145 |
146 | return updated_df
147 |
148 |
149 | def clean_points(city, crs):
150 | # Cleans the GVI points data by dropping points outside the extent of the NDVI file.
151 |
152 | # File paths for the GVI points and NDVI files
153 | # The NDVI file has to be stored in results/city/ndvi folder and has to be named ndvi.tif
154 | gvi = os.path.join("results", city, "gvi", "gvi-points.gpkg")
155 | ndvi = os.path.join("results", city, "ndvi", f"ndvi.tif")
156 |
157 | gvi_df = gpd.read_file(gvi, layer="gvi-points", crs=4326)
158 | gvi_df.to_crs(epsg=crs, inplace=True)
159 |
160 | # Get the extent of the NDVI file
161 | with rasterio.open(ndvi) as src:
162 | extent = src.bounds
163 |
164 | # Filter the GVI points to include only those within the extent of the NDVI file
165 | filtered_gvi = gvi_df.cx[extent[0]:extent[2], extent[1]:extent[3]]
166 |
167 | # Save the filtered GVI points to a new file to preserve the original data
168 | filtered_gvi_path = os.path.join("results", city, "ndvi", "filtered-points.gpkg")
169 | filtered_gvi.to_file(filtered_gvi_path, driver="GPKG", crs=crs)
170 |
171 | return filtered_gvi_path, ndvi
172 |
173 |
174 | if __name__ == "__main__":
175 | # Read command-line arguments
176 | args = sys.argv
177 |
178 | # Extract city, CRS, and distance from the command-line arguments
179 | city = args[1] # City to analyze
180 | ndvi_file_exists = bool(int(args[2])) # Indicates if we already have the NDVI values
181 |
182 | if not ndvi_file_exists:
183 | crs = int(args[3]) # CRS in meters, suitable for the area in which we are working
184 | # For example, we can use the same CRS as the roads.gpkg file
185 | # IMPORTANT: The NDVI image should be in this CRS
186 | distance = int(args[4]) # The distance used to generate the sample points
187 |
188 | # Step 1: Clean the GVI points by filtering points outside the extent of the NDVI file
189 | gvi, ndvi = clean_points(city, crs)
190 |
191 | # Step 2: Calculate the mean NDVI values from the filtered GVI points
192 | ndvi_path = calculate_ndvi(gvi, ndvi, distance//2, city, crs)
193 |
194 | # Step 3: Train a Linear Regression model to predict missing GVI values
195 | linreg = linear_regression(city)
196 | lingam = gam_regression(city)
--------------------------------------------------------------------------------
/scripts/get_gvi_gpkg.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import geopandas as gpd
3 | from shapely.geometry import Point
4 | import glob
5 | import sys
6 |
7 | """
8 | This script converts CSV files generated with the main_script into a GeoPackage (gpkg) file and a GeoJSON file. It processes the CSV files for a specific city, performs data cleaning and validation, and saves the resulting files in the specified path.
9 | The script just takes the city name as a command-line argument
10 | """
11 |
12 | if __name__ == "__main__":
13 | args = sys.argv
14 |
15 | city = args[1] # City to analyse
16 |
17 | # Path to the CSV files
18 | csv_files = glob.glob(f"results/{city}/gvi/*.csv")
19 |
20 | # Create an empty list to store individual DataFrames
21 | dfs = []
22 |
23 | # Loop through the CSV files, read each file using pandas, and append the resulting DataFrame to the list
24 | for csv_file in csv_files:
25 | df = pd.read_csv(csv_file)
26 | dfs.append(df)
27 |
28 | # Concatenate all the DataFrames in the list along the rows
29 | merged_df = pd.concat(dfs, ignore_index=True)
30 |
31 | # Iterate over the DataFrame rows
32 | for index, row in merged_df.iterrows():
33 | try:
34 | # Attempt to convert the "x" and "y" values to floats
35 | float(row["x"])
36 | float(row["y"])
37 | except ValueError:
38 | # If a ValueError occurs, drop the row from the DataFrame
39 | merged_df.drop(index, inplace=True)
40 |
41 | # Drop duplicate rows based on the id column
42 | merged_df = merged_df.drop_duplicates(subset=['id'])
43 |
44 | merged_df.to_csv(f"results/{city}/gvi/gvi-points.csv", index=False)
45 |
46 | # Convert the 'geometry' column to valid Point objects
47 | merged_df['geometry'] = merged_df.apply(lambda row: Point(float(row["x"]), float(row["y"])), axis=1)
48 | merged_df["id"] = merged_df["id"].astype(int)
49 |
50 | # Convert the merged DataFrame to a GeoDataFrame
51 | gdf = gpd.GeoDataFrame(merged_df, geometry='geometry', crs=4326)
52 |
53 | path_to_file="results/{}/gvi/gvi-points.gpkg".format(city, city)
54 | gdf.to_file(path_to_file, driver="GPKG", crs=4326)
--------------------------------------------------------------------------------
/scripts/mean_gvi_street.py:
--------------------------------------------------------------------------------
1 | import geopandas as gpd
2 | import sys
3 | import os
4 |
5 | """
6 | The main purpose of the script is to compute statistical measures such as the mean GVI, the number of missing points, and the total number of points per road segment. These statistics provide valuable insights into the visibility and quality of roads within a given city.
7 |
8 | To use the script, the user needs to provide the name of the city to be analyzed as a command-line argument. The script then retrieves the necessary data files from the corresponding directories and performs the statistical calculations.
9 | """
10 |
11 |
12 | if __name__ == "__main__":
13 | args = sys.argv
14 |
15 | city = args[1] # City to analyse
16 |
17 | dir_path = os.path.join("results", city)
18 |
19 | # Load roads layer
20 | roads_path = os.path.join(dir_path, "roads", "roads.gpkg")
21 | roads = gpd.read_file(roads_path, layer="roads")
22 |
23 | # Load points with gvi layer
24 | points_path = os.path.join(dir_path, "ndvi", "calculated_missing_values_linreg.gpkg")
25 | points = gpd.read_file(points_path, layer="calculated_missing_values_linreg", crs=4326)
26 | points.to_crs(crs=roads.crs, inplace=True)
27 |
28 | # Load points with roads layer
29 | points_road_path = os.path.join(dir_path, "points", "points.gpkg")
30 | points_road = gpd.read_file(points_road_path, layer="points", crs=4326)
31 | points_road.to_crs(crs=roads.crs, inplace=True)
32 |
33 | # Merge the dataframe containing the GVI value with the dataframe containing the roads ids
34 | points_road = points.merge(points_road, on="id")
35 |
36 | # Merge the previous dataframe with the roads dataframe
37 | intersection = points_road.merge(roads, left_on="road_index", right_on="index")
38 |
39 | # Get statistics per road (mean GVI value, number of null points, number of total points)
40 | gvi_per_road = intersection.groupby("road_index").agg(
41 | {'GVI': ['mean', lambda x: x.isnull().sum(), 'size']}
42 | ).reset_index()
43 |
44 | gvi_per_road.columns = ['road_index', 'avg_GVI', 'null_points_count', 'total_points']
45 |
46 | # Merge the results back into the road layer
47 | roads_with_avg_gvi = roads.merge(gvi_per_road, left_on="index", right_on="road_index", how='left')
48 |
49 | # Save results to GPKG
50 | path_to_file="results/{}/gvi/gvi-streets.gpkg".format(city)
51 | roads_with_avg_gvi.to_file(path_to_file, driver="GPKG", crs=roads.crs)
52 |
--------------------------------------------------------------------------------
/scripts/results_metrics.py:
--------------------------------------------------------------------------------
1 | import geopandas as gpd
2 | import numpy as np
3 | import pandas as pd
4 | import os
5 | import sys
6 |
7 | import seaborn as sns
8 | import matplotlib.pyplot as plt
9 |
10 |
11 | def plot_unavailable_images(df, city):
12 | grouped = df.groupby(
13 | ["highway", "city"]
14 | ).agg(
15 | {"total_null": "sum",
16 | "proportion_null": "sum"})
17 |
18 | grouped2 = grouped.groupby("highway").agg({"total_null": "sum"})
19 |
20 | # Sort the grouped DataFrame by 'total_null' column in descending order and select the top 5 rows
21 | top_5_highways = list(grouped2.nlargest(5, 'total_null').index)
22 |
23 | grouped = grouped.loc[top_5_highways]
24 |
25 | # Reset the index for proper sorting and grouping
26 | grouped = grouped.reset_index()
27 |
28 | grouped = grouped.sort_values(by="proportion_null", ascending=False)
29 |
30 | custom_palette = ["#D53E4F", "#FC8D59", "#FEE08B", "#FFFFBF", "#E6F598", "#99D594", "#3288BD"]
31 |
32 | # Create a bar plot for the top 5 highway types
33 | bar1 = sns.barplot(data=grouped, x="proportion_null", y="highway", hue="city", palette=custom_palette)
34 |
35 | # Create custom legend handles and labels for bar1
36 | handles1, labels1 = bar1.get_legend_handles_labels()
37 | # Add the legends outside of the plot
38 | plt.legend(handles1, labels1, title='City', bbox_to_anchor=(1.05, 1), loc='upper left')
39 |
40 | # Set labels and title
41 | plt.title('Top 5 Highway Types with Most Missing Images')
42 |
43 | plt.xlabel('Highway Types')
44 | plt.ylabel('Proportion of Missing Images by Highway Type')
45 |
46 | # Set the maximum value for the y-axis
47 | plt.xlim(0, 1)
48 |
49 | plt.show()
50 |
51 | # Set the figure size
52 | bar1.figure.set_size_inches(8, 6) # Adjust the width and height as needed
53 |
54 | bar1.figure.savefig(f'results/{city}/plot_missing_images_{city}.svg', format='svg', bbox_inches='tight')
55 |
56 | return grouped
57 |
58 |
59 | def get_unavailable_images(intersection, city):
60 | grouped = intersection.groupby(['road_index_x', 'highway']).agg({
61 | 'image_id': lambda x: (~x.isnull()).sum(), # Count number of points with missing images
62 | }).reset_index()
63 |
64 | # Rename the columns of the grouped dataframe
65 | grouped.columns = ['road_index', 'highway', 'total_null']
66 |
67 | # Count the number of missing values per road type
68 | count = grouped.groupby('highway').agg({
69 | 'total_null': 'sum',
70 | }).sort_values('total_null', ascending=False)
71 |
72 | count['city'] = city
73 | count['proportion_null'] = count["total_null"] / len(intersection)
74 |
75 | return count
76 |
77 |
78 | def get_road_unavailable_images(city):
79 | dir_path = os.path.join("results", city)
80 |
81 | # Load roads layer
82 | roads_path = os.path.join(dir_path, "gvi", "gvi-streets.gpkg")
83 | roads = gpd.read_file(roads_path, layer="gvi-streets")
84 |
85 | # Load points with gvi layer
86 | points_path = os.path.join(dir_path, "gvi", "gvi-points.gpkg")
87 | points = gpd.read_file(points_path, layer="gvi-points", crs=4326)
88 | points.to_crs(crs=roads.crs, inplace=True)
89 |
90 | # Load points with roads layer
91 | points_road_path = os.path.join(dir_path, "points", "points.gpkg")
92 | points_road = gpd.read_file(points_road_path, layer="points", crs=4326)
93 | points_road.to_crs(crs=roads.crs, inplace=True)
94 |
95 | points_road = points_road.merge(points, on="id")
96 |
97 | # Merge the previous dataframe with the roads dataframe
98 | intersection = points_road.merge(roads, left_on="road_index", right_on="index")
99 |
100 | intersection = intersection[["id", "image_id", "distance", "is_panoramic_x", "road_index_x", "geometry_x", "GVI", "length", "highway"]]
101 |
102 | count = get_unavailable_images(intersection, city)
103 |
104 | return intersection, count
105 |
106 |
107 | def get_missing_images(df):
108 | unavailable = df[df["image_id"] == ""].count()[0]
109 | unsuitable = df[(df["GVI"].isnull()) & (df["image_id"]!="")].count()[0]
110 | total_null = df[df["GVI"].isnull()].count()[0]
111 | total = df.count()[0]
112 | percentage_null = total_null / total
113 |
114 | result_table = [unavailable, unsuitable, total_null, percentage_null, total]
115 | return pd.DataFrame([result_table], columns=['Unavailable', 'Unsuitable', 'Total', 'Proportion', 'Total Sample Points'])
116 |
117 |
118 | # Create an empty DataFrame to store the results
119 | def get_panoramic_images(df):
120 | is_panoramic = df[df["is_panoramic_x"]].count()[0]
121 | total = df[df["image_id"] != ""].count()[0]
122 |
123 | result_table = [is_panoramic, total, is_panoramic/total]
124 | return pd.DataFrame([result_table], columns=['Panoramic Images', 'Total Images', "Proportion"])
125 |
126 |
127 | def get_availability_score(df):
128 | gvi_points = df[df["image_id"]!=""].count()[0]
129 | road_length = df["length"].sum() / 1000
130 | total = df.count()[0]
131 |
132 | result_table = [gvi_points, road_length, total, gvi_points/total, (gvi_points * np.log(road_length))/total]
133 | return pd.DataFrame([result_table], columns=['GVI Points', 'Road Length', 'Total Sample', 'Availability Score', 'Adjusted Availability Score'])
134 |
135 |
136 | def get_usability_score(df):
137 | gvi_points = df[(~df["GVI"].isnull()) & (df["image_id"]!="")].count()[0]
138 | road_length = df["length"].sum() / 1000
139 | total = df[df["image_id"]!=""].count()[0]
140 |
141 | result_table = [gvi_points, road_length, total, gvi_points/total, (gvi_points * np.log(road_length))/total]
142 |
143 | return pd.DataFrame([result_table], columns=['GVI Points', 'Road Length', 'Total Sample', 'Usability Score', 'Adjusted Usability Score'])
144 |
145 |
146 | def get_metrics(city):
147 | pd.set_option('display.max_columns', None)
148 | intersection, count = get_road_unavailable_images(city)
149 |
150 | print(f"Unavailable images per road type for {city}")
151 | print(plot_unavailable_images(count, city))
152 |
153 | print(f"\nMissing images for {city}")
154 | print(get_missing_images(intersection))
155 |
156 | print(f"\nPanoramic images for {city}")
157 | print(get_panoramic_images(intersection))
158 |
159 | print(f"\nImage Availability Score and Adjusted Image Availability Score for {city}")
160 | print(get_availability_score(intersection))
161 |
162 | print(f"\nImage Usability Score and Ajdusted Image Usability Score for {city}")
163 | print(get_usability_score(intersection))
164 |
165 |
166 | if __name__ == "__main__":
167 | # Read command-line arguments
168 | args = sys.argv
169 |
170 | # Extract city, CRS, and distance from the command-line arguments
171 | city = args[1] # City to analyze
172 |
173 | get_metrics(city)
174 |
175 |
176 |
177 |
--------------------------------------------------------------------------------