├── .gitattributes ├── data └── labels.csv.gz ├── LICENSE ├── CONTRIBUTING.md ├── README.md ├── CODE_OF_CONDUCT.md └── README /.gitattributes: -------------------------------------------------------------------------------- 1 | *.gz filter=lfs diff=lfs merge=lfs -text 2 | -------------------------------------------------------------------------------- /data/labels.csv.gz: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:a6f679cbc1004d35b9eaa475080817667521aa6372853a5d2f883cefa27472c7 3 | size 255524861 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | MIT License 3 | 4 | Copyright (c) Meta Platforms, Inc. and affiliates. 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in all 14 | copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 22 | SOFTWARE. -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to HighResolutionSettlementLayer 2 | We want to make contributing to this project as easy and transparent as 3 | possible. 4 | 5 | ## Our Development Process 6 | Currently this project is not in active development and we are only sharing data useful for training AI models; specifically human labels that were used to train machine-vision models and designed for overhead images like satellite imagery. In the future we will add more model code which may be suitable for more active development. 7 | 8 | ## Pull Requests 9 | We are not actively supporting pull requests at this time since this presently is just a repo for training data. 10 | 11 | However in the future we practice and advise the following best practices around pull requests. 12 | 13 | 1. Fork the repo and create your branch from `main`. 14 | 2. If you've added code that should be tested, add tests. 15 | 3. If you've changed APIs, update the documentation. 16 | 4. Ensure the test suite passes. 17 | 5. Make sure your code lints. 18 | 6. If you haven't already, complete the Contributor License Agreement ("CLA"). 19 | 20 | ## Contributor License Agreement ("CLA") 21 | In order to accept your pull request, we need you to submit a CLA. You only need 22 | to do this once to work on any of Meta's open source projects. 23 | 24 | Complete your CLA here: 25 | 26 | ## Issues 27 | We use GitHub issues to track public bugs. Please ensure your description is 28 | clear and has sufficient instructions to be able to reproduce the issue. 29 | 30 | Meta has a [bounty program](https://bugbounty.meta.com/) for the safe 31 | disclosure of security bugs. In those cases, please go through the process 32 | outlined on that page and do not file a public issue. 33 | 34 | ## Coding Style 35 | * 2 spaces for indentation rather than tabs. There is probably a good joke to put in here 36 | * 80 character line length 37 | * ... 38 | 39 | ## License 40 | By contributing to HighResolutionSettlementLayer, you agree that your contributions will be licensed 41 | under the LICENSE file in the root directory of this source tree. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | ## Examples 3 | Schema for CSV training data labels: 4 | 1. `id`: A unique sequential non-negative integer identifier for the data point. 5 | 2. `quadkey`: A zoom 15 [Bing Tile](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) 6 | quadkey containing the center of the image. 7 | 3. `latitude`: The latitude of the center of the image in the [WGS 84 datum](https://en.wikipedia.org/wiki/World_Geodetic_System). 8 | 4. `longitude`: The longitude of the center of the image in the [WGS 84 datum](https://en.wikipedia.org/wiki/World_Geodetic_System). 9 | 5. `num_true`: Number of true votes, i.e., votes that this image contains a building. 10 | 6. `num_false`: Number of false votes, i.e., votes that this image does not contain a building. 11 | 7. `num_unsure`: Number of unsure votes, i.e., votes that this image is unclear whether it contains a building. 12 | 8. `num_error`: Number of error votes, i.e., votes that this image has an error or image artifact. 13 | 14 | ## Requirements 15 | HighResolutionSettlementLayer requires or works with 16 | * Mac OS X or Linux 17 | * and presently only requires Gzip to unpack and ability to access CSV (comma seperated value) files 18 | 19 | ## Building HighResolutionSettlementLayer 20 | * CURRENTLY NOT APPLICABLE 21 | 22 | ## Installing HighResolutionSettlementLayer 23 | * CURRENTLY NOT APPLICABLE 24 | 25 | ## How HighResolutionSettlementLayer works 26 | *This repo contains a selection from the training, validation, and test data for 27 | the [High Resolution Settlement Layer] (https://dataforgood.facebook.com/dfg/tools/high-resolution-population-density-maps) 28 | dataset. The current data consists of 9869460 data points with location and rater scores.* 29 | 30 | ## Full documentation 31 | 32 | The file `labels.csv.gz` is a Gzip compressed file; to uncompress on unix-like 33 | systems run 34 | 35 | ``` 36 | $ gunzip labels.csv.gz 37 | ``` 38 | 39 | More information about Gzip, including how to decompress on other systems, can 40 | be found at https://www.gnu.org/software/gzip/ . 41 | 42 | This will produce a file `labels.csv`, which is a CSV file with header and the 43 | schema provided in the Examples section above or in the README file in more detail. 44 | 45 | 46 | ## Join the HighResolutionSettlementLayer community 47 | * Website: https://dataforgood.facebook.com/dfg/tools/high-resolution-population-density-maps 48 | * Facebook page: NA 49 | * Mailing list: NA 50 | * irc: NA 51 | 52 | See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out. 53 | 54 | ## License 55 | HighResolutionSettlementLayer is MIT licensed, as found in the LICENSE file. -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as 6 | contributors and maintainers pledge to make participation in our project and 7 | our community a harassment-free experience for everyone, regardless of age, body 8 | size, disability, ethnicity, sex characteristics, gender identity and expression, 9 | level of experience, education, socio-economic status, nationality, personal 10 | appearance, race, religion, or sexual identity and orientation. 11 | 12 | ## Our Standards 13 | 14 | Examples of behavior that contributes to creating a positive environment 15 | include: 16 | 17 | * Using welcoming and inclusive language 18 | * Being respectful of differing viewpoints and experiences 19 | * Gracefully accepting constructive criticism 20 | * Focusing on what is best for the community 21 | * Showing empathy towards other community members 22 | 23 | Examples of unacceptable behavior by participants include: 24 | 25 | * The use of sexualized language or imagery and unwelcome sexual attention or 26 | advances 27 | * Trolling, insulting/derogatory comments, and personal or political attacks 28 | * Public or private harassment 29 | * Publishing others' private information, such as a physical or electronic 30 | address, without explicit permission 31 | * Other conduct which could reasonably be considered inappropriate in a 32 | professional setting 33 | 34 | ## Our Responsibilities 35 | 36 | Project maintainers are responsible for clarifying the standards of acceptable 37 | behavior and are expected to take appropriate and fair corrective action in 38 | response to any instances of unacceptable behavior. 39 | 40 | Project maintainers have the right and responsibility to remove, edit, or 41 | reject comments, commits, code, wiki edits, issues, and other contributions 42 | that are not aligned to this Code of Conduct, or to ban temporarily or 43 | permanently any contributor for other behaviors that they deem inappropriate, 44 | threatening, offensive, or harmful. 45 | 46 | ## Scope 47 | 48 | This Code of Conduct applies within all project spaces, and it also applies when 49 | an individual is representing the project or its community in public spaces. 50 | Examples of representing a project or community include using an official 51 | project e-mail address, posting via an official social media account, or acting 52 | as an appointed representative at an online or offline event. Representation of 53 | a project may be further defined and clarified by project maintainers. 54 | 55 | This Code of Conduct also applies outside the project spaces when there is a 56 | reasonable belief that an individual's behavior may have a negative impact on 57 | the project or its community. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported by contacting the project team at . All 63 | complaints will be reviewed and investigated and will result in a response that 64 | is deemed necessary and appropriate to the circumstances. The project team is 65 | obligated to maintain confidentiality with regard to the reporter of an incident. 66 | Further details of specific enforcement policies may be posted separately. 67 | 68 | Project maintainers who do not follow or enforce the Code of Conduct in good 69 | faith may face temporary or permanent repercussions as determined by other 70 | members of the project's leadership. 71 | 72 | ## Attribution 73 | 74 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, 75 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html 76 | 77 | [homepage]: https://www.contributor-covenant.org 78 | 79 | For answers to common questions about this code of conduct, see 80 | https://www.contributor-covenant.org/faq -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | # HRSL Source Data 2 | 3 | This repo contains a selection from the training, validation, and test data for 4 | the [High Resolution Settlement Layer] (https://dataforgood.facebook.com/dfg/tools/high-resolution-population-density-maps) 5 | dataset. The current data consists of 9869460 data points with location and rater scores. 6 | 7 | ## Background 8 | 9 | To train, validate, test the 10 | [ResNet](https://en.wikipedia.org/wiki/Residual_neural_network) model 11 | underlying the HRSL dataset, a large number of images were sent to human raters 12 | to determine if the image contained a human-made building. The images were all 13 | 64x64 pixel images from high-resolution Maxar imagery, with an equatorial 14 | resolution of 50cm x 50cm a pixel. Images 15 | 16 | Each rater could vote for one of four options: 17 | 18 | 1. **True**: The image contains a reasonable fraction of a building. Buildings 19 | can be warehouses or other structures that can contain a human, but do not 20 | include vehicles, roads, signs, or other structures. Partial structures 21 | would qualify if they were at least one quarter of the structure or one 22 | quarter of the image. For example, a corner of a house in a corner of the 23 | image was not get a True vote. 24 | 2. **False**: The image does not contain a reasonable fraction of a building. 25 | That is, it clearly does not satisfy the conditions of a True vote. 26 | 3. **Unsure**: The rater could not determine if there was or was not a building 27 | in the image. This could be because the image is blurry, occluded by clouds, 28 | or other aspects preventing a positive identification. 29 | 4. **Error**: The image has an artificial artifact from processing, such as 30 | being all black or having a substantial portion of the image incomplete. 31 | 32 | Each image was rated by at least two raters. If they disagreed, a third 33 | highly-calibrated rater would break the tie. There are also cases where an 34 | image would be rated by more than three raters, for example if an image was 35 | requeued. 36 | 37 | For training and testing purposes, we would ignore any images with more than 38 | one vote for either Unsure or Error, or if the number of True votes equalled 39 | the number of False votes. Any non-ignored image would be considered a positive 40 | building-containing images if the True votes was greater than the False votes; 41 | otherwise it would be considered an image not containing a building. 42 | 43 | ## Labels 44 | 45 | The file `labels.csv.gz` is a Gzip compressed file; to uncompress on unix-like 46 | systems run 47 | 48 | ``` 49 | $ gunzip labels.csv.gz 50 | ``` 51 | 52 | More information about Gzip, including how to decompress on other systems, can 53 | be found at https://www.gnu.org/software/gzip/ . 54 | 55 | This will produce a file `labels.csv`, which is a CSV file with header and the 56 | following schema: 57 | 58 | 1. `id`: A unique sequential non-negative integer identifier for the data point. 59 | 2. `quadkey`: A zoom 15 [Bing Tile](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) 60 | quadkey containing the center of the image. 61 | 3. `latitude`: The latitude of the center of the image in the [WGS 84 datum](https://en.wikipedia.org/wiki/World_Geodetic_System). 62 | 4. `longitude`: The longitude of the center of the image in the [WGS 84 datum](https://en.wikipedia.org/wiki/World_Geodetic_System). 63 | 5. `num_true`: Number of true votes, i.e., votes that this image contains a building. 64 | 6. `num_false`: Number of false votes, i.e., votes that this image does not contain a building. 65 | 7. `num_unsure`: Number of unsure votes, i.e., votes that this image is unclear whether it contains a building. 66 | 8. `num_error`: Number of error votes, i.e., votes that this image has an error or image artifact. 67 | 68 | This can be processed by any tool that can parse CSV files (of which there are 69 | many). For example, in Python one can read this as follows: 70 | 71 | ```py 72 | import csv 73 | with open("labels.csv") as f: 74 | reader = csv.DictReader(f) 75 | row = next(reader) 76 | 77 | row_id = int(row['id']) 78 | latitude = float(row['latitude']) 79 | longitude = float(row['longitude']) 80 | num_true = int(row['num_true']) 81 | num_false = int(row['num_false']) 82 | should_skip = (num_true == num_false) or ( int(row['num_unsure']) + int(row['num_error']) > 1 ) 83 | if should_skip: 84 | decision = 'SKIP' 85 | elif num_true > num_false: 86 | decision = 'BUILDING' 87 | else: 88 | decision = 'NO_BUILDING' 89 | 90 | print(f"""Row: 91 | Id: {row_id} 92 | Location: {latitude} lat {longitude} lon 93 | Decision: {decision} 94 | """) 95 | ``` 96 | 97 | --------------------------------------------------------------------------------