├── How to Contribute ├── images │ ├── placeholder.md │ ├── placeholder.png │ ├── jdf-horizontal-color.svg │ ├── breakdown.svg │ └── organigram.svg ├── Appendix_A.md └── CONTRIBUTING.md ├── test ├── requirements.txt ├── test.sh └── validate.py ├── CHANGELOG.md ├── .gitignore ├── src ├── main.rs ├── lib.rs ├── util.rs ├── pbf.rs ├── osm_arrow.rs └── sink.rs ├── .github ├── ISSUE_TEMPLATE │ ├── bug.yml │ ├── question.yaml │ ├── action_item.yaml │ ├── documentation.yaml │ ├── phase_of_development.yml │ ├── default.yml │ ├── release-feedback.yaml │ └── feature.yml ├── workflows │ ├── release_deploy.yaml │ ├── build_test.yaml │ ├── issue-labeler.yml │ └── issue-labeler-node16.yml ├── DISCUSSION_TEMPLATE │ └── data_feedback.yml └── advanced-issue-labeler.yml ├── benches └── benchmark.rs ├── LICENSE ├── Cargo.toml ├── README.md └── CODE-OF-CONDUCT.md /How to Contribute/images/placeholder.md: -------------------------------------------------------------------------------- 1 | placeholder 2 | -------------------------------------------------------------------------------- /test/requirements.txt: -------------------------------------------------------------------------------- 1 | pandas 2 | xmltodict 3 | pyarrow 4 | -------------------------------------------------------------------------------- /How to Contribute/images/placeholder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OvertureMaps/osm-pbf-parquet/HEAD/How to Contribute/images/placeholder.png -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | All notable changes to this project will be documented in this file. 3 | The format is based on Keep a Changelog, and this project adheres to Semantic Versioning. 4 | 5 | ## 0.1.0 - 2024-12-06 6 | ### Added 7 | Initial release, copied version 0.7.2 from [original source](https://github.com/brad-richardson/osm-pbf-parquet) 8 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Generated by Cargo 2 | # will have compiled files and executables 3 | debug/ 4 | target/ 5 | 6 | # Test outputs 7 | parquet/ 8 | out/ 9 | *.osm.pbf 10 | *.osm 11 | *.parquet 12 | 13 | # These are backup files generated by rustfmt 14 | **/*.rs.bk 15 | 16 | # MSVC Windows builds of rustc generate these, which store debugging information 17 | *.pdb 18 | -------------------------------------------------------------------------------- /src/main.rs: -------------------------------------------------------------------------------- 1 | use clap::Parser; 2 | use env_logger::{Builder, Env}; 3 | 4 | use osm_pbf_parquet::pbf_driver; 5 | use osm_pbf_parquet::util::Args; 6 | 7 | fn main() { 8 | Builder::from_env(Env::default().default_filter_or("info")).init(); 9 | 10 | let args = Args::parse(); 11 | println!("{:?}", args); 12 | 13 | tokio::runtime::Builder::new_multi_thread() 14 | .worker_threads(args.get_worker_threads()) 15 | .enable_all() 16 | .build() 17 | .unwrap() 18 | .block_on(async { 19 | let _ = pbf_driver(args).await; 20 | }); 21 | } 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug.yml: -------------------------------------------------------------------------------- 1 | name: Bug report 2 | description: Create a report to help us improve 3 | title: "[Bug report]" 4 | labels: ["bug"] 5 | projects: ["Add project"] 6 | assignees: 7 | body: 8 | - type: textarea 9 | id: what-happened 10 | attributes: 11 | label: Outline bug 12 | description: Outline Outline bug Details 13 | placeholder: Tell us what you see 14 | value: "Add details" 15 | validations: 16 | required: true 17 | - type: dropdown 18 | id: TF-Groups 19 | attributes: 20 | label: Issue dependency with other TF Groups 21 | multiple: true 22 | options: 23 | - No Dependency 24 | - Add WG names 25 | 26 | -------------------------------------------------------------------------------- /test/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | set -e 3 | 4 | # Cleanup old artifacts if exist 5 | rm -rf ./parquet/**/*.parquet 6 | mkdir -p ./parquet/ 7 | 8 | test_file="test" 9 | 10 | # Download PBF, convert to OSM XML 11 | if [ ! -f "./${test_file}.osm.pbf" ]; then 12 | echo "Downloading file" 13 | curl -L "https://download.geofabrik.de/australia-oceania/cook-islands-latest.osm.pbf" -o "${test_file}.osm.pbf" 14 | echo "Creating OSM XML" 15 | osmium cat ${test_file}.osm.pbf -o ${test_file}.osm 16 | fi 17 | 18 | # Run parquet conversion 19 | echo "Running conversion" 20 | cargo run --release -- --input ${test_file}.osm.pbf --output ./parquet/ 21 | 22 | echo "Running validation" 23 | python3 ./validate.py 24 | 25 | # Cleanup artifacts 26 | # rm -rf ./parquet/**/*.parquet 27 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/question.yaml: -------------------------------------------------------------------------------- 1 | name: Question 2 | description: Question Submission 3 | title: "[Question]" 4 | labels: ["question"] 5 | projects: ["add_project"] 6 | assignees: add github name 7 | body: 8 | - type: textarea 9 | id: Question 10 | attributes: 11 | label: Question 12 | description: Question submission for group members 13 | placeholder: Tell us what you see 14 | value: "Question submission details" 15 | validations: 16 | required: true 17 | - type: dropdown 18 | id: TF-Groups 19 | attributes: 20 | label: Issue dependency with other TF Groups 21 | multiple: true 22 | options: 23 | - No Dependency 24 | - Admin 25 | - Address 26 | - Base 27 | - Data Platform 28 | - Dev Advocacy 29 | - GERs 30 | - Schema 31 | - Transportation 32 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/action_item.yaml: -------------------------------------------------------------------------------- 1 | name: Action Item 2 | description: Action Item submission 3 | title: "[Action Item]" 4 | labels: ["Action Item"] 5 | projects: ["Add project"] 6 | assignees: add github username 7 | body: 8 | - type: textarea 9 | id: what-happened 10 | attributes: 11 | label: Outline Action Item Details 12 | description: Outline Action Item Details 13 | placeholder: Tell us what you see 14 | value: "Add details" 15 | validations: 16 | required: true 17 | - type: dropdown 18 | id: TF-Groups 19 | attributes: 20 | label: Issue dependency with other TF Groups 21 | multiple: true 22 | options: 23 | - No Dependency 24 | - Admin 25 | - Address 26 | - Base 27 | - Data Platform 28 | - Dev Advocacy 29 | - GERs 30 | - Schema 31 | - Transportation 32 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/documentation.yaml: -------------------------------------------------------------------------------- 1 | name: Documentation 2 | description: Issue associated with Documentation 3 | title: "[Documentation]" 4 | labels: ["documentation"] 5 | projects: ["Add project"] 6 | assignees: 7 | body: 8 | - type: textarea 9 | id: what-happened 10 | attributes: 11 | label: Outline Action Item Details 12 | description: Outline Action Item Details 13 | placeholder: Tell us what you see 14 | value: "Add details" 15 | validations: 16 | required: true 17 | - type: dropdown 18 | id: TF-Groups 19 | attributes: 20 | label: Issue dependency with other TF Groups 21 | multiple: true 22 | options: 23 | - No Dependency 24 | - Admin 25 | - Address 26 | - Base 27 | - Data Platform 28 | - Dev Advocacy 29 | - GERs 30 | - Schema 31 | - Transportation 32 | 33 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/phase_of_development.yml: -------------------------------------------------------------------------------- 1 | name: Phase-of-development 2 | description: Phase development submission 3 | title: "[Phase-Dev]" 4 | labels: ["Phase-Dev"] 5 | projects: ["Add project name"] 6 | assignees: 7 | - 8 | 9 | body: 10 | - type: textarea 11 | id: other 12 | attributes: 13 | label: Other info 14 | description: Write your ideas ... 15 | validations: 16 | required: false 17 | 18 | - type: dropdown 19 | id: phases 20 | attributes: 21 | label: phases 22 | multiple: false 23 | options: 24 | - 'phase-1' 25 | - 'phase-2' 26 | - 'phase-3' 27 | 28 | - type: checkboxes 29 | id: Platform-work-question 30 | attributes: 31 | label: Platform work 32 | description: "Is this platform work that could be shared across multiple themes?" 33 | options: 34 | - label: Check box if YES 35 | -------------------------------------------------------------------------------- /.github/workflows/release_deploy.yaml: -------------------------------------------------------------------------------- 1 | name: Release build 2 | 3 | on: 4 | release: 5 | types: [created] 6 | 7 | permissions: 8 | contents: write 9 | 10 | jobs: 11 | upload-assets: 12 | strategy: 13 | matrix: 14 | include: 15 | - target: aarch64-unknown-linux-gnu 16 | os: ubuntu-latest 17 | - target: x86_64-unknown-linux-gnu 18 | os: ubuntu-latest 19 | - target: aarch64-apple-darwin 20 | os: macos-latest 21 | - target: x86_64-apple-darwin 22 | os: macos-latest 23 | - target: x86_64-pc-windows-msvc 24 | os: windows-latest 25 | runs-on: ${{ matrix.os }} 26 | steps: 27 | - uses: actions/checkout@v4 28 | - uses: taiki-e/upload-rust-binary-action@v1 29 | with: 30 | bin: osm-pbf-parquet 31 | target: ${{ matrix.target }} 32 | token: ${{ secrets.GITHUB_TOKEN }} 33 | -------------------------------------------------------------------------------- /benches/benchmark.rs: -------------------------------------------------------------------------------- 1 | use criterion::{criterion_group, criterion_main, Criterion}; 2 | use osm_pbf_parquet::pbf_driver; 3 | use osm_pbf_parquet::util::Args; 4 | use std::fs; 5 | 6 | async fn bench() { 7 | let args = Args::new( 8 | "./test/test.osm.pbf".to_string(), 9 | "./test/bench-out/".to_string(), 10 | 0, 11 | ); 12 | pbf_driver(args).await.unwrap(); 13 | } 14 | 15 | pub fn criterion_benchmark(c: &mut Criterion) { 16 | c.bench_function("benchmark", |b| { 17 | let rt = tokio::runtime::Builder::new_multi_thread() 18 | .enable_all() 19 | .build() 20 | .unwrap(); 21 | b.to_async(rt).iter(bench) 22 | }); 23 | let _ = fs::remove_dir_all("./test/bench-out/"); 24 | } 25 | 26 | criterion_group! { 27 | name = benches; 28 | config = Criterion::default().sample_size(10); 29 | targets = criterion_benchmark 30 | } 31 | criterion_main!(benches); 32 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/default.yml: -------------------------------------------------------------------------------- 1 | --- 2 | 3 | name: Report an Issue - no Config 4 | description: Issue form for reporting issues and requesting new features. 5 | 6 | body: 7 | - type: markdown 8 | attributes: 9 | value: Thanks for taking the time to fill out this issue! 10 | 11 | - type: dropdown 12 | id: without-policy 13 | attributes: 14 | label: Type of issue 15 | multiple: false 16 | options: 17 | - 'Bug Report' 18 | - 'Feature Request' 19 | - 'Other' 20 | validations: 21 | required: false 22 | 23 | - type: textarea 24 | id: description 25 | attributes: 26 | label: Description 27 | description: A clear and concise description of what the problem is. E.g. I'm always frustrated when [...] 28 | validations: 29 | required: false 30 | 31 | - type: textarea 32 | id: solution 33 | attributes: 34 | label: Describe the solution you'd like 35 | description: A clear and concise description of what you want to happen. 36 | validations: 37 | required: false 38 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/release-feedback.yaml: -------------------------------------------------------------------------------- 1 | name: Feedback 2 | description: Feedback Report 3 | title: "[Feedback]" 4 | labels: ["Feedback"] 5 | projects: ["ADD-PROJECT"] 6 | assignees: add Github name 7 | body: 8 | - type: input 9 | id: contact 10 | attributes: 11 | label: contact 12 | description: How can we contact you if we need more info? 13 | placeholder: ex. email@example.com 14 | validations: 15 | required: false 16 | - type: textarea 17 | id: Feedback 18 | attributes: 19 | label: Feedback 20 | description: Release feedback 21 | placeholder: Tell us your feedback! 22 | value: "## Summary Feedback \n\n [add details]" 23 | validations: 24 | required: true 25 | - type: dropdown 26 | id: Feedback-Theme 27 | attributes: 28 | label: Feedback Theme 29 | description: 30 | options: 31 | - No Dependency 32 | - Admin 33 | - Address 34 | - Base 35 | - Data Platform 36 | - Dev Advocacy 37 | - GERs 38 | - Schema 39 | - Transportation 40 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Overture Maps 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.github/workflows/build_test.yaml: -------------------------------------------------------------------------------- 1 | name: Build and test 2 | 3 | on: 4 | push: 5 | branches: [ "main" ] 6 | pull_request: 7 | branches: [ "main" ] 8 | 9 | env: 10 | CARGO_TERM_COLOR: always 11 | # Make sure CI fails on all warnings, including Clippy lints 12 | RUSTFLAGS: "-Dwarnings" 13 | 14 | jobs: 15 | build: 16 | runs-on: ubuntu-latest 17 | steps: 18 | - uses: actions/checkout@v4 19 | - uses: actions/setup-python@v5 20 | with: 21 | python-version: '3.12' 22 | - name: Build 23 | run: cargo build 24 | - name: Install osmium 25 | run: sudo apt install osmium-tool 26 | - name: Install python test dependencies 27 | working-directory: ./test 28 | run: pip3 install -r requirements.txt 29 | - name: Prepare and run test 30 | if: success() || failure() 31 | run: ./test.sh 32 | working-directory: ./test 33 | - name: Benchmark 34 | if: success() || failure() 35 | run: cargo bench 36 | - name: Run clippy 37 | if: success() || failure() 38 | run: cargo clippy --all-targets --all-features 39 | - name: Run cargo fmt 40 | if: success() || failure() 41 | run: cargo fmt --all -- --check 42 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "osm-pbf-parquet" 3 | version = "0.1.0" 4 | edition = "2021" 5 | 6 | [dependencies] 7 | anyhow = "1.*" 8 | arrow = "52.*" 9 | bytes = "1.*" 10 | clap = { version = "4.*", features = ["derive"] } 11 | env_logger = "0.11.*" 12 | flate2 = { version = "1.*", features = ["zlib-ng"], default-features = false } 13 | futures = "0.3.*" 14 | futures-util = "0.3.*" 15 | log = "0.4.*" 16 | object_store = {version = "0.10.2", features = ["aws"] } 17 | # TODO - move this upstream once https://github.com/b-r-u/osmpbf/pull/48 is landed 18 | osmpbf = { version = "0.3.*", features = ["zlib-ng", "async"], default-features = false, git = "https://github.com/brad-richardson/osmpbf.git", branch = "async-blob-reader" } 19 | parquet = { version = "52.*", features = ["async"] } 20 | sysinfo = "0.31.*" 21 | tokio = { version = "1.*", features = ["rt", "rt-multi-thread", "io-util"] } 22 | tokio-util = {version = "0.7.*", features = ["io-util"] } 23 | url = "2.*" 24 | 25 | [dev-dependencies] 26 | criterion = { version = "0.5.*", features = ["async_futures", "async_tokio"] } 27 | 28 | [lib] 29 | name = "osm_pbf_parquet" 30 | path = "src/lib.rs" 31 | 32 | [[bin]] 33 | name = "osm-pbf-parquet" 34 | path = "src/main.rs" 35 | 36 | [[bench]] 37 | name = "benchmark" 38 | harness = false 39 | -------------------------------------------------------------------------------- /src/lib.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::sync::{Arc, Mutex}; 3 | 4 | use tokio::runtime::Handle; 5 | use url::Url; 6 | 7 | pub mod osm_arrow; 8 | pub mod pbf; 9 | pub mod sink; 10 | pub mod util; 11 | use crate::osm_arrow::OSMType; 12 | use crate::pbf::{ 13 | create_local_buf_reader, create_s3_buf_reader, finish_sinks, monitor, process_blobs, 14 | }; 15 | use crate::sink::ElementSink; 16 | use crate::util::{Args, SinkpoolStore, ARGS}; 17 | 18 | pub async fn pbf_driver(args: Args) -> Result<(), anyhow::Error> { 19 | // TODO - validation of args 20 | // Store value for reading across threads (write-once) 21 | let _ = ARGS.set(args.clone()); 22 | 23 | let sinkpools: Arc = Arc::new(HashMap::from([ 24 | (OSMType::Node, Arc::new(Mutex::new(vec![]))), 25 | (OSMType::Way, Arc::new(Mutex::new(vec![]))), 26 | (OSMType::Relation, Arc::new(Mutex::new(vec![]))), 27 | ])); 28 | 29 | // Verify we're running in a tokio runtime and start separate monitoring thread 30 | let sinkpool_monitor = sinkpools.clone(); 31 | Handle::current().spawn(async move { monitor(sinkpool_monitor).await }); 32 | 33 | let full_path = args.input; 34 | let buf_reader = if let Ok(url) = Url::parse(&full_path) { 35 | create_s3_buf_reader(url).await? 36 | } else { 37 | create_local_buf_reader(&full_path).await? 38 | }; 39 | process_blobs(buf_reader, sinkpools.clone()).await?; 40 | 41 | finish_sinks(sinkpools.clone(), true).await?; 42 | 43 | Ok(()) 44 | } 45 | -------------------------------------------------------------------------------- /.github/DISCUSSION_TEMPLATE/data_feedback.yml: -------------------------------------------------------------------------------- 1 | title: "[Data-Feedback] " 2 | labels: ["data"] 3 | body: 4 | - type: markdown 5 | attributes: 6 | value: | 7 | This is text that will show up in the template! 8 | - type: dropdown 9 | id: category 10 | attributes: 11 | label: Feedback Theme? 12 | options: 13 | - Address 14 | - Admin Boundaries 15 | - Base 16 | - Buildings 17 | - Places 18 | - Transportation 19 | validations: 20 | required: true 21 | - type: dropdown 22 | id: download 23 | attributes: 24 | label: Feedback category 25 | options: 26 | - Documentation 27 | - Suggested improvements 28 | - Bug fix time 29 | - Release cadence 30 | validations: 31 | required: true 32 | - type: textarea 33 | id: improvements 34 | attributes: 35 | label: feedback-details 36 | description: "outline in detail what your observations" 37 | value: | 38 | ## Feedback Details: 39 | ``` 40 | Add details here 41 | ``` 42 | ## Associated Links: 43 | ``` 44 | Add details here 45 | ``` 46 | ## Other: 47 | ``` 48 | Add details here 49 | ``` 50 | ... 51 | render: bash 52 | validations: 53 | required: true 54 | - type: markdown 55 | attributes: 56 | value: | 57 | ## Further Details 58 | And some more markdown 59 | - type: input 60 | id: has-id 61 | attributes: 62 | label: Suggestions 63 | description: A description of suggestions to help us 64 | validations: 65 | required: true 66 | - type: checkboxes 67 | attributes: 68 | label: Check box if needed! 69 | options: 70 | - label: Feedback required 71 | required: true 72 | - label: Feedback not required 73 | - type: markdown 74 | attributes: 75 | value: | 76 | ### Thanks for your submission 77 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature.yml: -------------------------------------------------------------------------------- 1 | name: Feature request 2 | description: Suggest an improvement 3 | labels: ["RFE 🎁"] 4 | 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: Thanks for taking the time to fill out this feature request! 9 | 10 | - type: dropdown 11 | id: component 12 | attributes: 13 | label: Component 14 | description: Please chose components related to this feature request. 15 | multiple: true 16 | options: 17 | - 'bootctl' 18 | - 'homectl' 19 | - 'journalctl' 20 | - 'loginctl' 21 | - 'networkctl' 22 | - 'nss-resolve' 23 | - 'pam_systemd' 24 | - 'pam_systemd_home' 25 | - 'resolvectl' 26 | - 'systemctl' 27 | - 'systemd' 28 | - 'systemd-boot' 29 | - 'systemd-homed' 30 | - 'systemd-journald' 31 | - 'systemd-logind' 32 | - 'systemd-networkd' 33 | - 'systemd-networkd-wait-online' 34 | - 'systemd-nspawn' 35 | - 'systemd-resolved' 36 | - 'systemd-stub' 37 | - 'systemd-udevd' 38 | - 'the bootloader itself' 39 | - 'udev builtins' 40 | - 'udevadm' 41 | - '.network files' 42 | - 'tests' 43 | - 'other' 44 | validations: 45 | required: false 46 | 47 | - type: textarea 48 | id: description 49 | attributes: 50 | label: Is your feature request related to a problem? Please describe 51 | description: A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 52 | validations: 53 | required: false 54 | 55 | - type: textarea 56 | id: solution 57 | attributes: 58 | label: Describe the solution you'd like 59 | description: A clear and concise description of what you want to happen. 60 | validations: 61 | required: false 62 | 63 | - type: textarea 64 | id: alternatives 65 | attributes: 66 | label: Describe alternatives you've considered 67 | description: A clear and concise description of any alternative solutions or features you've considered. 68 | validations: 69 | required: false 70 | 71 | - type: input 72 | id: version 73 | attributes: 74 | label: The systemd version you checked that didn't have the feature you are asking for 75 | description: If this is not the most recently released upstream version, then please check first if it has that feature already. 76 | placeholder: '251' 77 | validations: 78 | required: false 79 | -------------------------------------------------------------------------------- /.github/workflows/issue-labeler.yml: -------------------------------------------------------------------------------- 1 | # Inspired by: https://github.com/stefanbuck/ristorante 2 | # See: https://stefanbuck.com/blog/codeless-contributions-with-github-issue-forms 3 | --- 4 | 5 | name: Issue labeler 6 | on: 7 | issues: 8 | types: [ opened ] 9 | 10 | permissions: 11 | issues: write 12 | contents: read 13 | id-token: write 14 | 15 | jobs: 16 | label-issues-policy: 17 | runs-on: ubuntu-latest 18 | 19 | strategy: 20 | matrix: 21 | template: [ bug.yml, feature.yml, animals.yml, phase_of_development.yml ] 22 | 23 | steps: 24 | - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 25 | 26 | - name: Parse issue form 27 | uses: stefanbuck/github-issue-parser@v3 28 | id: issue-parser 29 | with: 30 | template-path: .github/ISSUE_TEMPLATE/${{ matrix.template }} 31 | 32 | - name: Set labels based on policy 33 | uses: redhat-plumbers-in-action/advanced-issue-labeler@main 34 | with: 35 | issue-form: ${{ steps.issue-parser.outputs.jsonString }} 36 | template: ${{ matrix.template }} 37 | token: ${{ secrets.GITHUB_TOKEN }} 38 | 39 | label-issues-default-policy: 40 | runs-on: ubuntu-latest 41 | 42 | strategy: 43 | matrix: 44 | template: [ default-policy.yml ] 45 | 46 | steps: 47 | - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 48 | 49 | - name: Parse issue form 50 | uses: stefanbuck/github-issue-parser@v3 51 | id: issue-parser 52 | with: 53 | template-path: .github/ISSUE_TEMPLATE/${{ matrix.template }} 54 | 55 | - name: Set labels based on policy 56 | uses: redhat-plumbers-in-action/advanced-issue-labeler@main 57 | with: 58 | issue-form: ${{ steps.issue-parser.outputs.jsonString }} 59 | token: ${{ secrets.GITHUB_TOKEN }} 60 | 61 | label-issues-without-policy: 62 | runs-on: ubuntu-latest 63 | 64 | strategy: 65 | matrix: 66 | template: [ default.yml ] 67 | 68 | steps: 69 | - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 70 | 71 | - name: Parse issue form 72 | uses: stefanbuck/github-issue-parser@v3 73 | id: issue-parser 74 | with: 75 | template-path: .github/ISSUE_TEMPLATE/${{ matrix.template }} 76 | 77 | - name: Set labels based on policy 78 | uses: redhat-plumbers-in-action/advanced-issue-labeler@main 79 | with: 80 | issue-form: ${{ steps.issue-parser.outputs.jsonString }} 81 | section: without-policy 82 | block-list: | 83 | Other 84 | token: ${{ secrets.GITHUB_TOKEN }} 85 | -------------------------------------------------------------------------------- /.github/workflows/issue-labeler-node16.yml: -------------------------------------------------------------------------------- 1 | 2 | # Inspired by: https://github.com/stefanbuck/ristorante 3 | # See: https://stefanbuck.com/blog/codeless-contributions-with-github-issue-forms 4 | --- 5 | 6 | name: Issue labeler - node16 7 | on: 8 | issues: 9 | types: [ opened ] 10 | 11 | permissions: 12 | issues: write 13 | contents: read 14 | id-token: write 15 | 16 | jobs: 17 | label-issues-policy: 18 | runs-on: ubuntu-latest 19 | 20 | strategy: 21 | matrix: 22 | template: [ bug.yml, feature.yml, animals.yml, phase_of_development.yml ] 23 | 24 | steps: 25 | - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 26 | 27 | - name: Parse issue form 28 | uses: stefanbuck/github-issue-parser@v3 29 | id: issue-parser 30 | with: 31 | template-path: .github/ISSUE_TEMPLATE/${{ matrix.template }} 32 | 33 | - name: Set labels based on policy - node16 34 | uses: redhat-plumbers-in-action/advanced-issue-labeler@node16 35 | with: 36 | issue-form: ${{ steps.issue-parser.outputs.jsonString }} 37 | template: ${{ matrix.template }} 38 | token: ${{ secrets.GITHUB_TOKEN }} 39 | 40 | label-issues-default-policy: 41 | runs-on: ubuntu-latest 42 | 43 | strategy: 44 | matrix: 45 | template: [ default-policy.yml ] 46 | 47 | steps: 48 | - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 49 | 50 | - name: Parse issue form 51 | uses: stefanbuck/github-issue-parser@v3 52 | id: issue-parser 53 | with: 54 | template-path: .github/ISSUE_TEMPLATE/${{ matrix.template }} 55 | 56 | - name: Set labels based on policy - node16 57 | uses: redhat-plumbers-in-action/advanced-issue-labeler@node16 58 | with: 59 | issue-form: ${{ steps.issue-parser.outputs.jsonString }} 60 | token: ${{ secrets.GITHUB_TOKEN }} 61 | 62 | label-issues-without-policy: 63 | runs-on: ubuntu-latest 64 | 65 | strategy: 66 | matrix: 67 | template: [ default.yml ] 68 | 69 | steps: 70 | - uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4.1.4 71 | 72 | - name: Parse issue form 73 | uses: stefanbuck/github-issue-parser@v3 74 | id: issue-parser 75 | with: 76 | template-path: .github/ISSUE_TEMPLATE/${{ matrix.template }} 77 | 78 | - name: Set labels based on policy - node16 79 | uses: redhat-plumbers-in-action/advanced-issue-labeler@node16 80 | with: 81 | issue-form: ${{ steps.issue-parser.outputs.jsonString }} 82 | section: without-policy 83 | block-list: | 84 | Other 85 | token: ${{ secrets.GITHUB_TOKEN }} 86 | -------------------------------------------------------------------------------- /src/util.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::sync::atomic::AtomicU64; 3 | use std::sync::OnceLock; 4 | use std::sync::{Arc, Mutex}; 5 | use sysinfo::System; 6 | 7 | use clap::Parser; 8 | 9 | use crate::osm_arrow::OSMType; 10 | use crate::ElementSink; 11 | 12 | pub type SinkpoolStore = HashMap>>>; 13 | 14 | // Write once, safe read across threads 15 | pub static ARGS: OnceLock = OnceLock::new(); 16 | 17 | // Element counter to track read progress 18 | pub static ELEMENT_COUNTER: AtomicU64 = AtomicU64::new(0); 19 | 20 | static BYTES_IN_MB: usize = 1024 * 1024; 21 | 22 | #[derive(Parser, Debug, Clone)] 23 | #[command(version, about, long_about = None)] 24 | pub struct Args { 25 | /// Path to input PBF 26 | /// S3 URIs and filesystem paths are supported 27 | /// Note that reading from S3 is *much* slower, consider copying locally first 28 | #[arg(short, long)] 29 | pub input: String, 30 | 31 | /// Path to output directory 32 | /// S3 URIs and filesystem paths are supported 33 | #[arg(short, long, default_value = "./parquet")] 34 | pub output: String, 35 | 36 | /// Zstd compression level, 1-22, 0 for no compression 37 | #[arg(long, default_value_t = 3)] 38 | pub compression: u8, 39 | 40 | /// Worker thread count, default CPU count 41 | #[arg(long)] 42 | pub worker_threads: Option, 43 | 44 | /// Advanced options: 45 | /// 46 | /// Input buffer size, default 8MB 47 | #[arg(long)] 48 | pub input_buffer_size_mb: Option, 49 | 50 | /// Override target record batch size, balance this with available memory 51 | /// default is total memory (MB) / CPU count / 8 52 | #[arg(long)] 53 | pub record_batch_target_mb: Option, 54 | 55 | /// Max feature count per row group 56 | #[arg(long)] 57 | pub max_row_group_count: Option, 58 | 59 | /// Override target parquet file size, default 500MB 60 | #[arg(long, default_value_t = 500usize)] 61 | pub file_target_mb: usize, 62 | } 63 | 64 | impl Args { 65 | pub fn new(input: String, output: String, compression: u8) -> Self { 66 | Args { 67 | input, 68 | output, 69 | compression, 70 | worker_threads: None, 71 | input_buffer_size_mb: None, 72 | record_batch_target_mb: None, 73 | max_row_group_count: None, 74 | file_target_mb: 500usize, 75 | } 76 | } 77 | 78 | pub fn get_worker_threads(&self) -> usize { 79 | self.worker_threads.unwrap_or(default_worker_thread_count()) 80 | } 81 | 82 | pub fn get_input_buffer_size_bytes(&self) -> usize { 83 | // Max size of an uncompressed single blob is 32MB, assumes compression ratio of 2:1 or better 84 | self.input_buffer_size_mb.unwrap_or(16) * BYTES_IN_MB 85 | } 86 | 87 | pub fn get_record_batch_target_bytes(&self) -> usize { 88 | self.record_batch_target_mb 89 | .unwrap_or(default_record_batch_size_mb()) 90 | * BYTES_IN_MB 91 | } 92 | 93 | pub fn get_file_target_bytes(&self) -> usize { 94 | self.file_target_mb * BYTES_IN_MB 95 | } 96 | } 97 | 98 | fn default_record_batch_size_mb() -> usize { 99 | let system = System::new_all(); 100 | // Estimate per thread available memory, leaving overhead for copies and system processes 101 | ((system.total_memory() as usize / BYTES_IN_MB) / system.cpus().len()) / 8usize 102 | } 103 | 104 | fn default_worker_thread_count() -> usize { 105 | let system = System::new_all(); 106 | system.cpus().len() 107 | } 108 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # osm-pbf-parquet 2 | Transcode OSM PBF file to parquet files with hive-style partitioning by type 3 | 4 | ## Getting started 5 | 6 | ### Download 7 | Download latest version from [releases](https://github.com/OvertureMaps/osm-pbf-parquet/releases) 8 | 9 | ### Usage 10 | Example for x86_64 linux system with pre-compiled binary: 11 | ``` 12 | curl -L "https://github.com/OvertureMaps/osm-pbf-parquet/releases/latest/download/osm-pbf-parquet-x86_64-unknown-linux-gnu.tar.gz" -o "osm-pbf-parquet.tar.gz" 13 | tar -xzf osm-pbf-parquet.tar.gz 14 | chmod +x osm-pbf-parquet 15 | ./osm-pbf-parquet --input your.osm.pbf --output ./parquet 16 | ``` 17 | 18 | OR compile and run locally: 19 | ``` 20 | git clone https://github.com/OvertureMaps/osm-pbf-parquet.git 21 | cargo run --release -- --input your.osm.pbf --output ./parquet 22 | ``` 23 | 24 | ### Supported input/output 25 | - Local filesystem 26 | - AWS S3 (auth read from environment, see [object_store docs](https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3Builder.html)) 27 | 28 | ### Output structure 29 | ``` 30 | planet.osm.pbf 31 | parquet/ 32 | type=node/ 33 | node_0000.zstd.parquet 34 | ... 35 | type=relation/ 36 | relation_0000.zstd.parquet 37 | ... 38 | type=way/ 39 | way_0000.zstd.parquet 40 | ... 41 | ``` 42 | [Reference Arrow/SQL schema](https://github.com/OvertureMaps/osm-pbf-parquet/blob/main/src/osm_arrow.rs) 43 | 44 | ### Querying 45 | 46 | #### DuckDB 47 | ``` 48 | duckdb -c "SELECT * FROM read_parquet('s3://your-s3-bucket/path/') LIMIT 10;" 49 | ``` 50 | 51 | #### Athena/Presto/Trino 52 | ``` 53 | CREATE EXTERNAL TABLE IF NOT EXISTS `osm` ( 54 | `id` BIGINT, 55 | `tags` MAP, 56 | `lat` DOUBLE, 57 | `lon` DOUBLE, 58 | `nds` ARRAY>, 59 | `members` ARRAY>, 60 | `changeset` BIGINT, 61 | `timestamp` TIMESTAMP, 62 | `uid` BIGINT, 63 | `user` STRING, 64 | `version` BIGINT, 65 | `visible` BOOLEAN 66 | ) 67 | PARTITIONED BY ( 68 | `type` STRING 69 | ) 70 | ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 71 | STORED AS PARQUET 72 | LOCATION 's3://your-s3-bucket/path/'; 73 | 74 | MSCK REPAIR TABLE `osm`; 75 | 76 | SELECT * FROM osm LIMIT 10; 77 | ``` 78 | 79 | ## Development 80 | 1. [Install rust](https://www.rust-lang.org/tools/install) 81 | 2. Clone repo `git clone https://github.com/OvertureMaps/osm-pbf-parquet.git` 82 | 3. Make changes 83 | 4. Run against PBF with `cargo run -- --input your.osm.pbf` ([Geofabrik regional PBF extracts here](https://download.geofabrik.de/)) 84 | 5. Test with `cd test && ./prepare.sh && python3 validate.py` 85 | 86 | 87 | ## Benchmarks 88 | osm-pbf-parquet prioritizes transcode speed over file size, file count or perserving ordering. Here is a comparison against similar tools on the 2024-06-24 OSM planet PBF with target file size of 500MB: 89 | | | Time (wall) | Output size | File count | 90 | | - | - | - | - | 91 | | **osm-pbf-parquet** (zstd:3) | 30 minutes | 182GB | ~600 | 92 | | **osm-pbf-parquet** (zstd:9) | 60 minutes | 165GB | ~600 | 93 | | [osm-parquetizer](https://github.com/adrianulbona/osm-parquetizer) | 196 minutes | 285GB | 3 | 94 | | [osm2orc](https://github.com/mojodna/osm2orc) | 385 minutes | 110GB | 1 | 95 | 96 | Test system: 97 | ``` 98 | i5-9400 (6 CPU, 32GB memory) 99 | Ubuntu 24.04 100 | OpenJDK 17 101 | Rust 1.79.0 102 | ``` 103 | 104 | 105 | ## License 106 | Distributed under the MIT License. See `LICENSE` for more information. 107 | 108 | ## Acknowledgments 109 | * [osmpbf](https://github.com/b-r-u/osmpbf) and [osm2gzip](https://github.com/b-r-u/osm2gzip) for reading PBF data 110 | * [osm2orc](https://github.com/mojodna/osm2orc) for schema and processing ideas 111 | -------------------------------------------------------------------------------- /CODE-OF-CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, caste, color, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | - Demonstrating empathy and kindness toward other people 21 | - Being respectful of differing opinions, viewpoints, and experiences 22 | - Giving and gracefully accepting constructive feedback 23 | - Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | - Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | - The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | - Trolling, insulting or derogatory comments, and personal or political attacks 33 | - Public or private harassment 34 | - Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | - Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be directly 62 | reported to the Executive Director of the Green Software Foundation at exec@greensoftware.foundation or any community leaders responsible for enforcement. 63 | 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | [https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0]. 120 | 121 | Community Impact Guidelines were inspired by 122 | [Mozilla's code of conduct enforcement ladder][mozilla coc]. 123 | 124 | For answers to common questions about this code of conduct, see the FAQ at 125 | [https://www.contributor-covenant.org/faq][faq]. Translations are available 126 | at [https://www.contributor-covenant.org/translations][translations]. 127 | 128 | [homepage]: https://www.contributor-covenant.org 129 | [v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html 130 | [mozilla coc]: https://github.com/mozilla/diversity 131 | [faq]: https://www.contributor-covenant.org/faq 132 | [translations]: https://www.contributor-covenant.org/translations 133 | 134 | --- 135 | -------------------------------------------------------------------------------- /How to Contribute/Appendix_A.md: -------------------------------------------------------------------------------- 1 | ## Appendix - Checklist for Working Groups Chairs 2 | 3 | ### General Roles of Officers 4 | 5 | - [ ] The chair has overall responsibility for the management of the group 6 | - [ ] Co-chairs can provide full cover for the Chair - the intent being continuity of leadership to the group 7 | - [ ] Chairs and co-chairs are to behave as a single team generating the group's vision 8 | - [ ] Co-chairs provide the chair with a 2nd pair of eyes and ears to guage the meeting 9 | - [ ] Officers have to have visibility of all officers' communications to ensure transparency in the leadership 10 | (e.g. a leadership mailing list) 11 | - [ ] Chair needs to drive the co-chairs' support 12 | 13 | ### General Responsibilities 14 | 15 | - [ ] Read, understand and follow the OMP processes as defined in the OMP Rules of Engagement, and ensure that all rules are 16 | followed. Where there is doubt, seek advice from the parent group chair or ORG Team Officers as appropriate. 17 | - [ ] Read, understand and follow any other related OMP procedures and guidelines. 18 | - [ ] Familiarize themselves with IPR and Anti-Trust laws. 19 | - [ ] Ensure execution and fulfilment of the mandate bestowed on the group through its charter and assigned work packages. 20 | - [ ] Work closely with the officers of the parent group to ensure the integrated approach required for OMP is achieved. 21 | - [ ] Inform the parent group if the work of the group has been completed and recommend closure of the group. 22 | - [ ] Interact with other groups as may be necessary to fulfil the group's own or another group's, charter and work packages. 23 | - [ ] Interact with external fora (consistent with the liaison process) as necessary and report all interaction with external fora to the parent group. 24 | - [ ] At all times act fully impartially in conducting the group's mandate be prepared to confidentially work with a group member if they wish to express any concerns in private. 25 | - [ ] Support, promote and keep updated the work packages as appropriate. 26 | - [ ] Support the parent group with the responsibilities delegated. 27 | - [ ] Conduct all group business in a fair, reasonable and open manner in accordance with the approved OMP processes and procedures. 28 | - [ ] Chairs: Arrange for an agreed number of co-chairs to support the chair in the group leadership; arrange the task assignment for the chairs and co-chairs, delegate assigned tasks to, and supervise, the co-chairs. 29 | - [ ] Co-chairs: Support the chair/convenor/interim chair in task assignment and execute delegated tasks and stand in for the chair as appropriate to fulfill the roles of the group. 30 | - [ ] Internally structure the group as appropriate (e.g. create/modify/close sub-working groups, etc) to best fulfil the mandate. 31 | - [ ] Conduct elections of and manage all sub-group officers consistent with officer election rules, and act as election officer in the strictest confidentiality. 32 | - [ ] Regularly report group status and issues to the parent group, Steering Committee and members as required using the approved templates. 33 | - [ ] Ensure the contact information shown on the MS Teams is corrected and updated as necessary. The OMP Organization Chart and MS Teams al should be updated to reflect the appointment. 34 | - [ ] Organize group meetings and phone conferences in a timely manner. 35 | - [ ] Monitor your GitHub issues for requests to respond to questions from the public and provide responses. 36 | - [ ] The chair will manage the creation, regular maintenance and obsolescence of their group's charter. 37 | 38 | ### Meeting Responsibilities 39 | 40 | - [ ] Moderate the meeting to ensure the work within the group progresses in a timely manner. 41 | * When issues are raised within the meeting that the chair considers editorial and that may result in lengthy discussions, the chair should direct the editor to handle these issues outside the face to face session or conference call session subject to agreement on the resulting changes. 42 | * For class Editorial PRs should be pre-reviewed, then in the meeting the chair should seek agreement and direct the editor to incorporate them in the document without further discussion during the meeting (unless non-editorial issues are raised). 43 | - [ ] Organise and run the meetings in accordance with the processes and procedures of OMP as defined in the OMP "Rules of Engagement" and other guidelines, and ensure that all rules are followed. 44 | * Where there is doubt, seek advice from the parent group chair or ORG Officers as appropriate. 45 | - [ ] Issue a call for meeting agenda items in a timely manner as defined in the "Rules of Enagagement". 46 | - [ ] Ensure that an agenda is prepared for every meeting (face to face or by conference call). 47 | - [ ] When scheduling non-face-to-face real-time meeting such as phone conference, have in consideration the time zone of the participants. 48 | - [ ] Issue meeting agendas to fulfil the groups charter and work packages. 49 | - [ ] Monitor the discussion and activity of the group and facilitate to ensure that decisions are reached through consensus in a timely manner. 50 | - [ ] Monitor the discussion and activity of the group to ensure that no anti-trust violations occur. 51 | - [ ] Remind members regarding IPR and anti-trust at the beginning of each meeting. 52 | - [ ] Ensure that meeting minutes are written, reviewed and published after each meeting. 53 | - [ ] Prepare group status for the parent or Steering Committee meetings as required and ensure there is a representative from the working group who will handle these duties if the chair cannot be present, e.g. co-chair. 54 | - [ ] Ensure that group GitHub repos is kept up to date in a timely fashion 55 | - [ ] Manage the Specification Development Process. 56 | 57 | ### Administrative Aspects 58 | 59 | - [ ] Ensure the technical activities of the group are progressed in a timely manner in accordance with the processes and procedures defined in the "Rules of Engagement". 60 | - [ ] Organise and run the meetings in accordance with the processes and procedures of OMP as defined in the OMP "Rules of Engagement" and other documents, and ensure that all rules are followed. Where there is doubt, seek advice from the parent group chair or ORG Officers as appropriate. 61 | 62 | ### Technical Aspects 63 | 64 | - [ ] Complete all reviews (Requirements, Architecture and Consistency) and address all issues raised in a timely manner. 65 | - [ ] Conduct a full specification dependency analysis, and document the results in the specification and ensure the reference policy and guidelines are adhered to. 66 | - [ ] Address all maintenance issues (PRs) in a timely manner. 67 | -------------------------------------------------------------------------------- /.github/advanced-issue-labeler.yml: -------------------------------------------------------------------------------- 1 | # syntax - https://github.com/redhat-plumbers-in-action/advanced-issue-labeler#policy 2 | --- 3 | 4 | policy: 5 | - template: [bug.yml, feature.yml] 6 | section: 7 | - id: [component] 8 | block-list: [other] 9 | label: 10 | - name: analyze 11 | keys: ['systemd-analyze'] 12 | 13 | - name: ask-password 14 | keys: ['systemd-ask-password'] 15 | 16 | - name: binfmt 17 | keys: ['systemd-binfmt'] 18 | 19 | - name: cgtop 20 | keys: ['systemd-cgtop'] 21 | 22 | - name: cryptsetup 23 | keys: ['systemd-cryptsetup'] 24 | 25 | - name: delta 26 | keys: ['systemd-delta'] 27 | 28 | - name: env 29 | keys: ['systemd-env-generator'] 30 | 31 | - name: fsck 32 | keys: ['systemd-fsck'] 33 | 34 | - name: gpt-auto 35 | keys: ['systemd-gpt-auto-generator'] 36 | 37 | - name: growfs 38 | keys: ['systemd-growfs'] 39 | 40 | - name: homed 41 | keys: ['systemd-homed', 'homectl', 'pam_systemd_home'] 42 | 43 | - name: hostname 44 | keys: ['systemd-hostnamed', 'hostnamectl'] 45 | 46 | - name: hwdb 47 | keys: ['systemd-hwdb', 'hardware database files'] 48 | 49 | - name: import 50 | keys: ['systemd-import'] 51 | 52 | - name: journal 53 | keys: ['systemd-journald', 'journalctl'] 54 | 55 | - name: journal-remote 56 | keys: ['systemd-journal-remote', 'systemd-journal-upload', 'systemd-journal-gatewayd'] 57 | 58 | - name: kernel-install 59 | keys: ['kernel-install'] 60 | 61 | - name: logind 62 | keys: ['systemd-logind', 'loginctl', 'pam_systemd'] 63 | 64 | - name: machined 65 | keys: ['systemd-machined', 'machinectl'] 66 | 67 | - name: modules-load 68 | keys: ['systemd-modules-load'] 69 | 70 | - name: network 71 | keys: ['systemd-networkd', 'networkctl', 'systemd-networkd-wait-online', 'systemd-network-generator'] 72 | 73 | - name: nspawn 74 | keys: ['systemd-nspawn'] 75 | 76 | - name: oomd 77 | keys: ['systemd-oomd', 'oomctl'] 78 | 79 | - name: pid1 80 | keys: ['systemd'] 81 | 82 | - name: portabled 83 | keys: ['systemd-portabled', 'portablectl'] 84 | 85 | - name: pstore 86 | keys: ['systemd-pstore'] 87 | 88 | - name: repart 89 | keys: ['systemd-repart'] 90 | 91 | - name: resolve 92 | keys: ['systemd-resolved', 'resolvectl', 'nss-resolve'] 93 | 94 | - name: rfkill 95 | keys: ['systemd-rfkill'] 96 | 97 | - name: rpm 98 | keys: ['rpm scriptlets'] 99 | 100 | - name: run 101 | keys: ['systemd-run'] 102 | 103 | - name: sd-boot/sd-stub/bootctl 104 | keys: ['bootctl', 'systemd-boot', 'systemd-stub'] 105 | 106 | - name: sysctl 107 | keys: ['systemd-sysctl'] 108 | 109 | - name: sysext 110 | keys: ['systemd-sysext'] 111 | 112 | - name: systemctl 113 | keys: ['systemctl'] 114 | 115 | - name: sysusers 116 | keys: ['systemd-sysusers'] 117 | 118 | - name: sysv 119 | keys: ['systemd-sysv-generator'] 120 | 121 | - name: tests 122 | keys: ['tests'] 123 | 124 | - name: timedate 125 | keys: ['systemd-timedate', 'timedatectl'] 126 | 127 | - name: timesync 128 | keys: ['systemd-timesync'] 129 | 130 | - name: tmpfiles 131 | keys: ['systemd-tmpfiles'] 132 | 133 | - name: udev 134 | keys: ['systemd-udevd', 'udevadm', 'udev rule files'] 135 | 136 | - name: userdb 137 | keys: ['systemd-userdb', 'userdbctl'] 138 | 139 | - name: veritysetup 140 | keys: ['systemd-veritysetup'] 141 | 142 | - name: xdg-autostart 143 | keys: ['systemd-xdg-autostart-generator'] 144 | 145 | - template: ['animals.yml'] 146 | section: 147 | - id: [animals] 148 | block-list: ['I do not like animals', 'Other'] 149 | label: 150 | - name: 'kind: amphibians' 151 | keys: ['🐸 Frog'] 152 | 153 | - name: 'kind: birds' 154 | keys: ['🐓 Rooster', '🐦 Bird', '🐧 Penguin'] 155 | 156 | - name: 'kind: fish' 157 | keys: ['🐡 Blowfish', '🐟 Fish', '🦈 Shark'] 158 | 159 | - name: 'kind: mammals' 160 | keys: ['🦍 Gorilla', '🐶 Dog', '🐬 Dolphin', '🐺 Wolf', '🦊 Fox', '🐴 Horse'] 161 | 162 | - name: 'kind: reptiles' 163 | keys: ['🐊 Crocodile'] 164 | 165 | - name: 'invertebrates' 166 | keys: ['🐛 Bug', '🕷️ Spider'] 167 | 168 | - id: ['food'] 169 | block-list: ["I don't like food", 'Other'] 170 | label: 171 | - name: 'food: fruits' 172 | keys: ['🍎 apple', '🥒 cucumber', '🍊 orange', '🍅 tomato'] 173 | 174 | - name: 'food: vegetables' 175 | keys: ['🥔 potato'] 176 | 177 | - id: ['severity', 'priority'] 178 | block-list: [] 179 | label: 180 | - name: 'low' 181 | keys: ['low', 'I do not know'] 182 | 183 | - name: 'medium' 184 | keys: ['medium'] 185 | 186 | - name: 'high' 187 | keys: ['high'] 188 | 189 | - name: 'urgent' 190 | keys: ['urgent'] 191 | 192 | - section: 193 | - id: [type] 194 | block-list: ['Other'] 195 | label: 196 | - name: 'bug 🐛' 197 | keys: ['Bug Report'] 198 | 199 | - name: 'RFE 🎁' 200 | keys: ['Feature Request'] 201 | 202 | - template: ['phase_of_development.yml'] 203 | section: 204 | - id: [phases] 205 | block-list: ['other'] 206 | label: 207 | - name: 'phase 1' 208 | keys: ['phase-1'] 209 | 210 | - name: 'phase 2' 211 | keys: ['phase-2'] 212 | 213 | - name: 'phase 3' 214 | keys: ['phase-3'] 215 | 216 | - id: ['milestone'] 217 | block-list: ['other'] 218 | label: 219 | - name: 'milestone 1' 220 | keys: ['milestone-1'] 221 | 222 | - name: 'milestone 2' 223 | keys: ['milestone-2'] 224 | 225 | - name: 'milestone 3' 226 | keys: ['milestone-3'] 227 | 228 | - name: 'medium' 229 | keys: ['medium'] 230 | 231 | - name: 'high' 232 | keys: ['high'] 233 | 234 | - name: 'urgent' 235 | keys: ['urgent'] 236 | 237 | -------------------------------------------------------------------------------- /test/validate.py: -------------------------------------------------------------------------------- 1 | import xmltodict 2 | import pandas as pd 3 | from pathlib import Path 4 | from datetime import datetime 5 | 6 | OSM_XML_FILE = "test.osm" 7 | PARQUET_DIR = "./parquet" 8 | 9 | # Parse, process OSM XML file into a dataframe 10 | xml_lines = Path(OSM_XML_FILE).read_text() 11 | xml_dict = xmltodict.parse(xml_lines)["osm"] 12 | xml_rows = [] 13 | for feature_type in ["node", "way", "relation"]: 14 | for row in xml_dict[feature_type]: 15 | new_entries = {} 16 | new_entries["type"] = feature_type 17 | for key, value in row.items(): 18 | new_value = value 19 | # Everything is read in as strings, try to parse 20 | try: 21 | new_value = int(value) 22 | except Exception: 23 | try: 24 | new_value = float(value) 25 | except Exception: 26 | pass 27 | if "timestamp" in key: 28 | new_value = round(datetime.fromisoformat(value).timestamp() * 1e3) 29 | 30 | if "@" in key: 31 | new_entries[key.replace("@", "")] = new_value 32 | else: 33 | new_entries[key] = new_value 34 | 35 | xml_rows.append(new_entries) 36 | xml_df = pd.DataFrame.from_dict(xml_rows) 37 | xml_df = xml_df.replace({float("nan"): None}) 38 | 39 | # Read in parquet data 40 | pq_df = pd.read_parquet(PARQUET_DIR) 41 | 42 | print("Samples, metadata") 43 | print(xml_df.head()) 44 | print(pq_df.head()) 45 | print(xml_df.dtypes) 46 | print(pq_df.dtypes) 47 | 48 | # Check for duplicates 49 | duplicate_count = pq_df.duplicated(subset=["id", "type"]).sum() 50 | if duplicate_count > 0: 51 | print(xml_df.count()) 52 | print(pq_df.count()) 53 | print(xml_df.groupby("type").count()) 54 | print(pq_df.groupby("type").count()) 55 | raise AssertionError(f"Duplicate count: {duplicate_count}") 56 | 57 | # Find all missing from either side 58 | joined = xml_df.set_index(["id", "type"]).join( 59 | pq_df.set_index(["id", "type"]), 60 | on=["id", "type"], 61 | how="outer", 62 | rsuffix="_pq", 63 | validate="one_to_one", 64 | ) 65 | 66 | print("Checking parquet has all XML data") 67 | pq_missing = joined[joined["version_pq"].isnull()] 68 | if pq_missing.count().sum() > 0: 69 | print(pq_missing.count()) 70 | print(pq_missing.head()) 71 | raise AssertionError( 72 | f"Unexpected missing rows from parquet, count: {pq_missing.count().sum()}" 73 | ) 74 | 75 | print("Checking unexpected parquet data (not in XML)") 76 | xml_missing = joined[joined["version"].isnull()] 77 | if xml_missing.count().sum() > 0: 78 | print(xml_missing.count()) 79 | print(xml_missing.head()) 80 | raise AssertionError( 81 | f"Unexpected extra rows in parquet, count: {xml_missing.count().sum()}" 82 | ) 83 | 84 | print("Checking mismatched scalar values") 85 | mismatched = joined[ 86 | ~( 87 | (joined["version"].eq(joined["version_pq"])) 88 | | (joined["timestamp"].eq(joined["timestamp_pq"])) 89 | | (joined["lat"].eq(joined["lat_pq"])) 90 | | (joined["lon"].eq(joined["lon_pq"])) 91 | ) 92 | ] 93 | if mismatched.count().sum() > 0: 94 | mismatched["version_match"] = mismatched["version"].eq(mismatched["version_pq"]) 95 | mismatched["timestamp_match"] = mismatched["timestamp"].eq( 96 | mismatched["timestamp_pq"] 97 | ) 98 | mismatched["lat_match"] = mismatched["lat"].eq(mismatched["lat_pq"]) 99 | mismatched["lon_match"] = mismatched["lon"].eq(mismatched["lon_pq"]) 100 | print(mismatched.count()) 101 | print( 102 | mismatched[ 103 | [ 104 | "version", 105 | "version_pq", 106 | "version_match", 107 | "timestamp", 108 | "timestamp_pq", 109 | "timestamp_match", 110 | "lat", 111 | "lat_pq", 112 | "lat_match", 113 | "lon", 114 | "lon_pq", 115 | "lon_match", 116 | ] 117 | ].head() 118 | ) 119 | raise AssertionError(f"Mismatched data, count: {mismatched.count().sum()}") 120 | 121 | 122 | def remap_tags(tags): 123 | if not tags: 124 | return [] 125 | if type(tags) == dict: 126 | return [(tags["@k"], tags["@v"])] 127 | return [(pair["@k"], pair["@v"]) for pair in tags] 128 | 129 | 130 | print("Checking tags") 131 | mismatched_tags = joined[ 132 | ~(joined["tag"].apply(lambda tags: remap_tags(tags)).eq(joined["tags"])) 133 | ] 134 | if mismatched_tags.count().sum() > 0: 135 | print(mismatched_tags.count()) 136 | print(mismatched_tags[["tag", "tags"]].head()) 137 | raise AssertionError(f"Mismatched tags, count: {mismatched_tags.count().sum()}") 138 | 139 | 140 | def remap_nodes(nodes): 141 | if nodes is None or len(nodes) == 0: 142 | return [] 143 | if type(nodes) == dict: 144 | return [int(nodes.get("@ref", nodes.get("ref")))] 145 | return [int(node.get("@ref", node.get("ref"))) for node in nodes] 146 | 147 | 148 | def compare_nodes(xml_nodes, pq_nodes): 149 | for xml, pq in zip(xml_nodes, pq_nodes): 150 | if str(xml) != str(pq): 151 | return False 152 | return True 153 | 154 | 155 | print("Checking nodes") 156 | joined["nd_remap"] = joined["nd"].apply(lambda nodes: remap_nodes(nodes)) 157 | joined["nds_remap"] = joined["nds"].apply(lambda nodes: remap_nodes(nodes)) 158 | joined["nd_compare"] = joined.apply( 159 | lambda row: compare_nodes(row["nd_remap"], row["nds_remap"]), axis=1 160 | ) 161 | mismatched_nodes = joined[~(joined["nd_compare"])] 162 | if mismatched_nodes.count().sum() > 0: 163 | print(mismatched_nodes.count()) 164 | print(mismatched_nodes[["nd_remap", "nds_remap"]].head()) 165 | raise AssertionError(f"Mismatched nodes, count: {mismatched_nodes.count().sum()}") 166 | 167 | 168 | def remap_members(members): 169 | if members is None or len(members) == 0: 170 | return [] 171 | if type(members) == dict: 172 | return [ 173 | ( 174 | members.get("@type", members.get("type")), 175 | int(members.get("@ref", members.get("ref"))), 176 | members.get("@role", members.get("role")), 177 | ) 178 | ] 179 | return [ 180 | ( 181 | member.get("@type", member.get("type")), 182 | int(member.get("@ref", member.get("ref"))), 183 | member.get("@role", member.get("role")), 184 | ) 185 | for member in members 186 | ] 187 | 188 | 189 | def compare_members(xml_members, pq_members): 190 | for xml, pq in zip(xml_members, pq_members): 191 | if xml != pq: 192 | return False 193 | return True 194 | 195 | 196 | print("Checking members") 197 | joined["member_remap"] = joined["member"].apply(lambda nodes: remap_members(nodes)) 198 | joined["members_remap"] = joined["members"].apply(lambda nodes: remap_members(nodes)) 199 | joined["members_compare"] = joined.apply( 200 | lambda row: compare_members(row["member_remap"], row["members_remap"]), axis=1 201 | ) 202 | mismatched_members = joined[~(joined["members_compare"])] 203 | if mismatched_members.count().sum() > 0: 204 | print(mismatched_members.count()) 205 | print(mismatched_members[["member_remap", "members_remap"]].head()) 206 | raise AssertionError( 207 | f"Mismatched members, count: {mismatched_members.count().sum()}" 208 | ) 209 | -------------------------------------------------------------------------------- /src/pbf.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | use std::sync::{Arc, Mutex}; 3 | 4 | use futures::StreamExt; 5 | use futures_util::pin_mut; 6 | use log::info; 7 | use object_store::aws::AmazonS3Builder; 8 | use object_store::buffered::BufReader; 9 | use object_store::local::LocalFileSystem; 10 | use object_store::path::Path; 11 | use object_store::ObjectStore; 12 | use osmpbf::{AsyncBlobReader, BlobDecode, Element, PrimitiveBlock}; 13 | use tokio::runtime::Handle; 14 | use tokio::sync::Semaphore; 15 | use tokio::task::JoinSet; 16 | use url::Url; 17 | 18 | use crate::osm_arrow::OSMType; 19 | use crate::sink::ElementSink; 20 | use crate::util::{SinkpoolStore, ARGS, ELEMENT_COUNTER}; 21 | 22 | pub async fn create_s3_buf_reader(url: Url) -> Result { 23 | let s3_store = AmazonS3Builder::from_env().with_url(url.clone()).build()?; 24 | let path = Path::parse(url.path())?; 25 | let meta = s3_store.head(&path).await?; 26 | Ok(BufReader::with_capacity( 27 | Arc::new(s3_store), 28 | &meta, 29 | ARGS.get().unwrap().get_input_buffer_size_bytes(), 30 | )) 31 | } 32 | 33 | pub async fn create_local_buf_reader(path: &str) -> Result { 34 | let local_store: LocalFileSystem = LocalFileSystem::new(); 35 | let path = std::path::Path::new(path); 36 | let filesystem_path = object_store::path::Path::from_filesystem_path(path)?; 37 | let meta = local_store.head(&filesystem_path).await?; 38 | Ok(BufReader::with_capacity( 39 | Arc::new(local_store), 40 | &meta, 41 | ARGS.get().unwrap().get_input_buffer_size_bytes(), 42 | )) 43 | } 44 | 45 | pub async fn process_blobs( 46 | buf_reader: BufReader, 47 | sinkpools: Arc, 48 | ) -> Result<(), anyhow::Error> { 49 | let mut blob_reader = AsyncBlobReader::new(buf_reader); 50 | 51 | let stream = blob_reader.stream(); 52 | pin_mut!(stream); 53 | 54 | let filenums: Arc>>> = Arc::new(HashMap::from([ 55 | (OSMType::Node, Arc::new(Mutex::new(0))), 56 | (OSMType::Way, Arc::new(Mutex::new(0))), 57 | (OSMType::Relation, Arc::new(Mutex::new(0))), 58 | ])); 59 | 60 | // Avoid too many tasks in memory 61 | let active_tasks = (1.5 * ARGS.get().unwrap().get_worker_threads() as f32) as usize; 62 | let semaphore = Arc::new(Semaphore::new(active_tasks)); 63 | 64 | let mut join_set = JoinSet::new(); 65 | while let Some(Ok(blob)) = stream.next().await { 66 | let sinkpools = sinkpools.clone(); 67 | let filenums = filenums.clone(); 68 | let permit = semaphore.clone().acquire_owned().await.unwrap(); 69 | join_set.spawn(async move { 70 | match blob.decode() { 71 | Ok(BlobDecode::OsmHeader(_)) => (), 72 | Ok(BlobDecode::OsmData(block)) => { 73 | process_block(block, sinkpools.clone(), filenums.clone()) 74 | .await 75 | .unwrap(); 76 | } 77 | Ok(BlobDecode::Unknown(unknown)) => { 78 | panic!("Unknown blob: {}", unknown); 79 | } 80 | Err(error) => { 81 | panic!("Error decoding blob: {}", error); 82 | } 83 | } 84 | drop(permit); 85 | }); 86 | } 87 | while let Some(result) = join_set.join_next().await { 88 | result?; 89 | } 90 | Ok(()) 91 | } 92 | 93 | pub async fn monitor(sinkpools: Arc) { 94 | let mut interval = tokio::time::interval(std::time::Duration::from_secs(60)); 95 | interval.tick().await; // First tick is immediate 96 | 97 | loop { 98 | interval.tick().await; 99 | 100 | // Run cleanup 101 | finish_sinks(sinkpools.clone(), false).await.unwrap(); 102 | 103 | // Log progress 104 | let processed = ELEMENT_COUNTER.load(std::sync::atomic::Ordering::Relaxed); 105 | let mut processed_str = format!("{}", processed); 106 | if processed >= 1_000_000_000 { 107 | processed_str = format!("{:.2}B", (processed as f64) / 1_000_000_000.0); 108 | } else if processed >= 1_000_000 { 109 | processed_str = format!("{:.2}M", (processed as f64) / 1_000_000.0); 110 | } 111 | info!("Processed {} elements", processed_str); 112 | } 113 | } 114 | 115 | pub async fn finish_sinks( 116 | sinkpools: Arc, 117 | force_finish: bool, 118 | ) -> Result<(), anyhow::Error> { 119 | let handle = Handle::current(); 120 | let mut join_set = JoinSet::new(); 121 | for sinkpool in sinkpools.values() { 122 | let mut pool = sinkpool.lock().unwrap(); 123 | let sinks = pool.drain(..).collect::>(); 124 | for mut sink in sinks { 125 | if force_finish || sink.last_write_cycle.elapsed().as_secs() > 30 { 126 | // Finish, old or final cleanup run 127 | join_set.spawn_on( 128 | async move { 129 | sink.finish().await.unwrap(); 130 | }, 131 | &handle, 132 | ); 133 | } else { 134 | // Retain, still being written to 135 | pool.push(sink); 136 | } 137 | } 138 | } 139 | while let Some(result) = join_set.join_next().await { 140 | result?; 141 | } 142 | Ok(()) 143 | } 144 | 145 | fn get_sink_from_pool( 146 | osm_type: OSMType, 147 | sinkpools: Arc, 148 | filenums: Arc>>>, 149 | ) -> Result { 150 | { 151 | let mut pool = sinkpools[&osm_type].lock().unwrap(); 152 | if let Some(sink) = pool.pop() { 153 | return Ok(sink); 154 | } 155 | } 156 | ElementSink::new(filenums[&osm_type].clone(), osm_type) 157 | } 158 | 159 | fn add_sink_to_pool(sink: ElementSink, sinkpools: Arc) { 160 | let osm_type = sink.osm_type.clone(); 161 | let mut pool = sinkpools[&osm_type].lock().unwrap(); 162 | pool.push(sink); 163 | } 164 | 165 | async fn process_block( 166 | block: PrimitiveBlock, 167 | sinkpools: Arc, 168 | filenums: Arc>>>, 169 | ) -> Result { 170 | let mut node_sink = get_sink_from_pool(OSMType::Node, sinkpools.clone(), filenums.clone())?; 171 | let mut way_sink = get_sink_from_pool(OSMType::Way, sinkpools.clone(), filenums.clone())?; 172 | let mut rel_sink = get_sink_from_pool(OSMType::Relation, sinkpools.clone(), filenums.clone())?; 173 | 174 | let mut block_counter = 0u64; 175 | for element in block.elements() { 176 | block_counter += 1; 177 | match element { 178 | Element::Node(ref node) => { 179 | node_sink.add_node(node); 180 | } 181 | Element::DenseNode(ref node) => { 182 | node_sink.add_dense_node(node); 183 | } 184 | Element::Way(ref way) => { 185 | way_sink.add_way(way); 186 | } 187 | Element::Relation(ref rel) => { 188 | rel_sink.add_relation(rel); 189 | } 190 | } 191 | } 192 | ELEMENT_COUNTER.fetch_add(block_counter, std::sync::atomic::Ordering::Relaxed); 193 | 194 | node_sink.increment_and_cycle().await?; 195 | way_sink.increment_and_cycle().await?; 196 | rel_sink.increment_and_cycle().await?; 197 | add_sink_to_pool(node_sink, sinkpools.clone()); 198 | add_sink_to_pool(way_sink, sinkpools.clone()); 199 | add_sink_to_pool(rel_sink, sinkpools.clone()); 200 | 201 | Ok(block_counter) 202 | } 203 | -------------------------------------------------------------------------------- /How to Contribute/images/jdf-horizontal-color.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/osm_arrow.rs: -------------------------------------------------------------------------------- 1 | use std::fmt; 2 | use std::sync::Arc; 3 | 4 | use arrow::array::builder::{ 5 | BooleanBuilder, Int64Builder, ListBuilder, MapBuilder, StringBuilder, StructBuilder, 6 | }; 7 | use arrow::array::{ArrayRef, Float64Builder, Int32Builder, TimestampMillisecondBuilder}; 8 | use arrow::datatypes::{DataType, Field, Fields, Schema, TimeUnit}; 9 | use arrow::error::ArrowError; 10 | use arrow::record_batch::RecordBatch; 11 | 12 | #[derive(Clone, Debug, Eq, PartialEq, Hash)] 13 | pub enum OSMType { 14 | Node, 15 | Way, 16 | Relation, 17 | } 18 | 19 | impl fmt::Display for OSMType { 20 | fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result { 21 | write!(formatter, "{}", format!("{:?}", self).to_lowercase()) 22 | } 23 | } 24 | 25 | pub fn osm_arrow_schema() -> Schema { 26 | // Derived from this schema: 27 | // `id` BIGINT, 28 | // `tags` MAP, 29 | // `lat` DOUBLE, 30 | // `lon` DOUBLE, 31 | // `nds` ARRAY>, 32 | // `members` ARRAY>, 33 | // `changeset` BIGINT, 34 | // `timestamp` TIMESTAMP, 35 | // `uid` BIGINT, 36 | // `user` STRING, 37 | // `version` BIGINT, 38 | // `visible` BOOLEAN 39 | 40 | // TODO - add type field when not writing with partitions 41 | // `type` STRING 42 | // Field::new("type", DataType::Utf8, false) 43 | 44 | Schema::new(vec![ 45 | Field::new("id", DataType::Int64, false), 46 | Field::new( 47 | "tags", 48 | DataType::Map( 49 | Arc::new(Field::new( 50 | "entries", 51 | DataType::Struct(Fields::from(vec![ 52 | Field::new("keys", DataType::Utf8, false), 53 | Field::new("values", DataType::Utf8, true), 54 | ])), 55 | false, 56 | )), 57 | false, 58 | ), 59 | true, 60 | ), 61 | Field::new("lat", DataType::Float64, true), 62 | Field::new("lon", DataType::Float64, true), 63 | Field::new( 64 | "nds", 65 | DataType::List(Arc::new(Field::new( 66 | "item", 67 | DataType::Struct(Fields::from(vec![Field::new("ref", DataType::Int64, true)])), 68 | true, 69 | ))), 70 | true, 71 | ), 72 | Field::new( 73 | "members", 74 | DataType::List(Arc::new(Field::new( 75 | "item", 76 | DataType::Struct(Fields::from(vec![ 77 | Field::new("type", DataType::Utf8, true), 78 | Field::new("ref", DataType::Int64, true), 79 | Field::new("role", DataType::Utf8, true), 80 | ])), 81 | true, 82 | ))), 83 | true, 84 | ), 85 | Field::new("changeset", DataType::Int64, true), 86 | Field::new( 87 | "timestamp", 88 | DataType::Timestamp(TimeUnit::Millisecond, None), 89 | true, 90 | ), 91 | Field::new("uid", DataType::Int32, true), 92 | Field::new("user", DataType::Utf8, true), 93 | Field::new("version", DataType::Int32, true), 94 | Field::new("visible", DataType::Boolean, true), 95 | ]) 96 | } 97 | 98 | pub struct OSMArrowBuilder { 99 | id_builder: Box, 100 | tags_builder: Box>, 101 | lat_builder: Box, 102 | lon_builder: Box, 103 | nodes_builder: Box>, 104 | members_builder: Box>, 105 | changeset_builder: Box, 106 | timestamp_builder: Box, 107 | uid_builder: Box, 108 | user_builder: Box, 109 | version_builder: Box, 110 | visible_builder: Box, 111 | } 112 | 113 | impl Default for OSMArrowBuilder { 114 | fn default() -> Self { 115 | Self::new() 116 | } 117 | } 118 | 119 | impl OSMArrowBuilder { 120 | pub fn new() -> Self { 121 | let id_builder = Box::new(Int64Builder::new()); 122 | let tags_builder = Box::new(MapBuilder::new( 123 | None, 124 | StringBuilder::new(), 125 | StringBuilder::new(), 126 | )); 127 | let lat_builder = Box::new(Float64Builder::new()); 128 | let lon_builder = Box::new(Float64Builder::new()); 129 | let nodes_builder = Box::new(ListBuilder::new(StructBuilder::from_fields( 130 | vec![Field::new("ref", DataType::Int64, true)], 131 | 0, 132 | ))); 133 | let members_builder = Box::new(ListBuilder::new(StructBuilder::from_fields( 134 | vec![ 135 | Field::new("type", DataType::Utf8, true), 136 | Field::new("ref", DataType::Int64, true), 137 | Field::new("role", DataType::Utf8, true), 138 | ], 139 | 0, 140 | ))); 141 | let changeset_builder = Box::new(Int64Builder::new()); 142 | let timestamp_builder = Box::new(TimestampMillisecondBuilder::new()); 143 | let uid_builder = Box::new(Int32Builder::new()); 144 | let user_builder = Box::new(StringBuilder::new()); 145 | let version_builder = Box::new(Int32Builder::new()); 146 | let visible_builder = Box::new(BooleanBuilder::new()); 147 | 148 | OSMArrowBuilder { 149 | id_builder, 150 | tags_builder, 151 | lat_builder, 152 | lon_builder, 153 | nodes_builder, 154 | members_builder, 155 | changeset_builder, 156 | timestamp_builder, 157 | uid_builder, 158 | user_builder, 159 | version_builder, 160 | visible_builder, 161 | } 162 | } 163 | 164 | #[allow(clippy::too_many_arguments)] 165 | pub fn append_row( 166 | &mut self, 167 | id: i64, 168 | _type_: OSMType, 169 | tags_iter: T, 170 | lat: Option, 171 | lon: Option, 172 | nodes_iter: N, 173 | members_iter: M, 174 | changeset: Option, 175 | timestamp_ms: Option, 176 | uid: Option, 177 | user: Option, 178 | version: Option, 179 | visible: Option, 180 | ) -> usize 181 | where 182 | T: IntoIterator, 183 | N: IntoIterator, 184 | M: IntoIterator)>, 185 | { 186 | // Track approximate size of inserted data, starting with known constant sizes 187 | let mut est_size_bytes = 64usize; 188 | 189 | self.id_builder.append_value(id); 190 | 191 | for (key, value) in tags_iter { 192 | est_size_bytes += key.len() + value.len(); 193 | self.tags_builder.keys().append_value(key); 194 | self.tags_builder.values().append_value(value); 195 | } 196 | let _ = self.tags_builder.append(true); 197 | 198 | self.lat_builder.append_option(lat); 199 | self.lon_builder.append_option(lon); 200 | 201 | // Derived from https://docs.rs/arrow/latest/arrow/array/struct.StructBuilder.html 202 | let struct_builder = self.nodes_builder.values(); 203 | for node_id in nodes_iter { 204 | est_size_bytes += 8usize; 205 | struct_builder 206 | .field_builder::(0) 207 | .unwrap() 208 | .append_value(node_id); 209 | struct_builder.append(true); 210 | } 211 | self.nodes_builder.append(true); 212 | 213 | let members_struct_builder = self.members_builder.values(); 214 | for (osm_type, ref_, role) in members_iter { 215 | // Rough size to avoid unwrapping, role should be fairly short. 216 | est_size_bytes += 10usize; 217 | 218 | members_struct_builder 219 | .field_builder::(0) 220 | .unwrap() 221 | .append_value(osm_type.to_string()); 222 | 223 | members_struct_builder 224 | .field_builder::(1) 225 | .unwrap() 226 | .append_value(ref_); 227 | 228 | members_struct_builder 229 | .field_builder::(2) 230 | .unwrap() 231 | .append_option(role); 232 | 233 | members_struct_builder.append(true); 234 | } 235 | self.members_builder.append(true); 236 | 237 | self.changeset_builder.append_option(changeset); 238 | self.timestamp_builder.append_option(timestamp_ms); 239 | self.uid_builder.append_option(uid); 240 | self.user_builder.append_option(user); 241 | self.version_builder.append_option(version); 242 | self.visible_builder.append_option(visible); 243 | 244 | est_size_bytes 245 | } 246 | 247 | pub fn finish(&mut self) -> Result { 248 | let array_refs: Vec = vec![ 249 | Arc::new(self.id_builder.finish()), 250 | Arc::new(self.tags_builder.finish()), 251 | Arc::new(self.lat_builder.finish()), 252 | Arc::new(self.lon_builder.finish()), 253 | Arc::new(self.nodes_builder.finish()), 254 | Arc::new(self.members_builder.finish()), 255 | Arc::new(self.changeset_builder.finish()), 256 | Arc::new(self.timestamp_builder.finish()), 257 | Arc::new(self.uid_builder.finish()), 258 | Arc::new(self.user_builder.finish()), 259 | Arc::new(self.version_builder.finish()), 260 | Arc::new(self.visible_builder.finish()), 261 | ]; 262 | 263 | RecordBatch::try_new(Arc::new(osm_arrow_schema()), array_refs) 264 | } 265 | } 266 | -------------------------------------------------------------------------------- /src/sink.rs: -------------------------------------------------------------------------------- 1 | use object_store::buffered::BufWriter; 2 | use std::path::absolute; 3 | use std::sync::{Arc, Mutex}; 4 | use std::time::Instant; 5 | 6 | use object_store::aws::AmazonS3Builder; 7 | use object_store::local::LocalFileSystem; 8 | use object_store::path::Path; 9 | use osmpbf::{DenseNode, Node, RelMemberType, Relation, Way}; 10 | use parquet::arrow::async_writer::AsyncArrowWriter; 11 | use parquet::basic::{Compression, ZstdLevel}; 12 | use parquet::file::properties::WriterProperties; 13 | use url::Url; 14 | 15 | use crate::osm_arrow::osm_arrow_schema; 16 | use crate::osm_arrow::OSMArrowBuilder; 17 | use crate::osm_arrow::OSMType; 18 | use crate::util::ARGS; 19 | 20 | pub struct ElementSink { 21 | // Config for writing file 22 | pub osm_type: OSMType, 23 | filenum: Arc>, 24 | 25 | // Arrow wrappers 26 | osm_builder: Box, 27 | writer: Option>, // Wrapped so we can replace this on the fly 28 | 29 | // State tracking for batching 30 | estimated_record_batch_bytes: usize, 31 | estimated_file_bytes: usize, 32 | target_record_batch_bytes: usize, 33 | target_file_bytes: usize, 34 | pub last_write_cycle: Instant, 35 | } 36 | 37 | impl ElementSink { 38 | pub fn new(filenum: Arc>, osm_type: OSMType) -> Result { 39 | let args = ARGS.get().unwrap(); 40 | 41 | let full_path = Self::create_full_path(&args.output, &osm_type, &filenum, args.compression); 42 | let buf_writer = Self::create_buf_writer(&full_path)?; 43 | let writer = Self::create_writer(buf_writer, args.compression, args.max_row_group_count)?; 44 | 45 | Ok(ElementSink { 46 | osm_type, 47 | filenum, 48 | 49 | osm_builder: Box::new(OSMArrowBuilder::new()), 50 | writer: Some(writer), 51 | 52 | estimated_record_batch_bytes: 0usize, 53 | estimated_file_bytes: 0usize, 54 | target_record_batch_bytes: args.get_record_batch_target_bytes(), 55 | target_file_bytes: args.get_file_target_bytes(), 56 | last_write_cycle: Instant::now(), 57 | }) 58 | } 59 | 60 | pub async fn finish(&mut self) -> Result<(), anyhow::Error> { 61 | self.finish_batch().await?; 62 | self.writer.take().unwrap().close().await?; 63 | Ok(()) 64 | } 65 | 66 | async fn finish_batch(&mut self) -> Result<(), anyhow::Error> { 67 | if self.estimated_record_batch_bytes == 0 { 68 | // Nothing to write 69 | return Ok(()); 70 | } 71 | let batch = self.osm_builder.finish()?; 72 | self.writer.as_mut().unwrap().write(&batch).await?; 73 | 74 | // Reset writer to new path if needed 75 | self.estimated_file_bytes += self.estimated_record_batch_bytes; 76 | if self.estimated_file_bytes >= self.target_file_bytes { 77 | self.writer.take().unwrap().close().await?; 78 | 79 | // Create new writer and output 80 | let args = ARGS.get().unwrap(); 81 | let full_path = Self::create_full_path( 82 | &args.output, 83 | &self.osm_type, 84 | &self.filenum, 85 | args.compression, 86 | ); 87 | let buf_writer = Self::create_buf_writer(&full_path)?; 88 | self.writer = Some(Self::create_writer( 89 | buf_writer, 90 | args.compression, 91 | args.max_row_group_count, 92 | )?); 93 | self.estimated_file_bytes = 0; 94 | } 95 | 96 | self.estimated_record_batch_bytes = 0; 97 | Ok(()) 98 | } 99 | 100 | pub async fn increment_and_cycle(&mut self) -> Result<(), anyhow::Error> { 101 | self.last_write_cycle = Instant::now(); 102 | if self.estimated_record_batch_bytes >= self.target_record_batch_bytes { 103 | self.finish_batch().await?; 104 | } 105 | Ok(()) 106 | } 107 | 108 | fn create_buf_writer(full_path: &str) -> Result { 109 | // TODO - better validation of URL/paths here and error handling 110 | if let Ok(url) = Url::parse(full_path) { 111 | let s3_store = AmazonS3Builder::from_env().with_url(url.clone()).build()?; 112 | let path = Path::parse(url.path())?; 113 | 114 | Ok(BufWriter::new(Arc::new(s3_store), path)) 115 | } else { 116 | let object_store = LocalFileSystem::new(); 117 | let absolute_path = absolute(full_path)?; 118 | let store_path = Path::from_absolute_path(absolute_path)?; 119 | 120 | Ok(BufWriter::new(Arc::new(object_store), store_path)) 121 | } 122 | } 123 | 124 | fn create_writer( 125 | buffer: BufWriter, 126 | compression: u8, 127 | max_row_group_rows: Option, 128 | ) -> Result, anyhow::Error> { 129 | let mut props_builder = WriterProperties::builder(); 130 | if compression == 0 { 131 | props_builder = props_builder.set_compression(Compression::UNCOMPRESSED); 132 | } else if compression > 0 && compression <= 22 { 133 | props_builder = props_builder 134 | .set_compression(Compression::ZSTD(ZstdLevel::try_new(compression as i32)?)); 135 | } 136 | if let Some(max_rows) = max_row_group_rows { 137 | props_builder = props_builder.set_max_row_group_size(max_rows); 138 | } 139 | let props = props_builder.build(); 140 | 141 | let writer = AsyncArrowWriter::try_new(buffer, Arc::new(osm_arrow_schema()), Some(props))?; 142 | Ok(writer) 143 | } 144 | 145 | fn create_full_path( 146 | output_path: &str, 147 | osm_type: &OSMType, 148 | filenum: &Arc>, 149 | compression: u8, 150 | ) -> String { 151 | let trailing_path = Self::new_trailing_path(osm_type, filenum, compression != 0); 152 | // Remove trailing `/`s to avoid empty path segment 153 | format!("{0}{trailing_path}", &output_path.trim_end_matches('/')) 154 | } 155 | 156 | fn new_trailing_path( 157 | osm_type: &OSMType, 158 | filenum: &Arc>, 159 | is_zstd_compression: bool, 160 | ) -> String { 161 | let mut num = filenum.lock().unwrap(); 162 | let compression_stem = if is_zstd_compression { ".zstd" } else { "" }; 163 | let path = format!( 164 | "/type={}/{}_{:04}{}.parquet", 165 | osm_type, osm_type, num, compression_stem 166 | ); 167 | *num += 1; 168 | path 169 | } 170 | 171 | pub fn add_node(&mut self, node: &Node) { 172 | let info = node.info(); 173 | let user = info.user().unwrap_or(Ok("")).unwrap_or("").to_string(); 174 | 175 | let est_size_bytes = self.osm_builder.append_row( 176 | node.id(), 177 | OSMType::Node, 178 | node.tags() 179 | .map(|(key, value)| (key.to_string(), value.to_string())), 180 | Some(node.lat()), 181 | Some(node.lon()), 182 | std::iter::empty(), 183 | std::iter::empty(), 184 | info.changeset(), 185 | info.milli_timestamp(), 186 | info.uid(), 187 | Some(user), 188 | info.version(), 189 | Some(info.visible()), 190 | ); 191 | self.estimated_record_batch_bytes += est_size_bytes; 192 | } 193 | 194 | pub fn add_dense_node(&mut self, node: &DenseNode) { 195 | let info = node.info(); 196 | let mut user: Option = None; 197 | if let Some(info) = info { 198 | user = Some(info.user().unwrap_or("").to_string()); 199 | } 200 | 201 | let est_size_bytes = self.osm_builder.append_row( 202 | node.id(), 203 | OSMType::Node, 204 | node.tags() 205 | .map(|(key, value)| (key.to_string(), value.to_string())), 206 | Some(node.lat()), 207 | Some(node.lon()), 208 | std::iter::empty(), 209 | std::iter::empty(), 210 | info.map(|info| info.changeset()), 211 | info.map(|info| info.milli_timestamp()), 212 | info.map(|info| info.uid()), 213 | user, 214 | info.map(|info| info.version()), 215 | info.map(|info| info.visible()), 216 | ); 217 | self.estimated_record_batch_bytes += est_size_bytes; 218 | } 219 | 220 | pub fn add_way(&mut self, way: &Way) { 221 | let info = way.info(); 222 | let user = info.user().unwrap_or(Ok("")).unwrap_or("").to_string(); 223 | 224 | let est_size_bytes = self.osm_builder.append_row( 225 | way.id(), 226 | OSMType::Way, 227 | way.tags() 228 | .map(|(key, value)| (key.to_string(), value.to_string())), 229 | None, 230 | None, 231 | way.refs(), 232 | std::iter::empty(), 233 | info.changeset(), 234 | info.milli_timestamp(), 235 | info.uid(), 236 | Some(user), 237 | info.version(), 238 | Some(info.visible()), 239 | ); 240 | self.estimated_record_batch_bytes += est_size_bytes; 241 | } 242 | 243 | pub fn add_relation(&mut self, relation: &Relation) { 244 | let info = relation.info(); 245 | let user = info.user().unwrap_or(Ok("")).unwrap_or("").to_string(); 246 | 247 | let members_iter = relation.members().map(|member| { 248 | let type_ = match member.member_type { 249 | RelMemberType::Node => OSMType::Node, 250 | RelMemberType::Way => OSMType::Way, 251 | RelMemberType::Relation => OSMType::Relation, 252 | }; 253 | 254 | let role = match member.role() { 255 | Ok(role) => Some(role.to_string()), 256 | Err(_) => None, 257 | }; 258 | (type_, member.member_id, role) 259 | }); 260 | 261 | let est_size_bytes = self.osm_builder.append_row( 262 | relation.id(), 263 | OSMType::Relation, 264 | relation 265 | .tags() 266 | .map(|(key, value)| (key.to_string(), value.to_string())), 267 | None, 268 | None, 269 | std::iter::empty(), 270 | members_iter, 271 | info.changeset(), 272 | info.milli_timestamp(), 273 | info.uid(), 274 | Some(user), 275 | info.version(), 276 | Some(info.visible()), 277 | ); 278 | self.estimated_record_batch_bytes += est_size_bytes; 279 | } 280 | } 281 | -------------------------------------------------------------------------------- /How to Contribute/images/breakdown.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 |
Work Package B
Work Package B
Work Package A
Work Package A
Epic A-1
Epic A-1
Epic A-2
Epic A-2
Epic A-3
Epic A-3
Epic B-1
Epic B-1
Version 1.0
Version 1.0
Version x.y
Version x.y
Version 1.1
Version 1.1
Stories
Stories
Tasks
Tasks
Epic
Epic

Work Package
Work Package
Viewer does not support full SVG 1.1
4 | -------------------------------------------------------------------------------- /How to Contribute/images/organigram.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
Technical Steering Committee
(SC)
Technical Steering Committee...
Committee Team
(CT 1)
Committee Team...
Committee Team
(CT 2)
Committee Team...
Working Group Name 1
(WG)
Working Group Name 1...
Committee Chair
Committee Chair
Committee Chair
Committee Chair
WG Chair
Vice-Chair
WG Chair...
WG Chair
Vice-Chair
WG Chair...
WG Chair
 Vice-Chair
WG Chair...
Charter per WG
(Approved by the SC)
Charter per WG...
Working Group Name 2
(WG)
Working Group Name 2...
Working Group Name 3
(WG)
Working Group Name 3...
Executive Director (ED)
Executive Director (...
Sub Working Group Name 1 
(SWG)
Sub Working Group Na...
Sub Working Group Name 2 
(SWG)
Sub Working Group Na...
SWG Convener
SWG Convener
SWG Convener
SWG Convener
SWG
(Does not need a charter)
SWG...
SWG
(Does not need a charter)
SWG...
Viewer does not support full SVG 1.1
-------------------------------------------------------------------------------- /How to Contribute/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Organization Operational Document 2 | 3 | ## Table of contents 4 | - [Meetings](#meetings) 5 | - [Technical Decision Making](#Technical Decision Making) 6 | - [Using Supermajority vote to achieve agreement](#Using Supermajority vote to achieve agreement) 7 | - [Approval Process](#Approval Process) 8 | - [Documentation](#Documentation) 9 | - [Publication](#Publication] 10 | - [Intellectual Property Rights](#Intellectual Property Rights) 11 | - [Certificate of Origin](#Certificate of Origin) 12 | 13 | ## Meetings 14 | 15 | ``` 16 | Note: Effective group meetings allow the Steering Committee, Committee Teams and Working Groups to discuss complex issues and talk through ideas and solutions. It is recommended to keep a Meeting Minutes record for each meeting. 17 | ``` 18 | * WGs are encouraged to schedule regular conference calls. 19 | * The Meetings MUST be announced at least, 7 days in advance for conference calls, and 1 month for face to face meetings. 20 | * All the Organization members are contractually bound to the IPR policy under terms of the Membership Application and these IPR Guidelines must be followed. 21 | * Meetings SHALL have an antitrust statement and an IPR call where a reminder of the IPR policy and the duties and obligations of members is provided. 22 | * A meeting attendee list MUST be produced for each meeting. This is necessary to determine which members can vote in a Supermajority vote. 23 | 24 | ### Meeting Agenda 25 | 26 | ``` 27 | Note: An effective Meeting Agenda enable teams to organize its topics and give a fair chance for every topic to be discussed. 28 | ``` 29 | * Please refer to the Organization Agenda & Meeting Minutes [Template]() 30 | 31 | ### Meeting Minutes 32 | 33 | ``` 34 | Note: It is important to record the key issues discussed during the meeting, motions proposed or voted, and activities to be undertaken. 35 | Also, it should be recorded meeting attendance, specially if there is a voting requirement associated with members attendance. 36 | 37 | It is recommended to define an Agenda & Meeting Minutes template. 38 | ``` 39 | * Please refer to the Organization Agenda & Meeting Minutes [Template]() 40 | 41 | ## Technical Decision Making 42 | ### Decision Making 43 | 44 | ``` 45 | Note: the following bullet points needs to be reviewed by the group: 3.1, 3.1 and 3.2 46 | This is to ensure that the fact of having memberships levels doesn't contradicts these rules 47 | ``` 48 | 49 | Note: from `[Organization_Abbreviation]` Scope & Governance 50 | 51 | ``` 52 | 2. Decision Making. 53 | 2.1. Consensus-Based Decision Making. Working Groups make decisions through a consensus process (“Approval” or “Approved”). While the agreement of all Participants is preferred, it is not required for consensus. Rather, the Maintainer will determine consensus based on their good faith consideration of a number of factors, including the dominant view of the Working Group Participants and nature of support and objections. The Maintainer will document evidence of consensus in accordance with these requirements. Consensus will not be deemed to have been met in the event of a sustained objection from one or more Working Group participants. 54 | 55 | 2.2. Appeal Process. Decisions may be appealed via a pull request or an issue, and that appeal will be considered by the Maintainer in good faith, who will respond in writing within a reasonable time. 56 | 57 | 3. Ways of Working. 58 | Inspired by American National Standards Institute’s (ANSI) Essential Requirements for Due Process, Community Specification Working Groups must adhere to consensus-based due process requirements. These requirements apply to activities related to the development of consensus for approval, revision, reaffirmation, and withdrawal of Community Specifications. Due process means that any person (organization, company, government agency, individual, etc.) with a direct and material interest has a right to participate by: a) expressing a position and its basis, b) having that position considered, and c) having the right to appeal. Due process allows for equity and fair play. The following constitute the minimum acceptable due process requirements for the development of consensus. 59 | 60 | 3.1. Openness. Participation shall be open to all persons who are directly and materially affected by the activity in question. There shall be no undue financial barriers to participation. Voting membership on the consensus body shall not be conditional upon membership in any organization, nor unreasonably restricted on the basis of technical qualifications or other such requirements. Membership in a Working Group’s parent organization, if any, may be required. 61 | 62 | 3.2. Lack of Dominance. The development process shall not be dominated by any single interest category, individual or organization. Dominance means a position or exercise of dominant authority, leadership, or influence by reason of superior leverage, strength, or representation to the exclusion of fair and equitable consideration of other viewpoints. 63 | 64 | 3.3. Balance. The development process should have a balance of interests. Participants from diverse interest categories shall be sought with the objective of achieving balance. 65 | 66 | 3.4. Coordination and Harmonization. Good faith efforts shall be made to resolve potential conflicts between and among deliverables developed under this Working Group and existing industry standards. 67 | 68 | 3.5. Consideration of Views and Objections. Prompt consideration shall be given to the written views and objections of all Participants. 69 | 70 | 3.6. Written procedures. This governance document and other materials documenting the Community Specification development process shall be available to any interested person. 71 | ``` 72 | 73 | As part of their responsibilities defined in [from WG Maintainers](#from-wg-maintainers), Maintainers need to ensure efficient and effective decision-making: 74 | * The decision making process in WGs is intended to be as inclusive as possible. 75 | * WGs shall attempt to use consensus to make decisions. 76 | * If consensus cannot be reached, voting mechanisms MAY be used. 77 | * Formal notice SHALL be given for decision making, e.g.: 78 | * Inclusion of a document on an agenda, proposing a specific decision to be taken (e.g. Pull Request). 79 | * Inclusion of an item directly in the agenda (e.g. proposed next meeting date). 80 | * Items proposed for approval via the group mailing list (e.g. agreement a document revision). 81 | * Inclusion of a document for decision in an electronic Review, Comment and Approval event 82 | * Inclusion of a document for decision in an e-vote (Supermajority) vote. 83 | 84 | > The above list is not exhaustive. 85 | 86 | * There SHALL be no distinction in the decision-making merit of real-time or non-real-time meetings. 87 | * In real-time meetings, consensus can be determined by receiving no sustained objections to a proposal. 88 | * In non-real-time meetings, consensus SHOULD be developed using Review, Comment and Agreement periods, e.g. using [Review and Approval](#[Organization_Abbreviation]-approval-process) 89 | * Proposals SHALL be available for a given period. 90 | 91 | 92 | ### Seeking Consensus 93 | * Groups shall endeavour to reach consensus on all decisions. 94 | * Informal methods of reaching consensus are encouraged (e.g. a show of hands). 95 | * Groups SHOULD attempt to ensure contributions relating to the same subject matter are considered together before being disposed. 96 | * However the Maintainer SHALL ensure that progress is not delayed by unavailable contributions or participants. 97 | * Agreement SHALL be sought in all forms of meeting. 98 | 99 | ### Handling objections when seeking consensus 100 | * Objections from a small minority SHOULD be minuted and the objecting delegates SHOULD be questioned if having their objections minuted is sufficient and they agree to not sustain their objections. 101 | * If such agreements are secured, then there is consensus for approving the proposal. 102 | * If such agreements are not secured, then the proposal is not agreed and further action SHALL be taken (e.g. the proposal is withdrawn, updated, or voted on). 103 | * Members are discouraged from sustaining their objections when it is clear that they would be overruled by a vote were one to take place. 104 | * In real-time meetings, consensus can be determined by receiving no sustained objections to a proposal. 105 | * Efforts to immediately resolve or record objections can be taken to attempt to achieve consensus. 106 | * Where attendance is sparse when viewed from normal participation levels, potentially controversial proposals SHOULD be made available to the broader membership. 107 | * The Maintainer is responsible for ensuring such opportunity for participation in the decision making process. 108 | * Sparsely attended meetings SHOULD NOT be used to drive through proposals that would not have broad support. 109 | * Following a decision-making meeting, a summary of decisions and document dispositions SHALL be published as soon as is practical. 110 | * This will be addressed if the meeting minutes are available in a timely fashion. 111 | * When there is insufficient time for review in a real-time meeting, non-real-time consensus approaches SHOULD be considered. 112 | * In non-real time meetings consensus SHOULD be developed by using [Review and Approval](#[Organization_Abbreviation]-approval-process) periods. 113 | * Using the group mailing list 114 | * Using GitHub "Review and Approval" label 115 | * Proposals SHALL be available for a given period. 116 | 117 | ## Using Supermajority vote to achieve agreement 118 | ### Phrasing of Voting Questions 119 | * The Maintainer ensures that questions to be voted upon SHALL be phrased in a concise and unambiguous manner. 120 | * Questions SHOULD NOT be phrased as the “The group SHALL not do xyz”. Examples of appropriate questions are: 121 | * SHALL the group agree the Specification? 122 | * SHALL the liaison be approved? 123 | * SHALL the new Work Package be approved? 124 | * SHALL the existing Work Package be stopped? 125 | * If the issue is to choose between two options (i.e. A or B), an example of the appropriate question may be: 126 | * SHALL the group agree Option A or Option B? 127 | * The option receiving no less than **3/4** of the Supermajority Votes SHALL be the decision of the group. 128 | * If the issue is to choose between three or more options, the group SHOULD use informal voting to reduce the number of options to two, and then use formal voting, if necessary. 129 | 130 | ### Voting on Technical Issues 131 | 132 | Note: Supermajority Vote 133 | 134 | ``` 135 | Note: Define Supermajority 136 | ``` 137 | * Before voting, a clear definition of the issues SHALL be provided by the Maintainer. 138 | * Members eligible to vote, SHALL only be entitled to one vote each. 139 | * Each member MAY cast its vote as often as it wishes, and the last vote it casts counts. 140 | * Voting MAY be performed electronically. 141 | * Voting MAY be performed by show of hands and members announcing their vote verbally one by one, or paper ballots. 142 | * The result of the vote SHALL be recorded in the meeting minutes. 143 | * Groups MAY use informal voting to reach consensus. If the Group is still unable to reach consensus, then a formal vote MAY be taken. 144 | * Each member’s electronic vote SHALL be electronically acknowledged to confirm participation in the vote. 145 | * The voting period for proposals are: 146 | * In-person-meetings require at least 30 days prior written notice 147 | * Teleconference meetings require at least 7 days prior written notice 148 | * Electronic voting MUST remain open for no less than 7 days. 149 | 150 | ## Approval Process 151 |
152 | Review & Approval 153 |
Review & Approval
154 |
155 | 156 | In the Standards Development Organizations (SDOs) the approval or rejection of a contribution follows a democratic process; **the majority**. This differs from an Open Source organization that normally follows a meritocratic process where the Maintainer decides what to accept of reject. If a person doesn’t like the decision that her contribution is rejected, then she can “fork” the project. 157 | 158 | The goal for an SDO is to reach interoperability, therefore “forking” is not the solution to a technical dispute. If there is a sustainable objection in a contribution the resolution is via a vote, see [Seeking Consensus](#seeking-consensus). 159 | 160 | The Review & Approval process implies that all the contributions need to be accepted by the Working Group. 161 | 162 | 163 | ### Review & Approval Process 164 | * **Review period**: 165 | * Period of time during which the contribution will be under review before being merged. 166 | * The period can be: 0, 1, 2, 3, 5, 7, 14 days 167 | * 0 days imply that the contribution is merged without Working Group review 168 | 169 | * **Comments or Objections**: 170 | * During the Review & Approval process members MAY raise **comments** or **objections**. 171 | * **Comments** MUST be taken in consideration by the Working Group, but they MAY be dismissed if they group thinks that are not relevant. 172 | 173 | * **Objections** MUST be taken in consideration and they cannot be dismissed by the Working Group without being reviewed. 174 | * If a contribution receives an **objection** the group MUST resolve the issue, with the person that raise the objection, before deciding the status of the contribution. If the **objection** is sustained, meaning the person doesn’t remove it, then the group will have to recur to a [vote](#voting-on-technical-issues) to resolve it. 175 | 176 | * **Approval Criteria**: 177 | * A contribution is considered **approved** and therefore it can be merged if: 178 | * The contribution has not received any sustainable **objection** during the review period, AND 179 | * At least 3 reviewers have indicated that they agree with the contribution 180 | * If a sustained **objection** is received, the contribution cannot be merged, even if 3 or more contributors agreed with the contribution. 181 | * If during the review period a contribution receives a **comment**, it is up to the group or maintainer to accept the comment or not. In any case, in order to merge the contribution at least 3 reviewers MUST indicate that they agree with the contribution. 182 | 183 | ## [Organization_Abbreviation] Process Flows 184 | 185 | ### Work Packages 186 | 187 |
188 | [Organization_Abbreviation] Work Units 189 |
[Organization_Abbreviation] Work Units
190 |
191 | 192 | 193 | #### Work Package 194 | * The Work Package (WP) SHALL describe the scope and expected deliverables and SHALL require WG approval 195 | * WPs are the means by which release packages (version x.y.z) are defined 196 | 197 | ##### Epics 198 | * It could be a feature, customer request or business requirement 199 | * It is recommendable to define a list of Epics that will be formed the release package for the corresponding Work Package 200 | * The WG SHOULD define a placeholder for each Epic with few lines of description 201 | * The Epics can be broken down in user stories and tasks which are not defined in detail at the creation of the Work Package 202 | 203 | ### Technical Specifications Life Cycle 204 | 205 | Note: from `[Organization_Abbreviation]` Scope & Governance 206 | ``` 207 | 4. Specification Development Process. 208 | 4.1. Pre-Draft. Any Participant may submit a proposed initial draft document as a candidate Draft Specification of that Working Group. The Maintainer will designate each submission as a “Pre-Draft” document. 209 | 210 | 4.2. Draft. Each Pre-Draft document of a Working Group must first be Approved to become a ”Draft Specification”. Once the Working Group approves a document as a Draft Specification, the Draft Specification becomes the basis for all going forward work on that specification. 211 | 212 | 4.3. Working Group Approval. Once a Working Group believes it has achieved the objectives for its specification as described in the Scope, it will submit it to the Steering Committee for its approval. Any Draft Specification approved by vote of the Steering Committee becomes an “Approved Specification”. 213 | 214 | 4.4. Publication and Submission. Upon the designation of a Draft Specification as an Approved Specification by the Steering Committee, the Maintainer will publish the Approved Specification in a manner agreed upon by the Steering Committee (i.e., Working Group Participant only location, publicly available location, Working Group maintained website, Working Group member website, etc.). The publication of an Approved Specification in a publicly accessible manner must include the terms under which the Approved Specification is being made available. 215 | 216 | 4.5. Submissions to Standards Bodies. The Governing Board of the LF Energy Foundation (the “Governing Board”) may submit a Draft Specification or Approved Specification to another standards development organization by vote. No Draft Specification or Approved Specification may be submitted to another standards development organization without the vote of the Governing Board. Upon an affirmative vote of the Governing Board regarding such a submission, the applicable Maintainer or Maintainers, or any other individuals so directed by the Governing Board, will coordinate the submission of the applicable Draft Specification or Approved Specification to the other standards development organization as directed by the Governing Board. Working Group Participants that developed that Draft Specification or Approved Specification agree to grant the copyright rights necessary to make those submissions. 217 | 218 | 4.6 Steering Committee. The Steering Committee is responsible for (a) approval of any Draft Specification as an Approved Specification and (b) alignment among each of the Working Groups of the [Organization_Name] project. 219 | 220 | 4.7. Voting of the Steering Committee and Strategy Committee. In any vote or Approval before the Steering Committee or Strategy Committee the affirmative vote of at least 50% of the voting members of the Steering Committee or Strategy Committee. The voting members of the Steering Committee and Strategy Committee consist of one appointee from each General Member and each Strategic Member of the LF Energy Foundation of the Linux Foundation. 221 | ``` 222 |
223 | Specifications Life Cycle 224 |
Specifications Life Cycle
225 |
226 | 227 | In this section the diagram below depictures the development phases of technical documents. 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 243 | 244 | 245 | 246 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 283 | 284 | 285 |
Technical Specifications Development Phases
PhaseDescription
Work PackageIn this phase the group agrees the scope of the work to be developed.
240 | Any member can provide a new Work Package proposal, the document is discussed among the group and further elaborated. 241 | The group will vote whether the Work Package is formally approved and endorsed by the majority of the group or rejected.
242 | If the proposal is approved, the Work Package is moved to the next phase, Technical Development.
DevelopmentA Technical Specification MAY be composed of one or more documents: 247 |
    248 |
  • Requirements Document, (RD)
  • 249 |
    • It contains the business requirements (non technical requiremnts). The business requirements are derived from the Use Cases described in the RD document.
    250 |
  • Architecture Document, (AD)
  • 251 |
    • Document that describes all functional elements of the system and its interfaces or reference points.
    252 |
  • Technical Specification Document(s), (TS)
  • 253 |
    • It refers to a set of documented requirements to be satisfied by a material, design, product, or service. It helps to understand the configuration and architecture of a system.
    254 |
  • Supporting Document(s), (SUP)
  • 255 |
    • Contains profile data, metadata, schemas, etc.
    256 |
257 | Note: in some cases the group MAY agree to develop a single document that contains the above list as sections. 258 | Each document will follow the phases described in the above diagram. 259 |
Consistency ReviewIn this phase, the document(s) developed by the WG are formally reviewed by the group. A Review period is open for members to submit their comments. After this period, the Working Group will address the issues received.
WG ApprovalOnce the WG completes the Consistency Review the document(s) MUST be agreed by the WG (in a Review & Approval) before sending the document(s) to the Steering Committee for formal Ratification.
RatificationOnce the WG approves the document(s), the document(s) are sent to the Steering Committee for Ratification.
Publication | MaintenanceUpon Steering Committee Ratification, the document(s) are ready for Publication. 276 |
    277 |
  • To publish the document(s), the Maintainer will create a new Release Tag. 278 |
  • A new Release Tag will be produce with the content in the "main" branch and stored in the Release section of the GitHub repository.
  • 279 |
  • The WG SHOULD open a *dialogue* with the public via GitHub Discussions.
  • 280 |
  • The input collected during the Maintenance phase SHOULD be used to improve the Technical Specifications as well as to collect business requirements for future releases.
  • 281 |
282 |
286 | 287 |
288 | [Organization_Abbreviation] Technical Specifications Development Phases 289 |
[Organization_Abbreviation] Technical Specifications Development Phases
290 |
291 | 292 | ### GitHub Flows 293 | It is suggested to follow the principles of [Trunk Based Development](https://trunkbaseddevelopment.com/) whenever is possible. 294 | 295 |
296 | [Organization_Abbreviation] Git Flow 297 |
[Organization_Abbreviation] GitHub Flow
298 |
299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 |
GitHub Work Flow - Public Repositories
BranchDescription
Rel vX.Y.Z Release-tag contain all the different versions of the Technical Specifications that have been approved by the Working Group and ratified by the Technical Steering Committee. The name of the release tag will follow [Semantic Versioning](#semantic-versioning) principles.
mainThis branch contains the latest version of the Technical specification approved by the Working Group. Its content will be moved into a release-tag after the Consistency Review and Technical Steering Committee Ratification phases.
319 | 320 | ### GitHub Access Rights 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 |
GitHub Access Rights
RoleAccess Rights
ParticipantsTRIAGE - Can read and clone this repository. Can also manage issues and pull requests.
EditorsWRITE - Can read, clone, and push to this repository. Can also manage issues and pull requests.
MaintainerADMINISTRATOR - Can read, clone, and push to this repository. They can also manage issues, pull requests, and some repository settings.
344 | 345 | ## Publication 346 | There are at least three different options for publishing content using GitHub: 347 |
348 | Publication 349 |
Publication
350 |
351 | 352 | ## Documentation 353 | ### Semantic Versioning 354 | 355 |
356 | Semantic Versioning 357 |
Semantic Versioning
358 |
359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 389 | 390 | 391 |
Document Version
FieldUseDescription
XMajor Version IndicatorThis mandatory field SHALL identify the major version of the document as determined by the WG. 374 | Major versions contain major feature additions, MAY contain incompatibilities with previous document or specification revisions, and MAY change, drop, or replace existing interfaces. Initial releases are “1_0”. 375 |
YMinor Version IndicatorMinor version of the document. This mandatory field SHALL identify the minor version of the document. It is incremented every time a minor change is made to the approved document version. Minor versions MAY contain minor feature additions, be compatible with the preceding Major_Minor specification revision, and MAY provide evolving interfaces. The initial minor release for any major release is “0”, i.e. 1_0
ZService IndicatorService indicator for the document. Incremented every time a corrective update is made to the Approved (not draft) document version by the WG. 386 | This field is OPTIONAL, and SHALL be provided whenever a service release of the document is made. The first service indicator release SHALL be “_1” for any Major_Minor release. 387 | Service indicators are intended to be compatible with the Major_Minor release they relate to but add bug fixes. No new functions will be added through the release of Service Indicators. 388 |
392 | 393 | ## Intellectual Property Rights 394 | 395 | ### Copyright 396 | This section provides a recommendation based on the best practices implemented by other projects. 397 | 398 | Most LF project communities do not require or recommend that every contributor include their copyright notice in contributed files. 399 | 400 | Instead, many LF Project communities recommend using a more general statement in a form similar to the following: (choose one) 401 | 402 | * `Copyright The [Organization_Name] Authors.` 403 | * `Copyright The [Organization_Name] Contributors.` 404 | * `Copyright Contributors to the XYZ project.` 405 | 406 | These statements are intended to communicate the following: 407 | 408 | * the work is copyrighted; 409 | * the contributors of the code licensed it, but retain ownership of their copyrights; and 410 | * it was licensed for distribution as part of the `[Organization_Name]`. 411 | 412 | With any of the above statements, the project avoids having to maintain lists of names of the authors or copyright holders, years or ranges of years, and variations on the (c) symbol. 413 | This aims to minimize the burden on the contributors and maintainers. 414 | 415 | This section provides a recommended format for ease of use, but it is not mandated. 416 | 417 | > Note: You may consider discussing with your legal department whether they require you to include a copyright notice identifying the employer as the copyright holder in contributions. Many of LF members' legal departments have already approved the above 418 | recommended practice. 419 | 420 | #### Reasons to Avoid Listing Copyright Holders 421 | These are some of the reasons why `[Organization_Abbreviation]` does not recommend trying to list every copyright holder for contributions to every file: 422 | 423 | * Copyright notices are not mandatory in order for the contributor to retain ownership of their copyright. 424 | * Copyright notices are rarely kept up to date as documentation evolves, resulting inaccurate statements. 425 | * Trying to keep notices up to date, or to correct notices that have become inaccurate, increases the burden on editors and maintainers without tangible benefit. 426 | * Editors and maintainers often do not want to have to worry about e.g. whether a minor contribution (such as a type fix) means that a new copyright notice should be added. 427 | 428 | #### Other Copyright Rules 429 | * If your contribution contains content from a third party source who didn't contribute it themselves, then you should not add the notice above. 430 | * You should not change or remove someone else's copyright notice unless they have expressly (in writting) permitted you to do so. 431 | 432 | ### Licenses 433 | This section provides a recommendation on how to communicate software or document licenses information in a project. 434 | 435 | #### Software Code Licenses 436 | 437 | Ideally, the project SHOULD communicate the software license information via three different metods: 438 | 439 | * In the README file 440 | * Inside of the repository with a ```License.txt``` document 441 | * Inside of each code file created by the group 442 | 443 | #### Statement in README File 444 | Insert in the README file the MIT License: 445 | 446 | ``` 447 | ![APM license](https://img.shields.io/badge/License-MIT-brightgreen) 448 | 449 | ``` 450 | 451 | The README file will display: 452 | 453 | * ![APM license](https://img.shields.io/badge/License-MIT-brightgreen) 454 | 455 | In addition, it is recommended to include a plain text statement of the license in the README file, for accessibility purposes as well as enabling parsing by automated tooling. This can be done by including a "License" section with: 456 | 457 | * This project is licensed under the MIT license. 458 | 459 | #### License File in the Repository 460 | Insert in the repository a file called ```License.txt```. 461 | 462 | The Maintainer can copy the corresponding license file from the [templates/license]() repository and upload it to the project repository. 463 | 464 | #### License Reference in each Source Code File 465 | The recommendation is that projects SHOULD use [SPDX short-form license identifiers](https://spdx.dev/ids/) in all source code and documentation files that are **original to the project**. 466 | 467 | Each source code created by the project SHOULD have one of these SPDX license identifiers: (depending on the type of source code license allocated to the project) 468 | 469 | * **for an MIT license:** 470 | 471 | ``` 472 | # SPDX-License-Identifier: MIT 473 | # Copyright Contributors to the [Organization_Name] 474 | ``` 475 | 476 | If the project needs to include source code or documents from a different upstream project, the recommendation is to retain those files in **unmodified form** _**(don't add identifiers)**_. 477 | 478 | Also consider to: 479 | 480 | * keep these files in sync with the upstream project 481 | * ask the upstream project to insert the identifiers on their source code files / documents. 482 | 483 | ### [Organization_Name] Software License Policy 484 | This policy is intended to assist `[Organization_Name]` Technical Working Groups to handle Software Licenses in the Projects. 485 | 486 | #### Recommended SafeGuards 487 | **1. Escalation Path** 488 | 489 | - Any question about licensing should be resolved by the Working Group (WG), if the WG cannot resolve it, then the question can be sent to the Technical Steering Committee (TSC) 490 | - LF doesn’t provide legal advice or comments about license compatibility (unless LF identifies some clear incompatibilities) 491 | - The Steering members may need to involve their Legal Counsel to make a license decision 492 | - Only the TSC can decide if a component created by the Project can be delivered under a different license than the Project License 493 | 494 | **2. Linked Libraries & 3 Party Software** 495 | 496 | - It is not recommended to pull software code, under different license than the Project License, into the project repository. Use linked libraries instead. 497 | - If 3rd party software is embedded, it should be under the Project License. If different licenses are used, then create a NOTICE file listing all the 3rd party license notice. 498 | - As a rule, if a software code developed by `[Organization_Name]` has an external dependency to a code distributed under GPL 2.0, then `[Organization_Name]` members need to consult with their legal counsel to decide under what license the `[Organization_Name]` software code should be released. In other words, if the code developed by `[Organization_Name]` doesn’t work without the reference to the external code under GPL 2.0, then the license to release the `[Organization_Name]` code should be evaluated. 499 | 500 | **3. License Compatibility** 501 | 502 | - Any upstream license needs to be compatible with the Project License 503 | - Any copyleft license inserted in a project repository needs to be flagged to the Organization Team 504 | 505 | **4. Binary Distribution** 506 | 507 | - It is a good practice to point users to the libraries so they can compile them on their own 508 | - If the group decides to ship binaries, the binaries should be ONLY for the code developed under the Project License. 509 | - If there are any other binaries under different license, then each binary should be distributed in its own files. Binaries under a license different than the Project License CANNOT packed with the same binaries than the ones created by the group 510 | 511 | ### Technical Document License 512 | In projects where the main deliverables are technical documents, each document MUST have a legal disclaimer. 513 | 514 | The legal disclaimer to insert in each project document SHOULD be: 515 | 516 | ``` 517 | © `[Organization_Abbreviation]` 2022, All rights reserved. 518 | 519 | “THESE MATERIALS ARE PROVIDED “AS IS.” The parties expressly disclaim any warranties 520 | (express, implied, or otherwise), including implied warranties of merchantability, non-infringement, 521 | fitness for a particular purpose, or title, related to the materials. The entire risk as to 522 | implementing or otherwise using the materials is assumed by the implementer and user. 523 | 524 | IN NO EVENT WILL THE PARTIES BE LIABLE TO ANY OTHER PARTY FOR LOST PROFITS OR ANY FORM OF 525 | INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER FROM ANY CAUSES 526 | OF ACTION OF ANY KIND WITH RESPECT TO THIS DELIVERABLE OR ITS GOVERNING AGREEMENT, WHETHER 527 | BASED ON BREACH OF CONTRACT, TORT (INCLUDING NEGLIGENCE), OR OTHERWISE, AND WHETHER OR NOT 528 | THE OTHER MEMBER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.” 529 | ``` 530 | ## :medal_sports: Certificate of Origin 531 | 532 | *Developer's Certificate of Origin 1.1* 533 | 534 | By making a contribution to this project, I certify that: 535 | 536 | > 1. The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or 537 | > 1. The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or 538 | > 1. The contribution was provided directly to me by some other person who certified (1), (2) or (3) and I have not modified it. 539 | > 1. I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is 540 | > 2. 541 | 542 | ### Source Code 543 | Working Group Participants contributing source code to this Working Group agree that those source code contributions are subject to the Developer Certificate of Origin version 1.1, available at http://developercertificate.org/, the license indicated below, and any policies and governance rules included in the source code’s repository. Source code may not be a required element of an Approved Deliverable specification. - [See DCO](https://github.com/OvertureMaps/template-repo/blob/main/dco.md) 544 | 545 | ### Dataset 546 | Working Group contributing data to a dataset to this Working Group agree that those data contributions are subject to the license 547 | indicated below. The dataset may not be a required element of an Approved Deliverable specification. 548 | ``` 549 | Data contributed to ODBL licensed datasets will be contributed under both the ODBL and CDLA permissive v2. 550 | Contributions to CDLA permissive v2 datasets will be contributed under the CDLA permissive v2. 551 | ``` 552 | 553 | ### Patent Licensing 554 | Each Working Group must specify the patent mode under which it will operate prior to initiating any work on any Draft Deliverable or Approved Deliverable other than source code or datasets. The patent mode for this Working Group is; 555 | ``` 556 | No Patent License. No patent licenses are granted for the Draft Deliverables or Approved Deliverables developed by this Working Group. 557 | ``` 558 | --------------------------------------------------------------------------------