├── .gitignore ├── CHANGELOG.md ├── LICENSE ├── README.rst ├── conflate ├── __init__.py ├── __main__.py ├── conflate.py ├── conflator.py ├── data.py ├── dataset.py ├── geocoder.py ├── osm.py ├── places.bin ├── profile.py └── version.py ├── filter ├── CMakeLists.txt ├── FindOsmium.cmake ├── FindProtozero.cmake ├── README.md ├── RTree.h ├── filter_planet_by_cats.cpp └── xml_centers_output.hpp ├── profiles ├── auchan_moscow.py ├── azbuka.py ├── burgerking.py ├── minkult.py ├── moscow_addr.py ├── moscow_parkomats.py ├── navads_shell.py ├── navads_shell_json.py ├── rosinter.py ├── schocoladnitsa.py ├── velobike.py └── yandex_parser.py ├── scripts ├── README.md └── pack_places.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.pyc 3 | *.user 4 | private/ 5 | data/ 6 | dist/ 7 | __pycache__/ 8 | *.egg* 9 | build/ 10 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # OSM Conflator Change Log 2 | 3 | ## master branch 4 | 5 | ## 1.4.1 6 | 7 | _Released 2019-06-04_ 8 | 9 | * Fixed an error when the query is pure regexp and it did not match anything. 10 | 11 | ## 1.4.0 12 | 13 | _Released 2018-05-30_ 14 | 15 | * Refactored `conflate.py` into seven smaller files. 16 | * Added a simple kd-tree based geocoder for countries and regions. Controlled by the `regions` parameter in a profile. 17 | * You can filter by regions using `-r` argument or `"regions"` list in an audit file. 18 | * Using the new `nwr` query type of Overpass API. 19 | * Reduced default `max_request_boxes` to four. 20 | * New argument `--alt-overpass` to use Kumi Systems' server (since the main one is blocked in Russia). 21 | * Better handling of server runtime errors. 22 | * Find matches in OSM with `--list `. 23 | * Control number of nearest points to check for matches with `nearest_points` profile parameter. 24 | * When you have dataset ID in an URL or other tag, use `find_ref` profile function to match on it. 25 | 26 | ## 1.3.3 27 | 28 | _Released 2018-04-26_ 29 | 30 | * Fixed processing of `''` tag value. 31 | * More that 3 duplicate points in a single place are processed correctly. 32 | * Now you can `yield` points from a profile instead of making a list. 33 | * Not marking nodes with `move` in the audit file as modified, unless we move them. 34 | 35 | ## 1.3.2 36 | 37 | _Released 2018-04-19_ 38 | 39 | * Fixed bug in categories building. 40 | * Fixed threshold for tags in duplicates check. 41 | * Now the script prints "Done" when finished, to better measure time. 42 | 43 | ## 1.3.1 44 | 45 | _Released 2018-03-20_ 46 | 47 | * "Similar tags" now means at least 66% instead of 50%. 48 | * Instead of removing all duplicates, conflating them and removing only unmatched. 49 | 50 | ## 1.3.0 51 | 52 | _Released 2018-03-15_ 53 | 54 | * Support for categories: `category_tag` and `categories` parameters in a profile. 55 | * LibOsmium-based C++ filtering script for categories. 56 | * More than one tag value works as "one of": `[('amenity', 'cafe', 'restaurant')]`. 57 | * Query can be a list of queries, providing for "OR" clause. An example: 58 | 59 | `[[('amenity', 'swimming_pool')], [('leisure', 'swimming_pool')]]` 60 | 61 | * Parameters for profiles, using `-p` argument. 62 | * No more default imports solely for profiles, import `zipfile` youself now. 63 | * Remarks for source points, thanks [@nixi](https://github.com/hixi). 64 | * Better error message for Overpass API timeouts. 65 | * Lifecycle prefixes are conflated, e.g. `amenity=*` and `was:amenity=*`. 66 | * Dataset is checked for duplicates, which are reported (see `-d`) and removed. 67 | * Support GeoJSON input (put identifiers into `id` property). 68 | 69 | ## 1.2.3 70 | 71 | _Released 2017-12-29_ 72 | 73 | * Fix error in applying audit json after conflating `contact:` namespace. 74 | 75 | ## 1.2.2 76 | 77 | _Released 2017-12-27_ 78 | 79 | * Addr:full tag is not set when addr:housenumber is present. 80 | * Whitespace is stripped from tag values in a dataset. 81 | * Conflate `contact:` namespace. 82 | 83 | ## 1.2.1 84 | 85 | _Released 2017-12-20_ 86 | 87 | * Support force creating points with `audit['create']`. 88 | * Fix green colour for created points in JSON. 89 | * Make `--output` optional and remove the default. 90 | 91 | ## 1.2.0 92 | 93 | _Released 2017-11-23_ 94 | 95 | * Checking moveability for json output (`-m`) for cf_audit. 96 | * Support for cf_audit json (`-a`). 97 | 98 | ## 1.1.0 99 | 100 | _Released 2017-10-06_ 101 | 102 | * Use `-v` for debug messages and `-q` to suppress informational messages. 103 | * You can run `conflate/conflate.py` as a script, again. 104 | * Profiles: added "override" dict with dataset id → OSM POI name or id like 'n12345'. 105 | * Profiles: added "matched" function that returns `False` if an OSM point should not be matched to dataset point (fixes [#6](https://github.com/mapsme/osm_conflate/issues/6)). 106 | * Profiles: `master_tags` is no longer mandatory. 107 | * If no `master_tags` specified in a profile, all tags are now considered non-master. 108 | * When a tag value was `None`, the tag was deleted on object modification. That should be done only on retagging non-matched objects. 109 | * OSM objects filtering failed when a query was a string. 110 | 111 | ## 1.0.0 112 | 113 | _Released 2017-06-07_ 114 | 115 | The initial PyPi release with all the features. 116 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | OSM Conflator 2 | ============= 3 | 4 | This is a script for merging points from some third-party source with 5 | OpenStreetMap data. Please make sure the license allows that. After 6 | merging and uploading, the data can be updated. 7 | 8 | See `the OSM wiki page`_ for detailed description and instructions. 9 | 10 | Installation 11 | ------------ 12 | 13 | Run 14 | ``pip install osm_conflate``. 15 | 16 | Profiles 17 | -------- 18 | 19 | Each source should have a profile. It is a python script with variables 20 | configuring names, tags and processing. See heavily commented examples 21 | in the ``profiles`` directory. 22 | 23 | Usage 24 | ----- 25 | 26 | For a simplest case, run: 27 | 28 | :: 29 | 30 | conflate -o result.osm 31 | 32 | You might want to add other arguments, 33 | to pass a dataset file or prepare a preview GeoJSON. Run 34 | ``conflate -h`` to see a list of arguments. 35 | 36 | Uploading to OpenStreetMap 37 | -------------------------- 38 | 39 | It is recommended to open the resulting file in the JOSM editor and 40 | manually check the changes. Alternatively, you can use 41 | `bulk\_upload.py`_ to upload a change file from the command line. 42 | 43 | Please mind the `Import Guidelines`_, or your work may be reverted. 44 | 45 | License 46 | ------- 47 | 48 | Written by Ilya Zverev for MAPS.ME. Published under the Apache 2.0 49 | license. 50 | 51 | .. _the OSM wiki page: https://wiki.openstreetmap.org/wiki/OSM_Conflator 52 | .. _bulk\_upload.py: https://wiki.openstreetmap.org/wiki/Bulk_upload.py 53 | .. _Import Guidelines: https://wiki.openstreetmap.org/wiki/Import/Guidelines 54 | 55 | -------------------------------------------------------------------------------- /conflate/__init__.py: -------------------------------------------------------------------------------- 1 | try: 2 | from lxml import etree 3 | except ImportError: 4 | import xml.etree.ElementTree as etree 5 | from .data import SourcePoint 6 | from .conflate import run 7 | from .version import __version__ 8 | from .profile import Profile, ProfileException 9 | from .conflator import OsmConflator 10 | -------------------------------------------------------------------------------- /conflate/__main__.py: -------------------------------------------------------------------------------- 1 | from . import run 2 | 3 | run() 4 | -------------------------------------------------------------------------------- /conflate/conflate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import argparse 3 | import csv 4 | import json 5 | import logging 6 | import os 7 | import sys 8 | from .geocoder import Geocoder 9 | from .profile import Profile 10 | from .conflator import OsmConflator, TITLE 11 | from .dataset import ( 12 | read_dataset, 13 | add_categories_to_dataset, 14 | transform_dataset, 15 | check_dataset_for_duplicates, 16 | add_regions, 17 | ) 18 | 19 | 20 | def write_for_filter(profile, dataset, f): 21 | def query_to_tag_strings(query): 22 | if isinstance(query, str): 23 | raise ValueError('Query string for filter should not be a string') 24 | result = [] 25 | if not isinstance(query[0], str) and isinstance(query[0][0], str): 26 | query = [query] 27 | for q in query: 28 | if isinstance(q, str): 29 | raise ValueError('Query string for filter should not be a string') 30 | parts = [] 31 | for part in q: 32 | if len(part) == 1: 33 | parts.append(part[0]) 34 | elif part[1] is None or len(part[1]) == 0: 35 | parts.append('{}='.format(part[0])) 36 | elif part[1][0] == '~': 37 | raise ValueError('Cannot use regular expressions in filter') 38 | elif '|' in part[1] or ';' in part[1]: 39 | raise ValueError('"|" and ";" symbols is not allowed in query values') 40 | else: 41 | parts.append('='.join(part)) 42 | result.append('|'.join(parts)) 43 | return result 44 | 45 | def tags_to_query(tags): 46 | return [(k, v) for k, v in tags.items()] 47 | 48 | categories = profile.get('categories', {}) 49 | p_query = profile.get('query', None) 50 | if p_query is not None: 51 | categories[None] = {'query': p_query} 52 | cat_map = {} 53 | i = 0 54 | try: 55 | for name, query in categories.items(): 56 | for tags in query_to_tag_strings(query.get('query', tags_to_query(query.get('tags')))): 57 | f.write('{},{},{}\n'.format(i, name or '', tags)) 58 | cat_map[name] = i 59 | i += 1 60 | except ValueError as e: 61 | logging.error(e) 62 | return False 63 | f.write('\n') 64 | for d in dataset: 65 | if d.category in cat_map: 66 | f.write('{},{},{}\n'.format(d.lon, d.lat, cat_map[d.category])) 67 | return True 68 | 69 | 70 | def run(profile=None): 71 | parser = argparse.ArgumentParser( 72 | description='''{}. 73 | Reads a profile with source data and conflates it with OpenStreetMap data. 74 | Produces an JOSM XML file ready to be uploaded.'''.format(TITLE)) 75 | if not profile: 76 | parser.add_argument('profile', type=argparse.FileType('r'), 77 | help='Name of a profile (python or json) to use') 78 | parser.add_argument('-i', '--source', type=argparse.FileType('rb'), 79 | help='Source file to pass to the profile dataset() function') 80 | parser.add_argument('-a', '--audit', type=argparse.FileType('r'), 81 | help='Conflation validation result as a JSON file') 82 | parser.add_argument('-o', '--output', type=argparse.FileType('w'), 83 | help='Output OSM XML file name') 84 | parser.add_argument('-p', '--param', 85 | help='Optional parameter for the profile') 86 | parser.add_argument('--osc', action='store_true', 87 | help='Produce an osmChange file instead of JOSM XML') 88 | parser.add_argument('--osm', 89 | help='Instead of querying Overpass API, use this unpacked osm file. ' + 90 | 'Create one from Overpass data if not found') 91 | parser.add_argument('-c', '--changes', type=argparse.FileType('w'), 92 | help='Write changes as GeoJSON for visualization') 93 | parser.add_argument('-m', '--check-move', action='store_true', 94 | help='Check for moveability of modified modes') 95 | parser.add_argument('-f', '--for-filter', type=argparse.FileType('w'), 96 | help='Prepare a file for the filtering script') 97 | parser.add_argument('-l', '--list', type=argparse.FileType('w'), 98 | help='Print a CSV list of matches') 99 | parser.add_argument('-d', '--list_duplicates', action='store_true', 100 | help='List all duplicate points in the dataset') 101 | parser.add_argument('-r', '--regions', 102 | help='Conflate only points with regions in this comma-separated list') 103 | parser.add_argument('--alt-overpass', action='store_true', 104 | help='Use an alternate Overpass API server') 105 | parser.add_argument('-v', '--verbose', action='store_true', 106 | help='Display debug messages') 107 | parser.add_argument('-q', '--quiet', action='store_true', 108 | help='Do not display informational messages') 109 | options = parser.parse_args() 110 | 111 | if (not options.output and not options.changes and 112 | not options.for_filter and not options.list): 113 | parser.print_help() 114 | return 115 | 116 | if options.verbose: 117 | log_level = logging.DEBUG 118 | elif options.quiet: 119 | log_level = logging.WARNING 120 | else: 121 | log_level = logging.INFO 122 | logging.basicConfig(level=log_level, format='%(asctime)s %(message)s', datefmt='%H:%M:%S') 123 | logging.getLogger("requests").setLevel(logging.WARNING) 124 | logging.getLogger("urllib3").setLevel(logging.WARNING) 125 | 126 | if not profile: 127 | logging.debug('Loading profile %s', options.profile) 128 | profile = Profile(profile or options.profile, options.param) 129 | 130 | audit = None 131 | if options.audit: 132 | audit = json.load(options.audit) 133 | 134 | geocoder = Geocoder(profile.get_raw('regions')) 135 | if options.regions: 136 | geocoder.set_filter(options.regions) 137 | elif audit and audit.get('regions'): 138 | geocoder.set_filter(audit.get('regions')) 139 | 140 | dataset = read_dataset(profile, options.source) 141 | if not dataset: 142 | logging.error('Empty source dataset') 143 | sys.exit(2) 144 | transform_dataset(profile, dataset) 145 | add_categories_to_dataset(profile, dataset) 146 | check_dataset_for_duplicates(profile, dataset, options.list_duplicates) 147 | add_regions(dataset, geocoder) 148 | logging.info('Read %s items from the dataset', len(dataset)) 149 | 150 | if options.for_filter: 151 | if write_for_filter(profile, dataset, options.for_filter): 152 | logging.info('Prepared data for filtering, exitting') 153 | return 154 | 155 | conflator = OsmConflator(profile, dataset, audit) 156 | conflator.geocoder = geocoder 157 | if options.alt_overpass: 158 | conflator.set_overpass('alt') 159 | if options.osm and os.path.exists(options.osm): 160 | with open(options.osm, 'r') as f: 161 | conflator.parse_osm(f) 162 | else: 163 | conflator.download_osm() 164 | if len(conflator.osmdata) > 0 and options.osm: 165 | with open(options.osm, 'w') as f: 166 | f.write(conflator.backup_osm()) 167 | logging.info('Downloaded %s objects from OSM', len(conflator.osmdata)) 168 | 169 | conflator.match() 170 | 171 | if options.output: 172 | diff = conflator.to_osc(not options.osc) 173 | options.output.write(diff) 174 | 175 | if options.changes: 176 | if options.check_move: 177 | conflator.check_moveability() 178 | fc = {'type': 'FeatureCollection', 'features': conflator.changes} 179 | json.dump(fc, options.changes, ensure_ascii=False, sort_keys=True, indent=1) 180 | 181 | if options.list: 182 | writer = csv.writer(options.list) 183 | writer.writerow(['ref', 'osm_type', 'osm_id', 'lat', 'lon', 'action']) 184 | for row in conflator.matches: 185 | writer.writerow(row) 186 | 187 | logging.info('Done') 188 | -------------------------------------------------------------------------------- /conflate/conflator.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import kdtree 3 | from collections import defaultdict 4 | from .data import OSMPoint 5 | from .version import __version__ 6 | from .osm import OsmDownloader, check_moveability 7 | from . import etree 8 | 9 | 10 | TITLE = 'OSM Conflator ' + __version__ 11 | CONTACT_KEYS = set(('phone', 'website', 'email', 'fax', 'facebook', 'twitter', 'instagram')) 12 | LIFECYCLE_KEYS = set(('amenity', 'shop', 'tourism', 'craft', 'office')) 13 | LIFECYCLE_PREFIXES = ('proposed', 'construction', 'disused', 'abandoned', 'was', 'removed') 14 | 15 | 16 | class OsmConflator: 17 | """The main class for the conflator. 18 | 19 | It receives a dataset, after which one must call either 20 | "download_osm" or "parse_osm" methods. Then it is ready to match: 21 | call the "match" method and get results with "to_osc". 22 | """ 23 | def __init__(self, profile, dataset, audit=None): 24 | self.dataset = {p.id: p for p in dataset} 25 | self.audit = audit or {} 26 | self.osmdata = {} 27 | self.matched = [] 28 | self.changes = [] 29 | self.matches = [] 30 | self.profile = profile 31 | self.geocoder = None 32 | self.downloader = OsmDownloader(profile) 33 | self.source = self.profile.get( 34 | 'source', required='value of "source" tag for uploaded OSM objects') 35 | self.add_source_tag = self.profile.get('add_source', False) 36 | if self.profile.get('no_dataset_id', False): 37 | self.ref = None 38 | else: 39 | self.ref = 'ref:' + self.profile.get( 40 | 'dataset_id', required='A fairly unique id of the dataset to query OSM') 41 | 42 | def set_overpass(self, server='alt'): 43 | self.downloader.set_overpass(server) 44 | 45 | def download_osm(self): 46 | bboxes = self.downloader.calc_boxes(self.dataset.values()) 47 | self.osmdata = self.downloader.download(bboxes) 48 | 49 | def parse_osm(self, fileobj): 50 | self.osmdata = self.downloader.parse_xml(fileobj) 51 | 52 | def register_match(self, dataset_key, osmdata_key, keep=False, retag=None): 53 | """Registers a match between an OSM point and a dataset point. 54 | 55 | Merges tags from an OSM Point and a dataset point, and add the result to the 56 | self.matched list. 57 | If dataset_key is None, deletes or retags the OSM point. 58 | If osmdata_key is None, adds a new OSM point for the dataset point. 59 | """ 60 | def get_osm_key(k, osm_tags): 61 | """Conflating contact: namespace.""" 62 | if k in CONTACT_KEYS and k not in osm_tags and 'contact:'+k in osm_tags: 63 | return 'contact:'+k 64 | elif k.startswith('contact:') and k not in osm_tags and k[8:] in osm_tags: 65 | return k[8:] 66 | 67 | # Now conflating lifecycle prefixes, only forward 68 | if k in LIFECYCLE_KEYS and k not in osm_tags: 69 | for prefix in LIFECYCLE_PREFIXES: 70 | if prefix+':'+k in osm_tags: 71 | return prefix+':'+k 72 | return k 73 | 74 | def update_tags(tags, source, master_tags=None, retagging=False, audit=None): 75 | """Updates tags dictionary with tags from source, 76 | returns True is something was changed.""" 77 | keep = set() 78 | override = set() 79 | changed = False 80 | if source: 81 | if audit: 82 | keep = set(audit.get('keep', [])) 83 | override = set(audit.get('override', [])) 84 | for k, v in source.items(): 85 | osm_key = get_osm_key(k, tags) 86 | 87 | if k in keep or osm_key in keep: 88 | continue 89 | if k in override or osm_key in override: 90 | if not v and osm_key in tags: 91 | del tags[osm_key] 92 | changed = True 93 | elif v and tags.get(osm_key, None) != v: 94 | tags[osm_key] = v 95 | changed = True 96 | continue 97 | 98 | if osm_key not in tags or retagging or ( 99 | tags[osm_key] != v and (master_tags and k in master_tags)): 100 | if v is not None and len(v) > 0: 101 | # Not setting addr:full when the object has addr:housenumber 102 | if k == 'addr:full' and 'addr:housenumber' in tags: 103 | continue 104 | tags[osm_key] = v 105 | changed = True 106 | elif osm_key in tags and (v == '' or retagging): 107 | del tags[osm_key] 108 | changed = True 109 | return changed 110 | 111 | def format_change(before, after, ref): 112 | MARKER_COLORS = { 113 | 'delete': '#ee2211', # deleting feature from OSM 114 | 'create': '#11dd11', # creating a new node 115 | 'update': '#0000ee', # changing tags on an existing feature 116 | 'retag': '#660000', # cannot delete unmatched feature, changing tags 117 | 'move': '#110055', # moving an existing node 118 | } 119 | marker_action = None 120 | geometry = {'type': 'Point', 'coordinates': [after.lon, after.lat]} 121 | props = { 122 | 'osm_type': after.osm_type, 123 | 'osm_id': after.osm_id, 124 | 'action': after.action 125 | } 126 | if after.action in ('create', 'delete'): 127 | # Red if deleted, green if added 128 | marker_action = after.action 129 | for k, v in after.tags.items(): 130 | props['tags.{}'.format(k)] = v 131 | if ref: 132 | props['ref_id'] = ref.id 133 | else: # modified 134 | # Blue if updated from dataset, dark red if retagged, dark blue if moved 135 | marker_action = 'update' if ref else 'retag' 136 | if ref: 137 | props['ref_id'] = ref.id 138 | props['ref_distance'] = round(10 * ref.distance(before)) / 10.0 139 | props['ref_coords'] = [ref.lon, ref.lat] 140 | if before.lon != after.lon or before.lat != after.lat: 141 | # The object was moved 142 | props['were_coords'] = [before.lon, before.lat] 143 | marker_action = 'move' 144 | # Find tags that were superseeded by OSM tags 145 | for k, v in ref.tags.items(): 146 | osm_key = get_osm_key(k, after.tags) 147 | if osm_key not in after.tags or after.tags[osm_key] != v: 148 | props['ref_unused_tags.{}'.format(osm_key)] = v 149 | # Now compare old and new OSM tags 150 | for k in set(after.tags.keys()).union(set(before.tags.keys())): 151 | v0 = before.tags.get(k, None) 152 | v1 = after.tags.get(k, None) 153 | if v0 == v1: 154 | props['tags.{}'.format(k)] = v0 155 | elif v0 is None: 156 | props['tags_new.{}'.format(k)] = v1 157 | elif v1 is None: 158 | props['tags_deleted.{}'.format(k)] = v0 159 | else: 160 | props['tags_changed.{}'.format(k)] = '{} -> {}'.format(v0, v1) 161 | props['marker-color'] = MARKER_COLORS[marker_action] 162 | if ref and ref.remarks: 163 | props['remarks'] = ref.remarks 164 | if ref and ref.region: 165 | props['region'] = ref.region 166 | elif self.geocoder: 167 | region, present = self.geocoder.find(after) 168 | if not present: 169 | return None 170 | if region is not None: 171 | props['region'] = region 172 | return {'type': 'Feature', 'geometry': geometry, 'properties': props} 173 | 174 | p = self.osmdata.pop(osmdata_key, None) 175 | p0 = None if p is None else p.copy() 176 | sp = self.dataset.pop(dataset_key, None) 177 | audit = self.audit.get(sp.id if sp else '{}{}'.format(p.osm_type, p.osm_id), {}) 178 | if audit.get('skip', False): 179 | return 180 | 181 | if sp is not None: 182 | if p is None: 183 | p = OSMPoint('node', -1-len(self.matched), 1, sp.lat, sp.lon, sp.tags) 184 | p.action = 'create' 185 | else: 186 | master_tags = set(self.profile.get('master_tags', [])) 187 | if update_tags(p.tags, sp.tags, master_tags, audit=audit): 188 | p.action = 'modify' 189 | # Move a node if it is too far from the dataset point 190 | if not p.is_area() and sp.distance(p) > self.profile.max_distance: 191 | p.lat = sp.lat 192 | p.lon = sp.lon 193 | p.action = 'modify' 194 | if self.add_source_tag: 195 | if 'source' in p.tags: 196 | if self.source not in p.tags['source']: 197 | p.tags['source'] = ';'.join([p.tags['source'], self.source]) 198 | else: 199 | p.tags['source'] = self.source 200 | if self.ref is not None: 201 | p.tags[self.ref] = sp.id 202 | if 'fixme' in audit and audit['fixme'] != p.tags.get('fixme'): 203 | p.tags['fixme'] = audit['fixme'] 204 | if p.action is None: 205 | p.action = 'modify' 206 | if 'move' in audit and not p.is_area(): 207 | if p0 and audit['move'] == 'osm': 208 | p.lat = p0.lat 209 | p.lon = p0.lon 210 | elif audit['move'] == 'dataset': 211 | p.lat = sp.lat 212 | p.lon = sp.lon 213 | elif len(audit['move']) == 2: 214 | p.lat = audit['move'][1] 215 | p.lon = audit['move'][0] 216 | if p.action is None and p0.distance(p) > 0.1: 217 | p.action = 'modify' 218 | if p.action != 'create': 219 | self.matches.append([sp.id, p.osm_type, p.osm_id, p.lat, p.lon, p.action]) 220 | else: 221 | self.matches.append([sp.id, '', '', p.lat, p.lon, p.action]) 222 | elif keep or p.is_area(): 223 | if update_tags(p.tags, retag, retagging=True, audit=audit): 224 | p.action = 'modify' 225 | else: 226 | p.action = 'delete' 227 | 228 | if p.action is not None: 229 | change = format_change(p0, p, sp) 230 | if change is not None: 231 | self.matched.append(p) 232 | self.changes.append(change) 233 | 234 | def match_dataset_points_smart(self): 235 | """Smart matching for dataset <-> OSM points. 236 | 237 | We find a shortest link between a dataset and an OSM point. 238 | Then we match these and remove both from dicts. 239 | Then find another link and so on, until the length of a link 240 | becomes larger than "max_distance". 241 | 242 | Currently the worst case complexity is around O(n^2*log^2 n). 243 | But given the small number of objects to match, and that 244 | the average case complexity is ~O(n*log^2 n), this is fine. 245 | """ 246 | def search_nn_fix(kd, point): 247 | nearest = kd.search_knn(point, self.profile.get('nearest_points', 10)) 248 | if not nearest: 249 | return None, None 250 | match_func = self.profile.get_raw('matches') 251 | if match_func: 252 | nearest = [p for p in nearest if match_func(p[0].data.tags, point.tags)] 253 | if not nearest: 254 | return None, None 255 | nearest = [(n[0], n[0].data.distance(point)) 256 | for n in nearest if point.category in n[0].data.categories] 257 | return sorted(nearest, key=lambda kv: kv[1])[0] 258 | 259 | if not self.osmdata: 260 | return 261 | osm_kd = kdtree.create(list(self.osmdata.values())) 262 | count_matched = 0 263 | 264 | # Process overridden features first 265 | for override, osm_find in self.profile.get('override', {}).items(): 266 | override = str(override) 267 | if override not in self.dataset: 268 | continue 269 | found = None 270 | if len(osm_find) > 2 and osm_find[0] in 'nwr' and osm_find[1].isdigit(): 271 | if osm_find in self.osmdata: 272 | found = self.osmdata[osm_find] 273 | # Search nearest 100 points 274 | nearest = osm_kd.search_knn(self.dataset[override], 100) 275 | if nearest: 276 | for p in nearest: 277 | if 'name' in p[0].data.tags and p[0].data.tags['name'] == osm_find: 278 | found = p[0].data 279 | if found: 280 | count_matched += 1 281 | self.register_match(override, found.id) 282 | osm_kd = osm_kd.remove(found) 283 | 284 | # Prepare distance list: match OSM points to each of the dataset points 285 | dist = [] 286 | for sp, v in self.dataset.items(): 287 | osm_point, distance = search_nn_fix(osm_kd, v) 288 | if osm_point is not None and distance <= self.profile.max_distance: 289 | dist.append((distance, sp, osm_point.data)) 290 | 291 | # The main matching loop: sort dist list if needed, 292 | # register the closes match, update the list 293 | needs_sorting = True 294 | while dist: 295 | if needs_sorting: 296 | dist.sort(key=lambda x: x[0]) 297 | needs_sorting = False 298 | count_matched += 1 299 | osm_point = dist[0][2] 300 | self.register_match(dist[0][1], osm_point.id) 301 | osm_kd = osm_kd.remove(osm_point) 302 | del dist[0] 303 | for i in reversed(range(len(dist))): 304 | if dist[i][2] == osm_point: 305 | nearest, distance = search_nn_fix(osm_kd, self.dataset[dist[i][1]]) 306 | if nearest and distance <= self.profile.max_distance: 307 | dist[i] = (distance, dist[i][1], nearest.data) 308 | needs_sorting = i == 0 or distance < dist[0][0] 309 | else: 310 | del dist[i] 311 | needs_sorting = i == 0 312 | logging.info('Matched %s points', count_matched) 313 | 314 | def match(self): 315 | """Matches each osm object with a SourcePoint, or marks it as obsolete. 316 | The resulting list of OSM Points are written to the "matched" field.""" 317 | find_ref = self.profile.get_raw('find_ref') 318 | if self.ref is not None or callable(find_ref): 319 | # First match all objects with ref:whatever tag set 320 | count_ref = 0 321 | for k, p in list(self.osmdata.items()): 322 | ref = None 323 | if self.ref and self.ref in p.tags: 324 | ref = p.tags[self.ref] 325 | elif find_ref: 326 | ref = find_ref(p.tags) 327 | if ref is not None: 328 | if ref in self.dataset: 329 | count_ref += 1 330 | self.register_match(ref, k) 331 | logging.info('Updated %s OSM objects with %s tag', count_ref, self.ref) 332 | 333 | # Add points for which audit specifically mentioned creating 334 | count_created = 0 335 | for ref, a in self.audit.items(): 336 | if ref in self.dataset: 337 | if a.get('create', None): 338 | count_created += 1 339 | self.register_match(ref, None) 340 | elif a.get('skip', None): 341 | # If we skip an object here, it would affect the conflation order 342 | pass 343 | if count_created > 0: 344 | logging.info('Created %s audit-overridden dataset points', count_created) 345 | 346 | # Prepare exclusive groups dict 347 | exclusive_groups = defaultdict(set) 348 | for p, v in self.dataset.items(): 349 | if v.exclusive_group is not None: 350 | exclusive_groups[v.exclusive_group].add(p) 351 | 352 | # Then find matches for unmatched dataset points 353 | self.match_dataset_points_smart() 354 | 355 | # Remove unmatched duplicates 356 | count_duplicates = 0 357 | for ids in exclusive_groups.values(): 358 | found = False 359 | for p in ids: 360 | if p not in self.dataset: 361 | found = True 362 | break 363 | for p in ids: 364 | if p in self.dataset: 365 | if found: 366 | count_duplicates += 1 367 | del self.dataset[p] 368 | else: 369 | # Leave one element when not matched any 370 | found = True 371 | if count_duplicates > 0: 372 | logging.info('Removed %s unmatched duplicates', count_duplicates) 373 | 374 | # Add unmatched dataset points 375 | logging.info('Adding %s unmatched dataset points', len(self.dataset)) 376 | for k in sorted(list(self.dataset.keys())): 377 | self.register_match(k, None) 378 | 379 | # And finally delete some or all of the remaining osm objects 380 | if len(self.osmdata) > 0: 381 | count_deleted = 0 382 | count_retagged = 0 383 | delete_unmatched = self.profile.get('delete_unmatched', False) 384 | retag = self.profile.get('tag_unmatched') 385 | for k, p in list(self.osmdata.items()): 386 | ref = None 387 | if self.ref and self.ref in p.tags: 388 | ref = p.tags[self.ref] 389 | elif find_ref: 390 | ref = find_ref(p.tags) 391 | if ref is not None: 392 | # When ref:whatever is present, we can delete that object safely 393 | count_deleted += 1 394 | self.register_match(None, k, retag=retag) 395 | elif delete_unmatched or retag: 396 | if not delete_unmatched or p.is_area(): 397 | count_retagged += 1 398 | else: 399 | count_deleted += 1 400 | self.register_match(None, k, keep=not delete_unmatched, retag=retag) 401 | logging.info( 402 | 'Deleted %s and retagged %s unmatched objects from OSM', 403 | count_deleted, count_retagged) 404 | 405 | def backup_osm(self): 406 | """Writes OSM data as-is.""" 407 | osm = etree.Element('osm', version='0.6', generator=TITLE) 408 | for osmel in self.osmdata.values(): 409 | el = osmel.to_xml() 410 | if osmel.osm_type != 'node': 411 | etree.SubElement(el, 'center', lat=str(osmel.lat), lon=str(osmel.lon)) 412 | osm.append(el) 413 | return ("\n" + 414 | etree.tostring(osm, encoding='utf-8').decode('utf-8')) 415 | 416 | def to_osc(self, josm=False): 417 | """Returns a string with osmChange or JOSM XML.""" 418 | osc = etree.Element('osm' if josm else 'osmChange', version='0.6', generator=TITLE) 419 | if josm: 420 | neg_id = -1 421 | changeset = etree.SubElement(osc, 'changeset') 422 | ch_tags = { 423 | 'source': self.source, 424 | 'created_by': TITLE, 425 | 'type': 'import' 426 | } 427 | for k, v in ch_tags.items(): 428 | etree.SubElement(changeset, 'tag', k=k, v=v) 429 | for osmel in self.matched: 430 | if osmel.action is not None: 431 | el = osmel.to_xml() 432 | if josm: 433 | if osmel.action == 'create': 434 | el.set('id', str(neg_id)) 435 | neg_id -= 1 436 | else: 437 | el.set('action', osmel.action) 438 | osc.append(el) 439 | else: 440 | etree.SubElement(osc, osmel.action).append(el) 441 | return ("\n" + 442 | etree.tostring(osc, encoding='utf-8').decode('utf-8')) 443 | 444 | def check_moveability(self): 445 | check_moveability(self.changes) 446 | -------------------------------------------------------------------------------- /conflate/data.py: -------------------------------------------------------------------------------- 1 | import math 2 | from . import etree 3 | 4 | 5 | class SourcePoint: 6 | """A common class for points. Has an id, latitude and longitude, 7 | and a dict of tags. Remarks are optional for reviewers hints only.""" 8 | def __init__(self, pid, lat, lon, tags=None, category=None, remarks=None, region=None): 9 | self.id = str(pid) 10 | self.lat = lat 11 | self.lon = lon 12 | self.tags = {} if tags is None else { 13 | k.lower(): str(v).strip() for k, v in tags.items() if v is not None} 14 | self.category = category 15 | self.dist_offset = 0 16 | self.remarks = remarks 17 | self.region = region 18 | self.exclusive_group = None 19 | 20 | def distance(self, other): 21 | """Calculate distance in meters.""" 22 | dx = math.radians(self.lon - other.lon) * math.cos(0.5 * math.radians(self.lat + other.lat)) 23 | dy = math.radians(self.lat - other.lat) 24 | return 6378137 * math.sqrt(dx*dx + dy*dy) - self.dist_offset 25 | 26 | def __len__(self): 27 | return 2 28 | 29 | def __getitem__(self, i): 30 | if i == 0: 31 | return self.lon 32 | elif i == 1: 33 | return self.lat 34 | else: 35 | raise ValueError('A SourcePoint has only lat and lon in a list') 36 | 37 | def __eq__(self, other): 38 | return self.id == other.id 39 | 40 | def __hash__(self): 41 | return hash(self.id) 42 | 43 | def __repr__(self): 44 | return 'SourcePoint({}, {}, {}, offset={}, tags={})'.format( 45 | self.id, self.lat, self.lon, self.dist_offset, self.tags) 46 | 47 | 48 | class OSMPoint(SourcePoint): 49 | """An OSM points is a SourcePoint with a few extra fields. 50 | Namely, version, members (for ways and relations), and an action. 51 | The id is compound and created from object type and object id.""" 52 | def __init__(self, ptype, pid, version, lat, lon, tags=None, categories=None): 53 | super().__init__('{}{}'.format(ptype[0], pid), lat, lon, tags) 54 | self.tags = {k: v for k, v in self.tags.items() if v is not None and len(v) > 0} 55 | self.osm_type = ptype 56 | self.osm_id = pid 57 | self.version = version 58 | self.members = None 59 | self.action = None 60 | self.categories = categories or set() 61 | self.remarks = None 62 | 63 | def copy(self): 64 | """Returns a copy of this object, except for members field.""" 65 | c = OSMPoint(self.osm_type, self.osm_id, self.version, self.lat, self.lon, self.tags.copy()) 66 | c.action = self.action 67 | c.remarks = self.remarks 68 | c.categories = self.categories.copy() 69 | return c 70 | 71 | def is_area(self): 72 | return self.osm_type != 'node' 73 | 74 | def is_poi(self): 75 | if self.osm_type == 'node': 76 | return True 77 | if self.osm_type == 'way' and len(self.members) > 2: 78 | return self.members[0] == self.members[-1] 79 | if self.osm_type == 'relation' and len(self.members) > 0: 80 | return self.tags.get('type', None) == 'multipolygon' 81 | return False 82 | 83 | def to_xml(self): 84 | """Produces an XML out of the point data. Disregards the "action" field.""" 85 | el = etree.Element(self.osm_type, id=str(self.osm_id), version=str(self.version)) 86 | for tag, value in self.tags.items(): 87 | etree.SubElement(el, 'tag', k=tag, v=value) 88 | 89 | if self.osm_type == 'node': 90 | el.set('lat', str(self.lat)) 91 | el.set('lon', str(self.lon)) 92 | elif self.osm_type == 'way': 93 | for node_id in self.members: 94 | etree.SubElement(el, 'nd', ref=str(node_id)) 95 | elif self.osm_type == 'relation': 96 | for member in self.members: 97 | m = etree.SubElement(el, 'member') 98 | for i, n in enumerate(('type', 'ref', 'role')): 99 | m.set(n, str(member[i])) 100 | return el 101 | 102 | def __repr__(self): 103 | return 'OSMPoint({} {} v{}, {}, {}, action={}, tags={})'.format( 104 | self.osm_type, self.osm_id, self.version, self.lat, self.lon, self.action, self.tags) 105 | -------------------------------------------------------------------------------- /conflate/dataset.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import json 3 | import codecs 4 | import requests 5 | import kdtree 6 | from io import BytesIO 7 | from .data import SourcePoint 8 | 9 | 10 | def read_dataset(profile, fileobj): 11 | """A helper function to call a "dataset" function in the profile. 12 | If the fileobj is not specified, tries to download a dataset from 13 | an URL specified in "download_url" profile variable.""" 14 | if not fileobj: 15 | url = profile.get('download_url') 16 | if url is None: 17 | logging.error('No download_url specified in the profile, ' 18 | 'please provide a dataset file with --source') 19 | return None 20 | r = requests.get(url) 21 | if r.status_code != 200: 22 | logging.error('Could not download source data: %s %s', r.status_code, r.text) 23 | return None 24 | if len(r.content) == 0: 25 | logging.error('Empty response from %s', url) 26 | return None 27 | fileobj = BytesIO(r.content) 28 | if not profile.has('dataset'): 29 | # The default option is to parse the source as a JSON 30 | try: 31 | data = [] 32 | reader = codecs.getreader('utf-8') 33 | json_src = json.load(reader(fileobj)) 34 | if 'features' in json_src: 35 | # Parse GeoJSON 36 | for item in json_src['features']: 37 | if item['geometry'].get('type') != 'Point' or 'properties' not in item: 38 | continue 39 | # Get the identifier from "id", "ref", "ref*" 40 | iid = item['properties'].get('id', item['properties'].get('ref')) 41 | if not iid: 42 | for k, v in item['properties'].items(): 43 | if k.startswith('ref'): 44 | iid = v 45 | break 46 | if not iid: 47 | continue 48 | data.append(SourcePoint( 49 | iid, 50 | item['geometry']['coordinates'][1], 51 | item['geometry']['coordinates'][0], 52 | {k: v for k, v in item['properties'].items() if k != 'id'})) 53 | else: 54 | for item in json_src: 55 | data.append(SourcePoint(item['id'], item['lat'], item['lon'], item['tags'])) 56 | return data 57 | except Exception: 58 | logging.error('Failed to parse the source as a JSON') 59 | return list(profile.get( 60 | 'dataset', args=(fileobj,), 61 | required='returns a list of SourcePoints with the dataset')) 62 | 63 | 64 | def add_categories_to_dataset(profile, dataset): 65 | categories = profile.get('categories') 66 | if not categories: 67 | return 68 | tag = profile.get('category_tag') 69 | other = categories.get('other', {}) 70 | for d in dataset: 71 | if tag and tag in d.tags: 72 | d.category = d.tags[tag] 73 | del d.tags[tag] 74 | if d.category: 75 | cat_tags = categories.get(d.category, other).get('tags', None) 76 | if cat_tags: 77 | d.tags.update(cat_tags) 78 | 79 | 80 | def transform_dataset(profile, dataset): 81 | """Transforms tags in the dataset using the "transform" method in the profile 82 | or the instructions in that field in string or dict form.""" 83 | transform = profile.get_raw('transform') 84 | if not transform: 85 | return 86 | if callable(transform): 87 | for d in dataset: 88 | transform(d.tags) 89 | return 90 | if isinstance(transform, str): 91 | # Convert string of "key=value|rule1|rule2" lines to a dict 92 | lines = [line.split('=', 1) for line in transform.splitlines()] 93 | transform = {l[0].strip(): l[1].strip() for l in lines} 94 | if not transform or not isinstance(transform, dict): 95 | return 96 | for key in transform: 97 | if isinstance(transform[key], str): 98 | transform[key] = [x.strip() for x in transform[key].split('|')] 99 | 100 | for d in dataset: 101 | for key, rules in transform.items(): 102 | if not rules: 103 | continue 104 | value = None 105 | if callable(rules): 106 | # The value can be generated 107 | value = rules(None if key not in d.tags else d.tags[key]) 108 | if value is None and key in d.tags: 109 | del d.tags[key] 110 | elif not rules[0]: 111 | # Use the value of the tag 112 | if key in d.tags: 113 | value = d.tags[key] 114 | elif not isinstance(rules[0], str): 115 | # If the value is not a string, use it 116 | value = str(rules[0]) 117 | elif rules[0][0] == '.': 118 | # Use the value from another tag 119 | alt_key = rules[0][1:] 120 | if alt_key in d.tags: 121 | value = d.tags[alt_key] 122 | elif rules[0][0] == '>': 123 | # Replace the key 124 | if key in d.tags: 125 | d.tags[rules[0][1:]] = d.tags[key] 126 | del d.tags[key] 127 | elif rules[0][0] == '<': 128 | # Replace the key, the same but backwards 129 | alt_key = rules[0][1:] 130 | if alt_key in d.tags: 131 | d.tags[key] = d.tags[alt_key] 132 | del d.tags[alt_key] 133 | elif rules[0] == '-': 134 | # Delete the tag 135 | if key in d.tags: 136 | del d.tags[key] 137 | else: 138 | # Take the value as written 139 | value = rules[0] 140 | if value is None: 141 | continue 142 | if isinstance(rules, list): 143 | for rule in rules[1:]: 144 | if rule == 'lower': 145 | value = value.lower() 146 | d.tags[key] = value 147 | 148 | 149 | def check_dataset_for_duplicates(profile, dataset, print_all=False): 150 | # First checking for duplicate ids and collecting tags with varying values 151 | ids = set() 152 | tags = {} 153 | found_duplicate_ids = False 154 | for d in dataset: 155 | if d.id in ids: 156 | found_duplicate_ids = True 157 | logging.error('Duplicate id {} in the dataset'.format(d.id)) 158 | ids.add(d.id) 159 | for k, v in d.tags.items(): 160 | if k not in tags: 161 | tags[k] = v 162 | elif tags[k] != '---' and tags[k] != v: 163 | tags[k] = '---' 164 | 165 | # And then for near-duplicate points with similar tags 166 | uncond_distance = profile.get('duplicate_distance', 1) 167 | diff_tags = [k for k in tags if tags[k] == '---'] 168 | kd = kdtree.create(list(dataset)) 169 | duplicates = set() 170 | group = 0 171 | for d in dataset: 172 | if d.id in duplicates: 173 | continue 174 | group += 1 175 | dups = kd.search_knn(d, 2) # The first one will be equal to d 176 | if len(dups) < 2 or dups[1][0].data.distance(d) > profile.max_distance: 177 | continue 178 | for alt, _ in kd.search_knn(d, 20): 179 | dist = alt.data.distance(d) 180 | if alt.data.id != d.id and dist <= profile.max_distance: 181 | tags_differ = 0 182 | if dist > uncond_distance: 183 | for k in diff_tags: 184 | if alt.data.tags.get(k) != d.tags.get(k): 185 | tags_differ += 1 186 | if tags_differ <= len(diff_tags) / 3: 187 | duplicates.add(alt.data.id) 188 | d.exclusive_group = group 189 | alt.data.exclusive_group = group 190 | if print_all or len(duplicates) <= 5: 191 | is_duplicate = tags_differ <= 1 192 | logging.error('Dataset points %s: %s and %s', 193 | 'duplicate each other' if is_duplicate else 'are too similar', 194 | d.id, alt.data.id) 195 | if duplicates: 196 | logging.error('Found %s duplicates in the dataset', len(duplicates)) 197 | if found_duplicate_ids: 198 | raise KeyError('Cannot continue with duplicate ids') 199 | 200 | 201 | def add_regions(dataset, geocoder): 202 | if not geocoder.enabled: 203 | return 204 | if geocoder.filter: 205 | logging.info('Geocoding and filtering points') 206 | else: 207 | logging.info('Geocoding points') 208 | for i in reversed(range(len(dataset))): 209 | region, present = geocoder.find(dataset[i]) 210 | if not present: 211 | del dataset[i] 212 | else: 213 | dataset[i].region = region 214 | -------------------------------------------------------------------------------- /conflate/geocoder.py: -------------------------------------------------------------------------------- 1 | import struct 2 | import logging 3 | import os 4 | import kdtree 5 | 6 | 7 | class Geocoder: 8 | def __init__(self, profile_regions='all'): 9 | self.filter = None 10 | self.enabled = bool(profile_regions) 11 | if self.enabled: 12 | logging.info('Initializing geocoder (this will take a minute)') 13 | self.regions = self.parse_regions(profile_regions) 14 | self.tree = self.load_places_tree() 15 | if not self.tree: 16 | if callable(profile_regions): 17 | logging.warn('Could not read the geocoding file') 18 | else: 19 | logging.error('Could not read the geocoding file, no regions will be added') 20 | self.enabled = False 21 | 22 | def set_filter(self, opt_regions): 23 | if isinstance(opt_regions, str): 24 | self.f_negate = opt_regions[0] in ('-', '^') 25 | if self.f_negate: 26 | opt_regions = opt_regions[1:] 27 | self.filter = set([r.strip() for r in opt_regions.split(',')]) 28 | elif isinstance(opt_regions, list): 29 | self.f_negate = False 30 | self.filter = set(opt_regions) 31 | 32 | def load_places_tree(self): 33 | class PlacePoint: 34 | def __init__(self, lon, lat, country, region): 35 | self.coord = (lon, lat) 36 | self.country = country 37 | self.region = region 38 | 39 | def __len__(self): 40 | return len(self.coord) 41 | 42 | def __getitem__(self, i): 43 | return self.coord[i] 44 | 45 | def unpack_coord(data): 46 | if data[-1] > 0x7f: 47 | data += b'\xFF' 48 | else: 49 | data += b'\0' 50 | return struct.unpack(' 2: 49 | q = '"{}"~"^({})$"'.format(t[0], '|'.join(t[1:])) 50 | else: 51 | q = '"{}"="{}"'.format(t[0], t[1]) 52 | tag_str += '[' + q + ']' 53 | tag_strs.append(tag_str) 54 | 55 | if self.profile.get('no_dataset_id', False): 56 | ref = None 57 | else: 58 | ref = 'nwr["ref:' + self.profile.get( 59 | 'dataset_id', required='A fairly unique id of the dataset to query OSM') + '"]' 60 | timeout = self.profile.get('overpass_timeout', 120) 61 | query = '[out:xml]{};('.format('' if timeout is None else '[timeout:{}]'.format(timeout)) 62 | for bbox in bboxes: 63 | bbox_str = '' if bbox is None else '(' + ','.join([str(x) for x in bbox]) + ')' 64 | for tag_str in tag_strs: 65 | query += 'nwr' + tag_str + bbox_str + ';' 66 | if ref is not None: 67 | if not self.profile.get('bounded_update', False): 68 | query += ref + ';' 69 | else: 70 | for bbox in bboxes: 71 | bbox_str = '' if bbox is None else '(' + ','.join( 72 | [str(x) for x in bbox]) + ')' 73 | query += ref + bbox_str + ';' 74 | query += '); out meta qt center;' 75 | return query 76 | 77 | def get_bbox(self, points): 78 | """Plain iterates over the dataset and returns the bounding box 79 | that encloses it.""" 80 | padding = self.profile.get('bbox_padding', BBOX_PADDING) 81 | bbox = [90.0, 180.0, -90.0, -180.0] 82 | for p in points: 83 | bbox[0] = min(bbox[0], p.lat - padding) 84 | bbox[1] = min(bbox[1], p.lon - padding) 85 | bbox[2] = max(bbox[2], p.lat + padding) 86 | bbox[3] = max(bbox[3], p.lon + padding) 87 | return bbox 88 | 89 | def split_into_bboxes(self, points): 90 | """ 91 | Splits the dataset into multiple bboxes to lower load on the overpass api. 92 | 93 | Returns a list of tuples (minlat, minlon, maxlat, maxlon). 94 | """ 95 | max_bboxes = self.profile.get('max_request_boxes', 4) 96 | if max_bboxes <= 1 or len(points) <= 1: 97 | return [self.get_bbox(points)] 98 | 99 | # coord, alt coord, total w/h to the left/bottom, total w/h to the right/top 100 | lons = sorted([[d.lon, d.lat, 0, 0] for d in points]) 101 | lats = sorted([[d.lat, d.lon, 0, 0] for d in points]) 102 | 103 | def update_side_dimensions(ar): 104 | """For each point, calculates the maximum and 105 | minimum bound for all points left and right.""" 106 | fwd_top = fwd_bottom = ar[0][1] 107 | back_top = back_bottom = ar[-1][1] 108 | for i in range(len(ar)): 109 | fwd_top = max(fwd_top, ar[i][1]) 110 | fwd_bottom = min(fwd_bottom, ar[i][1]) 111 | ar[i][2] = fwd_top - fwd_bottom 112 | back_top = max(back_top, ar[-i-1][1]) 113 | back_bottom = min(back_bottom, ar[-i-1][1]) 114 | ar[-i-1][3] = back_top - back_bottom 115 | 116 | def find_max_gap(ar, h): 117 | """Select an interval between points, which would give 118 | the maximum area if split there.""" 119 | max_id = None 120 | max_gap = 0 121 | for i in range(len(ar) - 1): 122 | # "Extra" variables are for area to the left and right 123 | # that would be freed after splitting. 124 | extra_left = (ar[i][0]-ar[0][0]) * (h-ar[i][2]) 125 | extra_right = (ar[-1][0]-ar[i+1][0]) * (h-ar[i+1][3]) 126 | # Gap is the area of the column between points i and i+1 127 | # plus extra areas to the left and right. 128 | gap = (ar[i+1][0] - ar[i][0]) * h + extra_left + extra_right 129 | if gap > max_gap: 130 | max_id = i 131 | max_gap = gap 132 | return max_id, max_gap 133 | 134 | def get_bbox(b, pad=0): 135 | """Returns a list of [min_lat, min_lon, max_lat, max_lon] for a box.""" 136 | return [b[2][0][0]-pad, b[3][0][0]-pad, b[2][-1][0]+pad, b[3][-1][0]+pad] 137 | 138 | def split(box, point_array, point_id): 139 | """Split the box over axis point_array at point point_id...point_id+1. 140 | Modifies the box in-place and returns a new box.""" 141 | alt_array = 5 - point_array # 3->2, 2->3 142 | points = box[point_array][point_id+1:] 143 | del box[point_array][point_id+1:] 144 | alt = {True: [], False: []} # True means point is in new box 145 | for p in box[alt_array]: 146 | alt[(p[1], p[0]) >= (points[0][0], points[0][1])].append(p) 147 | 148 | new_box = [None] * 4 149 | new_box[point_array] = points 150 | new_box[alt_array] = alt[True] 151 | box[alt_array] = alt[False] 152 | for i in range(2): 153 | box[i] = box[i+2][-1][0] - box[i+2][0][0] 154 | new_box[i] = new_box[i+2][-1][0] - new_box[i+2][0][0] 155 | return new_box 156 | 157 | # height, width, lats, lons 158 | boxes = [[lats[-1][0]-lats[0][0], lons[-1][0]-lons[0][0], lats, lons]] 159 | initial_area = boxes[0][0] * boxes[0][1] 160 | while len(boxes) < max_bboxes and len(boxes) <= len(points): 161 | candidate_box = None 162 | area = 0 163 | point_id = None 164 | point_array = None 165 | for box in boxes: 166 | for ar in (2, 3): 167 | # Find a box and an axis for splitting that would decrease the area the most 168 | update_side_dimensions(box[ar]) 169 | max_id, max_area = find_max_gap(box[ar], box[3-ar]) 170 | if max_area > area: 171 | area = max_area 172 | candidate_box = box 173 | point_id = max_id 174 | point_array = ar 175 | if area * 100 < initial_area: 176 | # Stop splitting when the area decrease is less than 1% 177 | break 178 | logging.debug('Splitting bbox %s at %s %s..%s; area decrease %s%%', 179 | get_bbox(candidate_box), 180 | 'longs' if point_array == 3 else 'lats', 181 | candidate_box[point_array][point_id][0], 182 | candidate_box[point_array][point_id+1][0], 183 | round(100*area/initial_area)) 184 | boxes.append(split(candidate_box, point_array, point_id)) 185 | 186 | padding = self.profile.get('bbox_padding', BBOX_PADDING) 187 | return [get_bbox(b, padding) for b in boxes] 188 | 189 | def get_categories(self, tags): 190 | def match_query(tags, query): 191 | for tag in query: 192 | if len(tag) == 1: 193 | return tag[0] in tags 194 | else: 195 | value = tags.get(tag[0], None) 196 | if tag[1] is None or tag[1] == '': 197 | return value is None 198 | if value is None: 199 | return False 200 | found = False 201 | for t2 in tag[1:]: 202 | if t2[0] == '~': 203 | if re.search(t2[1:], value): 204 | found = True 205 | elif t2[0] == '!': 206 | if t2[1:].lower() in value.lower(): 207 | found = True 208 | elif t2 == value: 209 | found = True 210 | if found: 211 | break 212 | if not found: 213 | return False 214 | return True 215 | 216 | def tags_to_query(tags): 217 | return [(k, v) for k, v in tags.items()] 218 | 219 | result = set() 220 | qualifies = self.profile.get('qualifies', args=tags) 221 | if qualifies is not None: 222 | if qualifies: 223 | result.add(None) 224 | return result 225 | 226 | # First check default query 227 | query = self.profile.get('query', None) 228 | if query is not None: 229 | if isinstance(query, str): 230 | result.add(None) 231 | else: 232 | if isinstance(query[0][0], str): 233 | query = [query] 234 | for q in query: 235 | if match_query(tags, q): 236 | result.add(None) 237 | break 238 | 239 | # Then check each category if we got these 240 | categories = self.profile.get('categories', {}) 241 | for name, params in categories.items(): 242 | if 'tags' not in params and 'query' not in params: 243 | raise ValueError('No tags and query attributes for category "{}"'.format(name)) 244 | if match_query(tags, params.get('query', tags_to_query(params.get('tags')))): 245 | result.add(name) 246 | 247 | return result 248 | 249 | def calc_boxes(self, dataset_points): 250 | profile_bbox = self.profile.get('bbox', True) 251 | if not profile_bbox: 252 | bboxes = [None] 253 | elif hasattr(profile_bbox, '__len__') and len(profile_bbox) == 4: 254 | bboxes = [profile_bbox] 255 | else: 256 | bboxes = self.split_into_bboxes(dataset_points) 257 | return bboxes 258 | 259 | def download(self, bboxes=None): 260 | """Constructs an Overpass API query and requests objects 261 | to match from a server.""" 262 | if not bboxes: 263 | pbbox = self.profile.get('bbox', True) 264 | if pbbox and hasattr(pbbox, '__len__') and len(pbbox) == 4: 265 | bboxes = [pbbox] 266 | else: 267 | bboxes = [None] 268 | 269 | query = self.construct_overpass_query(bboxes) 270 | logging.debug('Overpass query: %s', query) 271 | r = requests.get(OVERPASS_SERVER + 'interpreter', {'data': query}) 272 | if r.encoding is None: 273 | r.encoding = 'utf-8' 274 | if r.status_code != 200: 275 | logging.error('Failed to download data from Overpass API: %s', r.status_code) 276 | if 'rate_limited' in r.text: 277 | r = requests.get(OVERPASS_SERVER + 'status') 278 | logging.warning('Seems like you are rate limited. API status:\n%s', r.text) 279 | else: 280 | logging.error('Error message: %s', r.text) 281 | raise IOError() 282 | if 'runtime error: ' in r.text: 283 | m = re.search(r'runtime error: ([^<]+)', r.text) 284 | error = 'unknown' if not m else m.group(1) 285 | if 'Query timed out' in error: 286 | logging.error( 287 | 'Query timed out, try increasing the "overpass_timeout" profile variable') 288 | else: 289 | logging.error('Runtime error: %s', error) 290 | raise IOError() 291 | return self.parse_xml(r.content) 292 | 293 | def parse_xml(self, fileobj): 294 | """Parses an OSM XML file into the "osmdata" field. For ways and relations, 295 | finds the center. Drops objects that do not match the overpass query tags 296 | (see "check_against_profile_tags" method).""" 297 | if isinstance(fileobj, bytes): 298 | xml = etree.fromstring(fileobj) 299 | else: 300 | xml = etree.parse(fileobj).getroot() 301 | nodes = {} 302 | for nd in xml.findall('node'): 303 | nodes[nd.get('id')] = (float(nd.get('lat')), float(nd.get('lon'))) 304 | ways = {} 305 | for way in xml.findall('way'): 306 | center = way.find('center') 307 | if center is not None: 308 | ways[way.get('id')] = [float(center.get('lat')), float(center.get('lon'))] 309 | else: 310 | logging.debug('Way %s does not have a center', way.get('id')) 311 | coord = [0, 0] 312 | count = 0 313 | for nd in way.findall('nd'): 314 | if nd.get('ref') in nodes: 315 | count += 1 316 | for i in range(len(coord)): 317 | coord[i] += nodes[nd.get('ref')][i] 318 | ways[way.get('id')] = [coord[0] / count, coord[1] / count] 319 | 320 | # For calculating weight of OSM objects 321 | weight_fn = self.profile.get_raw('weight') 322 | osmdata = {} 323 | 324 | for el in xml: 325 | tags = {} 326 | for tag in el.findall('tag'): 327 | tags[tag.get('k')] = tag.get('v') 328 | categories = self.get_categories(tags) 329 | if categories is False or categories is None or len(categories) == 0: 330 | continue 331 | 332 | if el.tag == 'node': 333 | coord = nodes[el.get('id')] 334 | members = None 335 | elif el.tag == 'way': 336 | coord = ways[el.get('id')] 337 | members = [nd.get('ref') for nd in el.findall('nd')] 338 | elif el.tag == 'relation': 339 | center = el.find('center') 340 | if center is not None: 341 | coord = [float(center.get('lat')), float(center.get('lon'))] 342 | else: 343 | logging.debug('Relation %s does not have a center', el.get('id')) 344 | coord = [0, 0] 345 | count = 0 346 | for m in el.findall('member'): 347 | if m.get('type') == 'node' and m.get('ref') in nodes: 348 | count += 1 349 | for i in range(len(coord)): 350 | coord[i] += nodes[m.get('ref')][i] 351 | elif m.get('type') == 'way' and m.get('ref') in ways: 352 | count += 1 353 | for i in range(len(coord)): 354 | coord[i] += ways[m.get('ref')][i] 355 | if count > 0: 356 | coord = [coord[0] / count, coord[1] / count] 357 | members = [ 358 | (m.get('type'), m.get('ref'), m.get('role')) 359 | for m in el.findall('member') 360 | ] 361 | else: 362 | continue 363 | if not coord or coord == [0, 0]: 364 | continue 365 | pt = OSMPoint( 366 | el.tag, int(el.get('id')), int(el.get('version')), 367 | coord[0], coord[1], tags, categories) 368 | pt.members = members 369 | if pt.is_poi(): 370 | if callable(weight_fn): 371 | weight = weight_fn(pt) 372 | if weight: 373 | if abs(weight) > 3: 374 | pt.dist_offset = weight 375 | else: 376 | pt.dist_offset = weight * self.profile.max_distance 377 | osmdata[pt.id] = pt 378 | return osmdata 379 | 380 | 381 | def check_moveability(changes): 382 | to_check = [x for x in changes if x['properties']['osm_type'] == 'node' and 383 | x['properties']['action'] == 'modify'] 384 | logging.info('Checking moveability of %s modified nodes', len(to_check)) 385 | for c in to_check: 386 | p = c['properties'] 387 | p['can_move'] = False 388 | r = requests.get('{}node/{}/ways'.format(OSM_API_SERVER, p['osm_id'])) 389 | if r.status_code == 200: 390 | xml = etree.fromstring(r.content) 391 | p['can_move'] = xml.find('way') is None 392 | -------------------------------------------------------------------------------- /conflate/places.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mapsme/osm_conflate/a7af835ce44b3ac194469b53b7f388bba168cbe4/conflate/places.bin -------------------------------------------------------------------------------- /conflate/profile.py: -------------------------------------------------------------------------------- 1 | import json 2 | from .data import SourcePoint # So we don't have to import this in profiles 3 | from . import etree 4 | 5 | 6 | class ProfileException(Exception): 7 | """An exception class for the Profile instance.""" 8 | def __init__(self, attr, desc): 9 | super().__init__('Field missing in profile: {} ({})'.format(attr, desc)) 10 | 11 | 12 | class Profile: 13 | """A wrapper for a profile. 14 | 15 | A profile is a python script that sets a few local variables. 16 | These variables become properties of the profile, accessible with 17 | a "get" method. If something is a function, it will be called, 18 | optional parameters might be passed to it. 19 | 20 | You can compile a list of all supported variables by grepping through 21 | this code, or by looking at a few example profiles. If something 22 | is required, you will be notified of that. 23 | """ 24 | def __init__(self, fileobj, par=None): 25 | global param 26 | param = par 27 | if isinstance(fileobj, dict): 28 | self.profile = fileobj 29 | elif hasattr(fileobj, 'read'): 30 | s = fileobj.read().replace('\r', '') 31 | if s[0] == '{': 32 | self.profile = json.loads(s) 33 | else: 34 | self.profile = {} 35 | exec(s, globals(), self.profile) 36 | else: 37 | # Got a class 38 | self.profile = {name: getattr(fileobj, name) 39 | for name in dir(fileobj) if not name.startswith('_')} 40 | self.max_distance = self.get('max_distance', 100) 41 | 42 | def has(self, attr): 43 | return attr in self.profile 44 | 45 | def get(self, attr, default=None, required=None, args=None): 46 | if attr in self.profile: 47 | value = self.profile[attr] 48 | if callable(value): 49 | if args is None: 50 | return value() 51 | else: 52 | return value(*args) 53 | else: 54 | return value 55 | if required is not None: 56 | raise ProfileException(attr, required) 57 | return default 58 | 59 | def get_raw(self, attr, default=None): 60 | if attr in self.profile: 61 | return self.profile[attr] 62 | return default 63 | -------------------------------------------------------------------------------- /conflate/version.py: -------------------------------------------------------------------------------- 1 | __version__ = '1.4.1' 2 | -------------------------------------------------------------------------------- /filter/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 2.8) 2 | set(NAME filter_planet_by_cats) 3 | project(${NAME} C CXX) 4 | set(CMAKE_CXX_STANDARD 11) 5 | message(STATUS "Configuring ${NAME}") 6 | list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}") 7 | find_package(Osmium REQUIRED COMPONENTS io) 8 | include_directories(SYSTEM ${OSMIUM_INCLUDE_DIRS}) 9 | add_executable( 10 | ${NAME} 11 | ${NAME}.cpp 12 | RTree.h 13 | xml_centers_output.hpp 14 | ) 15 | target_link_libraries(${NAME} ${OSMIUM_IO_LIBRARIES}) 16 | -------------------------------------------------------------------------------- /filter/FindOsmium.cmake: -------------------------------------------------------------------------------- 1 | #---------------------------------------------------------------------- 2 | # 3 | # FindOsmium.cmake 4 | # 5 | # Find the Libosmium headers and, optionally, several components needed 6 | # for different Libosmium functions. 7 | # 8 | #---------------------------------------------------------------------- 9 | # 10 | # Usage: 11 | # 12 | # Copy this file somewhere into your project directory, where cmake can 13 | # find it. Usually this will be a directory called "cmake" which you can 14 | # add to the CMake module search path with the following line in your 15 | # CMakeLists.txt: 16 | # 17 | # list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake") 18 | # 19 | # Then add the following in your CMakeLists.txt: 20 | # 21 | # find_package(Osmium [version] REQUIRED COMPONENTS ) 22 | # include_directories(SYSTEM ${OSMIUM_INCLUDE_DIRS}) 23 | # 24 | # The version number is optional. If it is not set, any version of 25 | # libosmium will do. 26 | # 27 | # For the substitute a space separated list of one or more of the 28 | # following components: 29 | # 30 | # pbf - include libraries needed for PBF input and output 31 | # xml - include libraries needed for XML input and output 32 | # io - include libraries needed for any type of input/output 33 | # geos - include if you want to use any of the GEOS functions 34 | # gdal - include if you want to use any of the OGR functions 35 | # proj - include if you want to use any of the Proj.4 functions 36 | # sparsehash - include if you use the sparsehash index 37 | # 38 | # You can check for success with something like this: 39 | # 40 | # if(NOT OSMIUM_FOUND) 41 | # message(WARNING "Libosmium not found!\n") 42 | # endif() 43 | # 44 | #---------------------------------------------------------------------- 45 | # 46 | # Variables: 47 | # 48 | # OSMIUM_FOUND - True if Osmium found. 49 | # OSMIUM_INCLUDE_DIRS - Where to find include files. 50 | # OSMIUM_XML_LIBRARIES - Libraries needed for XML I/O. 51 | # OSMIUM_PBF_LIBRARIES - Libraries needed for PBF I/O. 52 | # OSMIUM_IO_LIBRARIES - Libraries needed for XML or PBF I/O. 53 | # OSMIUM_LIBRARIES - All libraries Osmium uses somewhere. 54 | # 55 | #---------------------------------------------------------------------- 56 | 57 | # This is the list of directories where we look for osmium includes. 58 | set(_osmium_include_path 59 | ../libosmium 60 | ~/Library/Frameworks 61 | /Library/Frameworks 62 | /opt/local # DarwinPorts 63 | /opt 64 | ) 65 | 66 | # Look for the header file. 67 | find_path(OSMIUM_INCLUDE_DIR osmium/version.hpp 68 | PATH_SUFFIXES include 69 | PATHS ${_osmium_include_path} 70 | ) 71 | 72 | # Check libosmium version number 73 | if(Osmium_FIND_VERSION) 74 | file(STRINGS "${OSMIUM_INCLUDE_DIR}/osmium/version.hpp" _libosmium_version_define REGEX "#define LIBOSMIUM_VERSION_STRING") 75 | if("${_libosmium_version_define}" MATCHES "#define LIBOSMIUM_VERSION_STRING \"([0-9.]+)\"") 76 | set(_libosmium_version "${CMAKE_MATCH_1}") 77 | else() 78 | set(_libosmium_version "unknown") 79 | endif() 80 | endif() 81 | 82 | set(OSMIUM_INCLUDE_DIRS "${OSMIUM_INCLUDE_DIR}") 83 | 84 | #---------------------------------------------------------------------- 85 | # 86 | # Check for optional components 87 | # 88 | #---------------------------------------------------------------------- 89 | if(Osmium_FIND_COMPONENTS) 90 | foreach(_component ${Osmium_FIND_COMPONENTS}) 91 | string(TOUPPER ${_component} _component_uppercase) 92 | set(Osmium_USE_${_component_uppercase} TRUE) 93 | endforeach() 94 | endif() 95 | 96 | #---------------------------------------------------------------------- 97 | # Component 'io' is an alias for 'pbf' and 'xml' 98 | if(Osmium_USE_IO) 99 | set(Osmium_USE_PBF TRUE) 100 | set(Osmium_USE_XML TRUE) 101 | endif() 102 | 103 | #---------------------------------------------------------------------- 104 | # Component 'ogr' is an alias for 'gdal' 105 | if(Osmium_USE_OGR) 106 | set(Osmium_USE_GDAL TRUE) 107 | endif() 108 | 109 | #---------------------------------------------------------------------- 110 | # Component 'pbf' 111 | if(Osmium_USE_PBF) 112 | find_package(ZLIB) 113 | find_package(Threads) 114 | find_package(Protozero 1.5.1) 115 | 116 | list(APPEND OSMIUM_EXTRA_FIND_VARS ZLIB_FOUND Threads_FOUND PROTOZERO_INCLUDE_DIR) 117 | if(ZLIB_FOUND AND Threads_FOUND AND PROTOZERO_FOUND) 118 | list(APPEND OSMIUM_PBF_LIBRARIES 119 | ${ZLIB_LIBRARIES} 120 | ${CMAKE_THREAD_LIBS_INIT} 121 | ) 122 | list(APPEND OSMIUM_INCLUDE_DIRS 123 | ${ZLIB_INCLUDE_DIR} 124 | ${PROTOZERO_INCLUDE_DIR} 125 | ) 126 | else() 127 | message(WARNING "Osmium: Can not find some libraries for PBF input/output, please install them or configure the paths.") 128 | endif() 129 | endif() 130 | 131 | #---------------------------------------------------------------------- 132 | # Component 'xml' 133 | if(Osmium_USE_XML) 134 | find_package(EXPAT) 135 | find_package(BZip2) 136 | find_package(ZLIB) 137 | find_package(Threads) 138 | 139 | list(APPEND OSMIUM_EXTRA_FIND_VARS EXPAT_FOUND BZIP2_FOUND ZLIB_FOUND Threads_FOUND) 140 | if(EXPAT_FOUND AND BZIP2_FOUND AND ZLIB_FOUND AND Threads_FOUND) 141 | list(APPEND OSMIUM_XML_LIBRARIES 142 | ${EXPAT_LIBRARIES} 143 | ${BZIP2_LIBRARIES} 144 | ${ZLIB_LIBRARIES} 145 | ${CMAKE_THREAD_LIBS_INIT} 146 | ) 147 | list(APPEND OSMIUM_INCLUDE_DIRS 148 | ${EXPAT_INCLUDE_DIR} 149 | ${BZIP2_INCLUDE_DIR} 150 | ${ZLIB_INCLUDE_DIR} 151 | ) 152 | else() 153 | message(WARNING "Osmium: Can not find some libraries for XML input/output, please install them or configure the paths.") 154 | endif() 155 | endif() 156 | 157 | #---------------------------------------------------------------------- 158 | list(APPEND OSMIUM_IO_LIBRARIES 159 | ${OSMIUM_PBF_LIBRARIES} 160 | ${OSMIUM_XML_LIBRARIES} 161 | ) 162 | 163 | list(APPEND OSMIUM_LIBRARIES 164 | ${OSMIUM_IO_LIBRARIES} 165 | ) 166 | 167 | #---------------------------------------------------------------------- 168 | # Component 'geos' 169 | if(Osmium_USE_GEOS) 170 | find_path(GEOS_INCLUDE_DIR geos/geom.h) 171 | find_library(GEOS_LIBRARY NAMES geos) 172 | 173 | list(APPEND OSMIUM_EXTRA_FIND_VARS GEOS_INCLUDE_DIR GEOS_LIBRARY) 174 | if(GEOS_INCLUDE_DIR AND GEOS_LIBRARY) 175 | SET(GEOS_FOUND 1) 176 | list(APPEND OSMIUM_LIBRARIES ${GEOS_LIBRARY}) 177 | list(APPEND OSMIUM_INCLUDE_DIRS ${GEOS_INCLUDE_DIR}) 178 | else() 179 | message(WARNING "Osmium: GEOS library is required but not found, please install it or configure the paths.") 180 | endif() 181 | endif() 182 | 183 | #---------------------------------------------------------------------- 184 | # Component 'gdal' (alias 'ogr') 185 | if(Osmium_USE_GDAL) 186 | find_package(GDAL) 187 | 188 | list(APPEND OSMIUM_EXTRA_FIND_VARS GDAL_FOUND) 189 | if(GDAL_FOUND) 190 | list(APPEND OSMIUM_LIBRARIES ${GDAL_LIBRARIES}) 191 | list(APPEND OSMIUM_INCLUDE_DIRS ${GDAL_INCLUDE_DIRS}) 192 | else() 193 | message(WARNING "Osmium: GDAL library is required but not found, please install it or configure the paths.") 194 | endif() 195 | endif() 196 | 197 | #---------------------------------------------------------------------- 198 | # Component 'proj' 199 | if(Osmium_USE_PROJ) 200 | find_path(PROJ_INCLUDE_DIR proj_api.h) 201 | find_library(PROJ_LIBRARY NAMES proj) 202 | 203 | list(APPEND OSMIUM_EXTRA_FIND_VARS PROJ_INCLUDE_DIR PROJ_LIBRARY) 204 | if(PROJ_INCLUDE_DIR AND PROJ_LIBRARY) 205 | set(PROJ_FOUND 1) 206 | list(APPEND OSMIUM_LIBRARIES ${PROJ_LIBRARY}) 207 | list(APPEND OSMIUM_INCLUDE_DIRS ${PROJ_INCLUDE_DIR}) 208 | else() 209 | message(WARNING "Osmium: PROJ.4 library is required but not found, please install it or configure the paths.") 210 | endif() 211 | endif() 212 | 213 | #---------------------------------------------------------------------- 214 | # Component 'sparsehash' 215 | if(Osmium_USE_SPARSEHASH) 216 | find_path(SPARSEHASH_INCLUDE_DIR google/sparsetable) 217 | 218 | list(APPEND OSMIUM_EXTRA_FIND_VARS SPARSEHASH_INCLUDE_DIR) 219 | if(SPARSEHASH_INCLUDE_DIR) 220 | # Find size of sparsetable::size_type. This does not work on older 221 | # CMake versions because they can do this check only in C, not in C++. 222 | if(NOT CMAKE_VERSION VERSION_LESS 3.0) 223 | include(CheckTypeSize) 224 | set(CMAKE_REQUIRED_INCLUDES ${SPARSEHASH_INCLUDE_DIR}) 225 | set(CMAKE_EXTRA_INCLUDE_FILES "google/sparsetable") 226 | check_type_size("google::sparsetable::size_type" SPARSETABLE_SIZE_TYPE LANGUAGE CXX) 227 | set(CMAKE_EXTRA_INCLUDE_FILES) 228 | set(CMAKE_REQUIRED_INCLUDES) 229 | else() 230 | set(SPARSETABLE_SIZE_TYPE ${CMAKE_SIZEOF_VOID_P}) 231 | endif() 232 | 233 | # Sparsetable::size_type must be at least 8 bytes (64bit), otherwise 234 | # OSM object IDs will not fit. 235 | if(SPARSETABLE_SIZE_TYPE GREATER 7) 236 | set(SPARSEHASH_FOUND 1) 237 | add_definitions(-DOSMIUM_WITH_SPARSEHASH=${SPARSEHASH_FOUND}) 238 | list(APPEND OSMIUM_INCLUDE_DIRS ${SPARSEHASH_INCLUDE_DIR}) 239 | else() 240 | message(WARNING "Osmium: Disabled Google SparseHash library on 32bit system (size_type=${SPARSETABLE_SIZE_TYPE}).") 241 | endif() 242 | else() 243 | message(WARNING "Osmium: Google SparseHash library is required but not found, please install it or configure the paths.") 244 | endif() 245 | endif() 246 | 247 | #---------------------------------------------------------------------- 248 | 249 | list(REMOVE_DUPLICATES OSMIUM_INCLUDE_DIRS) 250 | 251 | if(OSMIUM_XML_LIBRARIES) 252 | list(REMOVE_DUPLICATES OSMIUM_XML_LIBRARIES) 253 | endif() 254 | 255 | if(OSMIUM_PBF_LIBRARIES) 256 | list(REMOVE_DUPLICATES OSMIUM_PBF_LIBRARIES) 257 | endif() 258 | 259 | if(OSMIUM_IO_LIBRARIES) 260 | list(REMOVE_DUPLICATES OSMIUM_IO_LIBRARIES) 261 | endif() 262 | 263 | if(OSMIUM_LIBRARIES) 264 | list(REMOVE_DUPLICATES OSMIUM_LIBRARIES) 265 | endif() 266 | 267 | #---------------------------------------------------------------------- 268 | # 269 | # Check that all required libraries are available 270 | # 271 | #---------------------------------------------------------------------- 272 | if(OSMIUM_EXTRA_FIND_VARS) 273 | list(REMOVE_DUPLICATES OSMIUM_EXTRA_FIND_VARS) 274 | endif() 275 | # Handle the QUIETLY and REQUIRED arguments and the optional version check 276 | # and set OSMIUM_FOUND to TRUE if all listed variables are TRUE. 277 | include(FindPackageHandleStandardArgs) 278 | find_package_handle_standard_args(Osmium 279 | REQUIRED_VARS OSMIUM_INCLUDE_DIR ${OSMIUM_EXTRA_FIND_VARS} 280 | VERSION_VAR _libosmium_version) 281 | unset(OSMIUM_EXTRA_FIND_VARS) 282 | 283 | #---------------------------------------------------------------------- 284 | # 285 | # A function for setting the -pthread option in compilers/linkers 286 | # 287 | #---------------------------------------------------------------------- 288 | function(set_pthread_on_target _target) 289 | if(NOT MSVC) 290 | set_target_properties(${_target} PROPERTIES COMPILE_FLAGS "-pthread") 291 | if(NOT APPLE) 292 | set_target_properties(${_target} PROPERTIES LINK_FLAGS "-pthread") 293 | endif() 294 | endif() 295 | endfunction() 296 | 297 | #---------------------------------------------------------------------- 298 | # 299 | # Add compiler flags 300 | # 301 | #---------------------------------------------------------------------- 302 | add_definitions(-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64) 303 | 304 | if(MSVC) 305 | add_definitions(-wd4996) 306 | 307 | # Disable warning C4068: "unknown pragma" because we want it to ignore 308 | # pragmas for other compilers. 309 | add_definitions(-wd4068) 310 | 311 | # Disable warning C4715: "not all control paths return a value" because 312 | # it generates too many false positives. 313 | add_definitions(-wd4715) 314 | 315 | # Disable warning C4351: new behavior: elements of array '...' will be 316 | # default initialized. The new behaviour is correct and we don't support 317 | # old compilers anyway. 318 | add_definitions(-wd4351) 319 | 320 | # Disable warning C4503: "decorated name length exceeded, name was truncated" 321 | # there are more than 150 of generated names in libosmium longer than 4096 symbols supported in MSVC 322 | add_definitions(-wd4503) 323 | 324 | add_definitions(-DNOMINMAX -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS) 325 | endif() 326 | 327 | if(APPLE) 328 | # following only available from cmake 2.8.12: 329 | # add_compile_options(-stdlib=libc++) 330 | # so using this instead: 331 | add_definitions(-stdlib=libc++) 332 | set(LDFLAGS ${LDFLAGS} -stdlib=libc++) 333 | endif() 334 | 335 | #---------------------------------------------------------------------- 336 | 337 | # This is a set of recommended warning options that can be added when compiling 338 | # libosmium code. 339 | if(MSVC) 340 | set(OSMIUM_WARNING_OPTIONS "/W3 /wd4514" CACHE STRING "Recommended warning options for libosmium") 341 | else() 342 | set(OSMIUM_WARNING_OPTIONS "-Wall -Wextra -pedantic -Wredundant-decls -Wdisabled-optimization -Wctor-dtor-privacy -Wnon-virtual-dtor -Woverloaded-virtual -Wsign-promo -Wold-style-cast" CACHE STRING "Recommended warning options for libosmium") 343 | endif() 344 | 345 | set(OSMIUM_DRACONIC_CLANG_OPTIONS "-Wdocumentation -Wunused-exception-parameter -Wmissing-declarations -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-unused-macros -Wno-exit-time-destructors -Wno-global-constructors -Wno-padded -Wno-switch-enum -Wno-missing-prototypes -Wno-weak-vtables -Wno-cast-align -Wno-float-equal") 346 | 347 | if(Osmium_DEBUG) 348 | message(STATUS "OSMIUM_XML_LIBRARIES=" ${OSMIUM_XML_LIBRARIES}) 349 | message(STATUS "OSMIUM_PBF_LIBRARIES=" ${OSMIUM_PBF_LIBRARIES}) 350 | message(STATUS "OSMIUM_IO_LIBRARIES=" ${OSMIUM_IO_LIBRARIES}) 351 | message(STATUS "OSMIUM_LIBRARIES=" ${OSMIUM_LIBRARIES}) 352 | message(STATUS "OSMIUM_INCLUDE_DIRS=" ${OSMIUM_INCLUDE_DIRS}) 353 | endif() 354 | 355 | -------------------------------------------------------------------------------- /filter/FindProtozero.cmake: -------------------------------------------------------------------------------- 1 | #---------------------------------------------------------------------- 2 | # 3 | # FindProtozero.cmake 4 | # 5 | # Find the protozero headers. 6 | # 7 | #---------------------------------------------------------------------- 8 | # 9 | # Usage: 10 | # 11 | # Copy this file somewhere into your project directory, where cmake can 12 | # find it. Usually this will be a directory called "cmake" which you can 13 | # add to the CMake module search path with the following line in your 14 | # CMakeLists.txt: 15 | # 16 | # list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake") 17 | # 18 | # Then add the following in your CMakeLists.txt: 19 | # 20 | # find_package(Protozero [version] [REQUIRED]) 21 | # include_directories(SYSTEM ${PROTOZERO_INCLUDE_DIR}) 22 | # 23 | # The version number is optional. If it is not set, any version of 24 | # protozero will do. 25 | # 26 | # if(NOT PROTOZERO_FOUND) 27 | # message(WARNING "Protozero not found!\n") 28 | # endif() 29 | # 30 | #---------------------------------------------------------------------- 31 | # 32 | # Variables: 33 | # 34 | # PROTOZERO_FOUND - True if Protozero was found. 35 | # PROTOZERO_INCLUDE_DIR - Where to find include files. 36 | # 37 | #---------------------------------------------------------------------- 38 | 39 | # find include path 40 | find_path(PROTOZERO_INCLUDE_DIR protozero/version.hpp 41 | PATH_SUFFIXES include 42 | PATHS ${CMAKE_SOURCE_DIR}/../protozero 43 | ) 44 | 45 | # Check version number 46 | if(Protozero_FIND_VERSION) 47 | file(STRINGS "${PROTOZERO_INCLUDE_DIR}/protozero/version.hpp" _version_define REGEX "#define PROTOZERO_VERSION_STRING") 48 | if("${_version_define}" MATCHES "#define PROTOZERO_VERSION_STRING \"([0-9.]+)\"") 49 | set(_version "${CMAKE_MATCH_1}") 50 | else() 51 | set(_version "unknown") 52 | endif() 53 | endif() 54 | 55 | #set(PROTOZERO_INCLUDE_DIRS "${PROTOZERO_INCLUDE_DIR}") 56 | 57 | include(FindPackageHandleStandardArgs) 58 | find_package_handle_standard_args(Protozero 59 | REQUIRED_VARS PROTOZERO_INCLUDE_DIR 60 | VERSION_VAR _version) 61 | 62 | 63 | #---------------------------------------------------------------------- 64 | -------------------------------------------------------------------------------- /filter/README.md: -------------------------------------------------------------------------------- 1 | # Filtering OSM by external dataset 2 | 3 | When you got points of multiple categories, an Overpass API request may fail 4 | from the number of query clauses. For that, you would need to filter the planet 5 | file yourself. First, prepare a list of categories and dataset points: 6 | 7 | conflate.py profile.py -f points.lst 8 | 9 | Then compile the filtering tool: 10 | 11 | mkdir build 12 | cmake .. 13 | make 14 | 15 | Download a planet file or an extract for the country of import, update it to the minute, 16 | and feed it to the filtering tool: 17 | 18 | ./filter_planet_by_cats points.lst planet-latest.osm.pbf > filtered.osm 19 | 20 | This will take an hour or two. The resulting OSM file should be used as an input to 21 | the conflation tool: 22 | 23 | conflate.py profile.py --osm filtered.osm -c changes.json 24 | 25 | ## Authors and License 26 | 27 | The `filter_planet_by_cats` script was written by Ilya Zverev for MAPS.ME and 28 | published under Apache License 2.0. 29 | 30 | The `xml_centers_output.hpp` and `*.cmake` files are based on 31 | [libosmium](https://github.com/osmcode/libosmium) code and hence published 32 | under the Boost License terms. 33 | 34 | `RTree.h` is under public domain, downloaded from 35 | [this repository](https://github.com/nushoin/RTree). 36 | -------------------------------------------------------------------------------- /filter/RTree.h: -------------------------------------------------------------------------------- 1 | #ifndef RTREE_H 2 | #define RTREE_H 3 | 4 | // NOTE This file compiles under MSVC 6 SP5 and MSVC .Net 2003 it may not work on other compilers without modification. 5 | 6 | // NOTE These next few lines may be win32 specific, you may need to modify them to compile on other platform 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include 13 | 14 | #define ASSERT assert // RTree uses ASSERT( condition ) 15 | #ifndef Min 16 | #define Min std::min 17 | #endif //Min 18 | #ifndef Max 19 | #define Max std::max 20 | #endif //Max 21 | 22 | // 23 | // RTree.h 24 | // 25 | 26 | #define RTREE_TEMPLATE template 27 | #define RTREE_QUAL RTree 28 | 29 | #define RTREE_DONT_USE_MEMPOOLS // This version does not contain a fixed memory allocator, fill in lines with EXAMPLE to implement one. 30 | #define RTREE_USE_SPHERICAL_VOLUME // Better split classification, may be slower on some systems 31 | 32 | // Fwd decl 33 | class RTFileStream; // File I/O helper class, look below for implementation and notes. 34 | 35 | 36 | /// \class RTree 37 | /// Implementation of RTree, a multidimensional bounding rectangle tree. 38 | /// Example usage: For a 3-dimensional tree use RTree myTree; 39 | /// 40 | /// This modified, templated C++ version by Greg Douglas at Auran (http://www.auran.com) 41 | /// 42 | /// DATATYPE Referenced data, should be int, void*, obj* etc. no larger than sizeof and simple type 43 | /// ELEMTYPE Type of element such as int or float 44 | /// NUMDIMS Number of dimensions such as 2 or 3 45 | /// ELEMTYPEREAL Type of element that allows fractional and large values such as float or double, for use in volume calcs 46 | /// 47 | /// NOTES: Inserting and removing data requires the knowledge of its constant Minimal Bounding Rectangle. 48 | /// This version uses new/delete for nodes, I recommend using a fixed size allocator for efficiency. 49 | /// Instead of using a callback function for returned results, I recommend and efficient pre-sized, grow-only memory 50 | /// array similar to MFC CArray or STL Vector for returning search query result. 51 | /// 52 | template 54 | class RTree 55 | { 56 | protected: 57 | 58 | struct Node; // Fwd decl. Used by other internal structs and iterator 59 | 60 | public: 61 | 62 | // These constant must be declared after Branch and before Node struct 63 | // Stuck up here for MSVC 6 compiler. NSVC .NET 2003 is much happier. 64 | enum 65 | { 66 | MAXNODES = TMAXNODES, ///< Max elements in node 67 | MINNODES = TMINNODES, ///< Min elements in node 68 | }; 69 | 70 | typedef bool (*t_resultCallback)(DATATYPE, void*); 71 | 72 | public: 73 | 74 | RTree(); 75 | virtual ~RTree(); 76 | 77 | /// Insert entry 78 | /// \param a_min Min of bounding rect 79 | /// \param a_max Max of bounding rect 80 | /// \param a_dataId Positive Id of data. Maybe zero, but negative numbers not allowed. 81 | void Insert(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId); 82 | 83 | /// Remove entry 84 | /// \param a_min Min of bounding rect 85 | /// \param a_max Max of bounding rect 86 | /// \param a_dataId Positive Id of data. Maybe zero, but negative numbers not allowed. 87 | void Remove(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId); 88 | 89 | /// Find all within search rectangle 90 | /// \param a_min Min of search bounding rect 91 | /// \param a_max Max of search bounding rect 92 | /// \param a_searchResult Search result array. Caller should set grow size. Function will reset, not append to array. 93 | /// \param a_resultCallback Callback function to return result. Callback should return 'true' to continue searching 94 | /// \param a_context User context to pass as parameter to a_resultCallback 95 | /// \return Returns the number of entries found 96 | int Search(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], t_resultCallback a_resultCallback, void* a_context); 97 | 98 | /// Remove all entries from tree 99 | void RemoveAll(); 100 | 101 | /// Count the data elements in this container. This is slow as no internal counter is maintained. 102 | int Count(); 103 | 104 | /// Load tree contents from file 105 | bool Load(const char* a_fileName); 106 | /// Load tree contents from stream 107 | bool Load(RTFileStream& a_stream); 108 | 109 | 110 | /// Save tree contents to file 111 | bool Save(const char* a_fileName); 112 | /// Save tree contents to stream 113 | bool Save(RTFileStream& a_stream); 114 | 115 | /// Iterator is not remove safe. 116 | class Iterator 117 | { 118 | private: 119 | 120 | enum { MAX_STACK = 32 }; // Max stack size. Allows almost n^32 where n is number of branches in node 121 | 122 | struct StackElement 123 | { 124 | Node* m_node; 125 | int m_branchIndex; 126 | }; 127 | 128 | public: 129 | 130 | Iterator() { Init(); } 131 | 132 | ~Iterator() { } 133 | 134 | /// Is iterator invalid 135 | bool IsNull() { return (m_tos <= 0); } 136 | 137 | /// Is iterator pointing to valid data 138 | bool IsNotNull() { return (m_tos > 0); } 139 | 140 | /// Access the current data element. Caller must be sure iterator is not NULL first. 141 | DATATYPE& operator*() 142 | { 143 | ASSERT(IsNotNull()); 144 | StackElement& curTos = m_stack[m_tos - 1]; 145 | return curTos.m_node->m_branch[curTos.m_branchIndex].m_data; 146 | } 147 | 148 | /// Access the current data element. Caller must be sure iterator is not NULL first. 149 | const DATATYPE& operator*() const 150 | { 151 | ASSERT(IsNotNull()); 152 | StackElement& curTos = m_stack[m_tos - 1]; 153 | return curTos.m_node->m_branch[curTos.m_branchIndex].m_data; 154 | } 155 | 156 | /// Find the next data element 157 | bool operator++() { return FindNextData(); } 158 | 159 | /// Get the bounds for this node 160 | void GetBounds(ELEMTYPE a_min[NUMDIMS], ELEMTYPE a_max[NUMDIMS]) 161 | { 162 | ASSERT(IsNotNull()); 163 | StackElement& curTos = m_stack[m_tos - 1]; 164 | Branch& curBranch = curTos.m_node->m_branch[curTos.m_branchIndex]; 165 | 166 | for(int index = 0; index < NUMDIMS; ++index) 167 | { 168 | a_min[index] = curBranch.m_rect.m_min[index]; 169 | a_max[index] = curBranch.m_rect.m_max[index]; 170 | } 171 | } 172 | 173 | private: 174 | 175 | /// Reset iterator 176 | void Init() { m_tos = 0; } 177 | 178 | /// Find the next data element in the tree (For internal use only) 179 | bool FindNextData() 180 | { 181 | for(;;) 182 | { 183 | if(m_tos <= 0) 184 | { 185 | return false; 186 | } 187 | StackElement curTos = Pop(); // Copy stack top cause it may change as we use it 188 | 189 | if(curTos.m_node->IsLeaf()) 190 | { 191 | // Keep walking through data while we can 192 | if(curTos.m_branchIndex+1 < curTos.m_node->m_count) 193 | { 194 | // There is more data, just point to the next one 195 | Push(curTos.m_node, curTos.m_branchIndex + 1); 196 | return true; 197 | } 198 | // No more data, so it will fall back to previous level 199 | } 200 | else 201 | { 202 | if(curTos.m_branchIndex+1 < curTos.m_node->m_count) 203 | { 204 | // Push sibling on for future tree walk 205 | // This is the 'fall back' node when we finish with the current level 206 | Push(curTos.m_node, curTos.m_branchIndex + 1); 207 | } 208 | // Since cur node is not a leaf, push first of next level to get deeper into the tree 209 | Node* nextLevelnode = curTos.m_node->m_branch[curTos.m_branchIndex].m_child; 210 | Push(nextLevelnode, 0); 211 | 212 | // If we pushed on a new leaf, exit as the data is ready at TOS 213 | if(nextLevelnode->IsLeaf()) 214 | { 215 | return true; 216 | } 217 | } 218 | } 219 | } 220 | 221 | /// Push node and branch onto iteration stack (For internal use only) 222 | void Push(Node* a_node, int a_branchIndex) 223 | { 224 | m_stack[m_tos].m_node = a_node; 225 | m_stack[m_tos].m_branchIndex = a_branchIndex; 226 | ++m_tos; 227 | ASSERT(m_tos <= MAX_STACK); 228 | } 229 | 230 | /// Pop element off iteration stack (For internal use only) 231 | StackElement& Pop() 232 | { 233 | ASSERT(m_tos > 0); 234 | --m_tos; 235 | return m_stack[m_tos]; 236 | } 237 | 238 | StackElement m_stack[MAX_STACK]; ///< Stack as we are doing iteration instead of recursion 239 | int m_tos; ///< Top Of Stack index 240 | 241 | friend class RTree; // Allow hiding of non-public functions while allowing manipulation by logical owner 242 | }; 243 | 244 | /// Get 'first' for iteration 245 | void GetFirst(Iterator& a_it) 246 | { 247 | a_it.Init(); 248 | Node* first = m_root; 249 | while(first) 250 | { 251 | if(first->IsInternalNode() && first->m_count > 1) 252 | { 253 | a_it.Push(first, 1); // Descend sibling branch later 254 | } 255 | else if(first->IsLeaf()) 256 | { 257 | if(first->m_count) 258 | { 259 | a_it.Push(first, 0); 260 | } 261 | break; 262 | } 263 | first = first->m_branch[0].m_child; 264 | } 265 | } 266 | 267 | /// Get Next for iteration 268 | void GetNext(Iterator& a_it) { ++a_it; } 269 | 270 | /// Is iterator NULL, or at end? 271 | bool IsNull(Iterator& a_it) { return a_it.IsNull(); } 272 | 273 | /// Get object at iterator position 274 | DATATYPE& GetAt(Iterator& a_it) { return *a_it; } 275 | 276 | protected: 277 | 278 | /// Minimal bounding rectangle (n-dimensional) 279 | struct Rect 280 | { 281 | ELEMTYPE m_min[NUMDIMS]; ///< Min dimensions of bounding box 282 | ELEMTYPE m_max[NUMDIMS]; ///< Max dimensions of bounding box 283 | }; 284 | 285 | /// May be data or may be another subtree 286 | /// The parents level determines this. 287 | /// If the parents level is 0, then this is data 288 | struct Branch 289 | { 290 | Rect m_rect; ///< Bounds 291 | Node* m_child; ///< Child node 292 | DATATYPE m_data; ///< Data Id 293 | }; 294 | 295 | /// Node for each branch level 296 | struct Node 297 | { 298 | bool IsInternalNode() { return (m_level > 0); } // Not a leaf, but a internal node 299 | bool IsLeaf() { return (m_level == 0); } // A leaf, contains data 300 | 301 | int m_count; ///< Count 302 | int m_level; ///< Leaf is zero, others positive 303 | Branch m_branch[MAXNODES]; ///< Branch 304 | }; 305 | 306 | /// A link list of nodes for reinsertion after a delete operation 307 | struct ListNode 308 | { 309 | ListNode* m_next; ///< Next in list 310 | Node* m_node; ///< Node 311 | }; 312 | 313 | /// Variables for finding a split partition 314 | struct PartitionVars 315 | { 316 | enum { NOT_TAKEN = -1 }; // indicates that position 317 | 318 | int m_partition[MAXNODES+1]; 319 | int m_total; 320 | int m_minFill; 321 | int m_count[2]; 322 | Rect m_cover[2]; 323 | ELEMTYPEREAL m_area[2]; 324 | 325 | Branch m_branchBuf[MAXNODES+1]; 326 | int m_branchCount; 327 | Rect m_coverSplit; 328 | ELEMTYPEREAL m_coverSplitArea; 329 | }; 330 | 331 | Node* AllocNode(); 332 | void FreeNode(Node* a_node); 333 | void InitNode(Node* a_node); 334 | void InitRect(Rect* a_rect); 335 | bool InsertRectRec(const Branch& a_branch, Node* a_node, Node** a_newNode, int a_level); 336 | bool InsertRect(const Branch& a_branch, Node** a_root, int a_level); 337 | Rect NodeCover(Node* a_node); 338 | bool AddBranch(const Branch* a_branch, Node* a_node, Node** a_newNode); 339 | void DisconnectBranch(Node* a_node, int a_index); 340 | int PickBranch(const Rect* a_rect, Node* a_node); 341 | Rect CombineRect(const Rect* a_rectA, const Rect* a_rectB); 342 | void SplitNode(Node* a_node, const Branch* a_branch, Node** a_newNode); 343 | ELEMTYPEREAL RectSphericalVolume(Rect* a_rect); 344 | ELEMTYPEREAL RectVolume(Rect* a_rect); 345 | ELEMTYPEREAL CalcRectVolume(Rect* a_rect); 346 | void GetBranches(Node* a_node, const Branch* a_branch, PartitionVars* a_parVars); 347 | void ChoosePartition(PartitionVars* a_parVars, int a_minFill); 348 | void LoadNodes(Node* a_nodeA, Node* a_nodeB, PartitionVars* a_parVars); 349 | void InitParVars(PartitionVars* a_parVars, int a_maxRects, int a_minFill); 350 | void PickSeeds(PartitionVars* a_parVars); 351 | void Classify(int a_index, int a_group, PartitionVars* a_parVars); 352 | bool RemoveRect(Rect* a_rect, const DATATYPE& a_id, Node** a_root); 353 | bool RemoveRectRec(Rect* a_rect, const DATATYPE& a_id, Node* a_node, ListNode** a_listNode); 354 | ListNode* AllocListNode(); 355 | void FreeListNode(ListNode* a_listNode); 356 | bool Overlap(Rect* a_rectA, Rect* a_rectB); 357 | void ReInsert(Node* a_node, ListNode** a_listNode); 358 | bool Search(Node* a_node, Rect* a_rect, int& a_foundCount, t_resultCallback a_resultCallback, void* a_context); 359 | void RemoveAllRec(Node* a_node); 360 | void Reset(); 361 | void CountRec(Node* a_node, int& a_count); 362 | 363 | bool SaveRec(Node* a_node, RTFileStream& a_stream); 364 | bool LoadRec(Node* a_node, RTFileStream& a_stream); 365 | 366 | Node* m_root; ///< Root of tree 367 | ELEMTYPEREAL m_unitSphereVolume; ///< Unit sphere constant for required number of dimensions 368 | }; 369 | 370 | 371 | // Because there is not stream support, this is a quick and dirty file I/O helper. 372 | // Users will likely replace its usage with a Stream implementation from their favorite API. 373 | class RTFileStream 374 | { 375 | FILE* m_file; 376 | 377 | public: 378 | 379 | 380 | RTFileStream() 381 | { 382 | m_file = NULL; 383 | } 384 | 385 | ~RTFileStream() 386 | { 387 | Close(); 388 | } 389 | 390 | bool OpenRead(const char* a_fileName) 391 | { 392 | m_file = fopen(a_fileName, "rb"); 393 | if(!m_file) 394 | { 395 | return false; 396 | } 397 | return true; 398 | } 399 | 400 | bool OpenWrite(const char* a_fileName) 401 | { 402 | m_file = fopen(a_fileName, "wb"); 403 | if(!m_file) 404 | { 405 | return false; 406 | } 407 | return true; 408 | } 409 | 410 | void Close() 411 | { 412 | if(m_file) 413 | { 414 | fclose(m_file); 415 | m_file = NULL; 416 | } 417 | } 418 | 419 | template< typename TYPE > 420 | size_t Write(const TYPE& a_value) 421 | { 422 | ASSERT(m_file); 423 | return fwrite((void*)&a_value, sizeof(a_value), 1, m_file); 424 | } 425 | 426 | template< typename TYPE > 427 | size_t WriteArray(const TYPE* a_array, int a_count) 428 | { 429 | ASSERT(m_file); 430 | return fwrite((void*)a_array, sizeof(TYPE) * a_count, 1, m_file); 431 | } 432 | 433 | template< typename TYPE > 434 | size_t Read(TYPE& a_value) 435 | { 436 | ASSERT(m_file); 437 | return fread((void*)&a_value, sizeof(a_value), 1, m_file); 438 | } 439 | 440 | template< typename TYPE > 441 | size_t ReadArray(TYPE* a_array, int a_count) 442 | { 443 | ASSERT(m_file); 444 | return fread((void*)a_array, sizeof(TYPE) * a_count, 1, m_file); 445 | } 446 | }; 447 | 448 | 449 | RTREE_TEMPLATE 450 | RTREE_QUAL::RTree() 451 | { 452 | ASSERT(MAXNODES > MINNODES); 453 | ASSERT(MINNODES > 0); 454 | 455 | // Precomputed volumes of the unit spheres for the first few dimensions 456 | const float UNIT_SPHERE_VOLUMES[] = { 457 | 0.000000f, 2.000000f, 3.141593f, // Dimension 0,1,2 458 | 4.188790f, 4.934802f, 5.263789f, // Dimension 3,4,5 459 | 5.167713f, 4.724766f, 4.058712f, // Dimension 6,7,8 460 | 3.298509f, 2.550164f, 1.884104f, // Dimension 9,10,11 461 | 1.335263f, 0.910629f, 0.599265f, // Dimension 12,13,14 462 | 0.381443f, 0.235331f, 0.140981f, // Dimension 15,16,17 463 | 0.082146f, 0.046622f, 0.025807f, // Dimension 18,19,20 464 | }; 465 | 466 | m_root = AllocNode(); 467 | m_root->m_level = 0; 468 | m_unitSphereVolume = (ELEMTYPEREAL)UNIT_SPHERE_VOLUMES[NUMDIMS]; 469 | } 470 | 471 | 472 | RTREE_TEMPLATE 473 | RTREE_QUAL::~RTree() 474 | { 475 | Reset(); // Free, or reset node memory 476 | } 477 | 478 | 479 | RTREE_TEMPLATE 480 | void RTREE_QUAL::Insert(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId) 481 | { 482 | #ifdef _DEBUG 483 | for(int index=0; indexIsInternalNode()) // not a leaf node 567 | { 568 | for(int index = 0; index < a_node->m_count; ++index) 569 | { 570 | CountRec(a_node->m_branch[index].m_child, a_count); 571 | } 572 | } 573 | else // A leaf node 574 | { 575 | a_count += a_node->m_count; 576 | } 577 | } 578 | 579 | 580 | RTREE_TEMPLATE 581 | bool RTREE_QUAL::Load(const char* a_fileName) 582 | { 583 | RemoveAll(); // Clear existing tree 584 | 585 | RTFileStream stream; 586 | if(!stream.OpenRead(a_fileName)) 587 | { 588 | return false; 589 | } 590 | 591 | bool result = Load(stream); 592 | 593 | stream.Close(); 594 | 595 | return result; 596 | } 597 | 598 | 599 | 600 | RTREE_TEMPLATE 601 | bool RTREE_QUAL::Load(RTFileStream& a_stream) 602 | { 603 | // Write some kind of header 604 | int _dataFileId = ('R'<<0)|('T'<<8)|('R'<<16)|('E'<<24); 605 | int _dataSize = sizeof(DATATYPE); 606 | int _dataNumDims = NUMDIMS; 607 | int _dataElemSize = sizeof(ELEMTYPE); 608 | int _dataElemRealSize = sizeof(ELEMTYPEREAL); 609 | int _dataMaxNodes = TMAXNODES; 610 | int _dataMinNodes = TMINNODES; 611 | 612 | int dataFileId = 0; 613 | int dataSize = 0; 614 | int dataNumDims = 0; 615 | int dataElemSize = 0; 616 | int dataElemRealSize = 0; 617 | int dataMaxNodes = 0; 618 | int dataMinNodes = 0; 619 | 620 | a_stream.Read(dataFileId); 621 | a_stream.Read(dataSize); 622 | a_stream.Read(dataNumDims); 623 | a_stream.Read(dataElemSize); 624 | a_stream.Read(dataElemRealSize); 625 | a_stream.Read(dataMaxNodes); 626 | a_stream.Read(dataMinNodes); 627 | 628 | bool result = false; 629 | 630 | // Test if header was valid and compatible 631 | if( (dataFileId == _dataFileId) 632 | && (dataSize == _dataSize) 633 | && (dataNumDims == _dataNumDims) 634 | && (dataElemSize == _dataElemSize) 635 | && (dataElemRealSize == _dataElemRealSize) 636 | && (dataMaxNodes == _dataMaxNodes) 637 | && (dataMinNodes == _dataMinNodes) 638 | ) 639 | { 640 | // Recursively load tree 641 | result = LoadRec(m_root, a_stream); 642 | } 643 | 644 | return result; 645 | } 646 | 647 | 648 | RTREE_TEMPLATE 649 | bool RTREE_QUAL::LoadRec(Node* a_node, RTFileStream& a_stream) 650 | { 651 | a_stream.Read(a_node->m_level); 652 | a_stream.Read(a_node->m_count); 653 | 654 | if(a_node->IsInternalNode()) // not a leaf node 655 | { 656 | for(int index = 0; index < a_node->m_count; ++index) 657 | { 658 | Branch* curBranch = &a_node->m_branch[index]; 659 | 660 | a_stream.ReadArray(curBranch->m_rect.m_min, NUMDIMS); 661 | a_stream.ReadArray(curBranch->m_rect.m_max, NUMDIMS); 662 | 663 | curBranch->m_child = AllocNode(); 664 | LoadRec(curBranch->m_child, a_stream); 665 | } 666 | } 667 | else // A leaf node 668 | { 669 | for(int index = 0; index < a_node->m_count; ++index) 670 | { 671 | Branch* curBranch = &a_node->m_branch[index]; 672 | 673 | a_stream.ReadArray(curBranch->m_rect.m_min, NUMDIMS); 674 | a_stream.ReadArray(curBranch->m_rect.m_max, NUMDIMS); 675 | 676 | a_stream.Read(curBranch->m_data); 677 | } 678 | } 679 | 680 | return true; // Should do more error checking on I/O operations 681 | } 682 | 683 | 684 | RTREE_TEMPLATE 685 | bool RTREE_QUAL::Save(const char* a_fileName) 686 | { 687 | RTFileStream stream; 688 | if(!stream.OpenWrite(a_fileName)) 689 | { 690 | return false; 691 | } 692 | 693 | bool result = Save(stream); 694 | 695 | stream.Close(); 696 | 697 | return result; 698 | } 699 | 700 | 701 | RTREE_TEMPLATE 702 | bool RTREE_QUAL::Save(RTFileStream& a_stream) 703 | { 704 | // Write some kind of header 705 | int dataFileId = ('R'<<0)|('T'<<8)|('R'<<16)|('E'<<24); 706 | int dataSize = sizeof(DATATYPE); 707 | int dataNumDims = NUMDIMS; 708 | int dataElemSize = sizeof(ELEMTYPE); 709 | int dataElemRealSize = sizeof(ELEMTYPEREAL); 710 | int dataMaxNodes = TMAXNODES; 711 | int dataMinNodes = TMINNODES; 712 | 713 | a_stream.Write(dataFileId); 714 | a_stream.Write(dataSize); 715 | a_stream.Write(dataNumDims); 716 | a_stream.Write(dataElemSize); 717 | a_stream.Write(dataElemRealSize); 718 | a_stream.Write(dataMaxNodes); 719 | a_stream.Write(dataMinNodes); 720 | 721 | // Recursively save tree 722 | bool result = SaveRec(m_root, a_stream); 723 | 724 | return result; 725 | } 726 | 727 | 728 | RTREE_TEMPLATE 729 | bool RTREE_QUAL::SaveRec(Node* a_node, RTFileStream& a_stream) 730 | { 731 | a_stream.Write(a_node->m_level); 732 | a_stream.Write(a_node->m_count); 733 | 734 | if(a_node->IsInternalNode()) // not a leaf node 735 | { 736 | for(int index = 0; index < a_node->m_count; ++index) 737 | { 738 | Branch* curBranch = &a_node->m_branch[index]; 739 | 740 | a_stream.WriteArray(curBranch->m_rect.m_min, NUMDIMS); 741 | a_stream.WriteArray(curBranch->m_rect.m_max, NUMDIMS); 742 | 743 | SaveRec(curBranch->m_child, a_stream); 744 | } 745 | } 746 | else // A leaf node 747 | { 748 | for(int index = 0; index < a_node->m_count; ++index) 749 | { 750 | Branch* curBranch = &a_node->m_branch[index]; 751 | 752 | a_stream.WriteArray(curBranch->m_rect.m_min, NUMDIMS); 753 | a_stream.WriteArray(curBranch->m_rect.m_max, NUMDIMS); 754 | 755 | a_stream.Write(curBranch->m_data); 756 | } 757 | } 758 | 759 | return true; // Should do more error checking on I/O operations 760 | } 761 | 762 | 763 | RTREE_TEMPLATE 764 | void RTREE_QUAL::RemoveAll() 765 | { 766 | // Delete all existing nodes 767 | Reset(); 768 | 769 | m_root = AllocNode(); 770 | m_root->m_level = 0; 771 | } 772 | 773 | 774 | RTREE_TEMPLATE 775 | void RTREE_QUAL::Reset() 776 | { 777 | #ifdef RTREE_DONT_USE_MEMPOOLS 778 | // Delete all existing nodes 779 | RemoveAllRec(m_root); 780 | #else // RTREE_DONT_USE_MEMPOOLS 781 | // Just reset memory pools. We are not using complex types 782 | // EXAMPLE 783 | #endif // RTREE_DONT_USE_MEMPOOLS 784 | } 785 | 786 | 787 | RTREE_TEMPLATE 788 | void RTREE_QUAL::RemoveAllRec(Node* a_node) 789 | { 790 | ASSERT(a_node); 791 | ASSERT(a_node->m_level >= 0); 792 | 793 | if(a_node->IsInternalNode()) // This is an internal node in the tree 794 | { 795 | for(int index=0; index < a_node->m_count; ++index) 796 | { 797 | RemoveAllRec(a_node->m_branch[index].m_child); 798 | } 799 | } 800 | FreeNode(a_node); 801 | } 802 | 803 | 804 | RTREE_TEMPLATE 805 | typename RTREE_QUAL::Node* RTREE_QUAL::AllocNode() 806 | { 807 | Node* newNode; 808 | #ifdef RTREE_DONT_USE_MEMPOOLS 809 | newNode = new Node; 810 | #else // RTREE_DONT_USE_MEMPOOLS 811 | // EXAMPLE 812 | #endif // RTREE_DONT_USE_MEMPOOLS 813 | InitNode(newNode); 814 | return newNode; 815 | } 816 | 817 | 818 | RTREE_TEMPLATE 819 | void RTREE_QUAL::FreeNode(Node* a_node) 820 | { 821 | ASSERT(a_node); 822 | 823 | #ifdef RTREE_DONT_USE_MEMPOOLS 824 | delete a_node; 825 | #else // RTREE_DONT_USE_MEMPOOLS 826 | // EXAMPLE 827 | #endif // RTREE_DONT_USE_MEMPOOLS 828 | } 829 | 830 | 831 | // Allocate space for a node in the list used in DeletRect to 832 | // store Nodes that are too empty. 833 | RTREE_TEMPLATE 834 | typename RTREE_QUAL::ListNode* RTREE_QUAL::AllocListNode() 835 | { 836 | #ifdef RTREE_DONT_USE_MEMPOOLS 837 | return new ListNode; 838 | #else // RTREE_DONT_USE_MEMPOOLS 839 | // EXAMPLE 840 | #endif // RTREE_DONT_USE_MEMPOOLS 841 | } 842 | 843 | 844 | RTREE_TEMPLATE 845 | void RTREE_QUAL::FreeListNode(ListNode* a_listNode) 846 | { 847 | #ifdef RTREE_DONT_USE_MEMPOOLS 848 | delete a_listNode; 849 | #else // RTREE_DONT_USE_MEMPOOLS 850 | // EXAMPLE 851 | #endif // RTREE_DONT_USE_MEMPOOLS 852 | } 853 | 854 | 855 | RTREE_TEMPLATE 856 | void RTREE_QUAL::InitNode(Node* a_node) 857 | { 858 | a_node->m_count = 0; 859 | a_node->m_level = -1; 860 | } 861 | 862 | 863 | RTREE_TEMPLATE 864 | void RTREE_QUAL::InitRect(Rect* a_rect) 865 | { 866 | for(int index = 0; index < NUMDIMS; ++index) 867 | { 868 | a_rect->m_min[index] = (ELEMTYPE)0; 869 | a_rect->m_max[index] = (ELEMTYPE)0; 870 | } 871 | } 872 | 873 | 874 | // Inserts a new data rectangle into the index structure. 875 | // Recursively descends tree, propagates splits back up. 876 | // Returns 0 if node was not split. Old node updated. 877 | // If node was split, returns 1 and sets the pointer pointed to by 878 | // new_node to point to the new node. Old node updated to become one of two. 879 | // The level argument specifies the number of steps up from the leaf 880 | // level to insert; e.g. a data rectangle goes in at level = 0. 881 | RTREE_TEMPLATE 882 | bool RTREE_QUAL::InsertRectRec(const Branch& a_branch, Node* a_node, Node** a_newNode, int a_level) 883 | { 884 | ASSERT(a_node && a_newNode); 885 | ASSERT(a_level >= 0 && a_level <= a_node->m_level); 886 | 887 | // recurse until we reach the correct level for the new record. data records 888 | // will always be called with a_level == 0 (leaf) 889 | if(a_node->m_level > a_level) 890 | { 891 | // Still above level for insertion, go down tree recursively 892 | Node* otherNode; 893 | 894 | // find the optimal branch for this record 895 | int index = PickBranch(&a_branch.m_rect, a_node); 896 | 897 | // recursively insert this record into the picked branch 898 | bool childWasSplit = InsertRectRec(a_branch, a_node->m_branch[index].m_child, &otherNode, a_level); 899 | 900 | if (!childWasSplit) 901 | { 902 | // Child was not split. Merge the bounding box of the new record with the 903 | // existing bounding box 904 | a_node->m_branch[index].m_rect = CombineRect(&a_branch.m_rect, &(a_node->m_branch[index].m_rect)); 905 | return false; 906 | } 907 | else 908 | { 909 | // Child was split. The old branches are now re-partitioned to two nodes 910 | // so we have to re-calculate the bounding boxes of each node 911 | a_node->m_branch[index].m_rect = NodeCover(a_node->m_branch[index].m_child); 912 | Branch branch; 913 | branch.m_child = otherNode; 914 | branch.m_rect = NodeCover(otherNode); 915 | 916 | // The old node is already a child of a_node. Now add the newly-created 917 | // node to a_node as well. a_node might be split because of that. 918 | return AddBranch(&branch, a_node, a_newNode); 919 | } 920 | } 921 | else if(a_node->m_level == a_level) 922 | { 923 | // We have reached level for insertion. Add rect, split if necessary 924 | return AddBranch(&a_branch, a_node, a_newNode); 925 | } 926 | else 927 | { 928 | // Should never occur 929 | ASSERT(0); 930 | return false; 931 | } 932 | } 933 | 934 | 935 | // Insert a data rectangle into an index structure. 936 | // InsertRect provides for splitting the root; 937 | // returns 1 if root was split, 0 if it was not. 938 | // The level argument specifies the number of steps up from the leaf 939 | // level to insert; e.g. a data rectangle goes in at level = 0. 940 | // InsertRect2 does the recursion. 941 | // 942 | RTREE_TEMPLATE 943 | bool RTREE_QUAL::InsertRect(const Branch& a_branch, Node** a_root, int a_level) 944 | { 945 | ASSERT(a_root); 946 | ASSERT(a_level >= 0 && a_level <= (*a_root)->m_level); 947 | #ifdef _DEBUG 948 | for(int index=0; index < NUMDIMS; ++index) 949 | { 950 | ASSERT(a_branch.m_rect.m_min[index] <= a_branch.m_rect.m_max[index]); 951 | } 952 | #endif //_DEBUG 953 | 954 | Node* newNode; 955 | 956 | if(InsertRectRec(a_branch, *a_root, &newNode, a_level)) // Root split 957 | { 958 | // Grow tree taller and new root 959 | Node* newRoot = AllocNode(); 960 | newRoot->m_level = (*a_root)->m_level + 1; 961 | 962 | Branch branch; 963 | 964 | // add old root node as a child of the new root 965 | branch.m_rect = NodeCover(*a_root); 966 | branch.m_child = *a_root; 967 | AddBranch(&branch, newRoot, NULL); 968 | 969 | // add the split node as a child of the new root 970 | branch.m_rect = NodeCover(newNode); 971 | branch.m_child = newNode; 972 | AddBranch(&branch, newRoot, NULL); 973 | 974 | // set the new root as the root node 975 | *a_root = newRoot; 976 | 977 | return true; 978 | } 979 | 980 | return false; 981 | } 982 | 983 | 984 | // Find the smallest rectangle that includes all rectangles in branches of a node. 985 | RTREE_TEMPLATE 986 | typename RTREE_QUAL::Rect RTREE_QUAL::NodeCover(Node* a_node) 987 | { 988 | ASSERT(a_node); 989 | 990 | Rect rect = a_node->m_branch[0].m_rect; 991 | for(int index = 1; index < a_node->m_count; ++index) 992 | { 993 | rect = CombineRect(&rect, &(a_node->m_branch[index].m_rect)); 994 | } 995 | 996 | return rect; 997 | } 998 | 999 | 1000 | // Add a branch to a node. Split the node if necessary. 1001 | // Returns 0 if node not split. Old node updated. 1002 | // Returns 1 if node split, sets *new_node to address of new node. 1003 | // Old node updated, becomes one of two. 1004 | RTREE_TEMPLATE 1005 | bool RTREE_QUAL::AddBranch(const Branch* a_branch, Node* a_node, Node** a_newNode) 1006 | { 1007 | ASSERT(a_branch); 1008 | ASSERT(a_node); 1009 | 1010 | if(a_node->m_count < MAXNODES) // Split won't be necessary 1011 | { 1012 | a_node->m_branch[a_node->m_count] = *a_branch; 1013 | ++a_node->m_count; 1014 | 1015 | return false; 1016 | } 1017 | else 1018 | { 1019 | ASSERT(a_newNode); 1020 | 1021 | SplitNode(a_node, a_branch, a_newNode); 1022 | return true; 1023 | } 1024 | } 1025 | 1026 | 1027 | // Disconnect a dependent node. 1028 | // Caller must return (or stop using iteration index) after this as count has changed 1029 | RTREE_TEMPLATE 1030 | void RTREE_QUAL::DisconnectBranch(Node* a_node, int a_index) 1031 | { 1032 | ASSERT(a_node && (a_index >= 0) && (a_index < MAXNODES)); 1033 | ASSERT(a_node->m_count > 0); 1034 | 1035 | // Remove element by swapping with the last element to prevent gaps in array 1036 | a_node->m_branch[a_index] = a_node->m_branch[a_node->m_count - 1]; 1037 | 1038 | --a_node->m_count; 1039 | } 1040 | 1041 | 1042 | // Pick a branch. Pick the one that will need the smallest increase 1043 | // in area to accomodate the new rectangle. This will result in the 1044 | // least total area for the covering rectangles in the current node. 1045 | // In case of a tie, pick the one which was smaller before, to get 1046 | // the best resolution when searching. 1047 | RTREE_TEMPLATE 1048 | int RTREE_QUAL::PickBranch(const Rect* a_rect, Node* a_node) 1049 | { 1050 | ASSERT(a_rect && a_node); 1051 | 1052 | bool firstTime = true; 1053 | ELEMTYPEREAL increase; 1054 | ELEMTYPEREAL bestIncr = (ELEMTYPEREAL)-1; 1055 | ELEMTYPEREAL area; 1056 | ELEMTYPEREAL bestArea; 1057 | int best; 1058 | Rect tempRect; 1059 | 1060 | for(int index=0; index < a_node->m_count; ++index) 1061 | { 1062 | Rect* curRect = &a_node->m_branch[index].m_rect; 1063 | area = CalcRectVolume(curRect); 1064 | tempRect = CombineRect(a_rect, curRect); 1065 | increase = CalcRectVolume(&tempRect) - area; 1066 | if((increase < bestIncr) || firstTime) 1067 | { 1068 | best = index; 1069 | bestArea = area; 1070 | bestIncr = increase; 1071 | firstTime = false; 1072 | } 1073 | else if((increase == bestIncr) && (area < bestArea)) 1074 | { 1075 | best = index; 1076 | bestArea = area; 1077 | bestIncr = increase; 1078 | } 1079 | } 1080 | return best; 1081 | } 1082 | 1083 | 1084 | // Combine two rectangles into larger one containing both 1085 | RTREE_TEMPLATE 1086 | typename RTREE_QUAL::Rect RTREE_QUAL::CombineRect(const Rect* a_rectA, const Rect* a_rectB) 1087 | { 1088 | ASSERT(a_rectA && a_rectB); 1089 | 1090 | Rect newRect; 1091 | 1092 | for(int index = 0; index < NUMDIMS; ++index) 1093 | { 1094 | newRect.m_min[index] = Min(a_rectA->m_min[index], a_rectB->m_min[index]); 1095 | newRect.m_max[index] = Max(a_rectA->m_max[index], a_rectB->m_max[index]); 1096 | } 1097 | 1098 | return newRect; 1099 | } 1100 | 1101 | 1102 | 1103 | // Split a node. 1104 | // Divides the nodes branches and the extra one between two nodes. 1105 | // Old node is one of the new ones, and one really new one is created. 1106 | // Tries more than one method for choosing a partition, uses best result. 1107 | RTREE_TEMPLATE 1108 | void RTREE_QUAL::SplitNode(Node* a_node, const Branch* a_branch, Node** a_newNode) 1109 | { 1110 | ASSERT(a_node); 1111 | ASSERT(a_branch); 1112 | 1113 | // Could just use local here, but member or external is faster since it is reused 1114 | PartitionVars localVars; 1115 | PartitionVars* parVars = &localVars; 1116 | 1117 | // Load all the branches into a buffer, initialize old node 1118 | GetBranches(a_node, a_branch, parVars); 1119 | 1120 | // Find partition 1121 | ChoosePartition(parVars, MINNODES); 1122 | 1123 | // Create a new node to hold (about) half of the branches 1124 | *a_newNode = AllocNode(); 1125 | (*a_newNode)->m_level = a_node->m_level; 1126 | 1127 | // Put branches from buffer into 2 nodes according to the chosen partition 1128 | a_node->m_count = 0; 1129 | LoadNodes(a_node, *a_newNode, parVars); 1130 | 1131 | ASSERT((a_node->m_count + (*a_newNode)->m_count) == parVars->m_total); 1132 | } 1133 | 1134 | 1135 | // Calculate the n-dimensional volume of a rectangle 1136 | RTREE_TEMPLATE 1137 | ELEMTYPEREAL RTREE_QUAL::RectVolume(Rect* a_rect) 1138 | { 1139 | ASSERT(a_rect); 1140 | 1141 | ELEMTYPEREAL volume = (ELEMTYPEREAL)1; 1142 | 1143 | for(int index=0; indexm_max[index] - a_rect->m_min[index]; 1146 | } 1147 | 1148 | ASSERT(volume >= (ELEMTYPEREAL)0); 1149 | 1150 | return volume; 1151 | } 1152 | 1153 | 1154 | // The exact volume of the bounding sphere for the given Rect 1155 | RTREE_TEMPLATE 1156 | ELEMTYPEREAL RTREE_QUAL::RectSphericalVolume(Rect* a_rect) 1157 | { 1158 | ASSERT(a_rect); 1159 | 1160 | ELEMTYPEREAL sumOfSquares = (ELEMTYPEREAL)0; 1161 | ELEMTYPEREAL radius; 1162 | 1163 | for(int index=0; index < NUMDIMS; ++index) 1164 | { 1165 | ELEMTYPEREAL halfExtent = ((ELEMTYPEREAL)a_rect->m_max[index] - (ELEMTYPEREAL)a_rect->m_min[index]) * 0.5f; 1166 | sumOfSquares += halfExtent * halfExtent; 1167 | } 1168 | 1169 | radius = (ELEMTYPEREAL)sqrt(sumOfSquares); 1170 | 1171 | // Pow maybe slow, so test for common dims like 2,3 and just use x*x, x*x*x. 1172 | if(NUMDIMS == 3) 1173 | { 1174 | return (radius * radius * radius * m_unitSphereVolume); 1175 | } 1176 | else if(NUMDIMS == 2) 1177 | { 1178 | return (radius * radius * m_unitSphereVolume); 1179 | } 1180 | else 1181 | { 1182 | return (ELEMTYPEREAL)(pow(radius, NUMDIMS) * m_unitSphereVolume); 1183 | } 1184 | } 1185 | 1186 | 1187 | // Use one of the methods to calculate retangle volume 1188 | RTREE_TEMPLATE 1189 | ELEMTYPEREAL RTREE_QUAL::CalcRectVolume(Rect* a_rect) 1190 | { 1191 | #ifdef RTREE_USE_SPHERICAL_VOLUME 1192 | return RectSphericalVolume(a_rect); // Slower but helps certain merge cases 1193 | #else // RTREE_USE_SPHERICAL_VOLUME 1194 | return RectVolume(a_rect); // Faster but can cause poor merges 1195 | #endif // RTREE_USE_SPHERICAL_VOLUME 1196 | } 1197 | 1198 | 1199 | // Load branch buffer with branches from full node plus the extra branch. 1200 | RTREE_TEMPLATE 1201 | void RTREE_QUAL::GetBranches(Node* a_node, const Branch* a_branch, PartitionVars* a_parVars) 1202 | { 1203 | ASSERT(a_node); 1204 | ASSERT(a_branch); 1205 | 1206 | ASSERT(a_node->m_count == MAXNODES); 1207 | 1208 | // Load the branch buffer 1209 | for(int index=0; index < MAXNODES; ++index) 1210 | { 1211 | a_parVars->m_branchBuf[index] = a_node->m_branch[index]; 1212 | } 1213 | a_parVars->m_branchBuf[MAXNODES] = *a_branch; 1214 | a_parVars->m_branchCount = MAXNODES + 1; 1215 | 1216 | // Calculate rect containing all in the set 1217 | a_parVars->m_coverSplit = a_parVars->m_branchBuf[0].m_rect; 1218 | for(int index=1; index < MAXNODES+1; ++index) 1219 | { 1220 | a_parVars->m_coverSplit = CombineRect(&a_parVars->m_coverSplit, &a_parVars->m_branchBuf[index].m_rect); 1221 | } 1222 | a_parVars->m_coverSplitArea = CalcRectVolume(&a_parVars->m_coverSplit); 1223 | } 1224 | 1225 | 1226 | // Method #0 for choosing a partition: 1227 | // As the seeds for the two groups, pick the two rects that would waste the 1228 | // most area if covered by a single rectangle, i.e. evidently the worst pair 1229 | // to have in the same group. 1230 | // Of the remaining, one at a time is chosen to be put in one of the two groups. 1231 | // The one chosen is the one with the greatest difference in area expansion 1232 | // depending on which group - the rect most strongly attracted to one group 1233 | // and repelled from the other. 1234 | // If one group gets too full (more would force other group to violate min 1235 | // fill requirement) then other group gets the rest. 1236 | // These last are the ones that can go in either group most easily. 1237 | RTREE_TEMPLATE 1238 | void RTREE_QUAL::ChoosePartition(PartitionVars* a_parVars, int a_minFill) 1239 | { 1240 | ASSERT(a_parVars); 1241 | 1242 | ELEMTYPEREAL biggestDiff; 1243 | int group, chosen, betterGroup; 1244 | 1245 | InitParVars(a_parVars, a_parVars->m_branchCount, a_minFill); 1246 | PickSeeds(a_parVars); 1247 | 1248 | while (((a_parVars->m_count[0] + a_parVars->m_count[1]) < a_parVars->m_total) 1249 | && (a_parVars->m_count[0] < (a_parVars->m_total - a_parVars->m_minFill)) 1250 | && (a_parVars->m_count[1] < (a_parVars->m_total - a_parVars->m_minFill))) 1251 | { 1252 | biggestDiff = (ELEMTYPEREAL) -1; 1253 | for(int index=0; indexm_total; ++index) 1254 | { 1255 | if(PartitionVars::NOT_TAKEN == a_parVars->m_partition[index]) 1256 | { 1257 | Rect* curRect = &a_parVars->m_branchBuf[index].m_rect; 1258 | Rect rect0 = CombineRect(curRect, &a_parVars->m_cover[0]); 1259 | Rect rect1 = CombineRect(curRect, &a_parVars->m_cover[1]); 1260 | ELEMTYPEREAL growth0 = CalcRectVolume(&rect0) - a_parVars->m_area[0]; 1261 | ELEMTYPEREAL growth1 = CalcRectVolume(&rect1) - a_parVars->m_area[1]; 1262 | ELEMTYPEREAL diff = growth1 - growth0; 1263 | if(diff >= 0) 1264 | { 1265 | group = 0; 1266 | } 1267 | else 1268 | { 1269 | group = 1; 1270 | diff = -diff; 1271 | } 1272 | 1273 | if(diff > biggestDiff) 1274 | { 1275 | biggestDiff = diff; 1276 | chosen = index; 1277 | betterGroup = group; 1278 | } 1279 | else if((diff == biggestDiff) && (a_parVars->m_count[group] < a_parVars->m_count[betterGroup])) 1280 | { 1281 | chosen = index; 1282 | betterGroup = group; 1283 | } 1284 | } 1285 | } 1286 | Classify(chosen, betterGroup, a_parVars); 1287 | } 1288 | 1289 | // If one group too full, put remaining rects in the other 1290 | if((a_parVars->m_count[0] + a_parVars->m_count[1]) < a_parVars->m_total) 1291 | { 1292 | if(a_parVars->m_count[0] >= a_parVars->m_total - a_parVars->m_minFill) 1293 | { 1294 | group = 1; 1295 | } 1296 | else 1297 | { 1298 | group = 0; 1299 | } 1300 | for(int index=0; indexm_total; ++index) 1301 | { 1302 | if(PartitionVars::NOT_TAKEN == a_parVars->m_partition[index]) 1303 | { 1304 | Classify(index, group, a_parVars); 1305 | } 1306 | } 1307 | } 1308 | 1309 | ASSERT((a_parVars->m_count[0] + a_parVars->m_count[1]) == a_parVars->m_total); 1310 | ASSERT((a_parVars->m_count[0] >= a_parVars->m_minFill) && 1311 | (a_parVars->m_count[1] >= a_parVars->m_minFill)); 1312 | } 1313 | 1314 | 1315 | // Copy branches from the buffer into two nodes according to the partition. 1316 | RTREE_TEMPLATE 1317 | void RTREE_QUAL::LoadNodes(Node* a_nodeA, Node* a_nodeB, PartitionVars* a_parVars) 1318 | { 1319 | ASSERT(a_nodeA); 1320 | ASSERT(a_nodeB); 1321 | ASSERT(a_parVars); 1322 | 1323 | for(int index=0; index < a_parVars->m_total; ++index) 1324 | { 1325 | ASSERT(a_parVars->m_partition[index] == 0 || a_parVars->m_partition[index] == 1); 1326 | 1327 | int targetNodeIndex = a_parVars->m_partition[index]; 1328 | Node* targetNodes[] = {a_nodeA, a_nodeB}; 1329 | 1330 | // It is assured that AddBranch here will not cause a node split. 1331 | bool nodeWasSplit = AddBranch(&a_parVars->m_branchBuf[index], targetNodes[targetNodeIndex], NULL); 1332 | ASSERT(!nodeWasSplit); 1333 | } 1334 | } 1335 | 1336 | 1337 | // Initialize a PartitionVars structure. 1338 | RTREE_TEMPLATE 1339 | void RTREE_QUAL::InitParVars(PartitionVars* a_parVars, int a_maxRects, int a_minFill) 1340 | { 1341 | ASSERT(a_parVars); 1342 | 1343 | a_parVars->m_count[0] = a_parVars->m_count[1] = 0; 1344 | a_parVars->m_area[0] = a_parVars->m_area[1] = (ELEMTYPEREAL)0; 1345 | a_parVars->m_total = a_maxRects; 1346 | a_parVars->m_minFill = a_minFill; 1347 | for(int index=0; index < a_maxRects; ++index) 1348 | { 1349 | a_parVars->m_partition[index] = PartitionVars::NOT_TAKEN; 1350 | } 1351 | } 1352 | 1353 | 1354 | RTREE_TEMPLATE 1355 | void RTREE_QUAL::PickSeeds(PartitionVars* a_parVars) 1356 | { 1357 | int seed0, seed1; 1358 | ELEMTYPEREAL worst, waste; 1359 | ELEMTYPEREAL area[MAXNODES+1]; 1360 | 1361 | for(int index=0; indexm_total; ++index) 1362 | { 1363 | area[index] = CalcRectVolume(&a_parVars->m_branchBuf[index].m_rect); 1364 | } 1365 | 1366 | worst = -a_parVars->m_coverSplitArea - 1; 1367 | for(int indexA=0; indexA < a_parVars->m_total-1; ++indexA) 1368 | { 1369 | for(int indexB = indexA+1; indexB < a_parVars->m_total; ++indexB) 1370 | { 1371 | Rect oneRect = CombineRect(&a_parVars->m_branchBuf[indexA].m_rect, &a_parVars->m_branchBuf[indexB].m_rect); 1372 | waste = CalcRectVolume(&oneRect) - area[indexA] - area[indexB]; 1373 | if(waste > worst) 1374 | { 1375 | worst = waste; 1376 | seed0 = indexA; 1377 | seed1 = indexB; 1378 | } 1379 | } 1380 | } 1381 | 1382 | Classify(seed0, 0, a_parVars); 1383 | Classify(seed1, 1, a_parVars); 1384 | } 1385 | 1386 | 1387 | // Put a branch in one of the groups. 1388 | RTREE_TEMPLATE 1389 | void RTREE_QUAL::Classify(int a_index, int a_group, PartitionVars* a_parVars) 1390 | { 1391 | ASSERT(a_parVars); 1392 | ASSERT(PartitionVars::NOT_TAKEN == a_parVars->m_partition[a_index]); 1393 | 1394 | a_parVars->m_partition[a_index] = a_group; 1395 | 1396 | // Calculate combined rect 1397 | if (a_parVars->m_count[a_group] == 0) 1398 | { 1399 | a_parVars->m_cover[a_group] = a_parVars->m_branchBuf[a_index].m_rect; 1400 | } 1401 | else 1402 | { 1403 | a_parVars->m_cover[a_group] = CombineRect(&a_parVars->m_branchBuf[a_index].m_rect, &a_parVars->m_cover[a_group]); 1404 | } 1405 | 1406 | // Calculate volume of combined rect 1407 | a_parVars->m_area[a_group] = CalcRectVolume(&a_parVars->m_cover[a_group]); 1408 | 1409 | ++a_parVars->m_count[a_group]; 1410 | } 1411 | 1412 | 1413 | // Delete a data rectangle from an index structure. 1414 | // Pass in a pointer to a Rect, the tid of the record, ptr to ptr to root node. 1415 | // Returns 1 if record not found, 0 if success. 1416 | // RemoveRect provides for eliminating the root. 1417 | RTREE_TEMPLATE 1418 | bool RTREE_QUAL::RemoveRect(Rect* a_rect, const DATATYPE& a_id, Node** a_root) 1419 | { 1420 | ASSERT(a_rect && a_root); 1421 | ASSERT(*a_root); 1422 | 1423 | ListNode* reInsertList = NULL; 1424 | 1425 | if(!RemoveRectRec(a_rect, a_id, *a_root, &reInsertList)) 1426 | { 1427 | // Found and deleted a data item 1428 | // Reinsert any branches from eliminated nodes 1429 | while(reInsertList) 1430 | { 1431 | Node* tempNode = reInsertList->m_node; 1432 | 1433 | for(int index = 0; index < tempNode->m_count; ++index) 1434 | { 1435 | // TODO go over this code. should I use (tempNode->m_level - 1)? 1436 | InsertRect(tempNode->m_branch[index], 1437 | a_root, 1438 | tempNode->m_level); 1439 | } 1440 | 1441 | ListNode* remLNode = reInsertList; 1442 | reInsertList = reInsertList->m_next; 1443 | 1444 | FreeNode(remLNode->m_node); 1445 | FreeListNode(remLNode); 1446 | } 1447 | 1448 | // Check for redundant root (not leaf, 1 child) and eliminate TODO replace 1449 | // if with while? In case there is a whole branch of redundant roots... 1450 | if((*a_root)->m_count == 1 && (*a_root)->IsInternalNode()) 1451 | { 1452 | Node* tempNode = (*a_root)->m_branch[0].m_child; 1453 | 1454 | ASSERT(tempNode); 1455 | FreeNode(*a_root); 1456 | *a_root = tempNode; 1457 | } 1458 | return false; 1459 | } 1460 | else 1461 | { 1462 | return true; 1463 | } 1464 | } 1465 | 1466 | 1467 | // Delete a rectangle from non-root part of an index structure. 1468 | // Called by RemoveRect. Descends tree recursively, 1469 | // merges branches on the way back up. 1470 | // Returns 1 if record not found, 0 if success. 1471 | RTREE_TEMPLATE 1472 | bool RTREE_QUAL::RemoveRectRec(Rect* a_rect, const DATATYPE& a_id, Node* a_node, ListNode** a_listNode) 1473 | { 1474 | ASSERT(a_rect && a_node && a_listNode); 1475 | ASSERT(a_node->m_level >= 0); 1476 | 1477 | if(a_node->IsInternalNode()) // not a leaf node 1478 | { 1479 | for(int index = 0; index < a_node->m_count; ++index) 1480 | { 1481 | if(Overlap(a_rect, &(a_node->m_branch[index].m_rect))) 1482 | { 1483 | if(!RemoveRectRec(a_rect, a_id, a_node->m_branch[index].m_child, a_listNode)) 1484 | { 1485 | if(a_node->m_branch[index].m_child->m_count >= MINNODES) 1486 | { 1487 | // child removed, just resize parent rect 1488 | a_node->m_branch[index].m_rect = NodeCover(a_node->m_branch[index].m_child); 1489 | } 1490 | else 1491 | { 1492 | // child removed, not enough entries in node, eliminate node 1493 | ReInsert(a_node->m_branch[index].m_child, a_listNode); 1494 | DisconnectBranch(a_node, index); // Must return after this call as count has changed 1495 | } 1496 | return false; 1497 | } 1498 | } 1499 | } 1500 | return true; 1501 | } 1502 | else // A leaf node 1503 | { 1504 | for(int index = 0; index < a_node->m_count; ++index) 1505 | { 1506 | if(a_node->m_branch[index].m_data == a_id) 1507 | { 1508 | DisconnectBranch(a_node, index); // Must return after this call as count has changed 1509 | return false; 1510 | } 1511 | } 1512 | return true; 1513 | } 1514 | } 1515 | 1516 | 1517 | // Decide whether two rectangles overlap. 1518 | RTREE_TEMPLATE 1519 | bool RTREE_QUAL::Overlap(Rect* a_rectA, Rect* a_rectB) 1520 | { 1521 | ASSERT(a_rectA && a_rectB); 1522 | 1523 | for(int index=0; index < NUMDIMS; ++index) 1524 | { 1525 | if (a_rectA->m_min[index] > a_rectB->m_max[index] || 1526 | a_rectB->m_min[index] > a_rectA->m_max[index]) 1527 | { 1528 | return false; 1529 | } 1530 | } 1531 | return true; 1532 | } 1533 | 1534 | 1535 | // Add a node to the reinsertion list. All its branches will later 1536 | // be reinserted into the index structure. 1537 | RTREE_TEMPLATE 1538 | void RTREE_QUAL::ReInsert(Node* a_node, ListNode** a_listNode) 1539 | { 1540 | ListNode* newListNode; 1541 | 1542 | newListNode = AllocListNode(); 1543 | newListNode->m_node = a_node; 1544 | newListNode->m_next = *a_listNode; 1545 | *a_listNode = newListNode; 1546 | } 1547 | 1548 | 1549 | // Search in an index tree or subtree for all data retangles that overlap the argument rectangle. 1550 | RTREE_TEMPLATE 1551 | bool RTREE_QUAL::Search(Node* a_node, Rect* a_rect, int& a_foundCount, t_resultCallback a_resultCallback, void* a_context) 1552 | { 1553 | ASSERT(a_node); 1554 | ASSERT(a_node->m_level >= 0); 1555 | ASSERT(a_rect); 1556 | 1557 | if(a_node->IsInternalNode()) 1558 | { 1559 | // This is an internal node in the tree 1560 | for(int index=0; index < a_node->m_count; ++index) 1561 | { 1562 | if(Overlap(a_rect, &a_node->m_branch[index].m_rect)) 1563 | { 1564 | if(!Search(a_node->m_branch[index].m_child, a_rect, a_foundCount, a_resultCallback, a_context)) 1565 | { 1566 | // The callback indicated to stop searching 1567 | return false; 1568 | } 1569 | } 1570 | } 1571 | } 1572 | else 1573 | { 1574 | // This is a leaf node 1575 | for(int index=0; index < a_node->m_count; ++index) 1576 | { 1577 | if(Overlap(a_rect, &a_node->m_branch[index].m_rect)) 1578 | { 1579 | DATATYPE& id = a_node->m_branch[index].m_data; 1580 | ++a_foundCount; 1581 | 1582 | // NOTE: There are different ways to return results. Here's where to modify 1583 | if(a_resultCallback) 1584 | { 1585 | if(!a_resultCallback(id, a_context)) 1586 | { 1587 | return false; // Don't continue searching 1588 | } 1589 | } 1590 | } 1591 | } 1592 | } 1593 | 1594 | return true; // Continue searching 1595 | } 1596 | 1597 | 1598 | #undef RTREE_TEMPLATE 1599 | #undef RTREE_QUAL 1600 | 1601 | #endif //RTREE_H 1602 | 1603 | -------------------------------------------------------------------------------- /filter/filter_planet_by_cats.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | Filters a planet file by categories and location. 3 | 4 | Serves as a replacement for Overpass API for the OSM Conflator. 5 | Takes two parameters: a list of coordinates and categories prepared by 6 | conflate.py and an OSM PBF/XML file. Prints an OSM XML file with 7 | objects that will then be conflated with the external dataset. 8 | Either specify that XML file name as the third parameter, or redirect 9 | the output. 10 | 11 | Based on the osmium_amenity_list.cpp from libosmium. 12 | 13 | Published under Apache Public License 2.0. 14 | 15 | Written by Ilya Zverev for MAPS.ME. 16 | */ 17 | 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | 34 | #include "RTree.h" 35 | #include "xml_centers_output.hpp" 36 | 37 | using index_type = osmium::index::map::FlexMem; 39 | using location_handler_type = osmium::handler::NodeLocationsForWays; 40 | 41 | bool AppendToVector(uint16_t cat_id, void *vec) { 42 | static_cast*>(vec)->push_back(cat_id); 43 | return true; 44 | } 45 | 46 | class AmenityHandler : public osmium::handler::Handler { 47 | 48 | constexpr static double kSearchRadius = 0.01; 49 | 50 | typedef RTree DatasetTree; 51 | typedef std::vector> TQuery; 52 | typedef std::vector TCategory; 53 | 54 | DatasetTree m_tree; 55 | osmium::io::xmlcenters::XMLCentersOutput m_centers; 56 | std::map> m_categories; 57 | std::map m_category_names; 58 | 59 | void print_object(const osmium::OSMObject &obj, 60 | const osmium::Location ¢er) { 61 | std::cout << m_centers.apply(obj, center); 62 | } 63 | 64 | // Calculate the center point of a NodeRefList. 65 | osmium::Location calc_center(const osmium::NodeRefList &nr_list) { 66 | int64_t x = 0; 67 | int64_t y = 0; 68 | 69 | for (const auto &nr : nr_list) { 70 | x += nr.x(); 71 | y += nr.y(); 72 | } 73 | 74 | x /= nr_list.size(); 75 | y /= nr_list.size(); 76 | 77 | return osmium::Location{x, y}; 78 | } 79 | 80 | bool TestTags(osmium::TagList const & tags, TQuery const & query) { 81 | for (std::vector const & pair : query) { 82 | const char *value = tags[pair[0].c_str()]; 83 | if (pair.size() == 2 && pair[1].empty()) { 84 | if (value != nullptr) 85 | return false; 86 | } else { 87 | if (value == nullptr) 88 | return false; 89 | if (pair.size() > 1) { 90 | // TODO: substrings? 91 | bool found = false; 92 | for (size_t i = 1; i < pair.size(); i++) { 93 | if (!strcmp(value, pair[i].c_str())) { 94 | found = true; 95 | break; 96 | } 97 | } 98 | if (!found) 99 | return false; 100 | } 101 | } 102 | } 103 | return true; 104 | } 105 | 106 | bool IsEligible(const osmium::Location & loc, osmium::TagList const & tags) { 107 | if (tags.empty()) 108 | return false; 109 | 110 | int32_t radius = osmium::Location::double_to_fix(kSearchRadius); 111 | int32_t min[] = {loc.x() - radius, loc.y() - radius}; 112 | int32_t max[] = {loc.x() + radius, loc.y() + radius}; 113 | std::vector found; 114 | if (!m_tree.Search(min, max, &AppendToVector, &found)) 115 | return false; 116 | for (uint16_t cat_id : found) 117 | for (TQuery query : m_categories[cat_id]) 118 | if (TestTags(tags, query)) 119 | return true; 120 | return false; 121 | } 122 | 123 | void SplitTrim(std::string const & s, char delimiter, std::size_t limit, std::vector & target) { 124 | target.clear(); 125 | std::size_t start = 0, end = 0; 126 | while (start < s.length()) { 127 | end = s.find(delimiter, start); 128 | if (end == std::string::npos || target.size() == limit) 129 | end = s.length(); 130 | while (start < end && std::isspace(s[start])) 131 | start++; 132 | 133 | std::size_t tmpend = end - 1; 134 | while (tmpend > start && std::isspace(s[tmpend])) 135 | tmpend++; 136 | target.push_back(s.substr(start, tmpend - start + 1)); 137 | start = end + 1; 138 | } 139 | } 140 | 141 | TQuery ParseQuery(std::string const & query) { 142 | TQuery q; 143 | std::vector parts; 144 | SplitTrim(query, '|', 100, parts); 145 | for (std::string const & part : parts) { 146 | std::vector keys; 147 | SplitTrim(part, '=', 100, keys); 148 | if (keys.size() > 0) 149 | q.push_back(keys); 150 | } 151 | return q; 152 | } 153 | 154 | void LoadCategories(const char *filename) { 155 | std::ifstream infile(filename); 156 | std::string line; 157 | std::vector parts; 158 | bool parsingPoints = false; 159 | while (std::getline(infile, line)) { 160 | if (!parsingPoints) { 161 | if (!line.size()) 162 | parsingPoints = true; 163 | else { 164 | SplitTrim(line, ',', 3, parts); // cat_id, name, query 165 | uint16_t cat_id = std::stoi(parts[0]); 166 | m_category_names[cat_id] = parts[1]; 167 | m_categories[cat_id].push_back(ParseQuery(parts[2])); 168 | } 169 | } else { 170 | SplitTrim(line, ',', 3, parts); // lon, lat, cat_id 171 | const osmium::Location loc(std::stod(parts[0]), std::stod(parts[1])); 172 | int32_t coords[] = {loc.x(), loc.y()}; 173 | uint16_t cat_id = std::stoi(parts[2]); 174 | m_tree.Insert(coords, coords, cat_id); 175 | } 176 | } 177 | } 178 | 179 | public: 180 | AmenityHandler(const char *categories) { 181 | LoadCategories(categories); 182 | } 183 | 184 | void node(osmium::Node const & node) { 185 | if (IsEligible(node.location(), node.tags())) { 186 | print_object(node, node.location()); 187 | } 188 | } 189 | 190 | void way(osmium::Way const & way) { 191 | if (!way.is_closed()) 192 | return; 193 | 194 | int64_t x = 0, y = 0, cnt = 0; 195 | for (const auto& node_ref : way.nodes()) { 196 | if (node_ref.location()) { 197 | x += node_ref.x(); 198 | y += node_ref.y(); 199 | cnt++; 200 | } 201 | } 202 | if (!cnt) 203 | return; 204 | 205 | const osmium::Location center(x / cnt, y / cnt); 206 | if (IsEligible(center, way.tags())) { 207 | print_object(way, center); 208 | } 209 | } 210 | 211 | void multi(osmium::Relation const & rel, osmium::Location const & center) { 212 | if (IsEligible(center, rel.tags())) { 213 | print_object(rel, center); 214 | } 215 | } 216 | 217 | }; // class AmenityHandler 218 | 219 | class AmenityRelationsManager : public osmium::relations::RelationsManager { 220 | 221 | AmenityHandler *m_handler; 222 | 223 | public: 224 | 225 | AmenityRelationsManager(AmenityHandler & handler) : 226 | RelationsManager(), 227 | m_handler(&handler) { 228 | } 229 | 230 | bool new_relation(osmium::Relation const & rel) noexcept { 231 | const char *rel_type = rel.tags().get_value_by_key("type"); 232 | return rel_type && !std::strcmp(rel_type, "multipolygon"); 233 | } 234 | 235 | void complete_relation(osmium::Relation const & rel) { 236 | int64_t x = 0, y = 0, cnt = 0; 237 | for (auto const & member : rel.members()) { 238 | if (member.ref() != 0) { 239 | const osmium::Way* way = this->get_member_way(member.ref()); 240 | for (const auto& node_ref : way->nodes()) { 241 | if (node_ref.location()) { 242 | x += node_ref.x(); 243 | y += node_ref.y(); 244 | cnt++; 245 | } 246 | } 247 | } 248 | } 249 | if (cnt > 0) 250 | m_handler->multi(rel, osmium::Location{x / cnt, y / cnt}); 251 | } 252 | }; // class AmenityRelationsManager 253 | 254 | int main(int argc, char *argv[]) { 255 | if (argc < 3) { 256 | std::cerr << "Usage: " << argv[0] 257 | << " \n"; 258 | std::exit(1); 259 | } 260 | 261 | const osmium::io::File input_file{argv[2]}; 262 | const osmium::io::File output_file{"", "osm"}; 263 | 264 | AmenityHandler data_handler(argv[1]); 265 | AmenityRelationsManager manager(data_handler); 266 | osmium::relations::read_relations(input_file, manager); 267 | 268 | osmium::io::Header header; 269 | header.set("generator", argv[0]); 270 | osmium::io::Writer writer{output_file, header, osmium::io::overwrite::allow}; 271 | 272 | index_type index; 273 | location_handler_type location_handler{index}; 274 | location_handler.ignore_errors(); 275 | osmium::io::Reader reader{input_file}; 276 | 277 | osmium::apply(reader, location_handler, data_handler, manager.handler()); 278 | 279 | std::cout.flush(); 280 | reader.close(); 281 | writer.close(); 282 | } 283 | -------------------------------------------------------------------------------- /filter/xml_centers_output.hpp: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | This file is based on xml_output_format.hpp from the Osmium library 4 | (http://osmcode.org/libosmium). 5 | 6 | Copyright 2013-2017 Jochen Topf and others (see README). 7 | Copyright 2017 Ilya Zverev , MAPS.ME 8 | 9 | Boost Software License - Version 1.0 - August 17th, 2003 10 | 11 | Permission is hereby granted, free of charge, to any person or organization 12 | obtaining a copy of the software and accompanying documentation covered by 13 | this license (the "Software") to use, reproduce, display, distribute, 14 | execute, and transmit the Software, and to prepare derivative works of the 15 | Software, and to permit third-parties to whom the Software is furnished to 16 | do so, all subject to the following: 17 | 18 | The copyright notices in the Software and this entire statement, including 19 | the above license grant, this restriction and the following disclaimer, 20 | must be included in all copies of the Software, in whole or in part, and 21 | all derivative works of the Software, unless such copies or derivative 22 | works are solely in the form of machine-executable object code generated by 23 | a source language processor. 24 | 25 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 26 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 27 | FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT 28 | SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE 29 | FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, 30 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 31 | DEALINGS IN THE SOFTWARE. 32 | 33 | */ 34 | 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | #include 44 | #include 45 | #include 46 | #include 47 | 48 | #include 49 | #include 50 | #include 51 | #include 52 | 53 | namespace osmium { 54 | 55 | namespace io { 56 | 57 | namespace xmlcenters { 58 | 59 | namespace detail { 60 | 61 | inline void append_lat_lon_attributes(std::string& out, const char* lat, const char* lon, const osmium::Location& location) { 62 | out += ' '; 63 | out += lat; 64 | out += "=\""; 65 | osmium::detail::append_location_coordinate_to_string(std::back_inserter(out), location.y()); 66 | out += "\" "; 67 | out += lon; 68 | out += "=\""; 69 | osmium::detail::append_location_coordinate_to_string(std::back_inserter(out), location.x()); 70 | out += "\""; 71 | } 72 | 73 | } // namespace detail 74 | 75 | class XMLCentersOutput { 76 | 77 | std::shared_ptr m_out; 78 | 79 | inline void append_xml_encoded_string(std::string & out, const char *data) { 80 | osmium::io::detail::append_xml_encoded_string(out, data); 81 | } 82 | 83 | void output_int(int64_t value) { 84 | if (value < 0) { 85 | *m_out += '-'; 86 | value = -value; 87 | } 88 | 89 | char temp[20]; 90 | char *t = temp; 91 | do { 92 | *t++ = char(value % 10) + '0'; 93 | value /= 10; 94 | } while (value > 0); 95 | 96 | const auto old_size = m_out->size(); 97 | m_out->resize(old_size + (t - temp)); 98 | char* data = &(*m_out)[old_size]; 99 | do { 100 | *data++ += *--t; 101 | } while (t != temp); 102 | } 103 | 104 | void write_spaces(int num) { 105 | for (; num != 0; --num) { 106 | *m_out += ' '; 107 | } 108 | } 109 | 110 | void write_prefix() { 111 | write_spaces(2); 112 | } 113 | 114 | template 115 | void write_attribute(const char* name, T value) { 116 | *m_out += ' '; 117 | *m_out += name; 118 | *m_out += "=\""; 119 | output_int(value); 120 | *m_out += '"'; 121 | } 122 | 123 | void write_meta(const osmium::OSMObject& object) { 124 | write_attribute("id", object.id()); 125 | 126 | if (object.version()) { 127 | write_attribute("version", object.version()); 128 | } 129 | 130 | if (object.timestamp()) { 131 | *m_out += " timestamp=\""; 132 | *m_out += object.timestamp().to_iso(); 133 | *m_out += "\""; 134 | } 135 | 136 | if (!object.user_is_anonymous()) { 137 | write_attribute("uid", object.uid()); 138 | *m_out += " user=\""; 139 | append_xml_encoded_string(*m_out, object.user()); 140 | *m_out += "\""; 141 | } 142 | 143 | if (object.changeset()) { 144 | write_attribute("changeset", object.changeset()); 145 | } 146 | } 147 | 148 | void write_tags(const osmium::TagList& tags) { 149 | for (const auto& tag : tags) { 150 | write_spaces(2); 151 | *m_out += " \n"; 156 | } 157 | } 158 | 159 | public: 160 | 161 | XMLCentersOutput() : m_out(std::make_shared()) { 162 | } 163 | 164 | std::string apply(osmium::OSMObject const & item, osmium::Location const & center) { 165 | switch(item.type()) { 166 | case osmium::item_type::node: 167 | node(static_cast(item)); 168 | break; 169 | case osmium::item_type::way: 170 | way(static_cast(item), center); 171 | break; 172 | case osmium::item_type::relation: 173 | relation(static_cast(item), center); 174 | break; 175 | default: 176 | throw osmium::unknown_type{}; 177 | } 178 | 179 | std::string out; 180 | using std::swap; 181 | swap(out, *m_out); 182 | 183 | return out; 184 | } 185 | 186 | void node(const osmium::Node& node) { 187 | write_prefix(); 188 | *m_out += "\n"; 265 | } 266 | 267 | write_tags(relation.tags()); 268 | 269 | write_prefix(); 270 | *m_out += "\n"; 271 | } 272 | 273 | }; // class XMLCentersOutputBlock 274 | 275 | } // namespace xmlcenters 276 | 277 | } // namespace io 278 | 279 | } // namespace osmium 280 | -------------------------------------------------------------------------------- /profiles/auchan_moscow.py: -------------------------------------------------------------------------------- 1 | # A web page with a list of shops in Moscow. You can replace it with one for another city 2 | download_url = 'https://www.auchan.ru/ru/moscow/' 3 | source = 'auchan.ru' 4 | # Not adding a ref:auchan tag, since we don't have good identifiers 5 | no_dataset_id = True 6 | # Using a name query with regular expressions 7 | query = [('shop', 'supermarket', 'mall'), ('name', '~Ашан|АШАН')] 8 | master_tags = ('name', 'opening_hours', 'phone', 'website') 9 | # Empty dict so we don't add a fixme tag to unmatched objects 10 | tag_unmatched = {} 11 | # Coordinates are VERY approximate, so increasing max distance to 1 km 12 | max_distance = 1000 13 | 14 | # For some reason, functions here cannot use variables defined above 15 | # And defining them as "global" moves these from locals() to globals() 16 | download_url_copy = download_url 17 | def dataset(fileobj): 18 | def parse_weekdays(s): 19 | weekdays = {k: v for k, v in map(lambda x: x.split(), 'пн Mo,вт Tu,ср We,чт Th,пт Fr,сб Sa,вс Su'.split(','))} 20 | s = s.replace(' ', '').lower().replace('c', 'с') 21 | if s == 'ежедневно' or s == 'пн-вс': 22 | return '' 23 | parts = [] 24 | for x in s.split(','): 25 | p = None 26 | if x in weekdays: 27 | p = weekdays[x] 28 | elif '-' in x: 29 | m = re.match(r'(\w\w)-(\w\w)', x) 30 | if m: 31 | pts = [weekdays.get(m.group(i), None) for i in (1, 2)] 32 | if pts[0] and pts[1]: 33 | p = '-'.join(pts) 34 | if p: 35 | parts.append(p) 36 | else: 37 | logging.warning('Could not parse opening hours: %s', s) 38 | return None 39 | return ','.join(parts) 40 | 41 | # We are parsing HTML, and for that we need an lxml package 42 | from lxml import html 43 | import logging 44 | import re 45 | global download_url_copy, re 46 | h = html.fromstring(fileobj.read().decode('utf-8')) 47 | shops = h.find_class('shops-in-the-city-holder')[0] 48 | shops.make_links_absolute(download_url_copy) 49 | blocks = shops.xpath("//div[@class='mark-box'] | //ul[@class='shops-list']") 50 | name = None 51 | RE_GMAPS = re.compile(r'q=(-?[0-9.]+)\+(-?[0-9.]+)$') 52 | RE_OH = re.compile(r'(Ежедневно|(?:(?:Пн|Вт|Ср|Чт|Пт|Сб|В[сc])[, -]*)+)[ сc:]+(\d\d?[:.]\d\d)[- до]+(\d\d[.:]\d\d)', re.I) 53 | data = [] 54 | for block in blocks: 55 | if block.get('class') == 'mark-box': 56 | name = block.xpath("strong[contains(@class, 'name')]/text()")[0].replace('АШАН', 'Ашан') 57 | logging.debug('Name: %s', name) 58 | elif block.get('class') == 'shops-list': 59 | for li in block: 60 | title = li.xpath("strong[@class='title']/a/text()") 61 | title = title[0].lower() if title else None 62 | website = li.xpath("strong[@class='title']/a/@href") 63 | website = website[0] if website else None 64 | addr = li.xpath("p[1]/text()") 65 | addr = addr[0].strip() if addr else None 66 | lat = None 67 | lon = None 68 | gmapslink = li.xpath(".//a[contains(@href, 'maps.google')]/@href") 69 | if gmapslink: 70 | m = RE_GMAPS.search(gmapslink[0]) 71 | if m: 72 | lat = float(m.group(1)) 73 | lon = float(m.group(2)) 74 | opening_hours = [] 75 | # Extract opening hours 76 | oh = ' '.join(li.xpath("p/text()")) 77 | for m in RE_OH.finditer(oh): 78 | weekdays = parse_weekdays(m.group(1)) 79 | if weekdays is not None: 80 | opening_hours.append('{}{:0>5s}-{:0>5s}'.format( 81 | weekdays + ' ' if weekdays else '', m.group(2).replace('.', ':'), m.group(3).replace('.', ':'))) 82 | logging.debug('Found title: %s, website: %s, opens: %s, coords: %s, %s', title, website, '; '.join(opening_hours) or None, lat, lon) 83 | if lat is not None and name is not None: 84 | tags = { 85 | 'name': name, 86 | 'brand': 'Auchan', 87 | 'shop': 'supermarket', 88 | 'phone': '8-800-700-5-800', 89 | 'operator': 'ООО «АШАН»', 90 | 'opening_hours': '; '.join(opening_hours), 91 | 'addr:full': addr, 92 | 'website': website 93 | } 94 | data.append(SourcePoint(title, lat, lon, tags)) 95 | return data 96 | -------------------------------------------------------------------------------- /profiles/azbuka.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import conflate 3 | import requests 4 | import logging 5 | import re 6 | from io import BytesIO 7 | from yandex_parser import parse_feed 8 | 9 | 10 | class Profile: 11 | source = 'Азбука Вкуса' 12 | dataset_id = 'av' 13 | query = [('shop', 'convenience', 'supermarket', 'wine', 'alcohol')] 14 | master_tags = ('operator', 'shop', 'opening_hours', 'name', 'contact:website', 'contact:phone') 15 | download_url = 'https://av.ru/yandex/supermarket.xml' 16 | bounded_update = True 17 | 18 | def matches(osmtags, avtags): 19 | if 'Энотека' in avtags['name']: 20 | return osmtags.get('shop') in ('wine', 'alcohol') 21 | name = osmtags.get('name') 22 | if osmtags.get('shop') not in ('convenience', 'supermarket'): 23 | return False 24 | if not name or re.search(r'AB|АВ|Азбука|Daily', name, re.I): 25 | return True 26 | if name.upper() in ('SPAR', 'СПАР') or 'континент' in name.lower(): 27 | return True 28 | return False 29 | 30 | def dataset(fileobj): 31 | data = [] 32 | other_urls = [ 33 | None, 34 | 'http://av.ru/yandex/market.xml', 35 | 'http://av.ru/yandex/daily.xml', 36 | 'http://av.ru/yandex/enoteka.xml', 37 | ] 38 | for url in other_urls: 39 | if url: 40 | r = requests.get(url) 41 | if r.status_code != 200: 42 | logging.error('Could not download source data: %s %s', r.status_code, r.text) 43 | return None 44 | f = BytesIO(r.content) 45 | else: 46 | f = fileobj 47 | for c in parse_feed(f): 48 | name = next(iter(c.name.values())) 49 | tags = { 50 | 'name': name, 51 | 'operator': 'ООО «Городской супермаркет»', 52 | 'contact:phone': '; '.join(c.phones) or None, 53 | 'contact:website': c.url_add, 54 | 'opening_hours': c.opening_hours, 55 | } 56 | if 'Энотека' in name: 57 | tags['shop'] = 'wine' 58 | elif 'Daily' in name: 59 | tags['shop'] = 'convenience' 60 | else: 61 | tags['shop'] = 'supermarket' 62 | data.append(conflate.SourcePoint(c.id, c.lat, c.lon, tags)) 63 | return data 64 | 65 | 66 | if __name__ == '__main__': 67 | conflate.run(Profile) 68 | -------------------------------------------------------------------------------- /profiles/burgerking.py: -------------------------------------------------------------------------------- 1 | # Note: the json file at the burgerking website was restructured 2 | # and does not contain any useful data now. 3 | # So this profile is here solely for demonstration purposes. 4 | 5 | download_url = 'https://burgerking.ru/restaurant-locations-json-reply-new' 6 | source = 'Burger King' 7 | dataset_id = 'burger_king' 8 | no_dataset_id = True 9 | query = '[amenity~"cafe|restaurant|fast_food"][name~"burger.*king|бургер.*кинг",i]' 10 | max_distance = 1000 11 | overpass_timeout = 1200 12 | max_request_boxes = 4 13 | master_tags = ('name', 'amenity', 'name:ru', 'name:en', 'contact:phone', 'opening_hours') 14 | tag_unmatched = { 15 | 'fixme': 'Проверить на местности: в данных сайта отсутствует.', 16 | 'amenity': None, 17 | 'was:amenity': 'fast_food' 18 | } 19 | 20 | 21 | def dataset(fileobj): 22 | def parse_hours(s): 23 | global re 24 | s = re.sub('^зал:? *', '', s.lower()) 25 | s = s.replace('
', ';').replace('
', ';').replace('\n', ';').replace(' ', '').replace(',', ';').replace('–', '-') 26 | s = s.replace('-00:', '-24:') 27 | weekdays = {k: v for k, v in map(lambda x: x.split(), 'пн Mo,вт Tu,ср We,чт Th,пт Fr,сб Sa,вс Su'.split(','))} 28 | if s == 'круглосуточно': 29 | return '24/7' 30 | parts = s.split(';') 31 | WEEKDAY_PATH = '(?:пн|вт|ср|чт|пт|сб|вск?)' 32 | result = [] 33 | found_allweek = False 34 | for p in parts: 35 | if not p: 36 | continue 37 | m = re.match(r'^('+WEEKDAY_PATH+'(?:[-,]'+WEEKDAY_PATH+')*)?с?(\d?\d[:.]\d\d-\d?\d[:.]\d\d)$', p) 38 | if not m: 39 | # Disregarding other parts 40 | return None 41 | times = re.sub('(^|-)(\d:)', r'\g<1>0\g<2>', m[2].replace('.', ':')) 42 | if m[1]: 43 | wd = m[1].replace('вск', 'вс') 44 | for k, v in weekdays.items(): 45 | wd = wd.replace(k, v) 46 | else: 47 | found_allweek = True 48 | wd = 'Mo-Su' 49 | result.append(wd + ' ' + times) 50 | if not result or (found_allweek and len(result) > 1): 51 | return None 52 | return '; '.join(result) 53 | 54 | def parse_phone(s): 55 | s = s.replace('(', '').replace(')', '').replace('-', '') 56 | s = s.replace(' доб. ', '-') 57 | return s 58 | 59 | import json 60 | import codecs 61 | import re 62 | notes = { 63 | 172: 'Подвинуть на второй терминал', 64 | 25: 'Подвинуть в ЮниМолл', 65 | 133: 'Передвинуть в Парк №1: https://prnt.sc/gtlwjs', 66 | 471: 'Передвинуть в ТЦ Балканский 6, самый северный, где кино', 67 | 234: 'Передвинуть на север, в дом 7', 68 | 111: 'Сдвинуть в здание', 69 | 59: 'Сдвинуть в торговый центр севернее', 70 | 346: 'Передвинуть к кафе', 71 | 72 | } 73 | json_src = codecs.getreader('utf-8')(fileobj).read() 74 | p = json_src.find(' 0: 76 | json_src = json_src[:p] 77 | source = json.loads(json_src) 78 | data = [] 79 | for el in source: 80 | gid = int(el['origID']) 81 | tags = { 82 | 'amenity': 'fast_food', 83 | 'name': 'Бургер Кинг', 84 | 'name:ru': 'Бургер Кинг', 85 | 'name:en': 'Burger King', 86 | 'ref': gid, 87 | 'cuisine': 'burger', 88 | 'takeaway': 'yes', 89 | 'wikipedia:brand': 'ru:Burger King', 90 | 'wikidata:brand': 'Q177054', 91 | 'contact:website': 'https://burgerking.ru/', 92 | 'contact:email': el['email'], 93 | 'contact:phone': parse_phone(el['tel']), 94 | 'opening_hours': parse_hours(el['opened']) 95 | } 96 | if gid in notes: 97 | tags['fixme'] = notes[gid] 98 | if el['is_wifi']: 99 | tags['internet_access'] = 'wlan' 100 | tags['internet_access:fee'] = 'no' 101 | else: 102 | tags['internet_access'] = 'no' 103 | data.append(SourcePoint(gid, float(el['lat']), float(el['lng']), tags)) 104 | return data 105 | -------------------------------------------------------------------------------- /profiles/minkult.py: -------------------------------------------------------------------------------- 1 | source = 'opendata.mkrf.ru' 2 | dataset_id = 'mkrf_theaters' 3 | query = [('amenity', 'theatre')] 4 | max_distance = 300 5 | master_tags = ('official_name', 'phone', 'opening_hours', 'website') 6 | 7 | 8 | # Reading the dataset passport to determine an URL of the latest dataset version 9 | def download_url(): 10 | import logging 11 | import requests 12 | 13 | dataset_id = '7705851331-' + (param or 'museums') 14 | r = requests.get('http://opendata.mkrf.ru/opendata/{}/meta.json'.format(dataset_id)) 15 | if r.status_code != 200 or len(r.content) == 0: 16 | logging.error('Could not get URL for dataset: %s %s', r.status_code, r.text) 17 | logging.error('Please check http://opendata.mkrf.ru/opendata/{}'.format(dataset_id)) 18 | return None 19 | result = r.json() 20 | latest = result['data'][-1] 21 | logging.info('Downloading %s from %s', result['title'], latest['created']) 22 | return latest['source'] 23 | 24 | source = 'opendata.mkrf.ru' 25 | dataset_id = 'mkrf_'+(param or 'museums') 26 | if not param or param == 'museums': 27 | query = [('tourism', 'museum')] 28 | elif param == 'theaters': 29 | query = [('amenity', 'theatre')] 30 | elif param == 'circuses': 31 | query = [('amenity', 'circus')] 32 | elif param == 'philharmonic': 33 | query = [('amenity', 'theatre')] 34 | else: 35 | raise ValueError('Unknown param value: {}'.format(param)) 36 | 37 | max_distance = 300 38 | master_tags = ('official_name', 'phone', 'opening_hours', 'website') 39 | 40 | 41 | def dataset(fileobj): 42 | import json 43 | import codecs 44 | 45 | def make_wd_ranges(r): 46 | """Converts e.g. [0,1,4] into 'Mo-Tu, Fr'.""" 47 | wd = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su'] 48 | res = wd[r[0]] 49 | in_range = False 50 | for i in range(1, len(r)+1): 51 | if i < len(r) and r[i] == r[i-1] + 1: 52 | in_range = True 53 | else: 54 | if in_range: 55 | res += '-' + wd[r[i-1]] 56 | in_range = False 57 | if i < len(r): 58 | res += ', ' + wd[r[i]] 59 | return res 60 | 61 | def parse_hours(h): 62 | """Receives a dict {'0': {'from': '10:00:00', 'to': '18:00:00'}, ...} 63 | and returns a proper opening_hours value.""" 64 | days = {} 65 | for wd, d in h.items(): 66 | if not d['from']: 67 | continue 68 | for i in ('from', 'to'): 69 | d[i] = d[i][:5] 70 | if d['to'] == '00:00': 71 | d['to'] = '24:00' 72 | elif not d['to']: 73 | d['to'] = '19:00+' 74 | k = '{}-{}'.format(d['from'], d['to']) 75 | if k not in days: 76 | days[k] = set() 77 | days[k].add(int(wd)) 78 | days2 = {} 79 | for op, d in days.items(): 80 | days2[tuple(sorted(d))] = op 81 | res = [] 82 | for d in sorted(days2.keys(), key=lambda x: min(x)): 83 | res.append(' '.join([make_wd_ranges(d), days2[d]])) 84 | return '; '.join(res) 85 | 86 | def wrap(coord, absmax): 87 | if coord < -absmax: 88 | return coord + absmax * 2 89 | if coord > absmax: 90 | return coord - absmax * 2 91 | return coord 92 | 93 | def format_phone(ph): 94 | if ph and len(ph) == 11 and ph[0] == '7': 95 | return '+7 {} {}-{}-{}'.format(ph[1:4], ph[4:7], ph[7:9], ph[9:]) 96 | return ph 97 | 98 | source = json.load(codecs.getreader('utf-8')(fileobj)) 99 | data = [] 100 | for el in source: 101 | d = el['data']['general'] 102 | gid = d['id'] 103 | lon = wrap(d['address']['mapPosition']['coordinates'][1], 180) 104 | lat = d['address']['mapPosition']['coordinates'][0] 105 | tags = { 106 | 'amenity': 'theatre', 107 | 'name': d['name'], 108 | # 'official_name': d['name'], 109 | # 'image': d['image']['url'], 110 | 'operator': d['organization']['name'], 111 | 'addr:full': '{}, {}'.format(d['locale']['name'], d['address']['street']), 112 | } 113 | if tags['operator'] == tags['name']: 114 | del tags['operator'] 115 | if d.get('workingSchedule'): 116 | tags['opening_hours'] = parse_hours(d['workingSchedule']) 117 | if 'email' in d['contacts']: 118 | tags['email'] = d['contacts']['email'] 119 | if 'website' in d['contacts']: 120 | tags['website'] = d['contacts']['website'] 121 | if tags['website'].endswith('.ru'): 122 | tags['website'] += '/' 123 | if 'phones' in d['contacts'] and d['contacts']['phones']: 124 | tags['phone'] = format_phone(d['contacts']['phones'][0]['value']) 125 | data.append(SourcePoint(gid, lat, lon, tags)) 126 | return data 127 | -------------------------------------------------------------------------------- /profiles/moscow_addr.py: -------------------------------------------------------------------------------- 1 | source = 'dit.mos.ru' 2 | no_dataset_id = True 3 | query = [('building',)] 4 | max_distance = 50 5 | max_request_boxes = 2 6 | master_tags = ('addr:housenumber', 'addr:street') 7 | 8 | COMPLEX = False 9 | ADMS = { 10 | '1': 'Северо-Западный административный округ', 11 | '2': 'Северный административный округ', 12 | '3': 'Северо-Восточный административный округ', 13 | '4': 'Западный административный округ', 14 | '5': 'Центральный административный округ', 15 | '6': 'Восточный административный округ', 16 | '7': 'Юго-Западный административный округ', 17 | '8': 'Южный административный округ', 18 | '9': 'Юго-Восточный административный округ', 19 | '10': 'Зеленоградский административный округ', 20 | '11': 'Троицкий административный округ', 21 | '12': 'Новомосковский административный округ', 22 | } 23 | ADM = ADMS['2'] 24 | if param: 25 | if param[0] == 'c': 26 | COMPLEX = True 27 | param = param[1:] 28 | if param in ADMS: 29 | ADM = ADMS[param] 30 | if param == '5': 31 | query = [[('addr:housenumber',)], [('building',)]] 32 | 33 | 34 | def dataset(fileobj): 35 | import zipfile 36 | import json 37 | import logging 38 | global COMPLEX, ADM 39 | 40 | def find_center(geodata): 41 | if not geodata: 42 | return None 43 | if 'center' in geodata: 44 | return geodata['center'][0] 45 | if 'coordinates' in geodata: 46 | typ = geodata['type'] 47 | lonlat = [0, 0] 48 | cnt = 0 49 | if typ == 'Polygon': 50 | for p in geodata['coordinates'][0]: 51 | lonlat[0] += p[0] 52 | lonlat[1] += p[1] 53 | cnt += 1 54 | elif typ == 'LineString': 55 | for p in geodata['coordinates']: 56 | lonlat[0] += p[0] 57 | lonlat[1] += p[1] 58 | cnt += 1 59 | elif typ == 'Point': 60 | p = geodata['coordinates'] 61 | lonlat[0] += p[0] 62 | lonlat[1] += p[1] 63 | cnt += 1 64 | if cnt > 0: 65 | return [lonlat[0]/cnt, lonlat[1]/cnt] 66 | return None 67 | 68 | logging.info('Экспортируем %s (%s)', ADM, 'строения' if COMPLEX else 'без строений') 69 | zf = zipfile.ZipFile(fileobj) 70 | data = [] 71 | no_geodata = 0 72 | no_addr = 0 73 | count = 0 74 | for zname in zf.namelist(): 75 | source = json.loads(zf.read(zname).decode('cp1251')) 76 | for el in source: 77 | gid = el['global_id'] 78 | try: 79 | adm_area = el['ADM_AREA'] 80 | if adm_area != ADM: 81 | continue 82 | count += 1 83 | lonlat = find_center(el.get('geoData')) 84 | if not lonlat: 85 | no_geodata += 1 86 | street = el.get('P7') 87 | house = el.get('L1_VALUE') 88 | htype = el.get('L1_TYPE') 89 | corpus = el.get('L2_VALUE') 90 | ctype = el.get('L2_TYPE') 91 | stroenie = el.get('L3_VALUE') 92 | stype = el.get('L3_TYPE') 93 | if not street or not house or 'Б/Н' in house: 94 | no_addr += 1 95 | continue 96 | if not lonlat: 97 | continue 98 | is_complex = False 99 | housenumber = house.replace(' ', '') 100 | if htype != 'дом': 101 | is_complex = True 102 | if htype in ('владение', 'домовладение'): 103 | housenumber = 'вл' + housenumber 104 | else: 105 | logging.warn('Unknown house number type: %s', htype) 106 | continue 107 | if corpus: 108 | if ctype == 'корпус': 109 | housenumber += ' к{}'.format(corpus) 110 | else: 111 | logging.warn('Unknown corpus type: %s', ctype) 112 | continue 113 | if stroenie: 114 | is_complex = True 115 | if stype == 'строение' or stype == 'сооружение': 116 | housenumber += ' с{}'.format(stroenie) 117 | else: 118 | logging.warn('Unknown stroenie type: %s', stype) 119 | continue 120 | if is_complex != COMPLEX: 121 | continue 122 | tags = { 123 | 'addr:street': street, 124 | 'addr:housenumber': housenumber, 125 | } 126 | data.append(SourcePoint(gid, lonlat[1], lonlat[0], tags)) 127 | except Exception as e: 128 | logging.warning('PROFILE: Failed to get attributes for address %s: %s', gid, str(e)) 129 | logging.warning(json.dumps(el, ensure_ascii=False)) 130 | 131 | if no_addr + no_geodata > 0: 132 | logging.warning('%.2f%% of data have no centers, and %.2f%% have no streets or house numbers', 133 | 100*no_geodata/count, 100*no_addr/count) 134 | return data 135 | -------------------------------------------------------------------------------- /profiles/moscow_parkomats.py: -------------------------------------------------------------------------------- 1 | # What will be put into "source" tags. Lower case please 2 | source = 'dit.mos.ru' 3 | # A fairly unique id of the dataset to query OSM, used for "ref:mos_parking" tags 4 | # If you omit it, set explicitly "no_dataset_id = True" 5 | dataset_id = 'mos_parking' 6 | # Tags for querying with overpass api 7 | query = [('amenity', 'vending_machine'), ('vending', 'parking_tickets')] 8 | # Use bbox from dataset points (default). False = query whole world, [minlat, minlon, maxlat, maxlon] to override 9 | bbox = True 10 | # How close OSM point should be to register a match, in meters. Default is 100 11 | max_distance = 30 12 | # Delete objects that match query tags but not dataset? False is the default 13 | delete_unmatched = False 14 | # If set, and delete_unmatched is False, modify tags on unmatched objects instead 15 | # Always used for area features, since these are not deleted 16 | tag_unmatched = { 17 | 'fixme': 'Проверить на местности: в данных ДИТ отсутствует. Вероятно, демонтирован', 18 | 'amenity': None, 19 | 'was:amenity': 'vending_machine' 20 | } 21 | # Actually, after the initial upload we should not touch any existing non-matched objects 22 | tag_unmatched = None 23 | # A set of authoritative tags to replace on matched objects 24 | master_tags = ('zone:parking', 'ref', 'contact:phone', 'contact:website', 'operator') 25 | 26 | 27 | def download_url(mos_dataset_id=1421): 28 | import requests 29 | import logging 30 | r = requests.get('https://data.mos.ru/api/datasets/expformats/?datasetId={}'.format(mos_dataset_id)) 31 | if r.status_code != 200 or len(r.content) == 0: 32 | logging.error('Could not get URL for dataset: %s %s', r.status_code, r.text) 33 | logging.error('Please check http://data.mos.ru/opendata/{}/passport'.format(mos_dataset_id)) 34 | return None 35 | url = [x for x in r.json() if x['Format'] == 'json'][0] 36 | version = '?' 37 | title = 'dataset' 38 | r = requests.get('https://data.mos.ru/apiproxy/opendata/{}/meta.json'.format(mos_dataset_id)) 39 | if r.status_code == 200: 40 | title = r.json()['Title'] 41 | version = r.json()['VersionNumber'] 42 | logging.info('Downloading %s %s from %s', title, version, url['GenerationStart']) 43 | return 'https://op.mos.ru/EHDWSREST/catalog/export/get?id=' + url['EhdId'] 44 | 45 | 46 | # A list of SourcePoint objects. Initialize with (id, lat, lon, {tags}). 47 | def dataset(fileobj): 48 | import json 49 | import logging 50 | import zipfile 51 | import re 52 | zf = zipfile.ZipFile(fileobj) 53 | source = json.loads(zf.read(zf.namelist()[0]).decode('cp1251')) 54 | RE_NUM4 = re.compile(r'\d{4,6}') 55 | data = [] 56 | for el in source: 57 | try: 58 | gid = el['global_id'] 59 | zone = el['ParkingZoneNumber'] 60 | lon = el['Longitude_WGS84'] 61 | lat = el['Latitude_WGS84'] 62 | pnum = el['NumberOfParkingMeter'] 63 | tags = { 64 | 'amenity': 'vending_machine', 65 | 'vending': 'parking_tickets', 66 | 'zone:parking': zone, 67 | 'contact:phone': '+7 495 539-54-54', 68 | 'contact:website': 'http://parking.mos.ru/', 69 | 'opening_hours': '24/7', 70 | 'operator': 'ГКУ «Администратор Московского парковочного пространства»', 71 | 'payment:cash': 'no', 72 | 'payment:credit_cards': 'yes', 73 | 'payment:debit_cards': 'yes' 74 | } 75 | try: 76 | lat = float(lat) 77 | lon = float(lon) 78 | tags['ref'] = RE_NUM4.search(pnum).group(0) 79 | data.append(SourcePoint(gid, lat, lon, tags)) 80 | except Exception as e: 81 | logging.warning('PROFILE: Failed to parse lat/lon/ref for parking meter %s: %s', gid, str(e)) 82 | except Exception as e: 83 | logging.warning('PROFILE: Failed to get attributes for parking meter: %s', str(e)) 84 | return data 85 | -------------------------------------------------------------------------------- /profiles/navads_shell.py: -------------------------------------------------------------------------------- 1 | # This profile reads a prepared JSON, thus no "dataset" function 2 | 3 | # Value for the changeset "source" tag 4 | source = 'Navads' 5 | # Keeping identifiers in a "ref:navads_shell" tag 6 | dataset_id = 'navads_shell' 7 | # Overpass API query is a simple [amenity="fuel"] 8 | query = [('amenity', 'fuel')] 9 | # These tag values override values on OSM objects 10 | master_tags = ('brand', 'addr:postcode', 'phone', 'opening_hours') 11 | # Looking at most 50 meters around a dataset point 12 | max_distance = 50 13 | 14 | 15 | def format_phone(ph): 16 | if ph and len(ph) == 13 and ph[:3] == '+44': 17 | if (ph[3] == '1' and ph[4] != '1' and ph[5] != '1') or ph[3:7] == '7624': 18 | return ' '.join([ph[:3], ph[3:7], ph[7:]]) 19 | elif ph[3] in ('1', '3', '8', '9'): 20 | return ' '.join([ph[:3], ph[3:6], ph[6:9], ph[9:]]) 21 | else: 22 | return ' '.join([ph[:3], ph[3:5], ph[5:9], ph[9:]]) 23 | return ph 24 | 25 | 26 | # Tag transformation 27 | transform = { 28 | # Just add this tag 29 | 'amenity': 'fuel', 30 | # Rename key 31 | 'postal_code': '>addr:postcode', 32 | # Use a function to transform a value 33 | 'phone': format_phone, 34 | # Remove this tag 35 | 'name': '-' 36 | } 37 | 38 | # Example JSON line: 39 | # 40 | # { 41 | # "id": "NVDS298-10018804", 42 | # "lat": 51.142491, 43 | # "lon": -0.074893, 44 | # "tags": { 45 | # "name": "Shell", 46 | # "brand": "Shell", 47 | # "addr:street": "Snow Hill", 48 | # "postal_code": "RH10 3EQ", 49 | # "addr:city": "Crawley", 50 | # "phone": "+441342718750", 51 | # "website": "http://www.shell.co.uk", 52 | # "operator": "Shell", 53 | # "opening_hours": "24/7", 54 | # "amenity": "fuel" 55 | # } 56 | # } 57 | -------------------------------------------------------------------------------- /profiles/navads_shell_json.py: -------------------------------------------------------------------------------- 1 | source = 'Navads' 2 | dataset_id = 'navads_shell' 3 | query = [('amenity', 'fuel')] 4 | master_tags = ('brand', 'phone', 'opening_hours') 5 | max_distance = 50 6 | max_request_boxes = 3 7 | 8 | 9 | def dataset(fileobj): 10 | import json 11 | import codecs 12 | import re 13 | from collections import defaultdict 14 | 15 | def format_phone(ph): 16 | if ph and len(ph) == 13 and ph[:3] == '+44': 17 | if (ph[3] == '1' and ph[4] != '1' and ph[5] != '1') or ph[3:7] == '7624': 18 | return ' '.join([ph[:3], ph[3:7], ph[7:]]) 19 | elif ph[3] in ('1', '3', '8', '9'): 20 | return ' '.join([ph[:3], ph[3:6], ph[6:9], ph[9:]]) 21 | else: 22 | return ' '.join([ph[:3], ph[3:5], ph[5:9], ph[9:]]) 23 | return ph 24 | 25 | def make_wd_ranges(r): 26 | wd = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su'] 27 | res = wd[r[0]] 28 | in_range = False 29 | for i in range(1, len(r)+1): 30 | if i < len(r) and r[i] == r[i-1] + 1: 31 | in_range = True 32 | else: 33 | if in_range: 34 | res += '-' + wd[r[i-1]] 35 | in_range = False 36 | if i < len(r): 37 | res += ',' + wd[r[i]] 38 | return res 39 | 40 | def parse_hours(h): 41 | if not h: 42 | return None 43 | WD = {x: i for i, x in enumerate([ 44 | 'MONDAY', 'TUESDAY', 'WEDNESDAY', 'THURSDAY', 'FRIDAY', 'SATURDAY', 'SUNDAY' 45 | ])} 46 | days = defaultdict(list) 47 | for d in h.split(';'): 48 | parts = re.findall(r'([A-Z]+)=([0-9:-]+)', d) 49 | if len(set([p[0] for p in parts])) != 1: 50 | raise Exception('Parts format fail: {}'.format(d)) 51 | days[','.join([p[1] for p in parts])].append(WD[parts[0][0]]) 52 | res = [] 53 | for time, wd in sorted(days.items(), key=lambda x: min(x[1])): 54 | res.append(' '.join([make_wd_ranges(wd), time])) 55 | if res[0] == 'Mo-Su 00:00-23:59': 56 | return '24/7' 57 | return '; '.join(res).replace('23:59', '24:00') 58 | 59 | global re, defaultdict 60 | source = json.load(codecs.getreader('utf-8-sig')(fileobj)) 61 | data = [] 62 | for el in source['Locations']: 63 | if not el['location']: 64 | continue 65 | coords = [float(x) for x in el['location'].split(',')] 66 | tags = { 67 | 'amenity': 'fuel', 68 | 'brand': el['name'], 69 | 'addr:postcode': el['address_zip'] or None, 70 | 'phone': format_phone('+'+str(el['phone'])), 71 | 'opening_hours': parse_hours(el['daily_hours']), 72 | } 73 | if (el['address_street'] and el['address_number'] and 74 | not re.search(r'^([ABCDM]\d+|Junction)', el['address_street']) and 75 | 'Ln' not in el['address_street'] and 'A' not in el['address_number']): 76 | tags['addr:street'] = el['address_street'] 77 | tags['addr:housenumber'] = el['address_number'] 78 | data.append(SourcePoint(el['place_id'], coords[0], coords[1], tags)) 79 | return data 80 | 81 | 82 | # Example line of the source JSON: 83 | # 84 | # { 85 | # "place_id": "NVDS353-10019224", 86 | # "name": "Shell", 87 | # "category": "GAS_STATION", 88 | # "location": "54.978366,-1.57441", 89 | # "description": "", 90 | # "phone": 441912767084, 91 | # "address_street": "Shields Road", 92 | # "address_number": "308", 93 | # "address_city": "Newcastle-Upon-Tyne", 94 | # "address_zip": "NE6 2UU", 95 | # "address_country": "GB", 96 | # "website": "http://www.shell.co.uk/motorist/station-locator.html?id=10019224&modeselected=true", 97 | # "daily_hours": "MONDAY=00:00-23:59;TUESDAY=00:00-23:59;WEDNESDAY=00:00-23:59;THURSDAY=00:00-23:59;FRIDAY=00:00-23:59;SATURDAY=00:00-23:59;SUNDAY=00:00-23:59", 98 | # "brand": "Shell", 99 | # "is_deleted": false 100 | # }, 101 | -------------------------------------------------------------------------------- /profiles/rosinter.py: -------------------------------------------------------------------------------- 1 | download_url = 'http://www.rosinter.ru/locator/RestaurantsFeed.aspx?city=all&location=&lang=ru&brand=all&cuisine=all&metro=&hasDelivery=&isCorporate=' 2 | source = 'Rosinter' 3 | no_dataset_id = True 4 | max_distance = 500 5 | query = [('amenity', 'restaurant', 'cafe', 'bar', 'pub', 'fast_food')] 6 | overpass_timeout = 1000 7 | duplicate_distance = -1 8 | nearest_points = 30 9 | master_tags = ('name', 'phone', 'amenity') 10 | 11 | types = { 12 | # substr: osm_substr, amenity, cuisine 13 | 'Costa': ['costa', 'cafe', 'coffee_shop'], 14 | 'IL': [('patio', 'патио'), 'restaurant', 'italian'], 15 | 'TGI': [('tgi', 'friday'), 'restaurant', 'american'], 16 | 'Бар и': ['гриль', 'restaurant', 'american'], 17 | 'Макд': ['мак', 'fast_food', None], 18 | 'Раша': ['мама', 'fast_food', 'russian'], 19 | 'Планета': ['планета', 'restaurant', 'japanese'], 20 | 'Шика': ['шика', 'restaurant', 'asian'], 21 | 'Свои': ['сво', 'restaurant', None], 22 | } 23 | 24 | 25 | def matches(osmtags, ritags): 26 | global types 27 | rname = ritags['name'] 28 | name = osmtags.get('name', '').lower() 29 | for k, v in types.items(): 30 | if k in rname: 31 | if isinstance(v[0], str): 32 | return v[0] in name 33 | for n in v[0]: 34 | if n in name: 35 | return True 36 | return False 37 | logging.error('Unknown rname value: %s', rname) 38 | return False 39 | 40 | 41 | def dataset(f): 42 | global types 43 | from lxml import etree 44 | root = etree.parse(f).getroot() 45 | for el in root.find('Restaurants'): 46 | rid = el.find('id').text 47 | city = el.find('city').text 48 | if city in ('Прага', 'Будапешт', 'Варшава', 'Баку', 'Рига'): 49 | continue 50 | brand = el.find('brand').text 51 | if 'TGI' in brand: 52 | brand = 'TGI Fridays' 53 | elif 'СВОИ' in brand: 54 | brand = 'Свои' 55 | phone = el.find('telephone').text 56 | if phone: 57 | phone = phone.replace('(', '').replace(')', '') 58 | website = el.find('siteurl').text 59 | if website and 'il-patio' in website: 60 | website = 'http://ilpatio.ru' 61 | if 'Свои' in brand: 62 | website = 'http://restoransvoi.by' 63 | lat = float(el.find('latitude').text) 64 | lon = float(el.find('longitude').text) 65 | tags = { 66 | 'amenity': 'restaurant', 67 | 'name': brand, 68 | 'phone': phone, 69 | 'website': website, 70 | } 71 | address = el.find('address').text 72 | for k, v in types.items(): 73 | if k in brand: 74 | tags['amenity'] = v[1] 75 | tags['cuisine'] = v[2] 76 | yield SourcePoint( 77 | rid, lat, lon, tags, 78 | remarks='Обязательно подвиньте точку!\nАдрес: ' + str(address)) 79 | -------------------------------------------------------------------------------- /profiles/schocoladnitsa.py: -------------------------------------------------------------------------------- 1 | download_url = 'http://new.shoko.ru/addresses/' 2 | source = 'Шоколадница' 3 | no_dataset_id = True 4 | overpass_timeout = 600 5 | max_distance = 250 6 | max_request_boxes = 6 7 | query = [('amenity',), ('name', '~Шоколадница')] 8 | master_tags = ['amenity', 'name', 'name:ru', 'name:en', 'website', 'phone', 'opening_hours'] 9 | 10 | 11 | def dataset(fileobj): 12 | def parse_oh(s): 13 | if not s: 14 | return None 15 | olds = s 16 | if s.strip().lower() == 'круглосуточно': 17 | return '24/7' 18 | trans = { 19 | 'будни': 'Mo-Fr', 20 | 'суббота': 'Sa', 21 | 'воскресенье': 'Su', 22 | 'ежедневно': 'Mo-Su', 23 | 'выходные': 'Sa-Su', 24 | 'восерсенье': 'Su', 25 | 'ежеденевно': 'Mo-Su', 26 | 'пн-чтивс': 'Mo-Th,Su', 27 | 'пн-чт,вс': 'Mo-Th,Su', 28 | 'пт.-сб': 'Fr-Sa', 29 | 'вск.-чт': 'Su-Th', 30 | 'смаяпооктябрь': 'May-Oct', 31 | 'ч.смаяпооктябрь': 'May-Oct', 32 | 'сентября': 'May-Sep', 33 | } 34 | weekdays = {'пн': 'Mo', 'вт': 'Tu', 'ср': 'We', 'чт': 'Th', 'пт': 'Fr', 'сб': 'Sa', 'вс': 'Su'} 35 | if s == 'с 10 до 22' or s == 'с 10.00-22.00': 36 | s = '10:00 - 22:00' 37 | s = s.replace('круглосуточно', '00:00-24:00') 38 | s = s.replace('23,', '23:00') 39 | parts = [] 40 | for m in re.finditer(r'([а-яА-Я ,.:\(\)-]+?)?(?:\sс)?\s*(\d?\d[:.]\d\d)(?: до |[^\w\d]+)(\d\d[:.]\d\d)', s): 41 | days = (m[1] or '').strip(' -.,:()').lower().replace(' ', '') 42 | m2 = re.match(r'^([б-ч]{2})\s?[,и-]\s?([б-ч]{2})$', days) 43 | if not days: 44 | days = 'Mo-Su' 45 | elif days in weekdays: 46 | days = weekdays[days] 47 | elif m2 and m2[1] in weekdays and m2[2] in weekdays: 48 | days = weekdays[m2[1]] + '-' + weekdays[m2[2]] 49 | else: 50 | if days not in trans: 51 | logging.warn('Unknown days: %s', days) 52 | continue 53 | days = trans[days] 54 | parts.append('{} {:0>5}-{}'.format(days, m[2].replace('.', ':'), m[3].replace('.', ':'))) 55 | # logging.info('%s -> %s', olds, '; '.join(parts)) 56 | if parts: 57 | return '; '.join(parts) 58 | return None 59 | 60 | from lxml import html 61 | import re 62 | import logging 63 | import phonenumbers 64 | h = html.fromstring(fileobj.read().decode('utf-8')) 65 | markers = h.get_element_by_id('markers') 66 | i = 0 67 | for m in markers: 68 | lat = m.get('data-lat') 69 | lon = m.get('data-lng') 70 | if not lat or not lon: 71 | continue 72 | oh = parse_oh(m.get('data-time')) 73 | phone = m.get('data-phone') 74 | if phone[:3] == '812': 75 | phone = '+7' + phone 76 | if ' 891' in phone: 77 | phone = phone[:phone.index(' 891')] 78 | if ' 8-91' in phone: 79 | phone = phone[:phone.index(' 8-91')] 80 | try: 81 | if phone == 'отключен' or not phone: 82 | phone = None 83 | else: 84 | parsed_phone = phonenumbers.parse(phone.replace(';', ',').split(',')[0], "RU") 85 | except: 86 | logging.info(phone) 87 | raise 88 | if phone is None: 89 | fphone = None 90 | else: 91 | fphone = phonenumbers.format_number( 92 | parsed_phone, phonenumbers.PhoneNumberFormat.INTERNATIONAL) 93 | tags = { 94 | 'amenity': 'cafe', 95 | 'name': 'Шоколадница', 96 | 'name:ru': 'Шоколадница', 97 | 'name:en': 'Shokoladnitsa', 98 | 'website': 'http://shoko.ru', 99 | 'cuisine': 'coffee_shop', 100 | 'phone': fphone, 101 | 'opening_hours': oh 102 | } 103 | i += 1 104 | yield SourcePoint(i, float(lat), float(lon), tags, remarks=m.get('data-title')) 105 | -------------------------------------------------------------------------------- /profiles/velobike.py: -------------------------------------------------------------------------------- 1 | # Where to get the latest feed 2 | download_url = 'http://www.velobike.ru/proxy/parkings/' 3 | # What to write for the changeset's source tag 4 | source = 'velobike.ru' 5 | # These two lines negate each other: 6 | dataset_id = 'velobike' 7 | # We actually do not use ref:velobike tag 8 | no_dataset_id = True 9 | # Overpass API query: [amenity="bicycle_rental"][network="Велобайк"] 10 | query = [('amenity', 'bicycle_rental'), ('network', 'Велобайк')] 11 | # Maximum lookup radius is 100 meters 12 | max_distance = 100 13 | # The overpass query chooses all relevant points, 14 | # so points that are not in the dataset should be deleted 15 | delete_unmatched = True 16 | # If delete_unmatched were False, we'd be retagging these parkings: 17 | tag_unmatched = { 18 | 'fixme': 'Проверить на местности: в данных велобайка отсутствует. Вероятно, демонтирована', 19 | 'amenity': None, 20 | 'was:amenity': 'bicycle_rental' 21 | } 22 | # Overwriting these tags 23 | master_tags = ('ref', 'capacity', 'capacity:electric', 'contact:email', 24 | 'contact:phone', 'contact:website', 'operator') 25 | 26 | 27 | def dataset(fileobj): 28 | import codecs 29 | import json 30 | import logging 31 | 32 | # Specifying utf-8 is important, otherwise you'd get "bytes" instead of "str" 33 | source = json.load(codecs.getreader('utf-8')(fileobj)) 34 | data = [] 35 | for el in source['Items']: 36 | try: 37 | gid = int(el['Id']) 38 | lon = el['Position']['Lon'] 39 | lat = el['Position']['Lat'] 40 | terminal = 'yes' if el['HasTerminal'] else 'no' 41 | tags = { 42 | 'amenity': 'bicycle_rental', 43 | 'network': 'Велобайк', 44 | 'ref': gid, 45 | 'capacity': el['TotalOrdinaryPlaces'], 46 | 'capacity:electric': el['TotalElectricPlaces'], 47 | 'contact:email': 'info@velobike.ru', 48 | 'contact:phone': '+7 495 966-46-69', 49 | 'contact:website': 'https://velobike.ru/', 50 | 'opening_hours': '24/7', 51 | 'operator': 'ЗАО «СитиБайк»', 52 | 'payment:cash': 'no', 53 | 'payment:troika': 'no', 54 | 'payment:mastercard': terminal, 55 | 'payment:visa': terminal, 56 | } 57 | try: 58 | lat = float(lat) 59 | lon = float(lon) 60 | data.append(SourcePoint(gid, lat, lon, tags)) 61 | except Exception as e: 62 | logging.warning('PROFILE: Failed to parse lat/lon for rental stand %s: %s', gid, str(e)) 63 | except Exception as e: 64 | logging.warning('PROFILE: Failed to get attributes for rental stand: %s', str(e)) 65 | return data 66 | -------------------------------------------------------------------------------- /profiles/yandex_parser.py: -------------------------------------------------------------------------------- 1 | from lxml import etree 2 | import logging 3 | import re 4 | import phonenumbers # https://pypi.python.org/pypi/phonenumberslite 5 | 6 | 7 | class Company: 8 | def __init__(self, cid): 9 | self.id = cid 10 | self.name = {} 11 | self.alt_name = {} 12 | self.address = {} 13 | self.country = {} 14 | self.address_add = {} 15 | self.opening_hours = None 16 | self.url = None 17 | self.url_add = None 18 | self.url_ext = None 19 | self.email = None 20 | self.rubric = [] 21 | self.phones = [] 22 | self.faxes = [] 23 | self.photos = [] 24 | self.lat = None 25 | self.lon = None 26 | self.other = {} 27 | 28 | 29 | def parse_feed(f): 30 | def multilang(c, name): 31 | for el in company.findall(name): 32 | lang = el.get('lang', 'default') 33 | value = el.text 34 | if value and len(value.strip()) > 0: 35 | c[lang] = value.strip() 36 | 37 | def parse_subels(el): 38 | res = {} 39 | if el is None: 40 | return res 41 | for subel in el: 42 | name = subel.tag 43 | text = subel.text 44 | if text and text.strip(): 45 | res[name] = text 46 | return res 47 | 48 | def parse_opening_hours(s): 49 | if 'углосуточн' in s: 50 | return '24/7' 51 | m = re.search(r'([01]?\d:\d\d).*?([12]?\d:\d\d)', s) 52 | if m: 53 | # TODO: parse weekdays 54 | start = m.group(1) 55 | start = re.sub(r'^(\d:)', r'0\1', start) 56 | end = m.group(2) 57 | end = re.sub(r'0?0:', '24:', end) 58 | return 'Mo-Su {}-{}'.format(start, end) 59 | # TODO 60 | return None 61 | 62 | xml = etree.parse(f).getroot() 63 | if xml.tag != 'companies': 64 | logging.error('Root node must be named "companies", not %s', xml.tag) 65 | for company in xml: 66 | if company.tag != 'company': 67 | logging.warn('Non-company in yandex xml: %s', company.tag) 68 | continue 69 | cid = company.find('company-id') 70 | if cid is None or not cid.text: 71 | logging.error('No id for a company') 72 | continue 73 | c = Company(cid.text.strip()) 74 | multilang(c.name, 'name') 75 | multilang(c.alt_name, 'name-other') 76 | multilang(c.address, 'address') 77 | loc = {} 78 | multilang(loc, 'locality-name') 79 | if loc: 80 | for lng, place in loc.items(): 81 | if lng in c.address: 82 | c.address = place + ', ' + c.address 83 | multilang(c.address_add, 'address-add') 84 | multilang(c.country, 'country') 85 | coord = parse_subels(company.find('coordinates')) 86 | if 'lat' in coord and 'lon' in coord: 87 | c.lat = float(coord['lat']) 88 | c.lon = float(coord['lon']) 89 | else: 90 | logging.warn('No coordinates for %s', c.id) 91 | continue 92 | for ph in company.findall('phone'): 93 | phone = parse_subels(ph) 94 | if 'number' not in phone: 95 | continue 96 | parsed_phone = phonenumbers.parse(phone['number'], 'RU') 97 | number = phonenumbers.format_number( 98 | parsed_phone, phonenumbers.PhoneNumberFormat.INTERNATIONAL) 99 | if 'ext' in phone: 100 | number += ' ext. ' + phone['ext'] 101 | typ = phone.get('type', 'phone') 102 | if typ == 'fax': 103 | c.faxes.append(number) 104 | else: 105 | c.phones.append(number) 106 | email = company.find('email') 107 | if email is not None and email.text: 108 | c.email = email.text.strip() 109 | url = company.find('url') 110 | if url is not None and url.text: 111 | c.url = url.text.strip() 112 | url_add = company.find('add-url') 113 | if url_add is not None and url_add.text: 114 | c.url_add = url_add.text.strip() 115 | url_ext = company.find('info-page') 116 | if url_ext is not None and url_ext.text: 117 | c.url_ext = url_ext.text.strip() 118 | for rub in company.findall('rubric-rd'): 119 | if rub.text: 120 | c.rubric.append(int(rub.text.strip())) 121 | coh = company.find('working-time') 122 | if coh is not None and coh.text: 123 | c.opening_hours = parse_opening_hours(coh.text) 124 | photos = company.find('photos') 125 | if photos is not None: 126 | for photo in photos: 127 | if photo.get('type', 'interior') != 'food': 128 | c.photos.append(photo.get('url')) 129 | for feat in company: 130 | if feat.tag.startswith('feature-'): 131 | name = feat.get('name', None) 132 | value = feat.get('value', None) 133 | if name is not None and value is not None: 134 | if feat.tag == 'feature-boolean': 135 | value = value == '1' 136 | elif '-numeric' in feat.tag: 137 | value = float(value) 138 | c.other[name] = value 139 | yield c 140 | -------------------------------------------------------------------------------- /scripts/README.md: -------------------------------------------------------------------------------- 1 | # Scripts 2 | 3 | Here are some (one at the moment) scripts to prepare data for the conflator 4 | or do stuff after conflating. 5 | 6 | ## pack_places.py 7 | 8 | Prepares `places.bin` file for the geocoder. Requires three JSON files: 9 | 10 | * places.json 11 | * regions.json 12 | * countries.json 13 | 14 | These comprise the "places feed" and can be prepared using 15 | [these scripts](https://github.com/mapsme/geocoding_data). You can 16 | find a link to a ready-made feed in that repository. 17 | -------------------------------------------------------------------------------- /scripts/pack_places.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import json 3 | import struct 4 | import os 5 | import sys 6 | 7 | 8 | def pack_coord(coord): 9 | data = struct.pack('