├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── README.rst
├── conflate
    ├── __init__.py
    ├── __main__.py
    ├── conflate.py
    ├── conflator.py
    ├── data.py
    ├── dataset.py
    ├── geocoder.py
    ├── osm.py
    ├── places.bin
    ├── profile.py
    └── version.py
├── filter
    ├── CMakeLists.txt
    ├── FindOsmium.cmake
    ├── FindProtozero.cmake
    ├── README.md
    ├── RTree.h
    ├── filter_planet_by_cats.cpp
    └── xml_centers_output.hpp
├── profiles
    ├── auchan_moscow.py
    ├── azbuka.py
    ├── burgerking.py
    ├── minkult.py
    ├── moscow_addr.py
    ├── moscow_parkomats.py
    ├── navads_shell.py
    ├── navads_shell_json.py
    ├── rosinter.py
    ├── schocoladnitsa.py
    ├── velobike.py
    └── yandex_parser.py
├── scripts
    ├── README.md
    └── pack_places.py
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.swp
 2 | *.pyc
 3 | *.user
 4 | private/
 5 | data/
 6 | dist/
 7 | __pycache__/
 8 | *.egg*
 9 | build/
10 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | # OSM Conflator Change Log
  2 | 
  3 | ## master branch
  4 | 
  5 | ## 1.4.1
  6 | 
  7 | _Released 2019-06-04_
  8 | 
  9 | * Fixed an error when the query is pure regexp and it did not match anything.
 10 | 
 11 | ## 1.4.0
 12 | 
 13 | _Released 2018-05-30_
 14 | 
 15 | * Refactored `conflate.py` into seven smaller files.
 16 | * Added a simple kd-tree based geocoder for countries and regions. Controlled by the `regions` parameter in a profile.
 17 | * You can filter by regions using `-r` argument or `"regions"` list in an audit file.
 18 | * Using the new `nwr` query type of Overpass API.
 19 | * Reduced default `max_request_boxes` to four.
 20 | * New argument `--alt-overpass` to use Kumi Systems' server (since the main one is blocked in Russia).
 21 | * Better handling of server runtime errors.
 22 | * Find matches in OSM with `--list <result.csv>`.
 23 | * Control number of nearest points to check for matches with `nearest_points` profile parameter.
 24 | * When you have dataset ID in an URL or other tag, use `find_ref` profile function to match on it.
 25 | 
 26 | ## 1.3.3
 27 | 
 28 | _Released 2018-04-26_
 29 | 
 30 | * Fixed processing of `''` tag value.
 31 | * More that 3 duplicate points in a single place are processed correctly.
 32 | * Now you can `yield` points from a profile instead of making a list.
 33 | * Not marking nodes with `move` in the audit file as modified, unless we move them.
 34 | 
 35 | ## 1.3.2
 36 | 
 37 | _Released 2018-04-19_
 38 | 
 39 | * Fixed bug in categories building.
 40 | * Fixed threshold for tags in duplicates check.
 41 | * Now the script prints "Done" when finished, to better measure time.
 42 | 
 43 | ## 1.3.1
 44 | 
 45 | _Released 2018-03-20_
 46 | 
 47 | * "Similar tags" now means at least 66% instead of 50%.
 48 | * Instead of removing all duplicates, conflating them and removing only unmatched.
 49 | 
 50 | ## 1.3.0
 51 | 
 52 | _Released 2018-03-15_
 53 | 
 54 | * Support for categories: `category_tag` and `categories` parameters in a profile.
 55 | * LibOsmium-based C++ filtering script for categories.
 56 | * More than one tag value works as "one of": `[('amenity', 'cafe', 'restaurant')]`.
 57 | * Query can be a list of queries, providing for "OR" clause. An example:
 58 | 
 59 |     `[[('amenity', 'swimming_pool')], [('leisure', 'swimming_pool')]]`
 60 | 
 61 | * Parameters for profiles, using `-p` argument.
 62 | * No more default imports solely for profiles, import `zipfile` youself now.
 63 | * Remarks for source points, thanks [@nixi](https://github.com/hixi).
 64 | * Better error message for Overpass API timeouts.
 65 | * Lifecycle prefixes are conflated, e.g. `amenity=*` and `was:amenity=*`.
 66 | * Dataset is checked for duplicates, which are reported (see `-d`) and removed.
 67 | * Support GeoJSON input (put identifiers into `id` property).
 68 | 
 69 | ## 1.2.3
 70 | 
 71 | _Released 2017-12-29_
 72 | 
 73 | * Fix error in applying audit json after conflating `contact:` namespace.
 74 | 
 75 | ## 1.2.2
 76 | 
 77 | _Released 2017-12-27_
 78 | 
 79 | * Addr:full tag is not set when addr:housenumber is present.
 80 | * Whitespace is stripped from tag values in a dataset.
 81 | * Conflate `contact:` namespace.
 82 | 
 83 | ## 1.2.1
 84 | 
 85 | _Released 2017-12-20_
 86 | 
 87 | * Support force creating points with `audit['create']`.
 88 | * Fix green colour for created points in JSON.
 89 | * Make `--output` optional and remove the default.
 90 | 
 91 | ## 1.2.0
 92 | 
 93 | _Released 2017-11-23_
 94 | 
 95 | * Checking moveability for json output (`-m`) for cf_audit.
 96 | * Support for cf_audit json (`-a`).
 97 | 
 98 | ## 1.1.0
 99 | 
100 | _Released 2017-10-06_
101 | 
102 | * Use `-v` for debug messages and `-q` to suppress informational messages.
103 | * You can run `conflate/conflate.py` as a script, again.
104 | * Profiles: added "override" dict with dataset id → OSM POI name or id like 'n12345'.
105 | * Profiles: added "matched" function that returns `False` if an OSM point should not be matched to dataset point (fixes [#6](https://github.com/mapsme/osm_conflate/issues/6)).
106 | * Profiles: `master_tags` is no longer mandatory.
107 | * If no `master_tags` specified in a profile, all tags are now considered non-master.
108 | * When a tag value was `None`, the tag was deleted on object modification. That should be done only on retagging non-matched objects.
109 | * OSM objects filtering failed when a query was a string.
110 | 
111 | ## 1.0.0
112 | 
113 | _Released 2017-06-07_
114 | 
115 | The initial PyPi release with all the features.
116 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
 1 | OSM Conflator
 2 | =============
 3 | 
 4 | This is a script for merging points from some third-party source with
 5 | OpenStreetMap data. Please make sure the license allows that. After
 6 | merging and uploading, the data can be updated.
 7 | 
 8 | See `the OSM wiki page`_ for detailed description and instructions.
 9 | 
10 | Installation
11 | ------------
12 | 
13 | Run
14 | ``pip install osm_conflate``.
15 | 
16 | Profiles
17 | --------
18 | 
19 | Each source should have a profile. It is a python script with variables
20 | configuring names, tags and processing. See heavily commented examples
21 | in the ``profiles`` directory.
22 | 
23 | Usage
24 | -----
25 | 
26 | For a simplest case, run:
27 | 
28 | ::
29 | 
30 |     conflate <profile.py> -o result.osm
31 | 
32 | You might want to add other arguments,
33 | to pass a dataset file or prepare a preview GeoJSON. Run
34 | ``conflate -h`` to see a list of arguments.
35 | 
36 | Uploading to OpenStreetMap
37 | --------------------------
38 | 
39 | It is recommended to open the resulting file in the JOSM editor and
40 | manually check the changes. Alternatively, you can use
41 | `bulk\_upload.py`_ to upload a change file from the command line.
42 | 
43 | Please mind the `Import Guidelines`_, or your work may be reverted.
44 | 
45 | License
46 | -------
47 | 
48 | Written by Ilya Zverev for MAPS.ME. Published under the Apache 2.0
49 | license.
50 | 
51 | .. _the OSM wiki page: https://wiki.openstreetmap.org/wiki/OSM_Conflator
52 | .. _bulk\_upload.py: https://wiki.openstreetmap.org/wiki/Bulk_upload.py
53 | .. _Import Guidelines: https://wiki.openstreetmap.org/wiki/Import/Guidelines
54 | 
55 | 


--------------------------------------------------------------------------------
/conflate/__init__.py:
--------------------------------------------------------------------------------
 1 | try:
 2 |     from lxml import etree
 3 | except ImportError:
 4 |     import xml.etree.ElementTree as etree
 5 | from .data import SourcePoint
 6 | from .conflate import run
 7 | from .version import __version__
 8 | from .profile import Profile, ProfileException
 9 | from .conflator import OsmConflator
10 | 


--------------------------------------------------------------------------------
/conflate/__main__.py:
--------------------------------------------------------------------------------
1 | from . import run
2 | 
3 | run()
4 | 


--------------------------------------------------------------------------------
/conflate/conflate.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import argparse
  3 | import csv
  4 | import json
  5 | import logging
  6 | import os
  7 | import sys
  8 | from .geocoder import Geocoder
  9 | from .profile import Profile
 10 | from .conflator import OsmConflator, TITLE
 11 | from .dataset import (
 12 |     read_dataset,
 13 |     add_categories_to_dataset,
 14 |     transform_dataset,
 15 |     check_dataset_for_duplicates,
 16 |     add_regions,
 17 | )
 18 | 
 19 | 
 20 | def write_for_filter(profile, dataset, f):
 21 |     def query_to_tag_strings(query):
 22 |         if isinstance(query, str):
 23 |             raise ValueError('Query string for filter should not be a string')
 24 |         result = []
 25 |         if not isinstance(query[0], str) and isinstance(query[0][0], str):
 26 |             query = [query]
 27 |         for q in query:
 28 |             if isinstance(q, str):
 29 |                 raise ValueError('Query string for filter should not be a string')
 30 |             parts = []
 31 |             for part in q:
 32 |                 if len(part) == 1:
 33 |                     parts.append(part[0])
 34 |                 elif part[1] is None or len(part[1]) == 0:
 35 |                     parts.append('{}='.format(part[0]))
 36 |                 elif part[1][0] == '~':
 37 |                     raise ValueError('Cannot use regular expressions in filter')
 38 |                 elif '|' in part[1] or ';' in part[1]:
 39 |                     raise ValueError('"|" and ";" symbols is not allowed in query values')
 40 |                 else:
 41 |                     parts.append('='.join(part))
 42 |             result.append('|'.join(parts))
 43 |         return result
 44 | 
 45 |     def tags_to_query(tags):
 46 |         return [(k, v) for k, v in tags.items()]
 47 | 
 48 |     categories = profile.get('categories', {})
 49 |     p_query = profile.get('query', None)
 50 |     if p_query is not None:
 51 |         categories[None] = {'query': p_query}
 52 |     cat_map = {}
 53 |     i = 0
 54 |     try:
 55 |         for name, query in categories.items():
 56 |             for tags in query_to_tag_strings(query.get('query', tags_to_query(query.get('tags')))):
 57 |                 f.write('{},{},{}\n'.format(i, name or '', tags))
 58 |             cat_map[name] = i
 59 |             i += 1
 60 |     except ValueError as e:
 61 |         logging.error(e)
 62 |         return False
 63 |     f.write('\n')
 64 |     for d in dataset:
 65 |         if d.category in cat_map:
 66 |             f.write('{},{},{}\n'.format(d.lon, d.lat, cat_map[d.category]))
 67 |     return True
 68 | 
 69 | 
 70 | def run(profile=None):
 71 |     parser = argparse.ArgumentParser(
 72 |         description='''{}.
 73 |         Reads a profile with source data and conflates it with OpenStreetMap data.
 74 |         Produces an JOSM XML file ready to be uploaded.'''.format(TITLE))
 75 |     if not profile:
 76 |         parser.add_argument('profile', type=argparse.FileType('r'),
 77 |                             help='Name of a profile (python or json) to use')
 78 |     parser.add_argument('-i', '--source', type=argparse.FileType('rb'),
 79 |                         help='Source file to pass to the profile dataset() function')
 80 |     parser.add_argument('-a', '--audit', type=argparse.FileType('r'),
 81 |                         help='Conflation validation result as a JSON file')
 82 |     parser.add_argument('-o', '--output', type=argparse.FileType('w'),
 83 |                         help='Output OSM XML file name')
 84 |     parser.add_argument('-p', '--param',
 85 |                         help='Optional parameter for the profile')
 86 |     parser.add_argument('--osc', action='store_true',
 87 |                         help='Produce an osmChange file instead of JOSM XML')
 88 |     parser.add_argument('--osm',
 89 |                         help='Instead of querying Overpass API, use this unpacked osm file. ' +
 90 |                         'Create one from Overpass data if not found')
 91 |     parser.add_argument('-c', '--changes', type=argparse.FileType('w'),
 92 |                         help='Write changes as GeoJSON for visualization')
 93 |     parser.add_argument('-m', '--check-move', action='store_true',
 94 |                         help='Check for moveability of modified modes')
 95 |     parser.add_argument('-f', '--for-filter', type=argparse.FileType('w'),
 96 |                         help='Prepare a file for the filtering script')
 97 |     parser.add_argument('-l', '--list', type=argparse.FileType('w'),
 98 |                         help='Print a CSV list of matches')
 99 |     parser.add_argument('-d', '--list_duplicates', action='store_true',
100 |                         help='List all duplicate points in the dataset')
101 |     parser.add_argument('-r', '--regions',
102 |                         help='Conflate only points with regions in this comma-separated list')
103 |     parser.add_argument('--alt-overpass', action='store_true',
104 |                         help='Use an alternate Overpass API server')
105 |     parser.add_argument('-v', '--verbose', action='store_true',
106 |                         help='Display debug messages')
107 |     parser.add_argument('-q', '--quiet', action='store_true',
108 |                         help='Do not display informational messages')
109 |     options = parser.parse_args()
110 | 
111 |     if (not options.output and not options.changes and
112 |             not options.for_filter and not options.list):
113 |         parser.print_help()
114 |         return
115 | 
116 |     if options.verbose:
117 |         log_level = logging.DEBUG
118 |     elif options.quiet:
119 |         log_level = logging.WARNING
120 |     else:
121 |         log_level = logging.INFO
122 |     logging.basicConfig(level=log_level, format='%(asctime)s %(message)s', datefmt='%H:%M:%S')
123 |     logging.getLogger("requests").setLevel(logging.WARNING)
124 |     logging.getLogger("urllib3").setLevel(logging.WARNING)
125 | 
126 |     if not profile:
127 |         logging.debug('Loading profile %s', options.profile)
128 |     profile = Profile(profile or options.profile, options.param)
129 | 
130 |     audit = None
131 |     if options.audit:
132 |         audit = json.load(options.audit)
133 | 
134 |     geocoder = Geocoder(profile.get_raw('regions'))
135 |     if options.regions:
136 |         geocoder.set_filter(options.regions)
137 |     elif audit and audit.get('regions'):
138 |         geocoder.set_filter(audit.get('regions'))
139 | 
140 |     dataset = read_dataset(profile, options.source)
141 |     if not dataset:
142 |         logging.error('Empty source dataset')
143 |         sys.exit(2)
144 |     transform_dataset(profile, dataset)
145 |     add_categories_to_dataset(profile, dataset)
146 |     check_dataset_for_duplicates(profile, dataset, options.list_duplicates)
147 |     add_regions(dataset, geocoder)
148 |     logging.info('Read %s items from the dataset', len(dataset))
149 | 
150 |     if options.for_filter:
151 |         if write_for_filter(profile, dataset, options.for_filter):
152 |             logging.info('Prepared data for filtering, exitting')
153 |         return
154 | 
155 |     conflator = OsmConflator(profile, dataset, audit)
156 |     conflator.geocoder = geocoder
157 |     if options.alt_overpass:
158 |         conflator.set_overpass('alt')
159 |     if options.osm and os.path.exists(options.osm):
160 |         with open(options.osm, 'r') as f:
161 |             conflator.parse_osm(f)
162 |     else:
163 |         conflator.download_osm()
164 |         if len(conflator.osmdata) > 0 and options.osm:
165 |             with open(options.osm, 'w') as f:
166 |                 f.write(conflator.backup_osm())
167 |     logging.info('Downloaded %s objects from OSM', len(conflator.osmdata))
168 | 
169 |     conflator.match()
170 | 
171 |     if options.output:
172 |         diff = conflator.to_osc(not options.osc)
173 |         options.output.write(diff)
174 | 
175 |     if options.changes:
176 |         if options.check_move:
177 |             conflator.check_moveability()
178 |         fc = {'type': 'FeatureCollection', 'features': conflator.changes}
179 |         json.dump(fc, options.changes, ensure_ascii=False, sort_keys=True, indent=1)
180 | 
181 |     if options.list:
182 |         writer = csv.writer(options.list)
183 |         writer.writerow(['ref', 'osm_type', 'osm_id', 'lat', 'lon', 'action'])
184 |         for row in conflator.matches:
185 |             writer.writerow(row)
186 | 
187 |     logging.info('Done')
188 | 


--------------------------------------------------------------------------------
/conflate/conflator.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import kdtree
  3 | from collections import defaultdict
  4 | from .data import OSMPoint
  5 | from .version import __version__
  6 | from .osm import OsmDownloader, check_moveability
  7 | from . import etree
  8 | 
  9 | 
 10 | TITLE = 'OSM Conflator ' + __version__
 11 | CONTACT_KEYS = set(('phone', 'website', 'email', 'fax', 'facebook', 'twitter', 'instagram'))
 12 | LIFECYCLE_KEYS = set(('amenity', 'shop', 'tourism', 'craft', 'office'))
 13 | LIFECYCLE_PREFIXES = ('proposed', 'construction', 'disused', 'abandoned', 'was', 'removed')
 14 | 
 15 | 
 16 | class OsmConflator:
 17 |     """The main class for the conflator.
 18 | 
 19 |     It receives a dataset, after which one must call either
 20 |     "download_osm" or "parse_osm" methods. Then it is ready to match:
 21 |     call the "match" method and get results with "to_osc".
 22 |     """
 23 |     def __init__(self, profile, dataset, audit=None):
 24 |         self.dataset = {p.id: p for p in dataset}
 25 |         self.audit = audit or {}
 26 |         self.osmdata = {}
 27 |         self.matched = []
 28 |         self.changes = []
 29 |         self.matches = []
 30 |         self.profile = profile
 31 |         self.geocoder = None
 32 |         self.downloader = OsmDownloader(profile)
 33 |         self.source = self.profile.get(
 34 |             'source', required='value of "source" tag for uploaded OSM objects')
 35 |         self.add_source_tag = self.profile.get('add_source', False)
 36 |         if self.profile.get('no_dataset_id', False):
 37 |             self.ref = None
 38 |         else:
 39 |             self.ref = 'ref:' + self.profile.get(
 40 |                 'dataset_id', required='A fairly unique id of the dataset to query OSM')
 41 | 
 42 |     def set_overpass(self, server='alt'):
 43 |         self.downloader.set_overpass(server)
 44 | 
 45 |     def download_osm(self):
 46 |         bboxes = self.downloader.calc_boxes(self.dataset.values())
 47 |         self.osmdata = self.downloader.download(bboxes)
 48 | 
 49 |     def parse_osm(self, fileobj):
 50 |         self.osmdata = self.downloader.parse_xml(fileobj)
 51 | 
 52 |     def register_match(self, dataset_key, osmdata_key, keep=False, retag=None):
 53 |         """Registers a match between an OSM point and a dataset point.
 54 | 
 55 |         Merges tags from an OSM Point and a dataset point, and add the result to the
 56 |         self.matched list.
 57 |         If dataset_key is None, deletes or retags the OSM point.
 58 |         If osmdata_key is None, adds a new OSM point for the dataset point.
 59 |         """
 60 |         def get_osm_key(k, osm_tags):
 61 |             """Conflating contact: namespace."""
 62 |             if k in CONTACT_KEYS and k not in osm_tags and 'contact:'+k in osm_tags:
 63 |                 return 'contact:'+k
 64 |             elif k.startswith('contact:') and k not in osm_tags and k[8:] in osm_tags:
 65 |                 return k[8:]
 66 | 
 67 |             # Now conflating lifecycle prefixes, only forward
 68 |             if k in LIFECYCLE_KEYS and k not in osm_tags:
 69 |                 for prefix in LIFECYCLE_PREFIXES:
 70 |                     if prefix+':'+k in osm_tags:
 71 |                         return prefix+':'+k
 72 |             return k
 73 | 
 74 |         def update_tags(tags, source, master_tags=None, retagging=False, audit=None):
 75 |             """Updates tags dictionary with tags from source,
 76 |             returns True is something was changed."""
 77 |             keep = set()
 78 |             override = set()
 79 |             changed = False
 80 |             if source:
 81 |                 if audit:
 82 |                     keep = set(audit.get('keep', []))
 83 |                     override = set(audit.get('override', []))
 84 |                 for k, v in source.items():
 85 |                     osm_key = get_osm_key(k, tags)
 86 | 
 87 |                     if k in keep or osm_key in keep:
 88 |                         continue
 89 |                     if k in override or osm_key in override:
 90 |                         if not v and osm_key in tags:
 91 |                             del tags[osm_key]
 92 |                             changed = True
 93 |                         elif v and tags.get(osm_key, None) != v:
 94 |                             tags[osm_key] = v
 95 |                             changed = True
 96 |                         continue
 97 | 
 98 |                     if osm_key not in tags or retagging or (
 99 |                             tags[osm_key] != v and (master_tags and k in master_tags)):
100 |                         if v is not None and len(v) > 0:
101 |                             # Not setting addr:full when the object has addr:housenumber
102 |                             if k == 'addr:full' and 'addr:housenumber' in tags:
103 |                                 continue
104 |                             tags[osm_key] = v
105 |                             changed = True
106 |                         elif osm_key in tags and (v == '' or retagging):
107 |                             del tags[osm_key]
108 |                             changed = True
109 |             return changed
110 | 
111 |         def format_change(before, after, ref):
112 |             MARKER_COLORS = {
113 |                 'delete': '#ee2211',  # deleting feature from OSM
114 |                 'create': '#11dd11',  # creating a new node
115 |                 'update': '#0000ee',  # changing tags on an existing feature
116 |                 'retag':  '#660000',  # cannot delete unmatched feature, changing tags
117 |                 'move':   '#110055',  # moving an existing node
118 |             }
119 |             marker_action = None
120 |             geometry = {'type': 'Point', 'coordinates': [after.lon, after.lat]}
121 |             props = {
122 |                 'osm_type': after.osm_type,
123 |                 'osm_id': after.osm_id,
124 |                 'action': after.action
125 |             }
126 |             if after.action in ('create', 'delete'):
127 |                 # Red if deleted, green if added
128 |                 marker_action = after.action
129 |                 for k, v in after.tags.items():
130 |                     props['tags.{}'.format(k)] = v
131 |                 if ref:
132 |                     props['ref_id'] = ref.id
133 |             else:  # modified
134 |                 # Blue if updated from dataset, dark red if retagged, dark blue if moved
135 |                 marker_action = 'update' if ref else 'retag'
136 |                 if ref:
137 |                     props['ref_id'] = ref.id
138 |                     props['ref_distance'] = round(10 * ref.distance(before)) / 10.0
139 |                     props['ref_coords'] = [ref.lon, ref.lat]
140 |                     if before.lon != after.lon or before.lat != after.lat:
141 |                         # The object was moved
142 |                         props['were_coords'] = [before.lon, before.lat]
143 |                         marker_action = 'move'
144 |                     # Find tags that were superseeded by OSM tags
145 |                     for k, v in ref.tags.items():
146 |                         osm_key = get_osm_key(k, after.tags)
147 |                         if osm_key not in after.tags or after.tags[osm_key] != v:
148 |                             props['ref_unused_tags.{}'.format(osm_key)] = v
149 |                 # Now compare old and new OSM tags
150 |                 for k in set(after.tags.keys()).union(set(before.tags.keys())):
151 |                     v0 = before.tags.get(k, None)
152 |                     v1 = after.tags.get(k, None)
153 |                     if v0 == v1:
154 |                         props['tags.{}'.format(k)] = v0
155 |                     elif v0 is None:
156 |                         props['tags_new.{}'.format(k)] = v1
157 |                     elif v1 is None:
158 |                         props['tags_deleted.{}'.format(k)] = v0
159 |                     else:
160 |                         props['tags_changed.{}'.format(k)] = '{} -> {}'.format(v0, v1)
161 |             props['marker-color'] = MARKER_COLORS[marker_action]
162 |             if ref and ref.remarks:
163 |                 props['remarks'] = ref.remarks
164 |             if ref and ref.region:
165 |                 props['region'] = ref.region
166 |             elif self.geocoder:
167 |                 region, present = self.geocoder.find(after)
168 |                 if not present:
169 |                     return None
170 |                 if region is not None:
171 |                     props['region'] = region
172 |             return {'type': 'Feature', 'geometry': geometry, 'properties': props}
173 | 
174 |         p = self.osmdata.pop(osmdata_key, None)
175 |         p0 = None if p is None else p.copy()
176 |         sp = self.dataset.pop(dataset_key, None)
177 |         audit = self.audit.get(sp.id if sp else '{}{}'.format(p.osm_type, p.osm_id), {})
178 |         if audit.get('skip', False):
179 |             return
180 | 
181 |         if sp is not None:
182 |             if p is None:
183 |                 p = OSMPoint('node', -1-len(self.matched), 1, sp.lat, sp.lon, sp.tags)
184 |                 p.action = 'create'
185 |             else:
186 |                 master_tags = set(self.profile.get('master_tags', []))
187 |                 if update_tags(p.tags, sp.tags, master_tags, audit=audit):
188 |                     p.action = 'modify'
189 |                 # Move a node if it is too far from the dataset point
190 |                 if not p.is_area() and sp.distance(p) > self.profile.max_distance:
191 |                     p.lat = sp.lat
192 |                     p.lon = sp.lon
193 |                     p.action = 'modify'
194 |             if self.add_source_tag:
195 |                 if 'source' in p.tags:
196 |                     if self.source not in p.tags['source']:
197 |                         p.tags['source'] = ';'.join([p.tags['source'], self.source])
198 |                 else:
199 |                     p.tags['source'] = self.source
200 |             if self.ref is not None:
201 |                 p.tags[self.ref] = sp.id
202 |             if 'fixme' in audit and audit['fixme'] != p.tags.get('fixme'):
203 |                 p.tags['fixme'] = audit['fixme']
204 |                 if p.action is None:
205 |                     p.action = 'modify'
206 |             if 'move' in audit and not p.is_area():
207 |                 if p0 and audit['move'] == 'osm':
208 |                     p.lat = p0.lat
209 |                     p.lon = p0.lon
210 |                 elif audit['move'] == 'dataset':
211 |                     p.lat = sp.lat
212 |                     p.lon = sp.lon
213 |                 elif len(audit['move']) == 2:
214 |                     p.lat = audit['move'][1]
215 |                     p.lon = audit['move'][0]
216 |                 if p.action is None and p0.distance(p) > 0.1:
217 |                     p.action = 'modify'
218 |             if p.action != 'create':
219 |                 self.matches.append([sp.id, p.osm_type, p.osm_id, p.lat, p.lon, p.action])
220 |             else:
221 |                 self.matches.append([sp.id, '', '', p.lat, p.lon, p.action])
222 |         elif keep or p.is_area():
223 |             if update_tags(p.tags, retag, retagging=True, audit=audit):
224 |                 p.action = 'modify'
225 |         else:
226 |             p.action = 'delete'
227 | 
228 |         if p.action is not None:
229 |             change = format_change(p0, p, sp)
230 |             if change is not None:
231 |                 self.matched.append(p)
232 |                 self.changes.append(change)
233 | 
234 |     def match_dataset_points_smart(self):
235 |         """Smart matching for dataset <-> OSM points.
236 | 
237 |         We find a shortest link between a dataset and an OSM point.
238 |         Then we match these and remove both from dicts.
239 |         Then find another link and so on, until the length of a link
240 |         becomes larger than "max_distance".
241 | 
242 |         Currently the worst case complexity is around O(n^2*log^2 n).
243 |         But given the small number of objects to match, and that
244 |         the average case complexity is ~O(n*log^2 n), this is fine.
245 |         """
246 |         def search_nn_fix(kd, point):
247 |             nearest = kd.search_knn(point, self.profile.get('nearest_points', 10))
248 |             if not nearest:
249 |                 return None, None
250 |             match_func = self.profile.get_raw('matches')
251 |             if match_func:
252 |                 nearest = [p for p in nearest if match_func(p[0].data.tags, point.tags)]
253 |                 if not nearest:
254 |                     return None, None
255 |             nearest = [(n[0], n[0].data.distance(point))
256 |                        for n in nearest if point.category in n[0].data.categories]
257 |             return sorted(nearest, key=lambda kv: kv[1])[0]
258 | 
259 |         if not self.osmdata:
260 |             return
261 |         osm_kd = kdtree.create(list(self.osmdata.values()))
262 |         count_matched = 0
263 | 
264 |         # Process overridden features first
265 |         for override, osm_find in self.profile.get('override', {}).items():
266 |             override = str(override)
267 |             if override not in self.dataset:
268 |                 continue
269 |             found = None
270 |             if len(osm_find) > 2 and osm_find[0] in 'nwr' and osm_find[1].isdigit():
271 |                 if osm_find in self.osmdata:
272 |                     found = self.osmdata[osm_find]
273 |             # Search nearest 100 points
274 |             nearest = osm_kd.search_knn(self.dataset[override], 100)
275 |             if nearest:
276 |                 for p in nearest:
277 |                     if 'name' in p[0].data.tags and p[0].data.tags['name'] == osm_find:
278 |                         found = p[0].data
279 |             if found:
280 |                 count_matched += 1
281 |                 self.register_match(override, found.id)
282 |                 osm_kd = osm_kd.remove(found)
283 | 
284 |         # Prepare distance list: match OSM points to each of the dataset points
285 |         dist = []
286 |         for sp, v in self.dataset.items():
287 |             osm_point, distance = search_nn_fix(osm_kd, v)
288 |             if osm_point is not None and distance <= self.profile.max_distance:
289 |                 dist.append((distance, sp, osm_point.data))
290 | 
291 |         # The main matching loop: sort dist list if needed,
292 |         # register the closes match, update the list
293 |         needs_sorting = True
294 |         while dist:
295 |             if needs_sorting:
296 |                 dist.sort(key=lambda x: x[0])
297 |                 needs_sorting = False
298 |             count_matched += 1
299 |             osm_point = dist[0][2]
300 |             self.register_match(dist[0][1], osm_point.id)
301 |             osm_kd = osm_kd.remove(osm_point)
302 |             del dist[0]
303 |             for i in reversed(range(len(dist))):
304 |                 if dist[i][2] == osm_point:
305 |                     nearest, distance = search_nn_fix(osm_kd, self.dataset[dist[i][1]])
306 |                     if nearest and distance <= self.profile.max_distance:
307 |                         dist[i] = (distance, dist[i][1], nearest.data)
308 |                         needs_sorting = i == 0 or distance < dist[0][0]
309 |                     else:
310 |                         del dist[i]
311 |                         needs_sorting = i == 0
312 |         logging.info('Matched %s points', count_matched)
313 | 
314 |     def match(self):
315 |         """Matches each osm object with a SourcePoint, or marks it as obsolete.
316 |         The resulting list of OSM Points are written to the "matched" field."""
317 |         find_ref = self.profile.get_raw('find_ref')
318 |         if self.ref is not None or callable(find_ref):
319 |             # First match all objects with ref:whatever tag set
320 |             count_ref = 0
321 |             for k, p in list(self.osmdata.items()):
322 |                 ref = None
323 |                 if self.ref and self.ref in p.tags:
324 |                     ref = p.tags[self.ref]
325 |                 elif find_ref:
326 |                     ref = find_ref(p.tags)
327 |                 if ref is not None:
328 |                     if ref in self.dataset:
329 |                         count_ref += 1
330 |                         self.register_match(ref, k)
331 |             logging.info('Updated %s OSM objects with %s tag', count_ref, self.ref)
332 | 
333 |         # Add points for which audit specifically mentioned creating
334 |         count_created = 0
335 |         for ref, a in self.audit.items():
336 |             if ref in self.dataset:
337 |                 if a.get('create', None):
338 |                     count_created += 1
339 |                     self.register_match(ref, None)
340 |                 elif a.get('skip', None):
341 |                     # If we skip an object here, it would affect the conflation order
342 |                     pass
343 |         if count_created > 0:
344 |             logging.info('Created %s audit-overridden dataset points', count_created)
345 | 
346 |         # Prepare exclusive groups dict
347 |         exclusive_groups = defaultdict(set)
348 |         for p, v in self.dataset.items():
349 |             if v.exclusive_group is not None:
350 |                 exclusive_groups[v.exclusive_group].add(p)
351 | 
352 |         # Then find matches for unmatched dataset points
353 |         self.match_dataset_points_smart()
354 | 
355 |         # Remove unmatched duplicates
356 |         count_duplicates = 0
357 |         for ids in exclusive_groups.values():
358 |             found = False
359 |             for p in ids:
360 |                 if p not in self.dataset:
361 |                     found = True
362 |                     break
363 |             for p in ids:
364 |                 if p in self.dataset:
365 |                     if found:
366 |                         count_duplicates += 1
367 |                         del self.dataset[p]
368 |                     else:
369 |                         # Leave one element when not matched any
370 |                         found = True
371 |         if count_duplicates > 0:
372 |             logging.info('Removed %s unmatched duplicates', count_duplicates)
373 | 
374 |         # Add unmatched dataset points
375 |         logging.info('Adding %s unmatched dataset points', len(self.dataset))
376 |         for k in sorted(list(self.dataset.keys())):
377 |             self.register_match(k, None)
378 | 
379 |         # And finally delete some or all of the remaining osm objects
380 |         if len(self.osmdata) > 0:
381 |             count_deleted = 0
382 |             count_retagged = 0
383 |             delete_unmatched = self.profile.get('delete_unmatched', False)
384 |             retag = self.profile.get('tag_unmatched')
385 |             for k, p in list(self.osmdata.items()):
386 |                 ref = None
387 |                 if self.ref and self.ref in p.tags:
388 |                     ref = p.tags[self.ref]
389 |                 elif find_ref:
390 |                     ref = find_ref(p.tags)
391 |                 if ref is not None:
392 |                     # When ref:whatever is present, we can delete that object safely
393 |                     count_deleted += 1
394 |                     self.register_match(None, k, retag=retag)
395 |                 elif delete_unmatched or retag:
396 |                     if not delete_unmatched or p.is_area():
397 |                         count_retagged += 1
398 |                     else:
399 |                         count_deleted += 1
400 |                     self.register_match(None, k, keep=not delete_unmatched, retag=retag)
401 |             logging.info(
402 |                 'Deleted %s and retagged %s unmatched objects from OSM',
403 |                 count_deleted, count_retagged)
404 | 
405 |     def backup_osm(self):
406 |         """Writes OSM data as-is."""
407 |         osm = etree.Element('osm', version='0.6', generator=TITLE)
408 |         for osmel in self.osmdata.values():
409 |             el = osmel.to_xml()
410 |             if osmel.osm_type != 'node':
411 |                 etree.SubElement(el, 'center', lat=str(osmel.lat), lon=str(osmel.lon))
412 |             osm.append(el)
413 |         return ("<?xml version='1.0' encoding='utf-8'?>\n" +
414 |                 etree.tostring(osm, encoding='utf-8').decode('utf-8'))
415 | 
416 |     def to_osc(self, josm=False):
417 |         """Returns a string with osmChange or JOSM XML."""
418 |         osc = etree.Element('osm' if josm else 'osmChange', version='0.6', generator=TITLE)
419 |         if josm:
420 |             neg_id = -1
421 |             changeset = etree.SubElement(osc, 'changeset')
422 |             ch_tags = {
423 |                 'source': self.source,
424 |                 'created_by': TITLE,
425 |                 'type': 'import'
426 |             }
427 |             for k, v in ch_tags.items():
428 |                 etree.SubElement(changeset, 'tag', k=k, v=v)
429 |         for osmel in self.matched:
430 |             if osmel.action is not None:
431 |                 el = osmel.to_xml()
432 |                 if josm:
433 |                     if osmel.action == 'create':
434 |                         el.set('id', str(neg_id))
435 |                         neg_id -= 1
436 |                     else:
437 |                         el.set('action', osmel.action)
438 |                     osc.append(el)
439 |                 else:
440 |                     etree.SubElement(osc, osmel.action).append(el)
441 |         return ("<?xml version='1.0' encoding='utf-8'?>\n" +
442 |                 etree.tostring(osc, encoding='utf-8').decode('utf-8'))
443 | 
444 |     def check_moveability(self):
445 |         check_moveability(self.changes)
446 | 


--------------------------------------------------------------------------------
/conflate/data.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | from . import etree
  3 | 
  4 | 
  5 | class SourcePoint:
  6 |     """A common class for points. Has an id, latitude and longitude,
  7 |     and a dict of tags. Remarks are optional for reviewers hints only."""
  8 |     def __init__(self, pid, lat, lon, tags=None, category=None, remarks=None, region=None):
  9 |         self.id = str(pid)
 10 |         self.lat = lat
 11 |         self.lon = lon
 12 |         self.tags = {} if tags is None else {
 13 |             k.lower(): str(v).strip() for k, v in tags.items() if v is not None}
 14 |         self.category = category
 15 |         self.dist_offset = 0
 16 |         self.remarks = remarks
 17 |         self.region = region
 18 |         self.exclusive_group = None
 19 | 
 20 |     def distance(self, other):
 21 |         """Calculate distance in meters."""
 22 |         dx = math.radians(self.lon - other.lon) * math.cos(0.5 * math.radians(self.lat + other.lat))
 23 |         dy = math.radians(self.lat - other.lat)
 24 |         return 6378137 * math.sqrt(dx*dx + dy*dy) - self.dist_offset
 25 | 
 26 |     def __len__(self):
 27 |         return 2
 28 | 
 29 |     def __getitem__(self, i):
 30 |         if i == 0:
 31 |             return self.lon
 32 |         elif i == 1:
 33 |             return self.lat
 34 |         else:
 35 |             raise ValueError('A SourcePoint has only lat and lon in a list')
 36 | 
 37 |     def __eq__(self, other):
 38 |         return self.id == other.id
 39 | 
 40 |     def __hash__(self):
 41 |         return hash(self.id)
 42 | 
 43 |     def __repr__(self):
 44 |         return 'SourcePoint({}, {}, {}, offset={}, tags={})'.format(
 45 |             self.id, self.lat, self.lon, self.dist_offset, self.tags)
 46 | 
 47 | 
 48 | class OSMPoint(SourcePoint):
 49 |     """An OSM points is a SourcePoint with a few extra fields.
 50 |     Namely, version, members (for ways and relations), and an action.
 51 |     The id is compound and created from object type and object id."""
 52 |     def __init__(self, ptype, pid, version, lat, lon, tags=None, categories=None):
 53 |         super().__init__('{}{}'.format(ptype[0], pid), lat, lon, tags)
 54 |         self.tags = {k: v for k, v in self.tags.items() if v is not None and len(v) > 0}
 55 |         self.osm_type = ptype
 56 |         self.osm_id = pid
 57 |         self.version = version
 58 |         self.members = None
 59 |         self.action = None
 60 |         self.categories = categories or set()
 61 |         self.remarks = None
 62 | 
 63 |     def copy(self):
 64 |         """Returns a copy of this object, except for members field."""
 65 |         c = OSMPoint(self.osm_type, self.osm_id, self.version, self.lat, self.lon, self.tags.copy())
 66 |         c.action = self.action
 67 |         c.remarks = self.remarks
 68 |         c.categories = self.categories.copy()
 69 |         return c
 70 | 
 71 |     def is_area(self):
 72 |         return self.osm_type != 'node'
 73 | 
 74 |     def is_poi(self):
 75 |         if self.osm_type == 'node':
 76 |             return True
 77 |         if self.osm_type == 'way' and len(self.members) > 2:
 78 |             return self.members[0] == self.members[-1]
 79 |         if self.osm_type == 'relation' and len(self.members) > 0:
 80 |             return self.tags.get('type', None) == 'multipolygon'
 81 |         return False
 82 | 
 83 |     def to_xml(self):
 84 |         """Produces an XML out of the point data. Disregards the "action" field."""
 85 |         el = etree.Element(self.osm_type, id=str(self.osm_id), version=str(self.version))
 86 |         for tag, value in self.tags.items():
 87 |             etree.SubElement(el, 'tag', k=tag, v=value)
 88 | 
 89 |         if self.osm_type == 'node':
 90 |             el.set('lat', str(self.lat))
 91 |             el.set('lon', str(self.lon))
 92 |         elif self.osm_type == 'way':
 93 |             for node_id in self.members:
 94 |                 etree.SubElement(el, 'nd', ref=str(node_id))
 95 |         elif self.osm_type == 'relation':
 96 |             for member in self.members:
 97 |                 m = etree.SubElement(el, 'member')
 98 |                 for i, n in enumerate(('type', 'ref', 'role')):
 99 |                     m.set(n, str(member[i]))
100 |         return el
101 | 
102 |     def __repr__(self):
103 |         return 'OSMPoint({} {} v{}, {}, {}, action={}, tags={})'.format(
104 |             self.osm_type, self.osm_id, self.version, self.lat, self.lon, self.action, self.tags)
105 | 


--------------------------------------------------------------------------------
/conflate/dataset.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import json
  3 | import codecs
  4 | import requests
  5 | import kdtree
  6 | from io import BytesIO
  7 | from .data import SourcePoint
  8 | 
  9 | 
 10 | def read_dataset(profile, fileobj):
 11 |     """A helper function to call a "dataset" function in the profile.
 12 |     If the fileobj is not specified, tries to download a dataset from
 13 |     an URL specified in "download_url" profile variable."""
 14 |     if not fileobj:
 15 |         url = profile.get('download_url')
 16 |         if url is None:
 17 |             logging.error('No download_url specified in the profile, '
 18 |                           'please provide a dataset file with --source')
 19 |             return None
 20 |         r = requests.get(url)
 21 |         if r.status_code != 200:
 22 |             logging.error('Could not download source data: %s %s', r.status_code, r.text)
 23 |             return None
 24 |         if len(r.content) == 0:
 25 |             logging.error('Empty response from %s', url)
 26 |             return None
 27 |         fileobj = BytesIO(r.content)
 28 |     if not profile.has('dataset'):
 29 |         # The default option is to parse the source as a JSON
 30 |         try:
 31 |             data = []
 32 |             reader = codecs.getreader('utf-8')
 33 |             json_src = json.load(reader(fileobj))
 34 |             if 'features' in json_src:
 35 |                 # Parse GeoJSON
 36 |                 for item in json_src['features']:
 37 |                     if item['geometry'].get('type') != 'Point' or 'properties' not in item:
 38 |                         continue
 39 |                     # Get the identifier from "id", "ref", "ref*"
 40 |                     iid = item['properties'].get('id', item['properties'].get('ref'))
 41 |                     if not iid:
 42 |                         for k, v in item['properties'].items():
 43 |                             if k.startswith('ref'):
 44 |                                 iid = v
 45 |                                 break
 46 |                     if not iid:
 47 |                         continue
 48 |                     data.append(SourcePoint(
 49 |                         iid,
 50 |                         item['geometry']['coordinates'][1],
 51 |                         item['geometry']['coordinates'][0],
 52 |                         {k: v for k, v in item['properties'].items() if k != 'id'}))
 53 |             else:
 54 |                 for item in json_src:
 55 |                     data.append(SourcePoint(item['id'], item['lat'], item['lon'], item['tags']))
 56 |             return data
 57 |         except Exception:
 58 |             logging.error('Failed to parse the source as a JSON')
 59 |     return list(profile.get(
 60 |         'dataset', args=(fileobj,),
 61 |         required='returns a list of SourcePoints with the dataset'))
 62 | 
 63 | 
 64 | def add_categories_to_dataset(profile, dataset):
 65 |     categories = profile.get('categories')
 66 |     if not categories:
 67 |         return
 68 |     tag = profile.get('category_tag')
 69 |     other = categories.get('other', {})
 70 |     for d in dataset:
 71 |         if tag and tag in d.tags:
 72 |             d.category = d.tags[tag]
 73 |             del d.tags[tag]
 74 |         if d.category:
 75 |             cat_tags = categories.get(d.category, other).get('tags', None)
 76 |             if cat_tags:
 77 |                 d.tags.update(cat_tags)
 78 | 
 79 | 
 80 | def transform_dataset(profile, dataset):
 81 |     """Transforms tags in the dataset using the "transform" method in the profile
 82 |     or the instructions in that field in string or dict form."""
 83 |     transform = profile.get_raw('transform')
 84 |     if not transform:
 85 |         return
 86 |     if callable(transform):
 87 |         for d in dataset:
 88 |             transform(d.tags)
 89 |         return
 90 |     if isinstance(transform, str):
 91 |         # Convert string of "key=value|rule1|rule2" lines to a dict
 92 |         lines = [line.split('=', 1) for line in transform.splitlines()]
 93 |         transform = {l[0].strip(): l[1].strip() for l in lines}
 94 |     if not transform or not isinstance(transform, dict):
 95 |         return
 96 |     for key in transform:
 97 |         if isinstance(transform[key], str):
 98 |             transform[key] = [x.strip() for x in transform[key].split('|')]
 99 | 
100 |     for d in dataset:
101 |         for key, rules in transform.items():
102 |             if not rules:
103 |                 continue
104 |             value = None
105 |             if callable(rules):
106 |                 # The value can be generated
107 |                 value = rules(None if key not in d.tags else d.tags[key])
108 |                 if value is None and key in d.tags:
109 |                     del d.tags[key]
110 |             elif not rules[0]:
111 |                 # Use the value of the tag
112 |                 if key in d.tags:
113 |                     value = d.tags[key]
114 |             elif not isinstance(rules[0], str):
115 |                 # If the value is not a string, use it
116 |                 value = str(rules[0])
117 |             elif rules[0][0] == '.':
118 |                 # Use the value from another tag
119 |                 alt_key = rules[0][1:]
120 |                 if alt_key in d.tags:
121 |                     value = d.tags[alt_key]
122 |             elif rules[0][0] == '>':
123 |                 # Replace the key
124 |                 if key in d.tags:
125 |                     d.tags[rules[0][1:]] = d.tags[key]
126 |                     del d.tags[key]
127 |             elif rules[0][0] == '<':
128 |                 # Replace the key, the same but backwards
129 |                 alt_key = rules[0][1:]
130 |                 if alt_key in d.tags:
131 |                     d.tags[key] = d.tags[alt_key]
132 |                     del d.tags[alt_key]
133 |             elif rules[0] == '-':
134 |                 # Delete the tag
135 |                 if key in d.tags:
136 |                     del d.tags[key]
137 |             else:
138 |                 # Take the value as written
139 |                 value = rules[0]
140 |             if value is None:
141 |                 continue
142 |             if isinstance(rules, list):
143 |                 for rule in rules[1:]:
144 |                     if rule == 'lower':
145 |                         value = value.lower()
146 |             d.tags[key] = value
147 | 
148 | 
149 | def check_dataset_for_duplicates(profile, dataset, print_all=False):
150 |     # First checking for duplicate ids and collecting tags with varying values
151 |     ids = set()
152 |     tags = {}
153 |     found_duplicate_ids = False
154 |     for d in dataset:
155 |         if d.id in ids:
156 |             found_duplicate_ids = True
157 |             logging.error('Duplicate id {} in the dataset'.format(d.id))
158 |         ids.add(d.id)
159 |         for k, v in d.tags.items():
160 |             if k not in tags:
161 |                 tags[k] = v
162 |             elif tags[k] != '---' and tags[k] != v:
163 |                 tags[k] = '---'
164 | 
165 |     # And then for near-duplicate points with similar tags
166 |     uncond_distance = profile.get('duplicate_distance', 1)
167 |     diff_tags = [k for k in tags if tags[k] == '---']
168 |     kd = kdtree.create(list(dataset))
169 |     duplicates = set()
170 |     group = 0
171 |     for d in dataset:
172 |         if d.id in duplicates:
173 |             continue
174 |         group += 1
175 |         dups = kd.search_knn(d, 2)  # The first one will be equal to d
176 |         if len(dups) < 2 or dups[1][0].data.distance(d) > profile.max_distance:
177 |             continue
178 |         for alt, _ in kd.search_knn(d, 20):
179 |             dist = alt.data.distance(d)
180 |             if alt.data.id != d.id and dist <= profile.max_distance:
181 |                 tags_differ = 0
182 |                 if dist > uncond_distance:
183 |                     for k in diff_tags:
184 |                         if alt.data.tags.get(k) != d.tags.get(k):
185 |                             tags_differ += 1
186 |                 if tags_differ <= len(diff_tags) / 3:
187 |                     duplicates.add(alt.data.id)
188 |                     d.exclusive_group = group
189 |                     alt.data.exclusive_group = group
190 |                     if print_all or len(duplicates) <= 5:
191 |                         is_duplicate = tags_differ <= 1
192 |                         logging.error('Dataset points %s: %s and %s',
193 |                                       'duplicate each other' if is_duplicate else 'are too similar',
194 |                                       d.id, alt.data.id)
195 |     if duplicates:
196 |         logging.error('Found %s duplicates in the dataset', len(duplicates))
197 |     if found_duplicate_ids:
198 |         raise KeyError('Cannot continue with duplicate ids')
199 | 
200 | 
201 | def add_regions(dataset, geocoder):
202 |     if not geocoder.enabled:
203 |         return
204 |     if geocoder.filter:
205 |         logging.info('Geocoding and filtering points')
206 |     else:
207 |         logging.info('Geocoding points')
208 |     for i in reversed(range(len(dataset))):
209 |         region, present = geocoder.find(dataset[i])
210 |         if not present:
211 |             del dataset[i]
212 |         else:
213 |             dataset[i].region = region
214 | 


--------------------------------------------------------------------------------
/conflate/geocoder.py:
--------------------------------------------------------------------------------
  1 | import struct
  2 | import logging
  3 | import os
  4 | import kdtree
  5 | 
  6 | 
  7 | class Geocoder:
  8 |     def __init__(self, profile_regions='all'):
  9 |         self.filter = None
 10 |         self.enabled = bool(profile_regions)
 11 |         if self.enabled:
 12 |             logging.info('Initializing geocoder (this will take a minute)')
 13 |             self.regions = self.parse_regions(profile_regions)
 14 |             self.tree = self.load_places_tree()
 15 |             if not self.tree:
 16 |                 if callable(profile_regions):
 17 |                     logging.warn('Could not read the geocoding file')
 18 |                 else:
 19 |                     logging.error('Could not read the geocoding file, no regions will be added')
 20 |                     self.enabled = False
 21 | 
 22 |     def set_filter(self, opt_regions):
 23 |         if isinstance(opt_regions, str):
 24 |             self.f_negate = opt_regions[0] in ('-', '^')
 25 |             if self.f_negate:
 26 |                 opt_regions = opt_regions[1:]
 27 |             self.filter = set([r.strip() for r in opt_regions.split(',')])
 28 |         elif isinstance(opt_regions, list):
 29 |             self.f_negate = False
 30 |             self.filter = set(opt_regions)
 31 | 
 32 |     def load_places_tree(self):
 33 |         class PlacePoint:
 34 |             def __init__(self, lon, lat, country, region):
 35 |                 self.coord = (lon, lat)
 36 |                 self.country = country
 37 |                 self.region = region
 38 | 
 39 |             def __len__(self):
 40 |                 return len(self.coord)
 41 | 
 42 |             def __getitem__(self, i):
 43 |                 return self.coord[i]
 44 | 
 45 |         def unpack_coord(data):
 46 |             if data[-1] > 0x7f:
 47 |                 data += b'\xFF'
 48 |             else:
 49 |                 data += b'\0'
 50 |             return struct.unpack('<l', data)[0] / 10000
 51 | 
 52 |         filename = os.path.join(os.getcwd(), os.path.dirname(__file__), 'places.bin')
 53 |         if not os.path.exists(filename):
 54 |             return None
 55 |         places = []
 56 |         with open(filename, 'rb') as f:
 57 |             countries = []
 58 |             cnt = struct.unpack('B', f.read(1))[0]
 59 |             for i in range(cnt):
 60 |                 countries.append(struct.unpack('2s', f.read(2))[0].decode('ascii'))
 61 |             regions = []
 62 |             cnt = struct.unpack('<h', f.read(2))[0]
 63 |             for i in range(cnt):
 64 |                 l = struct.unpack('B', f.read(1))[0]
 65 |                 regions.append(f.read(l).decode('ascii'))
 66 |             dlon = f.read(3)
 67 |             while len(dlon) == 3:
 68 |                 dlat = f.read(3)
 69 |                 country = struct.unpack('B', f.read(1))[0]
 70 |                 region = struct.unpack('<h', f.read(2))[0]
 71 |                 places.append(PlacePoint(unpack_coord(dlon), unpack_coord(dlat),
 72 |                                          countries[country], regions[region]))
 73 |                 dlon = f.read(3)
 74 |         if not places:
 75 |             return None
 76 |         return kdtree.create(places)
 77 | 
 78 |     def parse_regions(self, profile_regions):
 79 |         if not profile_regions or callable(profile_regions):
 80 |             return profile_regions
 81 |         regions = profile_regions
 82 |         if regions is True or regions == 4:
 83 |             regions = 'all'
 84 |         elif regions is False or regions == 2:
 85 |             regions = []
 86 |         if isinstance(regions, str):
 87 |             regions = regions.lower()
 88 |             if regions[:3] == 'reg' or '4' in regions:
 89 |                 regions = 'all'
 90 |             elif regions[:3] == 'cou' or '2' in regions:
 91 |                 regions = []
 92 |             elif regions == 'some':
 93 |                 regions = ['US', 'RU']
 94 |         if isinstance(regions, set):
 95 |             regions = list(regions)
 96 |         if isinstance(regions, dict):
 97 |             regions = list(regions.keys())
 98 |         if isinstance(regions, list):
 99 |             for i in regions:
100 |                 regions[i] = regions[i].upper()
101 |             regions = set(regions)
102 |         return regions
103 | 
104 |     def find(self, pt):
105 |         """Returns a tuple of (region, present). A point should be skipped if not present."""
106 |         region = pt.region
107 |         if self.enabled:
108 |             if not self.tree:
109 |                 if callable(self.regions):
110 |                     region = self.regions(pt, region)
111 |             elif region is None:
112 |                 reg, _ = self.tree.search_nn(pt)
113 |                 if callable(self.regions):
114 |                     region = self.regions(pt, reg.data.region)
115 |                 elif self.regions == 'all' or reg.data.country in self.regions:
116 |                     region = reg.data.region
117 |                 else:
118 |                     region = reg.data.country
119 | 
120 |         return region, not self.filter or (self.negate != (region not in self.filter))
121 | 


--------------------------------------------------------------------------------
/conflate/osm.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import requests
  3 | import re
  4 | from .data import OSMPoint
  5 | from . import etree
  6 | 
  7 | 
  8 | OVERPASS_SERVER = 'https://overpass-api.de/api/'
  9 | ALT_OVERPASS_SERVER = 'https://overpass.kumi.systems/api/'
 10 | OSM_API_SERVER = 'https://api.openstreetmap.org/api/0.6/'
 11 | BBOX_PADDING = 0.003  # in degrees, ~330 m default
 12 | 
 13 | 
 14 | class OsmDownloader:
 15 |     def __init__(self, profile):
 16 |         self.profile = profile
 17 | 
 18 |     def set_overpass(self, server='alt'):
 19 |         global OVERPASS_SERVER
 20 |         if server == 'alt':
 21 |             OVERPASS_SERVER = ALT_OVERPASS_SERVER
 22 |         else:
 23 |             OVERPASS_SERVER = server
 24 | 
 25 |     def construct_overpass_query(self, bboxes):
 26 |         """Constructs an Overpass API query from the "query" list in the profile.
 27 |         (k, v) turns into [k=v], (k,) into [k], (k, None) into [!k], (k, "~v") into [k~v]."""
 28 |         tags = self.profile.get(
 29 |             'query', required="a list of tuples. E.g. [('amenity', 'cafe'), ('name', '~Mc.*lds')]")
 30 |         tag_strs = []
 31 |         if isinstance(tags, str):
 32 |             tag_strs = [tags]
 33 |         else:
 34 |             if not isinstance(tags[0], str) and isinstance(tags[0][0], str):
 35 |                 tags = [tags]
 36 |             for tags_q in tags:
 37 |                 if isinstance(tags_q, str):
 38 |                     tag_strs.append(tags_q)
 39 |                     continue
 40 |                 tag_str = ''
 41 |                 for t in tags_q:
 42 |                     if len(t) == 1:
 43 |                         q = '"{}"'.format(t[0])
 44 |                     elif t[1] is None or len(t[1]) == 0:
 45 |                         q = '"!{}"'.format(t[0])
 46 |                     elif t[1][0] == '~':
 47 |                         q = '"{}"~"{}",i'.format(t[0], t[1][1:])
 48 |                     elif len(t) > 2:
 49 |                         q = '"{}"~"^({})$"'.format(t[0], '|'.join(t[1:]))
 50 |                     else:
 51 |                         q = '"{}"="{}"'.format(t[0], t[1])
 52 |                     tag_str += '[' + q + ']'
 53 |                 tag_strs.append(tag_str)
 54 | 
 55 |         if self.profile.get('no_dataset_id', False):
 56 |             ref = None
 57 |         else:
 58 |             ref = 'nwr["ref:' + self.profile.get(
 59 |                 'dataset_id', required='A fairly unique id of the dataset to query OSM') + '"]'
 60 |         timeout = self.profile.get('overpass_timeout', 120)
 61 |         query = '[out:xml]{};('.format('' if timeout is None else '[timeout:{}]'.format(timeout))
 62 |         for bbox in bboxes:
 63 |             bbox_str = '' if bbox is None else '(' + ','.join([str(x) for x in bbox]) + ')'
 64 |             for tag_str in tag_strs:
 65 |                 query += 'nwr' + tag_str + bbox_str + ';'
 66 |         if ref is not None:
 67 |             if not self.profile.get('bounded_update', False):
 68 |                 query += ref + ';'
 69 |             else:
 70 |                 for bbox in bboxes:
 71 |                     bbox_str = '' if bbox is None else '(' + ','.join(
 72 |                         [str(x) for x in bbox]) + ')'
 73 |                     query += ref + bbox_str + ';'
 74 |         query += '); out meta qt center;'
 75 |         return query
 76 | 
 77 |     def get_bbox(self, points):
 78 |         """Plain iterates over the dataset and returns the bounding box
 79 |         that encloses it."""
 80 |         padding = self.profile.get('bbox_padding', BBOX_PADDING)
 81 |         bbox = [90.0, 180.0, -90.0, -180.0]
 82 |         for p in points:
 83 |             bbox[0] = min(bbox[0], p.lat - padding)
 84 |             bbox[1] = min(bbox[1], p.lon - padding)
 85 |             bbox[2] = max(bbox[2], p.lat + padding)
 86 |             bbox[3] = max(bbox[3], p.lon + padding)
 87 |         return bbox
 88 | 
 89 |     def split_into_bboxes(self, points):
 90 |         """
 91 |         Splits the dataset into multiple bboxes to lower load on the overpass api.
 92 | 
 93 |         Returns a list of tuples (minlat, minlon, maxlat, maxlon).
 94 |         """
 95 |         max_bboxes = self.profile.get('max_request_boxes', 4)
 96 |         if max_bboxes <= 1 or len(points) <= 1:
 97 |             return [self.get_bbox(points)]
 98 | 
 99 |         # coord, alt coord, total w/h to the left/bottom, total w/h to the right/top
100 |         lons = sorted([[d.lon, d.lat, 0, 0] for d in points])
101 |         lats = sorted([[d.lat, d.lon, 0, 0] for d in points])
102 | 
103 |         def update_side_dimensions(ar):
104 |             """For each point, calculates the maximum and
105 |             minimum bound for all points left and right."""
106 |             fwd_top = fwd_bottom = ar[0][1]
107 |             back_top = back_bottom = ar[-1][1]
108 |             for i in range(len(ar)):
109 |                 fwd_top = max(fwd_top, ar[i][1])
110 |                 fwd_bottom = min(fwd_bottom, ar[i][1])
111 |                 ar[i][2] = fwd_top - fwd_bottom
112 |                 back_top = max(back_top, ar[-i-1][1])
113 |                 back_bottom = min(back_bottom, ar[-i-1][1])
114 |                 ar[-i-1][3] = back_top - back_bottom
115 | 
116 |         def find_max_gap(ar, h):
117 |             """Select an interval between points, which would give
118 |             the maximum area if split there."""
119 |             max_id = None
120 |             max_gap = 0
121 |             for i in range(len(ar) - 1):
122 |                 # "Extra" variables are for area to the left and right
123 |                 # that would be freed after splitting.
124 |                 extra_left = (ar[i][0]-ar[0][0]) * (h-ar[i][2])
125 |                 extra_right = (ar[-1][0]-ar[i+1][0]) * (h-ar[i+1][3])
126 |                 # Gap is the area of the column between points i and i+1
127 |                 # plus extra areas to the left and right.
128 |                 gap = (ar[i+1][0] - ar[i][0]) * h + extra_left + extra_right
129 |                 if gap > max_gap:
130 |                     max_id = i
131 |                     max_gap = gap
132 |             return max_id, max_gap
133 | 
134 |         def get_bbox(b, pad=0):
135 |             """Returns a list of [min_lat, min_lon, max_lat, max_lon] for a box."""
136 |             return [b[2][0][0]-pad, b[3][0][0]-pad, b[2][-1][0]+pad, b[3][-1][0]+pad]
137 | 
138 |         def split(box, point_array, point_id):
139 |             """Split the box over axis point_array at point point_id...point_id+1.
140 |             Modifies the box in-place and returns a new box."""
141 |             alt_array = 5 - point_array  # 3->2, 2->3
142 |             points = box[point_array][point_id+1:]
143 |             del box[point_array][point_id+1:]
144 |             alt = {True: [], False: []}  # True means point is in new box
145 |             for p in box[alt_array]:
146 |                 alt[(p[1], p[0]) >= (points[0][0], points[0][1])].append(p)
147 | 
148 |             new_box = [None] * 4
149 |             new_box[point_array] = points
150 |             new_box[alt_array] = alt[True]
151 |             box[alt_array] = alt[False]
152 |             for i in range(2):
153 |                 box[i] = box[i+2][-1][0] - box[i+2][0][0]
154 |                 new_box[i] = new_box[i+2][-1][0] - new_box[i+2][0][0]
155 |             return new_box
156 | 
157 |         # height, width, lats, lons
158 |         boxes = [[lats[-1][0]-lats[0][0], lons[-1][0]-lons[0][0], lats, lons]]
159 |         initial_area = boxes[0][0] * boxes[0][1]
160 |         while len(boxes) < max_bboxes and len(boxes) <= len(points):
161 |             candidate_box = None
162 |             area = 0
163 |             point_id = None
164 |             point_array = None
165 |             for box in boxes:
166 |                 for ar in (2, 3):
167 |                     # Find a box and an axis for splitting that would decrease the area the most
168 |                     update_side_dimensions(box[ar])
169 |                     max_id, max_area = find_max_gap(box[ar], box[3-ar])
170 |                     if max_area > area:
171 |                         area = max_area
172 |                         candidate_box = box
173 |                         point_id = max_id
174 |                         point_array = ar
175 |             if area * 100 < initial_area:
176 |                 # Stop splitting when the area decrease is less than 1%
177 |                 break
178 |             logging.debug('Splitting bbox %s at %s %s..%s; area decrease %s%%',
179 |                           get_bbox(candidate_box),
180 |                           'longs' if point_array == 3 else 'lats',
181 |                           candidate_box[point_array][point_id][0],
182 |                           candidate_box[point_array][point_id+1][0],
183 |                           round(100*area/initial_area))
184 |             boxes.append(split(candidate_box, point_array, point_id))
185 | 
186 |         padding = self.profile.get('bbox_padding', BBOX_PADDING)
187 |         return [get_bbox(b, padding) for b in boxes]
188 | 
189 |     def get_categories(self, tags):
190 |         def match_query(tags, query):
191 |             for tag in query:
192 |                 if len(tag) == 1:
193 |                     return tag[0] in tags
194 |                 else:
195 |                     value = tags.get(tag[0], None)
196 |                     if tag[1] is None or tag[1] == '':
197 |                         return value is None
198 |                     if value is None:
199 |                         return False
200 |                     found = False
201 |                     for t2 in tag[1:]:
202 |                         if t2[0] == '~':
203 |                             if re.search(t2[1:], value):
204 |                                 found = True
205 |                         elif t2[0] == '!':
206 |                             if t2[1:].lower() in value.lower():
207 |                                 found = True
208 |                         elif t2 == value:
209 |                             found = True
210 |                         if found:
211 |                             break
212 |                     if not found:
213 |                         return False
214 |             return True
215 | 
216 |         def tags_to_query(tags):
217 |             return [(k, v) for k, v in tags.items()]
218 | 
219 |         result = set()
220 |         qualifies = self.profile.get('qualifies', args=tags)
221 |         if qualifies is not None:
222 |             if qualifies:
223 |                 result.add(None)
224 |             return result
225 | 
226 |         # First check default query
227 |         query = self.profile.get('query', None)
228 |         if query is not None:
229 |             if isinstance(query, str):
230 |                 result.add(None)
231 |             else:
232 |                 if isinstance(query[0][0], str):
233 |                     query = [query]
234 |                 for q in query:
235 |                     if match_query(tags, q):
236 |                         result.add(None)
237 |                         break
238 | 
239 |         # Then check each category if we got these
240 |         categories = self.profile.get('categories', {})
241 |         for name, params in categories.items():
242 |             if 'tags' not in params and 'query' not in params:
243 |                 raise ValueError('No tags and query attributes for category "{}"'.format(name))
244 |             if match_query(tags, params.get('query', tags_to_query(params.get('tags')))):
245 |                 result.add(name)
246 | 
247 |         return result
248 | 
249 |     def calc_boxes(self, dataset_points):
250 |         profile_bbox = self.profile.get('bbox', True)
251 |         if not profile_bbox:
252 |             bboxes = [None]
253 |         elif hasattr(profile_bbox, '__len__') and len(profile_bbox) == 4:
254 |             bboxes = [profile_bbox]
255 |         else:
256 |             bboxes = self.split_into_bboxes(dataset_points)
257 |         return bboxes
258 | 
259 |     def download(self, bboxes=None):
260 |         """Constructs an Overpass API query and requests objects
261 |         to match from a server."""
262 |         if not bboxes:
263 |             pbbox = self.profile.get('bbox', True)
264 |             if pbbox and hasattr(pbbox, '__len__') and len(pbbox) == 4:
265 |                 bboxes = [pbbox]
266 |             else:
267 |                 bboxes = [None]
268 | 
269 |         query = self.construct_overpass_query(bboxes)
270 |         logging.debug('Overpass query: %s', query)
271 |         r = requests.get(OVERPASS_SERVER + 'interpreter', {'data': query})
272 |         if r.encoding is None:
273 |             r.encoding = 'utf-8'
274 |         if r.status_code != 200:
275 |             logging.error('Failed to download data from Overpass API: %s', r.status_code)
276 |             if 'rate_limited' in r.text:
277 |                 r = requests.get(OVERPASS_SERVER + 'status')
278 |                 logging.warning('Seems like you are rate limited. API status:\n%s', r.text)
279 |             else:
280 |                 logging.error('Error message: %s', r.text)
281 |             raise IOError()
282 |         if 'runtime error: ' in r.text:
283 |             m = re.search(r'runtime error: ([^<]+)', r.text)
284 |             error = 'unknown' if not m else m.group(1)
285 |             if 'Query timed out' in error:
286 |                 logging.error(
287 |                     'Query timed out, try increasing the "overpass_timeout" profile variable')
288 |             else:
289 |                 logging.error('Runtime error: %s', error)
290 |             raise IOError()
291 |         return self.parse_xml(r.content)
292 | 
293 |     def parse_xml(self, fileobj):
294 |         """Parses an OSM XML file into the "osmdata" field. For ways and relations,
295 |         finds the center. Drops objects that do not match the overpass query tags
296 |         (see "check_against_profile_tags" method)."""
297 |         if isinstance(fileobj, bytes):
298 |             xml = etree.fromstring(fileobj)
299 |         else:
300 |             xml = etree.parse(fileobj).getroot()
301 |         nodes = {}
302 |         for nd in xml.findall('node'):
303 |             nodes[nd.get('id')] = (float(nd.get('lat')), float(nd.get('lon')))
304 |         ways = {}
305 |         for way in xml.findall('way'):
306 |             center = way.find('center')
307 |             if center is not None:
308 |                 ways[way.get('id')] = [float(center.get('lat')), float(center.get('lon'))]
309 |             else:
310 |                 logging.debug('Way %s does not have a center', way.get('id'))
311 |                 coord = [0, 0]
312 |                 count = 0
313 |                 for nd in way.findall('nd'):
314 |                     if nd.get('ref') in nodes:
315 |                         count += 1
316 |                         for i in range(len(coord)):
317 |                             coord[i] += nodes[nd.get('ref')][i]
318 |                 ways[way.get('id')] = [coord[0] / count, coord[1] / count]
319 | 
320 |         # For calculating weight of OSM objects
321 |         weight_fn = self.profile.get_raw('weight')
322 |         osmdata = {}
323 | 
324 |         for el in xml:
325 |             tags = {}
326 |             for tag in el.findall('tag'):
327 |                 tags[tag.get('k')] = tag.get('v')
328 |             categories = self.get_categories(tags)
329 |             if categories is False or categories is None or len(categories) == 0:
330 |                 continue
331 | 
332 |             if el.tag == 'node':
333 |                 coord = nodes[el.get('id')]
334 |                 members = None
335 |             elif el.tag == 'way':
336 |                 coord = ways[el.get('id')]
337 |                 members = [nd.get('ref') for nd in el.findall('nd')]
338 |             elif el.tag == 'relation':
339 |                 center = el.find('center')
340 |                 if center is not None:
341 |                     coord = [float(center.get('lat')), float(center.get('lon'))]
342 |                 else:
343 |                     logging.debug('Relation %s does not have a center', el.get('id'))
344 |                     coord = [0, 0]
345 |                     count = 0
346 |                     for m in el.findall('member'):
347 |                         if m.get('type') == 'node' and m.get('ref') in nodes:
348 |                             count += 1
349 |                             for i in range(len(coord)):
350 |                                 coord[i] += nodes[m.get('ref')][i]
351 |                         elif m.get('type') == 'way' and m.get('ref') in ways:
352 |                             count += 1
353 |                             for i in range(len(coord)):
354 |                                 coord[i] += ways[m.get('ref')][i]
355 |                     if count > 0:
356 |                         coord = [coord[0] / count, coord[1] / count]
357 |                 members = [
358 |                     (m.get('type'), m.get('ref'), m.get('role'))
359 |                     for m in el.findall('member')
360 |                 ]
361 |             else:
362 |                 continue
363 |             if not coord or coord == [0, 0]:
364 |                 continue
365 |             pt = OSMPoint(
366 |                 el.tag, int(el.get('id')), int(el.get('version')),
367 |                 coord[0], coord[1], tags, categories)
368 |             pt.members = members
369 |             if pt.is_poi():
370 |                 if callable(weight_fn):
371 |                     weight = weight_fn(pt)
372 |                     if weight:
373 |                         if abs(weight) > 3:
374 |                             pt.dist_offset = weight
375 |                         else:
376 |                             pt.dist_offset = weight * self.profile.max_distance
377 |                 osmdata[pt.id] = pt
378 |         return osmdata
379 | 
380 | 
381 | def check_moveability(changes):
382 |     to_check = [x for x in changes if x['properties']['osm_type'] == 'node' and
383 |                 x['properties']['action'] == 'modify']
384 |     logging.info('Checking moveability of %s modified nodes', len(to_check))
385 |     for c in to_check:
386 |         p = c['properties']
387 |         p['can_move'] = False
388 |         r = requests.get('{}node/{}/ways'.format(OSM_API_SERVER, p['osm_id']))
389 |         if r.status_code == 200:
390 |             xml = etree.fromstring(r.content)
391 |             p['can_move'] = xml.find('way') is None
392 | 


--------------------------------------------------------------------------------
/conflate/places.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mapsme/osm_conflate/a7af835ce44b3ac194469b53b7f388bba168cbe4/conflate/places.bin


--------------------------------------------------------------------------------
/conflate/profile.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | from .data import SourcePoint  # So we don't have to import this in profiles
 3 | from . import etree
 4 | 
 5 | 
 6 | class ProfileException(Exception):
 7 |     """An exception class for the Profile instance."""
 8 |     def __init__(self, attr, desc):
 9 |         super().__init__('Field missing in profile: {} ({})'.format(attr, desc))
10 | 
11 | 
12 | class Profile:
13 |     """A wrapper for a profile.
14 | 
15 |     A profile is a python script that sets a few local variables.
16 |     These variables become properties of the profile, accessible with
17 |     a "get" method. If something is a function, it will be called,
18 |     optional parameters might be passed to it.
19 | 
20 |     You can compile a list of all supported variables by grepping through
21 |     this code, or by looking at a few example profiles. If something
22 |     is required, you will be notified of that.
23 |     """
24 |     def __init__(self, fileobj, par=None):
25 |         global param
26 |         param = par
27 |         if isinstance(fileobj, dict):
28 |             self.profile = fileobj
29 |         elif hasattr(fileobj, 'read'):
30 |             s = fileobj.read().replace('\r', '')
31 |             if s[0] == '{':
32 |                 self.profile = json.loads(s)
33 |             else:
34 |                 self.profile = {}
35 |                 exec(s, globals(), self.profile)
36 |         else:
37 |             # Got a class
38 |             self.profile = {name: getattr(fileobj, name)
39 |                             for name in dir(fileobj) if not name.startswith('_')}
40 |         self.max_distance = self.get('max_distance', 100)
41 | 
42 |     def has(self, attr):
43 |         return attr in self.profile
44 | 
45 |     def get(self, attr, default=None, required=None, args=None):
46 |         if attr in self.profile:
47 |             value = self.profile[attr]
48 |             if callable(value):
49 |                 if args is None:
50 |                     return value()
51 |                 else:
52 |                     return value(*args)
53 |             else:
54 |                 return value
55 |         if required is not None:
56 |             raise ProfileException(attr, required)
57 |         return default
58 | 
59 |     def get_raw(self, attr, default=None):
60 |         if attr in self.profile:
61 |             return self.profile[attr]
62 |         return default
63 | 


--------------------------------------------------------------------------------
/conflate/version.py:
--------------------------------------------------------------------------------
1 | __version__ = '1.4.1'
2 | 


--------------------------------------------------------------------------------
/filter/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | cmake_minimum_required(VERSION 2.8)
 2 | set(NAME filter_planet_by_cats)
 3 | project(${NAME} C CXX)
 4 | set(CMAKE_CXX_STANDARD 11)
 5 | message(STATUS "Configuring ${NAME}")
 6 | list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}")
 7 | find_package(Osmium REQUIRED COMPONENTS io)
 8 | include_directories(SYSTEM ${OSMIUM_INCLUDE_DIRS})
 9 | add_executable(
10 |   ${NAME}
11 |   ${NAME}.cpp
12 |   RTree.h
13 |   xml_centers_output.hpp
14 | )
15 | target_link_libraries(${NAME} ${OSMIUM_IO_LIBRARIES})
16 | 


--------------------------------------------------------------------------------
/filter/FindOsmium.cmake:
--------------------------------------------------------------------------------
  1 | #----------------------------------------------------------------------
  2 | #
  3 | #  FindOsmium.cmake
  4 | #
  5 | #  Find the Libosmium headers and, optionally, several components needed
  6 | #  for different Libosmium functions.
  7 | #
  8 | #----------------------------------------------------------------------
  9 | #
 10 | #  Usage:
 11 | #
 12 | #    Copy this file somewhere into your project directory, where cmake can
 13 | #    find it. Usually this will be a directory called "cmake" which you can
 14 | #    add to the CMake module search path with the following line in your
 15 | #    CMakeLists.txt:
 16 | #
 17 | #      list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
 18 | #
 19 | #    Then add the following in your CMakeLists.txt:
 20 | #
 21 | #      find_package(Osmium [version] REQUIRED COMPONENTS <XXX>)
 22 | #      include_directories(SYSTEM ${OSMIUM_INCLUDE_DIRS})
 23 | #
 24 | #    The version number is optional. If it is not set, any version of
 25 | #    libosmium will do.
 26 | #
 27 | #    For the <XXX> substitute a space separated list of one or more of the
 28 | #    following components:
 29 | #
 30 | #      pbf        - include libraries needed for PBF input and output
 31 | #      xml        - include libraries needed for XML input and output
 32 | #      io         - include libraries needed for any type of input/output
 33 | #      geos       - include if you want to use any of the GEOS functions
 34 | #      gdal       - include if you want to use any of the OGR functions
 35 | #      proj       - include if you want to use any of the Proj.4 functions
 36 | #      sparsehash - include if you use the sparsehash index
 37 | #
 38 | #    You can check for success with something like this:
 39 | #
 40 | #      if(NOT OSMIUM_FOUND)
 41 | #          message(WARNING "Libosmium not found!\n")
 42 | #      endif()
 43 | #
 44 | #----------------------------------------------------------------------
 45 | #
 46 | #  Variables:
 47 | #
 48 | #    OSMIUM_FOUND         - True if Osmium found.
 49 | #    OSMIUM_INCLUDE_DIRS  - Where to find include files.
 50 | #    OSMIUM_XML_LIBRARIES - Libraries needed for XML I/O.
 51 | #    OSMIUM_PBF_LIBRARIES - Libraries needed for PBF I/O.
 52 | #    OSMIUM_IO_LIBRARIES  - Libraries needed for XML or PBF I/O.
 53 | #    OSMIUM_LIBRARIES     - All libraries Osmium uses somewhere.
 54 | #
 55 | #----------------------------------------------------------------------
 56 | 
 57 | # This is the list of directories where we look for osmium includes.
 58 | set(_osmium_include_path
 59 |         ../libosmium
 60 |         ~/Library/Frameworks
 61 |         /Library/Frameworks
 62 |         /opt/local # DarwinPorts
 63 |         /opt
 64 | )
 65 | 
 66 | # Look for the header file.
 67 | find_path(OSMIUM_INCLUDE_DIR osmium/version.hpp
 68 |     PATH_SUFFIXES include
 69 |     PATHS ${_osmium_include_path}
 70 | )
 71 | 
 72 | # Check libosmium version number
 73 | if(Osmium_FIND_VERSION)
 74 |     file(STRINGS "${OSMIUM_INCLUDE_DIR}/osmium/version.hpp" _libosmium_version_define REGEX "#define LIBOSMIUM_VERSION_STRING")
 75 |     if("${_libosmium_version_define}" MATCHES "#define LIBOSMIUM_VERSION_STRING \"([0-9.]+)\"")
 76 |         set(_libosmium_version "${CMAKE_MATCH_1}")
 77 |     else()
 78 |         set(_libosmium_version "unknown")
 79 |     endif()
 80 | endif()
 81 | 
 82 | set(OSMIUM_INCLUDE_DIRS "${OSMIUM_INCLUDE_DIR}")
 83 | 
 84 | #----------------------------------------------------------------------
 85 | #
 86 | #  Check for optional components
 87 | #
 88 | #----------------------------------------------------------------------
 89 | if(Osmium_FIND_COMPONENTS)
 90 |     foreach(_component ${Osmium_FIND_COMPONENTS})
 91 |         string(TOUPPER ${_component} _component_uppercase)
 92 |         set(Osmium_USE_${_component_uppercase} TRUE)
 93 |     endforeach()
 94 | endif()
 95 | 
 96 | #----------------------------------------------------------------------
 97 | # Component 'io' is an alias for 'pbf' and 'xml'
 98 | if(Osmium_USE_IO)
 99 |     set(Osmium_USE_PBF TRUE)
100 |     set(Osmium_USE_XML TRUE)
101 | endif()
102 | 
103 | #----------------------------------------------------------------------
104 | # Component 'ogr' is an alias for 'gdal'
105 | if(Osmium_USE_OGR)
106 |     set(Osmium_USE_GDAL TRUE)
107 | endif()
108 | 
109 | #----------------------------------------------------------------------
110 | # Component 'pbf'
111 | if(Osmium_USE_PBF)
112 |     find_package(ZLIB)
113 |     find_package(Threads)
114 |     find_package(Protozero 1.5.1)
115 | 
116 |     list(APPEND OSMIUM_EXTRA_FIND_VARS ZLIB_FOUND Threads_FOUND PROTOZERO_INCLUDE_DIR)
117 |     if(ZLIB_FOUND AND Threads_FOUND AND PROTOZERO_FOUND)
118 |         list(APPEND OSMIUM_PBF_LIBRARIES
119 |             ${ZLIB_LIBRARIES}
120 |             ${CMAKE_THREAD_LIBS_INIT}
121 |         )
122 |         list(APPEND OSMIUM_INCLUDE_DIRS
123 |             ${ZLIB_INCLUDE_DIR}
124 |             ${PROTOZERO_INCLUDE_DIR}
125 |         )
126 |     else()
127 |         message(WARNING "Osmium: Can not find some libraries for PBF input/output, please install them or configure the paths.")
128 |     endif()
129 | endif()
130 | 
131 | #----------------------------------------------------------------------
132 | # Component 'xml'
133 | if(Osmium_USE_XML)
134 |     find_package(EXPAT)
135 |     find_package(BZip2)
136 |     find_package(ZLIB)
137 |     find_package(Threads)
138 | 
139 |     list(APPEND OSMIUM_EXTRA_FIND_VARS EXPAT_FOUND BZIP2_FOUND ZLIB_FOUND Threads_FOUND)
140 |     if(EXPAT_FOUND AND BZIP2_FOUND AND ZLIB_FOUND AND Threads_FOUND)
141 |         list(APPEND OSMIUM_XML_LIBRARIES
142 |             ${EXPAT_LIBRARIES}
143 |             ${BZIP2_LIBRARIES}
144 |             ${ZLIB_LIBRARIES}
145 |             ${CMAKE_THREAD_LIBS_INIT}
146 |         )
147 |         list(APPEND OSMIUM_INCLUDE_DIRS
148 |             ${EXPAT_INCLUDE_DIR}
149 |             ${BZIP2_INCLUDE_DIR}
150 |             ${ZLIB_INCLUDE_DIR}
151 |         )
152 |     else()
153 |         message(WARNING "Osmium: Can not find some libraries for XML input/output, please install them or configure the paths.")
154 |     endif()
155 | endif()
156 | 
157 | #----------------------------------------------------------------------
158 | list(APPEND OSMIUM_IO_LIBRARIES
159 |     ${OSMIUM_PBF_LIBRARIES}
160 |     ${OSMIUM_XML_LIBRARIES}
161 | )
162 | 
163 | list(APPEND OSMIUM_LIBRARIES
164 |     ${OSMIUM_IO_LIBRARIES}
165 | )
166 | 
167 | #----------------------------------------------------------------------
168 | # Component 'geos'
169 | if(Osmium_USE_GEOS)
170 |     find_path(GEOS_INCLUDE_DIR geos/geom.h)
171 |     find_library(GEOS_LIBRARY NAMES geos)
172 | 
173 |     list(APPEND OSMIUM_EXTRA_FIND_VARS GEOS_INCLUDE_DIR GEOS_LIBRARY)
174 |     if(GEOS_INCLUDE_DIR AND GEOS_LIBRARY)
175 |         SET(GEOS_FOUND 1)
176 |         list(APPEND OSMIUM_LIBRARIES ${GEOS_LIBRARY})
177 |         list(APPEND OSMIUM_INCLUDE_DIRS ${GEOS_INCLUDE_DIR})
178 |     else()
179 |         message(WARNING "Osmium: GEOS library is required but not found, please install it or configure the paths.")
180 |     endif()
181 | endif()
182 | 
183 | #----------------------------------------------------------------------
184 | # Component 'gdal' (alias 'ogr')
185 | if(Osmium_USE_GDAL)
186 |     find_package(GDAL)
187 | 
188 |     list(APPEND OSMIUM_EXTRA_FIND_VARS GDAL_FOUND)
189 |     if(GDAL_FOUND)
190 |         list(APPEND OSMIUM_LIBRARIES ${GDAL_LIBRARIES})
191 |         list(APPEND OSMIUM_INCLUDE_DIRS ${GDAL_INCLUDE_DIRS})
192 |     else()
193 |         message(WARNING "Osmium: GDAL library is required but not found, please install it or configure the paths.")
194 |     endif()
195 | endif()
196 | 
197 | #----------------------------------------------------------------------
198 | # Component 'proj'
199 | if(Osmium_USE_PROJ)
200 |     find_path(PROJ_INCLUDE_DIR proj_api.h)
201 |     find_library(PROJ_LIBRARY NAMES proj)
202 | 
203 |     list(APPEND OSMIUM_EXTRA_FIND_VARS PROJ_INCLUDE_DIR PROJ_LIBRARY)
204 |     if(PROJ_INCLUDE_DIR AND PROJ_LIBRARY)
205 |         set(PROJ_FOUND 1)
206 |         list(APPEND OSMIUM_LIBRARIES ${PROJ_LIBRARY})
207 |         list(APPEND OSMIUM_INCLUDE_DIRS ${PROJ_INCLUDE_DIR})
208 |     else()
209 |         message(WARNING "Osmium: PROJ.4 library is required but not found, please install it or configure the paths.")
210 |     endif()
211 | endif()
212 | 
213 | #----------------------------------------------------------------------
214 | # Component 'sparsehash'
215 | if(Osmium_USE_SPARSEHASH)
216 |     find_path(SPARSEHASH_INCLUDE_DIR google/sparsetable)
217 | 
218 |     list(APPEND OSMIUM_EXTRA_FIND_VARS SPARSEHASH_INCLUDE_DIR)
219 |     if(SPARSEHASH_INCLUDE_DIR)
220 |         # Find size of sparsetable::size_type. This does not work on older
221 |         # CMake versions because they can do this check only in C, not in C++.
222 |         if(NOT CMAKE_VERSION VERSION_LESS 3.0)
223 |            include(CheckTypeSize)
224 |            set(CMAKE_REQUIRED_INCLUDES ${SPARSEHASH_INCLUDE_DIR})
225 |            set(CMAKE_EXTRA_INCLUDE_FILES "google/sparsetable")
226 |            check_type_size("google::sparsetable<int>::size_type" SPARSETABLE_SIZE_TYPE LANGUAGE CXX)
227 |            set(CMAKE_EXTRA_INCLUDE_FILES)
228 |            set(CMAKE_REQUIRED_INCLUDES)
229 |         else()
230 |            set(SPARSETABLE_SIZE_TYPE ${CMAKE_SIZEOF_VOID_P})
231 |         endif()
232 | 
233 |         # Sparsetable::size_type must be at least 8 bytes (64bit), otherwise
234 |         # OSM object IDs will not fit.
235 |         if(SPARSETABLE_SIZE_TYPE GREATER 7)
236 |             set(SPARSEHASH_FOUND 1)
237 |             add_definitions(-DOSMIUM_WITH_SPARSEHASH=${SPARSEHASH_FOUND})
238 |             list(APPEND OSMIUM_INCLUDE_DIRS ${SPARSEHASH_INCLUDE_DIR})
239 |         else()
240 |             message(WARNING "Osmium: Disabled Google SparseHash library on 32bit system (size_type=${SPARSETABLE_SIZE_TYPE}).")
241 |         endif()
242 |     else()
243 |         message(WARNING "Osmium: Google SparseHash library is required but not found, please install it or configure the paths.")
244 |     endif()
245 | endif()
246 | 
247 | #----------------------------------------------------------------------
248 | 
249 | list(REMOVE_DUPLICATES OSMIUM_INCLUDE_DIRS)
250 | 
251 | if(OSMIUM_XML_LIBRARIES)
252 |     list(REMOVE_DUPLICATES OSMIUM_XML_LIBRARIES)
253 | endif()
254 | 
255 | if(OSMIUM_PBF_LIBRARIES)
256 |     list(REMOVE_DUPLICATES OSMIUM_PBF_LIBRARIES)
257 | endif()
258 | 
259 | if(OSMIUM_IO_LIBRARIES)
260 |     list(REMOVE_DUPLICATES OSMIUM_IO_LIBRARIES)
261 | endif()
262 | 
263 | if(OSMIUM_LIBRARIES)
264 |     list(REMOVE_DUPLICATES OSMIUM_LIBRARIES)
265 | endif()
266 | 
267 | #----------------------------------------------------------------------
268 | #
269 | #  Check that all required libraries are available
270 | #
271 | #----------------------------------------------------------------------
272 | if(OSMIUM_EXTRA_FIND_VARS)
273 |     list(REMOVE_DUPLICATES OSMIUM_EXTRA_FIND_VARS)
274 | endif()
275 | # Handle the QUIETLY and REQUIRED arguments and the optional version check
276 | # and set OSMIUM_FOUND to TRUE if all listed variables are TRUE.
277 | include(FindPackageHandleStandardArgs)
278 | find_package_handle_standard_args(Osmium
279 |                                   REQUIRED_VARS OSMIUM_INCLUDE_DIR ${OSMIUM_EXTRA_FIND_VARS}
280 |                                   VERSION_VAR _libosmium_version)
281 | unset(OSMIUM_EXTRA_FIND_VARS)
282 | 
283 | #----------------------------------------------------------------------
284 | #
285 | #  A function for setting the -pthread option in compilers/linkers
286 | #
287 | #----------------------------------------------------------------------
288 | function(set_pthread_on_target _target)
289 |     if(NOT MSVC)
290 |         set_target_properties(${_target} PROPERTIES COMPILE_FLAGS "-pthread")
291 |         if(NOT APPLE)
292 |             set_target_properties(${_target} PROPERTIES LINK_FLAGS "-pthread")
293 |         endif()
294 |     endif()
295 | endfunction()
296 | 
297 | #----------------------------------------------------------------------
298 | #
299 | #  Add compiler flags
300 | #
301 | #----------------------------------------------------------------------
302 | add_definitions(-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64)
303 | 
304 | if(MSVC)
305 |     add_definitions(-wd4996)
306 | 
307 |     # Disable warning C4068: "unknown pragma" because we want it to ignore
308 |     # pragmas for other compilers.
309 |     add_definitions(-wd4068)
310 | 
311 |     # Disable warning C4715: "not all control paths return a value" because
312 |     # it generates too many false positives.
313 |     add_definitions(-wd4715)
314 | 
315 |     # Disable warning C4351: new behavior: elements of array '...' will be
316 |     # default initialized. The new behaviour is correct and we don't support
317 |     # old compilers anyway.
318 |     add_definitions(-wd4351)
319 | 
320 |     # Disable warning C4503: "decorated name length exceeded, name was truncated"
321 |     # there are more than 150 of generated names in libosmium longer than 4096 symbols supported in MSVC
322 |     add_definitions(-wd4503)
323 | 
324 |     add_definitions(-DNOMINMAX -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS)
325 | endif()
326 | 
327 | if(APPLE)
328 | # following only available from cmake 2.8.12:
329 | #   add_compile_options(-stdlib=libc++)
330 | # so using this instead:
331 |     add_definitions(-stdlib=libc++)
332 |     set(LDFLAGS ${LDFLAGS} -stdlib=libc++)
333 | endif()
334 | 
335 | #----------------------------------------------------------------------
336 | 
337 | # This is a set of recommended warning options that can be added when compiling
338 | # libosmium code.
339 | if(MSVC)
340 |     set(OSMIUM_WARNING_OPTIONS "/W3 /wd4514" CACHE STRING "Recommended warning options for libosmium")
341 | else()
342 |     set(OSMIUM_WARNING_OPTIONS "-Wall -Wextra -pedantic -Wredundant-decls -Wdisabled-optimization -Wctor-dtor-privacy -Wnon-virtual-dtor -Woverloaded-virtual -Wsign-promo -Wold-style-cast" CACHE STRING "Recommended warning options for libosmium")
343 | endif()
344 | 
345 | set(OSMIUM_DRACONIC_CLANG_OPTIONS "-Wdocumentation -Wunused-exception-parameter -Wmissing-declarations -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-unused-macros -Wno-exit-time-destructors -Wno-global-constructors -Wno-padded -Wno-switch-enum -Wno-missing-prototypes -Wno-weak-vtables -Wno-cast-align -Wno-float-equal")
346 | 
347 | if(Osmium_DEBUG)
348 |     message(STATUS "OSMIUM_XML_LIBRARIES=" ${OSMIUM_XML_LIBRARIES})
349 |     message(STATUS "OSMIUM_PBF_LIBRARIES=" ${OSMIUM_PBF_LIBRARIES})
350 |     message(STATUS "OSMIUM_IO_LIBRARIES=" ${OSMIUM_IO_LIBRARIES})
351 |     message(STATUS "OSMIUM_LIBRARIES=" ${OSMIUM_LIBRARIES})
352 |     message(STATUS "OSMIUM_INCLUDE_DIRS=" ${OSMIUM_INCLUDE_DIRS})
353 | endif()
354 | 
355 | 


--------------------------------------------------------------------------------
/filter/FindProtozero.cmake:
--------------------------------------------------------------------------------
 1 | #----------------------------------------------------------------------
 2 | #
 3 | #  FindProtozero.cmake
 4 | #
 5 | #  Find the protozero headers.
 6 | #
 7 | #----------------------------------------------------------------------
 8 | #
 9 | #  Usage:
10 | #
11 | #    Copy this file somewhere into your project directory, where cmake can
12 | #    find it. Usually this will be a directory called "cmake" which you can
13 | #    add to the CMake module search path with the following line in your
14 | #    CMakeLists.txt:
15 | #
16 | #      list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")
17 | #
18 | #    Then add the following in your CMakeLists.txt:
19 | #
20 | #      find_package(Protozero [version] [REQUIRED])
21 | #      include_directories(SYSTEM ${PROTOZERO_INCLUDE_DIR})
22 | #
23 | #    The version number is optional. If it is not set, any version of
24 | #    protozero will do.
25 | #
26 | #      if(NOT PROTOZERO_FOUND)
27 | #          message(WARNING "Protozero not found!\n")
28 | #      endif()
29 | #
30 | #----------------------------------------------------------------------
31 | #
32 | #  Variables:
33 | #
34 | #    PROTOZERO_FOUND        - True if Protozero was found.
35 | #    PROTOZERO_INCLUDE_DIR  - Where to find include files.
36 | #
37 | #----------------------------------------------------------------------
38 | 
39 | # find include path
40 | find_path(PROTOZERO_INCLUDE_DIR protozero/version.hpp
41 |     PATH_SUFFIXES include
42 |     PATHS ${CMAKE_SOURCE_DIR}/../protozero
43 | )
44 | 
45 | # Check version number
46 | if(Protozero_FIND_VERSION)
47 |     file(STRINGS "${PROTOZERO_INCLUDE_DIR}/protozero/version.hpp" _version_define REGEX "#define PROTOZERO_VERSION_STRING")
48 |     if("${_version_define}" MATCHES "#define PROTOZERO_VERSION_STRING \"([0-9.]+)\"")
49 |         set(_version "${CMAKE_MATCH_1}")
50 |     else()
51 |         set(_version "unknown")
52 |     endif()
53 | endif()
54 | 
55 | #set(PROTOZERO_INCLUDE_DIRS "${PROTOZERO_INCLUDE_DIR}")
56 | 
57 | include(FindPackageHandleStandardArgs)
58 | find_package_handle_standard_args(Protozero
59 |                                   REQUIRED_VARS PROTOZERO_INCLUDE_DIR
60 |                                   VERSION_VAR _version)
61 | 
62 | 
63 | #----------------------------------------------------------------------
64 | 


--------------------------------------------------------------------------------
/filter/README.md:
--------------------------------------------------------------------------------
 1 | # Filtering OSM by external dataset
 2 | 
 3 | When you got points of multiple categories, an Overpass API request may fail
 4 | from the number of query clauses. For that, you would need to filter the planet
 5 | file yourself. First, prepare a list of categories and dataset points:
 6 | 
 7 |     conflate.py profile.py -f points.lst
 8 | 
 9 | Then compile the filtering tool:
10 | 
11 |     mkdir build
12 |     cmake ..
13 |     make
14 | 
15 | Download a planet file or an extract for the country of import, update it to the minute,
16 | and feed it to the filtering tool:
17 | 
18 |     ./filter_planet_by_cats points.lst planet-latest.osm.pbf > filtered.osm
19 | 
20 | This will take an hour or two. The resulting OSM file should be used as an input to
21 | the conflation tool:
22 | 
23 |     conflate.py profile.py --osm filtered.osm -c changes.json
24 | 
25 | ## Authors and License
26 | 
27 | The `filter_planet_by_cats` script was written by Ilya Zverev for MAPS.ME and
28 | published under Apache License 2.0.
29 | 
30 | The `xml_centers_output.hpp` and `*.cmake` files are based on
31 | [libosmium](https://github.com/osmcode/libosmium) code and hence published
32 | under the Boost License terms.
33 | 
34 | `RTree.h` is under public domain, downloaded from
35 | [this repository](https://github.com/nushoin/RTree).
36 | 


--------------------------------------------------------------------------------
/filter/RTree.h:
--------------------------------------------------------------------------------
   1 | #ifndef RTREE_H
   2 | #define RTREE_H
   3 | 
   4 | // NOTE This file compiles under MSVC 6 SP5 and MSVC .Net 2003 it may not work on other compilers without modification.
   5 | 
   6 | // NOTE These next few lines may be win32 specific, you may need to modify them to compile on other platform
   7 | #include <stdio.h>
   8 | #include <math.h>
   9 | #include <assert.h>
  10 | #include <stdlib.h>
  11 | 
  12 | #include <algorithm>
  13 | 
  14 | #define ASSERT assert // RTree uses ASSERT( condition )
  15 | #ifndef Min
  16 |   #define Min std::min
  17 | #endif //Min
  18 | #ifndef Max
  19 |   #define Max std::max
  20 | #endif //Max
  21 | 
  22 | //
  23 | // RTree.h
  24 | //
  25 | 
  26 | #define RTREE_TEMPLATE template<class DATATYPE, class ELEMTYPE, int NUMDIMS, class ELEMTYPEREAL, int TMAXNODES, int TMINNODES>
  27 | #define RTREE_QUAL RTree<DATATYPE, ELEMTYPE, NUMDIMS, ELEMTYPEREAL, TMAXNODES, TMINNODES>
  28 | 
  29 | #define RTREE_DONT_USE_MEMPOOLS // This version does not contain a fixed memory allocator, fill in lines with EXAMPLE to implement one.
  30 | #define RTREE_USE_SPHERICAL_VOLUME // Better split classification, may be slower on some systems
  31 | 
  32 | // Fwd decl
  33 | class RTFileStream;  // File I/O helper class, look below for implementation and notes.
  34 | 
  35 | 
  36 | /// \class RTree
  37 | /// Implementation of RTree, a multidimensional bounding rectangle tree.
  38 | /// Example usage: For a 3-dimensional tree use RTree<Object*, float, 3> myTree;
  39 | ///
  40 | /// This modified, templated C++ version by Greg Douglas at Auran (http://www.auran.com)
  41 | ///
  42 | /// DATATYPE Referenced data, should be int, void*, obj* etc. no larger than sizeof<void*> and simple type
  43 | /// ELEMTYPE Type of element such as int or float
  44 | /// NUMDIMS Number of dimensions such as 2 or 3
  45 | /// ELEMTYPEREAL Type of element that allows fractional and large values such as float or double, for use in volume calcs
  46 | ///
  47 | /// NOTES: Inserting and removing data requires the knowledge of its constant Minimal Bounding Rectangle.
  48 | ///        This version uses new/delete for nodes, I recommend using a fixed size allocator for efficiency.
  49 | ///        Instead of using a callback function for returned results, I recommend and efficient pre-sized, grow-only memory
  50 | ///        array similar to MFC CArray or STL Vector for returning search query result.
  51 | ///
  52 | template<class DATATYPE, class ELEMTYPE, int NUMDIMS, 
  53 |          class ELEMTYPEREAL = ELEMTYPE, int TMAXNODES = 8, int TMINNODES = TMAXNODES / 2>
  54 | class RTree
  55 | {
  56 | protected: 
  57 | 
  58 |   struct Node;  // Fwd decl.  Used by other internal structs and iterator
  59 | 
  60 | public:
  61 | 
  62 |   // These constant must be declared after Branch and before Node struct
  63 |   // Stuck up here for MSVC 6 compiler.  NSVC .NET 2003 is much happier.
  64 |   enum
  65 |   {
  66 |     MAXNODES = TMAXNODES,                         ///< Max elements in node
  67 |     MINNODES = TMINNODES,                         ///< Min elements in node
  68 |   };
  69 | 
  70 |   typedef bool (*t_resultCallback)(DATATYPE, void*);
  71 | 
  72 | public:
  73 | 
  74 |   RTree();
  75 |   virtual ~RTree();
  76 |   
  77 |   /// Insert entry
  78 |   /// \param a_min Min of bounding rect
  79 |   /// \param a_max Max of bounding rect
  80 |   /// \param a_dataId Positive Id of data.  Maybe zero, but negative numbers not allowed.
  81 |   void Insert(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId);
  82 |   
  83 |   /// Remove entry
  84 |   /// \param a_min Min of bounding rect
  85 |   /// \param a_max Max of bounding rect
  86 |   /// \param a_dataId Positive Id of data.  Maybe zero, but negative numbers not allowed.
  87 |   void Remove(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId);
  88 |   
  89 |   /// Find all within search rectangle
  90 |   /// \param a_min Min of search bounding rect
  91 |   /// \param a_max Max of search bounding rect
  92 |   /// \param a_searchResult Search result array.  Caller should set grow size. Function will reset, not append to array.
  93 |   /// \param a_resultCallback Callback function to return result.  Callback should return 'true' to continue searching
  94 |   /// \param a_context User context to pass as parameter to a_resultCallback
  95 |   /// \return Returns the number of entries found
  96 |   int Search(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], t_resultCallback a_resultCallback, void* a_context);
  97 |   
  98 |   /// Remove all entries from tree
  99 |   void RemoveAll();
 100 | 
 101 |   /// Count the data elements in this container.  This is slow as no internal counter is maintained.
 102 |   int Count();
 103 | 
 104 |   /// Load tree contents from file
 105 |   bool Load(const char* a_fileName);
 106 |   /// Load tree contents from stream
 107 |   bool Load(RTFileStream& a_stream);
 108 | 
 109 |   
 110 |   /// Save tree contents to file
 111 |   bool Save(const char* a_fileName);
 112 |   /// Save tree contents to stream
 113 |   bool Save(RTFileStream& a_stream);
 114 | 
 115 |   /// Iterator is not remove safe.
 116 |   class Iterator
 117 |   {
 118 |   private:
 119 |   
 120 |     enum { MAX_STACK = 32 }; //  Max stack size. Allows almost n^32 where n is number of branches in node
 121 |     
 122 |     struct StackElement
 123 |     {
 124 |       Node* m_node;
 125 |       int m_branchIndex;
 126 |     };
 127 |     
 128 |   public:
 129 |   
 130 |     Iterator()                                    { Init(); }
 131 | 
 132 |     ~Iterator()                                   { }
 133 |     
 134 |     /// Is iterator invalid
 135 |     bool IsNull()                                 { return (m_tos <= 0); }
 136 | 
 137 |     /// Is iterator pointing to valid data
 138 |     bool IsNotNull()                              { return (m_tos > 0); }
 139 | 
 140 |     /// Access the current data element. Caller must be sure iterator is not NULL first.
 141 |     DATATYPE& operator*()
 142 |     {
 143 |       ASSERT(IsNotNull());
 144 |       StackElement& curTos = m_stack[m_tos - 1];
 145 |       return curTos.m_node->m_branch[curTos.m_branchIndex].m_data;
 146 |     } 
 147 | 
 148 |     /// Access the current data element. Caller must be sure iterator is not NULL first.
 149 |     const DATATYPE& operator*() const
 150 |     {
 151 |       ASSERT(IsNotNull());
 152 |       StackElement& curTos = m_stack[m_tos - 1];
 153 |       return curTos.m_node->m_branch[curTos.m_branchIndex].m_data;
 154 |     } 
 155 | 
 156 |     /// Find the next data element
 157 |     bool operator++()                             { return FindNextData(); }
 158 | 
 159 |     /// Get the bounds for this node
 160 |     void GetBounds(ELEMTYPE a_min[NUMDIMS], ELEMTYPE a_max[NUMDIMS])
 161 |     {
 162 |       ASSERT(IsNotNull());
 163 |       StackElement& curTos = m_stack[m_tos - 1];
 164 |       Branch& curBranch = curTos.m_node->m_branch[curTos.m_branchIndex];
 165 |       
 166 |       for(int index = 0; index < NUMDIMS; ++index)
 167 |       {
 168 |         a_min[index] = curBranch.m_rect.m_min[index];
 169 |         a_max[index] = curBranch.m_rect.m_max[index];
 170 |       }
 171 |     }
 172 | 
 173 |   private:
 174 |   
 175 |     /// Reset iterator
 176 |     void Init()                                   { m_tos = 0; }
 177 | 
 178 |     /// Find the next data element in the tree (For internal use only)
 179 |     bool FindNextData()
 180 |     {
 181 |       for(;;)
 182 |       {
 183 |         if(m_tos <= 0)
 184 |         {
 185 |           return false;
 186 |         }
 187 |         StackElement curTos = Pop(); // Copy stack top cause it may change as we use it
 188 | 
 189 |         if(curTos.m_node->IsLeaf())
 190 |         {
 191 |           // Keep walking through data while we can
 192 |           if(curTos.m_branchIndex+1 < curTos.m_node->m_count)
 193 |           {
 194 |             // There is more data, just point to the next one
 195 |             Push(curTos.m_node, curTos.m_branchIndex + 1);
 196 |             return true;
 197 |           }
 198 |           // No more data, so it will fall back to previous level
 199 |         }
 200 |         else
 201 |         {
 202 |           if(curTos.m_branchIndex+1 < curTos.m_node->m_count)
 203 |           {
 204 |             // Push sibling on for future tree walk
 205 |             // This is the 'fall back' node when we finish with the current level
 206 |             Push(curTos.m_node, curTos.m_branchIndex + 1);
 207 |           }
 208 |           // Since cur node is not a leaf, push first of next level to get deeper into the tree
 209 |           Node* nextLevelnode = curTos.m_node->m_branch[curTos.m_branchIndex].m_child;
 210 |           Push(nextLevelnode, 0);
 211 |           
 212 |           // If we pushed on a new leaf, exit as the data is ready at TOS
 213 |           if(nextLevelnode->IsLeaf())
 214 |           {
 215 |             return true;
 216 |           }
 217 |         }
 218 |       }
 219 |     }
 220 | 
 221 |     /// Push node and branch onto iteration stack (For internal use only)
 222 |     void Push(Node* a_node, int a_branchIndex)
 223 |     {
 224 |       m_stack[m_tos].m_node = a_node;
 225 |       m_stack[m_tos].m_branchIndex = a_branchIndex;
 226 |       ++m_tos;
 227 |       ASSERT(m_tos <= MAX_STACK);
 228 |     }
 229 |     
 230 |     /// Pop element off iteration stack (For internal use only)
 231 |     StackElement& Pop()
 232 |     {
 233 |       ASSERT(m_tos > 0);
 234 |       --m_tos;
 235 |       return m_stack[m_tos];
 236 |     }
 237 | 
 238 |     StackElement m_stack[MAX_STACK];              ///< Stack as we are doing iteration instead of recursion
 239 |     int m_tos;                                    ///< Top Of Stack index
 240 |   
 241 |     friend class RTree; // Allow hiding of non-public functions while allowing manipulation by logical owner
 242 |   };
 243 | 
 244 |   /// Get 'first' for iteration
 245 |   void GetFirst(Iterator& a_it)
 246 |   {
 247 |     a_it.Init();
 248 |     Node* first = m_root;
 249 |     while(first)
 250 |     {
 251 |       if(first->IsInternalNode() && first->m_count > 1)
 252 |       {
 253 |         a_it.Push(first, 1); // Descend sibling branch later
 254 |       }
 255 |       else if(first->IsLeaf())
 256 |       {
 257 |         if(first->m_count)
 258 |         {
 259 |           a_it.Push(first, 0);
 260 |         }
 261 |         break;
 262 |       }
 263 |       first = first->m_branch[0].m_child;
 264 |     }
 265 |   }  
 266 | 
 267 |   /// Get Next for iteration
 268 |   void GetNext(Iterator& a_it)                    { ++a_it; }
 269 | 
 270 |   /// Is iterator NULL, or at end?
 271 |   bool IsNull(Iterator& a_it)                     { return a_it.IsNull(); }
 272 | 
 273 |   /// Get object at iterator position
 274 |   DATATYPE& GetAt(Iterator& a_it)                 { return *a_it; }
 275 | 
 276 | protected:
 277 | 
 278 |   /// Minimal bounding rectangle (n-dimensional)
 279 |   struct Rect
 280 |   {
 281 |     ELEMTYPE m_min[NUMDIMS];                      ///< Min dimensions of bounding box 
 282 |     ELEMTYPE m_max[NUMDIMS];                      ///< Max dimensions of bounding box 
 283 |   };
 284 | 
 285 |   /// May be data or may be another subtree
 286 |   /// The parents level determines this.
 287 |   /// If the parents level is 0, then this is data
 288 |   struct Branch
 289 |   {
 290 |     Rect m_rect;                                  ///< Bounds
 291 |     Node* m_child;                                ///< Child node
 292 |     DATATYPE m_data;                              ///< Data Id
 293 |   };
 294 | 
 295 |   /// Node for each branch level
 296 |   struct Node
 297 |   {
 298 |     bool IsInternalNode()                         { return (m_level > 0); } // Not a leaf, but a internal node
 299 |     bool IsLeaf()                                 { return (m_level == 0); } // A leaf, contains data
 300 |     
 301 |     int m_count;                                  ///< Count
 302 |     int m_level;                                  ///< Leaf is zero, others positive
 303 |     Branch m_branch[MAXNODES];                    ///< Branch
 304 |   };
 305 |   
 306 |   /// A link list of nodes for reinsertion after a delete operation
 307 |   struct ListNode
 308 |   {
 309 |     ListNode* m_next;                             ///< Next in list
 310 |     Node* m_node;                                 ///< Node
 311 |   };
 312 | 
 313 |   /// Variables for finding a split partition
 314 |   struct PartitionVars
 315 |   {
 316 |     enum { NOT_TAKEN = -1 }; // indicates that position
 317 | 
 318 |     int m_partition[MAXNODES+1];
 319 |     int m_total;
 320 |     int m_minFill;
 321 |     int m_count[2];
 322 |     Rect m_cover[2];
 323 |     ELEMTYPEREAL m_area[2];
 324 | 
 325 |     Branch m_branchBuf[MAXNODES+1];
 326 |     int m_branchCount;
 327 |     Rect m_coverSplit;
 328 |     ELEMTYPEREAL m_coverSplitArea;
 329 |   };
 330 |  
 331 |   Node* AllocNode();
 332 |   void FreeNode(Node* a_node);
 333 |   void InitNode(Node* a_node);
 334 |   void InitRect(Rect* a_rect);
 335 |   bool InsertRectRec(const Branch& a_branch, Node* a_node, Node** a_newNode, int a_level);
 336 |   bool InsertRect(const Branch& a_branch, Node** a_root, int a_level);
 337 |   Rect NodeCover(Node* a_node);
 338 |   bool AddBranch(const Branch* a_branch, Node* a_node, Node** a_newNode);
 339 |   void DisconnectBranch(Node* a_node, int a_index);
 340 |   int PickBranch(const Rect* a_rect, Node* a_node);
 341 |   Rect CombineRect(const Rect* a_rectA, const Rect* a_rectB);
 342 |   void SplitNode(Node* a_node, const Branch* a_branch, Node** a_newNode);
 343 |   ELEMTYPEREAL RectSphericalVolume(Rect* a_rect);
 344 |   ELEMTYPEREAL RectVolume(Rect* a_rect);
 345 |   ELEMTYPEREAL CalcRectVolume(Rect* a_rect);
 346 |   void GetBranches(Node* a_node, const Branch* a_branch, PartitionVars* a_parVars);
 347 |   void ChoosePartition(PartitionVars* a_parVars, int a_minFill);
 348 |   void LoadNodes(Node* a_nodeA, Node* a_nodeB, PartitionVars* a_parVars);
 349 |   void InitParVars(PartitionVars* a_parVars, int a_maxRects, int a_minFill);
 350 |   void PickSeeds(PartitionVars* a_parVars);
 351 |   void Classify(int a_index, int a_group, PartitionVars* a_parVars);
 352 |   bool RemoveRect(Rect* a_rect, const DATATYPE& a_id, Node** a_root);
 353 |   bool RemoveRectRec(Rect* a_rect, const DATATYPE& a_id, Node* a_node, ListNode** a_listNode);
 354 |   ListNode* AllocListNode();
 355 |   void FreeListNode(ListNode* a_listNode);
 356 |   bool Overlap(Rect* a_rectA, Rect* a_rectB);
 357 |   void ReInsert(Node* a_node, ListNode** a_listNode);
 358 |   bool Search(Node* a_node, Rect* a_rect, int& a_foundCount, t_resultCallback a_resultCallback, void* a_context);
 359 |   void RemoveAllRec(Node* a_node);
 360 |   void Reset();
 361 |   void CountRec(Node* a_node, int& a_count);
 362 | 
 363 |   bool SaveRec(Node* a_node, RTFileStream& a_stream);
 364 |   bool LoadRec(Node* a_node, RTFileStream& a_stream);
 365 |   
 366 |   Node* m_root;                                    ///< Root of tree
 367 |   ELEMTYPEREAL m_unitSphereVolume;                 ///< Unit sphere constant for required number of dimensions
 368 | };
 369 | 
 370 | 
 371 | // Because there is not stream support, this is a quick and dirty file I/O helper.
 372 | // Users will likely replace its usage with a Stream implementation from their favorite API.
 373 | class RTFileStream
 374 | {
 375 |   FILE* m_file;
 376 | 
 377 | public:
 378 | 
 379 |   
 380 |   RTFileStream()
 381 |   {
 382 |     m_file = NULL;
 383 |   }
 384 | 
 385 |   ~RTFileStream()
 386 |   {
 387 |     Close();
 388 |   }
 389 | 
 390 |   bool OpenRead(const char* a_fileName)
 391 |   {
 392 |     m_file = fopen(a_fileName, "rb");
 393 |     if(!m_file)
 394 |     {
 395 |       return false;
 396 |     }
 397 |     return true;
 398 |   }
 399 | 
 400 |   bool OpenWrite(const char* a_fileName)
 401 |   {
 402 |     m_file = fopen(a_fileName, "wb");
 403 |     if(!m_file)
 404 |     {
 405 |       return false;
 406 |     }
 407 |     return true;
 408 |   }
 409 | 
 410 |   void Close()
 411 |   {
 412 |     if(m_file)
 413 |     {
 414 |       fclose(m_file);
 415 |       m_file = NULL;
 416 |     }
 417 |   }
 418 | 
 419 |   template< typename TYPE >
 420 |   size_t Write(const TYPE& a_value)
 421 |   {
 422 |     ASSERT(m_file);
 423 |     return fwrite((void*)&a_value, sizeof(a_value), 1, m_file);
 424 |   }
 425 | 
 426 |   template< typename TYPE >
 427 |   size_t WriteArray(const TYPE* a_array, int a_count)
 428 |   {
 429 |     ASSERT(m_file);
 430 |     return fwrite((void*)a_array, sizeof(TYPE) * a_count, 1, m_file);
 431 |   }
 432 | 
 433 |   template< typename TYPE >
 434 |   size_t Read(TYPE& a_value)
 435 |   {
 436 |     ASSERT(m_file);
 437 |     return fread((void*)&a_value, sizeof(a_value), 1, m_file);
 438 |   }
 439 | 
 440 |   template< typename TYPE >
 441 |   size_t ReadArray(TYPE* a_array, int a_count)
 442 |   {
 443 |     ASSERT(m_file);
 444 |     return fread((void*)a_array, sizeof(TYPE) * a_count, 1, m_file);
 445 |   }
 446 | };
 447 | 
 448 | 
 449 | RTREE_TEMPLATE
 450 | RTREE_QUAL::RTree()
 451 | {
 452 |   ASSERT(MAXNODES > MINNODES);
 453 |   ASSERT(MINNODES > 0);
 454 | 
 455 |   // Precomputed volumes of the unit spheres for the first few dimensions
 456 |   const float UNIT_SPHERE_VOLUMES[] = {
 457 |     0.000000f, 2.000000f, 3.141593f, // Dimension  0,1,2
 458 |     4.188790f, 4.934802f, 5.263789f, // Dimension  3,4,5
 459 |     5.167713f, 4.724766f, 4.058712f, // Dimension  6,7,8
 460 |     3.298509f, 2.550164f, 1.884104f, // Dimension  9,10,11
 461 |     1.335263f, 0.910629f, 0.599265f, // Dimension  12,13,14
 462 |     0.381443f, 0.235331f, 0.140981f, // Dimension  15,16,17
 463 |     0.082146f, 0.046622f, 0.025807f, // Dimension  18,19,20 
 464 |   };
 465 | 
 466 |   m_root = AllocNode();
 467 |   m_root->m_level = 0;
 468 |   m_unitSphereVolume = (ELEMTYPEREAL)UNIT_SPHERE_VOLUMES[NUMDIMS];
 469 | }
 470 | 
 471 | 
 472 | RTREE_TEMPLATE
 473 | RTREE_QUAL::~RTree()
 474 | {
 475 |   Reset(); // Free, or reset node memory
 476 | }
 477 | 
 478 | 
 479 | RTREE_TEMPLATE
 480 | void RTREE_QUAL::Insert(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId)
 481 | {
 482 | #ifdef _DEBUG
 483 |   for(int index=0; index<NUMDIMS; ++index)
 484 |   {
 485 |     ASSERT(a_min[index] <= a_max[index]);
 486 |   }
 487 | #endif //_DEBUG
 488 | 
 489 |   Branch branch;
 490 |   branch.m_data = a_dataId;
 491 |   branch.m_child = NULL;
 492 |   
 493 |   for(int axis=0; axis<NUMDIMS; ++axis)
 494 |   {
 495 |     branch.m_rect.m_min[axis] = a_min[axis];
 496 |     branch.m_rect.m_max[axis] = a_max[axis];
 497 |   }
 498 |   
 499 |   InsertRect(branch, &m_root, 0);
 500 | }
 501 | 
 502 | 
 503 | RTREE_TEMPLATE
 504 | void RTREE_QUAL::Remove(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], const DATATYPE& a_dataId)
 505 | {
 506 | #ifdef _DEBUG
 507 |   for(int index=0; index<NUMDIMS; ++index)
 508 |   {
 509 |     ASSERT(a_min[index] <= a_max[index]);
 510 |   }
 511 | #endif //_DEBUG
 512 | 
 513 |   Rect rect;
 514 |   
 515 |   for(int axis=0; axis<NUMDIMS; ++axis)
 516 |   {
 517 |     rect.m_min[axis] = a_min[axis];
 518 |     rect.m_max[axis] = a_max[axis];
 519 |   }
 520 | 
 521 |   RemoveRect(&rect, a_dataId, &m_root);
 522 | }
 523 | 
 524 | 
 525 | RTREE_TEMPLATE
 526 | int RTREE_QUAL::Search(const ELEMTYPE a_min[NUMDIMS], const ELEMTYPE a_max[NUMDIMS], t_resultCallback a_resultCallback, void* a_context)
 527 | {
 528 | #ifdef _DEBUG
 529 |   for(int index=0; index<NUMDIMS; ++index)
 530 |   {
 531 |     ASSERT(a_min[index] <= a_max[index]);
 532 |   }
 533 | #endif //_DEBUG
 534 | 
 535 |   Rect rect;
 536 |   
 537 |   for(int axis=0; axis<NUMDIMS; ++axis)
 538 |   {
 539 |     rect.m_min[axis] = a_min[axis];
 540 |     rect.m_max[axis] = a_max[axis];
 541 |   }
 542 | 
 543 |   // NOTE: May want to return search result another way, perhaps returning the number of found elements here.
 544 | 
 545 |   int foundCount = 0;
 546 |   Search(m_root, &rect, foundCount, a_resultCallback, a_context);
 547 | 
 548 |   return foundCount;
 549 | }
 550 | 
 551 | 
 552 | RTREE_TEMPLATE
 553 | int RTREE_QUAL::Count()
 554 | {
 555 |   int count = 0;
 556 |   CountRec(m_root, count);
 557 |   
 558 |   return count;
 559 | }
 560 | 
 561 | 
 562 | 
 563 | RTREE_TEMPLATE
 564 | void RTREE_QUAL::CountRec(Node* a_node, int& a_count)
 565 | {
 566 |   if(a_node->IsInternalNode())  // not a leaf node
 567 |   {
 568 |     for(int index = 0; index < a_node->m_count; ++index)
 569 |     {
 570 |       CountRec(a_node->m_branch[index].m_child, a_count);
 571 |     }
 572 |   }
 573 |   else // A leaf node
 574 |   {
 575 |     a_count += a_node->m_count;
 576 |   }
 577 | }
 578 | 
 579 | 
 580 | RTREE_TEMPLATE
 581 | bool RTREE_QUAL::Load(const char* a_fileName)
 582 | {
 583 |   RemoveAll(); // Clear existing tree
 584 | 
 585 |   RTFileStream stream;
 586 |   if(!stream.OpenRead(a_fileName))
 587 |   {
 588 |     return false;
 589 |   }
 590 | 
 591 |   bool result = Load(stream);
 592 |   
 593 |   stream.Close();
 594 | 
 595 |   return result;
 596 | }
 597 | 
 598 | 
 599 | 
 600 | RTREE_TEMPLATE
 601 | bool RTREE_QUAL::Load(RTFileStream& a_stream)
 602 | {
 603 |   // Write some kind of header
 604 |   int _dataFileId = ('R'<<0)|('T'<<8)|('R'<<16)|('E'<<24);
 605 |   int _dataSize = sizeof(DATATYPE);
 606 |   int _dataNumDims = NUMDIMS;
 607 |   int _dataElemSize = sizeof(ELEMTYPE);
 608 |   int _dataElemRealSize = sizeof(ELEMTYPEREAL);
 609 |   int _dataMaxNodes = TMAXNODES;
 610 |   int _dataMinNodes = TMINNODES;
 611 | 
 612 |   int dataFileId = 0;
 613 |   int dataSize = 0;
 614 |   int dataNumDims = 0;
 615 |   int dataElemSize = 0;
 616 |   int dataElemRealSize = 0;
 617 |   int dataMaxNodes = 0;
 618 |   int dataMinNodes = 0;
 619 | 
 620 |   a_stream.Read(dataFileId);
 621 |   a_stream.Read(dataSize);
 622 |   a_stream.Read(dataNumDims);
 623 |   a_stream.Read(dataElemSize);
 624 |   a_stream.Read(dataElemRealSize);
 625 |   a_stream.Read(dataMaxNodes);
 626 |   a_stream.Read(dataMinNodes);
 627 | 
 628 |   bool result = false;
 629 | 
 630 |   // Test if header was valid and compatible
 631 |   if(    (dataFileId == _dataFileId) 
 632 |       && (dataSize == _dataSize) 
 633 |       && (dataNumDims == _dataNumDims) 
 634 |       && (dataElemSize == _dataElemSize) 
 635 |       && (dataElemRealSize == _dataElemRealSize) 
 636 |       && (dataMaxNodes == _dataMaxNodes) 
 637 |       && (dataMinNodes == _dataMinNodes) 
 638 |     )
 639 |   {
 640 |     // Recursively load tree
 641 |     result = LoadRec(m_root, a_stream);
 642 |   }
 643 | 
 644 |   return result;
 645 | }
 646 | 
 647 | 
 648 | RTREE_TEMPLATE
 649 | bool RTREE_QUAL::LoadRec(Node* a_node, RTFileStream& a_stream)
 650 | {
 651 |   a_stream.Read(a_node->m_level);
 652 |   a_stream.Read(a_node->m_count);
 653 | 
 654 |   if(a_node->IsInternalNode())  // not a leaf node
 655 |   {
 656 |     for(int index = 0; index < a_node->m_count; ++index)
 657 |     {
 658 |       Branch* curBranch = &a_node->m_branch[index];
 659 | 
 660 |       a_stream.ReadArray(curBranch->m_rect.m_min, NUMDIMS);
 661 |       a_stream.ReadArray(curBranch->m_rect.m_max, NUMDIMS);
 662 | 
 663 |       curBranch->m_child = AllocNode();
 664 |       LoadRec(curBranch->m_child, a_stream);
 665 |     }
 666 |   }
 667 |   else // A leaf node
 668 |   {
 669 |     for(int index = 0; index < a_node->m_count; ++index)
 670 |     {
 671 |       Branch* curBranch = &a_node->m_branch[index];
 672 | 
 673 |       a_stream.ReadArray(curBranch->m_rect.m_min, NUMDIMS);
 674 |       a_stream.ReadArray(curBranch->m_rect.m_max, NUMDIMS);
 675 | 
 676 |       a_stream.Read(curBranch->m_data);
 677 |     }
 678 |   }
 679 | 
 680 |   return true; // Should do more error checking on I/O operations
 681 | }
 682 | 
 683 | 
 684 | RTREE_TEMPLATE
 685 | bool RTREE_QUAL::Save(const char* a_fileName)
 686 | {
 687 |   RTFileStream stream;
 688 |   if(!stream.OpenWrite(a_fileName))
 689 |   {
 690 |     return false;
 691 |   }
 692 | 
 693 |   bool result = Save(stream);
 694 | 
 695 |   stream.Close();
 696 | 
 697 |   return result;
 698 | }
 699 | 
 700 | 
 701 | RTREE_TEMPLATE
 702 | bool RTREE_QUAL::Save(RTFileStream& a_stream)
 703 | {
 704 |   // Write some kind of header
 705 |   int dataFileId = ('R'<<0)|('T'<<8)|('R'<<16)|('E'<<24);
 706 |   int dataSize = sizeof(DATATYPE);
 707 |   int dataNumDims = NUMDIMS;
 708 |   int dataElemSize = sizeof(ELEMTYPE);
 709 |   int dataElemRealSize = sizeof(ELEMTYPEREAL);
 710 |   int dataMaxNodes = TMAXNODES;
 711 |   int dataMinNodes = TMINNODES;
 712 | 
 713 |   a_stream.Write(dataFileId);
 714 |   a_stream.Write(dataSize);
 715 |   a_stream.Write(dataNumDims);
 716 |   a_stream.Write(dataElemSize);
 717 |   a_stream.Write(dataElemRealSize);
 718 |   a_stream.Write(dataMaxNodes);
 719 |   a_stream.Write(dataMinNodes);
 720 | 
 721 |   // Recursively save tree
 722 |   bool result = SaveRec(m_root, a_stream);
 723 |   
 724 |   return result;
 725 | }
 726 | 
 727 | 
 728 | RTREE_TEMPLATE
 729 | bool RTREE_QUAL::SaveRec(Node* a_node, RTFileStream& a_stream)
 730 | {
 731 |   a_stream.Write(a_node->m_level);
 732 |   a_stream.Write(a_node->m_count);
 733 | 
 734 |   if(a_node->IsInternalNode())  // not a leaf node
 735 |   {
 736 |     for(int index = 0; index < a_node->m_count; ++index)
 737 |     {
 738 |       Branch* curBranch = &a_node->m_branch[index];
 739 | 
 740 |       a_stream.WriteArray(curBranch->m_rect.m_min, NUMDIMS);
 741 |       a_stream.WriteArray(curBranch->m_rect.m_max, NUMDIMS);
 742 | 
 743 |       SaveRec(curBranch->m_child, a_stream);
 744 |     }
 745 |   }
 746 |   else // A leaf node
 747 |   {
 748 |     for(int index = 0; index < a_node->m_count; ++index)
 749 |     {
 750 |       Branch* curBranch = &a_node->m_branch[index];
 751 | 
 752 |       a_stream.WriteArray(curBranch->m_rect.m_min, NUMDIMS);
 753 |       a_stream.WriteArray(curBranch->m_rect.m_max, NUMDIMS);
 754 | 
 755 |       a_stream.Write(curBranch->m_data);
 756 |     }
 757 |   }
 758 | 
 759 |   return true; // Should do more error checking on I/O operations
 760 | }
 761 | 
 762 | 
 763 | RTREE_TEMPLATE
 764 | void RTREE_QUAL::RemoveAll()
 765 | {
 766 |   // Delete all existing nodes
 767 |   Reset();
 768 | 
 769 |   m_root = AllocNode();
 770 |   m_root->m_level = 0;
 771 | }
 772 | 
 773 | 
 774 | RTREE_TEMPLATE
 775 | void RTREE_QUAL::Reset()
 776 | {
 777 | #ifdef RTREE_DONT_USE_MEMPOOLS
 778 |   // Delete all existing nodes
 779 |   RemoveAllRec(m_root);
 780 | #else // RTREE_DONT_USE_MEMPOOLS
 781 |   // Just reset memory pools.  We are not using complex types
 782 |   // EXAMPLE
 783 | #endif // RTREE_DONT_USE_MEMPOOLS
 784 | }
 785 | 
 786 | 
 787 | RTREE_TEMPLATE
 788 | void RTREE_QUAL::RemoveAllRec(Node* a_node)
 789 | {
 790 |   ASSERT(a_node);
 791 |   ASSERT(a_node->m_level >= 0);
 792 | 
 793 |   if(a_node->IsInternalNode()) // This is an internal node in the tree
 794 |   {
 795 |     for(int index=0; index < a_node->m_count; ++index)
 796 |     {
 797 |       RemoveAllRec(a_node->m_branch[index].m_child);
 798 |     }
 799 |   }
 800 |   FreeNode(a_node); 
 801 | }
 802 | 
 803 | 
 804 | RTREE_TEMPLATE
 805 | typename RTREE_QUAL::Node* RTREE_QUAL::AllocNode()
 806 | {
 807 |   Node* newNode;
 808 | #ifdef RTREE_DONT_USE_MEMPOOLS
 809 |   newNode = new Node;
 810 | #else // RTREE_DONT_USE_MEMPOOLS
 811 |   // EXAMPLE
 812 | #endif // RTREE_DONT_USE_MEMPOOLS
 813 |   InitNode(newNode);
 814 |   return newNode;
 815 | }
 816 | 
 817 | 
 818 | RTREE_TEMPLATE
 819 | void RTREE_QUAL::FreeNode(Node* a_node)
 820 | {
 821 |   ASSERT(a_node);
 822 | 
 823 | #ifdef RTREE_DONT_USE_MEMPOOLS
 824 |   delete a_node;
 825 | #else // RTREE_DONT_USE_MEMPOOLS
 826 |   // EXAMPLE
 827 | #endif // RTREE_DONT_USE_MEMPOOLS
 828 | }
 829 | 
 830 | 
 831 | // Allocate space for a node in the list used in DeletRect to
 832 | // store Nodes that are too empty.
 833 | RTREE_TEMPLATE
 834 | typename RTREE_QUAL::ListNode* RTREE_QUAL::AllocListNode()
 835 | {
 836 | #ifdef RTREE_DONT_USE_MEMPOOLS
 837 |   return new ListNode;
 838 | #else // RTREE_DONT_USE_MEMPOOLS
 839 |   // EXAMPLE
 840 | #endif // RTREE_DONT_USE_MEMPOOLS
 841 | }
 842 | 
 843 | 
 844 | RTREE_TEMPLATE
 845 | void RTREE_QUAL::FreeListNode(ListNode* a_listNode)
 846 | {
 847 | #ifdef RTREE_DONT_USE_MEMPOOLS
 848 |   delete a_listNode;
 849 | #else // RTREE_DONT_USE_MEMPOOLS
 850 |   // EXAMPLE
 851 | #endif // RTREE_DONT_USE_MEMPOOLS
 852 | }
 853 | 
 854 | 
 855 | RTREE_TEMPLATE
 856 | void RTREE_QUAL::InitNode(Node* a_node)
 857 | {
 858 |   a_node->m_count = 0;
 859 |   a_node->m_level = -1;
 860 | }
 861 | 
 862 | 
 863 | RTREE_TEMPLATE
 864 | void RTREE_QUAL::InitRect(Rect* a_rect)
 865 | {
 866 |   for(int index = 0; index < NUMDIMS; ++index)
 867 |   {
 868 |     a_rect->m_min[index] = (ELEMTYPE)0;
 869 |     a_rect->m_max[index] = (ELEMTYPE)0;
 870 |   }
 871 | }
 872 | 
 873 | 
 874 | // Inserts a new data rectangle into the index structure.
 875 | // Recursively descends tree, propagates splits back up.
 876 | // Returns 0 if node was not split.  Old node updated.
 877 | // If node was split, returns 1 and sets the pointer pointed to by
 878 | // new_node to point to the new node.  Old node updated to become one of two.
 879 | // The level argument specifies the number of steps up from the leaf
 880 | // level to insert; e.g. a data rectangle goes in at level = 0.
 881 | RTREE_TEMPLATE
 882 | bool RTREE_QUAL::InsertRectRec(const Branch& a_branch, Node* a_node, Node** a_newNode, int a_level)
 883 | {
 884 |   ASSERT(a_node && a_newNode);
 885 |   ASSERT(a_level >= 0 && a_level <= a_node->m_level);
 886 | 
 887 |   // recurse until we reach the correct level for the new record. data records
 888 |   // will always be called with a_level == 0 (leaf)
 889 |   if(a_node->m_level > a_level)
 890 |   {
 891 |     // Still above level for insertion, go down tree recursively
 892 |     Node* otherNode;
 893 | 
 894 |     // find the optimal branch for this record
 895 |     int index = PickBranch(&a_branch.m_rect, a_node);
 896 | 
 897 |     // recursively insert this record into the picked branch
 898 |     bool childWasSplit = InsertRectRec(a_branch, a_node->m_branch[index].m_child, &otherNode, a_level);
 899 | 
 900 |     if (!childWasSplit)
 901 |     {
 902 |       // Child was not split. Merge the bounding box of the new record with the
 903 |       // existing bounding box
 904 |       a_node->m_branch[index].m_rect = CombineRect(&a_branch.m_rect, &(a_node->m_branch[index].m_rect));
 905 |       return false;
 906 |     }
 907 |     else
 908 |     {
 909 |       // Child was split. The old branches are now re-partitioned to two nodes
 910 |       // so we have to re-calculate the bounding boxes of each node
 911 |       a_node->m_branch[index].m_rect = NodeCover(a_node->m_branch[index].m_child);
 912 |       Branch branch;
 913 |       branch.m_child = otherNode;
 914 |       branch.m_rect = NodeCover(otherNode);
 915 | 
 916 |       // The old node is already a child of a_node. Now add the newly-created
 917 |       // node to a_node as well. a_node might be split because of that.
 918 |       return AddBranch(&branch, a_node, a_newNode);
 919 |     }
 920 |   }
 921 |   else if(a_node->m_level == a_level)
 922 |   {
 923 |     // We have reached level for insertion. Add rect, split if necessary
 924 |     return AddBranch(&a_branch, a_node, a_newNode);
 925 |   }
 926 |   else
 927 |   {
 928 |     // Should never occur
 929 |     ASSERT(0);
 930 |     return false;
 931 |   }
 932 | }
 933 | 
 934 | 
 935 | // Insert a data rectangle into an index structure.
 936 | // InsertRect provides for splitting the root;
 937 | // returns 1 if root was split, 0 if it was not.
 938 | // The level argument specifies the number of steps up from the leaf
 939 | // level to insert; e.g. a data rectangle goes in at level = 0.
 940 | // InsertRect2 does the recursion.
 941 | //
 942 | RTREE_TEMPLATE
 943 | bool RTREE_QUAL::InsertRect(const Branch& a_branch, Node** a_root, int a_level)
 944 | {
 945 |   ASSERT(a_root);
 946 |   ASSERT(a_level >= 0 && a_level <= (*a_root)->m_level);
 947 | #ifdef _DEBUG
 948 |   for(int index=0; index < NUMDIMS; ++index)
 949 |   {
 950 |     ASSERT(a_branch.m_rect.m_min[index] <= a_branch.m_rect.m_max[index]);
 951 |   }
 952 | #endif //_DEBUG  
 953 | 
 954 |   Node* newNode;
 955 | 
 956 |   if(InsertRectRec(a_branch, *a_root, &newNode, a_level))  // Root split
 957 |   {
 958 |     // Grow tree taller and new root
 959 |     Node* newRoot = AllocNode();
 960 |     newRoot->m_level = (*a_root)->m_level + 1;
 961 | 
 962 |     Branch branch;
 963 | 
 964 |     // add old root node as a child of the new root
 965 |     branch.m_rect = NodeCover(*a_root);
 966 |     branch.m_child = *a_root;
 967 |     AddBranch(&branch, newRoot, NULL);
 968 | 
 969 |     // add the split node as a child of the new root
 970 |     branch.m_rect = NodeCover(newNode);
 971 |     branch.m_child = newNode;
 972 |     AddBranch(&branch, newRoot, NULL);
 973 | 
 974 |     // set the new root as the root node
 975 |     *a_root = newRoot;
 976 | 
 977 |     return true;
 978 |   }
 979 | 
 980 |   return false;
 981 | }
 982 | 
 983 | 
 984 | // Find the smallest rectangle that includes all rectangles in branches of a node.
 985 | RTREE_TEMPLATE
 986 | typename RTREE_QUAL::Rect RTREE_QUAL::NodeCover(Node* a_node)
 987 | {
 988 |   ASSERT(a_node);
 989 |   
 990 |   Rect rect = a_node->m_branch[0].m_rect;
 991 |   for(int index = 1; index < a_node->m_count; ++index)
 992 |   {
 993 |      rect = CombineRect(&rect, &(a_node->m_branch[index].m_rect));
 994 |   }
 995 |   
 996 |   return rect;
 997 | }
 998 | 
 999 | 
1000 | // Add a branch to a node.  Split the node if necessary.
1001 | // Returns 0 if node not split.  Old node updated.
1002 | // Returns 1 if node split, sets *new_node to address of new node.
1003 | // Old node updated, becomes one of two.
1004 | RTREE_TEMPLATE
1005 | bool RTREE_QUAL::AddBranch(const Branch* a_branch, Node* a_node, Node** a_newNode)
1006 | {
1007 |   ASSERT(a_branch);
1008 |   ASSERT(a_node);
1009 | 
1010 |   if(a_node->m_count < MAXNODES)  // Split won't be necessary
1011 |   {
1012 |     a_node->m_branch[a_node->m_count] = *a_branch;
1013 |     ++a_node->m_count;
1014 | 
1015 |     return false;
1016 |   }
1017 |   else
1018 |   {
1019 |     ASSERT(a_newNode);
1020 |     
1021 |     SplitNode(a_node, a_branch, a_newNode);
1022 |     return true;
1023 |   }
1024 | }
1025 | 
1026 | 
1027 | // Disconnect a dependent node.
1028 | // Caller must return (or stop using iteration index) after this as count has changed
1029 | RTREE_TEMPLATE
1030 | void RTREE_QUAL::DisconnectBranch(Node* a_node, int a_index)
1031 | {
1032 |   ASSERT(a_node && (a_index >= 0) && (a_index < MAXNODES));
1033 |   ASSERT(a_node->m_count > 0);
1034 | 
1035 |   // Remove element by swapping with the last element to prevent gaps in array
1036 |   a_node->m_branch[a_index] = a_node->m_branch[a_node->m_count - 1];
1037 |   
1038 |   --a_node->m_count;
1039 | }
1040 | 
1041 | 
1042 | // Pick a branch.  Pick the one that will need the smallest increase
1043 | // in area to accomodate the new rectangle.  This will result in the
1044 | // least total area for the covering rectangles in the current node.
1045 | // In case of a tie, pick the one which was smaller before, to get
1046 | // the best resolution when searching.
1047 | RTREE_TEMPLATE
1048 | int RTREE_QUAL::PickBranch(const Rect* a_rect, Node* a_node)
1049 | {
1050 |   ASSERT(a_rect && a_node);
1051 |   
1052 |   bool firstTime = true;
1053 |   ELEMTYPEREAL increase;
1054 |   ELEMTYPEREAL bestIncr = (ELEMTYPEREAL)-1;
1055 |   ELEMTYPEREAL area;
1056 |   ELEMTYPEREAL bestArea;
1057 |   int best;
1058 |   Rect tempRect;
1059 | 
1060 |   for(int index=0; index < a_node->m_count; ++index)
1061 |   {
1062 |     Rect* curRect = &a_node->m_branch[index].m_rect;
1063 |     area = CalcRectVolume(curRect);
1064 |     tempRect = CombineRect(a_rect, curRect);
1065 |     increase = CalcRectVolume(&tempRect) - area;
1066 |     if((increase < bestIncr) || firstTime)
1067 |     {
1068 |       best = index;
1069 |       bestArea = area;
1070 |       bestIncr = increase;
1071 |       firstTime = false;
1072 |     }
1073 |     else if((increase == bestIncr) && (area < bestArea))
1074 |     {
1075 |       best = index;
1076 |       bestArea = area;
1077 |       bestIncr = increase;
1078 |     }
1079 |   }
1080 |   return best;
1081 | }
1082 | 
1083 | 
1084 | // Combine two rectangles into larger one containing both
1085 | RTREE_TEMPLATE
1086 | typename RTREE_QUAL::Rect RTREE_QUAL::CombineRect(const Rect* a_rectA, const Rect* a_rectB)
1087 | {
1088 |   ASSERT(a_rectA && a_rectB);
1089 | 
1090 |   Rect newRect;
1091 | 
1092 |   for(int index = 0; index < NUMDIMS; ++index)
1093 |   {
1094 |     newRect.m_min[index] = Min(a_rectA->m_min[index], a_rectB->m_min[index]);
1095 |     newRect.m_max[index] = Max(a_rectA->m_max[index], a_rectB->m_max[index]);
1096 |   }
1097 | 
1098 |   return newRect;
1099 | }
1100 | 
1101 | 
1102 | 
1103 | // Split a node.
1104 | // Divides the nodes branches and the extra one between two nodes.
1105 | // Old node is one of the new ones, and one really new one is created.
1106 | // Tries more than one method for choosing a partition, uses best result.
1107 | RTREE_TEMPLATE
1108 | void RTREE_QUAL::SplitNode(Node* a_node, const Branch* a_branch, Node** a_newNode)
1109 | {
1110 |   ASSERT(a_node);
1111 |   ASSERT(a_branch);
1112 | 
1113 |   // Could just use local here, but member or external is faster since it is reused
1114 |   PartitionVars localVars;
1115 |   PartitionVars* parVars = &localVars;
1116 | 
1117 |   // Load all the branches into a buffer, initialize old node
1118 |   GetBranches(a_node, a_branch, parVars);
1119 | 
1120 |   // Find partition
1121 |   ChoosePartition(parVars, MINNODES);
1122 | 
1123 |   // Create a new node to hold (about) half of the branches
1124 |   *a_newNode = AllocNode();
1125 |   (*a_newNode)->m_level = a_node->m_level;
1126 | 
1127 |   // Put branches from buffer into 2 nodes according to the chosen partition
1128 |   a_node->m_count = 0;
1129 |   LoadNodes(a_node, *a_newNode, parVars);
1130 |   
1131 |   ASSERT((a_node->m_count + (*a_newNode)->m_count) == parVars->m_total);
1132 | }
1133 | 
1134 | 
1135 | // Calculate the n-dimensional volume of a rectangle
1136 | RTREE_TEMPLATE
1137 | ELEMTYPEREAL RTREE_QUAL::RectVolume(Rect* a_rect)
1138 | {
1139 |   ASSERT(a_rect);
1140 |   
1141 |   ELEMTYPEREAL volume = (ELEMTYPEREAL)1;
1142 | 
1143 |   for(int index=0; index<NUMDIMS; ++index)
1144 |   {
1145 |     volume *= a_rect->m_max[index] - a_rect->m_min[index];
1146 |   }
1147 |   
1148 |   ASSERT(volume >= (ELEMTYPEREAL)0);
1149 |   
1150 |   return volume;
1151 | }
1152 | 
1153 | 
1154 | // The exact volume of the bounding sphere for the given Rect
1155 | RTREE_TEMPLATE
1156 | ELEMTYPEREAL RTREE_QUAL::RectSphericalVolume(Rect* a_rect)
1157 | {
1158 |   ASSERT(a_rect);
1159 |    
1160 |   ELEMTYPEREAL sumOfSquares = (ELEMTYPEREAL)0;
1161 |   ELEMTYPEREAL radius;
1162 | 
1163 |   for(int index=0; index < NUMDIMS; ++index) 
1164 |   {
1165 |     ELEMTYPEREAL halfExtent = ((ELEMTYPEREAL)a_rect->m_max[index] - (ELEMTYPEREAL)a_rect->m_min[index]) * 0.5f;
1166 |     sumOfSquares += halfExtent * halfExtent;
1167 |   }
1168 | 
1169 |   radius = (ELEMTYPEREAL)sqrt(sumOfSquares);
1170 |   
1171 |   // Pow maybe slow, so test for common dims like 2,3 and just use x*x, x*x*x.
1172 |   if(NUMDIMS == 3)
1173 |   {
1174 |     return (radius * radius * radius * m_unitSphereVolume);
1175 |   }
1176 |   else if(NUMDIMS == 2)
1177 |   {
1178 |     return (radius * radius * m_unitSphereVolume);
1179 |   }
1180 |   else
1181 |   {
1182 |     return (ELEMTYPEREAL)(pow(radius, NUMDIMS) * m_unitSphereVolume);
1183 |   }
1184 | }
1185 | 
1186 | 
1187 | // Use one of the methods to calculate retangle volume
1188 | RTREE_TEMPLATE
1189 | ELEMTYPEREAL RTREE_QUAL::CalcRectVolume(Rect* a_rect)
1190 | {
1191 | #ifdef RTREE_USE_SPHERICAL_VOLUME
1192 |   return RectSphericalVolume(a_rect); // Slower but helps certain merge cases
1193 | #else // RTREE_USE_SPHERICAL_VOLUME
1194 |   return RectVolume(a_rect); // Faster but can cause poor merges
1195 | #endif // RTREE_USE_SPHERICAL_VOLUME  
1196 | }
1197 | 
1198 | 
1199 | // Load branch buffer with branches from full node plus the extra branch.
1200 | RTREE_TEMPLATE
1201 | void RTREE_QUAL::GetBranches(Node* a_node, const Branch* a_branch, PartitionVars* a_parVars)
1202 | {
1203 |   ASSERT(a_node);
1204 |   ASSERT(a_branch);
1205 | 
1206 |   ASSERT(a_node->m_count == MAXNODES);
1207 |     
1208 |   // Load the branch buffer
1209 |   for(int index=0; index < MAXNODES; ++index)
1210 |   {
1211 |     a_parVars->m_branchBuf[index] = a_node->m_branch[index];
1212 |   }
1213 |   a_parVars->m_branchBuf[MAXNODES] = *a_branch;
1214 |   a_parVars->m_branchCount = MAXNODES + 1;
1215 | 
1216 |   // Calculate rect containing all in the set
1217 |   a_parVars->m_coverSplit = a_parVars->m_branchBuf[0].m_rect;
1218 |   for(int index=1; index < MAXNODES+1; ++index)
1219 |   {
1220 |     a_parVars->m_coverSplit = CombineRect(&a_parVars->m_coverSplit, &a_parVars->m_branchBuf[index].m_rect);
1221 |   }
1222 |   a_parVars->m_coverSplitArea = CalcRectVolume(&a_parVars->m_coverSplit);
1223 | }
1224 | 
1225 | 
1226 | // Method #0 for choosing a partition:
1227 | // As the seeds for the two groups, pick the two rects that would waste the
1228 | // most area if covered by a single rectangle, i.e. evidently the worst pair
1229 | // to have in the same group.
1230 | // Of the remaining, one at a time is chosen to be put in one of the two groups.
1231 | // The one chosen is the one with the greatest difference in area expansion
1232 | // depending on which group - the rect most strongly attracted to one group
1233 | // and repelled from the other.
1234 | // If one group gets too full (more would force other group to violate min
1235 | // fill requirement) then other group gets the rest.
1236 | // These last are the ones that can go in either group most easily.
1237 | RTREE_TEMPLATE
1238 | void RTREE_QUAL::ChoosePartition(PartitionVars* a_parVars, int a_minFill)
1239 | {
1240 |   ASSERT(a_parVars);
1241 |   
1242 |   ELEMTYPEREAL biggestDiff;
1243 |   int group, chosen, betterGroup;
1244 |   
1245 |   InitParVars(a_parVars, a_parVars->m_branchCount, a_minFill);
1246 |   PickSeeds(a_parVars);
1247 | 
1248 |   while (((a_parVars->m_count[0] + a_parVars->m_count[1]) < a_parVars->m_total)
1249 |        && (a_parVars->m_count[0] < (a_parVars->m_total - a_parVars->m_minFill))
1250 |        && (a_parVars->m_count[1] < (a_parVars->m_total - a_parVars->m_minFill)))
1251 |   {
1252 |     biggestDiff = (ELEMTYPEREAL) -1;
1253 |     for(int index=0; index<a_parVars->m_total; ++index)
1254 |     {
1255 |       if(PartitionVars::NOT_TAKEN == a_parVars->m_partition[index])
1256 |       {
1257 |         Rect* curRect = &a_parVars->m_branchBuf[index].m_rect;
1258 |         Rect rect0 = CombineRect(curRect, &a_parVars->m_cover[0]);
1259 |         Rect rect1 = CombineRect(curRect, &a_parVars->m_cover[1]);
1260 |         ELEMTYPEREAL growth0 = CalcRectVolume(&rect0) - a_parVars->m_area[0];
1261 |         ELEMTYPEREAL growth1 = CalcRectVolume(&rect1) - a_parVars->m_area[1];
1262 |         ELEMTYPEREAL diff = growth1 - growth0;
1263 |         if(diff >= 0)
1264 |         {
1265 |           group = 0;
1266 |         }
1267 |         else
1268 |         {
1269 |           group = 1;
1270 |           diff = -diff;
1271 |         }
1272 | 
1273 |         if(diff > biggestDiff)
1274 |         {
1275 |           biggestDiff = diff;
1276 |           chosen = index;
1277 |           betterGroup = group;
1278 |         }
1279 |         else if((diff == biggestDiff) && (a_parVars->m_count[group] < a_parVars->m_count[betterGroup]))
1280 |         {
1281 |           chosen = index;
1282 |           betterGroup = group;
1283 |         }
1284 |       }
1285 |     }
1286 |     Classify(chosen, betterGroup, a_parVars);
1287 |   }
1288 | 
1289 |   // If one group too full, put remaining rects in the other
1290 |   if((a_parVars->m_count[0] + a_parVars->m_count[1]) < a_parVars->m_total)
1291 |   {
1292 |     if(a_parVars->m_count[0] >= a_parVars->m_total - a_parVars->m_minFill)
1293 |     {
1294 |       group = 1;
1295 |     }
1296 |     else
1297 |     {
1298 |       group = 0;
1299 |     }
1300 |     for(int index=0; index<a_parVars->m_total; ++index)
1301 |     {
1302 |       if(PartitionVars::NOT_TAKEN == a_parVars->m_partition[index])
1303 |       {
1304 |         Classify(index, group, a_parVars);
1305 |       }
1306 |     }
1307 |   }
1308 | 
1309 |   ASSERT((a_parVars->m_count[0] + a_parVars->m_count[1]) == a_parVars->m_total);
1310 |   ASSERT((a_parVars->m_count[0] >= a_parVars->m_minFill) && 
1311 |         (a_parVars->m_count[1] >= a_parVars->m_minFill));
1312 | }
1313 | 
1314 | 
1315 | // Copy branches from the buffer into two nodes according to the partition.
1316 | RTREE_TEMPLATE
1317 | void RTREE_QUAL::LoadNodes(Node* a_nodeA, Node* a_nodeB, PartitionVars* a_parVars)
1318 | {
1319 |   ASSERT(a_nodeA);
1320 |   ASSERT(a_nodeB);
1321 |   ASSERT(a_parVars);
1322 | 
1323 |   for(int index=0; index < a_parVars->m_total; ++index)
1324 |   {
1325 |     ASSERT(a_parVars->m_partition[index] == 0 || a_parVars->m_partition[index] == 1);
1326 | 
1327 |     int targetNodeIndex = a_parVars->m_partition[index];
1328 |     Node* targetNodes[] = {a_nodeA, a_nodeB};
1329 | 
1330 |     // It is assured that AddBranch here will not cause a node split. 
1331 |     bool nodeWasSplit = AddBranch(&a_parVars->m_branchBuf[index], targetNodes[targetNodeIndex], NULL);
1332 |     ASSERT(!nodeWasSplit);
1333 |   }
1334 | }
1335 | 
1336 | 
1337 | // Initialize a PartitionVars structure.
1338 | RTREE_TEMPLATE
1339 | void RTREE_QUAL::InitParVars(PartitionVars* a_parVars, int a_maxRects, int a_minFill)
1340 | {
1341 |   ASSERT(a_parVars);
1342 | 
1343 |   a_parVars->m_count[0] = a_parVars->m_count[1] = 0;
1344 |   a_parVars->m_area[0] = a_parVars->m_area[1] = (ELEMTYPEREAL)0;
1345 |   a_parVars->m_total = a_maxRects;
1346 |   a_parVars->m_minFill = a_minFill;
1347 |   for(int index=0; index < a_maxRects; ++index)
1348 |   {
1349 |     a_parVars->m_partition[index] = PartitionVars::NOT_TAKEN;
1350 |   }
1351 | }
1352 | 
1353 | 
1354 | RTREE_TEMPLATE
1355 | void RTREE_QUAL::PickSeeds(PartitionVars* a_parVars)
1356 | {
1357 |   int seed0, seed1;
1358 |   ELEMTYPEREAL worst, waste;
1359 |   ELEMTYPEREAL area[MAXNODES+1];
1360 | 
1361 |   for(int index=0; index<a_parVars->m_total; ++index)
1362 |   {
1363 |     area[index] = CalcRectVolume(&a_parVars->m_branchBuf[index].m_rect);
1364 |   }
1365 | 
1366 |   worst = -a_parVars->m_coverSplitArea - 1;
1367 |   for(int indexA=0; indexA < a_parVars->m_total-1; ++indexA)
1368 |   {
1369 |     for(int indexB = indexA+1; indexB < a_parVars->m_total; ++indexB)
1370 |     {
1371 |       Rect oneRect = CombineRect(&a_parVars->m_branchBuf[indexA].m_rect, &a_parVars->m_branchBuf[indexB].m_rect);
1372 |       waste = CalcRectVolume(&oneRect) - area[indexA] - area[indexB];
1373 |       if(waste > worst)
1374 |       {
1375 |         worst = waste;
1376 |         seed0 = indexA;
1377 |         seed1 = indexB;
1378 |       }
1379 |     }
1380 |   }
1381 | 
1382 |   Classify(seed0, 0, a_parVars);
1383 |   Classify(seed1, 1, a_parVars);
1384 | }
1385 | 
1386 | 
1387 | // Put a branch in one of the groups.
1388 | RTREE_TEMPLATE
1389 | void RTREE_QUAL::Classify(int a_index, int a_group, PartitionVars* a_parVars)
1390 | {
1391 |   ASSERT(a_parVars);
1392 |   ASSERT(PartitionVars::NOT_TAKEN == a_parVars->m_partition[a_index]);
1393 | 
1394 |   a_parVars->m_partition[a_index] = a_group;
1395 | 
1396 |   // Calculate combined rect
1397 |   if (a_parVars->m_count[a_group] == 0)
1398 |   {
1399 |     a_parVars->m_cover[a_group] = a_parVars->m_branchBuf[a_index].m_rect;
1400 |   }
1401 |   else
1402 |   {
1403 |     a_parVars->m_cover[a_group] = CombineRect(&a_parVars->m_branchBuf[a_index].m_rect, &a_parVars->m_cover[a_group]);
1404 |   }
1405 | 
1406 |   // Calculate volume of combined rect
1407 |   a_parVars->m_area[a_group] = CalcRectVolume(&a_parVars->m_cover[a_group]);
1408 | 
1409 |   ++a_parVars->m_count[a_group];
1410 | }
1411 | 
1412 | 
1413 | // Delete a data rectangle from an index structure.
1414 | // Pass in a pointer to a Rect, the tid of the record, ptr to ptr to root node.
1415 | // Returns 1 if record not found, 0 if success.
1416 | // RemoveRect provides for eliminating the root.
1417 | RTREE_TEMPLATE
1418 | bool RTREE_QUAL::RemoveRect(Rect* a_rect, const DATATYPE& a_id, Node** a_root)
1419 | {
1420 |   ASSERT(a_rect && a_root);
1421 |   ASSERT(*a_root);
1422 | 
1423 |   ListNode* reInsertList = NULL;
1424 | 
1425 |   if(!RemoveRectRec(a_rect, a_id, *a_root, &reInsertList))
1426 |   {
1427 |     // Found and deleted a data item
1428 |     // Reinsert any branches from eliminated nodes
1429 |     while(reInsertList)
1430 |     {
1431 |       Node* tempNode = reInsertList->m_node;
1432 | 
1433 |       for(int index = 0; index < tempNode->m_count; ++index)
1434 |       {
1435 |         // TODO go over this code. should I use (tempNode->m_level - 1)?
1436 |         InsertRect(tempNode->m_branch[index],
1437 |                    a_root,
1438 |                    tempNode->m_level);
1439 |       }
1440 |       
1441 |       ListNode* remLNode = reInsertList;
1442 |       reInsertList = reInsertList->m_next;
1443 |       
1444 |       FreeNode(remLNode->m_node);
1445 |       FreeListNode(remLNode);
1446 |     }
1447 |     
1448 |     // Check for redundant root (not leaf, 1 child) and eliminate TODO replace
1449 |     // if with while? In case there is a whole branch of redundant roots...
1450 |     if((*a_root)->m_count == 1 && (*a_root)->IsInternalNode())
1451 |     {
1452 |       Node* tempNode = (*a_root)->m_branch[0].m_child;
1453 |       
1454 |       ASSERT(tempNode);
1455 |       FreeNode(*a_root);
1456 |       *a_root = tempNode;
1457 |     }
1458 |     return false;
1459 |   }
1460 |   else
1461 |   {
1462 |     return true;
1463 |   }
1464 | }
1465 | 
1466 | 
1467 | // Delete a rectangle from non-root part of an index structure.
1468 | // Called by RemoveRect.  Descends tree recursively,
1469 | // merges branches on the way back up.
1470 | // Returns 1 if record not found, 0 if success.
1471 | RTREE_TEMPLATE
1472 | bool RTREE_QUAL::RemoveRectRec(Rect* a_rect, const DATATYPE& a_id, Node* a_node, ListNode** a_listNode)
1473 | {
1474 |   ASSERT(a_rect && a_node && a_listNode);
1475 |   ASSERT(a_node->m_level >= 0);
1476 | 
1477 |   if(a_node->IsInternalNode())  // not a leaf node
1478 |   {
1479 |     for(int index = 0; index < a_node->m_count; ++index)
1480 |     {
1481 |       if(Overlap(a_rect, &(a_node->m_branch[index].m_rect)))
1482 |       {
1483 |         if(!RemoveRectRec(a_rect, a_id, a_node->m_branch[index].m_child, a_listNode))
1484 |         {
1485 |           if(a_node->m_branch[index].m_child->m_count >= MINNODES)
1486 |           {
1487 |             // child removed, just resize parent rect
1488 |             a_node->m_branch[index].m_rect = NodeCover(a_node->m_branch[index].m_child);
1489 |           }
1490 |           else
1491 |           {
1492 |             // child removed, not enough entries in node, eliminate node
1493 |             ReInsert(a_node->m_branch[index].m_child, a_listNode);
1494 |             DisconnectBranch(a_node, index); // Must return after this call as count has changed
1495 |           }
1496 |           return false;
1497 |         }
1498 |       }
1499 |     }
1500 |     return true;
1501 |   }
1502 |   else // A leaf node
1503 |   {
1504 |     for(int index = 0; index < a_node->m_count; ++index)
1505 |     {
1506 |       if(a_node->m_branch[index].m_data == a_id)
1507 |       {
1508 |         DisconnectBranch(a_node, index); // Must return after this call as count has changed
1509 |         return false;
1510 |       }
1511 |     }
1512 |     return true;
1513 |   }
1514 | }
1515 | 
1516 | 
1517 | // Decide whether two rectangles overlap.
1518 | RTREE_TEMPLATE
1519 | bool RTREE_QUAL::Overlap(Rect* a_rectA, Rect* a_rectB)
1520 | {
1521 |   ASSERT(a_rectA && a_rectB);
1522 | 
1523 |   for(int index=0; index < NUMDIMS; ++index)
1524 |   {
1525 |     if (a_rectA->m_min[index] > a_rectB->m_max[index] ||
1526 |         a_rectB->m_min[index] > a_rectA->m_max[index])
1527 |     {
1528 |       return false;
1529 |     }
1530 |   }
1531 |   return true;
1532 | }
1533 | 
1534 | 
1535 | // Add a node to the reinsertion list.  All its branches will later
1536 | // be reinserted into the index structure.
1537 | RTREE_TEMPLATE
1538 | void RTREE_QUAL::ReInsert(Node* a_node, ListNode** a_listNode)
1539 | {
1540 |   ListNode* newListNode;
1541 | 
1542 |   newListNode = AllocListNode();
1543 |   newListNode->m_node = a_node;
1544 |   newListNode->m_next = *a_listNode;
1545 |   *a_listNode = newListNode;
1546 | }
1547 | 
1548 | 
1549 | // Search in an index tree or subtree for all data retangles that overlap the argument rectangle.
1550 | RTREE_TEMPLATE
1551 | bool RTREE_QUAL::Search(Node* a_node, Rect* a_rect, int& a_foundCount, t_resultCallback a_resultCallback, void* a_context)
1552 | {
1553 |   ASSERT(a_node);
1554 |   ASSERT(a_node->m_level >= 0);
1555 |   ASSERT(a_rect);
1556 | 
1557 |   if(a_node->IsInternalNode())
1558 |   {
1559 |     // This is an internal node in the tree
1560 |     for(int index=0; index < a_node->m_count; ++index)
1561 |     {
1562 |       if(Overlap(a_rect, &a_node->m_branch[index].m_rect))
1563 |       {
1564 |         if(!Search(a_node->m_branch[index].m_child, a_rect, a_foundCount, a_resultCallback, a_context))
1565 |         {
1566 |           // The callback indicated to stop searching
1567 |           return false;
1568 |         }
1569 |       }
1570 |     }
1571 |   }
1572 |   else
1573 |   {
1574 |     // This is a leaf node
1575 |     for(int index=0; index < a_node->m_count; ++index)
1576 |     {
1577 |       if(Overlap(a_rect, &a_node->m_branch[index].m_rect))
1578 |       {
1579 |         DATATYPE& id = a_node->m_branch[index].m_data;
1580 |         ++a_foundCount;
1581 | 
1582 |         // NOTE: There are different ways to return results.  Here's where to modify
1583 |         if(a_resultCallback)
1584 |         {
1585 |           if(!a_resultCallback(id, a_context))
1586 |           {
1587 |             return false; // Don't continue searching
1588 |           }
1589 |         }
1590 |       }
1591 |     }
1592 |   }
1593 | 
1594 |   return true; // Continue searching
1595 | }
1596 | 
1597 | 
1598 | #undef RTREE_TEMPLATE
1599 | #undef RTREE_QUAL
1600 | 
1601 | #endif //RTREE_H
1602 | 
1603 | 


--------------------------------------------------------------------------------
/filter/filter_planet_by_cats.cpp:
--------------------------------------------------------------------------------
  1 | /*
  2 |     Filters a planet file by categories and location.
  3 | 
  4 |     Serves as a replacement for Overpass API for the OSM Conflator.
  5 |     Takes two parameters: a list of coordinates and categories prepared by
  6 |     conflate.py and an OSM PBF/XML file. Prints an OSM XML file with
  7 |     objects that will then be conflated with the external dataset.
  8 |     Either specify that XML file name as the third parameter, or redirect
  9 |     the output.
 10 | 
 11 |     Based on the osmium_amenity_list.cpp from libosmium.
 12 | 
 13 |     Published under Apache Public License 2.0.
 14 | 
 15 |     Written by Ilya Zverev for MAPS.ME.
 16 | */
 17 | 
 18 | #include <cctype>
 19 | #include <cstdio>
 20 | #include <cstdlib>
 21 | #include <iostream>
 22 | #include <fstream>
 23 | #include <string>
 24 | #include <map>
 25 | 
 26 | #include <osmium/geom/coordinates.hpp>
 27 | #include <osmium/handler/node_locations_for_ways.hpp>
 28 | #include <osmium/index/map/flex_mem.hpp>
 29 | #include <osmium/io/any_input.hpp>
 30 | #include <osmium/io/xml_output.hpp>
 31 | #include <osmium/relations/relations_manager.hpp>
 32 | #include <osmium/visitor.hpp>
 33 | 
 34 | #include "RTree.h"
 35 | #include "xml_centers_output.hpp"
 36 | 
 37 | using index_type = osmium::index::map::FlexMem<osmium::unsigned_object_id_type,
 38 |                                                osmium::Location>;
 39 | using location_handler_type = osmium::handler::NodeLocationsForWays<index_type>;
 40 | 
 41 | bool AppendToVector(uint16_t cat_id, void *vec) {
 42 |   static_cast<std::vector<uint16_t>*>(vec)->push_back(cat_id);
 43 |   return true;
 44 | }
 45 | 
 46 | class AmenityHandler : public osmium::handler::Handler {
 47 | 
 48 |   constexpr static double kSearchRadius = 0.01;
 49 | 
 50 |   typedef RTree<uint16_t, int32_t, 2, double> DatasetTree;
 51 |   typedef std::vector<std::vector<std::string>> TQuery;
 52 |   typedef std::vector<TQuery> TCategory;
 53 | 
 54 |   DatasetTree m_tree;
 55 |   osmium::io::xmlcenters::XMLCentersOutput m_centers;
 56 |   std::map<uint16_t, std::vector<TQuery>> m_categories;
 57 |   std::map<uint16_t, std::string> m_category_names;
 58 | 
 59 |   void print_object(const osmium::OSMObject &obj,
 60 |                     const osmium::Location &center) {
 61 |     std::cout << m_centers.apply(obj, center);
 62 |   }
 63 | 
 64 |   // Calculate the center point of a NodeRefList.
 65 |   osmium::Location calc_center(const osmium::NodeRefList &nr_list) {
 66 |     int64_t x = 0;
 67 |     int64_t y = 0;
 68 | 
 69 |     for (const auto &nr : nr_list) {
 70 |       x += nr.x();
 71 |       y += nr.y();
 72 |     }
 73 | 
 74 |     x /= nr_list.size();
 75 |     y /= nr_list.size();
 76 | 
 77 |     return osmium::Location{x, y};
 78 |   }
 79 | 
 80 |   bool TestTags(osmium::TagList const & tags, TQuery const & query) {
 81 |     for (std::vector<std::string> const & pair : query) {
 82 |       const char *value = tags[pair[0].c_str()];
 83 |       if (pair.size() == 2 && pair[1].empty()) {
 84 |         if (value != nullptr)
 85 |           return false;
 86 |       } else {
 87 |         if (value == nullptr)
 88 |           return false;
 89 |         if (pair.size() > 1) {
 90 |           // TODO: substrings?
 91 |           bool found = false;
 92 |           for (size_t i = 1; i < pair.size(); i++) {
 93 |             if (!strcmp(value, pair[i].c_str())) {
 94 |               found = true;
 95 |               break;
 96 |             }
 97 |           }
 98 |           if (!found)
 99 |             return false;
100 |         }
101 |       }
102 |     }
103 |     return true;
104 |   }
105 | 
106 |   bool IsEligible(const osmium::Location & loc, osmium::TagList const & tags) {
107 |     if (tags.empty())
108 |       return false;
109 | 
110 |     int32_t radius = osmium::Location::double_to_fix(kSearchRadius);
111 |     int32_t min[] = {loc.x() - radius, loc.y() - radius};
112 |     int32_t max[] = {loc.x() + radius, loc.y() + radius};
113 |     std::vector<uint16_t> found;
114 |     if (!m_tree.Search(min, max, &AppendToVector, &found))
115 |       return false;
116 |     for (uint16_t cat_id : found)
117 |       for (TQuery query : m_categories[cat_id])
118 |         if (TestTags(tags, query))
119 |           return true;
120 |     return false;
121 |   }
122 | 
123 |   void SplitTrim(std::string const & s, char delimiter, std::size_t limit, std::vector<std::string> & target) {
124 |     target.clear();
125 |     std::size_t start = 0, end = 0;
126 |     while (start < s.length()) {
127 |       end = s.find(delimiter, start);
128 |       if (end == std::string::npos || target.size() == limit)
129 |         end = s.length();
130 |       while (start < end && std::isspace(s[start]))
131 |         start++;
132 | 
133 |       std::size_t tmpend = end - 1;
134 |       while (tmpend > start && std::isspace(s[tmpend]))
135 |         tmpend++;
136 |       target.push_back(s.substr(start, tmpend - start + 1));
137 |       start = end + 1;
138 |     }
139 |   }
140 | 
141 |   TQuery ParseQuery(std::string const & query) {
142 |     TQuery q;
143 |     std::vector<std::string> parts;
144 |     SplitTrim(query, '|', 100, parts);
145 |     for (std::string const & part : parts) {
146 |       std::vector<std::string> keys;
147 |       SplitTrim(part, '=', 100, keys);
148 |       if (keys.size() > 0)
149 |           q.push_back(keys);
150 |     }
151 |     return q;
152 |   }
153 | 
154 |   void LoadCategories(const char *filename) {
155 |     std::ifstream infile(filename);
156 |     std::string line;
157 |     std::vector<std::string> parts;
158 |     bool parsingPoints = false;
159 |     while (std::getline(infile, line)) {
160 |       if (!parsingPoints) {
161 |         if (!line.size())
162 |           parsingPoints = true;
163 |         else {
164 |           SplitTrim(line, ',', 3, parts); // cat_id, name, query
165 |           uint16_t cat_id = std::stoi(parts[0]);
166 |           m_category_names[cat_id] = parts[1];
167 |           m_categories[cat_id].push_back(ParseQuery(parts[2]));
168 |         }
169 |       } else {
170 |         SplitTrim(line, ',', 3, parts); // lon, lat, cat_id
171 |         const osmium::Location loc(std::stod(parts[0]), std::stod(parts[1]));
172 |         int32_t coords[] = {loc.x(), loc.y()};
173 |         uint16_t cat_id = std::stoi(parts[2]);
174 |         m_tree.Insert(coords, coords, cat_id);
175 |       }
176 |     }
177 |   }
178 | 
179 | public:
180 |   AmenityHandler(const char *categories) {
181 |     LoadCategories(categories);
182 |   }
183 | 
184 |   void node(osmium::Node const & node) {
185 |     if (IsEligible(node.location(), node.tags())) {
186 |       print_object(node, node.location());
187 |     }
188 |   }
189 | 
190 |   void way(osmium::Way const & way) {
191 |     if (!way.is_closed())
192 |       return;
193 | 
194 |     int64_t x = 0, y = 0, cnt = 0;
195 |     for (const auto& node_ref : way.nodes()) {
196 |         if (node_ref.location()) {
197 |             x += node_ref.x();
198 |             y += node_ref.y();
199 |             cnt++;
200 |         }
201 |     }
202 |     if (!cnt)
203 |       return;
204 | 
205 |     const osmium::Location center(x / cnt, y / cnt);
206 |     if (IsEligible(center, way.tags())) {
207 |       print_object(way, center);
208 |     }
209 |   }
210 | 
211 |   void multi(osmium::Relation const & rel, osmium::Location const & center) {
212 |     if (IsEligible(center, rel.tags())) {
213 |       print_object(rel, center);
214 |     }
215 |   }
216 | 
217 | }; // class AmenityHandler
218 | 
219 | class AmenityRelationsManager : public osmium::relations::RelationsManager<AmenityRelationsManager, false, true, false> {
220 | 
221 |     AmenityHandler *m_handler;
222 | 
223 | public:
224 | 
225 |   AmenityRelationsManager(AmenityHandler & handler) :
226 |       RelationsManager(),
227 |       m_handler(&handler) {
228 |   }
229 | 
230 |   bool new_relation(osmium::Relation const & rel) noexcept {
231 |     const char *rel_type = rel.tags().get_value_by_key("type");
232 |     return rel_type && !std::strcmp(rel_type, "multipolygon");
233 |   }
234 | 
235 |   void complete_relation(osmium::Relation const & rel) {
236 |     int64_t x = 0, y = 0, cnt = 0;
237 |     for (auto const & member : rel.members()) {
238 |         if (member.ref() != 0) {
239 |             const osmium::Way* way = this->get_member_way(member.ref());
240 |             for (const auto& node_ref : way->nodes()) {
241 |                 if (node_ref.location()) {
242 |                     x += node_ref.x();
243 |                     y += node_ref.y();
244 |                     cnt++;
245 |                 }
246 |             }
247 |         }
248 |     }
249 |     if (cnt > 0)
250 |         m_handler->multi(rel, osmium::Location{x / cnt, y / cnt});
251 |   }
252 | }; // class AmenityRelationsManager
253 | 
254 | int main(int argc, char *argv[]) {
255 |   if (argc < 3) {
256 |     std::cerr << "Usage: " << argv[0]
257 |               << " <dataset.lst> <osmfile>\n";
258 |     std::exit(1);
259 |   }
260 | 
261 |   const osmium::io::File input_file{argv[2]};
262 |   const osmium::io::File output_file{"", "osm"};
263 | 
264 |   AmenityHandler data_handler(argv[1]);
265 |   AmenityRelationsManager manager(data_handler);
266 |   osmium::relations::read_relations(input_file, manager);
267 | 
268 |   osmium::io::Header header;
269 |   header.set("generator", argv[0]);
270 |   osmium::io::Writer writer{output_file, header, osmium::io::overwrite::allow};
271 | 
272 |   index_type index;
273 |   location_handler_type location_handler{index};
274 |   location_handler.ignore_errors();
275 |   osmium::io::Reader reader{input_file};
276 | 
277 |   osmium::apply(reader, location_handler, data_handler, manager.handler());
278 | 
279 |   std::cout.flush();
280 |   reader.close();
281 |   writer.close();
282 | }
283 | 


--------------------------------------------------------------------------------
/filter/xml_centers_output.hpp:
--------------------------------------------------------------------------------
  1 | /*
  2 | 
  3 | This file is based on xml_output_format.hpp from the Osmium library
  4 | (http://osmcode.org/libosmium).
  5 | 
  6 | Copyright 2013-2017 Jochen Topf <jochen@topf.org> and others (see README).
  7 | Copyright 2017 Ilya Zverev <ilya@zverev.info>, MAPS.ME
  8 | 
  9 | Boost Software License - Version 1.0 - August 17th, 2003
 10 | 
 11 | Permission is hereby granted, free of charge, to any person or organization
 12 | obtaining a copy of the software and accompanying documentation covered by
 13 | this license (the "Software") to use, reproduce, display, distribute,
 14 | execute, and transmit the Software, and to prepare derivative works of the
 15 | Software, and to permit third-parties to whom the Software is furnished to
 16 | do so, all subject to the following:
 17 | 
 18 | The copyright notices in the Software and this entire statement, including
 19 | the above license grant, this restriction and the following disclaimer,
 20 | must be included in all copies of the Software, in whole or in part, and
 21 | all derivative works of the Software, unless such copies or derivative
 22 | works are solely in the form of machine-executable object code generated by
 23 | a source language processor.
 24 | 
 25 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 26 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 27 | FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
 28 | SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
 29 | FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
 30 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 31 | DEALINGS IN THE SOFTWARE.
 32 | 
 33 | */
 34 | 
 35 | #include <osmium/io/detail/string_util.hpp>
 36 | #include <osmium/osm/box.hpp>
 37 | #include <osmium/osm/item_type.hpp>
 38 | #include <osmium/osm/location.hpp>
 39 | #include <osmium/osm/node.hpp>
 40 | #include <osmium/osm/node_ref.hpp>
 41 | #include <osmium/osm/object.hpp>
 42 | #include <osmium/osm/relation.hpp>
 43 | #include <osmium/osm/tag.hpp>
 44 | #include <osmium/osm/timestamp.hpp>
 45 | #include <osmium/osm/types.hpp>
 46 | #include <osmium/osm/way.hpp>
 47 | 
 48 | #include <iterator>
 49 | #include <memory>
 50 | #include <string>
 51 | #include <utility>
 52 | 
 53 | namespace osmium {
 54 | 
 55 |     namespace io {
 56 | 
 57 |         namespace xmlcenters {
 58 | 
 59 |             namespace detail {
 60 | 
 61 |                 inline void append_lat_lon_attributes(std::string& out, const char* lat, const char* lon, const osmium::Location& location) {
 62 |                     out += ' ';
 63 |                     out += lat;
 64 |                     out += "=\"";
 65 |                     osmium::detail::append_location_coordinate_to_string(std::back_inserter(out), location.y());
 66 |                     out += "\" ";
 67 |                     out += lon;
 68 |                     out += "=\"";
 69 |                     osmium::detail::append_location_coordinate_to_string(std::back_inserter(out), location.x());
 70 |                     out += "\"";
 71 |                 }
 72 | 
 73 |             } // namespace detail
 74 | 
 75 |             class XMLCentersOutput {
 76 | 
 77 |                 std::shared_ptr<std::string> m_out;
 78 | 
 79 |                 inline void append_xml_encoded_string(std::string & out, const char *data) {
 80 |                     osmium::io::detail::append_xml_encoded_string(out, data);
 81 |                 }
 82 | 
 83 |                 void output_int(int64_t value) {
 84 |                     if (value < 0) {
 85 |                         *m_out += '-';
 86 |                         value = -value;
 87 |                     }
 88 | 
 89 |                     char temp[20];
 90 |                     char *t = temp;
 91 |                     do {
 92 |                         *t++ = char(value % 10) + '0';
 93 |                         value /= 10;
 94 |                     } while (value > 0);
 95 | 
 96 |                     const auto old_size = m_out->size();
 97 |                     m_out->resize(old_size + (t - temp));
 98 |                     char* data = &(*m_out)[old_size];
 99 |                     do {
100 |                         *data++ += *--t;
101 |                     } while (t != temp);
102 |                 }
103 | 
104 |                 void write_spaces(int num) {
105 |                     for (; num != 0; --num) {
106 |                         *m_out += ' ';
107 |                     }
108 |                 }
109 | 
110 |                 void write_prefix() {
111 |                     write_spaces(2);
112 |                 }
113 | 
114 |                 template <typename T>
115 |                 void write_attribute(const char* name, T value) {
116 |                     *m_out += ' ';
117 |                     *m_out += name;
118 |                     *m_out += "=\"";
119 |                     output_int(value);
120 |                     *m_out += '"';
121 |                 }
122 | 
123 |                 void write_meta(const osmium::OSMObject& object) {
124 |                     write_attribute("id", object.id());
125 | 
126 |                     if (object.version()) {
127 |                         write_attribute("version", object.version());
128 |                     }
129 | 
130 |                     if (object.timestamp()) {
131 |                         *m_out += " timestamp=\"";
132 |                         *m_out += object.timestamp().to_iso();
133 |                         *m_out += "\"";
134 |                     }
135 | 
136 |                     if (!object.user_is_anonymous()) {
137 |                         write_attribute("uid", object.uid());
138 |                         *m_out += " user=\"";
139 |                         append_xml_encoded_string(*m_out, object.user());
140 |                         *m_out += "\"";
141 |                     }
142 | 
143 |                     if (object.changeset()) {
144 |                         write_attribute("changeset", object.changeset());
145 |                     }
146 |                 }
147 | 
148 |                 void write_tags(const osmium::TagList& tags) {
149 |                     for (const auto& tag : tags) {
150 |                         write_spaces(2);
151 |                         *m_out += "  <tag k=\"";
152 |                         append_xml_encoded_string(*m_out, tag.key());
153 |                         *m_out += "\" v=\"";
154 |                         append_xml_encoded_string(*m_out, tag.value());
155 |                         *m_out += "\"/>\n";
156 |                     }
157 |                 }
158 | 
159 |             public:
160 | 
161 |                 XMLCentersOutput() : m_out(std::make_shared<std::string>()) {
162 |                 }
163 | 
164 |                 std::string apply(osmium::OSMObject const & item, osmium::Location const & center) {
165 |                     switch(item.type()) {
166 |                         case osmium::item_type::node:
167 |                             node(static_cast<const osmium::Node&>(item));
168 |                             break;
169 |                         case osmium::item_type::way:
170 |                             way(static_cast<const osmium::Way&>(item), center);
171 |                             break;
172 |                         case osmium::item_type::relation:
173 |                             relation(static_cast<const osmium::Relation&>(item), center);
174 |                             break;
175 |                         default:
176 |                             throw osmium::unknown_type{};
177 |                     }
178 | 
179 |                     std::string out;
180 |                     using std::swap;
181 |                     swap(out, *m_out);
182 | 
183 |                     return out;
184 |                 }
185 | 
186 |                 void node(const osmium::Node& node) {
187 |                     write_prefix();
188 |                     *m_out += "<node";
189 | 
190 |                     write_meta(node);
191 | 
192 |                     if (node.location()) {
193 |                         detail::append_lat_lon_attributes(*m_out, "lat", "lon", node.location());
194 |                     }
195 | 
196 |                     if (node.tags().empty()) {
197 |                         *m_out += "/>\n";
198 |                         return;
199 |                     }
200 | 
201 |                     *m_out += ">\n";
202 | 
203 |                     write_tags(node.tags());
204 | 
205 |                     write_prefix();
206 |                     *m_out += "</node>\n";
207 |                 }
208 | 
209 |                 void way(const osmium::Way& way, osmium::Location const & center) {
210 |                     write_prefix();
211 |                     *m_out += "<way";
212 |                     write_meta(way);
213 | 
214 |                     if (way.tags().empty() && way.nodes().empty()) {
215 |                         *m_out += "/>\n";
216 |                         return;
217 |                     }
218 | 
219 |                     *m_out += ">\n";
220 | 
221 |                     write_prefix();
222 |                     *m_out += "  <center";
223 |                     detail::append_lat_lon_attributes(*m_out, "lat", "lon", center);
224 |                     *m_out += "/>\n";
225 | 
226 |                     for (const auto& node_ref : way.nodes()) {
227 |                         write_prefix();
228 |                         *m_out += "  <nd";
229 |                         write_attribute("ref", node_ref.ref());
230 |                         *m_out += "/>\n";
231 |                     }
232 | 
233 |                     write_tags(way.tags());
234 | 
235 |                     write_prefix();
236 |                     *m_out += "</way>\n";
237 |                 }
238 | 
239 |                 void relation(const osmium::Relation& relation, osmium::Location const & center) {
240 |                     write_prefix();
241 |                     *m_out += "<relation";
242 |                     write_meta(relation);
243 | 
244 |                     if (relation.tags().empty() && relation.members().empty()) {
245 |                         *m_out += "/>\n";
246 |                         return;
247 |                     }
248 | 
249 |                     *m_out += ">\n";
250 | 
251 |                     write_prefix();
252 |                     *m_out += "  <center";
253 |                     detail::append_lat_lon_attributes(*m_out, "lat", "lon", center);
254 |                     *m_out += "/>\n";
255 | 
256 |                     for (const auto& member : relation.members()) {
257 |                         write_prefix();
258 |                         *m_out += "  <member type=\"";
259 |                         *m_out += item_type_to_name(member.type());
260 |                         *m_out += '"';
261 |                         write_attribute("ref", member.ref());
262 |                         *m_out += " role=\"";
263 |                         append_xml_encoded_string(*m_out, member.role());
264 |                         *m_out += "\"/>\n";
265 |                     }
266 | 
267 |                     write_tags(relation.tags());
268 | 
269 |                     write_prefix();
270 |                     *m_out += "</relation>\n";
271 |                 }
272 | 
273 |             }; // class XMLCentersOutputBlock
274 | 
275 |         } // namespace xmlcenters
276 | 
277 |     } // namespace io
278 | 
279 | } // namespace osmium
280 | 


--------------------------------------------------------------------------------
/profiles/auchan_moscow.py:
--------------------------------------------------------------------------------
 1 | # A web page with a list of shops in Moscow. You can replace it with one for another city
 2 | download_url = 'https://www.auchan.ru/ru/moscow/'
 3 | source = 'auchan.ru'
 4 | # Not adding a ref:auchan tag, since we don't have good identifiers
 5 | no_dataset_id = True
 6 | # Using a name query with regular expressions
 7 | query = [('shop', 'supermarket', 'mall'), ('name', '~Ашан|АШАН')]
 8 | master_tags = ('name', 'opening_hours', 'phone', 'website')
 9 | # Empty dict so we don't add a fixme tag to unmatched objects
10 | tag_unmatched = {}
11 | # Coordinates are VERY approximate, so increasing max distance to 1 km
12 | max_distance = 1000
13 | 
14 | # For some reason, functions here cannot use variables defined above
15 | # And defining them as "global" moves these from locals() to globals()
16 | download_url_copy = download_url
17 | def dataset(fileobj):
18 |     def parse_weekdays(s):
19 |         weekdays = {k: v for k, v in map(lambda x: x.split(), 'пн Mo,вт Tu,ср We,чт Th,пт Fr,сб Sa,вс Su'.split(','))}
20 |         s = s.replace(' ', '').lower().replace('c', 'с')
21 |         if s == 'ежедневно' or s == 'пн-вс':
22 |             return ''
23 |         parts = []
24 |         for x in s.split(','):
25 |             p = None
26 |             if x in weekdays:
27 |                 p = weekdays[x]
28 |             elif '-' in x:
29 |                 m = re.match(r'(\w\w)-(\w\w)', x)
30 |                 if m:
31 |                     pts = [weekdays.get(m.group(i), None) for i in (1, 2)]
32 |                     if pts[0] and pts[1]:
33 |                         p = '-'.join(pts)
34 |             if p:
35 |                 parts.append(p)
36 |             else:
37 |                 logging.warning('Could not parse opening hours: %s', s)
38 |                 return None
39 |         return ','.join(parts)
40 | 
41 |     # We are parsing HTML, and for that we need an lxml package
42 |     from lxml import html
43 |     import logging
44 |     import re
45 |     global download_url_copy, re
46 |     h = html.fromstring(fileobj.read().decode('utf-8'))
47 |     shops = h.find_class('shops-in-the-city-holder')[0]
48 |     shops.make_links_absolute(download_url_copy)
49 |     blocks = shops.xpath("//div[@class='mark-box'] | //ul[@class='shops-list']")
50 |     name = None
51 |     RE_GMAPS = re.compile(r'q=(-?[0-9.]+)\+(-?[0-9.]+)$')
52 |     RE_OH = re.compile(r'(Ежедневно|(?:(?:Пн|Вт|Ср|Чт|Пт|Сб|В[сc])[, -]*)+)[ сc:]+(\d\d?[:.]\d\d)[- до]+(\d\d[.:]\d\d)', re.I)
53 |     data = []
54 |     for block in blocks:
55 |         if block.get('class') == 'mark-box':
56 |             name = block.xpath("strong[contains(@class, 'name')]/text()")[0].replace('АШАН', 'Ашан')
57 |             logging.debug('Name: %s', name)
58 |         elif block.get('class') == 'shops-list':
59 |             for li in block:
60 |                 title = li.xpath("strong[@class='title']/a/text()")
61 |                 title = title[0].lower() if title else None
62 |                 website = li.xpath("strong[@class='title']/a/@href")
63 |                 website = website[0] if website else None
64 |                 addr = li.xpath("p[1]/text()")
65 |                 addr = addr[0].strip() if addr else None
66 |                 lat = None
67 |                 lon = None
68 |                 gmapslink = li.xpath(".//a[contains(@href, 'maps.google')]/@href")
69 |                 if gmapslink:
70 |                     m = RE_GMAPS.search(gmapslink[0])
71 |                     if m:
72 |                         lat = float(m.group(1))
73 |                         lon = float(m.group(2))
74 |                 opening_hours = []
75 |                 # Extract opening hours
76 |                 oh = ' '.join(li.xpath("p/text()"))
77 |                 for m in RE_OH.finditer(oh):
78 |                     weekdays = parse_weekdays(m.group(1))
79 |                     if weekdays is not None:
80 |                         opening_hours.append('{}{:0>5s}-{:0>5s}'.format(
81 |                             weekdays + ' ' if weekdays else '', m.group(2).replace('.', ':'), m.group(3).replace('.', ':')))
82 |                 logging.debug('Found title: %s, website: %s, opens: %s, coords: %s, %s', title, website, '; '.join(opening_hours) or None, lat, lon)
83 |                 if lat is not None and name is not None:
84 |                     tags = {
85 |                         'name': name,
86 |                         'brand': 'Auchan',
87 |                         'shop': 'supermarket',
88 |                         'phone': '8-800-700-5-800',
89 |                         'operator': 'ООО «АШАН»',
90 |                         'opening_hours': '; '.join(opening_hours),
91 |                         'addr:full': addr,
92 |                         'website': website
93 |                     }
94 |                     data.append(SourcePoint(title, lat, lon, tags))
95 |     return data
96 | 


--------------------------------------------------------------------------------
/profiles/azbuka.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import conflate
 3 | import requests
 4 | import logging
 5 | import re
 6 | from io import BytesIO
 7 | from yandex_parser import parse_feed
 8 | 
 9 | 
10 | class Profile:
11 |     source = 'Азбука Вкуса'
12 |     dataset_id = 'av'
13 |     query = [('shop', 'convenience', 'supermarket', 'wine', 'alcohol')]
14 |     master_tags = ('operator', 'shop', 'opening_hours', 'name', 'contact:website', 'contact:phone')
15 |     download_url = 'https://av.ru/yandex/supermarket.xml'
16 |     bounded_update = True
17 | 
18 |     def matches(osmtags, avtags):
19 |         if 'Энотека' in avtags['name']:
20 |             return osmtags.get('shop') in ('wine', 'alcohol')
21 |         name = osmtags.get('name')
22 |         if osmtags.get('shop') not in ('convenience', 'supermarket'):
23 |             return False
24 |         if not name or re.search(r'AB|АВ|Азбука|Daily', name, re.I):
25 |             return True
26 |         if name.upper() in ('SPAR', 'СПАР') or 'континент' in name.lower():
27 |             return True
28 |         return False
29 | 
30 |     def dataset(fileobj):
31 |         data = []
32 |         other_urls = [
33 |             None,
34 |             'http://av.ru/yandex/market.xml',
35 |             'http://av.ru/yandex/daily.xml',
36 |             'http://av.ru/yandex/enoteka.xml',
37 |         ]
38 |         for url in other_urls:
39 |             if url:
40 |                 r = requests.get(url)
41 |                 if r.status_code != 200:
42 |                     logging.error('Could not download source data: %s %s', r.status_code, r.text)
43 |                     return None
44 |                 f = BytesIO(r.content)
45 |             else:
46 |                 f = fileobj
47 |             for c in parse_feed(f):
48 |                 name = next(iter(c.name.values()))
49 |                 tags = {
50 |                     'name': name,
51 |                     'operator': 'ООО «Городской супермаркет»',
52 |                     'contact:phone': '; '.join(c.phones) or None,
53 |                     'contact:website': c.url_add,
54 |                     'opening_hours': c.opening_hours,
55 |                 }
56 |                 if 'Энотека' in name:
57 |                     tags['shop'] = 'wine'
58 |                 elif 'Daily' in name:
59 |                     tags['shop'] = 'convenience'
60 |                 else:
61 |                     tags['shop'] = 'supermarket'
62 |                 data.append(conflate.SourcePoint(c.id, c.lat, c.lon, tags))
63 |         return data
64 | 
65 | 
66 | if __name__ == '__main__':
67 |     conflate.run(Profile)
68 | 


--------------------------------------------------------------------------------
/profiles/burgerking.py:
--------------------------------------------------------------------------------
  1 | # Note: the json file at the burgerking website was restructured
  2 | # and does not contain any useful data now.
  3 | # So this profile is here solely for demonstration purposes.
  4 | 
  5 | download_url = 'https://burgerking.ru/restaurant-locations-json-reply-new'
  6 | source = 'Burger King'
  7 | dataset_id = 'burger_king'
  8 | no_dataset_id = True
  9 | query = '[amenity~"cafe|restaurant|fast_food"][name~"burger.*king|бургер.*кинг",i]'
 10 | max_distance = 1000
 11 | overpass_timeout = 1200
 12 | max_request_boxes = 4
 13 | master_tags = ('name', 'amenity', 'name:ru', 'name:en', 'contact:phone', 'opening_hours')
 14 | tag_unmatched = {
 15 |     'fixme': 'Проверить на местности: в данных сайта отсутствует.',
 16 |     'amenity': None,
 17 |     'was:amenity': 'fast_food'
 18 | }
 19 | 
 20 | 
 21 | def dataset(fileobj):
 22 |     def parse_hours(s):
 23 |         global re
 24 |         s = re.sub('^зал:? *', '', s.lower())
 25 |         s = s.replace('<br />', ';').replace('<br>', ';').replace('\n', ';').replace(' ', '').replace(',', ';').replace('–', '-')
 26 |         s = s.replace('-00:', '-24:')
 27 |         weekdays = {k: v for k, v in map(lambda x: x.split(), 'пн Mo,вт Tu,ср We,чт Th,пт Fr,сб Sa,вс Su'.split(','))}
 28 |         if s == 'круглосуточно':
 29 |             return '24/7'
 30 |         parts = s.split(';')
 31 |         WEEKDAY_PATH = '(?:пн|вт|ср|чт|пт|сб|вск?)'
 32 |         result = []
 33 |         found_allweek = False
 34 |         for p in parts:
 35 |             if not p:
 36 |                 continue
 37 |             m = re.match(r'^('+WEEKDAY_PATH+'(?:[-,]'+WEEKDAY_PATH+')*)?с?(\d?\d[:.]\d\d-\d?\d[:.]\d\d)$', p)
 38 |             if not m:
 39 |                 # Disregarding other parts
 40 |                 return None
 41 |             times = re.sub('(^|-)(\d:)', r'\g<1>0\g<2>', m[2].replace('.', ':'))
 42 |             if m[1]:
 43 |                 wd = m[1].replace('вск', 'вс')
 44 |                 for k, v in weekdays.items():
 45 |                     wd = wd.replace(k, v)
 46 |             else:
 47 |                 found_allweek = True
 48 |                 wd = 'Mo-Su'
 49 |             result.append(wd + ' ' + times)
 50 |         if not result or (found_allweek and len(result) > 1):
 51 |             return None
 52 |         return '; '.join(result)
 53 | 
 54 |     def parse_phone(s):
 55 |         s = s.replace('(', '').replace(')', '').replace('-', '')
 56 |         s = s.replace(' доб. ', '-')
 57 |         return s
 58 | 
 59 |     import json
 60 |     import codecs
 61 |     import re
 62 |     notes = {
 63 |         172: 'Подвинуть на второй терминал',
 64 |         25: 'Подвинуть в ЮниМолл',
 65 |         133: 'Передвинуть в Парк №1: https://prnt.sc/gtlwjs',
 66 |         471: 'Передвинуть в ТЦ Балканский 6, самый северный, где кино',
 67 |         234: 'Передвинуть на север, в дом 7',
 68 |         111: 'Сдвинуть в здание',
 69 |         59: 'Сдвинуть в торговый центр севернее',
 70 |         346: 'Передвинуть к кафе',
 71 | 
 72 |     }
 73 |     json_src = codecs.getreader('utf-8')(fileobj).read()
 74 |     p = json_src.find('<div')
 75 |     if p > 0:
 76 |         json_src = json_src[:p]
 77 |     source = json.loads(json_src)
 78 |     data = []
 79 |     for el in source:
 80 |         gid = int(el['origID'])
 81 |         tags = {
 82 |             'amenity': 'fast_food',
 83 |             'name': 'Бургер Кинг',
 84 |             'name:ru': 'Бургер Кинг',
 85 |             'name:en': 'Burger King',
 86 |             'ref': gid,
 87 |             'cuisine': 'burger',
 88 |             'takeaway': 'yes',
 89 |             'wikipedia:brand': 'ru:Burger King',
 90 |             'wikidata:brand': 'Q177054',
 91 |             'contact:website': 'https://burgerking.ru/',
 92 |             'contact:email': el['email'],
 93 |             'contact:phone': parse_phone(el['tel']),
 94 |             'opening_hours': parse_hours(el['opened'])
 95 |         }
 96 |         if gid in notes:
 97 |             tags['fixme'] = notes[gid]
 98 |         if el['is_wifi']:
 99 |             tags['internet_access'] = 'wlan'
100 |             tags['internet_access:fee'] = 'no'
101 |         else:
102 |             tags['internet_access'] = 'no'
103 |         data.append(SourcePoint(gid, float(el['lat']), float(el['lng']), tags))
104 |     return data
105 | 


--------------------------------------------------------------------------------
/profiles/minkult.py:
--------------------------------------------------------------------------------
  1 | source = 'opendata.mkrf.ru'
  2 | dataset_id = 'mkrf_theaters'
  3 | query = [('amenity', 'theatre')]
  4 | max_distance = 300
  5 | master_tags = ('official_name', 'phone', 'opening_hours', 'website')
  6 | 
  7 | 
  8 | # Reading the dataset passport to determine an URL of the latest dataset version
  9 | def download_url():
 10 |     import logging
 11 |     import requests
 12 | 
 13 |     dataset_id = '7705851331-' + (param or 'museums')
 14 |     r = requests.get('http://opendata.mkrf.ru/opendata/{}/meta.json'.format(dataset_id))
 15 |     if r.status_code != 200 or len(r.content) == 0:
 16 |         logging.error('Could not get URL for dataset: %s %s', r.status_code, r.text)
 17 |         logging.error('Please check http://opendata.mkrf.ru/opendata/{}'.format(dataset_id))
 18 |         return None
 19 |     result = r.json()
 20 |     latest = result['data'][-1]
 21 |     logging.info('Downloading %s from %s', result['title'], latest['created'])
 22 |     return latest['source']
 23 | 
 24 | source = 'opendata.mkrf.ru'
 25 | dataset_id = 'mkrf_'+(param or 'museums')
 26 | if not param or param == 'museums':
 27 |     query = [('tourism', 'museum')]
 28 | elif param == 'theaters':
 29 |     query = [('amenity', 'theatre')]
 30 | elif param == 'circuses':
 31 |     query = [('amenity', 'circus')]
 32 | elif param == 'philharmonic':
 33 |     query = [('amenity', 'theatre')]
 34 | else:
 35 |     raise ValueError('Unknown param value: {}'.format(param))
 36 | 
 37 | max_distance = 300
 38 | master_tags = ('official_name', 'phone', 'opening_hours', 'website')
 39 | 
 40 | 
 41 | def dataset(fileobj):
 42 |     import json
 43 |     import codecs
 44 | 
 45 |     def make_wd_ranges(r):
 46 |         """Converts e.g. [0,1,4] into 'Mo-Tu, Fr'."""
 47 |         wd = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su']
 48 |         res = wd[r[0]]
 49 |         in_range = False
 50 |         for i in range(1, len(r)+1):
 51 |             if i < len(r) and r[i] == r[i-1] + 1:
 52 |                 in_range = True
 53 |             else:
 54 |                 if in_range:
 55 |                     res += '-' + wd[r[i-1]]
 56 |                     in_range = False
 57 |                 if i < len(r):
 58 |                     res += ', ' + wd[r[i]]
 59 |         return res
 60 | 
 61 |     def parse_hours(h):
 62 |         """Receives a dict {'0': {'from': '10:00:00', 'to': '18:00:00'}, ...}
 63 |         and returns a proper opening_hours value."""
 64 |         days = {}
 65 |         for wd, d in h.items():
 66 |             if not d['from']:
 67 |                 continue
 68 |             for i in ('from', 'to'):
 69 |                 d[i] = d[i][:5]
 70 |             if d['to'] == '00:00':
 71 |                 d['to'] = '24:00'
 72 |             elif not d['to']:
 73 |                 d['to'] = '19:00+'
 74 |             k = '{}-{}'.format(d['from'], d['to'])
 75 |             if k not in days:
 76 |                 days[k] = set()
 77 |             days[k].add(int(wd))
 78 |         days2 = {}
 79 |         for op, d in days.items():
 80 |             days2[tuple(sorted(d))] = op
 81 |         res = []
 82 |         for d in sorted(days2.keys(), key=lambda x: min(x)):
 83 |             res.append(' '.join([make_wd_ranges(d), days2[d]]))
 84 |         return '; '.join(res)
 85 | 
 86 |     def wrap(coord, absmax):
 87 |         if coord < -absmax:
 88 |             return coord + absmax * 2
 89 |         if coord > absmax:
 90 |             return coord - absmax * 2
 91 |         return coord
 92 | 
 93 |     def format_phone(ph):
 94 |         if ph and len(ph) == 11 and ph[0] == '7':
 95 |             return '+7 {} {}-{}-{}'.format(ph[1:4], ph[4:7], ph[7:9], ph[9:])
 96 |         return ph
 97 | 
 98 |     source = json.load(codecs.getreader('utf-8')(fileobj))
 99 |     data = []
100 |     for el in source:
101 |         d = el['data']['general']
102 |         gid = d['id']
103 |         lon = wrap(d['address']['mapPosition']['coordinates'][1], 180)
104 |         lat = d['address']['mapPosition']['coordinates'][0]
105 |         tags = {
106 |             'amenity': 'theatre',
107 |             'name': d['name'],
108 |             # 'official_name': d['name'],
109 |             # 'image': d['image']['url'],
110 |             'operator': d['organization']['name'],
111 |             'addr:full': '{}, {}'.format(d['locale']['name'], d['address']['street']),
112 |         }
113 |         if tags['operator'] == tags['name']:
114 |             del tags['operator']
115 |         if d.get('workingSchedule'):
116 |             tags['opening_hours'] = parse_hours(d['workingSchedule'])
117 |         if 'email' in d['contacts']:
118 |             tags['email'] = d['contacts']['email']
119 |         if 'website' in d['contacts']:
120 |             tags['website'] = d['contacts']['website']
121 |             if tags['website'].endswith('.ru'):
122 |                 tags['website'] += '/'
123 |         if 'phones' in d['contacts'] and d['contacts']['phones']:
124 |             tags['phone'] = format_phone(d['contacts']['phones'][0]['value'])
125 |         data.append(SourcePoint(gid, lat, lon, tags))
126 |     return data
127 | 


--------------------------------------------------------------------------------
/profiles/moscow_addr.py:
--------------------------------------------------------------------------------
  1 | source = 'dit.mos.ru'
  2 | no_dataset_id = True
  3 | query = [('building',)]
  4 | max_distance = 50
  5 | max_request_boxes = 2
  6 | master_tags = ('addr:housenumber', 'addr:street')
  7 | 
  8 | COMPLEX = False
  9 | ADMS = {
 10 |     '1': 'Северо-Западный административный округ',
 11 |     '2': 'Северный административный округ',
 12 |     '3': 'Северо-Восточный административный округ',
 13 |     '4': 'Западный административный округ',
 14 |     '5': 'Центральный административный округ',
 15 |     '6': 'Восточный административный округ',
 16 |     '7': 'Юго-Западный административный округ',
 17 |     '8': 'Южный административный округ',
 18 |     '9': 'Юго-Восточный административный округ',
 19 |     '10': 'Зеленоградский административный округ',
 20 |     '11': 'Троицкий административный округ',
 21 |     '12': 'Новомосковский административный округ',
 22 | }
 23 | ADM = ADMS['2']
 24 | if param:
 25 |     if param[0] == 'c':
 26 |         COMPLEX = True
 27 |         param = param[1:]
 28 |     if param in ADMS:
 29 |         ADM = ADMS[param]
 30 |     if param == '5':
 31 |         query = [[('addr:housenumber',)], [('building',)]]
 32 | 
 33 | 
 34 | def dataset(fileobj):
 35 |     import zipfile
 36 |     import json
 37 |     import logging
 38 |     global COMPLEX, ADM
 39 | 
 40 |     def find_center(geodata):
 41 |         if not geodata:
 42 |             return None
 43 |         if 'center' in geodata:
 44 |             return geodata['center'][0]
 45 |         if 'coordinates' in geodata:
 46 |             typ = geodata['type']
 47 |             lonlat = [0, 0]
 48 |             cnt = 0
 49 |             if typ == 'Polygon':
 50 |                 for p in geodata['coordinates'][0]:
 51 |                     lonlat[0] += p[0]
 52 |                     lonlat[1] += p[1]
 53 |                     cnt += 1
 54 |             elif typ == 'LineString':
 55 |                 for p in geodata['coordinates']:
 56 |                     lonlat[0] += p[0]
 57 |                     lonlat[1] += p[1]
 58 |                     cnt += 1
 59 |             elif typ == 'Point':
 60 |                 p = geodata['coordinates']
 61 |                 lonlat[0] += p[0]
 62 |                 lonlat[1] += p[1]
 63 |                 cnt += 1
 64 |             if cnt > 0:
 65 |                 return [lonlat[0]/cnt, lonlat[1]/cnt]
 66 |         return None
 67 | 
 68 |     logging.info('Экспортируем %s (%s)', ADM, 'строения' if COMPLEX else 'без строений')
 69 |     zf = zipfile.ZipFile(fileobj)
 70 |     data = []
 71 |     no_geodata = 0
 72 |     no_addr = 0
 73 |     count = 0
 74 |     for zname in zf.namelist():
 75 |         source = json.loads(zf.read(zname).decode('cp1251'))
 76 |         for el in source:
 77 |             gid = el['global_id']
 78 |             try:
 79 |                 adm_area = el['ADM_AREA']
 80 |                 if adm_area != ADM:
 81 |                     continue
 82 |                 count += 1
 83 |                 lonlat = find_center(el.get('geoData'))
 84 |                 if not lonlat:
 85 |                     no_geodata += 1
 86 |                 street = el.get('P7')
 87 |                 house = el.get('L1_VALUE')
 88 |                 htype = el.get('L1_TYPE')
 89 |                 corpus = el.get('L2_VALUE')
 90 |                 ctype = el.get('L2_TYPE')
 91 |                 stroenie = el.get('L3_VALUE')
 92 |                 stype = el.get('L3_TYPE')
 93 |                 if not street or not house or 'Б/Н' in house:
 94 |                     no_addr += 1
 95 |                     continue
 96 |                 if not lonlat:
 97 |                     continue
 98 |                 is_complex = False
 99 |                 housenumber = house.replace(' ', '')
100 |                 if htype != 'дом':
101 |                     is_complex = True
102 |                     if htype in ('владение', 'домовладение'):
103 |                         housenumber = 'вл' + housenumber
104 |                     else:
105 |                         logging.warn('Unknown house number type: %s', htype)
106 |                         continue
107 |                 if corpus:
108 |                     if ctype == 'корпус':
109 |                         housenumber += ' к{}'.format(corpus)
110 |                     else:
111 |                         logging.warn('Unknown corpus type: %s', ctype)
112 |                         continue
113 |                 if stroenie:
114 |                     is_complex = True
115 |                     if stype == 'строение' or stype == 'сооружение':
116 |                         housenumber += ' с{}'.format(stroenie)
117 |                     else:
118 |                         logging.warn('Unknown stroenie type: %s', stype)
119 |                         continue
120 |                 if is_complex != COMPLEX:
121 |                     continue
122 |                 tags = {
123 |                     'addr:street': street,
124 |                     'addr:housenumber': housenumber,
125 |                 }
126 |                 data.append(SourcePoint(gid, lonlat[1], lonlat[0], tags))
127 |             except Exception as e:
128 |                 logging.warning('PROFILE: Failed to get attributes for address %s: %s', gid, str(e))
129 |                 logging.warning(json.dumps(el, ensure_ascii=False))
130 | 
131 |     if no_addr + no_geodata > 0:
132 |         logging.warning('%.2f%% of data have no centers, and %.2f%% have no streets or house numbers',
133 |                         100*no_geodata/count, 100*no_addr/count)
134 |     return data
135 | 


--------------------------------------------------------------------------------
/profiles/moscow_parkomats.py:
--------------------------------------------------------------------------------
 1 | # What will be put into "source" tags. Lower case please
 2 | source = 'dit.mos.ru'
 3 | # A fairly unique id of the dataset to query OSM, used for "ref:mos_parking" tags
 4 | # If you omit it, set explicitly "no_dataset_id = True"
 5 | dataset_id = 'mos_parking'
 6 | # Tags for querying with overpass api
 7 | query = [('amenity', 'vending_machine'), ('vending', 'parking_tickets')]
 8 | # Use bbox from dataset points (default). False = query whole world, [minlat, minlon, maxlat, maxlon] to override
 9 | bbox = True
10 | # How close OSM point should be to register a match, in meters. Default is 100
11 | max_distance = 30
12 | # Delete objects that match query tags but not dataset? False is the default
13 | delete_unmatched = False
14 | # If set, and delete_unmatched is False, modify tags on unmatched objects instead
15 | # Always used for area features, since these are not deleted
16 | tag_unmatched = {
17 |     'fixme': 'Проверить на местности: в данных ДИТ отсутствует. Вероятно, демонтирован',
18 |     'amenity': None,
19 |     'was:amenity': 'vending_machine'
20 | }
21 | # Actually, after the initial upload we should not touch any existing non-matched objects
22 | tag_unmatched = None
23 | # A set of authoritative tags to replace on matched objects
24 | master_tags = ('zone:parking', 'ref', 'contact:phone', 'contact:website', 'operator')
25 | 
26 | 
27 | def download_url(mos_dataset_id=1421):
28 |     import requests
29 |     import logging
30 |     r = requests.get('https://data.mos.ru/api/datasets/expformats/?datasetId={}'.format(mos_dataset_id))
31 |     if r.status_code != 200 or len(r.content) == 0:
32 |         logging.error('Could not get URL for dataset: %s %s', r.status_code, r.text)
33 |         logging.error('Please check http://data.mos.ru/opendata/{}/passport'.format(mos_dataset_id))
34 |         return None
35 |     url = [x for x in r.json() if x['Format'] == 'json'][0]
36 |     version = '?'
37 |     title = 'dataset'
38 |     r = requests.get('https://data.mos.ru/apiproxy/opendata/{}/meta.json'.format(mos_dataset_id))
39 |     if r.status_code == 200:
40 |         title = r.json()['Title']
41 |         version = r.json()['VersionNumber']
42 |     logging.info('Downloading %s %s from %s', title, version, url['GenerationStart'])
43 |     return 'https://op.mos.ru/EHDWSREST/catalog/export/get?id=' + url['EhdId']
44 | 
45 | 
46 | # A list of SourcePoint objects. Initialize with (id, lat, lon, {tags}).
47 | def dataset(fileobj):
48 |     import json
49 |     import logging
50 |     import zipfile
51 |     import re
52 |     zf = zipfile.ZipFile(fileobj)
53 |     source = json.loads(zf.read(zf.namelist()[0]).decode('cp1251'))
54 |     RE_NUM4 = re.compile(r'\d{4,6}')
55 |     data = []
56 |     for el in source:
57 |         try:
58 |             gid = el['global_id']
59 |             zone = el['ParkingZoneNumber']
60 |             lon = el['Longitude_WGS84']
61 |             lat = el['Latitude_WGS84']
62 |             pnum = el['NumberOfParkingMeter']
63 |             tags = {
64 |                 'amenity': 'vending_machine',
65 |                 'vending': 'parking_tickets',
66 |                 'zone:parking': zone,
67 |                 'contact:phone': '+7 495 539-54-54',
68 |                 'contact:website': 'http://parking.mos.ru/',
69 |                 'opening_hours': '24/7',
70 |                 'operator': 'ГКУ «Администратор Московского парковочного пространства»',
71 |                 'payment:cash': 'no',
72 |                 'payment:credit_cards': 'yes',
73 |                 'payment:debit_cards': 'yes'
74 |             }
75 |             try:
76 |                 lat = float(lat)
77 |                 lon = float(lon)
78 |                 tags['ref'] = RE_NUM4.search(pnum).group(0)
79 |                 data.append(SourcePoint(gid, lat, lon, tags))
80 |             except Exception as e:
81 |                 logging.warning('PROFILE: Failed to parse lat/lon/ref for parking meter %s: %s', gid, str(e))
82 |         except Exception as e:
83 |             logging.warning('PROFILE: Failed to get attributes for parking meter: %s', str(e))
84 |     return data
85 | 


--------------------------------------------------------------------------------
/profiles/navads_shell.py:
--------------------------------------------------------------------------------
 1 | # This profile reads a prepared JSON, thus no "dataset" function
 2 | 
 3 | # Value for the changeset "source" tag
 4 | source = 'Navads'
 5 | # Keeping identifiers in a "ref:navads_shell" tag
 6 | dataset_id = 'navads_shell'
 7 | # Overpass API query is a simple [amenity="fuel"]
 8 | query = [('amenity', 'fuel')]
 9 | # These tag values override values on OSM objects
10 | master_tags = ('brand', 'addr:postcode', 'phone', 'opening_hours')
11 | # Looking at most 50 meters around a dataset point
12 | max_distance = 50
13 | 
14 | 
15 | def format_phone(ph):
16 |     if ph and len(ph) == 13 and ph[:3] == '+44':
17 |         if (ph[3] == '1' and ph[4] != '1' and ph[5] != '1') or ph[3:7] == '7624':
18 |             return ' '.join([ph[:3], ph[3:7], ph[7:]])
19 |         elif ph[3] in ('1', '3', '8', '9'):
20 |             return ' '.join([ph[:3], ph[3:6], ph[6:9], ph[9:]])
21 |         else:
22 |             return ' '.join([ph[:3], ph[3:5], ph[5:9], ph[9:]])
23 |     return ph
24 | 
25 | 
26 | # Tag transformation
27 | transform = {
28 |     # Just add this tag
29 |     'amenity': 'fuel',
30 |     # Rename key
31 |     'postal_code': '>addr:postcode',
32 |     # Use a function to transform a value
33 |     'phone': format_phone,
34 |     # Remove this tag
35 |     'name': '-'
36 | }
37 | 
38 | # Example JSON line:
39 | #
40 | # {
41 | #   "id": "NVDS298-10018804",
42 | #   "lat": 51.142491,
43 | #   "lon": -0.074893,
44 | #   "tags": {
45 | #     "name": "Shell",
46 | #     "brand": "Shell",
47 | #     "addr:street": "Snow Hill",
48 | #     "postal_code": "RH10 3EQ",
49 | #     "addr:city": "Crawley",
50 | #     "phone": "+441342718750",
51 | #     "website": "http://www.shell.co.uk",
52 | #     "operator": "Shell",
53 | #     "opening_hours": "24/7",
54 | #     "amenity": "fuel"
55 | #   }
56 | # }
57 | 


--------------------------------------------------------------------------------
/profiles/navads_shell_json.py:
--------------------------------------------------------------------------------
  1 | source = 'Navads'
  2 | dataset_id = 'navads_shell'
  3 | query = [('amenity', 'fuel')]
  4 | master_tags = ('brand', 'phone', 'opening_hours')
  5 | max_distance = 50
  6 | max_request_boxes = 3
  7 | 
  8 | 
  9 | def dataset(fileobj):
 10 |     import json
 11 |     import codecs
 12 |     import re
 13 |     from collections import defaultdict
 14 | 
 15 |     def format_phone(ph):
 16 |         if ph and len(ph) == 13 and ph[:3] == '+44':
 17 |             if (ph[3] == '1' and ph[4] != '1' and ph[5] != '1') or ph[3:7] == '7624':
 18 |                 return ' '.join([ph[:3], ph[3:7], ph[7:]])
 19 |             elif ph[3] in ('1', '3', '8', '9'):
 20 |                 return ' '.join([ph[:3], ph[3:6], ph[6:9], ph[9:]])
 21 |             else:
 22 |                 return ' '.join([ph[:3], ph[3:5], ph[5:9], ph[9:]])
 23 |         return ph
 24 | 
 25 |     def make_wd_ranges(r):
 26 |         wd = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su']
 27 |         res = wd[r[0]]
 28 |         in_range = False
 29 |         for i in range(1, len(r)+1):
 30 |             if i < len(r) and r[i] == r[i-1] + 1:
 31 |                 in_range = True
 32 |             else:
 33 |                 if in_range:
 34 |                     res += '-' + wd[r[i-1]]
 35 |                     in_range = False
 36 |                 if i < len(r):
 37 |                     res += ',' + wd[r[i]]
 38 |         return res
 39 | 
 40 |     def parse_hours(h):
 41 |         if not h:
 42 |             return None
 43 |         WD = {x: i for i, x in enumerate([
 44 |             'MONDAY', 'TUESDAY', 'WEDNESDAY', 'THURSDAY', 'FRIDAY', 'SATURDAY', 'SUNDAY'
 45 |         ])}
 46 |         days = defaultdict(list)
 47 |         for d in h.split(';'):
 48 |             parts = re.findall(r'([A-Z]+)=([0-9:-]+)', d)
 49 |             if len(set([p[0] for p in parts])) != 1:
 50 |                 raise Exception('Parts format fail: {}'.format(d))
 51 |             days[','.join([p[1] for p in parts])].append(WD[parts[0][0]])
 52 |         res = []
 53 |         for time, wd in sorted(days.items(), key=lambda x: min(x[1])):
 54 |             res.append(' '.join([make_wd_ranges(wd), time]))
 55 |         if res[0] == 'Mo-Su 00:00-23:59':
 56 |             return '24/7'
 57 |         return '; '.join(res).replace('23:59', '24:00')
 58 | 
 59 |     global re, defaultdict
 60 |     source = json.load(codecs.getreader('utf-8-sig')(fileobj))
 61 |     data = []
 62 |     for el in source['Locations']:
 63 |         if not el['location']:
 64 |             continue
 65 |         coords = [float(x) for x in el['location'].split(',')]
 66 |         tags = {
 67 |             'amenity': 'fuel',
 68 |             'brand': el['name'],
 69 |             'addr:postcode': el['address_zip'] or None,
 70 |             'phone': format_phone('+'+str(el['phone'])),
 71 |             'opening_hours': parse_hours(el['daily_hours']),
 72 |         }
 73 |         if (el['address_street'] and el['address_number'] and
 74 |                 not re.search(r'^([ABCDM]\d+|Junction)', el['address_street']) and
 75 |                 'Ln' not in el['address_street'] and 'A' not in el['address_number']):
 76 |             tags['addr:street'] = el['address_street']
 77 |             tags['addr:housenumber'] = el['address_number']
 78 |         data.append(SourcePoint(el['place_id'], coords[0], coords[1], tags))
 79 |     return data
 80 | 
 81 | 
 82 | # Example line of the source JSON:
 83 | #
 84 | # {
 85 | #   "place_id": "NVDS353-10019224",
 86 | #   "name": "Shell",
 87 | #   "category": "GAS_STATION",
 88 | #   "location": "54.978366,-1.57441",
 89 | #   "description": "",
 90 | #   "phone": 441912767084,
 91 | #   "address_street": "Shields Road",
 92 | #   "address_number": "308",
 93 | #   "address_city": "Newcastle-Upon-Tyne",
 94 | #   "address_zip": "NE6 2UU",
 95 | #   "address_country": "GB",
 96 | #   "website": "http://www.shell.co.uk/motorist/station-locator.html?id=10019224&modeselected=true",
 97 | #   "daily_hours": "MONDAY=00:00-23:59;TUESDAY=00:00-23:59;WEDNESDAY=00:00-23:59;THURSDAY=00:00-23:59;FRIDAY=00:00-23:59;SATURDAY=00:00-23:59;SUNDAY=00:00-23:59",
 98 | #   "brand": "Shell",
 99 | #   "is_deleted": false
100 | # },
101 | 


--------------------------------------------------------------------------------
/profiles/rosinter.py:
--------------------------------------------------------------------------------
 1 | download_url = 'http://www.rosinter.ru/locator/RestaurantsFeed.aspx?city=all&location=&lang=ru&brand=all&cuisine=all&metro=&hasDelivery=&isCorporate='
 2 | source = 'Rosinter'
 3 | no_dataset_id = True
 4 | max_distance = 500
 5 | query = [('amenity', 'restaurant', 'cafe', 'bar', 'pub', 'fast_food')]
 6 | overpass_timeout = 1000
 7 | duplicate_distance = -1
 8 | nearest_points = 30
 9 | master_tags = ('name', 'phone', 'amenity')
10 | 
11 | types = {
12 |     # substr: osm_substr, amenity, cuisine
13 |     'Costa': ['costa', 'cafe', 'coffee_shop'],
14 |     'IL': [('patio', 'патио'), 'restaurant', 'italian'],
15 |     'TGI': [('tgi', 'friday'), 'restaurant', 'american'],
16 |     'Бар и': ['гриль', 'restaurant', 'american'],
17 |     'Макд': ['мак', 'fast_food', None],
18 |     'Раша': ['мама', 'fast_food', 'russian'],
19 |     'Планета': ['планета', 'restaurant', 'japanese'],
20 |     'Шика': ['шика', 'restaurant', 'asian'],
21 |     'Свои': ['сво', 'restaurant', None],
22 | }
23 | 
24 | 
25 | def matches(osmtags, ritags):
26 |     global types
27 |     rname = ritags['name']
28 |     name = osmtags.get('name', '').lower()
29 |     for k, v in types.items():
30 |         if k in rname:
31 |             if isinstance(v[0], str):
32 |                 return v[0] in name
33 |             for n in v[0]:
34 |                 if n in name:
35 |                     return True
36 |             return False
37 |     logging.error('Unknown rname value: %s', rname)
38 |     return False
39 | 
40 | 
41 | def dataset(f):
42 |     global types
43 |     from lxml import etree
44 |     root = etree.parse(f).getroot()
45 |     for el in root.find('Restaurants'):
46 |         rid = el.find('id').text
47 |         city = el.find('city').text
48 |         if city in ('Прага', 'Будапешт', 'Варшава', 'Баку', 'Рига'):
49 |             continue
50 |         brand = el.find('brand').text
51 |         if 'TGI' in brand:
52 |             brand = 'TGI Fridays'
53 |         elif 'СВОИ' in brand:
54 |             brand = 'Свои'
55 |         phone = el.find('telephone').text
56 |         if phone:
57 |             phone = phone.replace('(', '').replace(')', '')
58 |         website = el.find('siteurl').text
59 |         if website and 'il-patio' in website:
60 |             website = 'http://ilpatio.ru'
61 |         if 'Свои' in brand:
62 |             website = 'http://restoransvoi.by'
63 |         lat = float(el.find('latitude').text)
64 |         lon = float(el.find('longitude').text)
65 |         tags = {
66 |             'amenity': 'restaurant',
67 |             'name': brand,
68 |             'phone': phone,
69 |             'website': website,
70 |         }
71 |         address = el.find('address').text
72 |         for k, v in types.items():
73 |             if k in brand:
74 |                 tags['amenity'] = v[1]
75 |                 tags['cuisine'] = v[2]
76 |         yield SourcePoint(
77 |             rid, lat, lon, tags,
78 |             remarks='Обязательно подвиньте точку!\nАдрес: ' + str(address))
79 | 


--------------------------------------------------------------------------------
/profiles/schocoladnitsa.py:
--------------------------------------------------------------------------------
  1 | download_url = 'http://new.shoko.ru/addresses/'
  2 | source = 'Шоколадница'
  3 | no_dataset_id = True
  4 | overpass_timeout = 600
  5 | max_distance = 250
  6 | max_request_boxes = 6
  7 | query = [('amenity',), ('name', '~Шоколадница')]
  8 | master_tags = ['amenity', 'name', 'name:ru', 'name:en', 'website', 'phone', 'opening_hours']
  9 | 
 10 | 
 11 | def dataset(fileobj):
 12 |     def parse_oh(s):
 13 |         if not s:
 14 |             return None
 15 |         olds = s
 16 |         if s.strip().lower() == 'круглосуточно':
 17 |             return '24/7'
 18 |         trans = {
 19 |             'будни': 'Mo-Fr',
 20 |             'суббота': 'Sa',
 21 |             'воскресенье': 'Su',
 22 |             'ежедневно': 'Mo-Su',
 23 |             'выходные': 'Sa-Su',
 24 |             'восерсенье': 'Su',
 25 |             'ежеденевно': 'Mo-Su',
 26 |             'пн-чтивс': 'Mo-Th,Su',
 27 |             'пн-чт,вс': 'Mo-Th,Su',
 28 |             'пт.-сб': 'Fr-Sa',
 29 |             'вск.-чт': 'Su-Th',
 30 |             'смаяпооктябрь': 'May-Oct',
 31 |             'ч.смаяпооктябрь': 'May-Oct',
 32 |             'сентября': 'May-Sep',
 33 |         }
 34 |         weekdays = {'пн': 'Mo', 'вт': 'Tu', 'ср': 'We', 'чт': 'Th', 'пт': 'Fr', 'сб': 'Sa', 'вс': 'Su'}
 35 |         if s == 'с 10 до 22' or s == 'с 10.00-22.00':
 36 |             s = '10:00 - 22:00'
 37 |         s = s.replace('круглосуточно', '00:00-24:00')
 38 |         s = s.replace('23,', '23:00')
 39 |         parts = []
 40 |         for m in re.finditer(r'([а-яА-Я ,.:\(\)-]+?)?(?:\sс)?\s*(\d?\d[:.]\d\d)(?: до |[^\w\d]+)(\d\d[:.]\d\d)', s):
 41 |             days = (m[1] or '').strip(' -.,:()').lower().replace(' ', '')
 42 |             m2 = re.match(r'^([б-ч]{2})\s?[,и-]\s?([б-ч]{2})$', days)
 43 |             if not days:
 44 |                 days = 'Mo-Su'
 45 |             elif days in weekdays:
 46 |                 days = weekdays[days]
 47 |             elif m2 and m2[1] in weekdays and m2[2] in weekdays:
 48 |                 days = weekdays[m2[1]] + '-' + weekdays[m2[2]]
 49 |             else:
 50 |                 if days not in trans:
 51 |                     logging.warn('Unknown days: %s', days)
 52 |                     continue
 53 |                 days = trans[days]
 54 |             parts.append('{} {:0>5}-{}'.format(days, m[2].replace('.', ':'), m[3].replace('.', ':')))
 55 |         # logging.info('%s -> %s', olds, '; '.join(parts))
 56 |         if parts:
 57 |             return '; '.join(parts)
 58 |         return None
 59 | 
 60 |     from lxml import html
 61 |     import re
 62 |     import logging
 63 |     import phonenumbers
 64 |     h = html.fromstring(fileobj.read().decode('utf-8'))
 65 |     markers = h.get_element_by_id('markers')
 66 |     i = 0
 67 |     for m in markers:
 68 |         lat = m.get('data-lat')
 69 |         lon = m.get('data-lng')
 70 |         if not lat or not lon:
 71 |             continue
 72 |         oh = parse_oh(m.get('data-time'))
 73 |         phone = m.get('data-phone')
 74 |         if phone[:3] == '812':
 75 |             phone = '+7' + phone
 76 |         if ' 891' in phone:
 77 |             phone = phone[:phone.index(' 891')]
 78 |         if ' 8-91' in phone:
 79 |             phone = phone[:phone.index(' 8-91')]
 80 |         try:
 81 |             if phone == 'отключен' or not phone:
 82 |                 phone = None
 83 |             else:
 84 |                 parsed_phone = phonenumbers.parse(phone.replace(';', ',').split(',')[0], "RU")
 85 |         except:
 86 |             logging.info(phone)
 87 |             raise
 88 |         if phone is None:
 89 |             fphone = None
 90 |         else:
 91 |             fphone = phonenumbers.format_number(
 92 |                 parsed_phone, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
 93 |         tags = {
 94 |             'amenity': 'cafe',
 95 |             'name': 'Шоколадница',
 96 |             'name:ru': 'Шоколадница',
 97 |             'name:en': 'Shokoladnitsa',
 98 |             'website': 'http://shoko.ru',
 99 |             'cuisine': 'coffee_shop',
100 |             'phone': fphone,
101 |             'opening_hours': oh
102 |         }
103 |         i += 1
104 |         yield SourcePoint(i, float(lat), float(lon), tags, remarks=m.get('data-title'))
105 | 


--------------------------------------------------------------------------------
/profiles/velobike.py:
--------------------------------------------------------------------------------
 1 | # Where to get the latest feed
 2 | download_url = 'http://www.velobike.ru/proxy/parkings/'
 3 | # What to write for the changeset's source tag
 4 | source = 'velobike.ru'
 5 | # These two lines negate each other:
 6 | dataset_id = 'velobike'
 7 | # We actually do not use ref:velobike tag
 8 | no_dataset_id = True
 9 | # Overpass API query: [amenity="bicycle_rental"][network="Велобайк"]
10 | query = [('amenity', 'bicycle_rental'), ('network', 'Велобайк')]
11 | # Maximum lookup radius is 100 meters
12 | max_distance = 100
13 | # The overpass query chooses all relevant points,
14 | # so points that are not in the dataset should be deleted
15 | delete_unmatched = True
16 | # If delete_unmatched were False, we'd be retagging these parkings:
17 | tag_unmatched = {
18 |     'fixme': 'Проверить на местности: в данных велобайка отсутствует. Вероятно, демонтирована',
19 |     'amenity': None,
20 |     'was:amenity': 'bicycle_rental'
21 | }
22 | # Overwriting these tags
23 | master_tags = ('ref', 'capacity', 'capacity:electric', 'contact:email',
24 |                'contact:phone', 'contact:website', 'operator')
25 | 
26 | 
27 | def dataset(fileobj):
28 |     import codecs
29 |     import json
30 |     import logging
31 | 
32 |     # Specifying utf-8 is important, otherwise you'd get "bytes" instead of "str"
33 |     source = json.load(codecs.getreader('utf-8')(fileobj))
34 |     data = []
35 |     for el in source['Items']:
36 |         try:
37 |             gid = int(el['Id'])
38 |             lon = el['Position']['Lon']
39 |             lat = el['Position']['Lat']
40 |             terminal = 'yes' if el['HasTerminal'] else 'no'
41 |             tags = {
42 |                 'amenity': 'bicycle_rental',
43 |                 'network': 'Велобайк',
44 |                 'ref': gid,
45 |                 'capacity': el['TotalOrdinaryPlaces'],
46 |                 'capacity:electric': el['TotalElectricPlaces'],
47 |                 'contact:email': 'info@velobike.ru',
48 |                 'contact:phone': '+7 495 966-46-69',
49 |                 'contact:website': 'https://velobike.ru/',
50 |                 'opening_hours': '24/7',
51 |                 'operator': 'ЗАО «СитиБайк»',
52 |                 'payment:cash': 'no',
53 |                 'payment:troika': 'no',
54 |                 'payment:mastercard': terminal,
55 |                 'payment:visa': terminal,
56 |             }
57 |             try:
58 |                 lat = float(lat)
59 |                 lon = float(lon)
60 |                 data.append(SourcePoint(gid, lat, lon, tags))
61 |             except Exception as e:
62 |                 logging.warning('PROFILE: Failed to parse lat/lon for rental stand %s: %s', gid, str(e))
63 |         except Exception as e:
64 |             logging.warning('PROFILE: Failed to get attributes for rental stand: %s', str(e))
65 |     return data
66 | 


--------------------------------------------------------------------------------
/profiles/yandex_parser.py:
--------------------------------------------------------------------------------
  1 | from lxml import etree
  2 | import logging
  3 | import re
  4 | import phonenumbers  # https://pypi.python.org/pypi/phonenumberslite
  5 | 
  6 | 
  7 | class Company:
  8 |     def __init__(self, cid):
  9 |         self.id = cid
 10 |         self.name = {}
 11 |         self.alt_name = {}
 12 |         self.address = {}
 13 |         self.country = {}
 14 |         self.address_add = {}
 15 |         self.opening_hours = None
 16 |         self.url = None
 17 |         self.url_add = None
 18 |         self.url_ext = None
 19 |         self.email = None
 20 |         self.rubric = []
 21 |         self.phones = []
 22 |         self.faxes = []
 23 |         self.photos = []
 24 |         self.lat = None
 25 |         self.lon = None
 26 |         self.other = {}
 27 | 
 28 | 
 29 | def parse_feed(f):
 30 |     def multilang(c, name):
 31 |         for el in company.findall(name):
 32 |             lang = el.get('lang', 'default')
 33 |             value = el.text
 34 |             if value and len(value.strip()) > 0:
 35 |                 c[lang] = value.strip()
 36 | 
 37 |     def parse_subels(el):
 38 |         res = {}
 39 |         if el is None:
 40 |             return res
 41 |         for subel in el:
 42 |             name = subel.tag
 43 |             text = subel.text
 44 |             if text and text.strip():
 45 |                 res[name] = text
 46 |         return res
 47 | 
 48 |     def parse_opening_hours(s):
 49 |         if 'углосуточн' in s:
 50 |             return '24/7'
 51 |         m = re.search(r'([01]?\d:\d\d).*?([12]?\d:\d\d)', s)
 52 |         if m:
 53 |             # TODO: parse weekdays
 54 |             start = m.group(1)
 55 |             start = re.sub(r'^(\d:)', r'0\1', start)
 56 |             end = m.group(2)
 57 |             end = re.sub(r'0?0:', '24:', end)
 58 |             return 'Mo-Su {}-{}'.format(start, end)
 59 |         # TODO
 60 |         return None
 61 | 
 62 |     xml = etree.parse(f).getroot()
 63 |     if xml.tag != 'companies':
 64 |         logging.error('Root node must be named "companies", not %s', xml.tag)
 65 |     for company in xml:
 66 |         if company.tag != 'company':
 67 |             logging.warn('Non-company in yandex xml: %s', company.tag)
 68 |             continue
 69 |         cid = company.find('company-id')
 70 |         if cid is None or not cid.text:
 71 |             logging.error('No id for a company')
 72 |             continue
 73 |         c = Company(cid.text.strip())
 74 |         multilang(c.name, 'name')
 75 |         multilang(c.alt_name, 'name-other')
 76 |         multilang(c.address, 'address')
 77 |         loc = {}
 78 |         multilang(loc, 'locality-name')
 79 |         if loc:
 80 |             for lng, place in loc.items():
 81 |                 if lng in c.address:
 82 |                     c.address = place + ', ' + c.address
 83 |         multilang(c.address_add, 'address-add')
 84 |         multilang(c.country, 'country')
 85 |         coord = parse_subels(company.find('coordinates'))
 86 |         if 'lat' in coord and 'lon' in coord:
 87 |             c.lat = float(coord['lat'])
 88 |             c.lon = float(coord['lon'])
 89 |         else:
 90 |             logging.warn('No coordinates for %s', c.id)
 91 |             continue
 92 |         for ph in company.findall('phone'):
 93 |             phone = parse_subels(ph)
 94 |             if 'number' not in phone:
 95 |                 continue
 96 |             parsed_phone = phonenumbers.parse(phone['number'], 'RU')
 97 |             number = phonenumbers.format_number(
 98 |                 parsed_phone, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
 99 |             if 'ext' in phone:
100 |                 number += ' ext. ' + phone['ext']
101 |             typ = phone.get('type', 'phone')
102 |             if typ == 'fax':
103 |                 c.faxes.append(number)
104 |             else:
105 |                 c.phones.append(number)
106 |         email = company.find('email')
107 |         if email is not None and email.text:
108 |             c.email = email.text.strip()
109 |         url = company.find('url')
110 |         if url is not None and url.text:
111 |             c.url = url.text.strip()
112 |         url_add = company.find('add-url')
113 |         if url_add is not None and url_add.text:
114 |             c.url_add = url_add.text.strip()
115 |         url_ext = company.find('info-page')
116 |         if url_ext is not None and url_ext.text:
117 |             c.url_ext = url_ext.text.strip()
118 |         for rub in company.findall('rubric-rd'):
119 |             if rub.text:
120 |                 c.rubric.append(int(rub.text.strip()))
121 |         coh = company.find('working-time')
122 |         if coh is not None and coh.text:
123 |             c.opening_hours = parse_opening_hours(coh.text)
124 |         photos = company.find('photos')
125 |         if photos is not None:
126 |             for photo in photos:
127 |                 if photo.get('type', 'interior') != 'food':
128 |                     c.photos.append(photo.get('url'))
129 |         for feat in company:
130 |             if feat.tag.startswith('feature-'):
131 |                 name = feat.get('name', None)
132 |                 value = feat.get('value', None)
133 |                 if name is not None and value is not None:
134 |                     if feat.tag == 'feature-boolean':
135 |                         value = value == '1'
136 |                     elif '-numeric' in feat.tag:
137 |                         value = float(value)
138 |                     c.other[name] = value
139 |         yield c
140 | 


--------------------------------------------------------------------------------
/scripts/README.md:
--------------------------------------------------------------------------------
 1 | # Scripts
 2 | 
 3 | Here are some (one at the moment) scripts to prepare data for the conflator
 4 | or do stuff after conflating.
 5 | 
 6 | ## pack_places.py
 7 | 
 8 | Prepares `places.bin` file for the geocoder. Requires three JSON files:
 9 | 
10 | * places.json
11 | * regions.json
12 | * countries.json
13 | 
14 | These comprise the "places feed" and can be prepared using
15 | [these scripts](https://github.com/mapsme/geocoding_data). You can
16 | find a link to a ready-made feed in that repository.
17 | 


--------------------------------------------------------------------------------
/scripts/pack_places.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import json
 3 | import struct
 4 | import os
 5 | import sys
 6 | 
 7 | 
 8 | def pack_coord(coord):
 9 |     data = struct.pack('<l', round(coord * 10000))
10 |     return data[:-1]
11 | 
12 | if len(sys.argv) < 2:
13 |     path = '.'
14 | else:
15 |     path = sys.argv[1]
16 | 
17 | with open(os.path.join(path, 'regions.json'), 'r') as f:
18 |     regions = [(r, int(rid)) for rid, r in json.load(f).items() if r.get('iso')]
19 |     reg_idx = {regions[i][1]: i for i in range(len(regions))}
20 | with open(os.path.join(path, 'countries.json'), 'r') as f:
21 |     countries = [(r, int(rid)) for rid, r in json.load(f).items() if r.get('iso')]
22 |     c_idx = {countries[i][1]: i for i in range(len(countries))}
23 | with open(os.path.join(path, 'places.json'), 'r') as f:
24 |     places = json.load(f)
25 | 
26 | out = open('places.bin', 'wb')
27 | out.write(struct.pack('B', len(countries)))
28 | for c, _ in countries:
29 |     out.write(struct.pack('2s', c['iso'].encode('ascii')))
30 | out.write(struct.pack('<h', len(regions)))
31 | for r, _ in regions:
32 |     rname = r['iso'].encode('ascii')
33 |     out.write(struct.pack('B', len(rname)))
34 |     out.write(rname)
35 | for pl in places.values():
36 |     if pl['country'] not in c_idx:
37 |         continue
38 |     out.write(pack_coord(pl['lon']))
39 |     out.write(pack_coord(pl['lat']))
40 |     out.write(struct.pack('B', c_idx[pl['country']]))
41 |     out.write(struct.pack('<h', reg_idx.get(pl.get('region'), -1)))
42 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | from os import path
 3 | 
 4 | here = path.abspath(path.dirname(__file__))
 5 | exec(open(path.join(here, 'conflate', 'version.py')).read())
 6 | 
 7 | setup(
 8 |     name='osm_conflate',
 9 |     version=__version__,
10 |     author='Ilya Zverev',
11 |     author_email='ilya@zverev.info',
12 |     packages=['conflate'],
13 |     package_data={'conflate': ['places.bin']},
14 |     install_requires=[
15 |         'kdtree',
16 |         'requests',
17 |     ],
18 |     url='https://github.com/mapsme/osm_conflate',
19 |     license='Apache License 2.0',
20 |     description='Command-line script for merging points from a third-party source with OpenStreetMap data',
21 |     long_description=open(path.join(here, 'README.rst')).read(),
22 |     classifiers=[
23 |         'Development Status :: 5 - Production/Stable',
24 |         'Environment :: Console',
25 |         'Intended Audience :: Information Technology',
26 |         'Intended Audience :: Developers',
27 |         'Topic :: Software Development :: Libraries :: Python Modules',
28 |         'Natural Language :: English',
29 |         'Operating System :: OS Independent',
30 |         'Topic :: Utilities',
31 |         'License :: OSI Approved :: Apache Software License',
32 |         'Programming Language :: Python :: 3 :: Only',
33 |     ],
34 |     entry_points={
35 |         'console_scripts': ['conflate = conflate:run']
36 |     },
37 | )
38 | 


--------------------------------------------------------------------------------