├── .github └── ISSUE_TEMPLATE │ ├── bug_report.md │ └── feature_request.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── MANIFEST.in ├── README.md ├── _config.yml ├── benchmark └── Benchmark.ipynb ├── csv_schema_inference ├── __init__.py └── csv_schema_inference.py ├── googled57bdb220576a44a.html ├── pyproject.toml └── setup.cfg /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | We as members, contributors, and leaders pledge to make participation in our 6 | community a harassment-free experience for everyone, regardless of age, body 7 | size, visible or invisible disability, ethnicity, sex characteristics, gender 8 | identity and expression, level of experience, education, socio-economic status, 9 | nationality, personal appearance, race, religion, or sexual identity 10 | and orientation. 11 | 12 | We pledge to act and interact in ways that contribute to an open, welcoming, 13 | diverse, inclusive, and healthy community. 14 | 15 | ## Our Standards 16 | 17 | Examples of behavior that contributes to a positive environment for our 18 | community include: 19 | 20 | * Demonstrating empathy and kindness toward other people 21 | * Being respectful of differing opinions, viewpoints, and experiences 22 | * Giving and gracefully accepting constructive feedback 23 | * Accepting responsibility and apologizing to those affected by our mistakes, 24 | and learning from the experience 25 | * Focusing on what is best not just for us as individuals, but for the 26 | overall community 27 | 28 | Examples of unacceptable behavior include: 29 | 30 | * The use of sexualized language or imagery, and sexual attention or 31 | advances of any kind 32 | * Trolling, insulting or derogatory comments, and personal or political attacks 33 | * Public or private harassment 34 | * Publishing others' private information, such as a physical or email 35 | address, without their explicit permission 36 | * Other conduct which could reasonably be considered inappropriate in a 37 | professional setting 38 | 39 | ## Enforcement Responsibilities 40 | 41 | Community leaders are responsible for clarifying and enforcing our standards of 42 | acceptable behavior and will take appropriate and fair corrective action in 43 | response to any behavior that they deem inappropriate, threatening, offensive, 44 | or harmful. 45 | 46 | Community leaders have the right and responsibility to remove, edit, or reject 47 | comments, commits, code, wiki edits, issues, and other contributions that are 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation 49 | decisions when appropriate. 50 | 51 | ## Scope 52 | 53 | This Code of Conduct applies within all community spaces, and also applies when 54 | an individual is officially representing the community in public spaces. 55 | Examples of representing our community include using an official e-mail address, 56 | posting via an official social media account, or acting as an appointed 57 | representative at an online or offline event. 58 | 59 | ## Enforcement 60 | 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 62 | reported to the community leaders responsible for enforcement at 63 | . 64 | All complaints will be reviewed and investigated promptly and fairly. 65 | 66 | All community leaders are obligated to respect the privacy and security of the 67 | reporter of any incident. 68 | 69 | ## Enforcement Guidelines 70 | 71 | Community leaders will follow these Community Impact Guidelines in determining 72 | the consequences for any action they deem in violation of this Code of Conduct: 73 | 74 | ### 1. Correction 75 | 76 | **Community Impact**: Use of inappropriate language or other behavior deemed 77 | unprofessional or unwelcome in the community. 78 | 79 | **Consequence**: A private, written warning from community leaders, providing 80 | clarity around the nature of the violation and an explanation of why the 81 | behavior was inappropriate. A public apology may be requested. 82 | 83 | ### 2. Warning 84 | 85 | **Community Impact**: A violation through a single incident or series 86 | of actions. 87 | 88 | **Consequence**: A warning with consequences for continued behavior. No 89 | interaction with the people involved, including unsolicited interaction with 90 | those enforcing the Code of Conduct, for a specified period of time. This 91 | includes avoiding interactions in community spaces as well as external channels 92 | like social media. Violating these terms may lead to a temporary or 93 | permanent ban. 94 | 95 | ### 3. Temporary Ban 96 | 97 | **Community Impact**: A serious violation of community standards, including 98 | sustained inappropriate behavior. 99 | 100 | **Consequence**: A temporary ban from any sort of interaction or public 101 | communication with the community for a specified period of time. No public or 102 | private interaction with the people involved, including unsolicited interaction 103 | with those enforcing the Code of Conduct, is allowed during this period. 104 | Violating these terms may lead to a permanent ban. 105 | 106 | ### 4. Permanent Ban 107 | 108 | **Community Impact**: Demonstrating a pattern of violation of community 109 | standards, including sustained inappropriate behavior, harassment of an 110 | individual, or aggression toward or disparagement of classes of individuals. 111 | 112 | **Consequence**: A permanent ban from any sort of public interaction within 113 | the community. 114 | 115 | ## Attribution 116 | 117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 118 | version 2.0, available at 119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. 120 | 121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct 122 | enforcement ladder](https://github.com/mozilla/diversity). 123 | 124 | [homepage]: https://www.contributor-covenant.org 125 | 126 | For answers to common questions about this code of conduct, see the FAQ at 127 | https://www.contributor-covenant.org/faq. Translations are available at 128 | https://www.contributor-covenant.org/translations. 129 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | CONTRIBUTING.md 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Ramses Alexander Coraspe Valdez 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | global-include *.* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # **Csv Schema Inference** 2 | A tool to automatically infer columns data types in .csv files 3 | 4 | ### Check the article here: Building a Schema Inference Data Pipeline for Large CSV files 5 | 6 |

7 | 10 |

11 | 12 | 13 |
14 | 15 | ## **Installing csv-schema-inference** 🔧 16 | 17 |
18 | 19 |
20 | 21 | ``` python 22 | pip install csv-schema-inference 23 | ``` 24 | 25 |
26 | 27 | Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ 28 | Collecting csv-schema-inference 29 | Downloading csv_schema_inference-0.0.9-py3-none-any.whl (7.3 kB) 30 | Installing collected packages: csv-schema-inference 31 | Successfully installed csv-schema-inference-0.0.9 32 | 33 |
34 | 35 |
36 | 37 |
38 | 39 | ## **Importing csv-schema-inference library** ⚡ 40 | 41 |
42 | 43 |
44 | 45 | ``` python 46 | from csv_schema_inference import csv_schema_inference 47 | ``` 48 | 49 |
50 | 51 |
52 | 53 | ## **Setting csv-schema-inference configuration** ✍ 54 | 55 |
56 | 57 |
58 | 59 | ``` python 60 | 61 | #if the inferred data type is INTEGER and there is a presence of FLOAT on the results , then the result will be FLOAT 62 | conditions = {"INTEGER":"FLOAT"} 63 | 64 | csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.9, max_length=100, batch_size = 200000, acc = 0.8, seed=2, header=True, sep=",", conditions = conditions) 65 | pathfile = "/content/file__500k.csv" 66 | ``` 67 | 68 |
69 | 70 |
71 | 72 | ## **Run inference** 🏃 73 | 74 |
75 | 76 |
77 | 78 | ``` python 79 | aprox_schema = csv_infer.run_inference(pathfile) 80 | ``` 81 | 82 |
83 | 84 |
85 | 86 | ## **Showing the approximate data type inference for each column** 🔍 87 | 88 |
89 | 90 |
91 | 92 | ``` python 93 | csv_infer.pretty(aprox_schema) 94 | ``` 95 | 96 |
97 | 98 | 0 99 | name 100 | id 101 | type 102 | INTEGER 103 | nullable 104 | False 105 | 1 106 | name 107 | full_name 108 | type 109 | STRING 110 | nullable 111 | True 112 | 2 113 | name 114 | age 115 | type 116 | INTEGER 117 | nullable 118 | False 119 | 3 120 | name 121 | city 122 | type 123 | STRING 124 | nullable 125 | True 126 | 4 127 | name 128 | weight 129 | type 130 | FLOAT 131 | nullable 132 | False 133 | 5 134 | name 135 | height 136 | type 137 | FLOAT 138 | nullable 139 | False 140 | 6 141 | name 142 | isActive 143 | type 144 | BOOLEAN 145 | nullable 146 | False 147 | 7 148 | name 149 | col_int1 150 | type 151 | INTEGER 152 | nullable 153 | False 154 | 8 155 | name 156 | col_int2 157 | type 158 | INTEGER 159 | nullable 160 | False 161 | 9 162 | name 163 | col_int3 164 | type 165 | INTEGER 166 | nullable 167 | False 168 | 10 169 | name 170 | col_float1 171 | type 172 | FLOAT 173 | nullable 174 | False 175 | 11 176 | name 177 | col_float2 178 | type 179 | FLOAT 180 | nullable 181 | False 182 | 12 183 | name 184 | col_float3 185 | type 186 | FLOAT 187 | nullable 188 | False 189 | 13 190 | name 191 | col_float4 192 | type 193 | FLOAT 194 | nullable 195 | False 196 | 14 197 | name 198 | col_float5 199 | type 200 | FLOAT 201 | nullable 202 | False 203 | 15 204 | name 205 | col_float6 206 | type 207 | FLOAT 208 | nullable 209 | False 210 | 16 211 | name 212 | col_float7 213 | type 214 | FLOAT 215 | nullable 216 | False 217 | 17 218 | name 219 | col_float8 220 | type 221 | FLOAT 222 | nullable 223 | False 224 | 18 225 | name 226 | col_float9 227 | type 228 | FLOAT 229 | nullable 230 | False 231 | 19 232 | name 233 | col_float10 234 | type 235 | FLOAT 236 | nullable 237 | False 238 | 20 239 | name 240 | test_column 241 | type 242 | FLOAT 243 | nullable 244 | False 245 | 246 |
247 | 248 |
249 | 250 |
251 | 252 | ## **Checking schema values for specific columns** ✔ 253 | 254 |
255 | 256 |
257 | 258 | ``` python 259 | result = csv_infer.get_schema_columns(columns = {"test_column"}) 260 | csv_infer.pretty(result) 261 | ``` 262 | 263 |
264 | 265 | 20 266 | _name 267 | test_column 268 | types_found 269 | INTEGER 270 | cnt 271 | 406130 272 | FLOAT 273 | cnt 274 | 50964 275 | nullable 276 | False 277 | type 278 | FLOAT 279 | 280 |
281 | 282 |
283 | 284 |
285 | 286 | ## **Explore all possible data types for a specific columns** ✅ 287 | 288 |
289 | 290 |
291 | 292 | ``` python 293 | result = csv_infer.explore_schema_column(column = "test_column") 294 | csv_infer.pretty(result) 295 | ``` 296 | 297 |
298 | 299 | 20 300 | name 301 | test_column 302 | types_found 303 | INTEGER 304 | 88.85043339006856 305 | FLOAT 306 | 11.149566609931437 307 | nullable 308 | False 309 | 310 |
311 | 312 |
313 | 314 | ## Benchmark 315 | The tests were done with 9 .csv files, 21 columns, different sizes and number of records, an average of 5 executions was calculated for each process, shuffle time and inferring time. 316 | 317 | - file__20m.csv: 20 million records 318 | - file__15m.csv: 15 million records 319 | - file__12m.csv: 12 million records 320 | - file__10m.csv: 10 million records 321 | - And so on... 322 | 323 | If you want to know more about the shuffling process, you can check this other repository: A tool to automatically Shuffle lines in .csv files, the shuffling process helps us to: 324 | 325 | 1. Increase the probability of finding all the data types present in a single column. 326 | 2. Avoid iterate the entire dataset. 327 | 2. Avoid see biases in the data that may be part of its organic behavior and due to not knowing the nature of its construction. 328 | 329 |

330 | 333 |

334 | 335 | ## Contributing and Feedback 336 | Any ideas or feedback about this repository?. Help me to improve it. 337 | 338 | ## Authors 339 | - Created by Ramses Alexander Coraspe Valdez 340 | - Created on 2022 341 | 342 | ## License 343 | This project is licensed under the terms of the MIT License. 344 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman 2 | title: "A tool to automatically infer columns data types in .csv files" 3 | description: "A parallel implementation of Schema inference using python" 4 | author: "Ramses Alexander Coraspe Valdez" 5 | -------------------------------------------------------------------------------- /benchmark/Benchmark.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Benchmark.ipynb", 7 | "provenance": [], 8 | "collapsed_sections": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "code", 21 | "source": [ 22 | "import pandas as pd" 23 | ], 24 | "metadata": { 25 | "id": "NrcJWz22npeq" 26 | }, 27 | "execution_count": 99, 28 | "outputs": [] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 100, 33 | "metadata": { 34 | "id": "XyKNLam6m4Vv" 35 | }, 36 | "outputs": [], 37 | "source": [ 38 | "benchmark_data = [ \n", 39 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 73.00043550, 'inferring_time': 111.56356970},\n", 40 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 72.97213800, 'inferring_time': 115.25191430},\n", 41 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 82.32063370, 'inferring_time': 116.76299740},\n", 42 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 77.67622630, 'inferring_time': 114.59385790},\n", 43 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 73.26938180, 'inferring_time': 112.55643420},\n", 44 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 55.82634800, 'inferring_time': 74.62251340},\n", 45 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.93429800, 'inferring_time': 71.26189710},\n", 46 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.87042450, 'inferring_time': 69.13962730},\n", 47 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.14651010, 'inferring_time': 71.23978310},\n", 48 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.21968120, 'inferring_time': 69.67053280},\n", 49 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 32.48983010, 'inferring_time': 58.08111770},\n", 50 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 34.64318280, 'inferring_time': 57.98930810},\n", 51 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 34.85442540, 'inferring_time': 57.71942010},\n", 52 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 33.38362710, 'inferring_time': 59.86055910},\n", 53 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 32.79728820, 'inferring_time': 57.41156370},\n", 54 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 28.28831460, 'inferring_time': 53.78283170},\n", 55 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 30.25130520, 'inferring_time': 51.21287500},\n", 56 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 29.83213370, 'inferring_time': 53.01958860},\n", 57 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 30.21982290, 'inferring_time': 51.81474830},\n", 58 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 29.52344140, 'inferring_time': 58.40408200},\n", 59 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.60465530, 'inferring_time': 44.68717590},\n", 60 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 23.84743100, 'inferring_time': 42.68867510},\n", 61 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.94851320, 'inferring_time': 46.96807710},\n", 62 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.77527450, 'inferring_time': 42.62858490},\n", 63 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.20869720, 'inferring_time': 42.98606580},\n", 64 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 15.88705860, 'inferring_time': 28.34111610},\n", 65 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 17.08761300, 'inferring_time': 29.42147060},\n", 66 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 16.48110200, 'inferring_time': 29.21088670},\n", 67 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 17.10600270, 'inferring_time': 28.82191680},\n", 68 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 17.17415740, 'inferring_time': 29.26859480},\n", 69 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.47866530, 'inferring_time': 19.30165580},\n", 70 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 12.42761710, 'inferring_time': 19.83578670},\n", 71 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.44712670, 'inferring_time': 21.38865030},\n", 72 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.67422640, 'inferring_time': 23.90071370},\n", 73 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.75241490, 'inferring_time': 23.17653020},\n", 74 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.56659010, 'inferring_time': 9.77755900},\n", 75 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.37670290, 'inferring_time': 9.85879350},\n", 76 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.50792340, 'inferring_time': 9.83664550},\n", 77 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.77451570, 'inferring_time': 9.72117910},\n", 78 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.56910340, 'inferring_time': 9.84671710},\n", 79 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.42946810, 'inferring_time': 4.65625420},\n", 80 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.38822270, 'inferring_time': 5.17744930},\n", 81 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.74428740, 'inferring_time': 4.82960490},\n", 82 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.58021890, 'inferring_time': 5.17412620},\n", 83 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.67854850, 'inferring_time': 5.08991410}\n", 84 | "]" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "source": [ 90 | "df = pd.DataFrame(benchmark_data)\n", 91 | "df" 92 | ], 93 | "metadata": { 94 | "colab": { 95 | "base_uri": "https://localhost:8080/", 96 | "height": 1000 97 | }, 98 | "id": "R9ABJG7qnohL", 99 | "outputId": "fb7f4f3c-6542-44cb-9d1c-4ce44d9537bb" 100 | }, 101 | "execution_count": 102, 102 | "outputs": [ 103 | { 104 | "output_type": "execute_result", 105 | "data": { 106 | "text/plain": [ 107 | " filename file_size shuffle_time inferring_time\n", 108 | "0 file__20m.csv 3100848423 73.000435 111.563570\n", 109 | "1 file__20m.csv 3100848423 72.972138 115.251914\n", 110 | "2 file__20m.csv 3100848423 82.320634 116.762997\n", 111 | "3 file__20m.csv 3100848423 77.676226 114.593858\n", 112 | "4 file__20m.csv 3100848423 73.269382 112.556434\n", 113 | "5 file__15m.csv 2322887546 55.826348 74.622513\n", 114 | "6 file__15m.csv 2322887546 42.934298 71.261897\n", 115 | "7 file__15m.csv 2322887546 42.870424 69.139627\n", 116 | "8 file__15m.csv 2322887546 42.146510 71.239783\n", 117 | "9 file__15m.csv 2322887546 42.219681 69.670533\n", 118 | "10 file__12m.csv 1856118441 32.489830 58.081118\n", 119 | "11 file__12m.csv 1856118441 34.643183 57.989308\n", 120 | "12 file__12m.csv 1856118441 34.854425 57.719420\n", 121 | "13 file__12m.csv 1856118441 33.383627 59.860559\n", 122 | "14 file__12m.csv 1856118441 32.797288 57.411564\n", 123 | "15 file__10m.csv 1544899668 28.288315 53.782832\n", 124 | "16 file__10m.csv 1544899668 30.251305 51.212875\n", 125 | "17 file__10m.csv 1544899668 29.832134 53.019589\n", 126 | "18 file__10m.csv 1544899668 30.219823 51.814748\n", 127 | "19 file__10m.csv 1544899668 29.523441 58.404082\n", 128 | "20 file__8m.csv 1235682644 22.604655 44.687176\n", 129 | "21 file__8m.csv 1235682644 23.847431 42.688675\n", 130 | "22 file__8m.csv 1235682644 22.948513 46.968077\n", 131 | "23 file__8m.csv 1235682644 22.775274 42.628585\n", 132 | "24 file__8m.csv 1235682644 22.208697 42.986066\n", 133 | "25 file__6m.csv 926480055 15.887059 28.341116\n", 134 | "26 file__6m.csv 926480055 17.087613 29.421471\n", 135 | "27 file__6m.csv 926480055 16.481102 29.210887\n", 136 | "28 file__6m.csv 926480055 17.106003 28.821917\n", 137 | "29 file__6m.csv 926480055 17.174157 29.268595\n", 138 | "30 file__4m.csv 617284424 11.478665 19.301656\n", 139 | "31 file__4m.csv 617284424 12.427617 19.835787\n", 140 | "32 file__4m.csv 617284424 11.447127 21.388650\n", 141 | "33 file__4m.csv 617284424 11.674226 23.900714\n", 142 | "34 file__4m.csv 617284424 11.752415 23.176530\n", 143 | "35 file__2m.csv 308078962 5.566590 9.777559\n", 144 | "36 file__2m.csv 308078962 5.376703 9.858794\n", 145 | "37 file__2m.csv 308078962 5.507923 9.836645\n", 146 | "38 file__2m.csv 308078962 5.774516 9.721179\n", 147 | "39 file__2m.csv 308078962 5.569103 9.846717\n", 148 | "40 file__1m.csv 153491820 2.429468 4.656254\n", 149 | "41 file__1m.csv 153491820 2.388223 5.177449\n", 150 | "42 file__1m.csv 153491820 2.744287 4.829605\n", 151 | "43 file__1m.csv 153491820 2.580219 5.174126\n", 152 | "44 file__1m.csv 153491820 2.678549 5.089914" 153 | ], 154 | "text/html": [ 155 | "\n", 156 | "
\n", 157 | "
\n", 158 | "
\n", 159 | "\n", 172 | "\n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | "
filenamefile_sizeshuffle_timeinferring_time
0file__20m.csv310084842373.000435111.563570
1file__20m.csv310084842372.972138115.251914
2file__20m.csv310084842382.320634116.762997
3file__20m.csv310084842377.676226114.593858
4file__20m.csv310084842373.269382112.556434
5file__15m.csv232288754655.82634874.622513
6file__15m.csv232288754642.93429871.261897
7file__15m.csv232288754642.87042469.139627
8file__15m.csv232288754642.14651071.239783
9file__15m.csv232288754642.21968169.670533
10file__12m.csv185611844132.48983058.081118
11file__12m.csv185611844134.64318357.989308
12file__12m.csv185611844134.85442557.719420
13file__12m.csv185611844133.38362759.860559
14file__12m.csv185611844132.79728857.411564
15file__10m.csv154489966828.28831553.782832
16file__10m.csv154489966830.25130551.212875
17file__10m.csv154489966829.83213453.019589
18file__10m.csv154489966830.21982351.814748
19file__10m.csv154489966829.52344158.404082
20file__8m.csv123568264422.60465544.687176
21file__8m.csv123568264423.84743142.688675
22file__8m.csv123568264422.94851346.968077
23file__8m.csv123568264422.77527442.628585
24file__8m.csv123568264422.20869742.986066
25file__6m.csv92648005515.88705928.341116
26file__6m.csv92648005517.08761329.421471
27file__6m.csv92648005516.48110229.210887
28file__6m.csv92648005517.10600328.821917
29file__6m.csv92648005517.17415729.268595
30file__4m.csv61728442411.47866519.301656
31file__4m.csv61728442412.42761719.835787
32file__4m.csv61728442411.44712721.388650
33file__4m.csv61728442411.67422623.900714
34file__4m.csv61728442411.75241523.176530
35file__2m.csv3080789625.5665909.777559
36file__2m.csv3080789625.3767039.858794
37file__2m.csv3080789625.5079239.836645
38file__2m.csv3080789625.7745169.721179
39file__2m.csv3080789625.5691039.846717
40file__1m.csv1534918202.4294684.656254
41file__1m.csv1534918202.3882235.177449
42file__1m.csv1534918202.7442874.829605
43file__1m.csv1534918202.5802195.174126
44file__1m.csv1534918202.6785495.089914
\n", 500 | "
\n", 501 | " \n", 511 | " \n", 512 | " \n", 549 | "\n", 550 | " \n", 574 | "
\n", 575 | "
\n", 576 | " " 577 | ] 578 | }, 579 | "metadata": {}, 580 | "execution_count": 102 581 | } 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "source": [ 587 | "df['file_size'] = round((df['file_size'] / 1e+9), 3)" 588 | ], 589 | "metadata": { 590 | "id": "VBxKzFLjuyfL" 591 | }, 592 | "execution_count": 103, 593 | "outputs": [] 594 | }, 595 | { 596 | "cell_type": "code", 597 | "source": [ 598 | "df = df.groupby(['file_size', 'filename'], sort = False)['shuffle_time', 'inferring_time'].mean()\n", 599 | "df" 600 | ], 601 | "metadata": { 602 | "colab": { 603 | "base_uri": "https://localhost:8080/", 604 | "height": 398 605 | }, 606 | "id": "fDBxYlEbn3TB", 607 | "outputId": "98af830a-86f4-435f-ac3e-91ec5f41b0da" 608 | }, 609 | "execution_count": 105, 610 | "outputs": [ 611 | { 612 | "output_type": "stream", 613 | "name": "stderr", 614 | "text": [ 615 | "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.\n", 616 | " \"\"\"Entry point for launching an IPython kernel.\n" 617 | ] 618 | }, 619 | { 620 | "output_type": "execute_result", 621 | "data": { 622 | "text/plain": [ 623 | " shuffle_time inferring_time\n", 624 | "file_size filename \n", 625 | "3.101 file__20m.csv 75.847763 114.145755\n", 626 | "2.323 file__15m.csv 45.199452 71.186871\n", 627 | "1.856 file__12m.csv 33.633671 58.212394\n", 628 | "1.545 file__10m.csv 29.623004 53.646825\n", 629 | "1.236 file__8m.csv 22.876914 43.991716\n", 630 | "0.926 file__6m.csv 16.747187 29.012797\n", 631 | "0.617 file__4m.csv 11.756010 21.520667\n", 632 | "0.308 file__2m.csv 5.558967 9.808179\n", 633 | "0.153 file__1m.csv 2.564149 4.985470" 634 | ], 635 | "text/html": [ 636 | "\n", 637 | "
\n", 638 | "
\n", 639 | "
\n", 640 | "\n", 653 | "\n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | "
shuffle_timeinferring_time
file_sizefilename
3.101file__20m.csv75.847763114.145755
2.323file__15m.csv45.19945271.186871
1.856file__12m.csv33.63367158.212394
1.545file__10m.csv29.62300453.646825
1.236file__8m.csv22.87691443.991716
0.926file__6m.csv16.74718729.012797
0.617file__4m.csv11.75601021.520667
0.308file__2m.csv5.5589679.808179
0.153file__1m.csv2.5641494.985470
\n", 725 | "
\n", 726 | " \n", 736 | " \n", 737 | " \n", 774 | "\n", 775 | " \n", 799 | "
\n", 800 | "
\n", 801 | " " 802 | ] 803 | }, 804 | "metadata": {}, 805 | "execution_count": 105 806 | } 807 | ] 808 | }, 809 | { 810 | "cell_type": "code", 811 | "source": [ 812 | "df.reset_index(inplace=True)" 813 | ], 814 | "metadata": { 815 | "id": "zh6NHO3csESF" 816 | }, 817 | "execution_count": 106, 818 | "outputs": [] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "source": [ 823 | "import matplotlib.pyplot as plt\n", 824 | "import seaborn as sns\n", 825 | "\n", 826 | "plt.figure(figsize=(15,4))\n", 827 | "\n", 828 | "sns.set(style='white')\n", 829 | "\n", 830 | "df.set_index(['file_size', 'filename']).plot(kind='bar', stacked=True, color=sns.set_palette(\"colorblind\"))\n", 831 | "\n", 832 | "plt.title('Time taken by Shuffle Time & Inferring Time', fontsize=12)\n", 833 | "\n", 834 | "plt.xlabel('Files Sizes (Gigabytes)')\n", 835 | "plt.ylabel('Time (Seconds)')\n", 836 | "\n", 837 | "plt.xticks(rotation=90)" 838 | ], 839 | "metadata": { 840 | "colab": { 841 | "base_uri": "https://localhost:8080/", 842 | "height": 464 843 | }, 844 | "id": "ikFD4xeysAVV", 845 | "outputId": "60dbaf2d-252f-4f2d-b2a2-ec9caa86909b" 846 | }, 847 | "execution_count": 107, 848 | "outputs": [ 849 | { 850 | "output_type": "execute_result", 851 | "data": { 852 | "text/plain": [ 853 | "(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),\n", 854 | " )" 855 | ] 856 | }, 857 | "metadata": {}, 858 | "execution_count": 107 859 | }, 860 | { 861 | "output_type": "display_data", 862 | "data": { 863 | "text/plain": [ 864 | "
" 865 | ] 866 | }, 867 | "metadata": {} 868 | }, 869 | { 870 | "output_type": "display_data", 871 | "data": { 872 | "text/plain": [ 873 | "
" 874 | ], 875 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAGLCAYAAADUPKXyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd1gU59rH8S9VVEhQVIIltthNAoqiMTbsRgRbbLH3htEDUVFRiVGxR49GTYw5iSQaGwgWEhu22PVYYyIRUVFQsADSd98/eNnjSnFBYAa9P9fldbmzu/P8mB32Zp7nmRkjrVarRQghhHgJY6UDCCGEKBqkYAghhDCIFAwhhBAGkYIhhBDCIFIwhBBCGEQKhhBCCINIwVCBTz75hJMnTyodI5OTJ0/SokWLAm9nwIABbNmypcDbmTp1KsuWLcvz+5ctW4aTkxPNmjUD4Pfff6dly5Y4ODhw9epVnJ2dOX78eH7FBeDMmTN06NAhX9epFK1Wy7Rp02jUqBE9e/bMl3VGRETg4OBAWlpavqzPUN7e3qxatapQ21QDU6UDvAkcHBx0/09ISMDc3BwTExMA5syZw65duwolx8qVK7l16xaLFy8ulPaUsGXLFtavX09kZCTFixenXr16LFu2DEtLy1dab0REBBs2bODgwYPY2NgA4Ovry8yZM2nbtm2e17tz505mzZoFQFpaGsnJyRQvXlz3/Pnz5wkODn6l7Lmxfv16NmzYQGJiInXq1OHbb7/FwsIi29dPnToVW1tbJk2a9NJ1nz17lmPHjhESEkKJEiXyJW/58uU5f/58vqzrecOHD+fs2bMAJCcnY2RkhJmZGQAuLi74+Pjke5tFgRSMQvD8Du3s7MzcuXP56KOPFEz0ejp16hTLli3ju+++o27dujx+/JiDBw/my7ojIiKwtrbWFYuMZTVq1Hil9Xbt2pWuXbsC6Ud0np6eHD58+JXWmVehoaEsX76crVu3Ur16dc6ePYuxcf51Qty9e5cKFSrkqVikpqZiamr60mX55bvvvtP9PzdF8XUnXVIq8HxXxsqVK3F3d8fDwwMHBwdcXFy4efMma9eupWnTprRs2ZKjR4/q3hsbG4uXlxcff/wxzZs3Z9myZVkenh8+fJi1a9eyZ88eHBwcdF9S27Zto1OnTjg4ONCmTRs2bdqUbc4ff/yRzp07c//+fZKTk/H19aVVq1Z89NFHeHt7k5iYCPyvK+v777+nadOmfPzxx2zbti3HbRAeHk7Pnj1p0KABY8aM4fHjxwCMHDmSn376Se+1Li4u/P7775nWcenSJezt7albty4A1tbWdOvWTe/o4unTp4wcORIHBwd69epFeHg4AHfu3KFWrVqkpqbqXpvRVXb8+HGGDh1KVFQUDg4OTJ48WdcN4urqmuURhkajYd26dbRt2xYnJycmTpyo+5ly48VuQWdnZ7777jtcXFywt7fHy8uLhw8fMnz4cBwcHBg8eDBPnjzRvf7ChQv06dMHR0dHunbtmmPXp6mpKSYmJlSoUAFTU1OcnJwwNzc3OGvGNtyxYwetWrXCycmJb775Bkg/8psxYwYXLlzAwcGBFStWAHDw4EFcXV1xdHSkT58+/Pnnn3o/67p163Q/661bt6hVqxZbtmyhVatWDBo0KNPnNmDAAJYvX06fPn1wcHBg6NChxMTE6Nbp7+9P69atcXJyYtWqVXnuRny+ezPjM/r22291+/u+ffsICQmhQ4cONG7cmDVr1ujem1/7hhKkYKhQxi/R6dOnqVOnDsOGDUOj0XD48GHGjRuHt7e37rVTp07F1NSU3377DX9/f44dO5bleECLFi0YNWoUnTp14vz58+zcuRMAGxsb1q5dy7lz55g/fz7z58/nypUrmd7/73//mx07drBx40beeecdFi9ezM2bN/H39+e3334jKipKr0/34cOHxMbGcvjwYb766it8fHz0vshe5O/vz7x58zh69CimpqbMnTsXADc3N11WgD///JOoqChatmyZaR0ffvghR48eZcWKFZw9e5bk5ORMr9m9ezfjx4/n9OnTvPvuuwaNaXz00Ud8++23lCtXjvPnz7N06VLdUWNAQAD79u3L9J6ffvqJffv2sXHjRo4cOcLbb7+db90Yv/32Gxs2bCA4OJiDBw8yYsQIJk+ezIkTJ9BoNLoCGxkZyahRoxgzZgynTp1iypQpuLu7632BPs/GxobSpUvj7u5OUlJSnvOdPXuWvXv38p///IdVq1YRGhpKr169mDNnDvb29pw/fx53d3euXr2Kl5cXPj4+nDx5kt69ezN27Fi9z23Xrl2sW7eOM2fO6LpxT58+ze7du1m/fn2W7QcFBTF//nz++OMPUlJS+P777wG4ceMGc+bMYdGiRRw5coS4uDgiIyPz/HM+7+HDhyQlJXH48GHc3d2ZMWMGO3fuZNu2bfj5+bF69Wpu374NFOy+UdCkYKiQo6MjzZs3x9TUlI4dO/Lo0SNGjhyJmZkZnTt35u7duzx9+pSHDx8SEhKCl5cXJUqUwMbGhsGDB+dqTKRVq1a8++67GBkZ0bhxY5o1a8aZM2d0z2u1WubPn8+xY8f48ccfKV26NFqtll9//RUvLy+sra2xtLRk1KhReu2ampoybtw4zMzMaNmyJSVKlODmzZvZ5nB1daVmzZqUKFGCiRMnsnfvXtLS0mjTpg1hYWGEhYUB6V/QnTp1yvIvX0dHR1auXMnVq1cZNWoUTk5OzJ8/X++Iq23btnzwwQeYmprStWtXrl27ZvC2yo1NmzYxadIk3nnnHczNzRk/fjzBwcF6RzB59dlnn1GmTBlsbW1xdHTkgw8+oG7duhQrVox27dpx9epVIH1btWjRgpYtW2JsbEyzZs2oX78+ISEhWa534sSJ9O7dm8qVKzN27Fhd0fDw8Mh0lJeT8ePHY2FhQe3ataldu7beUcPzNm/eTO/evfnwww8xMTGhW7dumJmZceHCBd1rBgwYgJ2dnd44yoQJEyhRokS2Yyvdu3enatWqWFhY0LFjR91nvHfvXlq3bo2joyPm5ua4u7tjZGRk8M+VE1NTU8aMGaP7HX306BEDBw7E0tKSGjVq8N5773H9+nWgYPeNgiZjGCr0fD+5hYUFpUqV0v11lfFL8uzZM6KiokhNTeXjjz/WvV6j0WBnZ2dwWyEhIaxatYqwsDA0Gg2JiYnUrFlT93xsbCy//vory5Ytw8rKCoCYmBgSEhLo3r277nVarRaNRqN7bG1trde/XLx4cZ49e5Ztjuczly9fnpSUFB49ekSZMmXo1KkTO3fuZPz48QQFBem6M7LSsmVLWrZsiUaj4eTJk0ycOJGqVavSp08fAMqUKaN7rYWFRY6ZXkVERATjxo3TGwMwNjYmOjoaW1vbV1r38z9DsWLFsv2ZIiIi2Lt3r944TmpqKk5OTpnW+c8//3Du3DnWrFmDiYkJnp6ejB07ln//+99cuHCBUaNG5SlfTp97REQE/v7+bNy4UbcsJSWFqKgo3eOs9uV33nknx/bLli2bZftRUVF67y1evDjW1tYv+WkMY21tnel39Pnf42LFihEfHw8U7L5R0KRgFGEZf6GcOHHCoMG/F/+aSk5Oxt3dHV9fX9q0aYOZmRljx47l+QsYv/XWWyxatIjPP/+cf//73zRs2JBSpUphYWHBrl278m0Hv3fvnt7/zczMKFWqFADdunXjiy++oGHDhhQvXlxv1ll2jI2Nadq0KU2aNOHvv/9+6eszBmITExN1Yx4PHjzIy48CpH828+bNo2HDhnlex6uys7PD1dVV172Xk7S0NNLS0tBqtRgbG7NgwQLGjRuHm5sb1atXf+XB/ezyjR49mjFjxmT7mqyOAPJ6VFCuXDm9o9zExERFxg7UsG/klXRJFWHlypWjWbNmLFiwgLi4ODQaDeHh4Zw6dSrL19vY2HD37l3dkUBycjLJycmULl0aU1NTQkJCOHbsWKb3OTk5sXjxYiZMmMDFixcxNjamV69ezJs3j+joaCC9v/zIkSN5/ll27tzJjRs3SEhI4Ouvv6ZDhw66v9gcHBx0X2IZg/VZ2bdvH7t27eLJkydotVouXrzIqVOn+PDDD1/afunSpbG1tSUgIIC0tDS2bt2q63POi759+7J8+XLu3r0LpB+VZTXWUZC6du3KwYMHOXLkCGlpaSQlJXHy5Enu37+f6bXVqlWjcuXKzJkzh9jYWFJTU/noo48ICwujRIkSFMRdEHr16sWmTZv473//i1ar5dmzZxw6dIi4uLh8bwugQ4cOHDhwgHPnzpGcnMzKlSsL5Od6GTXsG3klBaOIW7hwISkpKXTu3JlGjRrh7u6e7V/GHTt2BNILQMbsoRkzZvD555/TqFEjgoKCcHZ2zvK9zZo1Y968eYwePZorV67g6elJ5cqV+fTTT2nQoAGDBw/OcYziZVxdXZk6dSrNmjUjOTmZ6dOnZ3r+r7/+wtXVNdt1vP322/z666+0b9+eBg0a4OnpybBhw3IsMs/78ssvWb9+PU5OTty4ccOgI5nsDBw4EGdnZ4YOHYqDgwOffvopFy9ezPP68sLOzo7Vq1frzbBbv369XtdhBhMTE9auXUtsbCxt27alRYsWnDt3ju3bt3P16lWWL1+e7/nef/99vvzyS3x8fGjUqBHt27dn+/bt+d5Ohho1ajBz5kwmT55M8+bNKVGiBKVLl87VTLD8oIZ9I6+M5AZKoijw9/dn8+bN/PLLL0pHEa+J+Ph4GjVqRHBwMJUqVVI6TpEgRxhC9RISEvj555/p3bu30lFEEXfgwAESEhJ49uwZvr6+1KxZk4oVKyodq8iQgiFU7ciRIzRt2hQbGxu6dOmidBxRxO3fv5/mzZvTvHlzbt26xdKlS/Ntau2bQLqkhBBCGOS1nVabmJjI5cuXKVu2rG62jRBCiJylpaXx4MED6tevn+nkyNe2YFy+fJn+/fsrHUMIIYokPz8/HB0d9Za9tgUj42xPPz+/l54ZKoQQIt39+/fp37+/3hnzGV7bgpHRDfXOO+/ILAghhMilrLryZZaUEEIIg0jBEEIIYRApGEIIIQzy2o5hCCHyV0pKCnfu3NHdWVEUbRYWFlSsWFF3r3JDSMEQQhjkzp07WFlZUaVKFTk7uojTarVER0dz584dqlatavD7pEtKCGGQxMREbGxspFi8BoyMjLCxscn10aIUDCGEwaRYvD7y8llKwRBCCGGQN75gaFLzbwAvP9clhNolpqQVqfWKV/fGD3obm1pwc3n+3HGr6ufJ+bIeIYoCCzMTjD0C8329msUueXrfyZMn8fX1zdNd+6ZNm8bFixepUaMGy5cv13vcokULDh06xIoVK/KU64cffsDFxQUbGxsAfvnlF5KSkhg8eHCe1qekN75gCCHebA8fPiQ4OJgzZ85gbGyc6fGr3jb2xx9/5KOPPtIVjL59++ZHbEVIwRBCFDkJCQlMmTKFGzduYGpqStWqVenXrx9paWl4e3tz/vx5jIyMWLZsGdWrV2f79u16RwkZj+fNm8fAgQNJTEykW7dutG/fnl27duked+vWjbfeekuv7R07dvDzzz+TlpaGpaUls2fPplq1alnm/Oabb4iKisLd3Z1ixYqxZMkS9uzZw7Nnz5gyZQrbt28nKCgIKysrrl+/jq2tLTNnzsTX15fw8HDq16/P4sWLMTIyIi4ujvnz53P9+nWSkpJwcnJi2rRphXr7hjd+DEMIUfQcPXqU+Ph4du/ezc6dO/Hx8QHgxo0b9OnTh8DAQDp16sTq1atzXI+lpSXr1q3DysqKgIAAxo0bp/f4xW6jM2fOsGfPHvz8/Ni+fTvDhg3Dy8sr2/WPGTOGcuXKsWLFCgICAnjvvfcyvebSpUtMmzaNvXv3YmFhwb/+9S+WLFnCrl27+Ouvv/jjjz8AmD9/Po0aNWLr1q0EBAQQExPDtm3bcrnlXo0cYQghipzatWsTGhrKnDlzaNy4Ma1atQKgatWq1K1bFwB7e3sOHjyYr+0eOHCAP//8k169egHpJ8A9ffr0ldbZoEED3S0Y6tSpQ4UKFXRHNbVr1+bWrVt89NFHHDhwgIsXL7JhwwYg/bwYW1vbV2o7t6RgCCGKnEqVKhEUFMSJEyc4fPgwy5YtY8aMGZib/28Ci7GxMampqUD6pbo1Go3uuaSkpDy1q9Vq6dGjBxMnTny1H+A5xYoV0/3fxMQk0+O0tDRd26tXr6ZSpUr51nZuScEQQuRJYkpanmc0vWy9FmY598vfv3+ft99+m7Zt29KsWTOaN2/OkydPsn195cqVuX79OsnJ6TMZg4ODM41NGMLZ2ZkpU6bQu3dv3nnnHdLS0rh27Rr169fP9j0lS5YkNjY2121l1fa6deuYPXs2JiYmxMTEEB8fX6gFRAqGECJPXvalXpDrvX79OkuWLAFAo9EwcuRIypUrl+3r7e3tadq0KZ988gnlypWjdu3aPHjwINfZGjVqxOeff86YMWNIS0sjJSWFjh075lgwBg4ciJeXFxYWFrrMeeHl5cWiRYtwdXXFyMgIMzMzvLy8CrVgGGm1Wm2htVaI7ty5Q5s2bdi/f/9L77gn52EI8XLXrl2jTp06SscQ+SirzzSn706ZJSWEEMIg0iUlhBCvaMuWLWzcuDHT8gULFrxWR2VSMIQQ4hX16tVLN9X2dVZoBcPX15fg4GDu3r1LYGAgNWvW5M6dO4wbN073mtjYWOLi4jh16hSQPivA3NxcN83Mw8OD5s2bF1ZkIYQQzym0gtGmTRsGDhxI//79dcsqVqxIQECA7vFXX32lm3OcYcWKFdSsWbOwYgohhMhGoRUMR0fHHJ9PTk4mMDCQ9evXF1IiIYQQuaGaMYwDBw5ga2tLvXr19JZ7eHig1Wpp2LAhkydPztPJNkKI/KdJTcTY1KLIrFe8OtUUjG3bttGjRw+9ZX5+ftjZ2ZGcnMxXX32Fj48PixcvViihEOJ5+XkvmecZej6Tq6srmzdvxsIi5+Jy7tw5vL29MTU1ZerUqTRp0uSV8hnabm5du3aNmzdv0rlz5wJvK69UcR5GZGQkp0+fxsVF/zIDdnZ2AJibm9OvXz/OnTunRDwhhAoFBAQY9EUaEBCAm5sb/v7+uSoWGdehevGxoe3m1rVr19i7d6/esoJqK69UcYSxY8cOWrZsSalSpXTLnj17RlpaGlZWVmi1Wnbv3v1azWcWQryaWrVqce7cOUqWLImzszOurq4cP36cBw8eMHToUD777DO+++479uzZg4WFBYGBgWzevJmIiAjmzZvHo0ePSElJYdCgQbrejVq1ajF+/HgOHTpE8+bNuX//PiYmJty8eZP4+HgCAgIMahfSL4U+Z84cAJycnNi/fz9r167NchLPo0ePWLFiBXFxcbi6utKoUSNmzJiRqS0XFxdOnDhBZGQk//rXv4iOjiYoKIgnT54wb948GjVqBEBISAjffPMNycnJmJmZMW3aNOzt7V95mxdawZg7dy6//fYbDx8+ZMiQIVhbW7Nr1y4gvWBMnz5d7/XR0dFMmDCBtLQ0NBoN1atXZ9asWYUVVwhRxCQmJrJ582bu3LmDi4sL3bp1Y/jw4dy4cYP69evz2WefkZqaioeHB4sWLaJ69erExcXRo0cP7O3tqV69OpB+9diM+0xMnTqVa9eusXHjRkqUKGFwu2ZmZkyePJmlS5fi6OjI77//zk8//ZRt9lKlSuHu7v7SW8EmJyezefNmLl68yMCBA/H09GTr1q3s3r2bpUuX8ssvvxAeHs7q1atZv349lpaW/P3334wYMYJDhw7lfeP+v0IrGDNmzGDGjBlZPhccHJxpWaVKlfD39y/oWEKI10RG33/FihV56623uH//vq4IZAgLCyM0NJTJkyfrlqWkpPDPP//oXtutWze993Ts2DHbYpFduykpKVhYWOhmh7Zr1y5fJuxktFWvXj0SEhLo1KkTAPXr1yc8PByAI0eOEB4erncKQ2pqKg8fPqRMmTKv1L4quqSEEOJVZXcfiedptVpKlSqld/7Xi14sDjkVC0PbzS8ZbWXcljXj8fP3/gBo3rw5CxcuzPf2VTHoLYQQhaFq1apYWFjo9V6EhoYSFxeXr+1Uq1aNhIQEzp49C8C+ffteemc+S0vLfLlvRrNmzThy5Ah///23btnFixdfeb0gRxhCiDzSpCYWyCX9C/I8DFNTU9asWcO8efNYv349Go0GGxsbli9fnq/tmJubs2TJEmbPng1A48aNsbGxwcrKKtv3NG3alO+//56uXbvSuHHjbLvwX6ZKlSosWrSI6dOnk5iYSEpKCg0aNOCDDz7I0/qeJ/fDQO6HIYQh5H4YuRMXF4elpSUAJ06cYNq0aezfvx9jY/V07OT2fhhyhCGEEAXgt99+44cffkCr1WJubs7ixYtVVSzyQgqGEEIUgO7du9O9e/csl784MP7hhx/i4+NTWNHyTAqGEEIUou3btysdIc+K9vGREKJQvaZDnm+kvHyWUjCEEAaxsLAgOjpaisZrQKvVEh0dnevrVEmXlBDCIBUrVuTOnTs8ePBA6SgiH1hYWLx0BumLpGAIIQxiZmZG1apVlY4hFCRdUkIIIQwiBUMIIYRBpGAIIYQwiBQMIYQQBpGCIYQQwiBSMIQQQhhECoYQQgiDFNp5GL6+vgQHB3P37l0CAwN1N0J3dnbG3Nxcd+coDw8PmjdvDsCFCxfw9vYmKSmJChUqsGjRImxsbAorshBCiOcU2hFGmzZt8PPzo0KFCpmeW7FiBQEBAQQEBOiKhUajwdPTE29vb4KDg3F0dGTx4sWFFVcIIcQLCq1gODo6YmdnZ/DrL1++TLFixXQ3Ue/Tpw979+4tqHhCCCFeQhWXBvHw8ECr1dKwYUMmT57MW2+9xb179yhfvrzuNaVLl0aj0fD48WOsra0VTCuEEG8mxQe9/fz82LlzJ9u2bUOr1RaJm4gIIcSbSPGCkdFNZW5uTr9+/Th37pxueUREhO51MTExGBsby9GFEEIoRNGC8ezZM2JjY4H067Pv3r1bd0Py+vXrk5iYyJkzZwDYtGkTHTt2VCyrEEK86QptDGPu3Ln89ttvPHz4kCFDhmBtbc2aNWuYMGECaWlpaDQaqlevzqxZswAwNjZm4cKFzJo1S29arRBCCGUUWsGYMWMGM2bMyLTc398/2/c0aNCAwMDAgowlhBDCQIqPYQghhCgapGAIIYQwiBQMIYQQBpGCIYQQwiBSMIQQQhgkx1lSqampHDhwgEOHDvHnn38SGxuLlZUVtWvXpkWLFrRt2xZTU1VcXUQIIUQBy/bb/pdffmHt2rVUr16dRo0a0bp1a0qWLEl8fDyhoaFs2bKFBQsWMGrUKPr27VuYmYUQQigg24IRHh7Oli1bKFu2bKbn2rVrx+jRo4mKimLDhg0FGlAIIYQ6ZFswpkyZ8tI3lytXzqDXCSGEKPoMGoC4ceMG1tbWlClThri4OL7//nuMjY0ZNmwYxYsXL+iMQgghVMCgWVKTJ0/m6dOnACxcuJDTp0/rbp8qhBDizWDQEcbdu3epVq0aWq2W33//nV27dmFhYUGbNm0KOp8QQgiVMKhgFCtWjLi4OEJDQ7Gzs6N06dKkpqaSlJRU0PmEEEKohEEFo0uXLgwaNIj4+Hg+++wzAK5evUrFihULNJwQQgj1MKhgeHl5cfToUUxNTWnSpAkARkZGTJs2rUDDCSGEUA+DT9P++OOP9R6///77+R5GCCGEemVbMPr164eRkdFLV+Dn55evgYQQQqhTtgWjV69euv+Hh4ezbds2unXrRvny5YmIiMDf358ePXoUSkghhBDKy7ZgdOvWTff/Tz/9lPXr11OjRg3dMhcXF7y8vHB3dzeoIV9fX4KDg7l79y6BgYHUrFmTR48e8cUXXxAeHo65uTmVK1fGx8eH0qVLA1CrVi1q1qyJsXH66SILFy6kVq1aefpBhRBCvBqDTtwLDQ3l3Xff1VtWsWJF/vnnH4MbatOmDX5+flSoUEG3zMjIiOHDhxMcHExgYCCVKlVi8eLFeu/btGkTAQEBBAQESLEQQggFGVQwGjVqxNSpUwkLCyMxMZGbN28yffp0HB0dDW7I0dEROzs7vWXW1tY4OTnpHtvb2xMREWHwOoUQQhQegwrGggULgPTzMRwcHHBxcUGr1TJv3rx8C6LRaPjll19wdnbWWz5gwABcXV1ZsmQJycnJ+daeEEKI3DFoWq21tTXLli1Do9EQExND6dKldeMK+eXLL7+kRIkSuhMDAQ4dOoSdnR1xcXF4enqyatUqJk2alK/tCiGEMIzB52HExsZy8+ZN4uPj9ZY3bdr0lUP4+vpy69Yt1qxZo1eIMrqwLC0t6dWrl9x7QwghFGRQwdi+fTs+Pj6UKFECCwsL3XIjIyP279//SgGWLl3K5cuXWbduHebm5rrlT548oVixYlhYWJCamkpwcDB16tR5pbaEEELknUEFY9myZXz99de0bNkyzw3NnTuX3377jYcPHzJkyBCsra1Zvnw5a9eupUqVKvTp0wdIn321atUq/vnnH7y9vTEyMiI1NRUHBwcmTpyY5/aFEEK8GoMKRlpaWqZLg+TWjBkzmDFjRqbl169fz/L1Dg4OBAYGvlKbQggh8o9BI9cjRozgm2++QaPRFHQeIYQQKmXQEcYPP/zAw4cP+e6777C2ttZ77tChQwWRSwghhMoYVDAWLVpU0DmEEEKonEEFo3HjxgWdQwghhMoZNIaRkpLCihUraNOmDe+//z5t2rRhxYoVcuZ1AdGkJqpyXUKIN5vBXVIXL15kzpw5usubr169mri4OLy8vAo64xvH2NSCm8vNX/5CA1T9XIq6ECJ/GFQw9u7dS0BAAKVKlQKgWrVq1K1bF1dXVykYQgjxhjCoS0qr1eZquRBCiNePQQWjY8eOjBkzhiNHjhAaGsrhw4cZN24cnTp1Kuh8QgghVMKgLilPT0+++eYbfHx8iIqKwtbWls6dOzN27NiCzieEEEIlDCoY5ubmTJw4Ua7lJIQQbzCDuqTWrVvHxYsX9ZZdvHiRb7/9tkBCCSGEUB+DCsaPP/7Ie++9p7esevXq/Oc//ymQUEIIIdTH4BP3TE31e6/MzMzkxD0hhHiDGFQw6tWrx88//6y3bNOmTXN82h0AACAASURBVNStW7dAQgkhhFAfgwa9p02bxpAhQ9i5cyeVKlXi9u3bPHjwQG6ZKoQQbxCDCkaNGjUIDg7m0KFD3Lt3j/bt29OqVStKlixZ0PmEEEKohEEFA6BkyZI0aNCAyMhI7O3tCzKTEEIIFTJoDCMiIoI+ffrQqVMnhgwZAqRfX2r69OkFGk4IIYR6GFQwvL29adWqFefOndPNlmrWrBnHjx83qBFfX1+cnZ2pVasWf/31l275zZs36d27Nx06dKB3796EhYUZ9JwQQojCZ1DBuHTpEiNHjsTY2BgjIyMArKysiI2NNaiRNm3a4OfnR4UKFfSWz5o1i379+hEcHEy/fv3w9vY26DkhhBCFz6CCYWNjw61bt/SW3bhxAzs7O4MacXR0zPTa6Ohorl69SpcuXQDo0qULV69eJSYmJsfnhBBCKMOggjF06FBGjx7Ntm3bSE1NJSgoiEmTJjFixIg8N3zv3j1sbW0xMTEBwMTEhHLlynHv3r0cnxNCCKEMg2ZJ9ezZE2trazZv3oydnR07duxg4sSJtG3btqDzCSGEUAmDp9W2bds2XwuEnZ0dkZGRpKWlYWJiQlpaGlFRUdjZ2aHVarN9TgghhDJy7JK6fPmy3qymmJgY/vWvf9G1a1e8vb2Jj4/Pc8M2NjbUqVOHoKAgAIKCgqhTpw6lS5fO8TkhhBDKyLFgzJs3j4cPH+oeT58+nbCwMHr37s3ff//NokWLDGpk7ty5tGjRgvv37zNkyBA++eQTAGbPns3GjRvp0KEDGzduZM6cObr35PScEEKIwpdjl1RoaCiOjo4APH36lCNHjhAYGEjVqlVxdnamT58+zJ49+6WNzJgxgxkzZmRaXr16dbZs2ZLle3J6TgghROHL8QgjLS0NMzMzAC5cuECZMmWoWrUqkD4G8fTp04JPKIQQQhVyLBjvvfcee/bsAWD37t00bdpU91xkZCRWVlYFm04IIYRq5Ngl5eHhwZgxY5g9ezbGxsZ698TYvXs3DRo0KPCAQggh1CHHguHo6MjBgwcJCwujSpUqWFpa6p5r2bIlnTt3LvCAQh00qYkYm1qobl1CiMLz0vMwLC0tqV+/fqbl1apVK5BAQp2MTS24udw8X9ZV9XO5ta8QRVG2Yxg9evRgz5492d63Ozk5md27d9OrV68CCyeEEEI9sj3C8PX1ZcWKFcyePZt69epRtWpVSpYsSXx8PGFhYVy5coUmTZqwYMGCwswrhBBCIdkWjPfee48VK1bw4MEDjh07xl9//cWjR4946623cHV1ZeHChdjY2BRmViGEEAp66RhG2bJlcXNzK4wsQgghVMygy5sLIYQQUjCEEEIYRAqGEEIIg0jBEEWaJjVRVesR4nVm0A2UtFotW7ZsISgoiEePHhEYGMjp06d58OCBnO0tFJVfJxTKyYRCvJxBRxhff/01W7dupXfv3rr7ar/zzjt89913BRpOCCGEehhUMHbs2MGaNWv45JNPMDIyAqBixYrcvn27QMMJIYRQD4MKRlpaGiVLlgTQFYz4+HhKlChRcMmEEEKoikEFo2XLlsyfP193XSmtVsvXX39N69atCzScEEII9TBo0HvatGlMmTKFhg0bkpqaioODA82aNcPX1/eVA9y5c4dx48bpHsfGxhIXF8epU6dwdnbG3NycYsWKAen352jevPkrtymEECL3DCoYlpaWrFq1iocPHxIREYGdnR1ly5bNlwAVK1YkICBA9/irr74iLS1N93jFihXUrFkzX9oSQgiRdwYVjAwWFhbY2tqi0WiIjIwEwNbWNt/CJCcnExgYyPr16/NtnS+jSU3ItymVmtQEjE2L58u6hBBCbQwqGMePH2fmzJlERESg1Wp1y42MjLh27Vq+hTlw4AC2trbUq1dPt8zDwwOtVkvDhg2ZPHkyb731Vr61B2BsWhxjj8B8WZdmsUu+rEcIIdTIoIIxffp0xo4dS+fOnbGwKLhba27bto0ePXroHvv5+WFnZ0dycjJfffUVPj4+LF68uMDaF0IIkT2DZkklJSXRvXt3SpYsiYmJid6//BIZGcnp06dxcfnfX+l2dnYAmJub069fP86dO5dv7QkhhMgdgwrG4MGD+e677/S6o/Lbjh07aNmyJaVKlQLg2bNnxMbGAunTeHfv3k2dOnUKrH0hhBA5M6hLqn379gwbNoy1a9fqvtAz7N+/P1+C7Nixg+nTp+seR0dHM2HCBNLS0tBoNFSvXp1Zs2blS1tCCCFyz6CC4e7ujqOjIx07diywMYzg4GC9x5UqVcLf379A2hJCCJF7BhWMO3fu4O/vj7GxXA1dCCHeVAZVgDZt2nDixImCziKEEELFDDrCSE5OZsyYMTg6OmJjY6P33MKFCwskmBBCCHUxqGDUqFGDGjVqFHQWIV4LmtREjE1ffawvv9YjRH4xqGCMHz++oHMI8dqQuwCK11W2BeP06dM0atQIgD/++CPbFTRt2jT/UwkhhFCdbAvGnDlzCAoKAtA7P+J5RkZG+XYehhBCCHXLtmAEBQURFBREly5dOHDgQGFmEkIIoUI5Tqv19vYurBxCCCFULseCUZDXjhJCCFG05DhLSqPRcOLEiRwLhwx6CyHEmyHHgpGcnMz06dOzLRgy6C2EEG+OHAtG8eLFpSAIIYQADLyWlBBCCCGD3kIIIQySY8E4f/58YeUQQgihctIlJYQQwiBSMIQQQhjEoKvVFjRnZ2fMzc0pVqwYAB4eHjRv3pwLFy7g7e1NUlISFSpUYNGiRZnuxyGEEKJwqKJgAKxYsYKaNWvqHms0Gjw9PZk/fz6Ojo6sXr2axYsXM3/+fAVTCiHEm0u1XVKXL1+mWLFiODo6AtCnTx/27t2rcCohhHhzqeYIw8PDA61WS8OGDZk8eTL37t2jfPnyuudLly6NRqPh8ePHWFtbK5hUCCHeTKo4wvDz82Pnzp1s27YNrVaLj4+P0pGEEEK8QBUFw87ODgBzc3P69evHuXPnsLOzIyIiQveamJgYjI2N5ehCCCEUonjBePbsGbGxsUD6meW7d++mTp061K9fn8TERM6cOQPApk2b6Nixo5JRhRDijab4GEZ0dDQTJkwgLS0NjUZD9erVmTVrFsbGxixcuJBZs2bpTasVQgihDMULRqVKlfD398/yuQYNGhAYGFjIiYR4/WhSEzE2tVDdukTRonjBEJlpUhOo+nlyvq3L2LR4vqxLFF3GphbcXG6eL+vKr31TFD1SMFTI2LQ4xh75c2SlWeySL+sRQgjFB72FEEIUDVIwhBBCGEQKhhBCCINIwRBCCGEQKRhCCCEMIgVDCCGEQaRgCCGEMIgUDCGEEAaRgiGEEMIgUjCEEEIYRC4NIgwi17cSQkjBEAaR61uJ/CZX0C16pGAIIRQhV9AtemQMQwghhEGkYAghhDCIdEmJIi2/BuNlIF6Il1O8YDx69IgvvviC8PBwzM3NqVy5Mj4+PpQuXZpatWpRs2ZNjI3TD4QWLlxIrVq1FE4s1CS/BuNlIF6Il1O8YBgZGTF8+HCcnJwA8PX1ZfHixcybNw+ATZs2UbJkSSUjCiGEQAVjGNbW1rpiAWBvb09ERISCiYQQQmRF8SOM52k0Gn755RecnZ11ywYMGEBaWhotWrRgwoQJmJvnzzQ8IYQQuaP4EcbzvvzyS0qUKMFnn30GwKFDh9i+fTt+fn7cuHGDVatWKZxQCCHeXKopGL6+vty6dYvly5frBrnt7OwAsLS0pFevXpw7d07JiEII8UZTRZfU0qVLuXz5MuvWrdN1OT158oRixYphYWFBamoqwcHB1KlTR+GkQrycTPUVryvFC8bff//N2rVrqVKlCn369AGgYsWKDB8+HG9vb4yMjEhNTcXBwYGJEycqnFaIl5OpvuJ1pXjBqFGjBtevX8/yucDA/LnYnRBCiFenmjEMIYQQ6iYFQwghhEGkYAghhDCIFAwhhBAGUXzQWwhR8OQWuyI/SMEQ4g0gt9g1jNw2NmdSMIQQ4v/JbWNzJmMYQgghDCIFQwghhEGkS0oIoQgZiC96pGAIIRQhA/FFj3RJCSGEMIgUDCGEEAaRgiGEEMIgUjCEEELlNKmJqliPDHoLIYTK5dcJha86K00KhhBC/D+Z6pszKRhCCPH/ZKpvzmQMQwghhEFUf4Rx8+ZNpk6dyuPHj7G2tsbX15cqVaooHUsIIQpNfnWVvWo3meoLxqxZs+jXrx+urq4EBATg7e3Njz/+qHQsIYQoNPnVVfaq3WSqLhjR0dFcvXqVDRs2ANClSxe+/PJLYmJiKF26dI7vTUtLA+D+/fsvbcf0WcyrhwXu3LmTL+sByZQb+ZFLMhnudd+n1JgJCm+fyvjOzPgOfZ6RVqvVvnKKAnL58mWmTJnCrl27dMs6d+7MokWLqFevXo7vPXPmDP379y/oiEII8Vry8/PD0dFRb5mqjzBeRf369fHz86Ns2bKYmJgoHUcIIYqEtLQ0Hjx4QP369TM9p+qCYWdnR2RkJGlpaZiYmJCWlkZUVBR2dnYvfa+FhUWm6iiEEOLlKleunOVyVU+rtbGxoU6dOgQFBQEQFBREnTp1Xjp+IYQQIv+pegwDIDQ0lKlTp/L06VPeeustfH19qVatmtKxhBDijaP6giGEEEIdVN0lJYQQQj2kYAghhDCIFAwhhBAGkYIhhBDCIFIwhBBCGETVJ+6JzG7evMn9+/exsLCgRo0aWFpaKh1JlWQ7CZH/pGBkIzw8nD/++EP3pVO7dm2aNGlCsWLFCj1LXFwcGzZsYOvWrZibm2NjY0NycjK3b9/mww8/ZPjw4TRp0qTQcwE8e/aMc+fO6W2n9957T5Esat5OoK59SjJJpryQ8zBecOHCBZYsWUJMTAwffvghZcuWJSkpidDQUEJDQ3Fzc2PkyJFYWFgUWqYePXrg6upK586dKVOmjG65RqPh7NmzbNq0icaNG9O7d+9Cy3T37l1WrlzJ4cOHqVGjBmXKlCE5OZnQ0FCMjIwYOnQoPXr0KLQ8oM7tBOrcpySTZMoTrdAzadIk7bVr17J87tmzZ9pNmzZpt2zZUqiZkpKS8uU1+emzzz7T7tu3T5uSkpLpuTt37miXLFmi3bhxY6FmUuN20mrVuU9JJsmUF3KEUYSsWrWK7t27G3TxxTeZbCchCobJ7NmzZysdQo0GDhyIiYkJlStXxtRUHUM9Bw8e5KuvvuLw4cMYGxtTpUoVxbN5eXlhbW1N+fLlFc3xPDVuJ1DnPiWZDCOZ0knByEapUqXYtWsX8+bNIywsDGtra8X/Yv34448ZNGgQ1tbW7N69m/nz53Pr1i1at26tWKb79++zdu1a1q9fT1xcHJUqVVJ8RpIatxOoc5+STJIpN6RL6iUePXrErl272LFjB/Hx8ezdu1fpSAD89ddffP/99wQGBnLlyhWl43D9+nX8/f3ZtWsXNWrUYP369UpHAtS3nUCd+5RkkkyGUMexlYoZG6ef26jValG6tj5+/JigoCC2b99OfHw83bp1Y9++fYpmylCjRg0aN27MrVu3OHXqlKJZ1LydQF37VAbJZJg3PZMcYWTjwIED7Nixg7Nnz9KmTRvc3Nxo2LChopmaNGlCu3btVJElw/Xr19mxYwdBQUHUrFmTbt260a5du0KdZvgiNW4nUOc+JZkkU25IwcjGkCFD6NatG+3bt1f0y+95iYmJqsmSoUOHDnTr1g1XV1fF+3QzqHE7gTr3KclkGMmUTgqGAZKTk3ny5Ally5ZVNMeGDRvo2bMnVlZWeHp6cunSJWbMmMHHH3+saC61KQrbSS371PMkk2He5Exy8cFsTJo0idjYWBITE3FxceGTTz5RfCB3+/btWFlZceLECWJiYpg3bx5Lly5VNNOCBQuIjY0lNTWVfv36YW9vT0BAgKKZ1LidQJ37lGSSTLkhBSMbN2/exMrKikOHDuHk5ERISAj+/v6KZjIxMQHg5MmTuLi40KBBA8UH3o4fP46VlRVHjx7F1taW4OBgvv/+e0UzqXE7gTr3KckkmXJDCkY2UlNTATh9+jQtW7akePHiutkISrGwsGDdunXs2rWLZs2aodVqSUlJUTRThtOnT9OuXTtsbW0xMjJSNItat5Ma9ynJJJlyQwpGNqpXr87w4cM5ePAgTZs2JTExUelIzJ8/nwcPHuDh4UHZsmW5ffs2Li4uimaysbFh1qxZ7Nmzh2bNmpGamkpaWpqimdS4nUCd+5Rkkky5IYPe2UhMTOTo0aPUqlWLSpUqERkZyfXr12nRooXS0VQlJiaGnTt3Ym9vj729PXfu3OHUqVN0795d6Wiqo8Z9SjJJplwpkEsavgauX7+ujY+P1z2Oi4vT/vXXXwom0mr79Omjffz4se7xo0ePtP369VMwkVYbHR2tdwXYpKQkbXR0tIKJ1LmdtFp17lOSyTCSKZ10SWVj6tSpmJmZ6R6bmZkxZcoUBROl36zo7bff1j22trYmPj5ewUQwatQovS6o1NRURo8erWAidW4nUOc+JZkMI5nSScHIRlpamt6HYW5urnjfvEajISEhQfc4Pj5eN/CllOTkZIoXL657XKJECZKSkhRMpM7tBOrcpySTYSRTOikY2TA1NeX27du6x+Hh4brpmkrp0qULQ4YMISAggICAAIYNG0bXrl0VzQTp4xgZoqOj0Wg0CqZR73ZS4z4lmQwjmf6/zQJdexE2fvx4+vbtS8uWLQEICQlh7ty5imYaNWoU5cqV48CBAwD06dMHNzc3RTMNGDCAvn374urqCkBAQAAjR45UNJMatxOoc5+STJIpN2SWVA7CwsI4duwYkH6PhcqVKyucSJ1OnjxJSEgIAK1ataJx48YKJ1IvNe5TkskwkkkKRpG3efNmevfurXQM1ZPtJMSrkzGMXJg5c6bSETKJjIxUOkImK1euVDpCJmrcTqDOfUoyGeZNzCS3aM2lqlWrKh1Bj5OTk9IRMrl58yb169dXOoYeNW6nDGrbp0AyGepNyyRdUkVMeHg44eHhetPnMga9xP/IdhIi/8ksqWwkJiYSFBREeHi43hz+L774QrFMCxcuxN/fn6pVq+ouMmZkZKT4F+Eff/yRaTv1799fsTxq3U5q3Kckk2TKDSkY2Rg/fjzGxsbUq1cPc3NzpeMAsG/fPvbv3693opzSPD09uX79OrVr11Z8XnoGNW4nUOc+JZkMI5nSScHIxr1799i1a5fSMfTY2dnpndmpBpcuXWLXrl2qKRagzu0E6tynJJNhJFM6KRjZqFGjBlFRUZQrV07pKDpTp05l9OjRNGvWTO8vCiW7f959910SEhKwtLRULMOL1LidQJ37lGQyjGRKJ4Pe2bhx4wbDhw+ndu3aFCtWTLf866+/VizTpEmT+Oeff6hVq5beX/Tz589XLFNoaCgeHh40bNhQ78tZyb5dNW4nUOc+JZkkU27IEUY2vvjiC5ydnalbt65quluuXLlCcHCw4ne0e97cuXOxtbXFyspKttNLqHGfkkyGkUzppGBkIyUlBW9vb6Vj6KlSpQrPnj2jZMmSSkfRuX//Pnv27FE6hh41bidQ5z4lmQwjmdJJwciGvb09169fp1atWkpH0bG0tKR79+40b95cNd0/tWrVUl3frhq3E6hzn5JMhpFM6aRgZOPixYv06NGDqlWr6vUPbt26VbFM1apVo1q1aoq1n5XY2FhcXFxwcHBQTd+uGrcTqHOfkkySKTdk0Dsbp06dynK5XIlV344dO7Jc3q1bt0JOon5q3Kckk2EkUzopGEXQH3/8wblz56hbty6tW7dWOo6qnDlzhj179nDv3j0g/ZyMTp064ejoqHAykRehoaFUr15d6Rji/8nVal+QlJTEqlWr6NKlC46Ojjg6OuLi4sKqVatITExUJNPzl+X29/fH29ubuLg4li5dyoYNGxTJBLBz507WrFnDn3/+qbd87dq1iuRZvXo1Pj4+VKhQARcXF1xcXKhQoQI+Pj6sWrVKkUyQ/qU3cuRIvL29efLkCaNHj8bBwYHevXsTGhqqSKZt27bp7pR4//59Bg0aRIMGDejXrx/h4eGKZEpISMj0b8SIESQmJurdcrcwJScn88033zBz5kwOHTqk99yXX36pSKacFPTVauUI4wWTJ0+mRIkS9OnTh/LlywMQERHBpk2biIuLY/ny5YWeyc3NDX9/fyC9eCxZsoSKFSvy+PFjBgwYQGBgYKFnWrRoEefPn6du3boEBwczbNgwBg8eDKR3R2XXVVWQ2rdvT2BgoF5/LqRfc8fFxYXff/+90DNB+gmDHTt2JDY2lj179uDm5oabmxuHDh3C39+fn376qdAzdenShaCgIAA+//xz7O3t6dq1K4cPH8bf358ffvih0DPVrl0bIyMjsvpKMjIy4tq1a4WeycvLi4SEBD744AO2bdtG06ZNmT59OqDcfp6TVq1aZSps+UkGvV+QMYf/eaVLl2bu3Ll06NBBkUzPn0+QkpJCxYoVAbC2tsbUVJmPMCQkhB07dmBmZsaYMWMYO3YscXFxjB8/Pstf+MKg1WqzPPciuy+hwhIXF8eAAQOA9Bs5DRs2DIAePXooUiwAvYvV3bp1S/eHkJubmyLFAtK/gI2NjZk2bZruygHOzs66W+0q4dKlS7o/yPr27cvkyZPx8vLiq6++Umyfatq0aZbLtVotsbGxBdq2FIwXGBsbc/v2bSpVqqS3PDw8XLETwW7evEnPnj3RarWEh4cTFxen+4VKSUlRJBOgu16TjY0N69evZ8yYMSQlJSm2ndzc3OjVqxdubm56R4f+/v6K3tM7NTWVpKQk4uPjefr0KdHR0djY2JCQkEBSUpIimd59910OHDiAs7Mz7777LmFhYVSpUoUHDx4okgfSz8Q/ePAggwcPxt3dnRYtWih+8uXzl8e3sLBg5cqVeHh44OnpiUajUSSTVqvlhx9+wMrKKtPyvn37FmjbUjBe4OnpSd++falfvz4VKlQA4O7du1y+fBkfHx9FMq1bt07vccYv0YMHDwp8B8mOpaUl4eHhvPvuu7rH3377LaNGjeKvv/5SJNO4ceNwcnJi9+7dnDx5EoDy5cszffp0RWezuLi40KlTJ1JTU5kwYQLu7u7UqlWLs2fP0qZNG0UyzZo1i/Hjx7NhwwbefvttevXqRb169bh3756id5Jr3bo19vb2fPnll+zevVvvC1sJZcqU4c8//6R27doAmJiYsGTJEqZMmcLff/+tSKb69evz6NEjXabn2draFmjbMoaRhWfPnnH48GG9mTbNmzdX3ZnDSjp//jxWVla89957esuTk5PZsmWL4hf6U5uMiQG1a9cmIiKCvXv3UrFiRdq3b69oruPHj3Pjxg00Gg12dna0aNFCNZeF37NnD6dOnWLWrFmKZQgLC8PMzEz3x2MGrVbL4cOHFbnHSnJyMiYmJspcokQrirSVK1cqHaFI2Lp1q9IRXmrMmDFKR8hEjZl69OihdIRM1JipID47mVb7gqI2jU7JM02zo2SXRnZWrlypdISXioiIUDpCJmrM9PyAvVqoMVNBfHYyhvGC2bNn66bRLV68mGPHjumm0Z07d06RTD179sxyuVarJTo6upDTvNyRI0cUaXfixIlZLtdqtTx58qSQ0+Se0gO8WZFMhnlTMknBeIEap9GFhYWxZMmSTH3LWq2WSZMmKZJJyal92QkJCcHLyyvT3fa0Wq1uEFwIkXdSMF6gxml0devWxdLSkoYNG2Z6TqlbkWoVnNqXnTp16lC7dm0++OCDTM8peTFEIV4XMobxgoxpdBkyptEZGRkpNo1u0aJF2V7C+ODBg4WcJl3G1L4KFSro/atYsWKBT+3LzqxZs7Czs8vyuZ9//rmQ0+TejRs3lI6QiRozKfV7mBM1ZiqIz04Kxgt8fHwy/dVsbGzMwoULWbNmjSKZbG1ts71ntrHx/z7C2bNnF1Ki9Os2ZXdug1ID8bVr16Zs2bJZPvf8tMjVq1cXVqRcUeNF9tSY6cWp3GqgxkwF8dlJwXhBlSpVMs25hvQBpOfnXGc3EK2k//73v4XWlrm5uUHzwMeOHVsIaXJHqWtKvcybMnD6qiSTYQoikxSMPFLjNDo1UuO0TKUmLwhR1EnByCM1/kWhRmrcTmrMJERRIAVDCJVQ45GPZDLMm5JJCsZrRI0zNdRIySv8ArobF71IqXNqQH2Z4uLiuHLlSpbPLViwoJDTpFNjJijcz04KRh6p8ctZjTM11DgtUyn//e9/ad26te5+55cuXdK7jIoSF7JTY6aQkBA++eQTJkyYoMs0evRo3fNZXaX1TcykxGcnBSOP1PjlrMa+eTVOy1TqZMf58+fz7bffUqpUKQDef/99xS43o+ZMK1asYOvWrbz11lu6TErdNlbNmZT47KRg5JEav5zVSLbT/6SkpGT6Q0Op4pVBjZmATOfTmJubK5Tkf9SWSYnPTgqGEIXE3Nyc+Ph4XRG9ceNGpvuPSyYoWbIkDx8+1GU6efJkppNpJZMyn51cS+o1osaZGmqk1HYaPXo0w4YNIyoqiqlTp3LkyBEWLVqkSBY1Z/Lw8GDEiBHcuXOHAQMGEBYWxjfffCOZXqDEZyd33MsjNzc3/P39lY6hZ8uWLfTq1UvpGHrUuJ1WrVrFuHHjFGn79u3bHDlyBK1Wy8cff0zlypUVyaH2TLGxsbr+eAcHB93YgZLUmKmwPzspGDmIi4vj1q1b1KtXL9Nzz9/nt7CEhYUxbdo0IiMjOXDgAFeuXOHAgQO6mRtKiomJoXTp0pmWh4SEFPpMm+joaObPn8+9e/fw8/Pjzz//5Pz584pdRVeI14UUjGyEhITg7e2NiYkJBw4c4NKlS6xatUqxCxACDB48mKFDh7JkyRICAgLQaDS4uLiwa9cuxTL997//5fPPP0ej0RASEsKlS5f49ddfFb074ZgxY2jRogU///wzgYGBJCcn06NHD919Tgpbjx49chz8V+JijWrM1KRJkywzabVajIyM1pKA9AAAHmJJREFU+OOPPyQTyn52MoaRjYxpdCNGjADUMY0uNjaWFi1asHTpUiD9SrVKz2jJmNrn4eEBpG+nqVOnKpopMjKSvn37snnzZiB9cPD5q/oWtilTpijWdnbUmGnbtm1KR8hEjZmU/OykYORAbdPoTExMSElJ0f11ERkZqegXIahzWqapqf5u/fTpU0UnBGR3GXglqTFTVleJVpoaMyn52UnByIYap9H169eP8ePH8+jRI1auXIm/v7+il5MAdU7LbNeuHd7e3sTHx7N9+3Z+/vlnevTooVieRYsW4enpibu7e5ZdCUrcDVCNmTw9PVm0aFG2XS5KdJOpMZOSn50UjGyocRqdm5sbFStW5ODBgyQkJODr64ujo6OimdQ4LXPEiBHs3LmTp0+fEhISwoABA3B1dVUsT8Zn1Lp1a8UyvEiNmQYNGgSoq7tMjZkU/ey0IltPnz7VHjp0SHvo0CHtkydPlI6jWuHh4Vo/Pz/txo0btWFhYUrHUZ1JkyZptVqt9ocfflA4yf+oMVP//v21Wq1Wu3DhQoWT/I8aMyn52ckRRg6srKwUufjai7I79MygRPfB8ypVqkS/fv0UzQCwcOHCHJ//4osvCimJvowLVfr7++v+YlWaGjNFR0fz6NEjjh49yoQJEzKNOxUvXlwyoexnJwXjBWqcRqemboMMapyWWaJEiUJv0xD169enYcOGJCUl0bRpU91yJfcpNWZq3749rVq1Ijk5GXt7eyD9WmQZma5duyaZUPazk/MwXnD37t0cn1fjrAklnDp1Ksfn1TgLR0kPHz5k0KBBrFu3LtNzSu1TaswE0L9/f/z8/BRrPytqy6TUZycFowj4z3/+w6BBg7LtclGqq0Vt9uzZQ6dOnbL9xe7fv38hJ9IXHx9PyZIlFc3wIjVmEoZR4rOTLqkXqHEanYWFBaCuLhc1TsvMuFnT5cuXC71tQ6jxi1mNmYRhlPjspGC8QI3T6G7evAmk34yoU6dOCqdJp8ZpmRm/QD179qRhw4YKpxHi9SMF4wULFixg48aNhISE4OnpqXQcAN0g1rp161RTMAIDA2ndujVPnz5VzSybwMBAhg4dyty5c9mxY4fScYR47UjBeIEap9GVK1cOFxcX7ty5Q8+ePTM9r0Q3mRqnZRYrVozRo0dz9+5dJk6cmOl5pacfi7yJiorC2tpa8UvzPE+NmQqDFIwXqHEa3erVq7l69Sqenp6qGeBW47TMNWvWcPz4ca5fv06rVq0Kvf3c6tSpE6ampowcORIXFxel4wDqzDRo0CCePHnCwIEDGT16tNJxAHVmKozPTmZJZUNt0+ggfSyjatWqSsfQUeu0zJMnT+Lk5KRY+4aKjIwkKiqKs2fPMnjwYKXjAOrMBOn3Wzl79izt2rVTOoqO2jIVxmcnBUO8EpmWKcSbQ9lrY4siT4rFyx05coQnT54A6Zda9/LywsXFBU9PT2JiYiTT/3N3d+f3338nNTVVkfazEhoaysiRI/H29ubJkyeMHj0aBwcHevfuTWhoqCKZUlJS2LhxI35+fqSmprJ7927GjBnDsmXLSE5OLtC25QhDiALWpUsXdu7cibGxMTNmzKBEiRJ06dKFI0eOcPXqVVatWiWZSL8sj52dHZGRkbi4uNCjRw9q1qxZ6Dme179/fzp27EhsbCx79uzBzc0NNzc3Dh06hL+/Pz/99FOhZ/L29ubx48ckJiZiaWlJSkoKnTp1Yv/+/VhbWzNz5swCa1sGvYUoYFqtVnejqytXruim/H7wwQeKXXZdjZneeecdduzYwZUrV9i+fTsDBgygUqVK9OjRAxcXFywtLQs9U1xcHAMGDABg8+bNDBs2DEi/lpoSxQLg/PnzulsPN23alGPHjmFhYUHbtm0L/L4v0iVloKioqAI/3MutwYMHM3r0aM6dO6d0FFWbMmUKc+fOJSIiQpH2S5UqxcWLF4H0uzhGR0cD6eM/Go1GMv2/jCsG1KtXj5kzZ3LkyBGGDBnCvn37aN68uSKZUlNTSUpKIiYmhqdPn+q2U0JCAklJSYpkyrijpLm5OZUqVdJdCcLc3BwTE5OCbbtA1/4aUeM0usmTJxMREcGePXto0KCB0nEAdU7LbNeuHbdu3WLBggWsWLGi0NufPn06EyZMwNHRkbJly/Lpp5/SpEkTLl68yP+1d+ZBUZz5G39gOFbEqKOCEMMmHjCi4o4ORwKIoOuAMAzCglhRs4IRDYcoKMYkaimaeCGGiEahJEt2ASEKaMmRLQVcdAtxjaAVghIhyCnBg2F0Dnh/f/ijl+EwuJHpNnk/VVM13T3zvs+80P12v+/zfr+rV6/Wuh6uauo7Om5gYABPT094enqiubmZFU0SiQQeHh5Qq9UIDw9HREQErKyscO3aNSxYsIAVTTo6OlCr1dDT09NwKCqVSnR1dQ1v3XQOY+hwzUbHRbhqy2SbJ0+e4Ny5c7hz5w66urpgbm4ODw8PmJmZUU3/T0FBAcRiMSt1P4+qqioAgEAgQENDAwoKCjBp0iQsWrSINT2TJ0/ut2iwvr4e169fh7e397DVTTuMV4SLFy9CV1cXLi4uKC8vR35+PqysrODv78+2NM7y448/oqKiAgKBAAKBgG05FMorD53DeAGG033wPOLj45GYmIj4+Hjs3bsXhw8fhomJCXJzc5GQkMCKJi7aMsPDw5n3xcXFWLlyJS5cuICQkBDk5OSwoglg1wb5PHJzc3Hs2DHmDrqHL7/8kiVFg8PWuadQKJCYmAgvLy+IRCKIRCJIJBIcOXIET58+ZUVTz3nXQ1ZWFrZu3YrU1NR+w3ovG/qE8QLMnz8fRUVFWq9XIpEgOzsbT548gZOTE4qKijBmzBjI5XIsXboUZ8+e1bomLtoyfXx8kJ2dDQBYsWIFtm7diunTp6OhoQGhoaHMMW3Dpg1yMPbv34/r16/D2toaBQUFCA4OZoYQlyxZwrngjWydexs3boSRkRECAwNhbm4OAGhsbER6ejpkMhni4+O1rqn33yc5ORmFhYWQSCS4cOECZsyYgaioqGGrm05696F3XKTeEELQ0dGhZTXP0NPTA4/Hg7GxMSwsLDBmzBgAz/JjDLcrYjC4aMvsnZdDJpNh+vTpANjPksimDXIwiouLcebMGejr62PdunX44IMPIJPJEBYWNux3qYPBxXPv1q1bKCgo0NjH5/MRGxvL2nxL77/P+fPnceLECfD5fPj5+cHf3592GNqEEIKUlBSMGjWq3/5ly5axoqm7u5sJ6rdnzx4NTWytiu2xZdrY2DC2zHHjxrFqy7x37x7Wr18PQghaWlqgVCqZiUE2Vw+zaYN8Hvr6+gCAcePGITk5GevWrYNCoXhurvbhhIvnnq6uLurr6/HGG29o7P/pp59Ya6fe9ero6IDP5wN4Fkm7539tuKAdRh9mzpyJBw8eDDhJampqyoIiIDo6Gk+fPsWIESMwc+ZMZn9dXR2WLFnCiiYu2jK3bt3KvHd1dYVcLoeBgQFaWlpYs0AC7NogB8PY2Bg//fQTLCwsmO0TJ04gJCQE1dXVrGji4rm3adMmLFu2DDNnzmSeVBsaGnDz5k3s3LmTFU3V1dV4++23QQhBZ2cn2tvbwefzoVarqa1W2yiVSvB4PFbv/F4VuGbL5Cps2iAH4/r16xg1ahSmTp2qsV+pVCIzM5OV/OdcPffkcjlKSkrQ1NQEADAzM4OzszNrcdQaGho0tk1MTKCvr68V2z/tMF4Rrl27hsbGRjg4OGDChAnM/jNnzrD2lPEqIJVKWXVIUSi/JWiH0QeFQoGkpCTk5eUxq0vNzMzg7u6O4OBgZvxZm5w8eRJpaWmYPHkyKioqsGPHDmbREFuOlkePHmH06NHMdlZWFv7zn/9g+vTpWL58OSvjuwNlI/zhhx9gZWUFgJ3MhD3k5uaisbER8+fP1xhy+fLLLxESEqJ1PSqVCikpKTh79iwaGxvB4/EwdepUBAcHw83NTet6AOCbb76Bq6sr+Hw+mpubERMTg8rKSggEAnz22WfM8Jk26ejoQGJiInR0dBAaGoq0tDTk5ubC0tISH3/8MWNA4QqffPIJdu3aNWzl0w6jD1y00UkkEqSlpcHY2Bg1NTUIDQ3FmjVr4Ovrq2El1SZsWvsGw9PTE0KhEFKpFIQQEEIQFRWFuLg4AICdnZ3WNQHctLB++OGH0NPTw7x585CXl4c333wT06ZNw4kTJ+Dn58fKkJSXlxfOnTsHAIiMjMSf/vQneHt7o6SkBNnZ2UhJSdG6psjISJiamkIul6Ourg5TpkyBj48PCgoK0NbWhn379mld0/MYdvsxoWiwaNGi/+nYcOLl5aWx3dLSQjw9PUlaWhrx8fFhRZNUKmXe+/r6kp9//pkQQohcLieenp6saFIoFOSzzz4j69atI62trYQQQtzc3FjR0htPT0+iVCoJIYS0tbWRgIAAkpCQQAjRbEdt4uHhwbxXqVQkMDCQEELIw4cPiVgsZkVT73r7/l+z1U49555arSZ2dnZErVYTQgjp7u7ud15qCwcHhwFf9vb2xNraeljrpi6pPnDRRqevr4/79+8zcxcmJib46quvEBQUhLq6OlY0sWntGwwDAwPExMSgvLwca9euZeUueTC4ZmHl8XhQKBQwNDSETCZjIq+OHj2a0aptLCwscOHCBbi5ucHCwgK1tbV48803cf/+fVb0AP+1RPN4PJiZmTET8jo6Osw6JG1DWLQf0w6jD1y00YWGhqKpqUljsnvcuHFISUnBiRMnWNHEprXvlxCJRPj666+xb98+TmQE5KKF1cvLCwEBARCJRCgtLWU617a2Nlb0AMD27dsRFhaGkydPYvTo0fD398eMGTPQ1NTEWmgQXV1dpmPtPfQrl8tZ0QOwaz+mcxgDwDUbHRfpa+2bMGECDAwMaETfAeCihRUASktLUVVVhRkzZsDBwYEVDQNx+fJl3LlzB93d3TAzM8O8efMwYsQIVrTcv38ffD6/n9W3ubkZd+/eHXR1+nDCpv2YdhivACqVChkZGdDR0cHSpUtRWFiIs2fPwtLSEqGhof38/b9XuOjcolB+S9BotS8AW4/Fu3btQllZGYqLi7F582bk5eVBIpHg3r172Lt3LyuaampqsGbNGmzbtg2PHj3C2rVrIRQKsXTpUtTU1LCiqXf+jeTkZGRmZsLa2hoXL15knFJcg61ow8+Dahoav0dNdA7jBbh06RIr9XIxeN22bdvg7u6Ojo4OLF++HD4+Pti9ezeKioqwY8cOVvIdExaDsv2v9J6X4gpU09D4PWqiHUYfuBgxk4vB62QyGVasWAEAyMjIQHBwMADAz8+Plc4C4KZz65cIDAxkW0I/qKah8XvUxM2ziEXYtKwNBheD16nVaigUCnR2duLx48dMtNonT54wFk1tw2Xn1mCUlpbC0dGRlbobGxuRn5+vYe4Qi8WshoOnmritic5h9KHHsvb6669rvCZNmsRaxMw9e/YwIcNNTEyY/S0tLXj//fdZ0SSRSODh4QEfHx+Eh4cjIiICO3fuRGBgIGuRYQsLC5GVlYVvvvkG+fn5TKf/+PFjREREsKLpl/joo49YqTczMxPLli1DQ0MDTE1NYWpqioaGBrz77rvIzMykmqimAaEuqT5wNWImF+lJ7SkQCJg7nkmTJjFxrijPGCx8BCEEp06dwrVr17SsCBCLxUhLS2OG7Xpob29HYGAgCgsLqSaqqR/0CaMPbCe1eVEyMjJYq1sgEDCLh8zNzREUFMTZzoJNR0tqaioMDQ1hZGSk8Ro5ciRrVt/u7u5+FxzgWWIstu4hqaahwaYmOofxAiQkJCA8PJxtGRq0tLSwLaEfXGwnNh0tlpaWEIvFA67MZWtYw8nJCatXr0ZAQIBGkM1Tp06xNqdCNXFfEx2SegHS09M56YzgGrSdNCktLYWFhUW/+GTAszwnc+fO1bqm7u5u5ObmIi8vD42NjQCePSW6u7tDKpWyEieJauK+JtphvCIolUo8ePCg38T77du3MW3aNJZUvTqw6UaiUH4r0DmMAaiursbt27cBALW1tUhJScHly5dZ0/Ovf/0Ljo6O8PLygq+vr0aE2s2bN7OmazBKS0vZltAPttxIAHDz5s3nHlcqlVpfHU81DQ2qSRM6h9GH1NRUnDx5Emq1GsHBwcjJycGsWbOQnp6OFStWsBIo7tChQ0hNTYVAIMCZM2ewatUqJCYmQiAQsDbx9jw++uij4U3iMgjPcyOxtegSAI4fPw65XA4vLy/Mnj0b48ePh0KhwN27d3Hp0iUUFxdjy5YtmDJlCtVENXFaEx2S6oO3tzfS09Mhl8uxYMECFBQUYOLEiWhvb0dQUBAr2e365qUuKytDTEwMDh8+jO3bt7OSsY2LVtFZs2Zh9erVA7rcUlJSUF5ernVNPVRUVCAjIwNlZWVobm7GiBEjYGlpiYULF+Ivf/kLjI2NqSaqifOa6BNGH3R1dRnb4xtvvIGJEycCAPh8PmsWyK6uLiYmP/As1WhcXBzWr1/P2qrq1NTUQS/ObLUTF91IPdjY2MDGxoZVDX2hmoYG1fRfaIfRh54V1cCz/N69UalU2pYDAFi8eDHKy8s1Jm2FQiEOHz7MWlInLl6cN27cOGjOEq5Gq6VQXiXokFQfTp8+DbFY3O/CU1NTg1OnTuHDDz9kSRm34KJVlEKhDC+0w3gFuHnzJmbOnDnocaVSifr6eq1OvHER2k4UyvBCbbV94KKN7vjx41i9ejWys7Nx9+5ddHR0oK2tDVevXkVcXBz8/f3R2tqqVU20nSiU3x/0CaMPERERQ7KsaTuXL9ecGrSdKJTfH7TDGAB60RkatJ0olN8XtMOgUCgUypCgcxgUCoVCGRK0w6BQKBTKkKAdBoVCoVCGBO0wKKwjFApRX18PANiyZQsOHTqktbobGxshFArR1dWltToPHjyIlJSUF/pO7zb6NSQkJCA6OvpXl/NrqKqqovlSXlFoaBCK1nBzc0NbW5tG/Kn8/Hxcv359WOttbm7G7t27UVZWBrVaDTMzMwQFBcHX1xfm5ubDXn9v2tvbkZ2djW+//ZbZJ5PJkJCQgG+//Rbt7e0YM2YMbGxsEBwcjNmzZwOAVjUOxooVK+Dt7Q1/f/9fVY5AIMCoUaNw4cIFuLm5vSR1FG1AOwyKVjl27Bjeeecdrda5adMmCAQCXLx4EQYGBqiursb9+/e1qqGH06dPw8XFBX/4wx8APFvg+N577+G1117DsWPHMGXKFCgUCpSUlKCkpITpMH5rSCQSZGRk0A7jFYMOSVFYx8rKSiMpVG8uXrwIqVQKkUiEwMBAVFVVMceOHz8OZ2dnCIVCiMViXLlyZcAybt68CV9fXxgZGUFPTw/W1tZwcXEBANy7dw9WVlZQq9W4fv06hEIh85o1axZzQevu7sbx48excOFC2NvbY/369Xj48CEAQKFQIDo6Gvb29hCJRPDz80NbW9uAWkpKSmBra8ts5+TkoKWlBUeOHIGlpSV4PB6MjIzg7u6ukRe9dxs9ePAAa9euxZw5c+Dn54dDhw5h2bJlzGdjY2Ph4uKCOXPmwNfXt19Yd6VSicjISAiFQixZsoRp06SkpH652GNjYxEbG4tDhw6hvLwcO3fuhFAoZIJe1tTUYNWqVbCzs4NYLMb58+eZ7xYXF2Px4sUQCoVwdnZGcnIyc8ze3h5XrlyBUqkcsJ0oHIVQKFrC1dWVlJaW9ttvaWlJamtrCSGExMTEkLi4OEIIIbdu3SIODg7ku+++I2q1mpw+fZq4uroShUJBampqyLx580hzczMhhJD6+npSV1c3YL3vvfceWbp0KTl37hxpaGjQOFZfX08sLS2JSqXS2K9UKsm7775LDhw4QAghJCUlhfj7+5OmpiaiUCjIJ598QjZs2EAIISQtLY2EhIQQuVxO1Go1qaysJB0dHQNqsbe3Jzdu3GC2IyMjSUxMzC+2Xe82ioyMJJGRkUQul5Pbt2+TefPmkcDAQOaz2dnZpL29nahUKpKcnEzeeecd8vTpU0IIIZ9//jmxtrYmeXl5RKlUkqSkJOLq6kqUSiVpaWkhs2fPJo8ePSKEEKJSqYiDgwOprKwkhBCyfPlycurUKaaezs5OMm/ePJKVlUVUKhW5desWsbOzI7dv3yaEEOLo6EiuXr1KCCHk4cOH5ObNmxq/SSgUku+///4XfzuFO9AnDIpWCQ0NhUgkgkgkwgcffPDcz2ZkZGDp0qWYPXs2eDwelixZAn19fXz33Xfg8XhMvCqVSoVJkybBwsJiwHIOHz4MkUiExMRELFiwAFKpFBUVFc+tOzY2FiNHjsSGDRsAAOnp6diwYQMmTpwIAwMDhIWFoaCgAGq1Gnp6enj48CHq6urA4/Ewc+bMQVe5d3R0aERCfvDgAcaPH89sf//99xCJRJgzZw7EYnG/73d1daGwsBDh4eEYMWIEpk6dCh8fH43PSKVSjB07Fnp6eggKCoJSqcTdu3eZ4zNmzIC7uzv09fWxatUqKJVK3LhxAyYmJhCJRMjPzwcAXLp0CWPHjh00oGNRURFef/11+Pn5MU9uYrGY+b6enh7u3LkDmUyG0aNHY8aMGRrfHzlyJKuZECkvDp3DoGiVI0eODHkOo7GxEdnZ2fj666+ZfSqVCq2trbCzs8PWrVuRkJCAO3fuwMnJCVu2bIGpqWm/ckaPHo3o6GhER0ejvb0d+/btQ2hoKEpKSgasNz09HWVlZcjMzISuri6jJTQ0lNkGniXb+vnnnyGVStHc3IyNGzfi8ePH8Pb2xoYNG6Cvr9+v7Ndeew2dnZ3M9pgxYzTmU6ZPn47y8nJcvnwZH3/8cb/vt7e3MxP3PfR+DwDJycnIyspCa2srdHR0IJPJ8ODBA+Z4T1Kwnt9gamrKBGVcsmQJ0tLSEBAQgNzcXEil0gHbCAAaGhpQUVEBkUjE7Ovq6oK3tzcA4PPPP8fRo0dx8OBBWFlZISoqCkKhkPlsZ2cnRo0aNWj5FO5BOwwKZzEzM8PatWuxbt26AY9LJBJIJBLIZDJs27YNBw4cwP79+59bJp/PR1BQEM6cOcPMQfSmvLwchw8fxj/+8Q+Np4SJEydiz549g+b5CAsLQ1hYGO7du4c1a9bgrbfeGtBNZGVlhdraWiZb2ttvv42EhATI5XIYGRk9V3uPfj09PTQ3N+Ott94CADQ1NWnoT0pKQkpKCqZNmwZdXV3Y2tpq5H5vbm5m3nd3d6OlpQUmJiYAgIULF2LHjh2orq5GUVERNm3aNKgWMzMz2Nra4uTJkwMet7GxwdGjR6FSqfD3v/8dkZGRKC4uBgC0tLRApVJh8uTJv/ibKdyBDklROIu/vz/S09Nx48YNEEIgl8tRVFQEmUyGH3/8kZk0NTAwgKGhocbdf2/279+P6upqqNVqyGQypKWl4Y9//CPGjh2r8bmmpiZERkZi7969zMW4h2XLliE+Ph4NDQ0Ant3p//Of/wQA/Pvf/8YPP/yArq4uGBsbQ09Pb1AtLi4uuHr1KrPt4+ODCRMmICwsDNXV1Uw63sHCx/N4PPz5z3/GF198gSdPnqCmpkYj33tnZyd4PB74fD7UajW++OILyGQyjTJu3bqFwsJCqNVqfPXVVzAwMGDcWIaGhhCLxYiKisKsWbNgbm7OfG/8+PEaa0Hmz5+P2tpaZGdnQ6VSQaVSoaKiAjU1NVAqlcjNzUVHRwf09fUxcuRIjTYpKyuDg4MDDAwMBvydFG5COwwKZ5k1axZ27dqFnTt3wtbWFosWLcLp06cBPHP6HDx4EPb29nByckJ7e3u/lLo9PH36FGFhYbC1tcXChQvR2NiIo0eP9vvclStX0NbWhvXr1zNOKU9PTwDAypUr4ebmhqCgIAiFQgQEBDDzIG1tbYiIiMDcuXOxePFi2NnZDTqUI5VKUVxcjKdPnwJ4doH+29/+hilTpiAkJARz586Fu7s7KisrER8fP2AZ27ZtQ0dHBxwdHbF582Z4enoyF14nJyc4OztDLBbDzc0NhoaG/YasFixYgPPnz8PW1hY5OTlISEjQGD7z8fFBdXV1v9+wcuVKFBQUwNbWFrGxsTA2NkZycjLOnz8PZ2dnODk54cCBA4zzKScnB25ubpgzZw7S09M1nv7Onj1LF++9gtBotRSKlomLiwOfz8df//rXl1Le/v370dbWhr17976U8hobG+Hh4YHS0tJhCVFfVVWF7du3IyMj46WXTRleaIdBobxi9DjDrKysUFlZiffffx+7d+/GwoULf3XZ3d3d+PTTTyGTyfDpp5++BLWU3xJ00ptCecXo7OxEVFQUWltbMW7cOAQFBWHBggW/uly5XA5HR0eYm5sjKSnpJSil/NagTxgUCoVCGRJ00ptCoVAoQ4J2GBQKhUIZErTDoFAoFMqQoB0GhUKhUIYE7TAoFAqFMiT+DwsAfj24NhVUAAAAAElFTkSuQmCC\n" 876 | }, 877 | "metadata": {} 878 | } 879 | ] 880 | } 881 | ] 882 | } -------------------------------------------------------------------------------- /csv_schema_inference/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Wittline/csv-schema-inference/8121193f82b02984c811b2cc794c539e28b5e5ef/csv_schema_inference/__init__.py -------------------------------------------------------------------------------- /csv_schema_inference/csv_schema_inference.py: -------------------------------------------------------------------------------- 1 | import mmap 2 | import os 3 | import multiprocessing as mp 4 | import datetime as dt 5 | import operator 6 | 7 | 8 | 9 | class DetectType: 10 | 11 | def __init__(self, max_length, sep): 12 | self.max_length = max_length 13 | self.sep = sep 14 | 15 | def __get_local_type(self, value): 16 | try: 17 | float(value) 18 | except ValueError: 19 | return "STRING" 20 | 21 | if float(value).is_integer(): 22 | return "INTEGER" 23 | else: 24 | return "FLOAT" 25 | 26 | 27 | def __get_date_type(self, value): 28 | 29 | 30 | if "T" in value: 31 | segments = value.split("T") 32 | try: 33 | 34 | if len(segments) == 2: 35 | valid_date = False 36 | d_elements = segments[0].split("-") 37 | if len(d_elements) == 3 and len(d_elements[0]) in {2, 4} and \ 38 | len(d_elements[1]) == 2 and len(d_elements[2]) == 2: 39 | dt.date(*(int(e) for e in d_elements)) 40 | valid_date = True 41 | t_elements = segments[1].split(":") 42 | valid_time = False 43 | if len(t_elements) in (2, 3): 44 | valid_time = (len(t_elements[0]) == 2 and 0 <= int(t_elements[0]) < 24 and 45 | len(t_elements[1]) and 0 <= int(t_elements[1]) < 60) 46 | if len(t_elements) == 3: 47 | valid_time = (valid_time and len(t_elements[2]) == 2 and 48 | 0 <= int(t_elements[2]) < 60) 49 | if valid_time and valid_date: 50 | return "TIMESTAMP" 51 | 52 | except ValueError: 53 | return "STRING" 54 | 55 | elif "-" in value: 56 | 57 | segments = value.split("-") 58 | try: 59 | 60 | if len(segments) == 3 and len(segments[0]) in {2, 4} and \ 61 | len(segments[1]) == 2 and len(segments[2]) == 2: 62 | 63 | dt.date(*(int(e) for e in segments)) 64 | return "DATE" 65 | except ValueError: 66 | return "STRING" 67 | else: 68 | 69 | try: 70 | segments = value.split(":") 71 | if len(segments) in {2, 3}: 72 | valid = (len(segments[0]) == 2 and 0 <= int(segments[0]) < 24 and 73 | len(segments[1]) and 0 <= int(segments[1]) < 60) 74 | if len(segments) == 3: 75 | valid = (valid and len(segments[2]) == 2 and 76 | 0 <= int(segments[2]) < 60) 77 | if valid: 78 | return "TIME" 79 | except ValueError: 80 | return "STRING" 81 | 82 | 83 | return "STRING" 84 | 85 | 86 | def __infer_value_type(self, value, index, schema, values_type): 87 | 88 | if value not in values_type.keys(): 89 | 90 | local_type = self.__get_local_type(value) 91 | 92 | if local_type == 'STRING': 93 | 94 | if value in {"", "na", "NA", "null", "NULL"}: 95 | schema[index]["nullable"] = True 96 | _type = "STRING" 97 | elif value in {"true", "false", "TRUE", "FALSE", "True", "False"}: 98 | _type = "BOOLEAN" 99 | elif len(value) < 21: 100 | _type = self.__get_date_type(value) 101 | else: 102 | _type = local_type 103 | else: 104 | _type = local_type 105 | 106 | values_type[value] = _type 107 | 108 | if values_type[value] not in schema[index]["types_found"].keys(): 109 | schema[index]["types_found"][values_type[value]] = { "cnt": 1} 110 | else: 111 | schema[index]["types_found"][values_type[value]]["cnt"] += 1 112 | else: 113 | if values_type[value] not in schema[index]["types_found"].keys(): 114 | schema[index]["types_found"][values_type[value]] = { "cnt": 1} 115 | else: 116 | schema[index]["types_found"][values_type[value]]["cnt"] += 1 117 | 118 | 119 | def execute(self, records, schema): 120 | values_type = {} 121 | for record in records: 122 | values = record.rstrip().split(self.sep) 123 | for index, value in enumerate(values): 124 | self.__infer_value_type(value[0:self.max_length], index, schema, values_type) 125 | 126 | 127 | class Parallel: 128 | 129 | def __init__(self): 130 | pass 131 | 132 | 133 | def execute(self, records, x, obj, d_schema): 134 | obj.execute(records, d_schema) 135 | return d_schema 136 | 137 | 138 | def parallel(self, records, obj, d_schema): 139 | 140 | 141 | 142 | cpus = (mp.cpu_count() - 2) 143 | 144 | if cpus <= 0: 145 | cpus = mp.cpu_count() 146 | 147 | chunk_size = len(records) / cpus 148 | 149 | if chunk_size < 1: 150 | cpus = int(chunk_size * 10) 151 | chunk_size = 1 152 | else: 153 | chunk_size = round(chunk_size) 154 | 155 | 156 | 157 | pool = mp.Pool(processes=cpus) 158 | 159 | results = [pool.apply_async(self.execute, args=(records[x:x+chunk_size], x, obj, d_schema)) for x in range(0, len(records), chunk_size)] 160 | pool.close() 161 | pool.join() 162 | 163 | return [p.get() for p in results] 164 | 165 | 166 | class CsvSchemaInference: 167 | 168 | def __init__(self, portion = 0.5, max_length = 1000, batch_size = 250000, acc = 0.7, seed= 1, header= True, sep=";", conditions = {}): 169 | self.portion = portion 170 | self.seed = seed 171 | self.header = header 172 | self.sep = sep 173 | self.accuracy = acc 174 | self.__schema = {} 175 | self.max_length = max_length 176 | self.data_types = {"STRING", "INTEGER", "FLOAT", "DATETIME", "DATE", "TIME", "TIMESTAMP", "BOOLEAN"} 177 | self.batch_size = batch_size 178 | 179 | if isinstance(conditions,dict): 180 | 181 | if conditions: 182 | for k, v in conditions.items(): 183 | if k not in self.data_types or v not in self.data_types: 184 | raise ValueError('Keys and values in conditions must be valid data types') 185 | 186 | 187 | self.conditions = conditions 188 | 189 | 190 | 191 | 192 | def __set_header(self, header): 193 | 194 | header = header.rstrip().split(self.sep) 195 | for i in range(0, len(header)): 196 | self.__schema[i] = { 197 | "_name": header[i].replace('"', ''), 198 | "types_found":{ 199 | }, 200 | "nullable":False, 201 | "type":"" 202 | } 203 | 204 | 205 | def __estimate_count(self, filename, reader): 206 | buffer = reader.read(1<<13) 207 | file_size = os.path.getsize(filename) 208 | return file_size // (len(buffer) // buffer.count(b'\n')) 209 | 210 | 211 | def __merge_schemas(self, schemas): 212 | 213 | for c_inx in self.__schema: 214 | 215 | for s_inx in range(0, len(schemas)): 216 | 217 | _v = schemas[s_inx][c_inx] 218 | 219 | if _v['nullable']: 220 | self.__schema[c_inx]['nullable'] = True 221 | 222 | 223 | for k in _v['types_found']: 224 | 225 | if k not in self.__schema[c_inx]['types_found'].keys(): 226 | 227 | self.__schema[c_inx]['types_found'][k] = { 228 | "cnt": _v['types_found'][k]['cnt'] 229 | } 230 | else: 231 | self.__schema[c_inx]['types_found'][k]['cnt'] += _v['types_found'][k]['cnt'] 232 | 233 | 234 | 235 | def check_condition(self, _types, acc): 236 | 237 | try: 238 | _type = max({k: v for k, v in _types.items() if v >= (acc * 100)}.items(), 239 | key=operator.itemgetter(1))[0] 240 | 241 | if _type in self.conditions: 242 | if self.conditions[_type] in _types: 243 | _type = self.conditions[_type] 244 | 245 | except ValueError: 246 | 247 | if "STRING" in _types or len(_types) > 2: 248 | _type = "STRING" 249 | else: 250 | if {"INTEGER", "FLOAT"}.issubset(_types): 251 | _type = "FLOAT" 252 | else: 253 | _type = "STRING" 254 | 255 | return _type 256 | 257 | 258 | 259 | 260 | 261 | def __approximate_types(self, acc = 0.5): 262 | 263 | result = {} 264 | for c in self.__schema: 265 | _types = {} 266 | t = 0 267 | for v in self.__schema[c]['types_found']: 268 | t += self.__schema[c]['types_found'][v]['cnt'] 269 | if v not in _types.keys(): 270 | _types[v] = self.__schema[c]['types_found'][v]['cnt'] 271 | else: 272 | _types[v] += self.__schema[c]['types_found'][v]['cnt'] 273 | 274 | for ft in _types: 275 | _types[ft] = (_types[ft] * 100) / t 276 | 277 | 278 | _type = self.check_condition(_types, acc) 279 | 280 | 281 | self.__schema[c]['type'] = _type 282 | 283 | result[c] = { 284 | "name": self.__schema[c]['_name'], 285 | "type": _type, 286 | "nullable": self.__schema[c]['nullable'] 287 | } 288 | 289 | return result 290 | 291 | 292 | def pretty(self, d, ind=0): 293 | 294 | for k, v in d.items(): 295 | print('\t' * ind + str(k)) 296 | if isinstance(v, dict): 297 | self.pretty(v, ind+1) 298 | else: 299 | print('\t' * (ind+1) + str(v)) 300 | 301 | 302 | def get_schema_columns(self, columns = {}): 303 | 304 | 305 | result = {} 306 | 307 | for c in self.__schema: 308 | if self.__schema[c]["_name"] in columns: 309 | result[c] = { 310 | "_name": self.__schema[c]["_name"], 311 | "types_found":self.__schema[c]["types_found"], 312 | "nullable":self.__schema[c]["nullable"], 313 | "type":self.__schema[c]["type"] 314 | } 315 | 316 | return result 317 | 318 | 319 | def explore_schema_column(self, column): 320 | 321 | result = {} 322 | 323 | for c in self.__schema: 324 | 325 | if column == self.__schema[c]['_name']: 326 | 327 | _types = {} 328 | t = 0 329 | for v in self.__schema[c]['types_found']: 330 | t += self.__schema[c]['types_found'][v]['cnt'] 331 | 332 | if v not in _types.keys(): 333 | _types[v] = self.__schema[c]['types_found'][v]['cnt'] 334 | else: 335 | _types[v] += self.__schema[c]['types_found'][v]['cnt'] 336 | 337 | for ft in _types: 338 | _types[ft] = (_types[ft] * 100) / t 339 | 340 | result[c] = { 341 | "name" : self.__schema[c]['_name'], 342 | "types_found": _types, 343 | "nullable": self.__schema[c]['nullable'] 344 | } 345 | 346 | break 347 | 348 | return result 349 | 350 | 351 | 352 | def run_inference(self, filename): 353 | 354 | with open(filename, mode="r", encoding = "ISO-8859-1") as file_obj: 355 | 356 | with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as map_file: 357 | 358 | less_header = 0 359 | 360 | if self.header: 361 | less_header = 1 362 | 363 | no_lines = self.__estimate_count(filename, map_file) - less_header 364 | portion = int(no_lines * self.portion) 365 | map_file.seek(0) 366 | 367 | if self.header: 368 | self.__set_header(map_file.readline().decode("ISO-8859-1")) 369 | 370 | lines = [] 371 | schemas = [] 372 | batch_count = 0 373 | 374 | dtype = DetectType(self.max_length, self.sep) 375 | 376 | 377 | while batch_count < portion: 378 | 379 | batch_count += 1 380 | lines.append(map_file.readline().decode("ISO-8859-1")) 381 | 382 | if batch_count % self.batch_size == 0: 383 | 384 | prl = Parallel() 385 | schemas_result = prl.parallel(records = lines, obj=dtype, d_schema = self.__schema) 386 | 387 | for schema in schemas_result: 388 | schemas.append(schema) 389 | 390 | lines = [] 391 | 392 | if len(lines) > 0: 393 | 394 | prl = Parallel() 395 | schemas_result = prl.parallel(records = lines,obj=dtype, d_schema = self.__schema) 396 | 397 | for schema in schemas_result: 398 | schemas.append(schema) 399 | 400 | del lines 401 | del batch_count 402 | 403 | 404 | #Joining schemas results 405 | self.__merge_schemas(schemas) 406 | 407 | #Approximate data types 408 | return self.__approximate_types(acc = self.accuracy) -------------------------------------------------------------------------------- /googled57bdb220576a44a.html: -------------------------------------------------------------------------------- 1 | google-site-verification: googled57bdb220576a44a.html -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = [ 3 | "setuptools>=42", 4 | "wheel" 5 | ] 6 | build-backend = "setuptools.build_meta" -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | name = csv-schema-inference 3 | version = 0.0.9 4 | author = Ramses Alexander Coraspe Valdez 5 | author_email = contacto@wittline.com 6 | description = A tool to automatically infer columns data types in .csv files 7 | long_description = file: README.md 8 | long_description_content_type = text/markdown 9 | url = https://github.com/Wittline/csv-schema-inference 10 | classifiers = 11 | Programming Language :: Python :: 3 12 | License :: OSI Approved :: MIT License 13 | Operating System :: OS Independent 14 | 15 | [options] 16 | packages = find: 17 | python_requires = >=3.7 18 | include_package_data = False --------------------------------------------------------------------------------