├── .github
└── ISSUE_TEMPLATE
│ ├── bug_report.md
│ └── feature_request.md
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── _config.yml
├── benchmark
└── Benchmark.ipynb
├── csv_schema_inference
├── __init__.py
└── csv_schema_inference.py
├── googled57bdb220576a44a.html
├── pyproject.toml
└── setup.cfg
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Bug report
3 | about: Create a report to help us improve
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 |
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 |
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 |
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 |
26 | **Desktop (please complete the following information):**
27 | - OS: [e.g. iOS]
28 | - Browser [e.g. chrome, safari]
29 | - Version [e.g. 22]
30 |
31 | **Smartphone (please complete the following information):**
32 | - Device: [e.g. iPhone6]
33 | - OS: [e.g. iOS8.1]
34 | - Browser [e.g. stock browser, safari]
35 | - Version [e.g. 22]
36 |
37 | **Additional context**
38 | Add any other context about the problem here.
39 |
--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
1 | ---
2 | name: Feature request
3 | about: Suggest an idea for this project
4 | title: ''
5 | labels: ''
6 | assignees: ''
7 |
8 | ---
9 |
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 |
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 |
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 |
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | pip-wheel-metadata/
24 | share/python-wheels/
25 | *.egg-info/
26 | .installed.cfg
27 | *.egg
28 | MANIFEST
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .nox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | *.py,cover
51 | .hypothesis/
52 | .pytest_cache/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | target/
76 |
77 | # Jupyter Notebook
78 | .ipynb_checkpoints
79 |
80 | # IPython
81 | profile_default/
82 | ipython_config.py
83 |
84 | # pyenv
85 | .python-version
86 |
87 | # pipenv
88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
91 | # install all needed dependencies.
92 | #Pipfile.lock
93 |
94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
95 | __pypackages__/
96 |
97 | # Celery stuff
98 | celerybeat-schedule
99 | celerybeat.pid
100 |
101 | # SageMath parsed files
102 | *.sage.py
103 |
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 |
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 |
117 | # Rope project settings
118 | .ropeproject
119 |
120 | # mkdocs documentation
121 | /site
122 |
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 |
128 | # Pyre type checker
129 | .pyre/
130 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | We as members, contributors, and leaders pledge to make participation in our
6 | community a harassment-free experience for everyone, regardless of age, body
7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
8 | identity and expression, level of experience, education, socio-economic status,
9 | nationality, personal appearance, race, religion, or sexual identity
10 | and orientation.
11 |
12 | We pledge to act and interact in ways that contribute to an open, welcoming,
13 | diverse, inclusive, and healthy community.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to a positive environment for our
18 | community include:
19 |
20 | * Demonstrating empathy and kindness toward other people
21 | * Being respectful of differing opinions, viewpoints, and experiences
22 | * Giving and gracefully accepting constructive feedback
23 | * Accepting responsibility and apologizing to those affected by our mistakes,
24 | and learning from the experience
25 | * Focusing on what is best not just for us as individuals, but for the
26 | overall community
27 |
28 | Examples of unacceptable behavior include:
29 |
30 | * The use of sexualized language or imagery, and sexual attention or
31 | advances of any kind
32 | * Trolling, insulting or derogatory comments, and personal or political attacks
33 | * Public or private harassment
34 | * Publishing others' private information, such as a physical or email
35 | address, without their explicit permission
36 | * Other conduct which could reasonably be considered inappropriate in a
37 | professional setting
38 |
39 | ## Enforcement Responsibilities
40 |
41 | Community leaders are responsible for clarifying and enforcing our standards of
42 | acceptable behavior and will take appropriate and fair corrective action in
43 | response to any behavior that they deem inappropriate, threatening, offensive,
44 | or harmful.
45 |
46 | Community leaders have the right and responsibility to remove, edit, or reject
47 | comments, commits, code, wiki edits, issues, and other contributions that are
48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
49 | decisions when appropriate.
50 |
51 | ## Scope
52 |
53 | This Code of Conduct applies within all community spaces, and also applies when
54 | an individual is officially representing the community in public spaces.
55 | Examples of representing our community include using an official e-mail address,
56 | posting via an official social media account, or acting as an appointed
57 | representative at an online or offline event.
58 |
59 | ## Enforcement
60 |
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported to the community leaders responsible for enforcement at
63 | .
64 | All complaints will be reviewed and investigated promptly and fairly.
65 |
66 | All community leaders are obligated to respect the privacy and security of the
67 | reporter of any incident.
68 |
69 | ## Enforcement Guidelines
70 |
71 | Community leaders will follow these Community Impact Guidelines in determining
72 | the consequences for any action they deem in violation of this Code of Conduct:
73 |
74 | ### 1. Correction
75 |
76 | **Community Impact**: Use of inappropriate language or other behavior deemed
77 | unprofessional or unwelcome in the community.
78 |
79 | **Consequence**: A private, written warning from community leaders, providing
80 | clarity around the nature of the violation and an explanation of why the
81 | behavior was inappropriate. A public apology may be requested.
82 |
83 | ### 2. Warning
84 |
85 | **Community Impact**: A violation through a single incident or series
86 | of actions.
87 |
88 | **Consequence**: A warning with consequences for continued behavior. No
89 | interaction with the people involved, including unsolicited interaction with
90 | those enforcing the Code of Conduct, for a specified period of time. This
91 | includes avoiding interactions in community spaces as well as external channels
92 | like social media. Violating these terms may lead to a temporary or
93 | permanent ban.
94 |
95 | ### 3. Temporary Ban
96 |
97 | **Community Impact**: A serious violation of community standards, including
98 | sustained inappropriate behavior.
99 |
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 |
106 | ### 4. Permanent Ban
107 |
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior, harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 |
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 |
115 | ## Attribution
116 |
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 |
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 |
124 | [homepage]: https://www.contributor-covenant.org
125 |
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | CONTRIBUTING.md
2 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Ramses Alexander Coraspe Valdez
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | global-include *.*
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # **Csv Schema Inference**
2 | A tool to automatically infer columns data types in .csv files
3 |
4 | ### Check the article here: Building a Schema Inference Data Pipeline for Large CSV files
5 |
6 |
7 |
10 |
11 |
12 |
13 |
14 |
15 | ## **Installing csv-schema-inference** 🔧
16 |
17 |
18 |
19 |
20 |
21 | ``` python
22 | pip install csv-schema-inference
23 | ```
24 |
25 |
26 |
27 | Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
28 | Collecting csv-schema-inference
29 | Downloading csv_schema_inference-0.0.9-py3-none-any.whl (7.3 kB)
30 | Installing collected packages: csv-schema-inference
31 | Successfully installed csv-schema-inference-0.0.9
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 | ## **Importing csv-schema-inference library** ⚡
40 |
41 |
42 |
43 |
44 |
45 | ``` python
46 | from csv_schema_inference import csv_schema_inference
47 | ```
48 |
49 |
50 |
51 |
52 |
53 | ## **Setting csv-schema-inference configuration** ✍
54 |
55 |
56 |
57 |
58 |
59 | ``` python
60 |
61 | #if the inferred data type is INTEGER and there is a presence of FLOAT on the results , then the result will be FLOAT
62 | conditions = {"INTEGER":"FLOAT"}
63 |
64 | csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.9, max_length=100, batch_size = 200000, acc = 0.8, seed=2, header=True, sep=",", conditions = conditions)
65 | pathfile = "/content/file__500k.csv"
66 | ```
67 |
68 |
69 |
70 |
71 |
72 | ## **Run inference** 🏃
73 |
74 |
75 |
76 |
77 |
78 | ``` python
79 | aprox_schema = csv_infer.run_inference(pathfile)
80 | ```
81 |
82 |
83 |
84 |
85 |
86 | ## **Showing the approximate data type inference for each column** 🔍
87 |
88 |
89 |
90 |
91 |
92 | ``` python
93 | csv_infer.pretty(aprox_schema)
94 | ```
95 |
96 |
97 |
98 | 0
99 | name
100 | id
101 | type
102 | INTEGER
103 | nullable
104 | False
105 | 1
106 | name
107 | full_name
108 | type
109 | STRING
110 | nullable
111 | True
112 | 2
113 | name
114 | age
115 | type
116 | INTEGER
117 | nullable
118 | False
119 | 3
120 | name
121 | city
122 | type
123 | STRING
124 | nullable
125 | True
126 | 4
127 | name
128 | weight
129 | type
130 | FLOAT
131 | nullable
132 | False
133 | 5
134 | name
135 | height
136 | type
137 | FLOAT
138 | nullable
139 | False
140 | 6
141 | name
142 | isActive
143 | type
144 | BOOLEAN
145 | nullable
146 | False
147 | 7
148 | name
149 | col_int1
150 | type
151 | INTEGER
152 | nullable
153 | False
154 | 8
155 | name
156 | col_int2
157 | type
158 | INTEGER
159 | nullable
160 | False
161 | 9
162 | name
163 | col_int3
164 | type
165 | INTEGER
166 | nullable
167 | False
168 | 10
169 | name
170 | col_float1
171 | type
172 | FLOAT
173 | nullable
174 | False
175 | 11
176 | name
177 | col_float2
178 | type
179 | FLOAT
180 | nullable
181 | False
182 | 12
183 | name
184 | col_float3
185 | type
186 | FLOAT
187 | nullable
188 | False
189 | 13
190 | name
191 | col_float4
192 | type
193 | FLOAT
194 | nullable
195 | False
196 | 14
197 | name
198 | col_float5
199 | type
200 | FLOAT
201 | nullable
202 | False
203 | 15
204 | name
205 | col_float6
206 | type
207 | FLOAT
208 | nullable
209 | False
210 | 16
211 | name
212 | col_float7
213 | type
214 | FLOAT
215 | nullable
216 | False
217 | 17
218 | name
219 | col_float8
220 | type
221 | FLOAT
222 | nullable
223 | False
224 | 18
225 | name
226 | col_float9
227 | type
228 | FLOAT
229 | nullable
230 | False
231 | 19
232 | name
233 | col_float10
234 | type
235 | FLOAT
236 | nullable
237 | False
238 | 20
239 | name
240 | test_column
241 | type
242 | FLOAT
243 | nullable
244 | False
245 |
246 |
247 |
248 |
249 |
250 |
251 |
252 | ## **Checking schema values for specific columns** ✔
253 |
254 |
255 |
256 |
257 |
258 | ``` python
259 | result = csv_infer.get_schema_columns(columns = {"test_column"})
260 | csv_infer.pretty(result)
261 | ```
262 |
263 |
264 |
265 | 20
266 | _name
267 | test_column
268 | types_found
269 | INTEGER
270 | cnt
271 | 406130
272 | FLOAT
273 | cnt
274 | 50964
275 | nullable
276 | False
277 | type
278 | FLOAT
279 |
280 |
281 |
282 |
283 |
284 |
285 |
286 | ## **Explore all possible data types for a specific columns** ✅
287 |
288 |
289 |
290 |
291 |
292 | ``` python
293 | result = csv_infer.explore_schema_column(column = "test_column")
294 | csv_infer.pretty(result)
295 | ```
296 |
297 |
298 |
299 | 20
300 | name
301 | test_column
302 | types_found
303 | INTEGER
304 | 88.85043339006856
305 | FLOAT
306 | 11.149566609931437
307 | nullable
308 | False
309 |
310 |
311 |
312 |
313 |
314 | ## Benchmark
315 | The tests were done with 9 .csv files, 21 columns, different sizes and number of records, an average of 5 executions was calculated for each process, shuffle time and inferring time.
316 |
317 | - file__20m.csv: 20 million records
318 | - file__15m.csv: 15 million records
319 | - file__12m.csv: 12 million records
320 | - file__10m.csv: 10 million records
321 | - And so on...
322 |
323 | If you want to know more about the shuffling process, you can check this other repository: A tool to automatically Shuffle lines in .csv files , the shuffling process helps us to:
324 |
325 | 1. Increase the probability of finding all the data types present in a single column.
326 | 2. Avoid iterate the entire dataset.
327 | 2. Avoid see biases in the data that may be part of its organic behavior and due to not knowing the nature of its construction.
328 |
329 |
330 |
333 |
334 |
335 | ## Contributing and Feedback
336 | Any ideas or feedback about this repository?. Help me to improve it.
337 |
338 | ## Authors
339 | - Created by Ramses Alexander Coraspe Valdez
340 | - Created on 2022
341 |
342 | ## License
343 | This project is licensed under the terms of the MIT License.
344 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman
2 | title: "A tool to automatically infer columns data types in .csv files"
3 | description: "A parallel implementation of Schema inference using python"
4 | author: "Ramses Alexander Coraspe Valdez"
5 |
--------------------------------------------------------------------------------
/benchmark/Benchmark.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "Benchmark.ipynb",
7 | "provenance": [],
8 | "collapsed_sections": []
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | }
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "code",
21 | "source": [
22 | "import pandas as pd"
23 | ],
24 | "metadata": {
25 | "id": "NrcJWz22npeq"
26 | },
27 | "execution_count": 99,
28 | "outputs": []
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 100,
33 | "metadata": {
34 | "id": "XyKNLam6m4Vv"
35 | },
36 | "outputs": [],
37 | "source": [
38 | "benchmark_data = [ \n",
39 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 73.00043550, 'inferring_time': 111.56356970},\n",
40 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 72.97213800, 'inferring_time': 115.25191430},\n",
41 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 82.32063370, 'inferring_time': 116.76299740},\n",
42 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 77.67622630, 'inferring_time': 114.59385790},\n",
43 | " {'filename': 'file__20m.csv', 'file_size': 3100848423, 'shuffle_time': 73.26938180, 'inferring_time': 112.55643420},\n",
44 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 55.82634800, 'inferring_time': 74.62251340},\n",
45 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.93429800, 'inferring_time': 71.26189710},\n",
46 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.87042450, 'inferring_time': 69.13962730},\n",
47 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.14651010, 'inferring_time': 71.23978310},\n",
48 | " {'filename': 'file__15m.csv', 'file_size': 2322887546, 'shuffle_time': 42.21968120, 'inferring_time': 69.67053280},\n",
49 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 32.48983010, 'inferring_time': 58.08111770},\n",
50 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 34.64318280, 'inferring_time': 57.98930810},\n",
51 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 34.85442540, 'inferring_time': 57.71942010},\n",
52 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 33.38362710, 'inferring_time': 59.86055910},\n",
53 | " {'filename': 'file__12m.csv', 'file_size': 1856118441, 'shuffle_time': 32.79728820, 'inferring_time': 57.41156370},\n",
54 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 28.28831460, 'inferring_time': 53.78283170},\n",
55 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 30.25130520, 'inferring_time': 51.21287500},\n",
56 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 29.83213370, 'inferring_time': 53.01958860},\n",
57 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 30.21982290, 'inferring_time': 51.81474830},\n",
58 | " {'filename': 'file__10m.csv', 'file_size': 1544899668, 'shuffle_time': 29.52344140, 'inferring_time': 58.40408200},\n",
59 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.60465530, 'inferring_time': 44.68717590},\n",
60 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 23.84743100, 'inferring_time': 42.68867510},\n",
61 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.94851320, 'inferring_time': 46.96807710},\n",
62 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.77527450, 'inferring_time': 42.62858490},\n",
63 | " {'filename': 'file__8m.csv', 'file_size': 1235682644, 'shuffle_time': 22.20869720, 'inferring_time': 42.98606580},\n",
64 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 15.88705860, 'inferring_time': 28.34111610},\n",
65 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 17.08761300, 'inferring_time': 29.42147060},\n",
66 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 16.48110200, 'inferring_time': 29.21088670},\n",
67 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 17.10600270, 'inferring_time': 28.82191680},\n",
68 | " {'filename': 'file__6m.csv', 'file_size': 926480055, 'shuffle_time': 17.17415740, 'inferring_time': 29.26859480},\n",
69 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.47866530, 'inferring_time': 19.30165580},\n",
70 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 12.42761710, 'inferring_time': 19.83578670},\n",
71 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.44712670, 'inferring_time': 21.38865030},\n",
72 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.67422640, 'inferring_time': 23.90071370},\n",
73 | " {'filename': 'file__4m.csv', 'file_size': 617284424, 'shuffle_time': 11.75241490, 'inferring_time': 23.17653020},\n",
74 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.56659010, 'inferring_time': 9.77755900},\n",
75 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.37670290, 'inferring_time': 9.85879350},\n",
76 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.50792340, 'inferring_time': 9.83664550},\n",
77 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.77451570, 'inferring_time': 9.72117910},\n",
78 | " {'filename': 'file__2m.csv', 'file_size': 308078962, 'shuffle_time': 5.56910340, 'inferring_time': 9.84671710},\n",
79 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.42946810, 'inferring_time': 4.65625420},\n",
80 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.38822270, 'inferring_time': 5.17744930},\n",
81 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.74428740, 'inferring_time': 4.82960490},\n",
82 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.58021890, 'inferring_time': 5.17412620},\n",
83 | " {'filename': 'file__1m.csv', 'file_size': 153491820, 'shuffle_time': 2.67854850, 'inferring_time': 5.08991410}\n",
84 | "]"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "source": [
90 | "df = pd.DataFrame(benchmark_data)\n",
91 | "df"
92 | ],
93 | "metadata": {
94 | "colab": {
95 | "base_uri": "https://localhost:8080/",
96 | "height": 1000
97 | },
98 | "id": "R9ABJG7qnohL",
99 | "outputId": "fb7f4f3c-6542-44cb-9d1c-4ce44d9537bb"
100 | },
101 | "execution_count": 102,
102 | "outputs": [
103 | {
104 | "output_type": "execute_result",
105 | "data": {
106 | "text/plain": [
107 | " filename file_size shuffle_time inferring_time\n",
108 | "0 file__20m.csv 3100848423 73.000435 111.563570\n",
109 | "1 file__20m.csv 3100848423 72.972138 115.251914\n",
110 | "2 file__20m.csv 3100848423 82.320634 116.762997\n",
111 | "3 file__20m.csv 3100848423 77.676226 114.593858\n",
112 | "4 file__20m.csv 3100848423 73.269382 112.556434\n",
113 | "5 file__15m.csv 2322887546 55.826348 74.622513\n",
114 | "6 file__15m.csv 2322887546 42.934298 71.261897\n",
115 | "7 file__15m.csv 2322887546 42.870424 69.139627\n",
116 | "8 file__15m.csv 2322887546 42.146510 71.239783\n",
117 | "9 file__15m.csv 2322887546 42.219681 69.670533\n",
118 | "10 file__12m.csv 1856118441 32.489830 58.081118\n",
119 | "11 file__12m.csv 1856118441 34.643183 57.989308\n",
120 | "12 file__12m.csv 1856118441 34.854425 57.719420\n",
121 | "13 file__12m.csv 1856118441 33.383627 59.860559\n",
122 | "14 file__12m.csv 1856118441 32.797288 57.411564\n",
123 | "15 file__10m.csv 1544899668 28.288315 53.782832\n",
124 | "16 file__10m.csv 1544899668 30.251305 51.212875\n",
125 | "17 file__10m.csv 1544899668 29.832134 53.019589\n",
126 | "18 file__10m.csv 1544899668 30.219823 51.814748\n",
127 | "19 file__10m.csv 1544899668 29.523441 58.404082\n",
128 | "20 file__8m.csv 1235682644 22.604655 44.687176\n",
129 | "21 file__8m.csv 1235682644 23.847431 42.688675\n",
130 | "22 file__8m.csv 1235682644 22.948513 46.968077\n",
131 | "23 file__8m.csv 1235682644 22.775274 42.628585\n",
132 | "24 file__8m.csv 1235682644 22.208697 42.986066\n",
133 | "25 file__6m.csv 926480055 15.887059 28.341116\n",
134 | "26 file__6m.csv 926480055 17.087613 29.421471\n",
135 | "27 file__6m.csv 926480055 16.481102 29.210887\n",
136 | "28 file__6m.csv 926480055 17.106003 28.821917\n",
137 | "29 file__6m.csv 926480055 17.174157 29.268595\n",
138 | "30 file__4m.csv 617284424 11.478665 19.301656\n",
139 | "31 file__4m.csv 617284424 12.427617 19.835787\n",
140 | "32 file__4m.csv 617284424 11.447127 21.388650\n",
141 | "33 file__4m.csv 617284424 11.674226 23.900714\n",
142 | "34 file__4m.csv 617284424 11.752415 23.176530\n",
143 | "35 file__2m.csv 308078962 5.566590 9.777559\n",
144 | "36 file__2m.csv 308078962 5.376703 9.858794\n",
145 | "37 file__2m.csv 308078962 5.507923 9.836645\n",
146 | "38 file__2m.csv 308078962 5.774516 9.721179\n",
147 | "39 file__2m.csv 308078962 5.569103 9.846717\n",
148 | "40 file__1m.csv 153491820 2.429468 4.656254\n",
149 | "41 file__1m.csv 153491820 2.388223 5.177449\n",
150 | "42 file__1m.csv 153491820 2.744287 4.829605\n",
151 | "43 file__1m.csv 153491820 2.580219 5.174126\n",
152 | "44 file__1m.csv 153491820 2.678549 5.089914"
153 | ],
154 | "text/html": [
155 | "\n",
156 | " \n",
157 | "
\n",
158 | "
\n",
159 | "\n",
172 | "
\n",
173 | " \n",
174 | " \n",
175 | " \n",
176 | " filename \n",
177 | " file_size \n",
178 | " shuffle_time \n",
179 | " inferring_time \n",
180 | " \n",
181 | " \n",
182 | " \n",
183 | " \n",
184 | " 0 \n",
185 | " file__20m.csv \n",
186 | " 3100848423 \n",
187 | " 73.000435 \n",
188 | " 111.563570 \n",
189 | " \n",
190 | " \n",
191 | " 1 \n",
192 | " file__20m.csv \n",
193 | " 3100848423 \n",
194 | " 72.972138 \n",
195 | " 115.251914 \n",
196 | " \n",
197 | " \n",
198 | " 2 \n",
199 | " file__20m.csv \n",
200 | " 3100848423 \n",
201 | " 82.320634 \n",
202 | " 116.762997 \n",
203 | " \n",
204 | " \n",
205 | " 3 \n",
206 | " file__20m.csv \n",
207 | " 3100848423 \n",
208 | " 77.676226 \n",
209 | " 114.593858 \n",
210 | " \n",
211 | " \n",
212 | " 4 \n",
213 | " file__20m.csv \n",
214 | " 3100848423 \n",
215 | " 73.269382 \n",
216 | " 112.556434 \n",
217 | " \n",
218 | " \n",
219 | " 5 \n",
220 | " file__15m.csv \n",
221 | " 2322887546 \n",
222 | " 55.826348 \n",
223 | " 74.622513 \n",
224 | " \n",
225 | " \n",
226 | " 6 \n",
227 | " file__15m.csv \n",
228 | " 2322887546 \n",
229 | " 42.934298 \n",
230 | " 71.261897 \n",
231 | " \n",
232 | " \n",
233 | " 7 \n",
234 | " file__15m.csv \n",
235 | " 2322887546 \n",
236 | " 42.870424 \n",
237 | " 69.139627 \n",
238 | " \n",
239 | " \n",
240 | " 8 \n",
241 | " file__15m.csv \n",
242 | " 2322887546 \n",
243 | " 42.146510 \n",
244 | " 71.239783 \n",
245 | " \n",
246 | " \n",
247 | " 9 \n",
248 | " file__15m.csv \n",
249 | " 2322887546 \n",
250 | " 42.219681 \n",
251 | " 69.670533 \n",
252 | " \n",
253 | " \n",
254 | " 10 \n",
255 | " file__12m.csv \n",
256 | " 1856118441 \n",
257 | " 32.489830 \n",
258 | " 58.081118 \n",
259 | " \n",
260 | " \n",
261 | " 11 \n",
262 | " file__12m.csv \n",
263 | " 1856118441 \n",
264 | " 34.643183 \n",
265 | " 57.989308 \n",
266 | " \n",
267 | " \n",
268 | " 12 \n",
269 | " file__12m.csv \n",
270 | " 1856118441 \n",
271 | " 34.854425 \n",
272 | " 57.719420 \n",
273 | " \n",
274 | " \n",
275 | " 13 \n",
276 | " file__12m.csv \n",
277 | " 1856118441 \n",
278 | " 33.383627 \n",
279 | " 59.860559 \n",
280 | " \n",
281 | " \n",
282 | " 14 \n",
283 | " file__12m.csv \n",
284 | " 1856118441 \n",
285 | " 32.797288 \n",
286 | " 57.411564 \n",
287 | " \n",
288 | " \n",
289 | " 15 \n",
290 | " file__10m.csv \n",
291 | " 1544899668 \n",
292 | " 28.288315 \n",
293 | " 53.782832 \n",
294 | " \n",
295 | " \n",
296 | " 16 \n",
297 | " file__10m.csv \n",
298 | " 1544899668 \n",
299 | " 30.251305 \n",
300 | " 51.212875 \n",
301 | " \n",
302 | " \n",
303 | " 17 \n",
304 | " file__10m.csv \n",
305 | " 1544899668 \n",
306 | " 29.832134 \n",
307 | " 53.019589 \n",
308 | " \n",
309 | " \n",
310 | " 18 \n",
311 | " file__10m.csv \n",
312 | " 1544899668 \n",
313 | " 30.219823 \n",
314 | " 51.814748 \n",
315 | " \n",
316 | " \n",
317 | " 19 \n",
318 | " file__10m.csv \n",
319 | " 1544899668 \n",
320 | " 29.523441 \n",
321 | " 58.404082 \n",
322 | " \n",
323 | " \n",
324 | " 20 \n",
325 | " file__8m.csv \n",
326 | " 1235682644 \n",
327 | " 22.604655 \n",
328 | " 44.687176 \n",
329 | " \n",
330 | " \n",
331 | " 21 \n",
332 | " file__8m.csv \n",
333 | " 1235682644 \n",
334 | " 23.847431 \n",
335 | " 42.688675 \n",
336 | " \n",
337 | " \n",
338 | " 22 \n",
339 | " file__8m.csv \n",
340 | " 1235682644 \n",
341 | " 22.948513 \n",
342 | " 46.968077 \n",
343 | " \n",
344 | " \n",
345 | " 23 \n",
346 | " file__8m.csv \n",
347 | " 1235682644 \n",
348 | " 22.775274 \n",
349 | " 42.628585 \n",
350 | " \n",
351 | " \n",
352 | " 24 \n",
353 | " file__8m.csv \n",
354 | " 1235682644 \n",
355 | " 22.208697 \n",
356 | " 42.986066 \n",
357 | " \n",
358 | " \n",
359 | " 25 \n",
360 | " file__6m.csv \n",
361 | " 926480055 \n",
362 | " 15.887059 \n",
363 | " 28.341116 \n",
364 | " \n",
365 | " \n",
366 | " 26 \n",
367 | " file__6m.csv \n",
368 | " 926480055 \n",
369 | " 17.087613 \n",
370 | " 29.421471 \n",
371 | " \n",
372 | " \n",
373 | " 27 \n",
374 | " file__6m.csv \n",
375 | " 926480055 \n",
376 | " 16.481102 \n",
377 | " 29.210887 \n",
378 | " \n",
379 | " \n",
380 | " 28 \n",
381 | " file__6m.csv \n",
382 | " 926480055 \n",
383 | " 17.106003 \n",
384 | " 28.821917 \n",
385 | " \n",
386 | " \n",
387 | " 29 \n",
388 | " file__6m.csv \n",
389 | " 926480055 \n",
390 | " 17.174157 \n",
391 | " 29.268595 \n",
392 | " \n",
393 | " \n",
394 | " 30 \n",
395 | " file__4m.csv \n",
396 | " 617284424 \n",
397 | " 11.478665 \n",
398 | " 19.301656 \n",
399 | " \n",
400 | " \n",
401 | " 31 \n",
402 | " file__4m.csv \n",
403 | " 617284424 \n",
404 | " 12.427617 \n",
405 | " 19.835787 \n",
406 | " \n",
407 | " \n",
408 | " 32 \n",
409 | " file__4m.csv \n",
410 | " 617284424 \n",
411 | " 11.447127 \n",
412 | " 21.388650 \n",
413 | " \n",
414 | " \n",
415 | " 33 \n",
416 | " file__4m.csv \n",
417 | " 617284424 \n",
418 | " 11.674226 \n",
419 | " 23.900714 \n",
420 | " \n",
421 | " \n",
422 | " 34 \n",
423 | " file__4m.csv \n",
424 | " 617284424 \n",
425 | " 11.752415 \n",
426 | " 23.176530 \n",
427 | " \n",
428 | " \n",
429 | " 35 \n",
430 | " file__2m.csv \n",
431 | " 308078962 \n",
432 | " 5.566590 \n",
433 | " 9.777559 \n",
434 | " \n",
435 | " \n",
436 | " 36 \n",
437 | " file__2m.csv \n",
438 | " 308078962 \n",
439 | " 5.376703 \n",
440 | " 9.858794 \n",
441 | " \n",
442 | " \n",
443 | " 37 \n",
444 | " file__2m.csv \n",
445 | " 308078962 \n",
446 | " 5.507923 \n",
447 | " 9.836645 \n",
448 | " \n",
449 | " \n",
450 | " 38 \n",
451 | " file__2m.csv \n",
452 | " 308078962 \n",
453 | " 5.774516 \n",
454 | " 9.721179 \n",
455 | " \n",
456 | " \n",
457 | " 39 \n",
458 | " file__2m.csv \n",
459 | " 308078962 \n",
460 | " 5.569103 \n",
461 | " 9.846717 \n",
462 | " \n",
463 | " \n",
464 | " 40 \n",
465 | " file__1m.csv \n",
466 | " 153491820 \n",
467 | " 2.429468 \n",
468 | " 4.656254 \n",
469 | " \n",
470 | " \n",
471 | " 41 \n",
472 | " file__1m.csv \n",
473 | " 153491820 \n",
474 | " 2.388223 \n",
475 | " 5.177449 \n",
476 | " \n",
477 | " \n",
478 | " 42 \n",
479 | " file__1m.csv \n",
480 | " 153491820 \n",
481 | " 2.744287 \n",
482 | " 4.829605 \n",
483 | " \n",
484 | " \n",
485 | " 43 \n",
486 | " file__1m.csv \n",
487 | " 153491820 \n",
488 | " 2.580219 \n",
489 | " 5.174126 \n",
490 | " \n",
491 | " \n",
492 | " 44 \n",
493 | " file__1m.csv \n",
494 | " 153491820 \n",
495 | " 2.678549 \n",
496 | " 5.089914 \n",
497 | " \n",
498 | " \n",
499 | "
\n",
500 | "
\n",
501 | "
\n",
504 | " \n",
505 | " \n",
507 | " \n",
508 | " \n",
509 | " \n",
510 | " \n",
511 | " \n",
512 | " \n",
549 | "\n",
550 | " \n",
574 | "
\n",
575 | "
\n",
576 | " "
577 | ]
578 | },
579 | "metadata": {},
580 | "execution_count": 102
581 | }
582 | ]
583 | },
584 | {
585 | "cell_type": "code",
586 | "source": [
587 | "df['file_size'] = round((df['file_size'] / 1e+9), 3)"
588 | ],
589 | "metadata": {
590 | "id": "VBxKzFLjuyfL"
591 | },
592 | "execution_count": 103,
593 | "outputs": []
594 | },
595 | {
596 | "cell_type": "code",
597 | "source": [
598 | "df = df.groupby(['file_size', 'filename'], sort = False)['shuffle_time', 'inferring_time'].mean()\n",
599 | "df"
600 | ],
601 | "metadata": {
602 | "colab": {
603 | "base_uri": "https://localhost:8080/",
604 | "height": 398
605 | },
606 | "id": "fDBxYlEbn3TB",
607 | "outputId": "98af830a-86f4-435f-ac3e-91ec5f41b0da"
608 | },
609 | "execution_count": 105,
610 | "outputs": [
611 | {
612 | "output_type": "stream",
613 | "name": "stderr",
614 | "text": [
615 | "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.\n",
616 | " \"\"\"Entry point for launching an IPython kernel.\n"
617 | ]
618 | },
619 | {
620 | "output_type": "execute_result",
621 | "data": {
622 | "text/plain": [
623 | " shuffle_time inferring_time\n",
624 | "file_size filename \n",
625 | "3.101 file__20m.csv 75.847763 114.145755\n",
626 | "2.323 file__15m.csv 45.199452 71.186871\n",
627 | "1.856 file__12m.csv 33.633671 58.212394\n",
628 | "1.545 file__10m.csv 29.623004 53.646825\n",
629 | "1.236 file__8m.csv 22.876914 43.991716\n",
630 | "0.926 file__6m.csv 16.747187 29.012797\n",
631 | "0.617 file__4m.csv 11.756010 21.520667\n",
632 | "0.308 file__2m.csv 5.558967 9.808179\n",
633 | "0.153 file__1m.csv 2.564149 4.985470"
634 | ],
635 | "text/html": [
636 | "\n",
637 | " \n",
638 | "
\n",
639 | "
\n",
640 | "\n",
653 | "
\n",
654 | " \n",
655 | " \n",
656 | " \n",
657 | " \n",
658 | " shuffle_time \n",
659 | " inferring_time \n",
660 | " \n",
661 | " \n",
662 | " file_size \n",
663 | " filename \n",
664 | " \n",
665 | " \n",
666 | " \n",
667 | " \n",
668 | " \n",
669 | " \n",
670 | " 3.101 \n",
671 | " file__20m.csv \n",
672 | " 75.847763 \n",
673 | " 114.145755 \n",
674 | " \n",
675 | " \n",
676 | " 2.323 \n",
677 | " file__15m.csv \n",
678 | " 45.199452 \n",
679 | " 71.186871 \n",
680 | " \n",
681 | " \n",
682 | " 1.856 \n",
683 | " file__12m.csv \n",
684 | " 33.633671 \n",
685 | " 58.212394 \n",
686 | " \n",
687 | " \n",
688 | " 1.545 \n",
689 | " file__10m.csv \n",
690 | " 29.623004 \n",
691 | " 53.646825 \n",
692 | " \n",
693 | " \n",
694 | " 1.236 \n",
695 | " file__8m.csv \n",
696 | " 22.876914 \n",
697 | " 43.991716 \n",
698 | " \n",
699 | " \n",
700 | " 0.926 \n",
701 | " file__6m.csv \n",
702 | " 16.747187 \n",
703 | " 29.012797 \n",
704 | " \n",
705 | " \n",
706 | " 0.617 \n",
707 | " file__4m.csv \n",
708 | " 11.756010 \n",
709 | " 21.520667 \n",
710 | " \n",
711 | " \n",
712 | " 0.308 \n",
713 | " file__2m.csv \n",
714 | " 5.558967 \n",
715 | " 9.808179 \n",
716 | " \n",
717 | " \n",
718 | " 0.153 \n",
719 | " file__1m.csv \n",
720 | " 2.564149 \n",
721 | " 4.985470 \n",
722 | " \n",
723 | " \n",
724 | "
\n",
725 | "
\n",
726 | "
\n",
729 | " \n",
730 | " \n",
732 | " \n",
733 | " \n",
734 | " \n",
735 | " \n",
736 | " \n",
737 | " \n",
774 | "\n",
775 | " \n",
799 | "
\n",
800 | "
\n",
801 | " "
802 | ]
803 | },
804 | "metadata": {},
805 | "execution_count": 105
806 | }
807 | ]
808 | },
809 | {
810 | "cell_type": "code",
811 | "source": [
812 | "df.reset_index(inplace=True)"
813 | ],
814 | "metadata": {
815 | "id": "zh6NHO3csESF"
816 | },
817 | "execution_count": 106,
818 | "outputs": []
819 | },
820 | {
821 | "cell_type": "code",
822 | "source": [
823 | "import matplotlib.pyplot as plt\n",
824 | "import seaborn as sns\n",
825 | "\n",
826 | "plt.figure(figsize=(15,4))\n",
827 | "\n",
828 | "sns.set(style='white')\n",
829 | "\n",
830 | "df.set_index(['file_size', 'filename']).plot(kind='bar', stacked=True, color=sns.set_palette(\"colorblind\"))\n",
831 | "\n",
832 | "plt.title('Time taken by Shuffle Time & Inferring Time', fontsize=12)\n",
833 | "\n",
834 | "plt.xlabel('Files Sizes (Gigabytes)')\n",
835 | "plt.ylabel('Time (Seconds)')\n",
836 | "\n",
837 | "plt.xticks(rotation=90)"
838 | ],
839 | "metadata": {
840 | "colab": {
841 | "base_uri": "https://localhost:8080/",
842 | "height": 464
843 | },
844 | "id": "ikFD4xeysAVV",
845 | "outputId": "60dbaf2d-252f-4f2d-b2a2-ec9caa86909b"
846 | },
847 | "execution_count": 107,
848 | "outputs": [
849 | {
850 | "output_type": "execute_result",
851 | "data": {
852 | "text/plain": [
853 | "(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),\n",
854 | " )"
855 | ]
856 | },
857 | "metadata": {},
858 | "execution_count": 107
859 | },
860 | {
861 | "output_type": "display_data",
862 | "data": {
863 | "text/plain": [
864 | ""
865 | ]
866 | },
867 | "metadata": {}
868 | },
869 | {
870 | "output_type": "display_data",
871 | "data": {
872 | "text/plain": [
873 | ""
874 | ],
875 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAGLCAYAAADUPKXyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd1gU59rH8S9VVEhQVIIltthNAoqiMTbsRgRbbLH3htEDUVFRiVGxR49GTYw5iSQaGwgWEhu22PVYYyIRUVFQsADSd98/eNnjSnFBYAa9P9fldbmzu/P8mB32Zp7nmRkjrVarRQghhHgJY6UDCCGEKBqkYAghhDCIFAwhhBAGkYIhhBDCIFIwhBBCGEQKhhBCCINIwVCBTz75hJMnTyodI5OTJ0/SokWLAm9nwIABbNmypcDbmTp1KsuWLcvz+5ctW4aTkxPNmjUD4Pfff6dly5Y4ODhw9epVnJ2dOX78eH7FBeDMmTN06NAhX9epFK1Wy7Rp02jUqBE9e/bMl3VGRETg4OBAWlpavqzPUN7e3qxatapQ21QDU6UDvAkcHBx0/09ISMDc3BwTExMA5syZw65duwolx8qVK7l16xaLFy8ulPaUsGXLFtavX09kZCTFixenXr16LFu2DEtLy1dab0REBBs2bODgwYPY2NgA4Ovry8yZM2nbtm2e17tz505mzZoFQFpaGsnJyRQvXlz3/Pnz5wkODn6l7Lmxfv16NmzYQGJiInXq1OHbb7/FwsIi29dPnToVW1tbJk2a9NJ1nz17lmPHjhESEkKJEiXyJW/58uU5f/58vqzrecOHD+fs2bMAJCcnY2RkhJmZGQAuLi74+Pjke5tFgRSMQvD8Du3s7MzcuXP56KOPFEz0ejp16hTLli3ju+++o27dujx+/JiDBw/my7ojIiKwtrbWFYuMZTVq1Hil9Xbt2pWuXbsC6Ud0np6eHD58+JXWmVehoaEsX76crVu3Ur16dc6ePYuxcf51Qty9e5cKFSrkqVikpqZiamr60mX55bvvvtP9PzdF8XUnXVIq8HxXxsqVK3F3d8fDwwMHBwdcXFy4efMma9eupWnTprRs2ZKjR4/q3hsbG4uXlxcff/wxzZs3Z9myZVkenh8+fJi1a9eyZ88eHBwcdF9S27Zto1OnTjg4ONCmTRs2bdqUbc4ff/yRzp07c//+fZKTk/H19aVVq1Z89NFHeHt7k5iYCPyvK+v777+nadOmfPzxx2zbti3HbRAeHk7Pnj1p0KABY8aM4fHjxwCMHDmSn376Se+1Li4u/P7775nWcenSJezt7albty4A1tbWdOvWTe/o4unTp4wcORIHBwd69epFeHg4AHfu3KFWrVqkpqbqXpvRVXb8+HGGDh1KVFQUDg4OTJ48WdcN4urqmuURhkajYd26dbRt2xYnJycmTpyo+5ly48VuQWdnZ7777jtcXFywt7fHy8uLhw8fMnz4cBwcHBg8eDBPnjzRvf7ChQv06dMHR0dHunbtmmPXp6mpKSYmJlSoUAFTU1OcnJwwNzc3OGvGNtyxYwetWrXCycmJb775Bkg/8psxYwYXLlzAwcGBFStWAHDw4EFcXV1xdHSkT58+/Pnnn3o/67p163Q/661bt6hVqxZbtmyhVatWDBo0KNPnNmDAAJYvX06fPn1wcHBg6NChxMTE6Nbp7+9P69atcXJyYtWqVXnuRny+ezPjM/r22291+/u+ffsICQmhQ4cONG7cmDVr1ujem1/7hhKkYKhQxi/R6dOnqVOnDsOGDUOj0XD48GHGjRuHt7e37rVTp07F1NSU3377DX9/f44dO5bleECLFi0YNWoUnTp14vz58+zcuRMAGxsb1q5dy7lz55g/fz7z58/nypUrmd7/73//mx07drBx40beeecdFi9ezM2bN/H39+e3334jKipKr0/34cOHxMbGcvjwYb766it8fHz0vshe5O/vz7x58zh69CimpqbMnTsXADc3N11WgD///JOoqChatmyZaR0ffvghR48eZcWKFZw9e5bk5ORMr9m9ezfjx4/n9OnTvPvuuwaNaXz00Ud8++23lCtXjvPnz7N06VLdUWNAQAD79u3L9J6ffvqJffv2sXHjRo4cOcLbb7+db90Yv/32Gxs2bCA4OJiDBw8yYsQIJk+ezIkTJ9BoNLoCGxkZyahRoxgzZgynTp1iypQpuLu7632BPs/GxobSpUvj7u5OUlJSnvOdPXuWvXv38p///IdVq1YRGhpKr169mDNnDvb29pw/fx53d3euXr2Kl5cXPj4+nDx5kt69ezN27Fi9z23Xrl2sW7eOM2fO6LpxT58+ze7du1m/fn2W7QcFBTF//nz++OMPUlJS+P777wG4ceMGc+bMYdGiRRw5coS4uDgiIyPz/HM+7+HDhyQlJXH48GHc3d2ZMWMGO3fuZNu2bfj5+bF69Wpu374NFOy+UdCkYKiQo6MjzZs3x9TUlI4dO/Lo0SNGjhyJmZkZnTt35u7duzx9+pSHDx8SEhKCl5cXJUqUwMbGhsGDB+dqTKRVq1a8++67GBkZ0bhxY5o1a8aZM2d0z2u1WubPn8+xY8f48ccfKV26NFqtll9//RUvLy+sra2xtLRk1KhReu2ampoybtw4zMzMaNmyJSVKlODmzZvZ5nB1daVmzZqUKFGCiRMnsnfvXtLS0mjTpg1hYWGEhYUB6V/QnTp1yvIvX0dHR1auXMnVq1cZNWoUTk5OzJ8/X++Iq23btnzwwQeYmprStWtXrl27ZvC2yo1NmzYxadIk3nnnHczNzRk/fjzBwcF6RzB59dlnn1GmTBlsbW1xdHTkgw8+oG7duhQrVox27dpx9epVIH1btWjRgpYtW2JsbEyzZs2oX78+ISEhWa534sSJ9O7dm8qVKzN27Fhd0fDw8Mh0lJeT8ePHY2FhQe3ataldu7beUcPzNm/eTO/evfnwww8xMTGhW7dumJmZceHCBd1rBgwYgJ2dnd44yoQJEyhRokS2Yyvdu3enatWqWFhY0LFjR91nvHfvXlq3bo2joyPm5ua4u7tjZGRk8M+VE1NTU8aMGaP7HX306BEDBw7E0tKSGjVq8N5773H9+nWgYPeNgiZjGCr0fD+5hYUFpUqV0v11lfFL8uzZM6KiokhNTeXjjz/WvV6j0WBnZ2dwWyEhIaxatYqwsDA0Gg2JiYnUrFlT93xsbCy//vory5Ytw8rKCoCYmBgSEhLo3r277nVarRaNRqN7bG1trde/XLx4cZ49e5Ztjuczly9fnpSUFB49ekSZMmXo1KkTO3fuZPz48QQFBem6M7LSsmVLWrZsiUaj4eTJk0ycOJGqVavSp08fAMqUKaN7rYWFRY6ZXkVERATjxo3TGwMwNjYmOjoaW1vbV1r38z9DsWLFsv2ZIiIi2Lt3r944TmpqKk5OTpnW+c8//3Du3DnWrFmDiYkJnp6ejB07ln//+99cuHCBUaNG5SlfTp97REQE/v7+bNy4UbcsJSWFqKgo3eOs9uV33nknx/bLli2bZftRUVF67y1evDjW1tYv+WkMY21tnel39Pnf42LFihEfHw8U7L5R0KRgFGEZf6GcOHHCoMG/F/+aSk5Oxt3dHV9fX9q0aYOZmRljx47l+QsYv/XWWyxatIjPP/+cf//73zRs2JBSpUphYWHBrl278m0Hv3fvnt7/zczMKFWqFADdunXjiy++oGHDhhQvXlxv1ll2jI2Nadq0KU2aNOHvv/9+6eszBmITExN1Yx4PHjzIy48CpH828+bNo2HDhnlex6uys7PD1dVV172Xk7S0NNLS0tBqtRgbG7NgwQLGjRuHm5sb1atXf+XB/ezyjR49mjFjxmT7mqyOAPJ6VFCuXDm9o9zExERFxg7UsG/klXRJFWHlypWjWbNmLFiwgLi4ODQaDeHh4Zw6dSrL19vY2HD37l3dkUBycjLJycmULl0aU1NTQkJCOHbsWKb3OTk5sXjxYiZMmMDFixcxNjamV69ezJs3j+joaCC9v/zIkSN5/ll27tzJjRs3SEhI4Ouvv6ZDhw66v9gcHBx0X2IZg/VZ2bdvH7t27eLJkydotVouXrzIqVOn+PDDD1/afunSpbG1tSUgIIC0tDS2bt2q63POi759+7J8+XLu3r0LpB+VZTXWUZC6du3KwYMHOXLkCGlpaSQlJXHy5Enu37+f6bXVqlWjcuXKzJkzh9jYWFJTU/noo48ICwujRIkSFMRdEHr16sWmTZv473//i1ar5dmzZxw6dIi4uLh8bwugQ4cOHDhwgHPnzpGcnMzKlSsL5Od6GTXsG3klBaOIW7hwISkpKXTu3JlGjRrh7u6e7V/GHTt2BNILQMbsoRkzZvD555/TqFEjgoKCcHZ2zvK9zZo1Y968eYwePZorV67g6elJ5cqV+fTTT2nQoAGDBw/OcYziZVxdXZk6dSrNmjUjOTmZ6dOnZ3r+r7/+wtXVNdt1vP322/z666+0b9+eBg0a4OnpybBhw3IsMs/78ssvWb9+PU5OTty4ccOgI5nsDBw4EGdnZ4YOHYqDgwOffvopFy9ezPP68sLOzo7Vq1frzbBbv369XtdhBhMTE9auXUtsbCxt27alRYsWnDt3ju3bt3P16lWWL1+e7/nef/99vvzyS3x8fGjUqBHt27dn+/bt+d5Ohho1ajBz5kwmT55M8+bNKVGiBKVLl87VTLD8oIZ9I6+M5AZKoijw9/dn8+bN/PLLL0pHEa+J+Ph4GjVqRHBwMJUqVVI6TpEgRxhC9RISEvj555/p3bu30lFEEXfgwAESEhJ49uwZvr6+1KxZk4oVKyodq8iQgiFU7ciRIzRt2hQbGxu6dOmidBxRxO3fv5/mzZvTvHlzbt26xdKlS/Ntau2bQLqkhBBCGOS1nVabmJjI5cuXKVu2rG62jRBCiJylpaXx4MED6tevn+nkyNe2YFy+fJn+/fsrHUMIIYokPz8/HB0d9Za9tgUj42xPPz+/l54ZKoQQIt39+/fp37+/3hnzGV7bgpHRDfXOO+/ILAghhMilrLryZZaUEEIIg0jBEEIIYRApGEIIIQzy2o5hCCHyV0pKCnfu3NHdWVEUbRYWFlSsWFF3r3JDSMEQQhjkzp07WFlZUaVKFTk7uojTarVER0dz584dqlatavD7pEtKCGGQxMREbGxspFi8BoyMjLCxscn10aIUDCGEwaRYvD7y8llKwRBCCGGQN75gaFLzbwAvP9clhNolpqQVqfWKV/fGD3obm1pwc3n+3HGr6ufJ+bIeIYoCCzMTjD0C8329msUueXrfyZMn8fX1zdNd+6ZNm8bFixepUaMGy5cv13vcokULDh06xIoVK/KU64cffsDFxQUbGxsAfvnlF5KSkhg8eHCe1qekN75gCCHebA8fPiQ4OJgzZ85gbGyc6fGr3jb2xx9/5KOPPtIVjL59++ZHbEVIwRBCFDkJCQlMmTKFGzduYGpqStWqVenXrx9paWl4e3tz/vx5jIyMWLZsGdWrV2f79u16RwkZj+fNm8fAgQNJTEykW7dutG/fnl27duked+vWjbfeekuv7R07dvDzzz+TlpaGpaUls2fPplq1alnm/Oabb4iKisLd3Z1ixYqxZMkS9uzZw7Nnz5gyZQrbt28nKCgIKysrrl+/jq2tLTNnzsTX15fw8HDq16/P4sWLMTIyIi4ujvnz53P9+nWSkpJwcnJi2rRphXr7hjd+DEMIUfQcPXqU+Ph4du/ezc6dO/Hx8QHgxo0b9OnTh8DAQDp16sTq1atzXI+lpSXr1q3DysqKgIAAxo0bp/f4xW6jM2fOsGfPHvz8/Ni+fTvDhg3Dy8sr2/WPGTOGcuXKsWLFCgICAnjvvfcyvebSpUtMmzaNvXv3YmFhwb/+9S+WLFnCrl27+Ouvv/jjjz8AmD9/Po0aNWLr1q0EBAQQExPDtm3bcrnlXo0cYQghipzatWsTGhrKnDlzaNy4Ma1atQKgatWq1K1bFwB7e3sOHjyYr+0eOHCAP//8k169egHpJ8A9ffr0ldbZoEED3S0Y6tSpQ4UKFXRHNbVr1+bWrVt89NFHHDhwgIsXL7JhwwYg/bwYW1vbV2o7t6RgCCGKnEqVKhEUFMSJEyc4fPgwy5YtY8aMGZib/28Ci7GxMampqUD6pbo1Go3uuaSkpDy1q9Vq6dGjBxMnTny1H+A5xYoV0/3fxMQk0+O0tDRd26tXr6ZSpUr51nZuScEQQuRJYkpanmc0vWy9FmY598vfv3+ft99+m7Zt29KsWTOaN2/OkydPsn195cqVuX79OsnJ6TMZg4ODM41NGMLZ2ZkpU6bQu3dv3nnnHdLS0rh27Rr169fP9j0lS5YkNjY2121l1fa6deuYPXs2JiYmxMTEEB8fX6gFRAqGECJPXvalXpDrvX79OkuWLAFAo9EwcuRIypUrl+3r7e3tadq0KZ988gnlypWjdu3aPHjwINfZGjVqxOeff86YMWNIS0sjJSWFjh075lgwBg4ciJeXFxYWFrrMeeHl5cWiRYtwdXXFyMgIMzMzvLy8CrVgGGm1Wm2htVaI7ty5Q5s2bdi/f/9L77gn52EI8XLXrl2jTp06SscQ+SirzzSn706ZJSWEEMIg0iUlhBCvaMuWLWzcuDHT8gULFrxWR2VSMIQQ4hX16tVLN9X2dVZoBcPX15fg4GDu3r1LYGAgNWvW5M6dO4wbN073mtjYWOLi4jh16hSQPivA3NxcN83Mw8OD5s2bF1ZkIYQQzym0gtGmTRsGDhxI//79dcsqVqxIQECA7vFXX32lm3OcYcWKFdSsWbOwYgohhMhGoRUMR0fHHJ9PTk4mMDCQ9evXF1IiIYQQuaGaMYwDBw5ga2tLvXr19JZ7eHig1Wpp2LAhkydPztPJNkKI/KdJTcTY1KLIrFe8OtUUjG3bttGjRw+9ZX5+ftjZ2ZGcnMxXX32Fj48PixcvViihEOJ5+XkvmecZej6Tq6srmzdvxsIi5+Jy7tw5vL29MTU1ZerUqTRp0uSV8hnabm5du3aNmzdv0rlz5wJvK69UcR5GZGQkp0+fxsVF/zIDdnZ2AJibm9OvXz/OnTunRDwhhAoFBAQY9EUaEBCAm5sb/v7+uSoWGdehevGxoe3m1rVr19i7d6/esoJqK69UcYSxY8cOWrZsSalSpXTLnj17RlpaGlZWVmi1Wnbv3v1azWcWQryaWrVqce7cOUqWLImzszOurq4cP36cBw8eMHToUD777DO+++479uzZg4WFBYGBgWzevJmIiAjmzZvHo0ePSElJYdCgQbrejVq1ajF+/HgOHTpE8+bNuX//PiYmJty8eZP4+HgCAgIMahfSL4U+Z84cAJycnNi/fz9r167NchLPo0ePWLFiBXFxcbi6utKoUSNmzJiRqS0XFxdOnDhBZGQk//rXv4iOjiYoKIgnT54wb948GjVqBEBISAjffPMNycnJmJmZMW3aNOzt7V95mxdawZg7dy6//fYbDx8+ZMiQIVhbW7Nr1y4gvWBMnz5d7/XR0dFMmDCBtLQ0NBoN1atXZ9asWYUVVwhRxCQmJrJ582bu3LmDi4sL3bp1Y/jw4dy4cYP69evz2WefkZqaioeHB4sWLaJ69erExcXRo0cP7O3tqV69OpB+9diM+0xMnTqVa9eusXHjRkqUKGFwu2ZmZkyePJmlS5fi6OjI77//zk8//ZRt9lKlSuHu7v7SW8EmJyezefNmLl68yMCBA/H09GTr1q3s3r2bpUuX8ssvvxAeHs7q1atZv349lpaW/P3334wYMYJDhw7lfeP+v0IrGDNmzGDGjBlZPhccHJxpWaVKlfD39y/oWEKI10RG33/FihV56623uH//vq4IZAgLCyM0NJTJkyfrlqWkpPDPP//oXtutWze993Ts2DHbYpFduykpKVhYWOhmh7Zr1y5fJuxktFWvXj0SEhLo1KkTAPXr1yc8PByAI0eOEB4erncKQ2pqKg8fPqRMmTKv1L4quqSEEOJVZXcfiedptVpKlSqld/7Xi14sDjkVC0PbzS8ZbWXcljXj8fP3/gBo3rw5CxcuzPf2VTHoLYQQhaFq1apYWFjo9V6EhoYSFxeXr+1Uq1aNhIQEzp49C8C+ffteemc+S0vLfLlvRrNmzThy5Ah///23btnFixdfeb0gRxhCiDzSpCYWyCX9C/I8DFNTU9asWcO8efNYv349Go0GGxsbli9fnq/tmJubs2TJEmbPng1A48aNsbGxwcrKKtv3NG3alO+//56uXbvSuHHjbLvwX6ZKlSosWrSI6dOnk5iYSEpKCg0aNOCDDz7I0/qeJ/fDQO6HIYQh5H4YuRMXF4elpSUAJ06cYNq0aezfvx9jY/V07OT2fhhyhCGEEAXgt99+44cffkCr1WJubs7ixYtVVSzyQgqGEEIUgO7du9O9e/csl784MP7hhx/i4+NTWNHyTAqGEEIUou3btysdIc+K9vGREKJQvaZDnm+kvHyWUjCEEAaxsLAgOjpaisZrQKvVEh0dnevrVEmXlBDCIBUrVuTOnTs8ePBA6SgiH1hYWLx0BumLpGAIIQxiZmZG1apVlY4hFCRdUkIIIQwiBUMIIYRBpGAIIYQwiBQMIYQQBpGCIYQQwiBSMIQQQhhECoYQQgiDFNp5GL6+vgQHB3P37l0CAwN1N0J3dnbG3Nxcd+coDw8PmjdvDsCFCxfw9vYmKSmJChUqsGjRImxsbAorshBCiOcU2hFGmzZt8PPzo0KFCpmeW7FiBQEBAQQEBOiKhUajwdPTE29vb4KDg3F0dGTx4sWFFVcIIcQLCq1gODo6YmdnZ/DrL1++TLFixXQ3Ue/Tpw979+4tqHhCCCFeQhWXBvHw8ECr1dKwYUMmT57MW2+9xb179yhfvrzuNaVLl0aj0fD48WOsra0VTCuEEG8mxQe9/fz82LlzJ9u2bUOr1RaJm4gIIcSbSPGCkdFNZW5uTr9+/Th37pxueUREhO51MTExGBsby9GFEEIoRNGC8ezZM2JjY4H067Pv3r1bd0Py+vXrk5iYyJkzZwDYtGkTHTt2VCyrEEK86QptDGPu3Ln89ttvPHz4kCFDhmBtbc2aNWuYMGECaWlpaDQaqlevzqxZswAwNjZm4cKFzJo1S29arRBCCGUUWsGYMWMGM2bMyLTc398/2/c0aNCAwMDAgowlhBDCQIqPYQghhCgapGAIIYQwiBQMIYQQBpGCIYQQwiBSMIQQQhgkx1lSqampHDhwgEOHDvHnn38SGxuLlZUVtWvXpkWLFrRt2xZTU1VcXUQIIUQBy/bb/pdffmHt2rVUr16dRo0a0bp1a0qWLEl8fDyhoaFs2bKFBQsWMGrUKPr27VuYmYUQQigg24IRHh7Oli1bKFu2bKbn2rVrx+jRo4mKimLDhg0FGlAIIYQ6ZFswpkyZ8tI3lytXzqDXCSGEKPoMGoC4ceMG1tbWlClThri4OL7//nuMjY0ZNmwYxYsXL+iMQgghVMCgWVKTJ0/m6dOnACxcuJDTp0/rbp8qhBDizWDQEcbdu3epVq0aWq2W33//nV27dmFhYUGbNm0KOp8QQgiVMKhgFCtWjLi4OEJDQ7Gzs6N06dKkpqaSlJRU0PmEEEKohEEFo0uXLgwaNIj4+Hg+++wzAK5evUrFihULNJwQQgj1MKhgeHl5cfToUUxNTWnSpAkARkZGTJs2rUDDCSGEUA+DT9P++OOP9R6///77+R5GCCGEemVbMPr164eRkdFLV+Dn55evgYQQQqhTtgWjV69euv+Hh4ezbds2unXrRvny5YmIiMDf358ePXoUSkghhBDKy7ZgdOvWTff/Tz/9lPXr11OjRg3dMhcXF7y8vHB3dzeoIV9fX4KDg7l79y6BgYHUrFmTR48e8cUXXxAeHo65uTmVK1fGx8eH0qVLA1CrVi1q1qyJsXH66SILFy6kVq1aefpBhRBCvBqDTtwLDQ3l3Xff1VtWsWJF/vnnH4MbatOmDX5+flSoUEG3zMjIiOHDhxMcHExgYCCVKlVi8eLFeu/btGkTAQEBBAQESLEQQggFGVQwGjVqxNSpUwkLCyMxMZGbN28yffp0HB0dDW7I0dEROzs7vWXW1tY4OTnpHtvb2xMREWHwOoUQQhQegwrGggULgPTzMRwcHHBxcUGr1TJv3rx8C6LRaPjll19wdnbWWz5gwABcXV1ZsmQJycnJ+daeEEKI3DFoWq21tTXLli1Do9EQExND6dKldeMK+eXLL7+kRIkSuhMDAQ4dOoSdnR1xcXF4enqyatUqJk2alK/tCiGEMIzB52HExsZy8+ZN4uPj9ZY3bdr0lUP4+vpy69Yt1qxZo1eIMrqwLC0t6dWrl9x7QwghFGRQwdi+fTs+Pj6UKFECCwsL3XIjIyP279//SgGWLl3K5cuXWbduHebm5rrlT548oVixYlhYWJCamkpwcDB16tR5pbaEEELknUEFY9myZXz99de0bNkyzw3NnTuX3377jYcPHzJkyBCsra1Zvnw5a9eupUqVKvTp0wdIn321atUq/vnnH7y9vTEyMiI1NRUHBwcmTpyY5/aFEEK8GoMKRlpaWqZLg+TWjBkzmDFjRqbl169fz/L1Dg4OBAYGvlKbQggh8o9BI9cjRozgm2++QaPRFHQeIYQQKmXQEcYPP/zAw4cP+e6777C2ttZ77tChQwWRSwghhMoYVDAWLVpU0DmEEEKonEEFo3HjxgWdQwghhMoZNIaRkpLCihUraNOmDe+//z5t2rRhxYoVcuZ1AdGkJqpyXUKIN5vBXVIXL15kzpw5usubr169mri4OLy8vAo64xvH2NSCm8vNX/5CA1T9XIq6ECJ/GFQw9u7dS0BAAKVKlQKgWrVq1K1bF1dXVykYQgjxhjCoS0qr1eZquRBCiNePQQWjY8eOjBkzhiNHjhAaGsrhw4cZN24cnTp1Kuh8QgghVMKgLilPT0+++eYbfHx8iIqKwtbWls6dOzN27NiCzieEEEIlDCoY5ubmTJw4Ua7lJIQQbzCDuqTWrVvHxYsX9ZZdvHiRb7/9tkBCCSGEUB+DCsaPP/7Ie++9p7esevXq/Oc//ymQUEIIIdTH4BP3TE31e6/MzMzkxD0hhHiDGFQw6tWrx88//6y3bNOmTXN82h0AACAASURBVNStW7dAQgkhhFAfgwa9p02bxpAhQ9i5cyeVKlXi9u3bPHjwQG6ZKoQQbxCDCkaNGjUIDg7m0KFD3Lt3j/bt29OqVStKlixZ0PmEEEKohEEFA6BkyZI0aNCAyMhI7O3tCzKTEEIIFTJoDCMiIoI+ffrQqVMnhgwZAqRfX2r69OkFGk4IIYR6GFQwvL29adWqFefOndPNlmrWrBnHjx83qBFfX1+cnZ2pVasWf/31l275zZs36d27Nx06dKB3796EhYUZ9JwQQojCZ1DBuHTpEiNHjsTY2BgjIyMArKysiI2NNaiRNm3a4OfnR4UKFfSWz5o1i379+hEcHEy/fv3w9vY26DkhhBCFz6CCYWNjw61bt/SW3bhxAzs7O4MacXR0zPTa6Ohorl69SpcuXQDo0qULV69eJSYmJsfnhBBCKMOggjF06FBGjx7Ntm3bSE1NJSgoiEmTJjFixIg8N3zv3j1sbW0xMTEBwMTEhHLlynHv3r0cnxNCCKEMg2ZJ9ezZE2trazZv3oydnR07duxg4sSJtG3btqDzCSGEUAmDp9W2bds2XwuEnZ0dkZGRpKWlYWJiQlpaGlFRUdjZ2aHVarN9TgghhDJy7JK6fPmy3qymmJgY/vWvf9G1a1e8vb2Jj4/Pc8M2NjbUqVOHoKAgAIKCgqhTpw6lS5fO8TkhhBDKyLFgzJs3j4cPH+oeT58+nbCwMHr37s3ff//NokWLDGpk7ty5tGjRgvv37zNkyBA++eQTAGbPns3GjRvp0KEDGzduZM6cObr35PScEEKIwpdjl1RoaCiOjo4APH36lCNHjhAYGEjVqlVxdnamT58+zJ49+6WNzJgxgxkzZmRaXr16dbZs2ZLle3J6TgghROHL8QgjLS0NMzMzAC5cuECZMmWoWrUqkD4G8fTp04JPKIQQQhVyLBjvvfcee/bsAWD37t00bdpU91xkZCRWVlYFm04IIYRq5Ngl5eHhwZgxY5g9ezbGxsZ698TYvXs3DRo0KPCAQggh1CHHguHo6MjBgwcJCwujSpUqWFpa6p5r2bIlnTt3LvCAQh00qYkYm1qobl1CiMLz0vMwLC0tqV+/fqbl1apVK5BAQp2MTS24udw8X9ZV9XO5ta8QRVG2Yxg9evRgz5492d63Ozk5md27d9OrV68CCyeEEEI9sj3C8PX1ZcWKFcyePZt69epRtWpVSpYsSXx8PGFhYVy5coUmTZqwYMGCwswrhBBCIdkWjPfee48VK1bw4MEDjh07xl9//cWjR4946623cHV1ZeHChdjY2BRmViGEEAp66RhG2bJlcXNzK4wsQgghVMygy5sLIYQQUjCEEEIYRAqGEEIIg0jBEEWaJjVRVesR4nVm0A2UtFotW7ZsISgoiEePHhEYGMjp06d58OCBnO0tFJVfJxTKyYRCvJxBRxhff/01W7dupXfv3rr7ar/zzjt89913BRpOCCGEehhUMHbs2MGaNWv45JNPMDIyAqBixYrcvn27QMMJIYRQD4MKRlpaGiVLlgTQFYz4+HhKlChRcMmEEEKoikEFo2XLlsyfP193XSmtVsvXX39N69atCzScEEII9TBo0HvatGlMmTKFhg0bkpqaioODA82aNcPX1/eVA9y5c4dx48bpHsfGxhIXF8epU6dwdnbG3NycYsWKAen352jevPkrtymEECL3DCoYlpaWrFq1iocPHxIREYGdnR1ly5bNlwAVK1YkICBA9/irr74iLS1N93jFihXUrFkzX9oSQgiRdwYVjAwWFhbY2tqi0WiIjIwEwNbWNt/CJCcnExgYyPr16/NtnS+jSU3ItymVmtQEjE2L58u6hBBCbQwqGMePH2fmzJlERESg1Wp1y42MjLh27Vq+hTlw4AC2trbUq1dPt8zDwwOtVkvDhg2ZPHkyb731Vr61B2BsWhxjj8B8WZdmsUu+rEcIIdTIoIIxffp0xo4dS+fOnbGwKLhba27bto0ePXroHvv5+WFnZ0dycjJfffUVPj4+LF68uMDaF0IIkT2DZkklJSXRvXt3SpYsiYmJid6//BIZGcnp06dxcfnfX+l2dnYAmJub069fP86dO5dv7QkhhMgdgwrG4MGD+e677/S6o/Lbjh07aNmyJaVKlQLg2bNnxMbGAunTeHfv3k2dOnUKrH0hhBA5M6hLqn379gwbNoy1a9fqvtAz7N+/P1+C7Nixg+nTp+seR0dHM2HCBNLS0tBoNFSvXp1Zs2blS1tCCCFyz6CC4e7ujqOjIx07diywMYzg4GC9x5UqVcLf379A2hJCCJF7BhWMO3fu4O/vj7GxXA1dCCHeVAZVgDZt2nDixImCziKEEELFDDrCSE5OZsyYMTg6OmJjY6P33MKFCwskmBBCCHUxqGDUqFGDGjVqFHQWIV4LmtREjE1ffawvv9YjRH4xqGCMHz++oHMI8dqQuwCK11W2BeP06dM0atQIgD/++CPbFTRt2jT/UwkhhFCdbAvGnDlzCAoKAtA7P+J5RkZG+XYehhBCCHXLtmAEBQURFBREly5dOHDgQGFmEkIIoUI5Tqv19vYurBxCCCFULseCUZDXjhJCCFG05DhLSqPRcOLEiRwLhwx6CyHEmyHHgpGcnMz06dOzLRgy6C2EEG+OHAtG8eLFpSAIIYQADLyWlBBCCCGD3kIIIQySY8E4f/58YeUQQgihctIlJYQQwiBSMIQQQhjEoKvVFjRnZ2fMzc0pVqwYAB4eHjRv3pwLFy7g7e1NUlISFSpUYNGiRZnuxyGEEKJwqKJgAKxYsYKaNWvqHms0Gjw9PZk/fz6Ojo6sXr2axYsXM3/+fAVTCiHEm0u1XVKXL1+mWLFiODo6AtCnTx/27t2rcCohhHhzqeYIw8PDA61WS8OGDZk8eTL37t2jfPnyuudLly6NRqPh8ePHWFtbK5hUCCHeTKo4wvDz82Pnzp1s27YNrVaLj4+P0pGEEEK8QBUFw87ODgBzc3P69evHuXPnsLOzIyIiQveamJgYjI2N5ehCCCEUonjBePbsGbGxsUD6meW7d++mTp061K9fn8TERM6cOQPApk2b6Nixo5JRhRDijab4GEZ0dDQTJkwgLS0NjUZD9erVmTVrFsbGxixcuJBZs2bpTasVQgihDMULRqVKlfD398/yuQYNGhAYGFjIiYR4/WhSEzE2tVDdukTRonjBEJlpUhOo+nlyvq3L2LR4vqxLFF3GphbcXG6eL+vKr31TFD1SMFTI2LQ4xh75c2SlWeySL+sRQgjFB72FEEIUDVIwhBBCGEQKhhBCCINIwRBCCGEQKRhCCCEMIgVDCCGEQaRgCCGEMIgUDCGEEAaRgiGEEMIgUjCEEEIYRC4NIgwi17cSQkjBEAaR61uJ/CZX0C16pGAIIRQhV9AtemQMQwghhEGkYAghhDCIdEmJIi2/BuNlIF6Il1O8YDx69IgvvviC8PBwzM3NqVy5Mj4+PpQuXZpatWpRs2ZNjI3TD4QWLlxIrVq1FE4s1CS/BuNlIF6Il1O8YBgZGTF8+HCcnJwA8PX1ZfHixcybNw+ATZs2UbJkSSUjCiGEQAVjGNbW1rpiAWBvb09ERISCiYQQQmRF8SOM52k0Gn755RecnZ11ywYMGEBaWhotWrRgwoQJmJvnzzQ8IYQQuaP4EcbzvvzyS0qUKMFnn30GwKFDh9i+fTt+fn7cuHGDVatWKZxQCCHeXKopGL6+vty6dYvly5frBrnt7OwAsLS0pFevXpw7d07JiEII8UZTRZfU0qVLuXz5MuvWrdN1OT158oRixYphYWFBamoqwcHB1KlTR+GkQrycTPUVryvFC8bff//N2rVrqVKlCn369AGgYsWKDB8+HG9vb4yMjEhNTcXBwYGJEycqnFaIl5OpvuJ1pXjBqFGjBtevX8/yucDA/LnYnRBCiFenmjEMIYQQ6iYFQwghhEGkYAghhDCIFAwhhBAGUXzQWwhR8OQWuyI/SMEQ4g0gt9g1jNw2NmdSMIQQ4v/JbWNzJmMYQgghDCIFQwghhEGkS0oIoQgZiC96pGAIIRQhA/FFj3RJCSGEMIgUDCGEEAaRgiGEEMIgUjCEEELlNKmJqliPDHoLIYTK5dcJha86K00KhhBC/D+Z6pszKRhCCPH/ZKpvzmQMQwghhEFUf4Rx8+ZNpk6dyuPHj7G2tsbX15cqVaooHUsIIQpNfnWVvWo3meoLxqxZs+jXrx+urq4EBATg7e3Njz/+qHQsIYQoNPnVVfaq3WSqLhjR0dFcvXqVDRs2ANClSxe+/PJLYmJiKF26dI7vTUtLA+D+/fsvbcf0WcyrhwXu3LmTL+sByZQb+ZFLMhnudd+n1JgJCm+fyvjOzPgOfZ6RVqvVvnKKAnL58mWmTJnCrl27dMs6d+7MokWLqFevXo7vPXPmDP379y/oiEII8Vry8/PD0dFRb5mqjzBeRf369fHz86Ns2bKYmJgoHUcIIYqEtLQ0Hjx4QP369TM9p+qCYWdnR2RkJGlpaZiYmJCWlkZUVBR2dnYvfa+FhUWm6iiEEOLlKleunOVyVU+rtbGxoU6dOgQFBQEQFBREnTp1Xjp+IYQQIv+pegwDIDQ0lKlTp/L06VPeeustfH19qVatmtKxhBDijaP6giGEEEIdVN0lJYQQQj2kYAghhDCIFAwhhBAGkYIhhBDCIFIwhBBCGETVJ+6JzG7evMn9+/exsLCgRo0aWFpaKh1JlWQ7CZH/pGBkIzw8nD/++EP3pVO7dm2aNGlCsWLFCj1LXFwcGzZsYOvWrZibm2NjY0NycjK3b9/mww8/ZPjw4TRp0qTQcwE8e/aMc+fO6W2n9957T5Esat5OoK59SjJJpryQ8zBecOHCBZYsWUJMTAwffvghZcuWJSkpidDQUEJDQ3Fzc2PkyJFYWFgUWqYePXrg6upK586dKVOmjG65RqPh7NmzbNq0icaNG9O7d+9Cy3T37l1WrlzJ4cOHqVGjBmXKlCE5OZnQ0FCMjIwYOnQoPXr0KLQ8oM7tBOrcpySTZMoTrdAzadIk7bVr17J87tmzZ9pNmzZpt2zZUqiZkpKS8uU1+emzzz7T7tu3T5uSkpLpuTt37miXLFmi3bhxY6FmUuN20mrVuU9JJsmUF3KEUYSsWrWK7t27G3TxxTeZbCchCobJ7NmzZysdQo0GDhyIiYkJlStXxtRUHUM9Bw8e5KuvvuLw4cMYGxtTpUoVxbN5eXlhbW1N+fLlFc3xPDVuJ1DnPiWZDCOZ0knByEapUqXYtWsX8+bNIywsDGtra8X/Yv34448ZNGgQ1tbW7N69m/nz53Pr1i1at26tWKb79++zdu1a1q9fT1xcHJUqVVJ8RpIatxOoc5+STJIpN6RL6iUePXrErl272LFjB/Hx8ezdu1fpSAD89ddffP/99wQGBnLlyhWl43D9+nX8/f3ZtWsXNWrUYP369UpHAtS3nUCd+5RkkkyGUMexlYoZG6ef26jValG6tj5+/JigoCC2b99OfHw83bp1Y9++fYpmylCjRg0aN27MrVu3OHXqlKJZ1LydQF37VAbJZJg3PZMcYWTjwIED7Nixg7Nnz9KmTRvc3Nxo2LChopmaNGlCu3btVJElw/Xr19mxYwdBQUHUrFmTbt260a5du0KdZvgiNW4nUOc+JZkkU25IwcjGkCFD6NatG+3bt1f0y+95iYmJqsmSoUOHDnTr1g1XV1fF+3QzqHE7gTr3KclkGMmUTgqGAZKTk3ny5Ally5ZVNMeGDRvo2bMnVlZWeHp6cunSJWbMmMHHH3+saC61KQrbSS371PMkk2He5Exy8cFsTJo0idjYWBITE3FxceGTTz5RfCB3+/btWFlZceLECWJiYpg3bx5Lly5VNNOCBQuIjY0lNTWVfv36YW9vT0BAgKKZ1LidQJ37lGSSTLkhBSMbN2/exMrKikOHDuHk5ERISAj+/v6KZjIxMQHg5MmTuLi40KBBA8UH3o4fP46VlRVHjx7F1taW4OBgvv/+e0UzqXE7gTr3KckkmXJDCkY2UlNTATh9+jQtW7akePHiutkISrGwsGDdunXs2rWLZs2aodVqSUlJUTRThtOnT9OuXTtsbW0xMjJSNItat5Ma9ynJJJlyQwpGNqpXr87w4cM5ePAgTZs2JTExUelIzJ8/nwcPHuDh4UHZsmW5ffs2Li4uimaysbFh1qxZ7Nmzh2bNmpGamkpaWpqimdS4nUCd+5Rkkky5IYPe2UhMTOTo0aPUqlWLSpUqERkZyfXr12nRooXS0VQlJiaGnTt3Ym9vj729PXfu3OHUqVN0795d6Wiqo8Z9SjJJplwpkEsavgauX7+ujY+P1z2Oi4vT/vXXXwom0mr79Omjffz4se7xo0ePtP369VMwkVYbHR2tdwXYpKQkbXR0tIKJ1LmdtFp17lOSyTCSKZ10SWVj6tSpmJmZ6R6bmZkxZcoUBROl36zo7bff1j22trYmPj5ewUQwatQovS6o1NRURo8erWAidW4nUOc+JZkMI5nSScHIRlpamt6HYW5urnjfvEajISEhQfc4Pj5eN/CllOTkZIoXL657XKJECZKSkhRMpM7tBOrcpySTYSRTOikY2TA1NeX27du6x+Hh4brpmkrp0qULQ4YMISAggICAAIYNG0bXrl0VzQTp4xgZoqOj0Wg0CqZR73ZS4z4lmQwjmf6/zQJdexE2fvx4+vbtS8uWLQEICQlh7ty5imYaNWoU5cqV48CBAwD06dMHNzc3RTMNGDCAvn374urqCkBAQAAjR45UNJMatxOoc5+STJIpN2SWVA7CwsI4duwYkH6PhcqVKyucSJ1OnjxJSEgIAK1ataJx48YKJ1IvNe5TkskwkkkKRpG3efNmevfurXQM1ZPtJMSrkzGMXJg5c6bSETKJjIxUOkImK1euVDpCJmrcTqDOfUoyGeZNzCS3aM2lqlWrKh1Bj5OTk9IRMrl58yb169dXOoYeNW6nDGrbp0AyGepNyyRdUkVMeHg44eHhetPnMga9xP/IdhIi/8ksqWwkJiYSFBREeHi43hz+L774QrFMCxcuxN/fn6pVq+ouMmZkZKT4F+Eff/yRaTv1799fsTxq3U5q3Kckk2TKDSkY2Rg/fjzGxsbUq1cPc3NzpeMAsG/fPvbv3693opzSPD09uX79OrVr11Z8XnoGNW4nUOc+JZkMI5nSScHIxr1799i1a5fSMfTY2dnpndmpBpcuXWLXrl2qKRagzu0E6tynJJNhJFM6KRjZqFGjBlFRUZQrV07pKDpTp05l9OjRNGvWTO8vCiW7f959910SEhKwtLRULMOL1LidQJ37lGQyjGRKJ4Pe2bhx4wbDhw+ndu3aFCtWTLf866+/VizTpEmT+Oeff6hVq5beX/Tz589XLFNoaCgeHh40bNhQ78tZyb5dNW4nUOc+JZkkU27IEUY2vvjiC5ydnalbt65quluuXLlCcHCw4ne0e97cuXOxtbXFyspKttNLqHGfkkyGkUzppGBkIyUlBW9vb6Vj6KlSpQrPnj2jZMmSSkfRuX//Pnv27FE6hh41bidQ5z4lmQwjmdJJwciGvb09169fp1atWkpH0bG0tKR79+40b95cNd0/tWrVUl3frhq3E6hzn5JMhpFM6aRgZOPixYv06NGDqlWr6vUPbt26VbFM1apVo1q1aoq1n5XY2FhcXFxwcHBQTd+uGrcTqHOfkkySKTdk0Dsbp06dynK5XIlV344dO7Jc3q1bt0JOon5q3Kckk2EkUzopGEXQH3/8wblz56hbty6tW7dWOo6qnDlzhj179nDv3j0g/ZyMTp064ejoqHAykRehoaFUr15d6Rji/8nVal+QlJTEqlWr6NKlC46Ojjg6OuLi4sKqVatITExUJNPzl+X29/fH29ubuLg4li5dyoYNGxTJBLBz507WrFnDn3/+qbd87dq1iuRZvXo1Pj4+VKhQARcXF1xcXKhQoQI+Pj6sWrVKkUyQ/qU3cuRIvL29efLkCaNHj8bBwYHevXsTGhqqSKZt27bp7pR4//59Bg0aRIMGDejXrx/h4eGKZEpISMj0b8SIESQmJurdcrcwJScn88033zBz5kwOHTqk99yXX36pSKacFPTVauUI4wWTJ0+mRIkS9OnTh/LlywMQERHBpk2biIuLY/ny5YWeyc3NDX9/fyC9eCxZsoSKFSvy+PFjBgwYQGBgYKFnWrRoEefPn6du3boEBwczbNgwBg8eDKR3R2XXVVWQ2rdvT2BgoF5/LqRfc8fFxYXff/+90DNB+gmDHTt2JDY2lj179uDm5oabmxuHDh3C39+fn376qdAzdenShaCgIAA+//xz7O3t6dq1K4cPH8bf358ffvih0DPVrl0bIyMjsvpKMjIy4tq1a4WeycvLi4SEBD744AO2bdtG06ZNmT59OqDcfp6TVq1aZSps+UkGvV+QMYf/eaVLl2bu3Ll06NBBkUzPn0+QkpJCxYoVAbC2tsbUVJmPMCQkhB07dmBmZsaYMWMYO3YscXFxjB8/Pstf+MKg1WqzPPciuy+hwhIXF8eAAQOA9Bs5DRs2DIAePXooUiwAvYvV3bp1S/eHkJubmyLFAtK/gI2NjZk2bZruygHOzs66W+0q4dKlS7o/yPr27cvkyZPx8vLiq6++Umyfatq0aZbLtVotsbGxBdq2FIwXGBsbc/v2bSpVqqS3PDw8XLETwW7evEnPnj3RarWEh4cTFxen+4VKSUlRJBOgu16TjY0N69evZ8yYMSQlJSm2ndzc3OjVqxdubm56R4f+/v6K3tM7NTWVpKQk4uPjefr0KdHR0djY2JCQkEBSUpIimd59910OHDiAs7Mz7777LmFhYVSpUoUHDx4okgfSz8Q/ePAggwcPxt3dnRYtWih+8uXzl8e3sLBg5cqVeHh44OnpiUajUSSTVqvlhx9+wMrKKtPyvn37FmjbUjBe4OnpSd++falfvz4VKlQA4O7du1y+fBkfHx9FMq1bt07vccYv0YMHDwp8B8mOpaUl4eHhvPvuu7rH3377LaNGjeKvv/5SJNO4ceNwcnJi9+7dnDx5EoDy5cszffp0RWezuLi40KlTJ1JTU5kwYQLu7u7UqlWLs2fP0qZNG0UyzZo1i/Hjx7NhwwbefvttevXqRb169bh3756id5Jr3bo19vb2fPnll+zevVvvC1sJZcqU4c8//6R27doAmJiYsGTJEqZMmcLff/+tSKb69evz6NEjXabn2draFmjbMoaRhWfPnnH48GG9mTbNmzdX3ZnDSjp//jxWVla89957esuTk5PZsmWL4hf6U5uMiQG1a9cmIiKCvXv3UrFiRdq3b69oruPHj3Pjxg00Gg12dna0aNFCNZeF37NnD6dOnWLWrFmKZQgLC8PMzEz3x2MGrVbL4cOHFbnHSnJyMiYmJspcokQrirSVK1cqHaFI2Lp1q9IRXmrMmDFKR8hEjZl69OihdIRM1JipID47mVb7gqI2jU7JM02zo2SXRnZWrlypdISXioiIUDpCJmrM9PyAvVqoMVNBfHYyhvGC2bNn66bRLV68mGPHjumm0Z07d06RTD179sxyuVarJTo6upDTvNyRI0cUaXfixIlZLtdqtTx58qSQ0+Se0gO8WZFMhnlTMknBeIEap9GFhYWxZMmSTH3LWq2WSZMmKZJJyal92QkJCcHLyyvT3fa0Wq1uEFwIkXdSMF6gxml0devWxdLSkoYNG2Z6TqlbkWoVnNqXnTp16lC7dm0++OCDTM8peTFEIV4XMobxgoxpdBkyptEZGRkpNo1u0aJF2V7C+ODBg4WcJl3G1L4KFSro/atYsWKBT+3LzqxZs7Czs8vyuZ9//rmQ0+TejRs3lI6QiRozKfV7mBM1ZiqIz04Kxgt8fHwy/dVsbGzMwoULWbNmjSKZbG1ts71ntrHx/z7C2bNnF1Ki9Os2ZXdug1ID8bVr16Zs2bJZPvf8tMjVq1cXVqRcUeNF9tSY6cWp3GqgxkwF8dlJwXhBlSpVMs25hvQBpOfnXGc3EK2k//73v4XWlrm5uUHzwMeOHVsIaXJHqWtKvcybMnD6qiSTYQoikxSMPFLjNDo1UuO0TKUmLwhR1EnByCM1/kWhRmrcTmrMJERRIAVDCJVQ45GPZDLMm5JJCsZrRI0zNdRIySv8ArobF71IqXNqQH2Z4uLiuHLlSpbPLViwoJDTpFNjJijcz04KRh6p8ctZjTM11DgtUyn//e9/ad26te5+55cuXdK7jIoSF7JTY6aQkBA++eQTJkyYoMs0evRo3fNZXaX1TcykxGcnBSOP1PjlrMa+eTVOy1TqZMf58+fz7bffUqpUKQDef/99xS43o+ZMK1asYOvWrbz11lu6TErdNlbNmZT47KRg5JEav5zVSLbT/6SkpGT6Q0Op4pVBjZmATOfTmJubK5Tkf9SWSYnPTgqGEIXE3Nyc+Ph4XRG9ceNGpvuPSyYoWbIkDx8+1GU6efJkppNpJZMyn51cS+o1osaZGmqk1HYaPXo0w4YNIyoqiqlTp3LkyBEWLVqkSBY1Z/Lw8GDEiBHcuXOHAQMGEBYWxjfffCOZXqDEZyd33MsjNzc3/P39lY6hZ8uWLfTq1UvpGHrUuJ1WrVrFuHHjFGn79u3bHDlyBK1Wy8cff0zlypUVyaH2TLGxsbr+eAcHB93YgZLUmKmwPzspGDmIi4vj1q1b1KtXL9Nzz9/nt7CEhYUxbdo0IiMjOXDgAFeuXOHAgQO6mRtKiomJoXTp0pmWh4SEFPpMm+joaObPn8+9e/fw8/Pjzz//5Pz584pdRVeI14UUjGyEhITg7e2NiYkJBw4c4NKlS6xatUqxCxACDB48mKFDh7JkyRICAgLQaDS4uLiwa9cuxTL997//5fPPP0ej0RASEsKlS5f49ddfFb074ZgxY2jRogU///wzgYGBJCcn06NHD919Tgpbjx49chz8V+JijWrM1KRJkywzabVajIyM1pKA9AAAHmJJREFU+OOPPyQTyn52MoaRjYxpdCNGjADUMY0uNjaWFi1asHTpUiD9SrVKz2jJmNrn4eEBpG+nqVOnKpopMjKSvn37snnzZiB9cPD5q/oWtilTpijWdnbUmGnbtm1KR8hEjZmU/OykYORAbdPoTExMSElJ0f11ERkZqegXIahzWqapqf5u/fTpU0UnBGR3GXglqTFTVleJVpoaMyn52UnByIYap9H169eP8ePH8+jRI1auXIm/v7+il5MAdU7LbNeuHd7e3sTHx7N9+3Z+/vlnevTooVieRYsW4enpibu7e5ZdCUrcDVCNmTw9PVm0aFG2XS5KdJOpMZOSn50UjGyocRqdm5sbFStW5ODBgyQkJODr64ujo6OimdQ4LXPEiBHs3LmTp0+fEhISwoABA3B1dVUsT8Zn1Lp1a8UyvEiNmQYNGgSoq7tMjZkU/ey0IltPnz7VHjp0SHvo0CHtkydPlI6jWuHh4Vo/Pz/txo0btWFhYUrHUZ1JkyZptVqt9ocfflA4yf+oMVP//v21Wq1Wu3DhQoWT/I8aMyn52ckRRg6srKwUufjai7I79MygRPfB8ypVqkS/fv0UzQCwcOHCHJ//4osvCimJvowLVfr7++v+YlWaGjNFR0fz6NEjjh49yoQJEzKNOxUvXlwyoexnJwXjBWqcRqemboMMapyWWaJEiUJv0xD169enYcOGJCUl0bRpU91yJfcpNWZq3749rVq1Ijk5GXt7eyD9WmQZma5duyaZUPazk/MwXnD37t0cn1fjrAklnDp1Ksfn1TgLR0kPHz5k0KBBrFu3LtNzSu1TaswE0L9/f/z8/BRrPytqy6TUZycFowj4z3/+w6BBg7LtclGqq0Vt9uzZQ6dOnbL9xe7fv38hJ9IXHx9PyZIlFc3wIjVmEoZR4rOTLqkXqHEanYWFBaCuLhc1TsvMuFnT5cuXC71tQ6jxi1mNmYRhlPjspGC8QI3T6G7evAmk34yoU6dOCqdJp8ZpmRm/QD179qRhw4YKpxHi9SMF4wULFixg48aNhISE4OnpqXQcAN0g1rp161RTMAIDA2ndujVPnz5VzSybwMBAhg4dyty5c9mxY4fScYR47UjBeIEap9GVK1cOFxcX7ty5Q8+ePTM9r0Q3mRqnZRYrVozRo0dz9+5dJk6cmOl5pacfi7yJiorC2tpa8UvzPE+NmQqDFIwXqHEa3erVq7l69Sqenp6qGeBW47TMNWvWcPz4ca5fv06rVq0Kvf3c6tSpE6ampowcORIXFxel4wDqzDRo0CCePHnCwIEDGT16tNJxAHVmKozPTmZJZUNt0+ggfSyjatWqSsfQUeu0zJMnT+Lk5KRY+4aKjIwkKiqKs2fPMnjwYKXjAOrMBOn3Wzl79izt2rVTOoqO2jIVxmcnBUO8EpmWKcSbQ9lrY4siT4rFyx05coQnT54A6Zda9/LywsXFBU9PT2JiYiTT/3N3d+f3338nNTVVkfazEhoaysiRI/H29ubJkyeMHj0aBwcHevfuTWhoqCKZUlJS2LhxI35+fqSmprJ7927GjBnDsmXLSE5OLtC25QhDiALWpUsXdu7cibGxMTNmzKBEiRJ06dKFI0eOcPXqVVatWiWZSL8sj52dHZGRkbi4uNCjRw9q1qxZ6Dme179/fzp27EhsbCx79uzBzc0NNzc3Dh06hL+/Pz/99FOhZ/L29ubx48ckJiZiaWlJSkoKnTp1Yv/+/VhbWzNz5swCa1sGvYUoYFqtVnejqytXruim/H7wwQeKXXZdjZneeecdduzYwZUrV9i+fTsDBgygUqVK9OjRAxcXFywtLQs9U1xcHAMGDABg8+bNDBs2DEi/lpoSxQLg/PnzulsPN23alGPHjmFhYUHbtm0L/L4v0iVloKioqAI/3MutwYMHM3r0aM6dO6d0FFWbMmUKc+fOJSIiQpH2S5UqxcWLF4H0uzhGR0cD6eM/Go1GMv2/jCsG1KtXj5kzZ3LkyBGGDBnCvn37aN68uSKZUlNTSUpKIiYmhqdPn+q2U0JCAklJSYpkyrijpLm5OZUqVdJdCcLc3BwTE5OCbbtA1/4aUeM0usmTJxMREcGePXto0KCB0nEAdU7LbNeuHbdu3WLBggWsWLGi0NufPn06EyZMwNHRkbJly/Lpp5/SpEkTLl68yP+1d+ZBUZz5G39gOFbEqKOCEMMmHjCi4o4ORwKIoOuAMAzCglhRs4IRDYcoKMYkaimaeCGGiEahJEt2ASEKaMmRLQVcdAtxjaAVghIhyCnBg2F0Dnh/f/ijl+EwuJHpNnk/VVM13T3zvs+80P12v+/zfr+rV6/Wuh6uauo7Om5gYABPT094enqiubmZFU0SiQQeHh5Qq9UIDw9HREQErKyscO3aNSxYsIAVTTo6OlCr1dDT09NwKCqVSnR1dQ1v3XQOY+hwzUbHRbhqy2SbJ0+e4Ny5c7hz5w66urpgbm4ODw8PmJmZUU3/T0FBAcRiMSt1P4+qqioAgEAgQENDAwoKCjBp0iQsWrSINT2TJ0/ut2iwvr4e169fh7e397DVTTuMV4SLFy9CV1cXLi4uKC8vR35+PqysrODv78+2NM7y448/oqKiAgKBAAKBgG05FMorD53DeAGG033wPOLj45GYmIj4+Hjs3bsXhw8fhomJCXJzc5GQkMCKJi7aMsPDw5n3xcXFWLlyJS5cuICQkBDk5OSwoglg1wb5PHJzc3Hs2DHmDrqHL7/8kiVFg8PWuadQKJCYmAgvLy+IRCKIRCJIJBIcOXIET58+ZUVTz3nXQ1ZWFrZu3YrU1NR+w3ovG/qE8QLMnz8fRUVFWq9XIpEgOzsbT548gZOTE4qKijBmzBjI5XIsXboUZ8+e1bomLtoyfXx8kJ2dDQBYsWIFtm7diunTp6OhoQGhoaHMMW3Dpg1yMPbv34/r16/D2toaBQUFCA4OZoYQlyxZwrngjWydexs3boSRkRECAwNhbm4OAGhsbER6ejpkMhni4+O1rqn33yc5ORmFhYWQSCS4cOECZsyYgaioqGGrm05696F3XKTeEELQ0dGhZTXP0NPTA4/Hg7GxMSwsLDBmzBgAz/JjDLcrYjC4aMvsnZdDJpNh+vTpANjPksimDXIwiouLcebMGejr62PdunX44IMPIJPJEBYWNux3qYPBxXPv1q1bKCgo0NjH5/MRGxvL2nxL77/P+fPnceLECfD5fPj5+cHf3592GNqEEIKUlBSMGjWq3/5ly5axoqm7u5sJ6rdnzx4NTWytiu2xZdrY2DC2zHHjxrFqy7x37x7Wr18PQghaWlqgVCqZiUE2Vw+zaYN8Hvr6+gCAcePGITk5GevWrYNCoXhurvbhhIvnnq6uLurr6/HGG29o7P/pp59Ya6fe9ero6IDP5wN4Fkm7539tuKAdRh9mzpyJBw8eDDhJampqyoIiIDo6Gk+fPsWIESMwc+ZMZn9dXR2WLFnCiiYu2jK3bt3KvHd1dYVcLoeBgQFaWlpYs0AC7NogB8PY2Bg//fQTLCwsmO0TJ04gJCQE1dXVrGji4rm3adMmLFu2DDNnzmSeVBsaGnDz5k3s3LmTFU3V1dV4++23QQhBZ2cn2tvbwefzoVarqa1W2yiVSvB4PFbv/F4VuGbL5Cps2iAH4/r16xg1ahSmTp2qsV+pVCIzM5OV/OdcPffkcjlKSkrQ1NQEADAzM4OzszNrcdQaGho0tk1MTKCvr68V2z/tMF4Rrl27hsbGRjg4OGDChAnM/jNnzrD2lPEqIJVKWXVIUSi/JWiH0QeFQoGkpCTk5eUxq0vNzMzg7u6O4OBgZvxZm5w8eRJpaWmYPHkyKioqsGPHDmbREFuOlkePHmH06NHMdlZWFv7zn/9g+vTpWL58OSvjuwNlI/zhhx9gZWUFgJ3MhD3k5uaisbER8+fP1xhy+fLLLxESEqJ1PSqVCikpKTh79iwaGxvB4/EwdepUBAcHw83NTet6AOCbb76Bq6sr+Hw+mpubERMTg8rKSggEAnz22WfM8Jk26ejoQGJiInR0dBAaGoq0tDTk5ubC0tISH3/8MWNA4QqffPIJdu3aNWzl0w6jD1y00UkkEqSlpcHY2Bg1NTUIDQ3FmjVr4Ovrq2El1SZsWvsGw9PTE0KhEFKpFIQQEEIQFRWFuLg4AICdnZ3WNQHctLB++OGH0NPTw7x585CXl4c333wT06ZNw4kTJ+Dn58fKkJSXlxfOnTsHAIiMjMSf/vQneHt7o6SkBNnZ2UhJSdG6psjISJiamkIul6Ourg5TpkyBj48PCgoK0NbWhn379mld0/MYdvsxoWiwaNGi/+nYcOLl5aWx3dLSQjw9PUlaWhrx8fFhRZNUKmXe+/r6kp9//pkQQohcLieenp6saFIoFOSzzz4j69atI62trYQQQtzc3FjR0htPT0+iVCoJIYS0tbWRgIAAkpCQQAjRbEdt4uHhwbxXqVQkMDCQEELIw4cPiVgsZkVT73r7/l+z1U49555arSZ2dnZErVYTQgjp7u7ud15qCwcHhwFf9vb2xNraeljrpi6pPnDRRqevr4/79+8zcxcmJib46quvEBQUhLq6OlY0sWntGwwDAwPExMSgvLwca9euZeUueTC4ZmHl8XhQKBQwNDSETCZjIq+OHj2a0aptLCwscOHCBbi5ucHCwgK1tbV48803cf/+fVb0AP+1RPN4PJiZmTET8jo6Osw6JG1DWLQf0w6jD1y00YWGhqKpqUljsnvcuHFISUnBiRMnWNHEprXvlxCJRPj666+xb98+TmQE5KKF1cvLCwEBARCJRCgtLWU617a2Nlb0AMD27dsRFhaGkydPYvTo0fD398eMGTPQ1NTEWmgQXV1dpmPtPfQrl8tZ0QOwaz+mcxgDwDUbHRfpa+2bMGECDAwMaETfAeCihRUASktLUVVVhRkzZsDBwYEVDQNx+fJl3LlzB93d3TAzM8O8efMwYsQIVrTcv38ffD6/n9W3ubkZd+/eHXR1+nDCpv2YdhivACqVChkZGdDR0cHSpUtRWFiIs2fPwtLSEqGhof38/b9XuOjcolB+S9BotS8AW4/Fu3btQllZGYqLi7F582bk5eVBIpHg3r172Lt3LyuaampqsGbNGmzbtg2PHj3C2rVrIRQKsXTpUtTU1LCiqXf+jeTkZGRmZsLa2hoXL15knFJcg61ow8+Dahoav0dNdA7jBbh06RIr9XIxeN22bdvg7u6Ojo4OLF++HD4+Pti9ezeKioqwY8cOVvIdExaDsv2v9J6X4gpU09D4PWqiHUYfuBgxk4vB62QyGVasWAEAyMjIQHBwMADAz8+Plc4C4KZz65cIDAxkW0I/qKah8XvUxM2ziEXYtKwNBheD16nVaigUCnR2duLx48dMtNonT54wFk1tw2Xn1mCUlpbC0dGRlbobGxuRn5+vYe4Qi8WshoOnmritic5h9KHHsvb6669rvCZNmsRaxMw9e/YwIcNNTEyY/S0tLXj//fdZ0SSRSODh4QEfHx+Eh4cjIiICO3fuRGBgIGuRYQsLC5GVlYVvvvkG+fn5TKf/+PFjREREsKLpl/joo49YqTczMxPLli1DQ0MDTE1NYWpqioaGBrz77rvIzMykmqimAaEuqT5wNWImF+lJ7SkQCJg7nkmTJjFxrijPGCx8BCEEp06dwrVr17SsCBCLxUhLS2OG7Xpob29HYGAgCgsLqSaqqR/0CaMPbCe1eVEyMjJYq1sgEDCLh8zNzREUFMTZzoJNR0tqaioMDQ1hZGSk8Ro5ciRrVt/u7u5+FxzgWWIstu4hqaahwaYmOofxAiQkJCA8PJxtGRq0tLSwLaEfXGwnNh0tlpaWEIvFA67MZWtYw8nJCatXr0ZAQIBGkM1Tp06xNqdCNXFfEx2SegHS09M56YzgGrSdNCktLYWFhUW/+GTAszwnc+fO1bqm7u5u5ObmIi8vD42NjQCePSW6u7tDKpWyEieJauK+JtphvCIolUo8ePCg38T77du3MW3aNJZUvTqw6UaiUH4r0DmMAaiursbt27cBALW1tUhJScHly5dZ0/Ovf/0Ljo6O8PLygq+vr0aE2s2bN7OmazBKS0vZltAPttxIAHDz5s3nHlcqlVpfHU81DQ2qSRM6h9GH1NRUnDx5Emq1GsHBwcjJycGsWbOQnp6OFStWsBIo7tChQ0hNTYVAIMCZM2ewatUqJCYmQiAQsDbx9jw++uij4U3iMgjPcyOxtegSAI4fPw65XA4vLy/Mnj0b48ePh0KhwN27d3Hp0iUUFxdjy5YtmDJlCtVENXFaEx2S6oO3tzfS09Mhl8uxYMECFBQUYOLEiWhvb0dQUBAr2e365qUuKytDTEwMDh8+jO3bt7OSsY2LVtFZs2Zh9erVA7rcUlJSUF5ernVNPVRUVCAjIwNlZWVobm7GiBEjYGlpiYULF+Ivf/kLjI2NqSaqifOa6BNGH3R1dRnb4xtvvIGJEycCAPh8PmsWyK6uLiYmP/As1WhcXBzWr1/P2qrq1NTUQS/ObLUTF91IPdjY2MDGxoZVDX2hmoYG1fRfaIfRh54V1cCz/N69UalU2pYDAFi8eDHKy8s1Jm2FQiEOHz7MWlInLl6cN27cOGjOEq5Gq6VQXiXokFQfTp8+DbFY3O/CU1NTg1OnTuHDDz9kSRm34KJVlEKhDC+0w3gFuHnzJmbOnDnocaVSifr6eq1OvHER2k4UyvBCbbV94KKN7vjx41i9ejWys7Nx9+5ddHR0oK2tDVevXkVcXBz8/f3R2tqqVU20nSiU3x/0CaMPERERQ7KsaTuXL9ecGrSdKJTfH7TDGAB60RkatJ0olN8XtMOgUCgUypCgcxgUCoVCGRK0w6BQKBTKkKAdBoVCoVCGBO0wKKwjFApRX18PANiyZQsOHTqktbobGxshFArR1dWltToPHjyIlJSUF/pO7zb6NSQkJCA6OvpXl/NrqKqqovlSXlFoaBCK1nBzc0NbW5tG/Kn8/Hxcv359WOttbm7G7t27UVZWBrVaDTMzMwQFBcHX1xfm5ubDXn9v2tvbkZ2djW+//ZbZJ5PJkJCQgG+//Rbt7e0YM2YMbGxsEBwcjNmzZwOAVjUOxooVK+Dt7Q1/f/9fVY5AIMCoUaNw4cIFuLm5vSR1FG1AOwyKVjl27Bjeeecdrda5adMmCAQCXLx4EQYGBqiursb9+/e1qqGH06dPw8XFBX/4wx8APFvg+N577+G1117DsWPHMGXKFCgUCpSUlKCkpITpMH5rSCQSZGRk0A7jFYMOSVFYx8rKSiMpVG8uXrwIqVQKkUiEwMBAVFVVMceOHz8OZ2dnCIVCiMViXLlyZcAybt68CV9fXxgZGUFPTw/W1tZwcXEBANy7dw9WVlZQq9W4fv06hEIh85o1axZzQevu7sbx48excOFC2NvbY/369Xj48CEAQKFQIDo6Gvb29hCJRPDz80NbW9uAWkpKSmBra8ts5+TkoKWlBUeOHIGlpSV4PB6MjIzg7u6ukRe9dxs9ePAAa9euxZw5c+Dn54dDhw5h2bJlzGdjY2Ph4uKCOXPmwNfXt19Yd6VSicjISAiFQixZsoRp06SkpH652GNjYxEbG4tDhw6hvLwcO3fuhFAoZIJe1tTUYNWqVbCzs4NYLMb58+eZ7xYXF2Px4sUQCoVwdnZGcnIyc8ze3h5XrlyBUqkcsJ0oHIVQKFrC1dWVlJaW9ttvaWlJamtrCSGExMTEkLi4OEIIIbdu3SIODg7ku+++I2q1mpw+fZq4uroShUJBampqyLx580hzczMhhJD6+npSV1c3YL3vvfceWbp0KTl37hxpaGjQOFZfX08sLS2JSqXS2K9UKsm7775LDhw4QAghJCUlhfj7+5OmpiaiUCjIJ598QjZs2EAIISQtLY2EhIQQuVxO1Go1qaysJB0dHQNqsbe3Jzdu3GC2IyMjSUxMzC+2Xe82ioyMJJGRkUQul5Pbt2+TefPmkcDAQOaz2dnZpL29nahUKpKcnEzeeecd8vTpU0IIIZ9//jmxtrYmeXl5RKlUkqSkJOLq6kqUSiVpaWkhs2fPJo8ePSKEEKJSqYiDgwOprKwkhBCyfPlycurUKaaezs5OMm/ePJKVlUVUKhW5desWsbOzI7dv3yaEEOLo6EiuXr1KCCHk4cOH5ObNmxq/SSgUku+///4XfzuFO9AnDIpWCQ0NhUgkgkgkwgcffPDcz2ZkZGDp0qWYPXs2eDwelixZAn19fXz33Xfg8XhMvCqVSoVJkybBwsJiwHIOHz4MkUiExMRELFiwAFKpFBUVFc+tOzY2FiNHjsSGDRsAAOnp6diwYQMmTpwIAwMDhIWFoaCgAGq1Gnp6enj48CHq6urA4/Ewc+bMQVe5d3R0aERCfvDgAcaPH89sf//99xCJRJgzZw7EYnG/73d1daGwsBDh4eEYMWIEpk6dCh8fH43PSKVSjB07Fnp6eggKCoJSqcTdu3eZ4zNmzIC7uzv09fWxatUqKJVK3LhxAyYmJhCJRMjPzwcAXLp0CWPHjh00oGNRURFef/11+Pn5MU9uYrGY+b6enh7u3LkDmUyG0aNHY8aMGRrfHzlyJKuZECkvDp3DoGiVI0eODHkOo7GxEdnZ2fj666+ZfSqVCq2trbCzs8PWrVuRkJCAO3fuwMnJCVu2bIGpqWm/ckaPHo3o6GhER0ejvb0d+/btQ2hoKEpKSgasNz09HWVlZcjMzISuri6jJTQ0lNkGniXb+vnnnyGVStHc3IyNGzfi8ePH8Pb2xoYNG6Cvr9+v7Ndeew2dnZ3M9pgxYzTmU6ZPn47y8nJcvnwZH3/8cb/vt7e3MxP3PfR+DwDJycnIyspCa2srdHR0IJPJ8ODBA+Z4T1Kwnt9gamrKBGVcsmQJ0tLSEBAQgNzcXEil0gHbCAAaGhpQUVEBkUjE7Ovq6oK3tzcA4PPPP8fRo0dx8OBBWFlZISoqCkKhkPlsZ2cnRo0aNWj5FO5BOwwKZzEzM8PatWuxbt26AY9LJBJIJBLIZDJs27YNBw4cwP79+59bJp/PR1BQEM6cOcPMQfSmvLwchw8fxj/+8Q+Np4SJEydiz549g+b5CAsLQ1hYGO7du4c1a9bgrbfeGtBNZGVlhdraWiZb2ttvv42EhATI5XIYGRk9V3uPfj09PTQ3N+Ott94CADQ1NWnoT0pKQkpKCqZNmwZdXV3Y2tpq5H5vbm5m3nd3d6OlpQUmJiYAgIULF2LHjh2orq5GUVERNm3aNKgWMzMz2Nra4uTJkwMet7GxwdGjR6FSqfD3v/8dkZGRKC4uBgC0tLRApVJh8uTJv/ibKdyBDklROIu/vz/S09Nx48YNEEIgl8tRVFQEmUyGH3/8kZk0NTAwgKGhocbdf2/279+P6upqqNVqyGQypKWl4Y9//CPGjh2r8bmmpiZERkZi7969zMW4h2XLliE+Ph4NDQ0Ant3p//Of/wQA/Pvf/8YPP/yArq4uGBsbQ09Pb1AtLi4uuHr1KrPt4+ODCRMmICwsDNXV1Uw63sHCx/N4PPz5z3/GF198gSdPnqCmpkYj33tnZyd4PB74fD7UajW++OILyGQyjTJu3bqFwsJCqNVqfPXVVzAwMGDcWIaGhhCLxYiKisKsWbNgbm7OfG/8+PEaa0Hmz5+P2tpaZGdnQ6VSQaVSoaKiAjU1NVAqlcjNzUVHRwf09fUxcuRIjTYpKyuDg4MDDAwMBvydFG5COwwKZ5k1axZ27dqFnTt3wtbWFosWLcLp06cBPHP6HDx4EPb29nByckJ7e3u/lLo9PH36FGFhYbC1tcXChQvR2NiIo0eP9vvclStX0NbWhvXr1zNOKU9PTwDAypUr4ebmhqCgIAiFQgQEBDDzIG1tbYiIiMDcuXOxePFi2NnZDTqUI5VKUVxcjKdPnwJ4doH+29/+hilTpiAkJARz586Fu7s7KisrER8fP2AZ27ZtQ0dHBxwdHbF582Z4enoyF14nJyc4OztDLBbDzc0NhoaG/YasFixYgPPnz8PW1hY5OTlISEjQGD7z8fFBdXV1v9+wcuVKFBQUwNbWFrGxsTA2NkZycjLOnz8PZ2dnODk54cCBA4zzKScnB25ubpgzZw7S09M1nv7Onj1LF++9gtBotRSKlomLiwOfz8df//rXl1Le/v370dbWhr17976U8hobG+Hh4YHS0tJhCVFfVVWF7du3IyMj46WXTRleaIdBobxi9DjDrKysUFlZiffffx+7d+/GwoULf3XZ3d3d+PTTTyGTyfDpp5++BLWU3xJ00ptCecXo7OxEVFQUWltbMW7cOAQFBWHBggW/uly5XA5HR0eYm5sjKSnpJSil/NagTxgUCoVCGRJ00ptCoVAoQ4J2GBQKhUIZErTDoFAoFMqQoB0GhUKhUIYE7TAoFAqFMiT+DwsAfj24NhVUAAAAAElFTkSuQmCC\n"
876 | },
877 | "metadata": {}
878 | }
879 | ]
880 | }
881 | ]
882 | }
--------------------------------------------------------------------------------
/csv_schema_inference/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Wittline/csv-schema-inference/8121193f82b02984c811b2cc794c539e28b5e5ef/csv_schema_inference/__init__.py
--------------------------------------------------------------------------------
/csv_schema_inference/csv_schema_inference.py:
--------------------------------------------------------------------------------
1 | import mmap
2 | import os
3 | import multiprocessing as mp
4 | import datetime as dt
5 | import operator
6 |
7 |
8 |
9 | class DetectType:
10 |
11 | def __init__(self, max_length, sep):
12 | self.max_length = max_length
13 | self.sep = sep
14 |
15 | def __get_local_type(self, value):
16 | try:
17 | float(value)
18 | except ValueError:
19 | return "STRING"
20 |
21 | if float(value).is_integer():
22 | return "INTEGER"
23 | else:
24 | return "FLOAT"
25 |
26 |
27 | def __get_date_type(self, value):
28 |
29 |
30 | if "T" in value:
31 | segments = value.split("T")
32 | try:
33 |
34 | if len(segments) == 2:
35 | valid_date = False
36 | d_elements = segments[0].split("-")
37 | if len(d_elements) == 3 and len(d_elements[0]) in {2, 4} and \
38 | len(d_elements[1]) == 2 and len(d_elements[2]) == 2:
39 | dt.date(*(int(e) for e in d_elements))
40 | valid_date = True
41 | t_elements = segments[1].split(":")
42 | valid_time = False
43 | if len(t_elements) in (2, 3):
44 | valid_time = (len(t_elements[0]) == 2 and 0 <= int(t_elements[0]) < 24 and
45 | len(t_elements[1]) and 0 <= int(t_elements[1]) < 60)
46 | if len(t_elements) == 3:
47 | valid_time = (valid_time and len(t_elements[2]) == 2 and
48 | 0 <= int(t_elements[2]) < 60)
49 | if valid_time and valid_date:
50 | return "TIMESTAMP"
51 |
52 | except ValueError:
53 | return "STRING"
54 |
55 | elif "-" in value:
56 |
57 | segments = value.split("-")
58 | try:
59 |
60 | if len(segments) == 3 and len(segments[0]) in {2, 4} and \
61 | len(segments[1]) == 2 and len(segments[2]) == 2:
62 |
63 | dt.date(*(int(e) for e in segments))
64 | return "DATE"
65 | except ValueError:
66 | return "STRING"
67 | else:
68 |
69 | try:
70 | segments = value.split(":")
71 | if len(segments) in {2, 3}:
72 | valid = (len(segments[0]) == 2 and 0 <= int(segments[0]) < 24 and
73 | len(segments[1]) and 0 <= int(segments[1]) < 60)
74 | if len(segments) == 3:
75 | valid = (valid and len(segments[2]) == 2 and
76 | 0 <= int(segments[2]) < 60)
77 | if valid:
78 | return "TIME"
79 | except ValueError:
80 | return "STRING"
81 |
82 |
83 | return "STRING"
84 |
85 |
86 | def __infer_value_type(self, value, index, schema, values_type):
87 |
88 | if value not in values_type.keys():
89 |
90 | local_type = self.__get_local_type(value)
91 |
92 | if local_type == 'STRING':
93 |
94 | if value in {"", "na", "NA", "null", "NULL"}:
95 | schema[index]["nullable"] = True
96 | _type = "STRING"
97 | elif value in {"true", "false", "TRUE", "FALSE", "True", "False"}:
98 | _type = "BOOLEAN"
99 | elif len(value) < 21:
100 | _type = self.__get_date_type(value)
101 | else:
102 | _type = local_type
103 | else:
104 | _type = local_type
105 |
106 | values_type[value] = _type
107 |
108 | if values_type[value] not in schema[index]["types_found"].keys():
109 | schema[index]["types_found"][values_type[value]] = { "cnt": 1}
110 | else:
111 | schema[index]["types_found"][values_type[value]]["cnt"] += 1
112 | else:
113 | if values_type[value] not in schema[index]["types_found"].keys():
114 | schema[index]["types_found"][values_type[value]] = { "cnt": 1}
115 | else:
116 | schema[index]["types_found"][values_type[value]]["cnt"] += 1
117 |
118 |
119 | def execute(self, records, schema):
120 | values_type = {}
121 | for record in records:
122 | values = record.rstrip().split(self.sep)
123 | for index, value in enumerate(values):
124 | self.__infer_value_type(value[0:self.max_length], index, schema, values_type)
125 |
126 |
127 | class Parallel:
128 |
129 | def __init__(self):
130 | pass
131 |
132 |
133 | def execute(self, records, x, obj, d_schema):
134 | obj.execute(records, d_schema)
135 | return d_schema
136 |
137 |
138 | def parallel(self, records, obj, d_schema):
139 |
140 |
141 |
142 | cpus = (mp.cpu_count() - 2)
143 |
144 | if cpus <= 0:
145 | cpus = mp.cpu_count()
146 |
147 | chunk_size = len(records) / cpus
148 |
149 | if chunk_size < 1:
150 | cpus = int(chunk_size * 10)
151 | chunk_size = 1
152 | else:
153 | chunk_size = round(chunk_size)
154 |
155 |
156 |
157 | pool = mp.Pool(processes=cpus)
158 |
159 | results = [pool.apply_async(self.execute, args=(records[x:x+chunk_size], x, obj, d_schema)) for x in range(0, len(records), chunk_size)]
160 | pool.close()
161 | pool.join()
162 |
163 | return [p.get() for p in results]
164 |
165 |
166 | class CsvSchemaInference:
167 |
168 | def __init__(self, portion = 0.5, max_length = 1000, batch_size = 250000, acc = 0.7, seed= 1, header= True, sep=";", conditions = {}):
169 | self.portion = portion
170 | self.seed = seed
171 | self.header = header
172 | self.sep = sep
173 | self.accuracy = acc
174 | self.__schema = {}
175 | self.max_length = max_length
176 | self.data_types = {"STRING", "INTEGER", "FLOAT", "DATETIME", "DATE", "TIME", "TIMESTAMP", "BOOLEAN"}
177 | self.batch_size = batch_size
178 |
179 | if isinstance(conditions,dict):
180 |
181 | if conditions:
182 | for k, v in conditions.items():
183 | if k not in self.data_types or v not in self.data_types:
184 | raise ValueError('Keys and values in conditions must be valid data types')
185 |
186 |
187 | self.conditions = conditions
188 |
189 |
190 |
191 |
192 | def __set_header(self, header):
193 |
194 | header = header.rstrip().split(self.sep)
195 | for i in range(0, len(header)):
196 | self.__schema[i] = {
197 | "_name": header[i].replace('"', ''),
198 | "types_found":{
199 | },
200 | "nullable":False,
201 | "type":""
202 | }
203 |
204 |
205 | def __estimate_count(self, filename, reader):
206 | buffer = reader.read(1<<13)
207 | file_size = os.path.getsize(filename)
208 | return file_size // (len(buffer) // buffer.count(b'\n'))
209 |
210 |
211 | def __merge_schemas(self, schemas):
212 |
213 | for c_inx in self.__schema:
214 |
215 | for s_inx in range(0, len(schemas)):
216 |
217 | _v = schemas[s_inx][c_inx]
218 |
219 | if _v['nullable']:
220 | self.__schema[c_inx]['nullable'] = True
221 |
222 |
223 | for k in _v['types_found']:
224 |
225 | if k not in self.__schema[c_inx]['types_found'].keys():
226 |
227 | self.__schema[c_inx]['types_found'][k] = {
228 | "cnt": _v['types_found'][k]['cnt']
229 | }
230 | else:
231 | self.__schema[c_inx]['types_found'][k]['cnt'] += _v['types_found'][k]['cnt']
232 |
233 |
234 |
235 | def check_condition(self, _types, acc):
236 |
237 | try:
238 | _type = max({k: v for k, v in _types.items() if v >= (acc * 100)}.items(),
239 | key=operator.itemgetter(1))[0]
240 |
241 | if _type in self.conditions:
242 | if self.conditions[_type] in _types:
243 | _type = self.conditions[_type]
244 |
245 | except ValueError:
246 |
247 | if "STRING" in _types or len(_types) > 2:
248 | _type = "STRING"
249 | else:
250 | if {"INTEGER", "FLOAT"}.issubset(_types):
251 | _type = "FLOAT"
252 | else:
253 | _type = "STRING"
254 |
255 | return _type
256 |
257 |
258 |
259 |
260 |
261 | def __approximate_types(self, acc = 0.5):
262 |
263 | result = {}
264 | for c in self.__schema:
265 | _types = {}
266 | t = 0
267 | for v in self.__schema[c]['types_found']:
268 | t += self.__schema[c]['types_found'][v]['cnt']
269 | if v not in _types.keys():
270 | _types[v] = self.__schema[c]['types_found'][v]['cnt']
271 | else:
272 | _types[v] += self.__schema[c]['types_found'][v]['cnt']
273 |
274 | for ft in _types:
275 | _types[ft] = (_types[ft] * 100) / t
276 |
277 |
278 | _type = self.check_condition(_types, acc)
279 |
280 |
281 | self.__schema[c]['type'] = _type
282 |
283 | result[c] = {
284 | "name": self.__schema[c]['_name'],
285 | "type": _type,
286 | "nullable": self.__schema[c]['nullable']
287 | }
288 |
289 | return result
290 |
291 |
292 | def pretty(self, d, ind=0):
293 |
294 | for k, v in d.items():
295 | print('\t' * ind + str(k))
296 | if isinstance(v, dict):
297 | self.pretty(v, ind+1)
298 | else:
299 | print('\t' * (ind+1) + str(v))
300 |
301 |
302 | def get_schema_columns(self, columns = {}):
303 |
304 |
305 | result = {}
306 |
307 | for c in self.__schema:
308 | if self.__schema[c]["_name"] in columns:
309 | result[c] = {
310 | "_name": self.__schema[c]["_name"],
311 | "types_found":self.__schema[c]["types_found"],
312 | "nullable":self.__schema[c]["nullable"],
313 | "type":self.__schema[c]["type"]
314 | }
315 |
316 | return result
317 |
318 |
319 | def explore_schema_column(self, column):
320 |
321 | result = {}
322 |
323 | for c in self.__schema:
324 |
325 | if column == self.__schema[c]['_name']:
326 |
327 | _types = {}
328 | t = 0
329 | for v in self.__schema[c]['types_found']:
330 | t += self.__schema[c]['types_found'][v]['cnt']
331 |
332 | if v not in _types.keys():
333 | _types[v] = self.__schema[c]['types_found'][v]['cnt']
334 | else:
335 | _types[v] += self.__schema[c]['types_found'][v]['cnt']
336 |
337 | for ft in _types:
338 | _types[ft] = (_types[ft] * 100) / t
339 |
340 | result[c] = {
341 | "name" : self.__schema[c]['_name'],
342 | "types_found": _types,
343 | "nullable": self.__schema[c]['nullable']
344 | }
345 |
346 | break
347 |
348 | return result
349 |
350 |
351 |
352 | def run_inference(self, filename):
353 |
354 | with open(filename, mode="r", encoding = "ISO-8859-1") as file_obj:
355 |
356 | with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as map_file:
357 |
358 | less_header = 0
359 |
360 | if self.header:
361 | less_header = 1
362 |
363 | no_lines = self.__estimate_count(filename, map_file) - less_header
364 | portion = int(no_lines * self.portion)
365 | map_file.seek(0)
366 |
367 | if self.header:
368 | self.__set_header(map_file.readline().decode("ISO-8859-1"))
369 |
370 | lines = []
371 | schemas = []
372 | batch_count = 0
373 |
374 | dtype = DetectType(self.max_length, self.sep)
375 |
376 |
377 | while batch_count < portion:
378 |
379 | batch_count += 1
380 | lines.append(map_file.readline().decode("ISO-8859-1"))
381 |
382 | if batch_count % self.batch_size == 0:
383 |
384 | prl = Parallel()
385 | schemas_result = prl.parallel(records = lines, obj=dtype, d_schema = self.__schema)
386 |
387 | for schema in schemas_result:
388 | schemas.append(schema)
389 |
390 | lines = []
391 |
392 | if len(lines) > 0:
393 |
394 | prl = Parallel()
395 | schemas_result = prl.parallel(records = lines,obj=dtype, d_schema = self.__schema)
396 |
397 | for schema in schemas_result:
398 | schemas.append(schema)
399 |
400 | del lines
401 | del batch_count
402 |
403 |
404 | #Joining schemas results
405 | self.__merge_schemas(schemas)
406 |
407 | #Approximate data types
408 | return self.__approximate_types(acc = self.accuracy)
--------------------------------------------------------------------------------
/googled57bdb220576a44a.html:
--------------------------------------------------------------------------------
1 | google-site-verification: googled57bdb220576a44a.html
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = [
3 | "setuptools>=42",
4 | "wheel"
5 | ]
6 | build-backend = "setuptools.build_meta"
--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | name = csv-schema-inference
3 | version = 0.0.9
4 | author = Ramses Alexander Coraspe Valdez
5 | author_email = contacto@wittline.com
6 | description = A tool to automatically infer columns data types in .csv files
7 | long_description = file: README.md
8 | long_description_content_type = text/markdown
9 | url = https://github.com/Wittline/csv-schema-inference
10 | classifiers =
11 | Programming Language :: Python :: 3
12 | License :: OSI Approved :: MIT License
13 | Operating System :: OS Independent
14 |
15 | [options]
16 | packages = find:
17 | python_requires = >=3.7
18 | include_package_data = False
--------------------------------------------------------------------------------