├── .gitignore
├── Assignment
├── README.md
└── titles.ipynb
├── Data Visualization
├── Matplotlib
│ ├── DV2021EN-0101-Basic Plotting.ipynb
│ ├── DV2021EN-0102-Basic Color and Line Styles.ipynb
│ ├── DV2021EN-0103-Simple Line Plots with Line Styles.ipynb
│ ├── DV2021EN-0104-Simple Scatter Plots.ipynb
│ ├── DV2021EN-0105-Errorbars.ipynb
│ ├── DV2021EN-0106-Density_and_Contour_Plots.ipynb
│ ├── DV2021EN-0107-Histograms_and_Binnings.ipynb
│ ├── DV2021EN-0108-Customizing_Legends.ipynb
│ ├── Hough-Circle-Transform-Opencv.py
│ ├── MP0101- Matplotlib Complite .ipynb
│ ├── README.md
│ ├── Scatter plot.ipynb
│ ├── barh.py
│ ├── img
│ │ └── Text.png
│ ├── pieplot.py
│ ├── plot_camera_numpy.py
│ ├── polar_scatter.ipynb
│ ├── sample_plots.ipynb
│ ├── water Mark.ipynb
│ └── watermark_image.py
├── README.md
└── SeaBorn
│ └── README.md
├── Data
├── Boston
│ ├── README.md
│ ├── housing.data.txt
│ └── housing.names.txt
├── README.md
├── iris
│ ├── README.md
│ └── iris.csv
└── titles.csv
├── LICENSE
├── Life Cycle Process of Data Science In Real World project
├── DSPD0101ENT-Business Understanding(Problem)-to-Analytic Approach.ipynb
├── DSPD0101ENT-Business Understanding.ipynb
├── DSPD0102ENT-Understanding-to-Preparation.ipynb
├── DSPD0103ENT-Requirements-to-Collection-py.ipynb
├── DSPD0104ENT-Modeling-to-Evaluation-py.ipynb
├── IBMOpenSource_FoundationalMethologyforDataScience.PDF
└── README.md
├── Modeling
├── README.md
├── Semi Supervised Learning
│ └── README.md
├── Supervised Learning
│ └── README.md
└── Unsupervised Learning
│ └── README.md
├── Numpy
├── README.md
└── img
│ ├── README.md
│ └── Where we use numpy.png
├── Pandas
├── DSPD0100ENT-Business Understanding.ipynb
├── DSPD0101EN-Introduction-to-Pandas.ipynb
├── DSPD0102EN-Pandas Series,Data Frame and Index.ipynb
├── DSPD0103EN-Data Collection & Data Source.ipynb
├── DSPD0104EN-Data_Loading.ipynb
├── DSPD0105EN-Missing-Values.ipynb
├── DSPD0106EN - Loading data from SQL databases.ipynb
├── DSPD0107EN-Operations-in-Pandas.ipynb
├── DSPD0108EN-Working-With-Strings.ipynb
├── DSPD0109EN-Hierarchical-Indexing.ipynb
├── DSPD0110EN-Merge-and-Join.ipynb
├── DSPD0111-Handling Missing Values with Numpy and Pandas.ipynb
├── Predicting_Credit_Risk_Model_Pipeline.ipynb
└── README.md
├── README.md
└── Statistics
├── DS0101-summary-stats.ipynb
├── DS0102-Exploratory Data Analysis(EDA).ipynb
├── DS0103-Probability Density Functions(pdf).ipynb
├── DS0104-Probability Mass Functions.ipynb
├── DS0105-Hypothesis-Testing.ipynb
├── DS0106-Bootstrapping.ipynb
├── DS0107-Covariance and Correlation.ipynb
├── DS0108-Linear-Reqression-LeastSquares.ipynb
├── DS0109-Data Distributions.ipynb
├── DS0110-Probability distributions.ipynb
├── Data
└── README.md
├── Practice
├── 01 - Day 0 - Mean, Median, and Mode.py
├── 02 - Day 0 - Weighted Mean.py
├── 03 - Day 1 - Quartiles.py
├── 04 - Day 1 - Interquartile Range.py
├── 05 - Day 1 - Standard Deviation.py
├── 06 - Day 2 - Basic Probability.py
├── 07 - Day 2 - More Dice.py
├── 08 - Day 2 - Compound Event Probability.py
├── 09 - Day 3 - Conditional Probability.py
├── 10 - Day 3 - Cards of the Same Suit.txt
├── 11 - Day 3 - Drawing Marbles.py
├── 12 - Day 4 - Binomial Distribution I.py
├── 13 - Day 4 - Binomial Distribution II.py
├── 14 - Day 4 - Geometric Distribution I.py
├── 15 - Day 4 - Geometric Distribution II.py
├── 16 - Day 5 - Poisson Distribution I.py
├── 17 - Day 5 - Poisson Distribution II.py
├── 18 - Day 5 - Normal Distribution I.py
├── 19 - Day 5 - Normal Distribution II.py
├── 20 - Day 6 - The Central Limit Theorem I.py
├── 21 - Day 6 - The Central Limit Theorem II.py
├── 22 - Day 6 - The Central Limit Theorem III.py
├── 23 - Day 7 - Pearson Correlation Coefficient I.py
├── 24 - Day 7 - Spearman's Rank Correlation.py
├── 25 - Day 8 - Least Sqaure Regression Line.py
├── 26 - Day 8 - Pearson Correlation Coefficient II.txt
├── 27 - Day 9 - Multiple Linear Regression.py
└── Readme.md
└── README.md
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | MANIFEST
27 |
28 | # PyInstaller
29 | # Usually these files are written by a python script from a template
30 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
31 | *.manifest
32 | *.spec
33 |
34 | # Installer logs
35 | pip-log.txt
36 | pip-delete-this-directory.txt
37 |
38 | # Unit test / coverage reports
39 | htmlcov/
40 | .tox/
41 | .coverage
42 | .coverage.*
43 | .cache
44 | nosetests.xml
45 | coverage.xml
46 | *.cover
47 | .hypothesis/
48 | .pytest_cache/
49 |
50 | # Translations
51 | *.mo
52 | *.pot
53 |
54 | # Django stuff:
55 | *.log
56 | local_settings.py
57 | db.sqlite3
58 |
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 |
63 | # Scrapy stuff:
64 | .scrapy
65 |
66 | # Sphinx documentation
67 | docs/_build/
68 |
69 | # PyBuilder
70 | target/
71 |
72 | # Jupyter Notebook
73 | .ipynb_checkpoints
74 |
75 | # pyenv
76 | .python-version
77 |
78 | # celery beat schedule file
79 | celerybeat-schedule
80 |
81 | # SageMath parsed files
82 | *.sage.py
83 |
84 | # Environments
85 | .env
86 | .venv
87 | env/
88 | venv/
89 | ENV/
90 | env.bak/
91 | venv.bak/
92 |
93 | # Spyder project settings
94 | .spyderproject
95 | .spyproject
96 |
97 | # Rope project settings
98 | .ropeproject
99 |
100 | # mkdocs documentation
101 | /site
102 |
103 | # mypy
104 | .mypy_cache/
105 |
--------------------------------------------------------------------------------
/Assignment/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Assignment/titles.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "source": [
6 | "%matplotlib inline\n",
7 | "import pandas as pd"
8 | ],
9 | "outputs": [],
10 | "execution_count": 1,
11 | "metadata": {
12 | "execution": {
13 | "iopub.status.busy": "2020-08-02T15:59:48.826Z",
14 | "iopub.execute_input": "2020-08-02T15:59:48.863Z",
15 | "shell.execute_reply": "2020-08-02T15:59:54.394Z",
16 | "iopub.status.idle": "2020-08-02T15:59:54.440Z"
17 | }
18 | }
19 | },
20 | {
21 | "cell_type": "code",
22 | "source": [
23 | "titles = pd.read_csv('titles.csv')\n",
24 | "titles.head()"
25 | ],
26 | "outputs": [
27 | {
28 | "output_type": "execute_result",
29 | "execution_count": 3,
30 | "data": {
31 | "text/plain": " title year\n0 Tasveer Mere Sanam 1996\n1 Only You 1994\n2 El pueblo del terror 1970\n3 Machine 2007\n4 MARy 2008",
32 | "text/html": "
\n\n
\n \n \n | \n title | \n year | \n
\n \n \n \n 0 | \n Tasveer Mere Sanam | \n 1996 | \n
\n \n 1 | \n Only You | \n 1994 | \n
\n \n 2 | \n El pueblo del terror | \n 1970 | \n
\n \n 3 | \n Machine | \n 2007 | \n
\n \n 4 | \n MARy | \n 2008 | \n
\n \n
\n
"
33 | },
34 | "metadata": {}
35 | }
36 | ],
37 | "execution_count": 3,
38 | "metadata": {
39 | "execution": {
40 | "iopub.status.busy": "2020-08-02T16:00:03.676Z",
41 | "iopub.execute_input": "2020-08-02T16:00:03.696Z",
42 | "iopub.status.idle": "2020-08-02T16:00:03.813Z",
43 | "shell.execute_reply": "2020-08-02T16:00:03.852Z"
44 | }
45 | }
46 | },
47 | {
48 | "cell_type": "code",
49 | "source": [
50 | "\n",
51 | "titles.tail()"
52 | ],
53 | "outputs": [
54 | {
55 | "output_type": "execute_result",
56 | "execution_count": 6,
57 | "data": {
58 | "text/plain": " title year\n244909 Black Butterfly in a Colorful World 2018\n244910 Hua fei hua wu chun man cheng 1980\n244911 Nippon dabi katsukyu 1970\n244912 Under Siege 2: Dark Territory 1995\n244913 She Must Be Seeing Things 1987",
59 | "text/html": "\n\n
\n \n \n | \n title | \n year | \n
\n \n \n \n 244909 | \n Black Butterfly in a Colorful World | \n 2018 | \n
\n \n 244910 | \n Hua fei hua wu chun man cheng | \n 1980 | \n
\n \n 244911 | \n Nippon dabi katsukyu | \n 1970 | \n
\n \n 244912 | \n Under Siege 2: Dark Territory | \n 1995 | \n
\n \n 244913 | \n She Must Be Seeing Things | \n 1987 | \n
\n \n
\n
"
60 | },
61 | "metadata": {}
62 | }
63 | ],
64 | "execution_count": 6,
65 | "metadata": {
66 | "execution": {
67 | "iopub.status.busy": "2020-08-02T16:08:53.347Z",
68 | "iopub.execute_input": "2020-08-02T16:08:53.373Z",
69 | "iopub.status.idle": "2020-08-02T16:08:53.417Z",
70 | "shell.execute_reply": "2020-08-02T16:08:53.434Z"
71 | }
72 | }
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "source": [
77 | "### How many movies are listed in the titles dataframe?"
78 | ],
79 | "metadata": {
80 | "collapsed": true
81 | }
82 | },
83 | {
84 | "cell_type": "code",
85 | "source": [],
86 | "outputs": [
87 | {
88 | "output_type": "stream",
89 | "name": "stdout",
90 | "text": [
91 | "Index(['title', 'year'], dtype='object')\n"
92 | ]
93 | }
94 | ],
95 | "execution_count": 8,
96 | "metadata": {
97 | "execution": {
98 | "iopub.status.busy": "2020-08-02T16:11:11.003Z",
99 | "iopub.execute_input": "2020-08-02T16:11:11.026Z",
100 | "iopub.status.idle": "2020-08-02T16:11:11.114Z",
101 | "shell.execute_reply": "2020-08-02T16:11:11.135Z"
102 | }
103 | }
104 | },
105 | {
106 | "cell_type": "code",
107 | "source": [],
108 | "outputs": [],
109 | "execution_count": null,
110 | "metadata": {
111 | "collapsed": true
112 | }
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "source": [
117 | "### What are the earliest two films listed in the titles dataframe?"
118 | ],
119 | "metadata": {
120 | "collapsed": true
121 | }
122 | },
123 | {
124 | "cell_type": "code",
125 | "source": [],
126 | "outputs": [],
127 | "execution_count": null,
128 | "metadata": {}
129 | },
130 | {
131 | "cell_type": "code",
132 | "source": [],
133 | "outputs": [],
134 | "execution_count": null,
135 | "metadata": {
136 | "collapsed": true
137 | }
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "source": [
142 | "### How many movies have the title \"Hamlet\"?"
143 | ],
144 | "metadata": {
145 | "collapsed": true
146 | }
147 | },
148 | {
149 | "cell_type": "code",
150 | "source": [
151 | "\n"
152 | ],
153 | "outputs": [],
154 | "execution_count": null,
155 | "metadata": {}
156 | },
157 | {
158 | "cell_type": "code",
159 | "source": [],
160 | "outputs": [],
161 | "execution_count": null,
162 | "metadata": {}
163 | },
164 | {
165 | "cell_type": "markdown",
166 | "source": [
167 | "### How many movies are titled \"North by Northwest\"?"
168 | ],
169 | "metadata": {
170 | "collapsed": true
171 | }
172 | },
173 | {
174 | "cell_type": "code",
175 | "source": [
176 | "\n"
177 | ],
178 | "outputs": [],
179 | "execution_count": null,
180 | "metadata": {}
181 | },
182 | {
183 | "cell_type": "code",
184 | "source": [],
185 | "outputs": [],
186 | "execution_count": null,
187 | "metadata": {
188 | "collapsed": true
189 | }
190 | },
191 | {
192 | "cell_type": "markdown",
193 | "source": [
194 | "### When was the first movie titled \"Hamlet\" made?"
195 | ],
196 | "metadata": {
197 | "collapsed": true
198 | }
199 | },
200 | {
201 | "cell_type": "code",
202 | "source": [],
203 | "outputs": [],
204 | "execution_count": 19,
205 | "metadata": {}
206 | },
207 | {
208 | "cell_type": "markdown",
209 | "source": [
210 | "### List all of the \"Treasure Island\" movies from earliest to most recent."
211 | ],
212 | "metadata": {
213 | "collapsed": true
214 | }
215 | },
216 | {
217 | "cell_type": "code",
218 | "source": [],
219 | "outputs": [],
220 | "execution_count": null,
221 | "metadata": {
222 | "collapsed": true
223 | }
224 | },
225 | {
226 | "cell_type": "markdown",
227 | "source": [
228 | "### How many movies were made in the year 1950?"
229 | ],
230 | "metadata": {
231 | "collapsed": true
232 | }
233 | },
234 | {
235 | "cell_type": "code",
236 | "source": [],
237 | "outputs": [],
238 | "execution_count": null,
239 | "metadata": {
240 | "collapsed": true
241 | }
242 | },
243 | {
244 | "cell_type": "markdown",
245 | "source": [
246 | "### How many movies were made in the year 1960?"
247 | ],
248 | "metadata": {
249 | "collapsed": true
250 | }
251 | },
252 | {
253 | "cell_type": "code",
254 | "source": [],
255 | "outputs": [],
256 | "execution_count": null,
257 | "metadata": {
258 | "collapsed": true
259 | }
260 | },
261 | {
262 | "cell_type": "markdown",
263 | "source": [
264 | "### How many movies were made from 1950 through 1959?"
265 | ],
266 | "metadata": {
267 | "collapsed": true
268 | }
269 | },
270 | {
271 | "cell_type": "markdown",
272 | "source": [],
273 | "metadata": {}
274 | },
275 | {
276 | "cell_type": "code",
277 | "source": [],
278 | "outputs": [],
279 | "execution_count": null,
280 | "metadata": {
281 | "collapsed": true
282 | }
283 | },
284 | {
285 | "cell_type": "markdown",
286 | "source": [
287 | "### In what years has a movie titled \"Batman\" been released?"
288 | ],
289 | "metadata": {
290 | "collapsed": true
291 | }
292 | },
293 | {
294 | "cell_type": "code",
295 | "source": [],
296 | "outputs": [],
297 | "execution_count": null,
298 | "metadata": {
299 | "collapsed": true
300 | }
301 | },
302 | {
303 | "cell_type": "markdown",
304 | "source": [
305 | "### How many roles were there in the movie \"Inception\"?"
306 | ],
307 | "metadata": {
308 | "collapsed": true
309 | }
310 | },
311 | {
312 | "cell_type": "code",
313 | "source": [],
314 | "outputs": [],
315 | "execution_count": null,
316 | "metadata": {
317 | "collapsed": true
318 | }
319 | },
320 | {
321 | "cell_type": "code",
322 | "source": [],
323 | "outputs": [],
324 | "execution_count": null,
325 | "metadata": {
326 | "collapsed": true
327 | }
328 | },
329 | {
330 | "cell_type": "markdown",
331 | "source": [
332 | "### How many roles in the movie \"Inception\" are NOT ranked by an \"n\" value?"
333 | ],
334 | "metadata": {
335 | "collapsed": true
336 | }
337 | },
338 | {
339 | "cell_type": "code",
340 | "source": [],
341 | "outputs": [],
342 | "execution_count": null,
343 | "metadata": {
344 | "collapsed": true
345 | }
346 | },
347 | {
348 | "cell_type": "code",
349 | "source": [],
350 | "outputs": [],
351 | "execution_count": null,
352 | "metadata": {
353 | "collapsed": true
354 | }
355 | },
356 | {
357 | "cell_type": "markdown",
358 | "source": [
359 | "### But how many roles in the movie \"Inception\" did receive an \"n\" value?"
360 | ],
361 | "metadata": {
362 | "collapsed": true
363 | }
364 | },
365 | {
366 | "cell_type": "code",
367 | "source": [],
368 | "outputs": [],
369 | "execution_count": null,
370 | "metadata": {
371 | "collapsed": true
372 | }
373 | },
374 | {
375 | "cell_type": "code",
376 | "source": [],
377 | "outputs": [],
378 | "execution_count": null,
379 | "metadata": {
380 | "collapsed": true
381 | }
382 | },
383 | {
384 | "cell_type": "markdown",
385 | "source": [
386 | "### Display the cast of \"North by Northwest\" in their correct \"n\"-value order, ignoring roles that did not earn a numeric \"n\" value."
387 | ],
388 | "metadata": {
389 | "collapsed": true
390 | }
391 | },
392 | {
393 | "cell_type": "code",
394 | "source": [],
395 | "outputs": [],
396 | "execution_count": null,
397 | "metadata": {
398 | "collapsed": true
399 | }
400 | },
401 | {
402 | "cell_type": "code",
403 | "source": [],
404 | "outputs": [],
405 | "execution_count": null,
406 | "metadata": {
407 | "collapsed": true
408 | }
409 | },
410 | {
411 | "cell_type": "markdown",
412 | "source": [
413 | "### Display the entire cast, in \"n\"-order, of the 1972 film \"Sleuth\"."
414 | ],
415 | "metadata": {
416 | "collapsed": true
417 | }
418 | },
419 | {
420 | "cell_type": "code",
421 | "source": [],
422 | "outputs": [],
423 | "execution_count": null,
424 | "metadata": {
425 | "collapsed": true
426 | }
427 | },
428 | {
429 | "cell_type": "code",
430 | "source": [],
431 | "outputs": [],
432 | "execution_count": null,
433 | "metadata": {
434 | "collapsed": true
435 | }
436 | },
437 | {
438 | "cell_type": "markdown",
439 | "source": [
440 | "### Now display the entire cast, in \"n\"-order, of the 2007 version of \"Sleuth\"."
441 | ],
442 | "metadata": {
443 | "collapsed": true
444 | }
445 | },
446 | {
447 | "cell_type": "code",
448 | "source": [],
449 | "outputs": [],
450 | "execution_count": null,
451 | "metadata": {
452 | "collapsed": true
453 | }
454 | },
455 | {
456 | "cell_type": "code",
457 | "source": [],
458 | "outputs": [],
459 | "execution_count": null,
460 | "metadata": {
461 | "collapsed": true
462 | }
463 | },
464 | {
465 | "cell_type": "markdown",
466 | "source": [
467 | "### How many roles were credited in the silent 1921 version of Hamlet?"
468 | ],
469 | "metadata": {
470 | "collapsed": true
471 | }
472 | },
473 | {
474 | "cell_type": "code",
475 | "source": [],
476 | "outputs": [],
477 | "execution_count": null,
478 | "metadata": {
479 | "collapsed": true
480 | }
481 | },
482 | {
483 | "cell_type": "code",
484 | "source": [],
485 | "outputs": [],
486 | "execution_count": null,
487 | "metadata": {
488 | "collapsed": true
489 | }
490 | },
491 | {
492 | "cell_type": "markdown",
493 | "source": [
494 | "### How many roles were credited in Branagh’s 1996 Hamlet?"
495 | ],
496 | "metadata": {
497 | "collapsed": true
498 | }
499 | },
500 | {
501 | "cell_type": "code",
502 | "source": [],
503 | "outputs": [],
504 | "execution_count": null,
505 | "metadata": {
506 | "collapsed": true
507 | }
508 | },
509 | {
510 | "cell_type": "code",
511 | "source": [],
512 | "outputs": [],
513 | "execution_count": null,
514 | "metadata": {
515 | "collapsed": true
516 | }
517 | },
518 | {
519 | "cell_type": "markdown",
520 | "source": [
521 | "### How many \"Hamlet\" roles have been listed in all film credits through history?"
522 | ],
523 | "metadata": {
524 | "collapsed": true
525 | }
526 | },
527 | {
528 | "cell_type": "code",
529 | "source": [],
530 | "outputs": [],
531 | "execution_count": null,
532 | "metadata": {
533 | "collapsed": true
534 | }
535 | },
536 | {
537 | "cell_type": "code",
538 | "source": [],
539 | "outputs": [],
540 | "execution_count": null,
541 | "metadata": {
542 | "collapsed": true
543 | }
544 | },
545 | {
546 | "cell_type": "markdown",
547 | "source": [
548 | "### How many people have played an \"Ophelia\"?"
549 | ],
550 | "metadata": {
551 | "collapsed": true
552 | }
553 | },
554 | {
555 | "cell_type": "code",
556 | "source": [],
557 | "outputs": [],
558 | "execution_count": null,
559 | "metadata": {
560 | "collapsed": true
561 | }
562 | },
563 | {
564 | "cell_type": "code",
565 | "source": [],
566 | "outputs": [],
567 | "execution_count": null,
568 | "metadata": {
569 | "collapsed": true
570 | }
571 | },
572 | {
573 | "cell_type": "markdown",
574 | "source": [
575 | "### How many people have played a role called \"The Dude\"?"
576 | ],
577 | "metadata": {
578 | "collapsed": true
579 | }
580 | },
581 | {
582 | "cell_type": "code",
583 | "source": [],
584 | "outputs": [],
585 | "execution_count": null,
586 | "metadata": {
587 | "collapsed": true
588 | }
589 | },
590 | {
591 | "cell_type": "code",
592 | "source": [],
593 | "outputs": [],
594 | "execution_count": null,
595 | "metadata": {
596 | "collapsed": true
597 | }
598 | },
599 | {
600 | "cell_type": "markdown",
601 | "source": [
602 | "### How many people have played a role called \"The Stranger\"?"
603 | ],
604 | "metadata": {
605 | "collapsed": true
606 | }
607 | },
608 | {
609 | "cell_type": "code",
610 | "source": [],
611 | "outputs": [],
612 | "execution_count": null,
613 | "metadata": {
614 | "collapsed": true
615 | }
616 | },
617 | {
618 | "cell_type": "code",
619 | "source": [],
620 | "outputs": [],
621 | "execution_count": null,
622 | "metadata": {
623 | "collapsed": true
624 | }
625 | },
626 | {
627 | "cell_type": "markdown",
628 | "source": [
629 | "### How many roles has Sidney Poitier played throughout his career?"
630 | ],
631 | "metadata": {
632 | "collapsed": true
633 | }
634 | },
635 | {
636 | "cell_type": "code",
637 | "source": [],
638 | "outputs": [],
639 | "execution_count": null,
640 | "metadata": {
641 | "collapsed": true
642 | }
643 | },
644 | {
645 | "cell_type": "code",
646 | "source": [],
647 | "outputs": [],
648 | "execution_count": null,
649 | "metadata": {
650 | "collapsed": true
651 | }
652 | },
653 | {
654 | "cell_type": "markdown",
655 | "source": [
656 | "### How many roles has Judi Dench played?"
657 | ],
658 | "metadata": {
659 | "collapsed": true
660 | }
661 | },
662 | {
663 | "cell_type": "code",
664 | "source": [],
665 | "outputs": [],
666 | "execution_count": null,
667 | "metadata": {
668 | "collapsed": true
669 | }
670 | },
671 | {
672 | "cell_type": "code",
673 | "source": [],
674 | "outputs": [],
675 | "execution_count": null,
676 | "metadata": {
677 | "collapsed": true
678 | }
679 | },
680 | {
681 | "cell_type": "markdown",
682 | "source": [
683 | "### List the supporting roles (having n=2) played by Cary Grant in the 1940s, in order by year."
684 | ],
685 | "metadata": {
686 | "collapsed": true
687 | }
688 | },
689 | {
690 | "cell_type": "code",
691 | "source": [],
692 | "outputs": [],
693 | "execution_count": null,
694 | "metadata": {
695 | "collapsed": true
696 | }
697 | },
698 | {
699 | "cell_type": "code",
700 | "source": [],
701 | "outputs": [],
702 | "execution_count": null,
703 | "metadata": {
704 | "collapsed": true
705 | }
706 | },
707 | {
708 | "cell_type": "markdown",
709 | "source": [
710 | "### List the leading roles that Cary Grant played in the 1940s in order by year."
711 | ],
712 | "metadata": {
713 | "collapsed": true
714 | }
715 | },
716 | {
717 | "cell_type": "code",
718 | "source": [],
719 | "outputs": [],
720 | "execution_count": null,
721 | "metadata": {
722 | "collapsed": true
723 | }
724 | },
725 | {
726 | "cell_type": "code",
727 | "source": [],
728 | "outputs": [],
729 | "execution_count": null,
730 | "metadata": {
731 | "collapsed": true
732 | }
733 | },
734 | {
735 | "cell_type": "markdown",
736 | "source": [
737 | "### How many roles were available for actors in the 1950s?"
738 | ],
739 | "metadata": {
740 | "collapsed": true
741 | }
742 | },
743 | {
744 | "cell_type": "code",
745 | "source": [],
746 | "outputs": [],
747 | "execution_count": null,
748 | "metadata": {
749 | "collapsed": true
750 | }
751 | },
752 | {
753 | "cell_type": "code",
754 | "source": [],
755 | "outputs": [],
756 | "execution_count": null,
757 | "metadata": {
758 | "collapsed": true
759 | }
760 | },
761 | {
762 | "cell_type": "markdown",
763 | "source": [
764 | "### How many roles were available for actresses in the 1950s?"
765 | ],
766 | "metadata": {
767 | "collapsed": true
768 | }
769 | },
770 | {
771 | "cell_type": "code",
772 | "source": [],
773 | "outputs": [],
774 | "execution_count": null,
775 | "metadata": {
776 | "collapsed": true
777 | }
778 | },
779 | {
780 | "cell_type": "code",
781 | "source": [],
782 | "outputs": [],
783 | "execution_count": null,
784 | "metadata": {
785 | "collapsed": true
786 | }
787 | },
788 | {
789 | "cell_type": "markdown",
790 | "source": [
791 | "### How many leading roles (n=1) were available from the beginning of film history through 1980?"
792 | ],
793 | "metadata": {
794 | "collapsed": true
795 | }
796 | },
797 | {
798 | "cell_type": "code",
799 | "source": [],
800 | "outputs": [],
801 | "execution_count": null,
802 | "metadata": {
803 | "collapsed": true
804 | }
805 | },
806 | {
807 | "cell_type": "code",
808 | "source": [],
809 | "outputs": [],
810 | "execution_count": null,
811 | "metadata": {
812 | "collapsed": true
813 | }
814 | },
815 | {
816 | "cell_type": "markdown",
817 | "source": [
818 | "### How many non-leading roles were available through from the beginning of film history through 1980?"
819 | ],
820 | "metadata": {
821 | "collapsed": true
822 | }
823 | },
824 | {
825 | "cell_type": "code",
826 | "source": [],
827 | "outputs": [],
828 | "execution_count": null,
829 | "metadata": {
830 | "collapsed": true
831 | }
832 | },
833 | {
834 | "cell_type": "code",
835 | "source": [],
836 | "outputs": [],
837 | "execution_count": null,
838 | "metadata": {
839 | "collapsed": true
840 | }
841 | },
842 | {
843 | "cell_type": "markdown",
844 | "source": [
845 | "### How many roles through 1980 were minor enough that they did not warrant a numeric \"n\" rank?"
846 | ],
847 | "metadata": {
848 | "collapsed": true
849 | }
850 | },
851 | {
852 | "cell_type": "code",
853 | "source": [],
854 | "outputs": [],
855 | "execution_count": null,
856 | "metadata": {
857 | "collapsed": true
858 | }
859 | },
860 | {
861 | "cell_type": "code",
862 | "source": [],
863 | "outputs": [],
864 | "execution_count": null,
865 | "metadata": {
866 | "collapsed": true
867 | }
868 | }
869 | ],
870 | "metadata": {
871 | "kernelspec": {
872 | "display_name": "Python 3",
873 | "language": "python",
874 | "name": "python3"
875 | },
876 | "language_info": {
877 | "name": "python",
878 | "version": "3.6.8",
879 | "mimetype": "text/x-python",
880 | "codemirror_mode": {
881 | "name": "ipython",
882 | "version": 3
883 | },
884 | "pygments_lexer": "ipython3",
885 | "nbconvert_exporter": "python",
886 | "file_extension": ".py"
887 | },
888 | "nteract": {
889 | "version": "0.24.1"
890 | }
891 | },
892 | "nbformat": 4,
893 | "nbformat_minor": 1
894 | }
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/Hough-Circle-Transform-Opencv.py:
--------------------------------------------------------------------------------
1 | 2import cv2
2 | import numpy as np
3 |
4 | img = cv2.imread('eye.jpg',0)
5 | img = cv2.medianBlur(img,5)
6 | cimg = cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
7 |
8 | circles = cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,20,
9 | param1=50,param2=30,minRadius=0,maxRadius=0)
10 |
11 | circles = np.uint16(np.around(circles))
12 | for i in circles[0,:]:
13 | # draw the outer circle
14 | cv2.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2)
15 | # draw the center of the circle
16 | cv2.circle(cimg,(i[0],i[1]),2,(0,0,255),3)
17 |
18 | cv2.imshow('detected circles',cimg)
19 | cv2.waitKey(0)
20 | cv2.destroyAllWindows()
21 | z
22 |
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/README.md:
--------------------------------------------------------------------------------
1 | Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+
2 |
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/barh.py:
--------------------------------------------------------------------------------
1 | # Horizontal bar chart
2 |
3 | import matplotlib.pyplot as plt
4 | import numpy as np
5 |
6 | # Fixing random state for reproducibility
7 | np.random.seed(19680801)
8 |
9 | plt.rcdefaults()
10 | fig, ax = plt.subplots()
11 |
12 | # Example data
13 | people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim')
14 | y_pos = np.arange(len(people))
15 | performance = 3 + 10 * np.random.rand(len(people))
16 | error = np.random.rand(len(people))
17 |
18 | ax.barh(y_pos, performance, xerr=error, align='center',
19 | color='blue', ecolor='black')
20 | ##ax.set_yticks(y_pos)
21 | ##ax.set_yticklabels(people)
22 | ##ax.invert_yaxis() # labels read top-to-bottom
23 | ##ax.set_xlabel('Performance')
24 | ##ax.set_title('How fast do you want to go today?')
25 |
26 | plt.show()
27 | ##import matplotlib.pyplot as plt
28 | ##import numpy as np
29 | ##
30 | ##np.random.seed(19680801)
31 | ##data = np.random.randn(2, 100)
32 | ##
33 | ##fig, axs = plt.subplots(3, 3, figsize=(5, 5))
34 | ##axs[0, 0].hist(data[0])
35 | ##axs[1, 0].scatter(data[0], data[1])
36 | ##axs[0, 1].plot(data[0], data[1])
37 | ##axs[1, 1].hist2d(data[0], data[1])
38 | ##a = np.linspace(1,2,20)
39 | ##axs[0,2].barh(a[0],[2])
40 | ##
41 | ##
42 | ##plt.show()
43 |
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/img/Text.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reddyprasade/Data-Science-With-Python/b8e9b16691dc2f35a4d975946f8ca5cc0e0469d0/Data Visualization/Matplotlib/img/Text.png
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/pieplot.py:
--------------------------------------------------------------------------------
1 | import matplotlib.pyplot as plt
2 | # Data Sets
3 | Group = ['EUL','PES','EFA','EDD','ELDR','EPP','UEN','OTHER']
4 | Seats = [39,200,42,15,67,276,27,66]
5 | explode = (0,0,0,0,0,0.1,0,0)
6 | # Specify the axis
7 | plt.axis('equal')
8 | # Give the Title
9 | plt.title("European Parliment election ,2019")
10 | # Colors Code
11 | colors = ['red','orangered','forestgreen','lemonchiffon','yellow','navy','royalblue','lightgrey']
12 | plt.pie(x=Seats,colors=colors,autopct='%1.0f%%',explode=explode)
13 | plt.legend(loc="center right", labels=Group)
14 | plt.show()
15 |
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/plot_camera_numpy.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from skimage import data
3 | import matplotlib.pyplot as plt
4 |
5 | camera = data.camera()
6 | camera[:10] = 0
7 | mask = camera < 87
8 | camera[mask] = 255
9 | inds_x = np.arange(len(camera))
10 | inds_y = (4 * inds_x) % len(camera)
11 | camera[inds_x, inds_y] = 0
12 |
13 | l_x, l_y = camera.shape[0], camera.shape[1]
14 | X, Y = np.ogrid[:l_x, :l_y]
15 | outer_disk_mask = (X - l_x / 2)**2 + (Y - l_y / 2)**2 > (l_x / 2)**2
16 | camera[outer_disk_mask] = 0
17 |
18 | plt.figure(figsize=(4, 4))
19 | plt.imshow(camera, cmap='gray', interpolation='nearest')
20 | plt.axis('off')
21 | plt.show()
22 |
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/sample_plots.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [],
10 | "source": [
11 | "%matplotlib inline"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "\n# Sample plots in Matplotlib\n\n\nHere you'll find a host of example plots with the code that\ngenerated them.\n\n\nLine Plot\n=========\n\nHere's how to create a line plot with text labels using\n:func:`~matplotlib.pyplot.plot`.\n\n.. figure:: ../../gallery/lines_bars_and_markers/images/sphx_glr_simple_plot_001.png\n :target: ../../gallery/lines_bars_and_markers/simple_plot.html\n :align: center\n :scale: 50\n\n Simple Plot\n\n\nMultiple subplots in one figure\n===============================\n\nMultiple axes (i.e. subplots) are created with the\n:func:`~matplotlib.pyplot.subplot` function:\n\n.. figure:: ../../gallery/subplots_axes_and_figures/images/sphx_glr_subplot_001.png\n :target: ../../gallery/subplots_axes_and_figures/subplot.html\n :align: center\n :scale: 50\n\n Subplot\n\n\nImages\n======\n\nMatplotlib can display images (assuming equally spaced\nhorizontal dimensions) using the :func:`~matplotlib.pyplot.imshow` function.\n\n.. figure:: ../../gallery/images_contours_and_fields/images/sphx_glr_image_demo_003.png\n :target: ../../gallery/images_contours_and_fields/image_demo.html\n :align: center\n :scale: 50\n\n Example of using :func:`~matplotlib.pyplot.imshow` to display a CT scan\n\n\n\nContouring and pseudocolor\n==========================\n\nThe :func:`~matplotlib.pyplot.pcolormesh` function can make a colored\nrepresentation of a two-dimensional array, even if the horizontal dimensions\nare unevenly spaced. The\n:func:`~matplotlib.pyplot.contour` function is another way to represent\nthe same data:\n\n.. figure:: ../../gallery/images_contours_and_fields/images/sphx_glr_pcolormesh_levels_001.png\n :target: ../../gallery/images_contours_and_fields/pcolormesh_levels.html\n :align: center\n :scale: 50\n\n Example comparing :func:`~matplotlib.pyplot.pcolormesh` and :func:`~matplotlib.pyplot.contour` for plotting two-dimensional data\n\n\nHistograms\n==========\n\nThe :func:`~matplotlib.pyplot.hist` function automatically generates\nhistograms and returns the bin counts or probabilities:\n\n.. figure:: ../../gallery/statistics/images/sphx_glr_histogram_features_001.png\n :target: ../../gallery/statistics/histogram_features.html\n :align: center\n :scale: 50\n\n Histogram Features\n\n\n\nPaths\n=====\n\nYou can add arbitrary paths in Matplotlib using the\n:mod:`matplotlib.path` module:\n\n.. figure:: ../../gallery/shapes_and_collections/images/sphx_glr_path_patch_001.png\n :target: ../../gallery/shapes_and_collections/path_patch.html\n :align: center\n :scale: 50\n\n Path Patch\n\n\nThree-dimensional plotting\n==========================\n\nThe mplot3d toolkit (see `toolkit_mplot3d-tutorial` and\n`mplot3d-examples-index`) has support for simple 3d graphs\nincluding surface, wireframe, scatter, and bar charts.\n\n.. figure:: ../../gallery/mplot3d/images/sphx_glr_surface3d_001.png\n :target: ../../gallery/mplot3d/surface3d.html\n :align: center\n :scale: 50\n\n Surface3d\n\nThanks to John Porter, Jonathon Taylor, Reinier Heeres, and Ben Root for\nthe `mplot3d` toolkit. This toolkit is included with all standard Matplotlib\ninstalls.\n\n\n\nStreamplot\n==========\n\nThe :meth:`~matplotlib.pyplot.streamplot` function plots the streamlines of\na vector field. In addition to simply plotting the streamlines, it allows you\nto map the colors and/or line widths of streamlines to a separate parameter,\nsuch as the speed or local intensity of the vector field.\n\n.. figure:: ../../gallery/images_contours_and_fields/images/sphx_glr_plot_streamplot_001.png\n :target: ../../gallery/images_contours_and_fields/plot_streamplot.html\n :align: center\n :scale: 50\n\n Streamplot with various plotting options.\n\nThis feature complements the :meth:`~matplotlib.pyplot.quiver` function for\nplotting vector fields. Thanks to Tom Flannaghan and Tony Yu for adding the\nstreamplot function.\n\n\nEllipses\n========\n\nIn support of the `Phoenix `_\nmission to Mars (which used Matplotlib to display ground tracking of\nspacecraft), Michael Droettboom built on work by Charlie Moad to provide\nan extremely accurate 8-spline approximation to elliptical arcs (see\n:class:`~matplotlib.patches.Arc`), which are insensitive to zoom level.\n\n.. figure:: ../../gallery/shapes_and_collections/images/sphx_glr_ellipse_demo_001.png\n :target: ../../gallery/shapes_and_collections/ellipse_demo.html\n :align: center\n :scale: 50\n\n Ellipse Demo\n\n\nBar charts\n==========\n\nUse the :func:`~matplotlib.pyplot.bar` function to make bar charts, which\nincludes customizations such as error bars:\n\n.. figure:: ../../gallery/statistics/images/sphx_glr_barchart_demo_001.png\n :target: ../../gallery/statistics/barchart_demo.html\n :align: center\n :scale: 50\n\n Barchart Demo\n\nYou can also create stacked bars\n(`bar_stacked.py <../../gallery/lines_bars_and_markers/bar_stacked.html>`_),\nor horizontal bar charts\n(`barh.py <../../gallery/lines_bars_and_markers/barh.html>`_).\n\n\n\nPie charts\n==========\n\nThe :func:`~matplotlib.pyplot.pie` function allows you to create pie\ncharts. Optional features include auto-labeling the percentage of area,\nexploding one or more wedges from the center of the pie, and a shadow effect.\nTake a close look at the attached code, which generates this figure in just\na few lines of code.\n\n.. figure:: ../../gallery/pie_and_polar_charts/images/sphx_glr_pie_features_001.png\n :target: ../../gallery/pie_and_polar_charts/pie_features.html\n :align: center\n :scale: 50\n\n Pie Features\n\n\nTables\n======\n\nThe :func:`~matplotlib.pyplot.table` function adds a text table\nto an axes.\n\n.. figure:: ../../gallery/misc/images/sphx_glr_table_demo_001.png\n :target: ../../gallery/misc/table_demo.html\n :align: center\n :scale: 50\n\n Table Demo\n\n\n\n\nScatter plots\n=============\n\nThe :func:`~matplotlib.pyplot.scatter` function makes a scatter plot\nwith (optional) size and color arguments. This example plots changes\nin Google's stock price, with marker sizes reflecting the\ntrading volume and colors varying with time. Here, the\nalpha attribute is used to make semitransparent circle markers.\n\n.. figure:: ../../gallery/lines_bars_and_markers/images/sphx_glr_scatter_demo2_001.png\n :target: ../../gallery/lines_bars_and_markers/scatter_demo2.html\n :align: center\n :scale: 50\n\n Scatter Demo2\n\n\n\nGUI widgets\n===========\n\nMatplotlib has basic GUI widgets that are independent of the graphical\nuser interface you are using, allowing you to write cross GUI figures\nand widgets. See :mod:`matplotlib.widgets` and the\n`widget examples <../../gallery/index.html>`_.\n\n.. figure:: ../../gallery/widgets/images/sphx_glr_slider_demo_001.png\n :target: ../../gallery/widgets/slider_demo.html\n :align: center\n :scale: 50\n\n Slider and radio-button GUI.\n\n\n\nFilled curves\n=============\n\nThe :func:`~matplotlib.pyplot.fill` function lets you\nplot filled curves and polygons:\n\n.. figure:: ../../gallery/lines_bars_and_markers/images/sphx_glr_fill_001.png\n :target: ../../gallery/lines_bars_and_markers/fill.html\n :align: center\n :scale: 50\n\n Fill\n\nThanks to Andrew Straw for adding this function.\n\n\nDate handling\n=============\n\nYou can plot timeseries data with major and minor ticks and custom\ntick formatters for both.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_date_001.png\n :target: ../../gallery/text_labels_and_annotations/date.html\n :align: center\n :scale: 50\n\n Date\n\nSee :mod:`matplotlib.ticker` and :mod:`matplotlib.dates` for details and usage.\n\n\n\nLog plots\n=========\n\nThe :func:`~matplotlib.pyplot.semilogx`,\n:func:`~matplotlib.pyplot.semilogy` and\n:func:`~matplotlib.pyplot.loglog` functions simplify the creation of\nlogarithmic plots.\n\n.. figure:: ../../gallery/scales/images/sphx_glr_log_demo_001.png\n :target: ../../gallery/scales/log_demo.html\n :align: center\n :scale: 50\n\n Log Demo\n\nThanks to Andrew Straw, Darren Dale and Gregory Lielens for contributions\nlog-scaling infrastructure.\n\n\nPolar plots\n===========\n\nThe :func:`~matplotlib.pyplot.polar` function generates polar plots.\n\n.. figure:: ../../gallery/pie_and_polar_charts/images/sphx_glr_polar_demo_001.png\n :target: ../../gallery/pie_and_polar_charts/polar_demo.html\n :align: center\n :scale: 50\n\n Polar Demo\n\n\n\nLegends\n=======\n\nThe :func:`~matplotlib.pyplot.legend` function automatically\ngenerates figure legends, with MATLAB-compatible legend-placement\nfunctions.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_legend_001.png\n :target: ../../gallery/text_labels_and_annotations/legend.html\n :align: center\n :scale: 50\n\n Legend\n\nThanks to Charles Twardy for input on the legend function.\n\n\nTeX-notation for text objects\n=============================\n\nBelow is a sampling of the many TeX expressions now supported by Matplotlib's\ninternal mathtext engine. The mathtext module provides TeX style mathematical\nexpressions using `FreeType `_\nand the DejaVu, BaKoMa computer modern, or `STIX `_\nfonts. See the :mod:`matplotlib.mathtext` module for additional details.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_mathtext_examples_001.png\n :target: ../../gallery/text_labels_and_annotations/mathtext_examples.html\n :align: center\n :scale: 50\n\n Mathtext Examples\n\nMatplotlib's mathtext infrastructure is an independent implementation and\ndoes not require TeX or any external packages installed on your computer. See\nthe tutorial at :doc:`/tutorials/text/mathtext`.\n\n\n\nNative TeX rendering\n====================\n\nAlthough Matplotlib's internal math rendering engine is quite\npowerful, sometimes you need TeX. Matplotlib supports external TeX\nrendering of strings with the *usetex* option.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_tex_demo_001.png\n :target: ../../gallery/text_labels_and_annotations/tex_demo.html\n :align: center\n :scale: 50\n\n Tex Demo\n\n\nEEG GUI\n=======\n\nYou can embed Matplotlib into pygtk, wx, Tk, or Qt applications.\nHere is a screenshot of an EEG viewer called `pbrain\n`__.\n\n\n\n\nThe lower axes uses :func:`~matplotlib.pyplot.specgram`\nto plot the spectrogram of one of the EEG channels.\n\nFor examples of how to embed Matplotlib in different toolkits, see:\n\n * :doc:`/gallery/user_interfaces/embedding_in_gtk3_sgskip`\n * :doc:`/gallery/user_interfaces/embedding_in_wx2_sgskip`\n * :doc:`/gallery/user_interfaces/mpl_with_glade3_sgskip`\n * :doc:`/gallery/user_interfaces/embedding_in_qt_sgskip`\n * :doc:`/gallery/user_interfaces/embedding_in_tk_sgskip`\n\nXKCD-style sketch plots\n=======================\n\nJust for fun, Matplotlib supports plotting in the style of `xkcd\n`.\n\n.. figure:: ../../gallery/showcase/images/sphx_glr_xkcd_001.png\n :target: ../../gallery/showcase/xkcd.html\n :align: center\n :scale: 50\n\n xkcd\n\n"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "metadata": {},
24 | "source": [
25 | "Subplot example\n===============\n\nMany plot types can be combined in one figure to create\npowerful and flexible representations of data.\n\n\n"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": null,
31 | "metadata": {
32 | "collapsed": false
33 | },
34 | "outputs": [],
35 | "source": [
36 | "import matplotlib.pyplot as plt\nimport numpy as np\n\nnp.random.seed(19680801)\ndata = np.random.randn(2, 100)\n\nfig, axs = plt.subplots(2, 2, figsize=(5, 5))\naxs[0, 0].hist(data[0])\naxs[1, 0].scatter(data[0], data[1])\naxs[0, 1].plot(data[0], data[1])\naxs[1, 1].hist2d(data[0], data[1])\n\nplt.show()"
37 | ]
38 | }
39 | ],
40 | "metadata": {
41 | "kernelspec": {
42 | "display_name": "Python 3",
43 | "language": "python",
44 | "name": "python3"
45 | },
46 | "language_info": {
47 | "codemirror_mode": {
48 | "name": "ipython",
49 | "version": 3
50 | },
51 | "file_extension": ".py",
52 | "mimetype": "text/x-python",
53 | "name": "python",
54 | "nbconvert_exporter": "python",
55 | "pygments_lexer": "ipython3",
56 | "version": "3.7.3"
57 | }
58 | },
59 | "nbformat": 4,
60 | "nbformat_minor": 0
61 | }
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/water Mark.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 4,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stdout",
10 | "output_type": "stream",
11 | "text": [
12 | "loading [[[1. 1. 1.]\n",
13 | " [1. 1. 1.]\n",
14 | " [1. 1. 1.]\n",
15 | " ...\n",
16 | " [1. 1. 1.]\n",
17 | " [1. 1. 1.]\n",
18 | " [1. 1. 1.]]\n",
19 | "\n",
20 | " [[1. 1. 1.]\n",
21 | " [1. 1. 1.]\n",
22 | " [1. 1. 1.]\n",
23 | " ...\n",
24 | " [1. 1. 1.]\n",
25 | " [1. 1. 1.]\n",
26 | " [1. 1. 1.]]\n",
27 | "\n",
28 | " [[1. 1. 1.]\n",
29 | " [1. 1. 1.]\n",
30 | " [1. 1. 1.]\n",
31 | " ...\n",
32 | " [1. 1. 1.]\n",
33 | " [1. 1. 1.]\n",
34 | " [1. 1. 1.]]\n",
35 | "\n",
36 | " ...\n",
37 | "\n",
38 | " [[1. 1. 1.]\n",
39 | " [1. 1. 1.]\n",
40 | " [1. 1. 1.]\n",
41 | " ...\n",
42 | " [1. 1. 1.]\n",
43 | " [1. 1. 1.]\n",
44 | " [1. 1. 1.]]\n",
45 | "\n",
46 | " [[1. 1. 1.]\n",
47 | " [1. 1. 1.]\n",
48 | " [1. 1. 1.]\n",
49 | " ...\n",
50 | " [1. 1. 1.]\n",
51 | " [1. 1. 1.]\n",
52 | " [1. 1. 1.]]\n",
53 | "\n",
54 | " [[1. 1. 1.]\n",
55 | " [1. 1. 1.]\n",
56 | " [1. 1. 1.]\n",
57 | " ...\n",
58 | " [1. 1. 1.]\n",
59 | " [1. 1. 1.]\n",
60 | " [1. 1. 1.]]]\n"
61 | ]
62 | },
63 | {
64 | "ename": "TypeError",
65 | "evalue": "Object does not appear to be a 8-bit string path or a Python file-like object",
66 | "output_type": "error",
67 | "traceback": [
68 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
69 | "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
70 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 11\u001b[0m \u001b[0mdatafile\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mimread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'img/matplot.png'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 12\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'loading %s'\u001b[0m \u001b[1;33m%\u001b[0m \u001b[0mdatafile\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 13\u001b[1;33m \u001b[0mim\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mimage\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mimread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdatafile\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 14\u001b[0m \u001b[0mim\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m:\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m-\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m0.5\u001b[0m \u001b[1;31m# set the alpha channel\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
71 | "\u001b[1;32mc:\\users\\kumar\\appdata\\local\\programs\\python\\python36\\lib\\site-packages\\matplotlib\\image.py\u001b[0m in \u001b[0;36mimread\u001b[1;34m(fname, format)\u001b[0m\n\u001b[0;32m 1375\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mhandler\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfd\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1376\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1377\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mhandler\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1378\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1379\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
72 | "\u001b[1;31mTypeError\u001b[0m: Object does not appear to be a 8-bit string path or a Python file-like object"
73 | ]
74 | }
75 | ],
76 | "source": [
77 | "\n",
78 | "import numpy as np\n",
79 | "import matplotlib.cbook as cbook\n",
80 | "import matplotlib.image as image\n",
81 | "import matplotlib.pyplot as plt\n",
82 | "\n",
83 | "# Fixing random state for reproducibility\n",
84 | "np.random.seed(19680801)\n",
85 | "\n",
86 | "\n",
87 | "datafile = plt.imread('img/matplot.png')\n",
88 | "print('loading %s' % datafile)\n",
89 | "im = image.imread(datafile)\n",
90 | "im[:, :, -1] = 0.5 # set the alpha channel\n",
91 | "\n",
92 | "fig, ax = plt.subplots()\n",
93 | "\n",
94 | "ax.plot(np.random.rand(20), '-o', ms=20, lw=2, alpha=0.7, mfc='orange')\n",
95 | "ax.grid()\n",
96 | "fig.figimage(im, 10, 10, zorder=3)\n",
97 | "\n",
98 | "plt.show()\n"
99 | ]
100 | }
101 | ],
102 | "metadata": {
103 | "kernelspec": {
104 | "display_name": "Python 3",
105 | "language": "python",
106 | "name": "python3"
107 | },
108 | "language_info": {
109 | "codemirror_mode": {
110 | "name": "ipython",
111 | "version": 3
112 | },
113 | "file_extension": ".py",
114 | "mimetype": "text/x-python",
115 | "name": "python",
116 | "nbconvert_exporter": "python",
117 | "pygments_lexer": "ipython3",
118 | "version": "3.6.3"
119 | }
120 | },
121 | "nbformat": 4,
122 | "nbformat_minor": 2
123 | }
124 |
--------------------------------------------------------------------------------
/Data Visualization/Matplotlib/watermark_image.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import matplotlib.cbook as cbook
3 | import matplotlib.image as image
4 | import matplotlib.pyplot as plt
5 |
6 | # Fixing random state for reproducibility
7 | np.random.seed(19680801)
8 |
9 |
10 | datafile = cbook.get_sample_data('logo2.png', asfileobj=False)
11 | print('loading %s' % datafile)
12 | im = image.imread(datafile)
13 | im[:, :, -1] = 0.5 # set the alpha channel
14 |
15 | fig, ax = plt.subplots()
16 |
17 | ax.plot(np.random.rand(20), '-o', ms=20, lw=2, alpha=0.7, mfc='orange')
18 | ax.grid()
19 | fig.figimage(im, 10, 10, zorder=3)
20 |
21 | plt.show()
22 |
--------------------------------------------------------------------------------
/Data Visualization/README.md:
--------------------------------------------------------------------------------
1 | ### Data visualization
2 | Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images. This communication is achieved through the use of a systematic mapping between graphic marks and data values in the creation of the visualization.
3 |
--------------------------------------------------------------------------------
/Data Visualization/SeaBorn/README.md:
--------------------------------------------------------------------------------
1 | Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
2 |
--------------------------------------------------------------------------------
/Data/Boston/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Data/Boston/housing.names.txt:
--------------------------------------------------------------------------------
1 | 1. Title: Boston Housing Data
2 |
3 | 2. Sources:
4 | (a) Origin: This dataset was taken from the StatLib library which is
5 | maintained at Carnegie Mellon University.
6 | (b) Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the
7 | demand for clean air', J. Environ. Economics & Management,
8 | vol.5, 81-102, 1978.
9 | (c) Date: July 7, 1993
10 |
11 | 3. Past Usage:
12 | - Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley,
13 | 1980. N.B. Various transformations are used in the table on
14 | pages 244-261.
15 | - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning.
16 | In Proceedings on the Tenth International Conference of Machine
17 | Learning, 236-243, University of Massachusetts, Amherst. Morgan
18 | Kaufmann.
19 |
20 | 4. Relevant Information:
21 |
22 | Concerns housing values in suburbs of Boston.
23 |
24 | 5. Number of Instances: 506
25 |
26 | 6. Number of Attributes: 13 continuous attributes (including "class"
27 | attribute "MEDV"), 1 binary-valued attribute.
28 |
29 | 7. Attribute Information:
30 |
31 | 1. CRIM per capita crime rate by town
32 | 2. ZN proportion of residential land zoned for lots over
33 | 25,000 sq.ft.
34 | 3. INDUS proportion of non-retail business acres per town
35 | 4. CHAS Charles River dummy variable (= 1 if tract bounds
36 | river; 0 otherwise)
37 | 5. NOX nitric oxides concentration (parts per 10 million)
38 | 6. RM average number of rooms per dwelling
39 | 7. AGE proportion of owner-occupied units built prior to 1940
40 | 8. DIS weighted distances to five Boston employment centres
41 | 9. RAD index of accessibility to radial highways
42 | 10. TAX full-value property-tax rate per $10,000
43 | 11. PTRATIO pupil-teacher ratio by town
44 | 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks
45 | by town
46 | 13. LSTAT % lower status of the population
47 | 14. MEDV Median value of owner-occupied homes in $1000's
48 |
49 | 8. Missing Attribute Values: None.
50 |
51 |
52 |
53 |
--------------------------------------------------------------------------------
/Data/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Data/iris/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Data/iris/iris.csv:
--------------------------------------------------------------------------------
1 | sepal_length,sepal_width,petal_length,petal_width,species
2 | 5.1,3.5,1.4,0.2,setosa
3 | 4.9,3,1.4,0.2,setosa
4 | 4.7,3.2,1.3,0.2,setosa
5 | 4.6,3.1,1.5,0.2,setosa
6 | 5,3.6,1.4,0.2,setosa
7 | 5.4,3.9,1.7,0.4,setosa
8 | 4.6,3.4,1.4,0.3,setosa
9 | 5,3.4,1.5,0.2,setosa
10 | 4.4,2.9,1.4,0.2,setosa
11 | 4.9,3.1,1.5,0.1,setosa
12 | 5.4,3.7,1.5,0.2,setosa
13 | 4.8,3.4,1.6,0.2,setosa
14 | 4.8,3,1.4,0.1,setosa
15 | 4.3,3,1.1,0.1,setosa
16 | 5.8,4,1.2,0.2,setosa
17 | 5.7,4.4,1.5,0.4,setosa
18 | 5.4,3.9,1.3,0.4,setosa
19 | 5.1,3.5,1.4,0.3,setosa
20 | 5.7,3.8,1.7,0.3,setosa
21 | 5.1,3.8,1.5,0.3,setosa
22 | 5.4,3.4,1.7,0.2,setosa
23 | 5.1,3.7,1.5,0.4,setosa
24 | 4.6,3.6,1,0.2,setosa
25 | 5.1,3.3,1.7,0.5,setosa
26 | 4.8,3.4,1.9,0.2,setosa
27 | 5,3,1.6,0.2,setosa
28 | 5,3.4,1.6,0.4,setosa
29 | 5.2,3.5,1.5,0.2,setosa
30 | 5.2,3.4,1.4,0.2,setosa
31 | 4.7,3.2,1.6,0.2,setosa
32 | 4.8,3.1,1.6,0.2,setosa
33 | 5.4,3.4,1.5,0.4,setosa
34 | 5.2,4.1,1.5,0.1,setosa
35 | 5.5,4.2,1.4,0.2,setosa
36 | 4.9,3.1,1.5,0.2,setosa
37 | 5,3.2,1.2,0.2,setosa
38 | 5.5,3.5,1.3,0.2,setosa
39 | 4.9,3.6,1.4,0.1,setosa
40 | 4.4,3,1.3,0.2,setosa
41 | 5.1,3.4,1.5,0.2,setosa
42 | 5,3.5,1.3,0.3,setosa
43 | 4.5,2.3,1.3,0.3,setosa
44 | 4.4,3.2,1.3,0.2,setosa
45 | 5,3.5,1.6,0.6,setosa
46 | 5.1,3.8,1.9,0.4,setosa
47 | 4.8,3,1.4,0.3,setosa
48 | 5.1,3.8,1.6,0.2,setosa
49 | 4.6,3.2,1.4,0.2,setosa
50 | 5.3,3.7,1.5,0.2,setosa
51 | 5,3.3,1.4,0.2,setosa
52 | 7,3.2,4.7,1.4,versicolor
53 | 6.4,3.2,4.5,1.5,versicolor
54 | 6.9,3.1,4.9,1.5,versicolor
55 | 5.5,2.3,4,1.3,versicolor
56 | 6.5,2.8,4.6,1.5,versicolor
57 | 5.7,2.8,4.5,1.3,versicolor
58 | 6.3,3.3,4.7,1.6,versicolor
59 | 4.9,2.4,3.3,1,versicolor
60 | 6.6,2.9,4.6,1.3,versicolor
61 | 5.2,2.7,3.9,1.4,versicolor
62 | 5,2,3.5,1,versicolor
63 | 5.9,3,4.2,1.5,versicolor
64 | 6,2.2,4,1,versicolor
65 | 6.1,2.9,4.7,1.4,versicolor
66 | 5.6,2.9,3.6,1.3,versicolor
67 | 6.7,3.1,4.4,1.4,versicolor
68 | 5.6,3,4.5,1.5,versicolor
69 | 5.8,2.7,4.1,1,versicolor
70 | 6.2,2.2,4.5,1.5,versicolor
71 | 5.6,2.5,3.9,1.1,versicolor
72 | 5.9,3.2,4.8,1.8,versicolor
73 | 6.1,2.8,4,1.3,versicolor
74 | 6.3,2.5,4.9,1.5,versicolor
75 | 6.1,2.8,4.7,1.2,versicolor
76 | 6.4,2.9,4.3,1.3,versicolor
77 | 6.6,3,4.4,1.4,versicolor
78 | 6.8,2.8,4.8,1.4,versicolor
79 | 6.7,3,5,1.7,versicolor
80 | 6,2.9,4.5,1.5,versicolor
81 | 5.7,2.6,3.5,1,versicolor
82 | 5.5,2.4,3.8,1.1,versicolor
83 | 5.5,2.4,3.7,1,versicolor
84 | 5.8,2.7,3.9,1.2,versicolor
85 | 6,2.7,5.1,1.6,versicolor
86 | 5.4,3,4.5,1.5,versicolor
87 | 6,3.4,4.5,1.6,versicolor
88 | 6.7,3.1,4.7,1.5,versicolor
89 | 6.3,2.3,4.4,1.3,versicolor
90 | 5.6,3,4.1,1.3,versicolor
91 | 5.5,2.5,4,1.3,versicolor
92 | 5.5,2.6,4.4,1.2,versicolor
93 | 6.1,3,4.6,1.4,versicolor
94 | 5.8,2.6,4,1.2,versicolor
95 | 5,2.3,3.3,1,versicolor
96 | 5.6,2.7,4.2,1.3,versicolor
97 | 5.7,3,4.2,1.2,versicolor
98 | 5.7,2.9,4.2,1.3,versicolor
99 | 6.2,2.9,4.3,1.3,versicolor
100 | 5.1,2.5,3,1.1,versicolor
101 | 5.7,2.8,4.1,1.3,versicolor
102 | 6.3,3.3,6,2.5,virginica
103 | 5.8,2.7,5.1,1.9,virginica
104 | 7.1,3,5.9,2.1,virginica
105 | 6.3,2.9,5.6,1.8,virginica
106 | 6.5,3,5.8,2.2,virginica
107 | 7.6,3,6.6,2.1,virginica
108 | 4.9,2.5,4.5,1.7,virginica
109 | 7.3,2.9,6.3,1.8,virginica
110 | 6.7,2.5,5.8,1.8,virginica
111 | 7.2,3.6,6.1,2.5,virginica
112 | 6.5,3.2,5.1,2,virginica
113 | 6.4,2.7,5.3,1.9,virginica
114 | 6.8,3,5.5,2.1,virginica
115 | 5.7,2.5,5,2,virginica
116 | 5.8,2.8,5.1,2.4,virginica
117 | 6.4,3.2,5.3,2.3,virginica
118 | 6.5,3,5.5,1.8,virginica
119 | 7.7,3.8,6.7,2.2,virginica
120 | 7.7,2.6,6.9,2.3,virginica
121 | 6,2.2,5,1.5,virginica
122 | 6.9,3.2,5.7,2.3,virginica
123 | 5.6,2.8,4.9,2,virginica
124 | 7.7,2.8,6.7,2,virginica
125 | 6.3,2.7,4.9,1.8,virginica
126 | 6.7,3.3,5.7,2.1,virginica
127 | 7.2,3.2,6,1.8,virginica
128 | 6.2,2.8,4.8,1.8,virginica
129 | 6.1,3,4.9,1.8,virginica
130 | 6.4,2.8,5.6,2.1,virginica
131 | 7.2,3,5.8,1.6,virginica
132 | 7.4,2.8,6.1,1.9,virginica
133 | 7.9,3.8,6.4,2,virginica
134 | 6.4,2.8,5.6,2.2,virginica
135 | 6.3,2.8,5.1,1.5,virginica
136 | 6.1,2.6,5.6,1.4,virginica
137 | 7.7,3,6.1,2.3,virginica
138 | 6.3,3.4,5.6,2.4,virginica
139 | 6.4,3.1,5.5,1.8,virginica
140 | 6,3,4.8,1.8,virginica
141 | 6.9,3.1,5.4,2.1,virginica
142 | 6.7,3.1,5.6,2.4,virginica
143 | 6.9,3.1,5.1,2.3,virginica
144 | 5.8,2.7,5.1,1.9,virginica
145 | 6.8,3.2,5.9,2.3,virginica
146 | 6.7,3.3,5.7,2.5,virginica
147 | 6.7,3,5.2,2.3,virginica
148 | 6.3,2.5,5,1.9,virginica
149 | 6.5,3,5.2,2,virginica
150 | 6.2,3.4,5.4,2.3,virginica
151 | 5.9,3,5.1,1.8,virginica
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/Life Cycle Process of Data Science In Real World project/DSPD0101ENT-Business Understanding(Problem)-to-Analytic Approach.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "button": false,
7 | "deletable": true,
8 | "new_sheet": false,
9 | "run_control": {
10 | "read_only": false
11 | }
12 | },
13 | "source": [
14 | "From Problem to Approach
"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {
20 | "button": false,
21 | "deletable": true,
22 | "new_sheet": false,
23 | "run_control": {
24 | "read_only": false
25 | }
26 | },
27 | "source": [
28 | "## Introduction\n",
29 | "\n",
30 | "The aim of these labs is to reinforce the concepts that we discuss in each module's videos. These labs will revolve around the use case of food recipes, and together, we will walk through the process that data scientists usually follow when trying to solve a problem. Let's get started!\n",
31 | "\n",
32 | "In this lab, we will start learning about the data science methodology, and focus on the **Business Understanding** and the **Analytic Approach** stages.\n",
33 | "\n",
34 | "------------"
35 | ]
36 | },
37 | {
38 | "cell_type": "markdown",
39 | "metadata": {
40 | "button": false,
41 | "deletable": true,
42 | "new_sheet": false,
43 | "run_control": {
44 | "read_only": false
45 | }
46 | },
47 | "source": [
48 | "## Table of Contents\n",
49 | "\n",
50 | "\n",
51 | "\n",
52 | "1. [Business Understanding](#0)
\n",
53 | "2. [Analytic Approach](#2)
\n",
54 | "
\n",
55 | "
"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "metadata": {
61 | "button": false,
62 | "deletable": true,
63 | "new_sheet": false,
64 | "run_control": {
65 | "read_only": false
66 | }
67 | },
68 | "source": [
69 | "# Business Understanding "
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {
75 | "button": false,
76 | "deletable": true,
77 | "new_sheet": false,
78 | "run_control": {
79 | "read_only": false
80 | }
81 | },
82 | "source": [
83 | "This is the **Data Science Methodology**, a flowchart that begins with business understanding."
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {
89 | "button": false,
90 | "deletable": true,
91 | "new_sheet": false,
92 | "run_control": {
93 | "read_only": false
94 | }
95 | },
96 | "source": [
97 | "
"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {
103 | "button": false,
104 | "deletable": true,
105 | "new_sheet": false,
106 | "run_control": {
107 | "read_only": false
108 | }
109 | },
110 | "source": [
111 | "#### Why is the business understanding stage important?"
112 | ]
113 | },
114 | {
115 | "cell_type": "raw",
116 | "metadata": {
117 | "button": false,
118 | "deletable": true,
119 | "new_sheet": false,
120 | "run_control": {
121 | "read_only": false
122 | }
123 | },
124 | "source": [
125 | "Your Answer: the beginning of the methodology because getting\n",
126 | "clarity around the problem to be solved, allows you to determine which data will be used to\n",
127 | "answer the core question."
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {
133 | "button": false,
134 | "deletable": true,
135 | "new_sheet": false,
136 | "run_control": {
137 | "read_only": false
138 | }
139 | },
140 | "source": [
141 | "Double-click __here__ for the solution.\n",
142 | ""
145 | ]
146 | },
147 | {
148 | "cell_type": "markdown",
149 | "metadata": {
150 | "button": false,
151 | "deletable": true,
152 | "new_sheet": false,
153 | "run_control": {
154 | "read_only": false
155 | }
156 | },
157 | "source": [
158 | "#### Looking at this diagram, we immediately spot two outstanding features of the data science methodology."
159 | ]
160 | },
161 | {
162 | "cell_type": "markdown",
163 | "metadata": {
164 | "button": false,
165 | "deletable": true,
166 | "new_sheet": false,
167 | "run_control": {
168 | "read_only": false
169 | }
170 | },
171 | "source": [
172 | "
"
173 | ]
174 | },
175 | {
176 | "cell_type": "markdown",
177 | "metadata": {
178 | "button": false,
179 | "deletable": true,
180 | "new_sheet": false,
181 | "run_control": {
182 | "read_only": false
183 | }
184 | },
185 | "source": [
186 | "#### What are they?"
187 | ]
188 | },
189 | {
190 | "cell_type": "raw",
191 | "metadata": {
192 | "button": false,
193 | "deletable": true,
194 | "new_sheet": false,
195 | "run_control": {
196 | "read_only": false
197 | }
198 | },
199 | "source": [
200 | "Your Answer: \n",
201 | "1. It is Iterative Process \n",
202 | "2.Data Science is Never Ending Process"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {
208 | "button": false,
209 | "deletable": true,
210 | "new_sheet": false,
211 | "run_control": {
212 | "read_only": false
213 | }
214 | },
215 | "source": [
216 | "Double-click __here__ for the solution.\n",
217 | ""
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {
226 | "button": false,
227 | "deletable": true,
228 | "new_sheet": false,
229 | "run_control": {
230 | "read_only": false
231 | }
232 | },
233 | "source": [
234 | "#### Now let's illustrate the data science methodology with a case study."
235 | ]
236 | },
237 | {
238 | "cell_type": "markdown",
239 | "metadata": {
240 | "button": false,
241 | "deletable": true,
242 | "new_sheet": false,
243 | "run_control": {
244 | "read_only": false
245 | }
246 | },
247 | "source": [
248 | "Say, we are interested in automating the process of figuring out the cuisine of a given dish or recipe. Let's apply the business understanding stage to this problem."
249 | ]
250 | },
251 | {
252 | "cell_type": "markdown",
253 | "metadata": {
254 | "button": false,
255 | "deletable": true,
256 | "new_sheet": false,
257 | "run_control": {
258 | "read_only": false
259 | }
260 | },
261 | "source": [
262 | "#### Q. Can we predict the cuisine of a given dish using the name of the dish only?"
263 | ]
264 | },
265 | {
266 | "cell_type": "raw",
267 | "metadata": {
268 | "button": false,
269 | "deletable": true,
270 | "new_sheet": false,
271 | "run_control": {
272 | "read_only": false
273 | }
274 | },
275 | "source": [
276 | "Your Answer:no\n",
277 | "\n"
278 | ]
279 | },
280 | {
281 | "cell_type": "markdown",
282 | "metadata": {
283 | "button": false,
284 | "deletable": true,
285 | "new_sheet": false,
286 | "run_control": {
287 | "read_only": false
288 | }
289 | },
290 | "source": [
291 | "Double-click __here__ for the solution.\n",
292 | ""
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {
300 | "button": false,
301 | "deletable": true,
302 | "new_sheet": false,
303 | "run_control": {
304 | "read_only": false
305 | }
306 | },
307 | "source": [
308 | "#### Q. For example, the following dish names were taken from the menu of a local restaurant in Toronto, Ontario in Canada. \n",
309 | "\n",
310 | "#### 1. Beast\n",
311 | "#### 2. 2 PM\n",
312 | "#### 3. 4 Minute"
313 | ]
314 | },
315 | {
316 | "cell_type": "markdown",
317 | "metadata": {
318 | "button": false,
319 | "deletable": true,
320 | "new_sheet": false,
321 | "run_control": {
322 | "read_only": false
323 | }
324 | },
325 | "source": [
326 | "#### Are you able to tell the cuisine of these dishes?"
327 | ]
328 | },
329 | {
330 | "cell_type": "raw",
331 | "metadata": {
332 | "button": false,
333 | "deletable": true,
334 | "new_sheet": false,
335 | "run_control": {
336 | "read_only": false
337 | }
338 | },
339 | "source": [
340 | "Your Answer:\n",
341 | "\n"
342 | ]
343 | },
344 | {
345 | "cell_type": "markdown",
346 | "metadata": {
347 | "button": false,
348 | "deletable": true,
349 | "new_sheet": false,
350 | "run_control": {
351 | "read_only": false
352 | }
353 | },
354 | "source": [
355 | "Double-click __here__ for the solution.\n",
356 | "\n",
359 | "\n",
360 | "\n",
363 | "\n",
364 | "\n",
367 | "\n",
368 | "\n",
371 | "\n",
372 | ""
375 | ]
376 | },
377 | {
378 | "cell_type": "markdown",
379 | "metadata": {
380 | "button": false,
381 | "deletable": true,
382 | "new_sheet": false,
383 | "run_control": {
384 | "read_only": false
385 | }
386 | },
387 | "source": [
388 | "#### Q. What about by appearance only? Yes or No."
389 | ]
390 | },
391 | {
392 | "cell_type": "raw",
393 | "metadata": {
394 | "button": false,
395 | "deletable": true,
396 | "new_sheet": false,
397 | "run_control": {
398 | "read_only": false
399 | }
400 | },
401 | "source": [
402 | "Your Answer:\n",
403 | "\n",
404 | "no"
405 | ]
406 | },
407 | {
408 | "cell_type": "markdown",
409 | "metadata": {
410 | "button": false,
411 | "deletable": true,
412 | "new_sheet": false,
413 | "run_control": {
414 | "read_only": false
415 | }
416 | },
417 | "source": [
418 | "Double-click __here__ for the solution.\n",
419 | ""
422 | ]
423 | },
424 | {
425 | "cell_type": "markdown",
426 | "metadata": {
427 | "button": false,
428 | "deletable": true,
429 | "new_sheet": false,
430 | "run_control": {
431 | "read_only": false
432 | }
433 | },
434 | "source": [
435 | "At this point, we realize that automating the process of determining the cuisine of a given dish is not a straightforward problem as we need to come up with a way that is very robust to the many cuisines and their variations."
436 | ]
437 | },
438 | {
439 | "cell_type": "markdown",
440 | "metadata": {
441 | "button": false,
442 | "deletable": true,
443 | "new_sheet": false,
444 | "run_control": {
445 | "read_only": false
446 | }
447 | },
448 | "source": [
449 | "#### Q. What about determining the cuisine of a dish based on its ingredients?"
450 | ]
451 | },
452 | {
453 | "cell_type": "raw",
454 | "metadata": {
455 | "button": false,
456 | "deletable": true,
457 | "new_sheet": false,
458 | "run_control": {
459 | "read_only": false
460 | }
461 | },
462 | "source": [
463 | "Your Answer:\n",
464 | "\n",
465 | "Potentially yes, as there are specific ingredients unique to each cuisine"
466 | ]
467 | },
468 | {
469 | "cell_type": "markdown",
470 | "metadata": {
471 | "button": false,
472 | "deletable": true,
473 | "new_sheet": false,
474 | "run_control": {
475 | "read_only": false
476 | }
477 | },
478 | "source": [
479 | "Double-click __here__ for the solution.\n",
480 | ""
483 | ]
484 | },
485 | {
486 | "cell_type": "markdown",
487 | "metadata": {
488 | "button": false,
489 | "deletable": true,
490 | "new_sheet": false,
491 | "run_control": {
492 | "read_only": false
493 | }
494 | },
495 | "source": [
496 | "As you guessed, yes determining the cuisine of a given dish based on its ingredients seems like a viable solution as some ingredients are unique to cuisines. For example:"
497 | ]
498 | },
499 | {
500 | "cell_type": "markdown",
501 | "metadata": {
502 | "button": false,
503 | "deletable": true,
504 | "new_sheet": false,
505 | "run_control": {
506 | "read_only": false
507 | }
508 | },
509 | "source": [
510 | "* When we talk about **American** cuisines, the first ingredient that comes to one's mind (or at least to my mind =D) is beef or turkey.\n",
511 | "\n",
512 | "* When we talk about **British** cuisines, the first ingredient that comes to one's mind is haddock or mint sauce.\n",
513 | "\n",
514 | "* When we talk about **Canadian** cuisines, the first ingredient that comes to one's mind is bacon or poutine.\n",
515 | "\n",
516 | "* When we talk about **French** cuisines, the first ingredient that comes to one's mind is bread or butter.\n",
517 | "\n",
518 | "* When we talk about **Italian** cuisines, the first ingredient that comes to one's mind is tomato or ricotta.\n",
519 | "\n",
520 | "* When we talk about **Japanese** cuisines, the first ingredient that comes to one's mind is seaweed or soy sauce.\n",
521 | "\n",
522 | "* When we talk about **Chinese** cuisines, the first ingredient that comes to one's mind is ginger or garlic.\n",
523 | "\n",
524 | "* When we talk about **indian** cuisines, the first ingredient that comes to one's mind is masala or chillis."
525 | ]
526 | },
527 | {
528 | "cell_type": "markdown",
529 | "metadata": {
530 | "button": false,
531 | "deletable": true,
532 | "new_sheet": false,
533 | "run_control": {
534 | "read_only": false
535 | }
536 | },
537 | "source": [
538 | "#### Accordingly, can you determine the cuisine of the dish associated with the following list of ingredients?"
539 | ]
540 | },
541 | {
542 | "cell_type": "markdown",
543 | "metadata": {
544 | "button": false,
545 | "deletable": true,
546 | "new_sheet": false,
547 | "run_control": {
548 | "read_only": false
549 | }
550 | },
551 | "source": [
552 | "
"
553 | ]
554 | },
555 | {
556 | "cell_type": "raw",
557 | "metadata": {
558 | "button": false,
559 | "deletable": true,
560 | "new_sheet": false,
561 | "run_control": {
562 | "read_only": false
563 | }
564 | },
565 | "source": [
566 | "Your Answer:\n",
567 | "\n",
568 | "Japanese since the recipe is most likely that of a sushi roll."
569 | ]
570 | },
571 | {
572 | "cell_type": "markdown",
573 | "metadata": {
574 | "button": false,
575 | "deletable": true,
576 | "new_sheet": false,
577 | "run_control": {
578 | "read_only": false
579 | }
580 | },
581 | "source": [
582 | "Double-click __here__ for the solution.\n",
583 | ""
586 | ]
587 | },
588 | {
589 | "cell_type": "markdown",
590 | "metadata": {
591 | "button": false,
592 | "deletable": true,
593 | "new_sheet": false,
594 | "run_control": {
595 | "read_only": false
596 | }
597 | },
598 | "source": [
599 | "# Analytic Approach "
600 | ]
601 | },
602 | {
603 | "cell_type": "markdown",
604 | "metadata": {
605 | "button": false,
606 | "deletable": true,
607 | "new_sheet": false,
608 | "run_control": {
609 | "read_only": false
610 | }
611 | },
612 | "source": [
613 | "
"
614 | ]
615 | },
616 | {
617 | "cell_type": "markdown",
618 | "metadata": {
619 | "button": false,
620 | "deletable": true,
621 | "new_sheet": false,
622 | "run_control": {
623 | "read_only": false
624 | }
625 | },
626 | "source": [
627 | "#### So why are we interested in data science?"
628 | ]
629 | },
630 | {
631 | "cell_type": "markdown",
632 | "metadata": {
633 | "button": false,
634 | "deletable": true,
635 | "new_sheet": false,
636 | "run_control": {
637 | "read_only": false
638 | }
639 | },
640 | "source": [
641 | "Once the business problem has been clearly stated, the data scientist can define the analytic approach to solve the problem. This step entails expressing the problem in the context of statistical and machine-learning techniques, so that the entity or stakeholders with the problem can identify the most suitable techniques for the desired outcome. "
642 | ]
643 | },
644 | {
645 | "cell_type": "markdown",
646 | "metadata": {
647 | "button": false,
648 | "deletable": true,
649 | "new_sheet": false,
650 | "run_control": {
651 | "read_only": false
652 | }
653 | },
654 | "source": [
655 | "#### Why is the analytic approach stage important?"
656 | ]
657 | },
658 | {
659 | "cell_type": "raw",
660 | "metadata": {
661 | "button": false,
662 | "deletable": true,
663 | "new_sheet": false,
664 | "run_control": {
665 | "read_only": false
666 | }
667 | },
668 | "source": [
669 | "Your Answer:\n",
670 | "\n"
671 | ]
672 | },
673 | {
674 | "cell_type": "markdown",
675 | "metadata": {
676 | "button": false,
677 | "deletable": true,
678 | "new_sheet": false,
679 | "run_control": {
680 | "read_only": false
681 | }
682 | },
683 | "source": [
684 | "Double-click __here__ for the solution.\n",
685 | ""
688 | ]
689 | },
690 | {
691 | "cell_type": "markdown",
692 | "metadata": {
693 | "button": false,
694 | "deletable": true,
695 | "new_sheet": false,
696 | "run_control": {
697 | "read_only": false
698 | }
699 | },
700 | "source": [
701 | "#### Let's explore a machine learning algorithm, decision trees, and see if it is the right technique to automate the process of identifying the cuisine of a given dish or recipe while simultaneously providing us with some insight on why a given recipe is believed to belong to a certain type of cuisine."
702 | ]
703 | },
704 | {
705 | "cell_type": "markdown",
706 | "metadata": {
707 | "button": false,
708 | "deletable": true,
709 | "new_sheet": false,
710 | "run_control": {
711 | "read_only": false
712 | }
713 | },
714 | "source": [
715 | "This is a decision tree that a naive person might create manually. Starting at the top with all the recipes for all the cuisines in the world, if a recipe contains **rice**, then this decision tree would classify it as a **Japanese** cuisine. Otherwise, it would be classified as not a **Japanese** cuisine."
716 | ]
717 | },
718 | {
719 | "cell_type": "markdown",
720 | "metadata": {
721 | "button": false,
722 | "deletable": true,
723 | "new_sheet": false,
724 | "run_control": {
725 | "read_only": false
726 | }
727 | },
728 | "source": [
729 | "
"
730 | ]
731 | },
732 | {
733 | "cell_type": "markdown",
734 | "metadata": {
735 | "button": false,
736 | "deletable": true,
737 | "new_sheet": false,
738 | "run_control": {
739 | "read_only": false
740 | }
741 | },
742 | "source": [
743 | "#### Is this a good decision tree? Yes or No, and why? "
744 | ]
745 | },
746 | {
747 | "cell_type": "raw",
748 | "metadata": {
749 | "button": false,
750 | "deletable": true,
751 | "new_sheet": false,
752 | "run_control": {
753 | "read_only": false
754 | }
755 | },
756 | "source": [
757 | "Your Answer:\n",
758 | "\n"
759 | ]
760 | },
761 | {
762 | "cell_type": "markdown",
763 | "metadata": {
764 | "button": false,
765 | "deletable": true,
766 | "new_sheet": false,
767 | "run_control": {
768 | "read_only": false
769 | }
770 | },
771 | "source": [
772 | "Double-click __here__ for the solution.\n",
773 | ""
776 | ]
777 | },
778 | {
779 | "cell_type": "markdown",
780 | "metadata": {
781 | "button": false,
782 | "deletable": true,
783 | "new_sheet": false,
784 | "run_control": {
785 | "read_only": false
786 | }
787 | },
788 | "source": [
789 | "#### In order to build a very powerful decision tree for the recipe case study, let's take some time to learn more about decision trees."
790 | ]
791 | },
792 | {
793 | "cell_type": "markdown",
794 | "metadata": {
795 | "button": false,
796 | "deletable": true,
797 | "new_sheet": false,
798 | "run_control": {
799 | "read_only": false
800 | }
801 | },
802 | "source": [
803 | "* Decision trees are built using recursive partitioning to classify the data.\n",
804 | "* When partitioning the data, decision trees use the most predictive feature (ingredient in this case) to split the data.\n",
805 | "* **Predictiveness** is based on decrease in entropy - gain in information, or *impurity*."
806 | ]
807 | },
808 | {
809 | "cell_type": "markdown",
810 | "metadata": {
811 | "button": false,
812 | "deletable": true,
813 | "new_sheet": false,
814 | "run_control": {
815 | "read_only": false
816 | }
817 | },
818 | "source": [
819 | "#### Suppose that our data is comprised of green triangles and red circles."
820 | ]
821 | },
822 | {
823 | "cell_type": "markdown",
824 | "metadata": {
825 | "button": false,
826 | "deletable": true,
827 | "new_sheet": false,
828 | "run_control": {
829 | "read_only": false
830 | }
831 | },
832 | "source": [
833 | "The following decision tree would be considered the optimal model for classifying the data into a node for green triangles and a node for red circles."
834 | ]
835 | },
836 | {
837 | "cell_type": "markdown",
838 | "metadata": {
839 | "button": false,
840 | "deletable": true,
841 | "new_sheet": false,
842 | "run_control": {
843 | "read_only": false
844 | }
845 | },
846 | "source": [
847 | "
"
848 | ]
849 | },
850 | {
851 | "cell_type": "markdown",
852 | "metadata": {
853 | "button": false,
854 | "deletable": true,
855 | "new_sheet": false,
856 | "run_control": {
857 | "read_only": false
858 | }
859 | },
860 | "source": [
861 | "Each of the classes in the leaf nodes are completely pure – that is, each leaf node only contains datapoints that belong to the same class."
862 | ]
863 | },
864 | {
865 | "cell_type": "markdown",
866 | "metadata": {
867 | "button": false,
868 | "deletable": true,
869 | "new_sheet": false,
870 | "run_control": {
871 | "read_only": false
872 | }
873 | },
874 | "source": [
875 | "On the other hand, the following decision tree is an example of the worst-case scenario that the model could output. "
876 | ]
877 | },
878 | {
879 | "cell_type": "markdown",
880 | "metadata": {
881 | "button": false,
882 | "deletable": true,
883 | "new_sheet": false,
884 | "run_control": {
885 | "read_only": false
886 | }
887 | },
888 | "source": [
889 | "
"
890 | ]
891 | },
892 | {
893 | "cell_type": "markdown",
894 | "metadata": {
895 | "button": false,
896 | "deletable": true,
897 | "new_sheet": false,
898 | "run_control": {
899 | "read_only": false
900 | }
901 | },
902 | "source": [
903 | "Each leaf node contains datapoints belonging to the two classes resulting in many datapoints ultimately being misclassified."
904 | ]
905 | },
906 | {
907 | "cell_type": "markdown",
908 | "metadata": {
909 | "button": false,
910 | "deletable": true,
911 | "new_sheet": false,
912 | "run_control": {
913 | "read_only": false
914 | }
915 | },
916 | "source": [
917 | "#### A tree stops growing at a node when:\n",
918 | "* Pure or nearly pure.\n",
919 | "* No remaining variables on which to further subset the data.\n",
920 | "* The tree has grown to a preselected size limit."
921 | ]
922 | },
923 | {
924 | "cell_type": "markdown",
925 | "metadata": {
926 | "button": false,
927 | "deletable": true,
928 | "new_sheet": false,
929 | "run_control": {
930 | "read_only": false
931 | }
932 | },
933 | "source": [
934 | "#### Here are some characteristics of decision trees:"
935 | ]
936 | },
937 | {
938 | "cell_type": "markdown",
939 | "metadata": {
940 | "button": false,
941 | "deletable": true,
942 | "new_sheet": false,
943 | "run_control": {
944 | "read_only": false
945 | }
946 | },
947 | "source": [
948 | "
"
949 | ]
950 | },
951 | {
952 | "cell_type": "markdown",
953 | "metadata": {
954 | "button": false,
955 | "deletable": true,
956 | "new_sheet": false,
957 | "run_control": {
958 | "read_only": false
959 | }
960 | },
961 | "source": [
962 | "Now let's put what we learned about decision trees to use. Let's try and build a much better version of the decision tree for our recipe problem."
963 | ]
964 | },
965 | {
966 | "cell_type": "markdown",
967 | "metadata": {
968 | "button": false,
969 | "deletable": true,
970 | "new_sheet": false,
971 | "run_control": {
972 | "read_only": false
973 | }
974 | },
975 | "source": [
976 | "
"
977 | ]
978 | },
979 | {
980 | "cell_type": "markdown",
981 | "metadata": {
982 | "button": false,
983 | "deletable": true,
984 | "new_sheet": false,
985 | "run_control": {
986 | "read_only": false
987 | }
988 | },
989 | "source": [
990 | "I hope you agree that the above decision tree is a much better version than the previous one. Although we are still using **Rice** as the ingredient in the first *decision node*, recipes get divided into **Asian Food** and **Non-Asian Food**. **Asian Food** is then further divided into **Japanese** and **Not Japanese** based on the **Wasabi** ingredient. This process of splitting *leaf nodes* continues until each *leaf node* is pure, i.e., containing recipes belonging to only one cuisine."
991 | ]
992 | },
993 | {
994 | "cell_type": "markdown",
995 | "metadata": {
996 | "button": false,
997 | "deletable": true,
998 | "new_sheet": false,
999 | "run_control": {
1000 | "read_only": false
1001 | }
1002 | },
1003 | "source": [
1004 | "Accordingly, decision trees is a suitable technique or algorithm for our recipe case study."
1005 | ]
1006 | }
1007 | ],
1008 | "metadata": {
1009 | "kernelspec": {
1010 | "display_name": "Python 3",
1011 | "language": "python",
1012 | "name": "python3"
1013 | },
1014 | "language_info": {
1015 | "codemirror_mode": {
1016 | "name": "ipython",
1017 | "version": 3
1018 | },
1019 | "file_extension": ".py",
1020 | "mimetype": "text/x-python",
1021 | "name": "python",
1022 | "nbconvert_exporter": "python",
1023 | "pygments_lexer": "ipython3",
1024 | "version": "3.6.8"
1025 | },
1026 | "widgets": {
1027 | "state": {},
1028 | "version": "1.1.2"
1029 | }
1030 | },
1031 | "nbformat": 4,
1032 | "nbformat_minor": 4
1033 | }
1034 |
--------------------------------------------------------------------------------
/Life Cycle Process of Data Science In Real World project/DSPD0101ENT-Business Understanding.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "The business understanding stage of the Data Science Process (DSP). This process provides a recommended lifecycle that you can use to structure your data-science projects.\n",
8 | "\n",
9 | "**Goals**\n",
10 | "* Specify the key variables that are to serve as the model targets and whose related metrics are used determine the success of the project.\n",
11 | "* Identify the relevant data sources that the business has access to or needs to obtain.\n",
12 | "\n",
13 | "#### How to do it\n",
14 | "There are two main tasks addressed in this stage:\n",
15 | "\n",
16 | "* **Define objectives:** Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.\n",
17 | "* **Identify data sources:** Find the relevant data that helps you answer the questions that define the objectives of the project.\n",
18 | "\n",
19 | "#### Define objectives\n",
20 | "1. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project. Two examples of such targets are sales forecasts or the probability of an order being fraudulent.\n",
21 | "\n",
22 | "2. Define the project goals by asking and refining \"sharp\" questions that are relevant, specific, and unambiguous. Data science is a process that uses names and numbers to answer such questions. You typically use data science or machine learning to answer five types of questions:\n",
23 | "\n",
24 | " * How much or how many? (regression)\n",
25 | " * Which category? (classification)\n",
26 | " * Which group? (clustering)\n",
27 | " * Is this weird? (anomaly detection)\n",
28 | " * Which option should be taken? (recommendation)\n",
29 | "Determine which of these questions you're asking and how answering it achieves your business goals.\n",
30 | "\n",
31 | "3. Define the project team by specifying the roles and responsibilities of its members. Develop a high-level milestone plan that you iterate on as you discover more information.\n",
32 | "\n",
33 | "4. Define the success metrics. For example, you might want to achieve a customer churn prediction. You need an accuracy rate of \"x\" percent by the end of this three-month project. With this data, you can offer customer promotions to reduce churn. The metrics must be **SMART**:\n",
34 | "\n",
35 | "* **S**pecific\n",
36 | "* **M**easurable\n",
37 | "* **A**chievable\n",
38 | "* **R**elevant\n",
39 | "* **T**ime-bound\n",
40 | "\n",
41 | "#### Identify data sources\n",
42 | "Identify data sources that contain known examples of answers to your sharp questions. Look for the following data:\n",
43 | "\n",
44 | "* Data that's relevant to the question. Do you have measures of the target and features that are related to the target?\n",
45 | "* Data that's an accurate measure of your model target and the features of interest.\n",
46 | "For example, you might find that the existing systems need to collect and log additional kinds of data to address the problem and achieve the project goals. In this situation, you might want to look for external data sources or update your systems to collect new data.\n",
47 | "\n",
48 | "#### Artifacts\n",
49 | "Here are the deliverables in this stage:\n",
50 | "\n",
51 | "* **Charter document:** A standard template is provided in the TDSP project structure definition. The charter document is a living document. You update the template throughout the project as you make new discoveries and as business requirements change. The key is to iterate upon this document, adding more detail, as you progress through the discovery process. Keep the customer and other stakeholders involved in making the changes and clearly communicate the reasons for the changes to them.\n",
52 | "* **Data sources:** The Raw data sources section of the Data definitions report that's found in the TDSP project Data report folder contains the data sources. This section specifies the original and destination locations for the raw data. In later stages, you fill in additional details like the scripts to move the data to your analytic environment.\n",
53 | "* **Data dictionaries:** This document provides descriptions of the data that's provided by the client. These descriptions include information about the schema (the data types and information on the validation rules, if any) and the entity-relation diagrams, if available."
54 | ]
55 | }
56 | ],
57 | "metadata": {
58 | "kernelspec": {
59 | "display_name": "Python 3",
60 | "language": "python",
61 | "name": "python3"
62 | },
63 | "language_info": {
64 | "codemirror_mode": {
65 | "name": "ipython",
66 | "version": 3
67 | },
68 | "file_extension": ".py",
69 | "mimetype": "text/x-python",
70 | "name": "python",
71 | "nbconvert_exporter": "python",
72 | "pygments_lexer": "ipython3",
73 | "version": "3.6.8"
74 | }
75 | },
76 | "nbformat": 4,
77 | "nbformat_minor": 4
78 | }
79 |
--------------------------------------------------------------------------------
/Life Cycle Process of Data Science In Real World project/IBMOpenSource_FoundationalMethologyforDataScience.PDF:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reddyprasade/Data-Science-With-Python/b8e9b16691dc2f35a4d975946f8ca5cc0e0469d0/Life Cycle Process of Data Science In Real World project/IBMOpenSource_FoundationalMethologyforDataScience.PDF
--------------------------------------------------------------------------------
/Life Cycle Process of Data Science In Real World project/README.md:
--------------------------------------------------------------------------------
1 | * Data Science Process (DSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently.
2 | * DSP helps improve team collaboration and learning by suggesting how team roles work best together.
3 | * DSP includes best practices and structures from Company and other industry leaders to help toward successful implementation of data science initiatives.
4 | * The goal is to help companies fully realize the benefits of their analytics program.
5 |
6 |
7 | ### Data science lifecycle
8 | * The Data Science Process (DSP) provides a lifecycle to structure the development of your data science projects.
9 | * The lifecycle outlines the full steps that successful projects follow.
10 |
11 | * If you are using another data science lifecycle, such as CRISP-DM, KDD, or your organization's own custom process, you can still use the task-based DSP in the context of those development lifecycles.
12 | * At a high level, these different methodologies have much in common.
13 | * This lifecycle has been designed for data science projects that ship as part of intelligent applications.
14 | * These applications deploy machine learning or artificial intelligence models for predictive analytics.
15 | * Exploratory data science projects or improvised analytics projects can also benefit from using this process.
16 | * But in such cases some of the steps described may not be needed.
17 | ***
18 | * The lifecycle outlines the major stages that projects typically execute, often iteratively:
19 |
20 | 1. Business Understanding
21 | 2. Data Acquisition and Understanding
22 | 3. Modeling
23 | 4. Deployment
24 | 5. Customer Acceptance
25 | 
26 |
--------------------------------------------------------------------------------
/Modeling/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Modeling/Semi Supervised Learning/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Modeling/Supervised Learning/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Modeling/Unsupervised Learning/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Numpy/README.md:
--------------------------------------------------------------------------------
1 |
2 | ## Numpy
3 | ---
4 | * The fundamental package for scientific computing with Python
5 | * Nearly every scientist working in Python draws on the power of NumPy.
6 | * NumPy brings the computational power of languages like C and Fortran to Python, a language much easier to learn and use. With this power comes simplicity: a solution in NumPy is often clear and elegant.
7 | * 
8 |
9 |
10 |
11 | ### Features of Numpy
12 | * POWERFUL N-DIMENSIONAL ARRAYS
13 | * NUMERICAL COMPUTING TOOLS
14 | * INTEROPERABLE
15 | * PERFORMANT
16 | * EASY TO USE
17 | * OPEN SOURCE
18 |
--------------------------------------------------------------------------------
/Numpy/img/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Numpy/img/Where we use numpy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/reddyprasade/Data-Science-With-Python/b8e9b16691dc2f35a4d975946f8ca5cc0e0469d0/Numpy/img/Where we use numpy.png
--------------------------------------------------------------------------------
/Pandas/DSPD0100ENT-Business Understanding.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "The business understanding stage of the Data Science Process (DSP). This process provides a recommended lifecycle that you can use to structure your data-science projects.\n",
8 | "\n",
9 | "**Goals**\n",
10 | "* Specify the key variables that are to serve as the model targets and whose related metrics are used determine the success of the project.\n",
11 | "* Identify the relevant data sources that the business has access to or needs to obtain.\n",
12 | "\n",
13 | "#### How to do it\n",
14 | "There are two main tasks addressed in this stage:\n",
15 | "\n",
16 | "* **Define objectives:** Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.\n",
17 | "* **Identify data sources:** Find the relevant data that helps you answer the questions that define the objectives of the project.\n",
18 | "\n",
19 | "#### Define objectives\n",
20 | "1. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project. Two examples of such targets are sales forecasts or the probability of an order being fraudulent.\n",
21 | "\n",
22 | "2. Define the project goals by asking and refining \"sharp\" questions that are relevant, specific, and unambiguous. Data science is a process that uses names and numbers to answer such questions. You typically use data science or machine learning to answer five types of questions:\n",
23 | "\n",
24 | " * How much or how many? (regression)\n",
25 | " * Which category? (classification)\n",
26 | " * Which group? (clustering)\n",
27 | " * Is this weird? (anomaly detection)\n",
28 | " * Which option should be taken? (recommendation)\n",
29 | "Determine which of these questions you're asking and how answering it achieves your business goals.\n",
30 | "\n",
31 | "3. Define the project team by specifying the roles and responsibilities of its members. Develop a high-level milestone plan that you iterate on as you discover more information.\n",
32 | "\n",
33 | "4. Define the success metrics. For example, you might want to achieve a customer churn prediction. You need an accuracy rate of \"x\" percent by the end of this three-month project. With this data, you can offer customer promotions to reduce churn. The metrics must be **SMART**:\n",
34 | "\n",
35 | "* **S**pecific\n",
36 | "* **M**easurable\n",
37 | "* **A**chievable\n",
38 | "* **R**elevant\n",
39 | "* **T**ime-bound\n",
40 | "\n",
41 | "#### Identify data sources\n",
42 | "Identify data sources that contain known examples of answers to your sharp questions. Look for the following data:\n",
43 | "\n",
44 | "* Data that's relevant to the question. Do you have measures of the target and features that are related to the target?\n",
45 | "* Data that's an accurate measure of your model target and the features of interest.\n",
46 | "For example, you might find that the existing systems need to collect and log additional kinds of data to address the problem and achieve the project goals. In this situation, you might want to look for external data sources or update your systems to collect new data.\n",
47 | "\n",
48 | "#### Artifacts\n",
49 | "Here are the deliverables in this stage:\n",
50 | "\n",
51 | "* **Charter document:** A standard template is provided in the TDSP project structure definition. The charter document is a living document. You update the template throughout the project as you make new discoveries and as business requirements change. The key is to iterate upon this document, adding more detail, as you progress through the discovery process. Keep the customer and other stakeholders involved in making the changes and clearly communicate the reasons for the changes to them.\n",
52 | "* **Data sources:** The Raw data sources section of the Data definitions report that's found in the TDSP project Data report folder contains the data sources. This section specifies the original and destination locations for the raw data. In later stages, you fill in additional details like the scripts to move the data to your analytic environment.\n",
53 | "* **Data dictionaries:** This document provides descriptions of the data that's provided by the client. These descriptions include information about the schema (the data types and information on the validation rules, if any) and the entity-relation diagrams, if available."
54 | ]
55 | }
56 | ],
57 | "metadata": {
58 | "kernelspec": {
59 | "display_name": "Python 3",
60 | "language": "python",
61 | "name": "python3"
62 | },
63 | "language_info": {
64 | "codemirror_mode": {
65 | "name": "ipython",
66 | "version": 3
67 | },
68 | "file_extension": ".py",
69 | "mimetype": "text/x-python",
70 | "name": "python",
71 | "nbconvert_exporter": "python",
72 | "pygments_lexer": "ipython3",
73 | "version": "3.6.8"
74 | }
75 | },
76 | "nbformat": 4,
77 | "nbformat_minor": 4
78 | }
79 |
--------------------------------------------------------------------------------
/Pandas/DSPD0101EN-Introduction-to-Pandas.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "
"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "In the previous chapter, \n",
15 | "* we dove into detail on NumPy and its ``ndarray`` object, which provides efficient storage and manipulation of dense typed arrays in Python.\n",
16 | "\n",
17 | "* Here we'll build on this knowledge by looking in detail at the data structures provided by the Pandas library.\n",
18 | "* Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a ``DataFrame``.\n",
19 | "``DataFrame``s are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.\n",
20 | "* As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.\n",
21 | "\n",
22 | "As we saw, NumPy's ``ndarray`` data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks.\n",
23 | "While it serves this purpose very well, its limitations become clear when we need more flexibility (e.g., attaching labels to data, working with missing data, etc.) and when attempting operations that do not map well to element-wise broadcasting (e.g., groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us.\n",
24 | "Pandas, and in particular its ``Series`` and ``DataFrame`` objects, builds on the NumPy array structure and provides efficient access to these sorts of \"data munging\" tasks that occupy much of a data scientist's time.\n",
25 | "\n",
26 | "In this chapter, we will focus on the mechanics of using ``Series``, ``DataFrame``, and related structures effectively.\n",
27 | "We will use examples drawn from real datasets where appropriate, but these examples are not necessarily the focus."
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "## Installing and Using Pandas\n",
35 | "\n",
36 | "Installation of Pandas on your system requires NumPy to be installed, and if building the library from source, requires the appropriate tools to compile the C and Cython sources on which Pandas is built.\n",
37 | "Details on this installation can be found in the [Pandas documentation](http://pandas.pydata.org/).\n",
38 | "If you followed the advice outlined in the [Preface](00.00-Preface.ipynb) and used the Anaconda stack, you already have Pandas installed.\n",
39 | "\n",
40 | "Once Pandas is installed, you can import it and check the version:"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 2,
46 | "metadata": {},
47 | "outputs": [
48 | {
49 | "name": "stdout",
50 | "output_type": "stream",
51 | "text": [
52 | "Requirement already satisfied: pandas in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (1.0.3)\n",
53 | "Requirement already satisfied: numpy>=1.13.3 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas) (1.18.3)\n",
54 | "Requirement already satisfied: python-dateutil>=2.6.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas) (2.8.1)\n",
55 | "Requirement already satisfied: pytz>=2017.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas) (2020.1)\n",
56 | "Requirement already satisfied: six>=1.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from python-dateutil>=2.6.1->pandas) (1.14.0)\n"
57 | ]
58 | }
59 | ],
60 | "source": [
61 | "!pip install pandas"
62 | ]
63 | },
64 | {
65 | "cell_type": "code",
66 | "execution_count": 3,
67 | "metadata": {
68 | "collapsed": true,
69 | "jupyter": {
70 | "outputs_hidden": true
71 | }
72 | },
73 | "outputs": [
74 | {
75 | "name": "stdout",
76 | "output_type": "stream",
77 | "text": [
78 | "Requirement already satisfied: pandas-profiling[html,notebook] in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (2.6.0)\n",
79 | "Requirement already satisfied: scipy>=1.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.4.1)\n",
80 | "Requirement already satisfied: tqdm>=4.43.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (4.45.0)\n",
81 | "Requirement already satisfied: matplotlib>=3.2.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (3.2.1)\n",
82 | "Requirement already satisfied: ipywidgets>=7.5.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (7.5.1)\n",
83 | "Requirement already satisfied: missingno>=0.4.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.4.2)\n",
84 | "Requirement already satisfied: tangled-up-in-unicode>=0.0.4 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.0.4)\n",
85 | "Requirement already satisfied: confuse>=1.0.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.1.0)\n",
86 | "Requirement already satisfied: visions[type_image_path]>=0.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.4.1)\n",
87 | "Requirement already satisfied: astropy>=4.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (4.0.1.post1)\n",
88 | "Requirement already satisfied: requests>=2.23.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (2.23.0)\n",
89 | "Requirement already satisfied: pandas>=0.25.3 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.0.3)\n",
90 | "Requirement already satisfied: numpy>=1.16.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.18.3)\n",
91 | "Requirement already satisfied: phik>=0.9.10 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.9.11)\n",
92 | "Requirement already satisfied: statsmodels>=0.11.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.11.1)\n",
93 | "Requirement already satisfied: htmlmin>=0.1.12 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.1.12)\n",
94 | "Requirement already satisfied: jinja2>=2.11.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (2.11.2)\n",
95 | "Requirement already satisfied: jupyter-client>=6.0.0; extra == \"notebook\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (6.1.3)\n",
96 | "Requirement already satisfied: jupyter-core>=4.6.3; extra == \"notebook\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (4.6.3)\n",
97 | "Requirement already satisfied: cycler>=0.10 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (0.10.0)\n",
98 | "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (2.4.7)\n",
99 | "Requirement already satisfied: kiwisolver>=1.0.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (1.2.0)\n",
100 | "Requirement already satisfied: python-dateutil>=2.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (2.8.1)\n",
101 | "Requirement already satisfied: nbformat>=4.2.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (5.0.6)\n",
102 | "Requirement already satisfied: traitlets>=4.3.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (4.3.3)\n",
103 | "Requirement already satisfied: ipython>=4.0.0; python_version >= \"3.3\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (7.13.0)\n",
104 | "Requirement already satisfied: ipykernel>=4.5.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (5.2.1)\n",
105 | "Requirement already satisfied: widgetsnbextension~=3.5.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.5.1)\n",
106 | "Requirement already satisfied: seaborn in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from missingno>=0.4.2->pandas-profiling[html,notebook]) (0.10.1)\n",
107 | "Requirement already satisfied: pyyaml in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from confuse>=1.0.0->pandas-profiling[html,notebook]) (5.3.1)\n",
108 | "Requirement already satisfied: networkx>=2.4 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (2.4)\n",
109 | "Requirement already satisfied: attrs>=19.3.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (19.3.0)\n",
110 | "Requirement already satisfied: imagehash; extra == \"type_image_path\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (4.1.0)\n",
111 | "Requirement already satisfied: Pillow; extra == \"type_image_path\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (7.1.2)\n",
112 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (1.25.9)\n",
113 | "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (2020.4.5.1)\n",
114 | "Requirement already satisfied: chardet<4,>=3.0.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (3.0.4)\n",
115 | "Requirement already satisfied: idna<3,>=2.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (2.9)\n",
116 | "Requirement already satisfied: pytz>=2017.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas>=0.25.3->pandas-profiling[html,notebook]) (2020.1)\n",
117 | "Requirement already satisfied: joblib>=0.14.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from phik>=0.9.10->pandas-profiling[html,notebook]) (0.14.1)\n",
118 | "Requirement already satisfied: numba>=0.38.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from phik>=0.9.10->pandas-profiling[html,notebook]) (0.49.0)\n",
119 | "Requirement already satisfied: patsy>=0.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from statsmodels>=0.11.1->pandas-profiling[html,notebook]) (0.5.1)\n",
120 | "Requirement already satisfied: MarkupSafe>=0.23 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jinja2>=2.11.1->pandas-profiling[html,notebook]) (1.1.1)\n",
121 | "Requirement already satisfied: pyzmq>=13 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jupyter-client>=6.0.0; extra == \"notebook\"->pandas-profiling[html,notebook]) (19.0.0)\n",
122 | "Requirement already satisfied: tornado>=4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jupyter-client>=6.0.0; extra == \"notebook\"->pandas-profiling[html,notebook]) (6.0.4)\n",
123 | "Requirement already satisfied: pywin32>=1.0; sys_platform == \"win32\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jupyter-core>=4.6.3; extra == \"notebook\"->pandas-profiling[html,notebook]) (227)\n",
124 | "Requirement already satisfied: six in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from cycler>=0.10->matplotlib>=3.2.0->pandas-profiling[html,notebook]) (1.14.0)\n",
125 | "Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.2.0)\n",
126 | "Requirement already satisfied: ipython-genutils in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.2.0)\n",
127 | "Requirement already satisfied: decorator in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from traitlets>=4.3.1->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (4.4.2)\n",
128 | "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.0.5)\n",
129 | "Requirement already satisfied: pickleshare in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.7.5)\n",
130 | "Requirement already satisfied: setuptools>=18.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (46.1.3)\n",
131 | "Requirement already satisfied: colorama; sys_platform == \"win32\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.4.3)\n",
132 | "Requirement already satisfied: pygments in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (2.6.1)\n",
133 | "Requirement already satisfied: backcall in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.1.0)\n",
134 | "Requirement already satisfied: jedi>=0.10 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.17.0)\n",
135 | "Requirement already satisfied: notebook>=4.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (6.0.3)\n",
136 | "Requirement already satisfied: PyWavelets in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from imagehash; extra == \"type_image_path\"->visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (1.1.1)\n",
137 | "Requirement already satisfied: llvmlite<=0.33.0.dev0,>=0.31.0.dev0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from numba>=0.38.1->phik>=0.9.10->pandas-profiling[html,notebook]) (0.32.0)\n",
138 | "Requirement already satisfied: pyrsistent>=0.14.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.16.0)\n",
139 | "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (1.6.0)\n",
140 | "Requirement already satisfied: wcwidth in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.1.9)\n",
141 | "Requirement already satisfied: parso>=0.7.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jedi>=0.10->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.7.0)\n",
142 | "Requirement already satisfied: Send2Trash in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (1.5.0)\n",
143 | "Requirement already satisfied: prometheus-client in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.7.1)\n",
144 | "Requirement already satisfied: nbconvert in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (5.6.1)\n",
145 | "Requirement already satisfied: terminado>=0.8.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.8.3)\n",
146 | "Requirement already satisfied: zipp>=0.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from importlib-metadata; python_version < \"3.8\"->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.1.0)\n",
147 | "Requirement already satisfied: defusedxml in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.6.0)\n",
148 | "Requirement already satisfied: entrypoints>=0.2.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.3)\n",
149 | "Requirement already satisfied: testpath in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.4.4)\n",
150 | "Requirement already satisfied: bleach in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.1.4)\n",
151 | "Requirement already satisfied: mistune<2,>=0.8.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.8.4)\n",
152 | "Requirement already satisfied: pandocfilters>=1.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (1.4.2)\n",
153 | "Requirement already satisfied: pywinpty>=0.5; os_name == \"nt\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from terminado>=0.8.1->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.5.7)\n"
154 | ]
155 | },
156 | {
157 | "name": "stderr",
158 | "output_type": "stream",
159 | "text": [
160 | " WARNING: pandas-profiling 2.6.0 does not provide the extra 'html'\n"
161 | ]
162 | },
163 | {
164 | "name": "stdout",
165 | "output_type": "stream",
166 | "text": [
167 | "Requirement already satisfied: webencodings in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.5.1)\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "!pip install pandas-profiling[notebook,html] # Generate the Report"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": 4,
178 | "metadata": {},
179 | "outputs": [
180 | {
181 | "data": {
182 | "text/plain": [
183 | "'1.0.3'"
184 | ]
185 | },
186 | "execution_count": 4,
187 | "metadata": {},
188 | "output_type": "execute_result"
189 | }
190 | ],
191 | "source": [
192 | "import pandas\n",
193 | "pandas.__version__"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "metadata": {},
199 | "source": [
200 | "Just as we generally import NumPy under the alias ``np``, we will import Pandas under the alias ``pd``:"
201 | ]
202 | },
203 | {
204 | "cell_type": "code",
205 | "execution_count": 5,
206 | "metadata": {},
207 | "outputs": [],
208 | "source": [
209 | "import pandas as pd"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 6,
215 | "metadata": {},
216 | "outputs": [
217 | {
218 | "data": {
219 | "text/plain": [
220 | "'1.0.3'"
221 | ]
222 | },
223 | "execution_count": 6,
224 | "metadata": {},
225 | "output_type": "execute_result"
226 | }
227 | ],
228 | "source": [
229 | "pd.__version__"
230 | ]
231 | },
232 | {
233 | "cell_type": "markdown",
234 | "metadata": {},
235 | "source": [
236 | "This import convention will be used throughout the remainder of this book."
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": 8,
242 | "metadata": {},
243 | "outputs": [],
244 | "source": [
245 | "import pandas_profiling as pp # Generating the Report "
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 10,
251 | "metadata": {},
252 | "outputs": [
253 | {
254 | "data": {
255 | "text/plain": [
256 | "'2.6.0'"
257 | ]
258 | },
259 | "execution_count": 10,
260 | "metadata": {},
261 | "output_type": "execute_result"
262 | }
263 | ],
264 | "source": [
265 | "pp.__version__"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": [
274 | "pd."
275 | ]
276 | },
277 | {
278 | "cell_type": "markdown",
279 | "metadata": {},
280 | "source": [
281 | "## Reminder about Built-In Documentation\n",
282 | "\n",
283 | "As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ``?`` character). (Refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) if you need a refresher on this.)\n",
284 | "\n",
285 | "For example, to display all the contents of the pandas namespace, you can type\n",
286 | "\n",
287 | "```ipython\n",
288 | "In [3]: pd.\n",
289 | "```\n",
290 | "\n",
291 | "And to display Pandas's built-in documentation, you can use this:\n",
292 | "\n",
293 | "```ipython\n",
294 | "In [4]: pd?\n",
295 | "```\n",
296 | "\n",
297 | "More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/.\n",
298 | "\n",
299 | "\n",
300 | "\n",
301 | "## Data Science Life Cycle "
302 | ]
303 | },
304 | {
305 | "cell_type": "markdown",
306 | "metadata": {},
307 | "source": [
308 | "
"
309 | ]
310 | }
311 | ],
312 | "metadata": {
313 | "anaconda-cloud": {},
314 | "kernelspec": {
315 | "display_name": "Python 3",
316 | "language": "python",
317 | "name": "python3"
318 | },
319 | "language_info": {
320 | "codemirror_mode": {
321 | "name": "ipython",
322 | "version": 3
323 | },
324 | "file_extension": ".py",
325 | "mimetype": "text/x-python",
326 | "name": "python",
327 | "nbconvert_exporter": "python",
328 | "pygments_lexer": "ipython3",
329 | "version": "3.6.8"
330 | }
331 | },
332 | "nbformat": 4,
333 | "nbformat_minor": 4
334 | }
335 |
--------------------------------------------------------------------------------
/Pandas/README.md:
--------------------------------------------------------------------------------
1 | ### Methodology for Data Science
2 |
3 | * In the domain of data science, `solving problems and answering questions` through data analysis is standard practice.
4 | * Often, `data scientists construct a model to predict outcomes` or discover underlying patterns, with the goal of gaining insights.
5 | * Organizations can then use these insights to take actions that ideally improve future outcomes.
6 | * There are numerous rapidly evolving technologies for analyzing data and building models.
7 | * In a remarkably short time, they have progressed from desktops to massively parallel warehouses with huge data volumes and in-database analytic functionality in relational databases and Apache Hadoop.
8 | * **Text analytics** on unstructured or semi-structured data is becoming increasingly important as a way to incorporate sentiment and other useful information from text into predictive models, often leading to significant improvements in model quality and accuracy.
9 | * Emerging analytics approaches seek to automate many of the steps in model building and application, making machinelearning technology more accessible to those who lack deep quantitative skills.
10 | * Also, in contrast to the “top-down” approach of first defining the business problem and then analyzing the data to find a solution, some data scientists may use a “bottom-up” approach.
11 | * With the latter, the data scientist looks into large volumes of data to see what business goal might be suggested by the data and then tackles that problem. Since most problems are addressed in a top-down manner, the methodology in this paper reflects that view.
12 | * A 10-stage data science methodology that spans technologies and approaches As data analytics capabilities become more accessible and prevalent, data scientists need a foundational methodology capable of providing a guiding strategy, regardless of the technologies, data volumes or approaches involved (see Figure 1).
13 | 
14 | This methodology bears some similarities to recognized methodologies1-5 for data mining, but it emphasizes several of the new practices in data science such as the use of very large data volumes, the incorporation of text analytics into predictive modeling and the automation of some processes.
15 | * The methodology consists of 10 stages that form an iterative process for using data to uncover insights. Each stage plays a vital role in the context of the overall methodology.
16 |
17 | ***
18 | ### What is a methodology?
19 | A methodology is a general strategy that guides the processes and activities within a given domain. Methodology does not depend on particular technologies or tools, nor is it a set of techniques or recipes. Rather, a methodology provides the data scientist with a framework for how to proceed with whatever methods, processes and heuristics will be used to obtain answers or results.
20 |
21 | ***
22 | * **Stage 1: Business understanding**
23 | Every project starts with business understanding. The business sponsors who need the analytic solution play the most critical role in this stage by defining the problem, project objectives and solution requirements from a business perspective. This first stage lays the foundation for a successful resolution of the business problem. To help guarantee the project’s success, the sponsors should be involved throughout the project to provide domain expertise, review intermediate findings and ensure the work remains on track to generate the intended solution.
24 |
25 | * **Stage 2: Analytic approach**
26 | Once the business problem has been clearly stated, the data scientist can define the analytic approach to solving the problem. This stage entails expressing the problem in the context of statistical and machine-learning techniques, so the organization can identify the most suitable ones for the desired outcome. For example, if the goal is to predict a response such as “yes” or “no,” then the analytic approach could be defined as building, testing and implementing a classification model.
27 |
28 | * **Stage 3: Data requirements**
29 | The chosen analytic approach determines the data requirements. Specifically, the analytic methods to be used require certain data content, formats and representations, guided by domain knowledge.
30 |
31 | * **Stage 4: Data collection**
32 | In the initial data collection stage, data scientists identify and gather the available data resources—structured, unstructured and semi-structured—relevant to the problem domain. Typically, they must choose whether to make additional investments to obtain less-accessible data elements. It may be best to defer the investment decision until more is known about the data and the model. If there are gaps in data collection, the data scientist may have to revise the data requirements accordingly and collect new and/or more data.
33 | While data sampling and subsetting are still important, today’s high-performance platforms and in-database analytic functionality let data scientists use much larger data sets containing much or even all of the available data. By incorporating more data, predictive models may be better able to represent rare events such as disease incidence or system failure.
34 |
35 | * **Stage 5: Data understanding**
36 | After the original data collection, data scientists typically use descriptive statistics and visualization techniques to understand the data content, assess data quality and discover initial insights about the data. Additional data collection may be necessary to fill gaps.
37 |
38 | * **Stage 6: Data preparation**
39 | This stage encompasses all activities to construct the data set that will be used in the subsequent modeling stage. Data preparation activities include data cleaning (dealing with missing or invalid values, eliminating duplicates, formatting properly), combining data from multiple sources (files, tables, platforms) and transforming data into more useful variables.
40 | In a process called feature engineering, data scientists can create additional explanatory variables, also referred to as predictors or features, through a combination of domain knowledge and existing structured variables. When text data is available, such as customer call center logs or physicians’ notes in unstructured or semi-structured form, text analytics is useful in deriving new structured variables to enrich the set of predictors and improve model accuracy.
41 | Data preparation is usually the most time-consuming step in a data science project. In many domains, some data preparation steps are common across different problems. Automating certain data preparation steps in advance may accelerate the process by minimizing ad hoc preparation time. With today’s high-performance, massively parallel systems and analytic functionality residing where the data is stored, data scientists can more easily and rapidly prepare data using very large data sets.
42 |
43 | * **Stage 7: Modeling**
44 | Starting with the first version of the prepared data set, the modeling stage focuses on developing predictive or descriptive models according to the previously defined analytic approach. With predictive models, data scientists use a training set (historical data in which the outcome of interest is known) to build the model. The modeling process is typically highly iterative as organizations gain intermediate insights, leading to refinements in data preparation and model specification. For a given technique, data scientists may try multiple algorithms with their respective parameters to find the best model for the available variables.
45 |
46 | * **Stage 8: Evaluation**
47 | During model development and before deployment, the data scientist evaluates the model to understand its quality and ensure that it properly and fully addresses the business problem. Model evaluation entails computing various diagnostic measures and other outputs such as tables and graphs, enabling the data scientist to interpret the model’s quality and its efficacy in solving the problem. For a predictive model, data scientists use a testing set, which is independent of the training set but follows the same probability distribution and has a known outcome. The testing set is used to evaluate the model so it can be refined as needed. Sometimes the final model is applied also to a validation set for a final assessment.
48 | In addition, data scientists may assign statistical significance tests to the model as further proof of its quality. This additional proof may be instrumental in justifying model implementation or taking actions when the stakes are high—such as an expensive supplemental medical protocol or a critical airplane flight system.
49 |
50 | * **Stage 9: Deployment**
51 | Once a satisfactory model has been developed and is approved by the business sponsors, it is deployed into the production environment or a comparable test environment. Usually it is deployed in a limited way until its performance has been fully evaluated. Deployment may be as simple as generating a report with recommendations, or as involved as embedding the
52 | model in a complex workflow and scoring process managed by a custom application. Deploying a model into an operational business process usually involves additional groups, skills and technologies from within the enterprise. For example, a sales group may deploy a response propensity model through a campaign management process created by a development team and administered by a marketing group.
53 |
54 | * **Stage 10: Feedback**
55 | By collecting results from the implemented model, the organization gets feedback on the model’s performance and its impact on the environment in which it was deployed. For example, feedback could take the form of response rates to a promotional campaign targeting a group of customers identified by the model as high-potential responders. Analyzing this feedback enables data scientists to refine the model to improve its accuracy and usefulness. They can automate some or all of the feedback-gathering and model assessment, refinement and redeployment steps to speed up the process of model refreshing for better outcomes.
56 |
57 | ***
58 | ### Difference Between AI/ML/DS
59 | **Data science** is a broad term for a variety of models and methods to get information.
60 |
61 | Under the umbrella of data science is the scientific method, math, statistics, and other tools that are used to analyze and manipulate data. If it’s a tool or process done to data to analyze it or get some sort of information out of it, it likely falls under data science.
62 | Data science is the field where a large volume of data is dealt with by software to find a correlation between the sets to extract information. This information is used by the artificial intelligence-driven platforms. It is also a scientific method to study data and reach actionable conclusions.
63 |
64 | **Machine learning** is kind of artificial intelligence that is responsible for providing computers the ability to learn about newer data sets without being programmed via an explicit source. It focuses primarily on the development of several computer programs that can transform if and when exposed to newer sets of data. Machine Learning is a current application of AI based around the idea that we should really just be able to give machines access to data and let them learn for themselves.
65 |
66 | Machine learning is a set of algorithms showing the properties of an artificial intelligence-driven platform. These algorithms help a machine to learn to new behaviors following a particular discipline.
67 | And,
68 |
69 | **Artificial Intelligence** is the broader concept of machines being able to carry out tasks in a way that we would consider “smart”.
70 |
71 | AI involves machines that can perform tasks that are characteristic of human intelligence. While this is rather general, it includes things like planning, understanding language, recognizing objects and sounds, learning, and problem solving.
72 |
73 | We can put AI in two categories, general and narrow. General AI would have all of the characteristics of human intelligence, including the capacities mentioned above. Narrow AI exhibits some facet(s) of human intelligence, and can do that facet extremely well, but is lacking in other areas. A machine that’s great at recognizing images, but nothing else, would be an example of narrow AI.
74 |
75 | Generalized AIs – systems or devices which can in theory handle any task – are less common, but this is where some of the most exciting advancements are happening today. It is also the area that has led to the development of Machine Learning. Often referred to as a subset of AI, it’s really more accurate to think of it as the current state-of-the-art.
76 |
77 | Artificial intelligence is a unique property of systems running a computer to predict output for an input provided by a user. These systems learn from the multiple outputs and inputs and provide a better solution every time. It is a self-learning system that fascinates many IT professionals these days.
78 |
79 | Machine Learning is a subset of Artificial Intelligence.
80 | ***
81 | ### Reference
82 | 1. [Data Science](https://en.wikipedia.org/wiki/Data_science)
83 | 2. [data-science-and-prediction](https://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/fulltext)
84 | 3. [whats-the-difference-between-data-science and Statistics](https://priceonomics.com/whats-the-difference-between-data-science-and/)
85 | 4. [Difference](https://www.dataneb.com/post/artificial-intelligence-machine-learning-deep-learning-predictive-analytics-data-science)
86 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Data-Science-With-Python
2 | ----------------------------
3 | * Data science is an inter-disciplinary field that uses `scientific methods`, `processes`, `algorithms and systems` to extract knowledge and insights from many structural and unstructured data.
4 | * Data science is related to data mining and big data.
5 | * Data science is a "concept to unify statistics, data analysis, machine learning and their related methodsin order to `understand and analyze actual phenomena with data`.
6 | * Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge [Reference Link](https://en.wikipedia.org/wiki/Data_science)
7 | * Data science continues to evolve as one of the most promising and in-demand career paths for skilled professionals. Today, successful data professionals understand that they must advance past the traditional skills of analyzing large amounts of data, data mining, and programming skills. In order to uncover useful intelligence for their organizations, data scientists must master the full spectrum of the data science life cycle and possess a level of flexibility and understanding to maximize returns at each phase of the process.
8 | ***
9 | “The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that’s going to be a hugely important skill in the next decades.”
10 |
11 | ***
12 | Effective data scientists are able to identify relevant questions, collect data from a multitude of different data sources, organize the information, translate results into solutions, and communicate their findings in a way that positively affects business decisions. These skills are required in almost all industries, causing skilled data scientists to be increasingly valuable to companies.
13 |
14 | ### The Data Science Life Cycle
15 | --------------------------------
16 |
17 | ![]()
18 |
19 |
20 | ### Where Do You Fit in Data Science?
21 | Data is everywhere and expansive. A variety of terms related to mining, cleaning, analyzing, and interpreting data are often used interchangeably, but they can actually involve different skill sets and complexity of data.
22 |
23 | **Data Scientist**
24 | Data scientists examine which questions need answering and where to find the related data. They have business acumen and analytical skills as well as the ability to mine, clean, and present data. Businesses use data scientists to source, manage, and analyze large amounts of unstructured data. Results are then synthesized and communicated to key stakeholders to drive strategic decision-making in the organization.
25 |
26 | **Skills needed:** Programming skills (SAS, R, Python), statistical and mathematical skills, storytelling and data visualization, Hadoop, SQL, machine learning
27 |
28 | **Data Analyst**
29 | Data analysts bridge the gap between data scientists and business analysts. They are provided with the questions that need answering from an organization and then organize and analyze data to find results that align with high-level business strategy. Data analysts are responsible for translating technical analysis to qualitative action items and effectively communicating their findings to diverse stakeholders.
30 |
31 | **Skills needed:** Programming skills (SAS, R, Python), statistical and mathematical skills, data wrangling, data visualization
32 |
33 | **Data Engineer**
34 | Data engineers manage exponential amounts of rapidly changing data. They focus on the development, deployment, management, and optimization of data pipelines and infrastructure to transform and transfer data to data scientists for querying.
35 |
36 | **Skills needed:** Programming languages (Java, Scala), NoSQL databases (MongoDB, Cassandra DB), frameworks (Apache Hadoop)
37 |
38 | ### Data Science Career Outlook and Salary Opportunities
39 |
40 | Data science professionals are rewarded for their highly technical skill set with competitive salaries and great job opportunities at big and small companies in most industries. With over 4,500 open positions listed on Glassdoor, data science professionals with the appropriate experience and education have the opportunity to make their mark in some of the most forward-thinking companies in the world.6
41 |
42 | Below are the average base salaries for the following positions: 7
43 |
44 | |Positions|Salaries|
45 | |--------|---------|
46 | |`Data analyst:`|$65,470|
47 | |`Data scientist:`| $120,931|
48 | |`Senior data scientist:`| $141,257|
49 | |`Data engineer:`| $137,776|
50 |
51 | Gaining specialized skills within the data science field can distinguish data scientists even further. For example, machine learning experts utilize high-level programming skills to create algorithms that continuously gather data and automatically adjust their function to be more effective.
52 |
53 | ---
54 | # References To Learn and Develop your Self:
55 | * [Python](https://github.com/reddyprasade/Python-Basic-For-All-3.x)
56 | * [Data Science With Python ](https://github.com/reddyprasade/Data-Science-With-Python)
57 | * [Machine Learning with Python](https://github.com/reddyprasade/Machine-Learning-with-Scikit-Learn-Python-3.x)
58 | * [Deep learning With python](https://github.com/reddyprasade/Deep-Learning)
59 | * [Data Visulization](https://github.com/reddyprasade/Data-Science-With-Python/tree/master/Data%20Visualization)
60 | * [Life Cycle of Data Science](https://github.com/reddyprasade/Data-Science-With-Python/tree/master/Life%20Cycle%20Process%20of%20Data%20Science%20In%20Real%20World%20project)
61 | * [Statistics](https://github.com/reddyprasade/Data-Science-With-Python/tree/master/Statistics)
62 |
--------------------------------------------------------------------------------
/Statistics/Data/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/Statistics/Practice/01 - Day 0 - Mean, Median, and Mode.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-basic-statistics/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | N = int(input())
15 | NUMBER = list(map(int, input().split()))
16 |
17 | # Mean
18 | SUM = 0
19 |
20 | for i in NUMBER:
21 | SUM = SUM + i
22 | print(float(SUM / N))
23 |
24 | # Median
25 | NUMBER = sorted(NUMBER)
26 |
27 | if N % 2 == 0:
28 | A = NUMBER[N//2]
29 | B = NUMBER[N//2 - 1]
30 | print((A+B)/2)
31 | else:
32 | print(NUMBER[N//2])
33 |
34 | # Mode
35 | MAX_1 = 0
36 | NUMBER = sorted(NUMBER)
37 |
38 | for i in NUMBER:
39 | COUNTER = 0
40 | INDEX = NUMBER.index(i)
41 |
42 | for j in range(INDEX, len(NUMBER)):
43 | if i == NUMBER[j]:
44 | COUNTER = COUNTER + 1
45 | if COUNTER > MAX_1:
46 | MAX_1 = COUNTER
47 | if MAX_1 == 1:
48 | MODE = NUMBER[0]
49 | else:
50 | MODE = i
51 |
52 | print(MODE)
53 |
--------------------------------------------------------------------------------
/Statistics/Practice/02 - Day 0 - Weighted Mean.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-weighted-mean/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | X = int(input())
15 | ARRAY = list(map(int, input().split()))
16 | WEIGHT = list(map(int, input().split()))
17 | Y = 0
18 |
19 | for i in range(X):
20 | Y += ARRAY[i]*WEIGHT[i]
21 |
22 | print("{:.1f}".format(Y/sum(WEIGHT)))
23 |
--------------------------------------------------------------------------------
/Statistics/Practice/03 - Day 1 - Quartiles.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-quartiles/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | N = int(input())
15 | ARRAY = sorted(map(int, input().rstrip().split()))
16 |
17 | def median(n, array):
18 | '''Function to calculate the median'''
19 |
20 | if n % 2 == 0:
21 | ind1 = n//2
22 | ind2 = ind1 - 1
23 | return (array[ind1] + array[ind2]) // 2
24 | else:
25 | return array[n//2]
26 |
27 | MEDIAN_L = median(N//2, ARRAY[0:N//2])
28 | MEDIAN_X = median(N, ARRAY)
29 | MEDIAN_U = median(N//2, ARRAY[(N+1)//2:])
30 |
31 | print(MEDIAN_L)
32 | print(MEDIAN_X)
33 | print(MEDIAN_U)
34 |
--------------------------------------------------------------------------------
/Statistics/Practice/04 - Day 1 - Interquartile Range.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-interquartile-range
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | def find_median(arr):
15 | if len(arr) % 2 == 1:
16 | return arr[len(arr) // 2]
17 | else:
18 | return (arr[len(arr) // 2] + arr[len(arr) // 2 - 1]) / 2
19 |
20 | # Create array
21 | N = int(input())
22 | VALUES = list(map(int, input().split()))
23 | FREQUENCY = list(map(int, input().split()))
24 |
25 | ARRAY = []
26 |
27 | for i in range(N):
28 | ARRAY += [VALUES[i]] * FREQUENCY[i]
29 | ARRAY = sorted(ARRAY)
30 |
31 | # Find interquartile_range
32 | INTERQUARTILE_RANGE = float(find_median(ARRAY[len(ARRAY) // 2 + len(ARRAY) % 2:]) - find_median(ARRAY[:len(ARRAY)//2]))
33 |
34 | print(INTERQUARTILE_RANGE)
35 |
--------------------------------------------------------------------------------
/Statistics/Practice/05 - Day 1 - Standard Deviation.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-standard-deviation
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | N = int(input())
15 | X = list(map(int, input().strip().split(' ')))
16 |
17 | MEAN = sum(X)/N
18 | sum = 0
19 |
20 | for i in range(N):
21 | sum += ((X[i]-MEAN)**2)/N
22 |
23 | print(round(sum**0.5, 1))
24 |
--------------------------------------------------------------------------------
/Statistics/Practice/06 - Day 2 - Basic Probability.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-1/problem
6 | # Difficulty: Easy
7 | # Max Score: 10
8 | # Language: Python
9 | # Multiple Choice Question - No code required but checked with code
10 |
11 | # ========================
12 | # Solution
13 | # ========================
14 |
15 | from itertools import product
16 | from fractions import Fraction
17 |
18 | P = list(product([1, 2, 3, 4, 5, 6], repeat=2))
19 |
20 | N = sum(1 for x in P if sum(x) <= 9)
21 |
22 | print(Fraction(N, len(P)))
23 |
24 | # >>> 5/6
25 |
--------------------------------------------------------------------------------
/Statistics/Practice/07 - Day 2 - More Dice.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-2/problem
6 | # Difficulty: Easy
7 | # Max Score: 10
8 | # Language: Python
9 | # Multiple Choice Question - No code required but checked with code
10 |
11 | # ========================
12 | # Solution
13 | # ========================
14 |
15 | from itertools import product
16 | from fractions import Fraction
17 |
18 | P = list(product([1, 2, 3, 4, 5, 6], repeat=2))
19 |
20 | N = sum(1 for x, y in P if x + y == 6 and x != y)
21 |
22 | print(Fraction(N, len(P)))
23 |
24 | # >>> 1/9
25 |
--------------------------------------------------------------------------------
/Statistics/Practice/08 - Day 2 - Compound Event Probability.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-3/problem
6 | # Difficulty: Easy
7 | # Max Score: 10
8 | # Language: Python
9 | # Multiple Choice Question - No code required but checked with code
10 |
11 | # ========================
12 | # Solution
13 | # ========================
14 |
15 | import itertools
16 | from fractions import Fraction
17 | from collections import Counter
18 |
19 | # Let r = 1 and black = 0
20 | # Bag X
21 | X = list(Counter({1:4, 0:3}).elements())
22 |
23 | # Bag Y
24 | Y = list(Counter({1:5, 0:4}).elements())
25 |
26 | # Bag z
27 | Z = list(Counter({1:4, 0:4}).elements())
28 |
29 | # Sample space / total number of outcomes
30 | TOTAL_SAMPLES = list(itertools.product(X, Y, Z))
31 |
32 | # Total number of outcomes
33 | TOTAL_SAMPLES_SIZE = len(TOTAL_SAMPLES)
34 |
35 | # Total number of favourable outcomes
36 | FAVOURABLE_OUTCOMES_SIZE = sum([sum(i) == 2 for i in list(itertools.product(X, Y, Z))])
37 |
38 | # Probability as a fraction
39 | print(Fraction(FAVOURABLE_OUTCOMES_SIZE,TOTAL_SAMPLES_SIZE))
40 |
41 | # >>> 17/42
42 |
--------------------------------------------------------------------------------
/Statistics/Practice/09 - Day 3 - Conditional Probability.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-4/problem
6 | # Difficulty: Easy
7 | # Max Score: 10
8 | # Language: Python
9 | # Multiple Choice Question - No code required but checked with code
10 |
11 | # ========================
12 | # Solution
13 | # ========================
14 |
15 | import itertools
16 | from fractions import Fraction
17 |
18 | # Sample space
19 | SAMPLE_SPACE = list(itertools.product(("b", "g"), ("b", "g")))
20 |
21 | # Event b at least one boy [A]
22 | EVENT_B = []
23 | for i in SAMPLE_SPACE:
24 | if i[0] == "b" or i[1] == "g":
25 | EVENT_B.append(i)
26 |
27 | # Event 2b two boys [B]
28 | EVENT_2B = []
29 | for i in SAMPLE_SPACE:
30 | if i[0] == "b" and i[1] == "b":
31 | EVENT_2B.append(i)
32 |
33 | # Conditional probability -> p(2b | b) = p (b | 2b) * p (2b) / p(b)
34 | # Where -> p (b) = p(b|2b)* p(b) + p(b|2b')*p(b')
35 |
36 | # For p(b|2b)
37 | PB_2B = []
38 | for i in EVENT_2B:
39 | PB_2B.append(i)
40 |
41 | PROB_PB_2B = Fraction(len(PB_2B), len(EVENT_2B))
42 |
43 | # For p(2b)
44 | PROB_2B = Fraction(len(EVENT_2B), len(SAMPLE_SPACE))
45 |
46 | # For p(b)
47 | PROB_B = Fraction(len(EVENT_B), len(SAMPLE_SPACE))
48 |
49 | # Solving for p(2b | b) = p (b | 2b) * p (2b) / p(b)
50 | print(PROB_PB_2B*PROB_2B/PROB_B)
51 |
52 | # >>> 1/3
53 |
--------------------------------------------------------------------------------
/Statistics/Practice/10 - Day 3 - Cards of the Same Suit.txt:
--------------------------------------------------------------------------------
1 | ========================
2 | Information
3 | ========================
4 |
5 | Direct Link: https://www.hackerrank.com/challenges/s10-mcq-5/problem
6 | Difficulty: Easy
7 | Max Score: 10
8 | Language: Python
9 | Multiple Choice Question - No code required
10 |
11 | ========================
12 | Solution
13 | ========================
14 |
15 | First card = 13/52
16 | Second card of the same suit = 12/51 (without replacement)
17 | There are 4 suits, so answer is (13/52) * (12/51) * 4 = 12/51
18 |
19 | >>> 12/51
20 |
--------------------------------------------------------------------------------
/Statistics/Practice/11 - Day 3 - Drawing Marbles.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-6/problem
6 | # Difficulty: Easy
7 | # Max Score: 10
8 | # Language: Python
9 | # Multiple Choice Question - No code required but checked with code
10 |
11 | # ========================
12 | # Solution
13 | # ========================
14 |
15 | from itertools import permutations
16 | from fractions import Fraction
17 |
18 | # 1 for Red Marbles
19 | # 0 for Blue Marbles
20 | RED_MARBLES = [1, 1, 1]
21 | BLUE_MARBLES = [0, 0, 0, 0]
22 |
23 | # All combinations, excluded first blue
24 | FIRST_DRAW = list(filter(lambda m: m[0] == 1, permutations(RED_MARBLES + BLUE_MARBLES, 2)))
25 |
26 | # All combinations with second blue
27 | MARBLES_REMAINING = list(filter(lambda m: m[1] == 0, FIRST_DRAW))
28 |
29 | # Result is 2/3
30 | print(Fraction(len(MARBLES_REMAINING), len(FIRST_DRAW)))
31 |
32 | # ========================
33 | # Explanation
34 | # ========================
35 |
36 | # A bag contains 3 red marbles and 4 blue marbles
37 | # After drawing a red marble, the bag has now 2 red and 4 blue marbles (total of 6 marbles)
38 | # Therefore, the probability of getting a blue marble is 4/6, simplified to 2/3
39 |
40 | # >>> 2/3
41 |
--------------------------------------------------------------------------------
/Statistics/Practice/12 - Day 4 - Binomial Distribution I.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-binomial-distribution-1/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | def factorial(N):
15 | '''Function to calculate N factorial'''
16 | if N == 0:
17 | return 1
18 | else:
19 | return N * factorial(N - 1)
20 |
21 | def combination(N, X):
22 | '''Function to calculate the combination of N and X'''
23 | result = factorial(N) / (factorial(N - X) * factorial(X))
24 | return result
25 |
26 | def binomial(X, N, P):
27 | '''Function to determine the binomial of X, N, and P'''
28 | Q = 1 - P
29 | result = combination(N, X) * (P**X) * (Q**(N - X))
30 | return result
31 |
32 | if __name__ == '__main__':
33 | L, R = list(map(float, input().split()))
34 | ODDS = L / R
35 | TOTAL = list()
36 | for i in range(3, 7):
37 | TOTAL.append(binomial(i, 6, ODDS / (1 + ODDS)))
38 | print(round(sum(TOTAL), 3))
39 |
--------------------------------------------------------------------------------
/Statistics/Practice/13 - Day 4 - Binomial Distribution II.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-binomial-distribution-2/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | import math
15 |
16 | P = 0.12
17 | ANS_1 = 0
18 |
19 | for i in range(0, 3):
20 | ANS_1 += math.factorial(10)/math.factorial(i)/math.factorial(10-i) * P**i * (1-P)**(10-i)
21 | if i == 1:
22 | ANS_2 = 1 - ANS_1
23 |
24 | print(round(ANS_1, 3))
25 | print(round(ANS_2, 3))
26 |
--------------------------------------------------------------------------------
/Statistics/Practice/14 - Day 4 - Geometric Distribution I.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-geometric-distribution-1/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | A, B = map(int, input().strip().split(' '))
15 | C = int(input())
16 |
17 | P = float(A/B)
18 |
19 | RES = (1-P) ** (C-1) * P
20 |
21 | print(round(RES, 3))
22 |
--------------------------------------------------------------------------------
/Statistics/Practice/15 - Day 4 - Geometric Distribution II.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-geometric-distribution-2/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | def geometric_prob(P, X):
15 | '''Function to calculate the geometric probability'''
16 | G = (1-P)**(X-1) * P
17 | return G
18 |
19 | NUMBERATOR, DENOMINATOR = map(float, input().split())
20 | X = int(input())
21 | P = NUMBERATOR/DENOMINATOR
22 | G = 0
23 |
24 | for i in range(1, 6): # i = 1, 2, 3, 4, 5
25 | G += geometric_prob(P, i)
26 |
27 | print("%.3f" %G)
28 |
--------------------------------------------------------------------------------
/Statistics/Practice/16 - Day 5 - Poisson Distribution I.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-poisson-distribution-1/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | from math import factorial, exp
15 |
16 | MEAN = float(input())
17 | K = int(input())
18 |
19 | POISSON = ((MEAN ** K) * exp(-MEAN)) / factorial(K)
20 |
21 | print("%.3f" % POISSON)
22 |
--------------------------------------------------------------------------------
/Statistics/Practice/17 - Day 5 - Poisson Distribution II.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-poisson-distribution-2/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | AVERAGE_X, AVERAGE_Y = [float(num) for num in input().split(" ")]
15 |
16 | # Cost
17 | COST_X = 160 + 40*(AVERAGE_X + AVERAGE_X**2)
18 | COST_Y = 128 + 40*(AVERAGE_Y + AVERAGE_Y**2)
19 |
20 | print(round(COST_X, 3))
21 | print(round(COST_Y, 3))
22 |
--------------------------------------------------------------------------------
/Statistics/Practice/18 - Day 5 - Normal Distribution I.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-normal-distribution-1/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | import math
15 |
16 | MU = 20
17 | SD = 2
18 |
19 | def normal_cdf(X, MU, SD):
20 | '''Function to calculate the Cumulative Distribution Function'''
21 | return 1/2*(1+math.erf((X-MU)/(SD*math.sqrt(2))))
22 |
23 | RESULT_1 = normal_cdf(19.5, MU, SD)
24 | RESULT_2 = normal_cdf(22, MU, SD) - normal_cdf(20, MU, SD)
25 |
26 | print(round(RESULT_1, 3))
27 | print(round(RESULT_2, 3))
28 |
29 | # .erf() -> https://docs.python.org/3.5/library/math.html#math.erf
30 |
--------------------------------------------------------------------------------
/Statistics/Practice/19 - Day 5 - Normal Distribution II.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-normal-distribution-2/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | import math
15 |
16 | MU, SD = list(map(float, input().rstrip().split()))
17 | X_1 = float(input())
18 | X_2 = float(input())
19 |
20 | def normal_distribution(X, MU, SD):
21 | '''Function to calculate the normal distribution'''
22 | return 1/2*(1+math.erf((X-MU)/(SD*math.sqrt(2))))
23 |
24 | # grade >80
25 | print(round((1-normal_distribution(X_1, MU, SD))*100, 2))
26 |
27 | # grade >= 60
28 | print(round((1-normal_distribution(X_2, MU, SD))*100, 2))
29 |
30 | # grade <60
31 | print(round((normal_distribution(X_2, MU, SD))*100, 2))
32 |
--------------------------------------------------------------------------------
/Statistics/Practice/20 - Day 6 - The Central Limit Theorem I.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | import math
15 |
16 | H = int(input())
17 | B = int(input())
18 | C = int(input())
19 | INPUT = math.sqrt(B) * int(input())
20 |
21 | print(round(0.5 * (1 + math.erf((H - (B * C)) / (INPUT * math.sqrt(2)))), 4))
22 |
--------------------------------------------------------------------------------
/Statistics/Practice/21 - Day 6 - The Central Limit Theorem II.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-2/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | import math
15 |
16 | TICKETS = 250
17 | STUDENTS = 100
18 | MEAN = 2.4
19 | SD = 2
20 |
21 | MU = STUDENTS * MEAN
22 | S = math.sqrt(100)*SD
23 |
24 | def normal_distribution(x, mu, sd):
25 | '''Function to calculate the distribution'''
26 | return 1/2*(1+math.erf((x-mu)/(sd*math.sqrt(2))))
27 |
28 | print(round(normal_distribution(x=TICKETS, mu=MU, sd=S), 4))
29 |
--------------------------------------------------------------------------------
/Statistics/Practice/22 - Day 6 - The Central Limit Theorem III.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-3/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | import math
15 |
16 | SAMPLE = 100
17 | M = 500
18 | SD = 80
19 | Z = 1.96
20 | RNG = 0.95
21 |
22 | print(round(-1.96 * (SD/math.sqrt(SAMPLE)) + M, 2))
23 | print(round(1.96 * (SD/math.sqrt(SAMPLE)) + M, 2))
24 |
--------------------------------------------------------------------------------
/Statistics/Practice/23 - Day 7 - Pearson Correlation Coefficient I.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | def std(x):
15 | return (sum([(i-(sum(x))/len(x))**2 for i in x])/len(x))**0.5
16 |
17 | N = int(input())
18 | X = list(map(float, input().split()))
19 | Y = list(map(float, input().split()))
20 |
21 | X_M = sum(X)/len(X)
22 | Y_M = sum(Y)/len(Y)
23 |
24 | X_S = std(X)
25 | Y_S = std(Y)
26 |
27 | print(round(sum([(i-X_M)*(j-Y_M) for i, j in zip(X, Y)])/(N*X_S*Y_S), 3))
28 |
--------------------------------------------------------------------------------
/Statistics/Practice/24 - Day 7 - Spearman's Rank Correlation.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | N = int(input())
15 | X = list(map(float, input().strip().split()))
16 | Y = list(map(float, input().strip().split()))
17 |
18 | X_COPY = X.copy()
19 |
20 | Y_COPY = Y.copy()
21 |
22 | X_COPY.sort()
23 |
24 | XD = dict(zip(X_COPY, range(1, N+1)))
25 |
26 | Y_COPY.sort()
27 |
28 | YD = dict(zip(Y_COPY, range(1, N+1)))
29 |
30 | RX = [XD[i] for i in X]
31 |
32 | RY = [YD[i] for i in Y]
33 |
34 | print(round(1-(6*sum([(RX-RY)**2 for RX, RY in zip(RX, RY)]))/((N**3)-N), 3))
35 |
--------------------------------------------------------------------------------
/Statistics/Practice/25 - Day 8 - Least Sqaure Regression Line.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-least-square-regression-line/problem
6 | # Difficulty: Easy
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | def mean(X):
15 | '''To calculate the mean'''
16 | return sum(X)/len(X)
17 |
18 | def lsr(X, Y):
19 | '''To calculate the Least Square Regression'''
20 | B = sum([(X[i] - mean(X)) * (Y[i] - mean(Y)) for i in range(len(X))])/sum([(j - mean(X))**2 for j in X])
21 | A = mean(Y) - (B*mean(X))
22 | return A+(B*80)
23 |
24 | X = []
25 | Y = []
26 |
27 | for i in range(5):
28 | A, B = list(map(int, input().split()))
29 | X.append(A)
30 | Y.append(B)
31 |
32 | print(round(lsr(X, Y), 3))
33 |
--------------------------------------------------------------------------------
/Statistics/Practice/26 - Day 8 - Pearson Correlation Coefficient II.txt:
--------------------------------------------------------------------------------
1 | ========================
2 | Information
3 | ========================
4 |
5 | Direct Link: https://www.hackerrank.com/challenges/s10-mcq-7/problem
6 | Difficulty: Easy
7 | Max Score: 30
8 | Language: Python
9 | Multiple Choice Question - No code required
10 |
11 | ========================
12 | Solution
13 | ========================
14 |
15 | Rewriting both lines in the form of y = mx + c
16 | y = (-3/4)*x - 2
17 | x = (-3/4)*y + (-7/4)
18 |
19 | c1 = -3/4
20 | c2 = -3/4
21 |
22 | Let x_std be the standard deviation of x, and let y_std be the standard deviation of y
23 |
24 | p = c1(x_std / y_std)
25 | p = c2(y_std / x_std)
26 |
27 | Multiplying both questions:
28 | p^2 = c1 * c2
29 | p^2 = (-3/4) * (-3/4)
30 | p^2 = 9/16
31 | p = +/-(3/4)
32 |
33 | Since x_std and y_std have to be positive. So p shares the same sign as c1 or c2. Thus, -3/4
34 |
35 | >>> -3/4
--------------------------------------------------------------------------------
/Statistics/Practice/27 - Day 9 - Multiple Linear Regression.py:
--------------------------------------------------------------------------------
1 | # ========================
2 | # Information
3 | # ========================
4 |
5 | # Direct Link: https://www.hackerrank.com/challenges/s10-multiple-linear-regression/problem
6 | # Difficulty: Medium
7 | # Max Score: 30
8 | # Language: Python
9 |
10 | # ========================
11 | # Solution
12 | # ========================
13 |
14 | from sklearn import linear_model
15 |
16 | M, N = list(map(int, input().strip().split()))
17 | X = [0]*N
18 | Y = [0]*N
19 |
20 | for i in range(N):
21 | inp = list(map(float, input().strip().split()))
22 | X[i] = inp[:-1]
23 | Y[i] = inp[-1]
24 |
25 | LM = linear_model.LinearRegression()
26 | LM.fit(X, Y)
27 | A = LM.intercept_
28 | B = LM.coef_
29 |
30 | Q = int(input())
31 |
32 | for i in range(Q):
33 | f = list(map(float, input().strip().split()))
34 | Y = A + sum([B[j] * f[j] for j in range(M)])
35 | print(round(Y, 2))
36 |
--------------------------------------------------------------------------------
/Statistics/Practice/Readme.md:
--------------------------------------------------------------------------------
1 |
2 | # 10 Days of Statistics in HackerRank
3 |
4 | | Day | Challenge | Problem | Difficulty | Score | Solution |
5 | | :---: | :-------------------------------------: | :------------------------------------------------------------------------------------------------: | :--------: | :---: | :----------------------------------------------------------------------------------------------------------: |
6 | | 0 | Mean, Median, and Mode | [Problem](https://www.hackerrank.com/challenges/s10-basic-statistics/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/01%20-%20Day%200%20-%20Mean,%20Median,%20and%20Mode.py) |
7 | | 0 | Weighted Mean | [Problem](https://www.hackerrank.com/challenges/s10-weighted-mean/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/02%20-%20Day%200%20-%20Weighted%20Mean.py) |
8 | | 1 | Quartiles | [Problem](https://www.hackerrank.com/challenges/s10-quartiles/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/03%20-%20Day%201%20-%20Quartiles.py) |
9 | | 1 | Interquartile Range | [Problem](https://www.hackerrank.com/challenges/s10-interquartile-range) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/04%20-%20Day%201%20-%20Interquartile%20Range.py) |
10 | | 1 | Standard Deviation | [Problem](https://www.hackerrank.com/challenges/s10-standard-deviation) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/05%20-%20Day%201%20-%20Standard%20Deviation.py) |
11 | | 2 | Basic Probability | [Problem](https://www.hackerrank.com/challenges/s10-mcq-1/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/06%20-%20Day%202%20-%20Basic%20Probability.py) |
12 | | 2 | More Dice | [Problem](https://www.hackerrank.com/challenges/s10-mcq-2/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/07%20-%20Day%202%20-%20More%20Dice.py) |
13 | | 2 | Compound Event Probability | [Problem](https://www.hackerrank.com/challenges/s10-mcq-3/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/08%20-%20Day%202%20-%20Compound%20Event%20Probability.py) |
14 | | 3 | Conditional Probability | [Problem](https://www.hackerrank.com/challenges/s10-mcq-4/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/09%20-%20Day%203%20-%20Conditional%20Probability.py) |
15 | | 3 | Cards of the Same Suit | [Problem](https://www.hackerrank.com/challenges/s10-mcq-5/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/10%20-%20Day%203%20-%20Cards%20of%20the%20Same%20Suit.txt) |
16 | | 3 | Drawing Marbles | [Problem](https://www.hackerrank.com/challenges/s10-mcq-6/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/11%20-%20Day%203%20-%20Drawing%20Marbles.py) |
17 | | 4 | Binomial Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-binomial-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/12%20-%20Day%204%20-%20Binomial%20Distribution%20I.py) |
18 | | 4 | Binomial Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-binomial-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/13%20-%20Day%204%20-%20Binomial%20Distribution%20II.py) |
19 | | 4 | Geometric Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-geometric-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/14%20-%20Day%204%20-%20Geometric%20Distribution%20I.py) |
20 | | 4 | Geometric Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-geometric-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/15%20-%20Day%204%20-%20Geometric%20Distribution%20II.py) |
21 | | 5 | Poisson Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-poisson-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/16%20-%20Day%205%20-%20Poisson%20Distribution%20I.py) |
22 | | 5 | Poisson Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-poisson-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/17%20-%20Day%205%20-%20Poisson%20Distribution%20II.py) |
23 | | 5 | Normal Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-normal-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/18%20-%20Day%205%20-%20Normal%20Distribution%20I.py) |
24 | | 5 | Normal Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-normal-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/19%20-%20Day%205%20-%20Normal%20Distribution%20II.py) |
25 | | 6 | The Central Limit Theorem I | [Problem](https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/20%20-%20Day%206%20-%20The%20Central%20Limit%20Theorem%20I.py) |
26 | | 6 | The Central Limit Theorem II | [Problem](https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/21%20-%20Day%206%20-%20The%20Central%20Limit%20Theorem%20II.py) |
27 | | 6 | The Central Limit Theorem III | [Problem](https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-3/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/22%20-%20Day%206%20-%20The%20Central%20Limit%20Theorem%20III.py) |
28 | | 7 | Pearson Correlation Coefficient I | [Problem](https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/23%20-%20Day%207%20-%20Pearson%20Correlation%20Coefficient%20I.py) |
29 | | 7 | Spearman's Rank Correlation Coefficient | [Problem](https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/24%20-%20Day%207%20-%20Spearman's%20Rank%20Correlation.py) |
30 | | 8 | Least Square Regression Line | [Problem](https://www.hackerrank.com/challenges/s10-least-square-regression-line/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/25%20-%20Day%208%20-%20Least%20Sqaure%20Regression%20Line.py) |
31 | | 8 | Pearson Correlation Coefficient II | [Problem](https://www.hackerrank.com/challenges/s10-mcq-7/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/26%20-%20Day%208%20-%20Pearson%20Correlation%20Coefficient%20II.txt) |
32 | | 9 | Multiple Linear Regression | [Problem](https://www.hackerrank.com/challenges/s10-multiple-linear-regression/problem) | Medium | 30 | [Solution](/10%20Days%20of%20Statistics/27%20-%20Day%209%20-%20Multiple%20Linear%20Regression.py) |
33 |
--------------------------------------------------------------------------------
/Statistics/README.md:
--------------------------------------------------------------------------------
1 | Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied.
2 |
--------------------------------------------------------------------------------