├── .gitignore ├── Assignment ├── README.md └── titles.ipynb ├── Data Visualization ├── Matplotlib │ ├── DV2021EN-0101-Basic Plotting.ipynb │ ├── DV2021EN-0102-Basic Color and Line Styles.ipynb │ ├── DV2021EN-0103-Simple Line Plots with Line Styles.ipynb │ ├── DV2021EN-0104-Simple Scatter Plots.ipynb │ ├── DV2021EN-0105-Errorbars.ipynb │ ├── DV2021EN-0106-Density_and_Contour_Plots.ipynb │ ├── DV2021EN-0107-Histograms_and_Binnings.ipynb │ ├── DV2021EN-0108-Customizing_Legends.ipynb │ ├── Hough-Circle-Transform-Opencv.py │ ├── MP0101- Matplotlib Complite .ipynb │ ├── README.md │ ├── Scatter plot.ipynb │ ├── barh.py │ ├── img │ │ └── Text.png │ ├── pieplot.py │ ├── plot_camera_numpy.py │ ├── polar_scatter.ipynb │ ├── sample_plots.ipynb │ ├── water Mark.ipynb │ └── watermark_image.py ├── README.md └── SeaBorn │ └── README.md ├── Data ├── Boston │ ├── README.md │ ├── housing.data.txt │ └── housing.names.txt ├── README.md ├── iris │ ├── README.md │ └── iris.csv └── titles.csv ├── LICENSE ├── Life Cycle Process of Data Science In Real World project ├── DSPD0101ENT-Business Understanding(Problem)-to-Analytic Approach.ipynb ├── DSPD0101ENT-Business Understanding.ipynb ├── DSPD0102ENT-Understanding-to-Preparation.ipynb ├── DSPD0103ENT-Requirements-to-Collection-py.ipynb ├── DSPD0104ENT-Modeling-to-Evaluation-py.ipynb ├── IBMOpenSource_FoundationalMethologyforDataScience.PDF └── README.md ├── Modeling ├── README.md ├── Semi Supervised Learning │ └── README.md ├── Supervised Learning │ └── README.md └── Unsupervised Learning │ └── README.md ├── Numpy ├── README.md └── img │ ├── README.md │ └── Where we use numpy.png ├── Pandas ├── DSPD0100ENT-Business Understanding.ipynb ├── DSPD0101EN-Introduction-to-Pandas.ipynb ├── DSPD0102EN-Pandas Series,Data Frame and Index.ipynb ├── DSPD0103EN-Data Collection & Data Source.ipynb ├── DSPD0104EN-Data_Loading.ipynb ├── DSPD0105EN-Missing-Values.ipynb ├── DSPD0106EN - Loading data from SQL databases.ipynb ├── DSPD0107EN-Operations-in-Pandas.ipynb ├── DSPD0108EN-Working-With-Strings.ipynb ├── DSPD0109EN-Hierarchical-Indexing.ipynb ├── DSPD0110EN-Merge-and-Join.ipynb ├── DSPD0111-Handling Missing Values with Numpy and Pandas.ipynb ├── Predicting_Credit_Risk_Model_Pipeline.ipynb └── README.md ├── README.md └── Statistics ├── DS0101-summary-stats.ipynb ├── DS0102-Exploratory Data Analysis(EDA).ipynb ├── DS0103-Probability Density Functions(pdf).ipynb ├── DS0104-Probability Mass Functions.ipynb ├── DS0105-Hypothesis-Testing.ipynb ├── DS0106-Bootstrapping.ipynb ├── DS0107-Covariance and Correlation.ipynb ├── DS0108-Linear-Reqression-LeastSquares.ipynb ├── DS0109-Data Distributions.ipynb ├── DS0110-Probability distributions.ipynb ├── Data └── README.md ├── Practice ├── 01 - Day 0 - Mean, Median, and Mode.py ├── 02 - Day 0 - Weighted Mean.py ├── 03 - Day 1 - Quartiles.py ├── 04 - Day 1 - Interquartile Range.py ├── 05 - Day 1 - Standard Deviation.py ├── 06 - Day 2 - Basic Probability.py ├── 07 - Day 2 - More Dice.py ├── 08 - Day 2 - Compound Event Probability.py ├── 09 - Day 3 - Conditional Probability.py ├── 10 - Day 3 - Cards of the Same Suit.txt ├── 11 - Day 3 - Drawing Marbles.py ├── 12 - Day 4 - Binomial Distribution I.py ├── 13 - Day 4 - Binomial Distribution II.py ├── 14 - Day 4 - Geometric Distribution I.py ├── 15 - Day 4 - Geometric Distribution II.py ├── 16 - Day 5 - Poisson Distribution I.py ├── 17 - Day 5 - Poisson Distribution II.py ├── 18 - Day 5 - Normal Distribution I.py ├── 19 - Day 5 - Normal Distribution II.py ├── 20 - Day 6 - The Central Limit Theorem I.py ├── 21 - Day 6 - The Central Limit Theorem II.py ├── 22 - Day 6 - The Central Limit Theorem III.py ├── 23 - Day 7 - Pearson Correlation Coefficient I.py ├── 24 - Day 7 - Spearman's Rank Correlation.py ├── 25 - Day 8 - Least Sqaure Regression Line.py ├── 26 - Day 8 - Pearson Correlation Coefficient II.txt ├── 27 - Day 9 - Multiple Linear Regression.py └── Readme.md └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /Assignment/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Assignment/titles.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "source": [ 6 | "%matplotlib inline\n", 7 | "import pandas as pd" 8 | ], 9 | "outputs": [], 10 | "execution_count": 1, 11 | "metadata": { 12 | "execution": { 13 | "iopub.status.busy": "2020-08-02T15:59:48.826Z", 14 | "iopub.execute_input": "2020-08-02T15:59:48.863Z", 15 | "shell.execute_reply": "2020-08-02T15:59:54.394Z", 16 | "iopub.status.idle": "2020-08-02T15:59:54.440Z" 17 | } 18 | } 19 | }, 20 | { 21 | "cell_type": "code", 22 | "source": [ 23 | "titles = pd.read_csv('titles.csv')\n", 24 | "titles.head()" 25 | ], 26 | "outputs": [ 27 | { 28 | "output_type": "execute_result", 29 | "execution_count": 3, 30 | "data": { 31 | "text/plain": " title year\n0 Tasveer Mere Sanam 1996\n1 Only You 1994\n2 El pueblo del terror 1970\n3 Machine 2007\n4 MARy 2008", 32 | "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
titleyear
0Tasveer Mere Sanam1996
1Only You1994
2El pueblo del terror1970
3Machine2007
4MARy2008
\n
" 33 | }, 34 | "metadata": {} 35 | } 36 | ], 37 | "execution_count": 3, 38 | "metadata": { 39 | "execution": { 40 | "iopub.status.busy": "2020-08-02T16:00:03.676Z", 41 | "iopub.execute_input": "2020-08-02T16:00:03.696Z", 42 | "iopub.status.idle": "2020-08-02T16:00:03.813Z", 43 | "shell.execute_reply": "2020-08-02T16:00:03.852Z" 44 | } 45 | } 46 | }, 47 | { 48 | "cell_type": "code", 49 | "source": [ 50 | "\n", 51 | "titles.tail()" 52 | ], 53 | "outputs": [ 54 | { 55 | "output_type": "execute_result", 56 | "execution_count": 6, 57 | "data": { 58 | "text/plain": " title year\n244909 Black Butterfly in a Colorful World 2018\n244910 Hua fei hua wu chun man cheng 1980\n244911 Nippon dabi katsukyu 1970\n244912 Under Siege 2: Dark Territory 1995\n244913 She Must Be Seeing Things 1987", 59 | "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
titleyear
244909Black Butterfly in a Colorful World2018
244910Hua fei hua wu chun man cheng1980
244911Nippon dabi katsukyu1970
244912Under Siege 2: Dark Territory1995
244913She Must Be Seeing Things1987
\n
" 60 | }, 61 | "metadata": {} 62 | } 63 | ], 64 | "execution_count": 6, 65 | "metadata": { 66 | "execution": { 67 | "iopub.status.busy": "2020-08-02T16:08:53.347Z", 68 | "iopub.execute_input": "2020-08-02T16:08:53.373Z", 69 | "iopub.status.idle": "2020-08-02T16:08:53.417Z", 70 | "shell.execute_reply": "2020-08-02T16:08:53.434Z" 71 | } 72 | } 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "source": [ 77 | "### How many movies are listed in the titles dataframe?" 78 | ], 79 | "metadata": { 80 | "collapsed": true 81 | } 82 | }, 83 | { 84 | "cell_type": "code", 85 | "source": [], 86 | "outputs": [ 87 | { 88 | "output_type": "stream", 89 | "name": "stdout", 90 | "text": [ 91 | "Index(['title', 'year'], dtype='object')\n" 92 | ] 93 | } 94 | ], 95 | "execution_count": 8, 96 | "metadata": { 97 | "execution": { 98 | "iopub.status.busy": "2020-08-02T16:11:11.003Z", 99 | "iopub.execute_input": "2020-08-02T16:11:11.026Z", 100 | "iopub.status.idle": "2020-08-02T16:11:11.114Z", 101 | "shell.execute_reply": "2020-08-02T16:11:11.135Z" 102 | } 103 | } 104 | }, 105 | { 106 | "cell_type": "code", 107 | "source": [], 108 | "outputs": [], 109 | "execution_count": null, 110 | "metadata": { 111 | "collapsed": true 112 | } 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "source": [ 117 | "### What are the earliest two films listed in the titles dataframe?" 118 | ], 119 | "metadata": { 120 | "collapsed": true 121 | } 122 | }, 123 | { 124 | "cell_type": "code", 125 | "source": [], 126 | "outputs": [], 127 | "execution_count": null, 128 | "metadata": {} 129 | }, 130 | { 131 | "cell_type": "code", 132 | "source": [], 133 | "outputs": [], 134 | "execution_count": null, 135 | "metadata": { 136 | "collapsed": true 137 | } 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "source": [ 142 | "### How many movies have the title \"Hamlet\"?" 143 | ], 144 | "metadata": { 145 | "collapsed": true 146 | } 147 | }, 148 | { 149 | "cell_type": "code", 150 | "source": [ 151 | "\n" 152 | ], 153 | "outputs": [], 154 | "execution_count": null, 155 | "metadata": {} 156 | }, 157 | { 158 | "cell_type": "code", 159 | "source": [], 160 | "outputs": [], 161 | "execution_count": null, 162 | "metadata": {} 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "source": [ 167 | "### How many movies are titled \"North by Northwest\"?" 168 | ], 169 | "metadata": { 170 | "collapsed": true 171 | } 172 | }, 173 | { 174 | "cell_type": "code", 175 | "source": [ 176 | "\n" 177 | ], 178 | "outputs": [], 179 | "execution_count": null, 180 | "metadata": {} 181 | }, 182 | { 183 | "cell_type": "code", 184 | "source": [], 185 | "outputs": [], 186 | "execution_count": null, 187 | "metadata": { 188 | "collapsed": true 189 | } 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "source": [ 194 | "### When was the first movie titled \"Hamlet\" made?" 195 | ], 196 | "metadata": { 197 | "collapsed": true 198 | } 199 | }, 200 | { 201 | "cell_type": "code", 202 | "source": [], 203 | "outputs": [], 204 | "execution_count": 19, 205 | "metadata": {} 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "source": [ 210 | "### List all of the \"Treasure Island\" movies from earliest to most recent." 211 | ], 212 | "metadata": { 213 | "collapsed": true 214 | } 215 | }, 216 | { 217 | "cell_type": "code", 218 | "source": [], 219 | "outputs": [], 220 | "execution_count": null, 221 | "metadata": { 222 | "collapsed": true 223 | } 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "source": [ 228 | "### How many movies were made in the year 1950?" 229 | ], 230 | "metadata": { 231 | "collapsed": true 232 | } 233 | }, 234 | { 235 | "cell_type": "code", 236 | "source": [], 237 | "outputs": [], 238 | "execution_count": null, 239 | "metadata": { 240 | "collapsed": true 241 | } 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "source": [ 246 | "### How many movies were made in the year 1960?" 247 | ], 248 | "metadata": { 249 | "collapsed": true 250 | } 251 | }, 252 | { 253 | "cell_type": "code", 254 | "source": [], 255 | "outputs": [], 256 | "execution_count": null, 257 | "metadata": { 258 | "collapsed": true 259 | } 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "source": [ 264 | "### How many movies were made from 1950 through 1959?" 265 | ], 266 | "metadata": { 267 | "collapsed": true 268 | } 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "source": [], 273 | "metadata": {} 274 | }, 275 | { 276 | "cell_type": "code", 277 | "source": [], 278 | "outputs": [], 279 | "execution_count": null, 280 | "metadata": { 281 | "collapsed": true 282 | } 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "source": [ 287 | "### In what years has a movie titled \"Batman\" been released?" 288 | ], 289 | "metadata": { 290 | "collapsed": true 291 | } 292 | }, 293 | { 294 | "cell_type": "code", 295 | "source": [], 296 | "outputs": [], 297 | "execution_count": null, 298 | "metadata": { 299 | "collapsed": true 300 | } 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "source": [ 305 | "### How many roles were there in the movie \"Inception\"?" 306 | ], 307 | "metadata": { 308 | "collapsed": true 309 | } 310 | }, 311 | { 312 | "cell_type": "code", 313 | "source": [], 314 | "outputs": [], 315 | "execution_count": null, 316 | "metadata": { 317 | "collapsed": true 318 | } 319 | }, 320 | { 321 | "cell_type": "code", 322 | "source": [], 323 | "outputs": [], 324 | "execution_count": null, 325 | "metadata": { 326 | "collapsed": true 327 | } 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "source": [ 332 | "### How many roles in the movie \"Inception\" are NOT ranked by an \"n\" value?" 333 | ], 334 | "metadata": { 335 | "collapsed": true 336 | } 337 | }, 338 | { 339 | "cell_type": "code", 340 | "source": [], 341 | "outputs": [], 342 | "execution_count": null, 343 | "metadata": { 344 | "collapsed": true 345 | } 346 | }, 347 | { 348 | "cell_type": "code", 349 | "source": [], 350 | "outputs": [], 351 | "execution_count": null, 352 | "metadata": { 353 | "collapsed": true 354 | } 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "source": [ 359 | "### But how many roles in the movie \"Inception\" did receive an \"n\" value?" 360 | ], 361 | "metadata": { 362 | "collapsed": true 363 | } 364 | }, 365 | { 366 | "cell_type": "code", 367 | "source": [], 368 | "outputs": [], 369 | "execution_count": null, 370 | "metadata": { 371 | "collapsed": true 372 | } 373 | }, 374 | { 375 | "cell_type": "code", 376 | "source": [], 377 | "outputs": [], 378 | "execution_count": null, 379 | "metadata": { 380 | "collapsed": true 381 | } 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "source": [ 386 | "### Display the cast of \"North by Northwest\" in their correct \"n\"-value order, ignoring roles that did not earn a numeric \"n\" value." 387 | ], 388 | "metadata": { 389 | "collapsed": true 390 | } 391 | }, 392 | { 393 | "cell_type": "code", 394 | "source": [], 395 | "outputs": [], 396 | "execution_count": null, 397 | "metadata": { 398 | "collapsed": true 399 | } 400 | }, 401 | { 402 | "cell_type": "code", 403 | "source": [], 404 | "outputs": [], 405 | "execution_count": null, 406 | "metadata": { 407 | "collapsed": true 408 | } 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "source": [ 413 | "### Display the entire cast, in \"n\"-order, of the 1972 film \"Sleuth\"." 414 | ], 415 | "metadata": { 416 | "collapsed": true 417 | } 418 | }, 419 | { 420 | "cell_type": "code", 421 | "source": [], 422 | "outputs": [], 423 | "execution_count": null, 424 | "metadata": { 425 | "collapsed": true 426 | } 427 | }, 428 | { 429 | "cell_type": "code", 430 | "source": [], 431 | "outputs": [], 432 | "execution_count": null, 433 | "metadata": { 434 | "collapsed": true 435 | } 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "source": [ 440 | "### Now display the entire cast, in \"n\"-order, of the 2007 version of \"Sleuth\"." 441 | ], 442 | "metadata": { 443 | "collapsed": true 444 | } 445 | }, 446 | { 447 | "cell_type": "code", 448 | "source": [], 449 | "outputs": [], 450 | "execution_count": null, 451 | "metadata": { 452 | "collapsed": true 453 | } 454 | }, 455 | { 456 | "cell_type": "code", 457 | "source": [], 458 | "outputs": [], 459 | "execution_count": null, 460 | "metadata": { 461 | "collapsed": true 462 | } 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "source": [ 467 | "### How many roles were credited in the silent 1921 version of Hamlet?" 468 | ], 469 | "metadata": { 470 | "collapsed": true 471 | } 472 | }, 473 | { 474 | "cell_type": "code", 475 | "source": [], 476 | "outputs": [], 477 | "execution_count": null, 478 | "metadata": { 479 | "collapsed": true 480 | } 481 | }, 482 | { 483 | "cell_type": "code", 484 | "source": [], 485 | "outputs": [], 486 | "execution_count": null, 487 | "metadata": { 488 | "collapsed": true 489 | } 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "source": [ 494 | "### How many roles were credited in Branagh’s 1996 Hamlet?" 495 | ], 496 | "metadata": { 497 | "collapsed": true 498 | } 499 | }, 500 | { 501 | "cell_type": "code", 502 | "source": [], 503 | "outputs": [], 504 | "execution_count": null, 505 | "metadata": { 506 | "collapsed": true 507 | } 508 | }, 509 | { 510 | "cell_type": "code", 511 | "source": [], 512 | "outputs": [], 513 | "execution_count": null, 514 | "metadata": { 515 | "collapsed": true 516 | } 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "source": [ 521 | "### How many \"Hamlet\" roles have been listed in all film credits through history?" 522 | ], 523 | "metadata": { 524 | "collapsed": true 525 | } 526 | }, 527 | { 528 | "cell_type": "code", 529 | "source": [], 530 | "outputs": [], 531 | "execution_count": null, 532 | "metadata": { 533 | "collapsed": true 534 | } 535 | }, 536 | { 537 | "cell_type": "code", 538 | "source": [], 539 | "outputs": [], 540 | "execution_count": null, 541 | "metadata": { 542 | "collapsed": true 543 | } 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "source": [ 548 | "### How many people have played an \"Ophelia\"?" 549 | ], 550 | "metadata": { 551 | "collapsed": true 552 | } 553 | }, 554 | { 555 | "cell_type": "code", 556 | "source": [], 557 | "outputs": [], 558 | "execution_count": null, 559 | "metadata": { 560 | "collapsed": true 561 | } 562 | }, 563 | { 564 | "cell_type": "code", 565 | "source": [], 566 | "outputs": [], 567 | "execution_count": null, 568 | "metadata": { 569 | "collapsed": true 570 | } 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "source": [ 575 | "### How many people have played a role called \"The Dude\"?" 576 | ], 577 | "metadata": { 578 | "collapsed": true 579 | } 580 | }, 581 | { 582 | "cell_type": "code", 583 | "source": [], 584 | "outputs": [], 585 | "execution_count": null, 586 | "metadata": { 587 | "collapsed": true 588 | } 589 | }, 590 | { 591 | "cell_type": "code", 592 | "source": [], 593 | "outputs": [], 594 | "execution_count": null, 595 | "metadata": { 596 | "collapsed": true 597 | } 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "source": [ 602 | "### How many people have played a role called \"The Stranger\"?" 603 | ], 604 | "metadata": { 605 | "collapsed": true 606 | } 607 | }, 608 | { 609 | "cell_type": "code", 610 | "source": [], 611 | "outputs": [], 612 | "execution_count": null, 613 | "metadata": { 614 | "collapsed": true 615 | } 616 | }, 617 | { 618 | "cell_type": "code", 619 | "source": [], 620 | "outputs": [], 621 | "execution_count": null, 622 | "metadata": { 623 | "collapsed": true 624 | } 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "source": [ 629 | "### How many roles has Sidney Poitier played throughout his career?" 630 | ], 631 | "metadata": { 632 | "collapsed": true 633 | } 634 | }, 635 | { 636 | "cell_type": "code", 637 | "source": [], 638 | "outputs": [], 639 | "execution_count": null, 640 | "metadata": { 641 | "collapsed": true 642 | } 643 | }, 644 | { 645 | "cell_type": "code", 646 | "source": [], 647 | "outputs": [], 648 | "execution_count": null, 649 | "metadata": { 650 | "collapsed": true 651 | } 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "source": [ 656 | "### How many roles has Judi Dench played?" 657 | ], 658 | "metadata": { 659 | "collapsed": true 660 | } 661 | }, 662 | { 663 | "cell_type": "code", 664 | "source": [], 665 | "outputs": [], 666 | "execution_count": null, 667 | "metadata": { 668 | "collapsed": true 669 | } 670 | }, 671 | { 672 | "cell_type": "code", 673 | "source": [], 674 | "outputs": [], 675 | "execution_count": null, 676 | "metadata": { 677 | "collapsed": true 678 | } 679 | }, 680 | { 681 | "cell_type": "markdown", 682 | "source": [ 683 | "### List the supporting roles (having n=2) played by Cary Grant in the 1940s, in order by year." 684 | ], 685 | "metadata": { 686 | "collapsed": true 687 | } 688 | }, 689 | { 690 | "cell_type": "code", 691 | "source": [], 692 | "outputs": [], 693 | "execution_count": null, 694 | "metadata": { 695 | "collapsed": true 696 | } 697 | }, 698 | { 699 | "cell_type": "code", 700 | "source": [], 701 | "outputs": [], 702 | "execution_count": null, 703 | "metadata": { 704 | "collapsed": true 705 | } 706 | }, 707 | { 708 | "cell_type": "markdown", 709 | "source": [ 710 | "### List the leading roles that Cary Grant played in the 1940s in order by year." 711 | ], 712 | "metadata": { 713 | "collapsed": true 714 | } 715 | }, 716 | { 717 | "cell_type": "code", 718 | "source": [], 719 | "outputs": [], 720 | "execution_count": null, 721 | "metadata": { 722 | "collapsed": true 723 | } 724 | }, 725 | { 726 | "cell_type": "code", 727 | "source": [], 728 | "outputs": [], 729 | "execution_count": null, 730 | "metadata": { 731 | "collapsed": true 732 | } 733 | }, 734 | { 735 | "cell_type": "markdown", 736 | "source": [ 737 | "### How many roles were available for actors in the 1950s?" 738 | ], 739 | "metadata": { 740 | "collapsed": true 741 | } 742 | }, 743 | { 744 | "cell_type": "code", 745 | "source": [], 746 | "outputs": [], 747 | "execution_count": null, 748 | "metadata": { 749 | "collapsed": true 750 | } 751 | }, 752 | { 753 | "cell_type": "code", 754 | "source": [], 755 | "outputs": [], 756 | "execution_count": null, 757 | "metadata": { 758 | "collapsed": true 759 | } 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "source": [ 764 | "### How many roles were available for actresses in the 1950s?" 765 | ], 766 | "metadata": { 767 | "collapsed": true 768 | } 769 | }, 770 | { 771 | "cell_type": "code", 772 | "source": [], 773 | "outputs": [], 774 | "execution_count": null, 775 | "metadata": { 776 | "collapsed": true 777 | } 778 | }, 779 | { 780 | "cell_type": "code", 781 | "source": [], 782 | "outputs": [], 783 | "execution_count": null, 784 | "metadata": { 785 | "collapsed": true 786 | } 787 | }, 788 | { 789 | "cell_type": "markdown", 790 | "source": [ 791 | "### How many leading roles (n=1) were available from the beginning of film history through 1980?" 792 | ], 793 | "metadata": { 794 | "collapsed": true 795 | } 796 | }, 797 | { 798 | "cell_type": "code", 799 | "source": [], 800 | "outputs": [], 801 | "execution_count": null, 802 | "metadata": { 803 | "collapsed": true 804 | } 805 | }, 806 | { 807 | "cell_type": "code", 808 | "source": [], 809 | "outputs": [], 810 | "execution_count": null, 811 | "metadata": { 812 | "collapsed": true 813 | } 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "source": [ 818 | "### How many non-leading roles were available through from the beginning of film history through 1980?" 819 | ], 820 | "metadata": { 821 | "collapsed": true 822 | } 823 | }, 824 | { 825 | "cell_type": "code", 826 | "source": [], 827 | "outputs": [], 828 | "execution_count": null, 829 | "metadata": { 830 | "collapsed": true 831 | } 832 | }, 833 | { 834 | "cell_type": "code", 835 | "source": [], 836 | "outputs": [], 837 | "execution_count": null, 838 | "metadata": { 839 | "collapsed": true 840 | } 841 | }, 842 | { 843 | "cell_type": "markdown", 844 | "source": [ 845 | "### How many roles through 1980 were minor enough that they did not warrant a numeric \"n\" rank?" 846 | ], 847 | "metadata": { 848 | "collapsed": true 849 | } 850 | }, 851 | { 852 | "cell_type": "code", 853 | "source": [], 854 | "outputs": [], 855 | "execution_count": null, 856 | "metadata": { 857 | "collapsed": true 858 | } 859 | }, 860 | { 861 | "cell_type": "code", 862 | "source": [], 863 | "outputs": [], 864 | "execution_count": null, 865 | "metadata": { 866 | "collapsed": true 867 | } 868 | } 869 | ], 870 | "metadata": { 871 | "kernelspec": { 872 | "display_name": "Python 3", 873 | "language": "python", 874 | "name": "python3" 875 | }, 876 | "language_info": { 877 | "name": "python", 878 | "version": "3.6.8", 879 | "mimetype": "text/x-python", 880 | "codemirror_mode": { 881 | "name": "ipython", 882 | "version": 3 883 | }, 884 | "pygments_lexer": "ipython3", 885 | "nbconvert_exporter": "python", 886 | "file_extension": ".py" 887 | }, 888 | "nteract": { 889 | "version": "0.24.1" 890 | } 891 | }, 892 | "nbformat": 4, 893 | "nbformat_minor": 1 894 | } -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/Hough-Circle-Transform-Opencv.py: -------------------------------------------------------------------------------- 1 | 2import cv2 2 | import numpy as np 3 | 4 | img = cv2.imread('eye.jpg',0) 5 | img = cv2.medianBlur(img,5) 6 | cimg = cv2.cvtColor(img,cv2.COLOR_GRAY2BGR) 7 | 8 | circles = cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,20, 9 | param1=50,param2=30,minRadius=0,maxRadius=0) 10 | 11 | circles = np.uint16(np.around(circles)) 12 | for i in circles[0,:]: 13 | # draw the outer circle 14 | cv2.circle(cimg,(i[0],i[1]),i[2],(0,255,0),2) 15 | # draw the center of the circle 16 | cv2.circle(cimg,(i[0],i[1]),2,(0,0,255),3) 17 | 18 | cv2.imshow('detected circles',cimg) 19 | cv2.waitKey(0) 20 | cv2.destroyAllWindows() 21 | z 22 | -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/README.md: -------------------------------------------------------------------------------- 1 | Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+ 2 | -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/barh.py: -------------------------------------------------------------------------------- 1 | # Horizontal bar chart 2 | 3 | import matplotlib.pyplot as plt 4 | import numpy as np 5 | 6 | # Fixing random state for reproducibility 7 | np.random.seed(19680801) 8 | 9 | plt.rcdefaults() 10 | fig, ax = plt.subplots() 11 | 12 | # Example data 13 | people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim') 14 | y_pos = np.arange(len(people)) 15 | performance = 3 + 10 * np.random.rand(len(people)) 16 | error = np.random.rand(len(people)) 17 | 18 | ax.barh(y_pos, performance, xerr=error, align='center', 19 | color='blue', ecolor='black') 20 | ##ax.set_yticks(y_pos) 21 | ##ax.set_yticklabels(people) 22 | ##ax.invert_yaxis() # labels read top-to-bottom 23 | ##ax.set_xlabel('Performance') 24 | ##ax.set_title('How fast do you want to go today?') 25 | 26 | plt.show() 27 | ##import matplotlib.pyplot as plt 28 | ##import numpy as np 29 | ## 30 | ##np.random.seed(19680801) 31 | ##data = np.random.randn(2, 100) 32 | ## 33 | ##fig, axs = plt.subplots(3, 3, figsize=(5, 5)) 34 | ##axs[0, 0].hist(data[0]) 35 | ##axs[1, 0].scatter(data[0], data[1]) 36 | ##axs[0, 1].plot(data[0], data[1]) 37 | ##axs[1, 1].hist2d(data[0], data[1]) 38 | ##a = np.linspace(1,2,20) 39 | ##axs[0,2].barh(a[0],[2]) 40 | ## 41 | ## 42 | ##plt.show() 43 | -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/img/Text.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reddyprasade/Data-Science-With-Python/b8e9b16691dc2f35a4d975946f8ca5cc0e0469d0/Data Visualization/Matplotlib/img/Text.png -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/pieplot.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | # Data Sets 3 | Group = ['EUL','PES','EFA','EDD','ELDR','EPP','UEN','OTHER'] 4 | Seats = [39,200,42,15,67,276,27,66] 5 | explode = (0,0,0,0,0,0.1,0,0) 6 | # Specify the axis 7 | plt.axis('equal') 8 | # Give the Title 9 | plt.title("European Parliment election ,2019") 10 | # Colors Code 11 | colors = ['red','orangered','forestgreen','lemonchiffon','yellow','navy','royalblue','lightgrey'] 12 | plt.pie(x=Seats,colors=colors,autopct='%1.0f%%',explode=explode) 13 | plt.legend(loc="center right", labels=Group) 14 | plt.show() 15 | -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/plot_camera_numpy.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from skimage import data 3 | import matplotlib.pyplot as plt 4 | 5 | camera = data.camera() 6 | camera[:10] = 0 7 | mask = camera < 87 8 | camera[mask] = 255 9 | inds_x = np.arange(len(camera)) 10 | inds_y = (4 * inds_x) % len(camera) 11 | camera[inds_x, inds_y] = 0 12 | 13 | l_x, l_y = camera.shape[0], camera.shape[1] 14 | X, Y = np.ogrid[:l_x, :l_y] 15 | outer_disk_mask = (X - l_x / 2)**2 + (Y - l_y / 2)**2 > (l_x / 2)**2 16 | camera[outer_disk_mask] = 0 17 | 18 | plt.figure(figsize=(4, 4)) 19 | plt.imshow(camera, cmap='gray', interpolation='nearest') 20 | plt.axis('off') 21 | plt.show() 22 | -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/sample_plots.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "%matplotlib inline" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "\n# Sample plots in Matplotlib\n\n\nHere you'll find a host of example plots with the code that\ngenerated them.\n\n\nLine Plot\n=========\n\nHere's how to create a line plot with text labels using\n:func:`~matplotlib.pyplot.plot`.\n\n.. figure:: ../../gallery/lines_bars_and_markers/images/sphx_glr_simple_plot_001.png\n :target: ../../gallery/lines_bars_and_markers/simple_plot.html\n :align: center\n :scale: 50\n\n Simple Plot\n\n\nMultiple subplots in one figure\n===============================\n\nMultiple axes (i.e. subplots) are created with the\n:func:`~matplotlib.pyplot.subplot` function:\n\n.. figure:: ../../gallery/subplots_axes_and_figures/images/sphx_glr_subplot_001.png\n :target: ../../gallery/subplots_axes_and_figures/subplot.html\n :align: center\n :scale: 50\n\n Subplot\n\n\nImages\n======\n\nMatplotlib can display images (assuming equally spaced\nhorizontal dimensions) using the :func:`~matplotlib.pyplot.imshow` function.\n\n.. figure:: ../../gallery/images_contours_and_fields/images/sphx_glr_image_demo_003.png\n :target: ../../gallery/images_contours_and_fields/image_demo.html\n :align: center\n :scale: 50\n\n Example of using :func:`~matplotlib.pyplot.imshow` to display a CT scan\n\n\n\nContouring and pseudocolor\n==========================\n\nThe :func:`~matplotlib.pyplot.pcolormesh` function can make a colored\nrepresentation of a two-dimensional array, even if the horizontal dimensions\nare unevenly spaced. The\n:func:`~matplotlib.pyplot.contour` function is another way to represent\nthe same data:\n\n.. figure:: ../../gallery/images_contours_and_fields/images/sphx_glr_pcolormesh_levels_001.png\n :target: ../../gallery/images_contours_and_fields/pcolormesh_levels.html\n :align: center\n :scale: 50\n\n Example comparing :func:`~matplotlib.pyplot.pcolormesh` and :func:`~matplotlib.pyplot.contour` for plotting two-dimensional data\n\n\nHistograms\n==========\n\nThe :func:`~matplotlib.pyplot.hist` function automatically generates\nhistograms and returns the bin counts or probabilities:\n\n.. figure:: ../../gallery/statistics/images/sphx_glr_histogram_features_001.png\n :target: ../../gallery/statistics/histogram_features.html\n :align: center\n :scale: 50\n\n Histogram Features\n\n\n\nPaths\n=====\n\nYou can add arbitrary paths in Matplotlib using the\n:mod:`matplotlib.path` module:\n\n.. figure:: ../../gallery/shapes_and_collections/images/sphx_glr_path_patch_001.png\n :target: ../../gallery/shapes_and_collections/path_patch.html\n :align: center\n :scale: 50\n\n Path Patch\n\n\nThree-dimensional plotting\n==========================\n\nThe mplot3d toolkit (see `toolkit_mplot3d-tutorial` and\n`mplot3d-examples-index`) has support for simple 3d graphs\nincluding surface, wireframe, scatter, and bar charts.\n\n.. figure:: ../../gallery/mplot3d/images/sphx_glr_surface3d_001.png\n :target: ../../gallery/mplot3d/surface3d.html\n :align: center\n :scale: 50\n\n Surface3d\n\nThanks to John Porter, Jonathon Taylor, Reinier Heeres, and Ben Root for\nthe `mplot3d` toolkit. This toolkit is included with all standard Matplotlib\ninstalls.\n\n\n\nStreamplot\n==========\n\nThe :meth:`~matplotlib.pyplot.streamplot` function plots the streamlines of\na vector field. In addition to simply plotting the streamlines, it allows you\nto map the colors and/or line widths of streamlines to a separate parameter,\nsuch as the speed or local intensity of the vector field.\n\n.. figure:: ../../gallery/images_contours_and_fields/images/sphx_glr_plot_streamplot_001.png\n :target: ../../gallery/images_contours_and_fields/plot_streamplot.html\n :align: center\n :scale: 50\n\n Streamplot with various plotting options.\n\nThis feature complements the :meth:`~matplotlib.pyplot.quiver` function for\nplotting vector fields. Thanks to Tom Flannaghan and Tony Yu for adding the\nstreamplot function.\n\n\nEllipses\n========\n\nIn support of the `Phoenix `_\nmission to Mars (which used Matplotlib to display ground tracking of\nspacecraft), Michael Droettboom built on work by Charlie Moad to provide\nan extremely accurate 8-spline approximation to elliptical arcs (see\n:class:`~matplotlib.patches.Arc`), which are insensitive to zoom level.\n\n.. figure:: ../../gallery/shapes_and_collections/images/sphx_glr_ellipse_demo_001.png\n :target: ../../gallery/shapes_and_collections/ellipse_demo.html\n :align: center\n :scale: 50\n\n Ellipse Demo\n\n\nBar charts\n==========\n\nUse the :func:`~matplotlib.pyplot.bar` function to make bar charts, which\nincludes customizations such as error bars:\n\n.. figure:: ../../gallery/statistics/images/sphx_glr_barchart_demo_001.png\n :target: ../../gallery/statistics/barchart_demo.html\n :align: center\n :scale: 50\n\n Barchart Demo\n\nYou can also create stacked bars\n(`bar_stacked.py <../../gallery/lines_bars_and_markers/bar_stacked.html>`_),\nor horizontal bar charts\n(`barh.py <../../gallery/lines_bars_and_markers/barh.html>`_).\n\n\n\nPie charts\n==========\n\nThe :func:`~matplotlib.pyplot.pie` function allows you to create pie\ncharts. Optional features include auto-labeling the percentage of area,\nexploding one or more wedges from the center of the pie, and a shadow effect.\nTake a close look at the attached code, which generates this figure in just\na few lines of code.\n\n.. figure:: ../../gallery/pie_and_polar_charts/images/sphx_glr_pie_features_001.png\n :target: ../../gallery/pie_and_polar_charts/pie_features.html\n :align: center\n :scale: 50\n\n Pie Features\n\n\nTables\n======\n\nThe :func:`~matplotlib.pyplot.table` function adds a text table\nto an axes.\n\n.. figure:: ../../gallery/misc/images/sphx_glr_table_demo_001.png\n :target: ../../gallery/misc/table_demo.html\n :align: center\n :scale: 50\n\n Table Demo\n\n\n\n\nScatter plots\n=============\n\nThe :func:`~matplotlib.pyplot.scatter` function makes a scatter plot\nwith (optional) size and color arguments. This example plots changes\nin Google's stock price, with marker sizes reflecting the\ntrading volume and colors varying with time. Here, the\nalpha attribute is used to make semitransparent circle markers.\n\n.. figure:: ../../gallery/lines_bars_and_markers/images/sphx_glr_scatter_demo2_001.png\n :target: ../../gallery/lines_bars_and_markers/scatter_demo2.html\n :align: center\n :scale: 50\n\n Scatter Demo2\n\n\n\nGUI widgets\n===========\n\nMatplotlib has basic GUI widgets that are independent of the graphical\nuser interface you are using, allowing you to write cross GUI figures\nand widgets. See :mod:`matplotlib.widgets` and the\n`widget examples <../../gallery/index.html>`_.\n\n.. figure:: ../../gallery/widgets/images/sphx_glr_slider_demo_001.png\n :target: ../../gallery/widgets/slider_demo.html\n :align: center\n :scale: 50\n\n Slider and radio-button GUI.\n\n\n\nFilled curves\n=============\n\nThe :func:`~matplotlib.pyplot.fill` function lets you\nplot filled curves and polygons:\n\n.. figure:: ../../gallery/lines_bars_and_markers/images/sphx_glr_fill_001.png\n :target: ../../gallery/lines_bars_and_markers/fill.html\n :align: center\n :scale: 50\n\n Fill\n\nThanks to Andrew Straw for adding this function.\n\n\nDate handling\n=============\n\nYou can plot timeseries data with major and minor ticks and custom\ntick formatters for both.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_date_001.png\n :target: ../../gallery/text_labels_and_annotations/date.html\n :align: center\n :scale: 50\n\n Date\n\nSee :mod:`matplotlib.ticker` and :mod:`matplotlib.dates` for details and usage.\n\n\n\nLog plots\n=========\n\nThe :func:`~matplotlib.pyplot.semilogx`,\n:func:`~matplotlib.pyplot.semilogy` and\n:func:`~matplotlib.pyplot.loglog` functions simplify the creation of\nlogarithmic plots.\n\n.. figure:: ../../gallery/scales/images/sphx_glr_log_demo_001.png\n :target: ../../gallery/scales/log_demo.html\n :align: center\n :scale: 50\n\n Log Demo\n\nThanks to Andrew Straw, Darren Dale and Gregory Lielens for contributions\nlog-scaling infrastructure.\n\n\nPolar plots\n===========\n\nThe :func:`~matplotlib.pyplot.polar` function generates polar plots.\n\n.. figure:: ../../gallery/pie_and_polar_charts/images/sphx_glr_polar_demo_001.png\n :target: ../../gallery/pie_and_polar_charts/polar_demo.html\n :align: center\n :scale: 50\n\n Polar Demo\n\n\n\nLegends\n=======\n\nThe :func:`~matplotlib.pyplot.legend` function automatically\ngenerates figure legends, with MATLAB-compatible legend-placement\nfunctions.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_legend_001.png\n :target: ../../gallery/text_labels_and_annotations/legend.html\n :align: center\n :scale: 50\n\n Legend\n\nThanks to Charles Twardy for input on the legend function.\n\n\nTeX-notation for text objects\n=============================\n\nBelow is a sampling of the many TeX expressions now supported by Matplotlib's\ninternal mathtext engine. The mathtext module provides TeX style mathematical\nexpressions using `FreeType `_\nand the DejaVu, BaKoMa computer modern, or `STIX `_\nfonts. See the :mod:`matplotlib.mathtext` module for additional details.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_mathtext_examples_001.png\n :target: ../../gallery/text_labels_and_annotations/mathtext_examples.html\n :align: center\n :scale: 50\n\n Mathtext Examples\n\nMatplotlib's mathtext infrastructure is an independent implementation and\ndoes not require TeX or any external packages installed on your computer. See\nthe tutorial at :doc:`/tutorials/text/mathtext`.\n\n\n\nNative TeX rendering\n====================\n\nAlthough Matplotlib's internal math rendering engine is quite\npowerful, sometimes you need TeX. Matplotlib supports external TeX\nrendering of strings with the *usetex* option.\n\n.. figure:: ../../gallery/text_labels_and_annotations/images/sphx_glr_tex_demo_001.png\n :target: ../../gallery/text_labels_and_annotations/tex_demo.html\n :align: center\n :scale: 50\n\n Tex Demo\n\n\nEEG GUI\n=======\n\nYou can embed Matplotlib into pygtk, wx, Tk, or Qt applications.\nHere is a screenshot of an EEG viewer called `pbrain\n`__.\n\n![](../../_static/eeg_small.png)\n\n\nThe lower axes uses :func:`~matplotlib.pyplot.specgram`\nto plot the spectrogram of one of the EEG channels.\n\nFor examples of how to embed Matplotlib in different toolkits, see:\n\n * :doc:`/gallery/user_interfaces/embedding_in_gtk3_sgskip`\n * :doc:`/gallery/user_interfaces/embedding_in_wx2_sgskip`\n * :doc:`/gallery/user_interfaces/mpl_with_glade3_sgskip`\n * :doc:`/gallery/user_interfaces/embedding_in_qt_sgskip`\n * :doc:`/gallery/user_interfaces/embedding_in_tk_sgskip`\n\nXKCD-style sketch plots\n=======================\n\nJust for fun, Matplotlib supports plotting in the style of `xkcd\n`.\n\n.. figure:: ../../gallery/showcase/images/sphx_glr_xkcd_001.png\n :target: ../../gallery/showcase/xkcd.html\n :align: center\n :scale: 50\n\n xkcd\n\n" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "Subplot example\n===============\n\nMany plot types can be combined in one figure to create\npowerful and flexible representations of data.\n\n\n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": { 32 | "collapsed": false 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "import matplotlib.pyplot as plt\nimport numpy as np\n\nnp.random.seed(19680801)\ndata = np.random.randn(2, 100)\n\nfig, axs = plt.subplots(2, 2, figsize=(5, 5))\naxs[0, 0].hist(data[0])\naxs[1, 0].scatter(data[0], data[1])\naxs[0, 1].plot(data[0], data[1])\naxs[1, 1].hist2d(data[0], data[1])\n\nplt.show()" 37 | ] 38 | } 39 | ], 40 | "metadata": { 41 | "kernelspec": { 42 | "display_name": "Python 3", 43 | "language": "python", 44 | "name": "python3" 45 | }, 46 | "language_info": { 47 | "codemirror_mode": { 48 | "name": "ipython", 49 | "version": 3 50 | }, 51 | "file_extension": ".py", 52 | "mimetype": "text/x-python", 53 | "name": "python", 54 | "nbconvert_exporter": "python", 55 | "pygments_lexer": "ipython3", 56 | "version": "3.7.3" 57 | } 58 | }, 59 | "nbformat": 4, 60 | "nbformat_minor": 0 61 | } -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/water Mark.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 4, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "loading [[[1. 1. 1.]\n", 13 | " [1. 1. 1.]\n", 14 | " [1. 1. 1.]\n", 15 | " ...\n", 16 | " [1. 1. 1.]\n", 17 | " [1. 1. 1.]\n", 18 | " [1. 1. 1.]]\n", 19 | "\n", 20 | " [[1. 1. 1.]\n", 21 | " [1. 1. 1.]\n", 22 | " [1. 1. 1.]\n", 23 | " ...\n", 24 | " [1. 1. 1.]\n", 25 | " [1. 1. 1.]\n", 26 | " [1. 1. 1.]]\n", 27 | "\n", 28 | " [[1. 1. 1.]\n", 29 | " [1. 1. 1.]\n", 30 | " [1. 1. 1.]\n", 31 | " ...\n", 32 | " [1. 1. 1.]\n", 33 | " [1. 1. 1.]\n", 34 | " [1. 1. 1.]]\n", 35 | "\n", 36 | " ...\n", 37 | "\n", 38 | " [[1. 1. 1.]\n", 39 | " [1. 1. 1.]\n", 40 | " [1. 1. 1.]\n", 41 | " ...\n", 42 | " [1. 1. 1.]\n", 43 | " [1. 1. 1.]\n", 44 | " [1. 1. 1.]]\n", 45 | "\n", 46 | " [[1. 1. 1.]\n", 47 | " [1. 1. 1.]\n", 48 | " [1. 1. 1.]\n", 49 | " ...\n", 50 | " [1. 1. 1.]\n", 51 | " [1. 1. 1.]\n", 52 | " [1. 1. 1.]]\n", 53 | "\n", 54 | " [[1. 1. 1.]\n", 55 | " [1. 1. 1.]\n", 56 | " [1. 1. 1.]\n", 57 | " ...\n", 58 | " [1. 1. 1.]\n", 59 | " [1. 1. 1.]\n", 60 | " [1. 1. 1.]]]\n" 61 | ] 62 | }, 63 | { 64 | "ename": "TypeError", 65 | "evalue": "Object does not appear to be a 8-bit string path or a Python file-like object", 66 | "output_type": "error", 67 | "traceback": [ 68 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 69 | "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", 70 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 11\u001b[0m \u001b[0mdatafile\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mimread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'img/matplot.png'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 12\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'loading %s'\u001b[0m \u001b[1;33m%\u001b[0m \u001b[0mdatafile\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 13\u001b[1;33m \u001b[0mim\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mimage\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mimread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mdatafile\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 14\u001b[0m \u001b[0mim\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m:\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m-\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m0.5\u001b[0m \u001b[1;31m# set the alpha channel\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 15\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", 71 | "\u001b[1;32mc:\\users\\kumar\\appdata\\local\\programs\\python\\python36\\lib\\site-packages\\matplotlib\\image.py\u001b[0m in \u001b[0;36mimread\u001b[1;34m(fname, format)\u001b[0m\n\u001b[0;32m 1375\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mhandler\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfd\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1376\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1377\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0mhandler\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfname\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1378\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1379\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", 72 | "\u001b[1;31mTypeError\u001b[0m: Object does not appear to be a 8-bit string path or a Python file-like object" 73 | ] 74 | } 75 | ], 76 | "source": [ 77 | "\n", 78 | "import numpy as np\n", 79 | "import matplotlib.cbook as cbook\n", 80 | "import matplotlib.image as image\n", 81 | "import matplotlib.pyplot as plt\n", 82 | "\n", 83 | "# Fixing random state for reproducibility\n", 84 | "np.random.seed(19680801)\n", 85 | "\n", 86 | "\n", 87 | "datafile = plt.imread('img/matplot.png')\n", 88 | "print('loading %s' % datafile)\n", 89 | "im = image.imread(datafile)\n", 90 | "im[:, :, -1] = 0.5 # set the alpha channel\n", 91 | "\n", 92 | "fig, ax = plt.subplots()\n", 93 | "\n", 94 | "ax.plot(np.random.rand(20), '-o', ms=20, lw=2, alpha=0.7, mfc='orange')\n", 95 | "ax.grid()\n", 96 | "fig.figimage(im, 10, 10, zorder=3)\n", 97 | "\n", 98 | "plt.show()\n" 99 | ] 100 | } 101 | ], 102 | "metadata": { 103 | "kernelspec": { 104 | "display_name": "Python 3", 105 | "language": "python", 106 | "name": "python3" 107 | }, 108 | "language_info": { 109 | "codemirror_mode": { 110 | "name": "ipython", 111 | "version": 3 112 | }, 113 | "file_extension": ".py", 114 | "mimetype": "text/x-python", 115 | "name": "python", 116 | "nbconvert_exporter": "python", 117 | "pygments_lexer": "ipython3", 118 | "version": "3.6.3" 119 | } 120 | }, 121 | "nbformat": 4, 122 | "nbformat_minor": 2 123 | } 124 | -------------------------------------------------------------------------------- /Data Visualization/Matplotlib/watermark_image.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.cbook as cbook 3 | import matplotlib.image as image 4 | import matplotlib.pyplot as plt 5 | 6 | # Fixing random state for reproducibility 7 | np.random.seed(19680801) 8 | 9 | 10 | datafile = cbook.get_sample_data('logo2.png', asfileobj=False) 11 | print('loading %s' % datafile) 12 | im = image.imread(datafile) 13 | im[:, :, -1] = 0.5 # set the alpha channel 14 | 15 | fig, ax = plt.subplots() 16 | 17 | ax.plot(np.random.rand(20), '-o', ms=20, lw=2, alpha=0.7, mfc='orange') 18 | ax.grid() 19 | fig.figimage(im, 10, 10, zorder=3) 20 | 21 | plt.show() 22 | -------------------------------------------------------------------------------- /Data Visualization/README.md: -------------------------------------------------------------------------------- 1 | ### Data visualization 2 | Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images. This communication is achieved through the use of a systematic mapping between graphic marks and data values in the creation of the visualization. 3 | -------------------------------------------------------------------------------- /Data Visualization/SeaBorn/README.md: -------------------------------------------------------------------------------- 1 | Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. 2 | -------------------------------------------------------------------------------- /Data/Boston/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Data/Boston/housing.names.txt: -------------------------------------------------------------------------------- 1 | 1. Title: Boston Housing Data 2 | 3 | 2. Sources: 4 | (a) Origin: This dataset was taken from the StatLib library which is 5 | maintained at Carnegie Mellon University. 6 | (b) Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the 7 | demand for clean air', J. Environ. Economics & Management, 8 | vol.5, 81-102, 1978. 9 | (c) Date: July 7, 1993 10 | 11 | 3. Past Usage: 12 | - Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 13 | 1980. N.B. Various transformations are used in the table on 14 | pages 244-261. 15 | - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. 16 | In Proceedings on the Tenth International Conference of Machine 17 | Learning, 236-243, University of Massachusetts, Amherst. Morgan 18 | Kaufmann. 19 | 20 | 4. Relevant Information: 21 | 22 | Concerns housing values in suburbs of Boston. 23 | 24 | 5. Number of Instances: 506 25 | 26 | 6. Number of Attributes: 13 continuous attributes (including "class" 27 | attribute "MEDV"), 1 binary-valued attribute. 28 | 29 | 7. Attribute Information: 30 | 31 | 1. CRIM per capita crime rate by town 32 | 2. ZN proportion of residential land zoned for lots over 33 | 25,000 sq.ft. 34 | 3. INDUS proportion of non-retail business acres per town 35 | 4. CHAS Charles River dummy variable (= 1 if tract bounds 36 | river; 0 otherwise) 37 | 5. NOX nitric oxides concentration (parts per 10 million) 38 | 6. RM average number of rooms per dwelling 39 | 7. AGE proportion of owner-occupied units built prior to 1940 40 | 8. DIS weighted distances to five Boston employment centres 41 | 9. RAD index of accessibility to radial highways 42 | 10. TAX full-value property-tax rate per $10,000 43 | 11. PTRATIO pupil-teacher ratio by town 44 | 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks 45 | by town 46 | 13. LSTAT % lower status of the population 47 | 14. MEDV Median value of owner-occupied homes in $1000's 48 | 49 | 8. Missing Attribute Values: None. 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /Data/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Data/iris/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Data/iris/iris.csv: -------------------------------------------------------------------------------- 1 | sepal_length,sepal_width,petal_length,petal_width,species 2 | 5.1,3.5,1.4,0.2,setosa 3 | 4.9,3,1.4,0.2,setosa 4 | 4.7,3.2,1.3,0.2,setosa 5 | 4.6,3.1,1.5,0.2,setosa 6 | 5,3.6,1.4,0.2,setosa 7 | 5.4,3.9,1.7,0.4,setosa 8 | 4.6,3.4,1.4,0.3,setosa 9 | 5,3.4,1.5,0.2,setosa 10 | 4.4,2.9,1.4,0.2,setosa 11 | 4.9,3.1,1.5,0.1,setosa 12 | 5.4,3.7,1.5,0.2,setosa 13 | 4.8,3.4,1.6,0.2,setosa 14 | 4.8,3,1.4,0.1,setosa 15 | 4.3,3,1.1,0.1,setosa 16 | 5.8,4,1.2,0.2,setosa 17 | 5.7,4.4,1.5,0.4,setosa 18 | 5.4,3.9,1.3,0.4,setosa 19 | 5.1,3.5,1.4,0.3,setosa 20 | 5.7,3.8,1.7,0.3,setosa 21 | 5.1,3.8,1.5,0.3,setosa 22 | 5.4,3.4,1.7,0.2,setosa 23 | 5.1,3.7,1.5,0.4,setosa 24 | 4.6,3.6,1,0.2,setosa 25 | 5.1,3.3,1.7,0.5,setosa 26 | 4.8,3.4,1.9,0.2,setosa 27 | 5,3,1.6,0.2,setosa 28 | 5,3.4,1.6,0.4,setosa 29 | 5.2,3.5,1.5,0.2,setosa 30 | 5.2,3.4,1.4,0.2,setosa 31 | 4.7,3.2,1.6,0.2,setosa 32 | 4.8,3.1,1.6,0.2,setosa 33 | 5.4,3.4,1.5,0.4,setosa 34 | 5.2,4.1,1.5,0.1,setosa 35 | 5.5,4.2,1.4,0.2,setosa 36 | 4.9,3.1,1.5,0.2,setosa 37 | 5,3.2,1.2,0.2,setosa 38 | 5.5,3.5,1.3,0.2,setosa 39 | 4.9,3.6,1.4,0.1,setosa 40 | 4.4,3,1.3,0.2,setosa 41 | 5.1,3.4,1.5,0.2,setosa 42 | 5,3.5,1.3,0.3,setosa 43 | 4.5,2.3,1.3,0.3,setosa 44 | 4.4,3.2,1.3,0.2,setosa 45 | 5,3.5,1.6,0.6,setosa 46 | 5.1,3.8,1.9,0.4,setosa 47 | 4.8,3,1.4,0.3,setosa 48 | 5.1,3.8,1.6,0.2,setosa 49 | 4.6,3.2,1.4,0.2,setosa 50 | 5.3,3.7,1.5,0.2,setosa 51 | 5,3.3,1.4,0.2,setosa 52 | 7,3.2,4.7,1.4,versicolor 53 | 6.4,3.2,4.5,1.5,versicolor 54 | 6.9,3.1,4.9,1.5,versicolor 55 | 5.5,2.3,4,1.3,versicolor 56 | 6.5,2.8,4.6,1.5,versicolor 57 | 5.7,2.8,4.5,1.3,versicolor 58 | 6.3,3.3,4.7,1.6,versicolor 59 | 4.9,2.4,3.3,1,versicolor 60 | 6.6,2.9,4.6,1.3,versicolor 61 | 5.2,2.7,3.9,1.4,versicolor 62 | 5,2,3.5,1,versicolor 63 | 5.9,3,4.2,1.5,versicolor 64 | 6,2.2,4,1,versicolor 65 | 6.1,2.9,4.7,1.4,versicolor 66 | 5.6,2.9,3.6,1.3,versicolor 67 | 6.7,3.1,4.4,1.4,versicolor 68 | 5.6,3,4.5,1.5,versicolor 69 | 5.8,2.7,4.1,1,versicolor 70 | 6.2,2.2,4.5,1.5,versicolor 71 | 5.6,2.5,3.9,1.1,versicolor 72 | 5.9,3.2,4.8,1.8,versicolor 73 | 6.1,2.8,4,1.3,versicolor 74 | 6.3,2.5,4.9,1.5,versicolor 75 | 6.1,2.8,4.7,1.2,versicolor 76 | 6.4,2.9,4.3,1.3,versicolor 77 | 6.6,3,4.4,1.4,versicolor 78 | 6.8,2.8,4.8,1.4,versicolor 79 | 6.7,3,5,1.7,versicolor 80 | 6,2.9,4.5,1.5,versicolor 81 | 5.7,2.6,3.5,1,versicolor 82 | 5.5,2.4,3.8,1.1,versicolor 83 | 5.5,2.4,3.7,1,versicolor 84 | 5.8,2.7,3.9,1.2,versicolor 85 | 6,2.7,5.1,1.6,versicolor 86 | 5.4,3,4.5,1.5,versicolor 87 | 6,3.4,4.5,1.6,versicolor 88 | 6.7,3.1,4.7,1.5,versicolor 89 | 6.3,2.3,4.4,1.3,versicolor 90 | 5.6,3,4.1,1.3,versicolor 91 | 5.5,2.5,4,1.3,versicolor 92 | 5.5,2.6,4.4,1.2,versicolor 93 | 6.1,3,4.6,1.4,versicolor 94 | 5.8,2.6,4,1.2,versicolor 95 | 5,2.3,3.3,1,versicolor 96 | 5.6,2.7,4.2,1.3,versicolor 97 | 5.7,3,4.2,1.2,versicolor 98 | 5.7,2.9,4.2,1.3,versicolor 99 | 6.2,2.9,4.3,1.3,versicolor 100 | 5.1,2.5,3,1.1,versicolor 101 | 5.7,2.8,4.1,1.3,versicolor 102 | 6.3,3.3,6,2.5,virginica 103 | 5.8,2.7,5.1,1.9,virginica 104 | 7.1,3,5.9,2.1,virginica 105 | 6.3,2.9,5.6,1.8,virginica 106 | 6.5,3,5.8,2.2,virginica 107 | 7.6,3,6.6,2.1,virginica 108 | 4.9,2.5,4.5,1.7,virginica 109 | 7.3,2.9,6.3,1.8,virginica 110 | 6.7,2.5,5.8,1.8,virginica 111 | 7.2,3.6,6.1,2.5,virginica 112 | 6.5,3.2,5.1,2,virginica 113 | 6.4,2.7,5.3,1.9,virginica 114 | 6.8,3,5.5,2.1,virginica 115 | 5.7,2.5,5,2,virginica 116 | 5.8,2.8,5.1,2.4,virginica 117 | 6.4,3.2,5.3,2.3,virginica 118 | 6.5,3,5.5,1.8,virginica 119 | 7.7,3.8,6.7,2.2,virginica 120 | 7.7,2.6,6.9,2.3,virginica 121 | 6,2.2,5,1.5,virginica 122 | 6.9,3.2,5.7,2.3,virginica 123 | 5.6,2.8,4.9,2,virginica 124 | 7.7,2.8,6.7,2,virginica 125 | 6.3,2.7,4.9,1.8,virginica 126 | 6.7,3.3,5.7,2.1,virginica 127 | 7.2,3.2,6,1.8,virginica 128 | 6.2,2.8,4.8,1.8,virginica 129 | 6.1,3,4.9,1.8,virginica 130 | 6.4,2.8,5.6,2.1,virginica 131 | 7.2,3,5.8,1.6,virginica 132 | 7.4,2.8,6.1,1.9,virginica 133 | 7.9,3.8,6.4,2,virginica 134 | 6.4,2.8,5.6,2.2,virginica 135 | 6.3,2.8,5.1,1.5,virginica 136 | 6.1,2.6,5.6,1.4,virginica 137 | 7.7,3,6.1,2.3,virginica 138 | 6.3,3.4,5.6,2.4,virginica 139 | 6.4,3.1,5.5,1.8,virginica 140 | 6,3,4.8,1.8,virginica 141 | 6.9,3.1,5.4,2.1,virginica 142 | 6.7,3.1,5.6,2.4,virginica 143 | 6.9,3.1,5.1,2.3,virginica 144 | 5.8,2.7,5.1,1.9,virginica 145 | 6.8,3.2,5.9,2.3,virginica 146 | 6.7,3.3,5.7,2.5,virginica 147 | 6.7,3,5.2,2.3,virginica 148 | 6.3,2.5,5,1.9,virginica 149 | 6.5,3,5.2,2,virginica 150 | 6.2,3.4,5.4,2.3,virginica 151 | 5.9,3,5.1,1.8,virginica -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /Life Cycle Process of Data Science In Real World project/DSPD0101ENT-Business Understanding(Problem)-to-Analytic Approach.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "button": false, 7 | "deletable": true, 8 | "new_sheet": false, 9 | "run_control": { 10 | "read_only": false 11 | } 12 | }, 13 | "source": [ 14 | "

From Problem to Approach

" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": { 20 | "button": false, 21 | "deletable": true, 22 | "new_sheet": false, 23 | "run_control": { 24 | "read_only": false 25 | } 26 | }, 27 | "source": [ 28 | "## Introduction\n", 29 | "\n", 30 | "The aim of these labs is to reinforce the concepts that we discuss in each module's videos. These labs will revolve around the use case of food recipes, and together, we will walk through the process that data scientists usually follow when trying to solve a problem. Let's get started!\n", 31 | "\n", 32 | "In this lab, we will start learning about the data science methodology, and focus on the **Business Understanding** and the **Analytic Approach** stages.\n", 33 | "\n", 34 | "------------" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "button": false, 41 | "deletable": true, 42 | "new_sheet": false, 43 | "run_control": { 44 | "read_only": false 45 | } 46 | }, 47 | "source": [ 48 | "## Table of Contents\n", 49 | "\n", 50 | "
\n", 51 | "\n", 52 | "1. [Business Understanding](#0)
\n", 53 | "2. [Analytic Approach](#2)
\n", 54 | "
\n", 55 | "
" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": { 61 | "button": false, 62 | "deletable": true, 63 | "new_sheet": false, 64 | "run_control": { 65 | "read_only": false 66 | } 67 | }, 68 | "source": [ 69 | "# Business Understanding " 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "button": false, 76 | "deletable": true, 77 | "new_sheet": false, 78 | "run_control": { 79 | "read_only": false 80 | } 81 | }, 82 | "source": [ 83 | "This is the **Data Science Methodology**, a flowchart that begins with business understanding." 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": { 89 | "button": false, 90 | "deletable": true, 91 | "new_sheet": false, 92 | "run_control": { 93 | "read_only": false 94 | } 95 | }, 96 | "source": [ 97 | "" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": { 103 | "button": false, 104 | "deletable": true, 105 | "new_sheet": false, 106 | "run_control": { 107 | "read_only": false 108 | } 109 | }, 110 | "source": [ 111 | "#### Why is the business understanding stage important?" 112 | ] 113 | }, 114 | { 115 | "cell_type": "raw", 116 | "metadata": { 117 | "button": false, 118 | "deletable": true, 119 | "new_sheet": false, 120 | "run_control": { 121 | "read_only": false 122 | } 123 | }, 124 | "source": [ 125 | "Your Answer: the beginning of the methodology because getting\n", 126 | "clarity around the problem to be solved, allows you to determine which data will be used to\n", 127 | "answer the core question." 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": { 133 | "button": false, 134 | "deletable": true, 135 | "new_sheet": false, 136 | "run_control": { 137 | "read_only": false 138 | } 139 | }, 140 | "source": [ 141 | "Double-click __here__ for the solution.\n", 142 | "" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": { 150 | "button": false, 151 | "deletable": true, 152 | "new_sheet": false, 153 | "run_control": { 154 | "read_only": false 155 | } 156 | }, 157 | "source": [ 158 | "#### Looking at this diagram, we immediately spot two outstanding features of the data science methodology." 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": { 164 | "button": false, 165 | "deletable": true, 166 | "new_sheet": false, 167 | "run_control": { 168 | "read_only": false 169 | } 170 | }, 171 | "source": [ 172 | " " 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": { 178 | "button": false, 179 | "deletable": true, 180 | "new_sheet": false, 181 | "run_control": { 182 | "read_only": false 183 | } 184 | }, 185 | "source": [ 186 | "#### What are they?" 187 | ] 188 | }, 189 | { 190 | "cell_type": "raw", 191 | "metadata": { 192 | "button": false, 193 | "deletable": true, 194 | "new_sheet": false, 195 | "run_control": { 196 | "read_only": false 197 | } 198 | }, 199 | "source": [ 200 | "Your Answer: \n", 201 | "1. It is Iterative Process \n", 202 | "2.Data Science is Never Ending Process" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": { 208 | "button": false, 209 | "deletable": true, 210 | "new_sheet": false, 211 | "run_control": { 212 | "read_only": false 213 | } 214 | }, 215 | "source": [ 216 | "Double-click __here__ for the solution.\n", 217 | "" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": { 226 | "button": false, 227 | "deletable": true, 228 | "new_sheet": false, 229 | "run_control": { 230 | "read_only": false 231 | } 232 | }, 233 | "source": [ 234 | "#### Now let's illustrate the data science methodology with a case study." 235 | ] 236 | }, 237 | { 238 | "cell_type": "markdown", 239 | "metadata": { 240 | "button": false, 241 | "deletable": true, 242 | "new_sheet": false, 243 | "run_control": { 244 | "read_only": false 245 | } 246 | }, 247 | "source": [ 248 | "Say, we are interested in automating the process of figuring out the cuisine of a given dish or recipe. Let's apply the business understanding stage to this problem." 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": { 254 | "button": false, 255 | "deletable": true, 256 | "new_sheet": false, 257 | "run_control": { 258 | "read_only": false 259 | } 260 | }, 261 | "source": [ 262 | "#### Q. Can we predict the cuisine of a given dish using the name of the dish only?" 263 | ] 264 | }, 265 | { 266 | "cell_type": "raw", 267 | "metadata": { 268 | "button": false, 269 | "deletable": true, 270 | "new_sheet": false, 271 | "run_control": { 272 | "read_only": false 273 | } 274 | }, 275 | "source": [ 276 | "Your Answer:no\n", 277 | "\n" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": { 283 | "button": false, 284 | "deletable": true, 285 | "new_sheet": false, 286 | "run_control": { 287 | "read_only": false 288 | } 289 | }, 290 | "source": [ 291 | "Double-click __here__ for the solution.\n", 292 | "" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": { 300 | "button": false, 301 | "deletable": true, 302 | "new_sheet": false, 303 | "run_control": { 304 | "read_only": false 305 | } 306 | }, 307 | "source": [ 308 | "#### Q. For example, the following dish names were taken from the menu of a local restaurant in Toronto, Ontario in Canada. \n", 309 | "\n", 310 | "#### 1. Beast\n", 311 | "#### 2. 2 PM\n", 312 | "#### 3. 4 Minute" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": { 318 | "button": false, 319 | "deletable": true, 320 | "new_sheet": false, 321 | "run_control": { 322 | "read_only": false 323 | } 324 | }, 325 | "source": [ 326 | "#### Are you able to tell the cuisine of these dishes?" 327 | ] 328 | }, 329 | { 330 | "cell_type": "raw", 331 | "metadata": { 332 | "button": false, 333 | "deletable": true, 334 | "new_sheet": false, 335 | "run_control": { 336 | "read_only": false 337 | } 338 | }, 339 | "source": [ 340 | "Your Answer:\n", 341 | "\n" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": { 347 | "button": false, 348 | "deletable": true, 349 | "new_sheet": false, 350 | "run_control": { 351 | "read_only": false 352 | } 353 | }, 354 | "source": [ 355 | "Double-click __here__ for the solution.\n", 356 | "\n", 359 | "\n", 360 | "\n", 363 | "\n", 364 | "\n", 367 | "\n", 368 | "\n", 371 | "\n", 372 | "" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": { 380 | "button": false, 381 | "deletable": true, 382 | "new_sheet": false, 383 | "run_control": { 384 | "read_only": false 385 | } 386 | }, 387 | "source": [ 388 | "#### Q. What about by appearance only? Yes or No." 389 | ] 390 | }, 391 | { 392 | "cell_type": "raw", 393 | "metadata": { 394 | "button": false, 395 | "deletable": true, 396 | "new_sheet": false, 397 | "run_control": { 398 | "read_only": false 399 | } 400 | }, 401 | "source": [ 402 | "Your Answer:\n", 403 | "\n", 404 | "no" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": { 410 | "button": false, 411 | "deletable": true, 412 | "new_sheet": false, 413 | "run_control": { 414 | "read_only": false 415 | } 416 | }, 417 | "source": [ 418 | "Double-click __here__ for the solution.\n", 419 | "" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": { 427 | "button": false, 428 | "deletable": true, 429 | "new_sheet": false, 430 | "run_control": { 431 | "read_only": false 432 | } 433 | }, 434 | "source": [ 435 | "At this point, we realize that automating the process of determining the cuisine of a given dish is not a straightforward problem as we need to come up with a way that is very robust to the many cuisines and their variations." 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": { 441 | "button": false, 442 | "deletable": true, 443 | "new_sheet": false, 444 | "run_control": { 445 | "read_only": false 446 | } 447 | }, 448 | "source": [ 449 | "#### Q. What about determining the cuisine of a dish based on its ingredients?" 450 | ] 451 | }, 452 | { 453 | "cell_type": "raw", 454 | "metadata": { 455 | "button": false, 456 | "deletable": true, 457 | "new_sheet": false, 458 | "run_control": { 459 | "read_only": false 460 | } 461 | }, 462 | "source": [ 463 | "Your Answer:\n", 464 | "\n", 465 | "Potentially yes, as there are specific ingredients unique to each cuisine" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": { 471 | "button": false, 472 | "deletable": true, 473 | "new_sheet": false, 474 | "run_control": { 475 | "read_only": false 476 | } 477 | }, 478 | "source": [ 479 | "Double-click __here__ for the solution.\n", 480 | "" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": { 488 | "button": false, 489 | "deletable": true, 490 | "new_sheet": false, 491 | "run_control": { 492 | "read_only": false 493 | } 494 | }, 495 | "source": [ 496 | "As you guessed, yes determining the cuisine of a given dish based on its ingredients seems like a viable solution as some ingredients are unique to cuisines. For example:" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": { 502 | "button": false, 503 | "deletable": true, 504 | "new_sheet": false, 505 | "run_control": { 506 | "read_only": false 507 | } 508 | }, 509 | "source": [ 510 | "* When we talk about **American** cuisines, the first ingredient that comes to one's mind (or at least to my mind =D) is beef or turkey.\n", 511 | "\n", 512 | "* When we talk about **British** cuisines, the first ingredient that comes to one's mind is haddock or mint sauce.\n", 513 | "\n", 514 | "* When we talk about **Canadian** cuisines, the first ingredient that comes to one's mind is bacon or poutine.\n", 515 | "\n", 516 | "* When we talk about **French** cuisines, the first ingredient that comes to one's mind is bread or butter.\n", 517 | "\n", 518 | "* When we talk about **Italian** cuisines, the first ingredient that comes to one's mind is tomato or ricotta.\n", 519 | "\n", 520 | "* When we talk about **Japanese** cuisines, the first ingredient that comes to one's mind is seaweed or soy sauce.\n", 521 | "\n", 522 | "* When we talk about **Chinese** cuisines, the first ingredient that comes to one's mind is ginger or garlic.\n", 523 | "\n", 524 | "* When we talk about **indian** cuisines, the first ingredient that comes to one's mind is masala or chillis." 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": { 530 | "button": false, 531 | "deletable": true, 532 | "new_sheet": false, 533 | "run_control": { 534 | "read_only": false 535 | } 536 | }, 537 | "source": [ 538 | "#### Accordingly, can you determine the cuisine of the dish associated with the following list of ingredients?" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": { 544 | "button": false, 545 | "deletable": true, 546 | "new_sheet": false, 547 | "run_control": { 548 | "read_only": false 549 | } 550 | }, 551 | "source": [ 552 | "" 553 | ] 554 | }, 555 | { 556 | "cell_type": "raw", 557 | "metadata": { 558 | "button": false, 559 | "deletable": true, 560 | "new_sheet": false, 561 | "run_control": { 562 | "read_only": false 563 | } 564 | }, 565 | "source": [ 566 | "Your Answer:\n", 567 | "\n", 568 | "Japanese since the recipe is most likely that of a sushi roll." 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": { 574 | "button": false, 575 | "deletable": true, 576 | "new_sheet": false, 577 | "run_control": { 578 | "read_only": false 579 | } 580 | }, 581 | "source": [ 582 | "Double-click __here__ for the solution.\n", 583 | "" 586 | ] 587 | }, 588 | { 589 | "cell_type": "markdown", 590 | "metadata": { 591 | "button": false, 592 | "deletable": true, 593 | "new_sheet": false, 594 | "run_control": { 595 | "read_only": false 596 | } 597 | }, 598 | "source": [ 599 | "# Analytic Approach " 600 | ] 601 | }, 602 | { 603 | "cell_type": "markdown", 604 | "metadata": { 605 | "button": false, 606 | "deletable": true, 607 | "new_sheet": false, 608 | "run_control": { 609 | "read_only": false 610 | } 611 | }, 612 | "source": [ 613 | "" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": { 619 | "button": false, 620 | "deletable": true, 621 | "new_sheet": false, 622 | "run_control": { 623 | "read_only": false 624 | } 625 | }, 626 | "source": [ 627 | "#### So why are we interested in data science?" 628 | ] 629 | }, 630 | { 631 | "cell_type": "markdown", 632 | "metadata": { 633 | "button": false, 634 | "deletable": true, 635 | "new_sheet": false, 636 | "run_control": { 637 | "read_only": false 638 | } 639 | }, 640 | "source": [ 641 | "Once the business problem has been clearly stated, the data scientist can define the analytic approach to solve the problem. This step entails expressing the problem in the context of statistical and machine-learning techniques, so that the entity or stakeholders with the problem can identify the most suitable techniques for the desired outcome. " 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": { 647 | "button": false, 648 | "deletable": true, 649 | "new_sheet": false, 650 | "run_control": { 651 | "read_only": false 652 | } 653 | }, 654 | "source": [ 655 | "#### Why is the analytic approach stage important?" 656 | ] 657 | }, 658 | { 659 | "cell_type": "raw", 660 | "metadata": { 661 | "button": false, 662 | "deletable": true, 663 | "new_sheet": false, 664 | "run_control": { 665 | "read_only": false 666 | } 667 | }, 668 | "source": [ 669 | "Your Answer:\n", 670 | "\n" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": { 676 | "button": false, 677 | "deletable": true, 678 | "new_sheet": false, 679 | "run_control": { 680 | "read_only": false 681 | } 682 | }, 683 | "source": [ 684 | "Double-click __here__ for the solution.\n", 685 | "" 688 | ] 689 | }, 690 | { 691 | "cell_type": "markdown", 692 | "metadata": { 693 | "button": false, 694 | "deletable": true, 695 | "new_sheet": false, 696 | "run_control": { 697 | "read_only": false 698 | } 699 | }, 700 | "source": [ 701 | "#### Let's explore a machine learning algorithm, decision trees, and see if it is the right technique to automate the process of identifying the cuisine of a given dish or recipe while simultaneously providing us with some insight on why a given recipe is believed to belong to a certain type of cuisine." 702 | ] 703 | }, 704 | { 705 | "cell_type": "markdown", 706 | "metadata": { 707 | "button": false, 708 | "deletable": true, 709 | "new_sheet": false, 710 | "run_control": { 711 | "read_only": false 712 | } 713 | }, 714 | "source": [ 715 | "This is a decision tree that a naive person might create manually. Starting at the top with all the recipes for all the cuisines in the world, if a recipe contains **rice**, then this decision tree would classify it as a **Japanese** cuisine. Otherwise, it would be classified as not a **Japanese** cuisine." 716 | ] 717 | }, 718 | { 719 | "cell_type": "markdown", 720 | "metadata": { 721 | "button": false, 722 | "deletable": true, 723 | "new_sheet": false, 724 | "run_control": { 725 | "read_only": false 726 | } 727 | }, 728 | "source": [ 729 | "" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": { 735 | "button": false, 736 | "deletable": true, 737 | "new_sheet": false, 738 | "run_control": { 739 | "read_only": false 740 | } 741 | }, 742 | "source": [ 743 | "#### Is this a good decision tree? Yes or No, and why? " 744 | ] 745 | }, 746 | { 747 | "cell_type": "raw", 748 | "metadata": { 749 | "button": false, 750 | "deletable": true, 751 | "new_sheet": false, 752 | "run_control": { 753 | "read_only": false 754 | } 755 | }, 756 | "source": [ 757 | "Your Answer:\n", 758 | "\n" 759 | ] 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "metadata": { 764 | "button": false, 765 | "deletable": true, 766 | "new_sheet": false, 767 | "run_control": { 768 | "read_only": false 769 | } 770 | }, 771 | "source": [ 772 | "Double-click __here__ for the solution.\n", 773 | "" 776 | ] 777 | }, 778 | { 779 | "cell_type": "markdown", 780 | "metadata": { 781 | "button": false, 782 | "deletable": true, 783 | "new_sheet": false, 784 | "run_control": { 785 | "read_only": false 786 | } 787 | }, 788 | "source": [ 789 | "#### In order to build a very powerful decision tree for the recipe case study, let's take some time to learn more about decision trees." 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": { 795 | "button": false, 796 | "deletable": true, 797 | "new_sheet": false, 798 | "run_control": { 799 | "read_only": false 800 | } 801 | }, 802 | "source": [ 803 | "* Decision trees are built using recursive partitioning to classify the data.\n", 804 | "* When partitioning the data, decision trees use the most predictive feature (ingredient in this case) to split the data.\n", 805 | "* **Predictiveness** is based on decrease in entropy - gain in information, or *impurity*." 806 | ] 807 | }, 808 | { 809 | "cell_type": "markdown", 810 | "metadata": { 811 | "button": false, 812 | "deletable": true, 813 | "new_sheet": false, 814 | "run_control": { 815 | "read_only": false 816 | } 817 | }, 818 | "source": [ 819 | "#### Suppose that our data is comprised of green triangles and red circles." 820 | ] 821 | }, 822 | { 823 | "cell_type": "markdown", 824 | "metadata": { 825 | "button": false, 826 | "deletable": true, 827 | "new_sheet": false, 828 | "run_control": { 829 | "read_only": false 830 | } 831 | }, 832 | "source": [ 833 | "The following decision tree would be considered the optimal model for classifying the data into a node for green triangles and a node for red circles." 834 | ] 835 | }, 836 | { 837 | "cell_type": "markdown", 838 | "metadata": { 839 | "button": false, 840 | "deletable": true, 841 | "new_sheet": false, 842 | "run_control": { 843 | "read_only": false 844 | } 845 | }, 846 | "source": [ 847 | "" 848 | ] 849 | }, 850 | { 851 | "cell_type": "markdown", 852 | "metadata": { 853 | "button": false, 854 | "deletable": true, 855 | "new_sheet": false, 856 | "run_control": { 857 | "read_only": false 858 | } 859 | }, 860 | "source": [ 861 | "Each of the classes in the leaf nodes are completely pure – that is, each leaf node only contains datapoints that belong to the same class." 862 | ] 863 | }, 864 | { 865 | "cell_type": "markdown", 866 | "metadata": { 867 | "button": false, 868 | "deletable": true, 869 | "new_sheet": false, 870 | "run_control": { 871 | "read_only": false 872 | } 873 | }, 874 | "source": [ 875 | "On the other hand, the following decision tree is an example of the worst-case scenario that the model could output. " 876 | ] 877 | }, 878 | { 879 | "cell_type": "markdown", 880 | "metadata": { 881 | "button": false, 882 | "deletable": true, 883 | "new_sheet": false, 884 | "run_control": { 885 | "read_only": false 886 | } 887 | }, 888 | "source": [ 889 | "" 890 | ] 891 | }, 892 | { 893 | "cell_type": "markdown", 894 | "metadata": { 895 | "button": false, 896 | "deletable": true, 897 | "new_sheet": false, 898 | "run_control": { 899 | "read_only": false 900 | } 901 | }, 902 | "source": [ 903 | "Each leaf node contains datapoints belonging to the two classes resulting in many datapoints ultimately being misclassified." 904 | ] 905 | }, 906 | { 907 | "cell_type": "markdown", 908 | "metadata": { 909 | "button": false, 910 | "deletable": true, 911 | "new_sheet": false, 912 | "run_control": { 913 | "read_only": false 914 | } 915 | }, 916 | "source": [ 917 | "#### A tree stops growing at a node when:\n", 918 | "* Pure or nearly pure.\n", 919 | "* No remaining variables on which to further subset the data.\n", 920 | "* The tree has grown to a preselected size limit." 921 | ] 922 | }, 923 | { 924 | "cell_type": "markdown", 925 | "metadata": { 926 | "button": false, 927 | "deletable": true, 928 | "new_sheet": false, 929 | "run_control": { 930 | "read_only": false 931 | } 932 | }, 933 | "source": [ 934 | "#### Here are some characteristics of decision trees:" 935 | ] 936 | }, 937 | { 938 | "cell_type": "markdown", 939 | "metadata": { 940 | "button": false, 941 | "deletable": true, 942 | "new_sheet": false, 943 | "run_control": { 944 | "read_only": false 945 | } 946 | }, 947 | "source": [ 948 | "" 949 | ] 950 | }, 951 | { 952 | "cell_type": "markdown", 953 | "metadata": { 954 | "button": false, 955 | "deletable": true, 956 | "new_sheet": false, 957 | "run_control": { 958 | "read_only": false 959 | } 960 | }, 961 | "source": [ 962 | "Now let's put what we learned about decision trees to use. Let's try and build a much better version of the decision tree for our recipe problem." 963 | ] 964 | }, 965 | { 966 | "cell_type": "markdown", 967 | "metadata": { 968 | "button": false, 969 | "deletable": true, 970 | "new_sheet": false, 971 | "run_control": { 972 | "read_only": false 973 | } 974 | }, 975 | "source": [ 976 | "" 977 | ] 978 | }, 979 | { 980 | "cell_type": "markdown", 981 | "metadata": { 982 | "button": false, 983 | "deletable": true, 984 | "new_sheet": false, 985 | "run_control": { 986 | "read_only": false 987 | } 988 | }, 989 | "source": [ 990 | "I hope you agree that the above decision tree is a much better version than the previous one. Although we are still using **Rice** as the ingredient in the first *decision node*, recipes get divided into **Asian Food** and **Non-Asian Food**. **Asian Food** is then further divided into **Japanese** and **Not Japanese** based on the **Wasabi** ingredient. This process of splitting *leaf nodes* continues until each *leaf node* is pure, i.e., containing recipes belonging to only one cuisine." 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": { 996 | "button": false, 997 | "deletable": true, 998 | "new_sheet": false, 999 | "run_control": { 1000 | "read_only": false 1001 | } 1002 | }, 1003 | "source": [ 1004 | "Accordingly, decision trees is a suitable technique or algorithm for our recipe case study." 1005 | ] 1006 | } 1007 | ], 1008 | "metadata": { 1009 | "kernelspec": { 1010 | "display_name": "Python 3", 1011 | "language": "python", 1012 | "name": "python3" 1013 | }, 1014 | "language_info": { 1015 | "codemirror_mode": { 1016 | "name": "ipython", 1017 | "version": 3 1018 | }, 1019 | "file_extension": ".py", 1020 | "mimetype": "text/x-python", 1021 | "name": "python", 1022 | "nbconvert_exporter": "python", 1023 | "pygments_lexer": "ipython3", 1024 | "version": "3.6.8" 1025 | }, 1026 | "widgets": { 1027 | "state": {}, 1028 | "version": "1.1.2" 1029 | } 1030 | }, 1031 | "nbformat": 4, 1032 | "nbformat_minor": 4 1033 | } 1034 | -------------------------------------------------------------------------------- /Life Cycle Process of Data Science In Real World project/DSPD0101ENT-Business Understanding.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "The business understanding stage of the Data Science Process (DSP). This process provides a recommended lifecycle that you can use to structure your data-science projects.\n", 8 | "\n", 9 | "**Goals**\n", 10 | "* Specify the key variables that are to serve as the model targets and whose related metrics are used determine the success of the project.\n", 11 | "* Identify the relevant data sources that the business has access to or needs to obtain.\n", 12 | "\n", 13 | "#### How to do it\n", 14 | "There are two main tasks addressed in this stage:\n", 15 | "\n", 16 | "* **Define objectives:** Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.\n", 17 | "* **Identify data sources:** Find the relevant data that helps you answer the questions that define the objectives of the project.\n", 18 | "\n", 19 | "#### Define objectives\n", 20 | "1. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project. Two examples of such targets are sales forecasts or the probability of an order being fraudulent.\n", 21 | "\n", 22 | "2. Define the project goals by asking and refining \"sharp\" questions that are relevant, specific, and unambiguous. Data science is a process that uses names and numbers to answer such questions. You typically use data science or machine learning to answer five types of questions:\n", 23 | "\n", 24 | " * How much or how many? (regression)\n", 25 | " * Which category? (classification)\n", 26 | " * Which group? (clustering)\n", 27 | " * Is this weird? (anomaly detection)\n", 28 | " * Which option should be taken? (recommendation)\n", 29 | "Determine which of these questions you're asking and how answering it achieves your business goals.\n", 30 | "\n", 31 | "3. Define the project team by specifying the roles and responsibilities of its members. Develop a high-level milestone plan that you iterate on as you discover more information.\n", 32 | "\n", 33 | "4. Define the success metrics. For example, you might want to achieve a customer churn prediction. You need an accuracy rate of \"x\" percent by the end of this three-month project. With this data, you can offer customer promotions to reduce churn. The metrics must be **SMART**:\n", 34 | "\n", 35 | "* **S**pecific\n", 36 | "* **M**easurable\n", 37 | "* **A**chievable\n", 38 | "* **R**elevant\n", 39 | "* **T**ime-bound\n", 40 | "\n", 41 | "#### Identify data sources\n", 42 | "Identify data sources that contain known examples of answers to your sharp questions. Look for the following data:\n", 43 | "\n", 44 | "* Data that's relevant to the question. Do you have measures of the target and features that are related to the target?\n", 45 | "* Data that's an accurate measure of your model target and the features of interest.\n", 46 | "For example, you might find that the existing systems need to collect and log additional kinds of data to address the problem and achieve the project goals. In this situation, you might want to look for external data sources or update your systems to collect new data.\n", 47 | "\n", 48 | "#### Artifacts\n", 49 | "Here are the deliverables in this stage:\n", 50 | "\n", 51 | "* **Charter document:** A standard template is provided in the TDSP project structure definition. The charter document is a living document. You update the template throughout the project as you make new discoveries and as business requirements change. The key is to iterate upon this document, adding more detail, as you progress through the discovery process. Keep the customer and other stakeholders involved in making the changes and clearly communicate the reasons for the changes to them.\n", 52 | "* **Data sources:** The Raw data sources section of the Data definitions report that's found in the TDSP project Data report folder contains the data sources. This section specifies the original and destination locations for the raw data. In later stages, you fill in additional details like the scripts to move the data to your analytic environment.\n", 53 | "* **Data dictionaries:** This document provides descriptions of the data that's provided by the client. These descriptions include information about the schema (the data types and information on the validation rules, if any) and the entity-relation diagrams, if available." 54 | ] 55 | } 56 | ], 57 | "metadata": { 58 | "kernelspec": { 59 | "display_name": "Python 3", 60 | "language": "python", 61 | "name": "python3" 62 | }, 63 | "language_info": { 64 | "codemirror_mode": { 65 | "name": "ipython", 66 | "version": 3 67 | }, 68 | "file_extension": ".py", 69 | "mimetype": "text/x-python", 70 | "name": "python", 71 | "nbconvert_exporter": "python", 72 | "pygments_lexer": "ipython3", 73 | "version": "3.6.8" 74 | } 75 | }, 76 | "nbformat": 4, 77 | "nbformat_minor": 4 78 | } 79 | -------------------------------------------------------------------------------- /Life Cycle Process of Data Science In Real World project/IBMOpenSource_FoundationalMethologyforDataScience.PDF: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reddyprasade/Data-Science-With-Python/b8e9b16691dc2f35a4d975946f8ca5cc0e0469d0/Life Cycle Process of Data Science In Real World project/IBMOpenSource_FoundationalMethologyforDataScience.PDF -------------------------------------------------------------------------------- /Life Cycle Process of Data Science In Real World project/README.md: -------------------------------------------------------------------------------- 1 | * Data Science Process (DSP) is an agile, iterative data science methodology to deliver predictive analytics solutions and intelligent applications efficiently. 2 | * DSP helps improve team collaboration and learning by suggesting how team roles work best together. 3 | * DSP includes best practices and structures from Company and other industry leaders to help toward successful implementation of data science initiatives. 4 | * The goal is to help companies fully realize the benefits of their analytics program. 5 | 6 | 7 | ### Data science lifecycle 8 | * The Data Science Process (DSP) provides a lifecycle to structure the development of your data science projects. 9 | * The lifecycle outlines the full steps that successful projects follow. 10 | 11 | * If you are using another data science lifecycle, such as CRISP-DM, KDD, or your organization's own custom process, you can still use the task-based DSP in the context of those development lifecycles. 12 | * At a high level, these different methodologies have much in common. 13 | * This lifecycle has been designed for data science projects that ship as part of intelligent applications. 14 | * These applications deploy machine learning or artificial intelligence models for predictive analytics. 15 | * Exploratory data science projects or improvised analytics projects can also benefit from using this process. 16 | * But in such cases some of the steps described may not be needed. 17 | *** 18 | * The lifecycle outlines the major stages that projects typically execute, often iteratively: 19 | 20 | 1. Business Understanding 21 | 2. Data Acquisition and Understanding 22 | 3. Modeling 23 | 4. Deployment 24 | 5. Customer Acceptance 25 | ![](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/media/overview/tdsp-lifecycle2.png) 26 | -------------------------------------------------------------------------------- /Modeling/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Modeling/Semi Supervised Learning/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Modeling/Supervised Learning/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Modeling/Unsupervised Learning/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Numpy/README.md: -------------------------------------------------------------------------------- 1 | 2 | ## Numpy 3 | --- 4 | * The fundamental package for scientific computing with Python 5 | * Nearly every scientist working in Python draws on the power of NumPy. 6 | * NumPy brings the computational power of languages like C and Fortran to Python, a language much easier to learn and use. With this power comes simplicity: a solution in NumPy is often clear and elegant. 7 | * ![](https://github.com/reddyprasade/Data-Science-With-Python/blob/master/Numpy/img/Where%20we%20use%20numpy.png) 8 | 9 | 10 | 11 | ### Features of Numpy 12 | * POWERFUL N-DIMENSIONAL ARRAYS 13 | * NUMERICAL COMPUTING TOOLS 14 | * INTEROPERABLE 15 | * PERFORMANT 16 | * EASY TO USE 17 | * OPEN SOURCE 18 | -------------------------------------------------------------------------------- /Numpy/img/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Numpy/img/Where we use numpy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/reddyprasade/Data-Science-With-Python/b8e9b16691dc2f35a4d975946f8ca5cc0e0469d0/Numpy/img/Where we use numpy.png -------------------------------------------------------------------------------- /Pandas/DSPD0100ENT-Business Understanding.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "The business understanding stage of the Data Science Process (DSP). This process provides a recommended lifecycle that you can use to structure your data-science projects.\n", 8 | "\n", 9 | "**Goals**\n", 10 | "* Specify the key variables that are to serve as the model targets and whose related metrics are used determine the success of the project.\n", 11 | "* Identify the relevant data sources that the business has access to or needs to obtain.\n", 12 | "\n", 13 | "#### How to do it\n", 14 | "There are two main tasks addressed in this stage:\n", 15 | "\n", 16 | "* **Define objectives:** Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.\n", 17 | "* **Identify data sources:** Find the relevant data that helps you answer the questions that define the objectives of the project.\n", 18 | "\n", 19 | "#### Define objectives\n", 20 | "1. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project. Two examples of such targets are sales forecasts or the probability of an order being fraudulent.\n", 21 | "\n", 22 | "2. Define the project goals by asking and refining \"sharp\" questions that are relevant, specific, and unambiguous. Data science is a process that uses names and numbers to answer such questions. You typically use data science or machine learning to answer five types of questions:\n", 23 | "\n", 24 | " * How much or how many? (regression)\n", 25 | " * Which category? (classification)\n", 26 | " * Which group? (clustering)\n", 27 | " * Is this weird? (anomaly detection)\n", 28 | " * Which option should be taken? (recommendation)\n", 29 | "Determine which of these questions you're asking and how answering it achieves your business goals.\n", 30 | "\n", 31 | "3. Define the project team by specifying the roles and responsibilities of its members. Develop a high-level milestone plan that you iterate on as you discover more information.\n", 32 | "\n", 33 | "4. Define the success metrics. For example, you might want to achieve a customer churn prediction. You need an accuracy rate of \"x\" percent by the end of this three-month project. With this data, you can offer customer promotions to reduce churn. The metrics must be **SMART**:\n", 34 | "\n", 35 | "* **S**pecific\n", 36 | "* **M**easurable\n", 37 | "* **A**chievable\n", 38 | "* **R**elevant\n", 39 | "* **T**ime-bound\n", 40 | "\n", 41 | "#### Identify data sources\n", 42 | "Identify data sources that contain known examples of answers to your sharp questions. Look for the following data:\n", 43 | "\n", 44 | "* Data that's relevant to the question. Do you have measures of the target and features that are related to the target?\n", 45 | "* Data that's an accurate measure of your model target and the features of interest.\n", 46 | "For example, you might find that the existing systems need to collect and log additional kinds of data to address the problem and achieve the project goals. In this situation, you might want to look for external data sources or update your systems to collect new data.\n", 47 | "\n", 48 | "#### Artifacts\n", 49 | "Here are the deliverables in this stage:\n", 50 | "\n", 51 | "* **Charter document:** A standard template is provided in the TDSP project structure definition. The charter document is a living document. You update the template throughout the project as you make new discoveries and as business requirements change. The key is to iterate upon this document, adding more detail, as you progress through the discovery process. Keep the customer and other stakeholders involved in making the changes and clearly communicate the reasons for the changes to them.\n", 52 | "* **Data sources:** The Raw data sources section of the Data definitions report that's found in the TDSP project Data report folder contains the data sources. This section specifies the original and destination locations for the raw data. In later stages, you fill in additional details like the scripts to move the data to your analytic environment.\n", 53 | "* **Data dictionaries:** This document provides descriptions of the data that's provided by the client. These descriptions include information about the schema (the data types and information on the validation rules, if any) and the entity-relation diagrams, if available." 54 | ] 55 | } 56 | ], 57 | "metadata": { 58 | "kernelspec": { 59 | "display_name": "Python 3", 60 | "language": "python", 61 | "name": "python3" 62 | }, 63 | "language_info": { 64 | "codemirror_mode": { 65 | "name": "ipython", 66 | "version": 3 67 | }, 68 | "file_extension": ".py", 69 | "mimetype": "text/x-python", 70 | "name": "python", 71 | "nbconvert_exporter": "python", 72 | "pygments_lexer": "ipython3", 73 | "version": "3.6.8" 74 | } 75 | }, 76 | "nbformat": 4, 77 | "nbformat_minor": 4 78 | } 79 | -------------------------------------------------------------------------------- /Pandas/DSPD0101EN-Introduction-to-Pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In the previous chapter, \n", 15 | "* we dove into detail on NumPy and its ``ndarray`` object, which provides efficient storage and manipulation of dense typed arrays in Python.\n", 16 | "\n", 17 | "* Here we'll build on this knowledge by looking in detail at the data structures provided by the Pandas library.\n", 18 | "* Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a ``DataFrame``.\n", 19 | "``DataFrame``s are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.\n", 20 | "* As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.\n", 21 | "\n", 22 | "As we saw, NumPy's ``ndarray`` data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks.\n", 23 | "While it serves this purpose very well, its limitations become clear when we need more flexibility (e.g., attaching labels to data, working with missing data, etc.) and when attempting operations that do not map well to element-wise broadcasting (e.g., groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us.\n", 24 | "Pandas, and in particular its ``Series`` and ``DataFrame`` objects, builds on the NumPy array structure and provides efficient access to these sorts of \"data munging\" tasks that occupy much of a data scientist's time.\n", 25 | "\n", 26 | "In this chapter, we will focus on the mechanics of using ``Series``, ``DataFrame``, and related structures effectively.\n", 27 | "We will use examples drawn from real datasets where appropriate, but these examples are not necessarily the focus." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Installing and Using Pandas\n", 35 | "\n", 36 | "Installation of Pandas on your system requires NumPy to be installed, and if building the library from source, requires the appropriate tools to compile the C and Cython sources on which Pandas is built.\n", 37 | "Details on this installation can be found in the [Pandas documentation](http://pandas.pydata.org/).\n", 38 | "If you followed the advice outlined in the [Preface](00.00-Preface.ipynb) and used the Anaconda stack, you already have Pandas installed.\n", 39 | "\n", 40 | "Once Pandas is installed, you can import it and check the version:" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 2, 46 | "metadata": {}, 47 | "outputs": [ 48 | { 49 | "name": "stdout", 50 | "output_type": "stream", 51 | "text": [ 52 | "Requirement already satisfied: pandas in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (1.0.3)\n", 53 | "Requirement already satisfied: numpy>=1.13.3 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas) (1.18.3)\n", 54 | "Requirement already satisfied: python-dateutil>=2.6.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas) (2.8.1)\n", 55 | "Requirement already satisfied: pytz>=2017.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas) (2020.1)\n", 56 | "Requirement already satisfied: six>=1.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from python-dateutil>=2.6.1->pandas) (1.14.0)\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "!pip install pandas" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 3, 67 | "metadata": { 68 | "collapsed": true, 69 | "jupyter": { 70 | "outputs_hidden": true 71 | } 72 | }, 73 | "outputs": [ 74 | { 75 | "name": "stdout", 76 | "output_type": "stream", 77 | "text": [ 78 | "Requirement already satisfied: pandas-profiling[html,notebook] in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (2.6.0)\n", 79 | "Requirement already satisfied: scipy>=1.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.4.1)\n", 80 | "Requirement already satisfied: tqdm>=4.43.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (4.45.0)\n", 81 | "Requirement already satisfied: matplotlib>=3.2.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (3.2.1)\n", 82 | "Requirement already satisfied: ipywidgets>=7.5.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (7.5.1)\n", 83 | "Requirement already satisfied: missingno>=0.4.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.4.2)\n", 84 | "Requirement already satisfied: tangled-up-in-unicode>=0.0.4 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.0.4)\n", 85 | "Requirement already satisfied: confuse>=1.0.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.1.0)\n", 86 | "Requirement already satisfied: visions[type_image_path]>=0.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.4.1)\n", 87 | "Requirement already satisfied: astropy>=4.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (4.0.1.post1)\n", 88 | "Requirement already satisfied: requests>=2.23.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (2.23.0)\n", 89 | "Requirement already satisfied: pandas>=0.25.3 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.0.3)\n", 90 | "Requirement already satisfied: numpy>=1.16.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (1.18.3)\n", 91 | "Requirement already satisfied: phik>=0.9.10 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.9.11)\n", 92 | "Requirement already satisfied: statsmodels>=0.11.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.11.1)\n", 93 | "Requirement already satisfied: htmlmin>=0.1.12 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (0.1.12)\n", 94 | "Requirement already satisfied: jinja2>=2.11.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (2.11.2)\n", 95 | "Requirement already satisfied: jupyter-client>=6.0.0; extra == \"notebook\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (6.1.3)\n", 96 | "Requirement already satisfied: jupyter-core>=4.6.3; extra == \"notebook\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas-profiling[html,notebook]) (4.6.3)\n", 97 | "Requirement already satisfied: cycler>=0.10 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (0.10.0)\n", 98 | "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (2.4.7)\n", 99 | "Requirement already satisfied: kiwisolver>=1.0.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (1.2.0)\n", 100 | "Requirement already satisfied: python-dateutil>=2.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from matplotlib>=3.2.0->pandas-profiling[html,notebook]) (2.8.1)\n", 101 | "Requirement already satisfied: nbformat>=4.2.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (5.0.6)\n", 102 | "Requirement already satisfied: traitlets>=4.3.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (4.3.3)\n", 103 | "Requirement already satisfied: ipython>=4.0.0; python_version >= \"3.3\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (7.13.0)\n", 104 | "Requirement already satisfied: ipykernel>=4.5.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (5.2.1)\n", 105 | "Requirement already satisfied: widgetsnbextension~=3.5.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.5.1)\n", 106 | "Requirement already satisfied: seaborn in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from missingno>=0.4.2->pandas-profiling[html,notebook]) (0.10.1)\n", 107 | "Requirement already satisfied: pyyaml in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from confuse>=1.0.0->pandas-profiling[html,notebook]) (5.3.1)\n", 108 | "Requirement already satisfied: networkx>=2.4 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (2.4)\n", 109 | "Requirement already satisfied: attrs>=19.3.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (19.3.0)\n", 110 | "Requirement already satisfied: imagehash; extra == \"type_image_path\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (4.1.0)\n", 111 | "Requirement already satisfied: Pillow; extra == \"type_image_path\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (7.1.2)\n", 112 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (1.25.9)\n", 113 | "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (2020.4.5.1)\n", 114 | "Requirement already satisfied: chardet<4,>=3.0.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (3.0.4)\n", 115 | "Requirement already satisfied: idna<3,>=2.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from requests>=2.23.0->pandas-profiling[html,notebook]) (2.9)\n", 116 | "Requirement already satisfied: pytz>=2017.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from pandas>=0.25.3->pandas-profiling[html,notebook]) (2020.1)\n", 117 | "Requirement already satisfied: joblib>=0.14.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from phik>=0.9.10->pandas-profiling[html,notebook]) (0.14.1)\n", 118 | "Requirement already satisfied: numba>=0.38.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from phik>=0.9.10->pandas-profiling[html,notebook]) (0.49.0)\n", 119 | "Requirement already satisfied: patsy>=0.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from statsmodels>=0.11.1->pandas-profiling[html,notebook]) (0.5.1)\n", 120 | "Requirement already satisfied: MarkupSafe>=0.23 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jinja2>=2.11.1->pandas-profiling[html,notebook]) (1.1.1)\n", 121 | "Requirement already satisfied: pyzmq>=13 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jupyter-client>=6.0.0; extra == \"notebook\"->pandas-profiling[html,notebook]) (19.0.0)\n", 122 | "Requirement already satisfied: tornado>=4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jupyter-client>=6.0.0; extra == \"notebook\"->pandas-profiling[html,notebook]) (6.0.4)\n", 123 | "Requirement already satisfied: pywin32>=1.0; sys_platform == \"win32\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jupyter-core>=4.6.3; extra == \"notebook\"->pandas-profiling[html,notebook]) (227)\n", 124 | "Requirement already satisfied: six in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from cycler>=0.10->matplotlib>=3.2.0->pandas-profiling[html,notebook]) (1.14.0)\n", 125 | "Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.2.0)\n", 126 | "Requirement already satisfied: ipython-genutils in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.2.0)\n", 127 | "Requirement already satisfied: decorator in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from traitlets>=4.3.1->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (4.4.2)\n", 128 | "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.0.5)\n", 129 | "Requirement already satisfied: pickleshare in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.7.5)\n", 130 | "Requirement already satisfied: setuptools>=18.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (46.1.3)\n", 131 | "Requirement already satisfied: colorama; sys_platform == \"win32\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.4.3)\n", 132 | "Requirement already satisfied: pygments in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (2.6.1)\n", 133 | "Requirement already satisfied: backcall in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.1.0)\n", 134 | "Requirement already satisfied: jedi>=0.10 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.17.0)\n", 135 | "Requirement already satisfied: notebook>=4.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (6.0.3)\n", 136 | "Requirement already satisfied: PyWavelets in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from imagehash; extra == \"type_image_path\"->visions[type_image_path]>=0.4.1->pandas-profiling[html,notebook]) (1.1.1)\n", 137 | "Requirement already satisfied: llvmlite<=0.33.0.dev0,>=0.31.0.dev0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from numba>=0.38.1->phik>=0.9.10->pandas-profiling[html,notebook]) (0.32.0)\n", 138 | "Requirement already satisfied: pyrsistent>=0.14.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.16.0)\n", 139 | "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (1.6.0)\n", 140 | "Requirement already satisfied: wcwidth in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.1.9)\n", 141 | "Requirement already satisfied: parso>=0.7.0 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from jedi>=0.10->ipython>=4.0.0; python_version >= \"3.3\"->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.7.0)\n", 142 | "Requirement already satisfied: Send2Trash in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (1.5.0)\n", 143 | "Requirement already satisfied: prometheus-client in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.7.1)\n", 144 | "Requirement already satisfied: nbconvert in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (5.6.1)\n", 145 | "Requirement already satisfied: terminado>=0.8.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.8.3)\n", 146 | "Requirement already satisfied: zipp>=0.5 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from importlib-metadata; python_version < \"3.8\"->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.1.0)\n", 147 | "Requirement already satisfied: defusedxml in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.6.0)\n", 148 | "Requirement already satisfied: entrypoints>=0.2.2 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.3)\n", 149 | "Requirement already satisfied: testpath in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.4.4)\n", 150 | "Requirement already satisfied: bleach in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (3.1.4)\n", 151 | "Requirement already satisfied: mistune<2,>=0.8.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.8.4)\n", 152 | "Requirement already satisfied: pandocfilters>=1.4.1 in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (1.4.2)\n", 153 | "Requirement already satisfied: pywinpty>=0.5; os_name == \"nt\" in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from terminado>=0.8.1->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.5.7)\n" 154 | ] 155 | }, 156 | { 157 | "name": "stderr", 158 | "output_type": "stream", 159 | "text": [ 160 | " WARNING: pandas-profiling 2.6.0 does not provide the extra 'html'\n" 161 | ] 162 | }, 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "Requirement already satisfied: webencodings in c:\\users\\reddy\\appdata\\local\\programs\\python\\python36\\lib\\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.5.1->pandas-profiling[html,notebook]) (0.5.1)\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "!pip install pandas-profiling[notebook,html] # Generate the Report" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 4, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "data": { 182 | "text/plain": [ 183 | "'1.0.3'" 184 | ] 185 | }, 186 | "execution_count": 4, 187 | "metadata": {}, 188 | "output_type": "execute_result" 189 | } 190 | ], 191 | "source": [ 192 | "import pandas\n", 193 | "pandas.__version__" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "Just as we generally import NumPy under the alias ``np``, we will import Pandas under the alias ``pd``:" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 5, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "import pandas as pd" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 6, 215 | "metadata": {}, 216 | "outputs": [ 217 | { 218 | "data": { 219 | "text/plain": [ 220 | "'1.0.3'" 221 | ] 222 | }, 223 | "execution_count": 6, 224 | "metadata": {}, 225 | "output_type": "execute_result" 226 | } 227 | ], 228 | "source": [ 229 | "pd.__version__" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "This import convention will be used throughout the remainder of this book." 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 8, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "import pandas_profiling as pp # Generating the Report " 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 10, 251 | "metadata": {}, 252 | "outputs": [ 253 | { 254 | "data": { 255 | "text/plain": [ 256 | "'2.6.0'" 257 | ] 258 | }, 259 | "execution_count": 10, 260 | "metadata": {}, 261 | "output_type": "execute_result" 262 | } 263 | ], 264 | "source": [ 265 | "pp.__version__" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "pd." 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "## Reminder about Built-In Documentation\n", 282 | "\n", 283 | "As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ``?`` character). (Refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) if you need a refresher on this.)\n", 284 | "\n", 285 | "For example, to display all the contents of the pandas namespace, you can type\n", 286 | "\n", 287 | "```ipython\n", 288 | "In [3]: pd.\n", 289 | "```\n", 290 | "\n", 291 | "And to display Pandas's built-in documentation, you can use this:\n", 292 | "\n", 293 | "```ipython\n", 294 | "In [4]: pd?\n", 295 | "```\n", 296 | "\n", 297 | "More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/.\n", 298 | "\n", 299 | "\n", 300 | "\n", 301 | "## Data Science Life Cycle " 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "" 309 | ] 310 | } 311 | ], 312 | "metadata": { 313 | "anaconda-cloud": {}, 314 | "kernelspec": { 315 | "display_name": "Python 3", 316 | "language": "python", 317 | "name": "python3" 318 | }, 319 | "language_info": { 320 | "codemirror_mode": { 321 | "name": "ipython", 322 | "version": 3 323 | }, 324 | "file_extension": ".py", 325 | "mimetype": "text/x-python", 326 | "name": "python", 327 | "nbconvert_exporter": "python", 328 | "pygments_lexer": "ipython3", 329 | "version": "3.6.8" 330 | } 331 | }, 332 | "nbformat": 4, 333 | "nbformat_minor": 4 334 | } 335 | -------------------------------------------------------------------------------- /Pandas/README.md: -------------------------------------------------------------------------------- 1 | ### Methodology for Data Science 2 | 3 | * In the domain of data science, `solving problems and answering questions` through data analysis is standard practice. 4 | * Often, `data scientists construct a model to predict outcomes` or discover underlying patterns, with the goal of gaining insights. 5 | * Organizations can then use these insights to take actions that ideally improve future outcomes. 6 | * There are numerous rapidly evolving technologies for analyzing data and building models. 7 | * In a remarkably short time, they have progressed from desktops to massively parallel warehouses with huge data volumes and in-database analytic functionality in relational databases and Apache Hadoop. 8 | * **Text analytics** on unstructured or semi-structured data is becoming increasingly important as a way to incorporate sentiment and other useful information from text into predictive models, often leading to significant improvements in model quality and accuracy. 9 | * Emerging analytics approaches seek to automate many of the steps in model building and application, making machinelearning technology more accessible to those who lack deep quantitative skills. 10 | * Also, in contrast to the “top-down” approach of first defining the business problem and then analyzing the data to find a solution, some data scientists may use a “bottom-up” approach. 11 | * With the latter, the data scientist looks into large volumes of data to see what business goal might be suggested by the data and then tackles that problem. Since most problems are addressed in a top-down manner, the methodology in this paper reflects that view. 12 | * A 10-stage data science methodology that spans technologies and approaches As data analytics capabilities become more accessible and prevalent, data scientists need a foundational methodology capable of providing a guiding strategy, regardless of the technologies, data volumes or approaches involved (see Figure 1). 13 | ![](https://www.ibmbigdatahub.com/sites/default/files/figure01_revised.jpg) 14 | This methodology bears some similarities to recognized methodologies1-5 for data mining, but it emphasizes several of the new practices in data science such as the use of very large data volumes, the incorporation of text analytics into predictive modeling and the automation of some processes. 15 | * The methodology consists of 10 stages that form an iterative process for using data to uncover insights. Each stage plays a vital role in the context of the overall methodology. 16 | 17 | *** 18 | ### What is a methodology? 19 | A methodology is a general strategy that guides the processes and activities within a given domain. Methodology does not depend on particular technologies or tools, nor is it a set of techniques or recipes. Rather, a methodology provides the data scientist with a framework for how to proceed with whatever methods, processes and heuristics will be used to obtain answers or results. 20 | 21 | *** 22 | * **Stage 1: Business understanding** 23 | Every project starts with business understanding. The business sponsors who need the analytic solution play the most critical role in this stage by defining the problem, project objectives and solution requirements from a business perspective. This first stage lays the foundation for a successful resolution of the business problem. To help guarantee the project’s success, the sponsors should be involved throughout the project to provide domain expertise, review intermediate findings and ensure the work remains on track to generate the intended solution. 24 | 25 | * **Stage 2: Analytic approach** 26 | Once the business problem has been clearly stated, the data scientist can define the analytic approach to solving the problem. This stage entails expressing the problem in the context of statistical and machine-learning techniques, so the organization can identify the most suitable ones for the desired outcome. For example, if the goal is to predict a response such as “yes” or “no,” then the analytic approach could be defined as building, testing and implementing a classification model. 27 | 28 | * **Stage 3: Data requirements** 29 | The chosen analytic approach determines the data requirements. Specifically, the analytic methods to be used require certain data content, formats and representations, guided by domain knowledge. 30 | 31 | * **Stage 4: Data collection** 32 | In the initial data collection stage, data scientists identify and gather the available data resources—structured, unstructured and semi-structured—relevant to the problem domain. Typically, they must choose whether to make additional investments to obtain less-accessible data elements. It may be best to defer the investment decision until more is known about the data and the model. If there are gaps in data collection, the data scientist may have to revise the data requirements accordingly and collect new and/or more data. 33 | While data sampling and subsetting are still important, today’s high-performance platforms and in-database analytic functionality let data scientists use much larger data sets containing much or even all of the available data. By incorporating more data, predictive models may be better able to represent rare events such as disease incidence or system failure. 34 | 35 | * **Stage 5: Data understanding** 36 | After the original data collection, data scientists typically use descriptive statistics and visualization techniques to understand the data content, assess data quality and discover initial insights about the data. Additional data collection may be necessary to fill gaps. 37 | 38 | * **Stage 6: Data preparation** 39 | This stage encompasses all activities to construct the data set that will be used in the subsequent modeling stage. Data preparation activities include data cleaning (dealing with missing or invalid values, eliminating duplicates, formatting properly), combining data from multiple sources (files, tables, platforms) and transforming data into more useful variables. 40 | In a process called feature engineering, data scientists can create additional explanatory variables, also referred to as predictors or features, through a combination of domain knowledge and existing structured variables. When text data is available, such as customer call center logs or physicians’ notes in unstructured or semi-structured form, text analytics is useful in deriving new structured variables to enrich the set of predictors and improve model accuracy. 41 | Data preparation is usually the most time-consuming step in a data science project. In many domains, some data preparation steps are common across different problems. Automating certain data preparation steps in advance may accelerate the process by minimizing ad hoc preparation time. With today’s high-performance, massively parallel systems and analytic functionality residing where the data is stored, data scientists can more easily and rapidly prepare data using very large data sets. 42 | 43 | * **Stage 7: Modeling** 44 | Starting with the first version of the prepared data set, the modeling stage focuses on developing predictive or descriptive models according to the previously defined analytic approach. With predictive models, data scientists use a training set (historical data in which the outcome of interest is known) to build the model. The modeling process is typically highly iterative as organizations gain intermediate insights, leading to refinements in data preparation and model specification. For a given technique, data scientists may try multiple algorithms with their respective parameters to find the best model for the available variables. 45 | 46 | * **Stage 8: Evaluation** 47 | During model development and before deployment, the data scientist evaluates the model to understand its quality and ensure that it properly and fully addresses the business problem. Model evaluation entails computing various diagnostic measures and other outputs such as tables and graphs, enabling the data scientist to interpret the model’s quality and its efficacy in solving the problem. For a predictive model, data scientists use a testing set, which is independent of the training set but follows the same probability distribution and has a known outcome. The testing set is used to evaluate the model so it can be refined as needed. Sometimes the final model is applied also to a validation set for a final assessment. 48 | In addition, data scientists may assign statistical significance tests to the model as further proof of its quality. This additional proof may be instrumental in justifying model implementation or taking actions when the stakes are high—such as an expensive supplemental medical protocol or a critical airplane flight system. 49 | 50 | * **Stage 9: Deployment** 51 | Once a satisfactory model has been developed and is approved by the business sponsors, it is deployed into the production environment or a comparable test environment. Usually it is deployed in a limited way until its performance has been fully evaluated. Deployment may be as simple as generating a report with recommendations, or as involved as embedding the 52 | model in a complex workflow and scoring process managed by a custom application. Deploying a model into an operational business process usually involves additional groups, skills and technologies from within the enterprise. For example, a sales group may deploy a response propensity model through a campaign management process created by a development team and administered by a marketing group. 53 | 54 | * **Stage 10: Feedback** 55 | By collecting results from the implemented model, the organization gets feedback on the model’s performance and its impact on the environment in which it was deployed. For example, feedback could take the form of response rates to a promotional campaign targeting a group of customers identified by the model as high-potential responders. Analyzing this feedback enables data scientists to refine the model to improve its accuracy and usefulness. They can automate some or all of the feedback-gathering and model assessment, refinement and redeployment steps to speed up the process of model refreshing for better outcomes. 56 | 57 | *** 58 | ### Difference Between AI/ML/DS 59 | **Data science** is a broad term for a variety of models and methods to get information. 60 | 61 | Under the umbrella of data science is the scientific method, math, statistics, and other tools that are used to analyze and manipulate data. If it’s a tool or process done to data to analyze it or get some sort of information out of it, it likely falls under data science. 62 | Data science is the field where a large volume of data is dealt with by software to find a correlation between the sets to extract information. This information is used by the artificial intelligence-driven platforms. It is also a scientific method to study data and reach actionable conclusions. 63 | 64 | **Machine learning** is kind of artificial intelligence that is responsible for providing computers the ability to learn about newer data sets without being programmed via an explicit source. It focuses primarily on the development of several computer programs that can transform if and when exposed to newer sets of data. Machine Learning is a current application of AI based around the idea that we should really just be able to give machines access to data and let them learn for themselves. 65 | 66 | Machine learning is a set of algorithms showing the properties of an artificial intelligence-driven platform. These algorithms help a machine to learn to new behaviors following a particular discipline. 67 | And, 68 | 69 | **Artificial Intelligence** is the broader concept of machines being able to carry out tasks in a way that we would consider “smart”. 70 | 71 | AI involves machines that can perform tasks that are characteristic of human intelligence. While this is rather general, it includes things like planning, understanding language, recognizing objects and sounds, learning, and problem solving. 72 | 73 | We can put AI in two categories, general and narrow. General AI would have all of the characteristics of human intelligence, including the capacities mentioned above. Narrow AI exhibits some facet(s) of human intelligence, and can do that facet extremely well, but is lacking in other areas. A machine that’s great at recognizing images, but nothing else, would be an example of narrow AI. 74 | 75 | Generalized AIs – systems or devices which can in theory handle any task – are less common, but this is where some of the most exciting advancements are happening today. It is also the area that has led to the development of Machine Learning. Often referred to as a subset of AI, it’s really more accurate to think of it as the current state-of-the-art. 76 | 77 | Artificial intelligence is a unique property of systems running a computer to predict output for an input provided by a user. These systems learn from the multiple outputs and inputs and provide a better solution every time. It is a self-learning system that fascinates many IT professionals these days. 78 | 79 | Machine Learning is a subset of Artificial Intelligence. 80 | *** 81 | ### Reference 82 | 1. [Data Science](https://en.wikipedia.org/wiki/Data_science) 83 | 2. [data-science-and-prediction](https://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/fulltext) 84 | 3. [whats-the-difference-between-data-science and Statistics](https://priceonomics.com/whats-the-difference-between-data-science-and/) 85 | 4. [Difference](https://www.dataneb.com/post/artificial-intelligence-machine-learning-deep-learning-predictive-analytics-data-science) 86 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Data-Science-With-Python 2 | ---------------------------- 3 | * Data science is an inter-disciplinary field that uses `scientific methods`, `processes`, `algorithms and systems` to extract knowledge and insights from many structural and unstructured data. 4 | * Data science is related to data mining and big data. 5 | * Data science is a "concept to unify statistics, data analysis, machine learning and their related methodsin order to `understand and analyze actual phenomena with data`. 6 | * Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge [Reference Link](https://en.wikipedia.org/wiki/Data_science) 7 | * Data science continues to evolve as one of the most promising and in-demand career paths for skilled professionals. Today, successful data professionals understand that they must advance past the traditional skills of analyzing large amounts of data, data mining, and programming skills. In order to uncover useful intelligence for their organizations, data scientists must master the full spectrum of the data science life cycle and possess a level of flexibility and understanding to maximize returns at each phase of the process. 8 | *** 9 |

“The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that’s going to be a hugely important skill in the next decades.”

10 | 11 | *** 12 | Effective data scientists are able to identify relevant questions, collect data from a multitude of different data sources, organize the information, translate results into solutions, and communicate their findings in a way that positively affects business decisions. These skills are required in almost all industries, causing skilled data scientists to be increasingly valuable to companies. 13 | 14 | ### The Data Science Life Cycle 15 | -------------------------------- 16 | 17 | ![]() 18 | 19 | 20 | ### Where Do You Fit in Data Science? 21 | Data is everywhere and expansive. A variety of terms related to mining, cleaning, analyzing, and interpreting data are often used interchangeably, but they can actually involve different skill sets and complexity of data. 22 | 23 | **Data Scientist** 24 | Data scientists examine which questions need answering and where to find the related data. They have business acumen and analytical skills as well as the ability to mine, clean, and present data. Businesses use data scientists to source, manage, and analyze large amounts of unstructured data. Results are then synthesized and communicated to key stakeholders to drive strategic decision-making in the organization. 25 | 26 | **Skills needed:** Programming skills (SAS, R, Python), statistical and mathematical skills, storytelling and data visualization, Hadoop, SQL, machine learning 27 | 28 | **Data Analyst** 29 | Data analysts bridge the gap between data scientists and business analysts. They are provided with the questions that need answering from an organization and then organize and analyze data to find results that align with high-level business strategy. Data analysts are responsible for translating technical analysis to qualitative action items and effectively communicating their findings to diverse stakeholders. 30 | 31 | **Skills needed:** Programming skills (SAS, R, Python), statistical and mathematical skills, data wrangling, data visualization 32 | 33 | **Data Engineer** 34 | Data engineers manage exponential amounts of rapidly changing data. They focus on the development, deployment, management, and optimization of data pipelines and infrastructure to transform and transfer data to data scientists for querying. 35 | 36 | **Skills needed:** Programming languages (Java, Scala), NoSQL databases (MongoDB, Cassandra DB), frameworks (Apache Hadoop) 37 | 38 | ### Data Science Career Outlook and Salary Opportunities 39 | 40 | Data science professionals are rewarded for their highly technical skill set with competitive salaries and great job opportunities at big and small companies in most industries. With over 4,500 open positions listed on Glassdoor, data science professionals with the appropriate experience and education have the opportunity to make their mark in some of the most forward-thinking companies in the world.6 41 | 42 | Below are the average base salaries for the following positions: 7 43 | 44 | |Positions|Salaries| 45 | |--------|---------| 46 | |`Data analyst:`|$65,470| 47 | |`Data scientist:`| $120,931| 48 | |`Senior data scientist:`| $141,257| 49 | |`Data engineer:`| $137,776| 50 | 51 | Gaining specialized skills within the data science field can distinguish data scientists even further. For example, machine learning experts utilize high-level programming skills to create algorithms that continuously gather data and automatically adjust their function to be more effective. 52 | 53 | --- 54 | # References To Learn and Develop your Self: 55 | * [Python](https://github.com/reddyprasade/Python-Basic-For-All-3.x) 56 | * [Data Science With Python ](https://github.com/reddyprasade/Data-Science-With-Python) 57 | * [Machine Learning with Python](https://github.com/reddyprasade/Machine-Learning-with-Scikit-Learn-Python-3.x) 58 | * [Deep learning With python](https://github.com/reddyprasade/Deep-Learning) 59 | * [Data Visulization](https://github.com/reddyprasade/Data-Science-With-Python/tree/master/Data%20Visualization) 60 | * [Life Cycle of Data Science](https://github.com/reddyprasade/Data-Science-With-Python/tree/master/Life%20Cycle%20Process%20of%20Data%20Science%20In%20Real%20World%20project) 61 | * [Statistics](https://github.com/reddyprasade/Data-Science-With-Python/tree/master/Statistics) 62 | -------------------------------------------------------------------------------- /Statistics/Data/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Statistics/Practice/01 - Day 0 - Mean, Median, and Mode.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-basic-statistics/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | N = int(input()) 15 | NUMBER = list(map(int, input().split())) 16 | 17 | # Mean 18 | SUM = 0 19 | 20 | for i in NUMBER: 21 | SUM = SUM + i 22 | print(float(SUM / N)) 23 | 24 | # Median 25 | NUMBER = sorted(NUMBER) 26 | 27 | if N % 2 == 0: 28 | A = NUMBER[N//2] 29 | B = NUMBER[N//2 - 1] 30 | print((A+B)/2) 31 | else: 32 | print(NUMBER[N//2]) 33 | 34 | # Mode 35 | MAX_1 = 0 36 | NUMBER = sorted(NUMBER) 37 | 38 | for i in NUMBER: 39 | COUNTER = 0 40 | INDEX = NUMBER.index(i) 41 | 42 | for j in range(INDEX, len(NUMBER)): 43 | if i == NUMBER[j]: 44 | COUNTER = COUNTER + 1 45 | if COUNTER > MAX_1: 46 | MAX_1 = COUNTER 47 | if MAX_1 == 1: 48 | MODE = NUMBER[0] 49 | else: 50 | MODE = i 51 | 52 | print(MODE) 53 | -------------------------------------------------------------------------------- /Statistics/Practice/02 - Day 0 - Weighted Mean.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-weighted-mean/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | X = int(input()) 15 | ARRAY = list(map(int, input().split())) 16 | WEIGHT = list(map(int, input().split())) 17 | Y = 0 18 | 19 | for i in range(X): 20 | Y += ARRAY[i]*WEIGHT[i] 21 | 22 | print("{:.1f}".format(Y/sum(WEIGHT))) 23 | -------------------------------------------------------------------------------- /Statistics/Practice/03 - Day 1 - Quartiles.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-quartiles/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | N = int(input()) 15 | ARRAY = sorted(map(int, input().rstrip().split())) 16 | 17 | def median(n, array): 18 | '''Function to calculate the median''' 19 | 20 | if n % 2 == 0: 21 | ind1 = n//2 22 | ind2 = ind1 - 1 23 | return (array[ind1] + array[ind2]) // 2 24 | else: 25 | return array[n//2] 26 | 27 | MEDIAN_L = median(N//2, ARRAY[0:N//2]) 28 | MEDIAN_X = median(N, ARRAY) 29 | MEDIAN_U = median(N//2, ARRAY[(N+1)//2:]) 30 | 31 | print(MEDIAN_L) 32 | print(MEDIAN_X) 33 | print(MEDIAN_U) 34 | -------------------------------------------------------------------------------- /Statistics/Practice/04 - Day 1 - Interquartile Range.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-interquartile-range 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | def find_median(arr): 15 | if len(arr) % 2 == 1: 16 | return arr[len(arr) // 2] 17 | else: 18 | return (arr[len(arr) // 2] + arr[len(arr) // 2 - 1]) / 2 19 | 20 | # Create array 21 | N = int(input()) 22 | VALUES = list(map(int, input().split())) 23 | FREQUENCY = list(map(int, input().split())) 24 | 25 | ARRAY = [] 26 | 27 | for i in range(N): 28 | ARRAY += [VALUES[i]] * FREQUENCY[i] 29 | ARRAY = sorted(ARRAY) 30 | 31 | # Find interquartile_range 32 | INTERQUARTILE_RANGE = float(find_median(ARRAY[len(ARRAY) // 2 + len(ARRAY) % 2:]) - find_median(ARRAY[:len(ARRAY)//2])) 33 | 34 | print(INTERQUARTILE_RANGE) 35 | -------------------------------------------------------------------------------- /Statistics/Practice/05 - Day 1 - Standard Deviation.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-standard-deviation 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | N = int(input()) 15 | X = list(map(int, input().strip().split(' '))) 16 | 17 | MEAN = sum(X)/N 18 | sum = 0 19 | 20 | for i in range(N): 21 | sum += ((X[i]-MEAN)**2)/N 22 | 23 | print(round(sum**0.5, 1)) 24 | -------------------------------------------------------------------------------- /Statistics/Practice/06 - Day 2 - Basic Probability.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-1/problem 6 | # Difficulty: Easy 7 | # Max Score: 10 8 | # Language: Python 9 | # Multiple Choice Question - No code required but checked with code 10 | 11 | # ======================== 12 | # Solution 13 | # ======================== 14 | 15 | from itertools import product 16 | from fractions import Fraction 17 | 18 | P = list(product([1, 2, 3, 4, 5, 6], repeat=2)) 19 | 20 | N = sum(1 for x in P if sum(x) <= 9) 21 | 22 | print(Fraction(N, len(P))) 23 | 24 | # >>> 5/6 25 | -------------------------------------------------------------------------------- /Statistics/Practice/07 - Day 2 - More Dice.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-2/problem 6 | # Difficulty: Easy 7 | # Max Score: 10 8 | # Language: Python 9 | # Multiple Choice Question - No code required but checked with code 10 | 11 | # ======================== 12 | # Solution 13 | # ======================== 14 | 15 | from itertools import product 16 | from fractions import Fraction 17 | 18 | P = list(product([1, 2, 3, 4, 5, 6], repeat=2)) 19 | 20 | N = sum(1 for x, y in P if x + y == 6 and x != y) 21 | 22 | print(Fraction(N, len(P))) 23 | 24 | # >>> 1/9 25 | -------------------------------------------------------------------------------- /Statistics/Practice/08 - Day 2 - Compound Event Probability.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-3/problem 6 | # Difficulty: Easy 7 | # Max Score: 10 8 | # Language: Python 9 | # Multiple Choice Question - No code required but checked with code 10 | 11 | # ======================== 12 | # Solution 13 | # ======================== 14 | 15 | import itertools 16 | from fractions import Fraction 17 | from collections import Counter 18 | 19 | # Let r = 1 and black = 0 20 | # Bag X 21 | X = list(Counter({1:4, 0:3}).elements()) 22 | 23 | # Bag Y 24 | Y = list(Counter({1:5, 0:4}).elements()) 25 | 26 | # Bag z 27 | Z = list(Counter({1:4, 0:4}).elements()) 28 | 29 | # Sample space / total number of outcomes 30 | TOTAL_SAMPLES = list(itertools.product(X, Y, Z)) 31 | 32 | # Total number of outcomes 33 | TOTAL_SAMPLES_SIZE = len(TOTAL_SAMPLES) 34 | 35 | # Total number of favourable outcomes 36 | FAVOURABLE_OUTCOMES_SIZE = sum([sum(i) == 2 for i in list(itertools.product(X, Y, Z))]) 37 | 38 | # Probability as a fraction 39 | print(Fraction(FAVOURABLE_OUTCOMES_SIZE,TOTAL_SAMPLES_SIZE)) 40 | 41 | # >>> 17/42 42 | -------------------------------------------------------------------------------- /Statistics/Practice/09 - Day 3 - Conditional Probability.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-4/problem 6 | # Difficulty: Easy 7 | # Max Score: 10 8 | # Language: Python 9 | # Multiple Choice Question - No code required but checked with code 10 | 11 | # ======================== 12 | # Solution 13 | # ======================== 14 | 15 | import itertools 16 | from fractions import Fraction 17 | 18 | # Sample space 19 | SAMPLE_SPACE = list(itertools.product(("b", "g"), ("b", "g"))) 20 | 21 | # Event b at least one boy [A] 22 | EVENT_B = [] 23 | for i in SAMPLE_SPACE: 24 | if i[0] == "b" or i[1] == "g": 25 | EVENT_B.append(i) 26 | 27 | # Event 2b two boys [B] 28 | EVENT_2B = [] 29 | for i in SAMPLE_SPACE: 30 | if i[0] == "b" and i[1] == "b": 31 | EVENT_2B.append(i) 32 | 33 | # Conditional probability -> p(2b | b) = p (b | 2b) * p (2b) / p(b) 34 | # Where -> p (b) = p(b|2b)* p(b) + p(b|2b')*p(b') 35 | 36 | # For p(b|2b) 37 | PB_2B = [] 38 | for i in EVENT_2B: 39 | PB_2B.append(i) 40 | 41 | PROB_PB_2B = Fraction(len(PB_2B), len(EVENT_2B)) 42 | 43 | # For p(2b) 44 | PROB_2B = Fraction(len(EVENT_2B), len(SAMPLE_SPACE)) 45 | 46 | # For p(b) 47 | PROB_B = Fraction(len(EVENT_B), len(SAMPLE_SPACE)) 48 | 49 | # Solving for p(2b | b) = p (b | 2b) * p (2b) / p(b) 50 | print(PROB_PB_2B*PROB_2B/PROB_B) 51 | 52 | # >>> 1/3 53 | -------------------------------------------------------------------------------- /Statistics/Practice/10 - Day 3 - Cards of the Same Suit.txt: -------------------------------------------------------------------------------- 1 | ======================== 2 | Information 3 | ======================== 4 | 5 | Direct Link: https://www.hackerrank.com/challenges/s10-mcq-5/problem 6 | Difficulty: Easy 7 | Max Score: 10 8 | Language: Python 9 | Multiple Choice Question - No code required 10 | 11 | ======================== 12 | Solution 13 | ======================== 14 | 15 | First card = 13/52 16 | Second card of the same suit = 12/51 (without replacement) 17 | There are 4 suits, so answer is (13/52) * (12/51) * 4 = 12/51 18 | 19 | >>> 12/51 20 | -------------------------------------------------------------------------------- /Statistics/Practice/11 - Day 3 - Drawing Marbles.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-mcq-6/problem 6 | # Difficulty: Easy 7 | # Max Score: 10 8 | # Language: Python 9 | # Multiple Choice Question - No code required but checked with code 10 | 11 | # ======================== 12 | # Solution 13 | # ======================== 14 | 15 | from itertools import permutations 16 | from fractions import Fraction 17 | 18 | # 1 for Red Marbles 19 | # 0 for Blue Marbles 20 | RED_MARBLES = [1, 1, 1] 21 | BLUE_MARBLES = [0, 0, 0, 0] 22 | 23 | # All combinations, excluded first blue 24 | FIRST_DRAW = list(filter(lambda m: m[0] == 1, permutations(RED_MARBLES + BLUE_MARBLES, 2))) 25 | 26 | # All combinations with second blue 27 | MARBLES_REMAINING = list(filter(lambda m: m[1] == 0, FIRST_DRAW)) 28 | 29 | # Result is 2/3 30 | print(Fraction(len(MARBLES_REMAINING), len(FIRST_DRAW))) 31 | 32 | # ======================== 33 | # Explanation 34 | # ======================== 35 | 36 | # A bag contains 3 red marbles and 4 blue marbles 37 | # After drawing a red marble, the bag has now 2 red and 4 blue marbles (total of 6 marbles) 38 | # Therefore, the probability of getting a blue marble is 4/6, simplified to 2/3 39 | 40 | # >>> 2/3 41 | -------------------------------------------------------------------------------- /Statistics/Practice/12 - Day 4 - Binomial Distribution I.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-binomial-distribution-1/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | def factorial(N): 15 | '''Function to calculate N factorial''' 16 | if N == 0: 17 | return 1 18 | else: 19 | return N * factorial(N - 1) 20 | 21 | def combination(N, X): 22 | '''Function to calculate the combination of N and X''' 23 | result = factorial(N) / (factorial(N - X) * factorial(X)) 24 | return result 25 | 26 | def binomial(X, N, P): 27 | '''Function to determine the binomial of X, N, and P''' 28 | Q = 1 - P 29 | result = combination(N, X) * (P**X) * (Q**(N - X)) 30 | return result 31 | 32 | if __name__ == '__main__': 33 | L, R = list(map(float, input().split())) 34 | ODDS = L / R 35 | TOTAL = list() 36 | for i in range(3, 7): 37 | TOTAL.append(binomial(i, 6, ODDS / (1 + ODDS))) 38 | print(round(sum(TOTAL), 3)) 39 | -------------------------------------------------------------------------------- /Statistics/Practice/13 - Day 4 - Binomial Distribution II.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-binomial-distribution-2/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | import math 15 | 16 | P = 0.12 17 | ANS_1 = 0 18 | 19 | for i in range(0, 3): 20 | ANS_1 += math.factorial(10)/math.factorial(i)/math.factorial(10-i) * P**i * (1-P)**(10-i) 21 | if i == 1: 22 | ANS_2 = 1 - ANS_1 23 | 24 | print(round(ANS_1, 3)) 25 | print(round(ANS_2, 3)) 26 | -------------------------------------------------------------------------------- /Statistics/Practice/14 - Day 4 - Geometric Distribution I.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-geometric-distribution-1/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | A, B = map(int, input().strip().split(' ')) 15 | C = int(input()) 16 | 17 | P = float(A/B) 18 | 19 | RES = (1-P) ** (C-1) * P 20 | 21 | print(round(RES, 3)) 22 | -------------------------------------------------------------------------------- /Statistics/Practice/15 - Day 4 - Geometric Distribution II.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-geometric-distribution-2/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | def geometric_prob(P, X): 15 | '''Function to calculate the geometric probability''' 16 | G = (1-P)**(X-1) * P 17 | return G 18 | 19 | NUMBERATOR, DENOMINATOR = map(float, input().split()) 20 | X = int(input()) 21 | P = NUMBERATOR/DENOMINATOR 22 | G = 0 23 | 24 | for i in range(1, 6): # i = 1, 2, 3, 4, 5 25 | G += geometric_prob(P, i) 26 | 27 | print("%.3f" %G) 28 | -------------------------------------------------------------------------------- /Statistics/Practice/16 - Day 5 - Poisson Distribution I.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-poisson-distribution-1/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | from math import factorial, exp 15 | 16 | MEAN = float(input()) 17 | K = int(input()) 18 | 19 | POISSON = ((MEAN ** K) * exp(-MEAN)) / factorial(K) 20 | 21 | print("%.3f" % POISSON) 22 | -------------------------------------------------------------------------------- /Statistics/Practice/17 - Day 5 - Poisson Distribution II.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-poisson-distribution-2/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | AVERAGE_X, AVERAGE_Y = [float(num) for num in input().split(" ")] 15 | 16 | # Cost 17 | COST_X = 160 + 40*(AVERAGE_X + AVERAGE_X**2) 18 | COST_Y = 128 + 40*(AVERAGE_Y + AVERAGE_Y**2) 19 | 20 | print(round(COST_X, 3)) 21 | print(round(COST_Y, 3)) 22 | -------------------------------------------------------------------------------- /Statistics/Practice/18 - Day 5 - Normal Distribution I.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-normal-distribution-1/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | import math 15 | 16 | MU = 20 17 | SD = 2 18 | 19 | def normal_cdf(X, MU, SD): 20 | '''Function to calculate the Cumulative Distribution Function''' 21 | return 1/2*(1+math.erf((X-MU)/(SD*math.sqrt(2)))) 22 | 23 | RESULT_1 = normal_cdf(19.5, MU, SD) 24 | RESULT_2 = normal_cdf(22, MU, SD) - normal_cdf(20, MU, SD) 25 | 26 | print(round(RESULT_1, 3)) 27 | print(round(RESULT_2, 3)) 28 | 29 | # .erf() -> https://docs.python.org/3.5/library/math.html#math.erf 30 | -------------------------------------------------------------------------------- /Statistics/Practice/19 - Day 5 - Normal Distribution II.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-normal-distribution-2/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | import math 15 | 16 | MU, SD = list(map(float, input().rstrip().split())) 17 | X_1 = float(input()) 18 | X_2 = float(input()) 19 | 20 | def normal_distribution(X, MU, SD): 21 | '''Function to calculate the normal distribution''' 22 | return 1/2*(1+math.erf((X-MU)/(SD*math.sqrt(2)))) 23 | 24 | # grade >80 25 | print(round((1-normal_distribution(X_1, MU, SD))*100, 2)) 26 | 27 | # grade >= 60 28 | print(round((1-normal_distribution(X_2, MU, SD))*100, 2)) 29 | 30 | # grade <60 31 | print(round((normal_distribution(X_2, MU, SD))*100, 2)) 32 | -------------------------------------------------------------------------------- /Statistics/Practice/20 - Day 6 - The Central Limit Theorem I.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | import math 15 | 16 | H = int(input()) 17 | B = int(input()) 18 | C = int(input()) 19 | INPUT = math.sqrt(B) * int(input()) 20 | 21 | print(round(0.5 * (1 + math.erf((H - (B * C)) / (INPUT * math.sqrt(2)))), 4)) 22 | -------------------------------------------------------------------------------- /Statistics/Practice/21 - Day 6 - The Central Limit Theorem II.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-2/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | import math 15 | 16 | TICKETS = 250 17 | STUDENTS = 100 18 | MEAN = 2.4 19 | SD = 2 20 | 21 | MU = STUDENTS * MEAN 22 | S = math.sqrt(100)*SD 23 | 24 | def normal_distribution(x, mu, sd): 25 | '''Function to calculate the distribution''' 26 | return 1/2*(1+math.erf((x-mu)/(sd*math.sqrt(2)))) 27 | 28 | print(round(normal_distribution(x=TICKETS, mu=MU, sd=S), 4)) 29 | -------------------------------------------------------------------------------- /Statistics/Practice/22 - Day 6 - The Central Limit Theorem III.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-3/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | import math 15 | 16 | SAMPLE = 100 17 | M = 500 18 | SD = 80 19 | Z = 1.96 20 | RNG = 0.95 21 | 22 | print(round(-1.96 * (SD/math.sqrt(SAMPLE)) + M, 2)) 23 | print(round(1.96 * (SD/math.sqrt(SAMPLE)) + M, 2)) 24 | -------------------------------------------------------------------------------- /Statistics/Practice/23 - Day 7 - Pearson Correlation Coefficient I.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | def std(x): 15 | return (sum([(i-(sum(x))/len(x))**2 for i in x])/len(x))**0.5 16 | 17 | N = int(input()) 18 | X = list(map(float, input().split())) 19 | Y = list(map(float, input().split())) 20 | 21 | X_M = sum(X)/len(X) 22 | Y_M = sum(Y)/len(Y) 23 | 24 | X_S = std(X) 25 | Y_S = std(Y) 26 | 27 | print(round(sum([(i-X_M)*(j-Y_M) for i, j in zip(X, Y)])/(N*X_S*Y_S), 3)) 28 | -------------------------------------------------------------------------------- /Statistics/Practice/24 - Day 7 - Spearman's Rank Correlation.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | N = int(input()) 15 | X = list(map(float, input().strip().split())) 16 | Y = list(map(float, input().strip().split())) 17 | 18 | X_COPY = X.copy() 19 | 20 | Y_COPY = Y.copy() 21 | 22 | X_COPY.sort() 23 | 24 | XD = dict(zip(X_COPY, range(1, N+1))) 25 | 26 | Y_COPY.sort() 27 | 28 | YD = dict(zip(Y_COPY, range(1, N+1))) 29 | 30 | RX = [XD[i] for i in X] 31 | 32 | RY = [YD[i] for i in Y] 33 | 34 | print(round(1-(6*sum([(RX-RY)**2 for RX, RY in zip(RX, RY)]))/((N**3)-N), 3)) 35 | -------------------------------------------------------------------------------- /Statistics/Practice/25 - Day 8 - Least Sqaure Regression Line.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-least-square-regression-line/problem 6 | # Difficulty: Easy 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | def mean(X): 15 | '''To calculate the mean''' 16 | return sum(X)/len(X) 17 | 18 | def lsr(X, Y): 19 | '''To calculate the Least Square Regression''' 20 | B = sum([(X[i] - mean(X)) * (Y[i] - mean(Y)) for i in range(len(X))])/sum([(j - mean(X))**2 for j in X]) 21 | A = mean(Y) - (B*mean(X)) 22 | return A+(B*80) 23 | 24 | X = [] 25 | Y = [] 26 | 27 | for i in range(5): 28 | A, B = list(map(int, input().split())) 29 | X.append(A) 30 | Y.append(B) 31 | 32 | print(round(lsr(X, Y), 3)) 33 | -------------------------------------------------------------------------------- /Statistics/Practice/26 - Day 8 - Pearson Correlation Coefficient II.txt: -------------------------------------------------------------------------------- 1 | ======================== 2 | Information 3 | ======================== 4 | 5 | Direct Link: https://www.hackerrank.com/challenges/s10-mcq-7/problem 6 | Difficulty: Easy 7 | Max Score: 30 8 | Language: Python 9 | Multiple Choice Question - No code required 10 | 11 | ======================== 12 | Solution 13 | ======================== 14 | 15 | Rewriting both lines in the form of y = mx + c 16 | y = (-3/4)*x - 2 17 | x = (-3/4)*y + (-7/4) 18 | 19 | c1 = -3/4 20 | c2 = -3/4 21 | 22 | Let x_std be the standard deviation of x, and let y_std be the standard deviation of y 23 | 24 | p = c1(x_std / y_std) 25 | p = c2(y_std / x_std) 26 | 27 | Multiplying both questions: 28 | p^2 = c1 * c2 29 | p^2 = (-3/4) * (-3/4) 30 | p^2 = 9/16 31 | p = +/-(3/4) 32 | 33 | Since x_std and y_std have to be positive. So p shares the same sign as c1 or c2. Thus, -3/4 34 | 35 | >>> -3/4 -------------------------------------------------------------------------------- /Statistics/Practice/27 - Day 9 - Multiple Linear Regression.py: -------------------------------------------------------------------------------- 1 | # ======================== 2 | # Information 3 | # ======================== 4 | 5 | # Direct Link: https://www.hackerrank.com/challenges/s10-multiple-linear-regression/problem 6 | # Difficulty: Medium 7 | # Max Score: 30 8 | # Language: Python 9 | 10 | # ======================== 11 | # Solution 12 | # ======================== 13 | 14 | from sklearn import linear_model 15 | 16 | M, N = list(map(int, input().strip().split())) 17 | X = [0]*N 18 | Y = [0]*N 19 | 20 | for i in range(N): 21 | inp = list(map(float, input().strip().split())) 22 | X[i] = inp[:-1] 23 | Y[i] = inp[-1] 24 | 25 | LM = linear_model.LinearRegression() 26 | LM.fit(X, Y) 27 | A = LM.intercept_ 28 | B = LM.coef_ 29 | 30 | Q = int(input()) 31 | 32 | for i in range(Q): 33 | f = list(map(float, input().strip().split())) 34 | Y = A + sum([B[j] * f[j] for j in range(M)]) 35 | print(round(Y, 2)) 36 | -------------------------------------------------------------------------------- /Statistics/Practice/Readme.md: -------------------------------------------------------------------------------- 1 | 2 | # 10 Days of Statistics in HackerRank 3 | 4 | | Day | Challenge | Problem | Difficulty | Score | Solution | 5 | | :---: | :-------------------------------------: | :------------------------------------------------------------------------------------------------: | :--------: | :---: | :----------------------------------------------------------------------------------------------------------: | 6 | | 0 | Mean, Median, and Mode | [Problem](https://www.hackerrank.com/challenges/s10-basic-statistics/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/01%20-%20Day%200%20-%20Mean,%20Median,%20and%20Mode.py) | 7 | | 0 | Weighted Mean | [Problem](https://www.hackerrank.com/challenges/s10-weighted-mean/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/02%20-%20Day%200%20-%20Weighted%20Mean.py) | 8 | | 1 | Quartiles | [Problem](https://www.hackerrank.com/challenges/s10-quartiles/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/03%20-%20Day%201%20-%20Quartiles.py) | 9 | | 1 | Interquartile Range | [Problem](https://www.hackerrank.com/challenges/s10-interquartile-range) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/04%20-%20Day%201%20-%20Interquartile%20Range.py) | 10 | | 1 | Standard Deviation | [Problem](https://www.hackerrank.com/challenges/s10-standard-deviation) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/05%20-%20Day%201%20-%20Standard%20Deviation.py) | 11 | | 2 | Basic Probability | [Problem](https://www.hackerrank.com/challenges/s10-mcq-1/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/06%20-%20Day%202%20-%20Basic%20Probability.py) | 12 | | 2 | More Dice | [Problem](https://www.hackerrank.com/challenges/s10-mcq-2/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/07%20-%20Day%202%20-%20More%20Dice.py) | 13 | | 2 | Compound Event Probability | [Problem](https://www.hackerrank.com/challenges/s10-mcq-3/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/08%20-%20Day%202%20-%20Compound%20Event%20Probability.py) | 14 | | 3 | Conditional Probability | [Problem](https://www.hackerrank.com/challenges/s10-mcq-4/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/09%20-%20Day%203%20-%20Conditional%20Probability.py) | 15 | | 3 | Cards of the Same Suit | [Problem](https://www.hackerrank.com/challenges/s10-mcq-5/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/10%20-%20Day%203%20-%20Cards%20of%20the%20Same%20Suit.txt) | 16 | | 3 | Drawing Marbles | [Problem](https://www.hackerrank.com/challenges/s10-mcq-6/problem) | Easy | 10 | [Solution](/10%20Days%20of%20Statistics/11%20-%20Day%203%20-%20Drawing%20Marbles.py) | 17 | | 4 | Binomial Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-binomial-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/12%20-%20Day%204%20-%20Binomial%20Distribution%20I.py) | 18 | | 4 | Binomial Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-binomial-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/13%20-%20Day%204%20-%20Binomial%20Distribution%20II.py) | 19 | | 4 | Geometric Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-geometric-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/14%20-%20Day%204%20-%20Geometric%20Distribution%20I.py) | 20 | | 4 | Geometric Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-geometric-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/15%20-%20Day%204%20-%20Geometric%20Distribution%20II.py) | 21 | | 5 | Poisson Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-poisson-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/16%20-%20Day%205%20-%20Poisson%20Distribution%20I.py) | 22 | | 5 | Poisson Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-poisson-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/17%20-%20Day%205%20-%20Poisson%20Distribution%20II.py) | 23 | | 5 | Normal Distribution I | [Problem](https://www.hackerrank.com/challenges/s10-normal-distribution-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/18%20-%20Day%205%20-%20Normal%20Distribution%20I.py) | 24 | | 5 | Normal Distribution II | [Problem](https://www.hackerrank.com/challenges/s10-normal-distribution-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/19%20-%20Day%205%20-%20Normal%20Distribution%20II.py) | 25 | | 6 | The Central Limit Theorem I | [Problem](https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/20%20-%20Day%206%20-%20The%20Central%20Limit%20Theorem%20I.py) | 26 | | 6 | The Central Limit Theorem II | [Problem](https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-2/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/21%20-%20Day%206%20-%20The%20Central%20Limit%20Theorem%20II.py) | 27 | | 6 | The Central Limit Theorem III | [Problem](https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-3/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/22%20-%20Day%206%20-%20The%20Central%20Limit%20Theorem%20III.py) | 28 | | 7 | Pearson Correlation Coefficient I | [Problem](https://www.hackerrank.com/challenges/s10-pearson-correlation-coefficient/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/23%20-%20Day%207%20-%20Pearson%20Correlation%20Coefficient%20I.py) | 29 | | 7 | Spearman's Rank Correlation Coefficient | [Problem](https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/24%20-%20Day%207%20-%20Spearman's%20Rank%20Correlation.py) | 30 | | 8 | Least Square Regression Line | [Problem](https://www.hackerrank.com/challenges/s10-least-square-regression-line/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/25%20-%20Day%208%20-%20Least%20Sqaure%20Regression%20Line.py) | 31 | | 8 | Pearson Correlation Coefficient II | [Problem](https://www.hackerrank.com/challenges/s10-mcq-7/problem) | Easy | 30 | [Solution](/10%20Days%20of%20Statistics/26%20-%20Day%208%20-%20Pearson%20Correlation%20Coefficient%20II.txt) | 32 | | 9 | Multiple Linear Regression | [Problem](https://www.hackerrank.com/challenges/s10-multiple-linear-regression/problem) | Medium | 30 | [Solution](/10%20Days%20of%20Statistics/27%20-%20Day%209%20-%20Multiple%20Linear%20Regression.py) | 33 | -------------------------------------------------------------------------------- /Statistics/README.md: -------------------------------------------------------------------------------- 1 | Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. 2 | --------------------------------------------------------------------------------