├── .gitignore
├── Numpy
    ├── assets
    │   ├── array.jpg
    │   ├── kZNzz.png
    │   ├── vceRQ.png
    │   ├── Matrix.svg.png
    │   ├── elsp_0105.png
    │   ├── array_vs_list.png
    │   └── 583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg
    └── 01 Numpy Basics.md
├── Pandas
    ├── assets
    │   ├── hMKKt.jpg
    │   ├── structure_table.jpg
    │   ├── structure_table-1557216961120.jpg
    │   └── series-and-dataframe.width-1200.png
    └── 01 Pandas Basics.md
├── assets
    └── COFFEE BUTTON ヾ(°∇°^).png
├── README.md
└── LICENSE


/.gitignore:
--------------------------------------------------------------------------------
1 | 
2 | *.no_toc
3 | 


--------------------------------------------------------------------------------
/Numpy/assets/array.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/array.jpg


--------------------------------------------------------------------------------
/Numpy/assets/kZNzz.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/kZNzz.png


--------------------------------------------------------------------------------
/Numpy/assets/vceRQ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/vceRQ.png


--------------------------------------------------------------------------------
/Pandas/assets/hMKKt.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/hMKKt.jpg


--------------------------------------------------------------------------------
/Numpy/assets/Matrix.svg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/Matrix.svg.png


--------------------------------------------------------------------------------
/Numpy/assets/elsp_0105.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/elsp_0105.png


--------------------------------------------------------------------------------
/Numpy/assets/array_vs_list.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/array_vs_list.png


--------------------------------------------------------------------------------
/assets/COFFEE BUTTON ヾ(°∇°^).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/assets/COFFEE BUTTON ヾ(°∇°^).png


--------------------------------------------------------------------------------
/Pandas/assets/structure_table.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/structure_table.jpg


--------------------------------------------------------------------------------
/Pandas/assets/structure_table-1557216961120.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/structure_table-1557216961120.jpg


--------------------------------------------------------------------------------
/Pandas/assets/series-and-dataframe.width-1200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/series-and-dataframe.width-1200.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # python-data-tools-reference
2 | A reference of frameworks and tools for data and ML in Python
3 | 
4 | 
5 | 
6 | A one-stop collection of code references, snippets, and references for some of the most widely used tools and frameworks for data manipulation and ML in Python.
7 | 
8 | I'm building these as I'm learning or using them, so they won't be comprehensive, but maybe they'll be of use!


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 methylDragon
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Numpy/assets/583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg:
--------------------------------------------------------------------------------
  1 | <svg xmlns:xlink="http://www.w3.org/1999/xlink" width="62.504ex" height="12.509ex" style="vertical-align: -5.671ex;" viewBox="0 -2944.1 26911.4 5385.9" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg" aria-labelledby="MathJax-SVG-1-Title">
  2 | <title id="MathJax-SVG-1-Title">{\displaystyle {\begin{aligned}\mathbf {u} \otimes \mathbf {v} =\mathbf {u} \mathbf {v} ^{\top }={\begin{bmatrix}u_{1}\\u_{2}\\u_{3}\\u_{4}\end{bmatrix}}{\begin{bmatrix}v_{1}&amp;v_{2}&amp;v_{3}\end{bmatrix}}={\begin{bmatrix}u_{1}v_{1}&amp;u_{1}v_{2}&amp;u_{1}v_{3}\\u_{2}v_{1}&amp;u_{2}v_{2}&amp;u_{2}v_{3}\\u_{3}v_{1}&amp;u_{3}v_{2}&amp;u_{3}v_{3}\\u_{4}v_{1}&amp;u_{4}v_{2}&amp;u_{4}v_{3}\end{bmatrix}}.\end{aligned}}}</title>
  3 | <defs aria-hidden="true">
  4 | <path stroke-width="1" id="E1-MJMAINB-75" d="M40 442L134 446Q228 450 229 450H235V273V165Q235 90 238 74T254 52Q268 46 304 46H319Q352 46 380 67T419 121L420 123Q424 135 425 199Q425 201 425 207Q425 233 425 249V316Q425 354 423 363T410 376Q396 380 369 380H356V442L554 450V267Q554 84 556 79Q561 62 610 62H623V31Q623 0 622 0Q603 0 527 -3T432 -6Q431 -6 431 25V56L420 45Q373 6 332 -1Q313 -6 281 -6Q208 -6 165 14T109 87L107 98L106 230Q106 358 104 366Q96 380 50 380H37V442H40Z"></path>
  5 | <path stroke-width="1" id="E1-MJMAIN-2297" d="M56 250Q56 394 156 488T384 583Q530 583 626 485T722 250Q722 110 625 14T390 -83Q249 -83 153 14T56 250ZM582 471Q531 510 496 523Q446 542 381 542Q324 542 272 519T196 471L389 278L485 375L582 471ZM167 442Q95 362 95 250Q95 137 167 58L359 250L167 442ZM610 58Q682 138 682 250Q682 363 610 442L418 250L610 58ZM196 29Q209 16 230 2T295 -27T388 -42Q409 -42 429 -40T465 -33T496 -23T522 -11T544 1T561 13T574 22T582 29L388 222L196 29Z"></path>
  6 | <path stroke-width="1" id="E1-MJMAINB-76" d="M401 444Q413 441 495 441Q568 441 574 444H580V382H510L409 156Q348 18 339 6Q331 -4 320 -4Q318 -4 313 -4T303 -3H288Q273 -3 264 12T221 102Q206 135 197 156L96 382H26V444H34Q49 441 145 441Q252 441 270 444H279V382H231L284 264Q335 149 338 149Q338 150 389 264T442 381Q442 382 418 382H394V444H401Z"></path>
  7 | <path stroke-width="1" id="E1-MJMAIN-3D" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"></path>
  8 | <path stroke-width="1" id="E1-MJMAIN-22A4" d="M55 642T55 648T59 659T66 666T71 668H708Q723 660 723 648T708 628H409V15Q402 2 391 0Q387 0 384 1T379 3T375 6T373 9T371 13T369 16V628H71Q70 628 67 630T59 637Z"></path>
  9 | <path stroke-width="1" id="E1-MJMAIN-5B" d="M118 -250V750H255V710H158V-210H255V-250H118Z"></path>
 10 | <path stroke-width="1" id="E1-MJMATHI-75" d="M21 287Q21 295 30 318T55 370T99 420T158 442Q204 442 227 417T250 358Q250 340 216 246T182 105Q182 62 196 45T238 27T291 44T328 78L339 95Q341 99 377 247Q407 367 413 387T427 416Q444 431 463 431Q480 431 488 421T496 402L420 84Q419 79 419 68Q419 43 426 35T447 26Q469 29 482 57T512 145Q514 153 532 153Q551 153 551 144Q550 139 549 130T540 98T523 55T498 17T462 -8Q454 -10 438 -10Q372 -10 347 46Q345 45 336 36T318 21T296 6T267 -6T233 -11Q189 -11 155 7Q103 38 103 113Q103 170 138 262T173 379Q173 380 173 381Q173 390 173 393T169 400T158 404H154Q131 404 112 385T82 344T65 302T57 280Q55 278 41 278H27Q21 284 21 287Z"></path>
 11 | <path stroke-width="1" id="E1-MJMAIN-31" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"></path>
 12 | <path stroke-width="1" id="E1-MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path>
 13 | <path stroke-width="1" id="E1-MJMAIN-33" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"></path>
 14 | <path stroke-width="1" id="E1-MJMAIN-34" d="M462 0Q444 3 333 3Q217 3 199 0H190V46H221Q241 46 248 46T265 48T279 53T286 61Q287 63 287 115V165H28V211L179 442Q332 674 334 675Q336 677 355 677H373L379 671V211H471V165H379V114Q379 73 379 66T385 54Q393 47 442 46H471V0H462ZM293 211V545L74 212L183 211H293Z"></path>
 15 | <path stroke-width="1" id="E1-MJMAIN-5D" d="M22 710V750H159V-250H22V-210H119V710H22Z"></path>
 16 | <path stroke-width="1" id="E1-MJSZ4-23A1" d="M319 -645V1154H666V1070H403V-645H319Z"></path>
 17 | <path stroke-width="1" id="E1-MJSZ4-23A3" d="M319 -644V1155H403V-560H666V-644H319Z"></path>
 18 | <path stroke-width="1" id="E1-MJSZ4-23A2" d="M319 0V602H403V0H319Z"></path>
 19 | <path stroke-width="1" id="E1-MJSZ4-23A4" d="M0 1070V1154H347V-645H263V1070H0Z"></path>
 20 | <path stroke-width="1" id="E1-MJSZ4-23A6" d="M263 -560V1155H347V-644H0V-560H263Z"></path>
 21 | <path stroke-width="1" id="E1-MJSZ4-23A5" d="M263 0V602H347V0H263Z"></path>
 22 | <path stroke-width="1" id="E1-MJMATHI-76" d="M173 380Q173 405 154 405Q130 405 104 376T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Q21 294 29 316T53 368T97 419T160 441Q202 441 225 417T249 361Q249 344 246 335Q246 329 231 291T200 202T182 113Q182 86 187 69Q200 26 250 26Q287 26 319 60T369 139T398 222T409 277Q409 300 401 317T383 343T365 361T357 383Q357 405 376 424T417 443Q436 443 451 425T467 367Q467 340 455 284T418 159T347 40T241 -11Q177 -11 139 22Q102 54 102 117Q102 148 110 181T151 298Q173 362 173 380Z"></path>
 23 | <path stroke-width="1" id="E1-MJMAIN-2E" d="M78 60Q78 84 95 102T138 120Q162 120 180 104T199 61Q199 36 182 18T139 0T96 17T78 60Z"></path>
 24 | </defs>
 25 | <g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)" aria-hidden="true">
 26 | <g transform="translate(167,0)">
 27 | <g transform="translate(-11,0)">
 28 |  <use xlink:href="#E1-MJMAINB-75" x="0" y="0"></use>
 29 |  <use xlink:href="#E1-MJMAIN-2297" x="861" y="0"></use>
 30 |  <use xlink:href="#E1-MJMAINB-76" x="1862" y="0"></use>
 31 |  <use xlink:href="#E1-MJMAIN-3D" x="2747" y="0"></use>
 32 |  <use xlink:href="#E1-MJMAINB-75" x="3804" y="0"></use>
 33 | <g transform="translate(4443,0)">
 34 |  <use xlink:href="#E1-MJMAINB-76" x="0" y="0"></use>
 35 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-22A4" x="859" y="583"></use>
 36 | </g>
 37 |  <use xlink:href="#E1-MJMAIN-3D" x="5979" y="0"></use>
 38 | <g transform="translate(7035,0)">
 39 | <g transform="translate(0,2850)">
 40 |  <use xlink:href="#E1-MJSZ4-23A1" x="0" y="-1155"></use>
 41 | <g transform="translate(0,-3446.188741721854) scale(1,2.8112582781456954)">
 42 |  <use xlink:href="#E1-MJSZ4-23A2"></use>
 43 | </g>
 44 |  <use xlink:href="#E1-MJSZ4-23A3" x="0" y="-4555"></use>
 45 | </g>
 46 | <g transform="translate(834,0)">
 47 | <g transform="translate(-11,0)">
 48 | <g transform="translate(0,2050)">
 49 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
 50 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="809" y="-213"></use>
 51 | </g>
 52 | <g transform="translate(0,650)">
 53 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
 54 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="809" y="-213"></use>
 55 | </g>
 56 | <g transform="translate(0,-750)">
 57 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
 58 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="809" y="-213"></use>
 59 | </g>
 60 | <g transform="translate(0,-2150)">
 61 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
 62 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-34" x="809" y="-213"></use>
 63 | </g>
 64 | </g>
 65 | </g>
 66 | <g transform="translate(2017,2850)">
 67 |  <use xlink:href="#E1-MJSZ4-23A4" x="0" y="-1155"></use>
 68 | <g transform="translate(0,-3446.188741721854) scale(1,2.8112582781456954)">
 69 |  <use xlink:href="#E1-MJSZ4-23A5"></use>
 70 | </g>
 71 |  <use xlink:href="#E1-MJSZ4-23A6" x="0" y="-4555"></use>
 72 | </g>
 73 | </g>
 74 | <g transform="translate(9720,0)">
 75 |  <use xlink:href="#E1-MJMAIN-5B" x="0" y="0"></use>
 76 | <g transform="translate(445,0)">
 77 | <g transform="translate(-11,0)">
 78 | <g transform="translate(0,-50)">
 79 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
 80 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="686" y="-213"></use>
 81 | </g>
 82 | </g>
 83 | <g transform="translate(1928,0)">
 84 | <g transform="translate(0,-50)">
 85 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
 86 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="686" y="-213"></use>
 87 | </g>
 88 | </g>
 89 | <g transform="translate(3868,0)">
 90 | <g transform="translate(0,-50)">
 91 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
 92 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="686" y="-213"></use>
 93 | </g>
 94 | </g>
 95 | </g>
 96 |  <use xlink:href="#E1-MJMAIN-5D" x="5420" y="0"></use>
 97 | </g>
 98 |  <use xlink:href="#E1-MJMAIN-3D" x="15697" y="0"></use>
 99 | <g transform="translate(16753,0)">
100 | <g transform="translate(0,2850)">
101 |  <use xlink:href="#E1-MJSZ4-23A1" x="0" y="-1155"></use>
102 | <g transform="translate(0,-3446.188741721854) scale(1,2.8112582781456954)">
103 |  <use xlink:href="#E1-MJSZ4-23A2"></use>
104 | </g>
105 |  <use xlink:href="#E1-MJSZ4-23A3" x="0" y="-4555"></use>
106 | </g>
107 | <g transform="translate(834,0)">
108 | <g transform="translate(-11,0)">
109 | <g transform="translate(0,2050)">
110 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
111 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="809" y="-213"></use>
112 | <g transform="translate(1026,0)">
113 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
114 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="686" y="-213"></use>
115 | </g>
116 | </g>
117 | <g transform="translate(0,650)">
118 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
119 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="809" y="-213"></use>
120 | <g transform="translate(1026,0)">
121 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
122 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="686" y="-213"></use>
123 | </g>
124 | </g>
125 | <g transform="translate(0,-750)">
126 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
127 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="809" y="-213"></use>
128 | <g transform="translate(1026,0)">
129 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
130 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="686" y="-213"></use>
131 | </g>
132 | </g>
133 | <g transform="translate(0,-2150)">
134 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
135 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-34" x="809" y="-213"></use>
136 | <g transform="translate(1026,0)">
137 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
138 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="686" y="-213"></use>
139 | </g>
140 | </g>
141 | </g>
142 | <g transform="translate(2955,0)">
143 | <g transform="translate(0,2050)">
144 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
145 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="809" y="-213"></use>
146 | <g transform="translate(1026,0)">
147 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
148 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="686" y="-213"></use>
149 | </g>
150 | </g>
151 | <g transform="translate(0,650)">
152 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
153 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="809" y="-213"></use>
154 | <g transform="translate(1026,0)">
155 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
156 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="686" y="-213"></use>
157 | </g>
158 | </g>
159 | <g transform="translate(0,-750)">
160 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
161 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="809" y="-213"></use>
162 | <g transform="translate(1026,0)">
163 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
164 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="686" y="-213"></use>
165 | </g>
166 | </g>
167 | <g transform="translate(0,-2150)">
168 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
169 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-34" x="809" y="-213"></use>
170 | <g transform="translate(1026,0)">
171 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
172 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="686" y="-213"></use>
173 | </g>
174 | </g>
175 | </g>
176 | <g transform="translate(5921,0)">
177 | <g transform="translate(0,2050)">
178 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
179 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-31" x="809" y="-213"></use>
180 | <g transform="translate(1026,0)">
181 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
182 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="686" y="-213"></use>
183 | </g>
184 | </g>
185 | <g transform="translate(0,650)">
186 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
187 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-32" x="809" y="-213"></use>
188 | <g transform="translate(1026,0)">
189 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
190 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="686" y="-213"></use>
191 | </g>
192 | </g>
193 | <g transform="translate(0,-750)">
194 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
195 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="809" y="-213"></use>
196 | <g transform="translate(1026,0)">
197 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
198 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="686" y="-213"></use>
199 | </g>
200 | </g>
201 | <g transform="translate(0,-2150)">
202 |  <use xlink:href="#E1-MJMATHI-75" x="0" y="0"></use>
203 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-34" x="809" y="-213"></use>
204 | <g transform="translate(1026,0)">
205 |  <use xlink:href="#E1-MJMATHI-76" x="0" y="0"></use>
206 |  <use transform="scale(0.707)" xlink:href="#E1-MJMAIN-33" x="686" y="-213"></use>
207 | </g>
208 | </g>
209 | </g>
210 | </g>
211 | <g transform="translate(8888,2850)">
212 |  <use xlink:href="#E1-MJSZ4-23A4" x="0" y="-1155"></use>
213 | <g transform="translate(0,-3446.188741721854) scale(1,2.8112582781456954)">
214 |  <use xlink:href="#E1-MJSZ4-23A5"></use>
215 | </g>
216 |  <use xlink:href="#E1-MJSZ4-23A6" x="0" y="-4555"></use>
217 | </g>
218 | </g>
219 |  <use xlink:href="#E1-MJMAIN-2E" x="26309" y="0"></use>
220 | </g>
221 | </g>
222 | </g>
223 | </svg>


--------------------------------------------------------------------------------
/Pandas/01 Pandas Basics.md:
--------------------------------------------------------------------------------
   1 | # Pandas Basics
   2 | 
   3 | Author: methylDragon  
   4 | Contains a syntax reference and code snippets for Pandas!  
   5 | It's a collection of code snippets and tutorials from everywhere all mashed together!       
   6 | 
   7 | ------
   8 | 
   9 | ## Pre-Requisites
  10 | 
  11 | ### Required
  12 | 
  13 | - Python knowledge, this isn't a tutorial!
  14 | - Pandas installed
  15 |   
  16 |   - I'll assume you've already run these lines as well 
  17 |   
  18 |     ```python
  19 |     import numpy as np
  20 |     import pandas as pd
  21 |     ```
  22 | 
  23 | 
  24 | 
  25 | ## Table Of Contents <a name="top"></a>
  26 | 
  27 | 1. [Introduction](#1)    
  28 | 2. [Pandas Basics](#2)    
  29 |    2.1 [Data Types](#2.1)    
  30 |    2.2 [Series Basics](#2.2)    
  31 |    2.3 [DataFrame Basics](#2.3)    
  32 |    2.4 [Panel Basics](#2.4)    
  33 |    2.5 [Catagorical Data](#2.5)    
  34 |    2.6 [Basic Binary Operations](#2.6)    
  35 |    2.7 [Casting and Conversion](#2.7)    
  36 |    2.8 [Conditional Indexing](#2.8)    
  37 |    2.9 [IO](#2.9)    
  38 |    2.10 [Plotting](#2.10)    
  39 |    2.11 [Sparse Data](#2.11)    
  40 | 3. [Series Operations](#3)    
  41 |    3.1 [Manipulating Series Text](#3.1)    
  42 |    3.2 [Time Series](#3.2)    
  43 |    3.3 [Time Deltas](#3.3)    
  44 | 4. [DataFrame Operations](#4)    
  45 |    4.1 [Preface](#4.1)    
  46 |    4.2 [Iterating Through DataFrames](#4.2)    
  47 |    4.3 [Sorting, Reindexing, and Renaming DataFrame Values](#4.3)    
  48 |    4.4 [Replacing DataFrame Values](#4.4)    
  49 |    4.5 [Function Application on DataFrames](#4.5)    
  50 |    4.6 [Descriptive Statistics](#4.6)    
  51 |    4.7 [Statistical Methods](#4.7)    
  52 |    4.8 [Window Functions](#4.8)    
  53 |    4.9 [Data Aggregation](#4.9)    
  54 |    4.10 [Dealing with Missing Data](#4.10)    
  55 |    4.11 [GroupBy Operations](#4.11)    
  56 |    4.12 [Merging and Joining](#4.12)    
  57 |    4.13 [Concatenation](#4.13)    
  58 | 5. [EXTRA: Helpful Notes](#5)    
  59 | 
  60 | 
  61 | 
  62 | 
  63 | ## 1. Introduction <a name="1"></a>
  64 | 
  65 | > *pandas* is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the [Python](https://www.python.org/) programming language.
  66 | >
  67 | > *pandas* is a [NumFOCUS](https://www.numfocus.org/open-source-projects.html) sponsored project. This will help ensure the success of development of *pandas* as a world-class open-source project, and makes it possible to [donate](https://pandas.pydata.org/donate.html) to the project.
  68 | > 
  69 | > <https://pandas.pydata.org/>
  70 | 
  71 | This document will list the most commonly used functions in Pandas, to serve as a reference when using it.
  72 | 
  73 | It's not meant to be exhaustive, merely acting as a quick reference for the syntax for basic operations with Pandas. Please do not hesitate to consult the [official documentation](<https://pandas.pydata.org/pandas-docs/stable) for pandas for more in-depth dives into the library!
  74 | 
  75 | **Key Features**
  76 | 
  77 | Source: <https://www.tutorialspoint.com/python_pandas/python_pandas_introduction.htm>
  78 | 
  79 | - Fast and efficient DataFrame object with default and customized indexing.
  80 | - Tools for loading data into in-memory data objects from different file formats.
  81 | - Data alignment and integrated handling of missing data.
  82 | - Reshaping and pivoting of date sets.
  83 | - Label-based slicing, indexing and subsetting of large data sets.
  84 | - Columns from a data structure can be deleted or inserted.
  85 | - Group by data for aggregation and transformations.
  86 | - High performance merging and joining of data.
  87 | - Time Series functionality.
  88 | 
  89 | ---
  90 | 
  91 | Install it!
  92 | 
  93 | ```shell
  94 | # Best to use conda
  95 | $ conda install pandas
  96 | 
  97 | # But it's possible to use the PyPI wheels as well
  98 | $ pip install pandas
  99 | ```
 100 | 
 101 | You might also need to install additional dependencies
 102 | 
 103 | ```shell
 104 | $ sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook
 105 | python-pandas python-sympy python-nose
 106 | ```
 107 | 
 108 | 
 109 | 
 110 | If you need additional help or need a refresher on the parameters, feel free to use:
 111 | 
 112 | ```python
 113 | help(pd.FUNCTION_YOU_NEED_HELP_WITH)
 114 | ```
 115 | 
 116 | ---
 117 | 
 118 | **Credits:**
 119 | 
 120 | A lot of these notes I'm adapting from 
 121 | 
 122 | <https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html>
 123 | 
 124 | <https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python>
 125 | 
 126 | <https://www.tutorialspoint.com/python_pandas/python_pandas_introduction.htm>
 127 | 
 128 | 
 129 | 
 130 | ## 2. Pandas Basics <a name="2"></a>
 131 | 
 132 | ### 2.1 Data Types <a name="2.1"></a>
 133 | [go to top](#top)
 134 | 
 135 | 
 136 | Note that Pandas is built on top of Numpy.
 137 | 
 138 | There are three types of data structures that Pandas deals with:
 139 | 
 140 | - Series
 141 |   - 1D labelled homogeneous array, size-immutable
 142 |   - If heterogeneous data is entered, the data-type will become 'object'
 143 | - DataFrame
 144 |   - Contains series data
 145 |   - 2D labelled, size-mutable, table structure
 146 |   - Potentially heterogeneous columns
 147 | - Panel
 148 |   - Contains DataFrames
 149 |   - 3D labelled, size-mutable array
 150 | 
 151 | **The major focus of this syntax reference will deal with DataFrames**. Since they're the most commonly manipulated objects when Pandas is concerned.
 152 | 
 153 | 
 154 | 
 155 | ### 2.2 Series Basics <a name="2.2"></a>
 156 | [go to top](#top)
 157 | 
 158 | 
 159 | > A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.
 160 | >
 161 | > <https://www.tutorialspoint.com/python_pandas/python_pandas_series.htm>
 162 | 
 163 | ![Image result for pandas series](assets/series-and-dataframe.width-1200.png)
 164 | 
 165 | [Image Source](<https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/>)
 166 | 
 167 | #### **Creating Series Objects**
 168 | 
 169 | ```python
 170 | # Empty Series
 171 | s = pd.Series()
 172 | 
 173 | # Series from ndarray
 174 | s = pd.Series(np.array([1, 2, 3]))
 175 | s = pd.Series(np.array([1, 2, 3]), index=[100, 101, 102]) # With custom indexing
 176 | 
 177 | # Series from Dict
 178 | # Dictionary keys are used to construct the index
 179 | s = pd.Series({'a': 0, 'b': 1, 'c': 2})
 180 | 
 181 | # Series from scalar
 182 | s = pd.Series(5, index=[0, 1, 2]) # Creates 3 rows of value 5
 183 | ```
 184 | 
 185 | #### **Accessing Values**
 186 | 
 187 | ```python
 188 | # By position
 189 | s[0]
 190 | 
 191 | # By index
 192 | s['index_name']
 193 | 
 194 | # By slice
 195 | s[-3:] # Retrieves last 3 elements
 196 | 
 197 | # Fancy indexing works also!
 198 | s[[0, 1, 2]]
 199 | s[['index_1', 'index_2', 'index_3']]
 200 | 
 201 | # Head and Tail
 202 | s.head()
 203 | s.tail()
 204 | s.head(5) # First 5
 205 | s.tail(5) # Last 5
 206 | ```
 207 | 
 208 | #### **Series Properties**
 209 | 
 210 | ```python
 211 | s.axes 		# Returns list of row axis labels
 212 | s.dtype 	# Returns data type of entries
 213 | s.empty 	# True if series is empty
 214 | s.ndim 		# Dimension. 1 for series
 215 | s.size 		# Number of elements
 216 | s.values 	# Returns the Series as an ndarray
 217 | ```
 218 | 
 219 | 
 220 | 
 221 | ### 2.3 DataFrame Basics <a name="2.3"></a>
 222 | [go to top](#top)
 223 | 
 224 | 
 225 | > A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
 226 | >
 227 | > <https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm>
 228 | 
 229 | ![Structure Table](assets/structure_table.jpg)
 230 | 
 231 | Image Source: <https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm>
 232 | 
 233 | #### **Creating DataFrame Objects**
 234 | 
 235 | ```python
 236 | # Empty DataFrame
 237 | df = pd.DataFrame()
 238 | 
 239 | # DataFrame from List
 240 | df = pd.DataFrame([1, 2, 3, 4, 5]) # Single Column
 241 | df = pd.DataFrame([['a', 1], ['b', 2]], columns=['name_1', 'name_2']) # Multi columns
 242 | df = pd.DataFrame([1, 2, 3], dtype=float) # Convert the ints to floats
 243 | 
 244 | # DataFrame from Series
 245 | df = s.to_frame()
 246 | 
 247 | # DataFrame from Dict of Lists
 248 | df = pd.DataFrame({'Name':['methylDragon', 'toothless', 'smaug'], 'Rating': [10, 5, 2]})
 249 | 
 250 | # DataFrame from List of Dicts
 251 | df = pd.DataFrame([{'Name': 'methylDragon', 'Rating': 10},
 252 |                    {'Name': 'toothless', 'Rating': 5},
 253 |                    {'Name': 'smaug'}]) # NaN will be appended for missing values
 254 | 
 255 | # DataFrame from Dict of Series
 256 | # Similarly, NaN will be added for missing values
 257 | df = pd.DataFrame({'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
 258 |                    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])})
 259 | 
 260 | # Creating with Non-Default Index
 261 | df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'])
 262 | ```
 263 | 
 264 | #### **Important Note on Mutability**
 265 | 
 266 | **NOTE:** Most operations will **not** change the original DataFrame unless the DataFrame is **reassigned**, or you use an `inplace=True` flag, which changes the DataFrame in question in place.
 267 | 
 268 | #### **Basic Operations**
 269 | 
 270 | **Column**
 271 | 
 272 | ```python
 273 | # Column Selection
 274 | df['column_name']
 275 | df.column_name # This also works! (Only if the column name is a string though..)
 276 | 
 277 | # Column Selection by dtype
 278 | df.select_dtypes(include=[dtypes])
 279 | 
 280 | # Adding a new Column
 281 | df['new_column_name'] = pd.Series([1, 2, 3])
 282 | 
 283 | # Deleting a Column (Either one works)
 284 | del df['column_name']
 285 | df.pop(['column_name'])
 286 | 
 287 | # Math for Columns
 288 | df['column_1'] + df['column_2'] # Gives you a new column that is the addition of the first two
 289 | ```
 290 | 
 291 | **Row**
 292 | 
 293 | ```python
 294 | # Row Selection by Label
 295 | df.loc['row_lable/index']
 296 | 
 297 | # Row Selection by Position Index
 298 | df.iloc[0] # Selects first row
 299 | 
 300 | # Row Slicing
 301 | df[-3:]
 302 | 
 303 | # Adding Rows
 304 | df.append(df2)
 305 | df.append(df2, ignore_index=True) # To ignore indices
 306 | 
 307 | # Deleting Rows
 308 | df.drop('label_to_drop')
 309 | 
 310 | # Deleting rows with None/NaN/empty values
 311 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
 312 | df.dropna(axis=0, how='any') # Drop rows with any column containing None
 313 | df.dropna(axis=0, how='all') # Drop rows with all columns containing None
 314 | df.dropna(axis=0, thresh=2) # Drop rows with 2 or more columns containing None
 315 | 
 316 | # Head and Tail
 317 | df.head()
 318 | df.tail()
 319 | df.head(5) # First 5 rows
 320 | df.tail(5) # Last 5 rows
 321 | ```
 322 | 
 323 | #### **DataFrame Properties**
 324 | 
 325 | ```python
 326 | df.T 		# Transpose
 327 | df.axes 	# Row axis and column axis labels
 328 | df.dtypes 	# Data types of elements
 329 | df.empty 	# True if empty
 330 | df.ndim 	# Dimension (number of axes)
 331 | df.shape 	# Tuple representing the shape (dimensionality) of the DataFrame
 332 | df.size 	# Number of elements
 333 | df.values 	# Numpy represendation, NDFrame
 334 | ```
 335 | 
 336 | 
 337 | 
 338 | ### 2.4 Panel Basics <a name="2.4"></a>
 339 | [go to top](#top)
 340 | 
 341 | 
 342 | > A **panel** is a 3D container of data. The term **Panel data** is derived from econometrics and is partially responsible for the name pandas − **pan(el)-da(ta)**-s.
 343 | >
 344 | > The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are −
 345 | >
 346 | > - **items** − axis 0, each item corresponds to a DataFrame contained inside.
 347 | > - **major_axis** − axis 1, it is the index (rows) of each of the DataFrames.
 348 | > - **minor_axis** − axis 2, it is the columns of each of the DataFrames.
 349 | >
 350 | > <https://www.tutorialspoint.com/python_pandas/python_pandas_panel.htm>
 351 | 
 352 | #### **Creating Panel Objects**
 353 | 
 354 | ```python
 355 | # Empty Panel
 356 | p = pd.Panel()
 357 | 
 358 | # Panel from 3D ndarray
 359 | p = pd.Panel(np.random.rand(2, 4, 5))
 360 | 
 361 | # Panel from dict of DataFrames
 362 | data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
 363 |         'Item2' : pd.DataFrame(np.random.randn(4, 2))}
 364 | p = pd.Panel(data)
 365 | ```
 366 | 
 367 | #### **Accessing Values**
 368 | 
 369 | ```python
 370 | # By Item
 371 | p['Item1'] # Gives you the corresponding dataframe
 372 | 
 373 | # By Major Axis
 374 | p.major_xs(1) # Shows all data from the second row across all dataframes
 375 | 
 376 | '''
 377 | Eg: If the panel's first item is as such:
 378 |             0          1          2
 379 | 0    0.488224  -0.128637   0.930817
 380 | >> 1    0.417497   0.896681   0.576657 <<
 381 | 2   -2.775266   0.571668   0.290082
 382 | 3   -0.400538  -0.144234   1.110535
 383 | 
 384 | Then the Output of p.major_xs(1) is:
 385 |       Item1
 386 | 0   0.417497
 387 | 1   0.896681
 388 | 2   0.576657
 389 | 
 390 | It's a transpose of the second row's elements (of the original DataFrame)!
 391 | '''
 392 | 
 393 | # By Minor Axis
 394 | p.minor_xs(1)
 395 | 
 396 | '''
 397 | Eg: Same deal as above, same first item
 398 | 
 399 | Output of p.minor_xs(1) are the items under the second column (of the original DataFrame)!
 400 | 
 401 |        Item1
 402 | 0   -0.128637
 403 | 1    0.896681
 404 | 2    0.571668
 405 | 3   -0.144234
 406 | '''
 407 | ```
 408 | 
 409 | 
 410 | 
 411 | ### 2.5 Catagorical Data <a name="2.5"></a>
 412 | [go to top](#top)
 413 | 
 414 | 
 415 | So imagine you have data that's made of a limited number of actual values
 416 | 
 417 | Eg: [1, 1, 1, 3, 2, 3, 2, 1, 2, 3, 1]
 418 | 
 419 | There's a way to encode the fact that there are only three kinds of values - Catagories!
 420 | 
 421 | #### **Construct Catagorical Data**
 422 | 
 423 | ```python
 424 | # Source: https://www.tutorialspoint.com/python_pandas/python_pandas_categorical_data.htm
 425 | 
 426 | s = pd.Series(["a","b","c","a"], dtype="category")
 427 | '''
 428 | Output
 429 | 
 430 | 0  a
 431 | 1  b
 432 | 2  c
 433 | 3  a
 434 | dtype: category
 435 | Categories (3, object): [a, b, c]
 436 | '''
 437 | 
 438 | # Generate just a list-like object with catagories
 439 | cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
 440 | # [a, b, c, a, b, c]
 441 | # Categories (3, object): [a, b, c]
 442 | 
 443 | # Or do it with stated catagories!
 444 | cat = pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])
 445 | # [a, b, c, a, b, c, NaN]
 446 | # Categories (3, object): [c, b, a]
 447 | 
 448 | # Specify catagories with ordered catagories
 449 | # This one implies c < b < a
 450 | cat = pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)
 451 | ```
 452 | 
 453 | #### **Properties and Altering Catagories**
 454 | 
 455 | ```python
 456 | df.describe() # For general
 457 | s.categories() # Find catagories
 458 | s.ordered() #
 459 | s.cat.categories() # Use this to edit the categories
 460 | 
 461 | # Add catagories
 462 | s = s.cat.add_categories([4])
 463 | 
 464 | # Remove catagories
 465 | s.cat.remove_categories("a")
 466 | 
 467 | # Compare catagories
 468 | # You may compare catagorical data, aligned by category
 469 | cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)
 470 | cat1 = pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True)
 471 | 
 472 | cat > cat1
 473 | '''
 474 | Output
 475 | 
 476 | 0  False
 477 | 1  False
 478 | 2  True
 479 | dtype: bool
 480 | '''
 481 | ```
 482 | 
 483 | 
 484 | 
 485 | ### 2.6 Basic Binary Operations <a name="2.6"></a>
 486 | [go to top](#top)
 487 | 
 488 | 
 489 | https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html#pandas.DataFrame.add
 490 | 
 491 | #### **Arithmetic**
 492 | 
 493 | ```python
 494 | df.add(other)
 495 | df.sub(other)
 496 | df.mul(other)
 497 | df.div(other)
 498 | df.truediv(other) # For floats
 499 | df.floordiv(other) # For integers
 500 | df.mod(other)
 501 | df.pow(other)
 502 | df.divmod(other) # Returns tuple of (quotient, remainder)
 503 | 
 504 | df.radd(other) # Reverse
 505 | df.rsub(other) # Reverse
 506 | 
 507 | # You may specify fill-values for missing values too!
 508 | df.add(other, fill_value=0)
 509 | ```
 510 | 
 511 | #### **Boolean Reductions**
 512 | 
 513 | ```python
 514 | (df > 0).all()
 515 | 
 516 | #  empty, any, all, bool all work.
 517 | 
 518 | # You can also do comparisons! (Eg. ==, >, etc.)
 519 | ```
 520 | 
 521 | 
 522 | 
 523 | ### 2.7 Casting and Conversion <a name="2.7"></a>
 524 | [go to top](#top)
 525 | 
 526 | 
 527 | ```python
 528 | # Casting object to dtype
 529 | df.astype(dtype)
 530 | df.astype(dtype, copy=False) # Do not return a copy
 531 | 
 532 | # Attempt to infer better dtype for object columns
 533 | df.convert_objects(convert_dates=True) # Unconvertibles become NaT
 534 | df.convert_objects(convert_numeric=True) # Unconvertibles become NaN
 535 | ```
 536 | 
 537 | 
 538 | 
 539 | ### 2.8 Conditional Indexing <a name="2.8"></a>
 540 | [go to top](#top)
 541 | 
 542 | 
 543 | So you remember that fancy indexing works?
 544 | 
 545 | ```python
 546 | # Now you can do it with conditions too!
 547 | df[df > 0]
 548 | df.where(df > 0)
 549 | ```
 550 | 
 551 | 
 552 | 
 553 | ### 2.9 IO <a name="2.9"></a>
 554 | [go to top](#top)
 555 | 
 556 | 
 557 | <https://pandas.pydata.org/pandas-docs/version/0.20/io.html>
 558 | 
 559 | | Format Type | Data Description                                             | Reader                                                       | Writer                                                       |
 560 | | :---------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- |
 561 | | text        | [CSV](https://en.wikipedia.org/wiki/Comma-separated_values)  | [read_csv](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-read-csv-table) | [to_csv](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-store-in-csv) |
 562 | | text        | [JSON](http://www.json.org/)                                 | [read_json](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-json-reader) | [to_json](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-json-writer) |
 563 | | text        | [HTML](https://en.wikipedia.org/wiki/HTML)                   | [read_html](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-read-html) | [to_html](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-html) |
 564 | | text        | Local clipboard                                              | [read_clipboard](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-clipboard) | [to_clipboard](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-clipboard) |
 565 | | binary      | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel)    | [read_excel](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-excel-reader) | [to_excel](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-excel-writer) |
 566 | | binary      | [HDF5 Format](https://support.hdfgroup.org/HDF5/whatishdf5.html) | [read_hdf](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-hdf5) | [to_hdf](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-hdf5) |
 567 | | binary      | [Feather Format](https://github.com/wesm/feather)            | [read_feather](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-feather) | [to_feather](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-feather) |
 568 | | binary      | [Msgpack](http://msgpack.org/index.html)                     | [read_msgpack](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-msgpack) | [to_msgpack](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-msgpack) |
 569 | | binary      | [Stata](https://en.wikipedia.org/wiki/Stata)                 | [read_stata](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-stata-reader) | [to_stata](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-stata-writer) |
 570 | | binary      | [SAS](https://en.wikipedia.org/wiki/SAS_(software))          | [read_sas](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sas-reader) |                                                              |
 571 | | binary      | [Python Pickle Format](https://docs.python.org/3/library/pickle.html) | [read_pickle](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-pickle) | [to_pickle](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-pickle) |
 572 | | SQL         | [SQL](https://en.wikipedia.org/wiki/SQL)                     | [read_sql](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sql) | [to_sql](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sql) |
 573 | | SQL         | [Google Big Query](https://en.wikipedia.org/wiki/BigQuery)   | [read_gbq](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-bigquery) | [to_gbq](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-bigquery) |
 574 | 
 575 | ```python
 576 | # Custom Indexing
 577 | pd.read_csv("file", index_col=['index_col_name'])
 578 | 
 579 | # With converted datatypes
 580 | pd.read_csv("file", dtype={'col': dtype})
 581 | 
 582 | # Column names
 583 | pd.read_csv("file", names=['1', 'b', 'etc'])
 584 | 
 585 | # Skip rows
 586 | pd.read_csv("file", skiprows=2)
 587 | ```
 588 | 
 589 | 
 590 | 
 591 | ### 2.10 Plotting <a name="2.10"></a>
 592 | [go to top](#top)
 593 | 
 594 | 
 595 | Source: <https://www.tutorialspoint.com/python_pandas/python_pandas_visualization.htm>
 596 | 
 597 | ```python
 598 | df.plot() 						# Line plot
 599 | df.plot.bar() 					# Bar chart
 600 | df.plot.bar(stacked=True) 		# Stacaked bar chart
 601 | df.plot.barh() 					# Horizontal bar chart
 602 | df.plot.barh(stacked=True) 		# Horizontal stacked bar chart
 603 | df.plot.hist(bins=20) 			# Plot histogram
 604 | df.diff.hist(bins=30) 			# Plot different histograms for each column
 605 | df.plot.box() 					# Bot plot
 606 | df.plot.area()					# Area plot
 607 | df.plot.scatter(x='a', y='b') 	# Scatter plot
 608 | df.plot.pie(subplots=True)		# Pit plot
 609 | ```
 610 | 
 611 | 
 612 | 
 613 | ### 2.11 Sparse Data <a name="2.11"></a>
 614 | [go to top](#top)
 615 | 
 616 | 
 617 | You can sparsify data to save on space on Disk or in the interpretor memory!
 618 | 
 619 | ```python
 620 | # Sparsify
 621 | sparse_obj = obj.to_sparse() # Default sparsifies NaN/missing
 622 | sparse_obj = obj.to_sparse(fill_value=0) # Sparsify target value
 623 | 
 624 | # Convert back
 625 | sparse_obj.to_dense()
 626 | 
 627 | # Properties
 628 | sparse_obj.density
 629 | ```
 630 | 
 631 | 
 632 | 
 633 | ## 3. Series Operations <a name="3"></a>
 634 | 
 635 | ### 3.1 Manipulating Series Text <a name="3.1"></a>
 636 | [go to top](#top)
 637 | 
 638 | 
 639 | Source: <https://www.tutorialspoint.com/python_pandas/python_pandas_working_with_text_data.htm>
 640 | 
 641 | | 1    | **lower()**Converts strings in the Series/Index to lower case. |
 642 | | ---- | ------------------------------------------------------------ |
 643 | | 2    | **upper()**Converts strings in the Series/Index to upper case. |
 644 | | 3    | **len()**Computes String length().                           |
 645 | | 4    | **strip()**Helps strip whitespace(including newline) from each string in the Series/index from both the sides. |
 646 | | 5    | **split(' ')**Splits each string with the given pattern.     |
 647 | | 6    | **cat(sep=' ')**Concatenates the series/index elements with given separator. |
 648 | | 7    | **get_dummies()**Returns the DataFrame with One-Hot Encoded values. |
 649 | | 8    | **contains(pattern)**Returns a Boolean value True for each element if the substring contains in the element, else False. |
 650 | | 9    | **replace(a,b)**Replaces the value **a** with the value **b**. |
 651 | | 10   | **repeat(value)**Repeats each element with specified number of times. |
 652 | | 11   | **count(pattern)**Returns count of appearance of pattern in each element. |
 653 | | 12   | **startswith(pattern)**Returns true if the element in the Series/Index starts with the pattern. |
 654 | | 13   | **endswith(pattern)**Returns true if the element in the Series/Index ends with the pattern. |
 655 | | 14   | **find(pattern)**Returns the first position of the first occurrence of the pattern. |
 656 | | 15   | **findall(pattern)**Returns a list of all occurrence of the pattern. |
 657 | | 16   | **swapcase**Swaps the case lower/upper.                      |
 658 | | 17   | **islower()**Checks whether all characters in each string in the Series/Index in lower case or not. Returns Boolean |
 659 | | 18   | **isupper()**Checks whether all characters in each string in the Series/Index in upper case or not. Returns Boolean. |
 660 | | 19   | **isnumeric()**Checks whether all characters in each string in the Series/Index are numeric. Returns Boolean. |
 661 | 
 662 | #### **Example**
 663 | 
 664 | ```python
 665 | s.str.lower()
 666 | ```
 667 | 
 668 | 
 669 | 
 670 | ### 3.2 Time Series <a name="3.2"></a>
 671 | [go to top](#top)
 672 | 
 673 | 
 674 | ```python
 675 | # Get Current Time
 676 | pd.datetime.now() # Get current time
 677 | 
 678 | # Get Time from Timestamp
 679 | pd.Timestamp('2019-03-01')
 680 | pd.Timestamp(1587687575, unit='s')
 681 | 
 682 | # Get a date range
 683 | pd.date_range("11:00", "13:30", freq="H").time
 684 | pd.date_range("11:00", "13:30", freq="30min").time # Different frequency
 685 | # Output:
 686 | # [datetime.time(11, 0) datetime.time(11, 30) datetime.time(12, 0)
 687 | #  datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30)]
 688 | 
 689 | # Convert Time Series to Timestamps
 690 | pd.to_datetime(SOME_DATETIME_SERIES)
 691 | ```
 692 | 
 693 | 
 694 | 
 695 | ### 3.3 Time Deltas <a name="3.3"></a>
 696 | [go to top](#top)
 697 | 
 698 | 
 699 | These are almost exactly like the datetime library's timedelta objects.
 700 | 
 701 | ```python
 702 | pd.Timedelta(6, unit='h')
 703 | pd.Timedelta(days=-2)
 704 | pd.Timedelta('2 days 2 hours 15 minutes 30 seconds') # Or even from a string!
 705 | 
 706 | # Or from a series
 707 | pd.to_timedelta(s)
 708 | ```
 709 | 
 710 | 
 711 | 
 712 | ## 4. DataFrame Operations <a name="4"></a>
 713 | 
 714 | ### 4.1 Preface <a name="4.1"></a>
 715 | [go to top](#top)
 716 | 
 717 | 
 718 | Even though this section is supposed to be focused on DataFrames, a lot of these operations can be applied to Series and Panel objects as well! It's just that a large part of using Pandas is working with DataFrames
 719 | 
 720 | To get at least some brief understanding of your data you can
 721 | 
 722 | ```python
 723 | # Look at the first few rows of data
 724 | df.head()
 725 | 
 726 | # Look at essential details (like dimensions, data types, etc.)
 727 | df.info()
 728 | ```
 729 | 
 730 | 
 731 | 
 732 | ### 4.2 Iterating Through DataFrames <a name="4.2"></a>
 733 | [go to top](#top)
 734 | 
 735 | 
 736 | ```python
 737 | df.iteritems() # (key, value) pairs (Get by columns)
 738 | df.iterrows() # (index, series) pairs (Get by rows)
 739 | df.itertuples() # Iterate over rows as named tuples
 740 | ```
 741 | 
 742 | 
 743 | 
 744 | ### 4.3 Sorting, Reindexing, and Renaming DataFrame Values <a name="4.3"></a>
 745 | [go to top](#top)
 746 | 
 747 | 
 748 | ```python
 749 | # Sort by Values
 750 | df.sort_values('column_name', inplace=True) # Sort by values in column
 751 | 
 752 | # Sort by Index
 753 | df.sort_index(ascending=False) # Default is ascending=True
 754 | df.sort_index(axis=1) # Sort by column index
 755 | 
 756 | # Reset Index
 757 | df.reset_index(inplace=True, drop=True) # Reset index, skip inserting old index as a column
 758 | 
 759 | # Rename Columns
 760 | df.rename(columns=newcol_names, inplace=True)
 761 | 
 762 | # Rename Index
 763 | df.rename(index={'index_element_1': 'new_name'})
 764 | 
 765 | # Reindex
 766 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html
 767 | df.reindex(index=[1, 2, 3], columns=[1, 2, 3])
 768 | 
 769 | # Reindex to match another dataframe
 770 | df.reindex_like(df2)
 771 | df.reindex_like(df2, method="ffill") # Fill missing values
 772 | # pad/ffill: Forward fill
 773 | # bfill/backfill: Backward fill
 774 | # nearest: Nearest index value fill
 775 | ```
 776 | 
 777 | 
 778 | 
 779 | ### 4.4 Replacing DataFrame Values <a name="4.4"></a>
 780 | [go to top](#top)
 781 | 
 782 | 
 783 | ```python
 784 | # Replace strings with numbers
 785 | df.replace(['Awful', 'Poor', 'OK', 'Acceptable', 'Perfect'], [0, 1, 2, 3, 4]) 
 786 | 
 787 | # Replace using regex
 788 | df.replace({'\n': '<br>'}, regex=True)
 789 | 
 790 | # Removing Substrings
 791 | df['column_name'] = df['column_name'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
 792 | ```
 793 | 
 794 | 
 795 | 
 796 | ### 4.5 Function Application on DataFrames <a name="4.5"></a>
 797 | [go to top](#top)
 798 | 
 799 | 
 800 | ```python
 801 | # Apply function to all values in a scope
 802 | df['column_name'].apply(function_name)
 803 | 
 804 | # Apply function to all values in DataFrame
 805 | df.applymap(function_name)
 806 | ```
 807 | 
 808 | 
 809 | 
 810 | ### 4.6 Descriptive Statistics <a name="4.6"></a>
 811 | [go to top](#top)
 812 | 
 813 | 
 814 | You can do a bunch of basic statistical calculations on the rows of a DataFrame!
 815 | 
 816 | ```python
 817 | # Sum along axis
 818 | # axis=0 : Along columns
 819 | # axis=1 : Along rows
 820 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html
 821 | df.sum() # Default axis is 0
 822 | df.sum(axis=1)
 823 | df.sum(axis=0, skipna=True, numeric_only=True, min_count=0)
 824 | 
 825 | # Even more!
 826 | df.count()		# Number of non-null observations
 827 | df.mean()		# Mean of Values
 828 | df.median()		# Median of Values
 829 | df.mode()		# Mode of values
 830 | df.std()		# Standard Deviation of the Values
 831 | df.min()		# Minimum Value
 832 | df.max()		# Maximum Value
 833 | df.abs()		# Absolute Value
 834 | df.prod()		# Product of Values
 835 | df.cumsum()		# Cumulative Sum
 836 | df.cumprod()	# Cumulative Product
 837 | 
 838 | # Or just call all of them at once!
 839 | df.describe()
 840 | ```
 841 | 
 842 | 
 843 | 
 844 | ### 4.7 Statistical Methods <a name="4.7"></a>
 845 | [go to top](#top)
 846 | 
 847 | 
 848 | ```python
 849 | # Calculate percentage change
 850 | df.pct_change() # Column wise
 851 | df.pct_change(axis=1) # Row wise
 852 | 
 853 | # Covariance
 854 | s.cov(s2) # For series
 855 | df.cov() # For frame (calculates covariance between all columns)
 856 | 
 857 | # Correlation
 858 | df.corr() # For frames
 859 | df['col_1'].corr(df['col_2']) # For series
 860 | 
 861 | # Data Ranking (Series)
 862 | # https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.rank.html
 863 | # Check the docs for tie-breaking methods
 864 | # average, min, max, first (Default method='average')
 865 | s.rank()
 866 | ```
 867 | 
 868 | 
 869 | 
 870 | ### 4.8 Window Functions <a name="4.8"></a>
 871 | [go to top](#top)
 872 | 
 873 | 
 874 | ```python
 875 | # Rolling Window
 876 | df_rolling = df.rolling(window=3)
 877 | 
 878 | # Now you can use the window!
 879 | # You may use all the descriptive stats and statistical methods
 880 | df_rolling.sum()
 881 | df_rolling.mean()
 882 | df_rolling.median()
 883 | df_rolling.std()
 884 | # and so on...
 885 | 
 886 | # Expanding Window
 887 | # (Yields the value of the statistic with all the data available up to that point in time)
 888 | df_expanding = df.expanding(min_periods=1)
 889 | 
 890 | # Exponential Weighted Functions
 891 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html
 892 | # You can specify decay, half-life, etc. Check the docs!
 893 | df.ewm()
 894 | ```
 895 | 
 896 | 
 897 | 
 898 | ### 4.9 Data Aggregation <a name="4.9"></a>
 899 | [go to top](#top)
 900 | 
 901 | 
 902 | ```python
 903 | # Basically custom operations on windows!
 904 | df_rolling.aggregate(FUNCTION) # On Whole DF
 905 | df_rolling['col'].aggregate(FUNCTION) # On Single Column
 906 | df_rolling[['col', 'col2']].aggregate(FUNCTION) # On Multiple Columns
 907 | 
 908 | # Multiple functions (You'll get two columns as output)
 909 | df_rolling.aggregate([FUNCTION_1, FUNCTION_2])
 910 | 
 911 | # Multiple functions, on different columns
 912 | df_rolling.aggregate({'col_1': FUNCTION_1, 'col_2': FUNCTION_2})
 913 | 
 914 | # If you don't run it on a rolling window, it reduces the dimensionality of the data!
 915 | df.aggregate(np.sum) # Sums the entire column
 916 | ```
 917 | 
 918 | 
 919 | 
 920 | ### 4.10 Dealing with Missing Data <a name="4.10"></a>
 921 | [go to top](#top)
 922 | 
 923 | 
 924 | Null values can be NA, NaN, NaT, or None.
 925 | 
 926 | - NaN: Not a Number
 927 | - NaT: Not a Time
 928 | 
 929 | ```python
 930 | # Detect Missing Values
 931 | df.isnull() # Gives True if value is null
 932 | df.notnull() # Gives True if value is not null
 933 | 
 934 | # Filling Missing Data With Scalar
 935 | df.fillna(scalar_number)
 936 | 
 937 | # Filling Missing Data
 938 | # pad/fill: Fills forward
 939 | # bfill/backfill: Fills backwards
 940 | df.fillna(method='pad')
 941 | 
 942 | # Drop Missing Values
 943 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
 944 | df.dropna(axis=0, how='any') # Drop rows with any column containing None
 945 | df.dropna(axis=0, how='all') # Drop rows with all columns containing None
 946 | df.dropna(axis=0, thresh=2) # Drop rows with 2 or more columns containing None
 947 | ```
 948 | 
 949 | 
 950 | 
 951 | ### 4.11 GroupBy Operations <a name="4.11"></a>
 952 | [go to top](#top)
 953 | 
 954 | 
 955 | Source: <https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm>
 956 | 
 957 | You can group data within your DataFrames in order to:
 958 | 
 959 | - Split the DF
 960 | - Apply a function to the DF
 961 |   - Aggregation
 962 |   - Transformation
 963 |   - Filtration
 964 | - Combine certain results
 965 | 
 966 | ```python
 967 | # Group Data
 968 | df_grouped = df.groupby('key') # By column
 969 | df_grouped = df.groupby('key', axis=1) # By row
 970 | df_grouped = df.groupby(['col_1', 'col_2']) # Multi-Column Group
 971 | 
 972 | # View the groups!
 973 | df_grouped.groups
 974 | 
 975 | # You can iterate through grouped dfs as well!
 976 | for i in df_grouped:
 977 |     pass
 978 | 
 979 | # Select a Single Group
 980 | df_grouped.get_group('group_name')
 981 | 
 982 | # Apply Aggregations
 983 | df_grouped.agg(function)
 984 | df_grouped.agg([function_1, function_2])
 985 | 
 986 | # Apply Transformations
 987 | # Transforms groups or columns inside the dataframe
 988 | transformation_function = lambda x: (x - x.mean()) / x.std()*10
 989 | df_grouped.transform(transformation_function)
 990 | 
 991 | # Apply Filters
 992 | # Works like the native Python filter(filtering_function, iterable) !
 993 | df_grouped.filter(filtering_function)
 994 | df_grouped.filter(lambda x: len(x) > = 3)
 995 | ```
 996 | 
 997 | 
 998 | 
 999 | ### 4.12 Merging and Joining <a name="4.12"></a>
1000 | [go to top](#top)
1001 | 
1002 | 
1003 | > Pandas provides a single function, **merge**, as the entry point for all standard database join operations between DataFrame objects −
1004 | >
1005 | > `pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)`
1006 | >
1007 | > Here, we have used the following parameters −
1008 | >
1009 | > - **left** − A DataFrame object.
1010 | > - **right** − Another DataFrame object.
1011 | > - **on** − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
1012 | > - **left_on** − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
1013 | > - **right_on** − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
1014 | > - **left_index** − If **True,** use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.
1015 | > - **right_index** − Same usage as **left_index** for the right DataFrame.
1016 | > - **how** − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. Each method has been described below.
1017 | > - **sort** − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.
1018 | >
1019 | > <https://www.tutorialspoint.com/python_pandas/python_pandas_merging_joining.htm>
1020 | 
1021 | ```python
1022 | # Code source: https://www.tutorialspoint.com/python_pandas/python_pandas_merging_joining.htm
1023 | 
1024 | # Merge two DFs via key
1025 | left = pd.DataFrame({
1026 |    'id':[1,2,3,4,5],
1027 |    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
1028 |    'subject_id':['sub1','sub2','sub4','sub6','sub5']})
1029 | right = pd.DataFrame({
1030 | 	'id':[1,2,3,4,5],
1031 |    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
1032 |    'subject_id':['sub2','sub4','sub3','sub6','sub5']})
1033 | 
1034 | pd.merge(left,right,on='id')
1035 | 
1036 | '''
1037 | OUTPUT
1038 | 
1039 | Name_x   id  subject_id_x   Name_y   subject_id_y
1040 | 0  Alex      1          sub1    Billy           sub2
1041 | 1  Amy       2          sub2    Brian           sub4
1042 | 2  Allen     3          sub4     Bran           sub3
1043 | 3  Alice     4          sub6    Bryce           sub6
1044 | 4  Ayoung    5          sub5    Betty           sub5
1045 | '''
1046 | 
1047 | # Merge two DFs via multiple keys
1048 | pd.merge(left, right, on=['key_1', 'key_2']) # Unmerged values are discarded
1049 | 
1050 | # Merge using 'HOW'
1051 | '''
1052 | Merge Method	SQL Equivalent		Description
1053 | left			LEFT OUTER JOIN		Use keys from left object
1054 | right			RIGHT OUTER JOIN	Use keys from right object
1055 | outer			FULL OUTER JOIN		Use union of keys
1056 | inner			INNER JOIN			Use intersection of keys
1057 | '''
1058 | pd.merge(left, right, on='key', how='left')
1059 | ```
1060 | 
1061 | Join Intuitions
1062 | 
1063 | ![Image result for outer join inner join image](assets/hMKKt.jpg)
1064 | 
1065 | Image source: <https://stackoverflow.com/questions/38549/what-is-the-difference-between-inner-join-and-outer-join>
1066 | 
1067 | 
1068 | 
1069 | ### 4.13 Concatenation <a name="4.13"></a>
1070 | [go to top](#top)
1071 | 
1072 | 
1073 | > Pandas provides various facilities for easily combining together **Series, DataFrame**, and **Panel** objects.
1074 | >
1075 | > ` pd.concat(objs,axis=0,join='outer',join_axes=None, ignore_index=False)`
1076 | >
1077 | > - **objs** − This is a sequence or mapping of Series, DataFrame, or Panel objects.
1078 | > - **axis** − {0, 1, ...}, default 0. This is the axis to concatenate along.
1079 | > - **join** − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection.
1080 | > - **ignore_index** − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1.
1081 | > - **join_axes** − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic.
1082 | >
1083 | > <https://www.tutorialspoint.com/python_pandas/python_pandas_concatenation.htm>
1084 | 
1085 | ```python
1086 | # Concatenate DFs
1087 | pd.concat([one, two]) # Adds the rows of two DFs together
1088 | pd.concat([one, two], keys=['x', 'y']) # This gives keys to each specific DF
1089 | pd.concat([one, two], ignore_index=True) # You can also make it ignore the original index
1090 | 
1091 | # Concatenate using Append
1092 | one.append(two)
1093 | ```
1094 | 
1095 | 
1096 | 
1097 | ## 5. EXTRA: Helpful Notes <a name="5"></a>
1098 | 
1099 | I couldn't find a suitable place to put this information, so I'll put it here:
1100 | 
1101 | - Pivot tables, stacking, and unstacking
1102 |   <https://nikgrozev.com/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/>
1103 | - Package configuration
1104 |   <https://www.tutorialspoint.com/python_pandas/python_pandas_options_and_customization.htm>
1105 | 
1106 | 
1107 | 
1108 | ```
1109 |                             .     .
1110 |                          .  |\-^-/|  .    
1111 |                         /| } O.=.O { |\
1112 | ```
1113 | 
1114 | ​    
1115 | 
1116 | ------
1117 | 
1118 |  [![Yeah! Buy the DRAGON a COFFEE!](../assets/COFFEE%20BUTTON%20%E3%83%BE(%C2%B0%E2%88%87%C2%B0%5E).png)](https://www.buymeacoffee.com/methylDragon)
1119 | 
1120 | 


--------------------------------------------------------------------------------
/Numpy/01 Numpy Basics.md:
--------------------------------------------------------------------------------
   1 | # Numpy Basics
   2 | 
   3 | Author: methylDragon  
   4 | Contains a syntax reference and code snippets for Numpy!  
   5 | It's a collection of code snippets and tutorials from everywhere all mashed together!       
   6 | 
   7 | ------
   8 | 
   9 | ## Pre-Requisites
  10 | 
  11 | ### Required
  12 | 
  13 | - Python knowledge, this isn't a tutorial!
  14 | - Numpy installed
  15 |   - I'll assume you've already run this line as well `import numpy as np`
  16 | 
  17 | 
  18 | 
  19 | ## Table Of Contents <a name="top"></a>
  20 | 
  21 | 1. [Introduction](#1)    
  22 | 2. [Array Basics](#2)    
  23 |    2.1 [Configuring Numpy](#2.1)    
  24 |    2.2 [Numpy Data Types](#2.2)    
  25 |    2.3 [Creating Arrays](#2.3)    
  26 |    2.4 [Array Basics and Attributes](#2.4)    
  27 |    2.5 [Casting](#2.5)    
  28 |    2.6 [Some Array Methods](#2.6)    
  29 |    2.7 [Array Indexing](#2.7)    
  30 |    2.8 [Array Slicing](#2.8)    
  31 |    2.9 [Reshaping Arrays](#2.9)    
  32 |    2.10 [Array Concatenation and Splitting](#2.10)    
  33 |    2.11 [Array Arithmetic](#2.11)    
  34 |    2.12 [More Array Math](#2.12)    
  35 | 3. [Going Deeper With Arrays](#3)    
  36 |    3.1 [Broadcasting](#3.1)    
  37 |    3.2 [Vectorize](#3.2)    
  38 |    3.3 [Iterating Through Axes](#3.3)    
  39 |    3.4 [Modifying Output Directly](#3.4)    
  40 |    3.5 [Locating Elements](#3.5)    
  41 |    3.6 [Aggregations](#3.6)    
  42 |    3.7 [Comparisons](#3.7)    
  43 |    3.8 [Sorting Arrays](#3.8)    
  44 |    3.9 [Fancy Indexing](#3.9)    
  45 |    3.10 [Structured Arrays](#3.10)    
  46 | 4. [Matrices](#4)    
  47 |    4.1 [Linear Algebra Functions](#4.1)    
  48 | 5. [Numpy I/O](#5)    
  49 |    5.1 [Import from CSV](#5.1)    
  50 |    5.2 [Saving and Loading](#5.2)    
  51 | 
  52 | 
  53 | 
  54 | 
  55 | ## 1. Introduction <a name="1"></a>
  56 | 
  57 | > NumPy is the fundamental package for scientific computing with Python. It contains among other things:
  58 | >
  59 | > - a powerful N-dimensional array object
  60 | > - sophisticated (broadcasting) functions
  61 | > - tools for integrating C/C++ and Fortran code
  62 | > - useful linear algebra, Fourier transform, and random number capabilities
  63 | >
  64 | > Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
  65 | >
  66 | > http://www.numpy.org/
  67 | 
  68 | This document will list the most commonly used functions in Numpy, to serve as a reference when using it.
  69 | 
  70 | It's especially useful because numpy is more efficient than native Python in terms of space usage and speed!
  71 | 
  72 | The reason for that is because of how the arrays are stored:
  73 | 
  74 | ![Image result for numpy vs python](assets/array_vs_list.png)
  75 | 
  76 | Image source: https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html
  77 | 
  78 | You can see that the Python list stores pointers and has to dereference the pointers, but the Numpy array doesn't, because the objects are stored incrementally from the head!
  79 | 
  80 | The Python pointers are extra overhead, same with needing to dereference them.
  81 | 
  82 | It's so useful you see it used in a lot of other packages like OpenCV, Scipy, and pandas!
  83 | 
  84 | ---
  85 | 
  86 | Install it!
  87 | 
  88 | ```shell
  89 | $ pip install numpy
  90 | ```
  91 | 
  92 | If you need additional help or need a refresher on the parameters, feel free to use:
  93 | 
  94 | ```python
  95 | help(np.FUNCTION_YOU_NEED_HELP_WITH)
  96 | ```
  97 | 
  98 | ---
  99 | 
 100 | **Credits:**
 101 | 
 102 | A lot of these notes I'm adapting from 
 103 | 
 104 | https://jakevdp.github.io/PythonDataScienceHandbook/index.html
 105 | 
 106 | http://cs231n.github.io/python-numpy-tutorial/
 107 | 
 108 | https://docs.scipy.org/doc/numpy-1.15.1/reference/
 109 | 
 110 | 
 111 | 
 112 | ## 2. Array Basics <a name="2"></a>
 113 | 
 114 | The core, most important object in Numpy is the **ndarray**, which stands for n-dimensional array.
 115 | 
 116 | > An [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its [`shape`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.shape.html#numpy.ndarray.shape), which is a [`tuple`](https://docs.python.org/dev/library/stdtypes.html#tuple)of *N* positive integers that specify the sizes of each dimension. The type of items in the array is specified by a separate [data-type object (dtype)](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html#arrays-dtypes), one of which is associated with each ndarray.
 117 | >
 118 | > As with other container objects in Python, the contents of an [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) can be accessed and modified by [indexing or slicing](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#arrays-indexing) the array (using, for example, *N* integers), and via the methods and attributes of the [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray).
 119 | >
 120 | > Different [`ndarrays`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) can share the same data, so that changes made in one [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) may be visible in another. That is, an ndarray can be a *“view”* to another ndarray, and the data it is referring to is taken care of by the *“base”* ndarray. ndarrays can also be views to memory owned by Python [`strings`](https://docs.python.org/dev/library/stdtypes.html#str) or objects implementing the `buffer` or [array](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.interface.html#arrays-interface) interfaces.
 121 | >
 122 | > https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html
 123 | 
 124 | 
 125 | 
 126 | ### 2.1 Configuring Numpy <a name="2.1"></a>
 127 | [go to top](#top)
 128 | 
 129 | 
 130 | ```python
 131 | # Set printing precision
 132 | np.set_printoptions(precision=2) 
 133 | ```
 134 | 
 135 | 
 136 | 
 137 | ### 2.2 Numpy Data Types <a name="2.2"></a>
 138 | [go to top](#top)
 139 | 
 140 | 
 141 | #### **List**
 142 | 
 143 | | Data type    | Description                                                  |
 144 | | ------------ | ------------------------------------------------------------ |
 145 | | `bool_`      | Boolean (True or False) stored as a byte                     |
 146 | | `int_`       | Default integer type (same as C `long`; normally either `int64` or `int32`) |
 147 | | `intc`       | Identical to C `int` (normally `int32` or `int64`)           |
 148 | | `intp`       | Integer used for indexing (same as C `ssize_t`; normally either `int32` or `int64`) |
 149 | | `int8`       | Byte (-128 to 127)                                           |
 150 | | `int16`      | Integer (-32768 to 32767)                                    |
 151 | | `int32`      | Integer (-2147483648 to 2147483647)                          |
 152 | | `int64`      | Integer (-9223372036854775808 to 9223372036854775807)        |
 153 | | `uint8`      | Unsigned integer (0 to 255)                                  |
 154 | | `uint16`     | Unsigned integer (0 to 65535)                                |
 155 | | `uint32`     | Unsigned integer (0 to 4294967295)                           |
 156 | | `uint64`     | Unsigned integer (0 to 18446744073709551615)                 |
 157 | | `float_`     | Shorthand for `float64`.                                     |
 158 | | `float16`    | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa |
 159 | | `float32`    | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa |
 160 | | `float64`    | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa |
 161 | | `complex_`   | Shorthand for `complex128`.                                  |
 162 | | `complex64`  | Complex number, represented by two 32-bit floats             |
 163 | | `complex128` | Complex number, represented by two 64-bit floats             |
 164 | 
 165 | #### **nan and inf**
 166 | 
 167 | It's numpy's version of None and infinity!
 168 | 
 169 | ```python
 170 | np.nan
 171 | np.inf
 172 | 
 173 | # To check if something is nan or inf,
 174 | np.isnan(x)
 175 | np.isinf(x)
 176 | ```
 177 | 
 178 | 
 179 | 
 180 | ### 2.3 Creating Arrays <a name="2.3"></a>
 181 | [go to top](#top)
 182 | 
 183 | 
 184 | General note: Basically any of these functions takes a dtype parameter where you can state the data-type of the output.
 185 | 
 186 | #### **From Python List**
 187 | 
 188 | ```python
 189 | # Basic
 190 | np.array([1, 2, 3, 4, 5])
 191 | # Out: array([1, 2, 3, 4, 5])
 192 | 
 193 | # Upcasted (ints are casted to float due to type constraint)
 194 | np.array([1.1, 2, 3, 4, 5])
 195 | # Out: array([1.1, 2., 3., 4., 5.])
 196 | 
 197 | # Explicit type
 198 | np.array([1, 2, 3, 4, 5], dtype='float32')
 199 | # Out: array([1., 2., 3., 4., 5.], dtype=float32)
 200 | 
 201 | # Multi-dimensional
 202 | np.array([[1,2],[3,4],[5,6]])
 203 | # Out: array([[1,2],
 204 | #             [3,4],
 205 | #             [5,6]])
 206 | ```
 207 | 
 208 | #### **From Scratch**
 209 | 
 210 | **Filled Arrays**
 211 | 
 212 | ```python
 213 | # All zeroes
 214 | np.zeros(5, dtype=int)
 215 | # Out: array([0, 0, 0, 0, 0])
 216 | 
 217 | # Multi-dimensional Zeros
 218 | np.zeros((2, 2))
 219 | # Out: array([[0., 0.],
 220 | #             [0., 0.]])
 221 | 
 222 | # Ones
 223 | np.ones((2, 2), dtype=float)
 224 | # Out: array([[1., 1.],
 225 | #             [1., 1.]])
 226 | 
 227 | # Filled array (It even works for non-standard numbers! AHAHAHA)
 228 | np.full((2, 2), 'CH3')
 229 | # array([['CH3', 'CH3'],
 230 | #        ['CH3', 'CH3']], dtype='<U3')
 231 | 
 232 | # Identity
 233 | np.identity(3) # Number of rows and columns in n x n output
 234 | # Out: array([[1., 0., 0.],
 235 | #             [0., 1., 0.],
 236 | #             [0., 0., 1.]])
 237 | 
 238 | # Eye (Just identity with an offsettable diagonal)
 239 | np.eye(3, 4, k=1) # Rows, Columns, Offset
 240 | # Out: array([[0., 1., 0., 0.],
 241 | #             [0., 0., 1., 0.],
 242 | #             [0., 0., 0., 1.]])
 243 | 
 244 | # Empty array (It won't actually be empty, just uninitialised)
 245 | # YOU MUST RE-INITIALISE THEM!
 246 | np.empty(4)
 247 | # Out: array([1.85520069e-316, 2.37663529e-312, 2.56761491e-312, 2.37151510e-322])
 248 | ```
 249 | **Filled Arrays like Input Array**
 250 | 
 251 | ```python
 252 | # First specify an array to copy
 253 | example_array = np.array([[1, 1], [2, 2], [3, 3]])
 254 | 
 255 | # Zeros-like
 256 | np.zeros_like(example_array, dtype=int)
 257 | # Out: array([[0, 0],
 258 | #             [0, 0],
 259 | #             [0, 0]])
 260 | 
 261 | # Ones-like
 262 | np.ones_like(example_array, dtype=int)\
 263 | # Out: array([[1, 1],
 264 | #             [1, 1],
 265 | #             [1, 1]])
 266 | ```
 267 | 
 268 | **Sequences**
 269 | 
 270 | ```python
 271 | # Linear Sequence (Think native Python range)
 272 | np.arange(0, 10, 2)
 273 | # Out: array([0, 2, 4, 6, 8])
 274 | 
 275 | # Linear Space
 276 | np.linspace(0, 1, 5) # start, stop, number
 277 | # Out: array([0., 0.25, 0.5, 0.75, 1.])
 278 | 
 279 | # Log Space
 280 | np.logspace(0, 2, 3, base=2) # start, stop, number, base
 281 | # Out: array([1., 2., 4.])
 282 | 
 283 | # Random Values (Will generate between 0 and 1)
 284 | # Just do scalar multiplication, or arithmetic if you want other intervals
 285 | np.random.random((2, 2))
 286 | # Out: array([[0.18818377, 0.72342759],
 287 | #             [0.29651442, 0.88577633]])
 288 | 
 289 | # Random Integers
 290 | np.random.randint(0, 10, (2, 2)) # lower, upper, size
 291 | # Out: array([[0, 2],
 292 | #            [3, 7]])
 293 | 
 294 | # Random Values, using Normal Distribution
 295 | np.random.normal(0, 1 (3, 3)) # Mean, std_dev, size
 296 | # Out: array([[-0.81699102, -1.01669763,  0.02438341],
 297 | #             [-0.00289402, -1.57459419,  0.73531925],
 298 | #             [ 0.33005433, -0.74426642,  0.97679512]])
 299 | ```
 300 | 
 301 | #### **Mesh Grids**
 302 | 
 303 | Mesh grids are useful when you need a rectangular grid of x and y values.
 304 | 
 305 | You can use them to plot higher dimensional functions!
 306 | 
 307 | ![enter image description here](assets/kZNzz.png)
 308 | 
 309 | ```python
 310 | xx, yy = np.mgrid[0:5,0:5]
 311 | 
 312 | # xx: array([[[0, 0, 0, 0, 0],
 313 | #             [1, 1, 1, 1, 1],
 314 | #             [2, 2, 2, 2, 2],
 315 | #             [3, 3, 3, 3, 3],
 316 | #             [4, 4, 4, 4, 4]])
 317 | 
 318 | # yy: array([[0, 1, 2, 3, 4],
 319 | #            [0, 1, 2, 3, 4],
 320 | #            [0, 1, 2, 3, 4],
 321 | #            [0, 1, 2, 3, 4],
 322 | #            [0, 1, 2, 3, 4]]])
 323 | ```
 324 | 
 325 | Here's an example
 326 | 
 327 | ![enter image description here](assets/vceRQ.png)
 328 | 
 329 | ```python
 330 | # Source: https://stackoverflow.com/questions/36013063/what-is-the-purpose-of-meshgrid-in-python-numpy
 331 | 
 332 | import matplotlib.pyplot as plt
 333 | 
 334 | def sinus2d(x, y):
 335 |     return np.sin(x) + np.sin(y)
 336 | 
 337 | xx, yy = np.meshgrid(np.linspace(0,2*np.pi,100), np.linspace(0,2*np.pi,100))
 338 | z = sinus2d(xx, yy) # Create the image on this grid
 339 | 
 340 | plt.imshow(z, origin='lower', interpolation='none')
 341 | plt.show()
 342 | ```
 343 | 
 344 | 
 345 | 
 346 | ### 2.4 Array Basics and Attributes <a name="2.4"></a>
 347 | [go to top](#top)
 348 | 
 349 | #### **Shape and Index**
 350 | 
 351 | It is important to get a proper understanding of the shape of numpy arrays!
 352 | 
 353 | ![Image result for numpy shape](assets/elsp_0105.png)
 354 | 
 355 | [Image Source](https://www.oreilly.com/library/view/elegant-scipy/9781491922927/ch01.html)
 356 | 
 357 | The corresponding arrays will look like:
 358 | 
 359 | ```python
 360 | # 1D
 361 | # Every 1D array can be treated as a column vector!
 362 | [7, 2, 9, 10]
 363 | 
 364 | # 2D
 365 | [[5.2, 3.0, 4.5],
 366 |  [9.1, 0.1, 0.3]]
 367 | 
 368 | # And so on
 369 | ```
 370 | 
 371 | Another way of looking at it is, **matrix indexing**! Numpy goes by **i, j**, from **2D arrays onwards only**.
 372 | 
 373 | If you want to think of it as x, and y, then axis 0 is y, and axis 1 is x. So the indexing is `(y, x)`, and `(i, j)`.
 374 | 
 375 | > If you want to do matrix or vector operations, it is best to do it from at least a 2D array.
 376 | >
 377 | > 
 378 | 
 379 | ![Image result for matrix i j](assets/Matrix.svg.png)
 380 | 
 381 | [Image Source](https://simple.wikipedia.org/wiki/Matrix_(mathematics))
 382 | 
 383 | #### **Attributes**
 384 | 
 385 | 
 386 | ```python
 387 | # Suppose we create a 3 dimensional array
 388 | example_array = np.random.randint(5, size=(2, 3, 4))
 389 | # Out: array([[[0, 0, 3, 3],
 390 | #              [2, 1, 1, 3],
 391 | #              [2, 2, 4, 4]],
 392 | #
 393 | #             [[2, 0, 1, 3],
 394 | #              [2, 3, 0, 1],
 395 | #              [2, 0, 1, 2]]])
 396 | 
 397 | # Dimensions
 398 | example_array.ndim # 3
 399 | 
 400 | # Shape
 401 | example_array.shape # (2, 3, 4) planes, rows, columns (for images, height, width, depth)
 402 | 
 403 | # Total Elements
 404 | example_array.size # 24 (which is 2 * 3 * 4)
 405 | 
 406 | # Type
 407 | example_array.dtype # dtype('int64')
 408 | 
 409 | # Byte-size of each element
 410 | example_array.itemsize # 8
 411 | 
 412 | # Total byte-size
 413 | example_array.nbytes # 192 (which is 2 * 3 * 4 * 8)
 414 | ```
 415 | 
 416 | 
 417 | 
 418 | ### 2.5 Casting <a name="2.5"></a>
 419 | [go to top](#top)
 420 | 
 421 | 
 422 | ```python
 423 | # Just use the .astype() method!
 424 | 
 425 | np.array([True, True, True, False, False, False]).astype('int')
 426 | # Out: array([1, 1, 1, 0, 0, 0])
 427 | ```
 428 | #### **Array to List**
 429 | 
 430 | ```python
 431 | np.array([1, 2, 3]).tolist() # [1, 2, 3] native Python list!
 432 | ```
 433 | 
 434 | #### **List to Array**
 435 | 
 436 | ```python
 437 | np.asarray([1, 2, 3]) # This can take list of tuples, tuples, etc.!
 438 | ```
 439 | 
 440 | 
 441 | 
 442 | ### 2.6 Some Array Methods <a name="2.6"></a>
 443 | [go to top](#top)
 444 | 
 445 | 
 446 | There are really a lot of them!
 447 | 
 448 | #### **Repeat and Tile**
 449 | 
 450 | ```python
 451 | a = [1, 2, 3]
 452 | 
 453 | np.tile(a, 2) # array([1, 2, 3, 1, 2, 3])
 454 | np.repeat(a, 2) # array([1, 1, 2, 2, 3, 3])
 455 | ```
 456 | 
 457 | #### **Get Unique**
 458 | 
 459 | ```python
 460 | a = np.array([1, 1, 1, 1, 2, 2, 2, 3, 3, 4])
 461 | 
 462 | np.unique(a, return_counts=True)
 463 | # Out: (array([1, 2, 3, 4]), array([4, 3, 2, 1]))
 464 | # (Unique set, Counts)
 465 | ```
 466 | 
 467 | #### **Rounding**
 468 | 
 469 | ```python
 470 | a = np.array([1.111, 2.222, 3.333, 4.444])
 471 | 
 472 | np.around(a) # array([1., 2., 3., 4.])
 473 | np.around(a, 2) # array([1.11, 2.22, 3.33, 4.44])
 474 | 
 475 | b = np.array([12345])
 476 | 
 477 | np.around(b, -1) # array([12340])
 478 | np.around(b, -2) # array([12300])
 479 | ```
 480 | 
 481 | #### **Floor**
 482 | 
 483 | ```python
 484 | a = np.array([1.111, 2.222, 3.333, 4.444, 5.555])
 485 | 
 486 | np.floor(a) # array([1., 2., 3., 4., 5.])
 487 | ```
 488 | 
 489 | #### **Ceil**
 490 | 
 491 | ```python
 492 | a = np.array([1.111, 2.222, 3.333, 4.444, 5.555])
 493 | 
 494 | np.ceil(a) # array([2., 3., 4., 5., 6.])
 495 | ```
 496 | 
 497 | #### **Count Non-Zeroes**
 498 | 
 499 | ```python
 500 | np.count_nonzero(array) # Gives you number of non-zero elements in the array
 501 | ```
 502 | 
 503 | #### **Digitize**
 504 | 
 505 | ```python
 506 | x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 507 | bins = np.array([0, 3, 6, 9])
 508 | 
 509 | # Return index of the bin each element belongs to
 510 | # You can use this together with take to get the digitized array!
 511 | np.digitize(x, bins) # array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
 512 | ```
 513 | 
 514 | #### **Clip**
 515 | 
 516 | Clip values
 517 | 
 518 | ```python
 519 | x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 520 | 
 521 | np.clip(x, 3, 8) # array([3, 3, 3, 3, 4, 5, 6, 7, 8, 8])
 522 | ```
 523 | 
 524 | #### **Histogram and Bincount**
 525 | 
 526 | ```python
 527 | x = np.array([1,1,2,2,2,4,4,5,6,6,6])
 528 | 
 529 | np.bincount(x) # array([0, 2, 3, 0, 2, 1, 3])
 530 | # How to read output:
 531 | # 0 occurs 0 times
 532 | # 1 occurs 2 times
 533 | # 2 occurs 3 times and so on
 534 | 
 535 | np.histogram(x, [0, 2, 4, 6, 8]) # (array([2, 3, 3, 3]), array([0, 2, 4, 6, 8]))
 536 | # First array are the counts
 537 | # Second array are the bins
 538 | # In this case, the bottom of the bins are inclusive, the tops are not
 539 | # Eg. [0, 2): 2
 540 | #     [2, 4): 3,
 541 | #     [4, 6): 3
 542 | #     [6, 8): 3
 543 | ```
 544 | 
 545 | **At**
 546 | 
 547 | If you just want to target these functions at a subset of an array, use at
 548 | 
 549 | ```python
 550 | np.some_numpy_function.at(array, [0, 1])
 551 | 
 552 | # Example
 553 | x = np.array([1, 2, 3, 4])
 554 | np.negative.at(x, [0, 1]) # This will mutate x
 555 | 
 556 | # x is now array([-1, -2, 3, 4])
 557 | ```
 558 | 
 559 | 
 560 | 
 561 | ### 2.7 Array Indexing <a name="2.7"></a>
 562 | [go to top](#top)
 563 | 
 564 | 
 565 | Of course, you can modify once you index as per normal as well!
 566 | 
 567 | **Note:** if you have an int array, and you try to replace it with a float, it'll be casted to int. (eg. 3.12 -> 3)
 568 | 
 569 | ```python
 570 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
 571 |                   [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
 572 | # Out: array([[[ 1,  2,  3],
 573 | #              [ 4,  5,  6],
 574 | #              [ 7,  8,  9]],
 575 | #
 576 | #             [[10, 11, 12],
 577 | #              [13, 14, 15],
 578 | #              [16, 17, 18]]])
 579 | ```
 580 | #### **One-dimensional**
 581 | 
 582 | Works just like native Python!
 583 | 
 584 | ```python
 585 | array[0]
 586 | # Out: array([[1, 2, 3],
 587 | #             [4, 5, 6],
 588 | #             [7, 8, 9]])
 589 | 
 590 | array[-1]
 591 | # Out: array([[10, 11, 12],
 592 | #             [13, 14, 15],
 593 | #             [16, 17, 18]])
 594 | ```
 595 | 
 596 | #### **Multi-dimensional**
 597 | 
 598 | ```python
 599 | array[0, 0]
 600 | # Out: array([1, 2, 3])
 601 | 
 602 | array[0, 0, 0]
 603 | # Out: 1
 604 | ```
 605 | 
 606 | #### **Conditional Indexing (Boolean Masks)**
 607 | 
 608 | ```python
 609 | a = np.array([1, 2, 3, 4, 5])
 610 | 
 611 | a[a > 3] # array([4, 5])
 612 | a[np.iscomplex(a)] # array([], dtype=int64)
 613 | ```
 614 | 
 615 | 
 616 | 
 617 | ### 2.8 Array Slicing <a name="2.8"></a>
 618 | [go to top](#top)
 619 | 
 620 | 
 621 | **Note:** Unlike in native Python, slicing an array gives you an **array view**, not a copy! So if you alter the array view, it'll alter the original array!
 622 | 
 623 | #### **One-dimensional**
 624 | 
 625 | ```python
 626 | array = np.arange(10)
 627 | # Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 628 | 
 629 | # From start
 630 | array[:5]
 631 | # Out: array([0, 1, 2, 3, 4])
 632 | 
 633 | # From end
 634 | array[5:]
 635 | # Out: array([5, 6,  7, 8, 9])
 636 | 
 637 | # From middle
 638 | array[4:7]
 639 | # Out: array([4, 5, 6])
 640 | 
 641 | # Every other element
 642 | array[::2]
 643 | # Out: array([0, 2, 4, 6, 8])
 644 | 
 645 | # Every other element from index 1
 646 | array[1::2]
 647 | # Out: array([1, 3, 5, 7, 9])
 648 | 
 649 | # Reversed
 650 | array[::-1]
 651 | # Out: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
 652 | 
 653 | # Reversed, every other element from index 5
 654 | array[5::-2]
 655 | # Out: array([5, 3, 1])
 656 | ```
 657 | 
 658 | #### **Multi-dimensional**
 659 | 
 660 | ```python
 661 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
 662 |                   [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
 663 | # Out: array([[[ 1,  2,  3],
 664 | #              [ 4,  5,  6],
 665 | #              [ 7,  8,  9]],
 666 | #
 667 | #             [[10, 11, 12],
 668 | #              [13, 14, 15],
 669 | #              [16, 17, 18]]])
 670 | 
 671 | # First from start
 672 | array[:1]
 673 | # Out: array([[[1, 2, 3],
 674 | #              [4, 5, 6],
 675 | #              [7, 8, 9]]])
 676 | 
 677 | # First from end
 678 | array[1:]
 679 | # Out: array([[[10, 11, 12],
 680 | #              [13, 14, 15],
 681 | #              [16, 17, 18]]])
 682 | 
 683 | # First from start from first array from start as nested array
 684 | array[:1, :1]
 685 | # Out: array([[[1, 2, 3]]])
 686 | 
 687 | # Get first element from every innermost array as nested array
 688 | array[:, :, :1]
 689 | # Out: array([[[ 1],
 690 | #              [ 4],
 691 | #              [ 7]],
 692 | #
 693 | #             [[10],
 694 | #              [13],
 695 | #              [16]]])
 696 | 
 697 | # Reverse innermost two layers
 698 | array[:, ::-1, ::-1]
 699 | # Out: array([[[ 9,  8,  7],
 700 | #              [ 6,  5,  4],
 701 | #              [ 3,  2,  1]],
 702 | #
 703 | #             [[18, 17, 16],
 704 | #              [15, 14, 13],
 705 | #              [12, 11, 10]]])
 706 | ```
 707 | 
 708 | #### **Multi-dimensional Access**
 709 | 
 710 | Sometimes you just want the columns or rows nicely shown as a one dimensional array instead of a nested one.
 711 | 
 712 | **Note: They'll still be editable views!**
 713 | 
 714 | Here's how to do it!
 715 | 
 716 | ```python
 717 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
 718 |                   [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
 719 | # Out: array([[[ 1,  2,  3],
 720 | #              [ 4,  5,  6],
 721 | #              [ 7,  8,  9]],
 722 | #
 723 | #             [[10, 11, 12],
 724 | #              [13, 14, 15],
 725 | #              [16, 17, 18]]])
 726 | 
 727 | # First column from first array
 728 | array[0][:, 0]
 729 | # Out: array([1, 4, 7])
 730 | 
 731 | # First row from first array (also equivalent to array[0][0])
 732 | array[0][0, :]
 733 | # Out: array([1, 2, 3])
 734 | 
 735 | # Nested array of first column from each array
 736 | array[:, :, 0]
 737 | # Out: array([[ 1,  4,  7],
 738 | #             [10, 13, 16]])
 739 | ```
 740 | 
 741 | #### **Array Views**
 742 | 
 743 | Remember what I said about array views?
 744 | 
 745 | ```python
 746 | # Native Python
 747 | a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 748 | b = a[:5] # [0, 1, 2, 3, 4]
 749 | 
 750 | b[0]= 5 # b is now [5, 1, 2, 3, 4]
 751 | a # But a is still [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 752 | 
 753 | # Numpy
 754 | a = np.arange(10) # array([1, 2, 3, 4, 5, 6, 7, 8, 9])
 755 | 
 756 | b = a[5:] # array([0, 1, 2, 3, 4])
 757 | b[0] = 5
 758 | a # a is now [5, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 759 | ```
 760 | 
 761 | #### **Copying Instead of Views**
 762 | 
 763 | ```python
 764 | # Just use .copy() !
 765 | 
 766 | a = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 767 | 
 768 | b = a[5:].copy() # array([0, 1, 2, 3, 4])
 769 | b[0] = 5 # b is array([5, 1, 2, 3, 4])
 770 | a # a is unchanged
 771 | ```
 772 | 
 773 | 
 774 | 
 775 | ### 2.9 Reshaping Arrays <a name="2.9"></a>
 776 | [go to top](#top)
 777 | 
 778 | 
 779 | #### **Reshape**
 780 | 
 781 | ```python
 782 | array = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 783 | 
 784 | # Reshape reshapes the arrays. Of course!
 785 | # You can reshape the array into any n dimensions! Just make sure all the arguments multiplied equal the number of elements of your input array!
 786 | 
 787 | array.reshape(10)
 788 | # Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 789 | 
 790 | array.reshape(1, 10)
 791 | # Out: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
 792 | 
 793 | array.reshape(2, 5)
 794 | # Out: array([[0, 1, 2, 3, 4],
 795 | #             [5, 6, 7, 8, 9]])
 796 | 
 797 | array.reshape(1, 1, 5, 2)
 798 | # Out: array([[[[0, 1],
 799 | #               [2, 3],
 800 | #               [4, 5],
 801 | #               [6, 7],
 802 | #               [8, 9]]]])
 803 | 
 804 | # You can also use reshape(-1, <dimension>) to have numpy figure out the other size for you!
 805 | array.reshape(-1, 5)
 806 | # Out: array([[0, 1, 2, 3, 4],
 807 | #             [5, 6, 7, 8, 9]])
 808 | ```
 809 | 
 810 | #### **Reshaping with np.newaxis**
 811 | 
 812 | ```python
 813 | # Create as row
 814 | array[np.newaxis, :] # Equivalent to array.reshape(1, 10)
 815 | # Out: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
 816 | 
 817 | # Create as column
 818 | array[:, np.newaxis] # Equivalent to array.reshape(10, 1)
 819 | # Out: array([[0],
 820 | #             [1],
 821 | #             [2],
 822 | #             [3],
 823 | #             [4],
 824 | #             [5],
 825 | #             [6],
 826 | #             [7],
 827 | #             [8],
 828 | #             [9]])
 829 | ```
 830 | 
 831 | #### **Flatten and Ravel**
 832 | 
 833 | ```python
 834 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
 835 |                   [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
 836 | 
 837 | ## Flatten creates a copy!
 838 | 
 839 | # Equivalent
 840 | array.flatten()
 841 | np.flatten(array)
 842 | # Out: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
 843 | 
 844 | ## Ravel creates a view! Editing the ravelled array will edit the parent!
 845 | 
 846 | # Equivalent
 847 | array.ravel()
 848 | np.ravel(array)
 849 | # Out: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
 850 | ```
 851 | 
 852 | #### **Squeeze**
 853 | 
 854 | Remove single dimensional entries
 855 | 
 856 | ```python
 857 | array = np.array([[[1]]])
 858 | 
 859 | # Equivalent
 860 | np.squeeze(array) # 1
 861 | array.squeeze() # 1
 862 | ```
 863 | 
 864 | #### **Transpose**
 865 | 
 866 | ```python
 867 | array = np.array([[1, 1], [2, 2]])
 868 | 
 869 | # Equivalent
 870 | array.T
 871 | array.transpose()
 872 | np.transpose(array)
 873 | np.rollaxis(array, 1)
 874 | np.swapaxes(array, 0, 1)
 875 | 
 876 | # Out: array([[1, 2],
 877 | #             [1, 2]])
 878 | ```
 879 | 
 880 | 
 881 | 
 882 | 
 883 | 
 884 | 
 885 | 
 886 | ### 2.10 Array Concatenation and Splitting <a name="2.10"></a>
 887 | [go to top](#top)
 888 | 
 889 | 
 890 | #### **Concatenating**
 891 | 
 892 | ```python
 893 | a = np.array([1, 2, 3])
 894 | b = np.array([4, 5, 6])
 895 | c = np.array([[7, 8, 9], [10, 11, 12]])
 896 | 
 897 | np.concatenate([a, b])
 898 | # Out: array([1, 2, 3, 4, 5, 6])
 899 | 
 900 | # You can do it with more than two arrays
 901 | np.concatenate([a, b, a, b])
 902 | # Out: array([1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6])
 903 | 
 904 | # Just make sure all inputs are of the same dimension!
 905 | np.concatenate([c, c, c])
 906 | # Out: array([[ 7,  8,  9],
 907 | #             [10, 11, 12],
 908 | #             [ 7,  8,  9],
 909 | #             [10, 11, 12],
 910 | #             [ 7,  8,  9],
 911 | #             [10, 11, 12]])
 912 | 
 913 | # You may even choose a different axis to concatenate along!
 914 | np.concatenate([c, c, c], axis=1)
 915 | # Out: array([[ 7,  8,  9,  7,  8,  9,  7,  8,  9],
 916 | #             [10, 11, 12, 10, 11, 12, 10, 11, 12]])
 917 | 
 918 | # More examples
 919 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
 920 |                   [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
 921 | 
 922 | np.concatenate([array, array], axis=0)
 923 | # Out: array([[[ 1,  2,  3],
 924 | #              [ 4,  5,  6],
 925 | #              [ 7,  8,  9]],
 926 | #
 927 | #             [[10, 11, 12],
 928 | #              [13, 14, 15],
 929 | #              [16, 17, 18]],
 930 | #
 931 | #             [[ 1,  2,  3],
 932 | #              [ 4,  5,  6],
 933 | #              [ 7,  8,  9]],
 934 | #
 935 | #             [[10, 11, 12],
 936 | #              [13, 14, 15],
 937 | #              [16, 17, 18]]])
 938 |     
 939 | np.concatenate([array, array], axis=1)
 940 | # Out: array([[[ 1,  2,  3],
 941 | #              [ 4,  5,  6],
 942 | #              [ 7,  8,  9],
 943 | #              [ 1,  2,  3],
 944 | #              [ 4,  5,  6],
 945 | #              [ 7,  8,  9]],
 946 | #
 947 | #             [[10, 11, 12],
 948 | #              [13, 14, 15],
 949 | #              [16, 17, 18],
 950 | #              [10, 11, 12],
 951 | #              [13, 14, 15],
 952 | #              [16, 17, 18]]])
 953 |     
 954 | np.concatenate([array, array], axis=2)
 955 | # Out: array([[[ 1,  2,  3,  1,  2,  3],
 956 | #              [ 4,  5,  6,  4,  5,  6],
 957 | #              [ 7,  8,  9,  7,  8,  9]],
 958 | #
 959 | #             [[10, 11, 12, 10, 11, 12],
 960 | #              [13, 14, 15, 13, 14, 15],
 961 | #              [16, 17, 18, 16, 17, 18]]])
 962 | ```
 963 | 
 964 | #### **Stacking**
 965 | 
 966 | ```python
 967 | a = np.array([1, 2, 3])
 968 | b = np.array([4, 5, 6])
 969 | 
 970 | # Vertical Stack
 971 | np.vstack([a, b])
 972 | # Out: array([[1, 2, 3],
 973 | #             [4, 5, 6]])
 974 | 
 975 | # Horizontal Stack
 976 | np.hstack([a, b])
 977 | # Out: array([1, 2, 3, 4, 5, 6])
 978 | 
 979 | # Third Axis Stack (Note how output is 3 dimensions)
 980 | np.dstack([a, b])
 981 | # Out: array([[[1, 4],
 982 | #              [2, 5],
 983 | #              [3, 6]]])
 984 | ```
 985 | 
 986 | #### **Splitting**
 987 | 
 988 | ```python
 989 | array = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
 990 | 
 991 | # Write the split indexes!
 992 | a, b, c = np.split(array, [1, 2])
 993 | 
 994 | a # array([0])
 995 | b # array([1])
 996 | c # array([2, 3, 4, 5, 6, 7, 8, 9])
 997 | 
 998 | grid = np.arange(16).reshape((4, 4))
 999 | # Out: array([[ 0,  1,  2,  3],
1000 | #             [ 4,  5,  6,  7],
1001 | #             [ 8,  9, 10, 11],
1002 | #             [12, 13, 14, 15]])
1003 | 
1004 | upper, lower = np.vsplit(grid, [2])
1005 | 
1006 | upper # array([[0 1 2 3], [4 5 6 7]])
1007 | lower # array([[ 8,  9, 10, 11], [12, 13, 14, 15]]))
1008 | 
1009 | left, right = np.hsplit(grid, [2])
1010 | 
1011 | left
1012 | # array([[ 0,  1],
1013 | #        [ 4,  5],
1014 | #        [ 8,  9],
1015 | #        [12, 13]])
1016 | 
1017 | right
1018 | # array([[ 2,  3],
1019 | #        [ 6,  7],
1020 | #        [10, 11],
1021 | #        [14, 15]])
1022 | 
1023 | # You can use dsplit also! But it only works on arrays of 3 dimensions or more
1024 | ```
1025 | 
1026 | 
1027 | 
1028 | ### 2.11 Array Arithmetic <a name="2.11"></a>
1029 | [go to top](#top)
1030 | 
1031 | 
1032 | ```python
1033 | array = np.arange(4) # array([0, 1, 2, 3])
1034 | 
1035 | array + 5 # array([5, 6, 7, 8])
1036 | array - 5 # array([-5, -4, -3, -2])
1037 | array * 2 # array([0, 2, 4, 6, 8])
1038 | array / 2 # array([0., 0.5, 1., 1.5])
1039 | array // 2 # array([0, 0, 1, 1])
1040 | 
1041 | -array # array([0, -1, -2, -3])
1042 | array ** 2 # array([0, 1, 4, 9])
1043 | array % 2 # array([0, 1, 0, 1])
1044 | 
1045 | # Equivalent
1046 | np.add(array, 5) # +
1047 | np.subtract(array, 5) # -
1048 | np.multiply(array, 2) # *
1049 | np.divide(array, 2) # /
1050 | np.floor_divide(array, 2) # //
1051 | 
1052 | np.negative(array) # -
1053 | np.power(array, 2) # **
1054 | np.mod(array, 2) # %
1055 | ```
1056 | 
1057 | 
1058 | 
1059 | ### 2.12 More Array Math <a name="2.12"></a>
1060 | [go to top](#top)
1061 | 
1062 | 
1063 | ```python
1064 | array = np.array([0, -1, 2, -3, 4])
1065 | ```
1066 | 
1067 | 
1068 | #### **Abs**
1069 | ```python
1070 | abs(array) # array([0, 1, 2, 3, 4])
1071 | np.abs(array) # Same
1072 | np.absolute(array) # Same
1073 | ```
1074 | 
1075 | #### **Complex Mod**
1076 | ```python
1077 | x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
1078 | np.abs(x) # array([ 5.,  5.,  2.,  1.])
1079 | ```
1080 | 
1081 | #### **Trigonometry**
1082 | ```python
1083 | theta = np.linspace(0, np.pi, 3) # array([ 0., 1.57079633, 3.14159265])
1084 | 
1085 | np.sin(theta) # array([0.00000000e+00, 1.00000000e+00, 1.22464680e-16])
1086 | np.cos(theta) # array([1.00000000e+00, 6.12323400e-17,-1.00000000e+00])
1087 | np.tan(theta) # array([0.00000000e+00, 1.63312394e+16, -1.22464680e-16])
1088 | 
1089 | # More Trigonometry
1090 | x = [-1, 0, 1] # By the way, YES, this is a Native Python list!
1091 | 
1092 | np.arcsin(x) # array([-1.57079633, 0., 1.57079633]) turns it into a numpy array!
1093 | np.arccos(x) # You get what you expect
1094 | np.arctan(x) # Same here
1095 | ```
1096 | 
1097 | #### **Exponents**
1098 | ```python
1099 | x = [1, 2, 3]
1100 | 
1101 | # e^x
1102 | np.exp(x) # array([2.71828183, 7.3890561, 20.08553692])
1103 | 
1104 | # 2^x
1105 | np.exp2(x) # array([2., 4., 8.])
1106 | 
1107 | # 3^x
1108 | np.power(3, x) # array([3, 9, 27])
1109 | ```
1110 | 
1111 | #### **Logarithms**
1112 | ```python
1113 | np.log(x) # ln
1114 | np.log2(x) # log base 2
1115 | np.log10(x) # log base 10
1116 | 
1117 | # Super high precision
1118 | np.expm1(x) # exp(x) - 1
1119 | np.log1p(x) # log(1 + x)
1120 | ```
1121 | 
1122 | #### **Reciprocal**
1123 | 
1124 | ```python
1125 | np.reciprocal(x) # Basically power -1
1126 | ```
1127 | 
1128 | #### **Return Range of Values**
1129 | 
1130 | ```python
1131 | a = np.array([1, 2, 3, 4])
1132 | 
1133 | np.ptp(a) # 3 (Maximum - Minimum)
1134 | ```
1135 | 
1136 | #### **Standard Deviation and Variance**
1137 | 
1138 | ```python
1139 | np.std(x) # Standard Deviation
1140 | np.var(x) # Variance
1141 | ```
1142 | 
1143 | There's a lot more! Go look at the `scipy.special` package for a list of all of them!
1144 | 
1145 | 
1146 | 
1147 | ## 3. Going Deeper With Arrays <a name="3"></a>
1148 | 
1149 | ### 3.1 Broadcasting <a name="3.1"></a>
1150 | [go to top](#top)
1151 | 
1152 | 
1153 | ![array](../../python-data-tools-reference/Numpy/assets/array.jpg)
1154 | 
1155 | Image source: https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm
1156 | 
1157 | Broadcasting causes Numpy to pad or 'stretch' smaller arrays to allow them to operate on or with other larger arrays!
1158 | 
1159 | > Broadcasting is possible if the following rules are satisfied:
1160 | >
1161 | > - Array with smaller **ndim** than the other is prepended with '1' in its shape.
1162 | > - Size in each dimension of the output shape is maximum of the input sizes in that dimension.
1163 | > - An input can be used in calculation, if its size in a particular dimension matches the output size or its value is exactly 1.
1164 | > - If an input has a dimension size of 1, the first data entry in that dimension is used for all calculations along that dimension.
1165 | >
1166 | > A set of arrays is said to be **broadcastable** if the above rules produce a valid result and one of the following is true:
1167 | >
1168 | > - Arrays have exactly the same shape.
1169 | > - Arrays have the same number of dimensions and the length of each dimension is either a common length or 1.
1170 | > - Array having too few dimensions can have its shape prepended with a dimension of length 1, so that the above stated property is true.
1171 | >
1172 | > https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm
1173 | 
1174 | **Example**
1175 | 
1176 | This is the example in the picture above!
1177 | 
1178 | ![array](../../python-data-tools-reference/Numpy/assets/array.jpg)
1179 | 
1180 | ```python
1181 | a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) 
1182 | b = np.array([1.0,2.0,3.0]) 
1183 | 
1184 | a
1185 | # Out: array([[0., 0., 0.]
1186 | #             [10., 10., 10.]
1187 | #             [20., 20., 20.]
1188 | #             [30., 30., 30.]])
1189 | 
1190 | b
1191 | # Out: array([1., 2., 3.])
1192 | 
1193 | a + b 
1194 | # Out: array([[1., 2., 3.]
1195 | #             [11., 12., 13.]
1196 | #             [21., 22., 23.]
1197 | #             [31., 32., 33.]])
1198 | ```
1199 | 
1200 | **Uses**
1201 | 
1202 | Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html
1203 | 
1204 | **Centering An Array**
1205 | 
1206 | ```python
1207 | X = np.random.random((10, 3))
1208 | Xmean = X.mean()
1209 | 
1210 | X_centered = X - Xmean
1211 | ```
1212 | 
1213 | **Plotting a Two-Dimensional Array**
1214 | 
1215 | ```python
1216 | # x and y have 50 steps from 0 to 5
1217 | x = np.linspace(0, 5, 50)
1218 | y = np.linspace(0, 5, 50)[:, np.newaxis]
1219 | 
1220 | z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
1221 | ```
1222 | 
1223 | 
1224 | 
1225 | ### 3.2 Vectorize <a name="3.2"></a>
1226 | [go to top](#top)
1227 | 
1228 | 
1229 | You'd have noticed that all of the functions above seem to be able to act on every element in the array without needing the use of for-loops!
1230 | 
1231 | You can get this ability for ANY function that you might want to write by using np.vectorize!
1232 | 
1233 | **Vectorize**
1234 | 
1235 | ```python
1236 | def my_add_n(a, n):
1237 |     return a + n
1238 | 
1239 | vfunc = np.vectorize(my_add_n)
1240 | 
1241 | vfunc([0, 2, 4], 2) # array([2, 4, 6])
1242 | 
1243 | # You may specify the output type explicitly as well
1244 | # Note: Down-casting will occur if you stated int but inputted floats!
1245 | vfunc_float = np.vectorize(my_add_n, otypes=[np.float])
1246 | ```
1247 | **Excluding Parameters**
1248 | 
1249 | ```python
1250 | # You may also declare parameters that shouldn't be vectorized!
1251 | # Source: https://docs.scipy.org/doc/numpy-1.9.2/reference/generated/numpy.vectorize.html
1252 | def mypolyval(p, x):
1253 |      _p = list(p)
1254 |      res = _p.pop(0)
1255 |      while _p:
1256 |          res = res*x + _p.pop(0)
1257 |      return res
1258 | 
1259 | vpolyval = np.vectorize(mypolyval, excluded=['p'])
1260 | 
1261 | # Think of this like x^2 + 2x + 3, then feed in x = 0, x = 1 successively
1262 | vpolyval(p=[1, 2, 3], x=[0, 1]) # array([3, 6])
1263 | 
1264 | # Or you can state the exclusion inline
1265 | vpolyval.excluded.add(0)
1266 | vpolyval([1, 2, 3], x=[0, 1]) # array([3, 6])
1267 | ```
1268 | 
1269 | 
1270 | 
1271 | ### 3.3 Iterating Through Axes <a name="3.3"></a>
1272 | [go to top](#top)
1273 | 
1274 | 
1275 | You could use a for loop, or you could use this
1276 | 
1277 | ```python
1278 | def state_max(x):
1279 |     return np.max(x)
1280 | 
1281 | np.apply_along_axis(state_max, axis=0, arr=array_to_parse)
1282 | ```
1283 | 
1284 | 
1285 | 
1286 | ### 3.4 Modifying Output Directly <a name="3.4"></a>
1287 | [go to top](#top)
1288 | 
1289 | 
1290 | Ok. So now you've noticed that all the functions above are more or less vectorized functions. They're also called UFuncs, universal functions.
1291 | 
1292 | Here are some nifty things you can do with them!
1293 | 
1294 | So, for example, if you're dealing with a huge array
1295 | 
1296 | ```python
1297 | a = np.arange(5)
1298 | b = np.empty(5)
1299 | 
1300 | # Less efficient
1301 | b = np.multiply(a, 10) # This creates a temporary array before assigning it to b
1302 | 
1303 | # More efficient
1304 | np.multiply(a, 10, out=b) # This modifies y directly! This also works for array views!
1305 | ```
1306 | 
1307 | 
1308 | 
1309 | ### 3.5 Locating Elements <a name="3.5"></a>
1310 | [go to top](#top)
1311 | 
1312 | 
1313 | #### **Where**
1314 | 
1315 | ```python
1316 | a = np.array([1, 2, 3, 4, 5])
1317 | 
1318 | b = np.where(a > 3) # (array([3, 4]),)  It's the locations of the satisfied conditions!
1319 | ```
1320 | 
1321 | #### **Take**
1322 | 
1323 | ```python
1324 | a.take(b) # array([[4, 5]])
1325 | ```
1326 | 
1327 | **Where Cases**
1328 | 
1329 | ```python
1330 | a = np.array([1, 2, 3, 4, 5])
1331 | 
1332 | b = np.where(a > 3, "NO", "YES") # array(['YES', 'YES', 'YES', 'NO', 'NO'], dtype='<U3')
1333 | ```
1334 | 
1335 | **Locate Maximum and Minimum Indices**
1336 | 
1337 | ```python
1338 | a = np.array([10, 20, 30, 40, 50])
1339 | 
1340 | # Equivalent
1341 | a.argmax() # 4
1342 | np.argmax(a) # 4
1343 | 
1344 | # Equivalent
1345 | a.argmin() # 0
1346 | np.argmin(a) # 0
1347 | ```
1348 | 
1349 | 
1350 | 
1351 | ### 3.6 Aggregations <a name="3.6"></a>
1352 | [go to top](#top)
1353 | 
1354 | 
1355 | #### **Reduce**
1356 | 
1357 | ```python
1358 | x = np.array([1, 2, 3, 4])
1359 | 
1360 | np.add.reduce(x) # 10 (which is 1 + 2 + 3 + 4)
1361 | np.multiply.reduce(x) # 24 (which is 1 * 2 * 3 * 4)
1362 | ```
1363 | 
1364 | #### **Accumulate**
1365 | 
1366 | Reduce, but show each step of the way!
1367 | 
1368 | ```python
1369 | x = np.array([1, 2, 3, 4])
1370 | 
1371 | np.add.accumulate(x) # array([1, 3, 6, 10])
1372 | np.multiply.accumulate(x) # array([1, 2, 6, 24])
1373 | ```
1374 | 
1375 | #### **Cumsum**
1376 | 
1377 | Cumulative sum
1378 | 
1379 | ```python
1380 | x = np.array([1, 2, 3, 4])
1381 | 
1382 | # Equivalent
1383 | np.cumsum(x)
1384 | x.cumsum()
1385 | np.add.reduce(x)
1386 | ```
1387 | 
1388 | **Outer Product**
1389 | 
1390 | The outer product of two vectors or matrices uv, is the matrix product of uv!
1391 | 
1392 | ![outer_product](assets/583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg)
1393 | 
1394 | Image source: https://en.wikipedia.org/wiki/Outer_product
1395 | 
1396 | ```python
1397 | x = np.array([1, 2, 3, 4])
1398 | 
1399 | np.multiply.outer(x, x)
1400 | # Out: array([[ 1,  2,  3,  4],
1401 | #             [ 2,  4,  6,  8],
1402 | #             [ 3,  6,  9, 12],
1403 | #             [ 4,  8, 12, 16]])
1404 | ```
1405 | 
1406 | #### **Sum**
1407 | 
1408 | ```python
1409 | np.sum(np.array([1, 2, 3, 4])) # 10
1410 | 
1411 | # Beware!
1412 | np.sum(np.array([[1, 2, 3, 4], [1, 2]])) # [1, 2, 3, 4, 1, 2]
1413 | ```
1414 | 
1415 | **Min and Max**
1416 | 
1417 | ```python
1418 | np.min(x) # Gives smallest element in array
1419 | np.max(x) # Gives largest element in array
1420 | 
1421 | # You can specify the axis!
1422 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
1423 |                   [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
1424 | 
1425 | np.min(array, axis=0)
1426 | # Out: array([[1, 2, 3],
1427 | #             [4, 5, 6],
1428 | #             [7, 8, 9]])
1429 |         
1430 | np.min(array, axis=1)
1431 | # Out: array([[ 1,  2,  3],
1432 | #             [10, 11, 12]])
1433 | 
1434 | np.min(array, axis=2)
1435 | # Out: array([[ 1,  4,  7],
1436 | #             [10, 13, 16]])
1437 | 
1438 | # Same applies to max
1439 | ```
1440 | 
1441 | #### **Mean**
1442 | 
1443 | ```python
1444 | np.mean(x)
1445 | ```
1446 | 
1447 | #### **Full List**
1448 | 
1449 | | Function Name   | NaN-safe Version   | Description                               |
1450 | | --------------- | ------------------ | ----------------------------------------- |
1451 | | `np.sum`        | `np.nansum`        | Compute sum of elements                   |
1452 | | `np.prod`       | `np.nanprod`       | Compute product of elements               |
1453 | | `np.mean`       | `np.nanmean`       | Compute mean of elements                  |
1454 | | `np.std`        | `np.nanstd`        | Compute standard deviation                |
1455 | | `np.var`        | `np.nanvar`        | Compute variance                          |
1456 | | `np.min`        | `np.nanmin`        | Find minimum value                        |
1457 | | `np.max`        | `np.nanmax`        | Find maximum value                        |
1458 | | `np.argmin`     | `np.nanargmin`     | Find index of minimum value               |
1459 | | `np.argmax`     | `np.nanargmax`     | Find index of maximum value               |
1460 | | `np.median`     | `np.nanmedian`     | Compute median of elements                |
1461 | | `np.percentile` | `np.nanpercentile` | Compute rank-based statistics of elements |
1462 | | `np.any`        | N/A                | Evaluate whether any elements are true    |
1463 | | `np.all`        | N/A                | Evaluate whether all elements are true    |
1464 | 
1465 | **Note:** These are methods you can call on the array itself as well!
1466 | 
1467 | ```python
1468 | x = np.array([1, 2, 3, 4])
1469 | 
1470 | # Equivalent!
1471 | np.sum(x)
1472 | x.sum()
1473 | 
1474 | # It even works for the arguments!
1475 | high_dim_array.sum(axis = 2) # And so on!
1476 | ```
1477 | 
1478 | 
1479 | 
1480 | ### 3.7 Comparisons <a name="3.7"></a>
1481 | [go to top](#top)
1482 | 
1483 | 
1484 | #### **Boolean Comparisons**
1485 | 
1486 | ```python
1487 | a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
1488 | 
1489 | b = a > 4 
1490 | # Out: array([False, False, False, False, False,  True,  True,  True,  True, True])
1491 | 
1492 | # This works for all the conditional operators!
1493 | # ==
1494 | # !=
1495 | # > , >=
1496 | # < , <=
1497 | ```
1498 | 
1499 | #### **Maximum and Minimum**
1500 | 
1501 | Note: It's **not** max and min! Those are single array!
1502 | 
1503 | ```python
1504 | a = np.array([1, 2, 3, 4, 5])
1505 | b = np.array([5, 4, 3, 2, 1])
1506 | 
1507 | np.maximum(a, b) # array([5, 4, 3, 4, 5])
1508 | np.minimum(a, b) # array([1, 2, 3, 2, 1])
1509 | ```
1510 | 
1511 | #### **Any and All**
1512 | 
1513 | You can use Any and All too!
1514 | 
1515 | ```python
1516 | np.any(x > 5)
1517 | np.all(x < 0)
1518 | ```
1519 | 
1520 | 
1521 | 
1522 | ### 3.8 Sorting Arrays <a name="3.8"></a>
1523 | [go to top](#top)
1524 | 
1525 | 
1526 | The np sort is default quicksort, though mergesort and heapsort are also options.
1527 | 
1528 | #### **Sort**
1529 | 
1530 | ```python
1531 | x = np.array([2, 1, 4, 3, 5])
1532 | 
1533 | # Does not mutate x
1534 | np.sort(x) # array([1, 2, 3, 4, 5])
1535 | 
1536 | # Mutates x
1537 | x.sort() # array([1, 2, 3, 4, 5])
1538 | 
1539 | # Return indices of sorted elements instead
1540 | np.argsort(x) # array([1, 0, 3, 2, 4])
1541 | ```
1542 | 
1543 | #### **Sort Along Axes**
1544 | 
1545 | ```python
1546 | array = np.array([[[9, 2, 1], [4, 2, 6], [17, 8, 9]],
1547 |                   [[190, 11, 12], [13, 14, 115], [16, 17, 18]]])
1548 | 
1549 | np.sort(array, axis=0)
1550 | # Out: array([[[  9,   2,   1],
1551 | #              [  4,   2,   6],
1552 | #              [ 16,   8,   9]],
1553 | #
1554 | #             [[190,  11,  12],
1555 | #              [ 13,  14, 115],
1556 | #              [ 17,  17,  18]]])
1557 | 
1558 | np.sort(array, axis = 1)
1559 | # Out: array([[[  4,   2,   1],
1560 | #              [  9,   2,   6],
1561 | #              [ 17,   8,   9]],
1562 | #
1563 | #             [[ 13,  11,  12],
1564 | #              [ 16,  14,  18],
1565 | #              [190,  17, 115]]])
1566 | ```
1567 | 
1568 | #### **Partial Sorts**
1569 | 
1570 | ```python
1571 | x = np.array([7, 2, 3, 1, 6, 5, 4])
1572 | 
1573 | # First 3 are smallest, the rest are in arbitrary order
1574 | # This also works for the multiple axes like in the previous example
1575 | np.partition(x, 3, axis = 0) # array([2, 1, 3, 4, 6, 5, 7])
1576 | ```
1577 | 
1578 | 
1579 | 
1580 | ### 3.9 Fancy Indexing <a name="3.9"></a>
1581 | [go to top](#top)
1582 | 
1583 | 
1584 | We know how to index, and slice, and apply Boolean masks (conditional indexing). but we can pass arrays of indices too!
1585 | 
1586 | ```python
1587 | x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
1588 | 
1589 | [x[3], x[4], x[8]] # [4, 5, 9]
1590 | 
1591 | ind = [3, 4, 8]
1592 | x[ind] # array([4, 5, 9])
1593 | 
1594 | # This is particularly useful because fancy indexing allows you to RESHAPE the array!
1595 | ind = np.array([[3, 4], [8, 0]])
1596 | x[ind]
1597 | # array([[4, 5],
1598 | #        [9, 1]])
1599 | 
1600 | # You can also do it in multiple dimensions
1601 | x = np.array([[1, 2],
1602 |               [3, 4]])
1603 | 
1604 | row = np.array([0, 1]) # Select [1, 2] or [3, 4]
1605 | col = np.array([0, 1]) # Select within those inner arrays
1606 | 
1607 | x[row, col] # array([1, 4])
1608 | 
1609 | # Also works with broadcasting
1610 | x[row[:, np.newaxis], col]
1611 | # Out: array([[1, 2],
1612 | #             [3, 4]])
1613 | ```
1614 | 
1615 | #### **Combined Indexing**
1616 | 
1617 | Combine fancy indexing with normal indexing!
1618 | 
1619 | ```python
1620 | x = np.arange(12).reshape(3, 4)
1621 | # Out: array([[ 0,  1,  2,  3],
1622 | #             [ 4,  5,  6,  7],
1623 | #             [ 8,  9, 10, 11]])
1624 | 
1625 | x[2, [2, 0, 1]] # array([10, 8, 9])
1626 | x[1:, [2, 0, 1]]
1627 | # Out: array([[6, 4, 5],
1628 | #             [10, 8, 9])
1629 | ```
1630 | 
1631 | 
1632 | 
1633 | ### 3.10 Structured Arrays <a name="3.10"></a>
1634 | [go to top](#top)
1635 | 
1636 | 
1637 | Arrays of mixed type!
1638 | 
1639 | Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html
1640 | 
1641 | ```python
1642 | data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
1643 |                           'formats':('U10', 'i4', 'f8')})
1644 | 
1645 | data.dtype # [('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]
1646 | 
1647 | name = ['Alice', 'Bob', 'Cathy', 'Doug']
1648 | age = [25, 45, 37, 19]
1649 | weight = [55.0, 85.5, 68.0, 61.5]
1650 | 
1651 | data['name'] = name
1652 | data['age'] = age
1653 | data['weight'] = weight
1654 | 
1655 | # Data is now array([('Alice', 25, 55.0), ('Bob', 45, 85.5), ('Cathy', 37, 68.0),
1656 | # ('Doug', 19, 61.5)])
1657 | 
1658 | # Now you can index into it or index using the name!
1659 | data['name'] # ['Alice', 'Bob', 'Cathy', 'Doug']
1660 | data[-1]['name'] # Doug
1661 | 
1662 | # And it's nice with masks
1663 | # Get names where age is under 30
1664 | data[data['age'] < 30]['name'] # array(['Alice', 'Doug'], dtype='<U10')
1665 | ```
1666 | 
1667 | 
1668 | 
1669 | ## 4. Matrices <a name="4"></a>
1670 | 
1671 | Matrices are strictly 2 dimensional ndarrays!
1672 | 
1673 | You create them exactly the same.
1674 | 
1675 | ```python
1676 | import numpy.matlib
1677 | 
1678 | matlib.empty()
1679 | matlib.zeros()
1680 | matlib.ones()
1681 | matlib.eye()
1682 | matlib.identity()
1683 | matlib.rand()
1684 | 
1685 | # You can even use
1686 | np.asmatrix(some_numpy_array)
1687 | 
1688 | # Useful methods
1689 | .diagonal() # Get diagonal as an array
1690 | 
1691 | # You can sort them, and do general ndarray stuff with them as well!
1692 | ```
1693 | 
1694 | ### 4.1 Linear Algebra Functions <a name="4.1"></a>
1695 | [go to top](#top)
1696 | 
1697 | 
1698 | ```python
1699 | np.dot() # Get dot product of two arrays
1700 | np.vdot() # Get dot product of two vectors
1701 | 
1702 | np.inner() # Get inner product of two arrays
1703 | np.matmul() # Matrix multiplication
1704 | 
1705 | np.linalg.det() # Determinant
1706 | np.linalg.inv() # Find Inverse matrix
1707 | 
1708 | np.linalg.solve() # Solve system of linear equations
1709 | ```
1710 | 
1711 | **Special Note: Dot Product and Multiply**
1712 | 
1713 | There are shorthand operators for matrices!
1714 | 
1715 | ```python
1716 | # Suppose we have two matrices A and B
1717 | 
1718 | # np.dot(A, B)
1719 | A * B # Dot product, element wise multiplication
1720 | 
1721 | # np.matmul(A, B)
1722 | A @ B # Matrix multiplication
1723 | ```
1724 | 
1725 | 
1726 | 
1727 | ## 5. Numpy I/O <a name="5"></a>
1728 | 
1729 | ### 5.1 Import from CSV <a name="5.1"></a>
1730 | [go to top](#top)
1731 | 
1732 | 
1733 | https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.genfromtxt.html
1734 | 
1735 | ```python
1736 | path = 'path_to_csv'
1737 | data = np.genfromtxt(path,
1738 |                      delimiter=',',
1739 |                      skip_header=1, # Number of lines to skip at beginning
1740 |                      filling_values=-999, # Value to use when data is missing
1741 |                      dtype='float')
1742 | 
1743 | # If you set dtype as None, each row will be a Python tuple in the Array!
1744 | (18., 8,  307., 130, 3504,  12. , 70, 1, b'"some_string_stuff"')
1745 | ```
1746 | 
1747 | 
1748 | 
1749 | ### 5.2 Saving and Loading <a name="5.2"></a>
1750 | [go to top](#top)
1751 | 
1752 | 
1753 | ```python
1754 | # Save One Array
1755 | np.save('data.npy', array)
1756 | 
1757 | # Save Multiple Arrays
1758 | np.savez('data_mult.npz', a=array_a, b=array_b)
1759 | 
1760 | # Load
1761 | single = np.load('data.npy')
1762 | mult = np.load('data.npz')
1763 | 
1764 | a = mult['a']
1765 | b = mult['b']
1766 | ```
1767 | 
1768 | **Save and Load as txt**
1769 | 
1770 | ```python
1771 | np.savetxt('out.txt', array)
1772 | 
1773 | np.loadtxt('out.txt')
1774 | ```
1775 | 
1776 | 
1777 | 
1778 | ```
1779 |                             .     .
1780 |                          .  |\-^-/|  .    
1781 |                         /| } O.=.O { |\
1782 | ```
1783 | 
1784 | ​    
1785 | 
1786 | ------
1787 | 
1788 |  [![Yeah! Buy the DRAGON a COFFEE!](../assets/COFFEE%20BUTTON%20%E3%83%BE(%C2%B0%E2%88%87%C2%B0%5E).png)](https://www.buymeacoffee.com/methylDragon)
1789 | 


--------------------------------------------------------------------------------