├── .gitignore ├── Numpy ├── assets │ ├── array.jpg │ ├── kZNzz.png │ ├── vceRQ.png │ ├── Matrix.svg.png │ ├── elsp_0105.png │ ├── array_vs_list.png │ └── 583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg └── 01 Numpy Basics.md ├── Pandas ├── assets │ ├── hMKKt.jpg │ ├── structure_table.jpg │ ├── structure_table-1557216961120.jpg │ └── series-and-dataframe.width-1200.png └── 01 Pandas Basics.md ├── assets └── COFFEE BUTTON ヾ(°∇°^).png ├── README.md └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | *.no_toc 3 | -------------------------------------------------------------------------------- /Numpy/assets/array.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/array.jpg -------------------------------------------------------------------------------- /Numpy/assets/kZNzz.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/kZNzz.png -------------------------------------------------------------------------------- /Numpy/assets/vceRQ.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/vceRQ.png -------------------------------------------------------------------------------- /Pandas/assets/hMKKt.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/hMKKt.jpg -------------------------------------------------------------------------------- /Numpy/assets/Matrix.svg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/Matrix.svg.png -------------------------------------------------------------------------------- /Numpy/assets/elsp_0105.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/elsp_0105.png -------------------------------------------------------------------------------- /Numpy/assets/array_vs_list.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/array_vs_list.png -------------------------------------------------------------------------------- /assets/COFFEE BUTTON ヾ(°∇°^).png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/assets/COFFEE BUTTON ヾ(°∇°^).png -------------------------------------------------------------------------------- /Pandas/assets/structure_table.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/structure_table.jpg -------------------------------------------------------------------------------- /Pandas/assets/structure_table-1557216961120.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/structure_table-1557216961120.jpg -------------------------------------------------------------------------------- /Pandas/assets/series-and-dataframe.width-1200.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/series-and-dataframe.width-1200.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # python-data-tools-reference 2 | A reference of frameworks and tools for data and ML in Python 3 | 4 | 5 | 6 | A one-stop collection of code references, snippets, and references for some of the most widely used tools and frameworks for data manipulation and ML in Python. 7 | 8 | I'm building these as I'm learning or using them, so they won't be comprehensive, but maybe they'll be of use! -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 methylDragon 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Numpy/assets/583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg: -------------------------------------------------------------------------------- 1 | 2 | {\displaystyle {\begin{aligned}\mathbf {u} \otimes \mathbf {v} =\mathbf {u} \mathbf {v} ^{\top }={\begin{bmatrix}u_{1}\\u_{2}\\u_{3}\\u_{4}\end{bmatrix}}{\begin{bmatrix}v_{1}&v_{2}&v_{3}\end{bmatrix}}={\begin{bmatrix}u_{1}v_{1}&u_{1}v_{2}&u_{1}v_{3}\\u_{2}v_{1}&u_{2}v_{2}&u_{2}v_{3}\\u_{3}v_{1}&u_{3}v_{2}&u_{3}v_{3}\\u_{4}v_{1}&u_{4}v_{2}&u_{4}v_{3}\end{bmatrix}}.\end{aligned}}} 3 | 25 | 223 | -------------------------------------------------------------------------------- /Pandas/01 Pandas Basics.md: -------------------------------------------------------------------------------- 1 | # Pandas Basics 2 | 3 | Author: methylDragon 4 | Contains a syntax reference and code snippets for Pandas! 5 | It's a collection of code snippets and tutorials from everywhere all mashed together! 6 | 7 | ------ 8 | 9 | ## Pre-Requisites 10 | 11 | ### Required 12 | 13 | - Python knowledge, this isn't a tutorial! 14 | - Pandas installed 15 | 16 | - I'll assume you've already run these lines as well 17 | 18 | ```python 19 | import numpy as np 20 | import pandas as pd 21 | ``` 22 | 23 | 24 | 25 | ## Table Of Contents 26 | 27 | 1. [Introduction](#1) 28 | 2. [Pandas Basics](#2) 29 | 2.1 [Data Types](#2.1) 30 | 2.2 [Series Basics](#2.2) 31 | 2.3 [DataFrame Basics](#2.3) 32 | 2.4 [Panel Basics](#2.4) 33 | 2.5 [Catagorical Data](#2.5) 34 | 2.6 [Basic Binary Operations](#2.6) 35 | 2.7 [Casting and Conversion](#2.7) 36 | 2.8 [Conditional Indexing](#2.8) 37 | 2.9 [IO](#2.9) 38 | 2.10 [Plotting](#2.10) 39 | 2.11 [Sparse Data](#2.11) 40 | 3. [Series Operations](#3) 41 | 3.1 [Manipulating Series Text](#3.1) 42 | 3.2 [Time Series](#3.2) 43 | 3.3 [Time Deltas](#3.3) 44 | 4. [DataFrame Operations](#4) 45 | 4.1 [Preface](#4.1) 46 | 4.2 [Iterating Through DataFrames](#4.2) 47 | 4.3 [Sorting, Reindexing, and Renaming DataFrame Values](#4.3) 48 | 4.4 [Replacing DataFrame Values](#4.4) 49 | 4.5 [Function Application on DataFrames](#4.5) 50 | 4.6 [Descriptive Statistics](#4.6) 51 | 4.7 [Statistical Methods](#4.7) 52 | 4.8 [Window Functions](#4.8) 53 | 4.9 [Data Aggregation](#4.9) 54 | 4.10 [Dealing with Missing Data](#4.10) 55 | 4.11 [GroupBy Operations](#4.11) 56 | 4.12 [Merging and Joining](#4.12) 57 | 4.13 [Concatenation](#4.13) 58 | 5. [EXTRA: Helpful Notes](#5) 59 | 60 | 61 | 62 | 63 | ## 1. Introduction 64 | 65 | > *pandas* is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the [Python](https://www.python.org/) programming language. 66 | > 67 | > *pandas* is a [NumFOCUS](https://www.numfocus.org/open-source-projects.html) sponsored project. This will help ensure the success of development of *pandas* as a world-class open-source project, and makes it possible to [donate](https://pandas.pydata.org/donate.html) to the project. 68 | > 69 | > 70 | 71 | This document will list the most commonly used functions in Pandas, to serve as a reference when using it. 72 | 73 | It's not meant to be exhaustive, merely acting as a quick reference for the syntax for basic operations with Pandas. Please do not hesitate to consult the [official documentation]( 78 | 79 | - Fast and efficient DataFrame object with default and customized indexing. 80 | - Tools for loading data into in-memory data objects from different file formats. 81 | - Data alignment and integrated handling of missing data. 82 | - Reshaping and pivoting of date sets. 83 | - Label-based slicing, indexing and subsetting of large data sets. 84 | - Columns from a data structure can be deleted or inserted. 85 | - Group by data for aggregation and transformations. 86 | - High performance merging and joining of data. 87 | - Time Series functionality. 88 | 89 | --- 90 | 91 | Install it! 92 | 93 | ```shell 94 | # Best to use conda 95 | $ conda install pandas 96 | 97 | # But it's possible to use the PyPI wheels as well 98 | $ pip install pandas 99 | ``` 100 | 101 | You might also need to install additional dependencies 102 | 103 | ```shell 104 | $ sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook 105 | python-pandas python-sympy python-nose 106 | ``` 107 | 108 | 109 | 110 | If you need additional help or need a refresher on the parameters, feel free to use: 111 | 112 | ```python 113 | help(pd.FUNCTION_YOU_NEED_HELP_WITH) 114 | ``` 115 | 116 | --- 117 | 118 | **Credits:** 119 | 120 | A lot of these notes I'm adapting from 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | ## 2. Pandas Basics 131 | 132 | ### 2.1 Data Types 133 | [go to top](#top) 134 | 135 | 136 | Note that Pandas is built on top of Numpy. 137 | 138 | There are three types of data structures that Pandas deals with: 139 | 140 | - Series 141 | - 1D labelled homogeneous array, size-immutable 142 | - If heterogeneous data is entered, the data-type will become 'object' 143 | - DataFrame 144 | - Contains series data 145 | - 2D labelled, size-mutable, table structure 146 | - Potentially heterogeneous columns 147 | - Panel 148 | - Contains DataFrames 149 | - 3D labelled, size-mutable array 150 | 151 | **The major focus of this syntax reference will deal with DataFrames**. Since they're the most commonly manipulated objects when Pandas is concerned. 152 | 153 | 154 | 155 | ### 2.2 Series Basics 156 | [go to top](#top) 157 | 158 | 159 | > A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. 160 | > 161 | > 162 | 163 | ![Image result for pandas series](assets/series-and-dataframe.width-1200.png) 164 | 165 | [Image Source]() 166 | 167 | #### **Creating Series Objects** 168 | 169 | ```python 170 | # Empty Series 171 | s = pd.Series() 172 | 173 | # Series from ndarray 174 | s = pd.Series(np.array([1, 2, 3])) 175 | s = pd.Series(np.array([1, 2, 3]), index=[100, 101, 102]) # With custom indexing 176 | 177 | # Series from Dict 178 | # Dictionary keys are used to construct the index 179 | s = pd.Series({'a': 0, 'b': 1, 'c': 2}) 180 | 181 | # Series from scalar 182 | s = pd.Series(5, index=[0, 1, 2]) # Creates 3 rows of value 5 183 | ``` 184 | 185 | #### **Accessing Values** 186 | 187 | ```python 188 | # By position 189 | s[0] 190 | 191 | # By index 192 | s['index_name'] 193 | 194 | # By slice 195 | s[-3:] # Retrieves last 3 elements 196 | 197 | # Fancy indexing works also! 198 | s[[0, 1, 2]] 199 | s[['index_1', 'index_2', 'index_3']] 200 | 201 | # Head and Tail 202 | s.head() 203 | s.tail() 204 | s.head(5) # First 5 205 | s.tail(5) # Last 5 206 | ``` 207 | 208 | #### **Series Properties** 209 | 210 | ```python 211 | s.axes # Returns list of row axis labels 212 | s.dtype # Returns data type of entries 213 | s.empty # True if series is empty 214 | s.ndim # Dimension. 1 for series 215 | s.size # Number of elements 216 | s.values # Returns the Series as an ndarray 217 | ``` 218 | 219 | 220 | 221 | ### 2.3 DataFrame Basics 222 | [go to top](#top) 223 | 224 | 225 | > A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. 226 | > 227 | > 228 | 229 | ![Structure Table](assets/structure_table.jpg) 230 | 231 | Image Source: 232 | 233 | #### **Creating DataFrame Objects** 234 | 235 | ```python 236 | # Empty DataFrame 237 | df = pd.DataFrame() 238 | 239 | # DataFrame from List 240 | df = pd.DataFrame([1, 2, 3, 4, 5]) # Single Column 241 | df = pd.DataFrame([['a', 1], ['b', 2]], columns=['name_1', 'name_2']) # Multi columns 242 | df = pd.DataFrame([1, 2, 3], dtype=float) # Convert the ints to floats 243 | 244 | # DataFrame from Series 245 | df = s.to_frame() 246 | 247 | # DataFrame from Dict of Lists 248 | df = pd.DataFrame({'Name':['methylDragon', 'toothless', 'smaug'], 'Rating': [10, 5, 2]}) 249 | 250 | # DataFrame from List of Dicts 251 | df = pd.DataFrame([{'Name': 'methylDragon', 'Rating': 10}, 252 | {'Name': 'toothless', 'Rating': 5}, 253 | {'Name': 'smaug'}]) # NaN will be appended for missing values 254 | 255 | # DataFrame from Dict of Series 256 | # Similarly, NaN will be added for missing values 257 | df = pd.DataFrame({'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 258 | 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}) 259 | 260 | # Creating with Non-Default Index 261 | df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c']) 262 | ``` 263 | 264 | #### **Important Note on Mutability** 265 | 266 | **NOTE:** Most operations will **not** change the original DataFrame unless the DataFrame is **reassigned**, or you use an `inplace=True` flag, which changes the DataFrame in question in place. 267 | 268 | #### **Basic Operations** 269 | 270 | **Column** 271 | 272 | ```python 273 | # Column Selection 274 | df['column_name'] 275 | df.column_name # This also works! (Only if the column name is a string though..) 276 | 277 | # Column Selection by dtype 278 | df.select_dtypes(include=[dtypes]) 279 | 280 | # Adding a new Column 281 | df['new_column_name'] = pd.Series([1, 2, 3]) 282 | 283 | # Deleting a Column (Either one works) 284 | del df['column_name'] 285 | df.pop(['column_name']) 286 | 287 | # Math for Columns 288 | df['column_1'] + df['column_2'] # Gives you a new column that is the addition of the first two 289 | ``` 290 | 291 | **Row** 292 | 293 | ```python 294 | # Row Selection by Label 295 | df.loc['row_lable/index'] 296 | 297 | # Row Selection by Position Index 298 | df.iloc[0] # Selects first row 299 | 300 | # Row Slicing 301 | df[-3:] 302 | 303 | # Adding Rows 304 | df.append(df2) 305 | df.append(df2, ignore_index=True) # To ignore indices 306 | 307 | # Deleting Rows 308 | df.drop('label_to_drop') 309 | 310 | # Deleting rows with None/NaN/empty values 311 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html 312 | df.dropna(axis=0, how='any') # Drop rows with any column containing None 313 | df.dropna(axis=0, how='all') # Drop rows with all columns containing None 314 | df.dropna(axis=0, thresh=2) # Drop rows with 2 or more columns containing None 315 | 316 | # Head and Tail 317 | df.head() 318 | df.tail() 319 | df.head(5) # First 5 rows 320 | df.tail(5) # Last 5 rows 321 | ``` 322 | 323 | #### **DataFrame Properties** 324 | 325 | ```python 326 | df.T # Transpose 327 | df.axes # Row axis and column axis labels 328 | df.dtypes # Data types of elements 329 | df.empty # True if empty 330 | df.ndim # Dimension (number of axes) 331 | df.shape # Tuple representing the shape (dimensionality) of the DataFrame 332 | df.size # Number of elements 333 | df.values # Numpy represendation, NDFrame 334 | ``` 335 | 336 | 337 | 338 | ### 2.4 Panel Basics 339 | [go to top](#top) 340 | 341 | 342 | > A **panel** is a 3D container of data. The term **Panel data** is derived from econometrics and is partially responsible for the name pandas − **pan(el)-da(ta)**-s. 343 | > 344 | > The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are − 345 | > 346 | > - **items** − axis 0, each item corresponds to a DataFrame contained inside. 347 | > - **major_axis** − axis 1, it is the index (rows) of each of the DataFrames. 348 | > - **minor_axis** − axis 2, it is the columns of each of the DataFrames. 349 | > 350 | > 351 | 352 | #### **Creating Panel Objects** 353 | 354 | ```python 355 | # Empty Panel 356 | p = pd.Panel() 357 | 358 | # Panel from 3D ndarray 359 | p = pd.Panel(np.random.rand(2, 4, 5)) 360 | 361 | # Panel from dict of DataFrames 362 | data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 363 | 'Item2' : pd.DataFrame(np.random.randn(4, 2))} 364 | p = pd.Panel(data) 365 | ``` 366 | 367 | #### **Accessing Values** 368 | 369 | ```python 370 | # By Item 371 | p['Item1'] # Gives you the corresponding dataframe 372 | 373 | # By Major Axis 374 | p.major_xs(1) # Shows all data from the second row across all dataframes 375 | 376 | ''' 377 | Eg: If the panel's first item is as such: 378 | 0 1 2 379 | 0 0.488224 -0.128637 0.930817 380 | >> 1 0.417497 0.896681 0.576657 << 381 | 2 -2.775266 0.571668 0.290082 382 | 3 -0.400538 -0.144234 1.110535 383 | 384 | Then the Output of p.major_xs(1) is: 385 | Item1 386 | 0 0.417497 387 | 1 0.896681 388 | 2 0.576657 389 | 390 | It's a transpose of the second row's elements (of the original DataFrame)! 391 | ''' 392 | 393 | # By Minor Axis 394 | p.minor_xs(1) 395 | 396 | ''' 397 | Eg: Same deal as above, same first item 398 | 399 | Output of p.minor_xs(1) are the items under the second column (of the original DataFrame)! 400 | 401 | Item1 402 | 0 -0.128637 403 | 1 0.896681 404 | 2 0.571668 405 | 3 -0.144234 406 | ''' 407 | ``` 408 | 409 | 410 | 411 | ### 2.5 Catagorical Data 412 | [go to top](#top) 413 | 414 | 415 | So imagine you have data that's made of a limited number of actual values 416 | 417 | Eg: [1, 1, 1, 3, 2, 3, 2, 1, 2, 3, 1] 418 | 419 | There's a way to encode the fact that there are only three kinds of values - Catagories! 420 | 421 | #### **Construct Catagorical Data** 422 | 423 | ```python 424 | # Source: https://www.tutorialspoint.com/python_pandas/python_pandas_categorical_data.htm 425 | 426 | s = pd.Series(["a","b","c","a"], dtype="category") 427 | ''' 428 | Output 429 | 430 | 0 a 431 | 1 b 432 | 2 c 433 | 3 a 434 | dtype: category 435 | Categories (3, object): [a, b, c] 436 | ''' 437 | 438 | # Generate just a list-like object with catagories 439 | cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c']) 440 | # [a, b, c, a, b, c] 441 | # Categories (3, object): [a, b, c] 442 | 443 | # Or do it with stated catagories! 444 | cat = pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a']) 445 | # [a, b, c, a, b, c, NaN] 446 | # Categories (3, object): [c, b, a] 447 | 448 | # Specify catagories with ordered catagories 449 | # This one implies c < b < a 450 | cat = pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True) 451 | ``` 452 | 453 | #### **Properties and Altering Catagories** 454 | 455 | ```python 456 | df.describe() # For general 457 | s.categories() # Find catagories 458 | s.ordered() # 459 | s.cat.categories() # Use this to edit the categories 460 | 461 | # Add catagories 462 | s = s.cat.add_categories([4]) 463 | 464 | # Remove catagories 465 | s.cat.remove_categories("a") 466 | 467 | # Compare catagories 468 | # You may compare catagorical data, aligned by category 469 | cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True) 470 | cat1 = pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True) 471 | 472 | cat > cat1 473 | ''' 474 | Output 475 | 476 | 0 False 477 | 1 False 478 | 2 True 479 | dtype: bool 480 | ''' 481 | ``` 482 | 483 | 484 | 485 | ### 2.6 Basic Binary Operations 486 | [go to top](#top) 487 | 488 | 489 | https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html#pandas.DataFrame.add 490 | 491 | #### **Arithmetic** 492 | 493 | ```python 494 | df.add(other) 495 | df.sub(other) 496 | df.mul(other) 497 | df.div(other) 498 | df.truediv(other) # For floats 499 | df.floordiv(other) # For integers 500 | df.mod(other) 501 | df.pow(other) 502 | df.divmod(other) # Returns tuple of (quotient, remainder) 503 | 504 | df.radd(other) # Reverse 505 | df.rsub(other) # Reverse 506 | 507 | # You may specify fill-values for missing values too! 508 | df.add(other, fill_value=0) 509 | ``` 510 | 511 | #### **Boolean Reductions** 512 | 513 | ```python 514 | (df > 0).all() 515 | 516 | # empty, any, all, bool all work. 517 | 518 | # You can also do comparisons! (Eg. ==, >, etc.) 519 | ``` 520 | 521 | 522 | 523 | ### 2.7 Casting and Conversion 524 | [go to top](#top) 525 | 526 | 527 | ```python 528 | # Casting object to dtype 529 | df.astype(dtype) 530 | df.astype(dtype, copy=False) # Do not return a copy 531 | 532 | # Attempt to infer better dtype for object columns 533 | df.convert_objects(convert_dates=True) # Unconvertibles become NaT 534 | df.convert_objects(convert_numeric=True) # Unconvertibles become NaN 535 | ``` 536 | 537 | 538 | 539 | ### 2.8 Conditional Indexing 540 | [go to top](#top) 541 | 542 | 543 | So you remember that fancy indexing works? 544 | 545 | ```python 546 | # Now you can do it with conditions too! 547 | df[df > 0] 548 | df.where(df > 0) 549 | ``` 550 | 551 | 552 | 553 | ### 2.9 IO 554 | [go to top](#top) 555 | 556 | 557 | 558 | 559 | | Format Type | Data Description | Reader | Writer | 560 | | :---------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | 561 | | text | [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) | [read_csv](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-read-csv-table) | [to_csv](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-store-in-csv) | 562 | | text | [JSON](http://www.json.org/) | [read_json](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-json-reader) | [to_json](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-json-writer) | 563 | | text | [HTML](https://en.wikipedia.org/wiki/HTML) | [read_html](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-read-html) | [to_html](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-html) | 564 | | text | Local clipboard | [read_clipboard](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-clipboard) | [to_clipboard](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-clipboard) | 565 | | binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [read_excel](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-excel-reader) | [to_excel](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-excel-writer) | 566 | | binary | [HDF5 Format](https://support.hdfgroup.org/HDF5/whatishdf5.html) | [read_hdf](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-hdf5) | [to_hdf](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-hdf5) | 567 | | binary | [Feather Format](https://github.com/wesm/feather) | [read_feather](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-feather) | [to_feather](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-feather) | 568 | | binary | [Msgpack](http://msgpack.org/index.html) | [read_msgpack](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-msgpack) | [to_msgpack](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-msgpack) | 569 | | binary | [Stata](https://en.wikipedia.org/wiki/Stata) | [read_stata](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-stata-reader) | [to_stata](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-stata-writer) | 570 | | binary | [SAS](https://en.wikipedia.org/wiki/SAS_(software)) | [read_sas](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sas-reader) | | 571 | | binary | [Python Pickle Format](https://docs.python.org/3/library/pickle.html) | [read_pickle](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-pickle) | [to_pickle](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-pickle) | 572 | | SQL | [SQL](https://en.wikipedia.org/wiki/SQL) | [read_sql](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sql) | [to_sql](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sql) | 573 | | SQL | [Google Big Query](https://en.wikipedia.org/wiki/BigQuery) | [read_gbq](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-bigquery) | [to_gbq](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-bigquery) | 574 | 575 | ```python 576 | # Custom Indexing 577 | pd.read_csv("file", index_col=['index_col_name']) 578 | 579 | # With converted datatypes 580 | pd.read_csv("file", dtype={'col': dtype}) 581 | 582 | # Column names 583 | pd.read_csv("file", names=['1', 'b', 'etc']) 584 | 585 | # Skip rows 586 | pd.read_csv("file", skiprows=2) 587 | ``` 588 | 589 | 590 | 591 | ### 2.10 Plotting 592 | [go to top](#top) 593 | 594 | 595 | Source: 596 | 597 | ```python 598 | df.plot() # Line plot 599 | df.plot.bar() # Bar chart 600 | df.plot.bar(stacked=True) # Stacaked bar chart 601 | df.plot.barh() # Horizontal bar chart 602 | df.plot.barh(stacked=True) # Horizontal stacked bar chart 603 | df.plot.hist(bins=20) # Plot histogram 604 | df.diff.hist(bins=30) # Plot different histograms for each column 605 | df.plot.box() # Bot plot 606 | df.plot.area() # Area plot 607 | df.plot.scatter(x='a', y='b') # Scatter plot 608 | df.plot.pie(subplots=True) # Pit plot 609 | ``` 610 | 611 | 612 | 613 | ### 2.11 Sparse Data 614 | [go to top](#top) 615 | 616 | 617 | You can sparsify data to save on space on Disk or in the interpretor memory! 618 | 619 | ```python 620 | # Sparsify 621 | sparse_obj = obj.to_sparse() # Default sparsifies NaN/missing 622 | sparse_obj = obj.to_sparse(fill_value=0) # Sparsify target value 623 | 624 | # Convert back 625 | sparse_obj.to_dense() 626 | 627 | # Properties 628 | sparse_obj.density 629 | ``` 630 | 631 | 632 | 633 | ## 3. Series Operations 634 | 635 | ### 3.1 Manipulating Series Text 636 | [go to top](#top) 637 | 638 | 639 | Source: 640 | 641 | | 1 | **lower()**Converts strings in the Series/Index to lower case. | 642 | | ---- | ------------------------------------------------------------ | 643 | | 2 | **upper()**Converts strings in the Series/Index to upper case. | 644 | | 3 | **len()**Computes String length(). | 645 | | 4 | **strip()**Helps strip whitespace(including newline) from each string in the Series/index from both the sides. | 646 | | 5 | **split(' ')**Splits each string with the given pattern. | 647 | | 6 | **cat(sep=' ')**Concatenates the series/index elements with given separator. | 648 | | 7 | **get_dummies()**Returns the DataFrame with One-Hot Encoded values. | 649 | | 8 | **contains(pattern)**Returns a Boolean value True for each element if the substring contains in the element, else False. | 650 | | 9 | **replace(a,b)**Replaces the value **a** with the value **b**. | 651 | | 10 | **repeat(value)**Repeats each element with specified number of times. | 652 | | 11 | **count(pattern)**Returns count of appearance of pattern in each element. | 653 | | 12 | **startswith(pattern)**Returns true if the element in the Series/Index starts with the pattern. | 654 | | 13 | **endswith(pattern)**Returns true if the element in the Series/Index ends with the pattern. | 655 | | 14 | **find(pattern)**Returns the first position of the first occurrence of the pattern. | 656 | | 15 | **findall(pattern)**Returns a list of all occurrence of the pattern. | 657 | | 16 | **swapcase**Swaps the case lower/upper. | 658 | | 17 | **islower()**Checks whether all characters in each string in the Series/Index in lower case or not. Returns Boolean | 659 | | 18 | **isupper()**Checks whether all characters in each string in the Series/Index in upper case or not. Returns Boolean. | 660 | | 19 | **isnumeric()**Checks whether all characters in each string in the Series/Index are numeric. Returns Boolean. | 661 | 662 | #### **Example** 663 | 664 | ```python 665 | s.str.lower() 666 | ``` 667 | 668 | 669 | 670 | ### 3.2 Time Series 671 | [go to top](#top) 672 | 673 | 674 | ```python 675 | # Get Current Time 676 | pd.datetime.now() # Get current time 677 | 678 | # Get Time from Timestamp 679 | pd.Timestamp('2019-03-01') 680 | pd.Timestamp(1587687575, unit='s') 681 | 682 | # Get a date range 683 | pd.date_range("11:00", "13:30", freq="H").time 684 | pd.date_range("11:00", "13:30", freq="30min").time # Different frequency 685 | # Output: 686 | # [datetime.time(11, 0) datetime.time(11, 30) datetime.time(12, 0) 687 | # datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30)] 688 | 689 | # Convert Time Series to Timestamps 690 | pd.to_datetime(SOME_DATETIME_SERIES) 691 | ``` 692 | 693 | 694 | 695 | ### 3.3 Time Deltas 696 | [go to top](#top) 697 | 698 | 699 | These are almost exactly like the datetime library's timedelta objects. 700 | 701 | ```python 702 | pd.Timedelta(6, unit='h') 703 | pd.Timedelta(days=-2) 704 | pd.Timedelta('2 days 2 hours 15 minutes 30 seconds') # Or even from a string! 705 | 706 | # Or from a series 707 | pd.to_timedelta(s) 708 | ``` 709 | 710 | 711 | 712 | ## 4. DataFrame Operations 713 | 714 | ### 4.1 Preface 715 | [go to top](#top) 716 | 717 | 718 | Even though this section is supposed to be focused on DataFrames, a lot of these operations can be applied to Series and Panel objects as well! It's just that a large part of using Pandas is working with DataFrames 719 | 720 | To get at least some brief understanding of your data you can 721 | 722 | ```python 723 | # Look at the first few rows of data 724 | df.head() 725 | 726 | # Look at essential details (like dimensions, data types, etc.) 727 | df.info() 728 | ``` 729 | 730 | 731 | 732 | ### 4.2 Iterating Through DataFrames 733 | [go to top](#top) 734 | 735 | 736 | ```python 737 | df.iteritems() # (key, value) pairs (Get by columns) 738 | df.iterrows() # (index, series) pairs (Get by rows) 739 | df.itertuples() # Iterate over rows as named tuples 740 | ``` 741 | 742 | 743 | 744 | ### 4.3 Sorting, Reindexing, and Renaming DataFrame Values 745 | [go to top](#top) 746 | 747 | 748 | ```python 749 | # Sort by Values 750 | df.sort_values('column_name', inplace=True) # Sort by values in column 751 | 752 | # Sort by Index 753 | df.sort_index(ascending=False) # Default is ascending=True 754 | df.sort_index(axis=1) # Sort by column index 755 | 756 | # Reset Index 757 | df.reset_index(inplace=True, drop=True) # Reset index, skip inserting old index as a column 758 | 759 | # Rename Columns 760 | df.rename(columns=newcol_names, inplace=True) 761 | 762 | # Rename Index 763 | df.rename(index={'index_element_1': 'new_name'}) 764 | 765 | # Reindex 766 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html 767 | df.reindex(index=[1, 2, 3], columns=[1, 2, 3]) 768 | 769 | # Reindex to match another dataframe 770 | df.reindex_like(df2) 771 | df.reindex_like(df2, method="ffill") # Fill missing values 772 | # pad/ffill: Forward fill 773 | # bfill/backfill: Backward fill 774 | # nearest: Nearest index value fill 775 | ``` 776 | 777 | 778 | 779 | ### 4.4 Replacing DataFrame Values 780 | [go to top](#top) 781 | 782 | 783 | ```python 784 | # Replace strings with numbers 785 | df.replace(['Awful', 'Poor', 'OK', 'Acceptable', 'Perfect'], [0, 1, 2, 3, 4]) 786 | 787 | # Replace using regex 788 | df.replace({'\n': '
'}, regex=True) 789 | 790 | # Removing Substrings 791 | df['column_name'] = df['column_name'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC')) 792 | ``` 793 | 794 | 795 | 796 | ### 4.5 Function Application on DataFrames 797 | [go to top](#top) 798 | 799 | 800 | ```python 801 | # Apply function to all values in a scope 802 | df['column_name'].apply(function_name) 803 | 804 | # Apply function to all values in DataFrame 805 | df.applymap(function_name) 806 | ``` 807 | 808 | 809 | 810 | ### 4.6 Descriptive Statistics 811 | [go to top](#top) 812 | 813 | 814 | You can do a bunch of basic statistical calculations on the rows of a DataFrame! 815 | 816 | ```python 817 | # Sum along axis 818 | # axis=0 : Along columns 819 | # axis=1 : Along rows 820 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html 821 | df.sum() # Default axis is 0 822 | df.sum(axis=1) 823 | df.sum(axis=0, skipna=True, numeric_only=True, min_count=0) 824 | 825 | # Even more! 826 | df.count() # Number of non-null observations 827 | df.mean() # Mean of Values 828 | df.median() # Median of Values 829 | df.mode() # Mode of values 830 | df.std() # Standard Deviation of the Values 831 | df.min() # Minimum Value 832 | df.max() # Maximum Value 833 | df.abs() # Absolute Value 834 | df.prod() # Product of Values 835 | df.cumsum() # Cumulative Sum 836 | df.cumprod() # Cumulative Product 837 | 838 | # Or just call all of them at once! 839 | df.describe() 840 | ``` 841 | 842 | 843 | 844 | ### 4.7 Statistical Methods 845 | [go to top](#top) 846 | 847 | 848 | ```python 849 | # Calculate percentage change 850 | df.pct_change() # Column wise 851 | df.pct_change(axis=1) # Row wise 852 | 853 | # Covariance 854 | s.cov(s2) # For series 855 | df.cov() # For frame (calculates covariance between all columns) 856 | 857 | # Correlation 858 | df.corr() # For frames 859 | df['col_1'].corr(df['col_2']) # For series 860 | 861 | # Data Ranking (Series) 862 | # https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.rank.html 863 | # Check the docs for tie-breaking methods 864 | # average, min, max, first (Default method='average') 865 | s.rank() 866 | ``` 867 | 868 | 869 | 870 | ### 4.8 Window Functions 871 | [go to top](#top) 872 | 873 | 874 | ```python 875 | # Rolling Window 876 | df_rolling = df.rolling(window=3) 877 | 878 | # Now you can use the window! 879 | # You may use all the descriptive stats and statistical methods 880 | df_rolling.sum() 881 | df_rolling.mean() 882 | df_rolling.median() 883 | df_rolling.std() 884 | # and so on... 885 | 886 | # Expanding Window 887 | # (Yields the value of the statistic with all the data available up to that point in time) 888 | df_expanding = df.expanding(min_periods=1) 889 | 890 | # Exponential Weighted Functions 891 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html 892 | # You can specify decay, half-life, etc. Check the docs! 893 | df.ewm() 894 | ``` 895 | 896 | 897 | 898 | ### 4.9 Data Aggregation 899 | [go to top](#top) 900 | 901 | 902 | ```python 903 | # Basically custom operations on windows! 904 | df_rolling.aggregate(FUNCTION) # On Whole DF 905 | df_rolling['col'].aggregate(FUNCTION) # On Single Column 906 | df_rolling[['col', 'col2']].aggregate(FUNCTION) # On Multiple Columns 907 | 908 | # Multiple functions (You'll get two columns as output) 909 | df_rolling.aggregate([FUNCTION_1, FUNCTION_2]) 910 | 911 | # Multiple functions, on different columns 912 | df_rolling.aggregate({'col_1': FUNCTION_1, 'col_2': FUNCTION_2}) 913 | 914 | # If you don't run it on a rolling window, it reduces the dimensionality of the data! 915 | df.aggregate(np.sum) # Sums the entire column 916 | ``` 917 | 918 | 919 | 920 | ### 4.10 Dealing with Missing Data 921 | [go to top](#top) 922 | 923 | 924 | Null values can be NA, NaN, NaT, or None. 925 | 926 | - NaN: Not a Number 927 | - NaT: Not a Time 928 | 929 | ```python 930 | # Detect Missing Values 931 | df.isnull() # Gives True if value is null 932 | df.notnull() # Gives True if value is not null 933 | 934 | # Filling Missing Data With Scalar 935 | df.fillna(scalar_number) 936 | 937 | # Filling Missing Data 938 | # pad/fill: Fills forward 939 | # bfill/backfill: Fills backwards 940 | df.fillna(method='pad') 941 | 942 | # Drop Missing Values 943 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html 944 | df.dropna(axis=0, how='any') # Drop rows with any column containing None 945 | df.dropna(axis=0, how='all') # Drop rows with all columns containing None 946 | df.dropna(axis=0, thresh=2) # Drop rows with 2 or more columns containing None 947 | ``` 948 | 949 | 950 | 951 | ### 4.11 GroupBy Operations 952 | [go to top](#top) 953 | 954 | 955 | Source: 956 | 957 | You can group data within your DataFrames in order to: 958 | 959 | - Split the DF 960 | - Apply a function to the DF 961 | - Aggregation 962 | - Transformation 963 | - Filtration 964 | - Combine certain results 965 | 966 | ```python 967 | # Group Data 968 | df_grouped = df.groupby('key') # By column 969 | df_grouped = df.groupby('key', axis=1) # By row 970 | df_grouped = df.groupby(['col_1', 'col_2']) # Multi-Column Group 971 | 972 | # View the groups! 973 | df_grouped.groups 974 | 975 | # You can iterate through grouped dfs as well! 976 | for i in df_grouped: 977 | pass 978 | 979 | # Select a Single Group 980 | df_grouped.get_group('group_name') 981 | 982 | # Apply Aggregations 983 | df_grouped.agg(function) 984 | df_grouped.agg([function_1, function_2]) 985 | 986 | # Apply Transformations 987 | # Transforms groups or columns inside the dataframe 988 | transformation_function = lambda x: (x - x.mean()) / x.std()*10 989 | df_grouped.transform(transformation_function) 990 | 991 | # Apply Filters 992 | # Works like the native Python filter(filtering_function, iterable) ! 993 | df_grouped.filter(filtering_function) 994 | df_grouped.filter(lambda x: len(x) > = 3) 995 | ``` 996 | 997 | 998 | 999 | ### 4.12 Merging and Joining 1000 | [go to top](#top) 1001 | 1002 | 1003 | > Pandas provides a single function, **merge**, as the entry point for all standard database join operations between DataFrame objects − 1004 | > 1005 | > `pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)` 1006 | > 1007 | > Here, we have used the following parameters − 1008 | > 1009 | > - **left** − A DataFrame object. 1010 | > - **right** − Another DataFrame object. 1011 | > - **on** − Columns (names) to join on. Must be found in both the left and right DataFrame objects. 1012 | > - **left_on** − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame. 1013 | > - **right_on** − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame. 1014 | > - **left_index** − If **True,** use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame. 1015 | > - **right_index** − Same usage as **left_index** for the right DataFrame. 1016 | > - **how** − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. Each method has been described below. 1017 | > - **sort** − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases. 1018 | > 1019 | > 1020 | 1021 | ```python 1022 | # Code source: https://www.tutorialspoint.com/python_pandas/python_pandas_merging_joining.htm 1023 | 1024 | # Merge two DFs via key 1025 | left = pd.DataFrame({ 1026 | 'id':[1,2,3,4,5], 1027 | 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 1028 | 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) 1029 | right = pd.DataFrame({ 1030 | 'id':[1,2,3,4,5], 1031 | 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 1032 | 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) 1033 | 1034 | pd.merge(left,right,on='id') 1035 | 1036 | ''' 1037 | OUTPUT 1038 | 1039 | Name_x id subject_id_x Name_y subject_id_y 1040 | 0 Alex 1 sub1 Billy sub2 1041 | 1 Amy 2 sub2 Brian sub4 1042 | 2 Allen 3 sub4 Bran sub3 1043 | 3 Alice 4 sub6 Bryce sub6 1044 | 4 Ayoung 5 sub5 Betty sub5 1045 | ''' 1046 | 1047 | # Merge two DFs via multiple keys 1048 | pd.merge(left, right, on=['key_1', 'key_2']) # Unmerged values are discarded 1049 | 1050 | # Merge using 'HOW' 1051 | ''' 1052 | Merge Method SQL Equivalent Description 1053 | left LEFT OUTER JOIN Use keys from left object 1054 | right RIGHT OUTER JOIN Use keys from right object 1055 | outer FULL OUTER JOIN Use union of keys 1056 | inner INNER JOIN Use intersection of keys 1057 | ''' 1058 | pd.merge(left, right, on='key', how='left') 1059 | ``` 1060 | 1061 | Join Intuitions 1062 | 1063 | ![Image result for outer join inner join image](assets/hMKKt.jpg) 1064 | 1065 | Image source: 1066 | 1067 | 1068 | 1069 | ### 4.13 Concatenation 1070 | [go to top](#top) 1071 | 1072 | 1073 | > Pandas provides various facilities for easily combining together **Series, DataFrame**, and **Panel** objects. 1074 | > 1075 | > ` pd.concat(objs,axis=0,join='outer',join_axes=None, ignore_index=False)` 1076 | > 1077 | > - **objs** − This is a sequence or mapping of Series, DataFrame, or Panel objects. 1078 | > - **axis** − {0, 1, ...}, default 0. This is the axis to concatenate along. 1079 | > - **join** − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection. 1080 | > - **ignore_index** − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1. 1081 | > - **join_axes** − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic. 1082 | > 1083 | > 1084 | 1085 | ```python 1086 | # Concatenate DFs 1087 | pd.concat([one, two]) # Adds the rows of two DFs together 1088 | pd.concat([one, two], keys=['x', 'y']) # This gives keys to each specific DF 1089 | pd.concat([one, two], ignore_index=True) # You can also make it ignore the original index 1090 | 1091 | # Concatenate using Append 1092 | one.append(two) 1093 | ``` 1094 | 1095 | 1096 | 1097 | ## 5. EXTRA: Helpful Notes 1098 | 1099 | I couldn't find a suitable place to put this information, so I'll put it here: 1100 | 1101 | - Pivot tables, stacking, and unstacking 1102 | 1103 | - Package configuration 1104 | 1105 | 1106 | 1107 | 1108 | ``` 1109 | . . 1110 | . |\-^-/| . 1111 | /| } O.=.O { |\ 1112 | ``` 1113 | 1114 | ​ 1115 | 1116 | ------ 1117 | 1118 | [![Yeah! Buy the DRAGON a COFFEE!](../assets/COFFEE%20BUTTON%20%E3%83%BE(%C2%B0%E2%88%87%C2%B0%5E).png)](https://www.buymeacoffee.com/methylDragon) 1119 | 1120 | -------------------------------------------------------------------------------- /Numpy/01 Numpy Basics.md: -------------------------------------------------------------------------------- 1 | # Numpy Basics 2 | 3 | Author: methylDragon 4 | Contains a syntax reference and code snippets for Numpy! 5 | It's a collection of code snippets and tutorials from everywhere all mashed together! 6 | 7 | ------ 8 | 9 | ## Pre-Requisites 10 | 11 | ### Required 12 | 13 | - Python knowledge, this isn't a tutorial! 14 | - Numpy installed 15 | - I'll assume you've already run this line as well `import numpy as np` 16 | 17 | 18 | 19 | ## Table Of Contents 20 | 21 | 1. [Introduction](#1) 22 | 2. [Array Basics](#2) 23 | 2.1 [Configuring Numpy](#2.1) 24 | 2.2 [Numpy Data Types](#2.2) 25 | 2.3 [Creating Arrays](#2.3) 26 | 2.4 [Array Basics and Attributes](#2.4) 27 | 2.5 [Casting](#2.5) 28 | 2.6 [Some Array Methods](#2.6) 29 | 2.7 [Array Indexing](#2.7) 30 | 2.8 [Array Slicing](#2.8) 31 | 2.9 [Reshaping Arrays](#2.9) 32 | 2.10 [Array Concatenation and Splitting](#2.10) 33 | 2.11 [Array Arithmetic](#2.11) 34 | 2.12 [More Array Math](#2.12) 35 | 3. [Going Deeper With Arrays](#3) 36 | 3.1 [Broadcasting](#3.1) 37 | 3.2 [Vectorize](#3.2) 38 | 3.3 [Iterating Through Axes](#3.3) 39 | 3.4 [Modifying Output Directly](#3.4) 40 | 3.5 [Locating Elements](#3.5) 41 | 3.6 [Aggregations](#3.6) 42 | 3.7 [Comparisons](#3.7) 43 | 3.8 [Sorting Arrays](#3.8) 44 | 3.9 [Fancy Indexing](#3.9) 45 | 3.10 [Structured Arrays](#3.10) 46 | 4. [Matrices](#4) 47 | 4.1 [Linear Algebra Functions](#4.1) 48 | 5. [Numpy I/O](#5) 49 | 5.1 [Import from CSV](#5.1) 50 | 5.2 [Saving and Loading](#5.2) 51 | 52 | 53 | 54 | 55 | ## 1. Introduction 56 | 57 | > NumPy is the fundamental package for scientific computing with Python. It contains among other things: 58 | > 59 | > - a powerful N-dimensional array object 60 | > - sophisticated (broadcasting) functions 61 | > - tools for integrating C/C++ and Fortran code 62 | > - useful linear algebra, Fourier transform, and random number capabilities 63 | > 64 | > Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. 65 | > 66 | > http://www.numpy.org/ 67 | 68 | This document will list the most commonly used functions in Numpy, to serve as a reference when using it. 69 | 70 | It's especially useful because numpy is more efficient than native Python in terms of space usage and speed! 71 | 72 | The reason for that is because of how the arrays are stored: 73 | 74 | ![Image result for numpy vs python](assets/array_vs_list.png) 75 | 76 | Image source: https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html 77 | 78 | You can see that the Python list stores pointers and has to dereference the pointers, but the Numpy array doesn't, because the objects are stored incrementally from the head! 79 | 80 | The Python pointers are extra overhead, same with needing to dereference them. 81 | 82 | It's so useful you see it used in a lot of other packages like OpenCV, Scipy, and pandas! 83 | 84 | --- 85 | 86 | Install it! 87 | 88 | ```shell 89 | $ pip install numpy 90 | ``` 91 | 92 | If you need additional help or need a refresher on the parameters, feel free to use: 93 | 94 | ```python 95 | help(np.FUNCTION_YOU_NEED_HELP_WITH) 96 | ``` 97 | 98 | --- 99 | 100 | **Credits:** 101 | 102 | A lot of these notes I'm adapting from 103 | 104 | https://jakevdp.github.io/PythonDataScienceHandbook/index.html 105 | 106 | http://cs231n.github.io/python-numpy-tutorial/ 107 | 108 | https://docs.scipy.org/doc/numpy-1.15.1/reference/ 109 | 110 | 111 | 112 | ## 2. Array Basics 113 | 114 | The core, most important object in Numpy is the **ndarray**, which stands for n-dimensional array. 115 | 116 | > An [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its [`shape`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.shape.html#numpy.ndarray.shape), which is a [`tuple`](https://docs.python.org/dev/library/stdtypes.html#tuple)of *N* positive integers that specify the sizes of each dimension. The type of items in the array is specified by a separate [data-type object (dtype)](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html#arrays-dtypes), one of which is associated with each ndarray. 117 | > 118 | > As with other container objects in Python, the contents of an [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) can be accessed and modified by [indexing or slicing](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#arrays-indexing) the array (using, for example, *N* integers), and via the methods and attributes of the [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray). 119 | > 120 | > Different [`ndarrays`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) can share the same data, so that changes made in one [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) may be visible in another. That is, an ndarray can be a *“view”* to another ndarray, and the data it is referring to is taken care of by the *“base”* ndarray. ndarrays can also be views to memory owned by Python [`strings`](https://docs.python.org/dev/library/stdtypes.html#str) or objects implementing the `buffer` or [array](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.interface.html#arrays-interface) interfaces. 121 | > 122 | > https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html 123 | 124 | 125 | 126 | ### 2.1 Configuring Numpy 127 | [go to top](#top) 128 | 129 | 130 | ```python 131 | # Set printing precision 132 | np.set_printoptions(precision=2) 133 | ``` 134 | 135 | 136 | 137 | ### 2.2 Numpy Data Types 138 | [go to top](#top) 139 | 140 | 141 | #### **List** 142 | 143 | | Data type | Description | 144 | | ------------ | ------------------------------------------------------------ | 145 | | `bool_` | Boolean (True or False) stored as a byte | 146 | | `int_` | Default integer type (same as C `long`; normally either `int64` or `int32`) | 147 | | `intc` | Identical to C `int` (normally `int32` or `int64`) | 148 | | `intp` | Integer used for indexing (same as C `ssize_t`; normally either `int32` or `int64`) | 149 | | `int8` | Byte (-128 to 127) | 150 | | `int16` | Integer (-32768 to 32767) | 151 | | `int32` | Integer (-2147483648 to 2147483647) | 152 | | `int64` | Integer (-9223372036854775808 to 9223372036854775807) | 153 | | `uint8` | Unsigned integer (0 to 255) | 154 | | `uint16` | Unsigned integer (0 to 65535) | 155 | | `uint32` | Unsigned integer (0 to 4294967295) | 156 | | `uint64` | Unsigned integer (0 to 18446744073709551615) | 157 | | `float_` | Shorthand for `float64`. | 158 | | `float16` | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa | 159 | | `float32` | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa | 160 | | `float64` | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa | 161 | | `complex_` | Shorthand for `complex128`. | 162 | | `complex64` | Complex number, represented by two 32-bit floats | 163 | | `complex128` | Complex number, represented by two 64-bit floats | 164 | 165 | #### **nan and inf** 166 | 167 | It's numpy's version of None and infinity! 168 | 169 | ```python 170 | np.nan 171 | np.inf 172 | 173 | # To check if something is nan or inf, 174 | np.isnan(x) 175 | np.isinf(x) 176 | ``` 177 | 178 | 179 | 180 | ### 2.3 Creating Arrays 181 | [go to top](#top) 182 | 183 | 184 | General note: Basically any of these functions takes a dtype parameter where you can state the data-type of the output. 185 | 186 | #### **From Python List** 187 | 188 | ```python 189 | # Basic 190 | np.array([1, 2, 3, 4, 5]) 191 | # Out: array([1, 2, 3, 4, 5]) 192 | 193 | # Upcasted (ints are casted to float due to type constraint) 194 | np.array([1.1, 2, 3, 4, 5]) 195 | # Out: array([1.1, 2., 3., 4., 5.]) 196 | 197 | # Explicit type 198 | np.array([1, 2, 3, 4, 5], dtype='float32') 199 | # Out: array([1., 2., 3., 4., 5.], dtype=float32) 200 | 201 | # Multi-dimensional 202 | np.array([[1,2],[3,4],[5,6]]) 203 | # Out: array([[1,2], 204 | # [3,4], 205 | # [5,6]]) 206 | ``` 207 | 208 | #### **From Scratch** 209 | 210 | **Filled Arrays** 211 | 212 | ```python 213 | # All zeroes 214 | np.zeros(5, dtype=int) 215 | # Out: array([0, 0, 0, 0, 0]) 216 | 217 | # Multi-dimensional Zeros 218 | np.zeros((2, 2)) 219 | # Out: array([[0., 0.], 220 | # [0., 0.]]) 221 | 222 | # Ones 223 | np.ones((2, 2), dtype=float) 224 | # Out: array([[1., 1.], 225 | # [1., 1.]]) 226 | 227 | # Filled array (It even works for non-standard numbers! AHAHAHA) 228 | np.full((2, 2), 'CH3') 229 | # array([['CH3', 'CH3'], 230 | # ['CH3', 'CH3']], dtype=' 347 | [go to top](#top) 348 | 349 | #### **Shape and Index** 350 | 351 | It is important to get a proper understanding of the shape of numpy arrays! 352 | 353 | ![Image result for numpy shape](assets/elsp_0105.png) 354 | 355 | [Image Source](https://www.oreilly.com/library/view/elegant-scipy/9781491922927/ch01.html) 356 | 357 | The corresponding arrays will look like: 358 | 359 | ```python 360 | # 1D 361 | # Every 1D array can be treated as a column vector! 362 | [7, 2, 9, 10] 363 | 364 | # 2D 365 | [[5.2, 3.0, 4.5], 366 | [9.1, 0.1, 0.3]] 367 | 368 | # And so on 369 | ``` 370 | 371 | Another way of looking at it is, **matrix indexing**! Numpy goes by **i, j**, from **2D arrays onwards only**. 372 | 373 | If you want to think of it as x, and y, then axis 0 is y, and axis 1 is x. So the indexing is `(y, x)`, and `(i, j)`. 374 | 375 | > If you want to do matrix or vector operations, it is best to do it from at least a 2D array. 376 | > 377 | > 378 | 379 | ![Image result for matrix i j](assets/Matrix.svg.png) 380 | 381 | [Image Source](https://simple.wikipedia.org/wiki/Matrix_(mathematics)) 382 | 383 | #### **Attributes** 384 | 385 | 386 | ```python 387 | # Suppose we create a 3 dimensional array 388 | example_array = np.random.randint(5, size=(2, 3, 4)) 389 | # Out: array([[[0, 0, 3, 3], 390 | # [2, 1, 1, 3], 391 | # [2, 2, 4, 4]], 392 | # 393 | # [[2, 0, 1, 3], 394 | # [2, 3, 0, 1], 395 | # [2, 0, 1, 2]]]) 396 | 397 | # Dimensions 398 | example_array.ndim # 3 399 | 400 | # Shape 401 | example_array.shape # (2, 3, 4) planes, rows, columns (for images, height, width, depth) 402 | 403 | # Total Elements 404 | example_array.size # 24 (which is 2 * 3 * 4) 405 | 406 | # Type 407 | example_array.dtype # dtype('int64') 408 | 409 | # Byte-size of each element 410 | example_array.itemsize # 8 411 | 412 | # Total byte-size 413 | example_array.nbytes # 192 (which is 2 * 3 * 4 * 8) 414 | ``` 415 | 416 | 417 | 418 | ### 2.5 Casting 419 | [go to top](#top) 420 | 421 | 422 | ```python 423 | # Just use the .astype() method! 424 | 425 | np.array([True, True, True, False, False, False]).astype('int') 426 | # Out: array([1, 1, 1, 0, 0, 0]) 427 | ``` 428 | #### **Array to List** 429 | 430 | ```python 431 | np.array([1, 2, 3]).tolist() # [1, 2, 3] native Python list! 432 | ``` 433 | 434 | #### **List to Array** 435 | 436 | ```python 437 | np.asarray([1, 2, 3]) # This can take list of tuples, tuples, etc.! 438 | ``` 439 | 440 | 441 | 442 | ### 2.6 Some Array Methods 443 | [go to top](#top) 444 | 445 | 446 | There are really a lot of them! 447 | 448 | #### **Repeat and Tile** 449 | 450 | ```python 451 | a = [1, 2, 3] 452 | 453 | np.tile(a, 2) # array([1, 2, 3, 1, 2, 3]) 454 | np.repeat(a, 2) # array([1, 1, 2, 2, 3, 3]) 455 | ``` 456 | 457 | #### **Get Unique** 458 | 459 | ```python 460 | a = np.array([1, 1, 1, 1, 2, 2, 2, 3, 3, 4]) 461 | 462 | np.unique(a, return_counts=True) 463 | # Out: (array([1, 2, 3, 4]), array([4, 3, 2, 1])) 464 | # (Unique set, Counts) 465 | ``` 466 | 467 | #### **Rounding** 468 | 469 | ```python 470 | a = np.array([1.111, 2.222, 3.333, 4.444]) 471 | 472 | np.around(a) # array([1., 2., 3., 4.]) 473 | np.around(a, 2) # array([1.11, 2.22, 3.33, 4.44]) 474 | 475 | b = np.array([12345]) 476 | 477 | np.around(b, -1) # array([12340]) 478 | np.around(b, -2) # array([12300]) 479 | ``` 480 | 481 | #### **Floor** 482 | 483 | ```python 484 | a = np.array([1.111, 2.222, 3.333, 4.444, 5.555]) 485 | 486 | np.floor(a) # array([1., 2., 3., 4., 5.]) 487 | ``` 488 | 489 | #### **Ceil** 490 | 491 | ```python 492 | a = np.array([1.111, 2.222, 3.333, 4.444, 5.555]) 493 | 494 | np.ceil(a) # array([2., 3., 4., 5., 6.]) 495 | ``` 496 | 497 | #### **Count Non-Zeroes** 498 | 499 | ```python 500 | np.count_nonzero(array) # Gives you number of non-zero elements in the array 501 | ``` 502 | 503 | #### **Digitize** 504 | 505 | ```python 506 | x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 507 | bins = np.array([0, 3, 6, 9]) 508 | 509 | # Return index of the bin each element belongs to 510 | # You can use this together with take to get the digitized array! 511 | np.digitize(x, bins) # array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) 512 | ``` 513 | 514 | #### **Clip** 515 | 516 | Clip values 517 | 518 | ```python 519 | x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 520 | 521 | np.clip(x, 3, 8) # array([3, 3, 3, 3, 4, 5, 6, 7, 8, 8]) 522 | ``` 523 | 524 | #### **Histogram and Bincount** 525 | 526 | ```python 527 | x = np.array([1,1,2,2,2,4,4,5,6,6,6]) 528 | 529 | np.bincount(x) # array([0, 2, 3, 0, 2, 1, 3]) 530 | # How to read output: 531 | # 0 occurs 0 times 532 | # 1 occurs 2 times 533 | # 2 occurs 3 times and so on 534 | 535 | np.histogram(x, [0, 2, 4, 6, 8]) # (array([2, 3, 3, 3]), array([0, 2, 4, 6, 8])) 536 | # First array are the counts 537 | # Second array are the bins 538 | # In this case, the bottom of the bins are inclusive, the tops are not 539 | # Eg. [0, 2): 2 540 | # [2, 4): 3, 541 | # [4, 6): 3 542 | # [6, 8): 3 543 | ``` 544 | 545 | **At** 546 | 547 | If you just want to target these functions at a subset of an array, use at 548 | 549 | ```python 550 | np.some_numpy_function.at(array, [0, 1]) 551 | 552 | # Example 553 | x = np.array([1, 2, 3, 4]) 554 | np.negative.at(x, [0, 1]) # This will mutate x 555 | 556 | # x is now array([-1, -2, 3, 4]) 557 | ``` 558 | 559 | 560 | 561 | ### 2.7 Array Indexing 562 | [go to top](#top) 563 | 564 | 565 | Of course, you can modify once you index as per normal as well! 566 | 567 | **Note:** if you have an int array, and you try to replace it with a float, it'll be casted to int. (eg. 3.12 -> 3) 568 | 569 | ```python 570 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 571 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) 572 | # Out: array([[[ 1, 2, 3], 573 | # [ 4, 5, 6], 574 | # [ 7, 8, 9]], 575 | # 576 | # [[10, 11, 12], 577 | # [13, 14, 15], 578 | # [16, 17, 18]]]) 579 | ``` 580 | #### **One-dimensional** 581 | 582 | Works just like native Python! 583 | 584 | ```python 585 | array[0] 586 | # Out: array([[1, 2, 3], 587 | # [4, 5, 6], 588 | # [7, 8, 9]]) 589 | 590 | array[-1] 591 | # Out: array([[10, 11, 12], 592 | # [13, 14, 15], 593 | # [16, 17, 18]]) 594 | ``` 595 | 596 | #### **Multi-dimensional** 597 | 598 | ```python 599 | array[0, 0] 600 | # Out: array([1, 2, 3]) 601 | 602 | array[0, 0, 0] 603 | # Out: 1 604 | ``` 605 | 606 | #### **Conditional Indexing (Boolean Masks)** 607 | 608 | ```python 609 | a = np.array([1, 2, 3, 4, 5]) 610 | 611 | a[a > 3] # array([4, 5]) 612 | a[np.iscomplex(a)] # array([], dtype=int64) 613 | ``` 614 | 615 | 616 | 617 | ### 2.8 Array Slicing 618 | [go to top](#top) 619 | 620 | 621 | **Note:** Unlike in native Python, slicing an array gives you an **array view**, not a copy! So if you alter the array view, it'll alter the original array! 622 | 623 | #### **One-dimensional** 624 | 625 | ```python 626 | array = np.arange(10) 627 | # Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 628 | 629 | # From start 630 | array[:5] 631 | # Out: array([0, 1, 2, 3, 4]) 632 | 633 | # From end 634 | array[5:] 635 | # Out: array([5, 6, 7, 8, 9]) 636 | 637 | # From middle 638 | array[4:7] 639 | # Out: array([4, 5, 6]) 640 | 641 | # Every other element 642 | array[::2] 643 | # Out: array([0, 2, 4, 6, 8]) 644 | 645 | # Every other element from index 1 646 | array[1::2] 647 | # Out: array([1, 3, 5, 7, 9]) 648 | 649 | # Reversed 650 | array[::-1] 651 | # Out: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) 652 | 653 | # Reversed, every other element from index 5 654 | array[5::-2] 655 | # Out: array([5, 3, 1]) 656 | ``` 657 | 658 | #### **Multi-dimensional** 659 | 660 | ```python 661 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 662 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) 663 | # Out: array([[[ 1, 2, 3], 664 | # [ 4, 5, 6], 665 | # [ 7, 8, 9]], 666 | # 667 | # [[10, 11, 12], 668 | # [13, 14, 15], 669 | # [16, 17, 18]]]) 670 | 671 | # First from start 672 | array[:1] 673 | # Out: array([[[1, 2, 3], 674 | # [4, 5, 6], 675 | # [7, 8, 9]]]) 676 | 677 | # First from end 678 | array[1:] 679 | # Out: array([[[10, 11, 12], 680 | # [13, 14, 15], 681 | # [16, 17, 18]]]) 682 | 683 | # First from start from first array from start as nested array 684 | array[:1, :1] 685 | # Out: array([[[1, 2, 3]]]) 686 | 687 | # Get first element from every innermost array as nested array 688 | array[:, :, :1] 689 | # Out: array([[[ 1], 690 | # [ 4], 691 | # [ 7]], 692 | # 693 | # [[10], 694 | # [13], 695 | # [16]]]) 696 | 697 | # Reverse innermost two layers 698 | array[:, ::-1, ::-1] 699 | # Out: array([[[ 9, 8, 7], 700 | # [ 6, 5, 4], 701 | # [ 3, 2, 1]], 702 | # 703 | # [[18, 17, 16], 704 | # [15, 14, 13], 705 | # [12, 11, 10]]]) 706 | ``` 707 | 708 | #### **Multi-dimensional Access** 709 | 710 | Sometimes you just want the columns or rows nicely shown as a one dimensional array instead of a nested one. 711 | 712 | **Note: They'll still be editable views!** 713 | 714 | Here's how to do it! 715 | 716 | ```python 717 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 718 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) 719 | # Out: array([[[ 1, 2, 3], 720 | # [ 4, 5, 6], 721 | # [ 7, 8, 9]], 722 | # 723 | # [[10, 11, 12], 724 | # [13, 14, 15], 725 | # [16, 17, 18]]]) 726 | 727 | # First column from first array 728 | array[0][:, 0] 729 | # Out: array([1, 4, 7]) 730 | 731 | # First row from first array (also equivalent to array[0][0]) 732 | array[0][0, :] 733 | # Out: array([1, 2, 3]) 734 | 735 | # Nested array of first column from each array 736 | array[:, :, 0] 737 | # Out: array([[ 1, 4, 7], 738 | # [10, 13, 16]]) 739 | ``` 740 | 741 | #### **Array Views** 742 | 743 | Remember what I said about array views? 744 | 745 | ```python 746 | # Native Python 747 | a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 748 | b = a[:5] # [0, 1, 2, 3, 4] 749 | 750 | b[0]= 5 # b is now [5, 1, 2, 3, 4] 751 | a # But a is still [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 752 | 753 | # Numpy 754 | a = np.arange(10) # array([1, 2, 3, 4, 5, 6, 7, 8, 9]) 755 | 756 | b = a[5:] # array([0, 1, 2, 3, 4]) 757 | b[0] = 5 758 | a # a is now [5, 1, 2, 3, 4, 5, 6, 7, 8, 9] 759 | ``` 760 | 761 | #### **Copying Instead of Views** 762 | 763 | ```python 764 | # Just use .copy() ! 765 | 766 | a = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 767 | 768 | b = a[5:].copy() # array([0, 1, 2, 3, 4]) 769 | b[0] = 5 # b is array([5, 1, 2, 3, 4]) 770 | a # a is unchanged 771 | ``` 772 | 773 | 774 | 775 | ### 2.9 Reshaping Arrays 776 | [go to top](#top) 777 | 778 | 779 | #### **Reshape** 780 | 781 | ```python 782 | array = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 783 | 784 | # Reshape reshapes the arrays. Of course! 785 | # You can reshape the array into any n dimensions! Just make sure all the arguments multiplied equal the number of elements of your input array! 786 | 787 | array.reshape(10) 788 | # Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 789 | 790 | array.reshape(1, 10) 791 | # Out: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) 792 | 793 | array.reshape(2, 5) 794 | # Out: array([[0, 1, 2, 3, 4], 795 | # [5, 6, 7, 8, 9]]) 796 | 797 | array.reshape(1, 1, 5, 2) 798 | # Out: array([[[[0, 1], 799 | # [2, 3], 800 | # [4, 5], 801 | # [6, 7], 802 | # [8, 9]]]]) 803 | 804 | # You can also use reshape(-1, ) to have numpy figure out the other size for you! 805 | array.reshape(-1, 5) 806 | # Out: array([[0, 1, 2, 3, 4], 807 | # [5, 6, 7, 8, 9]]) 808 | ``` 809 | 810 | #### **Reshaping with np.newaxis** 811 | 812 | ```python 813 | # Create as row 814 | array[np.newaxis, :] # Equivalent to array.reshape(1, 10) 815 | # Out: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) 816 | 817 | # Create as column 818 | array[:, np.newaxis] # Equivalent to array.reshape(10, 1) 819 | # Out: array([[0], 820 | # [1], 821 | # [2], 822 | # [3], 823 | # [4], 824 | # [5], 825 | # [6], 826 | # [7], 827 | # [8], 828 | # [9]]) 829 | ``` 830 | 831 | #### **Flatten and Ravel** 832 | 833 | ```python 834 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 835 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) 836 | 837 | ## Flatten creates a copy! 838 | 839 | # Equivalent 840 | array.flatten() 841 | np.flatten(array) 842 | # Out: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]) 843 | 844 | ## Ravel creates a view! Editing the ravelled array will edit the parent! 845 | 846 | # Equivalent 847 | array.ravel() 848 | np.ravel(array) 849 | # Out: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]) 850 | ``` 851 | 852 | #### **Squeeze** 853 | 854 | Remove single dimensional entries 855 | 856 | ```python 857 | array = np.array([[[1]]]) 858 | 859 | # Equivalent 860 | np.squeeze(array) # 1 861 | array.squeeze() # 1 862 | ``` 863 | 864 | #### **Transpose** 865 | 866 | ```python 867 | array = np.array([[1, 1], [2, 2]]) 868 | 869 | # Equivalent 870 | array.T 871 | array.transpose() 872 | np.transpose(array) 873 | np.rollaxis(array, 1) 874 | np.swapaxes(array, 0, 1) 875 | 876 | # Out: array([[1, 2], 877 | # [1, 2]]) 878 | ``` 879 | 880 | 881 | 882 | 883 | 884 | 885 | 886 | ### 2.10 Array Concatenation and Splitting 887 | [go to top](#top) 888 | 889 | 890 | #### **Concatenating** 891 | 892 | ```python 893 | a = np.array([1, 2, 3]) 894 | b = np.array([4, 5, 6]) 895 | c = np.array([[7, 8, 9], [10, 11, 12]]) 896 | 897 | np.concatenate([a, b]) 898 | # Out: array([1, 2, 3, 4, 5, 6]) 899 | 900 | # You can do it with more than two arrays 901 | np.concatenate([a, b, a, b]) 902 | # Out: array([1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]) 903 | 904 | # Just make sure all inputs are of the same dimension! 905 | np.concatenate([c, c, c]) 906 | # Out: array([[ 7, 8, 9], 907 | # [10, 11, 12], 908 | # [ 7, 8, 9], 909 | # [10, 11, 12], 910 | # [ 7, 8, 9], 911 | # [10, 11, 12]]) 912 | 913 | # You may even choose a different axis to concatenate along! 914 | np.concatenate([c, c, c], axis=1) 915 | # Out: array([[ 7, 8, 9, 7, 8, 9, 7, 8, 9], 916 | # [10, 11, 12, 10, 11, 12, 10, 11, 12]]) 917 | 918 | # More examples 919 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 920 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) 921 | 922 | np.concatenate([array, array], axis=0) 923 | # Out: array([[[ 1, 2, 3], 924 | # [ 4, 5, 6], 925 | # [ 7, 8, 9]], 926 | # 927 | # [[10, 11, 12], 928 | # [13, 14, 15], 929 | # [16, 17, 18]], 930 | # 931 | # [[ 1, 2, 3], 932 | # [ 4, 5, 6], 933 | # [ 7, 8, 9]], 934 | # 935 | # [[10, 11, 12], 936 | # [13, 14, 15], 937 | # [16, 17, 18]]]) 938 | 939 | np.concatenate([array, array], axis=1) 940 | # Out: array([[[ 1, 2, 3], 941 | # [ 4, 5, 6], 942 | # [ 7, 8, 9], 943 | # [ 1, 2, 3], 944 | # [ 4, 5, 6], 945 | # [ 7, 8, 9]], 946 | # 947 | # [[10, 11, 12], 948 | # [13, 14, 15], 949 | # [16, 17, 18], 950 | # [10, 11, 12], 951 | # [13, 14, 15], 952 | # [16, 17, 18]]]) 953 | 954 | np.concatenate([array, array], axis=2) 955 | # Out: array([[[ 1, 2, 3, 1, 2, 3], 956 | # [ 4, 5, 6, 4, 5, 6], 957 | # [ 7, 8, 9, 7, 8, 9]], 958 | # 959 | # [[10, 11, 12, 10, 11, 12], 960 | # [13, 14, 15, 13, 14, 15], 961 | # [16, 17, 18, 16, 17, 18]]]) 962 | ``` 963 | 964 | #### **Stacking** 965 | 966 | ```python 967 | a = np.array([1, 2, 3]) 968 | b = np.array([4, 5, 6]) 969 | 970 | # Vertical Stack 971 | np.vstack([a, b]) 972 | # Out: array([[1, 2, 3], 973 | # [4, 5, 6]]) 974 | 975 | # Horizontal Stack 976 | np.hstack([a, b]) 977 | # Out: array([1, 2, 3, 4, 5, 6]) 978 | 979 | # Third Axis Stack (Note how output is 3 dimensions) 980 | np.dstack([a, b]) 981 | # Out: array([[[1, 4], 982 | # [2, 5], 983 | # [3, 6]]]) 984 | ``` 985 | 986 | #### **Splitting** 987 | 988 | ```python 989 | array = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 990 | 991 | # Write the split indexes! 992 | a, b, c = np.split(array, [1, 2]) 993 | 994 | a # array([0]) 995 | b # array([1]) 996 | c # array([2, 3, 4, 5, 6, 7, 8, 9]) 997 | 998 | grid = np.arange(16).reshape((4, 4)) 999 | # Out: array([[ 0, 1, 2, 3], 1000 | # [ 4, 5, 6, 7], 1001 | # [ 8, 9, 10, 11], 1002 | # [12, 13, 14, 15]]) 1003 | 1004 | upper, lower = np.vsplit(grid, [2]) 1005 | 1006 | upper # array([[0 1 2 3], [4 5 6 7]]) 1007 | lower # array([[ 8, 9, 10, 11], [12, 13, 14, 15]])) 1008 | 1009 | left, right = np.hsplit(grid, [2]) 1010 | 1011 | left 1012 | # array([[ 0, 1], 1013 | # [ 4, 5], 1014 | # [ 8, 9], 1015 | # [12, 13]]) 1016 | 1017 | right 1018 | # array([[ 2, 3], 1019 | # [ 6, 7], 1020 | # [10, 11], 1021 | # [14, 15]]) 1022 | 1023 | # You can use dsplit also! But it only works on arrays of 3 dimensions or more 1024 | ``` 1025 | 1026 | 1027 | 1028 | ### 2.11 Array Arithmetic 1029 | [go to top](#top) 1030 | 1031 | 1032 | ```python 1033 | array = np.arange(4) # array([0, 1, 2, 3]) 1034 | 1035 | array + 5 # array([5, 6, 7, 8]) 1036 | array - 5 # array([-5, -4, -3, -2]) 1037 | array * 2 # array([0, 2, 4, 6, 8]) 1038 | array / 2 # array([0., 0.5, 1., 1.5]) 1039 | array // 2 # array([0, 0, 1, 1]) 1040 | 1041 | -array # array([0, -1, -2, -3]) 1042 | array ** 2 # array([0, 1, 4, 9]) 1043 | array % 2 # array([0, 1, 0, 1]) 1044 | 1045 | # Equivalent 1046 | np.add(array, 5) # + 1047 | np.subtract(array, 5) # - 1048 | np.multiply(array, 2) # * 1049 | np.divide(array, 2) # / 1050 | np.floor_divide(array, 2) # // 1051 | 1052 | np.negative(array) # - 1053 | np.power(array, 2) # ** 1054 | np.mod(array, 2) # % 1055 | ``` 1056 | 1057 | 1058 | 1059 | ### 2.12 More Array Math 1060 | [go to top](#top) 1061 | 1062 | 1063 | ```python 1064 | array = np.array([0, -1, 2, -3, 4]) 1065 | ``` 1066 | 1067 | 1068 | #### **Abs** 1069 | ```python 1070 | abs(array) # array([0, 1, 2, 3, 4]) 1071 | np.abs(array) # Same 1072 | np.absolute(array) # Same 1073 | ``` 1074 | 1075 | #### **Complex Mod** 1076 | ```python 1077 | x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j]) 1078 | np.abs(x) # array([ 5., 5., 2., 1.]) 1079 | ``` 1080 | 1081 | #### **Trigonometry** 1082 | ```python 1083 | theta = np.linspace(0, np.pi, 3) # array([ 0., 1.57079633, 3.14159265]) 1084 | 1085 | np.sin(theta) # array([0.00000000e+00, 1.00000000e+00, 1.22464680e-16]) 1086 | np.cos(theta) # array([1.00000000e+00, 6.12323400e-17,-1.00000000e+00]) 1087 | np.tan(theta) # array([0.00000000e+00, 1.63312394e+16, -1.22464680e-16]) 1088 | 1089 | # More Trigonometry 1090 | x = [-1, 0, 1] # By the way, YES, this is a Native Python list! 1091 | 1092 | np.arcsin(x) # array([-1.57079633, 0., 1.57079633]) turns it into a numpy array! 1093 | np.arccos(x) # You get what you expect 1094 | np.arctan(x) # Same here 1095 | ``` 1096 | 1097 | #### **Exponents** 1098 | ```python 1099 | x = [1, 2, 3] 1100 | 1101 | # e^x 1102 | np.exp(x) # array([2.71828183, 7.3890561, 20.08553692]) 1103 | 1104 | # 2^x 1105 | np.exp2(x) # array([2., 4., 8.]) 1106 | 1107 | # 3^x 1108 | np.power(3, x) # array([3, 9, 27]) 1109 | ``` 1110 | 1111 | #### **Logarithms** 1112 | ```python 1113 | np.log(x) # ln 1114 | np.log2(x) # log base 2 1115 | np.log10(x) # log base 10 1116 | 1117 | # Super high precision 1118 | np.expm1(x) # exp(x) - 1 1119 | np.log1p(x) # log(1 + x) 1120 | ``` 1121 | 1122 | #### **Reciprocal** 1123 | 1124 | ```python 1125 | np.reciprocal(x) # Basically power -1 1126 | ``` 1127 | 1128 | #### **Return Range of Values** 1129 | 1130 | ```python 1131 | a = np.array([1, 2, 3, 4]) 1132 | 1133 | np.ptp(a) # 3 (Maximum - Minimum) 1134 | ``` 1135 | 1136 | #### **Standard Deviation and Variance** 1137 | 1138 | ```python 1139 | np.std(x) # Standard Deviation 1140 | np.var(x) # Variance 1141 | ``` 1142 | 1143 | There's a lot more! Go look at the `scipy.special` package for a list of all of them! 1144 | 1145 | 1146 | 1147 | ## 3. Going Deeper With Arrays 1148 | 1149 | ### 3.1 Broadcasting 1150 | [go to top](#top) 1151 | 1152 | 1153 | ![array](../../python-data-tools-reference/Numpy/assets/array.jpg) 1154 | 1155 | Image source: https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm 1156 | 1157 | Broadcasting causes Numpy to pad or 'stretch' smaller arrays to allow them to operate on or with other larger arrays! 1158 | 1159 | > Broadcasting is possible if the following rules are satisfied: 1160 | > 1161 | > - Array with smaller **ndim** than the other is prepended with '1' in its shape. 1162 | > - Size in each dimension of the output shape is maximum of the input sizes in that dimension. 1163 | > - An input can be used in calculation, if its size in a particular dimension matches the output size or its value is exactly 1. 1164 | > - If an input has a dimension size of 1, the first data entry in that dimension is used for all calculations along that dimension. 1165 | > 1166 | > A set of arrays is said to be **broadcastable** if the above rules produce a valid result and one of the following is true: 1167 | > 1168 | > - Arrays have exactly the same shape. 1169 | > - Arrays have the same number of dimensions and the length of each dimension is either a common length or 1. 1170 | > - Array having too few dimensions can have its shape prepended with a dimension of length 1, so that the above stated property is true. 1171 | > 1172 | > https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm 1173 | 1174 | **Example** 1175 | 1176 | This is the example in the picture above! 1177 | 1178 | ![array](../../python-data-tools-reference/Numpy/assets/array.jpg) 1179 | 1180 | ```python 1181 | a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) 1182 | b = np.array([1.0,2.0,3.0]) 1183 | 1184 | a 1185 | # Out: array([[0., 0., 0.] 1186 | # [10., 10., 10.] 1187 | # [20., 20., 20.] 1188 | # [30., 30., 30.]]) 1189 | 1190 | b 1191 | # Out: array([1., 2., 3.]) 1192 | 1193 | a + b 1194 | # Out: array([[1., 2., 3.] 1195 | # [11., 12., 13.] 1196 | # [21., 22., 23.] 1197 | # [31., 32., 33.]]) 1198 | ``` 1199 | 1200 | **Uses** 1201 | 1202 | Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html 1203 | 1204 | **Centering An Array** 1205 | 1206 | ```python 1207 | X = np.random.random((10, 3)) 1208 | Xmean = X.mean() 1209 | 1210 | X_centered = X - Xmean 1211 | ``` 1212 | 1213 | **Plotting a Two-Dimensional Array** 1214 | 1215 | ```python 1216 | # x and y have 50 steps from 0 to 5 1217 | x = np.linspace(0, 5, 50) 1218 | y = np.linspace(0, 5, 50)[:, np.newaxis] 1219 | 1220 | z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x) 1221 | ``` 1222 | 1223 | 1224 | 1225 | ### 3.2 Vectorize 1226 | [go to top](#top) 1227 | 1228 | 1229 | You'd have noticed that all of the functions above seem to be able to act on every element in the array without needing the use of for-loops! 1230 | 1231 | You can get this ability for ANY function that you might want to write by using np.vectorize! 1232 | 1233 | **Vectorize** 1234 | 1235 | ```python 1236 | def my_add_n(a, n): 1237 | return a + n 1238 | 1239 | vfunc = np.vectorize(my_add_n) 1240 | 1241 | vfunc([0, 2, 4], 2) # array([2, 4, 6]) 1242 | 1243 | # You may specify the output type explicitly as well 1244 | # Note: Down-casting will occur if you stated int but inputted floats! 1245 | vfunc_float = np.vectorize(my_add_n, otypes=[np.float]) 1246 | ``` 1247 | **Excluding Parameters** 1248 | 1249 | ```python 1250 | # You may also declare parameters that shouldn't be vectorized! 1251 | # Source: https://docs.scipy.org/doc/numpy-1.9.2/reference/generated/numpy.vectorize.html 1252 | def mypolyval(p, x): 1253 | _p = list(p) 1254 | res = _p.pop(0) 1255 | while _p: 1256 | res = res*x + _p.pop(0) 1257 | return res 1258 | 1259 | vpolyval = np.vectorize(mypolyval, excluded=['p']) 1260 | 1261 | # Think of this like x^2 + 2x + 3, then feed in x = 0, x = 1 successively 1262 | vpolyval(p=[1, 2, 3], x=[0, 1]) # array([3, 6]) 1263 | 1264 | # Or you can state the exclusion inline 1265 | vpolyval.excluded.add(0) 1266 | vpolyval([1, 2, 3], x=[0, 1]) # array([3, 6]) 1267 | ``` 1268 | 1269 | 1270 | 1271 | ### 3.3 Iterating Through Axes 1272 | [go to top](#top) 1273 | 1274 | 1275 | You could use a for loop, or you could use this 1276 | 1277 | ```python 1278 | def state_max(x): 1279 | return np.max(x) 1280 | 1281 | np.apply_along_axis(state_max, axis=0, arr=array_to_parse) 1282 | ``` 1283 | 1284 | 1285 | 1286 | ### 3.4 Modifying Output Directly 1287 | [go to top](#top) 1288 | 1289 | 1290 | Ok. So now you've noticed that all the functions above are more or less vectorized functions. They're also called UFuncs, universal functions. 1291 | 1292 | Here are some nifty things you can do with them! 1293 | 1294 | So, for example, if you're dealing with a huge array 1295 | 1296 | ```python 1297 | a = np.arange(5) 1298 | b = np.empty(5) 1299 | 1300 | # Less efficient 1301 | b = np.multiply(a, 10) # This creates a temporary array before assigning it to b 1302 | 1303 | # More efficient 1304 | np.multiply(a, 10, out=b) # This modifies y directly! This also works for array views! 1305 | ``` 1306 | 1307 | 1308 | 1309 | ### 3.5 Locating Elements 1310 | [go to top](#top) 1311 | 1312 | 1313 | #### **Where** 1314 | 1315 | ```python 1316 | a = np.array([1, 2, 3, 4, 5]) 1317 | 1318 | b = np.where(a > 3) # (array([3, 4]),) It's the locations of the satisfied conditions! 1319 | ``` 1320 | 1321 | #### **Take** 1322 | 1323 | ```python 1324 | a.take(b) # array([[4, 5]]) 1325 | ``` 1326 | 1327 | **Where Cases** 1328 | 1329 | ```python 1330 | a = np.array([1, 2, 3, 4, 5]) 1331 | 1332 | b = np.where(a > 3, "NO", "YES") # array(['YES', 'YES', 'YES', 'NO', 'NO'], dtype=' 1352 | [go to top](#top) 1353 | 1354 | 1355 | #### **Reduce** 1356 | 1357 | ```python 1358 | x = np.array([1, 2, 3, 4]) 1359 | 1360 | np.add.reduce(x) # 10 (which is 1 + 2 + 3 + 4) 1361 | np.multiply.reduce(x) # 24 (which is 1 * 2 * 3 * 4) 1362 | ``` 1363 | 1364 | #### **Accumulate** 1365 | 1366 | Reduce, but show each step of the way! 1367 | 1368 | ```python 1369 | x = np.array([1, 2, 3, 4]) 1370 | 1371 | np.add.accumulate(x) # array([1, 3, 6, 10]) 1372 | np.multiply.accumulate(x) # array([1, 2, 6, 24]) 1373 | ``` 1374 | 1375 | #### **Cumsum** 1376 | 1377 | Cumulative sum 1378 | 1379 | ```python 1380 | x = np.array([1, 2, 3, 4]) 1381 | 1382 | # Equivalent 1383 | np.cumsum(x) 1384 | x.cumsum() 1385 | np.add.reduce(x) 1386 | ``` 1387 | 1388 | **Outer Product** 1389 | 1390 | The outer product of two vectors or matrices uv, is the matrix product of uv! 1391 | 1392 | ![outer_product](assets/583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg) 1393 | 1394 | Image source: https://en.wikipedia.org/wiki/Outer_product 1395 | 1396 | ```python 1397 | x = np.array([1, 2, 3, 4]) 1398 | 1399 | np.multiply.outer(x, x) 1400 | # Out: array([[ 1, 2, 3, 4], 1401 | # [ 2, 4, 6, 8], 1402 | # [ 3, 6, 9, 12], 1403 | # [ 4, 8, 12, 16]]) 1404 | ``` 1405 | 1406 | #### **Sum** 1407 | 1408 | ```python 1409 | np.sum(np.array([1, 2, 3, 4])) # 10 1410 | 1411 | # Beware! 1412 | np.sum(np.array([[1, 2, 3, 4], [1, 2]])) # [1, 2, 3, 4, 1, 2] 1413 | ``` 1414 | 1415 | **Min and Max** 1416 | 1417 | ```python 1418 | np.min(x) # Gives smallest element in array 1419 | np.max(x) # Gives largest element in array 1420 | 1421 | # You can specify the axis! 1422 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]], 1423 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]]) 1424 | 1425 | np.min(array, axis=0) 1426 | # Out: array([[1, 2, 3], 1427 | # [4, 5, 6], 1428 | # [7, 8, 9]]) 1429 | 1430 | np.min(array, axis=1) 1431 | # Out: array([[ 1, 2, 3], 1432 | # [10, 11, 12]]) 1433 | 1434 | np.min(array, axis=2) 1435 | # Out: array([[ 1, 4, 7], 1436 | # [10, 13, 16]]) 1437 | 1438 | # Same applies to max 1439 | ``` 1440 | 1441 | #### **Mean** 1442 | 1443 | ```python 1444 | np.mean(x) 1445 | ``` 1446 | 1447 | #### **Full List** 1448 | 1449 | | Function Name | NaN-safe Version | Description | 1450 | | --------------- | ------------------ | ----------------------------------------- | 1451 | | `np.sum` | `np.nansum` | Compute sum of elements | 1452 | | `np.prod` | `np.nanprod` | Compute product of elements | 1453 | | `np.mean` | `np.nanmean` | Compute mean of elements | 1454 | | `np.std` | `np.nanstd` | Compute standard deviation | 1455 | | `np.var` | `np.nanvar` | Compute variance | 1456 | | `np.min` | `np.nanmin` | Find minimum value | 1457 | | `np.max` | `np.nanmax` | Find maximum value | 1458 | | `np.argmin` | `np.nanargmin` | Find index of minimum value | 1459 | | `np.argmax` | `np.nanargmax` | Find index of maximum value | 1460 | | `np.median` | `np.nanmedian` | Compute median of elements | 1461 | | `np.percentile` | `np.nanpercentile` | Compute rank-based statistics of elements | 1462 | | `np.any` | N/A | Evaluate whether any elements are true | 1463 | | `np.all` | N/A | Evaluate whether all elements are true | 1464 | 1465 | **Note:** These are methods you can call on the array itself as well! 1466 | 1467 | ```python 1468 | x = np.array([1, 2, 3, 4]) 1469 | 1470 | # Equivalent! 1471 | np.sum(x) 1472 | x.sum() 1473 | 1474 | # It even works for the arguments! 1475 | high_dim_array.sum(axis = 2) # And so on! 1476 | ``` 1477 | 1478 | 1479 | 1480 | ### 3.7 Comparisons 1481 | [go to top](#top) 1482 | 1483 | 1484 | #### **Boolean Comparisons** 1485 | 1486 | ```python 1487 | a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 1488 | 1489 | b = a > 4 1490 | # Out: array([False, False, False, False, False, True, True, True, True, True]) 1491 | 1492 | # This works for all the conditional operators! 1493 | # == 1494 | # != 1495 | # > , >= 1496 | # < , <= 1497 | ``` 1498 | 1499 | #### **Maximum and Minimum** 1500 | 1501 | Note: It's **not** max and min! Those are single array! 1502 | 1503 | ```python 1504 | a = np.array([1, 2, 3, 4, 5]) 1505 | b = np.array([5, 4, 3, 2, 1]) 1506 | 1507 | np.maximum(a, b) # array([5, 4, 3, 4, 5]) 1508 | np.minimum(a, b) # array([1, 2, 3, 2, 1]) 1509 | ``` 1510 | 1511 | #### **Any and All** 1512 | 1513 | You can use Any and All too! 1514 | 1515 | ```python 1516 | np.any(x > 5) 1517 | np.all(x < 0) 1518 | ``` 1519 | 1520 | 1521 | 1522 | ### 3.8 Sorting Arrays 1523 | [go to top](#top) 1524 | 1525 | 1526 | The np sort is default quicksort, though mergesort and heapsort are also options. 1527 | 1528 | #### **Sort** 1529 | 1530 | ```python 1531 | x = np.array([2, 1, 4, 3, 5]) 1532 | 1533 | # Does not mutate x 1534 | np.sort(x) # array([1, 2, 3, 4, 5]) 1535 | 1536 | # Mutates x 1537 | x.sort() # array([1, 2, 3, 4, 5]) 1538 | 1539 | # Return indices of sorted elements instead 1540 | np.argsort(x) # array([1, 0, 3, 2, 4]) 1541 | ``` 1542 | 1543 | #### **Sort Along Axes** 1544 | 1545 | ```python 1546 | array = np.array([[[9, 2, 1], [4, 2, 6], [17, 8, 9]], 1547 | [[190, 11, 12], [13, 14, 115], [16, 17, 18]]]) 1548 | 1549 | np.sort(array, axis=0) 1550 | # Out: array([[[ 9, 2, 1], 1551 | # [ 4, 2, 6], 1552 | # [ 16, 8, 9]], 1553 | # 1554 | # [[190, 11, 12], 1555 | # [ 13, 14, 115], 1556 | # [ 17, 17, 18]]]) 1557 | 1558 | np.sort(array, axis = 1) 1559 | # Out: array([[[ 4, 2, 1], 1560 | # [ 9, 2, 6], 1561 | # [ 17, 8, 9]], 1562 | # 1563 | # [[ 13, 11, 12], 1564 | # [ 16, 14, 18], 1565 | # [190, 17, 115]]]) 1566 | ``` 1567 | 1568 | #### **Partial Sorts** 1569 | 1570 | ```python 1571 | x = np.array([7, 2, 3, 1, 6, 5, 4]) 1572 | 1573 | # First 3 are smallest, the rest are in arbitrary order 1574 | # This also works for the multiple axes like in the previous example 1575 | np.partition(x, 3, axis = 0) # array([2, 1, 3, 4, 6, 5, 7]) 1576 | ``` 1577 | 1578 | 1579 | 1580 | ### 3.9 Fancy Indexing 1581 | [go to top](#top) 1582 | 1583 | 1584 | We know how to index, and slice, and apply Boolean masks (conditional indexing). but we can pass arrays of indices too! 1585 | 1586 | ```python 1587 | x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) 1588 | 1589 | [x[3], x[4], x[8]] # [4, 5, 9] 1590 | 1591 | ind = [3, 4, 8] 1592 | x[ind] # array([4, 5, 9]) 1593 | 1594 | # This is particularly useful because fancy indexing allows you to RESHAPE the array! 1595 | ind = np.array([[3, 4], [8, 0]]) 1596 | x[ind] 1597 | # array([[4, 5], 1598 | # [9, 1]]) 1599 | 1600 | # You can also do it in multiple dimensions 1601 | x = np.array([[1, 2], 1602 | [3, 4]]) 1603 | 1604 | row = np.array([0, 1]) # Select [1, 2] or [3, 4] 1605 | col = np.array([0, 1]) # Select within those inner arrays 1606 | 1607 | x[row, col] # array([1, 4]) 1608 | 1609 | # Also works with broadcasting 1610 | x[row[:, np.newaxis], col] 1611 | # Out: array([[1, 2], 1612 | # [3, 4]]) 1613 | ``` 1614 | 1615 | #### **Combined Indexing** 1616 | 1617 | Combine fancy indexing with normal indexing! 1618 | 1619 | ```python 1620 | x = np.arange(12).reshape(3, 4) 1621 | # Out: array([[ 0, 1, 2, 3], 1622 | # [ 4, 5, 6, 7], 1623 | # [ 8, 9, 10, 11]]) 1624 | 1625 | x[2, [2, 0, 1]] # array([10, 8, 9]) 1626 | x[1:, [2, 0, 1]] 1627 | # Out: array([[6, 4, 5], 1628 | # [10, 8, 9]) 1629 | ``` 1630 | 1631 | 1632 | 1633 | ### 3.10 Structured Arrays 1634 | [go to top](#top) 1635 | 1636 | 1637 | Arrays of mixed type! 1638 | 1639 | Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html 1640 | 1641 | ```python 1642 | data = np.zeros(4, dtype={'names':('name', 'age', 'weight'), 1643 | 'formats':('U10', 'i4', 'f8')}) 1644 | 1645 | data.dtype # [('name', ' 1670 | 1671 | Matrices are strictly 2 dimensional ndarrays! 1672 | 1673 | You create them exactly the same. 1674 | 1675 | ```python 1676 | import numpy.matlib 1677 | 1678 | matlib.empty() 1679 | matlib.zeros() 1680 | matlib.ones() 1681 | matlib.eye() 1682 | matlib.identity() 1683 | matlib.rand() 1684 | 1685 | # You can even use 1686 | np.asmatrix(some_numpy_array) 1687 | 1688 | # Useful methods 1689 | .diagonal() # Get diagonal as an array 1690 | 1691 | # You can sort them, and do general ndarray stuff with them as well! 1692 | ``` 1693 | 1694 | ### 4.1 Linear Algebra Functions 1695 | [go to top](#top) 1696 | 1697 | 1698 | ```python 1699 | np.dot() # Get dot product of two arrays 1700 | np.vdot() # Get dot product of two vectors 1701 | 1702 | np.inner() # Get inner product of two arrays 1703 | np.matmul() # Matrix multiplication 1704 | 1705 | np.linalg.det() # Determinant 1706 | np.linalg.inv() # Find Inverse matrix 1707 | 1708 | np.linalg.solve() # Solve system of linear equations 1709 | ``` 1710 | 1711 | **Special Note: Dot Product and Multiply** 1712 | 1713 | There are shorthand operators for matrices! 1714 | 1715 | ```python 1716 | # Suppose we have two matrices A and B 1717 | 1718 | # np.dot(A, B) 1719 | A * B # Dot product, element wise multiplication 1720 | 1721 | # np.matmul(A, B) 1722 | A @ B # Matrix multiplication 1723 | ``` 1724 | 1725 | 1726 | 1727 | ## 5. Numpy I/O 1728 | 1729 | ### 5.1 Import from CSV 1730 | [go to top](#top) 1731 | 1732 | 1733 | https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.genfromtxt.html 1734 | 1735 | ```python 1736 | path = 'path_to_csv' 1737 | data = np.genfromtxt(path, 1738 | delimiter=',', 1739 | skip_header=1, # Number of lines to skip at beginning 1740 | filling_values=-999, # Value to use when data is missing 1741 | dtype='float') 1742 | 1743 | # If you set dtype as None, each row will be a Python tuple in the Array! 1744 | (18., 8, 307., 130, 3504, 12. , 70, 1, b'"some_string_stuff"') 1745 | ``` 1746 | 1747 | 1748 | 1749 | ### 5.2 Saving and Loading 1750 | [go to top](#top) 1751 | 1752 | 1753 | ```python 1754 | # Save One Array 1755 | np.save('data.npy', array) 1756 | 1757 | # Save Multiple Arrays 1758 | np.savez('data_mult.npz', a=array_a, b=array_b) 1759 | 1760 | # Load 1761 | single = np.load('data.npy') 1762 | mult = np.load('data.npz') 1763 | 1764 | a = mult['a'] 1765 | b = mult['b'] 1766 | ``` 1767 | 1768 | **Save and Load as txt** 1769 | 1770 | ```python 1771 | np.savetxt('out.txt', array) 1772 | 1773 | np.loadtxt('out.txt') 1774 | ``` 1775 | 1776 | 1777 | 1778 | ``` 1779 | . . 1780 | . |\-^-/| . 1781 | /| } O.=.O { |\ 1782 | ``` 1783 | 1784 | ​ 1785 | 1786 | ------ 1787 | 1788 | [![Yeah! Buy the DRAGON a COFFEE!](../assets/COFFEE%20BUTTON%20%E3%83%BE(%C2%B0%E2%88%87%C2%B0%5E).png)](https://www.buymeacoffee.com/methylDragon) 1789 | --------------------------------------------------------------------------------