├── .gitignore
├── Numpy
├── assets
│ ├── array.jpg
│ ├── kZNzz.png
│ ├── vceRQ.png
│ ├── Matrix.svg.png
│ ├── elsp_0105.png
│ ├── array_vs_list.png
│ └── 583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg
└── 01 Numpy Basics.md
├── Pandas
├── assets
│ ├── hMKKt.jpg
│ ├── structure_table.jpg
│ ├── structure_table-1557216961120.jpg
│ └── series-and-dataframe.width-1200.png
└── 01 Pandas Basics.md
├── assets
└── COFFEE BUTTON ヾ(°∇°^).png
├── README.md
└── LICENSE
/.gitignore:
--------------------------------------------------------------------------------
1 |
2 | *.no_toc
3 |
--------------------------------------------------------------------------------
/Numpy/assets/array.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/array.jpg
--------------------------------------------------------------------------------
/Numpy/assets/kZNzz.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/kZNzz.png
--------------------------------------------------------------------------------
/Numpy/assets/vceRQ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/vceRQ.png
--------------------------------------------------------------------------------
/Pandas/assets/hMKKt.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/hMKKt.jpg
--------------------------------------------------------------------------------
/Numpy/assets/Matrix.svg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/Matrix.svg.png
--------------------------------------------------------------------------------
/Numpy/assets/elsp_0105.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/elsp_0105.png
--------------------------------------------------------------------------------
/Numpy/assets/array_vs_list.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Numpy/assets/array_vs_list.png
--------------------------------------------------------------------------------
/assets/COFFEE BUTTON ヾ(°∇°^).png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/assets/COFFEE BUTTON ヾ(°∇°^).png
--------------------------------------------------------------------------------
/Pandas/assets/structure_table.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/structure_table.jpg
--------------------------------------------------------------------------------
/Pandas/assets/structure_table-1557216961120.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/structure_table-1557216961120.jpg
--------------------------------------------------------------------------------
/Pandas/assets/series-and-dataframe.width-1200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/python-data-tools-reference/master/Pandas/assets/series-and-dataframe.width-1200.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # python-data-tools-reference
2 | A reference of frameworks and tools for data and ML in Python
3 |
4 |
5 |
6 | A one-stop collection of code references, snippets, and references for some of the most widely used tools and frameworks for data manipulation and ML in Python.
7 |
8 | I'm building these as I'm learning or using them, so they won't be comprehensive, but maybe they'll be of use!
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 methylDragon
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/Numpy/assets/583d2f9f02f2644aa0acd092a29a9d0e49df1b4a.svg:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/Pandas/01 Pandas Basics.md:
--------------------------------------------------------------------------------
1 | # Pandas Basics
2 |
3 | Author: methylDragon
4 | Contains a syntax reference and code snippets for Pandas!
5 | It's a collection of code snippets and tutorials from everywhere all mashed together!
6 |
7 | ------
8 |
9 | ## Pre-Requisites
10 |
11 | ### Required
12 |
13 | - Python knowledge, this isn't a tutorial!
14 | - Pandas installed
15 |
16 | - I'll assume you've already run these lines as well
17 |
18 | ```python
19 | import numpy as np
20 | import pandas as pd
21 | ```
22 |
23 |
24 |
25 | ## Table Of Contents
26 |
27 | 1. [Introduction](#1)
28 | 2. [Pandas Basics](#2)
29 | 2.1 [Data Types](#2.1)
30 | 2.2 [Series Basics](#2.2)
31 | 2.3 [DataFrame Basics](#2.3)
32 | 2.4 [Panel Basics](#2.4)
33 | 2.5 [Catagorical Data](#2.5)
34 | 2.6 [Basic Binary Operations](#2.6)
35 | 2.7 [Casting and Conversion](#2.7)
36 | 2.8 [Conditional Indexing](#2.8)
37 | 2.9 [IO](#2.9)
38 | 2.10 [Plotting](#2.10)
39 | 2.11 [Sparse Data](#2.11)
40 | 3. [Series Operations](#3)
41 | 3.1 [Manipulating Series Text](#3.1)
42 | 3.2 [Time Series](#3.2)
43 | 3.3 [Time Deltas](#3.3)
44 | 4. [DataFrame Operations](#4)
45 | 4.1 [Preface](#4.1)
46 | 4.2 [Iterating Through DataFrames](#4.2)
47 | 4.3 [Sorting, Reindexing, and Renaming DataFrame Values](#4.3)
48 | 4.4 [Replacing DataFrame Values](#4.4)
49 | 4.5 [Function Application on DataFrames](#4.5)
50 | 4.6 [Descriptive Statistics](#4.6)
51 | 4.7 [Statistical Methods](#4.7)
52 | 4.8 [Window Functions](#4.8)
53 | 4.9 [Data Aggregation](#4.9)
54 | 4.10 [Dealing with Missing Data](#4.10)
55 | 4.11 [GroupBy Operations](#4.11)
56 | 4.12 [Merging and Joining](#4.12)
57 | 4.13 [Concatenation](#4.13)
58 | 5. [EXTRA: Helpful Notes](#5)
59 |
60 |
61 |
62 |
63 | ## 1. Introduction
64 |
65 | > *pandas* is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the [Python](https://www.python.org/) programming language.
66 | >
67 | > *pandas* is a [NumFOCUS](https://www.numfocus.org/open-source-projects.html) sponsored project. This will help ensure the success of development of *pandas* as a world-class open-source project, and makes it possible to [donate](https://pandas.pydata.org/donate.html) to the project.
68 | >
69 | >
70 |
71 | This document will list the most commonly used functions in Pandas, to serve as a reference when using it.
72 |
73 | It's not meant to be exhaustive, merely acting as a quick reference for the syntax for basic operations with Pandas. Please do not hesitate to consult the [official documentation](
78 |
79 | - Fast and efficient DataFrame object with default and customized indexing.
80 | - Tools for loading data into in-memory data objects from different file formats.
81 | - Data alignment and integrated handling of missing data.
82 | - Reshaping and pivoting of date sets.
83 | - Label-based slicing, indexing and subsetting of large data sets.
84 | - Columns from a data structure can be deleted or inserted.
85 | - Group by data for aggregation and transformations.
86 | - High performance merging and joining of data.
87 | - Time Series functionality.
88 |
89 | ---
90 |
91 | Install it!
92 |
93 | ```shell
94 | # Best to use conda
95 | $ conda install pandas
96 |
97 | # But it's possible to use the PyPI wheels as well
98 | $ pip install pandas
99 | ```
100 |
101 | You might also need to install additional dependencies
102 |
103 | ```shell
104 | $ sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook
105 | python-pandas python-sympy python-nose
106 | ```
107 |
108 |
109 |
110 | If you need additional help or need a refresher on the parameters, feel free to use:
111 |
112 | ```python
113 | help(pd.FUNCTION_YOU_NEED_HELP_WITH)
114 | ```
115 |
116 | ---
117 |
118 | **Credits:**
119 |
120 | A lot of these notes I'm adapting from
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 | ## 2. Pandas Basics
131 |
132 | ### 2.1 Data Types
133 | [go to top](#top)
134 |
135 |
136 | Note that Pandas is built on top of Numpy.
137 |
138 | There are three types of data structures that Pandas deals with:
139 |
140 | - Series
141 | - 1D labelled homogeneous array, size-immutable
142 | - If heterogeneous data is entered, the data-type will become 'object'
143 | - DataFrame
144 | - Contains series data
145 | - 2D labelled, size-mutable, table structure
146 | - Potentially heterogeneous columns
147 | - Panel
148 | - Contains DataFrames
149 | - 3D labelled, size-mutable array
150 |
151 | **The major focus of this syntax reference will deal with DataFrames**. Since they're the most commonly manipulated objects when Pandas is concerned.
152 |
153 |
154 |
155 | ### 2.2 Series Basics
156 | [go to top](#top)
157 |
158 |
159 | > A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.
160 | >
161 | >
162 |
163 | 
164 |
165 | [Image Source]()
166 |
167 | #### **Creating Series Objects**
168 |
169 | ```python
170 | # Empty Series
171 | s = pd.Series()
172 |
173 | # Series from ndarray
174 | s = pd.Series(np.array([1, 2, 3]))
175 | s = pd.Series(np.array([1, 2, 3]), index=[100, 101, 102]) # With custom indexing
176 |
177 | # Series from Dict
178 | # Dictionary keys are used to construct the index
179 | s = pd.Series({'a': 0, 'b': 1, 'c': 2})
180 |
181 | # Series from scalar
182 | s = pd.Series(5, index=[0, 1, 2]) # Creates 3 rows of value 5
183 | ```
184 |
185 | #### **Accessing Values**
186 |
187 | ```python
188 | # By position
189 | s[0]
190 |
191 | # By index
192 | s['index_name']
193 |
194 | # By slice
195 | s[-3:] # Retrieves last 3 elements
196 |
197 | # Fancy indexing works also!
198 | s[[0, 1, 2]]
199 | s[['index_1', 'index_2', 'index_3']]
200 |
201 | # Head and Tail
202 | s.head()
203 | s.tail()
204 | s.head(5) # First 5
205 | s.tail(5) # Last 5
206 | ```
207 |
208 | #### **Series Properties**
209 |
210 | ```python
211 | s.axes # Returns list of row axis labels
212 | s.dtype # Returns data type of entries
213 | s.empty # True if series is empty
214 | s.ndim # Dimension. 1 for series
215 | s.size # Number of elements
216 | s.values # Returns the Series as an ndarray
217 | ```
218 |
219 |
220 |
221 | ### 2.3 DataFrame Basics
222 | [go to top](#top)
223 |
224 |
225 | > A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
226 | >
227 | >
228 |
229 | 
230 |
231 | Image Source:
232 |
233 | #### **Creating DataFrame Objects**
234 |
235 | ```python
236 | # Empty DataFrame
237 | df = pd.DataFrame()
238 |
239 | # DataFrame from List
240 | df = pd.DataFrame([1, 2, 3, 4, 5]) # Single Column
241 | df = pd.DataFrame([['a', 1], ['b', 2]], columns=['name_1', 'name_2']) # Multi columns
242 | df = pd.DataFrame([1, 2, 3], dtype=float) # Convert the ints to floats
243 |
244 | # DataFrame from Series
245 | df = s.to_frame()
246 |
247 | # DataFrame from Dict of Lists
248 | df = pd.DataFrame({'Name':['methylDragon', 'toothless', 'smaug'], 'Rating': [10, 5, 2]})
249 |
250 | # DataFrame from List of Dicts
251 | df = pd.DataFrame([{'Name': 'methylDragon', 'Rating': 10},
252 | {'Name': 'toothless', 'Rating': 5},
253 | {'Name': 'smaug'}]) # NaN will be appended for missing values
254 |
255 | # DataFrame from Dict of Series
256 | # Similarly, NaN will be added for missing values
257 | df = pd.DataFrame({'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
258 | 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])})
259 |
260 | # Creating with Non-Default Index
261 | df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'])
262 | ```
263 |
264 | #### **Important Note on Mutability**
265 |
266 | **NOTE:** Most operations will **not** change the original DataFrame unless the DataFrame is **reassigned**, or you use an `inplace=True` flag, which changes the DataFrame in question in place.
267 |
268 | #### **Basic Operations**
269 |
270 | **Column**
271 |
272 | ```python
273 | # Column Selection
274 | df['column_name']
275 | df.column_name # This also works! (Only if the column name is a string though..)
276 |
277 | # Column Selection by dtype
278 | df.select_dtypes(include=[dtypes])
279 |
280 | # Adding a new Column
281 | df['new_column_name'] = pd.Series([1, 2, 3])
282 |
283 | # Deleting a Column (Either one works)
284 | del df['column_name']
285 | df.pop(['column_name'])
286 |
287 | # Math for Columns
288 | df['column_1'] + df['column_2'] # Gives you a new column that is the addition of the first two
289 | ```
290 |
291 | **Row**
292 |
293 | ```python
294 | # Row Selection by Label
295 | df.loc['row_lable/index']
296 |
297 | # Row Selection by Position Index
298 | df.iloc[0] # Selects first row
299 |
300 | # Row Slicing
301 | df[-3:]
302 |
303 | # Adding Rows
304 | df.append(df2)
305 | df.append(df2, ignore_index=True) # To ignore indices
306 |
307 | # Deleting Rows
308 | df.drop('label_to_drop')
309 |
310 | # Deleting rows with None/NaN/empty values
311 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
312 | df.dropna(axis=0, how='any') # Drop rows with any column containing None
313 | df.dropna(axis=0, how='all') # Drop rows with all columns containing None
314 | df.dropna(axis=0, thresh=2) # Drop rows with 2 or more columns containing None
315 |
316 | # Head and Tail
317 | df.head()
318 | df.tail()
319 | df.head(5) # First 5 rows
320 | df.tail(5) # Last 5 rows
321 | ```
322 |
323 | #### **DataFrame Properties**
324 |
325 | ```python
326 | df.T # Transpose
327 | df.axes # Row axis and column axis labels
328 | df.dtypes # Data types of elements
329 | df.empty # True if empty
330 | df.ndim # Dimension (number of axes)
331 | df.shape # Tuple representing the shape (dimensionality) of the DataFrame
332 | df.size # Number of elements
333 | df.values # Numpy represendation, NDFrame
334 | ```
335 |
336 |
337 |
338 | ### 2.4 Panel Basics
339 | [go to top](#top)
340 |
341 |
342 | > A **panel** is a 3D container of data. The term **Panel data** is derived from econometrics and is partially responsible for the name pandas − **pan(el)-da(ta)**-s.
343 | >
344 | > The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data. They are −
345 | >
346 | > - **items** − axis 0, each item corresponds to a DataFrame contained inside.
347 | > - **major_axis** − axis 1, it is the index (rows) of each of the DataFrames.
348 | > - **minor_axis** − axis 2, it is the columns of each of the DataFrames.
349 | >
350 | >
351 |
352 | #### **Creating Panel Objects**
353 |
354 | ```python
355 | # Empty Panel
356 | p = pd.Panel()
357 |
358 | # Panel from 3D ndarray
359 | p = pd.Panel(np.random.rand(2, 4, 5))
360 |
361 | # Panel from dict of DataFrames
362 | data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
363 | 'Item2' : pd.DataFrame(np.random.randn(4, 2))}
364 | p = pd.Panel(data)
365 | ```
366 |
367 | #### **Accessing Values**
368 |
369 | ```python
370 | # By Item
371 | p['Item1'] # Gives you the corresponding dataframe
372 |
373 | # By Major Axis
374 | p.major_xs(1) # Shows all data from the second row across all dataframes
375 |
376 | '''
377 | Eg: If the panel's first item is as such:
378 | 0 1 2
379 | 0 0.488224 -0.128637 0.930817
380 | >> 1 0.417497 0.896681 0.576657 <<
381 | 2 -2.775266 0.571668 0.290082
382 | 3 -0.400538 -0.144234 1.110535
383 |
384 | Then the Output of p.major_xs(1) is:
385 | Item1
386 | 0 0.417497
387 | 1 0.896681
388 | 2 0.576657
389 |
390 | It's a transpose of the second row's elements (of the original DataFrame)!
391 | '''
392 |
393 | # By Minor Axis
394 | p.minor_xs(1)
395 |
396 | '''
397 | Eg: Same deal as above, same first item
398 |
399 | Output of p.minor_xs(1) are the items under the second column (of the original DataFrame)!
400 |
401 | Item1
402 | 0 -0.128637
403 | 1 0.896681
404 | 2 0.571668
405 | 3 -0.144234
406 | '''
407 | ```
408 |
409 |
410 |
411 | ### 2.5 Catagorical Data
412 | [go to top](#top)
413 |
414 |
415 | So imagine you have data that's made of a limited number of actual values
416 |
417 | Eg: [1, 1, 1, 3, 2, 3, 2, 1, 2, 3, 1]
418 |
419 | There's a way to encode the fact that there are only three kinds of values - Catagories!
420 |
421 | #### **Construct Catagorical Data**
422 |
423 | ```python
424 | # Source: https://www.tutorialspoint.com/python_pandas/python_pandas_categorical_data.htm
425 |
426 | s = pd.Series(["a","b","c","a"], dtype="category")
427 | '''
428 | Output
429 |
430 | 0 a
431 | 1 b
432 | 2 c
433 | 3 a
434 | dtype: category
435 | Categories (3, object): [a, b, c]
436 | '''
437 |
438 | # Generate just a list-like object with catagories
439 | cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
440 | # [a, b, c, a, b, c]
441 | # Categories (3, object): [a, b, c]
442 |
443 | # Or do it with stated catagories!
444 | cat = pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])
445 | # [a, b, c, a, b, c, NaN]
446 | # Categories (3, object): [c, b, a]
447 |
448 | # Specify catagories with ordered catagories
449 | # This one implies c < b < a
450 | cat = pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'],ordered=True)
451 | ```
452 |
453 | #### **Properties and Altering Catagories**
454 |
455 | ```python
456 | df.describe() # For general
457 | s.categories() # Find catagories
458 | s.ordered() #
459 | s.cat.categories() # Use this to edit the categories
460 |
461 | # Add catagories
462 | s = s.cat.add_categories([4])
463 |
464 | # Remove catagories
465 | s.cat.remove_categories("a")
466 |
467 | # Compare catagories
468 | # You may compare catagorical data, aligned by category
469 | cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)
470 | cat1 = pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True)
471 |
472 | cat > cat1
473 | '''
474 | Output
475 |
476 | 0 False
477 | 1 False
478 | 2 True
479 | dtype: bool
480 | '''
481 | ```
482 |
483 |
484 |
485 | ### 2.6 Basic Binary Operations
486 | [go to top](#top)
487 |
488 |
489 | https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html#pandas.DataFrame.add
490 |
491 | #### **Arithmetic**
492 |
493 | ```python
494 | df.add(other)
495 | df.sub(other)
496 | df.mul(other)
497 | df.div(other)
498 | df.truediv(other) # For floats
499 | df.floordiv(other) # For integers
500 | df.mod(other)
501 | df.pow(other)
502 | df.divmod(other) # Returns tuple of (quotient, remainder)
503 |
504 | df.radd(other) # Reverse
505 | df.rsub(other) # Reverse
506 |
507 | # You may specify fill-values for missing values too!
508 | df.add(other, fill_value=0)
509 | ```
510 |
511 | #### **Boolean Reductions**
512 |
513 | ```python
514 | (df > 0).all()
515 |
516 | # empty, any, all, bool all work.
517 |
518 | # You can also do comparisons! (Eg. ==, >, etc.)
519 | ```
520 |
521 |
522 |
523 | ### 2.7 Casting and Conversion
524 | [go to top](#top)
525 |
526 |
527 | ```python
528 | # Casting object to dtype
529 | df.astype(dtype)
530 | df.astype(dtype, copy=False) # Do not return a copy
531 |
532 | # Attempt to infer better dtype for object columns
533 | df.convert_objects(convert_dates=True) # Unconvertibles become NaT
534 | df.convert_objects(convert_numeric=True) # Unconvertibles become NaN
535 | ```
536 |
537 |
538 |
539 | ### 2.8 Conditional Indexing
540 | [go to top](#top)
541 |
542 |
543 | So you remember that fancy indexing works?
544 |
545 | ```python
546 | # Now you can do it with conditions too!
547 | df[df > 0]
548 | df.where(df > 0)
549 | ```
550 |
551 |
552 |
553 | ### 2.9 IO
554 | [go to top](#top)
555 |
556 |
557 |
558 |
559 | | Format Type | Data Description | Reader | Writer |
560 | | :---------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- |
561 | | text | [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) | [read_csv](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-read-csv-table) | [to_csv](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-store-in-csv) |
562 | | text | [JSON](http://www.json.org/) | [read_json](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-json-reader) | [to_json](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-json-writer) |
563 | | text | [HTML](https://en.wikipedia.org/wiki/HTML) | [read_html](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-read-html) | [to_html](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-html) |
564 | | text | Local clipboard | [read_clipboard](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-clipboard) | [to_clipboard](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-clipboard) |
565 | | binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [read_excel](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-excel-reader) | [to_excel](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-excel-writer) |
566 | | binary | [HDF5 Format](https://support.hdfgroup.org/HDF5/whatishdf5.html) | [read_hdf](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-hdf5) | [to_hdf](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-hdf5) |
567 | | binary | [Feather Format](https://github.com/wesm/feather) | [read_feather](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-feather) | [to_feather](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-feather) |
568 | | binary | [Msgpack](http://msgpack.org/index.html) | [read_msgpack](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-msgpack) | [to_msgpack](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-msgpack) |
569 | | binary | [Stata](https://en.wikipedia.org/wiki/Stata) | [read_stata](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-stata-reader) | [to_stata](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-stata-writer) |
570 | | binary | [SAS](https://en.wikipedia.org/wiki/SAS_(software)) | [read_sas](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sas-reader) | |
571 | | binary | [Python Pickle Format](https://docs.python.org/3/library/pickle.html) | [read_pickle](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-pickle) | [to_pickle](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-pickle) |
572 | | SQL | [SQL](https://en.wikipedia.org/wiki/SQL) | [read_sql](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sql) | [to_sql](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-sql) |
573 | | SQL | [Google Big Query](https://en.wikipedia.org/wiki/BigQuery) | [read_gbq](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-bigquery) | [to_gbq](https://pandas.pydata.org/pandas-docs/version/0.20/io.html#io-bigquery) |
574 |
575 | ```python
576 | # Custom Indexing
577 | pd.read_csv("file", index_col=['index_col_name'])
578 |
579 | # With converted datatypes
580 | pd.read_csv("file", dtype={'col': dtype})
581 |
582 | # Column names
583 | pd.read_csv("file", names=['1', 'b', 'etc'])
584 |
585 | # Skip rows
586 | pd.read_csv("file", skiprows=2)
587 | ```
588 |
589 |
590 |
591 | ### 2.10 Plotting
592 | [go to top](#top)
593 |
594 |
595 | Source:
596 |
597 | ```python
598 | df.plot() # Line plot
599 | df.plot.bar() # Bar chart
600 | df.plot.bar(stacked=True) # Stacaked bar chart
601 | df.plot.barh() # Horizontal bar chart
602 | df.plot.barh(stacked=True) # Horizontal stacked bar chart
603 | df.plot.hist(bins=20) # Plot histogram
604 | df.diff.hist(bins=30) # Plot different histograms for each column
605 | df.plot.box() # Bot plot
606 | df.plot.area() # Area plot
607 | df.plot.scatter(x='a', y='b') # Scatter plot
608 | df.plot.pie(subplots=True) # Pit plot
609 | ```
610 |
611 |
612 |
613 | ### 2.11 Sparse Data
614 | [go to top](#top)
615 |
616 |
617 | You can sparsify data to save on space on Disk or in the interpretor memory!
618 |
619 | ```python
620 | # Sparsify
621 | sparse_obj = obj.to_sparse() # Default sparsifies NaN/missing
622 | sparse_obj = obj.to_sparse(fill_value=0) # Sparsify target value
623 |
624 | # Convert back
625 | sparse_obj.to_dense()
626 |
627 | # Properties
628 | sparse_obj.density
629 | ```
630 |
631 |
632 |
633 | ## 3. Series Operations
634 |
635 | ### 3.1 Manipulating Series Text
636 | [go to top](#top)
637 |
638 |
639 | Source:
640 |
641 | | 1 | **lower()**Converts strings in the Series/Index to lower case. |
642 | | ---- | ------------------------------------------------------------ |
643 | | 2 | **upper()**Converts strings in the Series/Index to upper case. |
644 | | 3 | **len()**Computes String length(). |
645 | | 4 | **strip()**Helps strip whitespace(including newline) from each string in the Series/index from both the sides. |
646 | | 5 | **split(' ')**Splits each string with the given pattern. |
647 | | 6 | **cat(sep=' ')**Concatenates the series/index elements with given separator. |
648 | | 7 | **get_dummies()**Returns the DataFrame with One-Hot Encoded values. |
649 | | 8 | **contains(pattern)**Returns a Boolean value True for each element if the substring contains in the element, else False. |
650 | | 9 | **replace(a,b)**Replaces the value **a** with the value **b**. |
651 | | 10 | **repeat(value)**Repeats each element with specified number of times. |
652 | | 11 | **count(pattern)**Returns count of appearance of pattern in each element. |
653 | | 12 | **startswith(pattern)**Returns true if the element in the Series/Index starts with the pattern. |
654 | | 13 | **endswith(pattern)**Returns true if the element in the Series/Index ends with the pattern. |
655 | | 14 | **find(pattern)**Returns the first position of the first occurrence of the pattern. |
656 | | 15 | **findall(pattern)**Returns a list of all occurrence of the pattern. |
657 | | 16 | **swapcase**Swaps the case lower/upper. |
658 | | 17 | **islower()**Checks whether all characters in each string in the Series/Index in lower case or not. Returns Boolean |
659 | | 18 | **isupper()**Checks whether all characters in each string in the Series/Index in upper case or not. Returns Boolean. |
660 | | 19 | **isnumeric()**Checks whether all characters in each string in the Series/Index are numeric. Returns Boolean. |
661 |
662 | #### **Example**
663 |
664 | ```python
665 | s.str.lower()
666 | ```
667 |
668 |
669 |
670 | ### 3.2 Time Series
671 | [go to top](#top)
672 |
673 |
674 | ```python
675 | # Get Current Time
676 | pd.datetime.now() # Get current time
677 |
678 | # Get Time from Timestamp
679 | pd.Timestamp('2019-03-01')
680 | pd.Timestamp(1587687575, unit='s')
681 |
682 | # Get a date range
683 | pd.date_range("11:00", "13:30", freq="H").time
684 | pd.date_range("11:00", "13:30", freq="30min").time # Different frequency
685 | # Output:
686 | # [datetime.time(11, 0) datetime.time(11, 30) datetime.time(12, 0)
687 | # datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30)]
688 |
689 | # Convert Time Series to Timestamps
690 | pd.to_datetime(SOME_DATETIME_SERIES)
691 | ```
692 |
693 |
694 |
695 | ### 3.3 Time Deltas
696 | [go to top](#top)
697 |
698 |
699 | These are almost exactly like the datetime library's timedelta objects.
700 |
701 | ```python
702 | pd.Timedelta(6, unit='h')
703 | pd.Timedelta(days=-2)
704 | pd.Timedelta('2 days 2 hours 15 minutes 30 seconds') # Or even from a string!
705 |
706 | # Or from a series
707 | pd.to_timedelta(s)
708 | ```
709 |
710 |
711 |
712 | ## 4. DataFrame Operations
713 |
714 | ### 4.1 Preface
715 | [go to top](#top)
716 |
717 |
718 | Even though this section is supposed to be focused on DataFrames, a lot of these operations can be applied to Series and Panel objects as well! It's just that a large part of using Pandas is working with DataFrames
719 |
720 | To get at least some brief understanding of your data you can
721 |
722 | ```python
723 | # Look at the first few rows of data
724 | df.head()
725 |
726 | # Look at essential details (like dimensions, data types, etc.)
727 | df.info()
728 | ```
729 |
730 |
731 |
732 | ### 4.2 Iterating Through DataFrames
733 | [go to top](#top)
734 |
735 |
736 | ```python
737 | df.iteritems() # (key, value) pairs (Get by columns)
738 | df.iterrows() # (index, series) pairs (Get by rows)
739 | df.itertuples() # Iterate over rows as named tuples
740 | ```
741 |
742 |
743 |
744 | ### 4.3 Sorting, Reindexing, and Renaming DataFrame Values
745 | [go to top](#top)
746 |
747 |
748 | ```python
749 | # Sort by Values
750 | df.sort_values('column_name', inplace=True) # Sort by values in column
751 |
752 | # Sort by Index
753 | df.sort_index(ascending=False) # Default is ascending=True
754 | df.sort_index(axis=1) # Sort by column index
755 |
756 | # Reset Index
757 | df.reset_index(inplace=True, drop=True) # Reset index, skip inserting old index as a column
758 |
759 | # Rename Columns
760 | df.rename(columns=newcol_names, inplace=True)
761 |
762 | # Rename Index
763 | df.rename(index={'index_element_1': 'new_name'})
764 |
765 | # Reindex
766 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html
767 | df.reindex(index=[1, 2, 3], columns=[1, 2, 3])
768 |
769 | # Reindex to match another dataframe
770 | df.reindex_like(df2)
771 | df.reindex_like(df2, method="ffill") # Fill missing values
772 | # pad/ffill: Forward fill
773 | # bfill/backfill: Backward fill
774 | # nearest: Nearest index value fill
775 | ```
776 |
777 |
778 |
779 | ### 4.4 Replacing DataFrame Values
780 | [go to top](#top)
781 |
782 |
783 | ```python
784 | # Replace strings with numbers
785 | df.replace(['Awful', 'Poor', 'OK', 'Acceptable', 'Perfect'], [0, 1, 2, 3, 4])
786 |
787 | # Replace using regex
788 | df.replace({'\n': '
'}, regex=True)
789 |
790 | # Removing Substrings
791 | df['column_name'] = df['column_name'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
792 | ```
793 |
794 |
795 |
796 | ### 4.5 Function Application on DataFrames
797 | [go to top](#top)
798 |
799 |
800 | ```python
801 | # Apply function to all values in a scope
802 | df['column_name'].apply(function_name)
803 |
804 | # Apply function to all values in DataFrame
805 | df.applymap(function_name)
806 | ```
807 |
808 |
809 |
810 | ### 4.6 Descriptive Statistics
811 | [go to top](#top)
812 |
813 |
814 | You can do a bunch of basic statistical calculations on the rows of a DataFrame!
815 |
816 | ```python
817 | # Sum along axis
818 | # axis=0 : Along columns
819 | # axis=1 : Along rows
820 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html
821 | df.sum() # Default axis is 0
822 | df.sum(axis=1)
823 | df.sum(axis=0, skipna=True, numeric_only=True, min_count=0)
824 |
825 | # Even more!
826 | df.count() # Number of non-null observations
827 | df.mean() # Mean of Values
828 | df.median() # Median of Values
829 | df.mode() # Mode of values
830 | df.std() # Standard Deviation of the Values
831 | df.min() # Minimum Value
832 | df.max() # Maximum Value
833 | df.abs() # Absolute Value
834 | df.prod() # Product of Values
835 | df.cumsum() # Cumulative Sum
836 | df.cumprod() # Cumulative Product
837 |
838 | # Or just call all of them at once!
839 | df.describe()
840 | ```
841 |
842 |
843 |
844 | ### 4.7 Statistical Methods
845 | [go to top](#top)
846 |
847 |
848 | ```python
849 | # Calculate percentage change
850 | df.pct_change() # Column wise
851 | df.pct_change(axis=1) # Row wise
852 |
853 | # Covariance
854 | s.cov(s2) # For series
855 | df.cov() # For frame (calculates covariance between all columns)
856 |
857 | # Correlation
858 | df.corr() # For frames
859 | df['col_1'].corr(df['col_2']) # For series
860 |
861 | # Data Ranking (Series)
862 | # https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.rank.html
863 | # Check the docs for tie-breaking methods
864 | # average, min, max, first (Default method='average')
865 | s.rank()
866 | ```
867 |
868 |
869 |
870 | ### 4.8 Window Functions
871 | [go to top](#top)
872 |
873 |
874 | ```python
875 | # Rolling Window
876 | df_rolling = df.rolling(window=3)
877 |
878 | # Now you can use the window!
879 | # You may use all the descriptive stats and statistical methods
880 | df_rolling.sum()
881 | df_rolling.mean()
882 | df_rolling.median()
883 | df_rolling.std()
884 | # and so on...
885 |
886 | # Expanding Window
887 | # (Yields the value of the statistic with all the data available up to that point in time)
888 | df_expanding = df.expanding(min_periods=1)
889 |
890 | # Exponential Weighted Functions
891 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html
892 | # You can specify decay, half-life, etc. Check the docs!
893 | df.ewm()
894 | ```
895 |
896 |
897 |
898 | ### 4.9 Data Aggregation
899 | [go to top](#top)
900 |
901 |
902 | ```python
903 | # Basically custom operations on windows!
904 | df_rolling.aggregate(FUNCTION) # On Whole DF
905 | df_rolling['col'].aggregate(FUNCTION) # On Single Column
906 | df_rolling[['col', 'col2']].aggregate(FUNCTION) # On Multiple Columns
907 |
908 | # Multiple functions (You'll get two columns as output)
909 | df_rolling.aggregate([FUNCTION_1, FUNCTION_2])
910 |
911 | # Multiple functions, on different columns
912 | df_rolling.aggregate({'col_1': FUNCTION_1, 'col_2': FUNCTION_2})
913 |
914 | # If you don't run it on a rolling window, it reduces the dimensionality of the data!
915 | df.aggregate(np.sum) # Sums the entire column
916 | ```
917 |
918 |
919 |
920 | ### 4.10 Dealing with Missing Data
921 | [go to top](#top)
922 |
923 |
924 | Null values can be NA, NaN, NaT, or None.
925 |
926 | - NaN: Not a Number
927 | - NaT: Not a Time
928 |
929 | ```python
930 | # Detect Missing Values
931 | df.isnull() # Gives True if value is null
932 | df.notnull() # Gives True if value is not null
933 |
934 | # Filling Missing Data With Scalar
935 | df.fillna(scalar_number)
936 |
937 | # Filling Missing Data
938 | # pad/fill: Fills forward
939 | # bfill/backfill: Fills backwards
940 | df.fillna(method='pad')
941 |
942 | # Drop Missing Values
943 | # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
944 | df.dropna(axis=0, how='any') # Drop rows with any column containing None
945 | df.dropna(axis=0, how='all') # Drop rows with all columns containing None
946 | df.dropna(axis=0, thresh=2) # Drop rows with 2 or more columns containing None
947 | ```
948 |
949 |
950 |
951 | ### 4.11 GroupBy Operations
952 | [go to top](#top)
953 |
954 |
955 | Source:
956 |
957 | You can group data within your DataFrames in order to:
958 |
959 | - Split the DF
960 | - Apply a function to the DF
961 | - Aggregation
962 | - Transformation
963 | - Filtration
964 | - Combine certain results
965 |
966 | ```python
967 | # Group Data
968 | df_grouped = df.groupby('key') # By column
969 | df_grouped = df.groupby('key', axis=1) # By row
970 | df_grouped = df.groupby(['col_1', 'col_2']) # Multi-Column Group
971 |
972 | # View the groups!
973 | df_grouped.groups
974 |
975 | # You can iterate through grouped dfs as well!
976 | for i in df_grouped:
977 | pass
978 |
979 | # Select a Single Group
980 | df_grouped.get_group('group_name')
981 |
982 | # Apply Aggregations
983 | df_grouped.agg(function)
984 | df_grouped.agg([function_1, function_2])
985 |
986 | # Apply Transformations
987 | # Transforms groups or columns inside the dataframe
988 | transformation_function = lambda x: (x - x.mean()) / x.std()*10
989 | df_grouped.transform(transformation_function)
990 |
991 | # Apply Filters
992 | # Works like the native Python filter(filtering_function, iterable) !
993 | df_grouped.filter(filtering_function)
994 | df_grouped.filter(lambda x: len(x) > = 3)
995 | ```
996 |
997 |
998 |
999 | ### 4.12 Merging and Joining
1000 | [go to top](#top)
1001 |
1002 |
1003 | > Pandas provides a single function, **merge**, as the entry point for all standard database join operations between DataFrame objects −
1004 | >
1005 | > `pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)`
1006 | >
1007 | > Here, we have used the following parameters −
1008 | >
1009 | > - **left** − A DataFrame object.
1010 | > - **right** − Another DataFrame object.
1011 | > - **on** − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
1012 | > - **left_on** − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
1013 | > - **right_on** − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
1014 | > - **left_index** − If **True,** use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.
1015 | > - **right_index** − Same usage as **left_index** for the right DataFrame.
1016 | > - **how** − One of 'left', 'right', 'outer', 'inner'. Defaults to inner. Each method has been described below.
1017 | > - **sort** − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.
1018 | >
1019 | >
1020 |
1021 | ```python
1022 | # Code source: https://www.tutorialspoint.com/python_pandas/python_pandas_merging_joining.htm
1023 |
1024 | # Merge two DFs via key
1025 | left = pd.DataFrame({
1026 | 'id':[1,2,3,4,5],
1027 | 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
1028 | 'subject_id':['sub1','sub2','sub4','sub6','sub5']})
1029 | right = pd.DataFrame({
1030 | 'id':[1,2,3,4,5],
1031 | 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
1032 | 'subject_id':['sub2','sub4','sub3','sub6','sub5']})
1033 |
1034 | pd.merge(left,right,on='id')
1035 |
1036 | '''
1037 | OUTPUT
1038 |
1039 | Name_x id subject_id_x Name_y subject_id_y
1040 | 0 Alex 1 sub1 Billy sub2
1041 | 1 Amy 2 sub2 Brian sub4
1042 | 2 Allen 3 sub4 Bran sub3
1043 | 3 Alice 4 sub6 Bryce sub6
1044 | 4 Ayoung 5 sub5 Betty sub5
1045 | '''
1046 |
1047 | # Merge two DFs via multiple keys
1048 | pd.merge(left, right, on=['key_1', 'key_2']) # Unmerged values are discarded
1049 |
1050 | # Merge using 'HOW'
1051 | '''
1052 | Merge Method SQL Equivalent Description
1053 | left LEFT OUTER JOIN Use keys from left object
1054 | right RIGHT OUTER JOIN Use keys from right object
1055 | outer FULL OUTER JOIN Use union of keys
1056 | inner INNER JOIN Use intersection of keys
1057 | '''
1058 | pd.merge(left, right, on='key', how='left')
1059 | ```
1060 |
1061 | Join Intuitions
1062 |
1063 | 
1064 |
1065 | Image source:
1066 |
1067 |
1068 |
1069 | ### 4.13 Concatenation
1070 | [go to top](#top)
1071 |
1072 |
1073 | > Pandas provides various facilities for easily combining together **Series, DataFrame**, and **Panel** objects.
1074 | >
1075 | > ` pd.concat(objs,axis=0,join='outer',join_axes=None, ignore_index=False)`
1076 | >
1077 | > - **objs** − This is a sequence or mapping of Series, DataFrame, or Panel objects.
1078 | > - **axis** − {0, 1, ...}, default 0. This is the axis to concatenate along.
1079 | > - **join** − {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es). Outer for union and inner for intersection.
1080 | > - **ignore_index** − boolean, default False. If True, do not use the index values on the concatenation axis. The resulting axis will be labeled 0, ..., n - 1.
1081 | > - **join_axes** − This is the list of Index objects. Specific indexes to use for the other (n-1) axes instead of performing inner/outer set logic.
1082 | >
1083 | >
1084 |
1085 | ```python
1086 | # Concatenate DFs
1087 | pd.concat([one, two]) # Adds the rows of two DFs together
1088 | pd.concat([one, two], keys=['x', 'y']) # This gives keys to each specific DF
1089 | pd.concat([one, two], ignore_index=True) # You can also make it ignore the original index
1090 |
1091 | # Concatenate using Append
1092 | one.append(two)
1093 | ```
1094 |
1095 |
1096 |
1097 | ## 5. EXTRA: Helpful Notes
1098 |
1099 | I couldn't find a suitable place to put this information, so I'll put it here:
1100 |
1101 | - Pivot tables, stacking, and unstacking
1102 |
1103 | - Package configuration
1104 |
1105 |
1106 |
1107 |
1108 | ```
1109 | . .
1110 | . |\-^-/| .
1111 | /| } O.=.O { |\
1112 | ```
1113 |
1114 |
1115 |
1116 | ------
1117 |
1118 | [.png)](https://www.buymeacoffee.com/methylDragon)
1119 |
1120 |
--------------------------------------------------------------------------------
/Numpy/01 Numpy Basics.md:
--------------------------------------------------------------------------------
1 | # Numpy Basics
2 |
3 | Author: methylDragon
4 | Contains a syntax reference and code snippets for Numpy!
5 | It's a collection of code snippets and tutorials from everywhere all mashed together!
6 |
7 | ------
8 |
9 | ## Pre-Requisites
10 |
11 | ### Required
12 |
13 | - Python knowledge, this isn't a tutorial!
14 | - Numpy installed
15 | - I'll assume you've already run this line as well `import numpy as np`
16 |
17 |
18 |
19 | ## Table Of Contents
20 |
21 | 1. [Introduction](#1)
22 | 2. [Array Basics](#2)
23 | 2.1 [Configuring Numpy](#2.1)
24 | 2.2 [Numpy Data Types](#2.2)
25 | 2.3 [Creating Arrays](#2.3)
26 | 2.4 [Array Basics and Attributes](#2.4)
27 | 2.5 [Casting](#2.5)
28 | 2.6 [Some Array Methods](#2.6)
29 | 2.7 [Array Indexing](#2.7)
30 | 2.8 [Array Slicing](#2.8)
31 | 2.9 [Reshaping Arrays](#2.9)
32 | 2.10 [Array Concatenation and Splitting](#2.10)
33 | 2.11 [Array Arithmetic](#2.11)
34 | 2.12 [More Array Math](#2.12)
35 | 3. [Going Deeper With Arrays](#3)
36 | 3.1 [Broadcasting](#3.1)
37 | 3.2 [Vectorize](#3.2)
38 | 3.3 [Iterating Through Axes](#3.3)
39 | 3.4 [Modifying Output Directly](#3.4)
40 | 3.5 [Locating Elements](#3.5)
41 | 3.6 [Aggregations](#3.6)
42 | 3.7 [Comparisons](#3.7)
43 | 3.8 [Sorting Arrays](#3.8)
44 | 3.9 [Fancy Indexing](#3.9)
45 | 3.10 [Structured Arrays](#3.10)
46 | 4. [Matrices](#4)
47 | 4.1 [Linear Algebra Functions](#4.1)
48 | 5. [Numpy I/O](#5)
49 | 5.1 [Import from CSV](#5.1)
50 | 5.2 [Saving and Loading](#5.2)
51 |
52 |
53 |
54 |
55 | ## 1. Introduction
56 |
57 | > NumPy is the fundamental package for scientific computing with Python. It contains among other things:
58 | >
59 | > - a powerful N-dimensional array object
60 | > - sophisticated (broadcasting) functions
61 | > - tools for integrating C/C++ and Fortran code
62 | > - useful linear algebra, Fourier transform, and random number capabilities
63 | >
64 | > Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
65 | >
66 | > http://www.numpy.org/
67 |
68 | This document will list the most commonly used functions in Numpy, to serve as a reference when using it.
69 |
70 | It's especially useful because numpy is more efficient than native Python in terms of space usage and speed!
71 |
72 | The reason for that is because of how the arrays are stored:
73 |
74 | 
75 |
76 | Image source: https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html
77 |
78 | You can see that the Python list stores pointers and has to dereference the pointers, but the Numpy array doesn't, because the objects are stored incrementally from the head!
79 |
80 | The Python pointers are extra overhead, same with needing to dereference them.
81 |
82 | It's so useful you see it used in a lot of other packages like OpenCV, Scipy, and pandas!
83 |
84 | ---
85 |
86 | Install it!
87 |
88 | ```shell
89 | $ pip install numpy
90 | ```
91 |
92 | If you need additional help or need a refresher on the parameters, feel free to use:
93 |
94 | ```python
95 | help(np.FUNCTION_YOU_NEED_HELP_WITH)
96 | ```
97 |
98 | ---
99 |
100 | **Credits:**
101 |
102 | A lot of these notes I'm adapting from
103 |
104 | https://jakevdp.github.io/PythonDataScienceHandbook/index.html
105 |
106 | http://cs231n.github.io/python-numpy-tutorial/
107 |
108 | https://docs.scipy.org/doc/numpy-1.15.1/reference/
109 |
110 |
111 |
112 | ## 2. Array Basics
113 |
114 | The core, most important object in Numpy is the **ndarray**, which stands for n-dimensional array.
115 |
116 | > An [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its [`shape`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.shape.html#numpy.ndarray.shape), which is a [`tuple`](https://docs.python.org/dev/library/stdtypes.html#tuple)of *N* positive integers that specify the sizes of each dimension. The type of items in the array is specified by a separate [data-type object (dtype)](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html#arrays-dtypes), one of which is associated with each ndarray.
117 | >
118 | > As with other container objects in Python, the contents of an [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) can be accessed and modified by [indexing or slicing](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#arrays-indexing) the array (using, for example, *N* integers), and via the methods and attributes of the [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray).
119 | >
120 | > Different [`ndarrays`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) can share the same data, so that changes made in one [`ndarray`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html#numpy.ndarray) may be visible in another. That is, an ndarray can be a *“view”* to another ndarray, and the data it is referring to is taken care of by the *“base”* ndarray. ndarrays can also be views to memory owned by Python [`strings`](https://docs.python.org/dev/library/stdtypes.html#str) or objects implementing the `buffer` or [array](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.interface.html#arrays-interface) interfaces.
121 | >
122 | > https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html
123 |
124 |
125 |
126 | ### 2.1 Configuring Numpy
127 | [go to top](#top)
128 |
129 |
130 | ```python
131 | # Set printing precision
132 | np.set_printoptions(precision=2)
133 | ```
134 |
135 |
136 |
137 | ### 2.2 Numpy Data Types
138 | [go to top](#top)
139 |
140 |
141 | #### **List**
142 |
143 | | Data type | Description |
144 | | ------------ | ------------------------------------------------------------ |
145 | | `bool_` | Boolean (True or False) stored as a byte |
146 | | `int_` | Default integer type (same as C `long`; normally either `int64` or `int32`) |
147 | | `intc` | Identical to C `int` (normally `int32` or `int64`) |
148 | | `intp` | Integer used for indexing (same as C `ssize_t`; normally either `int32` or `int64`) |
149 | | `int8` | Byte (-128 to 127) |
150 | | `int16` | Integer (-32768 to 32767) |
151 | | `int32` | Integer (-2147483648 to 2147483647) |
152 | | `int64` | Integer (-9223372036854775808 to 9223372036854775807) |
153 | | `uint8` | Unsigned integer (0 to 255) |
154 | | `uint16` | Unsigned integer (0 to 65535) |
155 | | `uint32` | Unsigned integer (0 to 4294967295) |
156 | | `uint64` | Unsigned integer (0 to 18446744073709551615) |
157 | | `float_` | Shorthand for `float64`. |
158 | | `float16` | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa |
159 | | `float32` | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa |
160 | | `float64` | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa |
161 | | `complex_` | Shorthand for `complex128`. |
162 | | `complex64` | Complex number, represented by two 32-bit floats |
163 | | `complex128` | Complex number, represented by two 64-bit floats |
164 |
165 | #### **nan and inf**
166 |
167 | It's numpy's version of None and infinity!
168 |
169 | ```python
170 | np.nan
171 | np.inf
172 |
173 | # To check if something is nan or inf,
174 | np.isnan(x)
175 | np.isinf(x)
176 | ```
177 |
178 |
179 |
180 | ### 2.3 Creating Arrays
181 | [go to top](#top)
182 |
183 |
184 | General note: Basically any of these functions takes a dtype parameter where you can state the data-type of the output.
185 |
186 | #### **From Python List**
187 |
188 | ```python
189 | # Basic
190 | np.array([1, 2, 3, 4, 5])
191 | # Out: array([1, 2, 3, 4, 5])
192 |
193 | # Upcasted (ints are casted to float due to type constraint)
194 | np.array([1.1, 2, 3, 4, 5])
195 | # Out: array([1.1, 2., 3., 4., 5.])
196 |
197 | # Explicit type
198 | np.array([1, 2, 3, 4, 5], dtype='float32')
199 | # Out: array([1., 2., 3., 4., 5.], dtype=float32)
200 |
201 | # Multi-dimensional
202 | np.array([[1,2],[3,4],[5,6]])
203 | # Out: array([[1,2],
204 | # [3,4],
205 | # [5,6]])
206 | ```
207 |
208 | #### **From Scratch**
209 |
210 | **Filled Arrays**
211 |
212 | ```python
213 | # All zeroes
214 | np.zeros(5, dtype=int)
215 | # Out: array([0, 0, 0, 0, 0])
216 |
217 | # Multi-dimensional Zeros
218 | np.zeros((2, 2))
219 | # Out: array([[0., 0.],
220 | # [0., 0.]])
221 |
222 | # Ones
223 | np.ones((2, 2), dtype=float)
224 | # Out: array([[1., 1.],
225 | # [1., 1.]])
226 |
227 | # Filled array (It even works for non-standard numbers! AHAHAHA)
228 | np.full((2, 2), 'CH3')
229 | # array([['CH3', 'CH3'],
230 | # ['CH3', 'CH3']], dtype='
347 | [go to top](#top)
348 |
349 | #### **Shape and Index**
350 |
351 | It is important to get a proper understanding of the shape of numpy arrays!
352 |
353 | 
354 |
355 | [Image Source](https://www.oreilly.com/library/view/elegant-scipy/9781491922927/ch01.html)
356 |
357 | The corresponding arrays will look like:
358 |
359 | ```python
360 | # 1D
361 | # Every 1D array can be treated as a column vector!
362 | [7, 2, 9, 10]
363 |
364 | # 2D
365 | [[5.2, 3.0, 4.5],
366 | [9.1, 0.1, 0.3]]
367 |
368 | # And so on
369 | ```
370 |
371 | Another way of looking at it is, **matrix indexing**! Numpy goes by **i, j**, from **2D arrays onwards only**.
372 |
373 | If you want to think of it as x, and y, then axis 0 is y, and axis 1 is x. So the indexing is `(y, x)`, and `(i, j)`.
374 |
375 | > If you want to do matrix or vector operations, it is best to do it from at least a 2D array.
376 | >
377 | >
378 |
379 | 
380 |
381 | [Image Source](https://simple.wikipedia.org/wiki/Matrix_(mathematics))
382 |
383 | #### **Attributes**
384 |
385 |
386 | ```python
387 | # Suppose we create a 3 dimensional array
388 | example_array = np.random.randint(5, size=(2, 3, 4))
389 | # Out: array([[[0, 0, 3, 3],
390 | # [2, 1, 1, 3],
391 | # [2, 2, 4, 4]],
392 | #
393 | # [[2, 0, 1, 3],
394 | # [2, 3, 0, 1],
395 | # [2, 0, 1, 2]]])
396 |
397 | # Dimensions
398 | example_array.ndim # 3
399 |
400 | # Shape
401 | example_array.shape # (2, 3, 4) planes, rows, columns (for images, height, width, depth)
402 |
403 | # Total Elements
404 | example_array.size # 24 (which is 2 * 3 * 4)
405 |
406 | # Type
407 | example_array.dtype # dtype('int64')
408 |
409 | # Byte-size of each element
410 | example_array.itemsize # 8
411 |
412 | # Total byte-size
413 | example_array.nbytes # 192 (which is 2 * 3 * 4 * 8)
414 | ```
415 |
416 |
417 |
418 | ### 2.5 Casting
419 | [go to top](#top)
420 |
421 |
422 | ```python
423 | # Just use the .astype() method!
424 |
425 | np.array([True, True, True, False, False, False]).astype('int')
426 | # Out: array([1, 1, 1, 0, 0, 0])
427 | ```
428 | #### **Array to List**
429 |
430 | ```python
431 | np.array([1, 2, 3]).tolist() # [1, 2, 3] native Python list!
432 | ```
433 |
434 | #### **List to Array**
435 |
436 | ```python
437 | np.asarray([1, 2, 3]) # This can take list of tuples, tuples, etc.!
438 | ```
439 |
440 |
441 |
442 | ### 2.6 Some Array Methods
443 | [go to top](#top)
444 |
445 |
446 | There are really a lot of them!
447 |
448 | #### **Repeat and Tile**
449 |
450 | ```python
451 | a = [1, 2, 3]
452 |
453 | np.tile(a, 2) # array([1, 2, 3, 1, 2, 3])
454 | np.repeat(a, 2) # array([1, 1, 2, 2, 3, 3])
455 | ```
456 |
457 | #### **Get Unique**
458 |
459 | ```python
460 | a = np.array([1, 1, 1, 1, 2, 2, 2, 3, 3, 4])
461 |
462 | np.unique(a, return_counts=True)
463 | # Out: (array([1, 2, 3, 4]), array([4, 3, 2, 1]))
464 | # (Unique set, Counts)
465 | ```
466 |
467 | #### **Rounding**
468 |
469 | ```python
470 | a = np.array([1.111, 2.222, 3.333, 4.444])
471 |
472 | np.around(a) # array([1., 2., 3., 4.])
473 | np.around(a, 2) # array([1.11, 2.22, 3.33, 4.44])
474 |
475 | b = np.array([12345])
476 |
477 | np.around(b, -1) # array([12340])
478 | np.around(b, -2) # array([12300])
479 | ```
480 |
481 | #### **Floor**
482 |
483 | ```python
484 | a = np.array([1.111, 2.222, 3.333, 4.444, 5.555])
485 |
486 | np.floor(a) # array([1., 2., 3., 4., 5.])
487 | ```
488 |
489 | #### **Ceil**
490 |
491 | ```python
492 | a = np.array([1.111, 2.222, 3.333, 4.444, 5.555])
493 |
494 | np.ceil(a) # array([2., 3., 4., 5., 6.])
495 | ```
496 |
497 | #### **Count Non-Zeroes**
498 |
499 | ```python
500 | np.count_nonzero(array) # Gives you number of non-zero elements in the array
501 | ```
502 |
503 | #### **Digitize**
504 |
505 | ```python
506 | x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
507 | bins = np.array([0, 3, 6, 9])
508 |
509 | # Return index of the bin each element belongs to
510 | # You can use this together with take to get the digitized array!
511 | np.digitize(x, bins) # array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
512 | ```
513 |
514 | #### **Clip**
515 |
516 | Clip values
517 |
518 | ```python
519 | x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
520 |
521 | np.clip(x, 3, 8) # array([3, 3, 3, 3, 4, 5, 6, 7, 8, 8])
522 | ```
523 |
524 | #### **Histogram and Bincount**
525 |
526 | ```python
527 | x = np.array([1,1,2,2,2,4,4,5,6,6,6])
528 |
529 | np.bincount(x) # array([0, 2, 3, 0, 2, 1, 3])
530 | # How to read output:
531 | # 0 occurs 0 times
532 | # 1 occurs 2 times
533 | # 2 occurs 3 times and so on
534 |
535 | np.histogram(x, [0, 2, 4, 6, 8]) # (array([2, 3, 3, 3]), array([0, 2, 4, 6, 8]))
536 | # First array are the counts
537 | # Second array are the bins
538 | # In this case, the bottom of the bins are inclusive, the tops are not
539 | # Eg. [0, 2): 2
540 | # [2, 4): 3,
541 | # [4, 6): 3
542 | # [6, 8): 3
543 | ```
544 |
545 | **At**
546 |
547 | If you just want to target these functions at a subset of an array, use at
548 |
549 | ```python
550 | np.some_numpy_function.at(array, [0, 1])
551 |
552 | # Example
553 | x = np.array([1, 2, 3, 4])
554 | np.negative.at(x, [0, 1]) # This will mutate x
555 |
556 | # x is now array([-1, -2, 3, 4])
557 | ```
558 |
559 |
560 |
561 | ### 2.7 Array Indexing
562 | [go to top](#top)
563 |
564 |
565 | Of course, you can modify once you index as per normal as well!
566 |
567 | **Note:** if you have an int array, and you try to replace it with a float, it'll be casted to int. (eg. 3.12 -> 3)
568 |
569 | ```python
570 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
571 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
572 | # Out: array([[[ 1, 2, 3],
573 | # [ 4, 5, 6],
574 | # [ 7, 8, 9]],
575 | #
576 | # [[10, 11, 12],
577 | # [13, 14, 15],
578 | # [16, 17, 18]]])
579 | ```
580 | #### **One-dimensional**
581 |
582 | Works just like native Python!
583 |
584 | ```python
585 | array[0]
586 | # Out: array([[1, 2, 3],
587 | # [4, 5, 6],
588 | # [7, 8, 9]])
589 |
590 | array[-1]
591 | # Out: array([[10, 11, 12],
592 | # [13, 14, 15],
593 | # [16, 17, 18]])
594 | ```
595 |
596 | #### **Multi-dimensional**
597 |
598 | ```python
599 | array[0, 0]
600 | # Out: array([1, 2, 3])
601 |
602 | array[0, 0, 0]
603 | # Out: 1
604 | ```
605 |
606 | #### **Conditional Indexing (Boolean Masks)**
607 |
608 | ```python
609 | a = np.array([1, 2, 3, 4, 5])
610 |
611 | a[a > 3] # array([4, 5])
612 | a[np.iscomplex(a)] # array([], dtype=int64)
613 | ```
614 |
615 |
616 |
617 | ### 2.8 Array Slicing
618 | [go to top](#top)
619 |
620 |
621 | **Note:** Unlike in native Python, slicing an array gives you an **array view**, not a copy! So if you alter the array view, it'll alter the original array!
622 |
623 | #### **One-dimensional**
624 |
625 | ```python
626 | array = np.arange(10)
627 | # Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
628 |
629 | # From start
630 | array[:5]
631 | # Out: array([0, 1, 2, 3, 4])
632 |
633 | # From end
634 | array[5:]
635 | # Out: array([5, 6, 7, 8, 9])
636 |
637 | # From middle
638 | array[4:7]
639 | # Out: array([4, 5, 6])
640 |
641 | # Every other element
642 | array[::2]
643 | # Out: array([0, 2, 4, 6, 8])
644 |
645 | # Every other element from index 1
646 | array[1::2]
647 | # Out: array([1, 3, 5, 7, 9])
648 |
649 | # Reversed
650 | array[::-1]
651 | # Out: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
652 |
653 | # Reversed, every other element from index 5
654 | array[5::-2]
655 | # Out: array([5, 3, 1])
656 | ```
657 |
658 | #### **Multi-dimensional**
659 |
660 | ```python
661 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
662 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
663 | # Out: array([[[ 1, 2, 3],
664 | # [ 4, 5, 6],
665 | # [ 7, 8, 9]],
666 | #
667 | # [[10, 11, 12],
668 | # [13, 14, 15],
669 | # [16, 17, 18]]])
670 |
671 | # First from start
672 | array[:1]
673 | # Out: array([[[1, 2, 3],
674 | # [4, 5, 6],
675 | # [7, 8, 9]]])
676 |
677 | # First from end
678 | array[1:]
679 | # Out: array([[[10, 11, 12],
680 | # [13, 14, 15],
681 | # [16, 17, 18]]])
682 |
683 | # First from start from first array from start as nested array
684 | array[:1, :1]
685 | # Out: array([[[1, 2, 3]]])
686 |
687 | # Get first element from every innermost array as nested array
688 | array[:, :, :1]
689 | # Out: array([[[ 1],
690 | # [ 4],
691 | # [ 7]],
692 | #
693 | # [[10],
694 | # [13],
695 | # [16]]])
696 |
697 | # Reverse innermost two layers
698 | array[:, ::-1, ::-1]
699 | # Out: array([[[ 9, 8, 7],
700 | # [ 6, 5, 4],
701 | # [ 3, 2, 1]],
702 | #
703 | # [[18, 17, 16],
704 | # [15, 14, 13],
705 | # [12, 11, 10]]])
706 | ```
707 |
708 | #### **Multi-dimensional Access**
709 |
710 | Sometimes you just want the columns or rows nicely shown as a one dimensional array instead of a nested one.
711 |
712 | **Note: They'll still be editable views!**
713 |
714 | Here's how to do it!
715 |
716 | ```python
717 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
718 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
719 | # Out: array([[[ 1, 2, 3],
720 | # [ 4, 5, 6],
721 | # [ 7, 8, 9]],
722 | #
723 | # [[10, 11, 12],
724 | # [13, 14, 15],
725 | # [16, 17, 18]]])
726 |
727 | # First column from first array
728 | array[0][:, 0]
729 | # Out: array([1, 4, 7])
730 |
731 | # First row from first array (also equivalent to array[0][0])
732 | array[0][0, :]
733 | # Out: array([1, 2, 3])
734 |
735 | # Nested array of first column from each array
736 | array[:, :, 0]
737 | # Out: array([[ 1, 4, 7],
738 | # [10, 13, 16]])
739 | ```
740 |
741 | #### **Array Views**
742 |
743 | Remember what I said about array views?
744 |
745 | ```python
746 | # Native Python
747 | a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
748 | b = a[:5] # [0, 1, 2, 3, 4]
749 |
750 | b[0]= 5 # b is now [5, 1, 2, 3, 4]
751 | a # But a is still [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
752 |
753 | # Numpy
754 | a = np.arange(10) # array([1, 2, 3, 4, 5, 6, 7, 8, 9])
755 |
756 | b = a[5:] # array([0, 1, 2, 3, 4])
757 | b[0] = 5
758 | a # a is now [5, 1, 2, 3, 4, 5, 6, 7, 8, 9]
759 | ```
760 |
761 | #### **Copying Instead of Views**
762 |
763 | ```python
764 | # Just use .copy() !
765 |
766 | a = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
767 |
768 | b = a[5:].copy() # array([0, 1, 2, 3, 4])
769 | b[0] = 5 # b is array([5, 1, 2, 3, 4])
770 | a # a is unchanged
771 | ```
772 |
773 |
774 |
775 | ### 2.9 Reshaping Arrays
776 | [go to top](#top)
777 |
778 |
779 | #### **Reshape**
780 |
781 | ```python
782 | array = np.arange(10) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
783 |
784 | # Reshape reshapes the arrays. Of course!
785 | # You can reshape the array into any n dimensions! Just make sure all the arguments multiplied equal the number of elements of your input array!
786 |
787 | array.reshape(10)
788 | # Out: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
789 |
790 | array.reshape(1, 10)
791 | # Out: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
792 |
793 | array.reshape(2, 5)
794 | # Out: array([[0, 1, 2, 3, 4],
795 | # [5, 6, 7, 8, 9]])
796 |
797 | array.reshape(1, 1, 5, 2)
798 | # Out: array([[[[0, 1],
799 | # [2, 3],
800 | # [4, 5],
801 | # [6, 7],
802 | # [8, 9]]]])
803 |
804 | # You can also use reshape(-1, ) to have numpy figure out the other size for you!
805 | array.reshape(-1, 5)
806 | # Out: array([[0, 1, 2, 3, 4],
807 | # [5, 6, 7, 8, 9]])
808 | ```
809 |
810 | #### **Reshaping with np.newaxis**
811 |
812 | ```python
813 | # Create as row
814 | array[np.newaxis, :] # Equivalent to array.reshape(1, 10)
815 | # Out: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
816 |
817 | # Create as column
818 | array[:, np.newaxis] # Equivalent to array.reshape(10, 1)
819 | # Out: array([[0],
820 | # [1],
821 | # [2],
822 | # [3],
823 | # [4],
824 | # [5],
825 | # [6],
826 | # [7],
827 | # [8],
828 | # [9]])
829 | ```
830 |
831 | #### **Flatten and Ravel**
832 |
833 | ```python
834 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
835 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
836 |
837 | ## Flatten creates a copy!
838 |
839 | # Equivalent
840 | array.flatten()
841 | np.flatten(array)
842 | # Out: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
843 |
844 | ## Ravel creates a view! Editing the ravelled array will edit the parent!
845 |
846 | # Equivalent
847 | array.ravel()
848 | np.ravel(array)
849 | # Out: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18])
850 | ```
851 |
852 | #### **Squeeze**
853 |
854 | Remove single dimensional entries
855 |
856 | ```python
857 | array = np.array([[[1]]])
858 |
859 | # Equivalent
860 | np.squeeze(array) # 1
861 | array.squeeze() # 1
862 | ```
863 |
864 | #### **Transpose**
865 |
866 | ```python
867 | array = np.array([[1, 1], [2, 2]])
868 |
869 | # Equivalent
870 | array.T
871 | array.transpose()
872 | np.transpose(array)
873 | np.rollaxis(array, 1)
874 | np.swapaxes(array, 0, 1)
875 |
876 | # Out: array([[1, 2],
877 | # [1, 2]])
878 | ```
879 |
880 |
881 |
882 |
883 |
884 |
885 |
886 | ### 2.10 Array Concatenation and Splitting
887 | [go to top](#top)
888 |
889 |
890 | #### **Concatenating**
891 |
892 | ```python
893 | a = np.array([1, 2, 3])
894 | b = np.array([4, 5, 6])
895 | c = np.array([[7, 8, 9], [10, 11, 12]])
896 |
897 | np.concatenate([a, b])
898 | # Out: array([1, 2, 3, 4, 5, 6])
899 |
900 | # You can do it with more than two arrays
901 | np.concatenate([a, b, a, b])
902 | # Out: array([1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6])
903 |
904 | # Just make sure all inputs are of the same dimension!
905 | np.concatenate([c, c, c])
906 | # Out: array([[ 7, 8, 9],
907 | # [10, 11, 12],
908 | # [ 7, 8, 9],
909 | # [10, 11, 12],
910 | # [ 7, 8, 9],
911 | # [10, 11, 12]])
912 |
913 | # You may even choose a different axis to concatenate along!
914 | np.concatenate([c, c, c], axis=1)
915 | # Out: array([[ 7, 8, 9, 7, 8, 9, 7, 8, 9],
916 | # [10, 11, 12, 10, 11, 12, 10, 11, 12]])
917 |
918 | # More examples
919 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
920 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
921 |
922 | np.concatenate([array, array], axis=0)
923 | # Out: array([[[ 1, 2, 3],
924 | # [ 4, 5, 6],
925 | # [ 7, 8, 9]],
926 | #
927 | # [[10, 11, 12],
928 | # [13, 14, 15],
929 | # [16, 17, 18]],
930 | #
931 | # [[ 1, 2, 3],
932 | # [ 4, 5, 6],
933 | # [ 7, 8, 9]],
934 | #
935 | # [[10, 11, 12],
936 | # [13, 14, 15],
937 | # [16, 17, 18]]])
938 |
939 | np.concatenate([array, array], axis=1)
940 | # Out: array([[[ 1, 2, 3],
941 | # [ 4, 5, 6],
942 | # [ 7, 8, 9],
943 | # [ 1, 2, 3],
944 | # [ 4, 5, 6],
945 | # [ 7, 8, 9]],
946 | #
947 | # [[10, 11, 12],
948 | # [13, 14, 15],
949 | # [16, 17, 18],
950 | # [10, 11, 12],
951 | # [13, 14, 15],
952 | # [16, 17, 18]]])
953 |
954 | np.concatenate([array, array], axis=2)
955 | # Out: array([[[ 1, 2, 3, 1, 2, 3],
956 | # [ 4, 5, 6, 4, 5, 6],
957 | # [ 7, 8, 9, 7, 8, 9]],
958 | #
959 | # [[10, 11, 12, 10, 11, 12],
960 | # [13, 14, 15, 13, 14, 15],
961 | # [16, 17, 18, 16, 17, 18]]])
962 | ```
963 |
964 | #### **Stacking**
965 |
966 | ```python
967 | a = np.array([1, 2, 3])
968 | b = np.array([4, 5, 6])
969 |
970 | # Vertical Stack
971 | np.vstack([a, b])
972 | # Out: array([[1, 2, 3],
973 | # [4, 5, 6]])
974 |
975 | # Horizontal Stack
976 | np.hstack([a, b])
977 | # Out: array([1, 2, 3, 4, 5, 6])
978 |
979 | # Third Axis Stack (Note how output is 3 dimensions)
980 | np.dstack([a, b])
981 | # Out: array([[[1, 4],
982 | # [2, 5],
983 | # [3, 6]]])
984 | ```
985 |
986 | #### **Splitting**
987 |
988 | ```python
989 | array = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
990 |
991 | # Write the split indexes!
992 | a, b, c = np.split(array, [1, 2])
993 |
994 | a # array([0])
995 | b # array([1])
996 | c # array([2, 3, 4, 5, 6, 7, 8, 9])
997 |
998 | grid = np.arange(16).reshape((4, 4))
999 | # Out: array([[ 0, 1, 2, 3],
1000 | # [ 4, 5, 6, 7],
1001 | # [ 8, 9, 10, 11],
1002 | # [12, 13, 14, 15]])
1003 |
1004 | upper, lower = np.vsplit(grid, [2])
1005 |
1006 | upper # array([[0 1 2 3], [4 5 6 7]])
1007 | lower # array([[ 8, 9, 10, 11], [12, 13, 14, 15]]))
1008 |
1009 | left, right = np.hsplit(grid, [2])
1010 |
1011 | left
1012 | # array([[ 0, 1],
1013 | # [ 4, 5],
1014 | # [ 8, 9],
1015 | # [12, 13]])
1016 |
1017 | right
1018 | # array([[ 2, 3],
1019 | # [ 6, 7],
1020 | # [10, 11],
1021 | # [14, 15]])
1022 |
1023 | # You can use dsplit also! But it only works on arrays of 3 dimensions or more
1024 | ```
1025 |
1026 |
1027 |
1028 | ### 2.11 Array Arithmetic
1029 | [go to top](#top)
1030 |
1031 |
1032 | ```python
1033 | array = np.arange(4) # array([0, 1, 2, 3])
1034 |
1035 | array + 5 # array([5, 6, 7, 8])
1036 | array - 5 # array([-5, -4, -3, -2])
1037 | array * 2 # array([0, 2, 4, 6, 8])
1038 | array / 2 # array([0., 0.5, 1., 1.5])
1039 | array // 2 # array([0, 0, 1, 1])
1040 |
1041 | -array # array([0, -1, -2, -3])
1042 | array ** 2 # array([0, 1, 4, 9])
1043 | array % 2 # array([0, 1, 0, 1])
1044 |
1045 | # Equivalent
1046 | np.add(array, 5) # +
1047 | np.subtract(array, 5) # -
1048 | np.multiply(array, 2) # *
1049 | np.divide(array, 2) # /
1050 | np.floor_divide(array, 2) # //
1051 |
1052 | np.negative(array) # -
1053 | np.power(array, 2) # **
1054 | np.mod(array, 2) # %
1055 | ```
1056 |
1057 |
1058 |
1059 | ### 2.12 More Array Math
1060 | [go to top](#top)
1061 |
1062 |
1063 | ```python
1064 | array = np.array([0, -1, 2, -3, 4])
1065 | ```
1066 |
1067 |
1068 | #### **Abs**
1069 | ```python
1070 | abs(array) # array([0, 1, 2, 3, 4])
1071 | np.abs(array) # Same
1072 | np.absolute(array) # Same
1073 | ```
1074 |
1075 | #### **Complex Mod**
1076 | ```python
1077 | x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
1078 | np.abs(x) # array([ 5., 5., 2., 1.])
1079 | ```
1080 |
1081 | #### **Trigonometry**
1082 | ```python
1083 | theta = np.linspace(0, np.pi, 3) # array([ 0., 1.57079633, 3.14159265])
1084 |
1085 | np.sin(theta) # array([0.00000000e+00, 1.00000000e+00, 1.22464680e-16])
1086 | np.cos(theta) # array([1.00000000e+00, 6.12323400e-17,-1.00000000e+00])
1087 | np.tan(theta) # array([0.00000000e+00, 1.63312394e+16, -1.22464680e-16])
1088 |
1089 | # More Trigonometry
1090 | x = [-1, 0, 1] # By the way, YES, this is a Native Python list!
1091 |
1092 | np.arcsin(x) # array([-1.57079633, 0., 1.57079633]) turns it into a numpy array!
1093 | np.arccos(x) # You get what you expect
1094 | np.arctan(x) # Same here
1095 | ```
1096 |
1097 | #### **Exponents**
1098 | ```python
1099 | x = [1, 2, 3]
1100 |
1101 | # e^x
1102 | np.exp(x) # array([2.71828183, 7.3890561, 20.08553692])
1103 |
1104 | # 2^x
1105 | np.exp2(x) # array([2., 4., 8.])
1106 |
1107 | # 3^x
1108 | np.power(3, x) # array([3, 9, 27])
1109 | ```
1110 |
1111 | #### **Logarithms**
1112 | ```python
1113 | np.log(x) # ln
1114 | np.log2(x) # log base 2
1115 | np.log10(x) # log base 10
1116 |
1117 | # Super high precision
1118 | np.expm1(x) # exp(x) - 1
1119 | np.log1p(x) # log(1 + x)
1120 | ```
1121 |
1122 | #### **Reciprocal**
1123 |
1124 | ```python
1125 | np.reciprocal(x) # Basically power -1
1126 | ```
1127 |
1128 | #### **Return Range of Values**
1129 |
1130 | ```python
1131 | a = np.array([1, 2, 3, 4])
1132 |
1133 | np.ptp(a) # 3 (Maximum - Minimum)
1134 | ```
1135 |
1136 | #### **Standard Deviation and Variance**
1137 |
1138 | ```python
1139 | np.std(x) # Standard Deviation
1140 | np.var(x) # Variance
1141 | ```
1142 |
1143 | There's a lot more! Go look at the `scipy.special` package for a list of all of them!
1144 |
1145 |
1146 |
1147 | ## 3. Going Deeper With Arrays
1148 |
1149 | ### 3.1 Broadcasting
1150 | [go to top](#top)
1151 |
1152 |
1153 | 
1154 |
1155 | Image source: https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm
1156 |
1157 | Broadcasting causes Numpy to pad or 'stretch' smaller arrays to allow them to operate on or with other larger arrays!
1158 |
1159 | > Broadcasting is possible if the following rules are satisfied:
1160 | >
1161 | > - Array with smaller **ndim** than the other is prepended with '1' in its shape.
1162 | > - Size in each dimension of the output shape is maximum of the input sizes in that dimension.
1163 | > - An input can be used in calculation, if its size in a particular dimension matches the output size or its value is exactly 1.
1164 | > - If an input has a dimension size of 1, the first data entry in that dimension is used for all calculations along that dimension.
1165 | >
1166 | > A set of arrays is said to be **broadcastable** if the above rules produce a valid result and one of the following is true:
1167 | >
1168 | > - Arrays have exactly the same shape.
1169 | > - Arrays have the same number of dimensions and the length of each dimension is either a common length or 1.
1170 | > - Array having too few dimensions can have its shape prepended with a dimension of length 1, so that the above stated property is true.
1171 | >
1172 | > https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm
1173 |
1174 | **Example**
1175 |
1176 | This is the example in the picture above!
1177 |
1178 | 
1179 |
1180 | ```python
1181 | a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]])
1182 | b = np.array([1.0,2.0,3.0])
1183 |
1184 | a
1185 | # Out: array([[0., 0., 0.]
1186 | # [10., 10., 10.]
1187 | # [20., 20., 20.]
1188 | # [30., 30., 30.]])
1189 |
1190 | b
1191 | # Out: array([1., 2., 3.])
1192 |
1193 | a + b
1194 | # Out: array([[1., 2., 3.]
1195 | # [11., 12., 13.]
1196 | # [21., 22., 23.]
1197 | # [31., 32., 33.]])
1198 | ```
1199 |
1200 | **Uses**
1201 |
1202 | Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html
1203 |
1204 | **Centering An Array**
1205 |
1206 | ```python
1207 | X = np.random.random((10, 3))
1208 | Xmean = X.mean()
1209 |
1210 | X_centered = X - Xmean
1211 | ```
1212 |
1213 | **Plotting a Two-Dimensional Array**
1214 |
1215 | ```python
1216 | # x and y have 50 steps from 0 to 5
1217 | x = np.linspace(0, 5, 50)
1218 | y = np.linspace(0, 5, 50)[:, np.newaxis]
1219 |
1220 | z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
1221 | ```
1222 |
1223 |
1224 |
1225 | ### 3.2 Vectorize
1226 | [go to top](#top)
1227 |
1228 |
1229 | You'd have noticed that all of the functions above seem to be able to act on every element in the array without needing the use of for-loops!
1230 |
1231 | You can get this ability for ANY function that you might want to write by using np.vectorize!
1232 |
1233 | **Vectorize**
1234 |
1235 | ```python
1236 | def my_add_n(a, n):
1237 | return a + n
1238 |
1239 | vfunc = np.vectorize(my_add_n)
1240 |
1241 | vfunc([0, 2, 4], 2) # array([2, 4, 6])
1242 |
1243 | # You may specify the output type explicitly as well
1244 | # Note: Down-casting will occur if you stated int but inputted floats!
1245 | vfunc_float = np.vectorize(my_add_n, otypes=[np.float])
1246 | ```
1247 | **Excluding Parameters**
1248 |
1249 | ```python
1250 | # You may also declare parameters that shouldn't be vectorized!
1251 | # Source: https://docs.scipy.org/doc/numpy-1.9.2/reference/generated/numpy.vectorize.html
1252 | def mypolyval(p, x):
1253 | _p = list(p)
1254 | res = _p.pop(0)
1255 | while _p:
1256 | res = res*x + _p.pop(0)
1257 | return res
1258 |
1259 | vpolyval = np.vectorize(mypolyval, excluded=['p'])
1260 |
1261 | # Think of this like x^2 + 2x + 3, then feed in x = 0, x = 1 successively
1262 | vpolyval(p=[1, 2, 3], x=[0, 1]) # array([3, 6])
1263 |
1264 | # Or you can state the exclusion inline
1265 | vpolyval.excluded.add(0)
1266 | vpolyval([1, 2, 3], x=[0, 1]) # array([3, 6])
1267 | ```
1268 |
1269 |
1270 |
1271 | ### 3.3 Iterating Through Axes
1272 | [go to top](#top)
1273 |
1274 |
1275 | You could use a for loop, or you could use this
1276 |
1277 | ```python
1278 | def state_max(x):
1279 | return np.max(x)
1280 |
1281 | np.apply_along_axis(state_max, axis=0, arr=array_to_parse)
1282 | ```
1283 |
1284 |
1285 |
1286 | ### 3.4 Modifying Output Directly
1287 | [go to top](#top)
1288 |
1289 |
1290 | Ok. So now you've noticed that all the functions above are more or less vectorized functions. They're also called UFuncs, universal functions.
1291 |
1292 | Here are some nifty things you can do with them!
1293 |
1294 | So, for example, if you're dealing with a huge array
1295 |
1296 | ```python
1297 | a = np.arange(5)
1298 | b = np.empty(5)
1299 |
1300 | # Less efficient
1301 | b = np.multiply(a, 10) # This creates a temporary array before assigning it to b
1302 |
1303 | # More efficient
1304 | np.multiply(a, 10, out=b) # This modifies y directly! This also works for array views!
1305 | ```
1306 |
1307 |
1308 |
1309 | ### 3.5 Locating Elements
1310 | [go to top](#top)
1311 |
1312 |
1313 | #### **Where**
1314 |
1315 | ```python
1316 | a = np.array([1, 2, 3, 4, 5])
1317 |
1318 | b = np.where(a > 3) # (array([3, 4]),) It's the locations of the satisfied conditions!
1319 | ```
1320 |
1321 | #### **Take**
1322 |
1323 | ```python
1324 | a.take(b) # array([[4, 5]])
1325 | ```
1326 |
1327 | **Where Cases**
1328 |
1329 | ```python
1330 | a = np.array([1, 2, 3, 4, 5])
1331 |
1332 | b = np.where(a > 3, "NO", "YES") # array(['YES', 'YES', 'YES', 'NO', 'NO'], dtype='
1352 | [go to top](#top)
1353 |
1354 |
1355 | #### **Reduce**
1356 |
1357 | ```python
1358 | x = np.array([1, 2, 3, 4])
1359 |
1360 | np.add.reduce(x) # 10 (which is 1 + 2 + 3 + 4)
1361 | np.multiply.reduce(x) # 24 (which is 1 * 2 * 3 * 4)
1362 | ```
1363 |
1364 | #### **Accumulate**
1365 |
1366 | Reduce, but show each step of the way!
1367 |
1368 | ```python
1369 | x = np.array([1, 2, 3, 4])
1370 |
1371 | np.add.accumulate(x) # array([1, 3, 6, 10])
1372 | np.multiply.accumulate(x) # array([1, 2, 6, 24])
1373 | ```
1374 |
1375 | #### **Cumsum**
1376 |
1377 | Cumulative sum
1378 |
1379 | ```python
1380 | x = np.array([1, 2, 3, 4])
1381 |
1382 | # Equivalent
1383 | np.cumsum(x)
1384 | x.cumsum()
1385 | np.add.reduce(x)
1386 | ```
1387 |
1388 | **Outer Product**
1389 |
1390 | The outer product of two vectors or matrices uv, is the matrix product of uv!
1391 |
1392 | 
1393 |
1394 | Image source: https://en.wikipedia.org/wiki/Outer_product
1395 |
1396 | ```python
1397 | x = np.array([1, 2, 3, 4])
1398 |
1399 | np.multiply.outer(x, x)
1400 | # Out: array([[ 1, 2, 3, 4],
1401 | # [ 2, 4, 6, 8],
1402 | # [ 3, 6, 9, 12],
1403 | # [ 4, 8, 12, 16]])
1404 | ```
1405 |
1406 | #### **Sum**
1407 |
1408 | ```python
1409 | np.sum(np.array([1, 2, 3, 4])) # 10
1410 |
1411 | # Beware!
1412 | np.sum(np.array([[1, 2, 3, 4], [1, 2]])) # [1, 2, 3, 4, 1, 2]
1413 | ```
1414 |
1415 | **Min and Max**
1416 |
1417 | ```python
1418 | np.min(x) # Gives smallest element in array
1419 | np.max(x) # Gives largest element in array
1420 |
1421 | # You can specify the axis!
1422 | array = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
1423 | [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
1424 |
1425 | np.min(array, axis=0)
1426 | # Out: array([[1, 2, 3],
1427 | # [4, 5, 6],
1428 | # [7, 8, 9]])
1429 |
1430 | np.min(array, axis=1)
1431 | # Out: array([[ 1, 2, 3],
1432 | # [10, 11, 12]])
1433 |
1434 | np.min(array, axis=2)
1435 | # Out: array([[ 1, 4, 7],
1436 | # [10, 13, 16]])
1437 |
1438 | # Same applies to max
1439 | ```
1440 |
1441 | #### **Mean**
1442 |
1443 | ```python
1444 | np.mean(x)
1445 | ```
1446 |
1447 | #### **Full List**
1448 |
1449 | | Function Name | NaN-safe Version | Description |
1450 | | --------------- | ------------------ | ----------------------------------------- |
1451 | | `np.sum` | `np.nansum` | Compute sum of elements |
1452 | | `np.prod` | `np.nanprod` | Compute product of elements |
1453 | | `np.mean` | `np.nanmean` | Compute mean of elements |
1454 | | `np.std` | `np.nanstd` | Compute standard deviation |
1455 | | `np.var` | `np.nanvar` | Compute variance |
1456 | | `np.min` | `np.nanmin` | Find minimum value |
1457 | | `np.max` | `np.nanmax` | Find maximum value |
1458 | | `np.argmin` | `np.nanargmin` | Find index of minimum value |
1459 | | `np.argmax` | `np.nanargmax` | Find index of maximum value |
1460 | | `np.median` | `np.nanmedian` | Compute median of elements |
1461 | | `np.percentile` | `np.nanpercentile` | Compute rank-based statistics of elements |
1462 | | `np.any` | N/A | Evaluate whether any elements are true |
1463 | | `np.all` | N/A | Evaluate whether all elements are true |
1464 |
1465 | **Note:** These are methods you can call on the array itself as well!
1466 |
1467 | ```python
1468 | x = np.array([1, 2, 3, 4])
1469 |
1470 | # Equivalent!
1471 | np.sum(x)
1472 | x.sum()
1473 |
1474 | # It even works for the arguments!
1475 | high_dim_array.sum(axis = 2) # And so on!
1476 | ```
1477 |
1478 |
1479 |
1480 | ### 3.7 Comparisons
1481 | [go to top](#top)
1482 |
1483 |
1484 | #### **Boolean Comparisons**
1485 |
1486 | ```python
1487 | a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
1488 |
1489 | b = a > 4
1490 | # Out: array([False, False, False, False, False, True, True, True, True, True])
1491 |
1492 | # This works for all the conditional operators!
1493 | # ==
1494 | # !=
1495 | # > , >=
1496 | # < , <=
1497 | ```
1498 |
1499 | #### **Maximum and Minimum**
1500 |
1501 | Note: It's **not** max and min! Those are single array!
1502 |
1503 | ```python
1504 | a = np.array([1, 2, 3, 4, 5])
1505 | b = np.array([5, 4, 3, 2, 1])
1506 |
1507 | np.maximum(a, b) # array([5, 4, 3, 4, 5])
1508 | np.minimum(a, b) # array([1, 2, 3, 2, 1])
1509 | ```
1510 |
1511 | #### **Any and All**
1512 |
1513 | You can use Any and All too!
1514 |
1515 | ```python
1516 | np.any(x > 5)
1517 | np.all(x < 0)
1518 | ```
1519 |
1520 |
1521 |
1522 | ### 3.8 Sorting Arrays
1523 | [go to top](#top)
1524 |
1525 |
1526 | The np sort is default quicksort, though mergesort and heapsort are also options.
1527 |
1528 | #### **Sort**
1529 |
1530 | ```python
1531 | x = np.array([2, 1, 4, 3, 5])
1532 |
1533 | # Does not mutate x
1534 | np.sort(x) # array([1, 2, 3, 4, 5])
1535 |
1536 | # Mutates x
1537 | x.sort() # array([1, 2, 3, 4, 5])
1538 |
1539 | # Return indices of sorted elements instead
1540 | np.argsort(x) # array([1, 0, 3, 2, 4])
1541 | ```
1542 |
1543 | #### **Sort Along Axes**
1544 |
1545 | ```python
1546 | array = np.array([[[9, 2, 1], [4, 2, 6], [17, 8, 9]],
1547 | [[190, 11, 12], [13, 14, 115], [16, 17, 18]]])
1548 |
1549 | np.sort(array, axis=0)
1550 | # Out: array([[[ 9, 2, 1],
1551 | # [ 4, 2, 6],
1552 | # [ 16, 8, 9]],
1553 | #
1554 | # [[190, 11, 12],
1555 | # [ 13, 14, 115],
1556 | # [ 17, 17, 18]]])
1557 |
1558 | np.sort(array, axis = 1)
1559 | # Out: array([[[ 4, 2, 1],
1560 | # [ 9, 2, 6],
1561 | # [ 17, 8, 9]],
1562 | #
1563 | # [[ 13, 11, 12],
1564 | # [ 16, 14, 18],
1565 | # [190, 17, 115]]])
1566 | ```
1567 |
1568 | #### **Partial Sorts**
1569 |
1570 | ```python
1571 | x = np.array([7, 2, 3, 1, 6, 5, 4])
1572 |
1573 | # First 3 are smallest, the rest are in arbitrary order
1574 | # This also works for the multiple axes like in the previous example
1575 | np.partition(x, 3, axis = 0) # array([2, 1, 3, 4, 6, 5, 7])
1576 | ```
1577 |
1578 |
1579 |
1580 | ### 3.9 Fancy Indexing
1581 | [go to top](#top)
1582 |
1583 |
1584 | We know how to index, and slice, and apply Boolean masks (conditional indexing). but we can pass arrays of indices too!
1585 |
1586 | ```python
1587 | x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
1588 |
1589 | [x[3], x[4], x[8]] # [4, 5, 9]
1590 |
1591 | ind = [3, 4, 8]
1592 | x[ind] # array([4, 5, 9])
1593 |
1594 | # This is particularly useful because fancy indexing allows you to RESHAPE the array!
1595 | ind = np.array([[3, 4], [8, 0]])
1596 | x[ind]
1597 | # array([[4, 5],
1598 | # [9, 1]])
1599 |
1600 | # You can also do it in multiple dimensions
1601 | x = np.array([[1, 2],
1602 | [3, 4]])
1603 |
1604 | row = np.array([0, 1]) # Select [1, 2] or [3, 4]
1605 | col = np.array([0, 1]) # Select within those inner arrays
1606 |
1607 | x[row, col] # array([1, 4])
1608 |
1609 | # Also works with broadcasting
1610 | x[row[:, np.newaxis], col]
1611 | # Out: array([[1, 2],
1612 | # [3, 4]])
1613 | ```
1614 |
1615 | #### **Combined Indexing**
1616 |
1617 | Combine fancy indexing with normal indexing!
1618 |
1619 | ```python
1620 | x = np.arange(12).reshape(3, 4)
1621 | # Out: array([[ 0, 1, 2, 3],
1622 | # [ 4, 5, 6, 7],
1623 | # [ 8, 9, 10, 11]])
1624 |
1625 | x[2, [2, 0, 1]] # array([10, 8, 9])
1626 | x[1:, [2, 0, 1]]
1627 | # Out: array([[6, 4, 5],
1628 | # [10, 8, 9])
1629 | ```
1630 |
1631 |
1632 |
1633 | ### 3.10 Structured Arrays
1634 | [go to top](#top)
1635 |
1636 |
1637 | Arrays of mixed type!
1638 |
1639 | Source: https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html
1640 |
1641 | ```python
1642 | data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
1643 | 'formats':('U10', 'i4', 'f8')})
1644 |
1645 | data.dtype # [('name', '
1670 |
1671 | Matrices are strictly 2 dimensional ndarrays!
1672 |
1673 | You create them exactly the same.
1674 |
1675 | ```python
1676 | import numpy.matlib
1677 |
1678 | matlib.empty()
1679 | matlib.zeros()
1680 | matlib.ones()
1681 | matlib.eye()
1682 | matlib.identity()
1683 | matlib.rand()
1684 |
1685 | # You can even use
1686 | np.asmatrix(some_numpy_array)
1687 |
1688 | # Useful methods
1689 | .diagonal() # Get diagonal as an array
1690 |
1691 | # You can sort them, and do general ndarray stuff with them as well!
1692 | ```
1693 |
1694 | ### 4.1 Linear Algebra Functions
1695 | [go to top](#top)
1696 |
1697 |
1698 | ```python
1699 | np.dot() # Get dot product of two arrays
1700 | np.vdot() # Get dot product of two vectors
1701 |
1702 | np.inner() # Get inner product of two arrays
1703 | np.matmul() # Matrix multiplication
1704 |
1705 | np.linalg.det() # Determinant
1706 | np.linalg.inv() # Find Inverse matrix
1707 |
1708 | np.linalg.solve() # Solve system of linear equations
1709 | ```
1710 |
1711 | **Special Note: Dot Product and Multiply**
1712 |
1713 | There are shorthand operators for matrices!
1714 |
1715 | ```python
1716 | # Suppose we have two matrices A and B
1717 |
1718 | # np.dot(A, B)
1719 | A * B # Dot product, element wise multiplication
1720 |
1721 | # np.matmul(A, B)
1722 | A @ B # Matrix multiplication
1723 | ```
1724 |
1725 |
1726 |
1727 | ## 5. Numpy I/O
1728 |
1729 | ### 5.1 Import from CSV
1730 | [go to top](#top)
1731 |
1732 |
1733 | https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.genfromtxt.html
1734 |
1735 | ```python
1736 | path = 'path_to_csv'
1737 | data = np.genfromtxt(path,
1738 | delimiter=',',
1739 | skip_header=1, # Number of lines to skip at beginning
1740 | filling_values=-999, # Value to use when data is missing
1741 | dtype='float')
1742 |
1743 | # If you set dtype as None, each row will be a Python tuple in the Array!
1744 | (18., 8, 307., 130, 3504, 12. , 70, 1, b'"some_string_stuff"')
1745 | ```
1746 |
1747 |
1748 |
1749 | ### 5.2 Saving and Loading
1750 | [go to top](#top)
1751 |
1752 |
1753 | ```python
1754 | # Save One Array
1755 | np.save('data.npy', array)
1756 |
1757 | # Save Multiple Arrays
1758 | np.savez('data_mult.npz', a=array_a, b=array_b)
1759 |
1760 | # Load
1761 | single = np.load('data.npy')
1762 | mult = np.load('data.npz')
1763 |
1764 | a = mult['a']
1765 | b = mult['b']
1766 | ```
1767 |
1768 | **Save and Load as txt**
1769 |
1770 | ```python
1771 | np.savetxt('out.txt', array)
1772 |
1773 | np.loadtxt('out.txt')
1774 | ```
1775 |
1776 |
1777 |
1778 | ```
1779 | . .
1780 | . |\-^-/| .
1781 | /| } O.=.O { |\
1782 | ```
1783 |
1784 |
1785 |
1786 | ------
1787 |
1788 | [.png)](https://www.buymeacoffee.com/methylDragon)
1789 |
--------------------------------------------------------------------------------