├── README.md
└── pandas.md


/pandas.md:
--------------------------------------------------------------------------------
  1 | ## Creating data
  2 | 
  3 | ### DataFrame
  4 | A dataframe is a table.
  5 | It contains an array of individual entries,
  6 | each of which has a certain value.
  7 | Each entry corresponds to a row and a column.
  8 | 
  9 | ```
 10 | pd.DataFrame({'Yes': [50, 21], 'No': [131,2]})
 11 | ```
 12 | 
 13 | ![image](https://user-images.githubusercontent.com/95273765/219207825-cad209bd-8cbc-4cec-93a5-489bffdae851.png)
 14 | 
 15 | ```
 16 | pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
 17 |               'Sue': ['Pretty good.', 'Bland.']},
 18 |              index=['Product A', 'Product B'])
 19 | ```
 20 | 
 21 | ![image](https://user-images.githubusercontent.com/95273765/219208281-719abedb-6bcf-48ae-90f3-04b2fa742b8b.png)
 22 | 
 23 | ### Series
 24 | A series, by contrast, is a sequence of data values.
 25 | If a dataframe is a table, a series is a list.
 26 | 
 27 | ![image](https://user-images.githubusercontent.com/95273765/219208506-31e638b7-0a5e-4e1e-a8e6-5499dc379aba.png)
 28 | 
 29 | A series is, in essence, a single column of a dataframe.
 30 | So we can assign row labels to the series the same way as before, using an `index  parameter.
 31 | However, a series does not have a column name, it only has one overall `name`.
 32 | 
 33 | ![image](https://user-images.githubusercontent.com/95273765/219208837-93449377-a30c-484c-8350-923e58f8c2c8.png)
 34 | 
 35 | ## Reading data files
 36 | Read CSV file:
 37 | ``` python
 38 | wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv")
 39 | ```
 40 | 
 41 | We can use the `shape` attribute to check how large the resulting DataFrame is:
 42 | ```
 43 | wine_reviews.shape
 44 | 
 45 | # it tells how many records split across a certain number of different columns
 46 | ```
 47 | 
 48 | We can examine the contents of the resultant DataFrame using the `head()` command, which grabs the first five rows.
 49 | ``` python
 50 | wine_reviews.head()
 51 | ```
 52 | 
 53 | ## Native accessors
 54 | In python, we can access the property of an object by accessing it as an attribute.
 55 | A `book` object, for example, might have a `title` property, which we can access by calling `book`.
 56 | 
 57 | Hence to access the `country` property of `reviews` we can use:
 58 | ```
 59 | reviews.country
 60 | ```
 61 | 
 62 | ![image](https://user-images.githubusercontent.com/95273765/219213679-7daeb44a-5917-42df-94a9-24ed9e0dcfa0.png)
 63 | 
 64 | We can also do 
 65 | ```
 66 | reviews['country']
 67 | ```
 68 | 
 69 | ## Indexing in pandas
 70 | The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem.
 71 | 
 72 | ### Index-based selection
 73 | Pandas indexing works in one of two paradigms.
 74 | The first is index-based selection:
 75 | selecting data based on its numerical position in the data.
 76 | Both `loc` and `iloc` are row-first, column-second.
 77 | 
 78 | `loc` and `iloc` are two important indexing methods in Pandas that allow selecting rows and columns of a DataFrame.
 79 | 
 80 | `loc` is used to select data based on labels of rows or columns. It takes the form `df.loc[row_label, column_label].`
 81 | The `row_label` and `column_label` can be a single label, a list of labels, or a slice of labels.
 82 | 
 83 | ``` python
 84 | import pandas as pd
 85 | 
 86 | data = {'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
 87 |         'age': [25, 28, 19, 31, 22],
 88 |         'gender': ['F', 'M', 'M', 'M', 'F']}
 89 | 
 90 | df = pd.DataFrame(data)
 91 | df.set_index('name', inplace=True)
 92 | 
 93 | print(df.loc['Bob', 'age']) # Output: 28
 94 | ```
 95 | 
 96 | `iloc` is used to select data based on the integer position of rows or columns.
 97 | It takes the form `df.iloc[row_index, column_index]`.
 98 | 
 99 | The `row_index` and `column_index` can be a single index, a list of indexes, or a slice of indexes.
100 | 
101 | ``` python
102 | import pandas as pd
103 | 
104 | data = {'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
105 |         'age': [25, 28, 19, 31, 22],
106 |         'gender': ['F', 'M', 'M', 'M', 'F']}
107 | 
108 | df = pd.DataFrame(data)
109 | df.set_index('name', inplace=True)
110 | 
111 | print(df.iloc[1, 1]) # Output: 28
112 | ```
113 | 
114 | On its own, the `:` operator, which also comes from native Python, means 'everything'.
115 | ``` python
116 | reviews.iloc[:3, 0]
117 | ```
118 | 
119 | ![image](https://user-images.githubusercontent.com/95273765/219215404-05bb3082-5b2a-4f1f-a381-f251c6ac2b87.png)
120 | 
121 | ## Manipulating the index
122 | Label-based selection derives its power from the labels in the index. Critically, the index we use is not immutable. We can manipulate the index in any way we see fit.
123 | 
124 | The `set_index()` method can be used to do the job.
125 | 
126 | ## Info about the data
127 | The DataFrames object has a method called `info()`, that gives us more information about the data set.
128 | 
129 | ``` python
130 | print(df.info())
131 | ```
132 | 
133 | The `info()` method also tells us how many non-null values there are present in each column.
134 | 
135 | ## Get columns
136 | ``` python
137 | print(df[['Name', 'Type 1', 'HP']])
138 | ```
139 | 
140 | ``` python
141 | for index, row in df.iterrows():
142 |   print(index, row['Name'])
143 | ```
144 | 
145 | ## Conditional getting
146 | ``` python
147 | df.loc(df['Type 1'] == Grass)
148 | ```
149 | 
150 | ## Sort values
151 | ``` python
152 | df.sort_values(['Name', 'HP'], ascending=[True, False])
153 | ```
154 | 
155 | ## Make changes
156 | ``` python
157 | df['Total'] = df['HP'] + df['Attack']
158 | ```
159 | 
160 | drop column
161 | 
162 | ``` python
163 | df = df.drop(column=['Total'])
164 | ```
165 | 
166 | another way getting total
167 | 
168 | ``` python
169 | df['Total'] = df.iloc[:, 4:9].sum(axis=1)
170 | ```
171 | 
172 | or
173 | 
174 | ``` python
175 | cols = list(df.columns.values)
176 | df = df[cols[0:4] + [cols[-1]]+cols[4:12]]
177 | ```
178 | 
179 | ## Convert to csv
180 | df.to_csv('modified.csv')
181 | 
182 | ## Filtering data
183 | ``` python
184 | df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')]
185 | ```
186 | 
187 | if contains
188 | 
189 | ``` python
190 | df.loc[~df['Name'].str.contains('Mega')]
191 | ```
192 | 
193 | ## Reset index
194 | new_df = new_df.reset_index()
195 | 
196 | ## Regex filtering
197 | ``` python
198 | df.loc[df['Type 1'].str.contains('File|Grass', regex=True)]
199 | ```
200 | 
201 | ignores case
202 | 
203 | ``` python
204 | df.loc[df['Type 1'].str.contains('File|Grass', flag=re.I, regex=True)]
205 | ```
206 | 


--------------------------------------------------------------------------------