├── Data Manipulation Practice ├── Other techniques & practices │ ├── README.md │ ├── Regular Expression & Web Scraping.ipynb │ └── Word Cloud.ipynb ├── Pandas & Numpy Practice │ ├── 101 Pandas Exercises.ipynb │ ├── Exhaustive Introduction to Pandas (1).ipynb │ ├── Exhaustive Introduction to Pandas (2).ipynb │ └── README.md ├── Python codes for implementing SQL queries │ ├── Python code for SQL queries.ipynb │ ├── README.md │ └── database_relationship.png └── README.md ├── Data Visualization ├── Matplotlib.ipynb ├── README.md └── Seaborn_Visualization.ipynb ├── LeetCode_Algorithm and Data Structure ├── #13_Roman to Integer.ipynb ├── #14_Longest Common Prefix.ipynb ├── #1_Two Sum.ipynb ├── #20_Valid Parentheses.ipynb ├── #26_Remove Duplicates from Sorted Array.ipynb ├── #53_Maximum Subarray.ipynb ├── #7 Reverse Interger.ipynb ├── #88_Merge Sorted Array.ipynb └── README.md └── README.md /Data Manipulation Practice/Other techniques & practices/README.md: -------------------------------------------------------------------------------- 1 | # Other techniques & practice 2 | 3 | This is a record of my practice using python on other techniques, including Web scrapping, Regular expression, Wordcloud and etc. 4 | 5 | ## Getting Started 6 | 7 | ### Prerequisites 8 | 9 | For Web scrapping practice, `pandas` and `numpy` packages are required. 10 | 11 | For Wordcloud practice, `wordcloud` packages are required. 12 | 13 | For Regular experssion practice, `re` packages are required 14 | 15 | #### Install packages 16 | 17 | Personally, I recommend to use anaconda which is easy to manage the packages. 18 | * [anaconda free download link](https://www.anaconda.com/distribution/#download-section) 19 | 20 | After installing anaconda, you can easily install python packages in terminal. 21 | ``` 22 | conda install package_name 23 | ``` 24 | 25 | ## Authors 26 | 27 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/) - [Medium page](https://medium.com/@lushuhan95) 28 | 29 | 30 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Other techniques & practices/Regular Expression & Web Scraping.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Regex" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 2, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import re\n", 17 | "from bs4 import BeautifulSoup\n", 18 | "import requests" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 3, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "data_re = open(\"mock_data.csv\",\"r\")\n", 28 | "file = data_re.read()\n", 29 | "file_temp = file.split(\"\\n\")\n", 30 | "file_no_header = \"\"\n", 31 | "for i in file_temp[1:]:\n", 32 | " file_no_header = file_no_header+\"\\n\"+i\n", 33 | "data_re.close()" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 4, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "data": { 43 | "text/plain": [ 44 | "'birthday,phone,names,lat_long,email,zip,city'" 45 | ] 46 | }, 47 | "execution_count": 4, 48 | "metadata": {}, 49 | "output_type": "execute_result" 50 | } 51 | ], 52 | "source": [ 53 | "file.split(\"\\n\")[0]" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "### 1 To transform the column birthday from European to US data format while leaving the rest as is." 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 5, 66 | "metadata": { 67 | "scrolled": true 68 | }, 69 | "outputs": [ 70 | { 71 | "name": "stdout", 72 | "output_type": "stream", 73 | "text": [ 74 | "\n", 75 | "2020-07-05,1-560-294-4480,Leslie H. Howard,\"-80.7931, 157.73725\",magna.Nam@Praesentinterdumligula.com,05-449,Campina Grande\n", 76 | "2019-11-06,1-616-403-7121,Kyra T. Wynn,\"-54.65661, 87.93458\",amet.ornare@uterosnon.net,1975,Bagh\n", 77 | "2020-11-23,1-258-160-9496,Darius G. Huff,\"-56.23283, -116.1583\",ipsum@erat.co.\n" 78 | ] 79 | } 80 | ], 81 | "source": [ 82 | "file_1 = re.sub(r\"([0-9]{2})\\.([0-9]{2})\\.([0-9]{4})\",r\"\\3-\\2-\\1\",file_no_header)\n", 83 | "print(file_1[:300])" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "### 2 To strip everything BUT the email column." 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 6, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "\n", 103 | "magna.Nam@Praesentinterdumligula.com\n", 104 | "amet.ornare@uterosnon.net\n", 105 | "ipsum@erat.co.uk\n", 106 | "nunc.est.mollis@auctor.com\n", 107 | "neque.venenatis@Phasellus.ca\n", 108 | "Proin.mi@a.co.uk\n", 109 | "gravida.non.sollicitudin@odioNam.net\n", 110 | "Cras@dict\n" 111 | ] 112 | } 113 | ], 114 | "source": [ 115 | "file_2 = re.sub(r\".+,.+,.+,.+,([a-zA-Z0-9\\._]+@[a-zA-Z0-9]+\\..*),.+,.+\",r\"\\1\",file_no_header)\n", 116 | "print(file_2[:200])" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "### 3 - To convert all rows to \"name [TAB] birthday\" only (and strip the rest). The birthday column should be in the US format. ([TAB]s will allow you to copy and pass its result into excel.)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 7, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "\n", 136 | "2020-07-05\tLeslie H. Howard\n", 137 | "2019-11-06\tKyra T. Wynn\n", 138 | "2020-11-23\tDarius G. Huff\n", 139 | "2020-05-07\tFlynn M. Rodriguez\n", 140 | "2020-09-25\tDamon K. Potts\n", 141 | "2019-12-30\tOdessa U. Stewart\n", 142 | "2019-03-04\tRuby M. Noble\n", 143 | "2020-12-17\t\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "file_3 = re.sub(r\"([0-9]{2})\\.([0-9]{2})\\.([0-9]{4}),.+,([a-zA-Z]+\\s[A-Z]{1}\\.\\s[a-zA-Z]+),.+,.+,.+,.+\",r\"\\3-\\2-\\1\\t\\4\",file_no_header)\n", 149 | "print(file_3[:200])\n" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "### 4 - To strip everything BUT lat_long AND reorder its entries to be \"long [TAB] lat\". ([TAB]s will allow you to copy and pass its result into excel.)" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 8, 162 | "metadata": {}, 163 | "outputs": [ 164 | { 165 | "name": "stdout", 166 | "output_type": "stream", 167 | "text": [ 168 | "\n", 169 | "157.73725\t-80.7931\n", 170 | "87.93458\t-54.65661\n", 171 | "-116.1583\t-56.23283\n", 172 | "-93.02564\t50.02477\n", 173 | "91.87029\t34.94064\n", 174 | "96.3\n" 175 | ] 176 | } 177 | ], 178 | "source": [ 179 | "file_4 = re.sub(r\".+,.+,.+,\\\"(.+),\\s(.+)\\\",.+,.+,.+\",r\"\\2\\t\\1\",file_no_header)\n", 180 | "print(file_4[:100])" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "# Web Scrapping" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 19, 193 | "metadata": {}, 194 | "outputs": [], 195 | "source": [ 196 | "# use agents to get access\n", 197 | "agent = {'User-Agent': 'Mozilla/5.0'}\n", 198 | "page = requests.get('https://www.usnews.com/',headers=agent)\n", 199 | "web_content = page.content\n", 200 | "soup = BeautifulSoup(web_content,'html.parser')" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 21, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "# Find top Stories\n", 210 | "\n", 211 | "top_header = soup.find(string=re.compile(\"Top Stories\"))\n", 212 | "top_content = top_header.parent.parent" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 38, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/plain": [ 223 | "bs4.element.NavigableString" 224 | ] 225 | }, 226 | "execution_count": 38, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "type(top_header)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 37, 238 | "metadata": {}, 239 | "outputs": [ 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "'Trump Trial Adjourns Before Crucial Vote'" 244 | ] 245 | }, 246 | "execution_count": 37, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "top_header.parent.parent.h3.a.string" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 11, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "name": "stdout", 262 | "output_type": "stream", 263 | "text": [ 264 | "https://www.usnews.com/news/world-report/articles/2020-02-03/china-accuses-us-of-spreading-fear-as-coronavirus-death-toll-rises\n" 265 | ] 266 | } 267 | ], 268 | "source": [ 269 | "# Read and print the URL of the _second_ current top story to the screen\n", 270 | "links=top_content.h3.a['href']\n", 271 | "top_h3=top_content.findAll('h3')\n", 272 | "link_list=[]\n", 273 | "for tag in top_h3:\n", 274 | " for link in tag.findAll('a', href=True):\n", 275 | " link_list.append(link['href'])\n", 276 | "\n", 277 | "print(link_list[1])" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 12, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "# Navigate to the second top story‘s url\n", 287 | "top_second = requests.get(link_list[1],headers=agent)" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": 13, 293 | "metadata": {}, 294 | "outputs": [ 295 | { 296 | "name": "stdout", 297 | "output_type": "stream", 298 | "text": [ 299 | "China Accuses U.S. of Spreading Fear as Coronavirus Death Toll Rises \n" 300 | ] 301 | } 302 | ], 303 | "source": [ 304 | "# Read and print the header\n", 305 | "soup_2 = BeautifulSoup(top_second.content,'html.parser')\n", 306 | "header_second = soup_2.findAll('h1')\n", 307 | "print(header_second[0].text)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 14, 313 | "metadata": {}, 314 | "outputs": [ 315 | { 316 | "name": "stdout", 317 | "output_type": "stream", 318 | "text": [ 319 | "China Accuses U.S. of Spreading Fear as Coronavirus Death Toll Rises \n", 320 | "\n", 321 | " Evacuees board an evacuation flight for EU nationals, Feb. 2, 2020, at Wuhan Tianhe International Airport in Wuhan in central China's Hubei Province. (Arek Rataj/AP-File) \n", 322 | "\n", 323 | " The U.S. is overreacting and spreading fear about the coronavirus outbreak, China's Foreign Ministry said Monday. \n", 324 | "\n", 325 | " SEE: \n" 326 | ] 327 | } 328 | ], 329 | "source": [ 330 | "# Read and print the header & the first 3 sentences\n", 331 | "main_body = soup_2.find('main')\n", 332 | "main_content = main_body.find('div',id='usn-toc-content')\n", 333 | "body = main_content.findAll('p')\n", 334 | "\n", 335 | "three_list = []\n", 336 | "for p in body:\n", 337 | " three_list.append(p.text)\n", 338 | "print(header_second[0].text,\"\\n\\n\",three_list[0],\"\\n\\n\",three_list[1],\"\\n\\n\",three_list[2])" 339 | ] 340 | } 341 | ], 342 | "metadata": { 343 | "kernelspec": { 344 | "display_name": "Python 3", 345 | "language": "python", 346 | "name": "python3" 347 | }, 348 | "language_info": { 349 | "codemirror_mode": { 350 | "name": "ipython", 351 | "version": 3 352 | }, 353 | "file_extension": ".py", 354 | "mimetype": "text/x-python", 355 | "name": "python", 356 | "nbconvert_exporter": "python", 357 | "pygments_lexer": "ipython3", 358 | "version": "3.7.4" 359 | } 360 | }, 361 | "nbformat": 4, 362 | "nbformat_minor": 2 363 | } 364 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Other techniques & practices/Word Cloud.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np" 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 3, 16 | "metadata": {}, 17 | "outputs": [], 18 | "source": [ 19 | "from wordcloud import WordCloud, STOPWORDS\n", 20 | "from PIL import Image\n", 21 | "import numpy as np\n", 22 | "import urllib\n", 23 | "import requests\n", 24 | "import matplotlib.pyplot as plt" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "\n", 34 | "words = 'access guest guest apartment area area bathroom bed bed bed bed bed bedroom block coffee coffee coffee coffee entrance entry francisco free garden guest home house kettle kettle kitchen kitchen kitchen kitchen kitchen kitchenliving located microwave neighborhood new park parking place privacy private queen room san separate seperate shared space space space street suite time welcome'\n", 35 | "mask = np.array(Image.open(requests.get('http://www.clker.com/cliparts/O/i/x/Y/q/P/yellow-house-hi.png', stream=True).raw))\n", 36 | "\n", 37 | "# # This function takes in your text and your mask and generates a wordcloud. \n", 38 | "# def generate_wordcloud(words, mask):\n", 39 | "# word_cloud = WordCloud(width = 512, height = 512, background_color='white', stopwords=STOPWORDS, mask=mask).generate(words)\n", 40 | "# plt.figure(figsize=(10,8),facecolor = 'white', edgecolor='blue')\n", 41 | "# plt.imshow(word_cloud)\n", 42 | "# plt.axis('off')\n", 43 | "# plt.tight_layout(pad=0)\n", 44 | "# plt.show()" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 11, 50 | "metadata": {}, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "text/plain": [ 55 | "'ab,dd'" 56 | ] 57 | }, 58 | "execution_count": 11, 59 | "metadata": {}, 60 | "output_type": "execute_result" 61 | } 62 | ], 63 | "source": [ 64 | "'ab'+ ','+'dd'" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 22, 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "data": { 74 | "image/png": "\n", 75 | "text/plain": [ 76 | "
" 77 | ] 78 | }, 79 | "metadata": { 80 | "needs_background": "light" 81 | }, 82 | "output_type": "display_data" 83 | } 84 | ], 85 | "source": [ 86 | "# mask is to detemine the shape of the word cloud\n", 87 | "words = 'access guest guest apartment area area bathroom bed bed bed bed bed bedroom block coffee coffee coffee coffee entrance entry francisco free garden guest home house kettle kettle kitchen kitchen kitchen kitchen kitchen kitchenliving located microwave neighborhood new park parking place privacy private queen room san separate seperate shared space space space street suite time welcome'\n", 88 | "mask = np.array(Image.open(requests.get('http://www.clker.com/cliparts/O/i/x/Y/q/P/yellow-house-hi.png', stream=True).raw))\n", 89 | "alice_mask = np.array(Image.open(image_file))\n", 90 | "wd=WordCloud(width = 512, height = 512, background_color='white', stopwords=STOPWORDS, mask=mask)\n", 91 | "wd.generate(words)\n", 92 | "plt.imshow(wd)\n", 93 | "plt.axis('off')\n", 94 | "plt.tight_layout(pad=0)" 95 | ] 96 | } 97 | ], 98 | "metadata": { 99 | "kernelspec": { 100 | "display_name": "Python 3", 101 | "language": "python", 102 | "name": "python3" 103 | }, 104 | "language_info": { 105 | "codemirror_mode": { 106 | "name": "ipython", 107 | "version": 3 108 | }, 109 | "file_extension": ".py", 110 | "mimetype": "text/x-python", 111 | "name": "python", 112 | "nbconvert_exporter": "python", 113 | "pygments_lexer": "ipython3", 114 | "version": "3.7.4" 115 | } 116 | }, 117 | "nbformat": 4, 118 | "nbformat_minor": 2 119 | } 120 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Pandas & Numpy Practice/101 Pandas Exercises.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Pandas Practice\n", 8 | "### Practice Pandas using questions from online" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": {}, 15 | "outputs": [], 16 | "source": [ 17 | "# Q1\n", 18 | "import pandas as pd\n", 19 | "import numpy as np" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Transform objects to series\n", 27 | "##### `np.arange(start,stop,step)` has three parameters normally\n", 28 | "##### When there is only one parameter, default start with `0` and step is `1` stop till `parameter - 1` \n" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 4, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "mylist = list('Han')\n", 38 | "myarr = np.arange(26)\n", 39 | "mydict = dict(zip(mylist, myarr))\n", 40 | "\n", 41 | "myarr1 = np.array(mylist)\n", 42 | "myarr2 = np.array(myarr)\n", 43 | "myarr3 = np.array(mydict)\n", 44 | "\n", 45 | "ser1 = pd.Series(mylist)\n", 46 | "ser2 = pd.Series(myarr)\n", 47 | "ser3 = pd.Series(mydict)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "## Transform series to dataframe" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "#### `zip(a,b)` can turn `a` and `b` into tuple but `a` and `b` must be iteratable.\n", 62 | "#### `DataFrame.reset_index ()` can turn the original index into column." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 16, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "data": { 72 | "text/html": [ 73 | "
\n", 74 | "\n", 87 | "\n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | "
index0
0H0
1a1
2n2
\n", 113 | "
" 114 | ], 115 | "text/plain": [ 116 | " index 0\n", 117 | "0 H 0\n", 118 | "1 a 1\n", 119 | "2 n 2" 120 | ] 121 | }, 122 | "execution_count": 16, 123 | "metadata": {}, 124 | "output_type": "execute_result" 125 | } 126 | ], 127 | "source": [ 128 | "mylist = list('Han')\n", 129 | "myarr = np.arange(26)\n", 130 | "mydict = dict(zip(mylist, myarr))\n", 131 | "ser = pd.Series(mydict)\n", 132 | "\n", 133 | "df1 = ser.to_frame()\n", 134 | "df1 = df1.reset_index()\n", 135 | "df1" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "## Combine two series to form a dataframe\n", 143 | "#### `pd.Series(list)` will break down list by elements." 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 22, 149 | "metadata": {}, 150 | "outputs": [ 151 | { 152 | "data": { 153 | "text/html": [ 154 | "
\n", 155 | "\n", 168 | "\n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | "
01
0H0
1a1
2n2
\n", 194 | "
" 195 | ], 196 | "text/plain": [ 197 | " 0 1\n", 198 | "0 H 0\n", 199 | "1 a 1\n", 200 | "2 n 2" 201 | ] 202 | }, 203 | "execution_count": 22, 204 | "metadata": {}, 205 | "output_type": "execute_result" 206 | } 207 | ], 208 | "source": [ 209 | "import numpy as np\n", 210 | "ser1 = pd.Series(list('Han'))\n", 211 | "ser2 = pd.Series(np.arange(3))\n", 212 | "\n", 213 | "# solution 1\n", 214 | "dfQ4_1 = pd.DataFrame({'col1':ser1,'col2':ser2})\n", 215 | "\n", 216 | "# solution 2\n", 217 | "dfQ4_2 = pd.concat([ser1,ser2],axis=1)\n", 218 | "dfQ4_2" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "## Assign name to the series's index" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 24, 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "data": { 235 | "text/plain": [ 236 | "'ser_name'" 237 | ] 238 | }, 239 | "execution_count": 24, 240 | "metadata": {}, 241 | "output_type": "execute_result" 242 | } 243 | ], 244 | "source": [ 245 | "ser = pd.Series(list('Han'))\n", 246 | "\n", 247 | "ser.name=\"ser_name\"\n", 248 | "ser.name" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "## Get the items of series A not present in series B\n", 256 | "##### `~`can reverse the result, which is turn `True` into `False` and `False` into `True`.\n", 257 | "##### `ser1.isin(ser2)` return a list containing `True` and `False` for each position in ser1." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 28, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "data": { 267 | "text/plain": [ 268 | "0 False\n", 269 | "1 False\n", 270 | "2 False\n", 271 | "3 True\n", 272 | "4 True\n", 273 | "5 False\n", 274 | "6 False\n", 275 | "dtype: bool" 276 | ] 277 | }, 278 | "execution_count": 28, 279 | "metadata": {}, 280 | "output_type": "execute_result" 281 | } 282 | ], 283 | "source": [ 284 | "ser1 = pd.Series([1, 2, 3, 4, 5])\n", 285 | "ser2 = pd.Series([4, 5, 6, 7, 8])\n", 286 | "\n", 287 | "ser1[~ser1.isin(ser2)]\n", 288 | "ser1.isin(ser2)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "## Get the items not common to both series A and series B\n", 296 | "##### `np.union1d(ser1,ser2)` find union of two groups\n", 297 | "#### `np.intersect1d(ser1,ser2)` find intersection of two groups" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 29, 303 | "metadata": { 304 | "collapsed": true 305 | }, 306 | "outputs": [ 307 | { 308 | "data": { 309 | "text/plain": [ 310 | "0 1\n", 311 | "1 2\n", 312 | "2 3\n", 313 | "5 6\n", 314 | "6 7\n", 315 | "7 8\n", 316 | "dtype: int64" 317 | ] 318 | }, 319 | "execution_count": 29, 320 | "metadata": {}, 321 | "output_type": "execute_result" 322 | } 323 | ], 324 | "source": [ 325 | "ser1 = pd.Series([1, 2, 3, 4, 5])\n", 326 | "ser2 = pd.Series([4, 5, 6, 7, 8])\n", 327 | "\n", 328 | "ser_u = pd.Series(np.union1d(ser1,ser2))\n", 329 | "ser_i = pd.Series(np.intersect1d(ser1,ser2))\n", 330 | "ser_u[~ser_u.isin(ser_i)]" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 30, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "data": { 340 | "text/plain": [ 341 | "0 1\n", 342 | "1 2\n", 343 | "Name: col1, dtype: int64" 344 | ] 345 | }, 346 | "execution_count": 30, 347 | "metadata": {}, 348 | "output_type": "execute_result" 349 | } 350 | ], 351 | "source": [ 352 | "dfQ7 = pd.DataFrame({'col1':[1,2,3,4,5],\"col2\":[3,4,5,6,7]})\n", 353 | "dfQ7[\"col1\"][~dfQ7[\"col1\"].isin(dfQ7[\"col2\"])]" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": {}, 359 | "source": [ 360 | "## The minimum, 25th percentile, median, 75th, and max of a numeric series" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "#### `np.random.normal(loc, scale, size)` generate Gaussian distribution numbers\n", 368 | "#### `np.percentile(Series, percentage)` can find the percentile for a series of numbers." 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": 51, 374 | "metadata": {}, 375 | "outputs": [ 376 | { 377 | "data": { 378 | "text/plain": [ 379 | "array([ 2.79446817, 7.40264008, 10.18758552, 14.01731645, 18.88070387])" 380 | ] 381 | }, 382 | "execution_count": 51, 383 | "metadata": {}, 384 | "output_type": "execute_result" 385 | } 386 | ], 387 | "source": [ 388 | "serQ8 = pd.Series(np.random.normal(10, 5, 25))\n", 389 | "np.percentile(serQ8,[0,25,50,75,100])" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "## Dataframe solution" 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": 60, 402 | "metadata": {}, 403 | "outputs": [ 404 | { 405 | "data": { 406 | "text/plain": [ 407 | "numpy.float64" 408 | ] 409 | }, 410 | "execution_count": 60, 411 | "metadata": {}, 412 | "output_type": "execute_result" 413 | } 414 | ], 415 | "source": [ 416 | "dfQ8 = pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9],'col2':[10,20,30,40,50,60,70,80,90]})\n", 417 | "np.percentile(dfQ8[\"col1\"],100)" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "## Get frequency counts of unique items of a series\n", 425 | "##### `np.take()` take out the elements one by one.\n", 426 | "##### if a is a array, `a.take(m,1)` means take the mth number of each row; `a.take(m,0)` takes mth row." 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 52, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "serQ9 = pd.Series(np.take(list('abcdefgh'),np.random.randint(8,size=30)))" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 61, 441 | "metadata": {}, 442 | "outputs": [ 443 | { 444 | "data": { 445 | "text/plain": [ 446 | "array(['n', 'n', 'a', 'a', 'n', 'a', 'a', 'a'], dtype='\n", 606 | "\n", 619 | "\n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | "
col1col2
01a
12a
23a
04b
15b
26b
\n", 660 | "" 661 | ], 662 | "text/plain": [ 663 | " col1 col2\n", 664 | "0 1 a\n", 665 | "1 2 a\n", 666 | "2 3 a\n", 667 | "0 4 b\n", 668 | "1 5 b\n", 669 | "2 6 b" 670 | ] 671 | }, 672 | "execution_count": 5, 673 | "metadata": {}, 674 | "output_type": "execute_result" 675 | } 676 | ], 677 | "source": [ 678 | "ser1Q15 = pd.Series(range(5))\n", 679 | "ser2Q15 = pd.Series(list('abcde'))\n", 680 | "\n", 681 | "ser1Q15.append(ser2Q15)\n", 682 | "df1Q15 = pd.concat([ser1Q15,ser2Q15],axis=1)\n", 683 | "\n", 684 | "df2Q15 = pd.DataFrame({'col1':[1,2,3],'col2':['a','a','a']})\n", 685 | "df3Q15 = pd.DataFrame({'col1':[4,5,6],'col2':['b','b','b']})\n", 686 | "pd.concat([df2Q15,df3Q15],axis = 0)" 687 | ] 688 | }, 689 | { 690 | "cell_type": "markdown", 691 | "metadata": {}, 692 | "source": [ 693 | "## Q16 get the positions of items of series A in another series B\n", 694 | "##### pd.Index(series) 返回series的值\n", 695 | "##### get_loc(i) 返回i的下标" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": 39, 701 | "metadata": {}, 702 | "outputs": [ 703 | { 704 | "ename": "SyntaxError", 705 | "evalue": "invalid syntax (, line 5)", 706 | "output_type": "error", 707 | "traceback": [ 708 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m5\u001b[0m\n\u001b[0;31m pd.Index(ser1Q16).get_loc(i) for i in ser2Q16\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" 709 | ] 710 | } 711 | ], 712 | "source": [ 713 | "# Q16 get the positions of items of series A in another series B\n", 714 | "ser1Q16 = pd.Series([1,4,5,6,7,10])\n", 715 | "ser2Q16 = pd.Series([4,5,10])\n", 716 | "\n", 717 | "\n", 718 | "[pd.Index(ser1Q16).get_loc(i) for i in ser2Q16]\n" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "## Q17 compute the mean squared error on a truth and predicted series\n", 726 | "##### **乘方" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": 3, 732 | "metadata": {}, 733 | "outputs": [ 734 | { 735 | "data": { 736 | "text/plain": [ 737 | "0.2518839544044974" 738 | ] 739 | }, 740 | "execution_count": 3, 741 | "metadata": {}, 742 | "output_type": "execute_result" 743 | } 744 | ], 745 | "source": [ 746 | "truth = pd.Series(range(10))\n", 747 | "pred = pd.Series(range(10)) + np.random.random(10)\n", 748 | "\n", 749 | "np.mean((truth - pred)**2)\n" 750 | ] 751 | }, 752 | { 753 | "cell_type": "markdown", 754 | "metadata": {}, 755 | "source": [ 756 | "## Q18 convert the first character of each element in a series to uppercase\n", 757 | "##### map & apply 区别\n", 758 | "##### map (其实是python自带的)用于series上,是元素级别的操作\n", 759 | "##### applymap 应用在dataframe的每个元素上\n", 760 | "##### apply 应用在dataframe 一列上?" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": 8, 766 | "metadata": {}, 767 | "outputs": [ 768 | { 769 | "data": { 770 | "text/plain": [ 771 | "0 How\n", 772 | "1 To\n", 773 | "2 Kick\n", 774 | "3 Ass?\n", 775 | "dtype: object" 776 | ] 777 | }, 778 | "execution_count": 8, 779 | "metadata": {}, 780 | "output_type": "execute_result" 781 | } 782 | ], 783 | "source": [ 784 | "serQ18 = pd.Series(['how', 'to', 'kick', 'ass?'])\n", 785 | "\n", 786 | "# solution 1\n", 787 | "serQ18.map(lambda x:x.title())\n", 788 | "\n", 789 | "# soluion 2\n", 790 | "serQ18.map(lambda x:x[0].upper() + x[1:])\n", 791 | "\n", 792 | "# solution 3\n", 793 | "pd.Series([i.title() for i in serQ18])" 794 | ] 795 | }, 796 | { 797 | "cell_type": "markdown", 798 | "metadata": {}, 799 | "source": [ 800 | "## Q19 calculate the number of characters in each word in a series\n" 801 | ] 802 | }, 803 | { 804 | "cell_type": "code", 805 | "execution_count": 10, 806 | "metadata": {}, 807 | "outputs": [ 808 | { 809 | "data": { 810 | "text/plain": [ 811 | "0 3\n", 812 | "1 2\n", 813 | "2 2\n", 814 | "dtype: int64" 815 | ] 816 | }, 817 | "execution_count": 10, 818 | "metadata": {}, 819 | "output_type": "execute_result" 820 | } 821 | ], 822 | "source": [ 823 | "serQ19 = pd.Series(['how','to','my'])\n", 824 | "\n", 825 | "serQ19.map(lambda x:len(x))" 826 | ] 827 | }, 828 | { 829 | "cell_type": "markdown", 830 | "metadata": {}, 831 | "source": [ 832 | "## Q20 compute difference of differences between consequtive numbers of a series\n", 833 | "###### diff _ a[n] - a[n-1]\n" 834 | ] 835 | }, 836 | { 837 | "cell_type": "code", 838 | "execution_count": 13, 839 | "metadata": {}, 840 | "outputs": [ 841 | { 842 | "name": "stdout", 843 | "output_type": "stream", 844 | "text": [ 845 | "[nan, nan, -1.0, 2.0, -2.0, 1.0, 1.0]\n" 846 | ] 847 | } 848 | ], 849 | "source": [ 850 | "serQ20 = pd.Series([3,5,6,9,10,12,15])\n", 851 | "\n", 852 | "print(serQ20.diff().diff().tolist())" 853 | ] 854 | }, 855 | { 856 | "cell_type": "markdown", 857 | "metadata": {}, 858 | "source": [ 859 | "## Q21 convert a series of date-strings to a timeseries\n" 860 | ] 861 | }, 862 | { 863 | "cell_type": "code", 864 | "execution_count": 18, 865 | "metadata": {}, 866 | "outputs": [ 867 | { 868 | "data": { 869 | "text/plain": [ 870 | "0 2010-01-01 00:00:00\n", 871 | "1 2011-02-02 00:00:00\n", 872 | "2 2012-03-03 00:00:00\n", 873 | "3 2013-04-04 00:00:00\n", 874 | "4 2014-05-05 00:00:00\n", 875 | "5 2015-06-06 12:20:00\n", 876 | "dtype: datetime64[ns]" 877 | ] 878 | }, 879 | "execution_count": 18, 880 | "metadata": {}, 881 | "output_type": "execute_result" 882 | } 883 | ], 884 | "source": [ 885 | "serQ21 = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])\n", 886 | "\n", 887 | "pd.to_datetime(serQ21)" 888 | ] 889 | }, 890 | { 891 | "cell_type": "markdown", 892 | "metadata": {}, 893 | "source": [ 894 | "## Q22 get the day of month, week number, day of year and day of week from a series of date strings\n" 895 | ] 896 | }, 897 | { 898 | "cell_type": "code", 899 | "execution_count": 21, 900 | "metadata": {}, 901 | "outputs": [], 902 | "source": [ 903 | "# 没做好\n", 904 | "serQ22 = pd.Series(['01 Jan 2010', '02-02-2011', '20120303', '2013/04/04', '2014-05-05', '2015-06-06T12:20'])\n", 905 | "\n", 906 | "from dateutil.parser import parse\n", 907 | "# parse \n", 908 | "ser_ts = serQ22.map(lambda x: parse(x))\n" 909 | ] 910 | }, 911 | { 912 | "cell_type": "markdown", 913 | "metadata": {}, 914 | "source": [ 915 | "## Q23 convert year-month string to dates corresponding to the 4th day of the month\n" 916 | ] 917 | }, 918 | { 919 | "cell_type": "code", 920 | "execution_count": 24, 921 | "metadata": {}, 922 | "outputs": [ 923 | { 924 | "data": { 925 | "text/plain": [ 926 | "0 2010-01-02\n", 927 | "1 2011-02-02\n", 928 | "2 2012-03-02\n", 929 | "dtype: datetime64[ns]" 930 | ] 931 | }, 932 | "execution_count": 24, 933 | "metadata": {}, 934 | "output_type": "execute_result" 935 | } 936 | ], 937 | "source": [ 938 | "serQ23 = pd.Series(['Jan 2010', 'Feb 2011', 'Mar 2012'])\n", 939 | "\n", 940 | "serQ23_ts = serQ23.map(lambda x: parse(x))" 941 | ] 942 | }, 943 | { 944 | "cell_type": "markdown", 945 | "metadata": {}, 946 | "source": [ 947 | "## Q24 filter words that contain atleast 2 vowels from a series\n", 948 | "##### Counter 返回一个dictionary,每个字母有多少个;\n", 949 | "##### get(key,value); 寻找一个字典里key对应的值,value是如果key不存在返回的值" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": 33, 955 | "metadata": {}, 956 | "outputs": [ 957 | { 958 | "data": { 959 | "text/plain": [ 960 | "0 Apple\n", 961 | "1 Orange\n", 962 | "4 Money\n", 963 | "dtype: object" 964 | ] 965 | }, 966 | "execution_count": 33, 967 | "metadata": {}, 968 | "output_type": "execute_result" 969 | } 970 | ], 971 | "source": [ 972 | "serQ24 = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money'])\n", 973 | "\n", 974 | "from collections import Counter\n", 975 | "serQ24[serQ24.map(lambda x: sum(Counter(x.lower()).get(i,0) for i in list('aeiou')) >=2)]" 976 | ] 977 | }, 978 | { 979 | "cell_type": "markdown", 980 | "metadata": {}, 981 | "source": [ 982 | "## Q25 filter valid emails from a series\n" 983 | ] 984 | }, 985 | { 986 | "cell_type": "code", 987 | "execution_count": 26, 988 | "metadata": {}, 989 | "outputs": [ 990 | { 991 | "ename": "NameError", 992 | "evalue": "name 'x' is not defined", 993 | "output_type": "error", 994 | "traceback": [ 995 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 996 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 997 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0mCounter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlower\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'aeiou'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 998 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0mCounter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlower\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'aeiou'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 999 | "\u001b[0;31mNameError\u001b[0m: name 'x' is not defined" 1000 | ] 1001 | } 1002 | ], 1003 | "source": [ 1004 | "emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com'])\n" 1005 | ] 1006 | }, 1007 | { 1008 | "cell_type": "markdown", 1009 | "metadata": {}, 1010 | "source": [ 1011 | "## 26 get the mean of a series grouped by another series\n", 1012 | "##### np.random.choice(list,int) 从第一个list里随机选int个数\n", 1013 | "##### np.linspace(start, stop, num) 创建等差数列\n", 1014 | "##### tolist() 可将series转换成list\n" 1015 | ] 1016 | }, 1017 | { 1018 | "cell_type": "code", 1019 | "execution_count": 41, 1020 | "metadata": {}, 1021 | "outputs": [ 1022 | { 1023 | "name": "stdout", 1024 | "output_type": "stream", 1025 | "text": [ 1026 | "[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]\n", 1027 | "['apple', 'apple', 'carrot', 'banana', 'carrot', 'banana', 'carrot', 'apple', 'carrot', 'carrot']\n" 1028 | ] 1029 | }, 1030 | { 1031 | "data": { 1032 | "text/plain": [ 1033 | "Index(['apple', 'banana', 'carrot'], dtype='object')" 1034 | ] 1035 | }, 1036 | "execution_count": 41, 1037 | "metadata": {}, 1038 | "output_type": "execute_result" 1039 | } 1040 | ], 1041 | "source": [ 1042 | "fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10))\n", 1043 | "\n", 1044 | "weights = pd.Series(np.linspace(1, 10, 10))\n", 1045 | "\n", 1046 | "print(weights.tolist())\n", 1047 | "print(fruit.tolist())\n", 1048 | "\n", 1049 | "weights.groupby(fruit).max()" 1050 | ] 1051 | }, 1052 | { 1053 | "cell_type": "markdown", 1054 | "metadata": {}, 1055 | "source": [ 1056 | "## Q27 compute the euclidean distance between two series\n", 1057 | "##### np.linalg.norm() 求范数\n" 1058 | ] 1059 | }, 1060 | { 1061 | "cell_type": "code", 1062 | "execution_count": 49, 1063 | "metadata": {}, 1064 | "outputs": [ 1065 | { 1066 | "data": { 1067 | "text/plain": [ 1068 | "18.16590212458495" 1069 | ] 1070 | }, 1071 | "execution_count": 49, 1072 | "metadata": {}, 1073 | "output_type": "execute_result" 1074 | } 1075 | ], 1076 | "source": [ 1077 | "p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n", 1078 | "q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])\n", 1079 | "dist = sum((p-q)**2)**.5\n", 1080 | "dist = np.linalg.norm(p-q)\n", 1081 | "\n", 1082 | "dist" 1083 | ] 1084 | }, 1085 | { 1086 | "cell_type": "markdown", 1087 | "metadata": {}, 1088 | "source": [ 1089 | "# Q28 find all the local maxima (or peaks) in a numeric series\n", 1090 | "##### sign() 结果1,0,-1分别对应正数、0和负数\n", 1091 | "##### np.where(condition, x, y) —— 满足输出x,不满足输出y\n" 1092 | ] 1093 | }, 1094 | { 1095 | "cell_type": "code", 1096 | "execution_count": 52, 1097 | "metadata": {}, 1098 | "outputs": [ 1099 | { 1100 | "data": { 1101 | "text/plain": [ 1102 | "array([1, 5, 7])" 1103 | ] 1104 | }, 1105 | "execution_count": 52, 1106 | "metadata": {}, 1107 | "output_type": "execute_result" 1108 | } 1109 | ], 1110 | "source": [ 1111 | "serQ28 = pd.Series([2, 10, 3, 4, 9, 10, 2, 7, 3])\n", 1112 | "dd = np.diff(np.sign(np.diff(serQ28)))\n", 1113 | "\n", 1114 | "peak_locs = np.where(dd == -2)[0] + 1\n", 1115 | "\n", 1116 | "peak_locs" 1117 | ] 1118 | }, 1119 | { 1120 | "cell_type": "markdown", 1121 | "metadata": {}, 1122 | "source": [ 1123 | "## Q29 replace missing spaces in a string with the least frequent character" 1124 | ] 1125 | }, 1126 | { 1127 | "cell_type": "code", 1128 | "execution_count": 69, 1129 | "metadata": {}, 1130 | "outputs": [ 1131 | { 1132 | "data": { 1133 | "text/plain": [ 1134 | "'g'" 1135 | ] 1136 | }, 1137 | "execution_count": 69, 1138 | "metadata": {}, 1139 | "output_type": "execute_result" 1140 | } 1141 | ], 1142 | "source": [ 1143 | "my_str = 'dbc deb abed gade'\n", 1144 | "serQ29 = pd.Series(list('dbc deb abed gade'))\n", 1145 | "freqQ29 = serQ29.value_counts()\n", 1146 | "least_freqQ29 = freqQ29.dropna().index[-1]\n", 1147 | "least_freqQ29" 1148 | ] 1149 | }, 1150 | { 1151 | "cell_type": "markdown", 1152 | "metadata": {}, 1153 | "source": [ 1154 | "## 30 create a TimeSeries starting ‘2000-01-01’ and 10 weekends (saturdays) after that having random numbers as values\n" 1155 | ] 1156 | }, 1157 | { 1158 | "cell_type": "code", 1159 | "execution_count": null, 1160 | "metadata": {}, 1161 | "outputs": [], 1162 | "source": [ 1163 | "# 看不懂题\n", 1164 | "serQ30 = pd.Series(np.random.randint(1,10,10), pd.date_range('2000-01-01', periods=10, freq='W-SAT'))" 1165 | ] 1166 | }, 1167 | { 1168 | "cell_type": "markdown", 1169 | "metadata": {}, 1170 | "source": [ 1171 | "## 31 fill an intermittent time series so all missing dates show up with values of previous non-missing date" 1172 | ] 1173 | }, 1174 | { 1175 | "cell_type": "code", 1176 | "execution_count": null, 1177 | "metadata": {}, 1178 | "outputs": [], 1179 | "source": [ 1180 | "# 31 fill an intermittent time series so all missing dates show up with values of previous non-missing date\n", 1181 | "# resample 用于重新采样???\n" 1182 | ] 1183 | }, 1184 | { 1185 | "cell_type": "markdown", 1186 | "metadata": {}, 1187 | "source": [ 1188 | "## Q32 compute the autocorrelations of a numeric series\n", 1189 | "##### np.random.normal(loc, scale, size) —— loc:central for the distribution; scale:standard deviation\n" 1190 | ] 1191 | }, 1192 | { 1193 | "cell_type": "code", 1194 | "execution_count": 5, 1195 | "metadata": {}, 1196 | "outputs": [], 1197 | "source": [ 1198 | "serQ32 = pd.Series(np.arange(20) + np.random.normal(1, 10, 20))\n" 1199 | ] 1200 | }, 1201 | { 1202 | "cell_type": "markdown", 1203 | "metadata": {}, 1204 | "source": [ 1205 | "## Q33 How to import only every nth row from a csv file to create a dataframe?\n", 1206 | "##### 主要用chunksize" 1207 | ] 1208 | }, 1209 | { 1210 | "cell_type": "code", 1211 | "execution_count": 7, 1212 | "metadata": {}, 1213 | "outputs": [], 1214 | "source": [ 1215 | "df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', chunksize=50)\n", 1216 | "# solution 1\n", 1217 | "df2 = pd.DataFrame()\n", 1218 | "for chunk in df:\n", 1219 | " df2 = df2.append(chunk.iloc[0,:])" 1220 | ] 1221 | }, 1222 | { 1223 | "cell_type": "markdown", 1224 | "metadata": {}, 1225 | "source": [ 1226 | "## Q34 change column values when importing csv to a dataframe" 1227 | ] 1228 | }, 1229 | { 1230 | "cell_type": "code", 1231 | "execution_count": null, 1232 | "metadata": {}, 1233 | "outputs": [], 1234 | "source": [] 1235 | } 1236 | ], 1237 | "metadata": { 1238 | "kernelspec": { 1239 | "display_name": "Python 3", 1240 | "language": "python", 1241 | "name": "python3" 1242 | }, 1243 | "language_info": { 1244 | "codemirror_mode": { 1245 | "name": "ipython", 1246 | "version": 3 1247 | }, 1248 | "file_extension": ".py", 1249 | "mimetype": "text/x-python", 1250 | "name": "python", 1251 | "nbconvert_exporter": "python", 1252 | "pygments_lexer": "ipython3", 1253 | "version": "3.7.3" 1254 | } 1255 | }, 1256 | "nbformat": 4, 1257 | "nbformat_minor": 2 1258 | } 1259 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Pandas & Numpy Practice/Exhaustive Introduction to Pandas (1).ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 68, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 69, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "data_origin = pd.read_csv(\"Data_pandas.csv\")\n", 19 | "data = data_origin.copy()" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 70, 25 | "metadata": {}, 26 | "outputs": [ 27 | { 28 | "data": { 29 | "text/html": [ 30 | "
\n", 31 | "\n", 44 | "\n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | "
IDNameLogInTimeLogOutTimePurchaseNumLocationPurchaseAmountSaving
0101Jack2018/2/10 13:102018/2/10 14:301Home150.01500
\n", 72 | "
" 73 | ], 74 | "text/plain": [ 75 | " ID Name LogInTime LogOutTime PurchaseNum Location \\\n", 76 | "0 101 Jack 2018/2/10 13:10 2018/2/10 14:30 1 Home \n", 77 | "\n", 78 | " PurchaseAmount Saving \n", 79 | "0 150.0 1500 " 80 | ] 81 | }, 82 | "execution_count": 70, 83 | "metadata": {}, 84 | "output_type": "execute_result" 85 | } 86 | ], 87 | "source": [ 88 | "data.head(1)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 71, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "data": { 98 | "text/html": [ 99 | "
\n", 100 | "\n", 113 | "\n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | "
IDNameLogInTimeLogOutTimePurchaseNumLocationPurchaseAmountSaving
18103Tom2018/11/11 15:252018/11/11 19:451Home50.03045
\n", 141 | "
" 142 | ], 143 | "text/plain": [ 144 | " ID Name LogInTime LogOutTime PurchaseNum Location \\\n", 145 | "18 103 Tom 2018/11/11 15:25 2018/11/11 19:45 1 Home \n", 146 | "\n", 147 | " PurchaseAmount Saving \n", 148 | "18 50.0 3045 " 149 | ] 150 | }, 151 | "execution_count": 71, 152 | "metadata": {}, 153 | "output_type": "execute_result" 154 | } 155 | ], 156 | "source": [ 157 | "data.tail(1)" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 72, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "\n", 170 | "RangeIndex: 19 entries, 0 to 18\n", 171 | "Data columns (total 8 columns):\n", 172 | "ID 19 non-null int64\n", 173 | "Name 19 non-null object\n", 174 | "LogInTime 19 non-null object\n", 175 | "LogOutTime 19 non-null object\n", 176 | "PurchaseNum 19 non-null int64\n", 177 | "Location 19 non-null object\n", 178 | "PurchaseAmount 15 non-null float64\n", 179 | "Saving 19 non-null int64\n", 180 | "dtypes: float64(1), int64(3), object(4)\n", 181 | "memory usage: 1.3+ KB\n" 182 | ] 183 | } 184 | ], 185 | "source": [ 186 | "data.info()" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 73, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "text/html": [ 197 | "
\n", 198 | "\n", 211 | "\n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | "
IDPurchaseNumPurchaseAmountSaving
count19.00000019.00000015.00000019.000000
mean101.9473681.47368478.2000001878.789474
std0.9112681.21876258.4199091113.136988
min101.0000000.00000020.000000670.000000
25%101.0000001.00000040.0000001016.000000
50%102.0000001.00000050.0000001350.000000
75%103.0000002.00000097.5000003095.000000
max103.0000004.000000200.0000003500.000000
\n", 280 | "
" 281 | ], 282 | "text/plain": [ 283 | " ID PurchaseNum PurchaseAmount Saving\n", 284 | "count 19.000000 19.000000 15.000000 19.000000\n", 285 | "mean 101.947368 1.473684 78.200000 1878.789474\n", 286 | "std 0.911268 1.218762 58.419909 1113.136988\n", 287 | "min 101.000000 0.000000 20.000000 670.000000\n", 288 | "25% 101.000000 1.000000 40.000000 1016.000000\n", 289 | "50% 102.000000 1.000000 50.000000 1350.000000\n", 290 | "75% 103.000000 2.000000 97.500000 3095.000000\n", 291 | "max 103.000000 4.000000 200.000000 3500.000000" 292 | ] 293 | }, 294 | "execution_count": 73, 295 | "metadata": {}, 296 | "output_type": "execute_result" 297 | } 298 | ], 299 | "source": [ 300 | "data.describe()" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 74, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "data": { 310 | "text/plain": [ 311 | "RangeIndex(start=0, stop=19, step=1)" 312 | ] 313 | }, 314 | "execution_count": 74, 315 | "metadata": {}, 316 | "output_type": "execute_result" 317 | } 318 | ], 319 | "source": [ 320 | "data.index" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 75, 326 | "metadata": {}, 327 | "outputs": [ 328 | { 329 | "data": { 330 | "text/plain": [ 331 | "Index(['ID', 'Name', 'LogInTime', 'LogOutTime', 'PurchaseNum', 'Location',\n", 332 | " 'PurchaseAmount', 'Saving'],\n", 333 | " dtype='object')" 334 | ] 335 | }, 336 | "execution_count": 75, 337 | "metadata": {}, 338 | "output_type": "execute_result" 339 | } 340 | ], 341 | "source": [ 342 | "data.columns" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "# Difference between `iloc[]` and `loc[]`" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "## `loc[]`\n", 357 | "### Only receive labels (single label or list of lagels) and boolean value" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "## `iloc[]`\n", 365 | "### Only receive integer (single integer or list of integers) and boolean value" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "# Select single value" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 76, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [ 381 | "# In `iloc[]`, for slicing index, it includes the start but not include the end.\n", 382 | "# Starting from 0." 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": {}, 388 | "source": [ 389 | "## Using position to select single value" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": 77, 395 | "metadata": {}, 396 | "outputs": [ 397 | { 398 | "data": { 399 | "text/html": [ 400 | "
\n", 401 | "\n", 414 | "\n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | "
IDNameLogInTimeLogOutTimePurchaseNumLocationPurchaseAmountSaving
0101Jack2018/2/10 13:102018/2/10 14:301Home150.01500
1101Jack2018/2/15 18:002018/2/15 19:000HomeNaN1350
2101Jack2018/2/15 20:002018/2/15 22:300HomeNaN1350
\n", 464 | "
" 465 | ], 466 | "text/plain": [ 467 | " ID Name LogInTime LogOutTime PurchaseNum Location \\\n", 468 | "0 101 Jack 2018/2/10 13:10 2018/2/10 14:30 1 Home \n", 469 | "1 101 Jack 2018/2/15 18:00 2018/2/15 19:00 0 Home \n", 470 | "2 101 Jack 2018/2/15 20:00 2018/2/15 22:30 0 Home \n", 471 | "\n", 472 | " PurchaseAmount Saving \n", 473 | "0 150.0 1500 \n", 474 | "1 NaN 1350 \n", 475 | "2 NaN 1350 " 476 | ] 477 | }, 478 | "execution_count": 77, 479 | "metadata": {}, 480 | "output_type": "execute_result" 481 | } 482 | ], 483 | "source": [ 484 | "data.head(3)" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": {}, 490 | "source": [ 491 | "#### Select the data point using range index\n" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 78, 497 | "metadata": {}, 498 | "outputs": [ 499 | { 500 | "data": { 501 | "text/html": [ 502 | "
\n", 503 | "\n", 516 | "\n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | "
LogInTime
12018/2/15 18:00
\n", 530 | "
" 531 | ], 532 | "text/plain": [ 533 | " LogInTime\n", 534 | "1 2018/2/15 18:00" 535 | ] 536 | }, 537 | "execution_count": 78, 538 | "metadata": {}, 539 | "output_type": "execute_result" 540 | } 541 | ], 542 | "source": [ 543 | "data.iloc[1:2,2:3]" 544 | ] 545 | }, 546 | { 547 | "cell_type": "markdown", 548 | "metadata": {}, 549 | "source": [ 550 | "#### Select the data point using single integer\n" 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": 79, 556 | "metadata": {}, 557 | "outputs": [ 558 | { 559 | "data": { 560 | "text/plain": [ 561 | "'2018/2/15 18:00'" 562 | ] 563 | }, 564 | "execution_count": 79, 565 | "metadata": {}, 566 | "output_type": "execute_result" 567 | } 568 | ], 569 | "source": [ 570 | "data.iloc[1,2]" 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "metadata": {}, 576 | "source": [ 577 | "#### Select the data point using list of integers\n" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 80, 583 | "metadata": {}, 584 | "outputs": [ 585 | { 586 | "data": { 587 | "text/html": [ 588 | "
\n", 589 | "\n", 602 | "\n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | "
LogInTime
12018/2/15 18:00
\n", 616 | "
" 617 | ], 618 | "text/plain": [ 619 | " LogInTime\n", 620 | "1 2018/2/15 18:00" 621 | ] 622 | }, 623 | "execution_count": 80, 624 | "metadata": {}, 625 | "output_type": "execute_result" 626 | } 627 | ], 628 | "source": [ 629 | "data.iloc[[1],[2]]" 630 | ] 631 | }, 632 | { 633 | "cell_type": "markdown", 634 | "metadata": {}, 635 | "source": [ 636 | "## Using column names & row indices to select single value" 637 | ] 638 | }, 639 | { 640 | "cell_type": "code", 641 | "execution_count": 81, 642 | "metadata": {}, 643 | "outputs": [], 644 | "source": [ 645 | "# Row index starts from 0\n", 646 | "# This is using column name and row index to selece value instead of row number. Because the default row index starts from 0 and increase 1 by each row.\n" 647 | ] 648 | }, 649 | { 650 | "cell_type": "markdown", 651 | "metadata": {}, 652 | "source": [ 653 | "#### Select the data point using a single label" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": 85, 659 | "metadata": { 660 | "scrolled": true 661 | }, 662 | "outputs": [ 663 | { 664 | "data": { 665 | "text/plain": [ 666 | "'2018/2/15 18:00'" 667 | ] 668 | }, 669 | "execution_count": 85, 670 | "metadata": {}, 671 | "output_type": "execute_result" 672 | } 673 | ], 674 | "source": [ 675 | "data.loc[\"index_1\",\"LogInTime\"]" 676 | ] 677 | }, 678 | { 679 | "cell_type": "code", 680 | "execution_count": 86, 681 | "metadata": {}, 682 | "outputs": [], 683 | "source": [ 684 | "data = data_origin" 685 | ] 686 | }, 687 | { 688 | "cell_type": "markdown", 689 | "metadata": {}, 690 | "source": [ 691 | "#### Select data points by labels when the row index is default\n" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": 87, 697 | "metadata": {}, 698 | "outputs": [ 699 | { 700 | "data": { 701 | "text/plain": [ 702 | "'2018/2/15 18:00'" 703 | ] 704 | }, 705 | "execution_count": 87, 706 | "metadata": {}, 707 | "output_type": "execute_result" 708 | } 709 | ], 710 | "source": [ 711 | "data.loc[1,\"LogInTime\"]" 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": 88, 717 | "metadata": {}, 718 | "outputs": [ 719 | { 720 | "data": { 721 | "text/plain": [ 722 | "RangeIndex(start=0, stop=19, step=1)" 723 | ] 724 | }, 725 | "execution_count": 88, 726 | "metadata": {}, 727 | "output_type": "execute_result" 728 | } 729 | ], 730 | "source": [ 731 | "data.index" 732 | ] 733 | }, 734 | { 735 | "cell_type": "code", 736 | "execution_count": 89, 737 | "metadata": {}, 738 | "outputs": [], 739 | "source": [ 740 | "# Set new index\n", 741 | "new_index = []\n", 742 | "for i in range(len(data)):\n", 743 | " new_index.append(\"index_\" + str(i))\n", 744 | "data.index = new_index" 745 | ] 746 | }, 747 | { 748 | "cell_type": "markdown", 749 | "metadata": {}, 750 | "source": [ 751 | "#### Select data points when row index is not default value\n" 752 | ] 753 | }, 754 | { 755 | "cell_type": "code", 756 | "execution_count": 91, 757 | "metadata": {}, 758 | "outputs": [ 759 | { 760 | "data": { 761 | "text/plain": [ 762 | "'2018/2/15 18:00'" 763 | ] 764 | }, 765 | "execution_count": 91, 766 | "metadata": {}, 767 | "output_type": "execute_result" 768 | } 769 | ], 770 | "source": [ 771 | "data.loc[\"index_1\",\"LogInTime\"]" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": {}, 777 | "source": [ 778 | "#### The \"number\" label doesn't work now\n" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": null, 784 | "metadata": {}, 785 | "outputs": [], 786 | "source": [ 787 | "# Cannot run after assigning new value to row indices\n", 788 | "# data.loc[1,\"LogInTime\"]" 789 | ] 790 | }, 791 | { 792 | "cell_type": "code", 793 | "execution_count": 45, 794 | "metadata": {}, 795 | "outputs": [ 796 | { 797 | "data": { 798 | "text/plain": [ 799 | "'Jack'" 800 | ] 801 | }, 802 | "execution_count": 45, 803 | "metadata": {}, 804 | "output_type": "execute_result" 805 | } 806 | ], 807 | "source": [ 808 | "# Avoid this method\n", 809 | "# https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-view-versus-copy\n", 810 | "# When assign value, there might be warning\n", 811 | "\n", 812 | "data[\"Name\"][2]" 813 | ] 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "metadata": {}, 818 | "source": [ 819 | "#### Select the data point using range idnex of labels" 820 | ] 821 | }, 822 | { 823 | "cell_type": "code", 824 | "execution_count": 93, 825 | "metadata": {}, 826 | "outputs": [ 827 | { 828 | "data": { 829 | "text/html": [ 830 | "
\n", 831 | "\n", 844 | "\n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | "
LogInTime
index_12018/2/15 18:00
\n", 858 | "
" 859 | ], 860 | "text/plain": [ 861 | " LogInTime\n", 862 | "index_1 2018/2/15 18:00" 863 | ] 864 | }, 865 | "execution_count": 93, 866 | "metadata": {}, 867 | "output_type": "execute_result" 868 | } 869 | ], 870 | "source": [ 871 | "data.loc[\"index_1\":\"index_1\",\"LogInTime\":\"LogInTime\"]" 872 | ] 873 | }, 874 | { 875 | "cell_type": "markdown", 876 | "metadata": {}, 877 | "source": [ 878 | "#### Select the data point using a list of labels" 879 | ] 880 | }, 881 | { 882 | "cell_type": "code", 883 | "execution_count": 94, 884 | "metadata": {}, 885 | "outputs": [ 886 | { 887 | "data": { 888 | "text/html": [ 889 | "
\n", 890 | "\n", 903 | "\n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | "
LogInTime
index_12018/2/15 18:00
\n", 917 | "
" 918 | ], 919 | "text/plain": [ 920 | " LogInTime\n", 921 | "index_1 2018/2/15 18:00" 922 | ] 923 | }, 924 | "execution_count": 94, 925 | "metadata": {}, 926 | "output_type": "execute_result" 927 | } 928 | ], 929 | "source": [ 930 | "data.loc[[\"index_1\"],[\"LogInTime\"]]" 931 | ] 932 | }, 933 | { 934 | "cell_type": "markdown", 935 | "metadata": {}, 936 | "source": [ 937 | "## Using a mix of position & column names/row index to select single value" 938 | ] 939 | }, 940 | { 941 | "cell_type": "code", 942 | "execution_count": null, 943 | "metadata": {}, 944 | "outputs": [], 945 | "source": [ 946 | "# Try to avoid this method\n", 947 | "data[\"LogInTime\"][5]" 948 | ] 949 | }, 950 | { 951 | "cell_type": "code", 952 | "execution_count": 27, 953 | "metadata": {}, 954 | "outputs": [ 955 | { 956 | "data": { 957 | "text/html": [ 958 | "
\n", 959 | "\n", 972 | "\n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | "
IDNameLogInTimeLogOutTimePurchaseNumLocationPurchaseAmountSaving
index_0101Jack2018/2/10 13:102018/2/10 14:301Home150.01500
index_1101Jack2018/2/15 18:002018/2/15 19:000HomeNaN1350
index_2101Jack2018/2/15 20:002018/2/15 22:300HomeNaN1350
index_3101Jack2018/3/11 10:502018/3/11 15:002Outside190.01350
index_4101Jack2018/3/11 15:152018/3/11 18:001Home30.01160
index_5101Jack2018/5/10 18:002018/5/10 21:004Home100.01130
index_6101Jack2018/7/12 17:402018/7/12 18:202Work28.01030
\n", 1066 | "
" 1067 | ], 1068 | "text/plain": [ 1069 | " ID Name LogInTime LogOutTime PurchaseNum Location \\\n", 1070 | "index_0 101 Jack 2018/2/10 13:10 2018/2/10 14:30 1 Home \n", 1071 | "index_1 101 Jack 2018/2/15 18:00 2018/2/15 19:00 0 Home \n", 1072 | "index_2 101 Jack 2018/2/15 20:00 2018/2/15 22:30 0 Home \n", 1073 | "index_3 101 Jack 2018/3/11 10:50 2018/3/11 15:00 2 Outside \n", 1074 | "index_4 101 Jack 2018/3/11 15:15 2018/3/11 18:00 1 Home \n", 1075 | "index_5 101 Jack 2018/5/10 18:00 2018/5/10 21:00 4 Home \n", 1076 | "index_6 101 Jack 2018/7/12 17:40 2018/7/12 18:20 2 Work \n", 1077 | "\n", 1078 | " PurchaseAmount Saving \n", 1079 | "index_0 150.0 1500 \n", 1080 | "index_1 NaN 1350 \n", 1081 | "index_2 NaN 1350 \n", 1082 | "index_3 190.0 1350 \n", 1083 | "index_4 30.0 1160 \n", 1084 | "index_5 100.0 1130 \n", 1085 | "index_6 28.0 1030 " 1086 | ] 1087 | }, 1088 | "execution_count": 27, 1089 | "metadata": {}, 1090 | "output_type": "execute_result" 1091 | } 1092 | ], 1093 | "source": [ 1094 | "data.head(7)" 1095 | ] 1096 | }, 1097 | { 1098 | "cell_type": "markdown", 1099 | "metadata": {}, 1100 | "source": [ 1101 | "#### Select the data point in 2nd row and LogInTime" 1102 | ] 1103 | }, 1104 | { 1105 | "cell_type": "code", 1106 | "execution_count": 95, 1107 | "metadata": {}, 1108 | "outputs": [ 1109 | { 1110 | "data": { 1111 | "text/plain": [ 1112 | "index_1 2018/2/15 18:00\n", 1113 | "Name: LogInTime, dtype: object" 1114 | ] 1115 | }, 1116 | "execution_count": 95, 1117 | "metadata": {}, 1118 | "output_type": "execute_result" 1119 | } 1120 | ], 1121 | "source": [ 1122 | "data.iloc[1:2,:].loc[:,\"LogInTime\"]" 1123 | ] 1124 | }, 1125 | { 1126 | "cell_type": "code", 1127 | "execution_count": 97, 1128 | "metadata": {}, 1129 | "outputs": [ 1130 | { 1131 | "data": { 1132 | "text/html": [ 1133 | "
\n", 1134 | "\n", 1147 | "\n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | "
LogInTime
index_12018/2/15 18:00
\n", 1161 | "
" 1162 | ], 1163 | "text/plain": [ 1164 | " LogInTime\n", 1165 | "index_1 2018/2/15 18:00" 1166 | ] 1167 | }, 1168 | "execution_count": 97, 1169 | "metadata": {}, 1170 | "output_type": "execute_result" 1171 | } 1172 | ], 1173 | "source": [ 1174 | "data.loc[\"index_1\":\"index_1\",:].iloc[:,2:3]" 1175 | ] 1176 | }, 1177 | { 1178 | "cell_type": "markdown", 1179 | "metadata": {}, 1180 | "source": [ 1181 | "### Try NOT to do this!" 1182 | ] 1183 | }, 1184 | { 1185 | "cell_type": "markdown", 1186 | "metadata": {}, 1187 | "source": [ 1188 | "#### Even though it works some times.\n" 1189 | ] 1190 | }, 1191 | { 1192 | "cell_type": "code", 1193 | "execution_count": 99, 1194 | "metadata": {}, 1195 | "outputs": [ 1196 | { 1197 | "name": "stderr", 1198 | "output_type": "stream", 1199 | "text": [ 1200 | "/Users/lush/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n", 1201 | "A value is trying to be set on a copy of a slice from a DataFrame\n", 1202 | "\n", 1203 | "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 1204 | " \"\"\"Entry point for launching an IPython kernel.\n" 1205 | ] 1206 | } 1207 | ], 1208 | "source": [ 1209 | "data[\"LogInTime\"][1] = 5" 1210 | ] 1211 | } 1212 | ], 1213 | "metadata": { 1214 | "kernelspec": { 1215 | "display_name": "Python 3", 1216 | "language": "python", 1217 | "name": "python3" 1218 | }, 1219 | "language_info": { 1220 | "codemirror_mode": { 1221 | "name": "ipython", 1222 | "version": 3 1223 | }, 1224 | "file_extension": ".py", 1225 | "mimetype": "text/x-python", 1226 | "name": "python", 1227 | "nbconvert_exporter": "python", 1228 | "pygments_lexer": "ipython3", 1229 | "version": "3.7.4" 1230 | } 1231 | }, 1232 | "nbformat": 4, 1233 | "nbformat_minor": 2 1234 | } 1235 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Pandas & Numpy Practice/README.md: -------------------------------------------------------------------------------- 1 | # Pandas & Numpy practice 2 | 3 | These are my practice on `pandas` and `numpy` packages, which are both very important packages when it comes to data manipulation and data analysis. 4 | 5 | ## Getting Started 6 | 7 | ### Prerequisites 8 | 9 | For this practice, `pandas`, `numpy` packages are required. 10 | 11 | 12 | ## General Guide 13 | The followings are descriptions of the several practices that I have done: 14 | * "101 pandas practice" is my practice and solutions for [101 Pandas Exercises](https://www.machinelearningplus.com/python/101-pandas-exercises-python/) 15 | * "Exhaustive Introduction to Pandas in Python" are the full codes for my blog on [medium](https://medium.com/@lushuhan95). 16 | 17 | ## Authors 18 | 19 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/)- [Medium page](https://medium.com/@lushuhan95) 20 | 21 | 22 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Python codes for implementing SQL queries/Python code for SQL queries.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 4, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np\n", 11 | "import sqlite3\n", 12 | "import matplotlib.pyplot as plt \n", 13 | "import matplotlib.image as mpimg \n", 14 | "from scipy import misc\n", 15 | "from PIL import Image" 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "### Creating connection between Python and Sqlite3" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 5, 28 | "metadata": {}, 29 | "outputs": [], 30 | "source": [ 31 | "conn = sqlite3.connect(\"local.db\")\n", 32 | "cursor = conn.cursor()" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "### Creating Database" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 6, 45 | "metadata": {}, 46 | "outputs": [ 47 | { 48 | "name": "stdout", 49 | "output_type": "stream", 50 | "text": [ 51 | "[(1, 122, '989319-457', '2014-04-08', 3813.33, 3813.33, 0, 3, '2014-05-08', '2014-05-07'), (2, 123, '263253241', '2014-04-10', 40.2, 40.2, 0, 3, '2014-05-10', '2014-05-14'), (3, 123, '963253234', '2014-04-13', 138.75, 138.75, 0, 3, '2014-05-13', '2014-05-09'), (4, 123, '2-000-2993', '2014-04-16', 144.7, 144.7, 0, 3, '2014-05-16', '2014-05-12'), (5, 123, '963253251', '2014-04-16', 15.5, 15.5, 0, 3, '2014-05-16', '2014-05-11')]\n" 52 | ] 53 | } 54 | ], 55 | "source": [ 56 | "cursor.execute(\"SELECT * FROM invoices LIMIT 5\")\n", 57 | "query_results_temp = cursor.fetchall()\n", 58 | "print(query_results_temp)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 7, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "general_ledger_accounts = pd.read_sql(sql=\"SELECT * FROM general_ledger_accounts\",con=conn)\n", 68 | "terms = pd.read_sql(sql=\"SELECT * FROM terms\",con=conn)\n", 69 | "vendors = pd.read_sql(sql=\"SELECT * FROM vendors\",con=conn)\n", 70 | "invoices = pd.read_sql(sql=\"SELECT * FROM invoices\",con=conn)\n", 71 | "invoice_line_items = pd.read_sql(sql=\"SELECT * FROM invoice_line_items\",con=conn)\n", 72 | "vendor_contacts = pd.read_sql(sql=\"SELECT * FROM vendor_contacts\",con=conn)\n", 73 | "invoice_archive = pd.read_sql(sql=\"SELECT * FROM invoice_archive\",con=conn)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "# Basic SQL query" 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "### Order of executing SQL query\n", 88 | "### FROM - WHERE - GROUP BY - HAVING - SELECT - ORDER BY - LIMIT" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 8, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "# Relationship of the database\n", 98 | "# im = Image.open('database_relationship.png')\n", 99 | "# im.show()" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "### Basic query with `FROM` &`WHERE` & `LIMIT`" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 9, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "data": { 116 | "text/html": [ 117 | "
\n", 118 | "\n", 131 | "\n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | "
invoice_idvendor_idinvoice_numberinvoice_dateinvoice_totalpayment_totalcredit_totalterms_idinvoice_due_datepayment_date
07812197/5222014-06-281962.131762.13200.0032014-07-282014-07-30
11061100-20602014-07-2423517.5821221.632295.9532014-08-232014-08-27
\n", 176 | "
" 177 | ], 178 | "text/plain": [ 179 | " invoice_id vendor_id invoice_number invoice_date invoice_total \\\n", 180 | "0 78 121 97/522 2014-06-28 1962.13 \n", 181 | "1 106 110 0-2060 2014-07-24 23517.58 \n", 182 | "\n", 183 | " payment_total credit_total terms_id invoice_due_date payment_date \n", 184 | "0 1762.13 200.00 3 2014-07-28 2014-07-30 \n", 185 | "1 21221.63 2295.95 3 2014-08-23 2014-08-27 " 186 | ] 187 | }, 188 | "execution_count": 9, 189 | "metadata": {}, 190 | "output_type": "execute_result" 191 | } 192 | ], 193 | "source": [ 194 | "cursor.execute(\"\"\"\n", 195 | "SELECT * \n", 196 | "FROM invoices \n", 197 | "WHERE payment_total >10 and credit_total >50\n", 198 | "LIMIT 2\n", 199 | "\"\"\")\n", 200 | "query_results_temp = cursor.fetchall()\n", 201 | "df_temp = pd.DataFrame(query_results_temp)\n", 202 | "df_temp.columns=(invoices[(invoices[\"payment_total\"]>10)].columns) # For displaying the column names\n", 203 | "df_temp" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 10, 209 | "metadata": { 210 | "scrolled": true 211 | }, 212 | "outputs": [ 213 | { 214 | "data": { 215 | "text/html": [ 216 | "
\n", 217 | "\n", 230 | "\n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | "
invoice_idvendor_idinvoice_numberinvoice_dateinvoice_totalpayment_totalcredit_totalterms_idinvoice_due_datepayment_date
777812197/5222014-06-281962.131762.13200.0032014-07-282014-07-30
1051061100-20602014-07-2423517.5821221.632295.9532014-08-232014-08-27
\n", 275 | "
" 276 | ], 277 | "text/plain": [ 278 | " invoice_id vendor_id invoice_number invoice_date invoice_total \\\n", 279 | "77 78 121 97/522 2014-06-28 1962.13 \n", 280 | "105 106 110 0-2060 2014-07-24 23517.58 \n", 281 | "\n", 282 | " payment_total credit_total terms_id invoice_due_date payment_date \n", 283 | "77 1762.13 200.00 3 2014-07-28 2014-07-30 \n", 284 | "105 21221.63 2295.95 3 2014-08-23 2014-08-27 " 285 | ] 286 | }, 287 | "execution_count": 10, 288 | "metadata": {}, 289 | "output_type": "execute_result" 290 | } 291 | ], 292 | "source": [ 293 | "invoices[(invoices[\"payment_total\"]>10) \\\n", 294 | " &(invoices[\"credit_total\"]>50)].head(2)\n" 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "### Basic query with `WHERE` & `IN` & select specific columns\n" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 11, 307 | "metadata": {}, 308 | "outputs": [ 309 | { 310 | "data": { 311 | "text/html": [ 312 | "
\n", 313 | "\n", 326 | "\n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | "
012
01989319-4573813.33
1226325324140.20
\n", 350 | "
" 351 | ], 352 | "text/plain": [ 353 | " 0 1 2\n", 354 | "0 1 989319-457 3813.33\n", 355 | "1 2 263253241 40.20" 356 | ] 357 | }, 358 | "execution_count": 11, 359 | "metadata": {}, 360 | "output_type": "execute_result" 361 | } 362 | ], 363 | "source": [ 364 | "cursor.execute(\"\"\"\n", 365 | "SELECT \n", 366 | "invoice_id,\n", 367 | "invoice_number,\n", 368 | "invoice_total\n", 369 | "FROM invoices \n", 370 | "WHERE payment_total NOT IN (0,15.5,40.5) \n", 371 | "ORDER BY invoice_id\n", 372 | "LIMIT 2\n", 373 | "\"\"\")\n", 374 | "query_results_temp = cursor.fetchall()\n", 375 | "df_temp = pd.DataFrame(query_results_temp)\n", 376 | "\n", 377 | "df_temp" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 12, 383 | "metadata": { 384 | "scrolled": true 385 | }, 386 | "outputs": [ 387 | { 388 | "data": { 389 | "text/html": [ 390 | "
\n", 391 | "\n", 404 | "\n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | "
invoice_idinvoice_numberinvoice_total
01989319-4573813.33
1226325324140.20
\n", 428 | "
" 429 | ], 430 | "text/plain": [ 431 | " invoice_id invoice_number invoice_total\n", 432 | "0 1 989319-457 3813.33\n", 433 | "1 2 263253241 40.20" 434 | ] 435 | }, 436 | "execution_count": 12, 437 | "metadata": {}, 438 | "output_type": "execute_result" 439 | } 440 | ], 441 | "source": [ 442 | "invoices[~invoices[\"payment_total\"].isin([0,15.5,40.5])] \\\n", 443 | ".sort_values(\"invoice_id\")[[\"invoice_id\",\"invoice_number\" \\\n", 444 | " ,\"invoice_total\"]].head(2)\n" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "### Basic query with`ORDER BY` using different order `DESC` or `ASC`" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 13, 457 | "metadata": {}, 458 | "outputs": [ 459 | { 460 | "data": { 461 | "text/html": [ 462 | "
\n", 463 | "\n", 476 | "\n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | "
012
0594-314-305713.75
1596325325115.50
\n", 500 | "
" 501 | ], 502 | "text/plain": [ 503 | " 0 1 2\n", 504 | "0 59 4-314-3057 13.75\n", 505 | "1 5 963253251 15.50" 506 | ] 507 | }, 508 | "execution_count": 13, 509 | "metadata": {}, 510 | "output_type": "execute_result" 511 | } 512 | ], 513 | "source": [ 514 | "cursor.execute(\"\"\"\n", 515 | "SELECT \n", 516 | "invoice_id,\n", 517 | "invoice_number,\n", 518 | "invoice_total\n", 519 | "FROM invoices \n", 520 | "WHERE payment_total > 10\n", 521 | "ORDER BY invoice_total, invoice_id DESC\n", 522 | "LIMIT 2\n", 523 | "\"\"\")\n", 524 | "query_results_temp = cursor.fetchall()\n", 525 | "df_temp = pd.DataFrame(query_results_temp)\n", 526 | "\n", 527 | "df_temp" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 14, 533 | "metadata": {}, 534 | "outputs": [ 535 | { 536 | "data": { 537 | "text/html": [ 538 | "
\n", 539 | "\n", 552 | "\n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | "
invoice_idinvoice_numberinvoice_total
58594-314-305713.75
4596325325115.50
\n", 576 | "
" 577 | ], 578 | "text/plain": [ 579 | " invoice_id invoice_number invoice_total\n", 580 | "58 59 4-314-3057 13.75\n", 581 | "4 5 963253251 15.50" 582 | ] 583 | }, 584 | "execution_count": 14, 585 | "metadata": {}, 586 | "output_type": "execute_result" 587 | } 588 | ], 589 | "source": [ 590 | "invoices[(invoices[\"payment_total\"]>10)].sort_values(by= \\\n", 591 | "[\"invoice_total\",\"invoice_id\"],ascending = [True,False]) \\\n", 592 | "[[\"invoice_id\",\"invoice_number\",\"invoice_total\"]].head(2)\n" 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": {}, 598 | "source": [ 599 | "### Basic query with `WHERE` & `LIMIT` & `ORDER BY` & specific multiple columns" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 15, 605 | "metadata": {}, 606 | "outputs": [ 607 | { 608 | "data": { 609 | "text/html": [ 610 | "
\n", 611 | "\n", 624 | "\n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | "
012
01989319-4573813.33
1226325324140.20
\n", 648 | "
" 649 | ], 650 | "text/plain": [ 651 | " 0 1 2\n", 652 | "0 1 989319-457 3813.33\n", 653 | "1 2 263253241 40.20" 654 | ] 655 | }, 656 | "execution_count": 15, 657 | "metadata": {}, 658 | "output_type": "execute_result" 659 | } 660 | ], 661 | "source": [ 662 | "cursor.execute(\"\"\"\n", 663 | "SELECT \n", 664 | "invoice_id,\n", 665 | "invoice_number,\n", 666 | "invoice_total\n", 667 | "FROM invoices \n", 668 | "WHERE payment_total >5 \n", 669 | "ORDER BY invoice_id\n", 670 | "LIMIT 2\n", 671 | "\"\"\")\n", 672 | "query_results_temp = cursor.fetchall()\n", 673 | "df_temp = pd.DataFrame(query_results_temp)\n", 674 | "df_temp" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 16, 680 | "metadata": {}, 681 | "outputs": [ 682 | { 683 | "data": { 684 | "text/html": [ 685 | "
\n", 686 | "\n", 699 | "\n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | "
invoice_idinvoice_numberinvoice_total
01989319-4573813.33
1226325324140.20
\n", 723 | "
" 724 | ], 725 | "text/plain": [ 726 | " invoice_id invoice_number invoice_total\n", 727 | "0 1 989319-457 3813.33\n", 728 | "1 2 263253241 40.20" 729 | ] 730 | }, 731 | "execution_count": 16, 732 | "metadata": {}, 733 | "output_type": "execute_result" 734 | } 735 | ], 736 | "source": [ 737 | "invoices[(invoices[\"payment_total\"]>5)].sort_values(by=[\"invoice_id\"]) \\\n", 738 | "[[\"invoice_id\",\"invoice_number\",\"invoice_total\"]].head(2)" 739 | ] 740 | }, 741 | { 742 | "cell_type": "markdown", 743 | "metadata": {}, 744 | "source": [ 745 | "### Basic query with`UNION` & Other functions" 746 | ] 747 | }, 748 | { 749 | "cell_type": "code", 750 | "execution_count": 17, 751 | "metadata": {}, 752 | "outputs": [ 753 | { 754 | "data": { 755 | "text/html": [ 756 | "
\n", 757 | "\n", 770 | "\n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | "
01
013813.33
13138.75
24144.70
\n", 796 | "
" 797 | ], 798 | "text/plain": [ 799 | " 0 1\n", 800 | "0 1 3813.33\n", 801 | "1 3 138.75\n", 802 | "2 4 144.70" 803 | ] 804 | }, 805 | "execution_count": 17, 806 | "metadata": {}, 807 | "output_type": "execute_result" 808 | } 809 | ], 810 | "source": [ 811 | "cursor.execute(\"\"\"\n", 812 | "SELECT \n", 813 | "invoice_id,\n", 814 | "payment_total\n", 815 | "FROM invoices \n", 816 | "WHERE payment_total >50 \n", 817 | "UNION \n", 818 | "SELECT \n", 819 | "invoice_id,\n", 820 | "payment_total\n", 821 | "FROM invoices \n", 822 | "WHERE payment_total <30 \n", 823 | "ORDER BY invoice_id ASC\n", 824 | "LIMIT 3\n", 825 | "\"\"\")\n", 826 | "query_results_temp = cursor.fetchall()\n", 827 | "df_temp = pd.DataFrame(query_results_temp)\n", 828 | "df_temp" 829 | ] 830 | }, 831 | { 832 | "cell_type": "code", 833 | "execution_count": 18, 834 | "metadata": {}, 835 | "outputs": [ 836 | { 837 | "data": { 838 | "text/html": [ 839 | "
\n", 840 | "\n", 853 | "\n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | "
invoice_idpayment_total
013813.33
23138.75
34144.70
\n", 879 | "
" 880 | ], 881 | "text/plain": [ 882 | " invoice_id payment_total\n", 883 | "0 1 3813.33\n", 884 | "2 3 138.75\n", 885 | "3 4 144.70" 886 | ] 887 | }, 888 | "execution_count": 18, 889 | "metadata": {}, 890 | "output_type": "execute_result" 891 | } 892 | ], 893 | "source": [ 894 | "pd.concat([invoices[invoices[\"payment_total\"]>50][[\"invoice_id\",\"payment_total\"]], \\\n", 895 | "invoices[invoices[\"payment_total\"]<30][[\"invoice_id\",\"payment_total\"]]],sort=False,axis=0). \\\n", 896 | "sort_values(\"invoice_id\",ascending = True).head(3)" 897 | ] 898 | }, 899 | { 900 | "cell_type": "markdown", 901 | "metadata": {}, 902 | "source": [ 903 | "# Aggregation SQL query" 904 | ] 905 | }, 906 | { 907 | "cell_type": "markdown", 908 | "metadata": {}, 909 | "source": [ 910 | "### Simple aggregation query with `GROUP BY`&`SUM` & `ORDER BY` & `LIMIT`" 911 | ] 912 | }, 913 | { 914 | "cell_type": "code", 915 | "execution_count": 19, 916 | "metadata": {}, 917 | "outputs": [ 918 | { 919 | "data": { 920 | "text/html": [ 921 | "
\n", 922 | "\n", 935 | "\n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | "
01
034116.54
1341083.58
\n", 956 | "
" 957 | ], 958 | "text/plain": [ 959 | " 0 1\n", 960 | "0 34 116.54\n", 961 | "1 34 1083.58" 962 | ] 963 | }, 964 | "execution_count": 19, 965 | "metadata": {}, 966 | "output_type": "execute_result" 967 | } 968 | ], 969 | "source": [ 970 | "cursor.execute(\"\"\"\n", 971 | "SELECT \n", 972 | "vendor_id,\n", 973 | "SUM(invoice_total)\n", 974 | "FROM invoices \n", 975 | "GROUP BY vendor_id,invoice_date\n", 976 | "ORDER BY vendor_id\n", 977 | "LIMIT 2\n", 978 | "\"\"\")\n", 979 | "query_results_temp = cursor.fetchall()\n", 980 | "df_temp = pd.DataFrame(query_results_temp)\n", 981 | "df_temp" 982 | ] 983 | }, 984 | { 985 | "cell_type": "code", 986 | "execution_count": 20, 987 | "metadata": {}, 988 | "outputs": [ 989 | { 990 | "data": { 991 | "text/html": [ 992 | "
\n", 993 | "\n", 1006 | "\n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | "
vendor_idinvoice_dateinvoice_total
0342014-05-07116.54
1342014-06-091083.58
\n", 1030 | "
" 1031 | ], 1032 | "text/plain": [ 1033 | " vendor_id invoice_date invoice_total\n", 1034 | "0 34 2014-05-07 116.54\n", 1035 | "1 34 2014-06-09 1083.58" 1036 | ] 1037 | }, 1038 | "execution_count": 20, 1039 | "metadata": {}, 1040 | "output_type": "execute_result" 1041 | } 1042 | ], 1043 | "source": [ 1044 | "invoices.groupby([\"vendor_id\",\"invoice_date\"]).agg({\"invoice_total\":\"sum\"}) \\\n", 1045 | ".reset_index().sort_values(by=[\"vendor_id\"]).head(2)" 1046 | ] 1047 | }, 1048 | { 1049 | "cell_type": "markdown", 1050 | "metadata": {}, 1051 | "source": [ 1052 | "### Simple aggregation query with `SUM` & `ORDER BY` & `LIMIT` & `WHERE` & `HAVING`" 1053 | ] 1054 | }, 1055 | { 1056 | "cell_type": "code", 1057 | "execution_count": 21, 1058 | "metadata": {}, 1059 | "outputs": [ 1060 | { 1061 | "data": { 1062 | "text/html": [ 1063 | "
\n", 1064 | "\n", 1077 | "\n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | "
012
0342014-06-091083.58
1482014-05-03856.92
\n", 1101 | "
" 1102 | ], 1103 | "text/plain": [ 1104 | " 0 1 2\n", 1105 | "0 34 2014-06-09 1083.58\n", 1106 | "1 48 2014-05-03 856.92" 1107 | ] 1108 | }, 1109 | "execution_count": 21, 1110 | "metadata": {}, 1111 | "output_type": "execute_result" 1112 | } 1113 | ], 1114 | "source": [ 1115 | "cursor.execute(\"\"\"\n", 1116 | "SELECT \n", 1117 | "vendor_id,\n", 1118 | "invoice_date,\n", 1119 | "SUM(invoice_total)\n", 1120 | "FROM invoices \n", 1121 | "WHERE payment_total >500 \n", 1122 | "GROUP BY vendor_id,invoice_date\n", 1123 | "HAVING SUM(invoice_total) > 50\n", 1124 | "ORDER BY vendor_id\n", 1125 | "LIMIT 2\n", 1126 | "\"\"\")\n", 1127 | "query_results_temp = cursor.fetchall()\n", 1128 | "df_temp = pd.DataFrame(query_results_temp)\n", 1129 | "df_temp" 1130 | ] 1131 | }, 1132 | { 1133 | "cell_type": "code", 1134 | "execution_count": 22, 1135 | "metadata": {}, 1136 | "outputs": [ 1137 | { 1138 | "data": { 1139 | "text/html": [ 1140 | "
\n", 1141 | "\n", 1154 | "\n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | "
vendor_idinvoice_dateinvoice_total
0342014-06-091083.58
1482014-05-03856.92
\n", 1178 | "
" 1179 | ], 1180 | "text/plain": [ 1181 | " vendor_id invoice_date invoice_total\n", 1182 | "0 34 2014-06-09 1083.58\n", 1183 | "1 48 2014-05-03 856.92" 1184 | ] 1185 | }, 1186 | "execution_count": 22, 1187 | "metadata": {}, 1188 | "output_type": "execute_result" 1189 | } 1190 | ], 1191 | "source": [ 1192 | "temp = \\\n", 1193 | "invoices[(invoices[\"payment_total\"]>500)].groupby([\"vendor_id\",\"invoice_date\"]) \\\n", 1194 | ".agg({\"invoice_total\":\"sum\"}).reset_index()\n", 1195 | "temp[(temp[\"invoice_total\"]>50)].sort_values(by=[\"vendor_id\"]).head(2)" 1196 | ] 1197 | }, 1198 | { 1199 | "cell_type": "markdown", 1200 | "metadata": {}, 1201 | "source": [ 1202 | "### Aggregation query with `COUNT` & `AVG` for one column & `HAVING` & `WHERE` & select specific columns\n", 1203 | "#### Select only one aggregation term from the column using `.loc[]`" 1204 | ] 1205 | }, 1206 | { 1207 | "cell_type": "code", 1208 | "execution_count": 23, 1209 | "metadata": {}, 1210 | "outputs": [ 1211 | { 1212 | "data": { 1213 | "text/html": [ 1214 | "
\n", 1215 | "\n", 1228 | "\n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | "
0123
01232014-05-312226.875
11232014-06-11233.500
\n", 1255 | "
" 1256 | ], 1257 | "text/plain": [ 1258 | " 0 1 2 3\n", 1259 | "0 123 2014-05-31 2 226.875\n", 1260 | "1 123 2014-06-11 2 33.500" 1261 | ] 1262 | }, 1263 | "execution_count": 23, 1264 | "metadata": {}, 1265 | "output_type": "execute_result" 1266 | } 1267 | ], 1268 | "source": [ 1269 | "cursor.execute(\"\"\"\n", 1270 | "SELECT \n", 1271 | "vendor_id,\n", 1272 | "invoice_date,\n", 1273 | "COUNT(invoice_total),\n", 1274 | "AVG(invoice_total)\n", 1275 | "FROM invoices \n", 1276 | "WHERE payment_total >8 \n", 1277 | "GROUP BY vendor_id,invoice_date\n", 1278 | "HAVING COUNT(invoice_total) > 1\n", 1279 | "ORDER BY COUNT(invoice_total), vendor_id\n", 1280 | "LIMIT 2\n", 1281 | "\"\"\")\n", 1282 | "query_results_temp = cursor.fetchall()\n", 1283 | "df_temp = pd.DataFrame(query_results_temp)\n", 1284 | "df_temp" 1285 | ] 1286 | }, 1287 | { 1288 | "cell_type": "code", 1289 | "execution_count": 24, 1290 | "metadata": {}, 1291 | "outputs": [ 1292 | { 1293 | "data": { 1294 | "text/html": [ 1295 | "
\n", 1296 | "\n", 1309 | "\n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | "
vendor_idinvoice_dateinvoice_total
sizemean
731232014-05-312226.875
761232014-06-11233.500
\n", 1342 | "
" 1343 | ], 1344 | "text/plain": [ 1345 | " vendor_id invoice_date invoice_total \n", 1346 | " size mean\n", 1347 | "73 123 2014-05-31 2 226.875\n", 1348 | "76 123 2014-06-11 2 33.500" 1349 | ] 1350 | }, 1351 | "execution_count": 24, 1352 | "metadata": {}, 1353 | "output_type": "execute_result" 1354 | } 1355 | ], 1356 | "source": [ 1357 | "temp = \\\n", 1358 | "invoices[(invoices[\"payment_total\"] > 8)].groupby( \\\n", 1359 | " [\"vendor_id\",\"invoice_date\"]).agg({\"invoice_total\":['size','mean']}) \\\n", 1360 | " .reset_index().loc[:,[(\"vendor_id\",\"\"),(\"invoice_date\",\"\"), \\\n", 1361 | " (\"invoice_total\",\"size\"),(\"invoice_total\",\"mean\")]]\n", 1362 | "\n", 1363 | "temp[temp[(\"invoice_total\",\"size\")]>1].sort_values \\\n", 1364 | "(by=[(\"invoice_total\",\"size\"),(\"vendor_id\",\"\")]).head(2)\n" 1365 | ] 1366 | }, 1367 | { 1368 | "cell_type": "markdown", 1369 | "metadata": {}, 1370 | "source": [ 1371 | "# JOIN SQL query" 1372 | ] 1373 | }, 1374 | { 1375 | "cell_type": "markdown", 1376 | "metadata": {}, 1377 | "source": [ 1378 | "### Simple `INNER` JOIN" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "code", 1383 | "execution_count": 25, 1384 | "metadata": {}, 1385 | "outputs": [ 1386 | { 1387 | "data": { 1388 | "text/html": [ 1389 | "
\n", 1390 | "\n", 1403 | "\n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1410 | " \n", 1411 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | "
01
0Abbey Office Furnishings2014-07-05
1Bertelsmann Industry Svcs. Inc2014-06-18
\n", 1424 | "
" 1425 | ], 1426 | "text/plain": [ 1427 | " 0 1\n", 1428 | "0 Abbey Office Furnishings 2014-07-05\n", 1429 | "1 Bertelsmann Industry Svcs. Inc 2014-06-18" 1430 | ] 1431 | }, 1432 | "execution_count": 25, 1433 | "metadata": {}, 1434 | "output_type": "execute_result" 1435 | } 1436 | ], 1437 | "source": [ 1438 | "cursor.execute(\"\"\"\n", 1439 | "SELECT \n", 1440 | "v.vendor_name,\n", 1441 | "i.invoice_date\n", 1442 | "FROM vendors v\n", 1443 | "INNER JOIN invoices i ON v.vendor_id = i.vendor_id\n", 1444 | "ORDER BY v.vendor_name, i.invoice_date\n", 1445 | "LIMIT 2\n", 1446 | "\"\"\")\n", 1447 | "query_results_temp = cursor.fetchall()\n", 1448 | "df_temp = pd.DataFrame(query_results_temp)\n", 1449 | "df_temp" 1450 | ] 1451 | }, 1452 | { 1453 | "cell_type": "code", 1454 | "execution_count": 26, 1455 | "metadata": {}, 1456 | "outputs": [ 1457 | { 1458 | "data": { 1459 | "text/html": [ 1460 | "
\n", 1461 | "\n", 1474 | "\n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | "
vendor_nameinvoice_date
18Abbey Office Furnishings2014-07-05
28Bertelsmann Industry Svcs. Inc2014-06-18
\n", 1495 | "
" 1496 | ], 1497 | "text/plain": [ 1498 | " vendor_name invoice_date\n", 1499 | "18 Abbey Office Furnishings 2014-07-05\n", 1500 | "28 Bertelsmann Industry Svcs. Inc 2014-06-18" 1501 | ] 1502 | }, 1503 | "execution_count": 26, 1504 | "metadata": {}, 1505 | "output_type": "execute_result" 1506 | } 1507 | ], 1508 | "source": [ 1509 | "vendors.merge(invoices,how=\"inner\",on=\"vendor_id\").sort_values(by=[\"vendor_name\",\"invoice_date\"]) \\\n", 1510 | "[[\"vendor_name\",\"invoice_date\"]].head(2)" 1511 | ] 1512 | }, 1513 | { 1514 | "cell_type": "markdown", 1515 | "metadata": {}, 1516 | "source": [ 1517 | "### Simple `INNER` & `LEFT` JOIN" 1518 | ] 1519 | }, 1520 | { 1521 | "cell_type": "code", 1522 | "execution_count": 27, 1523 | "metadata": {}, 1524 | "outputs": [ 1525 | { 1526 | "data": { 1527 | "text/html": [ 1528 | "
\n", 1529 | "\n", 1542 | "\n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | "
0123
0Abbey Office Furnishings2014-07-0517.5017.50
1Bertelsmann Industry Svcs. Inc2014-06-186940.256940.25
\n", 1569 | "
" 1570 | ], 1571 | "text/plain": [ 1572 | " 0 1 2 3\n", 1573 | "0 Abbey Office Furnishings 2014-07-05 17.50 17.50\n", 1574 | "1 Bertelsmann Industry Svcs. Inc 2014-06-18 6940.25 6940.25" 1575 | ] 1576 | }, 1577 | "execution_count": 27, 1578 | "metadata": {}, 1579 | "output_type": "execute_result" 1580 | } 1581 | ], 1582 | "source": [ 1583 | "cursor.execute(\"\"\"\n", 1584 | "SELECT \n", 1585 | "v.vendor_name,\n", 1586 | "i.invoice_date,\n", 1587 | "i.invoice_total,\n", 1588 | "ili.line_item_amount\n", 1589 | "FROM vendors v\n", 1590 | "INNER JOIN invoices i ON v.vendor_id = i.vendor_id\n", 1591 | "LEFT JOIN invoice_line_items ili ON i.invoice_id = ili.invoice_id\n", 1592 | "ORDER BY v.vendor_name, i.invoice_date\n", 1593 | "LIMIT 2\n", 1594 | "\"\"\")\n", 1595 | "query_results_temp = cursor.fetchall()\n", 1596 | "df_temp = pd.DataFrame(query_results_temp)\n", 1597 | "df_temp" 1598 | ] 1599 | }, 1600 | { 1601 | "cell_type": "code", 1602 | "execution_count": 28, 1603 | "metadata": {}, 1604 | "outputs": [ 1605 | { 1606 | "data": { 1607 | "text/html": [ 1608 | "
\n", 1609 | "\n", 1622 | "\n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | "
vendor_nameinvoice_dateinvoice_totalline_item_amount
18Abbey Office Furnishings2014-07-0517.5017.50
31Bertelsmann Industry Svcs. Inc2014-06-186940.256940.25
\n", 1649 | "
" 1650 | ], 1651 | "text/plain": [ 1652 | " vendor_name invoice_date invoice_total \\\n", 1653 | "18 Abbey Office Furnishings 2014-07-05 17.50 \n", 1654 | "31 Bertelsmann Industry Svcs. Inc 2014-06-18 6940.25 \n", 1655 | "\n", 1656 | " line_item_amount \n", 1657 | "18 17.50 \n", 1658 | "31 6940.25 " 1659 | ] 1660 | }, 1661 | "execution_count": 28, 1662 | "metadata": {}, 1663 | "output_type": "execute_result" 1664 | } 1665 | ], 1666 | "source": [ 1667 | "vendors.merge(invoices,how=\"inner\",on=\"vendor_id\"). \\\n", 1668 | "merge(invoice_line_items,how=\"left\",on=\"invoice_id\") \\\n", 1669 | "[[\"vendor_name\",\"invoice_date\",\"invoice_total\",\"line_item_amount\"]]. \\\n", 1670 | "sort_values(by=[\"vendor_name\",\"invoice_date\"]).head(2)" 1671 | ] 1672 | } 1673 | ], 1674 | "metadata": { 1675 | "kernelspec": { 1676 | "display_name": "Python 3", 1677 | "language": "python", 1678 | "name": "python3" 1679 | }, 1680 | "language_info": { 1681 | "codemirror_mode": { 1682 | "name": "ipython", 1683 | "version": 3 1684 | }, 1685 | "file_extension": ".py", 1686 | "mimetype": "text/x-python", 1687 | "name": "python", 1688 | "nbconvert_exporter": "python", 1689 | "pygments_lexer": "ipython3", 1690 | "version": "3.7.4" 1691 | } 1692 | }, 1693 | "nbformat": 4, 1694 | "nbformat_minor": 2 1695 | } 1696 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Python codes for implementing SQL queries/README.md: -------------------------------------------------------------------------------- 1 | # Python codes for implementing SQL queries 2 | 3 | This is the practice for rewriting SQL queries in Python using packages `pandas` and `numpy`. 4 | 5 | ## Prerequisites 6 | 7 | For this practice, `pandas`, `numpy` and `sqlite3` packages are required. 8 | 9 | ## General Guide 10 | This is the full codes for my blog: [Reproducting SQL Queries in Python Codes](https://medium.com/swlh/reproducing-sql-queries-in-python-codes-35d90f716b1a) 11 | The "database_relationship" picture can show the relationship between different databases. 12 | 13 | "Python code for SQL queries" shows the practice for reproducing SQL queries using python codes. 14 | 15 | ## Authors 16 | 17 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/)- [Medium page](https://medium.com/@lushuhan95) 18 | 19 | 20 | -------------------------------------------------------------------------------- /Data Manipulation Practice/Python codes for implementing SQL queries/database_relationship.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lush9516/Daily-Practice-for-Coding/bb672451256fe355df8b78d4c43a7b25decee7c8/Data Manipulation Practice/Python codes for implementing SQL queries/database_relationship.png -------------------------------------------------------------------------------- /Data Manipulation Practice/README.md: -------------------------------------------------------------------------------- 1 | # Data Manipulation 2 | 3 | This is a record of all my coding practice including Data Manipulation, Data Structure and Algorithm, Data Visualization. 4 | 5 | This is a list of all the practice: 6 | * [Pandas & Numpy practice](https://github.com/lush9516/Daily-Practice-for-Coding/tree/master/Data%20Manipulation%20Practice/Pandas%20%26%20Numpy%20Practice) 7 | * [Python(pandas) codes for implementing SQL queries](https://github.com/lush9516/Daily-Practice-for-Coding/tree/master/Data%20Manipulation%20Practice/Python%20codes%20for%20implementing%20SQL%20queries) 8 | * [Other techniques & practice](https://github.com/lush9516/Daily-Practice-for-Coding/tree/master/Data%20Manipulation%20Practice/Other%20techniques%20%26%20practices) 9 | 10 | ## Getting Started 11 | 12 | ### Prerequisites 13 | 14 | For Data Pandas & Numpy and Implementing SQL queries practice, `pandas` and `numpy` packages are required. 15 | 16 | For Data Other techniques, `wordcloud` packages are required. 17 | 18 | #### Install packages 19 | 20 | Personally, I recommend to use anaconda which is easy to manage the packages. 21 | * [anaconda free download link](https://www.anaconda.com/distribution/#download-section) 22 | 23 | After installing anaconda, you can easily install python packages in terminal. 24 | ``` 25 | conda install package_name 26 | ``` 27 | 28 | #### Load packages 29 | ``` 30 | import pandas as pd 31 | import numpy as np 32 | import matplotlib.pyploy as plt 33 | import seaborn 34 | ``` 35 | 36 | ## Author 37 | 38 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/) - [Medium page](https://medium.com/@lushuhan95) 39 | 40 | 41 | -------------------------------------------------------------------------------- /Data Visualization/README.md: -------------------------------------------------------------------------------- 1 | # Python_Basic_Visualization 2 | 3 | This is a record of practice on data visualization using python. 4 | 5 | This record mainly contains practice using `matplotlib.pyplot` and `seaborn`. 6 | 7 | ## Getting Started 8 | 9 | ### Prerequisites 10 | 11 | For this practice, `pandas`,`numpy`,`matplotlib.pyplot` and `seaborn`packages are required. 12 | 13 | 14 | #### Install packages 15 | 16 | Personally, I recommend to use anaconda which is easy to manage the packages. 17 | * [anaconda free download link](https://www.anaconda.com/distribution/#download-section) 18 | 19 | After installing anaconda, you can easily install python packages in terminal. 20 | ``` 21 | conda install package_name 22 | ``` 23 | 24 | #### Load packages 25 | ``` 26 | import pandas as pd 27 | import numpy as np 28 | import matplotlib.pyploy as plt 29 | import seaborn 30 | ``` 31 | 32 | ## General guidance 33 | This practice contains basic data visualization methods, including line plot, bar plot, histogram plot, etc. 34 | 35 | For `seaborn` package, the main use for me is to plot heatmap. 36 | 37 | ## Authors 38 | 39 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/)- [Medium page](https://medium.com/@lushuhan95) 40 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#13_Roman to Integer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "import numpy as np" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "#### For example, two is written as II in Roman numeral, just two one's added together. Twelve is written as, XII, which is simply X + II. The number twenty seven is written as XXVII, which is XX + V + II.\n", 18 | "\n", 19 | "#### Roman numerals are usually written largest to smallest from left to right. However, the numeral for four is not IIII. Instead, the number four is written as IV. Because the one is before the five we subtract it making four. The same principle applies to the number nine, which is written as IX. There are six instances where subtraction is used:\n", 20 | "\n", 21 | "#### I can be placed before V (5) and X (10) to make 4 and 9. \n", 22 | "#### X can be placed before L (50) and C (100) to make 40 and 90. \n", 23 | "#### C can be placed before D (500) and M (1000) to make 400 and 900.\n", 24 | "#### Given a roman numeral, convert it to an integer. Input is guaranteed to be within the range from 1 to 3999." 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "data": { 34 | "text/plain": [ 35 | "2004" 36 | ] 37 | }, 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "output_type": "execute_result" 41 | } 42 | ], 43 | "source": [ 44 | "s='IVMM'\n", 45 | "class Solution:\n", 46 | " def romanToInt(self, s: str) -> int:\n", 47 | " dict={'I':1,'V':5,'X':10,'L':50,'C':100,'D':500,'M':1000,\\\n", 48 | " 'IV':4,'IX':9,'XL':40,'XC':90,'CD':400,'CM':900}\n", 49 | " result = 0\n", 50 | " i=0\n", 51 | " while i < len(s):\n", 52 | " if i==len(s)-1:\n", 53 | " result+=dict[s[i]]\n", 54 | " i += 1\n", 55 | " else:\n", 56 | " if s[i]+s[i+1] in dict:\n", 57 | " result += dict[s[i]+s[i+1]]\n", 58 | " i +=2\n", 59 | " else:\n", 60 | " result += dict[s[i]]\n", 61 | " i+=1\n", 62 | " return result\n", 63 | "\n", 64 | "# Runtime: 44 ms, faster than 97.17% of Python3 online submissions for Roman to Integer.\n", 65 | "Solution.romanToInt(0,s)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "### Key point:\n", 73 | "#### To deal with `IV` etc. \n", 74 | "#### I put these into dictionary along with others and examine them each time make the transformation." 75 | ] 76 | } 77 | ], 78 | "metadata": { 79 | "kernelspec": { 80 | "display_name": "Python 3", 81 | "language": "python", 82 | "name": "python3" 83 | }, 84 | "language_info": { 85 | "codemirror_mode": { 86 | "name": "ipython", 87 | "version": 3 88 | }, 89 | "file_extension": ".py", 90 | "mimetype": "text/x-python", 91 | "name": "python", 92 | "nbconvert_exporter": "python", 93 | "pygments_lexer": "ipython3", 94 | "version": "3.7.3" 95 | } 96 | }, 97 | "nbformat": 4, 98 | "nbformat_minor": 2 99 | } 100 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#14_Longest Common Prefix.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Write a function to find the longest common prefix string amongst an array of strings.\n", 8 | "\n", 9 | "### If there is no common prefix, return an empty string \"\"." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 21, 15 | "metadata": {}, 16 | "outputs": [ 17 | { 18 | "data": { 19 | "text/plain": [ 20 | "'lush'" 21 | ] 22 | }, 23 | "execution_count": 21, 24 | "metadata": {}, 25 | "output_type": "execute_result" 26 | } 27 | ], 28 | "source": [ 29 | "s_list = ['lushuhan','lushu','lush']\n", 30 | "class Solution:\n", 31 | " def longestCommonPrefix(self, strs) -> str:\n", 32 | " loc=0\n", 33 | " n = len(strs)\n", 34 | " if n == 0: return ''\n", 35 | " min_word,max_word = strs[0],strs[0] \n", 36 | " for i in strs:\n", 37 | " if min_word > i: min_word = i\n", 38 | " if max_word < i: max_word = i\n", 39 | " \n", 40 | " iteration = min(len(min_word),len(max_word))\n", 41 | " for j in range(iteration):\n", 42 | " if min_word[j] == max_word[j]:\n", 43 | " loc+=1\n", 44 | " else: return min_word[:loc]\n", 45 | " return min_word[:loc]\n", 46 | "Solution.longestCommonPrefix(0,s_list)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "### Key Point:\n", 54 | "#### When a sequence(`list` or `number`) has order can can be compared.\n", 55 | "#### If the `min()` = `max()` then all the values are equal" 56 | ] 57 | } 58 | ], 59 | "metadata": { 60 | "kernelspec": { 61 | "display_name": "Python 3", 62 | "language": "python", 63 | "name": "python3" 64 | }, 65 | "language_info": { 66 | "codemirror_mode": { 67 | "name": "ipython", 68 | "version": 3 69 | }, 70 | "file_extension": ".py", 71 | "mimetype": "text/x-python", 72 | "name": "python", 73 | "nbconvert_exporter": "python", 74 | "pygments_lexer": "ipython3", 75 | "version": "3.7.3" 76 | } 77 | }, 78 | "nbformat": 4, 79 | "nbformat_minor": 2 80 | } 81 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#1_Two Sum.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Given an array of integers, return indices of the two numbers such that they add up to a specific target." 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import numpy as np\n" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "class Solution:\n", 27 | " def twoSum(self, nums: List[int], target: int) -> List[int]:\n", 28 | " dict={}\n", 29 | "\n", 30 | " for i in range(len(nums)):\n", 31 | " diff = target - nums[i]\n", 32 | " if diff in dict:\n", 33 | " return [dict[diff],i]\n", 34 | " else:\n", 35 | " dict[nums[i]] = i" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "### Key point: `dictionary`\n", 43 | "#### put difference between each element and target in the dict\n", 44 | "#### WHILE putting the difference, also examine whether this element match other's differences" 45 | ] 46 | } 47 | ], 48 | "metadata": { 49 | "kernelspec": { 50 | "display_name": "Python 3", 51 | "language": "python", 52 | "name": "python3" 53 | }, 54 | "language_info": { 55 | "codemirror_mode": { 56 | "name": "ipython", 57 | "version": 3 58 | }, 59 | "file_extension": ".py", 60 | "mimetype": "text/x-python", 61 | "name": "python", 62 | "nbconvert_exporter": "python", 63 | "pygments_lexer": "ipython3", 64 | "version": "3.7.3" 65 | } 66 | }, 67 | "nbformat": 4, 68 | "nbformat_minor": 2 69 | } 70 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#20_Valid Parentheses.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Given a string containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid.\n", 8 | "\n", 9 | "### An input string is valid if:\n", 10 | "\n", 11 | "### Open brackets must be closed by the same type of brackets.\n", 12 | "### Open brackets must be closed in the correct order." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 5, 18 | "metadata": {}, 19 | "outputs": [ 20 | { 21 | "data": { 22 | "text/plain": [ 23 | "False" 24 | ] 25 | }, 26 | "execution_count": 5, 27 | "metadata": {}, 28 | "output_type": "execute_result" 29 | } 30 | ], 31 | "source": [ 32 | "s='({}})'\n", 33 | "class Solution:\n", 34 | " def isValid(self, s: str) -> bool:\n", 35 | " if len(s) ==0: return True\n", 36 | " dict={'(':')','{':'}','[':']'}\n", 37 | " stack=[]\n", 38 | " \n", 39 | " for i in range(len(s)):\n", 40 | " if s[i] in dict:\n", 41 | " stack.append(dict[s[i]])\n", 42 | " elif len(stack)==0: return False\n", 43 | " elif s[i] != stack[-1]: return False\n", 44 | "\n", 45 | " elif s[i] == stack[-1]:\n", 46 | " stack.pop()\n", 47 | " \n", 48 | " if len(stack) == 0: return True\n", 49 | " else: return False\n", 50 | "Solution.isValid(0,s)\n", 51 | "# Runtime: 24 ms, faster than 99.67% of Python3 online submissions for Valid Parentheses.\n", 52 | "# Memory Usage: 12.9 MB, less than 100.00% of Python3 online submissions for Valid Parentheses." 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "### Key Point\n", 60 | "#### Using stack, it meets requirements pop out elements and at last exmaine if there is elements left\n", 61 | "### Important:\n", 62 | "### Consider every case of situations and edge cases!!!!" 63 | ] 64 | } 65 | ], 66 | "metadata": { 67 | "kernelspec": { 68 | "display_name": "Python 3", 69 | "language": "python", 70 | "name": "python3" 71 | }, 72 | "language_info": { 73 | "codemirror_mode": { 74 | "name": "ipython", 75 | "version": 3 76 | }, 77 | "file_extension": ".py", 78 | "mimetype": "text/x-python", 79 | "name": "python", 80 | "nbconvert_exporter": "python", 81 | "pygments_lexer": "ipython3", 82 | "version": "3.7.3" 83 | } 84 | }, 85 | "nbformat": 4, 86 | "nbformat_minor": 2 87 | } 88 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#26_Remove Duplicates from Sorted Array.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#### Given a sorted array nums, remove the duplicates in-place such that each element appear only once and return the new length.\n", 8 | "\n", 9 | "#### Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [] 18 | } 19 | ], 20 | "metadata": { 21 | "kernelspec": { 22 | "display_name": "Python 3", 23 | "language": "python", 24 | "name": "python3" 25 | }, 26 | "language_info": { 27 | "codemirror_mode": { 28 | "name": "ipython", 29 | "version": 3 30 | }, 31 | "file_extension": ".py", 32 | "mimetype": "text/x-python", 33 | "name": "python", 34 | "nbconvert_exporter": "python", 35 | "pygments_lexer": "ipython3", 36 | "version": "3.7.3" 37 | } 38 | }, 39 | "nbformat": 4, 40 | "nbformat_minor": 2 41 | } 42 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#53_Maximum Subarray.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Given an integer array nums, find the contiguous subarray\n", 8 | "### (containing at least one number) which has the largest sum and return its sum.\n", 9 | "### " 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 2, 15 | "metadata": {}, 16 | "outputs": [ 17 | { 18 | "data": { 19 | "text/plain": [ 20 | "6" 21 | ] 22 | }, 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "output_type": "execute_result" 26 | } 27 | ], 28 | "source": [ 29 | "nums=[-2,1,-3,4,-1,2,1,-5,4]\n", 30 | "class Solution:\n", 31 | " def maxSubArray(self, nums) -> int:\n", 32 | " if len(nums)==0: return 0\n", 33 | " length=len(nums)\n", 34 | " sum = None\n", 35 | " temp=0\n", 36 | " for i in nums:\n", 37 | " temp = max(i,temp+i)\n", 38 | " if sum is None or temp > sum:\n", 39 | " sum=temp\n", 40 | " return sum\n", 41 | " \n", 42 | "Solution.maxSubArray(0,nums)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "### Key Point\n", 50 | "#### When deciding whether to add a value, use the previous sum to \n", 51 | "#### compare the current single value. WHY???" 52 | ] 53 | } 54 | ], 55 | "metadata": { 56 | "kernelspec": { 57 | "display_name": "Python 3", 58 | "language": "python", 59 | "name": "python3" 60 | }, 61 | "language_info": { 62 | "codemirror_mode": { 63 | "name": "ipython", 64 | "version": 3 65 | }, 66 | "file_extension": ".py", 67 | "mimetype": "text/x-python", 68 | "name": "python", 69 | "nbconvert_exporter": "python", 70 | "pygments_lexer": "ipython3", 71 | "version": "3.7.3" 72 | } 73 | }, 74 | "nbformat": 4, 75 | "nbformat_minor": 2 76 | } 77 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#7 Reverse Interger.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Given a 32-bit signed integer, reverse digits of an integer." 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 11, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "data": { 17 | "text/plain": [ 18 | "123" 19 | ] 20 | }, 21 | "execution_count": 11, 22 | "metadata": {}, 23 | "output_type": "execute_result" 24 | } 25 | ], 26 | "source": [ 27 | "x=321\n", 28 | "class Solution:\n", 29 | " def reverse(self, x: int) -> int:\n", 30 | " result = 0\n", 31 | " neg_flag = 1\n", 32 | " if x ==0 | x>2**31 | x<-2**31 :\n", 33 | " return 0\n", 34 | " if x<0:\n", 35 | " neg_flag == -1\n", 36 | " while x>0:\n", 37 | " result =result*10 + x%10\n", 38 | " x=x//10\n", 39 | " return result\n", 40 | "Solution.reverse(0,321)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "### " 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 22, 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "data": { 57 | "text/plain": [ 58 | "False" 59 | ] 60 | }, 61 | "execution_count": 22, 62 | "metadata": {}, 63 | "output_type": "execute_result" 64 | } 65 | ], 66 | "source": [ 67 | "x = 1534236469\n", 68 | "x> (2**31) - 1" 69 | ] 70 | } 71 | ], 72 | "metadata": { 73 | "kernelspec": { 74 | "display_name": "Python 3", 75 | "language": "python", 76 | "name": "python3" 77 | }, 78 | "language_info": { 79 | "codemirror_mode": { 80 | "name": "ipython", 81 | "version": 3 82 | }, 83 | "file_extension": ".py", 84 | "mimetype": "text/x-python", 85 | "name": "python", 86 | "nbconvert_exporter": "python", 87 | "pygments_lexer": "ipython3", 88 | "version": "3.7.3" 89 | } 90 | }, 91 | "nbformat": 4, 92 | "nbformat_minor": 2 93 | } 94 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/#88_Merge Sorted Array.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as one sorted array." 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": null, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [] 16 | } 17 | ], 18 | "metadata": { 19 | "kernelspec": { 20 | "display_name": "Python 3", 21 | "language": "python", 22 | "name": "python3" 23 | }, 24 | "language_info": { 25 | "codemirror_mode": { 26 | "name": "ipython", 27 | "version": 3 28 | }, 29 | "file_extension": ".py", 30 | "mimetype": "text/x-python", 31 | "name": "python", 32 | "nbconvert_exporter": "python", 33 | "pygments_lexer": "ipython3", 34 | "version": "3.7.3" 35 | } 36 | }, 37 | "nbformat": 4, 38 | "nbformat_minor": 2 39 | } 40 | -------------------------------------------------------------------------------- /LeetCode_Algorithm and Data Structure/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Practice on Algorithm and Data Structure 3 | 4 | This is a record of my coding practice on algorithm and data structure questions on Leetcode and HackerRank. 5 | 6 | 7 | ## Getting Started 8 | 9 | ### Prerequisites 10 | 11 | For this practice, I mainly use the built-in functions in python so there is no pre-installed packages required. 12 | 13 | ## Authors 14 | 15 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/) - [Medium page](https://medium.com/@lushuhan95) 16 | 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Daily Practice for Coding 2 | 3 | This is a record of all my coding practice including Data Manipulation, Data Structure and Algorithm, Data Visualization. 4 | 5 | This is a list of all the practice: 6 | * [Data Manipulation](https://github.com/lush9516/Daily-Practice-for-Coding/tree/master/Data%20Manipulation%20Practice) 7 | * [Data Structure and Algorithm](https://github.com/lush9516/Daily-Practice-for-Coding/tree/master/LeetCode_Algorithm%20and%20Data%20Structure) 8 | * [Data Visualization](https://github.com/lush9516/Daily-Practice-for-Coding/tree/master/Data%20Visualization) 9 | 10 | ## Data Manipulation 11 | This part contains the following contents: 12 | * Rewtiring SQL queries using Python codes (mainly `pandas` package). Here is the related [blog](https://medium.com/swlh/reproducing-sql-queries-in-python-codes-35d90f716b1a). 13 | * Data Manipulation using `pandas` and `numpy` packages. Here is the related [blog](https://medium.com/swlh/exhaustive-introduction-to-pandas-in-python-cdfd9d3846f2). 14 | * Other techniques, such as web scraping, wordcloud, etc. 15 | 16 | ## Data Visualization 17 | This part contains my practice on data visualization using `matplotlib` , `seaborn` packages in Python. 18 | 19 | ## Data Structure and Algorithm 20 | This part contains my practice records on Data Structure and Algorithm using Leetcode questions. 21 | 22 | ## Author 23 | 24 | **Han(Shuhan) Lu** - *Initial work* - [LinkedIn page](https://www.linkedin.com/in/shuhan-lu/) - [Medium page](https://medium.com/@lushuhan95) 25 | 26 | 27 | --------------------------------------------------------------------------------