├── .gitignore ├── README.md ├── db ├── food.csv ├── new_and_specials.csv ├── orders.csv └── users.csv └── src.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | src-Copy1.ipynb -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Food-Recmmendation-System-Python 2 | 3 | 4 | This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur (given the potential for frauds & cyber-attacks) I will still complete the platform. 5 | 6 | Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers! 7 | 8 | ## Build instructions 9 | 10 | - ```git clone https://github.com/gsunit/Food-Recommendation-System-Pyhton.git``` 11 | - Run the Jupyter Notebook `src.ipynb` 12 | 13 | 14 | 15 | ## Demographic Filtering 16 | 17 | Suggesting the items that are well-received and popular among the users. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation. 18 | 19 | 20 | ```python 21 | import pandas as pd 22 | import numpy as np 23 | 24 | # Importing db of food items across all canteens registered on the platform 25 | df1=pd.read_csv('./db/food.csv') 26 | df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags'] 27 | 28 | df1 29 | ``` 30 | 31 | 32 | 33 | 34 |

35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 |

	food_id	title	canteen_id	price	num_orders	category	avg_rating	num_rating	tags
0	1	Lala Maggi	1	30	35	maggi	3.9	10	veg, spicy
1	2	Cheese Maggi	1	25	40	maggi	3.8	15	veg
2	3	Masala Maggi	1	25	10	maggi	3.0	10	veg, spicy
3	4	Veg Maggi	1	30	25	maggi	2.5	5	veg, healthy
4	5	Paneer Tikka	1	60	50	Punjabi	4.6	30	veg, healthy
5	6	Chicken Tikka	1	80	40	Punjabi	4.2	28	nonveg, healthy, spicy

126 |

127 | 128 | 129 | ## Results of demographic filtering 130 | ```python 131 | top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head() 132 | pop_items[['title', 'num_orders']].head() 133 | ``` 134 | 135 | 136 | 137 | 138 |

139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 |

	title	num_rating	avg_rating	score
4	Paneer Tikka	30	4.6	4.288889
5	Chicken Tikka	28	4.2	4.013953
1	Cheese Maggi	15	3.8	3.733333

173 |

174 | 175 | 176 | 177 | 178 | 179 | 180 |

181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 |

	title	num_orders
4	Paneer Tikka	50
1	Cheese Maggi	40
5	Chicken Tikka	40
0	Lala Maggi	35
3	Veg Maggi	25

217 |

218 | 219 | 220 | 221 | ## Content Based Filtering 222 | 223 | A bit more personalised recommendation. We will analyse the past orders of the user and suggest back those items which are similar. 224 | 225 | Also, since each person has a "home canteen", the user should be notified of any new items included in the menu by the vendor. 226 | 227 | We will be use Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together, we create a "soup" of tags. "Soup" is a processed string correspnding to each item, formed using the constituents of tags, tile and category. 228 | 229 |

230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 |

	food_id	title	canteen_id	price	num_orders	category	avg_rating	num_rating	tags	soup
0	1	Lala Maggi	1	30	35	maggi	3.9	10	veg, spicy	veg spicy lala maggi
1	2	Cheese Maggi	1	25	40	maggi	3.8	15	veg	veg cheese maggi
2	3	Masala Maggi	1	25	10	maggi	3.0	10	veg, spicy	veg spicy masala maggi

288 |

289 | 290 | 291 | ## Using CountVectorizer from Scikit-Learn 292 | 293 | ```python 294 | # Import CountVectorizer and create the count matrix 295 | from sklearn.feature_extraction.text import CountVectorizer 296 | count = CountVectorizer(stop_words='english') 297 | 298 | # df1['soup'] 299 | count_matrix = count.fit_transform(df1['soup']) 300 | 301 | # Compute the Cosine Similarity matrix based on the count_matrix 302 | from sklearn.metrics.pairwise import cosine_similarity 303 | cosine_sim = cosine_similarity(count_matrix, count_matrix) 304 | ``` 305 | 306 | 307 | ## Sample Recommendation 308 | 309 | ```python 310 | df1.loc[get_recommendations(title="Paneer Tikka")] 311 | ``` 312 | 313 | 314 | 315 | 316 |

317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 |

	food_id	title	canteen_id	price	num_orders	category	avg_rating	num_rating	tags	soup
5	6	Chicken Tikka	1	80	40	Punjabi	4.2	28	nonveg, healthy, spicy	nonveg healthy spicy chicken tikka punjabi
3	4	Veg Maggi	1	30	25	maggi	2.5	5	veg, healthy	veg healthy maggi

362 |

363 | 364 | 365 | 366 | 367 | 368 | ### After all the hard work, we finally get the recommendations 369 | 370 | 371 | ```python 372 | personalised_recomms(orders, df1, current_user, columns) 373 | get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns) 374 | get_top_rated_items(top_rated_items, df1, columns) 375 | get_popular_items(pop_items, df1, columns).head(3) 376 | ``` 377 | 378 | 379 | 380 | 381 |

382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 |

	title	canteen_id	price	comment
0	Veg Maggi	1	30	based on your past orders
1	Paneer Tikka	1	60	based on your past orders
2	Chicken Tikka	1	80	based on your past orders

416 |

417 | 418 | 419 | 420 | 421 | 422 | 423 |

424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 |

	title	canteen_id	price	comment
0	Cheese Maggi	1	25	new/today's special item in your home canteen

444 |

445 | 446 | 447 | 448 | 449 | 450 | 451 |

452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 |

	title	canteen_id	price	comment
0	Paneer Tikka	1	60	top rated items across canteens
1	Chicken Tikka	1	80	top rated items across canteens
2	Cheese Maggi	1	25	top rated items across canteens

487 |

488 | 489 | 490 | 491 | 492 | 493 | 494 |

495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 |

	title	canteen_id	price	comment
0	Paneer Tikka	1	60	most popular items across canteens
1	Cheese Maggi	1	25	most popular items across canteens
2	Chicken Tikka	1	80	most popular items across canteens

530 |

531 | 532 | 533 | 534 | These are just simple algorithms to make personalised & general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first, then move on to other features! 535 | 536 | #### Star the repository and send in your PRs if you think the engine needs any improvement or help me implement some more advanced features. 537 | -------------------------------------------------------------------------------- /db/food.csv: -------------------------------------------------------------------------------- 1 | food_id,title,canteen_id,price,num_orders,category,avg_rating,num_rating,tags 2 | 1,Lala Maggi,1,30,35,maggi,3.9,10,"veg, spicy" 3 | 2,Cheese Maggi,1,25,40,maggi,3.8,15,"veg" 4 | 3,Masala Maggi,1,25,10,maggi,3,10,"veg, spicy" 5 | 4,Veg Maggi,1,30,25,maggi,2.5,5,"veg, healthy" 6 | 5,Paneer Tikka,1,60,50,Punjabi,4.6,30,"veg, healthy" 7 | 6,Chicken Tikka,1,80,40,Punjabi,4.2,28,"nonveg, healthy, spicy" -------------------------------------------------------------------------------- /db/new_and_specials.csv: -------------------------------------------------------------------------------- 1 | specials_id,canteen_id,food_id,date,type 2 | 1,1,2,2019-6-28,new 3 | 2,2,5,2019-6-28,special -------------------------------------------------------------------------------- /db/orders.csv: -------------------------------------------------------------------------------- 1 | order_id,user_id,food_id,canteen_id,date_time,status,amount 2 | 1,2,5,1,2019-06-28 9:26:03,served,60 3 | 2,3,5,1,2019-06-29 9:26:03,served,60 4 | 3,2,6,1,2019-06-30 9:26:03,served,80 5 | 4,1,5,1,2019-07-01 9:26:03,served,60 6 | 5,2,5,1,2019-07-02 9:26:03,served,60 7 | 6,2,5,1,2019-07-03 9:26:03,served,60 -------------------------------------------------------------------------------- /db/users.csv: -------------------------------------------------------------------------------- 1 | user_id,name,email,roll_no,hall,home_canteen 2 | 1,test_1,test1@test.com,1,12,1 3 | 2,test_2,test2@test.com,2,12,1 4 | 3,test_3,test3@test.com,3,12,2 5 | 4,test_4,test4@test.com,4,12,1 6 | 5,test_5,test5@test.com,5,12,1 -------------------------------------------------------------------------------- /src.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from IPython.core.interactiveshell import InteractiveShell\n", 10 | "InteractiveShell.ast_node_interactivity = \"all\"" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "# Lunchbox App ML Engine\n", 18 | "\n", 19 | "This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur, given the potential for frauds & cyber-attacks, I will complete the platform.\n", 20 | "\n", 21 | "Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers!" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## Demographic Filtering\n", 29 | "\n", 30 | "Suggesting the users items that were well-received and are popular among the users, in general. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "data": { 40 | "text/html": [ 41 | "

\n", 42 | "\n", 55 | "\n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | "

	food_id	title	canteen_id	price	num_orders	category	avg_rating	num_rating	tags
0	1	Lala Maggi	1	30	35	maggi	3.9	10	veg, spicy
1	2	Cheese Maggi	1	25	40	maggi	3.8	15	veg
2	3	Masala Maggi	1	25	10	maggi	3.0	10	veg, spicy
3	4	Veg Maggi	1	30	25	maggi	2.5	5	veg, healthy
4	5	Paneer Tikka	1	60	50	Punjabi	4.6	30	veg, healthy
5	6	Chicken Tikka	1	80	40	Punjabi	4.2	28	nonveg, healthy, spicy

\n", 145 | "

" 146 | ], 147 | "text/plain": [ 148 | " food_id title canteen_id price num_orders category avg_rating \\\n", 149 | "0 1 Lala Maggi 1 30 35 maggi 3.9 \n", 150 | "1 2 Cheese Maggi 1 25 40 maggi 3.8 \n", 151 | "2 3 Masala Maggi 1 25 10 maggi 3.0 \n", 152 | "3 4 Veg Maggi 1 30 25 maggi 2.5 \n", 153 | "4 5 Paneer Tikka 1 60 50 Punjabi 4.6 \n", 154 | "5 6 Chicken Tikka 1 80 40 Punjabi 4.2 \n", 155 | "\n", 156 | " num_rating tags \n", 157 | "0 10 veg, spicy \n", 158 | "1 15 veg \n", 159 | "2 10 veg, spicy \n", 160 | "3 5 veg, healthy \n", 161 | "4 30 veg, healthy \n", 162 | "5 28 nonveg, healthy, spicy " 163 | ] 164 | }, 165 | "execution_count": 2, 166 | "metadata": {}, 167 | "output_type": "execute_result" 168 | } 169 | ], 170 | "source": [ 171 | "import pandas as pd \n", 172 | "import numpy as np\n", 173 | "\n", 174 | "# Importing db of food items across all canteens registered on the platform\n", 175 | "df1=pd.read_csv('./db/food.csv')\n", 176 | "df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags']\n", 177 | "\n", 178 | "df1" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 3, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "# mean of average ratings of all items\n", 188 | "C= df1['avg_rating'].mean()\n", 189 | "\n", 190 | "# the minimum number of votes required to appear in recommendation list, i.e, 60th percentile among 'num_rating'\n", 191 | "m= df1['num_rating'].quantile(0.6)\n", 192 | "\n", 193 | "# items that qualify the criteria of minimum num of votes\n", 194 | "q_items = df1.copy().loc[df1['num_rating'] >= m]\n", 195 | "\n", 196 | "# Calculation of weighted rating based on the IMDB formula\n", 197 | "def weighted_rating(x, m=m, C=C):\n", 198 | " v = x['num_rating']\n", 199 | " R = x['avg_rating']\n", 200 | " return (v/(v+m) * R) + (m/(m+v) * C)\n", 201 | "\n", 202 | "# Applying weighted_rating to qualified items\n", 203 | "q_items['score'] = q_items.apply(weighted_rating, axis=1)\n", 204 | "\n", 205 | "# Shortlisting the top rated items and popular items\n", 206 | "top_rated_items = q_items.sort_values('score', ascending=False)\n", 207 | "pop_items= df1.sort_values('num_orders', ascending=False)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 4, 213 | "metadata": {}, 214 | "outputs": [ 215 | { 216 | "data": { 217 | "text/html": [ 218 | "

\n", 219 | "\n", 232 | "\n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | "

	title	num_rating	avg_rating	score
4	Paneer Tikka	30	4.6	4.288889
5	Chicken Tikka	28	4.2	4.013953
1	Cheese Maggi	15	3.8	3.733333

\n", 266 | "

" 267 | ], 268 | "text/plain": [ 269 | " title num_rating avg_rating score\n", 270 | "4 Paneer Tikka 30 4.6 4.288889\n", 271 | "5 Chicken Tikka 28 4.2 4.013953\n", 272 | "1 Cheese Maggi 15 3.8 3.733333" 273 | ] 274 | }, 275 | "execution_count": 4, 276 | "metadata": {}, 277 | "output_type": "execute_result" 278 | }, 279 | { 280 | "data": { 281 | "text/html": [ 282 | "

\n", 283 | "\n", 296 | "\n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | "

	title	num_orders
4	Paneer Tikka	50
1	Cheese Maggi	40
5	Chicken Tikka	40
0	Lala Maggi	35
3	Veg Maggi	25

\n", 332 | "

" 333 | ], 334 | "text/plain": [ 335 | " title num_orders\n", 336 | "4 Paneer Tikka 50\n", 337 | "1 Cheese Maggi 40\n", 338 | "5 Chicken Tikka 40\n", 339 | "0 Lala Maggi 35\n", 340 | "3 Veg Maggi 25" 341 | ] 342 | }, 343 | "execution_count": 4, 344 | "metadata": {}, 345 | "output_type": "execute_result" 346 | } 347 | ], 348 | "source": [ 349 | "# Display results of demographic filtering\n", 350 | "top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head()\n", 351 | "pop_items[['title', 'num_orders']].head()" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "## Content Based Filtering\n", 359 | "\n", 360 | "A bit more personalised recommendation. We will be analysing the past orders of the user and suggesting back those items which are similar.\n", 361 | "\n", 362 | "Also, since each person has a \"home canteen\", the user should be notified any new items included in the menu by the vendor.\n", 363 | "\n", 364 | "We will be using Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together we create a \"soup\" of tags. \"Soup\" is a processed string correspnding to each item, formed using constituent words of tags, tile and category." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 5, 370 | "metadata": {}, 371 | "outputs": [ 372 | { 373 | "data": { 374 | "text/html": [ 375 | "

\n", 376 | "\n", 389 | "\n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | "

	food_id	title	canteen_id	price	num_orders	category	avg_rating	num_rating	tags	soup
0	1	Lala Maggi	1	30	35	maggi	3.9	10	veg, spicy	veg spicy lala maggi
1	2	Cheese Maggi	1	25	40	maggi	3.8	15	veg	veg cheese maggi
2	3	Masala Maggi	1	25	10	maggi	3.0	10	veg, spicy	veg spicy masala maggi

\n", 447 | "

" 448 | ], 449 | "text/plain": [ 450 | " food_id title canteen_id price num_orders category avg_rating \\\n", 451 | "0 1 Lala Maggi 1 30 35 maggi 3.9 \n", 452 | "1 2 Cheese Maggi 1 25 40 maggi 3.8 \n", 453 | "2 3 Masala Maggi 1 25 10 maggi 3.0 \n", 454 | "\n", 455 | " num_rating tags soup \n", 456 | "0 10 veg, spicy veg spicy lala maggi \n", 457 | "1 15 veg veg cheese maggi \n", 458 | "2 10 veg, spicy veg spicy masala maggi " 459 | ] 460 | }, 461 | "execution_count": 5, 462 | "metadata": {}, 463 | "output_type": "execute_result" 464 | } 465 | ], 466 | "source": [ 467 | "# TODO: clean data\n", 468 | "\n", 469 | "# Creating soup string for each item\n", 470 | "def create_soup(x): \n", 471 | " tags = x['tags'].lower().split(', ')\n", 472 | " tags.extend(x['title'].lower().split())\n", 473 | " tags.extend(x['category'].lower().split())\n", 474 | " return \" \".join(sorted(set(tags), key=tags.index))\n", 475 | "\n", 476 | "df1['soup'] = df1.apply(create_soup, axis=1)\n", 477 | "df1.head(3)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 6, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "# Import CountVectorizer and create the count matrix\n", 487 | "from sklearn.feature_extraction.text import CountVectorizer\n", 488 | "count = CountVectorizer(stop_words='english')\n", 489 | "\n", 490 | "# df1['soup']\n", 491 | "count_matrix = count.fit_transform(df1['soup'])\n", 492 | "\n", 493 | "# Compute the Cosine Similarity matrix based on the count_matrix\n", 494 | "from sklearn.metrics.pairwise import cosine_similarity\n", 495 | "cosine_sim = cosine_similarity(count_matrix, count_matrix)\n", 496 | "\n", 497 | "indices_from_title = pd.Series(df1.index, index=df1['title'])\n", 498 | "indices_from_food_id = pd.Series(df1.index, index=df1['food_id'])" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 7, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "# Function that takes in food title or food id as input and outputs most similar dishes \n", 508 | "def get_recommendations(title=\"\", cosine_sim=cosine_sim, idx=-1):\n", 509 | " # Get the index of the item that matches the title\n", 510 | " if idx == -1 and title != \"\":\n", 511 | " idx = indices_from_title[title]\n", 512 | "\n", 513 | " # Get the pairwsie similarity scores of all dishes with that dish\n", 514 | " sim_scores = list(enumerate(cosine_sim[idx]))\n", 515 | "\n", 516 | " # Sort the dishes based on the similarity scores\n", 517 | " sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)\n", 518 | " \n", 519 | " # Get the scores of the 10 most similar dishes\n", 520 | " sim_scores = sim_scores[1:3]\n", 521 | "\n", 522 | " # Get the food indices\n", 523 | " food_indices = [i[0] for i in sim_scores]\n", 524 | "\n", 525 | " # Return the top 10 most similar dishes\n", 526 | " return food_indices" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": 8, 532 | "metadata": {}, 533 | "outputs": [ 534 | { 535 | "data": { 536 | "text/html": [ 537 | "

\n", 538 | "\n", 551 | "\n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | "

	food_id	title	canteen_id	price	num_orders	category	avg_rating	num_rating	tags	soup
5	6	Chicken Tikka	1	80	40	Punjabi	4.2	28	nonveg, healthy, spicy	nonveg healthy spicy chicken tikka punjabi
3	4	Veg Maggi	1	30	25	maggi	2.5	5	veg, healthy	veg healthy maggi

\n", 596 | "

" 597 | ], 598 | "text/plain": [ 599 | " food_id title canteen_id price num_orders category avg_rating \\\n", 600 | "5 6 Chicken Tikka 1 80 40 Punjabi 4.2 \n", 601 | "3 4 Veg Maggi 1 30 25 maggi 2.5 \n", 602 | "\n", 603 | " num_rating tags \\\n", 604 | "5 28 nonveg, healthy, spicy \n", 605 | "3 5 veg, healthy \n", 606 | "\n", 607 | " soup \n", 608 | "5 nonveg healthy spicy chicken tikka punjabi \n", 609 | "3 veg healthy maggi " 610 | ] 611 | }, 612 | "execution_count": 8, 613 | "metadata": {}, 614 | "output_type": "execute_result" 615 | } 616 | ], 617 | "source": [ 618 | "df1.loc[get_recommendations(title=\"Paneer Tikka\")]" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": {}, 624 | "source": [ 625 | "We will now some functions, some of which are utility functions, others are actually the functions which will help get personalised recommendations for current user." 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": 9, 631 | "metadata": {}, 632 | "outputs": [], 633 | "source": [ 634 | "# fetch few past orders of a user, based on which personalized recommendations are to be made\n", 635 | "def get_latest_user_orders(user_id, orders, num_orders=3):\n", 636 | " counter = num_orders\n", 637 | " order_indices = []\n", 638 | " \n", 639 | " for index, row in orders[['user_id']].iterrows():\n", 640 | " if row.user_id == user_id:\n", 641 | " counter = counter -1\n", 642 | " order_indices.append(index)\n", 643 | " if counter == 0:\n", 644 | " break\n", 645 | " \n", 646 | " return order_indices\n", 647 | "\n", 648 | "# utility function that returns a DataFrame given the food_indices to be recommended\n", 649 | "def get_recomms_df(food_indices, df1, columns, comment):\n", 650 | " row = 0\n", 651 | " df = pd.DataFrame(columns=columns)\n", 652 | " \n", 653 | " for i in food_indices:\n", 654 | " df.loc[row] = df1[['title', 'canteen_id', 'price']].loc[i]\n", 655 | " df.loc[row].comment = comment\n", 656 | " row = row+1\n", 657 | " return df\n", 658 | "\n", 659 | "# return food_indices for accomplishing personalized recommendation using Count Vectorizer\n", 660 | "def personalised_recomms(orders, df1, user_id, columns, comment=\"based on your past orders\"):\n", 661 | " order_indices = get_latest_user_orders(user_id, orders)\n", 662 | " food_ids = []\n", 663 | " food_indices = []\n", 664 | " recomm_indices = []\n", 665 | " \n", 666 | " for i in order_indices:\n", 667 | " food_ids.append(orders.loc[i].food_id)\n", 668 | " for i in food_ids:\n", 669 | " food_indices.append(indices_from_food_id[i])\n", 670 | " for i in food_indices:\n", 671 | " recomm_indices.extend(get_recommendations(idx=i))\n", 672 | " \n", 673 | " return get_recomms_df(set(recomm_indices), df1, columns, comment)\n", 674 | "\n", 675 | "# Simply fetch new items added by vendor or today's special at home canteen\n", 676 | "def get_new_and_specials_recomms(new_and_specials, users, df1, canteen_id, columns, comment=\"new/today's special item in your home canteen\"):\n", 677 | " food_indices = []\n", 678 | " \n", 679 | " for index, row in new_and_specials[['canteen_id']].iterrows():\n", 680 | " if row.canteen_id == canteen_id:\n", 681 | " food_indices.append(indices_from_food_id[new_and_specials.loc[index].food_id])\n", 682 | " \n", 683 | " return get_recomms_df(set(food_indices), df1, columns, comment)\n", 684 | "\n", 685 | "# utility function to get the home canteen given a user id\n", 686 | "def get_user_home_canteen(users, user_id):\n", 687 | " for index, row in users[['user_id']].iterrows():\n", 688 | " if row.user_id == user_id:\n", 689 | " return users.loc[index].home_canteen\n", 690 | " return -1\n", 691 | "\n", 692 | "# fetch items from previously calculated top_rated_items list\n", 693 | "def get_top_rated_items(top_rated_items, df1, columns, comment=\"top rated items across canteens\"):\n", 694 | " food_indices = []\n", 695 | " \n", 696 | " for index, row in top_rated_items.iterrows():\n", 697 | " food_indices.append(indices_from_food_id[top_rated_items.loc[index].food_id])\n", 698 | " \n", 699 | " return get_recomms_df(food_indices, df1, columns, comment)\n", 700 | "\n", 701 | "# fetch items from previously calculated pop_items list\n", 702 | "def get_popular_items(pop_items, df1, columns, comment=\"most popular items across canteens\"):\n", 703 | " food_indices = []\n", 704 | " \n", 705 | " for index, row in pop_items.iterrows():\n", 706 | " food_indices.append(indices_from_food_id[pop_items.loc[index].food_id])\n", 707 | " \n", 708 | " return get_recomms_df(food_indices, df1, columns, comment)\n", 709 | " " 710 | ] 711 | }, 712 | { 713 | "cell_type": "markdown", 714 | "metadata": {}, 715 | "source": [ 716 | "### After all the hard work, we finally get the recommendations" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": 11, 722 | "metadata": {}, 723 | "outputs": [ 724 | { 725 | "data": { 726 | "text/html": [ 727 | "

\n", 728 | "\n", 741 | "\n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | "

	title	canteen_id	price	comment
0	Veg Maggi	1	30	based on your past orders
1	Paneer Tikka	1	60	based on your past orders
2	Chicken Tikka	1	80	based on your past orders

\n", 775 | "

" 776 | ], 777 | "text/plain": [ 778 | " title canteen_id price comment\n", 779 | "0 Veg Maggi 1 30 based on your past orders\n", 780 | "1 Paneer Tikka 1 60 based on your past orders\n", 781 | "2 Chicken Tikka 1 80 based on your past orders" 782 | ] 783 | }, 784 | "execution_count": 11, 785 | "metadata": {}, 786 | "output_type": "execute_result" 787 | }, 788 | { 789 | "data": { 790 | "text/html": [ 791 | "

\n", 792 | "\n", 805 | "\n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | "

	title	canteen_id	price	comment
0	Cheese Maggi	1	25	new/today's special item in your home canteen

\n", 825 | "

" 826 | ], 827 | "text/plain": [ 828 | " title canteen_id price \\\n", 829 | "0 Cheese Maggi 1 25 \n", 830 | "\n", 831 | " comment \n", 832 | "0 new/today's special item in your home canteen " 833 | ] 834 | }, 835 | "execution_count": 11, 836 | "metadata": {}, 837 | "output_type": "execute_result" 838 | }, 839 | { 840 | "data": { 841 | "text/html": [ 842 | "

\n", 843 | "\n", 856 | "\n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | "

	title	canteen_id	price	comment
0	Paneer Tikka	1	60	top rated items across canteens
1	Chicken Tikka	1	80	top rated items across canteens
2	Cheese Maggi	1	25	top rated items across canteens

\n", 890 | "

" 891 | ], 892 | "text/plain": [ 893 | " title canteen_id price comment\n", 894 | "0 Paneer Tikka 1 60 top rated items across canteens\n", 895 | "1 Chicken Tikka 1 80 top rated items across canteens\n", 896 | "2 Cheese Maggi 1 25 top rated items across canteens" 897 | ] 898 | }, 899 | "execution_count": 11, 900 | "metadata": {}, 901 | "output_type": "execute_result" 902 | }, 903 | { 904 | "data": { 905 | "text/html": [ 906 | "

\n", 907 | "\n", 920 | "\n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | "

	title	canteen_id	price	comment
0	Paneer Tikka	1	60	most popular items across canteens
1	Cheese Maggi	1	25	most popular items across canteens
2	Chicken Tikka	1	80	most popular items across canteens

\n", 954 | "

" 955 | ], 956 | "text/plain": [ 957 | " title canteen_id price comment\n", 958 | "0 Paneer Tikka 1 60 most popular items across canteens\n", 959 | "1 Cheese Maggi 1 25 most popular items across canteens\n", 960 | "2 Chicken Tikka 1 80 most popular items across canteens" 961 | ] 962 | }, 963 | "execution_count": 11, 964 | "metadata": {}, 965 | "output_type": "execute_result" 966 | } 967 | ], 968 | "source": [ 969 | "orders = pd.read_csv('./db/orders.csv')\n", 970 | "new_and_specials = pd.read_csv('./db/new_and_specials.csv')\n", 971 | "users = pd.read_csv('./db/users.csv')\n", 972 | "\n", 973 | "columns = ['title', 'canteen_id', 'price', 'comment']\n", 974 | "current_user = 2\n", 975 | "current_canteen = get_user_home_canteen(users, current_user)\n", 976 | "\n", 977 | "\n", 978 | "personalised_recomms(orders, df1, current_user, columns)\n", 979 | "get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns)\n", 980 | "get_top_rated_items(top_rated_items, df1, columns)\n", 981 | "get_popular_items(pop_items, df1, columns).head(3)" 982 | ] 983 | }, 984 | { 985 | "cell_type": "markdown", 986 | "metadata": {}, 987 | "source": [ 988 | "These are just simple algorithms to make personalised and even general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first! " 989 | ] 990 | }, 991 | { 992 | "cell_type": "markdown", 993 | "metadata": {}, 994 | "source": [ 995 | "#### Star the repository and send in your PRs if you think the engine needs any improvement or helping me implement some more advanced features." 996 | ] 997 | } 998 | ], 999 | "metadata": { 1000 | "kernelspec": { 1001 | "display_name": "Python 3", 1002 | "language": "python", 1003 | "name": "python3" 1004 | }, 1005 | "language_info": { 1006 | "codemirror_mode": { 1007 | "name": "ipython", 1008 | "version": 3 1009 | }, 1010 | "file_extension": ".py", 1011 | "mimetype": "text/x-python", 1012 | "name": "python", 1013 | "nbconvert_exporter": "python", 1014 | "pygments_lexer": "ipython3", 1015 | "version": "3.6.8" 1016 | } 1017 | }, 1018 | "nbformat": 4, 1019 | "nbformat_minor": 2 1020 | } 1021 | --------------------------------------------------------------------------------