├── .gitignore ├── README.md ├── db ├── food.csv ├── new_and_specials.csv ├── orders.csv └── users.csv └── src.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | src-Copy1.ipynb -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Food-Recmmendation-System-Python 2 | 3 | 4 | This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur (given the potential for frauds & cyber-attacks) I will still complete the platform. 5 | 6 | Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers! 7 | 8 | ## Build instructions 9 | 10 | - ```git clone https://github.com/gsunit/Food-Recommendation-System-Pyhton.git``` 11 | - Run the Jupyter Notebook `src.ipynb` 12 | 13 | 14 | 15 | ## Demographic Filtering 16 | 17 | Suggesting the items that are well-received and popular among the users. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation. 18 | 19 | 20 | ```python 21 | import pandas as pd 22 | import numpy as np 23 | 24 | # Importing db of food items across all canteens registered on the platform 25 | df1=pd.read_csv('./db/food.csv') 26 | df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags'] 27 | 28 | df1 29 | ``` 30 | 31 | 32 | 33 | 34 |
35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 |
food_idtitlecanteen_idpricenum_orderscategoryavg_ratingnum_ratingtags
01Lala Maggi13035maggi3.910veg, spicy
12Cheese Maggi12540maggi3.815veg
23Masala Maggi12510maggi3.010veg, spicy
34Veg Maggi13025maggi2.55veg, healthy
45Paneer Tikka16050Punjabi4.630veg, healthy
56Chicken Tikka18040Punjabi4.228nonveg, healthy, spicy
126 |
127 | 128 | 129 | ## Results of demographic filtering 130 | ```python 131 | top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head() 132 | pop_items[['title', 'num_orders']].head() 133 | ``` 134 | 135 | 136 | 137 | 138 |
139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 |
titlenum_ratingavg_ratingscore
4Paneer Tikka304.64.288889
5Chicken Tikka284.24.013953
1Cheese Maggi153.83.733333
173 |
174 | 175 | 176 | 177 | 178 | 179 | 180 |
181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 |
titlenum_orders
4Paneer Tikka50
1Cheese Maggi40
5Chicken Tikka40
0Lala Maggi35
3Veg Maggi25
217 |
218 | 219 | 220 | 221 | ## Content Based Filtering 222 | 223 | A bit more personalised recommendation. We will analyse the past orders of the user and suggest back those items which are similar. 224 | 225 | Also, since each person has a "home canteen", the user should be notified of any new items included in the menu by the vendor. 226 | 227 | We will be use Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together, we create a "soup" of tags. "Soup" is a processed string correspnding to each item, formed using the constituents of tags, tile and category. 228 | 229 |
230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 |
food_idtitlecanteen_idpricenum_orderscategoryavg_ratingnum_ratingtagssoup
01Lala Maggi13035maggi3.910veg, spicyveg spicy lala maggi
12Cheese Maggi12540maggi3.815vegveg cheese maggi
23Masala Maggi12510maggi3.010veg, spicyveg spicy masala maggi
288 |
289 | 290 | 291 | ## Using CountVectorizer from Scikit-Learn 292 | 293 | ```python 294 | # Import CountVectorizer and create the count matrix 295 | from sklearn.feature_extraction.text import CountVectorizer 296 | count = CountVectorizer(stop_words='english') 297 | 298 | # df1['soup'] 299 | count_matrix = count.fit_transform(df1['soup']) 300 | 301 | # Compute the Cosine Similarity matrix based on the count_matrix 302 | from sklearn.metrics.pairwise import cosine_similarity 303 | cosine_sim = cosine_similarity(count_matrix, count_matrix) 304 | ``` 305 | 306 | 307 | ## Sample Recommendation 308 | 309 | ```python 310 | df1.loc[get_recommendations(title="Paneer Tikka")] 311 | ``` 312 | 313 | 314 | 315 | 316 |
317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 |
food_idtitlecanteen_idpricenum_orderscategoryavg_ratingnum_ratingtagssoup
56Chicken Tikka18040Punjabi4.228nonveg, healthy, spicynonveg healthy spicy chicken tikka punjabi
34Veg Maggi13025maggi2.55veg, healthyveg healthy maggi
362 |
363 | 364 | 365 | 366 | 367 | 368 | ### After all the hard work, we finally get the recommendations 369 | 370 | 371 | ```python 372 | personalised_recomms(orders, df1, current_user, columns) 373 | get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns) 374 | get_top_rated_items(top_rated_items, df1, columns) 375 | get_popular_items(pop_items, df1, columns).head(3) 376 | ``` 377 | 378 | 379 | 380 | 381 |
382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 |
titlecanteen_idpricecomment
0Veg Maggi130based on your past orders
1Paneer Tikka160based on your past orders
2Chicken Tikka180based on your past orders
416 |
417 | 418 | 419 | 420 | 421 | 422 | 423 |
424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 |
titlecanteen_idpricecomment
0Cheese Maggi125new/today's special item in your home canteen
444 |
445 | 446 | 447 | 448 | 449 | 450 | 451 |
452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 |
titlecanteen_idpricecomment
0Paneer Tikka160top rated items across canteens
1Chicken Tikka180top rated items across canteens
2Cheese Maggi125top rated items across canteens
487 |
488 | 489 | 490 | 491 | 492 | 493 | 494 |
495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 |
titlecanteen_idpricecomment
0Paneer Tikka160most popular items across canteens
1Cheese Maggi125most popular items across canteens
2Chicken Tikka180most popular items across canteens
530 |
531 | 532 | 533 | 534 | These are just simple algorithms to make personalised & general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first, then move on to other features! 535 | 536 | #### Star the repository and send in your PRs if you think the engine needs any improvement or help me implement some more advanced features. 537 | -------------------------------------------------------------------------------- /db/food.csv: -------------------------------------------------------------------------------- 1 | food_id,title,canteen_id,price,num_orders,category,avg_rating,num_rating,tags 2 | 1,Lala Maggi,1,30,35,maggi,3.9,10,"veg, spicy" 3 | 2,Cheese Maggi,1,25,40,maggi,3.8,15,"veg" 4 | 3,Masala Maggi,1,25,10,maggi,3,10,"veg, spicy" 5 | 4,Veg Maggi,1,30,25,maggi,2.5,5,"veg, healthy" 6 | 5,Paneer Tikka,1,60,50,Punjabi,4.6,30,"veg, healthy" 7 | 6,Chicken Tikka,1,80,40,Punjabi,4.2,28,"nonveg, healthy, spicy" -------------------------------------------------------------------------------- /db/new_and_specials.csv: -------------------------------------------------------------------------------- 1 | specials_id,canteen_id,food_id,date,type 2 | 1,1,2,2019-6-28,new 3 | 2,2,5,2019-6-28,special -------------------------------------------------------------------------------- /db/orders.csv: -------------------------------------------------------------------------------- 1 | order_id,user_id,food_id,canteen_id,date_time,status,amount 2 | 1,2,5,1,2019-06-28 9:26:03,served,60 3 | 2,3,5,1,2019-06-29 9:26:03,served,60 4 | 3,2,6,1,2019-06-30 9:26:03,served,80 5 | 4,1,5,1,2019-07-01 9:26:03,served,60 6 | 5,2,5,1,2019-07-02 9:26:03,served,60 7 | 6,2,5,1,2019-07-03 9:26:03,served,60 -------------------------------------------------------------------------------- /db/users.csv: -------------------------------------------------------------------------------- 1 | user_id,name,email,roll_no,hall,home_canteen 2 | 1,test_1,test1@test.com,1,12,1 3 | 2,test_2,test2@test.com,2,12,1 4 | 3,test_3,test3@test.com,3,12,2 5 | 4,test_4,test4@test.com,4,12,1 6 | 5,test_5,test5@test.com,5,12,1 -------------------------------------------------------------------------------- /src.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from IPython.core.interactiveshell import InteractiveShell\n", 10 | "InteractiveShell.ast_node_interactivity = \"all\"" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "# Lunchbox App ML Engine\n", 18 | "\n", 19 | "This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur, given the potential for frauds & cyber-attacks, I will complete the platform.\n", 20 | "\n", 21 | "Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers!" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## Demographic Filtering\n", 29 | "\n", 30 | "Suggesting the users items that were well-received and are popular among the users, in general. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "data": { 40 | "text/html": [ 41 | "
\n", 42 | "\n", 55 | "\n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | "
food_idtitlecanteen_idpricenum_orderscategoryavg_ratingnum_ratingtags
01Lala Maggi13035maggi3.910veg, spicy
12Cheese Maggi12540maggi3.815veg
23Masala Maggi12510maggi3.010veg, spicy
34Veg Maggi13025maggi2.55veg, healthy
45Paneer Tikka16050Punjabi4.630veg, healthy
56Chicken Tikka18040Punjabi4.228nonveg, healthy, spicy
\n", 145 | "
" 146 | ], 147 | "text/plain": [ 148 | " food_id title canteen_id price num_orders category avg_rating \\\n", 149 | "0 1 Lala Maggi 1 30 35 maggi 3.9 \n", 150 | "1 2 Cheese Maggi 1 25 40 maggi 3.8 \n", 151 | "2 3 Masala Maggi 1 25 10 maggi 3.0 \n", 152 | "3 4 Veg Maggi 1 30 25 maggi 2.5 \n", 153 | "4 5 Paneer Tikka 1 60 50 Punjabi 4.6 \n", 154 | "5 6 Chicken Tikka 1 80 40 Punjabi 4.2 \n", 155 | "\n", 156 | " num_rating tags \n", 157 | "0 10 veg, spicy \n", 158 | "1 15 veg \n", 159 | "2 10 veg, spicy \n", 160 | "3 5 veg, healthy \n", 161 | "4 30 veg, healthy \n", 162 | "5 28 nonveg, healthy, spicy " 163 | ] 164 | }, 165 | "execution_count": 2, 166 | "metadata": {}, 167 | "output_type": "execute_result" 168 | } 169 | ], 170 | "source": [ 171 | "import pandas as pd \n", 172 | "import numpy as np\n", 173 | "\n", 174 | "# Importing db of food items across all canteens registered on the platform\n", 175 | "df1=pd.read_csv('./db/food.csv')\n", 176 | "df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags']\n", 177 | "\n", 178 | "df1" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 3, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "# mean of average ratings of all items\n", 188 | "C= df1['avg_rating'].mean()\n", 189 | "\n", 190 | "# the minimum number of votes required to appear in recommendation list, i.e, 60th percentile among 'num_rating'\n", 191 | "m= df1['num_rating'].quantile(0.6)\n", 192 | "\n", 193 | "# items that qualify the criteria of minimum num of votes\n", 194 | "q_items = df1.copy().loc[df1['num_rating'] >= m]\n", 195 | "\n", 196 | "# Calculation of weighted rating based on the IMDB formula\n", 197 | "def weighted_rating(x, m=m, C=C):\n", 198 | " v = x['num_rating']\n", 199 | " R = x['avg_rating']\n", 200 | " return (v/(v+m) * R) + (m/(m+v) * C)\n", 201 | "\n", 202 | "# Applying weighted_rating to qualified items\n", 203 | "q_items['score'] = q_items.apply(weighted_rating, axis=1)\n", 204 | "\n", 205 | "# Shortlisting the top rated items and popular items\n", 206 | "top_rated_items = q_items.sort_values('score', ascending=False)\n", 207 | "pop_items= df1.sort_values('num_orders', ascending=False)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 4, 213 | "metadata": {}, 214 | "outputs": [ 215 | { 216 | "data": { 217 | "text/html": [ 218 | "
\n", 219 | "\n", 232 | "\n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | "
titlenum_ratingavg_ratingscore
4Paneer Tikka304.64.288889
5Chicken Tikka284.24.013953
1Cheese Maggi153.83.733333
\n", 266 | "
" 267 | ], 268 | "text/plain": [ 269 | " title num_rating avg_rating score\n", 270 | "4 Paneer Tikka 30 4.6 4.288889\n", 271 | "5 Chicken Tikka 28 4.2 4.013953\n", 272 | "1 Cheese Maggi 15 3.8 3.733333" 273 | ] 274 | }, 275 | "execution_count": 4, 276 | "metadata": {}, 277 | "output_type": "execute_result" 278 | }, 279 | { 280 | "data": { 281 | "text/html": [ 282 | "
\n", 283 | "\n", 296 | "\n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | "
titlenum_orders
4Paneer Tikka50
1Cheese Maggi40
5Chicken Tikka40
0Lala Maggi35
3Veg Maggi25
\n", 332 | "
" 333 | ], 334 | "text/plain": [ 335 | " title num_orders\n", 336 | "4 Paneer Tikka 50\n", 337 | "1 Cheese Maggi 40\n", 338 | "5 Chicken Tikka 40\n", 339 | "0 Lala Maggi 35\n", 340 | "3 Veg Maggi 25" 341 | ] 342 | }, 343 | "execution_count": 4, 344 | "metadata": {}, 345 | "output_type": "execute_result" 346 | } 347 | ], 348 | "source": [ 349 | "# Display results of demographic filtering\n", 350 | "top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head()\n", 351 | "pop_items[['title', 'num_orders']].head()" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "## Content Based Filtering\n", 359 | "\n", 360 | "A bit more personalised recommendation. We will be analysing the past orders of the user and suggesting back those items which are similar.\n", 361 | "\n", 362 | "Also, since each person has a \"home canteen\", the user should be notified any new items included in the menu by the vendor.\n", 363 | "\n", 364 | "We will be using Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together we create a \"soup\" of tags. \"Soup\" is a processed string correspnding to each item, formed using constituent words of tags, tile and category." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 5, 370 | "metadata": {}, 371 | "outputs": [ 372 | { 373 | "data": { 374 | "text/html": [ 375 | "
\n", 376 | "\n", 389 | "\n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | "
food_idtitlecanteen_idpricenum_orderscategoryavg_ratingnum_ratingtagssoup
01Lala Maggi13035maggi3.910veg, spicyveg spicy lala maggi
12Cheese Maggi12540maggi3.815vegveg cheese maggi
23Masala Maggi12510maggi3.010veg, spicyveg spicy masala maggi
\n", 447 | "
" 448 | ], 449 | "text/plain": [ 450 | " food_id title canteen_id price num_orders category avg_rating \\\n", 451 | "0 1 Lala Maggi 1 30 35 maggi 3.9 \n", 452 | "1 2 Cheese Maggi 1 25 40 maggi 3.8 \n", 453 | "2 3 Masala Maggi 1 25 10 maggi 3.0 \n", 454 | "\n", 455 | " num_rating tags soup \n", 456 | "0 10 veg, spicy veg spicy lala maggi \n", 457 | "1 15 veg veg cheese maggi \n", 458 | "2 10 veg, spicy veg spicy masala maggi " 459 | ] 460 | }, 461 | "execution_count": 5, 462 | "metadata": {}, 463 | "output_type": "execute_result" 464 | } 465 | ], 466 | "source": [ 467 | "# TODO: clean data\n", 468 | "\n", 469 | "# Creating soup string for each item\n", 470 | "def create_soup(x): \n", 471 | " tags = x['tags'].lower().split(', ')\n", 472 | " tags.extend(x['title'].lower().split())\n", 473 | " tags.extend(x['category'].lower().split())\n", 474 | " return \" \".join(sorted(set(tags), key=tags.index))\n", 475 | "\n", 476 | "df1['soup'] = df1.apply(create_soup, axis=1)\n", 477 | "df1.head(3)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": 6, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [ 486 | "# Import CountVectorizer and create the count matrix\n", 487 | "from sklearn.feature_extraction.text import CountVectorizer\n", 488 | "count = CountVectorizer(stop_words='english')\n", 489 | "\n", 490 | "# df1['soup']\n", 491 | "count_matrix = count.fit_transform(df1['soup'])\n", 492 | "\n", 493 | "# Compute the Cosine Similarity matrix based on the count_matrix\n", 494 | "from sklearn.metrics.pairwise import cosine_similarity\n", 495 | "cosine_sim = cosine_similarity(count_matrix, count_matrix)\n", 496 | "\n", 497 | "indices_from_title = pd.Series(df1.index, index=df1['title'])\n", 498 | "indices_from_food_id = pd.Series(df1.index, index=df1['food_id'])" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 7, 504 | "metadata": {}, 505 | "outputs": [], 506 | "source": [ 507 | "# Function that takes in food title or food id as input and outputs most similar dishes \n", 508 | "def get_recommendations(title=\"\", cosine_sim=cosine_sim, idx=-1):\n", 509 | " # Get the index of the item that matches the title\n", 510 | " if idx == -1 and title != \"\":\n", 511 | " idx = indices_from_title[title]\n", 512 | "\n", 513 | " # Get the pairwsie similarity scores of all dishes with that dish\n", 514 | " sim_scores = list(enumerate(cosine_sim[idx]))\n", 515 | "\n", 516 | " # Sort the dishes based on the similarity scores\n", 517 | " sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)\n", 518 | " \n", 519 | " # Get the scores of the 10 most similar dishes\n", 520 | " sim_scores = sim_scores[1:3]\n", 521 | "\n", 522 | " # Get the food indices\n", 523 | " food_indices = [i[0] for i in sim_scores]\n", 524 | "\n", 525 | " # Return the top 10 most similar dishes\n", 526 | " return food_indices" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": 8, 532 | "metadata": {}, 533 | "outputs": [ 534 | { 535 | "data": { 536 | "text/html": [ 537 | "
\n", 538 | "\n", 551 | "\n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | "
food_idtitlecanteen_idpricenum_orderscategoryavg_ratingnum_ratingtagssoup
56Chicken Tikka18040Punjabi4.228nonveg, healthy, spicynonveg healthy spicy chicken tikka punjabi
34Veg Maggi13025maggi2.55veg, healthyveg healthy maggi
\n", 596 | "
" 597 | ], 598 | "text/plain": [ 599 | " food_id title canteen_id price num_orders category avg_rating \\\n", 600 | "5 6 Chicken Tikka 1 80 40 Punjabi 4.2 \n", 601 | "3 4 Veg Maggi 1 30 25 maggi 2.5 \n", 602 | "\n", 603 | " num_rating tags \\\n", 604 | "5 28 nonveg, healthy, spicy \n", 605 | "3 5 veg, healthy \n", 606 | "\n", 607 | " soup \n", 608 | "5 nonveg healthy spicy chicken tikka punjabi \n", 609 | "3 veg healthy maggi " 610 | ] 611 | }, 612 | "execution_count": 8, 613 | "metadata": {}, 614 | "output_type": "execute_result" 615 | } 616 | ], 617 | "source": [ 618 | "df1.loc[get_recommendations(title=\"Paneer Tikka\")]" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "metadata": {}, 624 | "source": [ 625 | "We will now some functions, some of which are utility functions, others are actually the functions which will help get personalised recommendations for current user." 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": 9, 631 | "metadata": {}, 632 | "outputs": [], 633 | "source": [ 634 | "# fetch few past orders of a user, based on which personalized recommendations are to be made\n", 635 | "def get_latest_user_orders(user_id, orders, num_orders=3):\n", 636 | " counter = num_orders\n", 637 | " order_indices = []\n", 638 | " \n", 639 | " for index, row in orders[['user_id']].iterrows():\n", 640 | " if row.user_id == user_id:\n", 641 | " counter = counter -1\n", 642 | " order_indices.append(index)\n", 643 | " if counter == 0:\n", 644 | " break\n", 645 | " \n", 646 | " return order_indices\n", 647 | "\n", 648 | "# utility function that returns a DataFrame given the food_indices to be recommended\n", 649 | "def get_recomms_df(food_indices, df1, columns, comment):\n", 650 | " row = 0\n", 651 | " df = pd.DataFrame(columns=columns)\n", 652 | " \n", 653 | " for i in food_indices:\n", 654 | " df.loc[row] = df1[['title', 'canteen_id', 'price']].loc[i]\n", 655 | " df.loc[row].comment = comment\n", 656 | " row = row+1\n", 657 | " return df\n", 658 | "\n", 659 | "# return food_indices for accomplishing personalized recommendation using Count Vectorizer\n", 660 | "def personalised_recomms(orders, df1, user_id, columns, comment=\"based on your past orders\"):\n", 661 | " order_indices = get_latest_user_orders(user_id, orders)\n", 662 | " food_ids = []\n", 663 | " food_indices = []\n", 664 | " recomm_indices = []\n", 665 | " \n", 666 | " for i in order_indices:\n", 667 | " food_ids.append(orders.loc[i].food_id)\n", 668 | " for i in food_ids:\n", 669 | " food_indices.append(indices_from_food_id[i])\n", 670 | " for i in food_indices:\n", 671 | " recomm_indices.extend(get_recommendations(idx=i))\n", 672 | " \n", 673 | " return get_recomms_df(set(recomm_indices), df1, columns, comment)\n", 674 | "\n", 675 | "# Simply fetch new items added by vendor or today's special at home canteen\n", 676 | "def get_new_and_specials_recomms(new_and_specials, users, df1, canteen_id, columns, comment=\"new/today's special item in your home canteen\"):\n", 677 | " food_indices = []\n", 678 | " \n", 679 | " for index, row in new_and_specials[['canteen_id']].iterrows():\n", 680 | " if row.canteen_id == canteen_id:\n", 681 | " food_indices.append(indices_from_food_id[new_and_specials.loc[index].food_id])\n", 682 | " \n", 683 | " return get_recomms_df(set(food_indices), df1, columns, comment)\n", 684 | "\n", 685 | "# utility function to get the home canteen given a user id\n", 686 | "def get_user_home_canteen(users, user_id):\n", 687 | " for index, row in users[['user_id']].iterrows():\n", 688 | " if row.user_id == user_id:\n", 689 | " return users.loc[index].home_canteen\n", 690 | " return -1\n", 691 | "\n", 692 | "# fetch items from previously calculated top_rated_items list\n", 693 | "def get_top_rated_items(top_rated_items, df1, columns, comment=\"top rated items across canteens\"):\n", 694 | " food_indices = []\n", 695 | " \n", 696 | " for index, row in top_rated_items.iterrows():\n", 697 | " food_indices.append(indices_from_food_id[top_rated_items.loc[index].food_id])\n", 698 | " \n", 699 | " return get_recomms_df(food_indices, df1, columns, comment)\n", 700 | "\n", 701 | "# fetch items from previously calculated pop_items list\n", 702 | "def get_popular_items(pop_items, df1, columns, comment=\"most popular items across canteens\"):\n", 703 | " food_indices = []\n", 704 | " \n", 705 | " for index, row in pop_items.iterrows():\n", 706 | " food_indices.append(indices_from_food_id[pop_items.loc[index].food_id])\n", 707 | " \n", 708 | " return get_recomms_df(food_indices, df1, columns, comment)\n", 709 | " " 710 | ] 711 | }, 712 | { 713 | "cell_type": "markdown", 714 | "metadata": {}, 715 | "source": [ 716 | "### After all the hard work, we finally get the recommendations" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": 11, 722 | "metadata": {}, 723 | "outputs": [ 724 | { 725 | "data": { 726 | "text/html": [ 727 | "
\n", 728 | "\n", 741 | "\n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | "
titlecanteen_idpricecomment
0Veg Maggi130based on your past orders
1Paneer Tikka160based on your past orders
2Chicken Tikka180based on your past orders
\n", 775 | "
" 776 | ], 777 | "text/plain": [ 778 | " title canteen_id price comment\n", 779 | "0 Veg Maggi 1 30 based on your past orders\n", 780 | "1 Paneer Tikka 1 60 based on your past orders\n", 781 | "2 Chicken Tikka 1 80 based on your past orders" 782 | ] 783 | }, 784 | "execution_count": 11, 785 | "metadata": {}, 786 | "output_type": "execute_result" 787 | }, 788 | { 789 | "data": { 790 | "text/html": [ 791 | "
\n", 792 | "\n", 805 | "\n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | "
titlecanteen_idpricecomment
0Cheese Maggi125new/today's special item in your home canteen
\n", 825 | "
" 826 | ], 827 | "text/plain": [ 828 | " title canteen_id price \\\n", 829 | "0 Cheese Maggi 1 25 \n", 830 | "\n", 831 | " comment \n", 832 | "0 new/today's special item in your home canteen " 833 | ] 834 | }, 835 | "execution_count": 11, 836 | "metadata": {}, 837 | "output_type": "execute_result" 838 | }, 839 | { 840 | "data": { 841 | "text/html": [ 842 | "
\n", 843 | "\n", 856 | "\n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | "
titlecanteen_idpricecomment
0Paneer Tikka160top rated items across canteens
1Chicken Tikka180top rated items across canteens
2Cheese Maggi125top rated items across canteens
\n", 890 | "
" 891 | ], 892 | "text/plain": [ 893 | " title canteen_id price comment\n", 894 | "0 Paneer Tikka 1 60 top rated items across canteens\n", 895 | "1 Chicken Tikka 1 80 top rated items across canteens\n", 896 | "2 Cheese Maggi 1 25 top rated items across canteens" 897 | ] 898 | }, 899 | "execution_count": 11, 900 | "metadata": {}, 901 | "output_type": "execute_result" 902 | }, 903 | { 904 | "data": { 905 | "text/html": [ 906 | "
\n", 907 | "\n", 920 | "\n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | "
titlecanteen_idpricecomment
0Paneer Tikka160most popular items across canteens
1Cheese Maggi125most popular items across canteens
2Chicken Tikka180most popular items across canteens
\n", 954 | "
" 955 | ], 956 | "text/plain": [ 957 | " title canteen_id price comment\n", 958 | "0 Paneer Tikka 1 60 most popular items across canteens\n", 959 | "1 Cheese Maggi 1 25 most popular items across canteens\n", 960 | "2 Chicken Tikka 1 80 most popular items across canteens" 961 | ] 962 | }, 963 | "execution_count": 11, 964 | "metadata": {}, 965 | "output_type": "execute_result" 966 | } 967 | ], 968 | "source": [ 969 | "orders = pd.read_csv('./db/orders.csv')\n", 970 | "new_and_specials = pd.read_csv('./db/new_and_specials.csv')\n", 971 | "users = pd.read_csv('./db/users.csv')\n", 972 | "\n", 973 | "columns = ['title', 'canteen_id', 'price', 'comment']\n", 974 | "current_user = 2\n", 975 | "current_canteen = get_user_home_canteen(users, current_user)\n", 976 | "\n", 977 | "\n", 978 | "personalised_recomms(orders, df1, current_user, columns)\n", 979 | "get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns)\n", 980 | "get_top_rated_items(top_rated_items, df1, columns)\n", 981 | "get_popular_items(pop_items, df1, columns).head(3)" 982 | ] 983 | }, 984 | { 985 | "cell_type": "markdown", 986 | "metadata": {}, 987 | "source": [ 988 | "These are just simple algorithms to make personalised and even general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first! " 989 | ] 990 | }, 991 | { 992 | "cell_type": "markdown", 993 | "metadata": {}, 994 | "source": [ 995 | "#### Star the repository and send in your PRs if you think the engine needs any improvement or helping me implement some more advanced features." 996 | ] 997 | } 998 | ], 999 | "metadata": { 1000 | "kernelspec": { 1001 | "display_name": "Python 3", 1002 | "language": "python", 1003 | "name": "python3" 1004 | }, 1005 | "language_info": { 1006 | "codemirror_mode": { 1007 | "name": "ipython", 1008 | "version": 3 1009 | }, 1010 | "file_extension": ".py", 1011 | "mimetype": "text/x-python", 1012 | "name": "python", 1013 | "nbconvert_exporter": "python", 1014 | "pygments_lexer": "ipython3", 1015 | "version": "3.6.8" 1016 | } 1017 | }, 1018 | "nbformat": 4, 1019 | "nbformat_minor": 2 1020 | } 1021 | --------------------------------------------------------------------------------