├── .gitignore
├── README.md
├── db
├── food.csv
├── new_and_specials.csv
├── orders.csv
└── users.csv
└── src.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints
2 | src-Copy1.ipynb
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Food-Recmmendation-System-Python
2 |
3 |
4 | This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur (given the potential for frauds & cyber-attacks) I will still complete the platform.
5 |
6 | Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers!
7 |
8 | ## Build instructions
9 |
10 | - ```git clone https://github.com/gsunit/Food-Recommendation-System-Pyhton.git```
11 | - Run the Jupyter Notebook `src.ipynb`
12 |
13 |
14 |
15 | ## Demographic Filtering
16 |
17 | Suggesting the items that are well-received and popular among the users. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation.
18 |
19 |
20 | ```python
21 | import pandas as pd
22 | import numpy as np
23 |
24 | # Importing db of food items across all canteens registered on the platform
25 | df1=pd.read_csv('./db/food.csv')
26 | df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags']
27 |
28 | df1
29 | ```
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 | |
40 | food_id |
41 | title |
42 | canteen_id |
43 | price |
44 | num_orders |
45 | category |
46 | avg_rating |
47 | num_rating |
48 | tags |
49 |
50 |
51 |
52 |
53 | 0 |
54 | 1 |
55 | Lala Maggi |
56 | 1 |
57 | 30 |
58 | 35 |
59 | maggi |
60 | 3.9 |
61 | 10 |
62 | veg, spicy |
63 |
64 |
65 | 1 |
66 | 2 |
67 | Cheese Maggi |
68 | 1 |
69 | 25 |
70 | 40 |
71 | maggi |
72 | 3.8 |
73 | 15 |
74 | veg |
75 |
76 |
77 | 2 |
78 | 3 |
79 | Masala Maggi |
80 | 1 |
81 | 25 |
82 | 10 |
83 | maggi |
84 | 3.0 |
85 | 10 |
86 | veg, spicy |
87 |
88 |
89 | 3 |
90 | 4 |
91 | Veg Maggi |
92 | 1 |
93 | 30 |
94 | 25 |
95 | maggi |
96 | 2.5 |
97 | 5 |
98 | veg, healthy |
99 |
100 |
101 | 4 |
102 | 5 |
103 | Paneer Tikka |
104 | 1 |
105 | 60 |
106 | 50 |
107 | Punjabi |
108 | 4.6 |
109 | 30 |
110 | veg, healthy |
111 |
112 |
113 | 5 |
114 | 6 |
115 | Chicken Tikka |
116 | 1 |
117 | 80 |
118 | 40 |
119 | Punjabi |
120 | 4.2 |
121 | 28 |
122 | nonveg, healthy, spicy |
123 |
124 |
125 |
126 |
127 |
128 |
129 | ## Results of demographic filtering
130 | ```python
131 | top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head()
132 | pop_items[['title', 'num_orders']].head()
133 | ```
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 | |
143 | title |
144 | num_rating |
145 | avg_rating |
146 | score |
147 |
148 |
149 |
150 |
151 | 4 |
152 | Paneer Tikka |
153 | 30 |
154 | 4.6 |
155 | 4.288889 |
156 |
157 |
158 | 5 |
159 | Chicken Tikka |
160 | 28 |
161 | 4.2 |
162 | 4.013953 |
163 |
164 |
165 | 1 |
166 | Cheese Maggi |
167 | 15 |
168 | 3.8 |
169 | 3.733333 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 |
184 | |
185 | title |
186 | num_orders |
187 |
188 |
189 |
190 |
191 | 4 |
192 | Paneer Tikka |
193 | 50 |
194 |
195 |
196 | 1 |
197 | Cheese Maggi |
198 | 40 |
199 |
200 |
201 | 5 |
202 | Chicken Tikka |
203 | 40 |
204 |
205 |
206 | 0 |
207 | Lala Maggi |
208 | 35 |
209 |
210 |
211 | 3 |
212 | Veg Maggi |
213 | 25 |
214 |
215 |
216 |
217 |
218 |
219 |
220 |
221 | ## Content Based Filtering
222 |
223 | A bit more personalised recommendation. We will analyse the past orders of the user and suggest back those items which are similar.
224 |
225 | Also, since each person has a "home canteen", the user should be notified of any new items included in the menu by the vendor.
226 |
227 | We will be use Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together, we create a "soup" of tags. "Soup" is a processed string correspnding to each item, formed using the constituents of tags, tile and category.
228 |
229 |
230 |
231 |
232 |
233 | |
234 | food_id |
235 | title |
236 | canteen_id |
237 | price |
238 | num_orders |
239 | category |
240 | avg_rating |
241 | num_rating |
242 | tags |
243 | soup |
244 |
245 |
246 |
247 |
248 | 0 |
249 | 1 |
250 | Lala Maggi |
251 | 1 |
252 | 30 |
253 | 35 |
254 | maggi |
255 | 3.9 |
256 | 10 |
257 | veg, spicy |
258 | veg spicy lala maggi |
259 |
260 |
261 | 1 |
262 | 2 |
263 | Cheese Maggi |
264 | 1 |
265 | 25 |
266 | 40 |
267 | maggi |
268 | 3.8 |
269 | 15 |
270 | veg |
271 | veg cheese maggi |
272 |
273 |
274 | 2 |
275 | 3 |
276 | Masala Maggi |
277 | 1 |
278 | 25 |
279 | 10 |
280 | maggi |
281 | 3.0 |
282 | 10 |
283 | veg, spicy |
284 | veg spicy masala maggi |
285 |
286 |
287 |
288 |
289 |
290 |
291 | ## Using CountVectorizer from Scikit-Learn
292 |
293 | ```python
294 | # Import CountVectorizer and create the count matrix
295 | from sklearn.feature_extraction.text import CountVectorizer
296 | count = CountVectorizer(stop_words='english')
297 |
298 | # df1['soup']
299 | count_matrix = count.fit_transform(df1['soup'])
300 |
301 | # Compute the Cosine Similarity matrix based on the count_matrix
302 | from sklearn.metrics.pairwise import cosine_similarity
303 | cosine_sim = cosine_similarity(count_matrix, count_matrix)
304 | ```
305 |
306 |
307 | ## Sample Recommendation
308 |
309 | ```python
310 | df1.loc[get_recommendations(title="Paneer Tikka")]
311 | ```
312 |
313 |
314 |
315 |
316 |
317 |
318 |
319 |
320 | |
321 | food_id |
322 | title |
323 | canteen_id |
324 | price |
325 | num_orders |
326 | category |
327 | avg_rating |
328 | num_rating |
329 | tags |
330 | soup |
331 |
332 |
333 |
334 |
335 | 5 |
336 | 6 |
337 | Chicken Tikka |
338 | 1 |
339 | 80 |
340 | 40 |
341 | Punjabi |
342 | 4.2 |
343 | 28 |
344 | nonveg, healthy, spicy |
345 | nonveg healthy spicy chicken tikka punjabi |
346 |
347 |
348 | 3 |
349 | 4 |
350 | Veg Maggi |
351 | 1 |
352 | 30 |
353 | 25 |
354 | maggi |
355 | 2.5 |
356 | 5 |
357 | veg, healthy |
358 | veg healthy maggi |
359 |
360 |
361 |
362 |
363 |
364 |
365 |
366 |
367 |
368 | ### After all the hard work, we finally get the recommendations
369 |
370 |
371 | ```python
372 | personalised_recomms(orders, df1, current_user, columns)
373 | get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns)
374 | get_top_rated_items(top_rated_items, df1, columns)
375 | get_popular_items(pop_items, df1, columns).head(3)
376 | ```
377 |
378 |
379 |
380 |
381 |
382 |
383 |
384 |
385 | |
386 | title |
387 | canteen_id |
388 | price |
389 | comment |
390 |
391 |
392 |
393 |
394 | 0 |
395 | Veg Maggi |
396 | 1 |
397 | 30 |
398 | based on your past orders |
399 |
400 |
401 | 1 |
402 | Paneer Tikka |
403 | 1 |
404 | 60 |
405 | based on your past orders |
406 |
407 |
408 | 2 |
409 | Chicken Tikka |
410 | 1 |
411 | 80 |
412 | based on your past orders |
413 |
414 |
415 |
416 |
417 |
418 |
419 |
420 |
421 |
422 |
423 |
424 |
425 |
426 |
427 | |
428 | title |
429 | canteen_id |
430 | price |
431 | comment |
432 |
433 |
434 |
435 |
436 | 0 |
437 | Cheese Maggi |
438 | 1 |
439 | 25 |
440 | new/today's special item in your home canteen |
441 |
442 |
443 |
444 |
445 |
446 |
447 |
448 |
449 |
450 |
451 |
452 |
453 |
454 |
455 |
456 | |
457 | title |
458 | canteen_id |
459 | price |
460 | comment |
461 |
462 |
463 |
464 |
465 | 0 |
466 | Paneer Tikka |
467 | 1 |
468 | 60 |
469 | top rated items across canteens |
470 |
471 |
472 | 1 |
473 | Chicken Tikka |
474 | 1 |
475 | 80 |
476 | top rated items across canteens |
477 |
478 |
479 | 2 |
480 | Cheese Maggi |
481 | 1 |
482 | 25 |
483 | top rated items across canteens |
484 |
485 |
486 |
487 |
488 |
489 |
490 |
491 |
492 |
493 |
494 |
495 |
496 |
497 |
498 |
499 | |
500 | title |
501 | canteen_id |
502 | price |
503 | comment |
504 |
505 |
506 |
507 |
508 | 0 |
509 | Paneer Tikka |
510 | 1 |
511 | 60 |
512 | most popular items across canteens |
513 |
514 |
515 | 1 |
516 | Cheese Maggi |
517 | 1 |
518 | 25 |
519 | most popular items across canteens |
520 |
521 |
522 | 2 |
523 | Chicken Tikka |
524 | 1 |
525 | 80 |
526 | most popular items across canteens |
527 |
528 |
529 |
530 |
531 |
532 |
533 |
534 | These are just simple algorithms to make personalised & general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first, then move on to other features!
535 |
536 | #### Star the repository and send in your PRs if you think the engine needs any improvement or help me implement some more advanced features.
537 |
--------------------------------------------------------------------------------
/db/food.csv:
--------------------------------------------------------------------------------
1 | food_id,title,canteen_id,price,num_orders,category,avg_rating,num_rating,tags
2 | 1,Lala Maggi,1,30,35,maggi,3.9,10,"veg, spicy"
3 | 2,Cheese Maggi,1,25,40,maggi,3.8,15,"veg"
4 | 3,Masala Maggi,1,25,10,maggi,3,10,"veg, spicy"
5 | 4,Veg Maggi,1,30,25,maggi,2.5,5,"veg, healthy"
6 | 5,Paneer Tikka,1,60,50,Punjabi,4.6,30,"veg, healthy"
7 | 6,Chicken Tikka,1,80,40,Punjabi,4.2,28,"nonveg, healthy, spicy"
--------------------------------------------------------------------------------
/db/new_and_specials.csv:
--------------------------------------------------------------------------------
1 | specials_id,canteen_id,food_id,date,type
2 | 1,1,2,2019-6-28,new
3 | 2,2,5,2019-6-28,special
--------------------------------------------------------------------------------
/db/orders.csv:
--------------------------------------------------------------------------------
1 | order_id,user_id,food_id,canteen_id,date_time,status,amount
2 | 1,2,5,1,2019-06-28 9:26:03,served,60
3 | 2,3,5,1,2019-06-29 9:26:03,served,60
4 | 3,2,6,1,2019-06-30 9:26:03,served,80
5 | 4,1,5,1,2019-07-01 9:26:03,served,60
6 | 5,2,5,1,2019-07-02 9:26:03,served,60
7 | 6,2,5,1,2019-07-03 9:26:03,served,60
--------------------------------------------------------------------------------
/db/users.csv:
--------------------------------------------------------------------------------
1 | user_id,name,email,roll_no,hall,home_canteen
2 | 1,test_1,test1@test.com,1,12,1
3 | 2,test_2,test2@test.com,2,12,1
4 | 3,test_3,test3@test.com,3,12,2
5 | 4,test_4,test4@test.com,4,12,1
6 | 5,test_5,test5@test.com,5,12,1
--------------------------------------------------------------------------------
/src.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "from IPython.core.interactiveshell import InteractiveShell\n",
10 | "InteractiveShell.ast_node_interactivity = \"all\""
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "# Lunchbox App ML Engine\n",
18 | "\n",
19 | "This is the Recommendation Engine that will be used in building the Lunchbox App, a platform for ordering food and keeping track of user expenditure and canteen sales. Regardless of whether or not this is actually implemented in all the canteens of IIT Kanpur, given the potential for frauds & cyber-attacks, I will complete the platform.\n",
20 | "\n",
21 | "Also, I would be open-sourcing the app so that any campus can implement a cash-less & integrated system of ordering food across their whole campus. After all, what good are IITs for if our canteens still keep track of student accounts on paper registers!"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "## Demographic Filtering\n",
29 | "\n",
30 | "Suggesting the users items that were well-received and are popular among the users, in general. Most trending items and items with the best rating rise to the top and get shortlisted for recommendation."
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "metadata": {},
37 | "outputs": [
38 | {
39 | "data": {
40 | "text/html": [
41 | "\n",
42 | "\n",
55 | "
\n",
56 | " \n",
57 | " \n",
58 | " | \n",
59 | " food_id | \n",
60 | " title | \n",
61 | " canteen_id | \n",
62 | " price | \n",
63 | " num_orders | \n",
64 | " category | \n",
65 | " avg_rating | \n",
66 | " num_rating | \n",
67 | " tags | \n",
68 | "
\n",
69 | " \n",
70 | " \n",
71 | " \n",
72 | " 0 | \n",
73 | " 1 | \n",
74 | " Lala Maggi | \n",
75 | " 1 | \n",
76 | " 30 | \n",
77 | " 35 | \n",
78 | " maggi | \n",
79 | " 3.9 | \n",
80 | " 10 | \n",
81 | " veg, spicy | \n",
82 | "
\n",
83 | " \n",
84 | " 1 | \n",
85 | " 2 | \n",
86 | " Cheese Maggi | \n",
87 | " 1 | \n",
88 | " 25 | \n",
89 | " 40 | \n",
90 | " maggi | \n",
91 | " 3.8 | \n",
92 | " 15 | \n",
93 | " veg | \n",
94 | "
\n",
95 | " \n",
96 | " 2 | \n",
97 | " 3 | \n",
98 | " Masala Maggi | \n",
99 | " 1 | \n",
100 | " 25 | \n",
101 | " 10 | \n",
102 | " maggi | \n",
103 | " 3.0 | \n",
104 | " 10 | \n",
105 | " veg, spicy | \n",
106 | "
\n",
107 | " \n",
108 | " 3 | \n",
109 | " 4 | \n",
110 | " Veg Maggi | \n",
111 | " 1 | \n",
112 | " 30 | \n",
113 | " 25 | \n",
114 | " maggi | \n",
115 | " 2.5 | \n",
116 | " 5 | \n",
117 | " veg, healthy | \n",
118 | "
\n",
119 | " \n",
120 | " 4 | \n",
121 | " 5 | \n",
122 | " Paneer Tikka | \n",
123 | " 1 | \n",
124 | " 60 | \n",
125 | " 50 | \n",
126 | " Punjabi | \n",
127 | " 4.6 | \n",
128 | " 30 | \n",
129 | " veg, healthy | \n",
130 | "
\n",
131 | " \n",
132 | " 5 | \n",
133 | " 6 | \n",
134 | " Chicken Tikka | \n",
135 | " 1 | \n",
136 | " 80 | \n",
137 | " 40 | \n",
138 | " Punjabi | \n",
139 | " 4.2 | \n",
140 | " 28 | \n",
141 | " nonveg, healthy, spicy | \n",
142 | "
\n",
143 | " \n",
144 | "
\n",
145 | "
"
146 | ],
147 | "text/plain": [
148 | " food_id title canteen_id price num_orders category avg_rating \\\n",
149 | "0 1 Lala Maggi 1 30 35 maggi 3.9 \n",
150 | "1 2 Cheese Maggi 1 25 40 maggi 3.8 \n",
151 | "2 3 Masala Maggi 1 25 10 maggi 3.0 \n",
152 | "3 4 Veg Maggi 1 30 25 maggi 2.5 \n",
153 | "4 5 Paneer Tikka 1 60 50 Punjabi 4.6 \n",
154 | "5 6 Chicken Tikka 1 80 40 Punjabi 4.2 \n",
155 | "\n",
156 | " num_rating tags \n",
157 | "0 10 veg, spicy \n",
158 | "1 15 veg \n",
159 | "2 10 veg, spicy \n",
160 | "3 5 veg, healthy \n",
161 | "4 30 veg, healthy \n",
162 | "5 28 nonveg, healthy, spicy "
163 | ]
164 | },
165 | "execution_count": 2,
166 | "metadata": {},
167 | "output_type": "execute_result"
168 | }
169 | ],
170 | "source": [
171 | "import pandas as pd \n",
172 | "import numpy as np\n",
173 | "\n",
174 | "# Importing db of food items across all canteens registered on the platform\n",
175 | "df1=pd.read_csv('./db/food.csv')\n",
176 | "df1.columns = ['food_id','title','canteen_id','price', 'num_orders', 'category', 'avg_rating', 'num_rating', 'tags']\n",
177 | "\n",
178 | "df1"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": 3,
184 | "metadata": {},
185 | "outputs": [],
186 | "source": [
187 | "# mean of average ratings of all items\n",
188 | "C= df1['avg_rating'].mean()\n",
189 | "\n",
190 | "# the minimum number of votes required to appear in recommendation list, i.e, 60th percentile among 'num_rating'\n",
191 | "m= df1['num_rating'].quantile(0.6)\n",
192 | "\n",
193 | "# items that qualify the criteria of minimum num of votes\n",
194 | "q_items = df1.copy().loc[df1['num_rating'] >= m]\n",
195 | "\n",
196 | "# Calculation of weighted rating based on the IMDB formula\n",
197 | "def weighted_rating(x, m=m, C=C):\n",
198 | " v = x['num_rating']\n",
199 | " R = x['avg_rating']\n",
200 | " return (v/(v+m) * R) + (m/(m+v) * C)\n",
201 | "\n",
202 | "# Applying weighted_rating to qualified items\n",
203 | "q_items['score'] = q_items.apply(weighted_rating, axis=1)\n",
204 | "\n",
205 | "# Shortlisting the top rated items and popular items\n",
206 | "top_rated_items = q_items.sort_values('score', ascending=False)\n",
207 | "pop_items= df1.sort_values('num_orders', ascending=False)"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": 4,
213 | "metadata": {},
214 | "outputs": [
215 | {
216 | "data": {
217 | "text/html": [
218 | "\n",
219 | "\n",
232 | "
\n",
233 | " \n",
234 | " \n",
235 | " | \n",
236 | " title | \n",
237 | " num_rating | \n",
238 | " avg_rating | \n",
239 | " score | \n",
240 | "
\n",
241 | " \n",
242 | " \n",
243 | " \n",
244 | " 4 | \n",
245 | " Paneer Tikka | \n",
246 | " 30 | \n",
247 | " 4.6 | \n",
248 | " 4.288889 | \n",
249 | "
\n",
250 | " \n",
251 | " 5 | \n",
252 | " Chicken Tikka | \n",
253 | " 28 | \n",
254 | " 4.2 | \n",
255 | " 4.013953 | \n",
256 | "
\n",
257 | " \n",
258 | " 1 | \n",
259 | " Cheese Maggi | \n",
260 | " 15 | \n",
261 | " 3.8 | \n",
262 | " 3.733333 | \n",
263 | "
\n",
264 | " \n",
265 | "
\n",
266 | "
"
267 | ],
268 | "text/plain": [
269 | " title num_rating avg_rating score\n",
270 | "4 Paneer Tikka 30 4.6 4.288889\n",
271 | "5 Chicken Tikka 28 4.2 4.013953\n",
272 | "1 Cheese Maggi 15 3.8 3.733333"
273 | ]
274 | },
275 | "execution_count": 4,
276 | "metadata": {},
277 | "output_type": "execute_result"
278 | },
279 | {
280 | "data": {
281 | "text/html": [
282 | "\n",
283 | "\n",
296 | "
\n",
297 | " \n",
298 | " \n",
299 | " | \n",
300 | " title | \n",
301 | " num_orders | \n",
302 | "
\n",
303 | " \n",
304 | " \n",
305 | " \n",
306 | " 4 | \n",
307 | " Paneer Tikka | \n",
308 | " 50 | \n",
309 | "
\n",
310 | " \n",
311 | " 1 | \n",
312 | " Cheese Maggi | \n",
313 | " 40 | \n",
314 | "
\n",
315 | " \n",
316 | " 5 | \n",
317 | " Chicken Tikka | \n",
318 | " 40 | \n",
319 | "
\n",
320 | " \n",
321 | " 0 | \n",
322 | " Lala Maggi | \n",
323 | " 35 | \n",
324 | "
\n",
325 | " \n",
326 | " 3 | \n",
327 | " Veg Maggi | \n",
328 | " 25 | \n",
329 | "
\n",
330 | " \n",
331 | "
\n",
332 | "
"
333 | ],
334 | "text/plain": [
335 | " title num_orders\n",
336 | "4 Paneer Tikka 50\n",
337 | "1 Cheese Maggi 40\n",
338 | "5 Chicken Tikka 40\n",
339 | "0 Lala Maggi 35\n",
340 | "3 Veg Maggi 25"
341 | ]
342 | },
343 | "execution_count": 4,
344 | "metadata": {},
345 | "output_type": "execute_result"
346 | }
347 | ],
348 | "source": [
349 | "# Display results of demographic filtering\n",
350 | "top_rated_items[['title', 'num_rating', 'avg_rating', 'score']].head()\n",
351 | "pop_items[['title', 'num_orders']].head()"
352 | ]
353 | },
354 | {
355 | "cell_type": "markdown",
356 | "metadata": {},
357 | "source": [
358 | "## Content Based Filtering\n",
359 | "\n",
360 | "A bit more personalised recommendation. We will be analysing the past orders of the user and suggesting back those items which are similar.\n",
361 | "\n",
362 | "Also, since each person has a \"home canteen\", the user should be notified any new items included in the menu by the vendor.\n",
363 | "\n",
364 | "We will be using Count Vectorizer from Scikit-Learn to find similarity between items based on their title, category and tags. To bring all these properties of each item together we create a \"soup\" of tags. \"Soup\" is a processed string correspnding to each item, formed using constituent words of tags, tile and category."
365 | ]
366 | },
367 | {
368 | "cell_type": "code",
369 | "execution_count": 5,
370 | "metadata": {},
371 | "outputs": [
372 | {
373 | "data": {
374 | "text/html": [
375 | "\n",
376 | "\n",
389 | "
\n",
390 | " \n",
391 | " \n",
392 | " | \n",
393 | " food_id | \n",
394 | " title | \n",
395 | " canteen_id | \n",
396 | " price | \n",
397 | " num_orders | \n",
398 | " category | \n",
399 | " avg_rating | \n",
400 | " num_rating | \n",
401 | " tags | \n",
402 | " soup | \n",
403 | "
\n",
404 | " \n",
405 | " \n",
406 | " \n",
407 | " 0 | \n",
408 | " 1 | \n",
409 | " Lala Maggi | \n",
410 | " 1 | \n",
411 | " 30 | \n",
412 | " 35 | \n",
413 | " maggi | \n",
414 | " 3.9 | \n",
415 | " 10 | \n",
416 | " veg, spicy | \n",
417 | " veg spicy lala maggi | \n",
418 | "
\n",
419 | " \n",
420 | " 1 | \n",
421 | " 2 | \n",
422 | " Cheese Maggi | \n",
423 | " 1 | \n",
424 | " 25 | \n",
425 | " 40 | \n",
426 | " maggi | \n",
427 | " 3.8 | \n",
428 | " 15 | \n",
429 | " veg | \n",
430 | " veg cheese maggi | \n",
431 | "
\n",
432 | " \n",
433 | " 2 | \n",
434 | " 3 | \n",
435 | " Masala Maggi | \n",
436 | " 1 | \n",
437 | " 25 | \n",
438 | " 10 | \n",
439 | " maggi | \n",
440 | " 3.0 | \n",
441 | " 10 | \n",
442 | " veg, spicy | \n",
443 | " veg spicy masala maggi | \n",
444 | "
\n",
445 | " \n",
446 | "
\n",
447 | "
"
448 | ],
449 | "text/plain": [
450 | " food_id title canteen_id price num_orders category avg_rating \\\n",
451 | "0 1 Lala Maggi 1 30 35 maggi 3.9 \n",
452 | "1 2 Cheese Maggi 1 25 40 maggi 3.8 \n",
453 | "2 3 Masala Maggi 1 25 10 maggi 3.0 \n",
454 | "\n",
455 | " num_rating tags soup \n",
456 | "0 10 veg, spicy veg spicy lala maggi \n",
457 | "1 15 veg veg cheese maggi \n",
458 | "2 10 veg, spicy veg spicy masala maggi "
459 | ]
460 | },
461 | "execution_count": 5,
462 | "metadata": {},
463 | "output_type": "execute_result"
464 | }
465 | ],
466 | "source": [
467 | "# TODO: clean data\n",
468 | "\n",
469 | "# Creating soup string for each item\n",
470 | "def create_soup(x): \n",
471 | " tags = x['tags'].lower().split(', ')\n",
472 | " tags.extend(x['title'].lower().split())\n",
473 | " tags.extend(x['category'].lower().split())\n",
474 | " return \" \".join(sorted(set(tags), key=tags.index))\n",
475 | "\n",
476 | "df1['soup'] = df1.apply(create_soup, axis=1)\n",
477 | "df1.head(3)"
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "execution_count": 6,
483 | "metadata": {},
484 | "outputs": [],
485 | "source": [
486 | "# Import CountVectorizer and create the count matrix\n",
487 | "from sklearn.feature_extraction.text import CountVectorizer\n",
488 | "count = CountVectorizer(stop_words='english')\n",
489 | "\n",
490 | "# df1['soup']\n",
491 | "count_matrix = count.fit_transform(df1['soup'])\n",
492 | "\n",
493 | "# Compute the Cosine Similarity matrix based on the count_matrix\n",
494 | "from sklearn.metrics.pairwise import cosine_similarity\n",
495 | "cosine_sim = cosine_similarity(count_matrix, count_matrix)\n",
496 | "\n",
497 | "indices_from_title = pd.Series(df1.index, index=df1['title'])\n",
498 | "indices_from_food_id = pd.Series(df1.index, index=df1['food_id'])"
499 | ]
500 | },
501 | {
502 | "cell_type": "code",
503 | "execution_count": 7,
504 | "metadata": {},
505 | "outputs": [],
506 | "source": [
507 | "# Function that takes in food title or food id as input and outputs most similar dishes \n",
508 | "def get_recommendations(title=\"\", cosine_sim=cosine_sim, idx=-1):\n",
509 | " # Get the index of the item that matches the title\n",
510 | " if idx == -1 and title != \"\":\n",
511 | " idx = indices_from_title[title]\n",
512 | "\n",
513 | " # Get the pairwsie similarity scores of all dishes with that dish\n",
514 | " sim_scores = list(enumerate(cosine_sim[idx]))\n",
515 | "\n",
516 | " # Sort the dishes based on the similarity scores\n",
517 | " sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)\n",
518 | " \n",
519 | " # Get the scores of the 10 most similar dishes\n",
520 | " sim_scores = sim_scores[1:3]\n",
521 | "\n",
522 | " # Get the food indices\n",
523 | " food_indices = [i[0] for i in sim_scores]\n",
524 | "\n",
525 | " # Return the top 10 most similar dishes\n",
526 | " return food_indices"
527 | ]
528 | },
529 | {
530 | "cell_type": "code",
531 | "execution_count": 8,
532 | "metadata": {},
533 | "outputs": [
534 | {
535 | "data": {
536 | "text/html": [
537 | "\n",
538 | "\n",
551 | "
\n",
552 | " \n",
553 | " \n",
554 | " | \n",
555 | " food_id | \n",
556 | " title | \n",
557 | " canteen_id | \n",
558 | " price | \n",
559 | " num_orders | \n",
560 | " category | \n",
561 | " avg_rating | \n",
562 | " num_rating | \n",
563 | " tags | \n",
564 | " soup | \n",
565 | "
\n",
566 | " \n",
567 | " \n",
568 | " \n",
569 | " 5 | \n",
570 | " 6 | \n",
571 | " Chicken Tikka | \n",
572 | " 1 | \n",
573 | " 80 | \n",
574 | " 40 | \n",
575 | " Punjabi | \n",
576 | " 4.2 | \n",
577 | " 28 | \n",
578 | " nonveg, healthy, spicy | \n",
579 | " nonveg healthy spicy chicken tikka punjabi | \n",
580 | "
\n",
581 | " \n",
582 | " 3 | \n",
583 | " 4 | \n",
584 | " Veg Maggi | \n",
585 | " 1 | \n",
586 | " 30 | \n",
587 | " 25 | \n",
588 | " maggi | \n",
589 | " 2.5 | \n",
590 | " 5 | \n",
591 | " veg, healthy | \n",
592 | " veg healthy maggi | \n",
593 | "
\n",
594 | " \n",
595 | "
\n",
596 | "
"
597 | ],
598 | "text/plain": [
599 | " food_id title canteen_id price num_orders category avg_rating \\\n",
600 | "5 6 Chicken Tikka 1 80 40 Punjabi 4.2 \n",
601 | "3 4 Veg Maggi 1 30 25 maggi 2.5 \n",
602 | "\n",
603 | " num_rating tags \\\n",
604 | "5 28 nonveg, healthy, spicy \n",
605 | "3 5 veg, healthy \n",
606 | "\n",
607 | " soup \n",
608 | "5 nonveg healthy spicy chicken tikka punjabi \n",
609 | "3 veg healthy maggi "
610 | ]
611 | },
612 | "execution_count": 8,
613 | "metadata": {},
614 | "output_type": "execute_result"
615 | }
616 | ],
617 | "source": [
618 | "df1.loc[get_recommendations(title=\"Paneer Tikka\")]"
619 | ]
620 | },
621 | {
622 | "cell_type": "markdown",
623 | "metadata": {},
624 | "source": [
625 | "We will now some functions, some of which are utility functions, others are actually the functions which will help get personalised recommendations for current user."
626 | ]
627 | },
628 | {
629 | "cell_type": "code",
630 | "execution_count": 9,
631 | "metadata": {},
632 | "outputs": [],
633 | "source": [
634 | "# fetch few past orders of a user, based on which personalized recommendations are to be made\n",
635 | "def get_latest_user_orders(user_id, orders, num_orders=3):\n",
636 | " counter = num_orders\n",
637 | " order_indices = []\n",
638 | " \n",
639 | " for index, row in orders[['user_id']].iterrows():\n",
640 | " if row.user_id == user_id:\n",
641 | " counter = counter -1\n",
642 | " order_indices.append(index)\n",
643 | " if counter == 0:\n",
644 | " break\n",
645 | " \n",
646 | " return order_indices\n",
647 | "\n",
648 | "# utility function that returns a DataFrame given the food_indices to be recommended\n",
649 | "def get_recomms_df(food_indices, df1, columns, comment):\n",
650 | " row = 0\n",
651 | " df = pd.DataFrame(columns=columns)\n",
652 | " \n",
653 | " for i in food_indices:\n",
654 | " df.loc[row] = df1[['title', 'canteen_id', 'price']].loc[i]\n",
655 | " df.loc[row].comment = comment\n",
656 | " row = row+1\n",
657 | " return df\n",
658 | "\n",
659 | "# return food_indices for accomplishing personalized recommendation using Count Vectorizer\n",
660 | "def personalised_recomms(orders, df1, user_id, columns, comment=\"based on your past orders\"):\n",
661 | " order_indices = get_latest_user_orders(user_id, orders)\n",
662 | " food_ids = []\n",
663 | " food_indices = []\n",
664 | " recomm_indices = []\n",
665 | " \n",
666 | " for i in order_indices:\n",
667 | " food_ids.append(orders.loc[i].food_id)\n",
668 | " for i in food_ids:\n",
669 | " food_indices.append(indices_from_food_id[i])\n",
670 | " for i in food_indices:\n",
671 | " recomm_indices.extend(get_recommendations(idx=i))\n",
672 | " \n",
673 | " return get_recomms_df(set(recomm_indices), df1, columns, comment)\n",
674 | "\n",
675 | "# Simply fetch new items added by vendor or today's special at home canteen\n",
676 | "def get_new_and_specials_recomms(new_and_specials, users, df1, canteen_id, columns, comment=\"new/today's special item in your home canteen\"):\n",
677 | " food_indices = []\n",
678 | " \n",
679 | " for index, row in new_and_specials[['canteen_id']].iterrows():\n",
680 | " if row.canteen_id == canteen_id:\n",
681 | " food_indices.append(indices_from_food_id[new_and_specials.loc[index].food_id])\n",
682 | " \n",
683 | " return get_recomms_df(set(food_indices), df1, columns, comment)\n",
684 | "\n",
685 | "# utility function to get the home canteen given a user id\n",
686 | "def get_user_home_canteen(users, user_id):\n",
687 | " for index, row in users[['user_id']].iterrows():\n",
688 | " if row.user_id == user_id:\n",
689 | " return users.loc[index].home_canteen\n",
690 | " return -1\n",
691 | "\n",
692 | "# fetch items from previously calculated top_rated_items list\n",
693 | "def get_top_rated_items(top_rated_items, df1, columns, comment=\"top rated items across canteens\"):\n",
694 | " food_indices = []\n",
695 | " \n",
696 | " for index, row in top_rated_items.iterrows():\n",
697 | " food_indices.append(indices_from_food_id[top_rated_items.loc[index].food_id])\n",
698 | " \n",
699 | " return get_recomms_df(food_indices, df1, columns, comment)\n",
700 | "\n",
701 | "# fetch items from previously calculated pop_items list\n",
702 | "def get_popular_items(pop_items, df1, columns, comment=\"most popular items across canteens\"):\n",
703 | " food_indices = []\n",
704 | " \n",
705 | " for index, row in pop_items.iterrows():\n",
706 | " food_indices.append(indices_from_food_id[pop_items.loc[index].food_id])\n",
707 | " \n",
708 | " return get_recomms_df(food_indices, df1, columns, comment)\n",
709 | " "
710 | ]
711 | },
712 | {
713 | "cell_type": "markdown",
714 | "metadata": {},
715 | "source": [
716 | "### After all the hard work, we finally get the recommendations"
717 | ]
718 | },
719 | {
720 | "cell_type": "code",
721 | "execution_count": 11,
722 | "metadata": {},
723 | "outputs": [
724 | {
725 | "data": {
726 | "text/html": [
727 | "\n",
728 | "\n",
741 | "
\n",
742 | " \n",
743 | " \n",
744 | " | \n",
745 | " title | \n",
746 | " canteen_id | \n",
747 | " price | \n",
748 | " comment | \n",
749 | "
\n",
750 | " \n",
751 | " \n",
752 | " \n",
753 | " 0 | \n",
754 | " Veg Maggi | \n",
755 | " 1 | \n",
756 | " 30 | \n",
757 | " based on your past orders | \n",
758 | "
\n",
759 | " \n",
760 | " 1 | \n",
761 | " Paneer Tikka | \n",
762 | " 1 | \n",
763 | " 60 | \n",
764 | " based on your past orders | \n",
765 | "
\n",
766 | " \n",
767 | " 2 | \n",
768 | " Chicken Tikka | \n",
769 | " 1 | \n",
770 | " 80 | \n",
771 | " based on your past orders | \n",
772 | "
\n",
773 | " \n",
774 | "
\n",
775 | "
"
776 | ],
777 | "text/plain": [
778 | " title canteen_id price comment\n",
779 | "0 Veg Maggi 1 30 based on your past orders\n",
780 | "1 Paneer Tikka 1 60 based on your past orders\n",
781 | "2 Chicken Tikka 1 80 based on your past orders"
782 | ]
783 | },
784 | "execution_count": 11,
785 | "metadata": {},
786 | "output_type": "execute_result"
787 | },
788 | {
789 | "data": {
790 | "text/html": [
791 | "\n",
792 | "\n",
805 | "
\n",
806 | " \n",
807 | " \n",
808 | " | \n",
809 | " title | \n",
810 | " canteen_id | \n",
811 | " price | \n",
812 | " comment | \n",
813 | "
\n",
814 | " \n",
815 | " \n",
816 | " \n",
817 | " 0 | \n",
818 | " Cheese Maggi | \n",
819 | " 1 | \n",
820 | " 25 | \n",
821 | " new/today's special item in your home canteen | \n",
822 | "
\n",
823 | " \n",
824 | "
\n",
825 | "
"
826 | ],
827 | "text/plain": [
828 | " title canteen_id price \\\n",
829 | "0 Cheese Maggi 1 25 \n",
830 | "\n",
831 | " comment \n",
832 | "0 new/today's special item in your home canteen "
833 | ]
834 | },
835 | "execution_count": 11,
836 | "metadata": {},
837 | "output_type": "execute_result"
838 | },
839 | {
840 | "data": {
841 | "text/html": [
842 | "\n",
843 | "\n",
856 | "
\n",
857 | " \n",
858 | " \n",
859 | " | \n",
860 | " title | \n",
861 | " canteen_id | \n",
862 | " price | \n",
863 | " comment | \n",
864 | "
\n",
865 | " \n",
866 | " \n",
867 | " \n",
868 | " 0 | \n",
869 | " Paneer Tikka | \n",
870 | " 1 | \n",
871 | " 60 | \n",
872 | " top rated items across canteens | \n",
873 | "
\n",
874 | " \n",
875 | " 1 | \n",
876 | " Chicken Tikka | \n",
877 | " 1 | \n",
878 | " 80 | \n",
879 | " top rated items across canteens | \n",
880 | "
\n",
881 | " \n",
882 | " 2 | \n",
883 | " Cheese Maggi | \n",
884 | " 1 | \n",
885 | " 25 | \n",
886 | " top rated items across canteens | \n",
887 | "
\n",
888 | " \n",
889 | "
\n",
890 | "
"
891 | ],
892 | "text/plain": [
893 | " title canteen_id price comment\n",
894 | "0 Paneer Tikka 1 60 top rated items across canteens\n",
895 | "1 Chicken Tikka 1 80 top rated items across canteens\n",
896 | "2 Cheese Maggi 1 25 top rated items across canteens"
897 | ]
898 | },
899 | "execution_count": 11,
900 | "metadata": {},
901 | "output_type": "execute_result"
902 | },
903 | {
904 | "data": {
905 | "text/html": [
906 | "\n",
907 | "\n",
920 | "
\n",
921 | " \n",
922 | " \n",
923 | " | \n",
924 | " title | \n",
925 | " canteen_id | \n",
926 | " price | \n",
927 | " comment | \n",
928 | "
\n",
929 | " \n",
930 | " \n",
931 | " \n",
932 | " 0 | \n",
933 | " Paneer Tikka | \n",
934 | " 1 | \n",
935 | " 60 | \n",
936 | " most popular items across canteens | \n",
937 | "
\n",
938 | " \n",
939 | " 1 | \n",
940 | " Cheese Maggi | \n",
941 | " 1 | \n",
942 | " 25 | \n",
943 | " most popular items across canteens | \n",
944 | "
\n",
945 | " \n",
946 | " 2 | \n",
947 | " Chicken Tikka | \n",
948 | " 1 | \n",
949 | " 80 | \n",
950 | " most popular items across canteens | \n",
951 | "
\n",
952 | " \n",
953 | "
\n",
954 | "
"
955 | ],
956 | "text/plain": [
957 | " title canteen_id price comment\n",
958 | "0 Paneer Tikka 1 60 most popular items across canteens\n",
959 | "1 Cheese Maggi 1 25 most popular items across canteens\n",
960 | "2 Chicken Tikka 1 80 most popular items across canteens"
961 | ]
962 | },
963 | "execution_count": 11,
964 | "metadata": {},
965 | "output_type": "execute_result"
966 | }
967 | ],
968 | "source": [
969 | "orders = pd.read_csv('./db/orders.csv')\n",
970 | "new_and_specials = pd.read_csv('./db/new_and_specials.csv')\n",
971 | "users = pd.read_csv('./db/users.csv')\n",
972 | "\n",
973 | "columns = ['title', 'canteen_id', 'price', 'comment']\n",
974 | "current_user = 2\n",
975 | "current_canteen = get_user_home_canteen(users, current_user)\n",
976 | "\n",
977 | "\n",
978 | "personalised_recomms(orders, df1, current_user, columns)\n",
979 | "get_new_and_specials_recomms(new_and_specials, users, df1, current_canteen, columns)\n",
980 | "get_top_rated_items(top_rated_items, df1, columns)\n",
981 | "get_popular_items(pop_items, df1, columns).head(3)"
982 | ]
983 | },
984 | {
985 | "cell_type": "markdown",
986 | "metadata": {},
987 | "source": [
988 | "These are just simple algorithms to make personalised and even general recommendations to users. We can easily use collaborative filtering or incorporate neural networks to make our prediction even better. However, these are more computationally intensive methods. Kinda overkill, IMO! Let's build that app first! "
989 | ]
990 | },
991 | {
992 | "cell_type": "markdown",
993 | "metadata": {},
994 | "source": [
995 | "#### Star the repository and send in your PRs if you think the engine needs any improvement or helping me implement some more advanced features."
996 | ]
997 | }
998 | ],
999 | "metadata": {
1000 | "kernelspec": {
1001 | "display_name": "Python 3",
1002 | "language": "python",
1003 | "name": "python3"
1004 | },
1005 | "language_info": {
1006 | "codemirror_mode": {
1007 | "name": "ipython",
1008 | "version": 3
1009 | },
1010 | "file_extension": ".py",
1011 | "mimetype": "text/x-python",
1012 | "name": "python",
1013 | "nbconvert_exporter": "python",
1014 | "pygments_lexer": "ipython3",
1015 | "version": "3.6.8"
1016 | }
1017 | },
1018 | "nbformat": 4,
1019 | "nbformat_minor": 2
1020 | }
1021 |
--------------------------------------------------------------------------------