├── dictionary.png
├── data_distributions.png
├── Business presentation.pdf
├── iFood Data Analyst Case.pdf
├── README.md
└── classification_model.ipynb
/dictionary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/dictionary.png
--------------------------------------------------------------------------------
/data_distributions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/data_distributions.png
--------------------------------------------------------------------------------
/Business presentation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/Business presentation.pdf
--------------------------------------------------------------------------------
/iFood Data Analyst Case.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/iFood Data Analyst Case.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ifood-data-business-analyst-test
2 | ## last update 19/02/2020
3 |
4 | This case is used for hiring Data Analysts for the iFood Brain team. Instructions are in the pdf file.
5 |
6 | If you are interested in joining our team, or getting to know more about iFood and our team, feel free to send an e-mail to "ifoodbrain_hiring@ifood.com.br".
7 |
--------------------------------------------------------------------------------
/classification_model.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# iFood CRM Data Analyst Case - Part2\n",
8 | "\n",
9 | "### Description\n",
10 | "\n",
11 | "The objective of the team is to build a predictive model that will produce the highest profit for the\n",
12 | "next direct marketing campaign, scheduled for the next month. \n",
13 | "The new campaign, sixth, aims at\n",
14 | "selling a new gadget to the Customer Database. \n",
15 | "To build the model, a pilot campaign involving **2.240 customers** was carried out. \n",
16 | "The customers were selected at random and contacted by phone regarding the acquisition of the gadget. \n",
17 | "During the following months, customers who bought the offer were properly labeled. \n",
18 | "The total cost of the sample campaign was 6.720MU and the revenue generated by the customers who accepted the offer was 3.674MU. \n",
19 | "Globally the campaign had a profit of -3.046MU. \n",
20 | "The success rate of the campaign was 15%. \n",
21 | "\n",
22 | "The objective is of the team is to develop a model that predicts customer behavior and to apply it to the rest of the customer base.\n",
23 | "Hopefully the model will allow the company to cherry pick the customers that are most likely to\n",
24 | "purchase the offer while leaving out the non-respondents, making the next campaign highly\n",
25 | "profitable. Moreover, other than maximizing the profit of the campaign, the CMO is interested in\n",
26 | "understanding to study the characteristic features of those customers who are willing to buy the\n",
27 | "gadget.\n",
28 | "\n",
29 | "### Key Objectives are:\n",
30 | "\n",
31 | "1. Explore the data – don’t just plot means and counts. Provide insights, define cause and\n",
32 | "effect. Provide a better understanding of the characteristic features of respondents;\n",
33 | "2. Propose and describe a customer segmentation based on customers behaviors;\n",
34 | "3. Create a predictive model which allows the company to maximize the profit of the next\n",
35 | "marketing campaign.\n",
36 | "4. Whatever else you think is necessary.\n",
37 | "\n",
38 | "### Deliverables:\n",
39 | "\n",
40 | "1. Data Exploration;\n",
41 | "2. Segmentation;\n",
42 | "3. Classification Model;\n",
43 | "4. A short business presentation.\n",
44 | "\n",
45 | "### Data Dictionary and Notes\n",
46 | "\n",
47 | "At the botton of the notebook\n",
48 | "\n",
49 | "---"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 102,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "import pandas as pd\n",
59 | "import numpy as np\n",
60 | "import seaborn as sns\n",
61 | "import matplotlib.pyplot as plt\n",
62 | "from sklearn.model_selection import train_test_split\n",
63 | "from sklearn.ensemble import RandomForestClassifier\n",
64 | "from sklearn.model_selection import GridSearchCV\n",
65 | "from sklearn import metrics"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 72,
71 | "metadata": {},
72 | "outputs": [],
73 | "source": [
74 | "# Read Dataset\n",
75 | "ifood_df = pd.read_csv('ifood_df.csv')\n",
76 | "\n",
77 | "# Split dataset into features and labels\n",
78 | "features = ifood_df.drop('Response', axis =1)\n",
79 | "labels = ifood_df.Response\n",
80 | "\n",
81 | "# Split dataset into training set and test set\n",
82 | "X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.40, random_state = 5)"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "metadata": {},
88 | "source": [
89 | "## Generating the Model \n"
90 | ]
91 | },
92 | {
93 | "cell_type": "code",
94 | "execution_count": 73,
95 | "metadata": {},
96 | "outputs": [
97 | {
98 | "output_type": "stream",
99 | "name": "stdout",
100 | "text": "Fitting 5 folds for each of 36 candidates, totalling 180 fits\n[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed: 1.7min finished\n"
101 | },
102 | {
103 | "output_type": "execute_result",
104 | "data": {
105 | "text/plain": "GridSearchCV(cv=5, error_score=nan,\n estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,\n class_weight=None,\n criterion='gini', max_depth=None,\n max_features='auto',\n max_leaf_nodes=None,\n max_samples=None,\n min_impurity_decrease=0.0,\n min_impurity_split=None,\n min_samples_leaf=1,\n min_samples_split=2,\n min_weight_fraction_leaf=0.0,\n n_estimators=100, n_jobs=None,\n oob_score=False, random_state=5,\n verbose=0, warm_start=False),\n iid='deprecated', n_jobs=None,\n param_grid={'criterion': ['gini'], 'max_depth': [None, 3, 5, 8],\n 'max_features': ['auto'],\n 'min_samples_split': [2, 3, 4],\n 'n_estimators': [50, 100, 200]},\n pre_dispatch='2*n_jobs', refit=True, return_train_score=False,\n scoring=None, verbose=1)"
106 | },
107 | "metadata": {},
108 | "execution_count": 73
109 | }
110 | ],
111 | "source": [
112 | "# Using Grid Search to find the best parameters\n",
113 | "param_grid = { \n",
114 | " 'n_estimators': [50, 100, 200],\n",
115 | " 'max_features': ['auto'],\n",
116 | " 'max_depth' : [None,3,5,8],\n",
117 | " 'criterion' :['gini'],\n",
118 | " 'min_samples_split':[2,3,4]\n",
119 | "}\n",
120 | "\n",
121 | "# Training RF Models with K-Fold of 5 \n",
122 | "rf_models = GridSearchCV(RandomForestClassifier(random_state = 5), param_grid=param_grid, cv=5, verbose=1)\n",
123 | "rf_models.fit(X_train, y_train)"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": 74,
129 | "metadata": {},
130 | "outputs": [
131 | {
132 | "output_type": "stream",
133 | "name": "stdout",
134 | "text": "Accuracy: 0.8707482993197279\n"
135 | }
136 | ],
137 | "source": [
138 | "# Get the predictions\n",
139 | "predictions = rf_models.predict(X_test)\n",
140 | "\n",
141 | "# Print the Model Accuracy, how often is the classifier correct?\n",
142 | "print(\"Accuracy:\",metrics.accuracy_score(predictions, y_test))"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 101,
148 | "metadata": {},
149 | "outputs": [
150 | {
151 | "output_type": "execute_result",
152 | "data": {
153 | "text/plain": "",
154 | "text/html": " | features | importance |
\n \n | 3 | \n Recency | \n 8.475428 | \n
\n \n | 37 | \n AcceptedCmpOverall | \n 8.151117 | \n
\n \n | 24 | \n Customer_Days | \n 7.842309 | \n
\n \n | 0 | \n Income | \n 5.412607 | \n
\n \n | 35 | \n MntTotal | \n 5.098285 | \n
\n \n | 4 | \n MntWines | \n 4.935032 | \n
\n \n | 36 | \n MntRegularProds | \n 4.713860 | \n
\n \n | 6 | \n MntMeatProducts | \n 4.602892 | \n
\n \n | 18 | \n AcceptedCmp1 | \n 4.353262 | \n
\n \n | 23 | \n Age | \n 4.002047 | \n
\n
"
155 | },
156 | "metadata": {},
157 | "execution_count": 101
158 | }
159 | ],
160 | "source": [
161 | "# Print Feature Importance\n",
162 | "feature_importance = pd.DataFrame(data={\"features\":x_test.columns, \"importance\":rf_models.best_estimator_.feature_importances_*100})\n",
163 | "feature_importance.sort_values('importance', ascending=False).head(10).style.background_gradient(cmap='coolwarm', low=1, high=0)"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": null,
169 | "metadata": {},
170 | "outputs": [],
171 | "source": []
172 | }
173 | ],
174 | "metadata": {
175 | "language_info": {
176 | "codemirror_mode": {
177 | "name": "ipython",
178 | "version": 3
179 | },
180 | "file_extension": ".py",
181 | "mimetype": "text/x-python",
182 | "name": "python",
183 | "nbconvert_exporter": "python",
184 | "pygments_lexer": "ipython3",
185 | "version": "3.7.6-final"
186 | },
187 | "orig_nbformat": 2,
188 | "kernelspec": {
189 | "name": "python3",
190 | "display_name": "Python 3"
191 | }
192 | },
193 | "nbformat": 4,
194 | "nbformat_minor": 2
195 | }
--------------------------------------------------------------------------------