├── dictionary.png ├── data_distributions.png ├── Business presentation.pdf ├── iFood Data Analyst Case.pdf ├── README.md └── classification_model.ipynb /dictionary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/dictionary.png -------------------------------------------------------------------------------- /data_distributions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/data_distributions.png -------------------------------------------------------------------------------- /Business presentation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/Business presentation.pdf -------------------------------------------------------------------------------- /iFood Data Analyst Case.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nailson/ifood-data-business-analyst-test/HEAD/iFood Data Analyst Case.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ifood-data-business-analyst-test 2 | ## last update 19/02/2020 3 |

4 | This case is used for hiring Data Analysts for the iFood Brain team. Instructions are in the pdf file.

5 | 6 | If you are interested in joining our team, or getting to know more about iFood and our team, feel free to send an e-mail to "ifoodbrain_hiring@ifood.com.br".

7 | -------------------------------------------------------------------------------- /classification_model.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# iFood CRM Data Analyst Case - Part2\n", 8 | "\n", 9 | "### Description\n", 10 | "\n", 11 | "The objective of the team is to build a predictive model that will produce the highest profit for the\n", 12 | "next direct marketing campaign, scheduled for the next month. \n", 13 | "The new campaign, sixth, aims at\n", 14 | "selling a new gadget to the Customer Database. \n", 15 | "To build the model, a pilot campaign involving **2.240 customers** was carried out. \n", 16 | "The customers were selected at random and contacted by phone regarding the acquisition of the gadget. \n", 17 | "During the following months, customers who bought the offer were properly labeled. \n", 18 | "The total cost of the sample campaign was 6.720MU and the revenue generated by the customers who accepted the offer was 3.674MU. \n", 19 | "Globally the campaign had a profit of -3.046MU. \n", 20 | "The success rate of the campaign was 15%. \n", 21 | "\n", 22 | "The objective is of the team is to develop a model that predicts customer behavior and to apply it to the rest of the customer base.\n", 23 | "Hopefully the model will allow the company to cherry pick the customers that are most likely to\n", 24 | "purchase the offer while leaving out the non-respondents, making the next campaign highly\n", 25 | "profitable. Moreover, other than maximizing the profit of the campaign, the CMO is interested in\n", 26 | "understanding to study the characteristic features of those customers who are willing to buy the\n", 27 | "gadget.\n", 28 | "\n", 29 | "### Key Objectives are:\n", 30 | "\n", 31 | "1. Explore the data – don’t just plot means and counts. Provide insights, define cause and\n", 32 | "effect. Provide a better understanding of the characteristic features of respondents;\n", 33 | "2. Propose and describe a customer segmentation based on customers behaviors;\n", 34 | "3. Create a predictive model which allows the company to maximize the profit of the next\n", 35 | "marketing campaign.\n", 36 | "4. Whatever else you think is necessary.\n", 37 | "\n", 38 | "### Deliverables:\n", 39 | "\n", 40 | "1. Data Exploration;\n", 41 | "2. Segmentation;\n", 42 | "3. Classification Model;\n", 43 | "4. A short business presentation.\n", 44 | "\n", 45 | "### Data Dictionary and Notes\n", 46 | "\n", 47 | "At the botton of the notebook\n", 48 | "\n", 49 | "---" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 102, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "import pandas as pd\n", 59 | "import numpy as np\n", 60 | "import seaborn as sns\n", 61 | "import matplotlib.pyplot as plt\n", 62 | "from sklearn.model_selection import train_test_split\n", 63 | "from sklearn.ensemble import RandomForestClassifier\n", 64 | "from sklearn.model_selection import GridSearchCV\n", 65 | "from sklearn import metrics" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 72, 71 | "metadata": {}, 72 | "outputs": [], 73 | "source": [ 74 | "# Read Dataset\n", 75 | "ifood_df = pd.read_csv('ifood_df.csv')\n", 76 | "\n", 77 | "# Split dataset into features and labels\n", 78 | "features = ifood_df.drop('Response', axis =1)\n", 79 | "labels = ifood_df.Response\n", 80 | "\n", 81 | "# Split dataset into training set and test set\n", 82 | "X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.40, random_state = 5)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "## Generating the Model \n" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 73, 95 | "metadata": {}, 96 | "outputs": [ 97 | { 98 | "output_type": "stream", 99 | "name": "stdout", 100 | "text": "Fitting 5 folds for each of 36 candidates, totalling 180 fits\n[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n[Parallel(n_jobs=1)]: Done 180 out of 180 | elapsed: 1.7min finished\n" 101 | }, 102 | { 103 | "output_type": "execute_result", 104 | "data": { 105 | "text/plain": "GridSearchCV(cv=5, error_score=nan,\n estimator=RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,\n class_weight=None,\n criterion='gini', max_depth=None,\n max_features='auto',\n max_leaf_nodes=None,\n max_samples=None,\n min_impurity_decrease=0.0,\n min_impurity_split=None,\n min_samples_leaf=1,\n min_samples_split=2,\n min_weight_fraction_leaf=0.0,\n n_estimators=100, n_jobs=None,\n oob_score=False, random_state=5,\n verbose=0, warm_start=False),\n iid='deprecated', n_jobs=None,\n param_grid={'criterion': ['gini'], 'max_depth': [None, 3, 5, 8],\n 'max_features': ['auto'],\n 'min_samples_split': [2, 3, 4],\n 'n_estimators': [50, 100, 200]},\n pre_dispatch='2*n_jobs', refit=True, return_train_score=False,\n scoring=None, verbose=1)" 106 | }, 107 | "metadata": {}, 108 | "execution_count": 73 109 | } 110 | ], 111 | "source": [ 112 | "# Using Grid Search to find the best parameters\n", 113 | "param_grid = { \n", 114 | " 'n_estimators': [50, 100, 200],\n", 115 | " 'max_features': ['auto'],\n", 116 | " 'max_depth' : [None,3,5,8],\n", 117 | " 'criterion' :['gini'],\n", 118 | " 'min_samples_split':[2,3,4]\n", 119 | "}\n", 120 | "\n", 121 | "# Training RF Models with K-Fold of 5 \n", 122 | "rf_models = GridSearchCV(RandomForestClassifier(random_state = 5), param_grid=param_grid, cv=5, verbose=1)\n", 123 | "rf_models.fit(X_train, y_train)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 74, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "output_type": "stream", 133 | "name": "stdout", 134 | "text": "Accuracy: 0.8707482993197279\n" 135 | } 136 | ], 137 | "source": [ 138 | "# Get the predictions\n", 139 | "predictions = rf_models.predict(X_test)\n", 140 | "\n", 141 | "# Print the Model Accuracy, how often is the classifier correct?\n", 142 | "print(\"Accuracy:\",metrics.accuracy_score(predictions, y_test))" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 101, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "output_type": "execute_result", 152 | "data": { 153 | "text/plain": "", 154 | "text/html": "\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n

	features	importance
3	Recency	8.475428
37	AcceptedCmpOverall	8.151117
24	Customer_Days	7.842309
0	Income	5.412607
35	MntTotal	5.098285
4	MntWines	4.935032
36	MntRegularProds	4.713860
6	MntMeatProducts	4.602892
18	AcceptedCmp1	4.353262
23	Age	4.002047

" 155 | }, 156 | "metadata": {}, 157 | "execution_count": 101 158 | } 159 | ], 160 | "source": [ 161 | "# Print Feature Importance\n", 162 | "feature_importance = pd.DataFrame(data={\"features\":x_test.columns, \"importance\":rf_models.best_estimator_.feature_importances_*100})\n", 163 | "feature_importance.sort_values('importance', ascending=False).head(10).style.background_gradient(cmap='coolwarm', low=1, high=0)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [] 172 | } 173 | ], 174 | "metadata": { 175 | "language_info": { 176 | "codemirror_mode": { 177 | "name": "ipython", 178 | "version": 3 179 | }, 180 | "file_extension": ".py", 181 | "mimetype": "text/x-python", 182 | "name": "python", 183 | "nbconvert_exporter": "python", 184 | "pygments_lexer": "ipython3", 185 | "version": "3.7.6-final" 186 | }, 187 | "orig_nbformat": 2, 188 | "kernelspec": { 189 | "name": "python3", 190 | "display_name": "Python 3" 191 | } 192 | }, 193 | "nbformat": 4, 194 | "nbformat_minor": 2 195 | } --------------------------------------------------------------------------------