└── README.md /README.md: -------------------------------------------------------------------------------- 1 | ## Overview 2 | Create a model to predict the result of a game between team 1 and team 2, based on who's home and who's away, and on whether or not the game is friendly or not. 3 | 4 | The two possible approaches (as shown below) have been used to make predictions given the below inputs 5 | - Input: Home team, Away team, Tournament type (World cup, Friendly, Other) 6 | 7 | ### Approach 1: Polynomial approach 8 | The model should use the below features: 9 | - Rank of home team 10 | - Rank of away team 11 | - Tournament type 12 | 13 | Model 1: Predict how many goals the home team scores 14 | Model 2: Predict how many goals the away team scores 15 | 16 | ### Approach 2 Logistic approach 17 | Feature Engineering: Figure out from the home team’s perspective if the game is a Win, Lose or Draw (W, L, D) 18 | 19 | ### Context 20 | A more detailed explanation and history of the rankings is available [here](https://en.wikipedia.org/wiki/FIFA_World_Rankings) 21 | 22 | An explanation of the ranking procedure is available [here](https://www.fifa.com/fifa-world-ranking/procedure/men.html) 23 | 24 | ### Dataset Columns 25 | Some features are available on the FIFA [ranking page](https://www.fifa.com/fifa-world-ranking/ranking-table/men/index.html) 26 | 27 | - Rank 28 | - Country Abbreviation 29 | - Total Points 30 | - Previous Points 31 | - Rank Change 32 | - Average Previous Years Points 33 | - Average Previous Years Points Weighted (50%) 34 | - Average 2 Years Ago Points 35 | - Average 2 Years Ago Points Weighted (30%) 36 | - Average 3 Years Ago Points 37 | - Average 3 Years Ago Points Weighted (20%) 38 | - Confederation 39 | - Date - date of the match 40 | - Home_team - the name of the home team 41 | - Away_team - the name of the away team 42 | - Home_score - full-time home team score including extra time, not including penalty-shootouts 43 | - Away_score - full-time away team score including extra time, not including penalty-shootouts 44 | - Tournament - the name of the tournament 45 | - City - the name of the city/town/administrative unit where the match was played 46 | - Country - the name of the country where the match was played 47 | - Neutral - TRUE/FALSE column indicating whether the match was played at a neutral venue 48 | 49 | ### Deliverables 50 | - Perform Exploratory Data Analysis 51 | - Perform any necessary feature engineering 52 | - Check of multicollinearity 53 | - Build models 54 | - Cross-validate the models 55 | - Compute RMSE 56 | - Create residual plots for the models, and assess their heteroscedasticity using Bartlett’s test 57 | 58 | ### Dataset 59 | The dataset and glossary to use for this project can be found [here](https://drive.google.com/open?id=1BYUqaEEnFtAe5lvzJh9lpVpR2MAvERUc) 60 | 61 | --------------------------------------------------------------------------------