├── Marketing Mix Modelling analysis.Rmd
├── Marketing_Mix_Modelling_analysis.md
├── Marketing_Mix_Modelling_analysis_files
└── figure-html
│ ├── Adstock-1.png
│ ├── Plot Ads-1.png
│ ├── Pricing-1.png
│ ├── Seasoning-1.png
│ ├── Sterling distribution-1.png
│ ├── sample_time_series.png
│ ├── sterling-distribution-1.png
│ ├── unnamed-chunk-10-1.png
│ ├── unnamed-chunk-11-1.png
│ ├── unnamed-chunk-12-1.png
│ ├── unnamed-chunk-13-1.png
│ ├── unnamed-chunk-14-1.png
│ ├── unnamed-chunk-15-1.png
│ ├── unnamed-chunk-20-1.png
│ ├── unnamed-chunk-21-1.png
│ ├── unnamed-chunk-3-1.png
│ ├── unnamed-chunk-4-1.png
│ ├── unnamed-chunk-5-1.png
│ ├── unnamed-chunk-6-1.png
│ └── unnamed-chunk-7-1.png
└── README.md
/Marketing Mix Modelling analysis.Rmd:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing Mix Modelling analysis.Rmd
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis.md:
--------------------------------------------------------------------------------
1 | # Marketing Mix Modelling Analysis
2 | Lefteris Nikolidakis
3 | 20 March 2016
4 |
5 |
6 | In this statistical analysis I fitted a multivariate regression model on simulations of Sales and marketing series data (i.e. Advertising, Distribution, Pricing) to estimate the impact of various marketing tactics on sales and then forecast the impact of future sets of tactics.
7 |
8 | The initial discovery of relationships is done with a training set while a test set is used for evaluating whether the discovered relationships hold.
9 |
10 | Transformation and adjustment techniques are also applied (Adstocks, log-log, adding ordinal predictors) so that the data would meet the assumptions of the statistical inference procedure and to reduce the variability of forecasts.
11 |
12 |
13 | ### Loading and description of Data
14 |
15 | Below I load the tables from the excel file.
16 |
17 |
18 |
19 |
20 | ```r
21 | #created a pricing variable in excel equal with values/volume
22 | #create data frames
23 | require(XLConnect)
24 | wb = loadWorkbook("data.xls")
25 | MData = readWorksheet(wb, sheet = 1, header = TRUE)
26 | names(MData)[c(1,2,6,7,8,9)] <- c("Week_Ending", "Brand_Sales","Press_Spend", "HWivesTVR", "Kids_TVR","SponsorTVR")
27 | ```
28 |
29 |
30 | The dataset:
31 |
32 | **Brand Sales Volume Kg**: Cumulative sales volume in kg till the weekend date for a Brand's product
33 |
34 | **Brand Sales Volume £**: Cumulative sales Value in £ till the weekend date
35 |
36 | **Brand Pricing**: Average Price during the weekend date
37 |
38 | **Brand Sterling Wtd distribution**: Distribution of the brand's product on several interconnected intermediaries along the way such as wholesalers, distributers, agents and retailers.
39 |
40 | **Promotion distribution**: Three types of marketing activities executed by the business/brand in interconnected intermediaries. This type of promotion affects only the distribution variable which subsequently affects the Sales.
41 |
42 | **Housewives, Kids and Sponsorship TVRs**: Three TV advertising variables that measure the total of all rating points during the brand's advertising campaign.
43 |
44 | **TV Spend £**: Money spent by the Brand/Business in TV Advertising during the weekend date.
45 |
46 | **Press Spend £**: Money spent by the Brand/Business in Press Advertising during the weekend date.
47 |
48 |
49 | ### Training and Test set partition
50 |
51 | In Time Series analysis and forecasting it is advised to leave a portion from the left or right part of the sample data for **testing**, i.e., not to use this part during learning, but to use it to test how successfully the forecasting model predicts our data. The rest of the sample's part will be the **training test** which will be implemented to build up the model.
52 |
53 | Example: Time series Training-Test set partition
54 |
55 |
56 |
57 | In our dataset the training-set will be all data from `2004` to `2006`, and the test-set the `2007` data.
58 |
59 |
60 | ```r
61 | TrainData <- MData[4:160,]
62 | TestData <- MData[161:length(MData[,1]),]
63 | ```
64 |
65 |
66 | ### Brand's Sales Perfomance
67 |
68 | I will first check how Sales behave over time and their distribution:
69 |
70 |
71 |
72 |
73 | ```r
74 | library(ggplot2)
75 | require(gridExtra)
76 | plot1 <- ggplot(TrainData, aes(x=Week_Ending, y=Brand_Sales)) + geom_line() + geom_smooth(method=lm) + ggtitle("Time Series plot and Trend for the Sales")
77 | trendfit <- lm(Brand_Sales ~ Week_Ending, TrainData)
78 | plot2 <- ggplot(TrainData, aes(Brand_Sales)) + geom_histogram(bins=25) + ggtitle("Histogram for Sales")
79 | grid.arrange(plot1, plot2, widths = c(3,2) , heights=2 , ncol=2, nrow=1)
80 | ```
81 |
82 | 
83 |
84 | - The histogram shows that our response's variable is **right skewed**, therefore some transformations must be implemented so that the data would meet the regression's assumptions.
85 |
86 | - The time series plot also shows an increasing **trend** over time
87 |
88 |
89 |
90 |
91 | ```r
92 | corMatrix <-cor(TrainData[,c("Brand_Sales","Week_ending", "Prices", "Distribution", "HWivesTVR", "Kids_TVR", "SponsorTVR", "Press_Spend")])
93 | library(plotrix)
94 | color2D.matplot(corMatrix, show.values = 2, axes=FALSE, xlab="",ylab="", main="Association matrix for all Variables")
95 | axis(1,at=c(seq(from=0.5,to=7.5,by=1)),labels=colnames(corMatrix), cex.axis=0.55)
96 | axis(2,at=c(seq(from=0.5,to=7.5,by=1)),labels=rev(colnames(corMatrix)), las=2, cex.axis=0.55)
97 | ```
98 |
99 | 
100 |
101 | From the correlation matrix we can notice that:
102 |
103 | - Sales variable is significantly related with **Prices**, **Distribution** and **Time** (Week_ending).
104 |
105 | - All correlation coefficients with **Advertising** and Sales are low. This is expected since the ads' carry-over effect has not been yet considered.
106 |
107 | - Distribution and Time are highly correlated hence we suspect that the observed trend was possibly casually related with the brand's Distribution.
108 |
109 |
110 | ### Pricing
111 |
112 | The plot below shows that price changes of the 9th brand variant significantly impacts sales negatively. Also the relationship seems to be monomial of the form y=axk instead of linear, therefore I transformed the x variable respectively:
113 |
114 |
115 | ```r
116 | ##use log-log transformation to calculate the power (k) of the monomial:
117 | fit<-lm(log(Brand_Sales)~log(Prices), data=TrainData)
118 | ##beta coefficient equals bo=17.87 and a=-3.908
119 | ##That means that the relationship is
120 | ## log(y)=17.8686 -3.7977*log(x) <=>
121 | ## y = e^(17.8686 -3.7977*log(x)) <=>
122 | ## y = e^17.8686 * e^log(x^(-3.7977)) <=>
123 | ## y = 57575057 * x^(-3.7977)
124 | fit<-lm(Brand_Sales ~ 0 + I(Prices^(-3.7977)), data=TrainData)
125 | ## the coefficient estimated with the monomial formula is almost equal with the log-log formula
126 |
127 | ggplot(TrainData, aes(x=Prices, y=(Brand_Sales))) + geom_point() + geom_smooth(method=lm, formula=y~I(x^(-3.7977))) + xlab("Prices in £") + ylab("Sales Volume") +ggtitle("Fitted line in Sales vs Prices")
128 | ```
129 |
130 | 
131 |
132 |
133 | ### Seasoning
134 |
135 | When looking at the plot below we see that sales have increased drasticly over the **Christmas** period each year, specifically increased from the week starting at the 18th of december to 28th, and decreased from weeks starting from 29 to 3 of january.
136 |
137 | There is also a strong seasonal period over the **Sales period** of september. Nevertheless the respective pricing seasonal graph clearly shows that the september seasonality can be explained quite well by the `Prices` variable and therefore we will ignore it.
138 |
139 | 
140 |
141 | This Seasonality can be estimated by inserting a **three level** ordinal predictor variable in the model:
142 |
143 | 
144 |
145 | ```
146 | ## Analysis of Variance Table
147 | ##
148 | ## Response: TrainData$Brand_Sales
149 | ## Df Sum Sq Mean Sq F value Pr(>F)
150 | ## TrainData$Season 3 1.8492e+10 6163912092 470.87 < 2.2e-16 ***
151 | ## Residuals 154 2.0159e+09 13090411
152 | ## ---
153 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
154 | ```
155 |
156 | From the graph and Anova's p-value, we come to the conclusion that the ordinal variable explains quite well the seasonality in sales, so we are confident to use it in our mixed marketing model.
157 |
158 | ### Distribution
159 |
160 | When looking at the scatterplot below (2nd plot), we come to the conclusion that `Distribution` affects significantly the Brand's Sales. Also given the assumption that the relationship between the two variables is liner, the fitted line's slope is approximately equal with the **trend's slope** coefficient, which confirms our suspicion that the trend we identified in the Sales volume is explained by the Distribution variable.
161 |
162 |
163 | ```r
164 | ## Plot Sales against distribution
165 | plot1 <- ggplot(TrainData, aes(x=Week_Ending, y=Distribution)) + geom_line() + xlab("Dates") + ylab("Distribution") + ggtitle("Distribution over Time") + geom_smooth(method=lm)
166 |
167 | plot2 <- ggplot(TrainData, aes(x=Distribution, y=Brand_Sales)) + geom_point() + geom_smooth(method=lm) + geom_smooth(method=lm) +
168 | xlab("Distribution") +
169 | ylab("Sales Volume") + ggtitle("Sales-Distribution Plot") + scale_y_continuous(labels=format_si())
170 |
171 | grid.arrange(plot1, plot2, widths = c(3,2) , heights=2 , ncol=2, nrow=1)
172 | ```
173 |
174 |
175 | 
176 |
177 |
178 | #### Ongoing Marketing mix modeling
179 |
180 | Before testing the media variables I will fit and check the model with the variables we explored till now:
181 |
182 | Response Variable:
183 |
184 | * Brand's Sales Volume (Continuous)
185 |
186 | Predictor Variables:
187 |
188 | 1. Prices of response variable (Continuous)
189 |
190 | 2. Distribution of products (Ordinal)
191 |
192 | 3. Seasonality (Ordinal)
193 |
194 |
195 |
196 | ```r
197 | OnGoing <- lm(Brand_Sales ~ I(Prices^(-3.7977)) + Distribution + Season, data=TrainData)
198 | summary(OnGoing)
199 | ```
200 |
201 | ```
202 | ##
203 | ## Call:
204 | ## lm(formula = Brand_Sales ~ I(Prices^(-3.7977)) + Distribution +
205 | ## Season, data = TrainData)
206 | ##
207 | ## Residuals:
208 | ## Min 1Q Median 3Q Max
209 | ## -3215.1 -852.4 89.1 778.0 3299.2
210 | ##
211 | ## Coefficients:
212 | ## Estimate Std. Error t value Pr(>|t|)
213 | ## (Intercept) -8.753e+03 8.133e+02 -10.763 < 2e-16 ***
214 | ## I(Prices^(-3.7977)) 5.389e+07 2.333e+06 23.093 < 2e-16 ***
215 | ## Distribution 1.370e+02 9.823e+00 13.948 < 2e-16 ***
216 | ## Season0 2.898e+03 6.430e+02 4.507 1.3e-05 ***
217 | ## Season1 1.209e+04 9.631e+02 12.555 < 2e-16 ***
218 | ## ---
219 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
220 | ##
221 | ## Residual standard error: 1260 on 152 degrees of freedom
222 | ## Multiple R-squared: 0.8983, Adjusted R-squared: 0.8956
223 | ## F-statistic: 335.6 on 4 and 152 DF, p-value: < 2.2e-16
224 | ```
225 |
226 | All p-values for the multi regression coefficients are smaller than the critical value a=0.05.
227 |
228 | Also the Adjusted R-squared for the current model is equal to 0.896 which indicates that
229 | **89.6%** of the variation in `Brand_Sales` is exlained by the current model.
230 |
231 | ### Media
232 |
233 | TV Advertising affects present and future sales due to its carry-over effect, whereby the impact of adverting on sales can occur during a subsequent time period. This aspect can be controlled with adstock transformation which measures the decaying and diminishing returns effect of advertising throughout the weeks.
234 |
235 | By assuming that the output ad variable depends linearly only on its own previous value, I will use the autoregressive (AR1) formula for calculating the advertising adstocks:
236 |
237 | At = Xt + adstock rate*At-1
238 |
239 | where At is the transformed Advertising value (adstock) in time period t, Xt the value of the advertising variable at time t, At-1 the adstock in the previous period t-1 and adstock rate a coefficient derived through Least Squares Method.
240 |
241 | The advertising variables with carry-over effects that will be transformed to adstocks, are _Housewives_ and _Kids_ and _Sponsorships_ TVRs.
242 |
243 |
244 | ```r
245 | Sales <- TrainData$Brand_Sales
246 | HWivesTVR <- TrainData$HWivesTVR
247 | Kids_TVR <- TrainData$Kids_TVR
248 | SponsorTVR <- TrainData$SponsorTVR
249 |
250 | ggplot(TrainData, aes(Week_Ending)) +
251 | geom_line(aes(y = HWivesTVR, colour = "HWivesTVR")) +
252 | geom_line(aes(y = Kids_TVR, colour = "Kids_TVR")) +
253 | geom_line(aes(y = SponsorTVR, colour = "SponsorTVR")) + ylab("Adveritising TVRs") +
254 | ggtitle("Ads' TVRs over time")
255 | ```
256 |
257 | 
258 |
259 | The Housewives and Kids TVRs are highly correlated with each other and also from the correlation matrix we can see that both affect Sales approximately equally (cor coefficients≈0.2). So since they are both measured in the same unit (TVR), I will merge them to one variable by adding each week's respective values of the two ads.
260 |
261 |
262 | ```r
263 | Kids_and_Housewives <- HWivesTVR+Kids_TVR
264 | TrainData["Kids_and_Housewives"] <- Kids_and_Housewives
265 | ```
266 |
267 |
268 |
269 | #### Adstock transformation:
270 |
271 | First I will find the optimum Adstock Rate coefficient for the transformation. The best approach for deriving the adstock rate is by fitting separate regression models for a range of potential adstock rates; the optimum adstock coefficient will be the one, for which the minimum MSE and biggest R2 is obtained.
272 |
273 |
274 | ```r
275 | ###Create adstocks for Kids_and_Housewives
276 |
277 | ##First find best adstock rate
278 | AdstockRate <- seq(0.1, 1, 0.01)
279 | TrainData$AdstockedKids_and_Housewives = numeric(length(Kids_and_Housewives))
280 | TrainData$AdstockedKids_and_Housewives[1] = Kids_and_Housewives[1]
281 | comb <- data.frame(AdstockRate, sigmas = rep(NA, times = length(AdstockRate)), r.squared = rep(NA, times = length(AdstockRate)))
282 |
283 | for (i in 1:length(AdstockRate)){
284 | for(j in 2:length(Kids_and_Housewives)){
285 | TrainData$AdstockedKids_and_Housewives[j] = Kids_and_Housewives[j] + AdstockRate[i] * TrainData$AdstockedKids_and_Housewives[j-1]
286 | #each advertising value (volume) is transformed and equal as the value plus a percentage
287 | #of the previous transfromed value.
288 | }
289 | modFit = lm(Brand_Sales ~ I(Prices^(-3.7977)) + Distribution + Season + AdstockedKids_and_Housewives, data=TrainData)
290 | comb[i,2] = summary(modFit)$sigma
291 | comb[i,3] = summary(modFit)$r.squared
292 | }
293 |
294 | ##check if min MSE is accompanied with the highest R Squared coefficient of determination.
295 | all.equal(comb[comb$sigmas == min(comb$sigmas),1], comb[comb$r.squared == max(comb$r.squared),1])
296 | ##the optimal Adstock Rate
297 | fitted_AdRate <- comb[comb$sigmas == min(comb$sigmas),1]
298 |
299 | for(j in 2:length(Kids_and_Housewives)){
300 | TrainData$AdstockedKids_and_Housewives[j] = Kids_and_Housewives[j] + fitted_AdRate * TrainData$AdstockedKids_and_Housewives[j-1]
301 | #each advertising value (volume) is transformed and equal as the value plus a percentage
302 | #of the previous transfromed value.
303 | }
304 |
305 | ggplot(TrainData, aes(Week_Ending)) +
306 | geom_line(aes(y = Kids_and_Housewives, colour = "Kids_and_Housewives")) +
307 | geom_line(aes(y = AdstockedKids_and_Housewives, colour = "AdstockedKids_and_Housewives")) + ylab("Adveritising TVRs") + ggtitle("Adstock vs. Advertising over time")
308 | ```
309 |
310 | 
311 |
312 |
313 |
314 | By comparing the two lines in the graph, we can see how adstock measures the memory effect of advertising carried over from start of each advertising.
315 |
316 |
317 | ```r
318 | ##The model
319 | Modfit <- lm(Brand_Sales ~ I(Prices^(-3.7977)) + Distribution + Season + AdstockedKids_and_Housewives + AdstockedSponsorTVR, data=TrainData)
320 |
321 | summary(Modfit)
322 | ```
323 |
324 | ```
325 | ##
326 | ## Call:
327 | ## lm(formula = Brand_Sales ~ I(Prices^(-3.7977)) + Distribution +
328 | ## Season + AdstockedKids_and_Housewives + AdstockedSponsorTVR,
329 | ## data = TrainData)
330 | ##
331 | ## Residuals:
332 | ## Min 1Q Median 3Q Max
333 | ## -2798.7 -812.6 33.3 848.4 3247.7
334 | ##
335 | ## Coefficients:
336 | ## Estimate Std. Error t value Pr(>|t|)
337 | ## (Intercept) -8.723e+03 7.825e+02 -11.148 < 2e-16 ***
338 | ## I(Prices^(-3.7977)) 5.290e+07 2.256e+06 23.450 < 2e-16 ***
339 | ## Distribution 1.330e+02 9.493e+00 14.008 < 2e-16 ***
340 | ## Season0 2.809e+03 6.186e+02 4.541 1.14e-05 ***
341 | ## Season1 1.224e+04 9.259e+02 13.222 < 2e-16 ***
342 | ## AdstockedKids_and_Housewives 2.629e+00 6.914e-01 3.802 0.000208 ***
343 | ## AdstockedSponsorTVR -9.115e+00 1.123e+01 -0.812 0.418302
344 | ## ---
345 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
346 | ##
347 | ## Residual standard error: 1210 on 150 degrees of freedom
348 | ## Multiple R-squared: 0.9074, Adjusted R-squared: 0.9037
349 | ## F-statistic: 245.1 on 6 and 150 DF, p-value: < 2.2e-16
350 | ```
351 |
352 | The p-value for the coefficient of Kids/Housewives adstock is below the critical value a=0.05. On the other hand the p-value for the coefficient of TV sponsorship is very high and so it should be excluded from the model.
353 |
354 | Also when comparing the R2 and MSE of the current and previous model we can see that the adstocked variable interprets only **0.81%** extra variability in the `Brand_Sales` and that the model's standard error is not increased at all.
355 |
356 | #### Press
357 |
358 | Press Advertising is described in our data only as money spent in each respective week and most of the values are equal to zero, hence I will replace this variable with a 3 level ordinal variable indicating the size of investment. The coefficients for each ordinal value will explain the average increase in Sales volume when an investment in Press occurs with a specific cost range occurs, with the rest of the predictors' values fixed.
359 |
360 | Also I will assume that there is no carry-over component in Press advertising. In other words each weekt's press ad investments only affect sales of that same week.
361 |
362 |
363 | ```r
364 | ##Most press efforts are equal to zero:
365 | ggplot(TrainData, aes(Press_Spend,Brand_Sales)) + geom_point() + ggtitle("Sales vs £ spent in Press Advertising")
366 | ```
367 |
368 | 
369 |
370 | ```r
371 | ##Ordinal
372 | TrainData$PressFactor <- rep(0, length(TrainData$Press_Spend))
373 | TrainData$PressFactor[TrainData$Press_Spend>0 & TrainData$Press_Spend<=10000] <- 1
374 | TrainData$PressFactor[TrainData$Press_Spend>10000] <- 2
375 | TrainData$PressFactor <- as.factor(TrainData$PressFactor)
376 |
377 | modFit <- lm(Brand_Sales ~ I(Prices^(-3.7977)) + Distribution + Season + AdstockedKids_and_Housewives + PressFactor, data=TrainData)
378 | summary(modFit)
379 | ```
380 |
381 | ```
382 | ##
383 | ## Call:
384 | ## lm(formula = Brand_Sales ~ I(Prices^(-3.7977)) + Distribution +
385 | ## Season + AdstockedKids_and_Housewives + PressFactor, data = TrainData)
386 | ##
387 | ## Residuals:
388 | ## Min 1Q Median 3Q Max
389 | ## -2731.5 -879.8 -5.0 772.6 3322.7
390 | ##
391 | ## Coefficients:
392 | ## Estimate Std. Error t value Pr(>|t|)
393 | ## (Intercept) -8.686e+03 7.910e+02 -10.981 < 2e-16 ***
394 | ## I(Prices^(-3.7977)) 5.282e+07 2.273e+06 23.242 < 2e-16 ***
395 | ## Distribution 1.321e+02 9.661e+00 13.678 < 2e-16 ***
396 | ## Season0 2.780e+03 6.210e+02 4.476 1.5e-05 ***
397 | ## Season1 1.223e+04 9.292e+02 13.161 < 2e-16 ***
398 | ## AdstockedKids_and_Housewives 2.602e+00 7.025e-01 3.703 0.000299 ***
399 | ## PressFactor1 4.309e+02 6.273e+02 0.687 0.493231
400 | ## PressFactor2 -1.544e+02 7.212e+02 -0.214 0.830725
401 | ## ---
402 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
403 | ##
404 | ## Residual standard error: 1214 on 149 degrees of freedom
405 | ## Multiple R-squared: 0.9073, Adjusted R-squared: 0.903
406 | ## F-statistic: 208.4 on 7 and 149 DF, p-value: < 2.2e-16
407 | ```
408 |
409 | Press ordinally variable should be excluded from the model since all respective coefficients have p-values above the critical level.
410 |
411 | ### Final Model
412 |
413 |
414 | ```r
415 | FinModel <- lm(Brand_Sales ~ I(Prices^(-3.7977)) + Distribution + Season + AdstockedKids_and_Housewives, data=TrainData)
416 | summary(FinModel)
417 | ```
418 |
419 | ```
420 | ##
421 | ## Call:
422 | ## lm(formula = Brand_Sales ~ I(Prices^(-3.7977)) + Distribution +
423 | ## Season + AdstockedKids_and_Housewives, data = TrainData)
424 | ##
425 | ## Residuals:
426 | ## Min 1Q Median 3Q Max
427 | ## -2766.2 -875.1 -19.8 814.7 3295.1
428 | ##
429 | ## Coefficients:
430 | ## Estimate Std. Error t value Pr(>|t|)
431 | ## (Intercept) -8.761e+03 7.802e+02 -11.229 < 2e-16 ***
432 | ## I(Prices^(-3.7977)) 5.291e+07 2.254e+06 23.479 < 2e-16 ***
433 | ## Distribution 1.333e+02 9.474e+00 14.072 < 2e-16 ***
434 | ## Season0 2.792e+03 6.175e+02 4.521 1.24e-05 ***
435 | ## Season1 1.222e+04 9.246e+02 13.220 < 2e-16 ***
436 | ## AdstockedKids_and_Housewives 2.594e+00 6.893e-01 3.764 0.000239 ***
437 | ## ---
438 | ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
439 | ##
440 | ## Residual standard error: 1208 on 151 degrees of freedom
441 | ## Multiple R-squared: 0.907, Adjusted R-squared: 0.9039
442 | ## F-statistic: 294.6 on 5 and 151 DF, p-value: < 2.2e-16
443 | ```
444 |
445 |
446 | The Adjusted R2 for the final model is eventually equal with **0.904**.
447 |
448 | Also all p-values for the coefficients are very small and the expected standard error of the residuals is equal with *2095*.
449 |
450 | ### Model Diagnostics
451 |
452 | Before forecasting the data in the test set, we need to assess the validity of a model and examine if the multivariate regression assumptions are met.
453 |
454 | When looking at the graphical diagnostic results below we conclude that:
455 |
456 | - The variance of the residuals is approximately equal over the whole period (see Residuals Time Series plot).
457 |
458 | - The points are not fully randomly arranged over time since weak autoregressive trends are still present (see "Residuals Time Series" plot and "Autocorrelation for 2006 residiuals" plot).
459 |
460 | - There are three residual points with high values (not outliers) and relatively high leverage (see "Residuals vs Leverage" plot). The two of these three values appear within the Christmas period (see "Residuals' Time series" plot), hence we should improve the interpretability of the `Season` custom variable.
461 |
462 | - The Errors are approximately normally distributed with a slight right skeweness present (see Histogram and Q-Q plot)
463 |
464 | - In the range of ten to thirteen thousand sales volume a significant part of negative errors is present (see "Residuals vs Fitted" diagnostic plot). In order to identify this trend we should further explore our dataset or add extra predictors.
465 |
466 | 
467 |
468 | 
469 |
470 | 
471 |
472 |
473 | This final model can be used to forecast future Sales given future tactics, by adding in the regression formula the respective values of the model's variables:
474 |
475 | 1. the time period of interest
476 |
477 | 2. Planed Prices for the product in the respective period
478 |
479 | 3. The estimated product ditribution for the respective period
480 |
481 | 3. TVRs of the planned Investments in TV advertising
482 |
483 | 4. The respective seasonal ordinal values, if the period includes the Christmas season.
484 |
485 |
486 | ## Model Testing
487 |
488 | Here we will run our model on the unseen 2007 dataset and by comparing the predicted with the observed Sales for 2007, we will determine whether the model's guesses are sufficient.
489 |
490 |
491 |
492 |
493 |
494 |
495 |
496 |
497 |
498 |
499 | ```r
500 | TestData$PredictedSales_2007 <- FinModel$coef[1] + FinModel$coef[2]*Prices_Transformed + FinModel$coef[3]*Distribution + FinModel$coef[4]*Season0 + FinModel$coef[5]*Season1 + FinModel$coef[6]*Adstocked_Kids_Housewives
501 | ```
502 |
503 | When looking at the comparison plots below we come to the conclusion that our model forecasts quite well future sales.
504 |
505 | 
506 |
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/Adstock-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/Adstock-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/Plot Ads-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/Plot Ads-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/Pricing-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/Pricing-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/Seasoning-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/Seasoning-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/Sterling distribution-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/Sterling distribution-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/sample_time_series.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/sample_time_series.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/sterling-distribution-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/sterling-distribution-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-10-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-10-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-11-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-11-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-12-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-12-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-13-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-13-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-14-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-14-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-15-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-15-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-20-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-20-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-21-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-21-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-3-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-3-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-4-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-4-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-5-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-5-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-6-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-6-1.png
--------------------------------------------------------------------------------
/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-7-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tab114/Marketing_Mixed_Modelling_Analysis/3b178611ed410526f4ccce3003f96b9f39cda937/Marketing_Mix_Modelling_analysis_files/figure-html/unnamed-chunk-7-1.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | In this statistical analysis I fitted a multivariate regression model on simulations of `Sales Volume` (response Variable) and marketing series data to (i.e. `Advertising`, `Distribution`, `Pricing`) to estimate the impact of various marketing tactics on sales and then forecasted the impact of future sets of tactics.
2 |
3 | The initial discovery of relationships is done with a training set while a test set is used for evaluating whether the discovered relationships hold.
4 |
5 | Transformation and adjustment techniques are also applied (Adstocks, log-log, adding ordinal predictors) so that the data would meet the assumptions of the statistical inference procedure and to reduce the variability of forecasts.
6 |
7 | Marketing_Mix_Modelling_analysis.md demonstrates the full analysis.
8 |
9 |
--------------------------------------------------------------------------------