├── .gitignore ├── Markov-Chain.Rproj ├── README.md ├── campaign_attribution.csv ├── img ├── AttributionApproach.PNG ├── MarkovChain.PNG ├── g_budgetallocation.png ├── g_channelperformance.png ├── g_markovheuristics.png └── markov_graph.png ├── markov_addons ├── README.md ├── markov_deciding_path_duration.r ├── markov_null_paths_comparison.R ├── markov_one_path_channels.r ├── markov_transition_matrix_visual.r └── markov_transition_simulation.r ├── markov_chain_attribution.R ├── markov_higher_order.R ├── results_visualization ├── campaign_attribution.csv └── markov_chain_visualization.R └── sample_datasets ├── README.md ├── budget_sample_daily.csv ├── campaign_attribution.csv └── campaign_data.csv /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /Markov-Chain.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Marketing-Attribution_Markov-Chain 2 | R based marketing attribution and campaign budget optimization using Markov Chains 3 | 4 | All of the concepts described shortly below have a corresponding links in the reference section. You can find there all articles and papers that I think are doing particularly great job in introducing and explaining this topic. 5 | 6 | #### If you are looking for attribution method that does not require customer paths - check out my other repository about relative weight analytis | https://github.com/MatCyt/relative-weight-analysis 7 | 8 | ### Marketing Attribution 9 | 10 | Marketing attribution is trying to answer one of the key questions in advertisement world: how precisely my campaigns and actions contributed to the results I see? In more [formal definition](https://en.wikipedia.org/wiki/Attribution_(marketing)): 11 | 12 | > attribution is the identification of a set of user actions ("events" or "touchpoints") that contribute in some manner to a desired outcome, and then the assignment of a value to each of these events. Marketing attribution provides a level of understanding of what combination of events in what particular order influence individuals to engage in a desired behavior, typically referred to as a conversion. 13 | 14 | Original, popular approach tries to solve this problem with set of heuristics: attributing all the conversions to the last or first touchpoint (last-touch or first-touch), dividing the glory equally among channels (linear attribution) or giving more credit to more recent one (time decay attribution). Below you can find a short visual summary and a reference in the last section discussing them in detail. 15 | 16 |
17 |
19 |
37 |
38 |
43 |
45 |
110 |
112 |
114 |
154 |
156 |
157 | ### Possible improvements
158 |
159 | Customer journey not ending with conversion can last for dozens of days (limited by cookie lifetime) and have many different touchpoint. We may want to decide to break it after X touchpoints or days following specific business logic.
160 |
161 | You may want try to validate the Markov results through accuracy measures based on prediction results to compare it with other allocation measurement methods.
162 |
163 | As Sergey's mention in his post the unique channel paths are undervalued by default in current calculation method. You may want to double check this impact and calculate markov results separately for one and multi-channel path.
164 |
165 | ### Markov Chain - Links and materials
166 | Above all - if you should read only one thing it would be two posts from Analyze Core blog by Sergey Bryl
167 |
168 | [Part 1 - Introduction to the topic in digital marketing context](https://analyzecore.com/2016/08/03/attribution-model-r-part-1/)
169 |
170 | [Part 2 - Great R implementation of Markov Chain](https://analyzecore.com/2017/05/31/marketing-multi-channel-attribution-model-r-part-2-practical-issues/)
171 |
172 | *R Libraries that can be applier to this problem*
173 | * [ChannelAttribution](https://cran.r-project.org/web/packages/ChannelAttribution/ChannelAttribution.pdf)
174 | * [markovchain](https://cran.r-project.org/web/packages/markovchain/index.html)
175 | * [clickstream](https://cran.r-project.org/web/packages/clickstream/clickstream.pdf)
176 |
177 | *Additional Resources*
178 |
179 | 1) [Heuristics Models Overview 1](https://www.snapapp.com/blog/marketing-attribution-models/)
180 | 2) [Heuristics Models Overview 2](https://www.referralsaasquatch.com/marketing-attribution/)
181 | 3) [Model Based Attribution - Overview](https://www.slideshare.net/MarketingFestival/lucie-sperkova-pioneering-multichannel-attribution-for-the-lack-of-comprehensive-solutions)
182 | 4) [Graphical introduction to Markov Chain](http://setosa.io/ev/markov-chains/)
183 | 5) [Markov Chains & Google Analytics Connection with R](https://stuifbergen.com/2016/11/conversion-attribution-markov-model-r/)
184 | 6) [Validating Markov Chains](https://amunategui.github.io/markov-chains/index.html)
185 |
186 | *Whitepapers*
187 |
188 | 1) [Mapping the Customer Journey: A Graph-Based Framework for Online Attribution Modeling](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2343077)
189 | 2) [Are Web Users Really Markovian?](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.232.5927&rep=rep1&type=pdf)
190 | 3) [Modeling Online Browsing and Path Analysis Using Clickstream Data](https://www.andrew.cmu.edu/user/alm3/papers/purchase%20conversion.pdf)
191 |
192 |
--------------------------------------------------------------------------------
/campaign_attribution.csv:
--------------------------------------------------------------------------------
1 | channel_name,markov_result,first_touch,last_touch,linear_touch,total_cost,chanel_weight,cost_weight,roas,optimal_budget,CPA
2 | Facebook,5940.39092991379,5908,6052,5957.793872194312,1481.7,0.30288027991198646,0.3050836988078325,0.9927776577232534,1470.9986554485447,0.24942802880845136
3 | Instagram,3665.366410561627,2634,2555,2570.135474419468,641.2,0.18688453630559462,0.13202380217019788,1.4155366928811313,907.6421274753815,0.17493476181600953
4 | Online Display,2212.9446308119614,2271,2246,2234.8089611535465,554.9,0.11283050174944992,0.1142545349723063,0.9875363089683789,547.9838978465534,0.2507518680195804
5 | Online Video,3250.793023843018,3803,3992,3937.3872710913865,991.8,0.16574685279370918,0.20421273704367163,0.811638172981657,804.9827399632073,0.3050947854033214
6 | Paid Search,4543.505004869603,4997,4768,4912.87442114129,1187.1,0.23165782923925982,0.2444252270059917,0.9477656299101282,1125.0925792663131,0.26127406016449833
7 |
--------------------------------------------------------------------------------
/img/AttributionApproach.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/AttributionApproach.PNG
--------------------------------------------------------------------------------
/img/MarkovChain.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/MarkovChain.PNG
--------------------------------------------------------------------------------
/img/g_budgetallocation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/g_budgetallocation.png
--------------------------------------------------------------------------------
/img/g_channelperformance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/g_channelperformance.png
--------------------------------------------------------------------------------
/img/g_markovheuristics.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/g_markovheuristics.png
--------------------------------------------------------------------------------
/img/markov_graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/markov_graph.png
--------------------------------------------------------------------------------
/markov_addons/README.md:
--------------------------------------------------------------------------------
1 | # Datasets description
2 | 1) Campaign dataset
3 |
4 | 2) Budget
5 | daily budget data
6 |
7 | 3) Campaign attribution
8 | for the outcome visualization mainly
9 |
--------------------------------------------------------------------------------
/markov_addons/markov_deciding_path_duration.r:
--------------------------------------------------------------------------------
1 |
2 | # Mecessary background and code taken from:
3 | # https://analyzecore.com/2017/05/31/marketing-multi-channel-attribution-model-r-part-2-practical-issues/
4 |
5 | # How to set the right time window for customer journey duration.
6 | # We can visualize the day count capturing 95% of durations as in the code below.
7 |
8 | # It is getting more complex if we want to build probabilistic model looking forward
9 | # In this case we can also manage tha path that did not completed by our current date
10 | # (code and explanation in the body of the article)
11 |
12 | df_multi_paths_tl <- df_split %>%
13 | group_by(cookie) %>%
14 | mutate(date = as.Date(time)) %>%
15 | summarise(path = paste(channel, collapse = ' > '),
16 | first_touch_date = min(date),
17 | last_touch_date = max(date),
18 | tot_time_lapse = round(as.numeric(last_touch_date - first_touch_date)),
19 | conversion = sum(conversion)) %>%
20 | ungroup()
21 |
22 | # distribution plot
23 | ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) +
24 | theme_minimal() +
25 | geom_histogram(fill = '#4e79a7', binwidth = 1)
26 |
27 | # cumulative distribution plot
28 | ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) +
29 | theme_minimal() +
30 | stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) +
31 | geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) +
32 | geom_vline(xintercept = 23, color = '#e15759', size = 1.5, linetype = 2)
33 |
34 |
--------------------------------------------------------------------------------
/markov_addons/markov_null_paths_comparison.R:
--------------------------------------------------------------------------------
1 | # In the markov chain approach and ChannelAttribution package we should remember about including the paths
2 | # that did not ended in conversion. Below there is a comparison of results for both approaches with visualized output.
3 |
4 |
5 | ## Load libraries
6 | if (!require("pacman")) install.packages("pacman")
7 | pacman::p_load(data.table, ggplot2, dplyr, ChannelAttribution)
8 |
9 |
10 | ## Markov Attribution without taking null conversion into consideration
11 | df_paths_conv = df_split %>%
12 | group_by(path_id) %>%
13 | arrange(time) %>%
14 | summarise(path = paste(channel, collapse = ">"),
15 | total_conversions = sum(conversion)) %>%
16 | ungroup()
17 |
18 |
19 | markov_attribution_conv <- markov_model(df_paths_conv,
20 | var_path = "path",
21 | var_conv = "total_conversions",
22 | var_value = NULL,
23 | var_null = NULL,
24 | out_more = TRUE)
25 |
26 |
27 |
28 | ## Markov Attribution including null conversion
29 | df_paths_null = df_split %>%
30 | group_by(path_id) %>%
31 | arrange(time) %>%
32 | summarise(path = paste(channel, collapse = ">"),
33 | total_conversions = sum(conversion)) %>%
34 | ungroup() %>%
35 | mutate(null_conversion = ifelse(total_conversions == 1, 0, 1)) # adding the null column
36 |
37 |
38 | markov_attribution_null <- markov_model(df_paths_null,
39 | var_path = "path",
40 | var_conv = "total_conversions",
41 | var_value = NULL,
42 | var_null = "null_conversion", # adding the null variable
43 | out_more = TRUE)
44 |
45 |
46 | ## Comparing results
47 | conv_result = markov_attribution_conv$result
48 | null_result = markov_attribution_null$result
49 |
50 | colnames(null_result) = c("channel_name","null_included")
51 | colnames(conv_result) = c("channel_name", "null_omitted")
52 |
53 | compare_approach =
54 | null_result %>%
55 | left_join(conv_result)
56 |
57 | df_gcompare = melt(compare_approach, id = "channel_name")
58 |
59 | g_compare_null <- ggplot(df_gcompare, aes(x = channel_name, y = value, fill = variable)) +
60 | geom_bar(stat = "identity", width = 0.6, position = position_dodge(width = 0.7)) +
61 | expand_limits(y=10000) +
62 | scale_fill_manual(labels = c("Null Included", "Null Omitted"), values = c("deepskyblue", "goldenrod1")) +
63 | theme_minimal() +
64 | theme(axis.text.x = element_text(angle = 30, hjust = 0.6)) +
65 | theme(panel.grid.major.x = element_blank()) +
66 | geom_text(aes(label = round(value, 0)),
67 | fontface = "bold", size = 3.5,
68 | vjust = -0.5, position = position_dodge(width = 0.75)) +
69 | labs(x = "", y = "Conversions") +
70 | ggtitle("Including vs omitting null conversion paths") +
71 | theme(plot.title = element_text(hjust = 0.5))
72 |
73 |
74 |
--------------------------------------------------------------------------------
/markov_addons/markov_one_path_channels.r:
--------------------------------------------------------------------------------
1 | # Compare one path and multipath channels
2 |
3 | # Markov model underestimates the conversions brought by one path channels (Channel -> Conversion)
4 | # Check if there is any channel with particular high one-path proportion?
5 |
6 | ##### one- and multi-channel paths #####
7 | df_path_1_clean <- df_split %>%
8 | group_by(cookie) %>%
9 | mutate(uniq_channel_tag = ifelse(length(unique(channel)) == 1, TRUE, FALSE)) %>%
10 | ungroup()
11 |
12 | df_path_1_clean_uniq <- df_path_1_clean %>%
13 | filter(uniq_channel_tag == TRUE) %>%
14 | select(-uniq_channel_tag)
15 |
16 | df_path_1_clean_multi <- df_path_1_clean %>%
17 | filter(uniq_channel_tag == FALSE) %>%
18 | select(-uniq_channel_tag)
19 |
20 | ### experiment ###
21 | # attribution model for all paths
22 | df_all_paths <- df_path_1_clean %>%
23 | group_by(cookie) %>%
24 | summarise(path = paste(channel, collapse = ' > '),
25 | conversion = sum(conversion)) %>%
26 | ungroup() %>%
27 | filter(conversion == 1)
28 |
29 | mod_attrib <- markov_model(df_all_paths,
30 | var_path = 'path',
31 | var_conv = 'conversion',
32 | out_more = TRUE)
33 | mod_attrib$removal_effects
34 | mod_attrib$result
35 | d_all <- data.frame(mod_attrib$result)
36 |
37 | # attribution model for splitted multi and unique channel paths
38 | df_multi_paths <- df_path_1_clean_multi %>%
39 | group_by(cookie) %>%
40 | summarise(path = paste(channel, collapse = ' > '),
41 | conversion = sum(conversion)) %>%
42 | ungroup() %>%
43 | filter(conversion == 1)
44 |
45 | mod_attrib_alt <- markov_model(df_multi_paths,
46 | var_path = 'path',
47 | var_conv = 'conversion',
48 | out_more = TRUE)
49 | mod_attrib_alt$removal_effects
50 | mod_attrib_alt$result
51 |
52 | # adding unique paths
53 | df_uniq_paths <- df_path_1_clean_uniq %>%
54 | filter(conversion == 1) %>%
55 | group_by(channel) %>%
56 | summarise(conversions = sum(conversion)) %>%
57 | ungroup()
58 |
59 | d_multi <- data.frame(mod_attrib_alt$result)
60 |
61 | d_split <- full_join(d_multi, df_uniq_paths, by = c('channel_name' = 'channel')) %>%
62 | mutate(result = total_conversions + conversions)
63 |
64 | sum(d_all$total_conversions)
65 | sum(d_split$result)
66 |
--------------------------------------------------------------------------------
/markov_addons/markov_transition_matrix_visual.r:
--------------------------------------------------------------------------------
1 | ### Vis 5 - Great Visualization of Markov Chain transition matrix
2 |
3 | # Code and background from:
4 | # https://analyzecore.com/2016/08/03/attribution-model-r-part-1/
5 |
6 | # transition matrix heatmap for "real" data
7 | df_plot_trans <- markov_attribution$transition_matrix
8 |
9 | cols <- c("#e7f0fa", "#c9e2f6", "#95cbee", "#0099dc", "#4ab04a", "#ffd73e", "#eec73a",
10 | "#e29421", "#e29421", "#f05336", "#ce472e")
11 | t <- max(df_plot_trans$transition_probability)
12 |
13 | ggplot(df_plot_trans, aes(y = channel_from, x = channel_to, fill = transition_probability)) +
14 | theme_minimal() +
15 | geom_tile(colour = "white", width = .9, height = .9) +
16 | scale_fill_gradientn(colours = cols, limits = c(0, t),
17 | breaks = seq(0, t, by = t/4),
18 | labels = c("0", round(t/4*1, 2), round(t/4*2, 2), round(t/4*3, 2), round(t/4*4, 2)),
19 | guide = guide_colourbar(ticks = T, nbin = 50, barheight = .5, label = T, barwidth = 10)) +
20 | geom_text(aes(label = round(transition_probability, 2)), fontface = "bold", size = 4) +
21 | theme(legend.position = 'bottom',
22 | legend.direction = "horizontal",
23 | panel.grid.major = element_blank(),
24 | panel.grid.minor = element_blank(),
25 | plot.title = element_text(size = 20, face = "bold", vjust = 2, color = 'black', lineheight = 0.8),
26 | axis.title.x = element_text(size = 24, face = "bold"),
27 | axis.title.y = element_text(size = 24, face = "bold"),
28 | axis.text.y = element_text(size = 8, face = "bold", color = 'black'),
29 | axis.text.x = element_text(size = 8, angle = 90, hjust = 0.5, vjust = 0.5, face = "plain")) +
30 | ggtitle("Transition matrix heatmap")
31 |
--------------------------------------------------------------------------------
/markov_addons/markov_transition_simulation.r:
--------------------------------------------------------------------------------
1 | # Building a small simulation of states after n steps on markov chain
2 |
3 | # Code and additional background originally from:
4 | # https://analyzecore.com/2016/08/03/attribution-model-r-part-1/
5 |
6 | library(expm)
7 |
8 | ##### modeling states and conversions #####
9 | # transition matrix preprocessing
10 | trans_matrix_complete <- markov_attribution$transition_matrix
11 | trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy %>%
12 | mutate(transition_probability = perc) %>%
13 | select(channel_from, channel_to, transition_probability))
14 | trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to, levels = c(levels(trans_matrix_complete$channel_from)))
15 | trans_matrix_complete <- dcast(trans_matrix_complete, channel_from ~ channel_to, value.var = 'transition_probability')
16 | trans_matrix_complete[is.na(trans_matrix_complete)] <- 0
17 | rownames(trans_matrix_complete) <- trans_matrix_complete$channel_from
18 | trans_matrix_complete <- as.matrix(trans_matrix_complete[, -1])
19 |
20 |
21 | # creating empty matrix for modeling
22 | model_mtrx <- matrix(data = 0,
23 | nrow = nrow(trans_matrix_complete), ncol = 1,
24 | dimnames = list(c(rownames(trans_matrix_complete)), '(start)'))
25 | # adding modeling number of visits
26 | model_mtrx['(start)', ] <- 1000
27 |
28 | c(model_mtrx) %*% (trans_matrix_complete %^% 5) # after 5 steps
29 | c(model_mtrx) %*% (trans_matrix_complete %^% 100000) # after 100000 steps
30 |
31 |
32 |
33 | df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'),
34 | channel_to = c('(start)', '(conversion)', '(null)'),
35 | n = c(0, 0, 0),
36 | tot_n = c(0, 0, 0),
37 | perc = c(0, 1, 1))
38 |
39 |
--------------------------------------------------------------------------------
/markov_chain_attribution.R:
--------------------------------------------------------------------------------
1 | ### Load Libraries ----
2 |
3 | if (!require("pacman")) install.packages("pacman")
4 | pacman::p_load(data.table, dplyr, ChannelAttribution, ggplot2, readr)
5 |
6 |
7 | ### Load Datasets ----
8 |
9 | campaign_data = fread(".../campaign_data.csv")
10 | campaign_budget_daily = fread(".../budget_sample_daily.csv")
11 |
12 | ### Prepare the files - Split Paths ----
13 | df_split = campaign_data %>%
14 | group_by(cookie) %>%
15 | arrange(time) %>%
16 | mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>%
17 | ungroup() %>%
18 | mutate(path_id = paste0(cookie, path_no))
19 |
20 |
21 | ### Prepare the file - Create the paths ----
22 | df_paths = df_split %>%
23 | group_by(path_id) %>%
24 | arrange(time) %>%
25 | summarise(path = paste(channel, collapse = ">"),
26 | total_conversions = sum(conversion)) %>%
27 | ungroup() %>%
28 | mutate(null_conversion = ifelse(total_conversions == 1, 0, 1)) # adding information about path that have not led to conversion
29 |
30 | ### Markov Chain and Heuristic Models ----
31 | markov_attribution <- markov_model(df_paths,
32 | var_path = "path",
33 | var_conv = "total_conversions",
34 | var_value = NULL,
35 | order = 2, # higher order markov chain
36 | var_null = "null_conversion",
37 | out_more = TRUE)
38 |
39 |
40 | heuristic_attribution <- heuristic_models(df_paths,
41 | var_path = "path",
42 | var_conv = "total_conversions")
43 |
44 |
45 |
46 | ### Prepare final joint dataset ----
47 |
48 | # Join attribution results
49 | all_model_results = merge(markov_attribution$result, heuristic_attribution)
50 |
51 | # Aggregate budget
52 | campaign_budget_total = as.data.table(
53 | campaign_budget_daily %>%
54 | group_by(channel) %>%
55 | summarise(total_cost = round(sum(cost), 1))
56 | )
57 |
58 | # Join into final results
59 | campaign_attribution = merge(all_model_results, campaign_budget_total,
60 | by.x = "channel_name", by.y = "channel")
61 |
62 | #### Calculate ROAS and CPA
63 | campaign_attribution =
64 | campaign_attribution %>%
65 | mutate(chanel_weight = (total_conversions / sum(total_conversions)),
66 | cost_weight = (total_cost / sum(total_cost)),
67 | roas = chanel_weight / cost_weight,
68 | optimal_budget = total_cost * roas,
69 | CPA = total_cost / total_conversions)
70 |
71 | # Change the name of markov results column
72 | names(campaign_attribution)[names(campaign_attribution) == "total_conversions"] = "markov_result"
73 |
74 | # Save the outputs
75 | write_csv(campaign_attribution, ".../campaign_attribution.csv")
76 |
77 |
--------------------------------------------------------------------------------
/markov_higher_order.R:
--------------------------------------------------------------------------------
1 | # Markov Chain can operate on lower on higher order - number of steps taken back when calculating the effect
2 | # Below is a simple comparison on ChannelAttribution the effect size between different orders
3 |
4 |
5 | # Libraries
6 | if (!require("pacman")) install.packages("pacman")
7 | pacman::p_load(data.table, ggplot2, dplyr, knitr, kableExtra)
8 |
9 |
10 | ### Higher order markov ----
11 |
12 | # Calculate markov chains
13 |
14 | markov_order1 = markov_model(df_paths,
15 | var_path = "path",
16 | var_conv = "total_conversions",
17 | var_value = NULL,
18 | var_null = NULL,
19 | out_more = TRUE,
20 | order = 1)
21 |
22 | markov_order2 = markov_model(df_paths,
23 | var_path = "path",
24 | var_conv = "total_conversions",
25 | var_value = NULL,
26 | var_null = NULL,
27 | out_more = TRUE,
28 | order = 2)
29 |
30 | markov_order3 = markov_model(df_paths,
31 | var_path = "path",
32 | var_conv = "total_conversions",
33 | var_value = NULL,
34 | var_null = NULL,
35 | out_more = TRUE,
36 | order = 3)
37 |
38 |
39 | ### Compare results ----
40 |
41 | # Merge results
42 | markov_results1 = markov_order1$result
43 | markov_results2 = markov_order2$result
44 | markov_results3 = markov_order3$result
45 |
46 | order_comparison =
47 | markov_results1 %>%
48 | left_join(markov_results2, by = 'channel_name') %>%
49 | left_join(markov_results3, by = 'channel_name') %>%
50 | arrange(desc(total_conversions.x))
51 |
52 | # change column names
53 | colnames(order_comparison) = c('channel_name', 'order1', 'order2', 'order3')
54 |
55 | # round all numeric columns
56 | nums = vapply(order_comparison, is.numeric, FUN.VALUE = logical(1))
57 | order_comparison[,nums] = round(order_comparison[,nums], 0)
58 |
59 |
60 | kable(order_comparison) %>%
61 | kable_styling(bootstrap_options = c("bordered", "hover"), full_width = F) %>%
62 | column_spec(1, bold = T, background = "aliceblue")
63 |
64 |
--------------------------------------------------------------------------------
/results_visualization/campaign_attribution.csv:
--------------------------------------------------------------------------------
1 | channel_name,markov_result,first_touch,last_touch,linear_touch,total_cost,chanel_weight,cost_weight,roas,optimal_budget,CPA
2 | Facebook,5940.39092991379,5908,6052,5957.793872194312,1481.7,0.30288027991198646,0.3050836988078325,0.9927776577232534,1470.9986554485447,0.24942802880845136
3 | Instagram,3665.366410561627,2634,2555,2570.135474419468,641.2,0.18688453630559462,0.13202380217019788,1.4155366928811313,907.6421274753815,0.17493476181600953
4 | Online Display,2212.9446308119614,2271,2246,2234.8089611535465,554.9,0.11283050174944992,0.1142545349723063,0.9875363089683789,547.9838978465534,0.2507518680195804
5 | Online Video,3250.793023843018,3803,3992,3937.3872710913865,991.8,0.16574685279370918,0.20421273704367163,0.811638172981657,804.9827399632073,0.3050947854033214
6 | Paid Search,4543.505004869603,4997,4768,4912.87442114129,1187.1,0.23165782923925982,0.2444252270059917,0.9477656299101282,1125.0925792663131,0.26127406016449833
7 |
--------------------------------------------------------------------------------
/results_visualization/markov_chain_visualization.R:
--------------------------------------------------------------------------------
1 | ### Visualize the results
2 |
3 | #libraries
4 | if (!require("pacman")) install.packages("pacman")
5 | pacman::p_load(data.table,ggplot2,dplyr, visNetwork)
6 |
7 | campaign_attribution = fread("C:\\Users\\matcyt\\Desktop\\Markov-Chain\\campaign_attribution.csv")
8 | str(campaign_attribution)
9 |
10 | ## Vis 1 - Campaign attribution - conversions ----
11 |
12 |
13 | # Re-order the factors for channel names - for proper order of the bars
14 | df_g1 = campaign_attribution[order(-campaign_attribution$markov_result), ]
15 | df_g1$channel_name = factor(df_g1$channel_name, levels = c("Facebook", "Instagram", "Paid Search", "Online Video", "Online Display"))
16 |
17 | # Create an ordered graph showing conversions attributed to each channel
18 | g_channel_performance <- ggplot(df_g1, aes(x = channel_name, y = markov_result, fill = channel_name)) +
19 | geom_bar(stat = "identity", width = 0.6) +
20 | ylim(0, 7000) +
21 | scale_fill_manual(values = c("#CE2D4F",
22 | "#A14DA0",
23 | "#9D79BC",
24 | "#7F96FF",
25 | "#A9CEF4")) +
26 | theme_minimal() +
27 | theme(axis.text.x = element_text(size = 9, angle = 30, hjust = 0.6, face = "bold")) +
28 | theme(panel.grid.major.x = element_blank()) +
29 | theme(plot.title = element_text(hjust = 0.5)) +
30 | geom_text(aes(label = round(markov_result, 0)), fontface = "bold", size = 4, vjust = -1) +
31 | labs(x = "", y = "Conversions") +
32 | ggtitle("Channel Performance") +
33 | guides(fill=FALSE)
34 |
35 | g_channel_performance
36 |
37 | ## Vis 2 - Visualize optimal budget allocation - ROAS based ----
38 | # Compare current budget allocation with the one suggested by Markov attribution
39 |
40 | # Create melted dataset for budget comparison
41 | df_g2 = campaign_attribution[, c("channel_name", "total_cost", "optimal_budget")]
42 | df_g2 = melt(df_g2, id = "channel_name")
43 |
44 | # Create double bar chart
45 | g_budget_allocation <- ggplot(df_g2, aes(x = channel_name, y = value, fill = variable)) +
46 | geom_bar(stat = "identity", width = 0.6, position = position_dodge(width = 0.7)) +
47 | scale_fill_manual(labels = c("Current Budget", "Optimal Budget"), values = c("#FFD166", "#04A777")) +
48 | theme_minimal() +
49 | theme(axis.text.x = element_text(size = 10, angle = 30, hjust = 0.6, face = "bold")) +
50 | theme(panel.grid.major.x = element_blank()) +
51 | geom_text(aes(label = round(value, 0)),
52 | fontface = "bold", size = 3.5,
53 | vjust = -0.5, position = position_dodge(width = 0.75)) +
54 | labs(x = "", y = "Budget $") +
55 | ggtitle("Budget Allocation") +
56 | theme(plot.title = element_text(hjust = 0.5))
57 |
58 | g_budget_allocation
59 |
60 | ## Vis 3 - Compare Markov Chain attribution and heuristics models ----
61 |
62 |
63 | # Create df for comparing heuristic models and markov results
64 | df_g3 = campaign_attribution[, c("channel_name", "markov_result", "first_touch", "last_touch", "linear_touch")]
65 | df_g3 = melt(df_g3, id = "channel_name")
66 |
67 |
68 | g_model_comparison <- ggplot(df_g3, aes(x = channel_name, y = value, fill = variable)) +
69 | geom_bar(stat = "identity", width = 0.6, position = position_dodge(width = 0.7)) +
70 | scale_fill_manual(labels = c("Markov Model", "First Touch", "Last Touch", "Linear"),
71 | values = c("#e65368",
72 | "#4e74ff",
73 | "#87BFFF",
74 | "#3BCEAC")) +
75 | theme_minimal() +
76 | theme(axis.text.x = element_text(size = 9, angle = 30, hjust = 0.6, face = "bold")) +
77 | theme(panel.grid.major.x = element_blank()) +
78 | labs(x = "", y = "Budget $") +
79 | ggtitle("Markov vs Heuristics") +
80 | theme(plot.title = element_text(hjust = 0.5))
81 |
82 | g_model_comparison
83 |
84 |
85 | ## Vis 4 - Markov network graph ----
86 |
87 | # Calculate transition matrix from markov chain - ChannelAttribution package
88 |
89 | trans_matrix_prob = markov_attribution$transition_matrix
90 | trans_matrix_prob[, c(1,2)] = lapply(trans_matrix_prob[, c(1,2)], as.character)
91 |
92 |
93 | ### Visualize the matrix ----
94 | edges <-
95 | data.frame(
96 | from = trans_matrix_prob$channel_from,
97 | to = trans_matrix_prob$channel_to,
98 | label = round(trans_matrix_prob$transition_probability, 2),
99 | font.size = trans_matrix_prob$transition_probability * 100,
100 | width = trans_matrix_prob$transition_probability * 15,
101 | shadow = TRUE,
102 | arrows = "to",
103 | color = list(color = "#95cbee", highlight = "red")
104 | )
105 |
106 | nodes <- data_frame(id = c( c(trans_matrix_prob$channel_from), c(trans_matrix_prob$channel_to) )) %>%
107 | distinct(id) %>%
108 | arrange(id) %>%
109 | mutate(
110 | label = id,
111 | color = ifelse(
112 | label %in% c('(start)', '(conversion)'),
113 | '#4ab04a',
114 | ifelse(label == '(null)', '#ce472e', '#ffd73e')
115 | ),
116 | shadow = TRUE,
117 | shape = "box"
118 | )
119 |
120 | visNetwork(nodes,
121 | edges,
122 | height = "2000px",
123 | width = "100%",
124 | main = "Markov Chain Visualized") %>%
125 | visIgraphLayout(randomSeed = 123) %>%
126 | visNodes(size = 5) %>%
127 | visOptions(highlightNearest = TRUE)
128 |
129 |
130 |
--------------------------------------------------------------------------------
/sample_datasets/README.md:
--------------------------------------------------------------------------------
1 | ## Datasets description
2 | Here you can find two sample datasets necessary for calculating markov attribution and budget allocation
3 |
4 | ### Campaign dataset - cookie level
5 |
6 | The main dataset resembling a (simplified) real data coming from digital marketing campaigns on cookie tracking level.
7 | To decrease the sample dataset size it has the 4 key variables necessary for markov chain analysis:
8 |
9 | **cookie** - unique identifier of user/session. Cookie lifetime varies usually from 30 to 90 days in online ads.
10 | Cookie campaign data serves as log of each user(s) action and is creating a map of his digital touchpoints and interacton with content over a period of time. Distribution of cookies in this file is based on actual campaign data.
11 |
12 | **timestamp** - the time of particular interaction with an add
13 |
14 | **interaction** - type of interaction between cookie and an add. Typically it will consist of impressions, clicks and conversions s but additional metrics might be defined by particular ad serving company
15 |
16 | **conversion** - binary column containing information if particular visit ended in conversion or not. Created out of interaction variable. Contains a low conversion distribution based on actual campaign. Typically around 0.9% to 2.0% of all journeys lead to a successful conversion [[1]](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2343077).
17 |
18 | Real campaign data will have many more variables like geography location, campaign name or information about browser, creative, device type or system. While it can bring new dimensions to analytics of the campaign it is not necessary for markov chain attribution model.
19 |
20 |
21 | ### Budget
22 |
23 | Simulated daily cost values for whole campaign calculated as a function of number of impressions.
24 |
25 |
26 |
--------------------------------------------------------------------------------
/sample_datasets/budget_sample_daily.csv:
--------------------------------------------------------------------------------
1 | day,channel,impressions,cost
2 | 2018-07-01,Facebook,7576,26.516000000000002
3 | 2018-07-01,Instagram,3350,13.399999999999999
4 | 2018-07-01,Online Display,3769,16.960499999999996
5 | 2018-07-01,Online Video,2364,11.82
6 | 2018-07-01,Paid Search,4992,27.456000000000003
7 | 2018-07-02,Facebook,9482,56.891999999999996
8 | 2018-07-02,Instagram,4120,26.78
9 | 2018-07-02,Online Display,3724,13.034
10 | 2018-07-02,Online Video,3786,15.143999999999998
11 | 2018-07-02,Paid Search,8806,39.626999999999995
12 | 2018-07-03,Facebook,8447,42.235
13 | 2018-07-03,Instagram,3715,20.4325
14 | 2018-07-03,Online Display,3741,22.445999999999998
15 | 2018-07-03,Online Video,3613,23.484500000000004
16 | 2018-07-03,Paid Search,13099,45.8465
17 | 2018-07-04,Facebook,9313,37.251999999999995
18 | 2018-07-04,Instagram,4018,18.081
19 | 2018-07-04,Online Display,3735,18.675
20 | 2018-07-04,Online Video,3471,19.090500000000002
21 | 2018-07-04,Paid Search,13305,79.83
22 | 2018-07-05,Facebook,8944,58.136
23 | 2018-07-05,Instagram,3927,13.7445
24 | 2018-07-05,Online Display,14,0.056
25 | 2018-07-05,Online Video,3418,15.380999999999998
26 | 2018-07-05,Paid Search,8972,44.86
27 | 2018-07-06,Facebook,8683,47.7565
28 | 2018-07-06,Instagram,3641,21.845999999999997
29 | 2018-07-06,Online Display,2744,17.836000000000002
30 | 2018-07-06,Online Video,2402,8.407
31 | 2018-07-06,Paid Search,5981,23.924
32 | 2018-07-07,Facebook,9341,42.034499999999994
33 | 2018-07-07,Instagram,4106,20.53
34 | 2018-07-07,Online Display,3366,18.512999999999998
35 | 2018-07-07,Online Video,3647,21.881999999999998
36 | 2018-07-07,Paid Search,7583,49.289500000000004
37 | 2018-07-08,Facebook,9019,31.566499999999998
38 | 2018-07-08,Instagram,3877,15.508
39 | 2018-07-08,Online Display,12,0.05399999999999999
40 | 2018-07-08,Online Video,4542,22.71
41 | 2018-07-08,Paid Search,10160,55.88
42 | 2018-07-09,Facebook,8232,49.391999999999996
43 | 2018-07-09,Instagram,3380,21.97
44 | 2018-07-09,Online Display,4125,14.437499999999998
45 | 2018-07-09,Online Video,5927,23.708
46 | 2018-07-09,Paid Search,8218,36.981
47 | 2018-07-10,Facebook,8632,43.16
48 | 2018-07-10,Instagram,3709,20.399500000000003
49 | 2018-07-10,Online Display,15,0.09
50 | 2018-07-10,Online Video,9672,62.868
51 | 2018-07-10,Paid Search,10463,36.62049999999999
52 | 2018-07-11,Facebook,8579,34.316
53 | 2018-07-11,Instagram,3600,16.2
54 | 2018-07-11,Online Display,5402,27.01
55 | 2018-07-11,Online Video,10058,55.319
56 | 2018-07-11,Paid Search,11147,66.88199999999999
57 | 2018-07-12,Facebook,9197,59.7805
58 | 2018-07-12,Instagram,3871,13.548499999999999
59 | 2018-07-12,Online Display,5687,22.747999999999998
60 | 2018-07-12,Online Video,10598,47.690999999999995
61 | 2018-07-12,Paid Search,10917,54.585
62 | 2018-07-13,Facebook,8863,48.746500000000005
63 | 2018-07-13,Instagram,3729,22.374
64 | 2018-07-13,Online Display,4550,29.575
65 | 2018-07-13,Online Video,5534,19.369
66 | 2018-07-13,Paid Search,8882,35.52799999999999
67 | 2018-07-14,Facebook,10190,45.855
68 | 2018-07-14,Instagram,4303,21.515
69 | 2018-07-14,Online Display,4524,24.882
70 | 2018-07-14,Online Video,7232,43.391999999999996
71 | 2018-07-14,Paid Search,9581,62.276500000000006
72 | 2018-07-15,Facebook,9931,34.7585
73 | 2018-07-15,Instagram,4152,16.608
74 | 2018-07-15,Online Display,4890,22.004999999999995
75 | 2018-07-15,Online Video,9810,49.05
76 | 2018-07-15,Paid Search,10255,56.4025
77 | 2018-07-16,Facebook,10956,65.736
78 | 2018-07-16,Instagram,4760,30.94
79 | 2018-07-16,Online Display,5046,17.660999999999998
80 | 2018-07-16,Online Video,8811,35.244
81 | 2018-07-16,Paid Search,9962,44.829
82 | 2018-07-17,Facebook,10762,53.81
83 | 2018-07-17,Instagram,4575,25.1625
84 | 2018-07-17,Online Display,5123,30.737999999999996
85 | 2018-07-17,Online Video,9298,60.437000000000005
86 | 2018-07-17,Paid Search,9887,34.6045
87 | 2018-07-18,Facebook,11280,45.12
88 | 2018-07-18,Instagram,4950,22.275
89 | 2018-07-18,Online Display,5289,26.445
90 | 2018-07-18,Online Video,11094,61.017
91 | 2018-07-18,Paid Search,8685,52.10999999999999
92 | 2018-07-19,Facebook,9685,62.9525
93 | 2018-07-19,Instagram,4152,14.532
94 | 2018-07-19,Online Display,4471,17.884
95 | 2018-07-19,Online Video,10654,47.943
96 | 2018-07-19,Paid Search,8262,41.31
97 | 2018-07-20,Facebook,9293,51.11150000000001
98 | 2018-07-20,Instagram,4104,24.624
99 | 2018-07-20,Online Display,3940,25.61
100 | 2018-07-20,Online Video,7592,26.572
101 | 2018-07-20,Paid Search,7310,29.239999999999995
102 | 2018-07-21,Facebook,9574,43.08299999999999
103 | 2018-07-21,Instagram,4085,20.425
104 | 2018-07-21,Online Display,4102,22.561000000000003
105 | 2018-07-21,Online Video,9287,55.722
106 | 2018-07-21,Paid Search,7833,50.914500000000004
107 | 2018-07-22,Facebook,10427,36.494499999999995
108 | 2018-07-22,Instagram,4461,17.843999999999998
109 | 2018-07-22,Online Display,4599,20.6955
110 | 2018-07-22,Online Video,12488,62.44
111 | 2018-07-22,Paid Search,5131,28.220500000000005
112 | 2018-07-23,Facebook,11438,68.628
113 | 2018-07-23,Instagram,4891,31.7915
114 | 2018-07-23,Online Display,4561,15.963499999999998
115 | 2018-07-23,Online Video,9490,37.96
116 | 2018-07-23,Paid Search,4876,21.941999999999997
117 | 2018-07-24,Facebook,9167,45.835
118 | 2018-07-24,Instagram,3989,21.939500000000002
119 | 2018-07-24,Online Display,3987,23.921999999999997
120 | 2018-07-24,Online Video,4876,31.694
121 | 2018-07-24,Paid Search,4389,15.3615
122 | 2018-07-25,Facebook,8499,33.995999999999995
123 | 2018-07-25,Instagram,3559,16.0155
124 | 2018-07-25,Online Display,3908,19.54
125 | 2018-07-25,Online Video,9272,50.996
126 | 2018-07-25,Paid Search,4650,27.9
127 | 2018-07-26,Facebook,9950,64.675
128 | 2018-07-26,Instagram,4210,14.735
129 | 2018-07-26,Online Display,3935,15.739999999999998
130 | 2018-07-26,Online Video,7222,32.498999999999995
131 | 2018-07-26,Paid Search,3750,18.75
132 | 2018-07-27,Facebook,9777,53.773500000000006
133 | 2018-07-27,Instagram,4253,25.518
134 | 2018-07-27,Online Display,3781,24.576500000000003
135 | 2018-07-27,Online Video,1805,6.3175
136 | 2018-07-27,Paid Search,3667,14.668
137 | 2018-07-28,Facebook,9990,44.955
138 | 2018-07-28,Instagram,4315,21.575
139 | 2018-07-28,Online Display,4336,23.848000000000003
140 | 2018-07-28,Online Video,2179,13.074
141 | 2018-07-28,Paid Search,5747,37.3555
142 | 2018-07-29,Facebook,12811,44.8385
143 | 2018-07-29,Instagram,5452,21.808
144 | 2018-07-29,Online Display,4736,21.311999999999998
145 | 2018-07-29,Online Video,5488,27.44
146 | 2018-07-29,Paid Search,6415,35.282500000000006
147 | 2018-07-30,Facebook,13033,78.19800000000001
148 | 2018-07-30,Instagram,5301,34.4565
149 | 2018-07-30,Online Display,15,0.0525
150 | 2018-07-30,Online Video,744,2.976
151 | 2018-07-30,Paid Search,3923,17.653499999999998
152 | 2018-07-31,Facebook,6015,30.075
153 | 2018-07-31,Instagram,2651,14.580500000000002
154 | 2018-07-31,Online Display,11,0.066
155 | 2018-07-31,Online Video,27,0.17550000000000002
156 | 2018-07-31,Paid Search,289,1.0115
157 |
--------------------------------------------------------------------------------
/sample_datasets/campaign_attribution.csv:
--------------------------------------------------------------------------------
1 | channel_name,markov_result,first_touch,last_touch,linear_touch,total_cost,chanel_weight,cost_weight,roas,optimal_budget,CPA
2 | Facebook,5940.39092991379,5908,6052,5957.793872194312,1481.7,0.30288027991198646,0.3050836988078325,0.9927776577232534,1470.9986554485447,0.24942802880845136
3 | Instagram,3665.366410561627,2634,2555,2570.135474419468,641.2,0.18688453630559462,0.13202380217019788,1.4155366928811313,907.6421274753815,0.17493476181600953
4 | Online Display,2212.9446308119614,2271,2246,2234.8089611535465,554.9,0.11283050174944992,0.1142545349723063,0.9875363089683789,547.9838978465534,0.2507518680195804
5 | Online Video,3250.793023843018,3803,3992,3937.3872710913865,991.8,0.16574685279370918,0.20421273704367163,0.811638172981657,804.9827399632073,0.3050947854033214
6 | Paid Search,4543.505004869603,4997,4768,4912.87442114129,1187.1,0.23165782923925982,0.2444252270059917,0.9477656299101282,1125.0925792663131,0.26127406016449833
7 |
--------------------------------------------------------------------------------