├── .gitignore ├── Markov-Chain.Rproj ├── README.md ├── campaign_attribution.csv ├── img ├── AttributionApproach.PNG ├── MarkovChain.PNG ├── g_budgetallocation.png ├── g_channelperformance.png ├── g_markovheuristics.png └── markov_graph.png ├── markov_addons ├── README.md ├── markov_deciding_path_duration.r ├── markov_null_paths_comparison.R ├── markov_one_path_channels.r ├── markov_transition_matrix_visual.r └── markov_transition_simulation.r ├── markov_chain_attribution.R ├── markov_higher_order.R ├── results_visualization ├── campaign_attribution.csv └── markov_chain_visualization.R └── sample_datasets ├── README.md ├── budget_sample_daily.csv ├── campaign_attribution.csv └── campaign_data.csv /.gitignore: -------------------------------------------------------------------------------- 1 | .Rproj.user 2 | .Rhistory 3 | .RData 4 | .Ruserdata 5 | -------------------------------------------------------------------------------- /Markov-Chain.Rproj: -------------------------------------------------------------------------------- 1 | Version: 1.0 2 | 3 | RestoreWorkspace: Default 4 | SaveWorkspace: Default 5 | AlwaysSaveHistory: Default 6 | 7 | EnableCodeIndexing: Yes 8 | UseSpacesForTab: Yes 9 | NumSpacesForTab: 2 10 | Encoding: UTF-8 11 | 12 | RnwWeave: Sweave 13 | LaTeX: pdfLaTeX 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Marketing-Attribution_Markov-Chain 2 | R based marketing attribution and campaign budget optimization using Markov Chains 3 | 4 | All of the concepts described shortly below have a corresponding links in the reference section. You can find there all articles and papers that I think are doing particularly great job in introducing and explaining this topic. 5 | 6 | #### If you are looking for attribution method that does not require customer paths - check out my other repository about relative weight analytis | https://github.com/MatCyt/relative-weight-analysis 7 | 8 | ### Marketing Attribution 9 | 10 | Marketing attribution is trying to answer one of the key questions in advertisement world: how precisely my campaigns and actions contributed to the results I see? In more [formal definition](https://en.wikipedia.org/wiki/Attribution_(marketing)): 11 | 12 | > attribution is the identification of a set of user actions ("events" or "touchpoints") that contribute in some manner to a desired outcome, and then the assignment of a value to each of these events. Marketing attribution provides a level of understanding of what combination of events in what particular order influence individuals to engage in a desired behavior, typically referred to as a conversion. 13 | 14 | Original, popular approach tries to solve this problem with set of heuristics: attributing all the conversions to the last or first touchpoint (last-touch or first-touch), dividing the glory equally among channels (linear attribution) or giving more credit to more recent one (time decay attribution). Below you can find a short visual summary and a reference in the last section discussing them in detail. 15 | 16 |

17 | Attribution Models 19 |

20 | 21 | While all of them are relatively easy to understand and implement (and above all better than nothing) they are also too simple to be true. Attribution problem can however be answered with more accurate data-driven models including: 22 | 23 | * logistic regressions 24 | * VAR models 25 | * Shapley value 26 | * Regression-based models (dominance analysis and relative weight analysis) 27 | * multivariate time-series models 28 | * Markov chains 29 | 30 | Following repository answers the attribution challenge using the popular Markov Chain. 31 | 32 | ### Markov Chain - Introduction 33 | 34 | Markov Chain essentially translate series of events into set of states (events itself) and transition probabilities between them (chance of moving from one event to another or staying in the current event). 35 | 36 |

37 | Markov Chain graph 38 |

39 | 40 | In our marketing attribution problem Markov Chain relates to the concept of the customer journey. Each touchpoint (online ad, landing page etc.) represents the state with the conversion or no-conversion being the final outcome of the journey. Based on the cookie level data tracking the customer actions online we can calculate the transition probabilities between each touchpoint. Final outcome of this transition matrix can be represented as a markov graph. 41 | 42 |

43 | Campaign Graph 45 |

46 | 47 | Finally attribution itself is calculated within Markov Chain by removal effect. [Explaining it simply](https://www.analyticsvidhya.com/blog/2018/01/channel-attribution-modeling-using-markov-chains-in-r/): 48 | 49 | > Removal effect principle says that if we want to find the contribution of each channel in the customer journey, we can do so by removing each channel and see how many conversions are happening without that channel being in place 50 | 51 | ### Markov Chain - R implementation 52 | 53 | Greatest article describing implementation of Markov Chain in R is definitely coming from Sergey's Bryl [Analyze Core](https://analyzecore.com/2017/05/31/marketing-multi-channel-attribution-model-r-part-2-practical-issues/). My main code in *markov_chain_attribution.R* is building on his main steps. You can test it out yourself using sample databases - description for them is included in their separate folder. 54 | 55 | Markov Chain for attribution is calculated on cookie level dataset containing time information. There are two most important distinct part of the code: 56 | 57 | Firstly after loading the campaign data we have to create a paths (touchpoints journey) for each distinct cookie that have ended with conversion or not - which is important information to include. The end results will create a string of touchpoints (ex. Channel1 > Channel5 > Channel7 > Conversion). 58 | ``` R 59 | ### Prepare the files - Split Paths ---- 60 | df_split = campaign_data %>% 61 | group_by(cookie) %>% 62 | arrange(time) %>% 63 | mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>% 64 | ungroup() %>% 65 | mutate(path_id = paste0(cookie, path_no)) 66 | 67 | 68 | ### Prepare the file - Create the paths ---- 69 | df_paths = df_split %>% 70 | group_by(path_id) %>% 71 | arrange(time) %>% 72 | summarise(path = paste(channel, collapse = ">"), 73 | total_conversions = sum(conversion)) %>% 74 | ungroup() %>% 75 | mutate(null_conversion = ifelse(total_conversions == 1, 0, 1)) # path that have not led to conversion 76 | ``` 77 | 78 | Then we can calculate the actual Markov Chain results using [Channel Attribution](https://cran.r-project.org/web/packages/ChannelAttribution/ChannelAttribution.pdf) package available in R. There are several information that we need to indicate within a function. 79 | * *var_path* and var_conv specify the appropriate columns (path and binary conversion) from input dataframe 80 | * *order* indicates how many steps we want to take back in calculating the current transition probability. You can compare differences between different order results in markov_higher_order.R and read more in reference links. Web users are not consider purely markovian [1](https://dl.acm.org/citation.cfm?id=2187919) therefore an order of 2 or 3 is typically applied for similar problems 81 | * *var_null* specifies the column containing binary values for paths that have not ended with conversion 82 | * *out_more* returns transition probabilities if set to TRUE. 83 | 84 | We can also calculate heuristics based results in order to compare them with Markov Chain attribution. 85 | 86 | ``` R 87 | ### Markov Chain and Heuristic Models ---- 88 | markov_attribution <- markov_model(df_paths, 89 | var_path = "path", 90 | var_conv = "total_conversions", 91 | var_value = NULL, 92 | order = 2, # higher order markov chain 93 | var_null = null_conversion, 94 | out_more = TRUE) 95 | 96 | 97 | heuristic_attribution <- heuristic_models(df_paths, 98 | var_path = "path", 99 | var_conv = "total_conversions") 100 | 101 | ``` 102 | 103 | 104 | ### Markov Chain - Attribution Result and Heuristics comparisons 105 | 106 | Following graphs show conversion attributed to channels by Markov model and comparison of its results with heuristics approach. All visualizations code can be found in results_visualization together with dataset necessary to run them. 107 | 108 | 109 |

110 | Channel Attribution 112 | Channel Comparison 114 |

115 | 116 | 117 | ### Markov Chain and budget allocation 118 | 119 | Based on Markov Chain attribution results we can calculate not only attributed conversion but also more better allocation of the campaign budget. In order to do so we will calculate *Return on Ad Spend* using the following formula 120 | 121 | **ROAS = Channel Conversion Weight / Channel Budget Weight** 122 | 123 | Channel Conversion Weight and Channel Budget Weight are ration of Channel's Conversion to Total Conversions and Channel Cost to total Campaigns Cost. ROAS > 100% indicates that channel is undervalued. Having this metric in place we can simply move to campaign budget recommendation through: 124 | 125 | **Proposed budget = Current budget x ROAS** 126 | 127 | All of this we calculate by merging aggregated budget data with the results of markov chain modelling: 128 | 129 | ```R 130 | # Aggregate budget 131 | campaign_budget_total = as.data.table( 132 | campaign_budget_daily %>% 133 | group_by(channel) %>% 134 | summarise(total_cost = round(sum(cost), 1)) 135 | ) 136 | 137 | # Join into final results 138 | campaign_attribution = merge(all_model_results, campaign_budget_total, 139 | by.x = "channel_name", by.y = "channel") 140 | 141 | #### Calculate ROAS and CPA 142 | campaign_attribution = 143 | campaign_attribution %>% 144 | mutate(chanel_weight = (total_conversions / sum(total_conversions)), 145 | cost_weight = (total_cost / sum(total_cost)), 146 | roas = chanel_weight / cost_weight, 147 | optimal_budget = total_cost * roas, 148 | CPA = total_cost / total_conversions) 149 | ``` 150 | 151 | Graph below will show us the comparison of current versus recommended budget based on the markov-driven allocation. Instagram seems to be undervalued as a channel contrary to Online Video and Paid Search. 152 | 153 |

154 | Budget Allocation 156 | 157 | ### Possible improvements 158 | 159 | Customer journey not ending with conversion can last for dozens of days (limited by cookie lifetime) and have many different touchpoint. We may want to decide to break it after X touchpoints or days following specific business logic. 160 | 161 | You may want try to validate the Markov results through accuracy measures based on prediction results to compare it with other allocation measurement methods. 162 | 163 | As Sergey's mention in his post the unique channel paths are undervalued by default in current calculation method. You may want to double check this impact and calculate markov results separately for one and multi-channel path. 164 | 165 | ### Markov Chain - Links and materials 166 | Above all - if you should read only one thing it would be two posts from Analyze Core blog by Sergey Bryl 167 | 168 | [Part 1 - Introduction to the topic in digital marketing context](https://analyzecore.com/2016/08/03/attribution-model-r-part-1/) 169 | 170 | [Part 2 - Great R implementation of Markov Chain](https://analyzecore.com/2017/05/31/marketing-multi-channel-attribution-model-r-part-2-practical-issues/) 171 | 172 | *R Libraries that can be applier to this problem* 173 | * [ChannelAttribution](https://cran.r-project.org/web/packages/ChannelAttribution/ChannelAttribution.pdf) 174 | * [markovchain](https://cran.r-project.org/web/packages/markovchain/index.html) 175 | * [clickstream](https://cran.r-project.org/web/packages/clickstream/clickstream.pdf) 176 | 177 | *Additional Resources* 178 | 179 | 1) [Heuristics Models Overview 1](https://www.snapapp.com/blog/marketing-attribution-models/) 180 | 2) [Heuristics Models Overview 2](https://www.referralsaasquatch.com/marketing-attribution/) 181 | 3) [Model Based Attribution - Overview](https://www.slideshare.net/MarketingFestival/lucie-sperkova-pioneering-multichannel-attribution-for-the-lack-of-comprehensive-solutions) 182 | 4) [Graphical introduction to Markov Chain](http://setosa.io/ev/markov-chains/) 183 | 5) [Markov Chains & Google Analytics Connection with R](https://stuifbergen.com/2016/11/conversion-attribution-markov-model-r/) 184 | 6) [Validating Markov Chains](https://amunategui.github.io/markov-chains/index.html) 185 | 186 | *Whitepapers* 187 | 188 | 1) [Mapping the Customer Journey: A Graph-Based Framework for Online Attribution Modeling](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2343077) 189 | 2) [Are Web Users Really Markovian?](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.232.5927&rep=rep1&type=pdf) 190 | 3) [Modeling Online Browsing and Path Analysis Using Clickstream Data](https://www.andrew.cmu.edu/user/alm3/papers/purchase%20conversion.pdf) 191 | 192 | -------------------------------------------------------------------------------- /campaign_attribution.csv: -------------------------------------------------------------------------------- 1 | channel_name,markov_result,first_touch,last_touch,linear_touch,total_cost,chanel_weight,cost_weight,roas,optimal_budget,CPA 2 | Facebook,5940.39092991379,5908,6052,5957.793872194312,1481.7,0.30288027991198646,0.3050836988078325,0.9927776577232534,1470.9986554485447,0.24942802880845136 3 | Instagram,3665.366410561627,2634,2555,2570.135474419468,641.2,0.18688453630559462,0.13202380217019788,1.4155366928811313,907.6421274753815,0.17493476181600953 4 | Online Display,2212.9446308119614,2271,2246,2234.8089611535465,554.9,0.11283050174944992,0.1142545349723063,0.9875363089683789,547.9838978465534,0.2507518680195804 5 | Online Video,3250.793023843018,3803,3992,3937.3872710913865,991.8,0.16574685279370918,0.20421273704367163,0.811638172981657,804.9827399632073,0.3050947854033214 6 | Paid Search,4543.505004869603,4997,4768,4912.87442114129,1187.1,0.23165782923925982,0.2444252270059917,0.9477656299101282,1125.0925792663131,0.26127406016449833 7 | -------------------------------------------------------------------------------- /img/AttributionApproach.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/AttributionApproach.PNG -------------------------------------------------------------------------------- /img/MarkovChain.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/MarkovChain.PNG -------------------------------------------------------------------------------- /img/g_budgetallocation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/g_budgetallocation.png -------------------------------------------------------------------------------- /img/g_channelperformance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/g_channelperformance.png -------------------------------------------------------------------------------- /img/g_markovheuristics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/g_markovheuristics.png -------------------------------------------------------------------------------- /img/markov_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MatCyt/Markov-Chain/1c7aadf1d70208d41010e7b24f5f94f961328274/img/markov_graph.png -------------------------------------------------------------------------------- /markov_addons/README.md: -------------------------------------------------------------------------------- 1 | # Datasets description 2 | 1) Campaign dataset 3 | 4 | 2) Budget 5 | daily budget data 6 | 7 | 3) Campaign attribution 8 | for the outcome visualization mainly 9 | -------------------------------------------------------------------------------- /markov_addons/markov_deciding_path_duration.r: -------------------------------------------------------------------------------- 1 | 2 | # Mecessary background and code taken from: 3 | # https://analyzecore.com/2017/05/31/marketing-multi-channel-attribution-model-r-part-2-practical-issues/ 4 | 5 | # How to set the right time window for customer journey duration. 6 | # We can visualize the day count capturing 95% of durations as in the code below. 7 | 8 | # It is getting more complex if we want to build probabilistic model looking forward 9 | # In this case we can also manage tha path that did not completed by our current date 10 | # (code and explanation in the body of the article) 11 | 12 | df_multi_paths_tl <- df_split %>% 13 | group_by(cookie) %>% 14 | mutate(date = as.Date(time)) %>% 15 | summarise(path = paste(channel, collapse = ' > '), 16 | first_touch_date = min(date), 17 | last_touch_date = max(date), 18 | tot_time_lapse = round(as.numeric(last_touch_date - first_touch_date)), 19 | conversion = sum(conversion)) %>% 20 | ungroup() 21 | 22 | # distribution plot 23 | ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) + 24 | theme_minimal() + 25 | geom_histogram(fill = '#4e79a7', binwidth = 1) 26 | 27 | # cumulative distribution plot 28 | ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) + 29 | theme_minimal() + 30 | stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) + 31 | geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) + 32 | geom_vline(xintercept = 23, color = '#e15759', size = 1.5, linetype = 2) 33 | 34 | -------------------------------------------------------------------------------- /markov_addons/markov_null_paths_comparison.R: -------------------------------------------------------------------------------- 1 | # In the markov chain approach and ChannelAttribution package we should remember about including the paths 2 | # that did not ended in conversion. Below there is a comparison of results for both approaches with visualized output. 3 | 4 | 5 | ## Load libraries 6 | if (!require("pacman")) install.packages("pacman") 7 | pacman::p_load(data.table, ggplot2, dplyr, ChannelAttribution) 8 | 9 | 10 | ## Markov Attribution without taking null conversion into consideration 11 | df_paths_conv = df_split %>% 12 | group_by(path_id) %>% 13 | arrange(time) %>% 14 | summarise(path = paste(channel, collapse = ">"), 15 | total_conversions = sum(conversion)) %>% 16 | ungroup() 17 | 18 | 19 | markov_attribution_conv <- markov_model(df_paths_conv, 20 | var_path = "path", 21 | var_conv = "total_conversions", 22 | var_value = NULL, 23 | var_null = NULL, 24 | out_more = TRUE) 25 | 26 | 27 | 28 | ## Markov Attribution including null conversion 29 | df_paths_null = df_split %>% 30 | group_by(path_id) %>% 31 | arrange(time) %>% 32 | summarise(path = paste(channel, collapse = ">"), 33 | total_conversions = sum(conversion)) %>% 34 | ungroup() %>% 35 | mutate(null_conversion = ifelse(total_conversions == 1, 0, 1)) # adding the null column 36 | 37 | 38 | markov_attribution_null <- markov_model(df_paths_null, 39 | var_path = "path", 40 | var_conv = "total_conversions", 41 | var_value = NULL, 42 | var_null = "null_conversion", # adding the null variable 43 | out_more = TRUE) 44 | 45 | 46 | ## Comparing results 47 | conv_result = markov_attribution_conv$result 48 | null_result = markov_attribution_null$result 49 | 50 | colnames(null_result) = c("channel_name","null_included") 51 | colnames(conv_result) = c("channel_name", "null_omitted") 52 | 53 | compare_approach = 54 | null_result %>% 55 | left_join(conv_result) 56 | 57 | df_gcompare = melt(compare_approach, id = "channel_name") 58 | 59 | g_compare_null <- ggplot(df_gcompare, aes(x = channel_name, y = value, fill = variable)) + 60 | geom_bar(stat = "identity", width = 0.6, position = position_dodge(width = 0.7)) + 61 | expand_limits(y=10000) + 62 | scale_fill_manual(labels = c("Null Included", "Null Omitted"), values = c("deepskyblue", "goldenrod1")) + 63 | theme_minimal() + 64 | theme(axis.text.x = element_text(angle = 30, hjust = 0.6)) + 65 | theme(panel.grid.major.x = element_blank()) + 66 | geom_text(aes(label = round(value, 0)), 67 | fontface = "bold", size = 3.5, 68 | vjust = -0.5, position = position_dodge(width = 0.75)) + 69 | labs(x = "", y = "Conversions") + 70 | ggtitle("Including vs omitting null conversion paths") + 71 | theme(plot.title = element_text(hjust = 0.5)) 72 | 73 | 74 | -------------------------------------------------------------------------------- /markov_addons/markov_one_path_channels.r: -------------------------------------------------------------------------------- 1 | # Compare one path and multipath channels 2 | 3 | # Markov model underestimates the conversions brought by one path channels (Channel -> Conversion) 4 | # Check if there is any channel with particular high one-path proportion? 5 | 6 | ##### one- and multi-channel paths ##### 7 | df_path_1_clean <- df_split %>% 8 | group_by(cookie) %>% 9 | mutate(uniq_channel_tag = ifelse(length(unique(channel)) == 1, TRUE, FALSE)) %>% 10 | ungroup() 11 | 12 | df_path_1_clean_uniq <- df_path_1_clean %>% 13 | filter(uniq_channel_tag == TRUE) %>% 14 | select(-uniq_channel_tag) 15 | 16 | df_path_1_clean_multi <- df_path_1_clean %>% 17 | filter(uniq_channel_tag == FALSE) %>% 18 | select(-uniq_channel_tag) 19 | 20 | ### experiment ### 21 | # attribution model for all paths 22 | df_all_paths <- df_path_1_clean %>% 23 | group_by(cookie) %>% 24 | summarise(path = paste(channel, collapse = ' > '), 25 | conversion = sum(conversion)) %>% 26 | ungroup() %>% 27 | filter(conversion == 1) 28 | 29 | mod_attrib <- markov_model(df_all_paths, 30 | var_path = 'path', 31 | var_conv = 'conversion', 32 | out_more = TRUE) 33 | mod_attrib$removal_effects 34 | mod_attrib$result 35 | d_all <- data.frame(mod_attrib$result) 36 | 37 | # attribution model for splitted multi and unique channel paths 38 | df_multi_paths <- df_path_1_clean_multi %>% 39 | group_by(cookie) %>% 40 | summarise(path = paste(channel, collapse = ' > '), 41 | conversion = sum(conversion)) %>% 42 | ungroup() %>% 43 | filter(conversion == 1) 44 | 45 | mod_attrib_alt <- markov_model(df_multi_paths, 46 | var_path = 'path', 47 | var_conv = 'conversion', 48 | out_more = TRUE) 49 | mod_attrib_alt$removal_effects 50 | mod_attrib_alt$result 51 | 52 | # adding unique paths 53 | df_uniq_paths <- df_path_1_clean_uniq %>% 54 | filter(conversion == 1) %>% 55 | group_by(channel) %>% 56 | summarise(conversions = sum(conversion)) %>% 57 | ungroup() 58 | 59 | d_multi <- data.frame(mod_attrib_alt$result) 60 | 61 | d_split <- full_join(d_multi, df_uniq_paths, by = c('channel_name' = 'channel')) %>% 62 | mutate(result = total_conversions + conversions) 63 | 64 | sum(d_all$total_conversions) 65 | sum(d_split$result) 66 | -------------------------------------------------------------------------------- /markov_addons/markov_transition_matrix_visual.r: -------------------------------------------------------------------------------- 1 | ### Vis 5 - Great Visualization of Markov Chain transition matrix 2 | 3 | # Code and background from: 4 | # https://analyzecore.com/2016/08/03/attribution-model-r-part-1/ 5 | 6 | # transition matrix heatmap for "real" data 7 | df_plot_trans <- markov_attribution$transition_matrix 8 | 9 | cols <- c("#e7f0fa", "#c9e2f6", "#95cbee", "#0099dc", "#4ab04a", "#ffd73e", "#eec73a", 10 | "#e29421", "#e29421", "#f05336", "#ce472e") 11 | t <- max(df_plot_trans$transition_probability) 12 | 13 | ggplot(df_plot_trans, aes(y = channel_from, x = channel_to, fill = transition_probability)) + 14 | theme_minimal() + 15 | geom_tile(colour = "white", width = .9, height = .9) + 16 | scale_fill_gradientn(colours = cols, limits = c(0, t), 17 | breaks = seq(0, t, by = t/4), 18 | labels = c("0", round(t/4*1, 2), round(t/4*2, 2), round(t/4*3, 2), round(t/4*4, 2)), 19 | guide = guide_colourbar(ticks = T, nbin = 50, barheight = .5, label = T, barwidth = 10)) + 20 | geom_text(aes(label = round(transition_probability, 2)), fontface = "bold", size = 4) + 21 | theme(legend.position = 'bottom', 22 | legend.direction = "horizontal", 23 | panel.grid.major = element_blank(), 24 | panel.grid.minor = element_blank(), 25 | plot.title = element_text(size = 20, face = "bold", vjust = 2, color = 'black', lineheight = 0.8), 26 | axis.title.x = element_text(size = 24, face = "bold"), 27 | axis.title.y = element_text(size = 24, face = "bold"), 28 | axis.text.y = element_text(size = 8, face = "bold", color = 'black'), 29 | axis.text.x = element_text(size = 8, angle = 90, hjust = 0.5, vjust = 0.5, face = "plain")) + 30 | ggtitle("Transition matrix heatmap") 31 | -------------------------------------------------------------------------------- /markov_addons/markov_transition_simulation.r: -------------------------------------------------------------------------------- 1 | # Building a small simulation of states after n steps on markov chain 2 | 3 | # Code and additional background originally from: 4 | # https://analyzecore.com/2016/08/03/attribution-model-r-part-1/ 5 | 6 | library(expm) 7 | 8 | ##### modeling states and conversions ##### 9 | # transition matrix preprocessing 10 | trans_matrix_complete <- markov_attribution$transition_matrix 11 | trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy %>% 12 | mutate(transition_probability = perc) %>% 13 | select(channel_from, channel_to, transition_probability)) 14 | trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to, levels = c(levels(trans_matrix_complete$channel_from))) 15 | trans_matrix_complete <- dcast(trans_matrix_complete, channel_from ~ channel_to, value.var = 'transition_probability') 16 | trans_matrix_complete[is.na(trans_matrix_complete)] <- 0 17 | rownames(trans_matrix_complete) <- trans_matrix_complete$channel_from 18 | trans_matrix_complete <- as.matrix(trans_matrix_complete[, -1]) 19 | 20 | 21 | # creating empty matrix for modeling 22 | model_mtrx <- matrix(data = 0, 23 | nrow = nrow(trans_matrix_complete), ncol = 1, 24 | dimnames = list(c(rownames(trans_matrix_complete)), '(start)')) 25 | # adding modeling number of visits 26 | model_mtrx['(start)', ] <- 1000 27 | 28 | c(model_mtrx) %*% (trans_matrix_complete %^% 5) # after 5 steps 29 | c(model_mtrx) %*% (trans_matrix_complete %^% 100000) # after 100000 steps 30 | 31 | 32 | 33 | df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'), 34 | channel_to = c('(start)', '(conversion)', '(null)'), 35 | n = c(0, 0, 0), 36 | tot_n = c(0, 0, 0), 37 | perc = c(0, 1, 1)) 38 | 39 | -------------------------------------------------------------------------------- /markov_chain_attribution.R: -------------------------------------------------------------------------------- 1 | ### Load Libraries ---- 2 | 3 | if (!require("pacman")) install.packages("pacman") 4 | pacman::p_load(data.table, dplyr, ChannelAttribution, ggplot2, readr) 5 | 6 | 7 | ### Load Datasets ---- 8 | 9 | campaign_data = fread(".../campaign_data.csv") 10 | campaign_budget_daily = fread(".../budget_sample_daily.csv") 11 | 12 | ### Prepare the files - Split Paths ---- 13 | df_split = campaign_data %>% 14 | group_by(cookie) %>% 15 | arrange(time) %>% 16 | mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>% 17 | ungroup() %>% 18 | mutate(path_id = paste0(cookie, path_no)) 19 | 20 | 21 | ### Prepare the file - Create the paths ---- 22 | df_paths = df_split %>% 23 | group_by(path_id) %>% 24 | arrange(time) %>% 25 | summarise(path = paste(channel, collapse = ">"), 26 | total_conversions = sum(conversion)) %>% 27 | ungroup() %>% 28 | mutate(null_conversion = ifelse(total_conversions == 1, 0, 1)) # adding information about path that have not led to conversion 29 | 30 | ### Markov Chain and Heuristic Models ---- 31 | markov_attribution <- markov_model(df_paths, 32 | var_path = "path", 33 | var_conv = "total_conversions", 34 | var_value = NULL, 35 | order = 2, # higher order markov chain 36 | var_null = "null_conversion", 37 | out_more = TRUE) 38 | 39 | 40 | heuristic_attribution <- heuristic_models(df_paths, 41 | var_path = "path", 42 | var_conv = "total_conversions") 43 | 44 | 45 | 46 | ### Prepare final joint dataset ---- 47 | 48 | # Join attribution results 49 | all_model_results = merge(markov_attribution$result, heuristic_attribution) 50 | 51 | # Aggregate budget 52 | campaign_budget_total = as.data.table( 53 | campaign_budget_daily %>% 54 | group_by(channel) %>% 55 | summarise(total_cost = round(sum(cost), 1)) 56 | ) 57 | 58 | # Join into final results 59 | campaign_attribution = merge(all_model_results, campaign_budget_total, 60 | by.x = "channel_name", by.y = "channel") 61 | 62 | #### Calculate ROAS and CPA 63 | campaign_attribution = 64 | campaign_attribution %>% 65 | mutate(chanel_weight = (total_conversions / sum(total_conversions)), 66 | cost_weight = (total_cost / sum(total_cost)), 67 | roas = chanel_weight / cost_weight, 68 | optimal_budget = total_cost * roas, 69 | CPA = total_cost / total_conversions) 70 | 71 | # Change the name of markov results column 72 | names(campaign_attribution)[names(campaign_attribution) == "total_conversions"] = "markov_result" 73 | 74 | # Save the outputs 75 | write_csv(campaign_attribution, ".../campaign_attribution.csv") 76 | 77 | -------------------------------------------------------------------------------- /markov_higher_order.R: -------------------------------------------------------------------------------- 1 | # Markov Chain can operate on lower on higher order - number of steps taken back when calculating the effect 2 | # Below is a simple comparison on ChannelAttribution the effect size between different orders 3 | 4 | 5 | # Libraries 6 | if (!require("pacman")) install.packages("pacman") 7 | pacman::p_load(data.table, ggplot2, dplyr, knitr, kableExtra) 8 | 9 | 10 | ### Higher order markov ---- 11 | 12 | # Calculate markov chains 13 | 14 | markov_order1 = markov_model(df_paths, 15 | var_path = "path", 16 | var_conv = "total_conversions", 17 | var_value = NULL, 18 | var_null = NULL, 19 | out_more = TRUE, 20 | order = 1) 21 | 22 | markov_order2 = markov_model(df_paths, 23 | var_path = "path", 24 | var_conv = "total_conversions", 25 | var_value = NULL, 26 | var_null = NULL, 27 | out_more = TRUE, 28 | order = 2) 29 | 30 | markov_order3 = markov_model(df_paths, 31 | var_path = "path", 32 | var_conv = "total_conversions", 33 | var_value = NULL, 34 | var_null = NULL, 35 | out_more = TRUE, 36 | order = 3) 37 | 38 | 39 | ### Compare results ---- 40 | 41 | # Merge results 42 | markov_results1 = markov_order1$result 43 | markov_results2 = markov_order2$result 44 | markov_results3 = markov_order3$result 45 | 46 | order_comparison = 47 | markov_results1 %>% 48 | left_join(markov_results2, by = 'channel_name') %>% 49 | left_join(markov_results3, by = 'channel_name') %>% 50 | arrange(desc(total_conversions.x)) 51 | 52 | # change column names 53 | colnames(order_comparison) = c('channel_name', 'order1', 'order2', 'order3') 54 | 55 | # round all numeric columns 56 | nums = vapply(order_comparison, is.numeric, FUN.VALUE = logical(1)) 57 | order_comparison[,nums] = round(order_comparison[,nums], 0) 58 | 59 | 60 | kable(order_comparison) %>% 61 | kable_styling(bootstrap_options = c("bordered", "hover"), full_width = F) %>% 62 | column_spec(1, bold = T, background = "aliceblue") 63 | 64 | -------------------------------------------------------------------------------- /results_visualization/campaign_attribution.csv: -------------------------------------------------------------------------------- 1 | channel_name,markov_result,first_touch,last_touch,linear_touch,total_cost,chanel_weight,cost_weight,roas,optimal_budget,CPA 2 | Facebook,5940.39092991379,5908,6052,5957.793872194312,1481.7,0.30288027991198646,0.3050836988078325,0.9927776577232534,1470.9986554485447,0.24942802880845136 3 | Instagram,3665.366410561627,2634,2555,2570.135474419468,641.2,0.18688453630559462,0.13202380217019788,1.4155366928811313,907.6421274753815,0.17493476181600953 4 | Online Display,2212.9446308119614,2271,2246,2234.8089611535465,554.9,0.11283050174944992,0.1142545349723063,0.9875363089683789,547.9838978465534,0.2507518680195804 5 | Online Video,3250.793023843018,3803,3992,3937.3872710913865,991.8,0.16574685279370918,0.20421273704367163,0.811638172981657,804.9827399632073,0.3050947854033214 6 | Paid Search,4543.505004869603,4997,4768,4912.87442114129,1187.1,0.23165782923925982,0.2444252270059917,0.9477656299101282,1125.0925792663131,0.26127406016449833 7 | -------------------------------------------------------------------------------- /results_visualization/markov_chain_visualization.R: -------------------------------------------------------------------------------- 1 | ### Visualize the results 2 | 3 | #libraries 4 | if (!require("pacman")) install.packages("pacman") 5 | pacman::p_load(data.table,ggplot2,dplyr, visNetwork) 6 | 7 | campaign_attribution = fread("C:\\Users\\matcyt\\Desktop\\Markov-Chain\\campaign_attribution.csv") 8 | str(campaign_attribution) 9 | 10 | ## Vis 1 - Campaign attribution - conversions ---- 11 | 12 | 13 | # Re-order the factors for channel names - for proper order of the bars 14 | df_g1 = campaign_attribution[order(-campaign_attribution$markov_result), ] 15 | df_g1$channel_name = factor(df_g1$channel_name, levels = c("Facebook", "Instagram", "Paid Search", "Online Video", "Online Display")) 16 | 17 | # Create an ordered graph showing conversions attributed to each channel 18 | g_channel_performance <- ggplot(df_g1, aes(x = channel_name, y = markov_result, fill = channel_name)) + 19 | geom_bar(stat = "identity", width = 0.6) + 20 | ylim(0, 7000) + 21 | scale_fill_manual(values = c("#CE2D4F", 22 | "#A14DA0", 23 | "#9D79BC", 24 | "#7F96FF", 25 | "#A9CEF4")) + 26 | theme_minimal() + 27 | theme(axis.text.x = element_text(size = 9, angle = 30, hjust = 0.6, face = "bold")) + 28 | theme(panel.grid.major.x = element_blank()) + 29 | theme(plot.title = element_text(hjust = 0.5)) + 30 | geom_text(aes(label = round(markov_result, 0)), fontface = "bold", size = 4, vjust = -1) + 31 | labs(x = "", y = "Conversions") + 32 | ggtitle("Channel Performance") + 33 | guides(fill=FALSE) 34 | 35 | g_channel_performance 36 | 37 | ## Vis 2 - Visualize optimal budget allocation - ROAS based ---- 38 | # Compare current budget allocation with the one suggested by Markov attribution 39 | 40 | # Create melted dataset for budget comparison 41 | df_g2 = campaign_attribution[, c("channel_name", "total_cost", "optimal_budget")] 42 | df_g2 = melt(df_g2, id = "channel_name") 43 | 44 | # Create double bar chart 45 | g_budget_allocation <- ggplot(df_g2, aes(x = channel_name, y = value, fill = variable)) + 46 | geom_bar(stat = "identity", width = 0.6, position = position_dodge(width = 0.7)) + 47 | scale_fill_manual(labels = c("Current Budget", "Optimal Budget"), values = c("#FFD166", "#04A777")) + 48 | theme_minimal() + 49 | theme(axis.text.x = element_text(size = 10, angle = 30, hjust = 0.6, face = "bold")) + 50 | theme(panel.grid.major.x = element_blank()) + 51 | geom_text(aes(label = round(value, 0)), 52 | fontface = "bold", size = 3.5, 53 | vjust = -0.5, position = position_dodge(width = 0.75)) + 54 | labs(x = "", y = "Budget $") + 55 | ggtitle("Budget Allocation") + 56 | theme(plot.title = element_text(hjust = 0.5)) 57 | 58 | g_budget_allocation 59 | 60 | ## Vis 3 - Compare Markov Chain attribution and heuristics models ---- 61 | 62 | 63 | # Create df for comparing heuristic models and markov results 64 | df_g3 = campaign_attribution[, c("channel_name", "markov_result", "first_touch", "last_touch", "linear_touch")] 65 | df_g3 = melt(df_g3, id = "channel_name") 66 | 67 | 68 | g_model_comparison <- ggplot(df_g3, aes(x = channel_name, y = value, fill = variable)) + 69 | geom_bar(stat = "identity", width = 0.6, position = position_dodge(width = 0.7)) + 70 | scale_fill_manual(labels = c("Markov Model", "First Touch", "Last Touch", "Linear"), 71 | values = c("#e65368", 72 | "#4e74ff", 73 | "#87BFFF", 74 | "#3BCEAC")) + 75 | theme_minimal() + 76 | theme(axis.text.x = element_text(size = 9, angle = 30, hjust = 0.6, face = "bold")) + 77 | theme(panel.grid.major.x = element_blank()) + 78 | labs(x = "", y = "Budget $") + 79 | ggtitle("Markov vs Heuristics") + 80 | theme(plot.title = element_text(hjust = 0.5)) 81 | 82 | g_model_comparison 83 | 84 | 85 | ## Vis 4 - Markov network graph ---- 86 | 87 | # Calculate transition matrix from markov chain - ChannelAttribution package 88 | 89 | trans_matrix_prob = markov_attribution$transition_matrix 90 | trans_matrix_prob[, c(1,2)] = lapply(trans_matrix_prob[, c(1,2)], as.character) 91 | 92 | 93 | ### Visualize the matrix ---- 94 | edges <- 95 | data.frame( 96 | from = trans_matrix_prob$channel_from, 97 | to = trans_matrix_prob$channel_to, 98 | label = round(trans_matrix_prob$transition_probability, 2), 99 | font.size = trans_matrix_prob$transition_probability * 100, 100 | width = trans_matrix_prob$transition_probability * 15, 101 | shadow = TRUE, 102 | arrows = "to", 103 | color = list(color = "#95cbee", highlight = "red") 104 | ) 105 | 106 | nodes <- data_frame(id = c( c(trans_matrix_prob$channel_from), c(trans_matrix_prob$channel_to) )) %>% 107 | distinct(id) %>% 108 | arrange(id) %>% 109 | mutate( 110 | label = id, 111 | color = ifelse( 112 | label %in% c('(start)', '(conversion)'), 113 | '#4ab04a', 114 | ifelse(label == '(null)', '#ce472e', '#ffd73e') 115 | ), 116 | shadow = TRUE, 117 | shape = "box" 118 | ) 119 | 120 | visNetwork(nodes, 121 | edges, 122 | height = "2000px", 123 | width = "100%", 124 | main = "Markov Chain Visualized") %>% 125 | visIgraphLayout(randomSeed = 123) %>% 126 | visNodes(size = 5) %>% 127 | visOptions(highlightNearest = TRUE) 128 | 129 | 130 | -------------------------------------------------------------------------------- /sample_datasets/README.md: -------------------------------------------------------------------------------- 1 | ## Datasets description 2 | Here you can find two sample datasets necessary for calculating markov attribution and budget allocation 3 | 4 | ### Campaign dataset - cookie level 5 | 6 | The main dataset resembling a (simplified) real data coming from digital marketing campaigns on cookie tracking level. 7 | To decrease the sample dataset size it has the 4 key variables necessary for markov chain analysis: 8 | 9 | **cookie** - unique identifier of user/session. Cookie lifetime varies usually from 30 to 90 days in online ads. 10 | Cookie campaign data serves as log of each user(s) action and is creating a map of his digital touchpoints and interacton with content over a period of time. Distribution of cookies in this file is based on actual campaign data. 11 | 12 | **timestamp** - the time of particular interaction with an add 13 | 14 | **interaction** - type of interaction between cookie and an add. Typically it will consist of impressions, clicks and conversions s but additional metrics might be defined by particular ad serving company 15 | 16 | **conversion** - binary column containing information if particular visit ended in conversion or not. Created out of interaction variable. Contains a low conversion distribution based on actual campaign. Typically around 0.9% to 2.0% of all journeys lead to a successful conversion [[1]](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2343077). 17 | 18 | Real campaign data will have many more variables like geography location, campaign name or information about browser, creative, device type or system. While it can bring new dimensions to analytics of the campaign it is not necessary for markov chain attribution model. 19 | 20 | 21 | ### Budget 22 | 23 | Simulated daily cost values for whole campaign calculated as a function of number of impressions. 24 | 25 | 26 | -------------------------------------------------------------------------------- /sample_datasets/budget_sample_daily.csv: -------------------------------------------------------------------------------- 1 | day,channel,impressions,cost 2 | 2018-07-01,Facebook,7576,26.516000000000002 3 | 2018-07-01,Instagram,3350,13.399999999999999 4 | 2018-07-01,Online Display,3769,16.960499999999996 5 | 2018-07-01,Online Video,2364,11.82 6 | 2018-07-01,Paid Search,4992,27.456000000000003 7 | 2018-07-02,Facebook,9482,56.891999999999996 8 | 2018-07-02,Instagram,4120,26.78 9 | 2018-07-02,Online Display,3724,13.034 10 | 2018-07-02,Online Video,3786,15.143999999999998 11 | 2018-07-02,Paid Search,8806,39.626999999999995 12 | 2018-07-03,Facebook,8447,42.235 13 | 2018-07-03,Instagram,3715,20.4325 14 | 2018-07-03,Online Display,3741,22.445999999999998 15 | 2018-07-03,Online Video,3613,23.484500000000004 16 | 2018-07-03,Paid Search,13099,45.8465 17 | 2018-07-04,Facebook,9313,37.251999999999995 18 | 2018-07-04,Instagram,4018,18.081 19 | 2018-07-04,Online Display,3735,18.675 20 | 2018-07-04,Online Video,3471,19.090500000000002 21 | 2018-07-04,Paid Search,13305,79.83 22 | 2018-07-05,Facebook,8944,58.136 23 | 2018-07-05,Instagram,3927,13.7445 24 | 2018-07-05,Online Display,14,0.056 25 | 2018-07-05,Online Video,3418,15.380999999999998 26 | 2018-07-05,Paid Search,8972,44.86 27 | 2018-07-06,Facebook,8683,47.7565 28 | 2018-07-06,Instagram,3641,21.845999999999997 29 | 2018-07-06,Online Display,2744,17.836000000000002 30 | 2018-07-06,Online Video,2402,8.407 31 | 2018-07-06,Paid Search,5981,23.924 32 | 2018-07-07,Facebook,9341,42.034499999999994 33 | 2018-07-07,Instagram,4106,20.53 34 | 2018-07-07,Online Display,3366,18.512999999999998 35 | 2018-07-07,Online Video,3647,21.881999999999998 36 | 2018-07-07,Paid Search,7583,49.289500000000004 37 | 2018-07-08,Facebook,9019,31.566499999999998 38 | 2018-07-08,Instagram,3877,15.508 39 | 2018-07-08,Online Display,12,0.05399999999999999 40 | 2018-07-08,Online Video,4542,22.71 41 | 2018-07-08,Paid Search,10160,55.88 42 | 2018-07-09,Facebook,8232,49.391999999999996 43 | 2018-07-09,Instagram,3380,21.97 44 | 2018-07-09,Online Display,4125,14.437499999999998 45 | 2018-07-09,Online Video,5927,23.708 46 | 2018-07-09,Paid Search,8218,36.981 47 | 2018-07-10,Facebook,8632,43.16 48 | 2018-07-10,Instagram,3709,20.399500000000003 49 | 2018-07-10,Online Display,15,0.09 50 | 2018-07-10,Online Video,9672,62.868 51 | 2018-07-10,Paid Search,10463,36.62049999999999 52 | 2018-07-11,Facebook,8579,34.316 53 | 2018-07-11,Instagram,3600,16.2 54 | 2018-07-11,Online Display,5402,27.01 55 | 2018-07-11,Online Video,10058,55.319 56 | 2018-07-11,Paid Search,11147,66.88199999999999 57 | 2018-07-12,Facebook,9197,59.7805 58 | 2018-07-12,Instagram,3871,13.548499999999999 59 | 2018-07-12,Online Display,5687,22.747999999999998 60 | 2018-07-12,Online Video,10598,47.690999999999995 61 | 2018-07-12,Paid Search,10917,54.585 62 | 2018-07-13,Facebook,8863,48.746500000000005 63 | 2018-07-13,Instagram,3729,22.374 64 | 2018-07-13,Online Display,4550,29.575 65 | 2018-07-13,Online Video,5534,19.369 66 | 2018-07-13,Paid Search,8882,35.52799999999999 67 | 2018-07-14,Facebook,10190,45.855 68 | 2018-07-14,Instagram,4303,21.515 69 | 2018-07-14,Online Display,4524,24.882 70 | 2018-07-14,Online Video,7232,43.391999999999996 71 | 2018-07-14,Paid Search,9581,62.276500000000006 72 | 2018-07-15,Facebook,9931,34.7585 73 | 2018-07-15,Instagram,4152,16.608 74 | 2018-07-15,Online Display,4890,22.004999999999995 75 | 2018-07-15,Online Video,9810,49.05 76 | 2018-07-15,Paid Search,10255,56.4025 77 | 2018-07-16,Facebook,10956,65.736 78 | 2018-07-16,Instagram,4760,30.94 79 | 2018-07-16,Online Display,5046,17.660999999999998 80 | 2018-07-16,Online Video,8811,35.244 81 | 2018-07-16,Paid Search,9962,44.829 82 | 2018-07-17,Facebook,10762,53.81 83 | 2018-07-17,Instagram,4575,25.1625 84 | 2018-07-17,Online Display,5123,30.737999999999996 85 | 2018-07-17,Online Video,9298,60.437000000000005 86 | 2018-07-17,Paid Search,9887,34.6045 87 | 2018-07-18,Facebook,11280,45.12 88 | 2018-07-18,Instagram,4950,22.275 89 | 2018-07-18,Online Display,5289,26.445 90 | 2018-07-18,Online Video,11094,61.017 91 | 2018-07-18,Paid Search,8685,52.10999999999999 92 | 2018-07-19,Facebook,9685,62.9525 93 | 2018-07-19,Instagram,4152,14.532 94 | 2018-07-19,Online Display,4471,17.884 95 | 2018-07-19,Online Video,10654,47.943 96 | 2018-07-19,Paid Search,8262,41.31 97 | 2018-07-20,Facebook,9293,51.11150000000001 98 | 2018-07-20,Instagram,4104,24.624 99 | 2018-07-20,Online Display,3940,25.61 100 | 2018-07-20,Online Video,7592,26.572 101 | 2018-07-20,Paid Search,7310,29.239999999999995 102 | 2018-07-21,Facebook,9574,43.08299999999999 103 | 2018-07-21,Instagram,4085,20.425 104 | 2018-07-21,Online Display,4102,22.561000000000003 105 | 2018-07-21,Online Video,9287,55.722 106 | 2018-07-21,Paid Search,7833,50.914500000000004 107 | 2018-07-22,Facebook,10427,36.494499999999995 108 | 2018-07-22,Instagram,4461,17.843999999999998 109 | 2018-07-22,Online Display,4599,20.6955 110 | 2018-07-22,Online Video,12488,62.44 111 | 2018-07-22,Paid Search,5131,28.220500000000005 112 | 2018-07-23,Facebook,11438,68.628 113 | 2018-07-23,Instagram,4891,31.7915 114 | 2018-07-23,Online Display,4561,15.963499999999998 115 | 2018-07-23,Online Video,9490,37.96 116 | 2018-07-23,Paid Search,4876,21.941999999999997 117 | 2018-07-24,Facebook,9167,45.835 118 | 2018-07-24,Instagram,3989,21.939500000000002 119 | 2018-07-24,Online Display,3987,23.921999999999997 120 | 2018-07-24,Online Video,4876,31.694 121 | 2018-07-24,Paid Search,4389,15.3615 122 | 2018-07-25,Facebook,8499,33.995999999999995 123 | 2018-07-25,Instagram,3559,16.0155 124 | 2018-07-25,Online Display,3908,19.54 125 | 2018-07-25,Online Video,9272,50.996 126 | 2018-07-25,Paid Search,4650,27.9 127 | 2018-07-26,Facebook,9950,64.675 128 | 2018-07-26,Instagram,4210,14.735 129 | 2018-07-26,Online Display,3935,15.739999999999998 130 | 2018-07-26,Online Video,7222,32.498999999999995 131 | 2018-07-26,Paid Search,3750,18.75 132 | 2018-07-27,Facebook,9777,53.773500000000006 133 | 2018-07-27,Instagram,4253,25.518 134 | 2018-07-27,Online Display,3781,24.576500000000003 135 | 2018-07-27,Online Video,1805,6.3175 136 | 2018-07-27,Paid Search,3667,14.668 137 | 2018-07-28,Facebook,9990,44.955 138 | 2018-07-28,Instagram,4315,21.575 139 | 2018-07-28,Online Display,4336,23.848000000000003 140 | 2018-07-28,Online Video,2179,13.074 141 | 2018-07-28,Paid Search,5747,37.3555 142 | 2018-07-29,Facebook,12811,44.8385 143 | 2018-07-29,Instagram,5452,21.808 144 | 2018-07-29,Online Display,4736,21.311999999999998 145 | 2018-07-29,Online Video,5488,27.44 146 | 2018-07-29,Paid Search,6415,35.282500000000006 147 | 2018-07-30,Facebook,13033,78.19800000000001 148 | 2018-07-30,Instagram,5301,34.4565 149 | 2018-07-30,Online Display,15,0.0525 150 | 2018-07-30,Online Video,744,2.976 151 | 2018-07-30,Paid Search,3923,17.653499999999998 152 | 2018-07-31,Facebook,6015,30.075 153 | 2018-07-31,Instagram,2651,14.580500000000002 154 | 2018-07-31,Online Display,11,0.066 155 | 2018-07-31,Online Video,27,0.17550000000000002 156 | 2018-07-31,Paid Search,289,1.0115 157 | -------------------------------------------------------------------------------- /sample_datasets/campaign_attribution.csv: -------------------------------------------------------------------------------- 1 | channel_name,markov_result,first_touch,last_touch,linear_touch,total_cost,chanel_weight,cost_weight,roas,optimal_budget,CPA 2 | Facebook,5940.39092991379,5908,6052,5957.793872194312,1481.7,0.30288027991198646,0.3050836988078325,0.9927776577232534,1470.9986554485447,0.24942802880845136 3 | Instagram,3665.366410561627,2634,2555,2570.135474419468,641.2,0.18688453630559462,0.13202380217019788,1.4155366928811313,907.6421274753815,0.17493476181600953 4 | Online Display,2212.9446308119614,2271,2246,2234.8089611535465,554.9,0.11283050174944992,0.1142545349723063,0.9875363089683789,547.9838978465534,0.2507518680195804 5 | Online Video,3250.793023843018,3803,3992,3937.3872710913865,991.8,0.16574685279370918,0.20421273704367163,0.811638172981657,804.9827399632073,0.3050947854033214 6 | Paid Search,4543.505004869603,4997,4768,4912.87442114129,1187.1,0.23165782923925982,0.2444252270059917,0.9477656299101282,1125.0925792663131,0.26127406016449833 7 | --------------------------------------------------------------------------------