├── Advanced features ├── Analytic Functions.md ├── Custom_moving_mau_window.sql ├── Hourly Player Session Distribution in local time.sql ├── SessionCounter example.sql ├── json_parse.sql ├── lastUiName.sql ├── time_slice_concurrency.sql ├── top_percent_spenders.sql └── transactions_per_clientVersion.sql ├── KPIs ├── ARPDAU.sql ├── ARPDAU_and_revenue_with_smartads.sql ├── First_IAP_by_transactionName.sql ├── IAP_per_platform_per_day.sql ├── Mission Starters vs Completers.sql ├── MissionCompletion.sql ├── New_Player_Retention_Chart.sql ├── New_Player_Retention_Matrix.sql ├── Non causal funnel.sql ├── Revenue KPIs by Days Since Install (24 hour Boundaries, not UTC).sql ├── Time_spent_per_level.sql ├── average_mission length.sql ├── average_session_length_by_session_number.sql ├── campaignImpact.sql ├── campaign_conversion_with_window.sql ├── churn.sql ├── count_days_until_first-purchase.sql ├── first_time_deposits.sql ├── first_two_purchases_matrix.sql ├── hours_until_first_purchase.sql ├── in_game_test_event.sql ├── median_and_average_session_times.sql ├── missionCompletion.png ├── percentage_spenders.sql ├── retention_by_country.sql ├── retention_for_min_event_date_only.sql ├── revenue_per_day_per_platform.sql ├── revenue_per_hour.sql ├── revenue_per_item.sql ├── smartads_and_iap_ltv.sql ├── spender_percentage.sql └── virtual_currency_spent_on_item.sql ├── LICENSE ├── Paradigms ├── New Players by Time of day in local time.sql ├── Percentage spenders over time.sql ├── SessionLength differences.sql ├── ab_testing_variants.sql ├── concurrency.sql ├── inventory_one_user.sql └── weekly_retention.sql ├── QA ├── Ghostevents_vs_gameplay_events.sql ├── event_order.sql ├── events_per_dau.sql ├── seconds_between_events.sql └── sessionlength_distribution.sql └── README.md /Advanced features/Analytic Functions.md: -------------------------------------------------------------------------------- 1 | # Analytic functions 2 | 3 | When you have data that is stored on multiple rows you sometimes want to know about something that is not just reported back in the event you are looking on. Traditionally with large databases with a lot of tables the goto way to resolve this was by joining data back onto itself. With our data warehouse running Vertica however this is not the best way to do this. Using analytic functions usually provide you with a way to enrich the rows you are looking for rather than joining onto a different dataset. 4 | 5 | Analytic functions take all rows in the output and add a column to this that holds the output of an analytic function. The functions can be easily recognised as part of the parameters to retrieve in the select statement and will contain the `OVER` keyword. 6 | 7 | ### Intention 8 | This tutorial we are aiming to explain what analytic functions are and how they can be used in the DeltaDNA context. All example queries are created with data mining our demo game in mind, [you can visit this tool by clicking here.](https://www.deltadna.net/demo-account/demo-game/live/data-mining) 9 | 10 | ## Example 11 | A first example, say you want to get the average amount of revenue generated per spender when they finish mission 5 ("First Time User Forest #5"): 12 | first we get all data in scope we need, the transactions (only events with convertedproductAmount) and mission 5 finishes. 13 | we can do this with the following where statement: 14 | ```sql 15 | where (eventName = 'missionCompleted' and missionName ='First Time User Forest #5') 16 | or convertedProductAmount is not null 17 | ``` 18 | Next we want to get the total amount spent in game up to that point in time, we can do this by summing up the convertedProductAmount per user and ordering it chronologically (using the eventID) 19 | ```sql 20 | SUM(convertedProductAmount) OVER (partition by userId order by eventID) as cumulative_revenue 21 | ``` 22 | Combining these values I can get a resultset that I can then do another pass over that allows me to select all missionCompleted events: 23 | ```sql 24 | with data as (select eventName,userId, 25 | SUM(convertedProductAmount) OVER (partition by userId order by eventID) as cumulative_revenue 26 | from events 27 | where (eventName = 'missionCompleted' and missionName ='First Time User Forest #5') 28 | or convertedProductAmount is not null) 29 | select * from data 30 | where eventName = 'missionCompleted' 31 | ``` 32 | Since users can finish that mission multiple times and we don't want any users that have never spend any money at all we want to just get the lowest value for the cumulative_revenue for each user that is not null, use this in a second subquery (or CTE using the WITH statement.) Then we get the average value of the revenue_first_mission_5_complete in the last part of the query. 33 | ```sql 34 | WITH DATA AS 35 | (SELECT eventName, 36 | userId, 37 | SUM(convertedProductAmount) OVER (partition BY userId ORDER BY eventID) AS cumulative_revenue 38 | FROM EVENTS 39 | WHERE (eventName = 'missionCompleted' 40 | AND missionName ='First Time User Forest #5') 41 | OR convertedProductAmount IS NOT NULL) 42 | ,spenderAggregates AS 43 | (SELECT userId, 44 | min(cumulative_revenue)AS revenue_first_mission_5_complete 45 | FROM DATA 46 | WHERE eventName = 'missionCompleted' 47 | AND cumulative_revenue IS NOT NULL 48 | GROUP BY userId) 49 | SELECT avg(revenue_first_mission_5_complete) AS RESULT 50 | FROM spenderAggregates 51 | ``` 52 | 53 | 54 | ## Functions 55 | 56 | Some default aggregate functions such as `SUM()` and `COUNT()` can be used as aggregate functions as well, you can find the complete list of functions here: 57 | [On the official Vertica documentation page.](https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/SQLReferenceManual/Functions/Analytic/AnalyticFunctions.htm) 58 | 59 | ### RANK/DENSE_RANK 60 | If you are looking to add a number to each event based on the order of passing by the values in the order clause you can use the RANK function. For example if you want to know what the most popular second event a user sends is then you can number the events using rank and report them as follows. 61 | ```sql 62 | WITH DATA AS 63 | (SELECT userId, 64 | eventName, 65 | rank() over (partition BY userId ORDER BY eventTimestamp) AS eventNumber 66 | FROM EVENTS 67 | WHERE gaUserStartdate = eventDate) 68 | SELECT eventName, 69 | count(*) AS occurrances 70 | FROM DATA 71 | WHERE eventNumber = 2 72 | GROUP BY eventName 73 | ORDER BY count(*) DESC 74 | ``` 75 | If two events come in at the same time this will cause for a gap however and the user will never get an event with eventNumber 2, like you will skip the third price if there is a shared second price in a contest. If you don't want to do this you can use the `DENSE_RANK`, this will add a number to each row that is either the same or one higher than the previous one. 76 | 77 | ### LAG/LEAD 78 | The `LAG()` function can be used to get the value something on the previous row. For example if you want to get the time between starting a mission and completing it you can do the following: 79 | ```sql 80 | WITH DATA AS 81 | (SELECT eventTimestamp, 82 | eventName, 83 | missionName, 84 | lag(eventTimestamp) over(partition BY userId ORDER BY eventTimestamp) AS previousTimestamp 85 | FROM EVENTS 86 | WHERE eventName IN ('missionStarted', 87 | 'missionCompleted') ) 88 | SELECT missionName, 89 | avg(eventTimestamp - previousTimestamp) AS 'Average Completion Time' 90 | FROM DATA 91 | WHERE eventName = 'missionCompleted' 92 | GROUP BY missionName 93 | ORDER BY missionName 94 | ``` 95 | By only looking at the missionCompleted event and assuming that you can't complete a mission without starting it before you get the intervals between these two timestamps. 96 | 97 | `LEAD()` works the opposite way around, it will get data from the next row instead. We did not use this here since there is an option of not completing the mission. 98 | 99 | ### LAST_VALUE/FIRST_VALUE 100 | `LAST_VALUE()` is probably one of the most useful functions, it can be used to get the last known value, usually one would provide the parameter and the `IGNORE NULLS` keyword to make sure it does not just copy over the parameter. 101 | ```sql 102 | WITH DATA AS 103 | (SELECT last_value(clientVersion IGNORE NULLS) over (partition BY userId ORDER BY eventId) AS currentClientversion, 104 | userId, 105 | eventTimestamp, 106 | eventName, 107 | realCurrencyAmount, 108 | realCurrencyType, 109 | convertedProductAmount 110 | FROM EVENTS 111 | WHERE clientVersion IS NOT NULL 112 | OR eventName = 'transaction') 113 | SELECT * 114 | FROM DATA 115 | WHERE eventName = 'transaction' 116 | ``` 117 | The `FIRST_VALUE` works the exact oposite and fill give you the first value ever seen for that user. 118 | 119 | ### CONDITIONAL_CHANGE_EVENT 120 | The `CONDITIONAL_CHANGE_EVENT()` function holds a counter that will increment if the current row is different from the last row, if you order your events by eventID you can use this to, for example, create a sessionCounter. Since the first session will get the number 0 you might want to add one to it to get a 1 based number and have session 1 being the first session. 121 | ```sql 122 | CONDITIONAL_CHANGE_EVENT(sessionid) over (partition BY userId ORDER BY eventTimestamp)+1 AS sessionCounter 123 | ``` 124 | # WORK IN PROGRESS 125 | 126 | 127 | ### PERCENTILE 128 | TODO 129 | 130 | ### MEDIAN 131 | TODO 132 | 133 | 134 | ## Partitioning 135 | Within the over statement you can specify what you want to group by, there will be a seperate counter for each unique value. Without partitioning you will just get the aggregate for all rows within scope. 136 | 137 | The following will the total aggregate over all events to all events without grouping at all, this can be convenient if you want to calculate a percentage. 138 | ```sql 139 | sum(convertedProductAmount) over () 140 | ``` 141 | If you specify a parameter however you will create a separate counter that will give you the total per unique value of that parameter. For example userID or sessionID work well here: 142 | ```sql 143 | sum(convertedProductAmount) over (partition by userID) 144 | ``` 145 | In the example above you get the total revenue generated by that user added to all their events. 146 | 147 | ## Ordering 148 | Next to partitioning you can also add an order in which the analytic function is executed, some functions even require you to specify an order such as `RANK()` and `MEDIAN()` since these functions are driven by the row order. 149 | 150 | The following will enrich all your rows with the cummulative revenue per user up to the time that event was received. 151 | ```sql 152 | sum(convertedProductAmount) over (partition by userID order by eventTimestamp) 153 | ``` 154 | 155 | 156 | ## windowing 157 | TODO: 158 | minimum: () 159 | Default value 160 | 161 | 1. row based 162 | 2. range based 163 | 164 | 165 | ### extra things to mention: 166 | TODO: 167 | -sessionCounter (conditional_change_event) 168 | -count(case) 169 | 170 | 171 | -------------------------------------------------------------------------------- /Advanced features/Custom_moving_mau_window.sql: -------------------------------------------------------------------------------- 1 | --In this query we create a time slice by extrapolating the days between two fixed dates 2 | --this is then joined onto the user activity of x days in the past. And next we count the unique number of users . 3 | WITH dates AS 4 | (SELECT DAY::date 5 | FROM 6 | (SELECT (CURRENT_DATE-30)::TIMESTAMP AS date_range 7 | UNION SELECT CURRENT_DATE::TIMESTAMP)AS ts timeseries DAY AS '1 days' over ( 8 | ORDER BY date_range)), 9 | activity AS 10 | (SELECT DISTINCT event_date, 11 | user_id 12 | FROM fact_user_sessions_day 13 | WHERE event_date> CURRENT_DATE -62) 14 | SELECT DAY, 15 | count(DISTINCT user_id) 16 | FROM dates d 17 | FULL OUTER JOIN activity a ON a.event_date BETWEEN d.day-30 AND d.day 18 | WHERE DAY IS NOT NULL 19 | GROUP BY DAY 20 | ORDER BY DAY 21 | -------------------------------------------------------------------------------- /Advanced features/Hourly Player Session Distribution in local time.sql: -------------------------------------------------------------------------------- 1 | --Get number of active users per hour based on the local timestamp. 2 | with firstValues as ( 3 | select userId 4 | , eventTimestamp as startTs 5 | , min(timezoneOffset)tzo 6 | from events 7 | where eventName = 'gameStarted' 8 | group by userId, eventTimestamp) 9 | , results as ( 10 | select userID 11 | , TIMESTAMPADD(minute, case when CHAR_LENGTH(TZO)=5 then CAST(substring(tzo, 1,1)||'1' as INTEGER) *-- get positive vs negative tzoffset 12 | (CAST(substring(tzo, 2,2) as INTEGER)*60 + cast(substring(tzo, 4,2) as INTEGER)) else null --get tzoffset in minutes 13 | end ,startTs )as localEventTimestamp 14 | from firstValues) 15 | select date_part('hour', localEventTimestamp) as LocalHour 16 | , count(distinct userID) as UniqueUsers 17 | from results 18 | where localEventTimestamp is not null 19 | group by 1 20 | order by 1 21 | -------------------------------------------------------------------------------- /Advanced features/SessionCounter example.sql: -------------------------------------------------------------------------------- 1 | --add a session counter to all events and use that to show the totals per xth session. 2 | WITH DATA AS 3 | (SELECT eventName, 4 | userId, 5 | sessionId, 6 | CONDITIONAL_CHANGE_EVENT(sessionid) over (partition BY userId 7 | ORDER BY eventTimestamp) AS sessionCounter 8 | FROM events 9 | WHERE userId IN 10 | (SELECT userId 11 | FROM events 12 | WHERE eventname = 'newPlayer')--added to remove sessions from people that started playing outside of data retention window 13 | ) 14 | SELECT eventName, 15 | count(*) AS events, 16 | count(DISTINCT userId) AS users, 17 | round(count(*) / count(DISTINCT userId),2.0) AS 'ev per user in first session' 18 | FROM DATA 19 | WHERE sessionCounter = 0 20 | GROUP BY eventName 21 | 22 | 23 | 24 | -------------------------------------------------------------------------------- /Advanced features/json_parse.sql: -------------------------------------------------------------------------------- 1 | --we suggest to not use json data objects as strings when setting up your event structure. However some parameters, such as the google transactonReceipt are json strings anyway. 2 | --Vertica is able to extract a map object from the json string and with maplookup you are able to retrieve the value for the selected key. 3 | select 4 | maplookup(MapJSONExtractor(transactionReceipt), 'packageName') as PackageName, 5 | transactionReceipt 6 | from events 7 | where transactionServer = 'GOOGLE' 8 | and revenueValidated = 1 9 | -------------------------------------------------------------------------------- /Advanced features/lastUiName.sql: -------------------------------------------------------------------------------- 1 | --Find the last UiName per user and compare these for current players and players that haven't been playing for a while 2 | WITH DATA AS 3 | (SELECT userId, 4 | eventTimestamp, 5 | first_value(UIName 6 | IGNORE nulls) over (partition BY userId 7 | ORDER BY eventTimestamp) AS lastValue --get the first ever value backwards 8 | FROM EVENTS 9 | WHERE sessionId IS NOT NULL --exclude non-gameplay events such as the sending of a notification 10 | ) ,aggregates AS 11 | (SELECT userId, 12 | max(eventTimestamp)::date AS last_seen_date, 13 | MAX (lastValue) AS lastUiName 14 | FROM DATA 15 | GROUP BY userId) 16 | SELECT lastUiName, 17 | count(CASE 18 | WHEN last_seen_date> CURRENT_DATE-7 THEN 1 19 | ELSE NULL 20 | END) currentPlayers, 21 | count(CASE 22 | WHEN last_seen_date< CURRENT_DATE-7 THEN 1 23 | ELSE NULL 24 | END) lapsedPlayers 25 | FROM aggregates 26 | GROUP BY lastUiName 27 | -------------------------------------------------------------------------------- /Advanced features/time_slice_concurrency.sql: -------------------------------------------------------------------------------- 1 | --concurrent users in data 2 | --create a dataset with timestamps between two points at regular intervals 3 | --join sessions on these intervals if they start before and end after this point in time. 4 | --Count the number of sessions on these intervals. 5 | with timeslots as (select slice_time 6 | from (select (CURRENT_DATE-10)::timestamp as date_range union select now())as ts 7 | timeseries slice_time as '10 minutes' over (order by date_range)), 8 | sessions as (select userid, sessionid, min(eventTimestamp) as sessionStart, max(eventTimestamp) as sessionEnd 9 | FROM events 10 | WHERE eventTimestamp between (CURRENT_DATE-10) and now() 11 | and sessionId is not null 12 | group by 1,2) 13 | select 14 | slice_time as time, 15 | count(sessionId) as activeSessions 16 | from timeslots as ts 17 | left join sessions s 18 | on ts.slice_time >s.sessionStart and ts.slice_time < s.sessionEnd 19 | group by ts.slice_time 20 | order by ts.slice_time desc; 21 | -------------------------------------------------------------------------------- /Advanced features/top_percent_spenders.sql: -------------------------------------------------------------------------------- 1 | --Get the tipping point for when users would be in the top 1% of the group of spenders. 2 | --Then use this to find the 1% of the spenders. 3 | WITH spenders AS 4 | (SELECT user_Id, 5 | totalRealCurrencySpent, 6 | PERCENTILE_CONT(.01) WITHIN GROUP(ORDER BY totalRealCurrencySpent DESC) OVER (PARTITION BY 1) AS top1Percent 7 | FROM user_metrics 8 | WHERE totalRealCurrencySpent>0 --only select spenders 9 | ) 10 | SELECT count(*) AS spenders, 11 | count(CASE 12 | WHEN totalRealCurrencySpent>=top1Percent THEN 1 13 | ELSE NULL 14 | END) AS top1PercentSpenders 15 | FROM spenders 16 | -------------------------------------------------------------------------------- /Advanced features/transactions_per_clientVersion.sql: -------------------------------------------------------------------------------- 1 | --Since the clientVersion is sent in on the gameStarted event at the start of every session but not on each event we can't just filter events by clientVersion in one go. 2 | --However we can extrapolate the clientVersion using an analytic function (with the OVER statement) 3 | with data as ( 4 | select last_value(clientVersion ignore nulls) over (partition by userId order by eventId) as currentClientversion 5 | ,userId 6 | ,eventTimestamp 7 | ,eventName 8 | ,realCurrencyAmount 9 | ,realCurrencyType 10 | ,convertedProductAmount 11 | from events 12 | where clientVersion is not null 13 | or eventName = 'transaction') 14 | select * 15 | from data 16 | where eventName = 'transaction' 17 | -------------------------------------------------------------------------------- /KPIs/ARPDAU.sql: -------------------------------------------------------------------------------- 1 | --ARPDAU, Average Revenue Per Daily Active User 2 | --specifically for currencies with an exponent of 2. 3 | select event_date, 4 | round(sum(revenue)/100/count(distinct user_id),4)::float as ARPDAU 5 | from fact_user_sessions_day 6 | where event_date between current_date -30 and current_date 7 | group by event_date 8 | order by event_date desc; 9 | -------------------------------------------------------------------------------- /KPIs/ARPDAU_and_revenue_with_smartads.sql: -------------------------------------------------------------------------------- 1 | --arpdau and revenue for IAP and SmartAds revenue 2 | --This query only works if the default currency of your game is USD 3 | --Keep in mind that the SmartAds revenue is an estimate based on previous performance. 4 | select eventdate, 5 | round( 6 | (sum(convertedproductAmount)+sum(adEcpm/1000))/100/count(distinct userid) 7 | ,4)::float as ARPDAU, 8 | (sum(convertedproductAmount)+sum(adEcpm)/1000)/100::float as 'revenue for both IAP and SmartAds' 9 | from events 10 | where eventdate between current_date -30 and current_date 11 | group by eventdate 12 | order by eventdate desc 13 | -------------------------------------------------------------------------------- /KPIs/First_IAP_by_transactionName.sql: -------------------------------------------------------------------------------- 1 | --First deposit by transactionName 2 | with transactions as ( 3 | select 4 | userId, eventTimestamp,transactionName, convertedproductAmount, revenueValidated, 5 | rank() over (partition by userid order by eventTimestamp) as transactionNumber 6 | from events 7 | where convertedProductAmount>0 and revenueValidated in (0,1) 8 | ) 9 | select 10 | transactionName, 11 | count(*) as 'IAP count', 12 | sum(convertedproductAmount)/100::float as Revenue 13 | from transactions 14 | where transactionNumber = 1 15 | group by transactionName 16 | order by 2 desc 17 | -------------------------------------------------------------------------------- /KPIs/IAP_per_platform_per_day.sql: -------------------------------------------------------------------------------- 1 | -- Show userCount, spenderCount, purchaseCount and revenue per platform per day since launch 2 | -- Uses 24 hour time difference from timestamp rather than day difference from date 3 | WITH data AS ( 4 | SELECT 5 | e.userid, 6 | First_value(eventtimestamp) over (PARTITION BY userid ORDER BY eventtimestamp) AS "firstEventTimestamp", 7 | ROUND(Datediff(hh, First_value(eventtimestamp) OVER (PARTITION BY userid ORDER BY eventtimestamp), eventtimestamp) / 24 + 1, 2.0) :: INTEGER AS DayNumber, 8 | e.eventtimestamp, 9 | e.eventname, 10 | e.platform, 11 | convertedproductamount, 12 | revenuevalidated 13 | FROM events e 14 | WHERE 15 | e.eventname IN ( 'newPlayer', 'gameStarted', 'transaction' ) 16 | AND gauserstartdate > current_date - 31) 17 | 18 | SELECT daynumber, 19 | COUNT(DISTINCT(CASE WHEN platform LIKE 'IOS%' THEN userid END)) AS "iOS Users", 20 | COUNT(DISTINCT(CASE WHEN platform LIKE 'IOS%' 21 | AND eventname = 'transaction' 22 | AND revenuevalidated IN ( 0, 1 ) 23 | AND convertedproductamount > 0 THEN userid END)) AS "iOS Spenders", 24 | COUNT(CASE WHEN platform LIKE 'IOS%' 25 | AND eventname = 'transaction' 26 | AND revenuevalidated IN ( 0, 1 ) 27 | AND convertedproductamount > 0 THEN convertedproductamount END) AS "iOS Purchases", 28 | SUM(CASE WHEN platform LIKE 'IOS%' 29 | AND eventname = 'transaction' 30 | AND revenuevalidated IN ( 0, 1 ) 31 | AND convertedproductamount > 0 THEN convertedproductamount END) AS "iOS Revenue", 32 | COUNT(DISTINCT(CASE WHEN platform LIKE 'ANDROID%' THEN userid END)) AS "Android Users", 33 | COUNT(DISTINCT(CASE WHEN platform LIKE 'ANDROID%' 34 | AND eventname = 'transaction' 35 | AND revenuevalidated IN ( 0, 1 ) 36 | AND convertedproductamount > 0 THEN userid END)) AS "Android Spenders", 37 | COUNT(CASE WHEN platform LIKE 'ANDROID%' 38 | AND eventname = 'transaction' 39 | AND revenuevalidated IN ( 0, 1 ) 40 | AND convertedproductamount > 0 THEN convertedproductamount END) AS "Android Purchases", 41 | SUM(CASE WHEN platform LIKE 'ANDROID%' 42 | AND eventname = 'transaction' 43 | AND revenuevalidated IN ( 0, 1 ) 44 | AND convertedproductamount > 0 THEN convertedproductamount END) AS "Android Revenue" 45 | FROM data 46 | GROUP BY daynumber 47 | ORDER BY daynumber; -------------------------------------------------------------------------------- /KPIs/Mission Starters vs Completers.sql: -------------------------------------------------------------------------------- 1 | --Mission starts vs mission completes, this query shows what percentage of players get stuck or churn on a mission. 2 | --Stack the starts but not completing players together with the players completing a mission and you will see the part of the total 3 | WITH DATA AS 4 | (SELECT userId, 5 | missionName, 6 | eventName, 7 | max(CASE WHEN eventName = 'missionStarted' THEN 1 ELSE NULL END) over (partition BY userId, missionName) AS missionStartedFlag, 8 | max(CASE WHEN eventName = 'missionCompleted' THEN 1 ELSE NULL END) over (partition BY userId, missionName) AS missionCompletedFlag 9 | FROM events 10 | WHERE missionName IS NOT NULL), nonCompletionData AS 11 | (SELECT missionName, 12 | count(DISTINCT CASE WHEN missionStartedFlag THEN userId ELSE NULL END) AS players, 13 | count(DISTINCT CASE WHEN missionStartedFlag = 1 14 | AND missionCompletedFlag = 1 THEN userId ELSE NULL END) AS completedPlayers, 15 | count(DISTINCT CASE WHEN missionStartedFlag = 1 16 | AND missionCompletedFlag IS NULL THEN userId ELSE NULL END) AS n00bs 17 | FROM DATA 18 | GROUP BY missionName) 19 | SELECT missionName AS "Mission Name", 20 | players AS "Players starting mission", 21 | completedPlayers AS "Number of players completing mission", 22 | n00bs AS "players started but not completed mission", 23 | n00bs/nullif(players,0) AS Ratio --NullIF since players could theoretically be 0 and we rather divide by null 24 | FROM nonCompletionData 25 | ORDER BY missionName; 26 | -------------------------------------------------------------------------------- /KPIs/MissionCompletion.sql: -------------------------------------------------------------------------------- 1 | --Mission completion and drop off: 2 | --First we select the users who start playing their first mission within scope 3 | --then we add counters for all mission end events and mission start events. 4 | --The sum of failed completed and abandoned should add up to be equal to the number of started 5 | --this can be visualised by stacking them within data mining and getting a line of mission starts 6 | --Here we assume missionID is a numeric value sent in as a string and the first mission is '0' 7 | WITH starters AS 8 | (SELECT userId 9 | FROM events 10 | WHERE missionId ='0'-- number of your first mission 11 | AND eventName = 'missionStarted') 12 | SELECT missionId, 13 | missionName, 14 | count(CASE WHEN eventName = 'missionStarted' THEN 1 ELSE NULL END) AS started, 15 | count(CASE WHEN eventName = 'missionFailed' THEN 1 ELSE NULL END) AS failed, 16 | count(CASE WHEN eventName = 'missionCompleted' THEN 1 ELSE NULL END) AS completed, 17 | count(CASE WHEN eventName = 'missionAbandoned' THEN 1 ELSE NULL END) AS abandoned 18 | FROM events 19 | WHERE userId IN 20 | (SELECT userId 21 | FROM starters) 22 | AND eventLevel = 0 23 | AND missionId is not null 24 | GROUP BY missionId, 25 | missionName 26 | ORDER BY CAST (missionId AS NUMERIC); 27 | -------------------------------------------------------------------------------- /KPIs/New_Player_Retention_Chart.sql: -------------------------------------------------------------------------------- 1 | with days_data as 2 | (select user_id 3 | , player_start_date as install_date 4 | , event_date 5 | from fact_user_sessions_day 6 | where player_start_date >= CURRENT_DATE - 30 7 | group by user_id, player_start_date, event_date 8 | ) 9 | , aggregated_data as 10 | ( select DATEDIFF ('day', install_date, event_date) as DaysSinceInstall 11 | , count(distinct user_id) as UniqueUsers 12 | from days_data 13 | group by 1 14 | ) 15 | select DaysSinceInstall, UniqueUsers 16 | , ROUND(UniqueUsers / FIRST_VALUE(UniqueUsers) OVER (order by DaysSinceInstall) *100,2)::float as Percentage 17 | from aggregated_data 18 | order by DaysSinceInstall -------------------------------------------------------------------------------- /KPIs/New_Player_Retention_Matrix.sql: -------------------------------------------------------------------------------- 1 | ---------------------------- 2 | -- New Player Retention Matrix 3 | with days_data as 4 | (select user_id 5 | , player_start_date as install_date 6 | , event_date 7 | from fact_user_sessions_day 8 | where player_start_date >= CURRENT_DATE - 30 9 | group by user_id, player_start_date, event_date 10 | ) 11 | select install_date 12 | , NULLIFZERO (count (distinct case when event_date - install_date = 0 then user_id else null end)) as "installs" 13 | , NULLIFZERO (count (distinct case when event_date - install_date = 1 then user_id else null end)) as "1" 14 | , NULLIFZERO (count (distinct case when event_date - install_date = 2 then user_id else null end)) as "2" 15 | , NULLIFZERO (count (distinct case when event_date - install_date = 3 then user_id else null end)) as "3" 16 | , NULLIFZERO (count (distinct case when event_date - install_date = 4 then user_id else null end)) as "4" 17 | , NULLIFZERO (count (distinct case when event_date - install_date = 5 then user_id else null end)) as "5" 18 | , NULLIFZERO (count (distinct case when event_date - install_date = 6 then user_id else null end)) as "6" 19 | , NULLIFZERO (count (distinct case when event_date - install_date = 7 then user_id else null end)) as "7" 20 | , NULLIFZERO (count (distinct case when event_date - install_date = 8 then user_id else null end)) as "8" 21 | , NULLIFZERO (count (distinct case when event_date - install_date = 9 then user_id else null end)) as "9" 22 | , NULLIFZERO (count (distinct case when event_date - install_date = 10 then user_id else null end)) as "10" 23 | , NULLIFZERO (count (distinct case when event_date - install_date = 11 then user_id else null end)) as "11" 24 | , NULLIFZERO (count (distinct case when event_date - install_date = 12 then user_id else null end)) as "12" 25 | , NULLIFZERO (count (distinct case when event_date - install_date = 13 then user_id else null end)) as "13" 26 | , NULLIFZERO (count (distinct case when event_date - install_date = 14 then user_id else null end)) as "14" 27 | , NULLIFZERO (count (distinct case when event_date - install_date = 21 then user_id else null end)) as "21" 28 | , NULLIFZERO (count (distinct case when event_date - install_date = 30 then user_id else null end)) as "30" 29 | from days_data 30 | where CURRENT_DATE - event_date between 0 and 30 31 | group by 1 32 | order by 1 desc -------------------------------------------------------------------------------- /KPIs/Non causal funnel.sql: -------------------------------------------------------------------------------- 1 | --The funnel tool enforces causality which means that what happens in step 1 needs to happen before step 2 for that user. 2 | --This query looks for users who started any mission on the first day of playing and then looks at the number of users who did this and also completed a mission on day 2 of playing. This day 1 and day 2 filtering enforces causality as well but this could be left out or dictated by the game (if conditions for step 1 always precede conditions for step 2) 3 | WITH step1 AS 4 | (SELECT DISTINCT userId 5 | FROM EVENTS 6 | WHERE eventDate = gaUserStartDate 7 | AND eventname = 'missionStarted'), 8 | step2 AS 9 | (SELECT DISTINCT userId 10 | FROM EVENTS 11 | WHERE userId IN 12 | (SELECT * 13 | FROM step1) 14 | AND eventDate - gaUserStartDate = 1 15 | AND eventName = 'missionCompleted') 16 | SELECT 'step1' AS stepName, 17 | count(*) AS users 18 | FROM step1 19 | UNION 20 | SELECT 'step2', 21 | count(*) 22 | FROM step2 23 | ORDER BY stepName 24 | -------------------------------------------------------------------------------- /KPIs/Revenue KPIs by Days Since Install (24 hour Boundaries, not UTC).sql: -------------------------------------------------------------------------------- 1 | -- Get the KPI's in 24 hour intervals since install rather than UTC days 2 | with data as ( 3 | select e.userID 4 | , first_value(eventTimestamp) over (partition by userID order by eventTimestamp) as "firstEventTimestamp" 5 | -- Calculate the Day Number based on number of 24 hour blocks since timestamp on first event, (i.e. Not UTC day boundaries) 6 | , round(datediff(hh,first_value(eventTimestamp) over (partition by userID order by eventTimestamp),eventTimestamp) / 24 +1 ,2.0):: integer as DayNumber 7 | , e.eventTimestamp 8 | , e.eventName 9 | , e.platform 10 | , convertedProductAmount 11 | , revenueValidated 12 | from events e 13 | where e.eventName in ('newPlayer', 'gameStarted', 'transaction') 14 | and gaUserStartDate > CURRENT_DATE -31 15 | ) 16 | select DayNumber 17 | , count(distinct(case when platform like 'IOS%' then userID end )) as "iOS Users" 18 | , count(distinct(case when platform like 'IOS%' and eventName = 'transaction' and revenueValidated in (0,1) and convertedProductAmount > 0 then userID end )) as "iOS Spenders" 19 | , count(case when platform like 'IOS%' and eventName = 'transaction' and revenueValidated in (0,1) and convertedProductAmount > 0 then convertedProductAmount end ) as "iOS Purchases" 20 | , sum(case when platform like 'IOS%' and eventName = 'transaction' and revenueValidated in (0,1) and convertedProductAmount > 0 then convertedProductAmount end ) as "iOS Revenue" 21 | , count(distinct(case when platform like 'ANDROID%' then userID end )) as "Andorid Users" 22 | , count(distinct(case when platform like 'ANDROID%' and eventName = 'transaction' and revenueValidated in (0,1) and convertedProductAmount > 0 then userID end )) as "Android Spenders" 23 | , count(case when platform like 'ANDROID%' and eventName = 'transaction' and revenueValidated in (0,1) and convertedProductAmount > 0 then convertedProductAmount end ) as "Android Purchases" 24 | , sum(case when platform like 'ANDROID%' and eventName = 'transaction' and revenueValidated in (0,1) and convertedProductAmount > 0 then convertedProductAmount end ) as "Android Revenue" 25 | from data 26 | group by DayNumber 27 | order by DayNumber ; 28 | -------------------------------------------------------------------------------- /KPIs/Time_spent_per_level.sql: -------------------------------------------------------------------------------- 1 | --In the data CTE we add the last known value for the parameter userLevel to all the other events and bring the interesting columns into scope 2 | --Next we group all events and the data by the last known value for the userlevel and create the aggregates we are interested in. 3 | WITH DATA AS 4 | (SELECT eventName, 5 | userId, 6 | last_value(userLevel IGNORE NULLS) over (partition BY userId 7 | ORDER BY eventId) AS current_level, 8 | eventDate, 9 | msSinceLastEvent 10 | FROM EVENTS) 11 | SELECT coalesce(current_level, 0) AS LEVEL,--replace not known yet with level 0 12 | sum(msSinceLastEvent)/1000 AS 'Seconds Spent On Level', 13 | count(DISTINCT userId) AS 'Number of users', 14 | count(DISTINCT eventDate) AS 'Total days spent', 15 | count(DISTINCT eventDate)/count(DISTINCT userId) AS 'average days spent', 16 | sum(msSinceLastEvent)/1000/count(DISTINCT userId) AS 'average seconds in game spent on level' 17 | FROM DATA 18 | GROUP BY current_level 19 | ORDER BY current_level 20 | -------------------------------------------------------------------------------- /KPIs/average_mission length.sql: -------------------------------------------------------------------------------- 1 | --average mission length 2 | --In the first CTE missions we add a counter to keep track of the mission try we are in which is upped after the start of a mission and after the end of a mission (hence the DESC order in the over clause) 3 | --In the second CTE we add up the time for each missionGroup. 4 | --In the select query we take only the missionCompletedEvent and select the time it took from starting the mission to completing it. 5 | WITH missions AS 6 | (SELECT userId, 7 | sessionId, 8 | eventName, 9 | missionId, 10 | userLevel, 11 | eventTimestamp, 12 | msSinceLastEvent, 13 | count(CASE WHEN eventName IN('missionCompleted', 'missionStarted') THEN 1 ELSE NULL END) over (partition BY userID 14 | ORDER BY eventTimestamp DESC) missionGroup 15 | FROM events), missionCompletedEvents AS 16 | (SELECT userId, 17 | missionGroup, 18 | missionID, 19 | sum(msSinceLastEvent) over (partition BY missionGroup, userId 20 | ORDER BY eventTimestamp)AS MissionCompletedTime 21 | FROM missions 22 | ) 23 | SELECT missionID, 24 | avg(missionCompletedTime)/1000 AS AverageMissionTimeInSeconds 25 | FROM missionCompletedEvents 26 | where eventName = 'missionCompleted' 27 | GROUP BY missionID 28 | ORDER BY cast(missionId AS NUMERIC) 29 | -------------------------------------------------------------------------------- /KPIs/average_session_length_by_session_number.sql: -------------------------------------------------------------------------------- 1 | -- Shows average session length in minutes, sessionCount grouped by the session number 2 | 3 | -- Get session data 4 | -- Conditional change event pulls out the sessionid only when it has changed from the last occurance 5 | WITH sessions AS ( 6 | SELECT 7 | userid, 8 | sessionid, 9 | eventtimestamp, 10 | eventname, 11 | mssincelastevent, 12 | CONDITIONAL_CHANGE_EVENT(sessionid) OVER (PARTITION BY userid ORDER BY eventid) + 1 AS SessionCounter 13 | FROM events WHERE sessionid IS NOT NULL 14 | AND mssincelastevent IS NOT NULL 15 | ), 16 | -- Get aggregate data from the session data 17 | session_aggregates AS ( 18 | SELECT 19 | Max(userid) AS userID, 20 | Min(eventtimestamp), 21 | sessionid, 22 | sessioncounter, 23 | SUM(mssincelastevent) AS msSessionLength, 24 | Count(*) AS eventCount 25 | FROM sessions 26 | GROUP BY sessionid, 27 | sessioncounter 28 | ) 29 | -- Further aggregate data 30 | SELECT 31 | sessioncounter as sessionNumber, 32 | Count(*) AS sessionsCounted, 33 | Round(Avg(mssessionlength) / 60000, 2.0) :: FLOAT AS avgSessionLengthMinutes 34 | FROM session_aggregates 35 | GROUP BY sessioncounter 36 | ORDER BY sessioncounter; -------------------------------------------------------------------------------- /KPIs/campaignImpact.sql: -------------------------------------------------------------------------------- 1 | with sends as( 2 | select userid 3 | ,eventdate 4 | ,campaignName 5 | ,communicationState 6 | ,eventtimestamp as sendtime 7 | from events 8 | where eventname = 'outOfGameSend' 9 | and campaignName !='' 10 | ) 11 | ,gameStarts as( 12 | select userid 13 | ,eventtimestamp as startTime 14 | from events 15 | where eventname = 'gameStarted' 16 | ) 17 | ,missionStarts as( 18 | select userid 19 | ,eventtimestamp as missionTime 20 | from events 21 | where eventname = 'missionStarted' 22 | ) 23 | ,IAPs as( 24 | select userid 25 | ,eventtimestamp as iapTime 26 | from events 27 | where convertedProductAmount > 0 and revenueValidated < 2 28 | ) 29 | ,sendOpens as( 30 | select g.userid 31 | ,eventdate 32 | ,campaignName 33 | ,sendTime 34 | ,communicationState 35 | ,min(sendTime) as nextLogin 36 | ,min(missionTime) as nextMission 37 | ,min(iapTime) as nextIAP 38 | from sends s left join gameStarts g on s.userid = g.userid and sendTime between startTime and startTime + interval '24 HOURS' 39 | left join missionStarts m on s.userid = m.userid and missionTime between startTime and startTime + interval '24 HOURS' 40 | left join IAPs i on s.userid = i.userid and iapTime between startTime and startTime + interval '24 HOURS' 41 | group by g.userid,eventdate,campaignName,sendTime,communicationState 42 | ) 43 | select eventDate 44 | ,campaignName 45 | ,count(distinct userid) as Delivery 46 | --,count(distinct case when communicationState = 'SENT' then userid end) as Sends 47 | ,count(distinct case when communicationState = 'FAIL' then userid end) as Bounces 48 | ,count(distinct case when communicationState = 'SENT'and timestampdiff(HH,nextLogin,sendTime) < 24 then userid end) "App Open" 49 | ,count(distinct case when communicationState = 'SENT'and timestampdiff(HH,nextMission,sendTime) < 24 then userid end) "Game Play" 50 | ,count(distinct case when communicationState = 'SENT'and timestampdiff(HH,nextIAP,sendTime) < 24 then userid end) "IAP" 51 | from sendOpens 52 | group by eventDate,campaignName 53 | order by eventDate desc,campaignName 54 | -------------------------------------------------------------------------------- /KPIs/campaign_conversion_with_window.sql: -------------------------------------------------------------------------------- 1 | --get all the messages sent in a certain step on a certain day and link these to the convertsion events in a (gamestarted) in the following 7 days 2 | --measure occurence per day for campaign 3 | with in_campaign as ( 4 | select userID, 5 | eventDate, 6 | COALESCE(stepID,1) as stepID, 7 | min(eventTimestamp) as stepTime, 8 | count(userID) over (partition by COALESCE(stepID,1), eventDate order by min(eventTimestamp)) as total_participants 9 | from events 10 | where eventName = 'outOfGameSend' 11 | and communicationState = 'SENT' 12 | and campaignID = 100--the ID of the campaign 13 | and stepType = 'STANDARD' 14 | group by userID, COALESCE(stepID,1), eventDate 15 | ) 16 | select c.stepTime::date as 'Date', c.stepID AS Step, max(total_participants) as Participants, count(distinct e.userID) as 'Occurred', round(count(distinct e.userID)/max(total_participants)*100,2)::float as '%Occurrence' 17 | from in_campaign c left join events e on e.userID = c.userID 18 | and e.eventTimestamp > c.stepTime 19 | and e.eventTimestamp < c.stepTime + interval '7 day' 20 | and e.eventName = 'gameStarted' 21 | group by c.stepID, c.stepTime::date 22 | order by c.stepTime::date, c.stepID, Participants 23 | -------------------------------------------------------------------------------- /KPIs/churn.sql: -------------------------------------------------------------------------------- 1 | -- User churn and rolling retention based on Day 28 2 | -- Rolling retention is the inverse of churn. 3 | -- Keep in mind that new data can come in and churn will go down over time. 4 | with activity_data as (select 5 | player_start_date, event_date, 6 | event_date-player_start_date as n, 7 | user_id 8 | from fact_user_sessions_day 9 | group by player_start_date, event_date, user_id 10 | ) 11 | select 12 | player_start_date, 13 | count(distinct user_id) as installs, 14 | count(distinct case when N>=28 then user_id end) as d28Retained, 15 | count(distinct case when N>=28 then user_id end)/count(distinct user_id) * 100 as d28RollingRetention, 16 | 100-(count(distinct case when N>=28 then user_id end)/count(distinct user_id) * 100) as d28Churn 17 | from activity_data 18 | where player_start_date > current_date -60 and event_date > current_date -60 19 | group by player_start_date 20 | order by player_start_date 21 | -------------------------------------------------------------------------------- /KPIs/count_days_until_first-purchase.sql: -------------------------------------------------------------------------------- 1 | -- Count of Days Until First Purchase 2 | WITH userPurchases AS ( 3 | SELECT 4 | user_id, 5 | datediff(day, player_start_date, event_date) daysSinceInstall, 6 | row_number() OVER (PARTITION BY user_id ORDER BY event_date) spendNumber 7 | FROM fact_user_sessions_day 8 | WHERE revenue > 0 9 | ) 10 | 11 | SELECT 12 | daysSinceInstall AS daysUntilFirstPurchase, 13 | COUNT(DISTINCT user_id) AS userCount 14 | FROM userPurchases 15 | WHERE spendNumber = 1 AND daysSinceInstall >= 0 16 | GROUP BY daysSinceInstall 17 | ORDER BY daysSinceInstall ASC -------------------------------------------------------------------------------- /KPIs/first_time_deposits.sql: -------------------------------------------------------------------------------- 1 | /* get the date of the first payment and count the number of users */ 2 | select firstPaymentTimestamp::date as spend_date, 3 | count(*) as spenders 4 | from user_metrics 5 | where firstPaymentTimestamp is not null 6 | group by 1 7 | order by 1 desc 8 | -------------------------------------------------------------------------------- /KPIs/first_two_purchases_matrix.sql: -------------------------------------------------------------------------------- 1 | with data as ( 2 | select 3 | userId, 4 | realcurrencyAmount, 5 | rank() over (partition by userId order by eventId) as transactionNumber 6 | from events 7 | where gaUserStartDate > (select min(eventDate) from events)--only get starters within time window 8 | and revenueValidated in (0,1) 9 | and realcurrencyAmount is not null 10 | ) 11 | ,userData as ( 12 | select 13 | userId, 14 | min(case when transactionNumber =1 then realcurrencyAmount end) as firstTransaction, 15 | min(case when transactionNumber =2 then realcurrencyAmount end) as secondTransaction 16 | from data 17 | group by userId) 18 | select 19 | firstTransaction, 20 | secondTransaction, 21 | count(*) as users 22 | from userData 23 | group by firstTransaction,secondTransaction 24 | order by firstTransaction,secondTransaction 25 | 26 | -------------------------------------------------------------------------------- /KPIs/hours_until_first_purchase.sql: -------------------------------------------------------------------------------- 1 | -- Hours Until First Purchase and Revenue of that Purchase 2 | -- Gets all transactions that are transaction, newPlayer or gameStarted 3 | -- Gets the hour difference between the first event and the first transaction event 4 | WITH transactions AS ( 5 | SELECT 6 | userid, 7 | eventid, 8 | eventlevel, 9 | eventtimestamp, 10 | First_value(eventtimestamp) OVER (PARTITION BY userid ORDER BY eventid) AS firstEventTimestamp, 11 | eventname, 12 | revenuevalidated, 13 | convertedproductamount, 14 | COUNT(convertedproductamount) OVER (PARTITION BY userid ORDER BY eventid) AS transactionNumber 15 | FROM events 16 | WHERE 17 | eventname IN ( 'transaction', 'newPlayer', 'gameStarted' ) 18 | AND gauserstartdate > (SELECT Min(eventtimestamp) FROM EVENTS) 19 | ) 20 | 21 | SELECT 22 | userid AS userId, 23 | firsteventtimestamp AS userFirstSeen, 24 | eventtimestamp AS firstPurchaseTimestamp, 25 | Datediff('hour', firsteventtimestamp, eventtimestamp) AS hoursSinceInstall, 26 | ROUND(convertedproductamount / 100, 2.0) :: FLOAT AS "Revenue USD" 27 | FROM transactions 28 | WHERE 29 | eventname = 'transaction' 30 | AND transactionnumber = 1 31 | AND revenuevalidated IN ( 0, 1 ) 32 | AND convertedproductamount IS NOT NULL 33 | ORDER BY 34 | userid, 35 | eventid, 36 | transactionnumber; -------------------------------------------------------------------------------- /KPIs/in_game_test_event.sql: -------------------------------------------------------------------------------- 1 | WITH relevant_data AS ( 2 | SELECT 3 | userId, 4 | eventDate, 5 | responseEngagementName, 6 | eventTimestamp, 7 | responseVariantName, 8 | responseMessageSequence, 9 | eventName, 10 | --Check if the next event for this user is the test event 11 | lead(eventName = 'gameStarted') OVER ( userWindow ) AS nextEventIsTestEvent, 12 | --Get the time of the following event 13 | lead(eventTimestamp) OVER ( userWindow ) AS nextTimestamp 14 | FROM events 15 | WHERE ((eventName = 'engageResponse' and responseEngagementName = 'A/B test campaign' 16 | ) 17 | OR (eventName = 'gameStarted' 18 | ) 19 | ) 20 | WINDOW userWindow AS (PARTITION BY userId ORDER BY eventTimestamp ) 21 | ), results AS ( 22 | SELECT 23 | eventDate AS Date, 24 | responseVariantName AS Variant, 25 | responseMessageSequence AS Step, 26 | count(DISTINCT userID) AS Participants, 27 | -- Get the number of people who reach the test event within the interval 28 | count(DISTINCT CASE WHEN nextEventIsTestEvent AND (nextTimestamp - eventTimestamp) <= '1 days' :: INTERVAL 29 | THEN userId END) AS Occurred 30 | FROM relevant_data 31 | WHERE (eventName = 'engageResponse') 32 | GROUP BY eventDate, responseVariantName, responseMessageSequence 33 | ) 34 | SELECT 35 | *, 36 | round(Occurred / Participants * 100, 2) :: FLOAT AS "Occurred %" 37 | FROM results 38 | ORDER BY Date, Variant, Step 39 | -------------------------------------------------------------------------------- /KPIs/median_and_average_session_times.sql: -------------------------------------------------------------------------------- 1 | --Get the median and average session times 2 | --The median times can only be retrieved in an analytic function 3 | WITH DATA AS 4 | (SELECT min(eventDate) AS eventDate, 5 | sessionId, 6 | userId, 7 | sum(msSinceLastEvent) AS sessionDurationMs, 8 | count(*)AS eventCount 9 | FROM EVENTS 10 | GROUP BY sessionId, 11 | userId) ,medianValues as 12 | (SELECT *, MEDIAN(sessionDurationMs) OVER (PARTITION BY eventDate) AS medianSessionTime 13 | FROM DATA 14 | WHERE eventCount>1-- exclude sessions with just one event 15 | ) 16 | SELECT eventDate, 17 | round(avg(sessionDurationMs)/1000,2.0) AS 'Mean session time in seconds', 18 | (avg(sessionDurationMs)::varchar||'ms')::interval AS 'Mean session time as interval', 19 | round(medianSessionTime/1000, 2.0) AS 'Medain session time in seconds', 20 | (medianSessionTime::varchar||'ms')::interval AS 'Median session time as interval', 21 | count(DISTINCT userId) AS 'Sample size users', 22 | count(DISTINCT sessionId) AS 'Sample size sessions' 23 | FROM medianValues 24 | GROUP BY eventDate, 25 | medianSessionTime 26 | ORDER BY eventDate DESC 27 | -------------------------------------------------------------------------------- /KPIs/missionCompletion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/deltaDNA/sql-cookbook/3b55f905879ac4486cc0d77e5fc71b1d6b8da515/KPIs/missionCompletion.png -------------------------------------------------------------------------------- /KPIs/percentage_spenders.sql: -------------------------------------------------------------------------------- 1 | --Select the total revenue per user and use that to calculate the percentage spenders 2 | --This number is in the default measure charts but this could count as a basis for a different query 3 | with data as (select user_Id, sum(revenue) as rv 4 | from fact_user_sessions_day 5 | group by 1 6 | ) 7 | select 8 | count(case when rv=0 then 1 else null end) as nonspenders, 9 | count(nullif(rv,0))as Spenders, 10 | count(*) as totalUsers, 11 | round(count(nullif(rv,0))/sum(case when rv=0 then 1 else null end)*100, 4.0) as percentageSpending 12 | from data 13 | -------------------------------------------------------------------------------- /KPIs/retention_by_country.sql: -------------------------------------------------------------------------------- 1 | with retention_data as ( 2 | select 3 | user_country, 4 | count(distinct user_id)as installs, 5 | count(distinct case when event_date-player_start_date = 7 then user_id end) as retainedD7 6 | from fact_user_sessions_day 7 | where user_country is not null 8 | and player_start_date < current_date -7 9 | group by user_country 10 | ) 11 | select *, retainedD7/installs as d7Retention 12 | from retention_data 13 | order by retainedD7/installs desc 14 | -------------------------------------------------------------------------------- /KPIs/retention_for_min_event_date_only.sql: -------------------------------------------------------------------------------- 1 | with DATA AS 2 | ( SELECT min(event_date) as event_date, 3 | player_start_date, 4 | user_id, 5 | DATEDIFF(DAY,player_start_date,min(event_date)) AS N 6 | FROM fact_user_sessions_day 7 | where DATEDIFF(DAY,player_start_date,current_date)<36 8 | group by player_start_date, 9 | user_id, session_Id 10 | ), retention AS 11 | ( SELECT player_start_date, 12 | COUNT (DISTINCT CASE WHEN n = 0 THEN user_id ELSE NULL END)AS installs, 13 | COUNT (DISTINCT CASE WHEN n = 1 THEN user_id ELSE NULL END)AS 'day 1 retention', 14 | COUNT (DISTINCT CASE WHEN n = 7 THEN user_id ELSE NULL END)AS 'day 7 retention', 15 | COUNT (DISTINCT CASE WHEN n = 14 THEN user_id ELSE NULL END)AS 'day 14 retention', 16 | COUNT (DISTINCT CASE WHEN n = 30 THEN user_id ELSE NULL END)AS 'day 30 retention' 17 | FROM DATA 18 | GROUP BY player_start_date) 19 | SELECT *, 20 | round("day 1 retention"/"installs"*100,2.0) as 'D1%', 21 | round("day 7 retention"/"installs"*100,2.0) as 'D7%', 22 | round("day 14 retention"/"installs"*100,2.0) as 'D14%', 23 | round("day 30 retention"/"installs"*100,2.0) as 'D30%' 24 | FROM retention 25 | ORDER BY player_start_date DESC 26 | 27 | -------------------------------------------------------------------------------- /KPIs/revenue_per_day_per_platform.sql: -------------------------------------------------------------------------------- 1 | -- Revenue per Day per mobile Platform 2 | -- Easily adaptable to show dollar revenue or add/adjust platforms 3 | SELECT 4 | event_date, 5 | SUM(CASE WHEN platform LIKE 'ANDROID%' THEN revenue END) AS " Android revenueInPence", 6 | count(DISTINCT(CASE WHEN platform LIKE 'ANDROID%' THEN user_id END)) AS "Android userCount", 7 | sum(CASE WHEN platform LIKE 'IOS%' THEN revenue END) AS " iOS revenueInPence", 8 | count(DISTINCT(CASE WHEN platform LIKE 'IOS%' THEN user_id END)) AS "iOS userCount" 9 | FROM fact_user_sessions_day 10 | WHERE 11 | event_date BETWEEN CURRENT_DATE - 31 AND CURRENT_DATE 12 | AND event_date = player_start_date 13 | GROUP BY event_date 14 | ORDER BY event_date DESC; -------------------------------------------------------------------------------- /KPIs/revenue_per_hour.sql: -------------------------------------------------------------------------------- 1 | /* Get the timestamp and round it down to the hour, 2 | then get the revenue for this hour, the total number of users as well as the total number of spenders within this window. 3 | */ 4 | select date_trunc('hour', eventTimestamp) as time, 5 | to_char(sum(convertedproductamount)/100, '$999,999,999,999.00') as revenue, 6 | count(distinct userid) active_users, 7 | count(distinct case when convertedproductamount>0 then userId end) as spenders 8 | from events 9 | where revenuevalidated not in (2,3) 10 | and eventName in ('gameStarted', 'transaction') 11 | and eventTimestamp between now() - interval '100 hours' and now() 12 | group by 1 order by 1 desc 13 | -------------------------------------------------------------------------------- /KPIs/revenue_per_item.sql: -------------------------------------------------------------------------------- 1 | --Real currency spent, grouped by item. 2 | --combine different child events from transaction event 3 | WITH items as 4 | (SELECT productAmount, ItemName, mainEventId 5 | FROM events 6 | WHERE eventName = 'transaction' 7 | AND productCategory = 'ITEM'), 8 | spendings AS 9 | (SELECT i.ItemName, 10 | i.productAmount, 11 | i.mainEventId, 12 | e.convertedProductAmount 13 | FROM events AS e 14 | INNER JOIN items AS i ON e.mainEventId = i.mainEventId 15 | WHERE e.productCategory = 'REAL_CURRENCY' 16 | ORDER BY itemName) 17 | SELECT itemName, 18 | sum(productAmount) as 'items sold', 19 | round(sum(convertedProductAmount)/100,2.0) as revenue, 20 | count(*) as sales 21 | FROM spendings 22 | GROUP BY itemName 23 | order by "revenue" desc 24 | -------------------------------------------------------------------------------- /KPIs/smartads_and_iap_ltv.sql: -------------------------------------------------------------------------------- 1 | --Get the total LTV (predicted smartads LTV + confirmed IAP LTV) 2 | --Only works for data within data retention window, hence the gaUserStartDate filter. 3 | --Keep in mind the predictedAdRevenue is only a prediction. 4 | with data as ( 5 | select 6 | gaUserStartdate, 7 | count(distinct userId) as userCount, 8 | sum(case when revenueValidated in (0,1) then convertedProductAmount/100 end) as IapRevenue, 9 | sum(case when eventname = 'adClosed' and adstatus = 'Success' then adEcpm/100000 end) as PredictedAdRevenue 10 | from events 11 | where gaUserStartDate>= (select min(eventDate) from events) 12 | group by gaUserStartdate) 13 | select 14 | gaUserStartDate as 'install date', 15 | round(IapRevenue/userCount,4)::float as 'IAP LTV', 16 | round(PredictedAdRevenue/userCount,4)::float as 'Predicted Ad LTV', 17 | round((PredictedAdRevenue+IapRevenue)/userCount,4)::float as 'total LTV', 18 | userCount as 'Number of players' 19 | from data 20 | order by gaUserStartDate 21 | -------------------------------------------------------------------------------- /KPIs/spender_percentage.sql: -------------------------------------------------------------------------------- 1 | /*Gets the number of players who have ever spent and the number of players active. 2 | */ 3 | with dailyAggregates as ( 4 | select event_date, 5 | user_id, 6 | sum(revenue) as revenue, 7 | sum(sum(revenue)) over (partition by user_id order by event_date) as cumulative_revenue 8 | from fact_user_sessions_day 9 | group by event_date, user_id 10 | ) 11 | select event_date, 12 | count(distinct user_id) as DAU, 13 | count(distinct case when revenue>0 then user_id end) as spenders_today, 14 | count(distinct case when revenue>0 then user_id end)/count(distinct user_id) as percentage_spending_today, 15 | count(distinct case when cumulative_revenue>0 then user_id end) as active_spenders, 16 | count(distinct case when cumulative_revenue>0 then user_id end)/count(distinct user_id) as percentage_spenders_active 17 | from dailyAggregates 18 | group by 1 19 | order by 1 desc 20 | -------------------------------------------------------------------------------- /KPIs/virtual_currency_spent_on_item.sql: -------------------------------------------------------------------------------- 1 | --Virtual currency spent, grouped by item. 2 | --combine different child events from transaction event 3 | --this will have to be grouped as a pivot table to be used in a data mining query 4 | WITH items as 5 | (SELECT productAmount, ItemName, mainEventId 6 | FROM events 7 | WHERE eventName = 'transaction' 8 | AND productCategory = 'ITEM'), 9 | spendings AS 10 | (SELECT i.ItemName, 11 | i.productAmount, 12 | i.mainEventId, 13 | e.virtualCurrencyName, 14 | e.virtualCurrencyAmount 15 | FROM events AS e 16 | INNER JOIN items AS i ON e.mainEventId = i.mainEventId 17 | WHERE e.productCategory = 'VIRTUAL_CURRENCY' 18 | ORDER BY itemName) 19 | SELECT itemName, 20 | virtualCurrencyName, 21 | sum(virtualCurrencyAmount) 22 | FROM spendings 23 | GROUP BY itemName, 24 | virtualCurrencyName 25 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /Paradigms/New Players by Time of day in local time.sql: -------------------------------------------------------------------------------- 1 | -- In the firstValues CTE we get the first timestamp we have see for a user and the timezoneoffset on the day they were first seen. 2 | --Next we add a number of minutes extracted from the timezone definition, so we are adding the value -90 when converting '-0130'. 3 | -- We now end up with the a row per install containing the timestamp the user is first seen on (localEventTimestamp) and the timezoneoffset in minutes. 4 | WITH firstValues AS 5 | (SELECT userId, 6 | min(eventTimestamp)startTs, 7 | min(timezoneOffset)tzo 8 | FROM EVENTS 9 | WHERE eventDate = gaUserStartDate 10 | GROUP BY userId), 11 | results AS 12 | (SELECT tzo, 13 | startTs, 14 | TIMESTAMPADD(MINUTE, CASE 15 | WHEN CHAR_LENGTH(TZO)=5 THEN CAST(substring(tzo, 1,1)||'1' AS INTEGER) *-- get positive vs negative tzoffset 16 | (CAST(substring(tzo, 2,2) AS INTEGER)*60 + cast(substring(tzo, 4,2) AS INTEGER)) 17 | ELSE NULL --get tzoffset in minutes 18 | 19 | END, startTs) AS localEventTimestamp, 20 | CASE 21 | WHEN CHAR_LENGTH(TZO)=5 THEN CAST(substring(tzo, 1,1)||'1' AS INTEGER) *-- get positive vs negative tzoffset 22 | (CAST(substring(tzo, 2,2) AS INTEGER)*60 + cast(substring(tzo, 4,2) AS INTEGER)) 23 | ELSE NULL --get tzoffset in minutes 24 | 25 | END AS minutes 26 | FROM firstValues) 27 | SELECT * 28 | FROM results 29 | -------------------------------------------------------------------------------- /Paradigms/Percentage spenders over time.sql: -------------------------------------------------------------------------------- 1 | -- Calculates the % of all players that are spenders. 2 | -- The first CTE pulls the date range 3 | -- The second CTE pulls player start dates and first payment dates 4 | -- The final part of the query pulls it all togteher 5 | WITH dates AS 6 | ( SELECT DAY::date 7 | FROM 8 | (SELECT (CURRENT_DATE-30)::TIMESTAMP AS date_range 9 | UNION SELECT CURRENT_DATE::TIMESTAMP)AS ts timeseries DAY AS '1 days' over (ORDER BY date_range)) 10 | , players AS 11 | ( SELECT user_id 12 | , MIN(event_date) as startDate 13 | , MIN(case when revenue > 0 then event_date end) as firstPaymentDate 14 | FROM fact_user_sessions_day 15 | GROUP BY user_id) 16 | SELECT DAY 17 | , count(DISTINCT p.user_id) AS Players 18 | , count(DISTINCT case when p.firstPaymentDate <= DAY then p.user_id end) as Payers 19 | , ROUND((count(DISTINCT case when p.firstPaymentDate <= DAY then p.user_id end) / count(DISTINCT p.user_id))*100,2) AS "% Spenders" 20 | FROM dates d 21 | LEFT JOIN players p on p.startDate <= d.day 22 | WHERE DAY IS NOT NULL 23 | GROUP BY DAY 24 | ORDER BY DAY; -------------------------------------------------------------------------------- /Paradigms/SessionLength differences.sql: -------------------------------------------------------------------------------- 1 | --In older SDK's the session lengths can be very long if an app has been backgrounded. As of 4.1 we have limited the background time to a maximum of 5 minutes, after backgrounding an app for more than 5 minutes a new session will be started. 2 | --This query applies to older SDK's and compares the difference between sessions where we exclude any gaps longer than 5 minutes 3 | WITH DATA AS 4 | (SELECT sessionId, 5 | sum(msSinceLastEvent)AS sessionLength, 6 | sum(CASE 7 | WHEN msSinceLastEvent < 300000 THEN msSinceLastEvent 8 | ELSE NULL 9 | END)AS compensatedSessionLength, 10 | count(*) AS eventCount 11 | FROM EVENTS 12 | GROUP BY sessionId) 13 | SELECT sessionId, 14 | (sessionLength|| ' ms')::interval AS sessionLength, 15 | (compensatedSessionLength|| ' ms')::interval AS compensatedSessionLength, 16 | eventCount 17 | FROM DATA 18 | -------------------------------------------------------------------------------- /Paradigms/ab_testing_variants.sql: -------------------------------------------------------------------------------- 1 | --Measure effects of engagement campaign by revenue. 2 | --In the data CTE we use the last_value function to populate the lastEngagementName abd the lastVariantName with the last known value for that user, partitioning and ordering defined in window. 3 | --Then we take the transaction events that occured after the engagementResponse event that populated the responseEngagementName in the first place and the value of the lastVariantName 4 | --Now we have all transaction events and all revenue generated by each variant, using the pivot feature you can create a new column for each unique value of the lastVariantName and plot the revenue by eventDate. 5 | with data as ( 6 | select eventName, userId, eventDate, eventTimestamp, convertedProductAmount, 7 | last_value(responseEngagementName ignore nulls) over user_window_chronological as lastEngagementName, 8 | last_value(responseVariantName ignore nulls) over user_window_chronological as lastVariantName 9 | from events 10 | where (eventName = 'transaction' and revenueValidated in (0,1)) or responseEngagementName = 'X'-- replace x with name of engagement 11 | WINDOW user_window_chronological as (partition by userId order by eventTimestamp) 12 | ) 13 | select 14 | eventDate, 15 | lastVariantName, 16 | (sum(convertedProductAmount)/100)::float as revenueInUSD, 17 | sum(sum(convertedProductAmount)/100) over (partition by lastVariantName order by eventDate)::float as cumulativeRevenueInUSD, 18 | count(*) as transactions, 19 | count(distinct userId) as convertedUserCount 20 | from data 21 | where lastEngagementName is not null 22 | and eventDate <= current_date 23 | and eventName = 'transaction' 24 | group by eventDate, lastVariantName, lastEngagementName 25 | order by eventDate, lastEngagementName 26 | 27 | -------------------------------------------------------------------------------- /Paradigms/concurrency.sql: -------------------------------------------------------------------------------- 1 | --get the number of users who were in a session at a given timeslot: 2 | with timeslots as( 3 | select 4 | slice_time 5 | from 6 | ( 7 | select now()-interval '6 hours' as date_range 8 | union 9 | select now() 10 | ) as ts timeseries slice_time as '1 minutes' over(order by date_range) 11 | ), data as( 12 | select userId, sessionId, min(eventTimestamp)startTime, max(eventTimestamp)endTime 13 | from events where eventTimestamp between now()-interval '6 hours' and now() 14 | group by 1,2 15 | ) 16 | select slice_time, count(distinct userId) from timeslots ts 17 | left join data d 18 | on ts.slice_time between d.startTime and d.endTime 19 | group by slice_time 20 | order by slice_time desc 21 | -------------------------------------------------------------------------------- /Paradigms/inventory_one_user.sql: -------------------------------------------------------------------------------- 1 | --this query can be used to look at the transactions a user made over time, in the currentlyInInventory you find the amount of the item the user currently has. 2 | SELECT CASE 3 | WHEN sessionId IS NOT NULL THEN 'client' 4 | ELSE 'server' 5 | END AS eventSource, --check where events come from, sessionless events are usually ghost events coming from a server 6 | eventTimestamp, 7 | conditional_change_event(mainEventID) over(partition BY userId 8 | ORDER BY eventId)+1 AS transactionNumber,--numbers the transactions 9 | productName, 10 | sum(CASE 11 | WHEN transactionVector = 'RECEIVED' THEN productAmount 12 | ELSE -productAmount 13 | END) over (partition BY userId, productname 14 | ORDER BY eventId) AS currentlyInInventory,-- this is available to the user at this point in time for this productName. 15 | transactionname,--the name of the transaction, telling you something about why the change happened. 16 | productAmount, 17 | transactionVector 18 | FROM EVENTS 19 | WHERE userId = 'userId goes here' 20 | AND productAmount IS NOT NULL --include all events that report something about the productamount, these will be reported only on eventlevel 1 21 | AND eventlevel = 1 --only select events at eventlevel 1 22 | ORDER BY eventId 23 | -------------------------------------------------------------------------------- /Paradigms/weekly_retention.sql: -------------------------------------------------------------------------------- 1 | 2 | with user_windows as ( 3 | select user_id 4 | , trunc(event_Date, 'D')::Date as week_commencing 5 | , datediff(day,trunc(event_Date, 'D'),lead(trunc(event_Date, 'D'),1) over (partition by user_id order by trunc(event_Date, 'D'))) next_week 6 | , datediff(day,trunc(event_Date, 'D'),lag(trunc(event_Date, 'D'),1) over (partition by user_id order by trunc(event_Date, 'D'))) last_week 7 | , min( trunc(player_start_date, 'D')) as first_week 8 | from fact_user_sessions_day 9 | where event_date > current_date - interval '90 days' 10 | group by user_id, trunc(event_Date, 'D') 11 | order by week_commencing 12 | ) 13 | select (case when first_week=week_commencing or last_week=-14 then week_commencing else week_commencing+interval '7 days' end)::Date as w 14 | , sum(case when next_week=7 and first_week!=week_commencing then 1 else 0 end) current 15 | , sum(case when coalesce(next_week,0)!=7 and first_week!=week_commencing then -1 else 0 end) churned 16 | , sum(case when first_week=week_commencing then 1 else 0 end) new_players 17 | , sum(case when last_week = -14 then 1 else 0 end) returning 18 | from user_windows 19 | where week_commencing=0 THEN 1 23 | ELSE NULL 24 | END) AS 'rightOrderEvents', 25 | count(DISTINCT CASE 26 | WHEN secondsdiff<0 THEN userId 27 | ELSE NULL 28 | END) AS 'wrongOrderUsers', 29 | count(DISTINCT userId) AS userCount, 30 | count(*) AS eventCount 31 | FROM eventData 32 | GROUP BY eventDate) 33 | SELECT eventDate, 34 | wrongOrderEvents, 35 | eventCount, 36 | wrongOrderUsers, 37 | userCount, 38 | round(wrongOrderEvents/eventCount*100,2.0) AS 'percentage of events in wrong order', 39 | round(wrongOrderUsers/userCount*100,2.0) AS 'percentage of users with events in wrong order' 40 | FROM aggregates 41 | ORDER BY eventDate DESC 42 | -------------------------------------------------------------------------------- /QA/events_per_dau.sql: -------------------------------------------------------------------------------- 1 | --Get the number of events per user and the sample size. Should be below 100 on average. 2 | 3 | SELECT eventDate, 4 | count(*)/count(DISTINCT userId) AS 'Events per DAU', 5 | count(DISTINCT userId) AS 'Sample Size' 6 | FROM EVENTS 7 | GROUP BY eventDate 8 | ORDER BY eventDate DESC 9 | -------------------------------------------------------------------------------- /QA/seconds_between_events.sql: -------------------------------------------------------------------------------- 1 | --Get the average number of seconds between events in the first 5 minutes of gameplay. 2 | WITH first_day_events AS 3 | (SELECT userId, 4 | gaUserStartDate, 5 | eventName, 6 | msSinceLastEvent, 7 | min(eventTimestamp) over (partition BY userId 8 | ORDER BY eventId) AS startTimeStamp, 9 | eventTimestamp 10 | FROM EVENTS 11 | WHERE eventDate = gaUserStartDate 12 | AND sessionID IS NOT NULL), 13 | firstMinutes AS 14 | (SELECT userId, 15 | gaUserStartDate, 16 | msSinceLastEvent 17 | FROM first_day_events 18 | WHERE eventTimestamp - startTimeStamp<=interval '5 minute'-- check first 5 minutes 19 | ) 20 | SELECT gaUserStartDate AS 'Date', 21 | avg(msSinceLastEvent/1000)AS 'Average number of seconds between events', 22 | count(*) as 'Number of events', 23 | count(distinct userId) as 'Number of users' 24 | FROM firstMinutes 25 | GROUP BY 1 26 | ORDER BY 1 DESC 27 | 28 | -------------------------------------------------------------------------------- /QA/sessionlength_distribution.sql: -------------------------------------------------------------------------------- 1 | --count sessionLength in last 2 weeks and then band the results. 2 | WITH sessionData as 3 | (SELECT userId, sessionID, sum(msSinceLastEvent) AS duration, count(*)AS eventCount 4 | FROM EVENTS 5 | WHERE eventDate> CURRENT_DATE-14-- only last 10 days 6 | GROUP BY userId, sessionID) 7 | SELECT round(duration/60000/5,0.0)*5 AS durationBandMinutes, --divide by 5 minutes, round, and then multiply by 5 8 | count(*) 9 | FROM sessionData 10 | WHERE duration IS NOT NULL 11 | GROUP BY 1 12 | ORDER BY 1 13 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # sql-cookbook 2 | Common SQL recipes and best practises 3 | 4 | ## Purpose 5 | 6 | This repository aims to provide a number of best practises and ideas on how to construct queries that provide interesting data in a way that works well with our platform. 7 | 8 | Use these queries as you please for educational purposes and to adjust them to suit your specific needs. 9 | 10 | Also everyone is welcome to contribute, just fork the project and file a pull request. 11 | 12 | ## Reference 13 | 14 | These queries can be used to run within the deltaDNA platform in the data mining section. Alternatively when adding either _live or _dev to the table selection you can use these queries in your favourite DB tool via direct access as well. 15 | 16 | [Data mining reference](http://docs.deltadna.com/reference/analyze/data-mining/ "data mining"). 17 | 18 | [Direct SQL Access reference](http://docs.deltadna.com/reference/analyze/direct-sql-access/ "direct access"). 19 | 20 | The queries are executed by an HP vertica DBMS and the SQL dialect used is defined by them and described in the following documentation: 21 | HP Vertica documentation 22 | A [HP Vertica documentation](https://my.vertica.com/docs/8.0.x/HTML/#Authoring/SQLReferenceManual/SQLReferenceManual.htm "Vertica Docs"). 23 | 24 | 25 | ## Prerequisites 26 | 27 | We assume some basic knowledge of SQL select statements and a basic knowledge of the data structure within the DeltaDNA platform, what events you are sending in and what they denote. 28 | 29 | In this cookbook we aim to show how SQL queries can be best used to query the data with the DeltaDNA platform. The theory applies to both the data mining feature as well as direct access. 30 | 31 | ### Data Mining 32 | Our data mining feature enables you to write an SQL query that retrieves data from the tables exposed and shows your the result on screen. These results can then be shown as a graph, a pie chart or as plain data. The queries can be stored and used in custom dashboards and email reports. You can even use data mining queries to create a list of userID's that you can then use within the platform as a target list. 33 | 34 | ### Direct Access 35 | Direct Access is the name of our direct database connection. We provide an endpoint where you can directly connect to the data warehouse as if it was a simple PostgreSQL (Postgres) database. This way you can connect via tools like Tableau, dBeaver or directly within Python or R to the data warehouse. This provides for an infinitely flexible use of the data but it does require some more work on your part. What direct access is not intended for is to mirror the data somewhere else, if you'd like to do this then please download the raw data exports from our S3 archive and use these. 36 | 37 | ### Data mining and Direct Access, the differences 38 | Data mining is part of the platform and is linked to the game and the environment. Within data mining you'll have the events table which is the table for the environment you're in (dev or live) for your game. Direct access connects to the game, this means here you don't have an events table but both the events_dev and the events_live 39 | 40 | ### The data warehouse 41 | The data warehouse consists of a Verica database, we currently expose this via direct access as a Postgres database since the SQL dialect is similar but Postgres is supported by more clients than Vertica. 42 | 43 | Vertica is a column store database that is especially good at analytics. Obviously this is a good thing but there are some things to keep in mind. 44 | 45 | There are a lot of columns, hence why it's also called a wide table, for the way it is stored this is not an issue since only rows that have a value in that column are stored. Naturally there will be quite a few empty values in this table. 46 | Due to all events being stored in this one table there will be a lot of rows when represented as a table. 47 | 48 | When querying the data it makes a big difference how many columns are within scope. The columns you are not interested in should not be queried. This makes `select count(*) from events` usually faster than `select * from events limit 1` When you select a row with all columns anyway you'll quickly find that it is hard to find the values you are actually interested since you will keep having to scroll horizontally. 49 | 50 | ## Tips 51 | Use analytic functions instead of joins 52 | - use case = select clientVersion (clientVersionLast) 53 | 54 | Get the last clientVersion known for a user from the user_metrics table and join it back on to the events table by joining the events 55 | 56 | ```sql 57 | select events.EventName, user_metrics.fieldClientVersionLast, count(distinct userId) as users 58 | from events, user_metrics 59 | where user_metrics.user_id = events.userId 60 | and eventDate > current_date-14 61 | group by events.EventName, user_metrics.fieldClientVersionLast 62 | ``` 63 | 64 | Instead you can use an analytic function to find the last value of the field clientVersion for that user. 65 | ```sql 66 | with dataCTE as (select eventName, 67 | last_value(ClientVersion ignore nulls) over (partition by userId order by eventId) as lastClientVersion, userId 68 | from events 69 | where eventDate > current_date-14 70 | ) 71 | select eventName, lastClientVersion, count(distinct userId) as users 72 | from dataCTE 73 | group by eventName, lastClientVersion 74 | ``` 75 | 76 | ## Query Performance 77 | 78 | A column store database, like Vertica, stores a table of data by splitting the data into multiple indexed data stores where a data store is a chunk split up by column name and within that the partitions the table is set to have. Next to that a data store can be ordered to improve performance for certain types of queries. 79 | 80 | If you are able to define a subset of items in the column or partition definition you can massively increase the query execution performance. 81 | 82 | So only selecting the columns you are interested in will mean that instead of retrieving the data for all columns only the relevant columns are touched. Try running: 83 | select eventTimestamp, userid, eventName from events order by 1 84 | Instead of: 85 | ``` 86 | select * from events order by eventTimestamp 87 | ``` 88 | In the case of the events table it helps to limit the number of dates your query is looking at. 89 | ``` 90 | Where eventDate = current_date -10 91 | ``` 92 | Or if you can limit the number of events to look at: 93 | ``` 94 | Where eventName = ‘gameStarted’ 95 | ```` 96 | Or if you define the users you are interested in: 97 | ``` 98 | Where userId in (‘ThisUser’, ‘AnotherUser’) 99 | ``` 100 | Alternatively if you can sample the data based on the userId so you can do a simple operation that tells you if the user should be included in your search, like only look at users starting with an ‘a’ 101 | ``` 102 | where left(userID, 1) = 'a' 103 | ``` 104 | 105 | Another, more random way to query a subset of users is to run a hash function on the userId which returns a number and then running a modulo on the outcome of that, which gives you a pseudo random number and get only the items where this returns 0, this gives you 1/100th of all users and makes the query nearly 100x as fast. Using this when debugging a complicated query is recommended. (protip: try changing the 100 for 10000 for a smaller subset and changing the 0 for a different number to test with a different sample!) 106 | ``` 107 | where mod(hash(userId), 100)=0 108 | ``` 109 | 110 | **Spooling** 111 | 112 | The next step in running any query is returning the results. This is depending on bandwidth and might cause performance issues on a slow internet connection. Returning 100 rows will not be a problem and most db tools like dbeaver and data mining will limit the number of returned rows to something below 1000. When running queries from R and Python this is an easy pitfall, especially when running on your laptop via a mobile data connection in the park. 113 | 114 | **Filtering in a query** 115 | 116 | Filter as early as possible in a query to make the next step quicker. 117 | So instead of: 118 | ``` 119 | With data as (select eventName, userid, platform from events) 120 | Select * from data 121 | Where eventName = ‘gameStarted’ 122 | ``` 123 | Do 124 | 125 | ``` 126 | With data as (select eventName, userid, platform from events 127 | where eventName = ‘gameStarted’) 128 | Select * from data 129 | ``` 130 | 131 | --------------------------------------------------------------------------------