├── ga3toga4-2.png ├── LICENSE ├── src ├── ga4-to-ga3-query-source-medium-agg.sql ├── ga4-to-ga3-query.sql ├── process_ga3.py └── ua.py └── README.md /ga3toga4-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/locomotive-agency/GA3toGA4/HEAD/ga3toga4-2.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 LOCOMOTIVE® 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /src/ga4-to-ga3-query-source-medium-agg.sql: -------------------------------------------------------------------------------- 1 | -- GA4 to GA3 Format query. 2 | -- This query is developed for readability and not necessarily efficiency. 3 | -- Update PRIOR_DAYS to specify how many days back to pull. 4 | -- Update ga4Raw FROM clause to match your dataset in `..` format. 5 | -- Update `conversion` to conversion event 6 | 7 | 8 | 9 | -- Pulls raw data from GA4 table 10 | WITH 11 | 12 | ga4Raw AS ( 13 | 14 | SELECT 15 | * 16 | FROM `{project_id}.{dataset_id}.{table_prefix}*` 17 | WHERE PARSE_DATE('%Y%m%d', _table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL {prior_days} DAY) AND CURRENT_DATE() 18 | 19 | ), 20 | 21 | 22 | -- Formats GA4 data into a flat table by user, date, and event name. 23 | ga4Flat AS ( 24 | SELECT 25 | PARSE_DATE("%Y%m%d", event_date) event_date, 26 | FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(event_timestamp)) AS event_time, 27 | FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_time, 28 | 29 | -- Type of Event 30 | event_name, 31 | 32 | -- User ID 33 | user_pseudo_id, 34 | 35 | -- User Data 36 | MAX(device.category) device_category, 37 | MAX(device.language) device_language, 38 | MAX(device.web_info.browser) device_browser, 39 | MAX(geo.country) geo_country, 40 | MAX(geo.region) geo_region, 41 | MAX(geo.city) geo_city, 42 | 43 | -- Session Data 44 | IF(event_name = "session_start", 1, 0) entrance, 45 | IF(event_name = "first_visit", 1, 0) new_user, 46 | MAX(IF(params.key = 'ga_session_id', params.value.int_value, null)) ga_session_id, 47 | MAX(IF(params.key = 'ga_session_number', params.value.int_value, null)) ga_session_number, 48 | CAST(MAX(IF(params.key = 'session_engaged', params.value.string_value, null)) as int64) session_engaged, 49 | MAX(IF(params.key = 'page_title', params.value.string_value, null)) page_title, 50 | MAX(IF(params.key = 'page_location' AND event_name = 'page_view', params.value.string_value, null)) pageview_location, 51 | MAX(IF(params.key = 'page_location' AND event_name = 'session_start', params.value.string_value, null)) landing_page, 52 | MAX(IF(params.key = "engagement_time_msec", params.value.int_value/1000, 0)) AS engagment_time_sec, 53 | 54 | -- Referral Data 55 | MAX(traffic_source.source) utm_source, 56 | MAX(traffic_source.medium) utm_medium, 57 | 58 | -- Ecommerce Data 59 | MAX(IF(event_name IN ('purchase', 'ecommerce_purchase', 'in_app_purchase', 'app_store_subscription_convert','app_store_subscription_renew','refund'), ecommerce.transaction_id, null)) ecommerce_transaction_id, 60 | MAX(IF(ecommerce.purchase_revenue IS NOT NULL, ecommerce.purchase_revenue, 0)) ecommerce_purchase_revenue, 61 | 62 | 63 | -- Type of Event 64 | -- Update `conversion` to conversion event 65 | COUNTIF(event_name = '{conversion_event}' AND params.key = "page_location") AS conversions 66 | 67 | FROM ga4Raw, UNNEST(event_params) AS params 68 | 69 | GROUP BY event_date, event_name, event_timestamp, user_pseudo_id, user_first_touch_timestamp 70 | ), 71 | 72 | 73 | -- Aggregates flat data into more session focused data. 74 | ga4Sessions AS ( 75 | SELECT 76 | 77 | -- Dimensions 78 | event_date, 79 | MAX(utm_source) utm_source, 80 | MAX(utm_medium) utm_medium, 81 | 82 | -- Usage Metrics 83 | COUNT(DISTINCT user_pseudo_id) users, 84 | COUNT(pageview_location) page_views, 85 | COUNT(DISTINCT pageview_location) unique_page_views, 86 | SUM(entrance) entrance, 87 | SUM(new_user) new_user, 88 | COUNT(DISTINCT ga_session_id) sessions, 89 | SUM(engagment_time_sec) engagment_time_sec, 90 | 91 | -- Goal Metrics 92 | SUM(conversions) conversions, 93 | SUM(ecommerce_purchase_revenue) ecommerce_revenue, 94 | COUNT(ecommerce_transaction_id) ecommerce_transactions 95 | 96 | FROM ga4Flat 97 | GROUP BY event_date, ga_session_id, user_pseudo_id 98 | 99 | ) 100 | 101 | 102 | -- Main Query: Formats into final GA3-ish report 103 | 104 | SELECT 105 | 106 | event_date date, 107 | 108 | -- Channel Info 109 | IF(utm_source IS NOT NULL, utm_source, "(direct)") utm_source, 110 | IF(utm_medium IS NOT NULL, utm_medium, "(none)") utm_medium, 111 | 112 | -- Aggregated Metrics 113 | SUM(users) users, 114 | SUM(new_user) new_users, 115 | SUM(entrance) entrances, 116 | SUM(sessions) sessions, 117 | SUM(page_views) page_views, 118 | SUM(unique_page_views) unique_page_views, 119 | ROUND(SAFE_DIVIDE(SUM(engagment_time_sec), SUM(sessions)), 2) engagment_time_sec_per_session, 120 | SUM(conversions) conversions, 121 | ROUND(SUM(ecommerce_revenue), 2) ecommerce_revenue, 122 | SUM(ecommerce_transactions) ecommerce_transactions, 123 | 124 | 125 | FROM ga4Sessions 126 | GROUP BY event_date, utm_source, utm_medium 127 | ORDER BY sessions DESC 128 | -------------------------------------------------------------------------------- /src/ga4-to-ga3-query.sql: -------------------------------------------------------------------------------- 1 | -- GA4 to GA3 Format query. 2 | -- This query is developed for readability and not necessarily efficiency. 3 | -- Update PRIOR_DAYS to specify how many days back to pull. 4 | -- Update ga4Raw FROM clause to match your dataset in `..
` format. 5 | -- Update `conversion` to conversion event 6 | 7 | 8 | 9 | -- Pulls raw data from GA4 table 10 | WITH 11 | 12 | ga4Raw AS ( 13 | 14 | SELECT 15 | * 16 | FROM `{project_id}.{dataset_id}.{table_prefix}*` 17 | WHERE PARSE_DATE('%Y%m%d', _table_suffix) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL {prior_days} DAY) AND CURRENT_DATE() 18 | 19 | ), 20 | 21 | 22 | -- Formats GA4 data into a flat table by user, date, and event name. 23 | ga4Flat AS ( 24 | SELECT 25 | PARSE_DATE("%Y%m%d", event_date) event_date, 26 | FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(event_timestamp)) AS event_time, 27 | FORMAT_TIMESTAMP('%Y-%m-%d %H:%M:%S', TIMESTAMP_MICROS(user_first_touch_timestamp)) AS user_first_touch_time, 28 | 29 | -- Type of Event 30 | event_name, 31 | 32 | -- User ID 33 | user_pseudo_id, 34 | 35 | -- User Data 36 | MAX(device.category) device_category, 37 | MAX(device.language) device_language, 38 | MAX(device.web_info.browser) device_browser, 39 | MAX(geo.country) geo_country, 40 | MAX(geo.region) geo_region, 41 | MAX(geo.city) geo_city, 42 | 43 | -- Session Data 44 | IF(event_name = "session_start", 1, 0) entrance, 45 | IF(event_name = "first_visit", 1, 0) new_user, 46 | MAX(IF(params.key = 'ga_session_id', params.value.int_value, null)) ga_session_id, 47 | MAX(IF(params.key = 'ga_session_number', params.value.int_value, null)) ga_session_number, 48 | CAST(MAX(IF(params.key = 'session_engaged', params.value.string_value, null)) as int64) session_engaged, 49 | MAX(IF(params.key = 'page_title', params.value.string_value, null)) page_title, 50 | MAX(IF(params.key = 'page_location' AND event_name = 'page_view', params.value.string_value, null)) pageview_location, 51 | MAX(IF(params.key = 'page_location' AND event_name = 'session_start', params.value.string_value, null)) landing_page, 52 | MAX(IF(params.key = "engagement_time_msec", params.value.int_value/1000, 0)) AS engagment_time_sec, 53 | 54 | -- Referral Data 55 | MAX(IF(params.key = 'source', params.value.string_value, null)) utm_source, 56 | MAX(IF(params.key = 'medium', params.value.string_value, null)) utm_medium, 57 | MAX(IF(params.key = 'campaign', params.value.string_value, null)) utm_campaign, 58 | 59 | -- Ecommerce Data 60 | MAX(ecommerce.transaction_id) ecommerce_transaction_id, 61 | MAX(IF(ecommerce.purchase_revenue IS NOT NULL, ecommerce.purchase_revenue, 0)) ecommerce_purchase_revenue, 62 | 63 | 64 | -- Type of Event 65 | -- Update `conversion` to conversion event 66 | COUNTIF(event_name = '{conversion_event}' AND params.key = "page_location") AS conversions 67 | 68 | FROM ga4Raw, UNNEST(event_params) AS params 69 | 70 | GROUP BY event_date, event_name, event_timestamp, user_pseudo_id, user_first_touch_timestamp 71 | ), 72 | 73 | 74 | -- Aggregates flat data into more session focused data. 75 | ga4Sessions AS ( 76 | SELECT 77 | 78 | -- Dimensions 79 | event_date, 80 | MAX(IF(landing_page IS NOT NULL, landing_page, "(not set")) landing_page, 81 | MAX(geo_country) country, 82 | MAX(geo_region) region, 83 | MAX(geo_city) city, 84 | MAX(utm_source) utm_source, 85 | MAX(utm_medium) utm_medium, 86 | MAX(utm_campaign) utm_campaign, 87 | 88 | -- Usage Metrics 89 | COUNT(DISTINCT user_pseudo_id) users, 90 | COUNT(pageview_location) page_views, 91 | COUNT(DISTINCT pageview_location) unique_page_views, 92 | SUM(entrance) entrance, 93 | SUM(new_user) new_user, 94 | COUNT(DISTINCT ga_session_id) sessions, 95 | SUM(engagment_time_sec) engagment_time_sec, 96 | 97 | -- Goal Metrics 98 | SUM(conversions) conversions, 99 | SUM(ecommerce_purchase_revenue) ecommerce_revenue, 100 | COUNT(DISTINCT ecommerce_transaction_id) ecommerce_transactions 101 | 102 | FROM ga4Flat 103 | GROUP BY event_date, ga_session_id, user_pseudo_id 104 | 105 | ) 106 | 107 | 108 | -- Main Query: Formats into final GA3-ish report 109 | 110 | SELECT 111 | 112 | event_date date, 113 | landing_page, 114 | 115 | -- Geography 116 | IF(country<>'', country, "(not set)") country, 117 | IF(region<>'', region, "(not set)") region, 118 | IF(city<>'', city, "(not set)") city, 119 | 120 | -- Channel Info 121 | IF(utm_source IS NOT NULL, utm_source, "(direct)") utm_source, 122 | IF(utm_medium IS NOT NULL, utm_medium, "(none)") utm_medium, 123 | IF(utm_campaign IS NOT NULL, utm_campaign, "(not set)") utm_campaign, 124 | 125 | -- Aggregated Metrics 126 | SUM(users) users, 127 | SUM(new_user) new_users, 128 | SUM(entrance) entrances, 129 | SUM(sessions) sessions, 130 | SUM(page_views) page_views, 131 | SUM(unique_page_views) unique_page_views, 132 | ROUND(SAFE_DIVIDE(SUM(engagment_time_sec), SUM(sessions)), 2) engagment_time_sec_per_session, 133 | SUM(conversions) conversions, 134 | ROUND(SUM(ecommerce_revenue), 2) ecommerce_revenue, 135 | SUM(ecommerce_transactions) ecommerce_transactions, 136 | 137 | 138 | FROM ga4Sessions 139 | GROUP BY event_date, landing_page, country, region, city, utm_source, utm_medium, utm_campaign 140 | ORDER BY sessions DESC 141 | 142 | -------------------------------------------------------------------------------- /src/process_ga3.py: -------------------------------------------------------------------------------- 1 | from lib.ua import UniversalAnalytics, AnalyticsQuery, AnalyticsReport 2 | from google.cloud import bigquery 3 | from typing import Union, List, Dict 4 | from tqdm import tqdm 5 | import datetime 6 | import pandas as pd 7 | import json 8 | import os 9 | 10 | 11 | def get_ga3( 12 | to_table_id: str, 13 | client: bigquery.client.Client, 14 | ga3_view_id: str, 15 | pull_start_date: str, 16 | goal_metric: str = "ga:goalCompletionsAll", 17 | ) -> None: 18 | 19 | def get_report( 20 | ua: UniversalAnalytics, 21 | ga3_view_id: str, 22 | pull_start_date: str, 23 | pull_end_date: datetime.date, 24 | pageToken: str, 25 | ) -> Dict[str, Union[str, float, int]]: 26 | 27 | dimensions = [ 28 | 'ga:date', 29 | 'ga:hostname', 30 | 'ga:landingPagePath', 31 | 'ga:country', 32 | 'ga:region', 33 | 'ga:city', 34 | 'ga:source', 35 | 'ga:medium', 36 | 'ga:campaign', 37 | ] 38 | 39 | metrics = [ 40 | 'ga:users', 41 | 'ga:newUsers', 42 | 'ga:entrances', 43 | 'ga:sessions', 44 | 'ga:pageviews', 45 | 'ga:uniquePageviews', 46 | 'ga:timeOnPage', 47 | goal_metric, 48 | 'ga:transactionRevenue', 49 | 'ga:transactions', 50 | ] 51 | 52 | query = ( 53 | AnalyticsQuery(ua, ga3_view_id) 54 | .date_range([(pull_start_date, pull_end_date)]) 55 | .dimensions(dimensions) 56 | .metrics(metrics) 57 | .page_size(10000) 58 | .page_token(pageToken) 59 | .sampling_level("LARGE") 60 | ) 61 | response = query.get().raw 62 | return response 63 | 64 | 65 | def get_token( 66 | response: Dict[str, Union[str, float, int]], 67 | ) -> str: 68 | 69 | for report in response.get('reports', []): 70 | pageToken = report.get('nextPageToken', None) 71 | return pageToken 72 | 73 | 74 | def dict_transfer( 75 | response: Dict[str, Union[str, float, int]], 76 | mylist: Union[List, List[Dict[str, Union[str, float, int]]]] 77 | ) -> None: 78 | 79 | for report in response.get('reports', []): 80 | columnHeader = report.get('columnHeader', {}) 81 | dimensionHeaders = columnHeader.get('dimensions', []) 82 | metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', []) 83 | rows = report.get('data', {}).get('rows', []) 84 | for row in rows: 85 | row_dict = {} 86 | dimensions = row.get('dimensions', []) 87 | dateRangeValues = row.get('metrics', []) 88 | 89 | for header, dimension in zip(dimensionHeaders, dimensions): 90 | row_dict[header] = dimension 91 | 92 | for i, values in enumerate(dateRangeValues): 93 | for metric, value in zip(metricHeaders, values.get('values')): 94 | if ',' in value or '.' in value: 95 | row_dict[metric.get('name')] = float(value) 96 | elif value == '0.0': 97 | row_dict[metric.get('name')] = int(float(value)) 98 | else: 99 | row_dict[metric.get('name')] = int(value) 100 | mylist.append(row_dict) 101 | 102 | 103 | ua = UniversalAnalytics('/content/service.json') 104 | 105 | sql = """SELECT MIN(date) FROM `{to_table_id}`""".format(to_table_id=to_table_id) 106 | 107 | result = client.query(sql) 108 | df2 = result.to_dataframe() 109 | first_date = df2.iloc[0]['f0_'] 110 | pull_end_date = (first_date - datetime.timedelta(days=1)) 111 | 112 | def last_day_of_month(any_day): 113 | next_month = any_day.replace(day=28) + datetime.timedelta(days=4) 114 | return next_month - datetime.timedelta(days=next_month.day) 115 | 116 | def monthlist(begin,end): 117 | begin = datetime.datetime.strptime(begin, "%Y-%m-%d") 118 | end = datetime.datetime.combine(end, datetime.time.min) 119 | 120 | result = [] 121 | while True: 122 | if begin.month == 12: 123 | next_month = begin.replace(year=begin.year+1,month=1, day=1) 124 | else: 125 | next_month = begin.replace(month=begin.month+1, day=1) 126 | if next_month > end: 127 | break 128 | result.append ((begin.strftime("%Y-%m-%d"),last_day_of_month(begin).strftime("%Y-%m-%d"))) 129 | begin = next_month 130 | result.append ((begin.strftime("%Y-%m-%d"),end.strftime("%Y-%m-%d"))) 131 | return result 132 | 133 | 134 | date_list = monthlist(pull_start_date,pull_end_date) 135 | 136 | date_list.reverse() 137 | 138 | print(date_list) 139 | 140 | 141 | for date_range in tqdm( 142 | date_list, desc = "Loading GA3 data" 143 | ): 144 | mylist = [] 145 | pageToken = "0" 146 | df = pd.DataFrame() 147 | 148 | while pageToken != None: 149 | response = get_report(ua, ga3_view_id, date_range[0], date_range[1], pageToken) 150 | pageToken = get_token(response) 151 | dict_transfer(response, mylist) 152 | 153 | df = pd.DataFrame(mylist) 154 | df['ga:landingPagePath'].loc[df['ga:landingPagePath'] != "(not set)"] = 'https://' + df['ga:hostname'].loc[df['ga:landingPagePath'] != "(not set)"] + df['ga:landingPagePath'].loc[df['ga:landingPagePath'] != "(not set)"].astype(str) 155 | df['ga:date'] = pd.to_datetime(df['ga:date']).dt.date 156 | 157 | 158 | order = { 159 | 'ga:date': 'date', 160 | 'ga:landingPagePath': 'landing_page', 161 | 'ga:country': 'country', 162 | 'ga:region': 'region', 163 | 'ga:city': 'city', 164 | 'ga:source': 'utm_source', 165 | 'ga:medium': 'utm_medium', 166 | 'ga:campaign': 'utm_campaign', 167 | 'ga:users': 'users', 168 | 'ga:newUsers': 'new_users', 169 | 'ga:entrances': 'entrances', 170 | 'ga:sessions': 'sessions', 171 | 'ga:pageviews': 'page_views', 172 | 'ga:uniquePageviews': 'unique_page_views', 173 | 'ga:timeOnPage': 'engagment_time_sec_per_session', 174 | goal_metric: 'conversions', 175 | 'ga:transactionRevenue': 'ecommerce_revenue', 176 | 'ga:transactions': 'ecommerce_transactions', 177 | } 178 | 179 | df = df[order.keys()].rename(columns=order) 180 | 181 | print("Uploading to BigQuery") 182 | 183 | 184 | job_config = bigquery.LoadJobConfig( 185 | write_disposition="WRITE_APPEND", 186 | schema=[ 187 | bigquery.SchemaField("date", "DATE"), 188 | ], 189 | ) 190 | 191 | job = client.load_table_from_dataframe( 192 | df, 193 | destination=to_table_id, 194 | job_config=job_config 195 | ) 196 | 197 | job.result() 198 | 199 | print("Query results loaded to the table {}".format(to_table_id)) 200 | -------------------------------------------------------------------------------- /src/ua.py: -------------------------------------------------------------------------------- 1 | import os 2 | import datetime 3 | import re 4 | from typing import Union, List, Tuple 5 | 6 | import googleapiclient.discovery 7 | import google.auth 8 | 9 | 10 | 11 | 12 | class UniversalAnalytics: 13 | def __init__(self, service_account_file: str, 14 | project_name: str = None): 15 | self.service_account_file = service_account_file 16 | self.project_name = project_name 17 | 18 | @property 19 | def service(self) -> googleapiclient.discovery.Resource: 20 | """Builds the discovery document for Universal Analytics. 21 | 22 | This is a simple facade for the Google API client discovery builder. For 23 | full details, refer to the following documentation: 24 | https://googleapis.github.io/google-api-python-client/docs/epy/googleapiclient.discovery-module.html#build 25 | """ 26 | return googleapiclient.discovery.build( 27 | 'analyticsreporting', 28 | 'v4', 29 | credentials=self.get_service_account_credentials(), 30 | cache_discovery=False) 31 | 32 | 33 | def get_service_account_credentials(self): 34 | 35 | SCOPES = [ 36 | "https://www.googleapis.com/auth/analytics.readonly", 37 | ] 38 | cred_kwargs = {"scopes": SCOPES} 39 | if self.project_name: 40 | cred_kwargs["quota_project_id"] = self.project_name 41 | 42 | credentials, _ = google.auth.load_credentials_from_file( self.service_account_file or os.environ.get('GOOGLE_APPLICATION_CREDENTIALS'), 43 | **cred_kwargs 44 | ) 45 | return credentials 46 | 47 | def query(self, view_id: str) -> 'AnalyticsQuery': 48 | return AnalyticsQuery(self, view_id) 49 | 50 | 51 | class AnalyticsQuery: 52 | def __init__(self, ua: UniversalAnalytics, view_id: str): 53 | self.ua = ua 54 | self.raw = { 55 | 'reportRequests': [{ 56 | 'viewId': view_id, 57 | 'dateRanges': [{ 58 | 'startDate': self.iso_date(self.days_ago(91)), 59 | 'endDate': self.iso_date(self.days_ago(1)) 60 | }], 61 | 'metrics': [{'expression': 'ga:users'}], 62 | 'samplingLevel': 'LARGE' 63 | }] 64 | } 65 | 66 | @staticmethod 67 | def iso_date(date: Union[str, datetime.date]) -> str: 68 | if isinstance(date, datetime.date): 69 | date = date.isoformat() 70 | iso_regex = r'^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[1-2][0-9]|3[0-1])$' 71 | if not re.fullmatch(iso_regex, date): 72 | raise ValueError(f'The specified date is not in valid ISO format ' 73 | f'"YYYY-MM-DD": "{date}".') 74 | return date 75 | 76 | @staticmethod 77 | def days_ago(days: int) -> datetime.date: 78 | today = datetime.date.today() 79 | return today - datetime.timedelta(days=abs(days)) 80 | 81 | def date_range( 82 | self, 83 | date_ranges: List[Tuple[Union[str, datetime.date], 84 | Union[str, datetime.date]]] 85 | ) -> 'AnalyticsQuery': 86 | """Return a new query for metrics within a given date range. 87 | 88 | Args: 89 | date_ranges: 90 | A list of tuples of date ranges for which to query the report. 91 | Each tuple must contain two date strings in ISO format 92 | (e.g. '2022-04-22'). 93 | 94 | Returns: 95 | An updated AnalyticsQuery object. 96 | """ 97 | query_date_ranges = [] 98 | for dr in date_ranges: 99 | start_date, end_date = dr 100 | start_date = self.iso_date(start_date) 101 | end_date = self.iso_date(end_date) 102 | query_date_ranges.append({ 103 | 'startDate': start_date, 104 | 'endDate': end_date 105 | }) 106 | self.raw['reportRequests'][0]['dateRanges'] = query_date_ranges 107 | 108 | return self 109 | 110 | def dimensions(self, dimensions: List[str]) -> 'AnalyticsQuery': 111 | """Return a new query that fetches the specified dimensions. 112 | 113 | Args: 114 | dimensions: 115 | Dimensions you would like to report on. The name of the 116 | dimension must start with 'ga:'. Refer to the following 117 | documentation for the full list of available dimensions: 118 | https://ga-dev-tools.web.app/dimensions-metrics-explorer/ 119 | 120 | Returns: 121 | An updated AnalyticsQuery object. 122 | """ 123 | self.raw['reportRequests'][0]['dimensions'] = [{'name': dim} for dim in dimensions] 124 | return self 125 | 126 | def metrics(self, metrics: List[str]) -> 'AnalyticsQuery': 127 | """Return a new query that fetches the specified metrics. 128 | 129 | Args: 130 | metrics: 131 | Metrics you would like to report on. The name of the metric 132 | must start with 'ga:'. Refer to the following documentation 133 | for the full list of available metrics: 134 | https://ga-dev-tools.web.app/dimensions-metrics-explorer/ 135 | 136 | Returns: 137 | An updated AnalyticsQuery object. 138 | """ 139 | self.raw['reportRequests'][0]['metrics'] = [{'expression': metric} for metric in metrics] 140 | return self 141 | 142 | def segment(self, segments: List[str]) -> 'AnalyticsQuery': 143 | """Return a new query that fetches with the specified segments. 144 | 145 | Args: 146 | segments: 147 | Segments you would like to report with. The ID of the segment 148 | must start with 'gaid::'. You can use the following tool to 149 | identify segment IDs: 150 | https://ga-dev-tools.web.app/query-explorer/ 151 | 152 | Returns: 153 | An updated AnalyticsQuery object. 154 | """ 155 | self.raw['reportRequests'][0]['segments'] = [{'segmentId': seg} for seg in segments] 156 | return self 157 | 158 | def page_size(self, page_size: int) -> 'AnalyticsQuery': 159 | """Return a new query that fetches with the specified segments. 160 | 161 | Args: 162 | page_size: 163 | Specifies the maximum number of returned rows for a query. 164 | Returns a maximum of 100,000 rows per request, no matter how 165 | many you ask for. 166 | 167 | Returns: 168 | An updated AnalyticsQuery object. 169 | """ 170 | self.raw['reportRequests'][0]['pageSize'] = page_size 171 | return self 172 | 173 | def page_token(self, page_token: str) -> 'AnalyticsQuery': 174 | """Return a new query that fetches with the specified segments. 175 | 176 | Args: 177 | page_token: 178 | A continuation token to get the next page of results. 179 | 180 | Returns: 181 | An updated AnalyticsQuery object. 182 | """ 183 | self.raw['reportRequests'][0]['pageToken'] = page_token 184 | return self 185 | 186 | def sampling_level(self, sampling_level: str) -> 'AnalyticsQuery': 187 | """Return a new query that fetches with the specified segments. 188 | 189 | Args: 190 | sampling_level: 191 | Field to set desired sample size. 192 | 193 | Returns: 194 | An updated AnalyticsQuery object. 195 | """ 196 | self.raw['reportRequests'][0]['samplingLevel'] = sampling_level 197 | return self 198 | 199 | def get(self) -> 'AnalyticsReport': 200 | raw_report = self.ua.service.reports().batchGet(body=self.raw).execute() 201 | return AnalyticsReport(raw_report, self) 202 | 203 | 204 | class AnalyticsReport: 205 | def __init__(self, raw: List[dict], query: AnalyticsQuery): 206 | self.raw = raw 207 | self.query = query 208 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GA3toGA4 2 | A Python tool for formatting GA4 data to match and be backfilled with historical GA3 data in BigQuery. 3 | 4 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1X7bWFcxsoe2WJ892vbbpsERNurv28k7S?usp=sharing) 5 | 6 | 7 | ![Logo for GA3toGA4](https://github.com/locomotive-agency/GA3toGA4/blob/7bae90e97605ea0de97e649eb6a45541ebfe5b31/ga3toga4-2.png) 8 | 9 |

10 | View the Dashboard 11 |

12 | 13 | 14 |

15 | 16 | ## About 17 | 18 | ### Welcome to GA3 to GA4 tool 19 | 20 | :warning: **Warning**: This tool should be considered alpha software and likely to break. It has been tested on a handful of sites and is shared as a proof-of-concept. Use at your own risk. While we will not support this open-source tool, please submit issues so we can make it better. 21 | 22 | GA3 to GA4 is a tool for creating and filling a Bigquery table with available GA4 data and formatting it to be compatible with historical data pulled from the Google Analytics Reporting API v4 for GA3 data. This tool has been designed to run in google colab and to automate a majority of the process. 23 | 24 | This notebook aims to provide 3 main functions 25 | 1. Create table with select GA4 data formatted to GA3 data style. 26 | 2. Set up a scheduled query for a daily recurring pull of new GA4 data. 27 | 3. Backfill formatted GA4 data with historical GA3 data 28 | 29 |

30 | 31 | #### Pulled GA3 & GA4 Data 32 | 33 | | GA3 | GA4 | 34 | | ----------------------|---------------------------------| 35 | | ga:date | date | 36 | | ga:hostname | landing_page | 37 | | ga:landingPagePath | landing_page | 38 | | ga:country | country | 39 | | ga:region | region | 40 | | ga:city | city | 41 | | ga:source | utm_source | 42 | | ga:medium | utm_medium | 43 | | ga:campaign | utm_campaign | 44 | | ga:users | users | 45 | | ga:newUsers | new_users | 46 | | ga:entrances | entrances | 47 | | ga:sessions | sessions | 48 | | ga:uniquePageviews | unique_page_views | 49 | | ga:timeOnPage | engagment_time_sec_per_session | 50 | | *goal_metric* | *conversions* | 51 | | ga:transactionRevenue | ecommerce_revenue | 52 | | ga:transactions | ecommerce_transactions | 53 | 54 | The table above shows what GA3 data is matched with what GA4 data by default. You can edit these values by changing ga4-to-ga3-query.sql and process_ga3.py. 55 | 56 | **Note**: goal_metric is only a placeholder for whatever GA3 conversion event you wish to pull. Defaults to ga:goalCompletionsAll. 57 | 58 |


59 | 60 | ## How To Use 61 | 62 | ### Google Cloud project set-up 63 | 64 | This tool assumes that you have a Google Cloud project created already. This project will contain your from dataset (raw GA4 data from a daily export) and your to dataset (select GA4 data formatted to GA3 style) that will be created by this tool. 65 | 66 | Ensure that BigQuery Daily Export is set up for your chosen GA4 account to your desired Google Cloud project and that you have a table with your raw GA4 data. If you don’t, follow the sets [here](https://support.google.com/analytics/answer/9358801?hl=en) to set it up. This should be set up **without** Streaming Export, and should just be a Daily Export. 67 | 68 | ### Part 1: GA3 to GA4 SQL Workflow 69 | 70 | First, open the notebook we have shared using Google Colab here: https://colab.research.google.com/drive/1X7bWFcxsoe2WJ892vbbpsERNurv28k7S?usp=sharing 71 | 72 | You can create your own editable copy of the notebook by going to File > Save a copy... 73 | 74 | #### Install BigQuery Data Transfer 75 | This is needed to set up the reoccuring queries. 76 | 77 | You will need to restart the runtime of this notebook after running this cell. 78 | 79 | This can be done by going to Runtime > Restart runtime or pressing CTRL + M + . 80 | 81 | ![Restart runtime location](https://user-images.githubusercontent.com/89162068/171264419-eadd41e1-34ab-4896-99fe-b66f6eba9c8f.png) 82 | 83 | #### Load needed libraries 84 | 85 | Run cell to load in the libraries needed for this tool. 86 | 87 | #### Sign in and Authenticate 88 | 89 | Run the cells and input the project name of your Google Cloud Project. You will be asked to sign in. Choose the account with your Google Cloud project on, which should be the same account you are using in colab. 90 | 91 | These cells give access to the Google Drive of the account you sign into, as well as setting up a service account and enabling the needed api permissions within Google Cloud. 92 | 93 | You will have to add the created service account with read access to your GA3 data. 94 | 95 | You do this by going to Google Analytics, selecting the GA3 view that you want to use, clicking Admin in the bottom right, looking in the property settings and selecting property access management. Once there, select the blue button on the top right to add another user. There, you add the service account email with viewer permissions. 96 | 97 | ![Adding service account to GA3 property visual](https://user-images.githubusercontent.com/89162068/171265347-3b0dd2cf-b9ab-4f5b-a06f-caac99a9e170.png) 98 | 99 | #### Specify configuration parameters for the Source and Target Datasets 100 | 101 | For the BigQuery From Table (Source Dataset) you need to input the following: 102 | 103 | * `from_dataset_id` : name of the dataset in your project with your GA4 data. 104 | * `from_table_prefix` : prefix of the tables in source dataset. Most of the time will be “events_”. 105 | * `from_conversion_event` : Optional, set conversion event from GA4 you want to pull. 106 | * `from_initial_pull_prior_days` : How many days back from initial pull do you want. 107 | 108 | For the BigQuery To Table (Target Dataset) you will need to input the following: 109 | * `to_dataset_id` : Name of the dataset you want created/to use in your project to hold modified GA4 data. 110 | * `to_table_name` : Name of the table you want created in the target dataset. 111 | 112 | #### Authenticate BigQuery 113 | 114 | Run cell to authenticate BigQuery Client. 115 | 116 | #### Built Initial Table 117 | Run to pull GA4 data and create the target dataset and table, will error if table is already created under specified name and location. 118 | 119 | #### Set Up Daily Reoccuring Pull 120 | Run to set up the daily scheduled query in Google Cloud. If it errors, check that you have restarted your runtime after [installing BigQuery Data Transfer](https://github.com/locomotive-agency/ga4toga3/edit/main/README.md#install-bigquery-data-transfer). 121 | 122 | #### Review Data 123 | Run to pull data from the target dataset for review. 124 | 125 | ### Part 2: Backfil GA3 data 126 | 127 | #### Specify configuration parameters for the GA3 View 128 | 129 | You need to input the following: 130 | 131 | * `ga3_view_id` : The GA3 view ID that you want to use to backfill your GA4 data with. 132 | * `pull_start_date` : The beginning of the timeframe of GA3 data that you want to pull. 133 | * `goal_metric` : GA3 goal you want to pull. 134 | 135 | #### Backfill GA3 Data 136 | Run to backfil GA3 data to your GA4 data. 137 | 138 | **Note**: If you want to add a conversion event, you will need to add it to the metrics list and to the order dictionary in in lib/process_ga3.py. 139 | 140 | #### Review Data 141 | Run to pull data from the target dataset for review. 142 | 143 |


144 | 145 | ## Acknowledgements 146 | * Thank you to [JR](https://github.com/jroakes), [Joe](https://github.com/joejoinerr), and [Savannah](https://github.com/SavannahCasto) for the work on the creation of this tool. 147 | * Thanks to Derek Perkins of [Nozzle](https://nozzle.io/) for SQL code review. 148 | * This tool was created and released open for the SEO community by [LOCOMOTIVE.agency](https://locomotive.agency/). 149 | 150 |


151 | 152 | ## Feedback 153 | If any errors, concerns, or areas for improvement are found, please open an issue for it on this repo. 154 | If you want to change something, please make a pull request. 155 | We would love to feature dashboards users have made with the data. Please let us know if you have any to share! 156 | 157 |


158 | 159 | 160 | --------------------------------------------------------------------------------