├── .gitignore ├── LICENSE.txt ├── README.md ├── build └── lib │ └── gapandas4 │ ├── __init__.py │ └── gapandas4.py ├── dist ├── gapandas4-0.3-py3-none-any.whl └── gapandas4-0.3.tar.gz ├── gapandas4.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── requires.txt └── top_level.txt ├── gapandas4 ├── __init__.py └── gapandas4.py ├── requirements.txt ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | venv 2 | .idea 3 | example.py 4 | client_secrets.json 5 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | Copyright (c) 2018 YOUR NAME 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | The above copyright notice and this permission notice shall be included in all 10 | copies or substantial portions of the Software. 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 13 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 14 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 15 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 16 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 17 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # GAPandas4 2 | GAPandas4 is a Python package for querying the Google Analytics Data API for GA4 and displaying the results in a Pandas dataframe. It is the successor to the [GAPandas](https://practicaldatascience.co.uk/data-science/how-to-access-google-analytics-data-in-pandas-using-gapandas) package, which did the same thing for GA3 or Universal Analytics. GAPandas4 is a wrapper around the official Google Analytics Data API package and simplifies imports and queries, requiring far less code. 3 | 4 | ### Before you start 5 | In order to use GAPandas4 you will first need to [create a Google Service Account](https://practicaldatascience.co.uk/data-engineering/how-to-create-a-google-service-account-client-secrets-json-key) with access to the Google Analytics Data API and export a client secrets JSON keyfile to use for authentication. You'll also need to add the service account email address as a user on the Google Analytics 4 property you wish to access, and you'll need to note the property ID to use in your queries. 6 | 7 | ### Installation 8 | You can install GAPandas4 in two ways: via GitHub or via PyPi using the Pip Python package management system. 9 | 10 | ```commandline 11 | pip3 install git+https://github.com/practical-data-science/gapandas4.git 12 | pip3 install gapandas4 13 | ``` 14 | 15 | ### Usage examples 16 | GAPandas4 has been written to allow you to use as little code as possible. Unlike the previous version of GAPandas for Universal Analytics, which used a payload based on a Python dictionary, GAPandas4 now uses a Protobuf (Protocol Buffer) payload as used in the API itself. 17 | 18 | #### Report 19 | The `query()` function is used to send a protobug API payload to the API. The function supports various report types 20 | via the `report_type` argument. Standard reports are handled using `report_type="report"`, but this is also the 21 | default. Data are returned as a Pandas dataframe. 22 | 23 | ```python 24 | import gapandas4 as gp 25 | 26 | service_account = 'client_secrets.json' 27 | property_id = 'xxxxxxxxx' 28 | 29 | report_request = gp.RunReportRequest( 30 | property=f"properties/{property_id}", 31 | dimensions=[ 32 | gp.Dimension(name="country"), 33 | gp.Dimension(name="city") 34 | ], 35 | metrics=[ 36 | gp.Metric(name="activeUsers") 37 | ], 38 | date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")], 39 | ) 40 | 41 | df = gp.query(service_account, report_request, report_type="report") 42 | print(df.head()) 43 | ``` 44 | 45 | #### Batch report 46 | If you construct a protobuf payload using `BatchRunReportsRequest()` you can pass up to five requests at once. These 47 | are returned as a list of Pandas dataframes, so will need to access them using their index. 48 | 49 | ```python 50 | import gapandas4 as gp 51 | 52 | service_account = 'client_secrets.json' 53 | property_id = 'xxxxxxxxx' 54 | 55 | 56 | batch_report_request = gp.BatchRunReportsRequest( 57 | property=f"properties/{property_id}", 58 | requests=[ 59 | gp.RunReportRequest( 60 | dimensions=[ 61 | gp.Dimension(name="country"), 62 | gp.Dimension(name="city") 63 | ], 64 | metrics=[ 65 | gp.Metric(name="activeUsers") 66 | ], 67 | date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")] 68 | ), 69 | gp.RunReportRequest( 70 | dimensions=[ 71 | gp.Dimension(name="country"), 72 | gp.Dimension(name="city") 73 | ], 74 | metrics=[ 75 | gp.Metric(name="activeUsers") 76 | ], 77 | date_ranges=[gp.DateRange(start_date="2022-06-02", end_date="2022-06-02")] 78 | ) 79 | ] 80 | ) 81 | 82 | df = gp.query(service_account, batch_report_request, report_type="batch_report") 83 | print(df[0].head()) 84 | print(df[1].head()) 85 | ``` 86 | 87 | #### Pivot report 88 | Constructing a report using `RunPivotReportRequest()` will return pivoted data in a single Pandas dataframe. 89 | 90 | ```python 91 | import gapandas4 as gp 92 | 93 | service_account = 'client_secrets.json' 94 | property_id = 'xxxxxxxxx' 95 | 96 | pivot_request = gp.RunPivotReportRequest( 97 | property=f"properties/{property_id}", 98 | dimensions=[gp.Dimension(name="country"), 99 | gp.Dimension(name="browser")], 100 | metrics=[gp.Metric(name="sessions")], 101 | date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")], 102 | pivots=[ 103 | gp.Pivot( 104 | field_names=["country"], 105 | limit=5, 106 | order_bys=[ 107 | gp.OrderBy( 108 | dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country") 109 | ) 110 | ], 111 | ), 112 | gp.Pivot( 113 | field_names=["browser"], 114 | offset=0, 115 | limit=5, 116 | order_bys=[ 117 | gp.OrderBy( 118 | metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True 119 | ) 120 | ], 121 | ), 122 | ], 123 | ) 124 | 125 | df = gp.query(service_account, pivot_request, report_type="pivot") 126 | print(df.head()) 127 | ``` 128 | 129 | #### Batch pivot report 130 | Constructing a payload using `BatchRunPivotReportsRequest()` will allow you to run up to five pivot reports. These 131 | are returned as a list of Pandas dataframes. 132 | 133 | ```python 134 | import gapandas4 as gp 135 | 136 | service_account = 'client_secrets.json' 137 | property_id = 'xxxxxxxxx' 138 | 139 | batch_pivot_request = gp.BatchRunPivotReportsRequest( 140 | property=f"properties/{property_id}", 141 | requests=[ 142 | gp.RunPivotReportRequest( 143 | dimensions=[gp.Dimension(name="country"), 144 | gp.Dimension(name="browser")], 145 | metrics=[gp.Metric(name="sessions")], 146 | date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")], 147 | pivots=[ 148 | gp.Pivot( 149 | field_names=["country"], 150 | limit=5, 151 | order_bys=[ 152 | gp.OrderBy( 153 | dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country") 154 | ) 155 | ], 156 | ), 157 | gp.Pivot( 158 | field_names=["browser"], 159 | offset=0, 160 | limit=5, 161 | order_bys=[ 162 | gp.OrderBy( 163 | metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True 164 | ) 165 | ], 166 | ), 167 | ], 168 | ), 169 | gp.RunPivotReportRequest( 170 | dimensions=[gp.Dimension(name="country"), 171 | gp.Dimension(name="browser")], 172 | metrics=[gp.Metric(name="sessions")], 173 | date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")], 174 | pivots=[ 175 | gp.Pivot( 176 | field_names=["country"], 177 | limit=5, 178 | order_bys=[ 179 | gp.OrderBy( 180 | dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country") 181 | ) 182 | ], 183 | ), 184 | gp.Pivot( 185 | field_names=["browser"], 186 | offset=0, 187 | limit=5, 188 | order_bys=[ 189 | gp.OrderBy( 190 | metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True 191 | ) 192 | ], 193 | ), 194 | ], 195 | ) 196 | ] 197 | ) 198 | 199 | df = gp.query(service_account, batch_pivot_request, report_type="batch_pivot") 200 | print(df[0].head()) 201 | print(df[1].head()) 202 | 203 | ``` 204 | 205 | #### Metadata 206 | The `get_metadata()` function will return all metadata on dimensions and metrics within the Google Analytics 4 property. 207 | 208 | ```python 209 | metadata = gp.get_metadata(service_account, property_id) 210 | print(metadata) 211 | ``` 212 | 213 | ### Current features 214 | - Support for all current API functionality including `RunReportRequest`, `BatchRunReportsRequest`, 215 | `RunPivotReportRequest`, `BatchRunPivotReportsRequest`, `RunRealtimeReportRequest`, and `GetMetadataRequest`. 216 | - Returns data in a Pandas dataframe, or a list of Pandas dataframes. -------------------------------------------------------------------------------- /build/lib/gapandas4/__init__.py: -------------------------------------------------------------------------------- 1 | from .gapandas4 import query 2 | from .gapandas4 import get_metadata 3 | 4 | from google.analytics.data_v1beta.types import DateRange 5 | from google.analytics.data_v1beta.types import Dimension 6 | from google.analytics.data_v1beta.types import Metric 7 | from google.analytics.data_v1beta.types import OrderBy 8 | from google.analytics.data_v1beta.types import Filter 9 | from google.analytics.data_v1beta.types import Pivot 10 | from google.analytics.data_v1beta.types import FilterExpression 11 | from google.analytics.data_v1beta.types import FilterExpressionList 12 | from google.analytics.data_v1beta.types import RunReportRequest 13 | from google.analytics.data_v1beta.types import BatchRunReportsRequest 14 | from google.analytics.data_v1beta.types import RunPivotReportRequest 15 | from google.analytics.data_v1beta.types import BatchRunPivotReportsRequest 16 | from google.analytics.data_v1beta.types import RunRealtimeReportRequest 17 | -------------------------------------------------------------------------------- /build/lib/gapandas4/gapandas4.py: -------------------------------------------------------------------------------- 1 | """ 2 | GAPandas4 3 | """ 4 | 5 | import os 6 | import pandas as pd 7 | from google.analytics.data_v1beta import BetaAnalyticsDataClient 8 | from google.analytics.data_v1beta.types import MetricType 9 | from google.analytics.data_v1beta.types import GetMetadataRequest 10 | from google.analytics.data_v1beta.types import DateRange 11 | from google.analytics.data_v1beta.types import Dimension 12 | from google.analytics.data_v1beta.types import Metric 13 | from google.analytics.data_v1beta.types import OrderBy 14 | from google.analytics.data_v1beta.types import Filter 15 | from google.analytics.data_v1beta.types import Pivot 16 | from google.analytics.data_v1beta.types import FilterExpression 17 | from google.analytics.data_v1beta.types import FilterExpressionList 18 | from google.analytics.data_v1beta.types import RunReportRequest 19 | from google.analytics.data_v1beta.types import BatchRunReportsRequest 20 | from google.analytics.data_v1beta.types import RunPivotReportRequest 21 | from google.analytics.data_v1beta.types import BatchRunPivotReportsRequest 22 | from google.analytics.data_v1beta.types import RunRealtimeReportRequest 23 | 24 | 25 | def _get_client(service_account): 26 | """Create a connection using a service account. 27 | 28 | Args: 29 | service_account (string): Filepath to Google Service Account client secrets JSON keyfile 30 | 31 | Returns: 32 | client (object): Google Analytics Data API client 33 | """ 34 | 35 | try: 36 | open(service_account) 37 | os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = service_account 38 | client = BetaAnalyticsDataClient() 39 | return client 40 | except Exception: 41 | print('Error: Google Service Account client secrets JSON key file does not exist') 42 | exit() 43 | 44 | 45 | def _get_request(service_account, request, report_type="report"): 46 | """Pass a request to the API and return a response. 47 | 48 | Args: 49 | service_account (string): Filepath to Google Service Account client secrets JSON keyfile 50 | request (protobuf): API request in Protocol Buffer format. 51 | report_type (string): Report type (report, batch_report, pivot, batch_pivot, or realtime) 52 | 53 | Returns: 54 | response: API response. 55 | """ 56 | 57 | client = _get_client(service_account) 58 | 59 | if report_type == "realtime": 60 | response = client.run_realtime_report(request) 61 | 62 | elif report_type == "pivot": 63 | response = client.run_pivot_report(request) 64 | 65 | elif report_type == "batch_pivot": 66 | response = client.batch_run_pivot_reports(request) 67 | 68 | elif report_type == "batch_report": 69 | response = client.batch_run_reports(request) 70 | 71 | else: 72 | response = client.run_report(request) 73 | 74 | return response 75 | 76 | 77 | def _get_headers(response): 78 | """Return a Python list of dimension and metric header names from the Protobuf response. 79 | 80 | Args: 81 | response (object): Google Analytics Data API response 82 | 83 | Returns: 84 | headers (list): List of column header names. 85 | """ 86 | 87 | headers = [] 88 | 89 | for header in response.dimension_headers: 90 | headers.append(header.name) 91 | 92 | for header in response.metric_headers: 93 | headers.append(header.name) 94 | 95 | return headers 96 | 97 | 98 | def _get_rows(response): 99 | """Return a Python list of row value lists from the Protobuf response. 100 | 101 | Args: 102 | response (object): Google Analytics Data API response 103 | 104 | Returns: 105 | rows (list): List of rows. 106 | 107 | """ 108 | 109 | rows = [] 110 | for _row in response.rows: 111 | row = [] 112 | for dimension in _row.dimension_values: 113 | row.append(dimension.value) 114 | for metric in _row.metric_values: 115 | row.append(metric.value) 116 | rows.append(row) 117 | return rows 118 | 119 | 120 | def _to_dataframe(response): 121 | """Returns a Pandas dataframe of results. 122 | 123 | Args: 124 | response (object): Google Analytics Data API response 125 | 126 | Returns: 127 | df (dataframe): Pandas dataframe created from response. 128 | """ 129 | 130 | headers = _get_headers(response) 131 | rows = _get_rows(response) 132 | df = pd.DataFrame(rows, columns=headers) 133 | return df 134 | 135 | 136 | def _batch_to_dataframe_list(response): 137 | """Return a list of dataframes of results from a batchRunReports query. 138 | 139 | Args: 140 | response (object): Response object from a batchRunReports query. 141 | 142 | Returns: 143 | output (list): List of Pandas dataframes of results. 144 | """ 145 | 146 | output = [] 147 | for report in response.reports: 148 | output.append(_to_dataframe(report)) 149 | return output 150 | 151 | 152 | def _batch_pivot_to_dataframe_list(response): 153 | """Return a list of dataframes of results from a batchRunPivotReports query. 154 | 155 | Args: 156 | response (object): Response object from a batchRunPivotReports query. 157 | 158 | Returns: 159 | output (list): List of Pandas dataframes of results. 160 | """ 161 | 162 | output = [] 163 | for report in response.pivot_reports: 164 | output.append(_to_dataframe(report)) 165 | return output 166 | 167 | 168 | def _handle_response(response): 169 | """Use the kind to determine the type of report requested and reformat the output to a Pandas dataframe. 170 | 171 | Args: 172 | response (object): Protobuf response object from the Google Analytics Data API. 173 | 174 | Returns: 175 | output (dataframe, or list of dataframes): Return a single dataframe for runReport, runPivotReport, 176 | or runRealtimeReport 177 | or a list of dataframes for batchRunReports and batchRunPivotReports. 178 | """ 179 | 180 | if response.kind == "analyticsData#runReport": 181 | return _to_dataframe(response) 182 | if response.kind == "analyticsData#batchRunReports": 183 | return _batch_to_dataframe_list(response) 184 | if response.kind == "analyticsData#runPivotReport": 185 | return _to_dataframe(response) 186 | if response.kind == "analyticsData#batchRunPivotReports": 187 | return _batch_pivot_to_dataframe_list(response) 188 | if response.kind == "analyticsData#runRealtimeReport": 189 | return _to_dataframe(response) 190 | else: 191 | print('Unsupported') 192 | 193 | 194 | def query(service_account, request, report_type="report"): 195 | """Return Pandas formatted data for a Google Analytics Data API query. 196 | 197 | Args: 198 | service_account (string): Path to Google Service Account client secrets JSON key file 199 | request (protobuf): Google Analytics Data API protocol buffer request 200 | report_type (string): Report type (report, batch_report, pivot, batch_pivot, or realtime) 201 | 202 | Returns: 203 | output (dataframe, or list of dataframes): Return a single dataframe for runReport, runPivotReport, 204 | or runRealtimeReport 205 | or a list of dataframes for batchRunReports and batchRunPivotReports. 206 | """ 207 | 208 | response = _get_request(service_account, request, report_type) 209 | output = _handle_response(response) 210 | return output 211 | 212 | 213 | def get_metadata(service_account, property_id): 214 | """Return metadata for the Google Analytics property. 215 | Args: 216 | service_account (string): Filepath to Google Service Account client secrets JSON keyfile 217 | property_id (string): Google Analytics 4 property ID 218 | 219 | Returns: 220 | df (dataframe): Pandas dataframe of metadata for the property. 221 | """ 222 | 223 | client = _get_client(service_account) 224 | request = GetMetadataRequest(name=f"properties/{property_id}/metadata") 225 | response = client.get_metadata(request) 226 | 227 | metadata = [] 228 | for dimension in response.dimensions: 229 | metadata.append({ 230 | "Type": "Dimension", 231 | "Data type": "STRING", 232 | "API Name": dimension.api_name, 233 | "UI Name": dimension.ui_name, 234 | "Description": dimension.description, 235 | "Custom definition": dimension.custom_definition 236 | }) 237 | 238 | for metric in response.metrics: 239 | metadata.append({ 240 | "Type": "Metric", 241 | "Data type": MetricType(metric.type_).name, 242 | "API Name": metric.api_name, 243 | "UI Name": metric.ui_name, 244 | "Description": metric.description, 245 | "Custom definition": metric.custom_definition 246 | }) 247 | 248 | return pd.DataFrame(metadata).sort_values(by=['Type', 'API Name']).drop_duplicates() 249 | 250 | -------------------------------------------------------------------------------- /dist/gapandas4-0.3-py3-none-any.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practical-data-science/gapandas4/06ee7224524e5c906e7d8baeb1af817ac059b170/dist/gapandas4-0.3-py3-none-any.whl -------------------------------------------------------------------------------- /dist/gapandas4-0.3.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practical-data-science/gapandas4/06ee7224524e5c906e7d8baeb1af817ac059b170/dist/gapandas4-0.3.tar.gz -------------------------------------------------------------------------------- /gapandas4.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 2.1 2 | Name: gapandas4 3 | Version: 0.3 4 | Summary: GAPandas4 is a Python package for accessing the Google Analytics Data API for GA4 using Pandas 5 | Home-page: https://github.com/practicaldatascience/gapandas4 6 | Author: Matt Clarke 7 | Author-email: matt@practicaldatascience.co.uk 8 | License: MIT 9 | Download-URL: https://github.com/practicaldatascience/gapandas4/archive/master.zip 10 | Description: # GAPandas4 11 | GAPandas4 is a Python package for querying the Google Analytics Data API for GA4 and displaying the results in a Pandas dataframe. It is the successor to the [GAPandas](https://practicaldatascience.co.uk/data-science/how-to-access-google-analytics-data-in-pandas-using-gapandas) package, which did the same thing for GA3 or Universal Analytics. GAPandas4 is a wrapper around the official Google Analytics Data API package and simplifies imports and queries, requiring far less code. 12 | 13 | ### Before you start 14 | In order to use GAPandas4 you will first need to [create a Google Service Account](https://practicaldatascience.co.uk/data-engineering/how-to-create-a-google-service-account-client-secrets-json-key) with access to the Google Analytics Data API and export a client secrets JSON keyfile to use for authentication. You'll also need to add the service account email address as a user on the Google Analytics 4 property you wish to access, and you'll need to note the property ID to use in your queries. 15 | 16 | ### Installation 17 | As this is currently in alpha, there's currently no Pip package, however, you can install the code into your Python environment directly from GitHub using the command below. It will run fine in a Jupyter notebook, a Python IDE, or a Python script. 18 | 19 | ```commandline 20 | pip3 install git+https://github.com/practical-data-science/gapandas4.git 21 | ``` 22 | 23 | ### Usage 24 | GAPandas4 has been written to allow you to use as little code as possible. Unlike the previous version of GAPandas for Universal Analytics, which used a payload based on a Python dictionary, GAPandas4 now uses a Protobuf (Protocol Buffer) payload as used in the API itself. 25 | 26 | ### Report 27 | The `query()` function is used to send a protobug API payload to the API. The function supports various report types 28 | via the `report_type` argument. Standard reports are handled using `report_type="report"`, but this is also the 29 | default. Data are returned as a Pandas dataframe. 30 | 31 | ```python 32 | import gapandas4 as gp 33 | 34 | service_account = 'client_secrets.json' 35 | property_id = 'xxxxxxxxx' 36 | 37 | report_request = gp.RunReportRequest( 38 | property=f"properties/{property_id}", 39 | dimensions=[ 40 | gp.Dimension(name="country"), 41 | gp.Dimension(name="city") 42 | ], 43 | metrics=[ 44 | gp.Metric(name="activeUsers") 45 | ], 46 | date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")], 47 | ) 48 | 49 | df = gp.query(service_account, report_request, report_type="report") 50 | print(df.head()) 51 | ``` 52 | 53 | ### Batch report 54 | If you construct a protobuf payload using `BatchRunReportsRequest()` you can pass up to five requests at once. These 55 | are returned as a list of Pandas dataframes, so will need to access them using their index. 56 | 57 | ```python 58 | import gapandas4 as gp 59 | 60 | service_account = 'client_secrets.json' 61 | property_id = 'xxxxxxxxx' 62 | 63 | 64 | batch_report_request = gp.BatchRunReportsRequest( 65 | property=f"properties/{property_id}", 66 | requests=[ 67 | gp.RunReportRequest( 68 | dimensions=[ 69 | gp.Dimension(name="country"), 70 | gp.Dimension(name="city") 71 | ], 72 | metrics=[ 73 | gp.Metric(name="activeUsers") 74 | ], 75 | date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")] 76 | ), 77 | gp.RunReportRequest( 78 | dimensions=[ 79 | gp.Dimension(name="country"), 80 | gp.Dimension(name="city") 81 | ], 82 | metrics=[ 83 | gp.Metric(name="activeUsers") 84 | ], 85 | date_ranges=[gp.DateRange(start_date="2022-06-02", end_date="2022-06-02")] 86 | ) 87 | ] 88 | ) 89 | 90 | df = gp.query(service_account, batch_report_request, report_type="batch_report") 91 | print(df[0].head()) 92 | print(df[1].head()) 93 | ``` 94 | 95 | ### Pivot report 96 | Constructing a report using `RunPivotReportRequest()` will return pivoted data in a single Pandas dataframe. 97 | 98 | ```python 99 | import gapandas4 as gp 100 | 101 | service_account = 'client_secrets.json' 102 | property_id = 'xxxxxxxxx' 103 | 104 | pivot_request = gp.RunPivotReportRequest( 105 | property=f"properties/{property_id}", 106 | dimensions=[gp.Dimension(name="country"), 107 | gp.Dimension(name="browser")], 108 | metrics=[gp.Metric(name="sessions")], 109 | date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")], 110 | pivots=[ 111 | gp.Pivot( 112 | field_names=["country"], 113 | limit=5, 114 | order_bys=[ 115 | gp.OrderBy( 116 | dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country") 117 | ) 118 | ], 119 | ), 120 | gp.Pivot( 121 | field_names=["browser"], 122 | offset=0, 123 | limit=5, 124 | order_bys=[ 125 | gp.OrderBy( 126 | metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True 127 | ) 128 | ], 129 | ), 130 | ], 131 | ) 132 | 133 | df = gp.query(service_account, pivot_request, report_type="pivot") 134 | print(df.head()) 135 | ``` 136 | 137 | ### Batch pivot report 138 | Constructing a payload using `BatchRunPivotReportsRequest()` will allow you to run up to five pivot reports. These 139 | are returned as a list of Pandas dataframes. 140 | 141 | ```python 142 | import gapandas4 as gp 143 | 144 | service_account = 'client_secrets.json' 145 | property_id = 'xxxxxxxxx' 146 | 147 | batch_pivot_request = gp.BatchRunPivotReportsRequest( 148 | property=f"properties/{property_id}", 149 | requests=[ 150 | gp.RunPivotReportRequest( 151 | dimensions=[gp.Dimension(name="country"), 152 | gp.Dimension(name="browser")], 153 | metrics=[gp.Metric(name="sessions")], 154 | date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")], 155 | pivots=[ 156 | gp.Pivot( 157 | field_names=["country"], 158 | limit=5, 159 | order_bys=[ 160 | gp.OrderBy( 161 | dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country") 162 | ) 163 | ], 164 | ), 165 | gp.Pivot( 166 | field_names=["browser"], 167 | offset=0, 168 | limit=5, 169 | order_bys=[ 170 | gp.OrderBy( 171 | metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True 172 | ) 173 | ], 174 | ), 175 | ], 176 | ), 177 | gp.RunPivotReportRequest( 178 | dimensions=[gp.Dimension(name="country"), 179 | gp.Dimension(name="browser")], 180 | metrics=[gp.Metric(name="sessions")], 181 | date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")], 182 | pivots=[ 183 | gp.Pivot( 184 | field_names=["country"], 185 | limit=5, 186 | order_bys=[ 187 | gp.OrderBy( 188 | dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country") 189 | ) 190 | ], 191 | ), 192 | gp.Pivot( 193 | field_names=["browser"], 194 | offset=0, 195 | limit=5, 196 | order_bys=[ 197 | gp.OrderBy( 198 | metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True 199 | ) 200 | ], 201 | ), 202 | ], 203 | ) 204 | ] 205 | ) 206 | 207 | df = gp.query(service_account, batch_pivot_request, report_type="batch_pivot") 208 | print(df[0].head()) 209 | print(df[1].head()) 210 | 211 | ``` 212 | 213 | #### Metadata 214 | The `get_metadata()` function will return all metadata on dimensions and metrics within the Google Analytics 4 property. 215 | 216 | ```python 217 | metadata = gp.get_metadata(service_account, property_id) 218 | print(metadata) 219 | ``` 220 | 221 | ### Current features 222 | - Support for all current API functionality including `RunReportRequest`, `BatchRunReportsRequest`, 223 | `RunPivotReportRequest`, `BatchRunPivotReportsRequest`, `RunRealtimeReportRequest`, and `GetMetadataRequest`. 224 | - Returns data in a Pandas dataframe, or a list of Pandas dataframes. 225 | Keywords: python,google analytics,ga,pandas,universal analytics,gapandas,ga4 226 | Platform: UNKNOWN 227 | Classifier: Development Status :: 3 - Alpha 228 | Classifier: Intended Audience :: Developers 229 | Classifier: Topic :: Software Development :: Libraries :: Python Modules 230 | Classifier: License :: OSI Approved :: MIT License 231 | Classifier: Programming Language :: Python :: 3.6 232 | Description-Content-Type: text/markdown 233 | -------------------------------------------------------------------------------- /gapandas4.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | LICENSE.txt 2 | README.md 3 | setup.cfg 4 | setup.py 5 | gapandas4/__init__.py 6 | gapandas4/gapandas4.py 7 | gapandas4.egg-info/PKG-INFO 8 | gapandas4.egg-info/SOURCES.txt 9 | gapandas4.egg-info/dependency_links.txt 10 | gapandas4.egg-info/requires.txt 11 | gapandas4.egg-info/top_level.txt -------------------------------------------------------------------------------- /gapandas4.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /gapandas4.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | google-analytics-data 2 | pandas 3 | -------------------------------------------------------------------------------- /gapandas4.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | gapandas4 2 | -------------------------------------------------------------------------------- /gapandas4/__init__.py: -------------------------------------------------------------------------------- 1 | from .gapandas4 import query 2 | from .gapandas4 import get_metadata 3 | 4 | from google.analytics.data_v1beta.types import DateRange 5 | from google.analytics.data_v1beta.types import Dimension 6 | from google.analytics.data_v1beta.types import Metric 7 | from google.analytics.data_v1beta.types import OrderBy 8 | from google.analytics.data_v1beta.types import Filter 9 | from google.analytics.data_v1beta.types import Pivot 10 | from google.analytics.data_v1beta.types import FilterExpression 11 | from google.analytics.data_v1beta.types import FilterExpressionList 12 | from google.analytics.data_v1beta.types import RunReportRequest 13 | from google.analytics.data_v1beta.types import BatchRunReportsRequest 14 | from google.analytics.data_v1beta.types import RunPivotReportRequest 15 | from google.analytics.data_v1beta.types import BatchRunPivotReportsRequest 16 | from google.analytics.data_v1beta.types import RunRealtimeReportRequest 17 | -------------------------------------------------------------------------------- /gapandas4/gapandas4.py: -------------------------------------------------------------------------------- 1 | """ 2 | GAPandas4 3 | """ 4 | 5 | import os 6 | import pandas as pd 7 | from google.analytics.data_v1beta import BetaAnalyticsDataClient 8 | from google.analytics.data_v1beta.types import MetricType 9 | from google.analytics.data_v1beta.types import GetMetadataRequest 10 | from google.analytics.data_v1beta.types import DateRange 11 | from google.analytics.data_v1beta.types import Dimension 12 | from google.analytics.data_v1beta.types import Metric 13 | from google.analytics.data_v1beta.types import OrderBy 14 | from google.analytics.data_v1beta.types import Filter 15 | from google.analytics.data_v1beta.types import Pivot 16 | from google.analytics.data_v1beta.types import FilterExpression 17 | from google.analytics.data_v1beta.types import FilterExpressionList 18 | from google.analytics.data_v1beta.types import RunReportRequest 19 | from google.analytics.data_v1beta.types import BatchRunReportsRequest 20 | from google.analytics.data_v1beta.types import RunPivotReportRequest 21 | from google.analytics.data_v1beta.types import BatchRunPivotReportsRequest 22 | from google.analytics.data_v1beta.types import RunRealtimeReportRequest 23 | 24 | 25 | def _get_client(service_account): 26 | """Create a connection using a service account. 27 | 28 | Args: 29 | service_account (string): Filepath to Google Service Account client secrets JSON keyfile 30 | 31 | Returns: 32 | client (object): Google Analytics Data API client 33 | """ 34 | 35 | try: 36 | open(service_account) 37 | os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = service_account 38 | client = BetaAnalyticsDataClient() 39 | return client 40 | except Exception: 41 | print('Error: Google Service Account client secrets JSON key file does not exist') 42 | exit() 43 | 44 | 45 | def _get_request(service_account, request, report_type="report"): 46 | """Pass a request to the API and return a response. 47 | 48 | Args: 49 | service_account (string): Filepath to Google Service Account client secrets JSON keyfile 50 | request (protobuf): API request in Protocol Buffer format. 51 | report_type (string): Report type (report, batch_report, pivot, batch_pivot, or realtime) 52 | 53 | Returns: 54 | response: API response. 55 | """ 56 | 57 | client = _get_client(service_account) 58 | 59 | if report_type == "realtime": 60 | response = client.run_realtime_report(request) 61 | 62 | elif report_type == "pivot": 63 | response = client.run_pivot_report(request) 64 | 65 | elif report_type == "batch_pivot": 66 | response = client.batch_run_pivot_reports(request) 67 | 68 | elif report_type == "batch_report": 69 | response = client.batch_run_reports(request) 70 | 71 | else: 72 | response = client.run_report(request) 73 | 74 | return response 75 | 76 | 77 | def _get_headers(response): 78 | """Return a Python list of dimension and metric header names from the Protobuf response. 79 | 80 | Args: 81 | response (object): Google Analytics Data API response 82 | 83 | Returns: 84 | headers (list): List of column header names. 85 | """ 86 | 87 | headers = [] 88 | 89 | for header in response.dimension_headers: 90 | headers.append(header.name) 91 | 92 | for header in response.metric_headers: 93 | headers.append(header.name) 94 | 95 | return headers 96 | 97 | 98 | def _get_rows(response): 99 | """Return a Python list of row value lists from the Protobuf response. 100 | 101 | Args: 102 | response (object): Google Analytics Data API response 103 | 104 | Returns: 105 | rows (list): List of rows. 106 | 107 | """ 108 | 109 | rows = [] 110 | for _row in response.rows: 111 | row = [] 112 | for dimension in _row.dimension_values: 113 | row.append(dimension.value) 114 | for metric in _row.metric_values: 115 | row.append(metric.value) 116 | rows.append(row) 117 | return rows 118 | 119 | 120 | def _to_dataframe(response): 121 | """Returns a Pandas dataframe of results. 122 | 123 | Args: 124 | response (object): Google Analytics Data API response 125 | 126 | Returns: 127 | df (dataframe): Pandas dataframe created from response. 128 | """ 129 | 130 | headers = _get_headers(response) 131 | rows = _get_rows(response) 132 | df = pd.DataFrame(rows, columns=headers) 133 | return df 134 | 135 | 136 | def _batch_to_dataframe_list(response): 137 | """Return a list of dataframes of results from a batchRunReports query. 138 | 139 | Args: 140 | response (object): Response object from a batchRunReports query. 141 | 142 | Returns: 143 | output (list): List of Pandas dataframes of results. 144 | """ 145 | 146 | output = [] 147 | for report in response.reports: 148 | output.append(_to_dataframe(report)) 149 | return output 150 | 151 | 152 | def _batch_pivot_to_dataframe_list(response): 153 | """Return a list of dataframes of results from a batchRunPivotReports query. 154 | 155 | Args: 156 | response (object): Response object from a batchRunPivotReports query. 157 | 158 | Returns: 159 | output (list): List of Pandas dataframes of results. 160 | """ 161 | 162 | output = [] 163 | for report in response.pivot_reports: 164 | output.append(_to_dataframe(report)) 165 | return output 166 | 167 | 168 | def _handle_response(response): 169 | """Use the kind to determine the type of report requested and reformat the output to a Pandas dataframe. 170 | 171 | Args: 172 | response (object): Protobuf response object from the Google Analytics Data API. 173 | 174 | Returns: 175 | output (dataframe, or list of dataframes): Return a single dataframe for runReport, runPivotReport, 176 | or runRealtimeReport 177 | or a list of dataframes for batchRunReports and batchRunPivotReports. 178 | """ 179 | 180 | if response.kind == "analyticsData#runReport": 181 | return _to_dataframe(response) 182 | if response.kind == "analyticsData#batchRunReports": 183 | return _batch_to_dataframe_list(response) 184 | if response.kind == "analyticsData#runPivotReport": 185 | return _to_dataframe(response) 186 | if response.kind == "analyticsData#batchRunPivotReports": 187 | return _batch_pivot_to_dataframe_list(response) 188 | if response.kind == "analyticsData#runRealtimeReport": 189 | return _to_dataframe(response) 190 | else: 191 | print('Unsupported') 192 | 193 | 194 | def query(service_account, request, report_type="report"): 195 | """Return Pandas formatted data for a Google Analytics Data API query. 196 | 197 | Args: 198 | service_account (string): Path to Google Service Account client secrets JSON key file 199 | request (protobuf): Google Analytics Data API protocol buffer request 200 | report_type (string): Report type (report, batch_report, pivot, batch_pivot, or realtime) 201 | 202 | Returns: 203 | output (dataframe, or list of dataframes): Return a single dataframe for runReport, runPivotReport, 204 | or runRealtimeReport 205 | or a list of dataframes for batchRunReports and batchRunPivotReports. 206 | """ 207 | 208 | response = _get_request(service_account, request, report_type) 209 | output = _handle_response(response) 210 | return output 211 | 212 | 213 | def get_metadata(service_account, property_id): 214 | """Return metadata for the Google Analytics property. 215 | Args: 216 | service_account (string): Filepath to Google Service Account client secrets JSON keyfile 217 | property_id (string): Google Analytics 4 property ID 218 | 219 | Returns: 220 | df (dataframe): Pandas dataframe of metadata for the property. 221 | """ 222 | 223 | client = _get_client(service_account) 224 | request = GetMetadataRequest(name=f"properties/{property_id}/metadata") 225 | response = client.get_metadata(request) 226 | 227 | metadata = [] 228 | for dimension in response.dimensions: 229 | metadata.append({ 230 | "Type": "Dimension", 231 | "Data type": "STRING", 232 | "API Name": dimension.api_name, 233 | "UI Name": dimension.ui_name, 234 | "Description": dimension.description, 235 | "Custom definition": dimension.custom_definition 236 | }) 237 | 238 | for metric in response.metrics: 239 | metadata.append({ 240 | "Type": "Metric", 241 | "Data type": MetricType(metric.type_).name, 242 | "API Name": metric.api_name, 243 | "UI Name": metric.ui_name, 244 | "Description": metric.description, 245 | "Custom definition": metric.custom_definition 246 | }) 247 | 248 | return pd.DataFrame(metadata).sort_values(by=['Type', 'API Name']).drop_duplicates() 249 | 250 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pandas>=1.2.5 2 | google-analytics-data 3 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | from os import path 4 | this_directory = path.abspath(path.dirname(__file__)) 5 | with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f: 6 | long_description = f.read() 7 | 8 | setup( 9 | name='gapandas4', 10 | packages=['gapandas4'], 11 | version='0.003', 12 | license='MIT', 13 | description='GAPandas4 is a Python package for accessing the Google Analytics Data API for GA4 using Pandas', 14 | long_description=long_description, 15 | long_description_content_type='text/markdown', 16 | author='Matt Clarke', 17 | author_email='matt@practicaldatascience.co.uk', 18 | url='https://github.com/practicaldatascience/gapandas4', 19 | download_url='https://github.com/practicaldatascience/gapandas4/archive/master.zip', 20 | keywords=['python', 'google analytics', 'ga', 'pandas', 'universal analytics', 'gapandas', 'ga4'], 21 | classifiers=[ 22 | 'Development Status :: 3 - Alpha', 23 | 'Intended Audience :: Developers', 24 | 'Topic :: Software Development :: Libraries :: Python Modules', 25 | 'License :: OSI Approved :: MIT License', 26 | 'Programming Language :: Python :: 3.6', 27 | ], 28 | install_requires=['pandas', 'google-analytics-data'] 29 | ) 30 | --------------------------------------------------------------------------------