├── .gitignore ├── CHANGELOG.md ├── LICENSE ├── README.md ├── docs ├── google-adwords-account-api-center.png ├── google-adwords-account-page.png ├── google-adwords-account-preferences.png ├── google-adwords-api-authorization-code.png ├── google-adwords-api-consent.png └── google-adwords-api-token-application.png ├── google_ads_downloader ├── __init__.py ├── cli.py ├── config.py └── downloader.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info/ 2 | __pycache__ 3 | .idea 4 | .vscode 5 | .venv/ 6 | build 7 | dist 8 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | ## 4.1.0 (2019-09-03) 4 | 5 | - Add option to download keywords-performance reports 6 | 7 | ## 4.0.0 (2019-07-05) 8 | 9 | - Compatible with specifications: a unique identifier is an Ad ID + Ad Group ID. 10 | - Add option to ignore downloading of data related to removed campaigns 11 | 12 | **required changes** 13 | 14 | - The file format changed to `v5`. Adapt etl scripts that process the output data. 15 | - Ad ID no longer unique in any files 16 | - Ad performance datasets now include Ad Group Id 17 | 18 | ## 3.0.0 (2019-04-13) 19 | 20 | - Change MARA_XXX variables to functions to delay importing of imports 21 | 22 | **required changes** 23 | 24 | - If used together with a mara project, Update `mara-app` to `>=2.0.0` 25 | 26 | 27 | ## 2.1.0 28 | *2019-01-23* 29 | 30 | - Migrate into Adwords API version v201809 31 | - Update googleads-python-lib to 15.0.2 32 | 33 | ## 2.0.0 34 | *2018-08-19* 35 | 36 | - Rename package to google-ads-performance-downloader 37 | - Adapt code to reflect the renaming of Adwords to Google Ads 38 | - Update googleads-python-lib to 13.0.0 39 | 40 | **required changes** 41 | 42 | - use new package names in requirements.txt 43 | - adapt ETL to new ouput file names 44 | - adapt calls to download cli command 45 | 46 | 47 | ## 1.7.1 48 | *2018-05-02* 49 | 50 | - Moved to googleads version 11.0.1 51 | - uses now google_auth_oauthlib 52 | 53 | ## 1.7.0 54 | *2018-03-12* 55 | - Download currency information for each account 56 | 57 | 58 | ## 1.6.1 59 | *2018-03-05* 60 | 61 | - Made API version configurable 62 | 63 | 64 | ## 1.6.0 65 | *2018-02-19* 66 | 67 | - Moved to googleads version 10.0.0 68 | 69 | ## 1.5.1 70 | *2018-01-29* 71 | 72 | - Allow for arbitrary characters in account / campaign / ad group labels 73 | 74 | 75 | ## 1.5.0 76 | *2018-01-19* 77 | 78 | - Allow for spaces in account / campaign / ad group labels 79 | - Retry in case of any error, not only HTTP 500 80 | 81 | 82 | ## 1.4.1 83 | *2017-11-21* 84 | 85 | - Updated googleads-python-lib to 9.0.0 86 | 87 | ## 1.4.0 88 | *2017-10-23* 89 | 90 | - Updated googleads-python-lib to 8.1.0 91 | 92 | 93 | ## 1.3.0 94 | *2017-10-10* 95 | 96 | - Updated googleads-python-lib to 8.0 97 | - 98 | 99 | ## 1.2.0 100 | *2017-09-20* 101 | 102 | - Updated googleads-python-lib to 6.0 and use AdWords API version v201705 103 | - Added retry logic 104 | - Made the config and click commands discoverable in [mara-app](https://github.com/mara/mara-app) >= 1.2.0 105 | 106 | **required changes** 107 | 108 | - The file format changed to `v3`. Adapt etl scripts that process the output data. 109 | 110 | 111 | ## 1.1.1 112 | *2017-06-30* 113 | 114 | - Updated googleads-python-lib to 6.0.0 115 | 116 | ## 1.1.0 117 | *2017-06-07* 118 | 119 | - Updated googleads-python-lib to 5.6.0 and use AdWords API version v201705 120 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Mara contributors 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Google Ads Performance Downloader 2 | 3 | A Python script for downloading performance data and account structure from an [MCC account](https://ads.google.com/home/tools/manager-accounts/) using the Google Adwords API ([v201809](https://developers.google.com/adwords/api/docs/reference/release-notes/v201809)) to local files. 4 | 5 | The [mara Google ads performance pipeline](https://github.com/mara/google-ads-performance-pipeline) can be, then, used for loading and transforming the downloaded data into a dimensional schema. 6 | 7 | ## Resulting data 8 | By default, it creates two data sets: 9 | 10 | 1. **Ad Performance** consists of measures such as impressions, clicks, conversions and cost. The script creates one file per day in a specified time range: 11 | 12 | data/2015/03/31/google-ads/ad-performance-v4.json.gz 13 | data/2015/04/01/google-ads/ad-performance-v4.json.gz 14 | 15 | For the last 30 days, the script always re-downloads the files as data still changes (e.g cost or attributed conversions). Beyond that, files are only downloaded when they do not yet exist. 16 | **Note**: If you are using an attribution window larger than 30 days adjust the `redownload_window` config accordingly. 17 | 18 | The resulting JSON files contain arrays of dictionaries per ad, device and network: 19 | 20 | [ 21 | { 22 | "Day": "2015-03-31", 23 | "Ad ID": "69450572293", 24 | "Device": "Computers", 25 | "Network (with search partners)": "Google search", 26 | "Active View viewable impressions": "0", 27 | "Avg. position": "1.0", 28 | "Clicks": "1", 29 | "Conversions": "0.0", 30 | "Total conv. value": "0.0", 31 | "Converted clicks": "0", 32 | "Cost": "590000", 33 | "Impressions": "4" 34 | }, 35 | .. 36 | ] 37 | 38 | See [Ad Performance Report](https://developers.google.com/adwords/api/docs/appendix/reports/ad-performance-report) for a documentation of the fields. 39 | 40 | 2. **Account Structure** information. This file is always overwritten by the script: 41 | 42 | data/google-ads-account-structure-v4.csv.gz 43 | 44 | Each line contains one ad together with its ad group, campaign and account: 45 | 46 | ad_id | 69450572293 47 | ad_name | Online Veiling Gent 48 | ad_group_id | 17837800573 49 | ad_group_name | Veiling Gent 50 | campaign_id | 254776453 51 | campaign_name | BE_NL_GEN_Auction_City_{e} 52 | account_id | 3470519330 53 | account_name | BE_Dutch_Search 54 | attributes | {"Target": "buyer", "Ad type": "Text ad", "Channel": "SEM", "Country": "Belgium", "Ad state": "disabled", "Language": "Dutch"} 55 | 56 | The `attributes` field contains a JSON representation of all [labels](https://support.google.com/adwords/answer/2475865) of an ad or its parents in a `{Key=Value}` syntax. For example, if an account has the label `{Channel=SEM}`, then all ads below will have the attribute `"Channel": "SEM"`. 57 | 58 | **Note**: Labels on lower levels overwrite those from higher levels. 59 | 60 | ## Getting Started 61 | 62 | ### Prerequisites 63 | 64 | To use the Google Ads Performance Downloader you have to 65 | 66 | - consolidate all accounts of a company under a single [manager account](https://ads.google.com/home/tools/manager-accounts/) (aka. MCC), 67 | - not delete any ad, ad group, campaign or account (but disable them instead) so that you can relate past performances to structure data, 68 | - set up Oauth2 credentials to access the Adwords API. See [Set up your OAuth2 credentials](#set-up-your-oauth2-credentials) for the necessary steps. 69 | 70 | Optionally, you can apply labels on all hierarchy levels for segmenting the account structure. 71 | 72 | 73 | ### Installation 74 | 75 | The Google Ads Performance Downloader requires: 76 | 77 | Python (>= 3.6) 78 | googleads (==15.0.2) 79 | click (>=6.0) 80 | 81 | The easiest way to install google-ads-downloader is using pip 82 | 83 | pip install git+https://github.com/mara/google-ads-performance-downloader.git 84 | 85 | In case you want to install it in a virtual environment: 86 | 87 | $ git clone git@github.com:mara/google-ads-performance-downloader.git google_ads_downloader 88 | $ cd google_ads_downloader 89 | $ python3 -m venv .venv 90 | $ .venv/bin/pip install . 91 | 92 | 93 | ### Set up your OAuth2 credentials 94 | 95 | **Note: Should you not be able to see the images, then please deactivate your ad blocker** 96 | 97 | Create an `oauth2_client_id` and `oauth2_client_secret`. As described in [Google Ads documentation](https://developers.google.com/adwords/api/docs/guides/authentication#installed). 98 | 99 | Log into your Google Ads account and go to account settings: 100 | ![Google Ads account page](docs/google-adwords-account-page.png) 101 | 102 | In the account settings you find the `client_customer_id` (account name aka MCC): 103 | ![Google Ads account preferences](docs/google-adwords-account-preferences.png) 104 | 105 | Fill out the developer details in the Adwords API Center. This page also provides you with the `developer_token`: 106 | ![Google Ads API Center](docs/google-adwords-account-api-center.png) 107 | 108 | To get access level beyond the test account fill out the Adwords API Token application: 109 | ![Google Ads API Token application](docs/google-adwords-api-token-application.png) 110 | 111 | Once approved, use the Google Adwords API to get an OAuth2 refresh token by calling (replace with your credentials): 112 | 113 | $ refresh-google-ads-api-oauth2-token --client_customer_id 123-456-7890 \ 114 | --developer_token ABCDEFEGHIJKL \ 115 | --oauth2_client_id 123456789-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com \ 116 | --oauth2_client_secret aBcDeFg 117 | 118 | This prompts you to visit a URL where you need to allow the OAuth2 credentials to access the API on your behalf. Navigate to the URL in a private browser session or an incognito window. Log in with the same Google account you use to access Google Ads, and then click **Allow** on the OAuth2 consent screen: 119 | 120 | ![Google Ads API consent screen](docs/google-adwords-api-consent.png) 121 | 122 | An authorization code is shown to you. Copy and paste it into the command line where you are running the `refresh-google-ads-api-oauth2-token` and press enter. The script should complete and display an offline `oauth2_refresh_token`: 123 | 124 | ![The authorization code](docs/google-adwords-api-authorization-code.png) 125 | 126 | ## Usage 127 | 128 | To run the Google Ads Performance Downloader call `download-google-ads-performance-data` with its config parameters: 129 | 130 | $ download-google-ads-performance-data --client_customer_id 123-456-7890 \ 131 | --developer_token ABCDEFEGHIJKL \ 132 | --oauth2_client_id 123456789-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com \ 133 | --oauth2_client_secret aBcDeFg \ 134 | --oauth2_refresh_token 1/acbd-efghijklmnopqrstuvwxyz \ 135 | --data_dir /tmp/google-ads 136 | 137 | 138 | All options: 139 | 140 | $ download-google-ads-performance-data --help 141 | Usage: download-google-ads-performance-data [OPTIONS] 142 | 143 | Downloads data. When options are not specified, then the defaults from 144 | config.py are used. 145 | 146 | Options: 147 | --client_customer_id TEXT The id of the manager account (MCC) that 148 | contains all the accounts for which data should 149 | be downloaded. Default: "123-456-7890" 150 | --developer_token TEXT The developer token that is used to access the 151 | Google Ads API. Default: "ABCDEFEGHIJKL" 152 | --oauth2_client_id TEXT The Oauth client id obtained from the Google Ads 153 | API center. Default: "123456789-abcdefghijklmno 154 | pqrstuvwxyz.apps.googleusercontent.com" 155 | --oauth2_client_secret TEXT The Oauth client secret obtained from the 156 | Google Ads API center. Default: "aBcDeFg" 157 | --oauth2_refresh_token TEXT The Oauth refresh token returned from the 158 | refresh-google-ads-api-oauth2-token script. 159 | Default: "1/acbd-efghijklmnopqrstuvwxyz" 160 | --data_dir TEXT The directory where result data is written to. 161 | Default: "/tmp/google-ads" 162 | --first_date TEXT The first day for which data is downloaded. 163 | Default: "2015-01-01" 164 | --redownload_window TEXT The number of days for which the performance 165 | data will be redownloaded. Default: "30" 166 | --output_file_version TEXT A suffix that is added to output files, 167 | denoting a version of the data format. Default: 168 | "v4" 169 | --max_retries TEXT How often try retry at max in case of 500 170 | errors. Default: "5" 171 | --retry_backoff_factor TEXT How many seconds to wait between retries (is 172 | multiplied with retry count). Default: "5" 173 | --help Show this message and exit. 174 | -------------------------------------------------------------------------------- /docs/google-adwords-account-api-center.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/project-a/google-ads-performance-downloader/bc1bf0ce57222ab496ca6e13d35ab7e9bae92a1d/docs/google-adwords-account-api-center.png -------------------------------------------------------------------------------- /docs/google-adwords-account-page.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/project-a/google-ads-performance-downloader/bc1bf0ce57222ab496ca6e13d35ab7e9bae92a1d/docs/google-adwords-account-page.png -------------------------------------------------------------------------------- /docs/google-adwords-account-preferences.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/project-a/google-ads-performance-downloader/bc1bf0ce57222ab496ca6e13d35ab7e9bae92a1d/docs/google-adwords-account-preferences.png -------------------------------------------------------------------------------- /docs/google-adwords-api-authorization-code.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/project-a/google-ads-performance-downloader/bc1bf0ce57222ab496ca6e13d35ab7e9bae92a1d/docs/google-adwords-api-authorization-code.png -------------------------------------------------------------------------------- /docs/google-adwords-api-consent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/project-a/google-ads-performance-downloader/bc1bf0ce57222ab496ca6e13d35ab7e9bae92a1d/docs/google-adwords-api-consent.png -------------------------------------------------------------------------------- /docs/google-adwords-api-token-application.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/project-a/google-ads-performance-downloader/bc1bf0ce57222ab496ca6e13d35ab7e9bae92a1d/docs/google-adwords-api-token-application.png -------------------------------------------------------------------------------- /google_ads_downloader/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | def MARA_CONFIG_MODULES(): 3 | from . import config, cli 4 | return [config] 5 | 6 | def MARA_CLICK_COMMANDS(): 7 | from . import config, cli 8 | return [cli.download_data, cli.refresh_oauth2_token] -------------------------------------------------------------------------------- /google_ads_downloader/cli.py: -------------------------------------------------------------------------------- 1 | """Command line interface for adwords downloader""" 2 | 3 | import click 4 | from google_ads_downloader import config 5 | from functools import partial 6 | 7 | 8 | def config_option(config_function): 9 | """Helper decorator that turns an option function into a cli option""" 10 | 11 | return lambda function: \ 12 | click.option('--' + config_function.__name__, 13 | help=f'{config_function.__doc__}. Default: "{config_function()}"') \ 14 | (function) 15 | 16 | 17 | def apply_options(kwargs): 18 | """Applies passed cli parameters to config.py""" 19 | for key, value in kwargs.items(): 20 | if value: setattr(config, key, partial(lambda v: v, value)) 21 | 22 | 23 | @click.command() 24 | @config_option(config.client_customer_id) 25 | @config_option(config.developer_token) 26 | @config_option(config.oauth2_client_id) 27 | @config_option(config.oauth2_client_secret) 28 | def refresh_oauth2_token(**kwargs): 29 | """ 30 | Creates a new OAuth2 token. 31 | When options are not specified, then the defaults from config.py are used. 32 | """ 33 | apply_options(kwargs) 34 | 35 | from google_ads_downloader import downloader 36 | downloader.refresh_oauth_token() 37 | 38 | 39 | @click.command() 40 | @config_option(config.client_customer_id) 41 | @config_option(config.developer_token) 42 | @config_option(config.oauth2_client_id) 43 | @config_option(config.oauth2_client_secret) 44 | @config_option(config.oauth2_refresh_token) 45 | @config_option(config.data_dir) 46 | @config_option(config.first_date) 47 | @config_option(config.redownload_window) 48 | @config_option(config.output_file_version) 49 | @config_option(config.max_retries) 50 | @config_option(config.retry_backoff_factor) 51 | def download_data(**kwargs): 52 | """ 53 | Downloads data. 54 | When options are not specified, then the defaults from config.py are used. 55 | """ 56 | apply_options(kwargs) 57 | from google_ads_downloader import downloader 58 | downloader.download_data() 59 | -------------------------------------------------------------------------------- /google_ads_downloader/config.py: -------------------------------------------------------------------------------- 1 | """ 2 | Configures access to Adwords API and where to store results 3 | """ 4 | from datetime import date 5 | 6 | 7 | def data_dir() -> str: 8 | """The directory where result data is written to""" 9 | return '/tmp/adwords' 10 | 11 | 12 | def first_date() -> str: 13 | """The first day for which data is downloaded""" 14 | return '2015-01-01' 15 | 16 | 17 | def client_customer_id() -> str: 18 | """The id of the manager account (MCC) that contains all the accounts for which data should be downloaded""" 19 | return '123-456-7890' 20 | 21 | 22 | def developer_token() -> str: 23 | """The developer token that is used to access the Adwords API""" 24 | return 'ABCDEFEGHIJKL' 25 | 26 | 27 | def oauth2_client_id() -> str: 28 | """The Oauth client id obtained from the Adwords API center""" 29 | return '123456789-abcdefghijklmnopqrstuvwxyz.apps.googleusercontent.com' 30 | 31 | 32 | def oauth2_client_secret() -> str: 33 | """The Oauth client secret obtained from the Adwords API center""" 34 | return 'aBcDeFg' 35 | 36 | 37 | def oauth2_refresh_token() -> str: 38 | """The Oauth refresh token returned from the adwords-downloader-refresh-oauth2-token script""" 39 | return '1/acbd-efghijklmnopqrstuvwxyz' 40 | 41 | 42 | def api_version() -> str: 43 | """Which Adwords API version should be called""" 44 | return 'v201809' 45 | 46 | 47 | def redownload_window() -> str: 48 | """The number of days for which the performance data will be redownloaded""" 49 | return '30' 50 | 51 | 52 | def output_file_version() -> str: 53 | """A suffix that is added to output files, denoting a version of the data format""" 54 | return 'v5' 55 | 56 | 57 | def max_retries() -> int: 58 | """How often try retry at max in case of 500 errors""" 59 | return 5 60 | 61 | 62 | def retry_backoff_factor() -> int: 63 | """How many seconds to wait between retries (is multiplied with retry count)""" 64 | return 5 65 | 66 | 67 | def ignore_removed_campaigns() -> bool: 68 | """Whether to ignore campaigns with status 'REMOVED'""" 69 | return False 70 | 71 | 72 | def download_keywords_performance_reports() -> bool: 73 | """Whether to download keywords-performance reports""" 74 | return False 75 | -------------------------------------------------------------------------------- /google_ads_downloader/downloader.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import errno 3 | import gzip 4 | import http 5 | import csv 6 | import logging 7 | import re 8 | import shutil 9 | import sys 10 | import io 11 | import tempfile 12 | import json 13 | import time 14 | from enum import Enum 15 | from pathlib import Path 16 | 17 | from google_ads_downloader import config 18 | from googleads import adwords, oauth2, errors 19 | 20 | from google_auth_oauthlib.flow import InstalledAppFlow 21 | from oauthlib.oauth2.rfc6749.errors import InvalidGrantError 22 | 23 | 24 | class PerformanceReportType(Enum): 25 | """ A Google performance report type 26 | https://developers.google.com/adwords/api/docs/appendix/reports/ad-performance-report 27 | https://developers.google.com/adwords/api/docs/appendix/reports/adgroup-performance-report 28 | https://developers.google.com/adwords/api/docs/appendix/reports/campaign-performance-report 29 | https://developers.google.com/adwords/api/docs/appendix/reports/account-performance-report 30 | https://developers.google.com/adwords/api/docs/appendix/reports/keywords-performance-report 31 | """ 32 | AD_PERFORMANCE_REPORT = 'ad-performance' 33 | ADGROUP_PERFORMANCE_REPORT = 'adgroup-performance' 34 | CAMPAIGN_PERFORMANCE_REPORT = 'campaign-performance' 35 | ACCOUNT_PERFORMANCE_REPORT = 'account-performance' 36 | KEYWORDS_PERFORMANCE_REPORT = 'keywords-performance' 37 | 38 | 39 | class AccountStructureType(Enum): 40 | AD_ACCOUNT_STRUCTURE = 'ads' 41 | KEYWORD_ACCOUNT_STRUCTURE = 'ads-keyword' 42 | 43 | 44 | class AdWordsApiClient(adwords.AdWordsClient): 45 | """A client for downloading data from the Google AdWords API""" 46 | 47 | def __init__(self): 48 | 49 | self.client = super(AdWordsApiClient, self).__init__( 50 | developer_token=config.developer_token(), 51 | oauth2_client=oauth2.GoogleRefreshTokenClient( 52 | client_id=config.oauth2_client_id(), 53 | client_secret=config.oauth2_client_secret(), 54 | refresh_token=config.oauth2_refresh_token()), 55 | client_customer_id=config.client_customer_id()) 56 | self.client_customers = self._fetch_client_customers() 57 | 58 | def _fetch_managed_customer_page(self): 59 | """Fetches the data from the ManagedCustomerService containing the customer information 60 | https://developers.google.com/adwords/api/docs/reference/v201609/ManagedCustomerService.ManagedCustomerPage 61 | 62 | Returns: ManagedCustomerPage 63 | 64 | """ 65 | service = self.GetService(service_name='ManagedCustomerService') 66 | return service.get({'fields': ['CustomerId', 'Name', 'CanManageClients', 'AccountLabels', 'CurrencyCode']}) 67 | 68 | def _fetch_client_customers(self): 69 | """Fetches the client customers, including their names and account labels, from 70 | the Google Ads API 71 | 72 | Returns: 73 | A dictionary of client_customers with 74 | {customer_id: {'Name': account_name, 'Labels': account_labels}} 75 | 76 | """ 77 | managed_customer_page = self._fetch_managed_customer_page() 78 | client_customers = {} 79 | for managed_customer in managed_customer_page.entries: 80 | # Exclude manager customers 81 | # https://support.google.com/adwords/answer/6139186?hl=en 82 | if not managed_customer.canManageClients: 83 | account_labels = [] 84 | if hasattr(managed_customer, 'accountLabels'): 85 | account_labels = [x.name for x in managed_customer.accountLabels] 86 | client_customers[managed_customer.customerId] = { 87 | 'Name': managed_customer.name, 88 | 'Labels': account_labels, 89 | 'Currency Code': managed_customer.currencyCode} 90 | return client_customers 91 | 92 | 93 | def download_data(): 94 | """Creates an AdWordsApiClient and downloads the data""" 95 | logging.basicConfig(level=logging.INFO, 96 | format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') 97 | 98 | logging.info('Adwords API version: ' + str(config.api_version())) 99 | 100 | api_client = AdWordsApiClient() 101 | download_data_sets(api_client) 102 | 103 | 104 | def download_data_sets(api_client: AdWordsApiClient): 105 | """Downloads the account structure and the AdWords ad performance 106 | 107 | Args: 108 | api_client: AdWordsApiClient 109 | 110 | """ 111 | 112 | base_predicates = [{ 113 | 'field': 'Impressions', 114 | 'operator': 'GREATER_THAN', 115 | 'values': [0] 116 | }] 117 | 118 | if config.ignore_removed_campaigns(): 119 | base_predicates.append({ 120 | 'field': 'CampaignStatus', 121 | 'operator': 'NOT_EQUALS', 122 | 'values': 'REMOVED' 123 | }) 124 | 125 | ads_performance_predicates = base_predicates.copy() 126 | ads_performance_predicates.append({'field': 'Status', 127 | 'operator': 'IN', 128 | 'values': ['ENABLED', 129 | 'PAUSED', 130 | 'DISABLED'] 131 | }) 132 | 133 | download_performance(api_client, 134 | PerformanceReportType.AD_PERFORMANCE_REPORT, 135 | fields=['Date', 'Id', 'AdGroupId', 'Device', 'AdNetworkType2', 136 | 'ActiveViewImpressions', 'AveragePosition', 137 | 'Clicks', 'Conversions', 'ConversionValue', 138 | 'Cost', 'Impressions'], 139 | predicates=ads_performance_predicates 140 | ) 141 | 142 | download_account_structure(api_client, 143 | AccountStructureType.AD_ACCOUNT_STRUCTURE, 144 | csv_header=['Ad Id', 'Ad', 'Ad Group Id', 'Ad Group', 'Campaign Id', 145 | 'Campaign', 'Customer Id', 'Customer Name', 'Attributes', 'Currency Code']) 146 | 147 | if config.download_keywords_performance_reports(): 148 | keywords_performance_predicates = base_predicates.copy() 149 | keywords_performance_predicates.append({'field': 'Status', 150 | 'operator': 'IN', 151 | 'values': ['ENABLED', 152 | 'PAUSED', 153 | 'REMOVED'] 154 | }) 155 | 156 | download_performance(api_client, 157 | PerformanceReportType.KEYWORDS_PERFORMANCE_REPORT, 158 | fields=['Date', 'Id', 'AdGroupId', 'Device', 'AdNetworkType2', 159 | 'ActiveViewImpressions', 'AveragePosition', 160 | 'Clicks', 'Conversions', 'ConversionValue', 161 | 'Cost', 'Impressions'], 162 | predicates=keywords_performance_predicates 163 | ) 164 | download_account_structure(api_client, 165 | AccountStructureType.KEYWORD_ACCOUNT_STRUCTURE, 166 | csv_header=['Keyword Id', 'Keyword', 'Ad Group Id', 'Ad Group', 167 | 'Campaign Id', 'Campaign', 'Customer Id', 'Customer Name', 168 | 'Attributes', 'Currency Code']) 169 | 170 | 171 | def download_performance(api_client: AdWordsApiClient, 172 | performance_report_type: PerformanceReportType, 173 | fields: [str], 174 | predicates: [{}]): 175 | """Download the Google Ads performance and saves them as zipped json files to disk 176 | 177 | Args: 178 | api_client: An AdWordsApiClient 179 | performance_report_type: A PerformanceReportType object 180 | fields: A list of fields to be included in the report 181 | predicates: A list of filters for the report 182 | """ 183 | client_customer_ids = api_client.client_customers.keys() 184 | 185 | first_date = datetime.datetime.strptime(config.first_date(), '%Y-%m-%d') 186 | last_date = datetime.datetime.now() - datetime.timedelta(days=1) 187 | current_date = last_date 188 | while current_date >= first_date: 189 | relative_filepath = Path('{date:%Y/%m/%d}/google-ads/{filename}_{version}.json.gz'.format( 190 | date=current_date, 191 | filename=performance_report_type.value, 192 | version=config.output_file_version())) 193 | filepath = ensure_data_directory(relative_filepath) 194 | 195 | if (not filepath.is_file() 196 | or (last_date - current_date).days <= int(config.redownload_window())): 197 | report_list = get_performance_for_single_day(api_client, 198 | client_customer_ids, 199 | current_date, 200 | performance_report_type, 201 | fields, 202 | predicates) 203 | 204 | with tempfile.TemporaryDirectory() as tmp_dir: 205 | tmp_filepath = Path(tmp_dir, relative_filepath) 206 | tmp_filepath.parent.mkdir(exist_ok=True, parents=True) 207 | with gzip.open(str(tmp_filepath), 'wt') as tmp_ad_performance_file: 208 | tmp_ad_performance_file.write(json.dumps(report_list)) 209 | shutil.move(str(tmp_filepath), str(filepath)) 210 | current_date += datetime.timedelta(days=-1) 211 | 212 | 213 | def get_performance_for_single_day(api_client: AdWordsApiClient, 214 | client_customer_ids: [int], 215 | single_date: datetime, 216 | report_type: PerformanceReportType, 217 | fields: [], 218 | predicates: []) -> [{}]: 219 | """Downloads the performance for a list of clients for a given day 220 | 221 | Args: 222 | api_client: An AdWordsApiClient 223 | client_customer_ids: A list of client ids 224 | single_date: A single date as a datetime object 225 | report_type: A PerformanceReportType object 226 | fields: A list of fields to be included in the report 227 | predicates: A list of filters for the report 228 | 229 | Returns: 230 | A list containing dictionaries with the performance from the report 231 | """ 232 | report_list = [] 233 | logging.info( 234 | 'download google ads {} for {}'.format(report_type.value, single_date.strftime('%Y-%m-%d'))) 235 | for client_customer_id in client_customer_ids: 236 | api_client.SetClientCustomerId(client_customer_id) 237 | report = _download_adwords_report(api_client, 238 | current_date=single_date, 239 | report_type=report_type.name, 240 | fields=fields, 241 | predicates=predicates, 242 | ) 243 | report_list.extend(list(report)) 244 | return report_list 245 | 246 | 247 | def download_account_structure(api_client: AdWordsApiClient, 248 | account_structure_type: AccountStructureType, 249 | csv_header: [str] 250 | ): 251 | """Downloads the Google Ads account structure as saves it as a zipped csv file. 252 | 253 | Args: 254 | api_client: An AdWordsApiClient 255 | account_structure_type: The type of the account structure file (ad or keyword) 256 | csv_header: The list of columns to be included in the downloaded CSV file 257 | 258 | """ 259 | filename = Path('google-{account_structure_type}-account-structure_{version}.csv.gz'.format( 260 | account_structure_type=account_structure_type.value, 261 | version=config.output_file_version())) 262 | filepath = ensure_data_directory(filename) 263 | 264 | with tempfile.TemporaryDirectory() as tmp_dir: 265 | tmp_filepath = Path(tmp_dir, filename) 266 | with gzip.open(str(tmp_filepath), 'wt') as tmp_campaign_structure_file: 267 | writer = csv.writer(tmp_campaign_structure_file, delimiter="\t") 268 | writer.writerow(csv_header) 269 | for client_customer_id, client_customer in api_client.client_customers.items(): 270 | labels = json.dumps(client_customer['Labels']) 271 | client_customer_attributes = parse_labels(labels) 272 | client_customer_name = client_customer['Name'] 273 | 274 | campaign_attributes = get_campaign_attributes(api_client, client_customer_id) 275 | ad_group_attributes = get_ad_group_attributes(api_client, client_customer_id) 276 | 277 | if account_structure_type == AccountStructureType.AD_ACCOUNT_STRUCTURE: 278 | account_data = get_ad_data(api_client, client_customer_id) 279 | main_key_prefix = 'Ad' 280 | else: 281 | account_data = get_keyword_data(api_client, client_customer_id) 282 | main_key_prefix = 'Keyword' 283 | 284 | for account_data_dict in account_data: 285 | ad_id = account_data_dict[f'{main_key_prefix} ID'] 286 | campaign_id = account_data_dict['Campaign ID'] 287 | ad_group_id = account_data_dict['Ad group ID'] 288 | currency_code = client_customer['Currency Code'] 289 | attributes = {**client_customer_attributes, 290 | **campaign_attributes.get(campaign_id, {}), 291 | **ad_group_attributes.get(ad_group_id, {}), 292 | **account_data_dict['attributes']} 293 | 294 | ad = [str(ad_id), 295 | account_data_dict[f'{main_key_prefix}'], 296 | str(ad_group_id), 297 | account_data_dict['Ad group'], 298 | str(campaign_id), 299 | account_data_dict['Campaign'], 300 | str(client_customer_id), 301 | client_customer_name, 302 | json.dumps(attributes), 303 | currency_code 304 | ] 305 | 306 | writer.writerow(ad) 307 | 308 | shutil.move(str(tmp_filepath), str(filepath)) 309 | 310 | 311 | def get_campaign_attributes(api_client: AdWordsApiClient, client_customer_id: int) -> {}: 312 | """Downloads the campaign attributes from the Google Ads API 313 | https://developers.google.com/adwords/api/docs/appendix/reports/campaign-performance-report 314 | 315 | Args: 316 | api_client: An AdWordsApiClient 317 | client_customer_id: A client customer id 318 | 319 | Returns: 320 | A dictionaries mapping campaign attributes to campaign ids 321 | """ 322 | logging.info('get campaign attributes for account {}'.format(client_customer_id)) 323 | api_client.SetClientCustomerId(client_customer_id) 324 | report = _download_adwords_report(api_client, 325 | report_type='CAMPAIGN_PERFORMANCE_REPORT', 326 | fields=['CampaignId', 'Labels'], 327 | predicates={'field': 'CampaignStatus', 328 | 'operator': 'IN', 329 | 'values': ['ENABLED', 330 | 'PAUSED', 331 | 'REMOVED'] 332 | }) 333 | return {row['Campaign ID']: parse_labels(row['Labels']) for row in report} 334 | 335 | 336 | def get_ad_group_attributes(api_client: AdWordsApiClient, client_customer_id: int) -> {}: 337 | """Downloads the ad group attributes from the Google AdWords API 338 | https://developers.google.com/adwords/api/docs/appendix/reports/adgroup-performance-report 339 | 340 | Args: 341 | api_client: An AdWordsApiClient 342 | client_customer_id: A client customer id 343 | 344 | Returns: 345 | A dictionaries mapping ad group attributes to ad group ids 346 | """ 347 | logging.info('get ad group attributes for account {}'.format(client_customer_id)) 348 | api_client.SetClientCustomerId(client_customer_id) 349 | report = _download_adwords_report(api_client, 350 | report_type='ADGROUP_PERFORMANCE_REPORT', 351 | fields=['AdGroupId', 'Labels'], 352 | predicates={'field': 'AdGroupStatus', 353 | 'operator': 'IN', 354 | 'values': ['ENABLED', 355 | 'PAUSED', 356 | 'REMOVED'] 357 | }) 358 | 359 | return {row['Ad group ID']: parse_labels(row['Labels']) for row in report} 360 | 361 | 362 | def get_ad_data(api_client: AdWordsApiClient, client_customer_id: int) -> [{}]: 363 | """Downloads the ad data from the Google AdWords API for a given client_customer_id 364 | https://developers.google.com/adwords/api/docs/appendix/reports/ad-performance-report 365 | 366 | Args: 367 | api_client: An AdWordsApiClient 368 | client_customer_id: A client customer id 369 | 370 | Returns: 371 | A list of dictionaries with ad data 372 | """ 373 | logging.info('get ad data for account {}'.format(client_customer_id)) 374 | 375 | api_client.SetClientCustomerId(client_customer_id) 376 | 377 | predicates = [ 378 | { 379 | 'field': 'Status', 380 | 'operator': 'IN', 381 | 'values': ['ENABLED', 382 | 'PAUSED', 383 | 'DISABLED'] 384 | } 385 | ] 386 | 387 | if config.ignore_removed_campaigns(): 388 | predicates.append({ 389 | 'field': 'CampaignStatus', 390 | 'operator': 'NOT_EQUALS', 391 | 'values': 'REMOVED' 392 | }) 393 | 394 | report = _download_adwords_report(api_client, 395 | report_type='AD_PERFORMANCE_REPORT', 396 | fields=['Id', 'AdGroupId', 'AdGroupName', 397 | 'CampaignId', 'CampaignName', 398 | 'Labels', 'Headline', 'AdType', 399 | 'Status'], 400 | predicates=predicates) 401 | 402 | ad_data = [] 403 | for row in report: 404 | attributes = parse_labels(row['Labels']) 405 | if row['Ad type'] is not None: 406 | attributes = {**attributes, 'Ad type': row['Ad type']} 407 | if row['Ad state'] is not None: 408 | attributes = {**attributes, 'Ad state': row['Ad state']} 409 | ad_data.append({**row, 'attributes': attributes}) 410 | 411 | return ad_data 412 | 413 | 414 | def get_keyword_data(api_client: AdWordsApiClient, client_customer_id: int) -> [{}]: 415 | """Downloads the keyword data from the Google AdWords API for a given client_customer_id 416 | https://developers.google.com/adwords/api/docs/appendix/reports/keywords-performance-report 417 | 418 | Args: 419 | api_client: An AdWordsApiClient 420 | client_customer_id: A client customer id 421 | 422 | Returns: 423 | A list of dictionaries with keyword data 424 | """ 425 | logging.info('get keyword data for account {}'.format(client_customer_id)) 426 | 427 | api_client.SetClientCustomerId(client_customer_id) 428 | 429 | predicates = [ 430 | { 431 | 'field': 'Status', 432 | 'operator': 'IN', 433 | 'values': ['ENABLED', 434 | 'PAUSED', 435 | 'REMOVED'] 436 | } 437 | ] 438 | 439 | if config.ignore_removed_campaigns(): 440 | predicates.append({ 441 | 'field': 'CampaignStatus', 442 | 'operator': 'NOT_EQUALS', 443 | 'values': 'REMOVED' 444 | }) 445 | 446 | report = _download_adwords_report(api_client, 447 | report_type='KEYWORDS_PERFORMANCE_REPORT', 448 | fields=['Id', 'AdGroupId', 'AdGroupName', 449 | 'CampaignId', 'CampaignName', 450 | 'Labels', 'Criteria', 'Status'], 451 | predicates=predicates) 452 | 453 | keyword_data = [] 454 | for row in report: 455 | attributes = parse_labels(row['Labels']) 456 | if row['Keyword state'] is not None: 457 | attributes = {**attributes, 'Keyword state': row['Keyword state']} 458 | keyword_data.append({**row, 'attributes': attributes}) 459 | 460 | return keyword_data 461 | 462 | 463 | def _download_adwords_report(api_client: AdWordsApiClient, 464 | report_type: str, 465 | fields: [str], 466 | predicates: {}, 467 | current_date: datetime = None) -> csv.DictReader: 468 | """Downloads an Google Ads report from the Google Ads API 469 | 470 | Args: 471 | api_client: An AdWordsApiClients 472 | report_type: The report type 473 | https://developers.google.com/adwords/api/docs/appendix/reports 474 | fields: The selector fields 475 | https://developers.google.com/adwords/api/docs/appendix/selectorfields 476 | predicates: The predicate to filter by 477 | https://developers.google.com/adwords/api/docs/reference/v201609/CampaignService.Predicate 478 | current_date: datetime (optional), if none is specified today's date is assumed 479 | 480 | Returns: 481 | A Google Ads report as a string 482 | 483 | """ 484 | report_filter = { 485 | 'reportName': '{}_#'.format(report_type), 486 | 'dateRangeType': 'CUSTOM_DATE', 487 | 'reportType': report_type, 488 | 'downloadFormat': 'CSV', 489 | 'selector': { 490 | 'fields': fields, 491 | 'predicates': predicates 492 | } 493 | } 494 | 495 | if current_date is not None: 496 | date_str = current_date.strftime('%Y%m%d') 497 | report_filter['selector']['dateRange'] = { 498 | 'min': date_str, 499 | 'max': date_str 500 | } 501 | else: 502 | report_filter['dateRangeType'] = 'TODAY' 503 | 504 | report_downloader = api_client.GetReportDownloader(version=config.api_version()) 505 | 506 | retry_count = 0 507 | while True: 508 | retry_count += 1 509 | try: 510 | report = io.StringIO() 511 | report_downloader.DownloadReport(report_filter, 512 | output=report, 513 | skip_report_header=True, 514 | skip_column_header=False, 515 | skip_report_summary=True) 516 | report.seek(0) 517 | return csv.DictReader(report) 518 | except errors.AdWordsReportError as e: 519 | if retry_count < config.max_retries(): 520 | 521 | logging.warning(('Error HTTP #{e.code} Failed attempt #{retry_count} for report with settings:\n' 522 | '{report_filter}\n' 523 | 'Retrying...').format(e=e, retry_count=retry_count, 524 | report_filter=report_filter)) 525 | time.sleep(retry_count * config.retry_backoff_factor()) 526 | else: 527 | raise e 528 | except http.client.RemoteDisconnected as e: 529 | if retry_count < config.max_retries(): 530 | logging.warning(('Network error during attempt #{retry_count} for report with settings:\n' 531 | '{report_filter}\n' 532 | 'Retrying...').format(retry_count=retry_count, 533 | report_filter=report_filter)) 534 | time.sleep(retry_count * config.retry_backoff_factor()) 535 | else: 536 | raise e 537 | 538 | 539 | class ClientConfigBuilder(object): 540 | """Helper class used to build a client config dict used in the OAuth 2.0 flow.""" 541 | 542 | _DEFAULT_AUTH_URI = 'https://accounts.google.com/o/oauth2/auth' 543 | _DEFAULT_TOKEN_URI = 'https://accounts.google.com/o/oauth2/token' 544 | CLIENT_TYPE_WEB = 'web' 545 | CLIENT_TYPE_INSTALLED_APP = 'installed' 546 | 547 | def __init__(self, client_type=None, client_id=None, client_secret=None, 548 | auth_uri=_DEFAULT_AUTH_URI, token_uri=_DEFAULT_TOKEN_URI): 549 | self.client_type = client_type 550 | self.client_id = client_id 551 | self.client_secret = client_secret 552 | self.auth_uri = auth_uri 553 | self.token_uri = token_uri 554 | 555 | def build(self): 556 | """Builds a client config dictionary used in the OAuth 2.0 flow.""" 557 | if all((self.client_type, self.client_id, self.client_secret, 558 | self.auth_uri, self.token_uri)): 559 | client_config = { 560 | self.client_type: { 561 | 'client_id': self.client_id, 562 | 'client_secret': self.client_secret, 563 | 'auth_uri': self.auth_uri, 564 | 'token_uri': self.token_uri 565 | } 566 | } 567 | else: 568 | raise ValueError('Required field is missing.') 569 | 570 | return client_config 571 | 572 | 573 | def refresh_oauth_token(): 574 | """Retrieve and display the access and refresh token.""" 575 | 576 | client_config = ClientConfigBuilder( 577 | client_type=ClientConfigBuilder.CLIENT_TYPE_WEB, client_id=config.oauth2_client_id(), 578 | client_secret=config.oauth2_client_secret()) 579 | flow = InstalledAppFlow.from_client_config(client_config.build(), 580 | scopes=['https://www.googleapis.com/auth/adwords']) 581 | flow.redirect_uri = 'urn:ietf:wg:oauth:2.0:oob' 582 | authorize_url, _ = flow.authorization_url(prompt='consent') 583 | 584 | print('Log into the Google Account you use to access your Google Ads account ' 585 | 'and go to the following URL: \n%s\n' % authorize_url) 586 | print('After approving the token enter the verification code (if specified).') 587 | code = input('Code: ').strip() 588 | try: 589 | flow.fetch_token(code=code) 590 | except InvalidGrantError as ex: 591 | print('Authentication has failed: %s' % ex) 592 | sys.exit(1) 593 | 594 | print('Access token: %s' % flow.credentials.token) 595 | print('Refresh token: %s' % flow.credentials.refresh_token) 596 | 597 | 598 | def parse_labels(labels: str) -> {str: str}: 599 | """Extracts labels from a string 600 | 601 | Args: 602 | labels: Labels as an json encoded array of strings '["{key_1=value_1}","{key_2=value_2}]", ..]' 603 | 604 | Returns: 605 | A dictionary of labels with {key_1 : value_1, ...} format 606 | 607 | """ 608 | matches = re.findall("{([^=]+)=([^=]+)}", labels) 609 | labels = {x[0].strip().lower().title(): x[1].strip() for x in matches} 610 | return labels 611 | 612 | 613 | def ensure_data_directory(relative_path: Path = None) -> Path: 614 | """Checks if a directory in the data dir path exists. Creates it if necessary 615 | 616 | Args: 617 | relative_path: A Path object pointing to a file relative to the data directory 618 | 619 | Returns: 620 | The absolute path Path object 621 | 622 | """ 623 | if relative_path is None: 624 | return Path(config.data_dir()) 625 | try: 626 | path = Path(config.data_dir(), relative_path) 627 | # if path points to a file, create parent directory instead 628 | if path.suffix: 629 | if not path.parent.exists(): 630 | path.parent.mkdir(exist_ok=True, parents=True) 631 | else: 632 | if not path.exists(): 633 | path.mkdir(exist_ok=True, parents=True) 634 | return path 635 | except OSError as exception: 636 | if exception.errno != errno.EEXIST: 637 | raise 638 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup( 4 | name='google-ads-performance-downloader', 5 | version='4.1.0', 6 | description="Downloads data from the Google Adwords Api to local files", 7 | 8 | install_requires=[ 9 | 'googleads==15.0.2', 10 | 'click>=6.0', 11 | 'wheel>=0.29' 12 | ], 13 | 14 | packages=find_packages(), 15 | 16 | author='Mara contributors', 17 | license='MIT', 18 | 19 | entry_points={ 20 | 'console_scripts': [ 21 | 'download-google-ads-performance-data=google_ads_downloader.cli:download_data', 22 | 'refresh-google-ads-api-oauth2-token=google_ads_downloader.cli:refresh_oauth2_token' 23 | ] 24 | } 25 | ) 26 | --------------------------------------------------------------------------------