├── .gitignore ├── AUTHORS ├── LICENSE ├── README.md └── google_analytics_cookie.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | __init__.py 3 | __init__.pyc 4 | google_analytics_cookie.pyc -------------------------------------------------------------------------------- /AUTHORS: -------------------------------------------------------------------------------- 1 | https://github.com/RyOnLife/ 2 | https://github.com/tedtieken/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Py Google Analytics Cookie Parser 2 | ===================================== 3 | 4 | [Py Google Analytics Cookie](https://github.com/ryonlife/py-google-analytics-cookie) is a simple class for parsing useful visitor and referral data from a Google Analytics cookie. 5 | 6 | Utilize it during your sign up process in order to permanently store informations about where your customers came from, which opens the door for all sorts of useful metrics... 7 | 8 | Cookie Breakdown 9 | ---------------- 10 | 11 | [From utma to utmz](http://www.morevisibility.com/analyticsblog/from-__utma-to-__utmz-google-analytics-cookies.html) is a decent blog post that explains, at a high level, the various cookies Google Analytics uses to store data. The two most interesting (at least to me) are the utma and utmz cookies: 12 | 13 | * utma is a persistent cookie that tracks the number of visits, and times of the first and last visit 14 | * utmz keeps track of all the referral information, with useful nuggets like ad campaigns and referring domain or search engine 15 | 16 | Usage 17 | ----- 18 | 19 | Instantiate an object with two keyword arguments containing the strings contained in the two cookies. 20 | 21 | Here's an example that will work on Pylons: 22 | 23 | from path.to.google_analytics_cookie import GoogleAnalyticsCookie 24 | utmz = request.cookies['__utmz'] if 'utmz' in request.cookies else None 25 | utma = request.cookies['__utma'] if 'utma' in request.cookies else None 26 | gac = GoogleAnalyticsCookie(utmz=utmz, utma=utma) 27 | 28 | In Django: 29 | 30 | from path.to.google_analytics_cookie import GoogleAnalyticsCookie 31 | utmz = request.COOKIES['__utmz'] if '__utmz' in request.COOKIES else None 32 | utma = request.COOKIES['__utma'] if '__utma' in request.COOKIES else None 33 | gac = GoogleAnalyticsCookie(utmz=utmz, utma=utma) 34 | 35 | Your object will have utma and utmz attributes that are dictionaries with the following keys: 36 | 37 | gac.utma['domain_hash'] 38 | gac.utma['random_id'] 39 | gac.utma['first_visit_at'] 40 | gac.utma['previous_visit_at'] 41 | gac.utma['current_visit_at'] 42 | gac.utma['session_counter'] 43 | 44 | gac.utmz['domain_hash'] 45 | gac.utmz['timestamp'] 46 | gac.utmz['session_counter'] 47 | gac.utmz['campaign_number'] 48 | gac.utmz['campaign_data']['source'] 49 | gac.utmz['campaign_data']['name'] 50 | gac.utmz['campaign_data']['medium'] 51 | gac.utmz['campaign_data']['term'] 52 | gac.utmz['campaign_data']['content'] 53 | -------------------------------------------------------------------------------- /google_analytics_cookie.py: -------------------------------------------------------------------------------- 1 | ######################################################################################## 2 | # 3 | # PYTHON GOOGLE ANALYTICS COOKIE PARSER 4 | # 5 | # Copyright (c) 2010, Ryan McKillen <@RyOnLife>. 6 | # All Rights Reserved. 7 | # 8 | # Originally written for use on http://www.ubercab.com and inspired by the 9 | # PHP Google Analytics Parser Class published by Joao Correia at 10 | # http://joaocorreia.pt/google-analytics-scripts/google-analytics-php-cookie-parser/. 11 | # 12 | # This software is subject to the provisions of the GNU LGPL v3 license at 13 | # http://www.gnu.org/licenses/lgpl-3.0.txt. A copy of the license should accompany this 14 | # distribution. THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED 15 | # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 16 | # TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. 17 | # 18 | ######################################################################################## 19 | 20 | import re 21 | import unittest 22 | from datetime import datetime 23 | 24 | class GoogleAnalyticsCookie(): 25 | """ Parses the utma (visitor) and utmz (referral) Google Analytics cookies """ 26 | utmz = dict( 27 | domain_hash = None, 28 | timestamp = None, 29 | session_counter = None, 30 | campaign_number = None, 31 | campaign_data = dict( 32 | source = None, 33 | name = None, 34 | medium = None, 35 | term = None, 36 | content = None 37 | ) 38 | ) 39 | utma = dict( 40 | domain_hash = None, 41 | random_id = None, 42 | first_visit_at = None, 43 | previous_visit_at = None, 44 | current_visit_at = None, 45 | session_counter = None 46 | ) 47 | 48 | 49 | def __init__(self, utmz=None, utma=None): 50 | self.utmz = dict( 51 | domain_hash = None, 52 | timestamp = None, 53 | session_counter = None, 54 | campaign_number = None, 55 | campaign_data = dict( 56 | source = None, 57 | name = None, 58 | medium = None, 59 | term = None, 60 | content = None 61 | ) 62 | ) 63 | 64 | self.utma = dict( 65 | domain_hash = None, 66 | random_id = None, 67 | first_visit_at = None, 68 | previous_visit_at = None, 69 | current_visit_at = None, 70 | session_counter = None 71 | ) 72 | if utmz: 73 | self.utmz = self.__parse_utmz(utmz) 74 | if utma: 75 | self.utma = self.__parse_utma(utma) 76 | 77 | def __parse_utmz(self, cookie): 78 | """ Parses the utmz cookie for visitor information """ 79 | parsed = cookie.split('.') 80 | if len(parsed) < 5: 81 | return self.utmz 82 | 83 | #rejoin when src or cct might have a dot i.e. utmscr=example.com 84 | parsed[4] = ".".join(parsed[4:]) 85 | 86 | translations = dict( 87 | utmcsr = 'source', 88 | utmccn = 'name', 89 | utmcmd = 'medium', 90 | utmctr = 'term', 91 | utmcct = 'content' 92 | ) 93 | 94 | parsed_campaign_data = self.utmz['campaign_data'] 95 | 96 | for params in parsed[4].split('|'): 97 | key_value = params.split('=') 98 | if translations.has_key(key_value[0]): 99 | parsed_campaign_data[translations[key_value[0]]] = key_value[1] 100 | 101 | # Override campaign data when visitor comes from Google AdWords 102 | if re.search('gclid=', cookie): 103 | parsed_campaign_data = dict( 104 | source = 'google', 105 | name = None, 106 | medium = 'cpc', 107 | content = None, 108 | term = parsed_campaign_data['term'] 109 | ) 110 | 111 | return dict( 112 | domain_hash = parsed[0], 113 | timestamp = parsed[1], 114 | session_counter = parsed[2], 115 | campaign_number = parsed[3], 116 | campaign_data = parsed_campaign_data 117 | ) 118 | 119 | def __parse_utma(self, cookie): 120 | """ Parses the utma cookie for referral information """ 121 | parsed = cookie.split('.') 122 | if len(parsed) != 6: 123 | return self.utma 124 | 125 | return dict( 126 | domain_hash = parsed[0], 127 | random_id = parsed[1], 128 | first_visit_at = datetime.fromtimestamp(float(parsed[2])), 129 | previous_visit_at = datetime.fromtimestamp(float(parsed[3])), 130 | current_visit_at = datetime.fromtimestamp(float(parsed[4])), 131 | session_counter = parsed[5] 132 | ) 133 | 134 | class TestGoogleAnalyticsCookie(unittest.TestCase): 135 | 136 | utmz_test = '174403709.1285179976.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)|utmctr=test' 137 | utmz_test2 = '81516565.1309300431.44.5.utmcsr=stumbleupon.com|utmccn=(referral)|utmcmd=referral|utmcct=/refer.php' 138 | utma = '174403709.475482016.1285179976.1285179976.1285179976.1' 139 | 140 | def test_parse_utmz(self): 141 | """ Should properly parse utmz cookie data """ 142 | gac = GoogleAnalyticsCookie(utmz=self.utmz_test) 143 | self.assertEqual(gac.utmz['domain_hash'], '174403709') 144 | self.assertEqual(gac.utmz['timestamp'], '1285179976') 145 | self.assertEqual(gac.utmz['session_counter'], '1') 146 | self.assertEqual(gac.utmz['campaign_number'], '1') 147 | 148 | self.assertEqual(gac.utmz['campaign_data']['source'], '(direct)') 149 | self.assertEqual(gac.utmz['campaign_data']['name'], '(direct)') 150 | self.assertEqual(gac.utmz['campaign_data']['medium'], '(none)') 151 | self.assertEqual(gac.utmz['campaign_data']['term'], 'test') 152 | self.assertEqual(gac.utmz['campaign_data']['content'], None) 153 | 154 | def test_parse_utmz_referral_url(self): 155 | """ Should properly parse utmz cookie data when there are periods in the src and in the content""" 156 | gac = GoogleAnalyticsCookie(utmz=self.utmz_test2) 157 | self.assertEqual(gac.utmz['domain_hash'], '81516565') 158 | self.assertEqual(gac.utmz['timestamp'], '1309300431') 159 | self.assertEqual(gac.utmz['session_counter'], '44') 160 | self.assertEqual(gac.utmz['campaign_number'], '5') 161 | 162 | self.assertEqual(gac.utmz['campaign_data']['source'], 'stumbleupon.com') 163 | self.assertEqual(gac.utmz['campaign_data']['name'], '(referral)') 164 | self.assertEqual(gac.utmz['campaign_data']['medium'], 'referral') 165 | self.assertEqual(gac.utmz['campaign_data']['term'], None) 166 | self.assertEqual(gac.utmz['campaign_data']['content'], '/refer.php') 167 | 168 | 169 | def test_parse_utmz_gclid(self): 170 | """ Should override normal campaign parsing when a Google Adwords click is involved """ 171 | gac = GoogleAnalyticsCookie(utmz=self.utmz_test + '|gclid=123') 172 | self.assertEqual(gac.utmz['campaign_data']['source'], 'google') 173 | self.assertEqual(gac.utmz['campaign_data']['name'], None) 174 | self.assertEqual(gac.utmz['campaign_data']['medium'], 'cpc') 175 | self.assertEqual(gac.utmz['campaign_data']['term'], 'test') 176 | self.assertEqual(gac.utmz['campaign_data']['content'], None) 177 | 178 | def test_parse_utma(self): 179 | """ Should properly parse utma cookie data """ 180 | gac = GoogleAnalyticsCookie(utma=self.utma) 181 | self.assertEqual(gac.utma['domain_hash'], '174403709') 182 | self.assertEqual(gac.utma['random_id'], '475482016') 183 | self.assertEqual(gac.utma['first_visit_at'], datetime.fromtimestamp(1285179976)) 184 | self.assertEqual(gac.utma['previous_visit_at'], datetime.fromtimestamp(1285179976)) 185 | self.assertEqual(gac.utma['current_visit_at'], datetime.fromtimestamp(1285179976)) 186 | self.assertEqual(gac.utma['session_counter'], '1') 187 | 188 | def test_parse_no_cookie(self): 189 | """ Should key dictionaries with None values when cookies are is missing """ 190 | gac = GoogleAnalyticsCookie() 191 | self.assertEqual(gac.utmz['domain_hash'], None) 192 | self.assertEqual(gac.utmz['timestamp'], None) 193 | self.assertEqual(gac.utmz['session_counter'], None) 194 | self.assertEqual(gac.utmz['campaign_number'], None) 195 | self.assertEqual(gac.utmz['campaign_data']['source'], None) 196 | self.assertEqual(gac.utmz['campaign_data']['name'], None) 197 | self.assertEqual(gac.utmz['campaign_data']['medium'], None) 198 | self.assertEqual(gac.utmz['campaign_data']['term'], None) 199 | self.assertEqual(gac.utmz['campaign_data']['content'], None) 200 | self.assertEqual(gac.utma['domain_hash'], None) 201 | self.assertEqual(gac.utma['random_id'], None) 202 | self.assertEqual(gac.utma['first_visit_at'], None) 203 | self.assertEqual(gac.utma['previous_visit_at'], None) 204 | self.assertEqual(gac.utma['current_visit_at'], None) 205 | self.assertEqual(gac.utma['session_counter'], None) 206 | 207 | def test_parse_bad_cookie(self): 208 | """ Should key dictionaries with None values when cookies have bad data """ 209 | gac = GoogleAnalyticsCookie(utmz='-', utma='-') 210 | self.assertEqual(gac.utmz['domain_hash'], None) 211 | self.assertEqual(gac.utmz['timestamp'], None) 212 | self.assertEqual(gac.utmz['session_counter'], None) 213 | self.assertEqual(gac.utmz['campaign_number'], None) 214 | self.assertEqual(gac.utmz['campaign_data']['source'], None) 215 | self.assertEqual(gac.utmz['campaign_data']['name'], None) 216 | self.assertEqual(gac.utmz['campaign_data']['medium'], None) 217 | self.assertEqual(gac.utmz['campaign_data']['term'], None) 218 | self.assertEqual(gac.utmz['campaign_data']['content'], None) 219 | self.assertEqual(gac.utma['domain_hash'], None) 220 | self.assertEqual(gac.utma['random_id'], None) 221 | self.assertEqual(gac.utma['first_visit_at'], None) 222 | self.assertEqual(gac.utma['previous_visit_at'], None) 223 | self.assertEqual(gac.utma['current_visit_at'], None) 224 | self.assertEqual(gac.utma['session_counter'], None) 225 | 226 | if __name__ == '__main__': 227 | unittest.main() 228 | --------------------------------------------------------------------------------