├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Analytics Debugger S.L.U. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![image](https://github.com/analytics-debugger/analytics-firewall/assets/1494564/2ef31c27-2260-4f0a-bffe-c20b4877f014) 2 | 3 | 4 | # Analytics Firewall 5 | This application enables anyone to set up a personalized public endpoint capable of receiving **Google Analytics 4 Payloads**, similar to a SGTM (*Server-side Google Tag Manager*) endpoint. 6 | 7 | The primary objective of this tool is to ensure the collection of highly accurate and pristine data for your GA4 implementation. 8 | 9 | # Why this tool 10 | 11 | It's being really frustating working with client on bypassing all the limitations of the new Google Analytics Suite, I started to build this the last year, for being able to export the GA4 to BigQuery without any limits ( 1M hits ), so think this is a replacement for the automatic exporting features. 12 | 13 | At the same time, while I was in the SuperWeek, though I could also try to fix somehow some of the tracking handicaps on BigQuery like the attribution, and then since we are here, why not adding some extra features, like Bot Spam Scoring, Automated PII Data Scrubbing, Parallel Tracking. Yes I know too many things, Hopefully some people my like the project I may end helping. 14 | 15 | # Stack Needed 16 | Don't blame me, I'm using PHP 8+ ( with some Asyncronous Support , need to learn mode about PHP Fibers at this point ). The main reason for using PHP is beacause is the most world-wide available server-side language, so it should allow anyone to run this endpoint with the less efforts possible 17 | 18 | Any ports to other languages will be welcome at any point. 19 | 20 | # Current Features 21 | 22 | ## Big Query Exporter 23 | 24 | The tool will parse the **GA4 collect payload**, and will generate a JSON file/string following the *GA4 Big Query Format*, that could be imported directly on Big Query. 25 | 26 | **Analytics Firewall**, will take of everything for you, 27 | 28 | - It will calculate the current session attribution and apply it to every event occurring within the user session. 29 | - It will retrieve the current geographical location data and map it to the corresponding geo section using the **GEOLite Free Database.** 30 | - It will extract browser/device details from the User Agent/Client Hints to populate all the relevant data fields. 31 | - It will handle the generation of internal events for "*session_start*" and "*first_visit*" automatically. 32 | 33 | Data is generated in real-time, meaning that you could get RealTime Insigths if you opt-in for a database with that support. 34 | 35 | ![image](https://github.com/analytics-debugger/analytics-firewall/assets/1494564/7aa38637-3533-46cf-8fc4-417df66d1b1c) 36 | 37 | 38 | ## Measurement ID spoofing 39 | 40 | You can specify a fictitious Measurement ID cliens-side to safeguard your website from bot crawlers that programmatically generate hits. The fake ID will be overridden with the real one from the tool, ensuring protection against unwanted bot traffic. 41 | 42 | ## Parallel Tracking 43 | 44 | Effortless parallel tracking implementation. Forward a copy of the hits to any account you want. 45 | 46 | ## Geo Ip Data 47 | If you're just forwarding the events to Google Analytilcs, the system will pass throuigh the user's ip address so the geo details keep working, if you're using the BigQuery Model Exporter, it will take care of getting the data. 48 | 49 | ## Browser Data 50 | If you're just forwarding the events to Google Analytilcs, the system will override the user agent and client hits headers of the hit sent to Google Analytics Server, if you're uing the BigQuery Model Exporter, it wil take care of guessing the browser/device details and pass it to the event data. 51 | 52 | ## Anti-Adblocker Payloads 53 | 54 | Not sending the data to Google Endpoints will take care of some adblockers, but still they may check for the current payload hints, Analytics Debugger will accept encoded/binary payloads, being able to bypass any ad-blocker. 55 | 56 | ## MiddleWares 57 | There's some incoming support for using middleware and being able to remap the Big Query JSON Schema to other tools like ClickHouse / Snowplow, etc 58 | 59 | # Incoming Features 60 | 61 | ## Bot/Spam Scoring 62 | 63 | Several rules used to detect bots activities, the system will assign an scoring based on each rule. Think about this like an Email Spam Filtering. Then you can just tag the hits (using an event_parameter) or blocking them . Some examples of the rules to check 64 | 65 | - Is the current IP ASN from a known non residential provider 66 | - IP hits throttling ( too many hits from a single IP ) 67 | - User Agent validity Check 68 | - Integration with thirst parties IP blacklists 69 | 70 | ## PII Scrubber 71 | Filtering Personal Identificable Data is important, and hard at the same, Analytics Firewall, will be able to check the full payload details to find %LIKE% string in the values ( ie: email like values ) or specific parameters within the URL like values and will scrub them out for you automatically. 72 | 73 | ## Data Sanity Checking. 74 | 75 | I bet that some of you found that at point someone sent you a fake 1B transactions ruining all your reports. Since the the real Measurement ID can he hidden, we could even run some sanity check and nobody will be again be able to send data to your accounts. 76 | 77 | - For example you could defined that if there's a transaction where the value is > 1.000.000 and block it. 78 | - Hold a whitelist of event names, skipping the ones that are not on that list 79 | - Event paremeters white lists. Automatically remove parameters that shouldn't be on the current event 80 | - User properties. Automatically remove user properties that shouldn't be on the current event. 81 | - 82 | Guaranting that you won't get any data you don't want into your reports. 83 | --------------------------------------------------------------------------------