├── .gitmodules ├── LICENSE ├── README.md ├── generate_list.py └── phased_array_blocklist.txt /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "tracker-radar"] 2 | path = tracker-radar 3 | url = https://github.com/duckduckgo/tracker-radar 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2020, rto 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Phased Array 2 | 3 | ## What? 4 | 5 | A script that generates a privacy-focussed list of tracker domains that have been identified by [DuckDuckGo's Tracker Radar](https://spreadprivacy.com/duckduckgo-tracker-radar/) for use in ad blocker solutions like pi-hole. 6 | 7 | ## Requirements 8 | 9 | This script requires Python >= 3.6. 10 | 11 | ## How? 12 | 13 | ```bash 14 | git clone --recurse-submodules https://github.com/rto/phased-array.git phased-array 15 | cd phased-array 16 | python generate_lists.py 17 | ``` 18 | 19 | You can customise the input directory, output file pathname, and the line prefix via the command-line. 20 | 21 | You will likely also want to pick which [categories](https://github.com/duckduckgo/tracker-radar/blob/master/docs/CATEGORIES.md) you want to exclude from the list (see Limitations / Warnings below). 22 | 23 | Setting a `--line-prefix` can be used to generate a `hosts` formatted list. 24 | 25 | ```bash 26 | python generate_lists.py \ 27 | --input-directory my-tracker/domains \ 28 | --output-pathname /path/to/my-output.txt \ 29 | --destination-address '127.0.0.1' \ 30 | --exclude-uncategorized \ 31 | --exclude-category CDN \ 32 | --exclude-category 'Embedded Content' \ 33 | --exclude-category 'Federated Login' 34 | ``` 35 | 36 | See `--help` for a full list of configuration options. 37 | 38 | ## Limitations / Warnings 39 | 40 | _Striking a balance between privacy and usability is tough!_ 41 | 42 | Blocking by domain name can be a particularly blunt tool. By default the Tracker Radar includes domains for many popular websites and apps that you may wish to use on a daily basis. If you do not set any _exclude categories_ then your output may result in 'undesirable behaviour', i.e. your favourite website/app may stop working. 43 | 44 | **For example, github.com, google.com, paypal.com, etc would all be blocked if we included every single domain.** 45 | 46 | By default we have chosen to exclude any domain that matches one or more of the following categories: CDN, Embedded Content, Federated Login, Non-tracking, Online Payment, SSO. 47 | 48 | Depending on your personal preference or concerns you may wish to filter on different [categories](https://github.com/duckduckgo/tracker-radar/blob/master/docs/CATEGORIES.md). 49 | 50 | 51 | ## Future improvements 52 | 53 | - Generate different types of output (domains, hosts, regex) 54 | - Improve the way that we filter domains in or out of the list 55 | 56 | Any help on these gratefully received! :-) 57 | 58 | ## Source data 59 | 60 | This project makes use of the Tracker Radar data from DuckDuckGo is [licensed](https://raw.githubusercontent.com/duckduckgo/tracker-radar/master/LICENSE) under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). 61 | -------------------------------------------------------------------------------- /generate_list.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | import os 4 | import datetime 5 | 6 | 7 | INTRO_TEXT = """ 8 | # # PHASED ARRAY 9 | # 10 | # A privacy-focussed list of tracker domains that have been identified by 11 | # DuckDuckGo's Tracker Radar for use in ad blocker solutions like pi-hole. 12 | # 13 | # At present this is a particularly blunt tool, blocking entire domains, 14 | # rather than individual trackers. This may result in 'undesirable 15 | # behaviour', i.e. your favourite website/app may stop working. 16 | # 17 | # Project website: 18 | # 19 | # - https://github.com/rto/phased-array 20 | # 21 | # Find out more about Tracker Radar at: 22 | # 23 | # - https://spreadprivacy.com/duckduckgo-tracker-radar/ 24 | # - https://github.com/duckduckgo/tracker-radar 25 | # 26 | # Find out more about Pi-hole at: 27 | # 28 | # - https://pi-hole.net 29 | 30 | """ 31 | 32 | DEFAULT_EXCLUDED_CATEGORIES = ( 33 | "CDN", 34 | "Embedded Content", 35 | "Federated Login", 36 | "Non-tracking", 37 | "Online Payment", 38 | "SSO", 39 | ) 40 | 41 | 42 | def parse_args(): 43 | parser = argparse.ArgumentParser( 44 | description="Produce a hosts file from DuckDuckGo's Tracker Radar" 45 | ) 46 | parser.add_argument( 47 | "--input-directory", "-i", 48 | # type=str, 49 | help="Path to a directory containing Tracker Radar files", 50 | default="tracker-radar/domains", 51 | ) 52 | parser.add_argument( 53 | "--output-pathname", "-o", 54 | help="Pathname of a file to write to", 55 | default="phased_array_blocklist.txt" 56 | ) 57 | parser.add_argument( 58 | "--destination-address", "-d", 59 | help="Sinkhole destination address (sets output in hosts file format)", 60 | default="", 61 | ) 62 | parser.add_argument( 63 | "--exclude-uncategorized", "-u", 64 | action="store_true", 65 | help="Exclude uncategorized domains", 66 | default=False, 67 | ) 68 | parser.add_argument( 69 | "--exclude-category", "-e", 70 | dest="exclude_categories", 71 | action="append", 72 | help="Domains matching one or more of these categories will be skipped", 73 | default=None, 74 | ) 75 | return parser.parse_args() 76 | 77 | 78 | def main(): 79 | args = parse_args() 80 | 81 | file_count = 0 82 | domains_included = 0 83 | excluded_categories = set( 84 | args.exclude_categories or 85 | DEFAULT_EXCLUDED_CATEGORIES 86 | ) 87 | line_prefix = args.destination_address + "\t" if len(args.destination_address) > 0 else "" 88 | 89 | output_file = open(args.output_pathname, "w") 90 | output_file.write(INTRO_TEXT) 91 | stamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") 92 | output_file.write(f"# Blocklist generated: {stamp}\n#\n") 93 | output_file.write(f"# Exclude categories:\n") 94 | for category in excluded_categories: 95 | output_file.write(f"# - {category}\n") 96 | continue 97 | output_file.write(f"#\n# Exclude uncategorized:\n# - {args.exclude_uncategorized}\n\n") 98 | 99 | file_list = os.scandir(args.input_directory) 100 | 101 | for entry in file_list: 102 | if not entry.is_file() or not entry.path.endswith(".json"): 103 | continue 104 | file_count += 1 105 | data = json.load(open(entry.path, "r")) 106 | domain = data["domain"] 107 | categories = set(data["categories"]) 108 | if ( 109 | (not categories and args.exclude_uncategorized) or 110 | (categories and categories & excluded_categories) 111 | ): 112 | print(f"Skipping: {domain}") 113 | continue 114 | domains_included += 1 115 | print(f"Adding: {domain}") 116 | output_file.write(f"{line_prefix}{domain}\n") 117 | 118 | file_list.close() 119 | output_file.close() 120 | 121 | print( 122 | f"Added {domains_included} domains from {file_count} files " 123 | f"to {args.output_pathname}" 124 | ) 125 | 126 | if __name__ == "__main__": 127 | main() 128 | -------------------------------------------------------------------------------- /phased_array_blocklist.txt: -------------------------------------------------------------------------------- 1 | 2 | # # PHASED ARRAY 3 | # 4 | # A privacy-focussed list of tracker domains that have been identified by 5 | # DuckDuckGo's Tracker Radar for use in ad blocker solutions like pi-hole. 6 | # 7 | # At present this is a particularly blunt tool, blocking entire domains, 8 | # rather than individual trackers. This may result in 'undesirable 9 | # behaviour', i.e. your favourite website/app may stop working. 10 | # 11 | # Project website: 12 | # 13 | # - https://github.com/rto/phased-array 14 | # 15 | # Find out more about Tracker Radar at: 16 | # 17 | # - https://spreadprivacy.com/duckduckgo-tracker-radar/ 18 | # - https://github.com/duckduckgo/tracker-radar 19 | # 20 | # Find out more about Pi-hole at: 21 | # 22 | # - https://pi-hole.net 23 | 24 | # Blocklist generated: 2020-03-12 11:17:16 25 | # 26 | # Exclude categories: 27 | # - Online Payment 28 | # - Non-tracking 29 | # - CDN 30 | # - SSO 31 | # - Embedded Content 32 | # - Federated Login 33 | # 34 | # Exclude uncategorized: 35 | # - True 36 | 37 | abtasty.com 38 | adstanding.com 39 | tns-counter.ru 40 | trueleadid.com 41 | ctnsnet.com 42 | yieldlove.com 43 | exosrv.com 44 | igodigital.com 45 | listrakbi.com 46 | en25.com 47 | yadro.ru 48 | bluecava.com 49 | xplosion.de 50 | myvisualiq.net 51 | bfmio.com 52 | clicktale.net 53 | exoclick.com 54 | trustedshops.com 55 | adhaven.com 56 | revcontent.com 57 | nuggad.net 58 | mookie1.com 59 | cquotient.com 60 | exdynsrv.com 61 | tvpixel.com 62 | webvisor.org 63 | fwmrm.net 64 | adsrvr.org 65 | solocpm.com 66 | adsco.re 67 | erne.co 68 | mxpnl.com 69 | tremorhub.com 70 | alexametrics.com 71 | ipredictive.com 72 | bidswitch.net 73 | e-planning.net 74 | hsadspixel.net 75 | s-onetag.com 76 | ezoic.net 77 | btstatic.com 78 | wt-safetag.com 79 | criteo.net 80 | branch.io 81 | bidr.io 82 | tubemogul.com 83 | truefitcorp.com 84 | scarabresearch.com 85 | adgrx.com 86 | netseer.com 87 | ib-ibi.com 88 | rambler.ru 89 | gssprt.jp 90 | 2o7.net 91 | blismedia.com 92 | adthrive.com 93 | distiltag.com 94 | colossusssp.com 95 | agkn.com 96 | bounceexchange.com 97 | ibillboard.com 98 | dwin1.com 99 | yieldoptimizer.com 100 | treasuredata.com 101 | twiago.com 102 | parsely.com 103 | w55c.net 104 | mouseflow.com 105 | mynativeplatform.com 106 | doubleclick.net 107 | ninthdecimal.com 108 | samplicio.us 109 | everesttech.net 110 | districtm.ca 111 | revjet.com 112 | theadex.com 113 | trustarc.com 114 | realvu.net 115 | adform.net 116 | truste.com 117 | districtm.io 118 | company-target.com 119 | fullstory.com 120 | impactradius-event.com 121 | rtmark.net 122 | kargo.com 123 | securedvisit.com 124 | pubmatic.com 125 | eum-appdynamics.com 126 | mathtag.com 127 | ywxi.net 128 | adswizz.com 129 | weborama.fr 130 | itsup.com 131 | fg8dgt.com 132 | assoc-amazon.com 133 | postrelease.com 134 | marinsm.com 135 | yieldmo.com 136 | digitru.st 137 | tsyndicate.com 138 | lfstmedia.com 139 | 254a.com 140 | newrelic.com 141 | m6r.eu 142 | googlesyndication.com 143 | criteo.com 144 | eyereturn.com 145 | segment.com 146 | ixiaa.com 147 | demdex.net 148 | clickagy.com 149 | tynt.com 150 | thrtle.com 151 | stickyadstv.com 152 | zemanta.com 153 | siteimprove.com 154 | snapchat.com 155 | betweendigital.com 156 | cxense.com 157 | ru4.com 158 | adtdp.com 159 | adotmob.com 160 | reson8.com 161 | tapad.com 162 | adriver.ru 163 | statcounter.com 164 | tribalfusion.com 165 | smrtb.com 166 | 2mdn.net 167 | moatads.com 168 | volvelle.tech 169 | mxptint.net 170 | contextweb.com 171 | indexww.com 172 | appdynamics.com 173 | sitescout.com 174 | usabilla.com 175 | ladsp.com 176 | walmart.com 177 | amung.us 178 | sharethrough.com 179 | micpn.com 180 | adsafeprotected.com 181 | tru.am 182 | admixer.net 183 | spotxchange.com 184 | microad.jp 185 | tiqcdn.com 186 | histats.com 187 | trackcmp.net 188 | popads.net 189 | lijit.com 190 | googletagservices.com 191 | navdmp.com 192 | amazon-adsystem.com 193 | survata.com 194 | ebay.com 195 | mediavine.com 196 | bannerflow.com 197 | aidata.io 198 | storygize.net 199 | hs-analytics.net 200 | brealtime.com 201 | adnium.com 202 | tealiumiq.com 203 | rmtag.com 204 | dc-storm.com 205 | 33across.com 206 | gemius.pl 207 | listrak.com 208 | ispot.tv 209 | justpremium.com 210 | ravenjs.com 211 | pagefair.com 212 | linksynergy.com 213 | pro-market.net 214 | contentabc.com 215 | vindicosuite.com 216 | omtrdc.net 217 | betrad.com 218 | yimg.jp 219 | gumgum.com 220 | adroll.com 221 | ioam.de 222 | xiti.com 223 | fastly-insights.com 224 | crsspxl.com 225 | getclicky.com 226 | doubleverify.com 227 | hybrid.ai 228 | wcfbc.net 229 | zorosrv.com 230 | sonobi.com 231 | dyntrk.com 232 | uncn.jp 233 | googletagmanager.com 234 | id5-sync.com 235 | pingdom.net 236 | tvsquared.com 237 | ads-twitter.com 238 | adrta.com 239 | siteimproveanalytics.io 240 | maxmind.com 241 | mgid.com 242 | adition.com 243 | mixpanel.com 244 | ligatus.com 245 | rfihub.net 246 | ad-m.asia 247 | scorecardresearch.com 248 | c3tag.com 249 | adlightning.com 250 | everestads.net 251 | altitude-arena.com 252 | clrstm.com 253 | areyouahuman.com 254 | exelator.com 255 | adsafety.net 256 | iasds01.com 257 | casalemedia.com 258 | mediabong.net 259 | app.link 260 | redditstatic.com 261 | atdmt.com 262 | ml314.com 263 | perimeterx.net 264 | admedo.com 265 | apxlv.com 266 | bizographics.com 267 | nr-data.net 268 | adfox.ru 269 | deliverimp.com 270 | deepintent.com 271 | chartbeat.com 272 | simpli.fi 273 | cpx.to 274 | blueconic.net 275 | pushcrew.com 276 | rezync.com 277 | userreport.com 278 | visualwebsiteoptimizer.com 279 | tagcommander.com 280 | adobedtm.com 281 | meetrics.net 282 | hsleadflows.net 283 | netmng.com 284 | insightexpressai.com 285 | turn.com 286 | hubapi.com 287 | rundsp.com 288 | weborama.com 289 | rkdms.com 290 | viglink.com 291 | inspectlet.com 292 | mfadsrvr.com 293 | adsnative.com 294 | sessioncam.com 295 | alcmpn.com 296 | adsymptotic.com 297 | steelhousemedia.com 298 | siteimproveanalytics.com 299 | rutarget.ru 300 | imrworldwide.com 301 | gwallet.com 302 | teads.tv 303 | 360yield.com 304 | bluekai.com 305 | omnitagjs.com 306 | tidaltv.com 307 | mediaplex.com 308 | videohub.tv 309 | go-mpulse.net 310 | pswec.com 311 | clickcertain.com 312 | vertamedia.com 313 | taboola.com 314 | smartadserver.com 315 | google-analytics.com 316 | yottaa.net 317 | deployads.com 318 | o333o.com 319 | googleadservices.com 320 | avocet.io 321 | ntv.io 322 | realsrv.com 323 | adkernel.com 324 | contentsquare.net 325 | mediawallahscript.com 326 | adscale.de 327 | hotjar.com 328 | adnxs.com 329 | intentiq.com 330 | df-srv.de 331 | crwdcntrl.net 332 | loopme.me 333 | everestjs.net 334 | adentifi.com 335 | rfihub.com 336 | connexity.net 337 | demandbase.com 338 | adhigh.net 339 | heapanalytics.com 340 | xg4ken.com 341 | foresee.com 342 | truoptik.com 343 | hellobar.com 344 | ensighten.com 345 | evidon.com 346 | 3lift.com 347 | acuityplatform.com 348 | pippio.com 349 | yahoo.co.jp 350 | juicyads.com 351 | buysellads.com 352 | adtech.de 353 | yieldlab.net 354 | maxymiser.net 355 | attn.tv 356 | s3xified.com 357 | lkqd.net 358 | amplitude.com 359 | 1dmp.io 360 | trustx.org 361 | exponential.com 362 | openx.net 363 | socdm.com 364 | stackadapt.com 365 | creative-serving.com 366 | owneriq.net 367 | media6degrees.com 368 | keywee.co 369 | onthe.io 370 | sojern.com 371 | semasio.net 372 | krxd.net 373 | cogocast.net 374 | mrpdata.net 375 | permutive.com 376 | eyeota.net 377 | lockerdome.com 378 | dotomi.com 379 | 1rx.io 380 | advertising.com 381 | rubiconproject.com 382 | media.net 383 | innovid.com 384 | rqtrk.eu 385 | perfectmarket.com 386 | mobileadtrading.com 387 | ero-advertising.com 388 | liadm.com 389 | outbrain.com 390 | undertone.com 391 | emxdgt.com 392 | springserve.com 393 | technoratimedia.com 394 | eloqua.com 395 | commander1.com 396 | thebrighttag.com 397 | quantserve.com 398 | dtscout.com 399 | --------------------------------------------------------------------------------