├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── cookies.txt
├── ignore-list
├── pipeline.py
├── reddit.lua
└── user-agents


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | *.pyc
3 | wget-lua
4 | wget-at
5 | STOP
6 | BANNED
7 | data/
8 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM atdr.meo.ws/archiveteam/grab-base:gnutls
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | This is free and unencumbered software released into the public domain.
 2 | 
 3 | Anyone is free to copy, modify, publish, use, compile, sell, or
 4 | distribute this software, either in source code form or as a compiled
 5 | binary, for any purpose, commercial or non-commercial, and by any
 6 | means.
 7 | 
 8 | In jurisdictions that recognize copyright laws, the author or authors
 9 | of this software dedicate any and all copyright interest in the
10 | software to the public domain. We make this dedication for the benefit
11 | of the public at large and to the detriment of our heirs and
12 | successors. We intend this dedication to be an overt act of
13 | relinquishment in perpetuity of all present and future rights to this
14 | software under copyright law.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 | 
24 | For more information, please refer to <http://unlicense.org>
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # reddit-grab
 2 | 
 3 | More information about the archiving project can be found on the ArchiveTeam wiki: [Reddit](https://wiki.archiveteam.org/index.php?title=Reddit)
 4 | 
 5 | ## Setup instructions
 6 | 
 7 | ### General instructions
 8 | 
 9 | Data integrity is very important in Archive Team projects. Please note the following important rules:
10 | 
11 | * [Do not use proxies or VPNs](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_use_whatever_internet_access_for_the_Warrior?).
12 | * Run the project using the either the Warrior or the project-specific Docker container as listed below. [Do not modify project code](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#I'd_like_to_help_write_code_or_I_want_to_tweak_the_scripts_to_run_to_my_liking._Where_can_I_find_more_info?_Where_is_the_source_code_and_repository?). Compiling the project dependencies yourself is no longer supported.
13 | * You can share your tracker nickname(s) across machine(s) you personally operate, but not with machines operated by other users. Nickname sharing makes it harder to inspect data if a problem arises.
14 | * [Use clean internet connections](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_use_whatever_internet_access_for_the_Warrior?).
15 | * Only x64-based machines are supported. [ARM (used on Raspberry Pi and Apple Silicon Macs) is not currently supported](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Can_I_run_the_Warrior_on_ARM_or_some_other_unusual_architecture?).
16 | * See the [Archive Team Wiki](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior#Warrior_FAQ) for additional information.
17 | 
18 | We strongly encourage you to join the IRC channel associated with this project in order to be informed about project updates and other important announcements, as well as to be reachable in the event of an issue. The Archive Team Wiki has [more information about IRC](https://wiki.archiveteam.org/index.php/Archiveteam:IRC). We can be found at hackint IRC [#shreddit](https://webirc.hackint.org/#irc://irc.hackint.org/#shreddit).
19 | 
20 | **If you have any questions or issues during setup, please review the wiki pages or contact us on IRC for troubleshooting information.**
21 | 
22 | ### Running the project
23 | 
24 | #### Archive Team Warrior (recommended for most users)
25 | 
26 | This and other archiving projects can easily be run using the [Archive Team Warrior](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior) virtual machine. Follow the [instructions on the Archive Team wiki](https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior) for installing the Warrior, and from the web interface running at `http://localhost:8001/`, enter the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like. Then, select the `Reddit` project in the Warrior interface.
27 | 
28 | #### Project-specific Docker container (for more advanced users)
29 | 
30 | Alternatively, more advanced users can also run projects using Docker. While users of the Warrior can switch between projects using a web interface, Docker containers are specific to each project. However, while the Warrior supports a maximum of 6 concurrent items, a Docker container supports a maximum of 20 concurrent items. The instructions below are a short overview. For more information and detailed explanations of the commands, follow the follow the [Docker instructions on the Archive Team wiki](https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker).
31 | 
32 | It is advised to use [Watchtower](https://github.com/containrrr/watchtower) to automatically update the project container:
33 | 
34 |     docker run -d --name watchtower --restart=unless-stopped -v /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --label-enable --cleanup --interval 3600 --include-restarting
35 | 
36 | after which the project container can be run:
37 | 
38 |     docker run -d --name archiveteam --label=com.centurylinklabs.watchtower.enable=true --log-driver json-file --log-opt max-size=50m --restart=unless-stopped atdr.meo.ws/archiveteam/reddit-grab --concurrent 1 YOURNICKHERE
39 | 
40 | Be sure to replace `YOURNICKHERE` with the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like.
41 | 
42 | ### Supporting Archive Team
43 | 
44 | Behind the scenes Archive Team has infrastructure to run the projects and process the data with. If you would like to help out with the costs of our infrastructure, a donation on our [Open Collective](https://opencollective.com/archiveteam) would be very welcome.
45 | 
46 | ### Issues in the code
47 | 
48 | If you notice a bug and want to file a bug report, please use the GitHub issues tracker.
49 | 
50 | Are you a developer? Help write code for us! Look at our [developer documentation](https://wiki.archiveteam.org/index.php?title=Dev) for details.
51 | 
52 | ### Other problems
53 | 
54 | Have an issue not listed here? Join us on IRC and ask! We can be found at hackint IRC [#shreddit](https://webirc.hackint.org/#irc://irc.hackint.org/#shreddit).
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/cookies.txt:
--------------------------------------------------------------------------------
1 | .reddit.com	TRUE	/	FALSE	0	eu_cookie_v2	3
2 | .reddit.com	TRUE	/	FALSE	0	over18	1
3 | .reddit.com	TRUE	/	FALSE	0	_options	%7B%22pref_quarantine_optin%22%3A%20true%2C%20%22pref_gated_sr_optin%22%3A%20true%7D
4 | 


--------------------------------------------------------------------------------
/ignore-list:
--------------------------------------------------------------------------------
1 | https://old.reddit.com/static/opensearch.xml
2 | https://reddit.com/static/pixel.png
3 | 


--------------------------------------------------------------------------------
/pipeline.py:
--------------------------------------------------------------------------------
  1 | # encoding=utf8
  2 | import datetime
  3 | from distutils.version import StrictVersion
  4 | import hashlib
  5 | import os.path
  6 | import random
  7 | import re
  8 | from seesaw.config import realize, NumberConfigValue
  9 | from seesaw.externalprocess import ExternalProcess
 10 | from seesaw.item import ItemInterpolation, ItemValue
 11 | from seesaw.task import SimpleTask, LimitConcurrent
 12 | from seesaw.tracker import GetItemFromTracker, PrepareStatsForTracker, \
 13 |     UploadWithTracker, SendDoneToTracker
 14 | import shutil
 15 | import socket
 16 | import subprocess
 17 | import sys
 18 | import time
 19 | import string
 20 | 
 21 | import seesaw
 22 | from seesaw.externalprocess import WgetDownload
 23 | from seesaw.pipeline import Pipeline
 24 | from seesaw.project import Project
 25 | from seesaw.util import find_executable
 26 | 
 27 | from tornado import httpclient
 28 | 
 29 | import requests
 30 | import zstandard
 31 | 
 32 | if StrictVersion(seesaw.__version__) < StrictVersion('0.8.5'):
 33 |     raise Exception('This pipeline needs seesaw version 0.8.5 or higher.')
 34 | 
 35 | 
 36 | ###########################################################################
 37 | # Find a useful Wget+Lua executable.
 38 | #
 39 | # WGET_AT will be set to the first path that
 40 | # 1. does not crash with --version, and
 41 | # 2. prints the required version string
 42 | 
 43 | class HigherVersion:
 44 |     def __init__(self, expression, min_version):
 45 |         self._expression = re.compile(expression)
 46 |         self._min_version = min_version
 47 | 
 48 |     def search(self, text):
 49 |         for result in self._expression.findall(text):
 50 |             if result >= self._min_version:
 51 |                 print('Found version {}.'.format(result))
 52 |                 return True
 53 | 
 54 | WGET_AT = find_executable(
 55 |     'Wget+AT',
 56 |     HigherVersion(
 57 |         r'(GNU Wget 1\.[0-9]{2}\.[0-9]{1}-at\.[0-9]{8}\.[0-9]{2})[^0-9a-zA-Z\.-_]',
 58 |         'GNU Wget 1.21.3-at.20231213.03'
 59 |     ),
 60 |     [
 61 |         './wget-at',
 62 |         '/home/warrior/data/wget-at-gnutls'
 63 |     ]
 64 | )
 65 | 
 66 | if not WGET_AT:
 67 |     raise Exception('No usable Wget+At found.')
 68 | 
 69 | 
 70 | ###########################################################################
 71 | # The version number of this pipeline definition.
 72 | #
 73 | # Update this each time you make a non-cosmetic change.
 74 | # It will be added to the WARC files and reported to the tracker.
 75 | VERSION = '20240216.01'
 76 | TRACKER_ID = 'reddit'
 77 | TRACKER_HOST = 'legacy-api.arpa.li'
 78 | MULTI_ITEM_SIZE = 100
 79 | 
 80 | 
 81 | ###########################################################################
 82 | # This section defines project-specific tasks.
 83 | #
 84 | # Simple tasks (tasks that do not need any concurrency) are based on the
 85 | # SimpleTask class and have a process(item) method that is called for
 86 | # each item.
 87 | class CheckIP(SimpleTask):
 88 |     def __init__(self):
 89 |         SimpleTask.__init__(self, 'CheckIP')
 90 |         self._counter = 0
 91 | 
 92 |     def process(self, item):
 93 |         # NEW for 2014! Check if we are behind firewall/proxy
 94 | 
 95 |         if self._counter <= 0:
 96 |             item.log_output('Checking IP address.')
 97 |             ip_set = set()
 98 | 
 99 |             ip_set.add(socket.gethostbyname('twitter.com'))
100 |             #ip_set.add(socket.gethostbyname('facebook.com'))
101 |             ip_set.add(socket.gethostbyname('youtube.com'))
102 |             ip_set.add(socket.gethostbyname('microsoft.com'))
103 |             ip_set.add(socket.gethostbyname('icanhas.cheezburger.com'))
104 |             ip_set.add(socket.gethostbyname('archiveteam.org'))
105 | 
106 |             if len(ip_set) != 5:
107 |                 item.log_output('Got IP addresses: {0}'.format(ip_set))
108 |                 item.log_output(
109 |                     'Are you behind a firewall/proxy? That is a big no-no!')
110 |                 raise Exception(
111 |                     'Are you behind a firewall/proxy? That is a big no-no!')
112 | 
113 |         # Check only occasionally
114 |         if self._counter <= 0:
115 |             self._counter = 10
116 |         else:
117 |             self._counter -= 1
118 | 
119 | 
120 | class PrepareDirectories(SimpleTask):
121 |     def __init__(self, warc_prefix):
122 |         SimpleTask.__init__(self, 'PrepareDirectories')
123 |         self.warc_prefix = warc_prefix
124 | 
125 |     def process(self, item):
126 |         item_name = item['item_name']
127 |         item_name_hash = hashlib.sha1(item_name.encode('utf8')).hexdigest()
128 |         escaped_item_name = item_name_hash
129 |         dirname = '/'.join((item['data_dir'], escaped_item_name))
130 | 
131 |         if os.path.isdir(dirname):
132 |             shutil.rmtree(dirname)
133 | 
134 |         os.makedirs(dirname)
135 | 
136 |         item['item_dir'] = dirname
137 |         item['warc_file_base'] = '-'.join([
138 |             self.warc_prefix,
139 |             item_name_hash,
140 |             time.strftime('%Y%m%d-%H%M%S')
141 |         ])
142 | 
143 |         open('%(item_dir)s/%(warc_file_base)s.warc.zst' % item, 'w').close()
144 |         open('%(item_dir)s/%(warc_file_base)s_data.txt' % item, 'w').close()
145 | 
146 | class MoveFiles(SimpleTask):
147 |     def __init__(self):
148 |         SimpleTask.__init__(self, 'MoveFiles')
149 | 
150 |     def process(self, item):
151 |         os.rename('%(item_dir)s/%(warc_file_base)s.warc.zst' % item,
152 |               '%(data_dir)s/%(warc_file_base)s.%(dict_project)s.%(dict_id)s.warc.zst' % item)
153 |         os.rename('%(item_dir)s/%(warc_file_base)s_data.txt' % item,
154 |               '%(data_dir)s/%(warc_file_base)s_data.txt' % item)
155 | 
156 |         shutil.rmtree('%(item_dir)s' % item)
157 | 
158 | 
159 | class SetBadUrls(SimpleTask):
160 |     def __init__(self):
161 |         SimpleTask.__init__(self, 'SetBadUrls')
162 | 
163 |     def process(self, item):
164 |         item['item_name_original'] = item['item_name']
165 |         items = item['item_name'].split('\0')
166 |         items_lower = [s.lower() for s in items]
167 |         with open('%(item_dir)s/%(warc_file_base)s_bad-items.txt' % item, 'r') as f:
168 |             for aborted_item in f:
169 |                 aborted_item = aborted_item.strip().lower()
170 |                 index = items_lower.index(aborted_item)
171 |                 item.log_output('Item {} is aborted.'.format(aborted_item))
172 |                 items.pop(index)
173 |                 items_lower.pop(index)
174 |         item['item_name'] = '\0'.join(items)
175 | 
176 | 
177 | class MaybeSendDoneToTracker(SendDoneToTracker):
178 |     def enqueue(self, item):
179 |         if len(item['item_name']) == 0:
180 |             return self.complete_item(item)
181 |         return super(MaybeSendDoneToTracker, self).enqueue(item)
182 | 
183 | 
184 | def get_hash(filename):
185 |     with open(filename, 'rb') as in_file:
186 |         return hashlib.sha1(in_file.read()).hexdigest()
187 | 
188 | CWD = os.getcwd()
189 | PIPELINE_SHA1 = get_hash(os.path.join(CWD, 'pipeline.py'))
190 | LUA_SHA1 = get_hash(os.path.join(CWD, 'reddit.lua'))
191 | 
192 | def stats_id_function(item):
193 |     d = {
194 |         'pipeline_hash': PIPELINE_SHA1,
195 |         'lua_hash': LUA_SHA1,
196 |         'python_version': sys.version,
197 |     }
198 | 
199 |     return d
200 | 
201 | 
202 | class ZstdDict(object):
203 |     created = 0
204 |     data = None
205 | 
206 |     @classmethod
207 |     def get_dict(cls):
208 |         if cls.data is not None and time.time() - cls.created < 1800:
209 |             return cls.data
210 |         response = requests.get(
211 |             'https://legacy-api.arpa.li/dictionary',
212 |             params={
213 |                 'project': 'reddit'
214 |             }
215 |         )
216 |         response.raise_for_status()
217 |         response = response.json()
218 |         if cls.data is not None and response['id'] == cls.data['id']:
219 |             cls.created = time.time()
220 |             return cls.data
221 |         print('Downloading latest dictionary.')
222 |         response_dict = requests.get(response['url'])
223 |         response_dict.raise_for_status()
224 |         raw_data = response_dict.content
225 |         if hashlib.sha256(raw_data).hexdigest() != response['sha256']:
226 |             raise ValueError('Hash of downloaded dictionary does not match.')
227 |         if raw_data[:4] == b'\x28\xB5\x2F\xFD':
228 |             raw_data = zstandard.ZstdDecompressor().decompress(raw_data)
229 |         cls.data = {
230 |             'id': response['id'],
231 |             'dict': raw_data
232 |         }
233 |         cls.created = time.time()
234 |         return cls.data
235 | 
236 | 
237 | class WgetArgs(object):
238 |     post_chars = string.digits + string.ascii_lowercase
239 | 
240 |     def int_to_str(self, i):
241 |         d, m = divmod(i, 36)
242 |         if d > 0:
243 |             return self.int_to_str(d) + self.post_chars[m]
244 |         return self.post_chars[m]
245 | 
246 |     def realize(self, item):
247 |         with open('user-agents', 'r') as f:
248 |             user_agent = random.choice(list(f)).strip()
249 |         wget_args = [
250 |             WGET_AT,
251 |             '-U', user_agent,
252 |             '-nv',
253 |             '--host-lookups', 'dns',
254 |             '--hosts-file', '/dev/null',
255 |             '--resolvconf-file', '/dev/null',
256 |             '--dns-servers', '9.9.9.10,149.112.112.10,2620:fe::10,2620:fe::fe:10',
257 |             '--reject-reserved-subnets',
258 |             '--load-cookies', 'cookies.txt',
259 |             '--content-on-error',
260 |             '--no-http-keep-alive',
261 |             '--lua-script', 'reddit.lua',
262 |             '-o', ItemInterpolation('%(item_dir)s/wget.log'),
263 |             '--no-check-certificate',
264 |             '--output-document', ItemInterpolation('%(item_dir)s/wget.tmp'),
265 |             '--truncate-output',
266 |             '-e', 'robots=off',
267 |             '--rotate-dns',
268 |             '--recursive', '--level=inf',
269 |             '--no-parent',
270 |             '--page-requisites',
271 |             '--timeout', '30',
272 |             '--tries', 'inf',
273 |             '--domains', 'reddit.com',
274 |             '--span-hosts',
275 |             '--waitretry', '30',
276 |             '--warc-file', ItemInterpolation('%(item_dir)s/%(warc_file_base)s'),
277 |             '--warc-header', 'operator: Archive Team',
278 |             '--warc-header', 'x-wget-at-project-version: ' + VERSION,
279 |             '--warc-header', 'x-wget-at-project-name: ' + TRACKER_ID,
280 |             '--warc-dedup-url-agnostic',
281 |             '--warc-compression-use-zstd',
282 |             '--warc-zstd-dict-no-include',
283 |             '--header', 'Accept-Language: en-US;q=0.9, en;q=0.8',
284 |             '--secure-protocol', 'TLSv1_2',
285 |             #'--ciphers', '+ECDHE-RSA:+AES-256-CBC:+SHA384'
286 |         ]
287 |         dict_data = ZstdDict.get_dict()
288 |         with open(os.path.join(item['item_dir'], 'zstdict'), 'wb') as f:
289 |             f.write(dict_data['dict'])
290 |         item['dict_id'] = dict_data['id']
291 |         item['dict_project'] = 'reddit'
292 |         wget_args.extend([
293 |             '--warc-zstd-dict', ItemInterpolation('%(item_dir)s/zstdict'),
294 |         ])
295 | 
296 |         for item_name in item['item_name'].split('\0'):
297 |           wget_args.extend(['--warc-header', 'x-wget-at-project-item-name: '+item_name])
298 |           wget_args.append('item-name://'+item_name)
299 |           item_type, item_value = item_name.split(':', 1)
300 |           if item_type == 'post':
301 |               wget_args.extend(['--warc-header', 'reddit-post: '+item_value])
302 |               wget_args.append('https://www.reddit.com/api/info.json?id=t3_'+item_value)
303 |           elif item_type == 'comment':
304 |               wget_args.extend(['--warc-header', 'reddit-comment: '+item_value])
305 |               wget_args.append('https://www.reddit.com/api/info.json?id=t1_'+item_value)
306 |           elif item_type == 'url':
307 |               wget_args.extend(['--warc-header', 'reddit-media-url: '+item_value])
308 |               wget_args.append(item_value)
309 |           else:
310 |               raise Exception('Unknown item')
311 | 
312 |         item['item_name_newline'] = item['item_name'].replace('\0', '\n')
313 | 
314 |         if 'bind_address' in globals():
315 |             wget_args.extend(['--bind-address', globals()['bind_address']])
316 |             print('')
317 |             print('*** Wget will bind address at {0} ***'.format(
318 |                 globals()['bind_address']))
319 |             print('')
320 | 
321 |         return realize(wget_args, item)
322 | 
323 | ###########################################################################
324 | # Initialize the project.
325 | #
326 | # This will be shown in the warrior management panel. The logo should not
327 | # be too big. The deadline is optional.
328 | project = Project(
329 |     title='reddit',
330 |     project_html='''
331 |         <img class="project-logo" alt="Project logo" src="https://www.archiveteam.org/images/b/b5/Reddit_logo.png" height="50px" title=""/>
332 |         <h2>reddit.com <span class="links"><a href="https://reddit.com/">Website</a> &middot; <a href="http://tracker.archiveteam.org/reddit/">Leaderboard</a></span></h2>
333 |         <p>Archiving everything from reddit.</p>
334 |     '''
335 | )
336 | 
337 | pipeline = Pipeline(
338 |     CheckIP(),
339 |     GetItemFromTracker('http://{}/{}/multi={}/'
340 |         .format(TRACKER_HOST, TRACKER_ID, MULTI_ITEM_SIZE),
341 |         downloader, VERSION),
342 |     PrepareDirectories(warc_prefix='reddit'),
343 |     WgetDownload(
344 |         WgetArgs(),
345 |         max_tries=2,
346 |         accept_on_exit_code=[0, 4, 8],
347 |         env={
348 |             'item_dir': ItemValue('item_dir'),
349 |             'item_names': ItemValue('item_name_newline'),
350 |             'warc_file_base': ItemValue('warc_file_base'),
351 |         }
352 |     ),
353 |     SetBadUrls(),
354 |     PrepareStatsForTracker(
355 |         defaults={'downloader': downloader, 'version': VERSION},
356 |         file_groups={
357 |             'data': [
358 |                 ItemInterpolation('%(item_dir)s/%(warc_file_base)s.warc.zst')
359 |             ]
360 |         },
361 |         id_function=stats_id_function,
362 |     ),
363 |     MoveFiles(),
364 |     LimitConcurrent(NumberConfigValue(min=1, max=20, default='20',
365 |         name='shared:rsync_threads', title='Rsync threads',
366 |         description='The maximum number of concurrent uploads.'),
367 |         UploadWithTracker(
368 |             'http://%s/%s' % (TRACKER_HOST, TRACKER_ID),
369 |             downloader=downloader,
370 |             version=VERSION,
371 |             files=[
372 |                 ItemInterpolation('%(data_dir)s/%(warc_file_base)s.%(dict_project)s.%(dict_id)s.warc.zst'),
373 |                 ItemInterpolation('%(data_dir)s/%(warc_file_base)s_data.txt')
374 |             ],
375 |             rsync_target_source_path=ItemInterpolation('%(data_dir)s/'),
376 |             rsync_extra_args=[
377 |                 '--recursive',
378 |                 '--min-size', '1',
379 |                 '--no-compress',
380 |                 '--compress-level', '0'
381 |             ]
382 |         ),
383 |     ),
384 |     MaybeSendDoneToTracker(
385 |         tracker_url='http://%s/%s' % (TRACKER_HOST, TRACKER_ID),
386 |         stats=ItemValue('stats')
387 |     )
388 | )
389 | 


--------------------------------------------------------------------------------
/reddit.lua:
--------------------------------------------------------------------------------
  1 | local urlparse = require("socket.url")
  2 | local http = require("socket.http")
  3 | local cjson = require("cjson")
  4 | local utf8 = require("utf8")
  5 | 
  6 | local item_names = os.getenv('item_names')
  7 | local item_dir = os.getenv('item_dir')
  8 | local warc_file_base = os.getenv('warc_file_base')
  9 | local item_type = nil
 10 | local item_name = nil
 11 | local item_value = nil
 12 | 
 13 | local selftext = nil
 14 | local retry_url = true
 15 | 
 16 | local item_types = {}
 17 | for s in string.gmatch(item_names, "([^\n]+)") do
 18 |   local t, n = string.match(s, "^([^:]+):(.+)$")
 19 |   item_types[n] = t
 20 | end
 21 | 
 22 | if urlparse == nil or http == nil then
 23 |   io.stdout:write("socket not corrently installed.\n")
 24 |   io.stdout:flush()
 25 |   abortgrab = true
 26 | end
 27 | 
 28 | local url_count = 0
 29 | local tries = 0
 30 | local downloaded = {}
 31 | local addedtolist = {}
 32 | local abortgrab = false
 33 | local killgrab = false
 34 | 
 35 | local posts = {}
 36 | local requested_children = {}
 37 | local is_crosspost = false
 38 | 
 39 | local outlinks = {}
 40 | local reddit_media_urls = {}
 41 | 
 42 | local bad_items = {}
 43 | 
 44 | for ignore in io.open("ignore-list", "r"):lines() do
 45 |   downloaded[ignore] = true
 46 | end
 47 | 
 48 | abort_item = function(item)
 49 |   abortgrab = true
 50 |   if not item then
 51 |     item = item_name
 52 |   end
 53 |   if not bad_items[item] then
 54 |     io.stdout:write("Aborting item " .. item .. ".\n")
 55 |     io.stdout:flush()
 56 |     bad_items[item] = true
 57 |   end
 58 | end
 59 | 
 60 | kill_grab = function(item)
 61 |   io.stdout:write("Aborting crawling.\n")
 62 |   killgrab = true
 63 | end
 64 | 
 65 | read_file = function(file)
 66 |   if file then
 67 |     local f = assert(io.open(file))
 68 |     local data = f:read("*all")
 69 |     f:close()
 70 |     return data
 71 |   else
 72 |     return ""
 73 |   end
 74 | end
 75 | 
 76 | processed = function(url)
 77 |   if downloaded[url] or addedtolist[url] then
 78 |     return true
 79 |   end
 80 |   return false
 81 | end
 82 | 
 83 | allowed = function(url, parenturl)
 84 |   if item_type == "url" then
 85 |     if url ~= item_value then
 86 |       reddit_media_urls["url:" .. url] = true
 87 |       return false
 88 |     end
 89 |     return true
 90 |   end
 91 | 
 92 |   --[[if string.match(url, "^https?://www%.reddit%.com/svc/") then
 93 |     return true
 94 |   end]]
 95 | 
 96 |   if string.match(url, "'+")
 97 |     or string.match(urlparse.unescape(url), "[<>\\%$%^%[%]%(%){}]")
 98 |     or string.match(url, "^https?://[^/]*reddit%.com/[^%?]+%?context=[0-9]+&depth=[0-9]+")
 99 |     or string.match(url, "^https?://[^/]*reddit%.com/[^%?]+%?depth=[0-9]+&context=[0-9]+")
100 |     or string.match(url, "^https?://[^/]*reddit%.com/login")
101 |     or string.match(url, "^https?://[^/]*reddit%.com/register")
102 |     or string.match(url, "^https?://[^/]*reddit%.com/r/undefined/")
103 |     or (
104 |       string.match(url, "%?sort=")
105 |       and not string.match(url, "/svc/")
106 |     )
107 |     or string.match(url, "%?limit=500$")
108 |     or string.match(url, "%?ref=readnext$")
109 |     or string.match(url, "/tailwind%-build%.css$")
110 |     or string.match(url, "^https?://v%.redd%.it/.+%?source=fallback$")
111 |     or string.match(url, "^https?://[^/]*reddit%.app%.link/")
112 |     or string.match(url, "^https?://out%.reddit%.com/r/")
113 |     or string.match(url, "^https?://old%.reddit%.com/gallery/")
114 |     or string.match(url, "^https?://old%.reddit%.com/gold%?")
115 |     or string.match(url, "^https?://[^/]+/over18.+dest=https%%3A%%2F%%2Fold%.reddit%.com")
116 |     or string.match(url, "^https?://old%.[^%?]+%?utm_source=reddit")
117 |     or string.match(url, "/%?context=1$")
118 |     or string.match(url, '/"$')
119 |     or string.match(url, "^https?://[^/]+/message/compose")
120 |     or string.match(url, "www%.reddit%.com/avatar[/]?$")
121 |     or (
122 |       string.match(url, "^https?://gateway%.reddit%.com/")
123 |       and not string.match(url, "/morecomments/")
124 |     )
125 |     or string.match(url, "/%.rss$")
126 |     or (
127 |       parenturl
128 |       and string.match(url, "^https?://amp%.reddit%.com/")
129 |     )
130 |     or (
131 |       parenturl
132 |       and string.match(url, "^https?://v%.redd%.it/[^/]+/HLSPlaylist%.m3u8")
133 |     )
134 |     or (
135 |       item_type == "post"
136 |       and (
137 |         string.match(url, "^https?://[^/]*reddit%.com/r/[^/]+/comments/[0-9a-z]+/[^/]+/[0-9a-z]+/?$")
138 |         or string.match(url, "^https?://[^/]*reddit%.com/r/[^/]+/comments/[0-9a-z]+/[^/]+/[0-9a-z]+/?%?utm_source=")
139 |       )
140 |     )
141 |     or (
142 |       parenturl
143 |       and string.match(parenturl, "^https?://[^/]*reddit%.com/r/[^/]+/duplicates/")
144 |       and string.match(url, "^https?://[^/]*reddit%.com/r/[^/]+/duplicates/")
145 |     )
146 |     or (
147 |       parenturl
148 |       and string.match(parenturl, "^https?://[^/]*reddit%.com/user/[^/]+/duplicates/")
149 |       and string.match(url, "^https?://[^/]*reddit%.com/user/[^/]+/duplicates/")
150 |     )
151 |     or (
152 |       parenturl
153 |       and string.match(parenturl, "^https?://[^/]+/r/EASportsFC/")
154 |       and string.match(url, "^https?://[^/]+/r/FIFA/")
155 |     ) then
156 |     return false
157 |   end
158 | 
159 |   local tested = {}
160 |   for s in string.gmatch(url, "([^/]+)") do
161 |     if tested[s] == nil then
162 |       tested[s] = 0
163 |     end
164 |     if tested[s] == 6 then
165 |       return false
166 |     end
167 |     tested[s] = tested[s] + 1
168 |   end
169 | 
170 |   if not (
171 |     string.match(url, "^https?://[^/]*redd%.it/")
172 |     or string.match(url, "^https?://[^/]*reddit%.com/")
173 |     or string.match(url, "^https?://[^/]*redditmedia%.com/")
174 |     or string.match(url, "^https?://[^/]*redditstatic%.com/")
175 |   ) then
176 |     local temp = ""
177 |     for c in string.gmatch(url, "(.)") do
178 |       local b = string.byte(c)
179 |       if b < 32 or b > 126 then
180 |         c = string.format("%%%02X", b)
181 |       end
182 |       temp = temp .. c
183 |     end
184 |     url = temp
185 |     outlinks[url] = true
186 |     return false
187 |   end
188 | 
189 |   if url .. "/" == parenturl then
190 |     return false
191 |   end
192 | 
193 |   if string.match(url, "^https?://gateway%.reddit%.com/desktopapi/v1/morecomments/")
194 |     or string.match(url, "^https?://old%.reddit%.com/api/morechildren$")
195 |     or string.match(url, "^https?://[^/]*reddit%.com/video/") then
196 |     return true
197 |   end
198 | 
199 |   if (
200 |       string.match(url, "^https?://[^/]*redditmedia%.com/")
201 |       or string.match(url, "^https?://v%.redd%.it/")
202 |       or string.match(url, "^https?://[^/]*reddit%.com/video/")
203 |       or string.match(url, "^https?://i%.redd%.it/")
204 |       or string.match(url, "^https?://[^%.]*preview%.redd%.it/.")
205 |     )
206 |     and not string.match(item_type, "comment")
207 |     and not string.match(url, "^https?://[^/]*redditmedia%.com/mediaembed/")
208 |     and not is_crosspost then
209 |     if parenturl
210 |       and string.match(parenturl, "^https?://www%.reddit.com/api/info%.json%?id=t")
211 |       and not string.match(url, "^https?://v%.redd%.it/")
212 |       and not string.match(url, "^https?://[^/]*reddit%.com/video/")
213 |       and not string.find(url, "thumbs.") then
214 |       return false
215 |     end
216 |     if not string.match(url, "^https?://v%.redd%.it/")
217 |       or string.match(url, "%.mp4$")
218 |       or string.match(url, "%.ts$") then
219 |       reddit_media_urls["url:" .. url] = true
220 |       return false
221 |     end
222 |     return true
223 |   end
224 | 
225 |   for s in string.gmatch(url, "([a-z0-9]+)") do
226 |     if posts[s] then
227 |       return true
228 |     end
229 |   end
230 |   
231 |   return false
232 | end
233 | 
234 | wget.callbacks.download_child_p = function(urlpos, parent, depth, start_url_parsed, iri, verdict, reason)
235 |   local url = urlpos["url"]["url"]
236 |   local html = urlpos["link_expect_html"]
237 | 
238 |   if item_type == "comment" or item_type == "url" then
239 |     return false
240 |   end
241 | 
242 |   if string.match(url, "[<>\\%*%$;%^%[%],%(%){}]")
243 |     or string.match(url, "^https?://[^/]*redditstatic%.com/")
244 |     or string.match(url, "^https?://old%.reddit%.com/static/")
245 |     or string.match(url, "^https?://www%.reddit%.com/static/")
246 |     or string.match(url, "^https?://styles%.redditmedia%.com/")
247 |     or string.match(url, "^https?://emoji%.redditmedia%.com/")
248 |     or string.match(url, "/%.rss$") then
249 |     return false
250 |   end
251 | 
252 |   if string.match(parent["url"], "^https?://old%.reddit%.com/comments/[a-z0-9]+") then
253 |     return true
254 |   end
255 | 
256 |   url = string.gsub(url, "&amp;", "&")
257 | 
258 |   if not processed(url)
259 |     and (allowed(url, parent["url"]) or (allowed(parent["url"]) and html == 0)) then
260 |     addedtolist[url] = true
261 |     return true
262 |   end
263 |   
264 |   return false
265 | end
266 | 
267 | wget.callbacks.get_urls = function(file, url, is_css, iri)
268 |   local urls = {}
269 |   local html = nil
270 |   local no_more_svc = false
271 |   
272 |   downloaded[url] = true
273 | 
274 |   if abortgrab then
275 |     return {}
276 |   end
277 | 
278 |   local function check(urla)
279 |     if no_more_svc
280 |       and string.match(urla, "^https?://[^/]+/svc/") then
281 |       return nil
282 |     end
283 |     local origurl = url
284 |     local url = string.match(urla, "^([^#]+)")
285 |     local url_ = string.match(url, "^(.-)%.?$")
286 |     if not string.find(url, "old.reddit.com") then
287 |       url_ = string.gsub(
288 |         url_, "\\[uU]([0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])",
289 |         function (s)
290 |           return utf8.char(tonumber(s, 16))
291 |         end
292 |       )
293 |     end
294 |     while string.find(url_, "&amp;") do
295 |       url_ = string.gsub(url_, "&amp;", "&")
296 |     end
297 |     if not processed(url_)
298 |       and string.match(url_, "^https?://.+")
299 |       and allowed(url_, origurl)
300 |       and not (string.match(url_, "[^/]$") and processed(url_ .. "/")) then
301 |       table.insert(urls, { url=url_ })
302 |       addedtolist[url_] = true
303 |       addedtolist[url] = true
304 |     end
305 |   end
306 | 
307 |   local function checknewurl(newurl)
308 |     if string.match(newurl, "^https?:////") then
309 |       check(string.gsub(newurl, ":////", "://"))
310 |     elseif string.match(newurl, "^https?://") then
311 |       check(newurl)
312 |     elseif string.match(newurl, "^https?:\\/\\?/") then
313 |       check(string.gsub(newurl, "\\", ""))
314 |     elseif string.match(newurl, "^\\/\\/") then
315 |       checknewurl(string.gsub(newurl, "\\", ""))
316 |     elseif string.match(newurl, "^//") then
317 |       check(urlparse.absolute(url, newurl))
318 |     elseif string.match(newurl, "^\\/") then
319 |       checknewurl(string.gsub(newurl, "\\", ""))
320 |     elseif string.match(newurl, "^/") then
321 |       check(urlparse.absolute(url, newurl))
322 |     elseif string.match(newurl, "^%.%./") then
323 |       if string.match(url, "^https?://[^/]+/[^/]+/") then
324 |         check(urlparse.absolute(url, newurl))
325 |       else
326 |         checknewurl(string.match(newurl, "^%.%.(/.+)$"))
327 |       end
328 |     elseif string.match(newurl, "^%./") then
329 |       check(urlparse.absolute(url, newurl))
330 |     end
331 |   end
332 | 
333 |   local function checknewshorturl(newurl)
334 |     if string.match(newurl, "^%?") then
335 |       check(urlparse.absolute(url, newurl))
336 |     elseif not (
337 |       string.match(newurl, "^https?:\\?/\\?//?/?")
338 |       or string.match(newurl, "^[/\\]")
339 |       or string.match(newurl, "^%./")
340 |       or string.match(newurl, "^[jJ]ava[sS]cript:")
341 |       or string.match(newurl, "^[mM]ail[tT]o:")
342 |       or string.match(newurl, "^vine:")
343 |       or string.match(newurl, "^android%-app:")
344 |       or string.match(newurl, "^ios%-app:")
345 |       or string.match(newurl, "^data:")
346 |       or string.match(newurl, "^irc:")
347 |       or string.match(newurl, "^%${")
348 |     ) then
349 |       check(urlparse.absolute(url, newurl))
350 |     end
351 |   end
352 | 
353 |   if string.match(url, "^https?://www%.reddit%.com/")
354 |     and not string.match(url, "/api/")
355 |     and not string.match(url, "^https?://[^/]+/svc/") then
356 |     check(string.gsub(url, "^https?://www%.reddit%.com/", "https://old.reddit.com/"))
357 |   end
358 | 
359 |   local match = string.match(url, "^https?://preview%.redd%.it/([a-zA-Z0-9]+%.[a-zA-Z0-9]+)")
360 |   if match then
361 |     check("https://i.redd.it/" .. match)
362 |   end
363 | 
364 |   if string.match(url, "is_lit_ssr=")
365 |     and not string.match(url, "/svc/shreddit/more%-comments/") then
366 |     check(string.gsub(url, "([%?&]is_lit_ssr=)[a-z]+", "%1true"))
367 |     check(string.gsub(url, "([%?&]is_lit_ssr=)[a-z]+", "%1false"))
368 |   end
369 | 
370 |   if allowed(url)
371 |     and status_code < 300
372 |     and item_type ~= "url"
373 |     and not string.match(url, "^https?://[^/]*redditmedia%.com/")
374 |     and not string.match(url, "^https?://[^/]*redditstatic%.com/")
375 |     and not string.match(url, "^https?://out%.reddit%.com/")
376 |     and not string.match(url, "^https?://[^%.]*preview%.redd%.it/")
377 |     and not string.match(url, "^https?://i%.redd%.it/")
378 |     and not (
379 |       string.match(url, "^https?://v%.redd%.it/")
380 |       and not string.match(url, "%.m3u8")
381 |       and not string.match(url, "%.mpd")
382 |     ) then
383 |     html = read_file(file)
384 |     --[[if string.match(url, "^https?://www%.reddit%.com/[^/]+/[^/]+/comments/[0-9a-z]+/[^/]+/[0-9a-z]*/?$") then
385 |       check(url .. "?utm_source=reddit&utm_medium=web2x&context=3")
386 |     end]]
387 |     if string.match(url, "^https?://old%.reddit%.com/api/morechildren$") then
388 |       html = string.gsub(html, '\\"', '"')
389 |     elseif string.match(url, "^https?://old%.reddit%.com/r/[^/]+/comments/")
390 |       or string.match(url, "^https?://old%.reddit%.com/r/[^/]+/duplicates/") then
391 |       html = string.gsub(html, "<div%s+class='spacer'>%s*<div%s+class=\"titlebox\">.-</div>%s*</div>%s*<div%s+class='spacer'>%s*<div%s+id=\"ad_[0-9]+\"%s*class=\"ad%-container%s*\">", "")
392 |     end    
393 |     if string.match(url, "^https?://old%.reddit%.com/") then
394 |       for s in string.gmatch(html, "(return%s+morechildren%(this,%s*'[^']+',%s*'[^']+',%s*'[^']+',%s*'[^']+'%))") do
395 |         local link_id, sort, children, limit_children = string.match(s, "%(this,%s*'([^']+)',%s*'([^']+)',%s*'([^']+)',%s*'([^']+)'%)$")
396 |         local id = string.match(children, "^([^,]+)")
397 |         local subreddit = string.match(html, 'data%-subreddit="([^"]+)"')
398 |         local post_data = 
399 |           "link_id=" .. link_id ..
400 |           "&sort=" .. sort ..
401 |           "&children=" .. string.gsub(children, ",", "%%2C") ..
402 |           "&id=t1_" .. id ..
403 |           "&limit_children=" .. limit_children ..
404 |           "&r=" .. subreddit ..
405 |           "&renderstyle=html"
406 |         if not requested_children[post_data] then
407 |           requested_children[post_data] = true
408 |           print("posting for modechildren with", post_data)
409 |           table.insert(urls, {
410 |             url="https://old.reddit.com/api/morechildren",
411 |             post_data=post_data,
412 |             headers={
413 |               ["Content-Type"]="application/x-www-form-urlencoded; charset=UTF-8",
414 |               ["X-Requested-With"]="XMLHttpRequest"
415 |             }
416 |           })
417 |         end
418 |       end
419 |     --[[elseif string.match(url, "^https?://www%.reddit%.com/r/[^/]+/comments/[^/]")
420 |       or string.match(url, "^https?://www%.reddit%.com/user/[^/]+/comments/[^/]")
421 |       or string.match(url, "^https?://www%.reddit%.com/comments/[^/]")
422 |       or string.match(url, "^https?://gateway%.reddit%.com/desktopapi/v1/morecomments/t3_[^%?]") then
423 |       local comments_data = nil
424 |       if string.match(url, "^https?://www%.reddit%.com/") then
425 |         comments_data = string.match(html, '<script%s+id="data">%s*window%.___r%s*=%s*({.+});%s*</script>%s*<script>')
426 |         if comments_data == nil then
427 |           print("Could not find comments data.")
428 |           abort_item()
429 |         end
430 |         comments_data = load_json_file(comments_data)["moreComments"]["models"]
431 |       elseif string.match(url, "^https?://gateway%.reddit%.com/") then
432 |         comments_data = load_json_file(html)["moreComments"]
433 |       end
434 |       if comments_data == nil then
435 |         print("Error handling comments data.")
436 |         abort_item()
437 |       end
438 |       local comment_id = string.match(url, "^https?://www%.reddit%.com/r/[^/]+/comments/([^/]+)")
439 |       if comment_id == nil then
440 |         comment_id = string.match(url, "^https?://www%.reddit%.com/user/[^/]+/comments/([^/]+)")
441 |       end
442 |       if comment_id == nil then
443 |         comment_id = string.match(url, "^https?://www%.reddit%.com/comments/([^/]+)")
444 |       end
445 |       if comment_id == nil then
446 |         comment_id = string.match(url, "^https?://gateway%.reddit%.com/desktopapi/v1/morecomments/t3_([^%?]+)")
447 |       end
448 |       if comment_id == nil then
449 |         print("Could not find comment ID.")
450 |         abort_item()
451 |       end
452 |       for _, d in pairs(comments_data) do
453 |         if d["token"] == nil then
454 |           print("Could not find token.")
455 |           abort_item()
456 |         end
457 |         local post_data = '{"token":"' .. d["token"] .. '"}'
458 |         if not requested_children[post_data] then
459 |           requested_children[post_data] = true
460 |           table.insert(urls, {url=
461 |             "https://gateway.reddit.com/desktopapi/v1/morecomments/t3_" .. comment_id .. 
462 |             "?emotes_as_images=true" ..
463 |             "&rtj=only" ..
464 |             "&allow_over18=1" ..
465 |             "&include=",
466 |             post_data=post_data
467 |           })
468 |         end
469 |       end]]
470 |     end
471 |     if string.match(url, "^https?://gateway%.reddit%.com/desktopapi/v1/morecomments/") then
472 |       for s in string.gmatch(html, '"permalink"%s*:%s*"([^"]+)"') do
473 |         check("https?://www.reddit.com" .. s)
474 |       end
475 |     end
476 |     if string.match(url, "^https?://v%.redd%.it/[^/]+/[^%.]+%.mpd") then
477 |       local max_size = 0
478 |       local max_size_url = nil
479 |       for s in string.gmatch(html, "<BaseURL>([^<]+)</BaseURL>") do
480 |         local size = string.match(s, "([0-9]+)%.mp4")
481 |         if size then
482 |           size = tonumber(size)
483 |           if size > max_size then
484 |             max_size = size
485 |             max_size_url = s
486 |           end
487 |         else
488 |           checknewshorturl(s)
489 |         end
490 |       end
491 |       if max_size_url then
492 |         checknewshorturl(max_size_url)
493 |       end
494 |     end
495 |     if string.match(url, "^https?://v%.redd%.it/[^/]+/[^%.]+%.m3u8") then
496 |       local bandwidth = 0
497 |       local url = nil
498 |       local has_uri = nil
499 |       for s in string.gmatch(html, "(.-)\n") do
500 |         if string.match(s, "^#") then
501 |           local uri = string.match(s, 'URI="([^"]+)"')
502 |           if (uri and not has_uri) or (not uri and has_uri) then
503 |             if url then
504 |               checknewshorturl(url)
505 |             end
506 |             bandwidth = 0
507 |             url = nil
508 |           end
509 |           local n = string.match(s, "BANDWIDTH=([0-9]+)")
510 |           if n then
511 |             n = tonumber(n)
512 |           end
513 |           if uri then
514 |             has_uri = true
515 |             if n then
516 |               if n > bandwidth then
517 |                 bandwidth = n
518 |                 url = uri
519 |               end
520 |             else
521 |               checknewshorturl(uri)
522 |             end
523 |           elseif n then
524 |             has_uri = false
525 |             if n > bandwidth then
526 |               bandwidth = n
527 |               url = nil
528 |             end
529 |           end
530 |         elseif not string.find(s, ".m3u8") then
531 |           checknewshorturl(s)
532 |         else
533 |           if not has_uri and not url then
534 |             url = s
535 |           end
536 |         end
537 |       end
538 |       if url then
539 |         checknewshorturl(url)
540 |       end
541 |     end
542 |     if string.match(url, "^https?://www%.reddit%.com/svc/") then
543 |       for _, pattern in pairs({
544 |         '<faceplate%-partial[^>]+src="([^"]+)"[^>]*>%s*<input%s+type="hidden"%s+name="cursor"%s+value="([^"]+)"%s*/>',
545 |         '<faceplate%-partial[^>]+src="([^"]+)"[^>]*>%s*<!%-%-lit%-node [0-9]%-%->%s*<input%s+type="hidden"%s+name="cursor"%s+value="([^"]+)"%s*/>'
546 |       }) do
547 |         for src_url, cursor in string.gmatch(html, pattern) do
548 |           src_url = string.gsub(src_url, "&amp;", "&")
549 |           local requested_s = src_url .. cursor
550 |           if not requested_children[requested_s] then
551 |             print("posting with cursor", cursor, "to URL", src_url)
552 |             table.insert(urls, {url=
553 |               urlparse.absolute(url, src_url),
554 |               headers={
555 |                 ["content-type"]="application/x-www-form-urlencoded"
556 |               },
557 |               post_data="cursor=" .. string.gsub(cursor, "=", "%%3D")-- .. "&csrf_token=" .. csrf_token
558 |             })
559 |           end
560 |         end
561 |       end
562 |       no_more_svc = true
563 |     end
564 |     if string.match(url, "^https?://www%.reddit.com/api/info%.json%?id=t") then
565 |       json = cjson.decode(html)
566 |       if not json or not json["data"] or not json["data"]["children"] then
567 |         io.stdout:write("Could not load JSON.\n")
568 |         io.stdout:flush()
569 |         abort_item()
570 |       end
571 |       for _, child in pairs(json["data"]["children"]) do
572 |         if not child["data"] or not child["data"]["permalink"] then
573 |           io.stdout:write("Permalink is missing.\n")
574 |           io.stdout:flush()
575 |           abort_item()
576 |         end
577 |         if selftext then
578 |           io.stdout:write("selftext already found.\n")
579 |           io.stdout:flush()
580 |           abort_item()
581 |         end
582 |         selftext = child["data"]["selftext"]
583 |         checknewurl(child["data"]["permalink"])
584 |         if child["data"]["is_video"] and not child["data"]["secure_media"] then
585 |           io.stdout:write("Video still being processed.\n")
586 |           io.stdout:flush()
587 |           abort_item()
588 |         end
589 |         local crosspost_parent = child["data"]["crosspost_parent"]
590 |         if crosspost_parent and crosspost_parent ~= string.match(url, "(t[0-9]_[a-z0-9]+)") then
591 |           is_crosspost = true
592 |         end
593 |         local id = child["data"]["id"]
594 |         local subreddit = child["data"]["subreddit"]
595 |         if child["kind"] == "t1" then
596 |           --check("https://www.reddit.com/svc/shreddit/comments/" .. subreddit .. "/" .. child["data"]["link_id"] .. "/t1_" .. id .. "?render-mode=partial&shredtop=")
597 |         elseif child["kind"] == "t3" then
598 |           --check("https://www.reddit.com/svc/shreddit/comments/" .. subreddit .. "/t3_" .. id .. "?render-mode=partial")
599 |         else
600 |           io.stdout:write("Kind is not supported.\n")
601 |           io.stdout:flush()
602 |           abort_item()
603 |         end
604 |       end
605 |     end
606 |     --no_more_svc = true
607 |     for newurl in string.gmatch(string.gsub(html, "&quot;", '"'), '([^"%s]+)') do
608 |       checknewurl(newurl)
609 |     end
610 |     for newurl in string.gmatch(string.gsub(html, "&#039;", "'"), "([^'%s]+)") do
611 |       checknewurl(newurl)
612 |     end
613 |     for newurl in string.gmatch(html, ">%s*([^<%s]+)") do
614 |       checknewurl(newurl)
615 |     end
616 |     for newurl in string.gmatch(html, "[^%-]href='([^']+)'") do
617 |       checknewshorturl(newurl)
618 |     end
619 |     for newurl in string.gmatch(html, '[^%-]href="([^"]+)"') do
620 |       checknewshorturl(newurl)
621 |     end
622 |     for newurl in string.gmatch(html, ":%s*url%(([^%)]+)%)") do
623 |       checknewurl(newurl)
624 |     end
625 |   end
626 | 
627 |   return urls
628 | end
629 | 
630 | find_item = function(url)
631 |   local match = string.match(url, "^https?://www%.reddit.com/api/info%.json%?id=t[0-9]_([a-z0-9]+)$")
632 |   if not match and item_types[url] then
633 |     match = url
634 |   end
635 |   if match and not posts[match] then
636 |     abortgrab = false
637 |     selftext = nil
638 |     is_crosspost = false
639 |     posts[match] = true
640 |     retry_url = false
641 |     if not item_types[match] then
642 |       io.stdout:write("Type for ID not found.\n")
643 |       io.stdout:flush()
644 |       abort_item()
645 |     end
646 |     item_type = item_types[match]
647 |     item_value = match
648 |     item_name = item_type .. ":" .. item_value
649 |     io.stdout:write("Archiving item " .. item_name .. ".\n")
650 |     io.stdout:flush()
651 |   end
652 | end
653 | 
654 | wget.callbacks.write_to_warc = function(url, http_stat)
655 |   status_code = http_stat["statcode"]
656 |   logged_response = true
657 |   find_item(url["url"])
658 |   local html = nil
659 |   if not item_name then
660 |     error("No item name found.")
661 |   end
662 |   if status_code >= 300 and status_code <= 399 then
663 |     local newloc = urlparse.absolute(url["url"], http_stat["newloc"])
664 |     if string.match(newloc, "inactive%.min")
665 |       or string.match(newloc, "ReturnUrl")
666 |       or string.match(newloc, "adultcontent") then
667 |       io.stdout:write("Found invalid redirect.\n")
668 |       io.stdout:flush()
669 |       print("Not writing to WARC.")
670 |       retry_url = true
671 |       return false
672 |     end
673 |   end
674 |   if (
675 |     (
676 |       http_stat["len"] == 0
677 |       and status_code == 200
678 |     ) or (
679 |       status_code ~= 200
680 |       and status_code ~= 301
681 |       and status_code ~= 302
682 |       and status_code ~= 308
683 |     )
684 |   ) and not (
685 |     string.match(url["url"], "^https?://[^/]*redditmedia%.com/mediaembed/")
686 |     and status_code == 404
687 |   ) then
688 |     print("Not writing to WARC.")
689 |     retry_url = true
690 |     return false
691 |   end
692 |   if string.match(url["url"], "/api/info%.json") then
693 |     if not html then
694 |       html = read_file(http_stat["local_file"])
695 |     end
696 |     local json = cjson.decode(html)
697 |     local child_count = 0
698 |     local has_video = false
699 |     for _, child in pairs(json["data"]["children"]) do
700 |       child_count = child_count + 1
701 |       if child["data"]["is_video"] then
702 |         has_video = true
703 |       end
704 |     end
705 |     if child_count ~= 1
706 |       or has_video
707 |       or string.match(html, "v%.redd%.it")
708 |       or string.match(html, "reddit_video") then
709 |       print("Not writing to WARC.")
710 |       abort_item()
711 |       return false
712 |     end
713 |   end
714 |   if string.match(url["url"], "^https?://www%.reddit%.com/")
715 |     or string.match(url["url"], "^https?://old%.reddit%.com/") then
716 |     if not html then
717 |       html = read_file(http_stat["local_file"])
718 |     end
719 |     if status_code == 200 and (
720 |       string.match(url["url"], "^https?://[^/]+/r/")
721 |       and (
722 |         not string.match(html, "<title>")
723 |         or not string.match(html, "</html>%s*$")
724 |       )
725 |     ) or (
726 |       string.match(url["url"], "^https?://[^/]+/svc/")
727 |       and not string.match(html, "</[^<>%s]+>%s*$")
728 |       and not string.match(html, "<!%-%-/lit%-part%-%->%s*$")
729 |     ) or (
730 |       string.match(url["url"], "^https?://old%.reddit%.com/api/morechildren$")
731 |       and not cjson.decode(html)["success"]
732 |     ) then
733 |       print("Not writing to WARC.")
734 |       retry_url = true
735 |       return false
736 |     end
737 |   end
738 |   local is_comments_comment = string.match(url["url"], "^https?://www%.reddit%.com/r/[^/]+/comments/[^/]+/comment/[^/]+/")
739 |   if (
740 |       string.match(url["url"], "^https?://[^/]+/svc/")
741 |       and string.match(html, 'level%s*=%s*"')
742 |       and not string.match(html, '<shreddit%-async%-loader[^a-z]')
743 |       and not string.match(html, '<shreddit%-comment%-tree[^a-z]')
744 |     ) or (
745 |       string.match(url["url"], "^https?://old%.reddit%.com/r/")
746 |       and not (
747 |         string.match(url["url"], "^https?://old%.reddit%.com/r/u_")
748 |         and status_code == 301
749 |       )
750 |       and not string.match(html, 'class="live%-timestamp"')
751 |     ) or (
752 |       string.match(url["url"], "^https?://www%.reddit%.com/r/")
753 |       and (
754 |         (
755 |           not is_comments_comment
756 |           and item_type == "comment"
757 |           and not string.match(html, "<shreddit%-redirect")
758 |         ) or (
759 |           (
760 |             is_comments_comment
761 |             or item_type ~= "comment"
762 |           )
763 |           and (
764 |             not string.match(html, "<shreddit%-title")
765 |             or not string.match(html, 'id="time%-ago%-separator"')
766 |           )
767 |           and not string.match(html, "<shreddit%-redirect")
768 |         )
769 |       )
770 |     ) then
771 |     print("Not writing to WARC.")
772 |     retry_url = true
773 |     return false
774 |   end
775 |   if abortgrab then
776 |     print("Not writing to WARC.")
777 |     return false
778 |   end
779 |   retry_url = false
780 |   tries = 0
781 |   return true
782 | end
783 | 
784 | wget.callbacks.httploop_result = function(url, err, http_stat)
785 |   status_code = http_stat["statcode"]
786 |   
787 |   url_count = url_count + 1
788 |   io.stdout:write(url_count .. "=" .. status_code .. " " .. url["url"] .. " \n")
789 |   io.stdout:flush()
790 | 
791 |   if killgrab then
792 |     return wget.actions.ABORT
793 |   end
794 | 
795 |   find_item(url["url"])
796 | 
797 |   if status_code >= 300 and status_code <= 399 and not retry_url then
798 |     local newloc = urlparse.absolute(url["url"], http_stat["newloc"])
799 |     if processed(newloc) or not allowed(newloc, url["url"]) then
800 |       tries = 0
801 |       return wget.actions.EXIT
802 |     end
803 |   end
804 |   
805 |   if (status_code >= 200 and status_code <= 399) then
806 |     downloaded[url["url"]] = true
807 |     downloaded[string.gsub(url["url"], "https?://", "http://")] = true
808 |   end
809 | 
810 |   if abortgrab then
811 |     abort_item()
812 |     return wget.actions.EXIT
813 |   end
814 | 
815 |   --[[if status_code == 403 and string.match(url["url"], "^https?://v%.redd%.it/")
816 |     and selftext == "[deleted]" then
817 |     return wget.actions.EXIT
818 |   end]]
819 |   
820 |   if retry_url
821 |     or status_code == 0 then
822 |     if item_type == "url" then
823 |       abort_item()
824 |       return wget.actions.EXIT
825 |     end
826 |     io.stdout:write("Server returned " .. http_stat.statcode .. " (" .. err .. "). Sleeping.\n")
827 |     io.stdout:flush()
828 |     local maxtries = 3
829 |     if not allowed(url["url"]) then
830 |         maxtries = 0
831 |     end
832 |     if tries >= maxtries then
833 |       io.stdout:write("\nI give up...\n")
834 |       io.stdout:flush()
835 |       tries = 0
836 |       abort_item()
837 |       return wget.actions.EXIT
838 |     end
839 |     os.execute("sleep " .. math.floor(math.pow(2, tries)))
840 |     tries = tries + 1
841 |     return wget.actions.CONTINUE
842 |   end
843 | 
844 |   if item_type == "url" then
845 |     return wget.actions.EXIT
846 |   end
847 | 
848 |   tries = 0
849 | 
850 |   local sleep_time = 0
851 | 
852 |   if sleep_time > 0.001 then
853 |     os.execute("sleep " .. sleep_time)
854 |   end
855 | 
856 |   return wget.actions.NOTHING
857 | end
858 | 
859 | wget.callbacks.finish = function(start_time, end_time, wall_time, numurls, total_downloaded_bytes, total_download_time)
860 |   local function submit_backfeed(newurls, key)
861 |     local tries = 0
862 |     local maxtries = 10
863 |     while tries < maxtries do
864 |       if killgrab then
865 |         return false
866 |       end
867 |       local body, code, headers, status = http.request(
868 |         "https://legacy-api.arpa.li/backfeed/legacy/" .. key,
869 |         newurls .. "\0"
870 |       )
871 |       print(body)
872 |       if code == 200 then
873 |         io.stdout:write("Submitted discovered URLs.\n")
874 |         io.stdout:flush()
875 |         break
876 |       end
877 |       io.stdout:write("Failed to submit discovered URLs." .. tostring(code) .. tostring(body) .. "\n")
878 |       io.stdout:flush()
879 |       os.execute("sleep " .. math.floor(math.pow(2, tries)))
880 |       tries = tries + 1
881 |     end
882 |     if tries == maxtries then
883 |       kill_grab()
884 |     end
885 |   end
886 | 
887 |   local file = io.open(item_dir .. '/' .. warc_file_base .. '_bad-items.txt', 'w')
888 |   for url, _ in pairs(bad_items) do
889 |     file:write(url .. "\n")
890 |   end
891 |   file:close()
892 |   for key, data in pairs({
893 |     ["reddit-v5fj9elcyh0rzck"] = reddit_media_urls,
894 |     ["urls-f1zr02i96okrkdv"] = outlinks
895 |   }) do
896 |     print('queuing for', string.match(key, "^(.+)%-"))--, "on shard", shard)
897 |     local items = nil
898 |     local count = 0
899 |     for item, _ in pairs(data) do
900 |       print("found item", item)
901 |       if items == nil then
902 |         items = item
903 |       else
904 |         items = items .. "\0" .. item
905 |       end
906 |       count = count + 1
907 |       if count == 100 then
908 |         submit_backfeed(items, key)
909 |         items = nil
910 |         count = 0
911 |       end
912 |     end
913 |     if items ~= nil then
914 |       submit_backfeed(items, key)
915 |     end
916 |   end
917 | end
918 | 
919 | wget.callbacks.before_exit = function(exit_status, exit_status_string)
920 |   if killgrab then
921 |     return wget.exits.IO_FAIL
922 |   end
923 |   if abortgrab then
924 |     abort_item()
925 |   end
926 |   return exit_status
927 | end
928 | 


--------------------------------------------------------------------------------
/user-agents:
--------------------------------------------------------------------------------
  1 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/62.0
  2 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
  3 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0
  4 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:47.0) Gecko/20100101 Firefox/47.0
  5 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:48.0) Gecko/20100101 Firefox/48.0
  6 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:49.0) Gecko/20100101 Firefox/49.0
  7 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:50.0) Gecko/20100101 Firefox/50.0
  8 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:52.0) Gecko/20100101 Firefox/52.0
  9 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:54.0) Gecko/20100101 Firefox/54.0
 10 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0) Gecko/20100101 Firefox/56.0
 11 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
 12 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
 13 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
 14 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
 15 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:58.0) Gecko/20100101 Firefox/58.0
 16 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:59.0) Gecko/20100101 Firefox/59.0
 17 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:60.0) Gecko/20100101 Firefox/60.0
 18 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
 19 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:62.0) Gecko/20100101 Firefox/62.0
 20 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:63.0) Gecko/20100101 Firefox/63.0
 21 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:64.0) Gecko/20100101 Firefox/64.0
 22 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:65.0) Gecko/20100101 Firefox/65.0
 23 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0
 24 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
 25 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:47.0) Gecko/20100101 Firefox/47.0
 26 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:48.0) Gecko/20100101 Firefox/48.0
 27 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0
 28 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:50.0) Gecko/20100101 Firefox/50.0
 29 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0
 30 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48
 31 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Firefox/52.0
 32 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0
 33 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:55.0) Gecko/20100101 Firefox/55.0
 34 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0) Gecko/20100101 Firefox/56.0
 35 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
 36 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
 37 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
 38 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
 39 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:57.0) Gecko/20100101 Firefox/57.0
 40 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:58.0) Gecko/20100101 Firefox/58.0
 41 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:59.0) Gecko/20100101 Firefox/59.0
 42 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:60.0) Gecko/20100101 Firefox/60.0
 43 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:61.0) Gecko/20100101 Firefox/61.0
 44 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:62.0) Gecko/20100101 Firefox/62.0
 45 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:63.0) Gecko/20100101 Firefox/63.0
 46 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:64.0) Gecko/20100101 Firefox/64.0
 47 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:49.0) Gecko/20100101 Firefox/49.0
 48 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:50.0) Gecko/20100101 Firefox/50.0
 49 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:51.0) Gecko/20100101 Firefox/51.0
 50 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Firefox/52.0
 51 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:53.0) Gecko/20100101 Firefox/53.0
 52 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0
 53 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0
 54 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0) Gecko/20100101 Firefox/56.0
 55 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
 56 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0
 57 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:58.0) Gecko/20100101 Firefox/58.0
 58 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:59.0) Gecko/20100101 Firefox/59.0
 59 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:59.0.2) Gecko/20100101 Firefox/59.0.2
 60 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Firefox/60.0
 61 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:61.0) Gecko/20100101 Firefox/61.0
 62 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:61.0) Gecko/20100101 Firefox/62.0
 63 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
 64 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:63.0) Gecko/20100101 Firefox/63.0
 65 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:64.0) Gecko/20100101 Firefox/64.0
 66 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:40.0) Gecko/20100101 Firefox/40.0
 67 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:48.0) Gecko/20100101 Firefox/48.0
 68 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Firefox/52.0
 69 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.3
 70 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:55.0) Gecko/20100101 Firefox/55.0
 71 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0) Gecko/20100101 Firefox/56.0
 72 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
 73 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
 74 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
 75 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
 76 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
 77 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/99.0
 78 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:58.0) Gecko/20100101 Firefox/58.0
 79 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0
 80 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0
 81 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0
 82 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:62.0) Gecko/20100101 Firefox/62.0
 83 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:63.0) Gecko/20100101 Firefox/63.0
 84 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:64.0) Gecko/20100101 Firefox/64.0
 85 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:65.0) Gecko/20100101 Firefox/65.0
 86 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2
 87 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
 88 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:57.0) Gecko/20100101 Firefox/57.0
 89 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Firefox/60.0
 90 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:61.0) Gecko/20100101 Firefox/61.0
 91 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:62.0) Gecko/20100101 Firefox/62.0
 92 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:63.0) Gecko/20100101 Firefox/63.0
 93 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:64.0) Gecko/20100101 Firefox/64.0
 94 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0
 95 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:40.0) Gecko/20100101 Firefox/40.0
 96 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:45.0) Gecko/20100101 Firefox/45.0
 97 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:47.0) Gecko/20100101 Firefox/47.0
 98 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:48.0) Gecko/20100101 Firefox/48.0
 99 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.3
100 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:44.0) Gecko/20100101 Firefox/44.0
101 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:45.0) Gecko/20100101 Firefox/45.0
102 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:47.0) Gecko/20100101 Firefox/47.0
103 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:48.0) Gecko/20100101 Firefox/48.0
104 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:49.0) Gecko/20100101 Firefox/49.0.2.1 Waterfox/49.0.2.1
105 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:45.0) Gecko/20100101 Firefox/45.0
106 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:48.0) Gecko/20100101 Firefox/48.0
107 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:56.0) Gecko/20100101 Firefox/56.0
108 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
109 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:41.0) Gecko/20100101 Firefox/41.0
110 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:47.0) Gecko/20100101 Firefox/47.0
111 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:48.0) Gecko/20100101 Firefox/48.0
112 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:52.0) Gecko/20100101 Firefox/52.0
113 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.1
114 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:55.0) Gecko/20100101 Firefox/55.0
115 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:56.0) Gecko/20100101 Firefox/56.0
116 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:56.0) Gecko/20100101 Firefox/56.0.1 Waterfox/56.0.1
117 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:57.0) Gecko/20100101 Firefox/57.0
118 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:58.0) Gecko/20100101 Firefox/58.0
119 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:59.0) Gecko/20100101 Firefox/59.0
120 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:60.0) Gecko/20100101 Firefox/60.0
121 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:61.0) Gecko/20100101 Firefox/61.0
122 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:62.0) Gecko/20100101 Firefox/62.0
123 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:63.0) Gecko/20100101 Firefox/63.0
124 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
125 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
126 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15
127 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1; rv:50.0) Gecko/20100101 Firefox/50.0
128 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1; rv:55.0) Gecko/20100101 Firefox/55.0
129 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2; rv:49.0) Gecko/20100101 Firefox/49.0
130 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.102 Safari/537.36
131 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
132 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
133 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15
134 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
135 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
136 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
137 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
138 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15
139 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15
140 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15
141 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15
142 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
143 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
144 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
145 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
146 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
147 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
148 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
149 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
150 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15
151 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
152 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
153 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3639.1 Safari/537.36
154 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.2 Safari/605.1.15
155 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_29_81; rv:45.70.23) Gecko/20134284 Firefox/45.70.23
156 | Mozilla/5.0 (Macintosh; Intel Mac OS X 11.11; rv:51.0) Gecko/20100101 Firefox/60.0
157 | Mozilla/5.0 (Macintosh; Intel Mac OS X 9.3; rv:45.0) Gecko/20100101 Firefox/57.0
158 | Mozilla/5.0 (Macintosh; Intel Mac OS X 9.3; rv:45.0) Gecko/20100101 Firefox/59.0.2
159 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0
160 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.12; rv:46.0) Gecko/20100101 Firefox/46.0
161 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.12; rv:55.0) Gecko/20100101 Firefox/55.0
162 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; FPR7; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/G5
163 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; FPR8; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/G5
164 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.4; FPR9; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/G5
165 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.5; FPR8; rv:45.0) Gecko/20100101 Firefox/45.0 TenFourFox/7450
166 | Mozilla/5.0 (Macintosh; PPC Mac OS X 10.8; rv:47.0) Gecko/20100101 Firefox/47.0
167 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:59.0) Gecko/20100101 Firefox/59.0
168 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:64.0) Gecko/20100101 Firefox/64.0
169 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:65.0) Gecko/20100101 Firefox/65.0
170 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:59.0) Gecko/20100101 Firefox/59.0
171 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:60.0) Gecko/20100101 Firefox/60.0
172 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:61.0) Gecko/20100101 Firefox/61.0
173 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.11; rv:62.0) Gecko/20100101 Firefox/62.0
174 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Firefox/54.0
175 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Firefox/60.0
176 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:61.0) Gecko/20100101 Firefox/61.0
177 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
178 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0
179 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0
180 | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; rv:62.0) Gecko/20100101 Firefox/62.0
181 | Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
182 | Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
183 | Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.85 Safari/537.36
184 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:20.0) Gecko/20100101 Firefox/60.0
185 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
186 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/45.0
187 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 IceDragon/40.1.1.18 Firefox/40.0.2
188 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
189 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0
190 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
191 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 Framafox/43.0.1
192 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40
193 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0
194 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
195 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.63.16) Gecko/20175595 Firefox/45.63.16
196 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
197 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0
198 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0
199 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
200 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0 SeaMonkey/2.46
201 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
202 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/45.0
203 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/47.0
204 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
205 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/60.0
206 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/64.0
207 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
208 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.9.1
209 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.1
210 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2
211 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.2 Lightning/5.4
212 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.3
213 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4
214 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Zotero/5.0
215 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Firefox/52.9
216 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.6.2
217 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.7.2
218 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.2
219 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.3
220 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.0
221 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1
222 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.2
223 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.3
224 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180927
225 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0
226 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0a2
227 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.1
228 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.1.0
229 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0
230 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0
231 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0
232 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/50.0
233 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0
234 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0 SeaMonkey/2.49.3
235 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/57.0
236 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:57.0) Gecko/20100101 Firefox/57.0
237 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0
238 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0 IceDragon/58.0.1
239 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:59.0) Gecko/20100101 Firefox/59.0
240 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0
241 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Firefox/60.0 IceDragon/60.0.2
242 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.9) Gecko/20100101 Goanna/4.1 Firefox/60.9 PaleMoon/28.2.1
243 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0
244 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0 IceDragon/61.0
245 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:62.0) Gecko/20100101 Firefox/62.0
246 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:62.0) Gecko/20100101 Firefox/62.0 IceDragon/62.0.2
247 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
248 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:64.0) Gecko/20100101 Firefox/64.0
249 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:65.0) Gecko/20100101 Firefox/65.0
250 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:54.0) Gecko/20100101 Firefox/54.0
251 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:55.0) Gecko/20100101 Firefox/55.0
252 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:59.0) Gecko/20100101 Firefox/59.0
253 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:60.0) Gecko/20100101 Firefox/60.0
254 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:61.0) Gecko/20100101 Firefox/61.0
255 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:61.0) Gecko/20100101 Firefox/62.0
256 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:62.0) Gecko/20100101 Firefox/62.0
257 | Mozilla/5.0 (Windows NT 10.0; Win64; rv:63.0) Gecko/20100101 Firefox/63.0
258 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0
259 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0
260 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0.4 Waterfox/43.0.4
261 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0
262 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0.1 Waterfox/46.0.1
263 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0
264 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:49.0) Gecko/20100101 Firefox/49.0
265 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0
266 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0
267 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0
268 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.0.4
269 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.5.0
270 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.5.2
271 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.7.2
272 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.7.4
273 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.8.0
274 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.9.1
275 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0.2 Waterfox/52.0.2
276 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/59.0
277 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.3 Firefox/52.9 PaleMoon/27.5.1
278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.8.3
279 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.0
280 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1
281 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.2
282 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.3
283 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.4
284 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180424
285 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180515
286 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180601
287 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180718
288 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180905
289 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 Basilisk/20180927
290 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0
291 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.0.1
292 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.0.1
293 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.1.0
294 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0
295 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0
296 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0.1 Waterfox/54.0.1
297 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0
298 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
299 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0.1 Waterfox/56.0.1
300 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0.4 Waterfox/56.0.4
301 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.3
302 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.4
303 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0; Waterfox) Gecko/20100101 Firefox/56.2.5
304 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
305 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/59.0
306 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
307 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
308 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
309 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0
310 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
311 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
312 | Mozilla/5.0 (Windows NT 10.0; rv:44.0) Gecko/20100101 Firefox/44.0.1
313 | Mozilla/5.0 (Windows NT 10.0; rv:45.0) Gecko/20100101 Firefox/45.0
314 | Mozilla/5.0 (Windows NT 10.0; rv:47.0) Gecko/20100101 Firefox/47.0
315 | Mozilla/5.0 (Windows NT 10.0; rv:49.0) Gecko/20100101 Firefox/49.0
316 | Mozilla/5.0 (Windows NT 10.0; rv:50.0) Gecko/20100101 Firefox/50.0
317 | Mozilla/5.0 (Windows NT 10.0; rv:51.0) Gecko/20100101 Firefox/51.0
318 | Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0
319 | Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.7.2
320 | Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0 Cyberfox/52.9.1
321 | Mozilla/5.0 (Windows NT 10.0; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4
322 | Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1
323 | Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.1a1
324 | Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 PaleMoon/27.9.3
325 | Mozilla/5.0 (Windows NT 10.0; rv:52.9) Gecko/20100101 Goanna/4.1 Firefox/52.9 PaleMoon/28.1.0
326 | Mozilla/5.0 (Windows NT 10.0; rv:53.0) Gecko/20100101 Firefox/53.0
327 | Mozilla/5.0 (Windows NT 10.0; rv:55.0) Gecko/20100101 Firefox/55.0
328 | Mozilla/5.0 (Windows NT 10.0; rv:56.0) Gecko/20100101 Firefox/56.0
329 | Mozilla/5.0 (Windows NT 10.0; rv:57.0) Gecko/20100101 Firefox/57.0
330 | Mozilla/5.0 (Windows NT 10.0; rv:58.0) Gecko/20100101 Firefox/58.0
331 | Mozilla/5.0 (Windows NT 10.0; rv:59.0) Gecko/20100101 Firefox/59.0
332 | Mozilla/5.0 (Windows NT 10.0; rv:60.0) Gecko/20100101 Firefox/60.0
333 | Mozilla/5.0 (Windows NT 10.0; rv:61.0) Gecko/20100101 Firefox/61.0
334 | Mozilla/5.0 (Windows NT 10.0; rv:62.0) Gecko/20100101 Firefox/62.0
335 | Mozilla/5.0 (Windows NT 10.0; rv:63.0) Gecko/20100101 Firefox/63.0
336 | Mozilla/5.0 (Windows NT 10.0; rv:64.0) Gecko/20100101 Firefox/64.0
337 | Mozilla/5.0 (Windows NT 4.0; rv:52.0) Gecko/20100101 Firefox/52.0
338 | Mozilla/5.0 (Windows NT 5.1; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0
339 | Mozilla/5.0 (Windows NT 5.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
340 | Mozilla/5.0 (Windows NT 5.1; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0
341 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
342 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
343 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
344 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
345 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
346 | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0
347 | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
348 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36
349 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
350 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
351 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
352 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
353 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
354 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
355 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0
356 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
357 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
358 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0
359 | Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
360 | Mozilla/5.0 (Windows NT 6.1; rv:63.0) Gecko/20100101 Firefox/63.0
361 | Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
362 | Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
363 | Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
364 | Mozilla/5.0 (X11; CrOS x86_64 11021.81.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
365 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36
366 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36
367 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36
368 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
369 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
370 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36
371 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36
372 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
373 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
374 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/70.0.3538.77 Chrome/70.0.3538.77 Safari/537.36
375 | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
376 | Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
377 | Mozilla/5.0 (X11; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0
378 | Mozilla/5.0 (X11; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0
379 | Mozilla/5.0 (X11; OpenBSD amd64; rv:56.0) Gecko/20100101 Firefox/66.0
380 | Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0
381 | Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0
382 | 


--------------------------------------------------------------------------------