├── .dockerignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── data
└── .gitkeep
├── generate
├── generate_goggle.py
├── requirements.txt
└── top_domains.txt
├── narwhalizer.sh
├── netsec.env
├── patches
└── tsdb_naming.patch
├── refresh
├── obtain_refresh_token.py
└── requirements.txt
├── scripts
├── bot.py
├── install.sh
├── patch.sh
└── refresh.sh
└── template.env
/.dockerignore:
--------------------------------------------------------------------------------
1 | **/.git
2 | **/data
3 | *.env
4 | Dockerfile
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM alpine:3.16
2 | RUN apk add python3 git patch bash
3 | WORKDIR /app
4 | COPY . .
5 | RUN ./scripts/install.sh
6 | CMD ./narwhalizer.sh
7 |
8 | RUN addgroup app
9 | RUN adduser -D -G app -h /app app
10 | RUN chown -R app:app /app
11 | USER app
12 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright © 2022 Forces Unseen
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 |
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 |
7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | build:
2 | docker build . -t narwhalizer
3 |
4 | run:
5 | docker run -it -v ${PWD}/data:/app/data --env-file $(env) narwhalizer
6 |
7 | dev:
8 | docker run -it -v ${PWD}/data:/app/data -v ${PWD}/generate:/app/generate --env-file $(env) narwhalizer
9 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # narwhalizer
2 |
3 |
4 |
5 |
6 |
7 | Goggles are a way to alter the ranking in a search engine. Brave is behind this technology and you should [read more about it](https://github.com/brave/goggles-quickstart). This tool lets you generate Goggles using your favorite subreddit(s). Forces Unseen uses this to generate [netsec-goggle](https://github.com/forcesunseen/netsec-goggle).
8 |
9 | You can also read more about this on Forces Unseen's blog: [Make Search Engines Great Again!](https://blog.forcesunseen.com/make-search-engines-great-again)
10 |
11 | ## Basic Usage
12 |
13 | 1. Build the container locally using the following command.
14 |
15 | ```
16 | docker build . -t narwhalizer
17 | ```
18 |
19 | 2. Create an `.env` file. Refer to `netsec.env` for what you will need in there. The [Environment Variables](#environment-variables) section has more details about what each variable does. For your first time you can try it out with the `netsec.env` file by just replacing the variables that say `REPLACE`. Refer to the [Reddit API Credentials](#reddit-api-credentials) section for obtaining credentials for Reddit.
20 |
21 | 3. Run the container using your `.env` file. The following command shows `netsec.env` being used.
22 |
23 | ```
24 | docker run -it -v ${PWD}/data:/app/data --env-file netsec.env narwhalizer
25 | ```
26 |
27 | The first run will take longer since timesearch will have to build a database for the subreddit first. Afterwards updates will be much quicker. After execution completes the `data` directory will contain the subreddit database and generated `output.goggle` file. Every time the container is run, timesearch will check for updates and the `output.goggle` file will be re-generated. If at any point you terminate the container or your computer crashes, timesearch will continue from where it left off in building the database.
28 |
29 |
30 | ## Reddit API Credentials
31 |
32 | Timesearch requires Reddit API credentials and other identifiers. These steps only have to be completed one time.
33 |
34 | 1. Go to https://old.reddit.com/prefs/apps/
35 | 2. Create an application with the `script` type and the redirect URL set to `http://localhost:8081`.
36 | 3. Copy the application ID and secret that was generated. (The application ID is near the top, under the name of your application)
37 | 4. Run the following command to obtain a refresh token. Replace the variables with the application ID and secret generated in the previous step.
38 |
39 | ```
40 | docker run -it -p 127.0.0.1:8081:8081 narwhalizer ./scripts/refresh.sh --app-id --app-secret
41 | ```
42 |
43 | After this you should be able to fill out the `APP_ID`, `APP_SECRET`, and `APP_REFRESH` variables within your `.env` file.
44 |
45 | *Note: The script requests the following scopes: `read` and `wikiread`*
46 |
47 | ## Environment Variables
48 |
49 | Everything is controlled through environment variables. Reference `netsec.env` for a real example and/or use `template.env` if you want a more bare-bones starting point.
50 |
51 | ### Timesearch
52 |
53 | `USERAGENT` should be set to a description of your API usage. In this case we just left it as `narwhalizer`.
54 |
55 | `CONTACT_INFO` should be set to an email address or Reddit username.
56 |
57 | See the [Reddit API Credentials](#reddit-api-credentials) section to obtain values for the `APP_ID`, `APP_SECRET`, and `APP_REFRESH` variables.
58 |
59 | **Example:**
60 | ```
61 | USERAGENT=narwhalizer
62 | CONTACT_INFO=
63 | APP_ID=
64 | APP_SECRET=
65 | APP_REFRESH=
66 | ```
67 |
68 | ### Goggle Metadata
69 |
70 | This ends up as metadata at the top of the Goggle. More details about these can be these parameters can be found [here](https://github.com/brave/goggles-quickstart/blob/main/getting-started.md#goggles-syntax).
71 |
72 | **Example:**
73 | ```
74 | GOGGLE_NAME=Netsec
75 | GOGGLE_DESCRIPTION=Prioritizes domains popular with the information security community. Primarily uses submissions and scoring from /r/netsec.
76 | GOGGLE_PUBLIC=true
77 | GOGGLE_AUTHOR=Forces Unseen
78 | GOGGLE_AVATAR=#01ebae
79 | GOGGLE_HOMEPAGE=https://github.com/forcesunseen/netsec-goggle
80 | ```
81 |
82 | ### Goggle
83 |
84 | `SUBREDDITS` takes a comma delimited list of subreddits.
85 |
86 | `GOGGLE_FILENAME` sets the output filename of the Goggle.
87 |
88 | `GOGGLES_EXTRAS` allows you to include additional instructions in the final Goggle. Use `\n` to separate each instruction.
89 |
90 | **Example:**
91 | ```
92 | SUBREDDITS=netsec
93 | GOGGLE_FILENAME=netsec.goggle
94 | GOGGLE_EXTRAS=$boost=2,site=github.io\n$boost=2,site=github.com\n$boost=2,site=stackoverflow.com\n/blog.$boost=2\n/blog/$boost=2\n/docs.$boost=2\n/docs/$boost=2\n/doc/$boost=2\n/Doc/$boost=2\n/manual/$boost=2
95 | ```
96 |
97 | ### Algorithm
98 |
99 | `SCORE_THRESHOLD` takes an integer for the minimum score of a submission to be included.
100 |
101 | `MIN_FREQUENCY` takes an integer for the minimum frequency of a domain to be included.
102 |
103 | `MIN_EPOCH_TIME` takes an integer representing Unix time for the oldest date submission to be included. Even though older submissions will not be included in the generated Goggle, we decided to still include all submissions when building the database with timesearch.
104 |
105 | `TOP_DOMAINS_BEHAVIOR` takes one of four options: `exclude`, `include`, `discard`, or `downrank`.
106 |
107 | * **exclude** - top domains will be removed from the list of subreddit submissions.
108 | * **include** - top domains will be left in the list of subreddit submissions.
109 | * **discard** - top domains will be removed from the list of subreddit submissions and will also be marked as discard within the Goggle.
110 | * **downrank** - top domains will be removed from the list of subreddit submissions and will also be downranked using `TOP_DOMAINS_DOWNRANK_VALUE` within the Goggle.
111 |
112 | `TOP_DOMAINS_DOWNRANK_VALUE` takes an integer for the amount to downrank. This is only used when `TOP_DOMAINS_BEHAVIOR` is set to `downrank`.
113 |
114 | **Example:**
115 | ```
116 | SCORE_THRESHOLD=20
117 | MIN_FREQUENCY=1
118 | MIN_EPOCH_TIME=0
119 | TOP_DOMAINS_BEHAVIOR=exclude
120 | TOP_DOMAINS_DOWNRANK_VALUE=2
121 | ```
122 |
123 | Don't worry too much if there are completely irrelevant domains within your list. It most likely won't have a big impact because the rules are applied to Brave's "expanded recall set", as explained below.
124 |
125 | >The instructions defined in a Goggle are not applied to Brave Search’s entire index, but to what we call the “expanded recall set,” which in turn is a function of the query. The set of candidate URLs can be in the tens of thousands, which is often more than enough to observe a noticeable effect; however, there are no guarantees that all possible URLs are surfaced (in search terminology, we have no guarantees on recall).
126 |
127 | >Goggles do not apply to the whole Brave Search index, but to the expanded recall set which is a function of the input query. So if the target pages aren’t in the recall set, or even be in the Brave Search index, they won’t be captured by the Goggle.
128 |
129 |
130 | ## Development
131 |
132 | The most common modifications can just be made through environment variables, but if you want you want to modify the `generate/generate_goggle.py` script you can mount the `generate` directory and run the container using the modified script. This can be done using the following command.
133 |
134 | ```
135 | docker run -it -v ${PWD}/data:/app/data -v ${PWD}/generate:/app/generate --env-file netsec.env narwhalizer
136 | ```
137 |
138 | ### Make
139 |
140 | Here are some make commands for you lazy ones.
141 |
142 | ```
143 | make build
144 | make run env=netsec.env
145 | make dev env=netsec.env
146 | ```
147 |
148 | ## Thank You!
149 |
150 | We owe it to [Ethan Dalool](https://github.com/voussoir) for creating [timesearch](https://github.com/voussoir/timesearch) and [Jason Baumgartner](https://github.com/pushshift) for creating [pushshift.io](https://pushshift.io/). Also, without Brave this project wouldn't even exist.
151 |
152 | Thank you so much!
153 |
--------------------------------------------------------------------------------
/data/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/forcesunseen/narwhalizer/1f307d651a0d4c6bd4c0ecf88f5649f9b0ea8563/data/.gitkeep
--------------------------------------------------------------------------------
/generate/generate_goggle.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sqlite3
3 | import tldextract
4 | from collections import defaultdict
5 | from datetime import timezone, datetime
6 |
7 | # Environment variables
8 | APPLICATION_ROOT = os.environ.get("APPLICATION_ROOT")
9 | GOGGLE_NAME = os.environ.get('GOGGLE_NAME')
10 | GOGGLE_DESCRIPTION = os.environ.get('GOGGLE_DESCRIPTION')
11 | GOGGLE_PUBLIC = os.environ.get('GOGGLE_PUBLIC')
12 | GOGGLE_AUTHOR = os.environ.get('GOGGLE_AUTHOR')
13 | GOGGLE_AVATAR = os.environ.get('GOGGLE_AVATAR')
14 | GOGGLE_HOMEPAGE = os.environ.get('GOGGLE_HOMEPAGE')
15 | GOGGLE_EXTRAS = os.environ.get('GOGGLE_EXTRAS')
16 | GOGGLE_FILENAME = os.environ.get('GOGGLE_FILENAME')
17 | SUBREDDITS = os.environ.get('SUBREDDITS').split(',')
18 | SCORE_THRESHOLD = int(os.environ.get('SCORE_THRESHOLD'))
19 | MIN_EPOCH_TIME = int(os.environ.get('MIN_EPOCH_TIME'))
20 | MIN_FREQUENCY = int(os.environ.get('MIN_FREQUENCY'))
21 | TOP_DOMAINS_BEHAVIOR = os.environ.get('TOP_DOMAINS_BEHAVIOR').lower()
22 | TOP_DOMAINS_DOWNRANK_VALUE = int(os.environ.get('TOP_DOMAINS_DOWNRANK_VALUE'))
23 |
24 |
25 | def dict_factory(cursor, row):
26 | """Return dicts for SQLite queries.
27 | """
28 | d = {}
29 | for idx, col in enumerate(cursor.description):
30 | d[col[0]] = row[idx]
31 | return d
32 |
33 |
34 | def header():
35 | """Generate the Goggle metadata.
36 | """
37 | return (
38 | f"! name: {GOGGLE_NAME}\n"
39 | f"! description: {GOGGLE_DESCRIPTION}\n"
40 | f"! public: {GOGGLE_PUBLIC}\n"
41 | f"! author: {GOGGLE_AUTHOR}\n"
42 | f"! avatar: {GOGGLE_AVATAR}\n"
43 | f"! homepage: {GOGGLE_HOMEPAGE}\n"
44 | )
45 |
46 |
47 | def extras():
48 | """Generate any extras to include in the Goggle.
49 | """
50 | comment = "! Goggle extras\n"
51 | if GOGGLE_EXTRAS is not None:
52 | extras = GOGGLE_EXTRAS.replace("\\n", "\n")
53 | return comment + extras + '\n'
54 | else:
55 | return ""
56 |
57 |
58 | def boost(domain, amt):
59 | """Boost a site by an integer amount.
60 | """
61 | return f'$boost={amt},site={domain}'
62 |
63 |
64 | def downrank(domain, amt):
65 | """Downrank a site by an integer amount.
66 | """
67 | return f'$downrank={amt},site={domain}'
68 |
69 |
70 | def discard(domain):
71 | """Discard a site."
72 | """
73 | return f'$discard,site={domain}'
74 |
75 |
76 | def TOP_DOMAINS_BEHAVIOR():
77 | if TOP_DOMAINS_BEHAVIOR in ['exclude', 'discard', 'include', 'downrank']:
78 | return TOP_DOMAINS_BEHAVIOR
79 |
80 |
81 | def sort_domains(submissions):
82 | """Parse URLs and return sorted list of domains with exclusions.
83 | """
84 | domains = defaultdict(lambda: 0)
85 | domains_counts = defaultdict(lambda: 0)
86 | for item in submissions:
87 | score = item['score']
88 | url = item['url']
89 | if url is None:
90 | pass
91 | else:
92 | extracted = tldextract.extract(url)
93 |
94 | # Double check that we have a real domain
95 | if extracted[1] and extracted[2]:
96 | domain = '.'.join(extracted[1:]).lower()
97 | # Check if we want to include domains from the top domains list
98 | if TOP_DOMAINS_BEHAVIOR() != "include":
99 | if domain not in TOP_DOMAINS:
100 | domains[domain] += score
101 | domains_counts[domain] += 1
102 | else:
103 | domains[domain] += score
104 | domains_counts[domain] += 1
105 |
106 | # Remove domains that don't meet the frequency requirements
107 | for item in domains_counts:
108 | count = domains_counts[item]
109 | if count < MIN_FREQUENCY:
110 | domains.pop(item)
111 |
112 | # Sort domains by score
113 | sorted_domains = sorted(
114 | domains.items(), key=lambda item: item[1], reverse=True)
115 | return sorted_domains
116 |
117 |
118 | def generate(domains):
119 | """Generate rankings in Goggle format.
120 | """
121 | with open(f'{APPLICATION_ROOT}/data/{GOGGLE_FILENAME}', 'w') as target:
122 | target.write(header())
123 | target.write(f"! generated: {datetime.now(timezone.utc)}\n")
124 | target.write('\n')
125 | target.write(extras())
126 |
127 | entries = len(domains)
128 |
129 | if TOP_DOMAINS_BEHAVIOR() == "discard":
130 | for domain in TOP_DOMAINS:
131 | target.write('\n')
132 | target.write(discard(domain))
133 |
134 | if TOP_DOMAINS_BEHAVIOR() == "downrank":
135 | for domain in TOP_DOMAINS:
136 | target.write('\n')
137 | target.write(downrank(domain, TOP_DOMAINS_DOWNRANK_VALUE))
138 |
139 | # Split up the list into thirds and assign a boost of 4, 3, or 2
140 | for item in range(len(domains)):
141 | domain = domains[item][0]
142 | place = item/entries
143 | if place <= 0.33:
144 | target.write('\n')
145 | target.write(boost(domain, 4))
146 | elif place <= 0.66:
147 | target.write('\n')
148 | target.write(boost(domain, 3))
149 | else:
150 | target.write('\n')
151 | target.write(boost(domain, 2))
152 | print(f'{GOGGLE_FILENAME} generated')
153 |
154 |
155 | # Store top domains in memory
156 | with open(f'{APPLICATION_ROOT}/generate/top_domains.txt', 'r') as top_domains_file:
157 | TOP_DOMAINS = top_domains_file.read().splitlines()
158 |
159 | # Get data from SQLite database
160 | submissions = []
161 | for item in SUBREDDITS:
162 | if len(item) > 0:
163 | target_subreddit = item.lower()
164 | con = sqlite3.connect(
165 | f'{APPLICATION_ROOT}/data/subreddits/{target_subreddit}/{target_subreddit}.db')
166 | con.row_factory = dict_factory
167 | cur = con.cursor()
168 | cur.execute(
169 | 'SELECT score,url FROM submissions WHERE score >= ? AND created >= ?', (SCORE_THRESHOLD, MIN_EPOCH_TIME))
170 | submissions.extend(cur.fetchall())
171 | con.close()
172 |
173 | # Sort domains and generate Goggle
174 | sorted_domains = sort_domains(submissions)
175 | generate(sorted_domains)
176 |
--------------------------------------------------------------------------------
/generate/requirements.txt:
--------------------------------------------------------------------------------
1 | tldextract
2 |
--------------------------------------------------------------------------------
/generate/top_domains.txt:
--------------------------------------------------------------------------------
1 | google.com
2 | youtube.com
3 | facebook.com
4 | akamaiedge.net
5 | netflix.com
6 | microsoft.com
7 | instagram.com
8 | twitter.com
9 | gtld-servers.net
10 | baidu.com
11 | linkedin.com
12 | akamai.net
13 | wikipedia.org
14 | apple.com
15 | amazonaws.com
16 | yahoo.com
17 | cloudflare.com
18 | bilibili.com
19 | amazon.com
20 | a-msedge.net
21 | qq.com
22 | live.com
23 | akadns.net
24 | googletagmanager.com
25 | wordpress.org
26 | bing.com
27 | netflix.net
28 | github.com
29 | whatsapp.com
30 | pinterest.com
31 | reddit.com
32 | office.com
33 | youtu.be
34 | trafficmanager.net
35 | microsoftonline.com
36 | l-msedge.net
37 | azure.com
38 | windowsupdate.com
39 | vimeo.com
40 | adobe.com
41 | zoom.us
42 | doubleclick.net
43 | zhihu.com
44 | fastly.net
45 | yandex.ru
46 | mail.ru
47 | googlevideo.com
48 | wordpress.com
49 | domaincontrol.com
50 | goo.gl
51 | vk.com
52 | bit.ly
53 | gandi.net
54 | nflxso.net
55 | s-msedge.net
56 | googleusercontent.com
57 | msn.com
58 | taobao.com
59 | weibo.com
60 | aaplimg.com
61 | sharepoint.com
62 | csdn.net
63 | t.co
64 | blogspot.com
65 | mozilla.org
66 | tiktok.com
67 | google.com.hk
68 | tumblr.com
69 | cloudapp.net
70 | paypal.com
71 | edgekey.net
72 | windows.net
73 | macromedia.com
74 | webex.com
75 | nytimes.com
76 | xvideos.com
77 | nih.gov
78 | office365.com
79 | apple-dns.net
80 | 163.com
81 | spotify.com
82 | sina.com.cn
83 | intuit.com
84 | yandex.net
85 | pornhub.com
86 | google-analytics.com
87 | europa.eu
88 | flickr.com
89 | dropbox.com
90 | skype.com
91 | twitch.tv
92 | canva.com
93 | attcompute.com
94 | yahoo.co.jp
95 | medium.com
96 | stackoverflow.com
97 | imdb.com
98 | ebay.com
99 | sohu.com
100 | gravatar.com
101 | googledomains.com
102 | jd.com
103 | opera.com
104 | naver.com
105 | lencr.org
106 | fandom.com
107 | icloud.com
108 | t-msedge.net
109 | cnn.com
110 | myfritz.net
111 | soundcloud.com
112 | outlook.com
113 | cloudfront.net
114 | aliexpress.com
115 | myshopify.com
116 | tmall.com
117 | t.me
118 | gvt1.com
119 | apache.org
120 | amazon.in
121 | fbcdn.net
122 | archive.org
123 | bbc.com
124 | forbes.com
125 | zemanta.com
126 | nic.ru
127 | theguardian.com
128 | bbc.co.uk
129 | quora.com
130 | cloudflare.net
131 | omtrdc.net
132 | wellsfargo.com
133 | force.com
134 | github.io
135 | abusix.zone
136 | w3.org
137 | msedge.net
138 | chaturbate.com
139 | xhamster.com
140 | office.net
141 | forms.gle
142 | digicert.com
143 | salesforce.com
144 | douban.com
145 | etsy.com
146 | indeed.com
147 | p-cdn.us
148 | sourceforge.net
149 | linode.com
150 | google.co.in
151 | duckduckgo.com
152 | sciencedirect.com
153 | spo-msedge.net
154 | miit.gov.cn
155 | imgur.com
156 | teamviewer.com
157 | amazon.co.uk
158 | creativecommons.org
159 | who.int
160 | 1688.com
161 | booking.com
162 | discord.com
163 | googlesyndication.com
164 | weebly.com
165 | dnsowl.com
166 | issuu.com
167 | twimg.com
168 | akamaihd.net
169 | youku.com
170 | spankbang.com
171 | wixsite.com
172 | cdc.gov
173 | researchgate.net
174 | roblox.com
175 | telegram.org
176 | flipkart.com
177 | reuters.com
178 | washingtonpost.com
179 | xnxx.com
180 | wikimedia.org
181 | oracle.com
182 | amazon.co.jp
183 | dailymail.co.uk
184 | tiktokcdn.com
185 | cedexis.net
186 | mangosip.ru
187 | registrar-servers.com
188 | tradingview.com
189 | harvard.edu
190 | bloomberg.com
191 | wsj.com
192 | blogger.com
193 | dnsmadeeasy.com
194 | hp.com
195 | tinyurl.com
196 | googleapis.com
197 | wix.com
198 | googleadservices.com
199 | worldfcdn.com
200 | alibaba.com
201 | akamaized.net
202 | ytimg.com
203 | mit.edu
204 | meraki.com
205 | kaspersky.com
206 | businessinsider.com
207 | aliyun.com
208 | shopify.com
209 | wa.me
210 | hwcdn.net
211 | go.com
212 | gvt2.com
213 | alipay.com
214 | wp.com
215 | dell.com
216 | samsung.com
217 | youtube-nocookie.com
218 | cisco.com
219 | epicgames.com
220 | cdninstagram.com
221 | ok.ru
222 | wiley.com
223 | cnbc.com
224 | freepik.com
225 | google.cn
226 | windows.com
227 | slideshare.net
228 | hao123.com
229 | ibm.com
230 | amazon.de
231 | azurewebsites.net
232 | demdex.net
233 | sogou.com
234 | php.net
235 | pixiv.net
236 | behance.net
237 | 360.cn
238 | slack.com
239 | edgesuite.net
240 | godaddy.com
241 | so.com
242 | nature.com
243 | coinmarketcap.com
244 | mediafire.com
245 | springer.com
246 | google.de
247 | walmart.com
248 | list-manage.com
249 | espn.com
250 | zendesk.com
251 | tencent.com
252 | aol.com
253 | adnxs.com
254 | grammarly.com
255 | binance.com
256 | stanford.edu
257 | fc2.com
258 | cnblogs.com
259 | weather.com
260 | un.org
261 | comcast.net
262 | o365filtering.com
263 | nasa.gov
264 | deepl.com
265 | elasticbeanstalk.com
266 | cnki.net
267 | gov.uk
268 | gnu.org
269 | app-measurement.com
270 | mega.nz
271 | doi.org
272 | db.com
273 | fiverr.com
274 | line.me
275 | att.net
276 | trello.com
277 | unsplash.com
278 | w3schools.com
279 | douyu.com
280 | salesforceliveagent.com
281 | nginx.org
282 | marriott.com
283 | name-services.com
284 | fincoad.com
285 | usatoday.com
286 | tiktokv.com
287 | azureedge.net
288 | cnet.com
289 | akam.net
290 | herokudns.com
291 | wsdvs.com
292 | ilovepdf.com
293 | dailymotion.com
294 | iqiyi.com
295 | hubspot.com
296 | google.co.uk
297 | sberdevices.ru
298 | stripe.com
299 | npr.org
300 | xcal.tv
301 | yelp.com
302 | telegraph.co.uk
303 | rakuten.co.jp
304 | linktr.ee
305 | eventbrite.com
306 | nginx.com
307 | steampowered.com
308 | surveymonkey.com
309 | rambler.ru
310 | cdngslb.com
311 | google.co.jp
312 | indiatimes.com
313 | instructure.com
314 | 2mdn.net
315 | shifen.com
316 | wetransfer.com
317 | mcafee.com
318 | amazon-adsystem.com
319 | goodreads.com
320 | pubmatic.com
321 | time.com
322 | huya.com
323 | deviantart.com
324 | casalemedia.com
325 | udemy.com
326 | cpanel.net
327 | rackspace.net
328 | ca.gov
329 | realsrv.com
330 | ampproject.org
331 | themeforest.net
332 | patreon.com
333 | snapchat.com
334 | ring.com
335 | tripadvisor.com
336 | userapi.com
337 | a2z.com
338 | google.fr
339 | criteo.com
340 | scorecardresearch.com
341 | gcdn.co
342 | foxnews.com
343 | zillow.com
344 | scribd.com
345 | avito.ru
346 | ted.com
347 | mailchimp.com
348 | bitly.com
349 | reg.ru
350 | cdn77.org
351 | gstatic.com
352 | healthline.com
353 | jomodns.com
354 | jianshu.com
355 | gosuslugi.ru
356 | stuvamac.com
357 | wired.com
358 | addtoany.com
359 | pki.goog
360 | manitu.net
361 | google.ru
362 | unity3d.com
363 | whatsapp.net
364 | independent.co.uk
365 | wildberries.ru
366 | huawei.com
367 | webmd.com
368 | pixabay.com
369 | incapdns.net
370 | nstld.com
371 | myspace.com
372 | livejournal.com
373 | ggpht.com
374 | shutterstock.com
375 | aparat.com
376 | addthis.com
377 | statista.com
378 | berkeley.edu
379 | stackexchange.com
380 | facebook.net
381 | xiaomi.com
382 | mysql.com
383 | digikala.com
384 | squarespace.com
385 | ft.com
386 | amazon.ca
387 | cpanel.com
388 | douyin.com
389 | google.ca
390 | craigslist.org
391 | youdao.com
392 | adobe.io
393 | okta.com
394 | intel.com
395 | g.page
396 | techcrunch.com
397 | huffingtonpost.com
398 | hupu.com
399 | latimes.com
400 | 3dmgame.com
401 | loc.gov
402 | fb.com
403 | digitalocean.com
404 | chase.com
405 | amzn.to
406 | hicloud.com
407 | f5silverline.com
408 | consultant.ru
409 | daum.net
410 | onlyfans.com
411 | appsflyer.com
412 | opendns.com
413 | savefrom.net
414 | hichina.com
415 | free.fr
416 | theverge.com
417 | alicdn.com
418 | aliyuncs.com
419 | atlassian.net
420 | taboola.com
421 | goskope.com
422 | cambridge.org
423 | pexels.com
424 | rzone.de
425 | f2pool.com
426 | ohthree.com
427 | speedtest.net
428 | amazonvideo.com
429 | kickstarter.com
430 | googletagservices.com
431 | investopedia.com
432 | rbxcdn.com
433 | tandfonline.com
434 | alibabadns.com
435 | b-msedge.net
436 | google.es
437 | beian.gov.cn
438 | ikea.com
439 | uol.com.br
440 | washington.edu
441 | roche.com
442 | huffpost.com
443 | rackspace.com
444 | cornell.edu
445 | debian.org
446 | giphy.com
447 | openx.net
448 | ozon.ru
449 | amazon.it
450 | hotstar.com
451 | prnewswire.com
452 | apple.news
453 | fedex.com
454 | upwork.com
455 | state.gov
456 | trustpilot.com
457 | feishu.cn
458 | att.com
459 | zoho.com
460 | homedepot.com
461 | oup.com
462 | safebrowsing.apple
463 | redd.it
464 | azure-dns.com
465 | saasprotection.com
466 | nicovideo.jp
467 | tistory.com
468 | britannica.com
469 | cbsnews.com
470 | no-ip.com
471 | android.com
472 | theatlantic.com
473 | nationalgeographic.com
474 | fda.gov
475 | ietf.org
476 | business.site
477 | hostgator.com
478 | yandex.com
479 | galaxydata.ru
480 | nbcnews.com
481 | lenovo.com.cn
482 | bankofamerica.com
483 | bidswitch.net
484 | google.com.tw
485 | dcinside.com
486 | cloudns.net
487 | eset.com
488 | mi.com
489 | buzzfeed.com
490 | sagepub.com
491 | maricopa.gov
492 | amazon.fr
493 | pinimg.com
494 | nypost.com
495 | mega.co.nz
496 | worldnic.com
497 | sentry.io
498 | cctv.com
499 | steamcommunity.com
500 | wikihow.com
501 | academia.edu
502 | root-servers.net
503 | sitemaps.org
504 | bandcamp.com
505 | name.com
506 | adriver.ru
507 | rlcdn.com
508 | usnews.com
509 | globo.com
510 | msftconnecttest.com
511 | hotjar.com
512 | dns.com
513 | primevideo.com
514 | ali213.net
515 | investing.com
516 | marketwatch.com
517 | dribbble.com
518 | bluehost.com
519 | aboutads.info
520 | mailchi.mp
521 | yahoodns.net
522 | disqus.com
523 | moatads.com
524 | rt.com
525 | usda.gov
526 | gmail.com
527 | box.com
528 | miui.com
529 | princeton.edu
530 | avast.com
531 | jimdo.com
532 | arnebrachhold.de
533 | mayoclinic.org
534 | hbr.org
535 | messenger.com
536 | nga.cn
537 | notion.so
538 | vkontakte.ru
539 | amazon.es
540 | rubiconproject.com
541 | coursera.org
542 | forter.com
543 | marca.com
544 | hugedomains.com
545 | statcounter.com
546 | herokuapp.com
547 | ea.com
548 | tbcache.com
549 | nr-data.net
550 | change.org
551 | igamecj.com
552 | mozilla.com
553 | nintendo.net
554 | merriam-webster.com
555 | nike.com
556 | ups.com
557 | unesco.org
558 | noaa.gov
559 | rbc.ru
560 | launchpad.net
561 | dmm.co.jp
562 | msftncsi.com
563 | trendmicro.com
564 | oath.cloud
565 | livedoor.jp
566 | stripchat.com
567 | akismet.com
568 | pbs.org
569 | nvidia.com
570 | ntp.org
571 | airbnb.com
572 | spaceweb.pro
573 | rackspaceclouddb.com
574 | smzdm.com
575 | usps.com
576 | arcgis.com
577 | bet365.com
578 | eastmoney.com
579 | adsrvr.org
580 | economist.com
581 | msn.cn
582 | umich.edu
583 | whitehouse.gov
584 | ngenix.net
585 | outbrain.com
586 | elsevier.com
587 | ubuntu.com
588 | chaoxing.com
589 | autodesk.com
590 | spcsdns.net
591 | e-msedge.net
592 | networkadvertising.org
593 | columbia.edu
594 | envato.com
595 | calendly.com
596 | verisign.com
597 | google.com.br
598 | bongacams.com
599 | businesswire.com
600 | epa.gov
601 | target.com
602 | ieee.org
603 | xinhuanet.com
604 | telegram.me
605 | atlassian.com
606 | varzesh3.com
607 | allaboutcookies.org
608 | ifeng.com
609 | accuweather.com
610 | y2mate.com
611 | northgrum.com
612 | wpengine.com
613 | irs.gov
614 | sorbs.net
615 | gamersky.com
616 | netease.com
617 | nessus.org
618 | exacttarget.com
619 | capitalone.com
620 | ptinews.com
621 | xiaohongshu.com
622 | geeksforgeeks.org
623 | redhat.com
624 | google.com.mx
625 | digg.com
626 | typepad.com
627 | 3lift.com
628 | roku.com
629 | ixigua.com
630 | namu.wiki
631 | bestbuy.com
632 | worldbank.org
633 | toutiao.com
634 | istockphoto.com
635 | ria.ru
636 | google.com.sg
637 | mts.ru
638 | hulu.com
639 | myqcloud.com
640 | fbs1-t-msedge.net
641 | 9gag.com
642 | media.net
643 | appspot.com
644 | shein.com
645 | b-cdn.net
646 | timeweb.ru
647 | typeform.com
648 | abc.net.au
649 | fortinet.net
650 | canada.ca
651 | rackcdn.com
652 | quizlet.com
653 | qualtrics.com
654 | about.com
655 | newrelic.com
656 | opensea.io
657 | netangels.ru
658 | yts.mx
659 | deloitte.com
660 | sciencemag.org
661 | atomile.com
662 | fb.me
663 | psu.edu
664 | arxiv.org
665 | gitlab.com
666 | domainmarket.com
667 | americanexpress.com
668 | media-amazon.com
669 | rncdn7.com
670 | visualstudio.com
671 | beeline.ru
672 | ucla.edu
673 | skyeng.link
674 | cbc.ca
675 | yale.edu
676 | tremorhub.com
677 | chinamobile.com
678 | reverso.net
679 | apigee.net
680 | dnspod.net
681 | ivi.ru
682 | dynect.net
683 | aiv-cdn.net
684 | ox.ac.uk
685 | vice.com
686 | dw.com
687 | lum-superproxy.io
688 | dreamhost.com
689 | iso.org
690 | upenn.edu
691 | sitescout.com
692 | constantcontact.com
693 | google.com.tr
694 | ameblo.jp
695 | cricbuzz.com
696 | figma.com
697 | zdnet.com
698 | imgsmail.ru
699 | tinkoff.ru
700 | nhentai.net
701 | c-msedge.net
702 | mashable.com
703 | smallpdf.com
704 | shaparak.ir
705 | hdfcbank.com
706 | psychologytoday.com
707 | aliexpress.ru
708 | vox.com
709 | rosintel.com
710 | jotform.com
711 | sophos.com
712 | fincoec.com
713 | eepurl.com
714 | shopee.tw
715 | lenovo.com
716 | mzstatic.com
717 | google.com.au
718 | nist.gov
719 | gofundme.com
720 | mathtag.com
721 | sciencedaily.com
722 | markmonitor.com
723 | checkpoint.com
724 | chsi.com.cn
725 | evernote.com
726 | aliyundrive.com
727 | playstation.com
728 | gitee.com
729 | bmj.com
730 | runoob.com
731 | dhl.com
732 | jdadelivers.com
733 | wisc.edu
734 | wiktionary.org
735 | nokia.com
736 | cogentco.com
737 | rayjump.com
738 | google.co.th
739 | ortb.net
740 | theconversation.com
741 | feedburner.com
742 | google.it
743 | getpocket.com
744 | newyorker.com
745 | ttlivecdn.com
746 | zerodha.com
747 | zhibo8.cc
748 | discord.gg
749 | apnews.com
750 | plos.org
751 | sharethrough.com
752 | uber.com
753 | elpais.com
754 | optimizely.com
755 | bitrix24.ru
756 | gearbest.com
757 | fortune.com
758 | trendyol.com
759 | biomedcentral.com
760 | bluekai.com
761 | wattpad.com
762 | ps.kz
763 | quantserve.com
764 | animeflv.net
765 | genius.com
766 | icourse163.org
767 | python.org
768 | medicalnewstoday.com
769 | ovscdns.com
770 | doubleverify.com
771 | paloaltonetworks.com
772 | docker.com
773 | weforum.org
774 | nba.com
775 | ertelecom.ru
776 | azurefd.net
777 | nest.com
778 | lazada.sg
779 | 1337x.to
780 | yimg.com
781 | alidns.com
782 | fastcompany.com
783 | mirtesen.ru
784 | mdpi.com
785 | ign.com
786 | zippyshare.com
787 | zhihuishu.com
788 | uci.edu
789 | mlb.com
790 | fontawesome.com
791 | jquery.com
792 | rtcfront.net
793 | va.gov
794 | engadget.com
795 | ovh.net
796 | advertising.com
797 | sfx.ms
798 | verizon.com
799 | 52pojie.cn
800 | umn.edu
801 | newsweek.com
802 | tds.net
803 | gihc.net
804 | ebay.co.uk
805 | kakao.com
806 | meetup.com
807 | avito.st
808 | thesun.co.uk
809 | as.com
810 | agkn.com
811 | vtb.ru
812 | nic.uk
813 | autohome.com.cn
814 | frontiersin.org
815 | sxyprn.com
816 | stumbleupon.com
817 | duolingo.com
818 | llnwi.net
819 | oreilly.com
820 | attn.tv
821 | gizmodo.com
822 | mckinsey.com
823 | spiegel.de
824 | jstor.org
825 | adsafeprotected.com
826 | apa.org
827 | online-metrix.net
828 | eporner.com
829 | inc.com
830 | crashlytics.com
831 | playstation.net
832 | pconline.com.cn
833 | expedia.com
834 | hilton.com
835 | comcast.com
836 | mirror.co.uk
837 | vmware.com
838 | swrve.com
839 | mercadolibre.com.mx
840 | example.com
841 | asus.com
842 | secureserver.net
843 | gismeteo.ru
844 | xfinity.com
845 | webs.com
846 | timeanddate.com
847 | discordapp.com
848 | gamespot.com
849 | myanimelist.net
850 | thepaper.cn
851 | google.com.sa
852 | dyndns.org
853 | cam.ac.uk
854 | zol.com.cn
855 | youporn.com
856 | sun.com
857 | jhu.edu
858 | corelux.net
859 | sahibinden.com
860 | wayfair.com
861 | criteo.net
862 | e-hentai.org
863 | emxdgt.com
864 | wp.pl
865 | disneyplus.com
866 | crunchyroll.com
867 | duckdns.org
868 | asana.com
869 | sberbank.ru
870 | tapad.com
871 | pikiran-rakyat.com
872 | people.com.cn
873 | everesttech.net
874 | smartadserver.com
875 | entrepreneur.com
876 | mwbsys.com
877 | utexas.edu
878 | cmu.edu
879 | fcuat.com
880 | ripn.net
881 | bidr.io
882 | oecd.org
883 | tripod.com
884 | youronlinechoices.com
885 | plesk.com
886 | photobucket.com
887 | huaban.com
888 | xing.com
889 | remove.bg
890 | namebrightdns.com
891 | realtor.com
892 | cdn-apple.com
893 | jb51.net
894 | hitomi.la
895 | hinet.net
896 | bbb.org
897 | coupang.com
898 | softonic.com
899 | acs.org
900 | exelator.com
901 | blog.jp
902 | glassdoor.com
903 | chess.com
904 | geocities.com
905 | onlinesbi.com
906 | imrworldwide.com
907 | teads.tv
908 | samsungcloudsolution.com
909 | tokopedia.com
910 | kunlunsl.com
911 | pewresearch.org
912 | aljazeera.com
913 | merchantlink.com
914 | tencent-cloud.net
915 | footprint.net
916 | xhamsterlive.com
917 | licdn.com
918 | dropboxusercontent.com
919 | icloud-content.com
920 | readmanganato.com
921 | canonical.com
922 | openstreetmap.org
923 | gotowebinar.com
924 | scientificamerican.com
925 | usgovcloudapi.net
926 | ed.gov
927 | ny.gov
928 | kaspersky-labs.com
929 | blackboard.com
930 | sectigo.com
931 | uchicago.edu
932 | sfgate.com
933 | shopee.co.id
934 | xueqiu.com
935 | nsone.net
936 | sky.com
937 | crwdcntrl.net
938 | substack.com
939 | nps.gov
940 | lowes.com
941 | hbo.com
942 | mercadolibre.com.ar
943 | adp.com
944 | elegantthemes.com
945 | dnsv1.com
946 | hootsuite.com
947 | amazon.com.mx
948 | 360doc.com
949 | samsungcloud.com
950 | ilive.cn
951 | guardian.co.uk
952 | 33across.com
953 | fmkorea.com
954 | xbox.com
955 | mihoyo.com
956 | wbx2.com
957 | chegg.com
958 | pcmag.com
959 | ys7.com
960 | lenta.ru
961 | fao.org
962 | applovin.com
963 | ecdns.net
964 | t-mobile.com
965 | fidelity.com
966 | gartner.com
967 | webrootcloudav.com
968 | immedia-semi.com
969 | getbootstrap.com
970 | detik.com
971 | allegro.pl
972 | weibo.cn
973 | qidian.com
974 | purdue.edu
975 | google.co.kr
976 | pnas.org
977 | ftc.gov
978 | soso.com
979 | braze.com
980 | indiegogo.com
981 | ucsd.edu
982 | news.com.au
983 | quillbot.com
984 | nyu.edu
985 | slate.com
986 | indiamart.com
987 | arstechnica.com
988 | onet.pl
989 | emailvision.net
990 | 360yield.com
991 | ucoz.ru
992 | thelancet.com
993 | cdn20.com
994 | op.gg
995 | bizjournals.com
996 | anchor.fm
997 | tribunnews.com
998 | google.com.ar
999 | branch.io
1000 | revopush.com
1001 |
--------------------------------------------------------------------------------
/narwhalizer.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | # This is where we have everything set up in the container
4 | export APPLICATION_ROOT=/app
5 |
6 | # Get ready
7 | source "$APPLICATION_ROOT"/venv/bin/activate
8 | cd "$APPLICATION_ROOT"/data
9 |
10 | # Update database for each subreddit using timesearch
11 | IFS=',' read -ra TARGET_SUBREDDITS <<< "$SUBREDDITS"
12 | for SUBREDDIT in "${TARGET_SUBREDDITS[@]}"; do
13 | echo "Running timesearch for subreddit: $SUBREDDIT"
14 | python "$APPLICATION_ROOT"/timesearch/timesearch.py get_submissions -r "$SUBREDDIT"
15 | done
16 |
17 | # Generate Goggle
18 | python "$APPLICATION_ROOT"/generate/generate_goggle.py
19 |
--------------------------------------------------------------------------------
/netsec.env:
--------------------------------------------------------------------------------
1 | # Timesearch
2 | USERAGENT=narwhalizer
3 | CONTACT_INFO=
4 | APP_ID=
5 | APP_SECRET=
6 | APP_REFRESH=
7 |
8 | # Goggle Metadata
9 | GOGGLE_NAME=Netsec
10 | GOGGLE_DESCRIPTION=Prioritizes domains popular with the information security community. Primarily uses submissions and scoring from /r/netsec.
11 | GOGGLE_PUBLIC=true
12 | GOGGLE_AUTHOR=Forces Unseen
13 | GOGGLE_AVATAR=#01ebae
14 | GOGGLE_HOMEPAGE=https://github.com/forcesunseen/netsec-goggle
15 |
16 | # Goggle
17 | SUBREDDITS=netsec
18 | GOGGLE_FILENAME=netsec.goggle
19 | GOGGLE_EXTRAS=$boost=2,site=github.io\n$boost=2,site=github.com\n$boost=2,site=stackoverflow.com\n/blog.$boost=2\n/blog/$boost=2\n/docs.$boost=2\n/docs/$boost=2\n/doc/$boost=2\n/Doc/$boost=2\n/manual/$boost=2
20 |
21 | # Algorithm
22 | SCORE_THRESHOLD=20
23 | MIN_FREQUENCY=1
24 | MIN_EPOCH_TIME=0
25 | TOP_DOMAINS_BEHAVIOR=exclude
26 | TOP_DOMAINS_DOWNRANK_VALUE=2
--------------------------------------------------------------------------------
/patches/tsdb_naming.patch:
--------------------------------------------------------------------------------
1 | 20,24c20,24
2 | < '.\\{name}.db',
3 | < '.\\subreddits\\{name}\\{name}.db',
4 | < '.\\{name}\\{name}.db',
5 | < '.\\databases\\{name}.db',
6 | < '.\\subreddits\\{name}\\{name}.db',
7 | ---
8 | > '{name}.db',
9 | > 'subreddits/{name}/{name}.db',
10 | > '{name}/{name}.db',
11 | > 'databases/{name}.db',
12 | > 'subreddits/{name}/{name}.db',
13 | 27,31c27,31
14 | < '.\\@{name}.db',
15 | < '.\\users\\@{name}\\@{name}.db',
16 | < '.\\@{name}\\@{name}.db',
17 | < '.\\databases\\@{name}.db',
18 | < '.\\users\\@{name}\\@{name}.db',
19 | ---
20 | > '@{name}.db',
21 | > 'users/@{name}/@{name}.db',
22 | > '@{name}/@{name}.db',
23 | > 'databases/@{name}.db',
24 | > 'users/@{name}/@{name}.db',
25 |
--------------------------------------------------------------------------------
/refresh/obtain_refresh_token.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | '''
3 | Copyright (c) 2016, Bryce Boe
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 |
9 | 1. Redistributions of source code must retain the above copyright notice, this
10 | list of conditions and the following disclaimer.
11 | 2. Redistributions in binary form must reproduce the above copyright notice,
12 | this list of conditions and the following disclaimer in the documentation
13 | and/or other materials provided with the distribution.
14 |
15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
19 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
21 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
22 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
23 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
24 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25 | '''
26 | import random
27 | import socket
28 | import sys
29 |
30 | import praw
31 |
32 |
33 | def main():
34 | scopes = ["read", "wikiread"]
35 |
36 | reddit = praw.Reddit(
37 | redirect_uri="http://localhost:8081",
38 | user_agent="narwhalizer",
39 | )
40 | state = str(random.randint(0, 65000))
41 | url = reddit.auth.url(duration="permanent", scopes=scopes, state=state)
42 | print(f"Now open this url in your browser: {url}")
43 |
44 | client = receive_connection()
45 | data = client.recv(1024).decode("utf-8")
46 | param_tokens = data.split(" ", 2)[1].split("?", 1)[1].split("&")
47 | params = {
48 | key: value for (key, value) in [token.split("=") for token in param_tokens]
49 | }
50 |
51 | if state != params["state"]:
52 | send_message(
53 | client,
54 | f"State mismatch. Expected: {state} Received: {params['state']}",
55 | )
56 | return 1
57 | elif "error" in params:
58 | send_message(client, params["error"])
59 | return 1
60 |
61 | print(params["code"])
62 | refresh_token = reddit.auth.authorize(params["code"])
63 | send_message(client, f"Refresh token: {refresh_token}")
64 | return 0
65 |
66 |
67 | def receive_connection():
68 | """Wait for and then return a connected socket..
69 |
70 | Opens a TCP connection on port 8081, and waits for a single client.
71 |
72 | """
73 | server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
74 | server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
75 | server.bind(("0.0.0.0", 8081))
76 | server.listen(1)
77 | client = server.accept()[0]
78 | server.close()
79 | return client
80 |
81 |
82 | def send_message(client, message):
83 | """Send message to client and close the connection."""
84 | print(message)
85 | client.send(f"HTTP/1.1 200 OK\r\n\r\n{message}".encode("utf-8"))
86 | client.close()
87 |
88 |
89 | if __name__ == "__main__":
90 | sys.exit(main())
91 |
--------------------------------------------------------------------------------
/refresh/requirements.txt:
--------------------------------------------------------------------------------
1 | praw
2 |
--------------------------------------------------------------------------------
/scripts/bot.py:
--------------------------------------------------------------------------------
1 | '''
2 | BSD 3-Clause License
3 |
4 | Copyright (c) 2022, Ethan Dalool aka voussoir
5 | All rights reserved.
6 |
7 | Redistribution and use in source and binary forms, with or without
8 | modification, are permitted provided that the following conditions are met:
9 |
10 | 1. Redistributions of source code must retain the above copyright notice, this
11 | list of conditions and the following disclaimer.
12 |
13 | 2. Redistributions in binary form must reproduce the above copyright notice,
14 | this list of conditions and the following disclaimer in the documentation
15 | and/or other materials provided with the distribution.
16 |
17 | 3. Neither the name of the copyright holder nor the names of its
18 | contributors may be used to endorse or promote products derived from
19 | this software without specific prior written permission.
20 |
21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
22 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
24 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
25 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
27 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
28 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
29 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
31 | '''
32 | import os
33 |
34 | '''
35 | bot.py template for PRAW4
36 |
37 | This file will be imported by all bots, and provides a standard way to log in.
38 |
39 | You should never place this file in a git repository or any place where it will
40 | get shared.
41 |
42 | The requirements for this file are:
43 |
44 | 1 A function `anonymous` with no arguments, which returns a `praw.Reddit`
45 | instance that has a Useragent but is otherwise anonymous / unauthenticated.
46 | This will be used in bots that need to make requests but don't need any
47 | permissions.
48 |
49 | 2 A function `login` with optional parameter `r`, which returns an
50 | authenticated Reddit instance.
51 | If `r` is provided, authenticate it.
52 | If not, create one using `anonymous` and authenticate that.
53 | Either way, return the instance when finished.
54 |
55 | The exact workings of these functions, and the existence of any other variables
56 | and functions are up to you.
57 |
58 | I suggest placing this file in a private directory and adding that directory to
59 | your `PYTHONPATH` environment variable. This makes it importable from anywhere.
60 |
61 | However, you may place it in your default Python library. An easy way to find
62 | this is by importing a standard library module and checking its location:
63 | >>> import os
64 | >>> os
65 |
66 |
67 | But placing the file in the standard library means you will have to copy it over
68 | when you upgrade Python.
69 |
70 | If you need multiple separate bots, I would suggest creating copies of this file
71 | with different names, and then using `import specialbot as bot` within the
72 | application, so that the rest of the interface can stay the same.
73 | '''
74 |
75 | # The USERAGENT is a short description of why you're using reddit's API.
76 | # You can make it as simple as "/u/myusername's bot for /r/mysubreddit".
77 | # It should include your username so that reddit can contact you if there
78 | # is a problem.
79 | USERAGENT = os.environ.get('USERAGENT')
80 |
81 | # CONTACT_INFO can be your reddit username, or your email address, or any other
82 | # means of contacting you. This is used for some bot programs but not all.
83 | CONTACT_INFO = os.environ.get('CONTACT_INFO')
84 |
85 | # It's time to get your OAuth credentials.
86 | # 1. Go to https://old.reddit.com/prefs/apps
87 | # 2. Click create a new app
88 | # 3. Give it any name you want
89 | # 4. Choose "script" type
90 | # 5. Description and About URI can be blank
91 | # 6. Put "http://localhost:8080" as the Redirect URI
92 | # 7. Now that you have created your app, write down the app ID (which appears
93 | # under its name), the secret, and the URI (http://localhost:8080) in the
94 | # variables below:
95 | APP_ID = os.environ.get('APP_ID')
96 | APP_SECRET = os.environ.get('APP_SECRET')
97 | APP_URI = os.environ.get('APP_URI')
98 | # 8. Go to https://praw.readthedocs.io/en/latest/tutorials/refresh_token.html#obtaining-refresh-tokens
99 | # 9. Copy that script and save it to a .py file on your computer
100 | # 10. The instructions at the top of the script tell you to run two "EXPORT"
101 | # commands before running the script. This only works on Unix. If you are on
102 | # Windows, or simply don't want to bother with environment variables, ignore
103 | # that part of the instructions and instead add `client_id='XXXX'` and
104 | # `client_secret='XXXX'` into the praw.Reddit constructor that you see on
105 | # line 40 of that script. When I say XXXX I mean the values you just wrote
106 | # down.
107 | # 11. Run the script on your command line `python obtain_refresh_token.py`
108 | # 12. Write down the refresh token that it gives you:
109 | APP_REFRESH = os.environ.get('APP_REFRESH')
110 |
111 | ################################################################################
112 |
113 | import praw
114 |
115 | def anonymous():
116 | r = praw.Reddit(
117 | user_agent=USERAGENT,
118 | client_id=APP_ID,
119 | client_secret=APP_SECRET,
120 | )
121 | return r
122 |
123 | def login(r=None):
124 | new_r = praw.Reddit(
125 | user_agent=USERAGENT,
126 | client_id=APP_ID,
127 | client_secret=APP_SECRET,
128 | refresh_token=APP_REFRESH,
129 | )
130 | if r:
131 | r.__dict__.clear()
132 | r.__dict__.update(new_r.__dict__)
133 | return new_r
134 |
--------------------------------------------------------------------------------
/scripts/install.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | mkdir data
4 |
5 | git clone https://github.com/voussoir/timesearch
6 | cd timesearch
7 | git checkout 01f2cdb
8 | cd ..
9 |
10 | python3 -m venv venv
11 | source venv/bin/activate
12 | pip install -r timesearch/requirements.txt
13 | pip install -r generate/requirements.txt
14 | pip install -r refresh/requirements.txt
15 |
16 | mv scripts/bot.py timesearch/
17 | ./scripts/patch.sh
18 |
--------------------------------------------------------------------------------
/scripts/patch.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | patch /app/timesearch/timesearch_modules/tsdb.py /app/patches/tsdb_naming.patch
4 |
--------------------------------------------------------------------------------
/scripts/refresh.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | while [[ $# -gt 0 ]] && [[ "$1" == "--"* ]] ;
4 | do
5 | opt="$1";
6 | shift;
7 | case "$opt" in
8 | "--" ) break 2;;
9 | "--app-id" )
10 | APP_ID="$1"; shift;;
11 | "--app-secret" )
12 | APP_SECRET="$1"; shift;;
13 | *) echo >&2 "Invalid option: $@"; exit 1;;
14 | esac
15 | done
16 |
17 | export praw_client_id=$APP_ID
18 | export praw_client_secret=$APP_SECRET
19 |
20 | source venv/bin/activate
21 | ./refresh/obtain_refresh_token.py
--------------------------------------------------------------------------------
/template.env:
--------------------------------------------------------------------------------
1 | # Timesearch
2 | USERAGENT=narwhalizer
3 | CONTACT_INFO=
4 | APP_ID=
5 | APP_SECRET=
6 | APP_REFRESH=
7 |
8 | # Goggle Metadata
9 | GOGGLE_NAME=Test
10 | GOGGLE_DESCRIPTION=This is a test Goggle.
11 | GOGGLE_PUBLIC=false
12 | GOGGLE_AUTHOR=User
13 | GOGGLE_AVATAR=#ff80ed
14 | GOGGLE_HOMEPAGE=https://example.com
15 |
16 | # Goggle
17 | SUBREDDITS=netsec
18 | GOGGLE_FILENAME=output.goggle
19 | GOGGLE_EXTRAS=
20 |
21 | # Algorithm
22 | SCORE_THRESHOLD=20
23 | MIN_FREQUENCY=1
24 | MIN_EPOCH_TIME=0
25 | TOP_DOMAINS_BEHAVIOR=exclude
26 | TOP_DOMAINS_DOWNRANK_VALUE=2
27 |
--------------------------------------------------------------------------------