├── .dockerignore ├── Dockerfile ├── LICENSE ├── Makefile ├── README.md ├── data └── .gitkeep ├── generate ├── generate_goggle.py ├── requirements.txt └── top_domains.txt ├── narwhalizer.sh ├── netsec.env ├── patches └── tsdb_naming.patch ├── refresh ├── obtain_refresh_token.py └── requirements.txt ├── scripts ├── bot.py ├── install.sh ├── patch.sh └── refresh.sh └── template.env /.dockerignore: -------------------------------------------------------------------------------- 1 | **/.git 2 | **/data 3 | *.env 4 | Dockerfile -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM alpine:3.16 2 | RUN apk add python3 git patch bash 3 | WORKDIR /app 4 | COPY . . 5 | RUN ./scripts/install.sh 6 | CMD ./narwhalizer.sh 7 | 8 | RUN addgroup app 9 | RUN adduser -D -G app -h /app app 10 | RUN chown -R app:app /app 11 | USER app 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright © 2022 Forces Unseen 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | build: 2 | docker build . -t narwhalizer 3 | 4 | run: 5 | docker run -it -v ${PWD}/data:/app/data --env-file $(env) narwhalizer 6 | 7 | dev: 8 | docker run -it -v ${PWD}/data:/app/data -v ${PWD}/generate:/app/generate --env-file $(env) narwhalizer 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # narwhalizer 2 | 3 |

4 | 5 |

6 | 7 | Goggles are a way to alter the ranking in a search engine. Brave is behind this technology and you should [read more about it](https://github.com/brave/goggles-quickstart). This tool lets you generate Goggles using your favorite subreddit(s). Forces Unseen uses this to generate [netsec-goggle](https://github.com/forcesunseen/netsec-goggle). 8 | 9 | You can also read more about this on Forces Unseen's blog: [Make Search Engines Great Again!](https://blog.forcesunseen.com/make-search-engines-great-again) 10 | 11 | ## Basic Usage 12 | 13 | 1. Build the container locally using the following command. 14 | 15 | ``` 16 | docker build . -t narwhalizer 17 | ``` 18 | 19 | 2. Create an `.env` file. Refer to `netsec.env` for what you will need in there. The [Environment Variables](#environment-variables) section has more details about what each variable does. For your first time you can try it out with the `netsec.env` file by just replacing the variables that say `REPLACE`. Refer to the [Reddit API Credentials](#reddit-api-credentials) section for obtaining credentials for Reddit. 20 | 21 | 3. Run the container using your `.env` file. The following command shows `netsec.env` being used. 22 | 23 | ``` 24 | docker run -it -v ${PWD}/data:/app/data --env-file netsec.env narwhalizer 25 | ``` 26 | 27 | The first run will take longer since timesearch will have to build a database for the subreddit first. Afterwards updates will be much quicker. After execution completes the `data` directory will contain the subreddit database and generated `output.goggle` file. Every time the container is run, timesearch will check for updates and the `output.goggle` file will be re-generated. If at any point you terminate the container or your computer crashes, timesearch will continue from where it left off in building the database. 28 | 29 | 30 | ## Reddit API Credentials 31 | 32 | Timesearch requires Reddit API credentials and other identifiers. These steps only have to be completed one time. 33 | 34 | 1. Go to https://old.reddit.com/prefs/apps/ 35 | 2. Create an application with the `script` type and the redirect URL set to `http://localhost:8081`. 36 | 3. Copy the application ID and secret that was generated. (The application ID is near the top, under the name of your application) 37 | 4. Run the following command to obtain a refresh token. Replace the variables with the application ID and secret generated in the previous step. 38 | 39 | ``` 40 | docker run -it -p 127.0.0.1:8081:8081 narwhalizer ./scripts/refresh.sh --app-id --app-secret 41 | ``` 42 | 43 | After this you should be able to fill out the `APP_ID`, `APP_SECRET`, and `APP_REFRESH` variables within your `.env` file. 44 | 45 | *Note: The script requests the following scopes: `read` and `wikiread`* 46 | 47 | ## Environment Variables 48 | 49 | Everything is controlled through environment variables. Reference `netsec.env` for a real example and/or use `template.env` if you want a more bare-bones starting point. 50 | 51 | ### Timesearch 52 | 53 | `USERAGENT` should be set to a description of your API usage. In this case we just left it as `narwhalizer`. 54 | 55 | `CONTACT_INFO` should be set to an email address or Reddit username. 56 | 57 | See the [Reddit API Credentials](#reddit-api-credentials) section to obtain values for the `APP_ID`, `APP_SECRET`, and `APP_REFRESH` variables. 58 | 59 | **Example:** 60 | ``` 61 | USERAGENT=narwhalizer 62 | CONTACT_INFO= 63 | APP_ID= 64 | APP_SECRET= 65 | APP_REFRESH= 66 | ``` 67 | 68 | ### Goggle Metadata 69 | 70 | This ends up as metadata at the top of the Goggle. More details about these can be these parameters can be found [here](https://github.com/brave/goggles-quickstart/blob/main/getting-started.md#goggles-syntax). 71 | 72 | **Example:** 73 | ``` 74 | GOGGLE_NAME=Netsec 75 | GOGGLE_DESCRIPTION=Prioritizes domains popular with the information security community. Primarily uses submissions and scoring from /r/netsec. 76 | GOGGLE_PUBLIC=true 77 | GOGGLE_AUTHOR=Forces Unseen 78 | GOGGLE_AVATAR=#01ebae 79 | GOGGLE_HOMEPAGE=https://github.com/forcesunseen/netsec-goggle 80 | ``` 81 | 82 | ### Goggle 83 | 84 | `SUBREDDITS` takes a comma delimited list of subreddits. 85 | 86 | `GOGGLE_FILENAME` sets the output filename of the Goggle. 87 | 88 | `GOGGLES_EXTRAS` allows you to include additional instructions in the final Goggle. Use `\n` to separate each instruction. 89 | 90 | **Example:** 91 | ``` 92 | SUBREDDITS=netsec 93 | GOGGLE_FILENAME=netsec.goggle 94 | GOGGLE_EXTRAS=$boost=2,site=github.io\n$boost=2,site=github.com\n$boost=2,site=stackoverflow.com\n/blog.$boost=2\n/blog/$boost=2\n/docs.$boost=2\n/docs/$boost=2\n/doc/$boost=2\n/Doc/$boost=2\n/manual/$boost=2 95 | ``` 96 | 97 | ### Algorithm 98 | 99 | `SCORE_THRESHOLD` takes an integer for the minimum score of a submission to be included. 100 | 101 | `MIN_FREQUENCY` takes an integer for the minimum frequency of a domain to be included. 102 | 103 | `MIN_EPOCH_TIME` takes an integer representing Unix time for the oldest date submission to be included. Even though older submissions will not be included in the generated Goggle, we decided to still include all submissions when building the database with timesearch. 104 | 105 | `TOP_DOMAINS_BEHAVIOR` takes one of four options: `exclude`, `include`, `discard`, or `downrank`. 106 | 107 | * **exclude** - top domains will be removed from the list of subreddit submissions. 108 | * **include** - top domains will be left in the list of subreddit submissions. 109 | * **discard** - top domains will be removed from the list of subreddit submissions and will also be marked as discard within the Goggle. 110 | * **downrank** - top domains will be removed from the list of subreddit submissions and will also be downranked using `TOP_DOMAINS_DOWNRANK_VALUE` within the Goggle. 111 | 112 | `TOP_DOMAINS_DOWNRANK_VALUE` takes an integer for the amount to downrank. This is only used when `TOP_DOMAINS_BEHAVIOR` is set to `downrank`. 113 | 114 | **Example:** 115 | ``` 116 | SCORE_THRESHOLD=20 117 | MIN_FREQUENCY=1 118 | MIN_EPOCH_TIME=0 119 | TOP_DOMAINS_BEHAVIOR=exclude 120 | TOP_DOMAINS_DOWNRANK_VALUE=2 121 | ``` 122 | 123 | Don't worry too much if there are completely irrelevant domains within your list. It most likely won't have a big impact because the rules are applied to Brave's "expanded recall set", as explained below. 124 | 125 | >The instructions defined in a Goggle are not applied to Brave Search’s entire index, but to what we call the “expanded recall set,” which in turn is a function of the query. The set of candidate URLs can be in the tens of thousands, which is often more than enough to observe a noticeable effect; however, there are no guarantees that all possible URLs are surfaced (in search terminology, we have no guarantees on recall). 126 | 127 | >Goggles do not apply to the whole Brave Search index, but to the expanded recall set which is a function of the input query. So if the target pages aren’t in the recall set, or even be in the Brave Search index, they won’t be captured by the Goggle. 128 | 129 | 130 | ## Development 131 | 132 | The most common modifications can just be made through environment variables, but if you want you want to modify the `generate/generate_goggle.py` script you can mount the `generate` directory and run the container using the modified script. This can be done using the following command. 133 | 134 | ``` 135 | docker run -it -v ${PWD}/data:/app/data -v ${PWD}/generate:/app/generate --env-file netsec.env narwhalizer 136 | ``` 137 | 138 | ### Make 139 | 140 | Here are some make commands for you lazy ones. 141 | 142 | ``` 143 | make build 144 | make run env=netsec.env 145 | make dev env=netsec.env 146 | ``` 147 | 148 | ## Thank You! 149 | 150 | We owe it to [Ethan Dalool](https://github.com/voussoir) for creating [timesearch](https://github.com/voussoir/timesearch) and [Jason Baumgartner](https://github.com/pushshift) for creating [pushshift.io](https://pushshift.io/). Also, without Brave this project wouldn't even exist. 151 | 152 | Thank you so much! 153 | -------------------------------------------------------------------------------- /data/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/forcesunseen/narwhalizer/1f307d651a0d4c6bd4c0ecf88f5649f9b0ea8563/data/.gitkeep -------------------------------------------------------------------------------- /generate/generate_goggle.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sqlite3 3 | import tldextract 4 | from collections import defaultdict 5 | from datetime import timezone, datetime 6 | 7 | # Environment variables 8 | APPLICATION_ROOT = os.environ.get("APPLICATION_ROOT") 9 | GOGGLE_NAME = os.environ.get('GOGGLE_NAME') 10 | GOGGLE_DESCRIPTION = os.environ.get('GOGGLE_DESCRIPTION') 11 | GOGGLE_PUBLIC = os.environ.get('GOGGLE_PUBLIC') 12 | GOGGLE_AUTHOR = os.environ.get('GOGGLE_AUTHOR') 13 | GOGGLE_AVATAR = os.environ.get('GOGGLE_AVATAR') 14 | GOGGLE_HOMEPAGE = os.environ.get('GOGGLE_HOMEPAGE') 15 | GOGGLE_EXTRAS = os.environ.get('GOGGLE_EXTRAS') 16 | GOGGLE_FILENAME = os.environ.get('GOGGLE_FILENAME') 17 | SUBREDDITS = os.environ.get('SUBREDDITS').split(',') 18 | SCORE_THRESHOLD = int(os.environ.get('SCORE_THRESHOLD')) 19 | MIN_EPOCH_TIME = int(os.environ.get('MIN_EPOCH_TIME')) 20 | MIN_FREQUENCY = int(os.environ.get('MIN_FREQUENCY')) 21 | TOP_DOMAINS_BEHAVIOR = os.environ.get('TOP_DOMAINS_BEHAVIOR').lower() 22 | TOP_DOMAINS_DOWNRANK_VALUE = int(os.environ.get('TOP_DOMAINS_DOWNRANK_VALUE')) 23 | 24 | 25 | def dict_factory(cursor, row): 26 | """Return dicts for SQLite queries. 27 | """ 28 | d = {} 29 | for idx, col in enumerate(cursor.description): 30 | d[col[0]] = row[idx] 31 | return d 32 | 33 | 34 | def header(): 35 | """Generate the Goggle metadata. 36 | """ 37 | return ( 38 | f"! name: {GOGGLE_NAME}\n" 39 | f"! description: {GOGGLE_DESCRIPTION}\n" 40 | f"! public: {GOGGLE_PUBLIC}\n" 41 | f"! author: {GOGGLE_AUTHOR}\n" 42 | f"! avatar: {GOGGLE_AVATAR}\n" 43 | f"! homepage: {GOGGLE_HOMEPAGE}\n" 44 | ) 45 | 46 | 47 | def extras(): 48 | """Generate any extras to include in the Goggle. 49 | """ 50 | comment = "! Goggle extras\n" 51 | if GOGGLE_EXTRAS is not None: 52 | extras = GOGGLE_EXTRAS.replace("\\n", "\n") 53 | return comment + extras + '\n' 54 | else: 55 | return "" 56 | 57 | 58 | def boost(domain, amt): 59 | """Boost a site by an integer amount. 60 | """ 61 | return f'$boost={amt},site={domain}' 62 | 63 | 64 | def downrank(domain, amt): 65 | """Downrank a site by an integer amount. 66 | """ 67 | return f'$downrank={amt},site={domain}' 68 | 69 | 70 | def discard(domain): 71 | """Discard a site." 72 | """ 73 | return f'$discard,site={domain}' 74 | 75 | 76 | def TOP_DOMAINS_BEHAVIOR(): 77 | if TOP_DOMAINS_BEHAVIOR in ['exclude', 'discard', 'include', 'downrank']: 78 | return TOP_DOMAINS_BEHAVIOR 79 | 80 | 81 | def sort_domains(submissions): 82 | """Parse URLs and return sorted list of domains with exclusions. 83 | """ 84 | domains = defaultdict(lambda: 0) 85 | domains_counts = defaultdict(lambda: 0) 86 | for item in submissions: 87 | score = item['score'] 88 | url = item['url'] 89 | if url is None: 90 | pass 91 | else: 92 | extracted = tldextract.extract(url) 93 | 94 | # Double check that we have a real domain 95 | if extracted[1] and extracted[2]: 96 | domain = '.'.join(extracted[1:]).lower() 97 | # Check if we want to include domains from the top domains list 98 | if TOP_DOMAINS_BEHAVIOR() != "include": 99 | if domain not in TOP_DOMAINS: 100 | domains[domain] += score 101 | domains_counts[domain] += 1 102 | else: 103 | domains[domain] += score 104 | domains_counts[domain] += 1 105 | 106 | # Remove domains that don't meet the frequency requirements 107 | for item in domains_counts: 108 | count = domains_counts[item] 109 | if count < MIN_FREQUENCY: 110 | domains.pop(item) 111 | 112 | # Sort domains by score 113 | sorted_domains = sorted( 114 | domains.items(), key=lambda item: item[1], reverse=True) 115 | return sorted_domains 116 | 117 | 118 | def generate(domains): 119 | """Generate rankings in Goggle format. 120 | """ 121 | with open(f'{APPLICATION_ROOT}/data/{GOGGLE_FILENAME}', 'w') as target: 122 | target.write(header()) 123 | target.write(f"! generated: {datetime.now(timezone.utc)}\n") 124 | target.write('\n') 125 | target.write(extras()) 126 | 127 | entries = len(domains) 128 | 129 | if TOP_DOMAINS_BEHAVIOR() == "discard": 130 | for domain in TOP_DOMAINS: 131 | target.write('\n') 132 | target.write(discard(domain)) 133 | 134 | if TOP_DOMAINS_BEHAVIOR() == "downrank": 135 | for domain in TOP_DOMAINS: 136 | target.write('\n') 137 | target.write(downrank(domain, TOP_DOMAINS_DOWNRANK_VALUE)) 138 | 139 | # Split up the list into thirds and assign a boost of 4, 3, or 2 140 | for item in range(len(domains)): 141 | domain = domains[item][0] 142 | place = item/entries 143 | if place <= 0.33: 144 | target.write('\n') 145 | target.write(boost(domain, 4)) 146 | elif place <= 0.66: 147 | target.write('\n') 148 | target.write(boost(domain, 3)) 149 | else: 150 | target.write('\n') 151 | target.write(boost(domain, 2)) 152 | print(f'{GOGGLE_FILENAME} generated') 153 | 154 | 155 | # Store top domains in memory 156 | with open(f'{APPLICATION_ROOT}/generate/top_domains.txt', 'r') as top_domains_file: 157 | TOP_DOMAINS = top_domains_file.read().splitlines() 158 | 159 | # Get data from SQLite database 160 | submissions = [] 161 | for item in SUBREDDITS: 162 | if len(item) > 0: 163 | target_subreddit = item.lower() 164 | con = sqlite3.connect( 165 | f'{APPLICATION_ROOT}/data/subreddits/{target_subreddit}/{target_subreddit}.db') 166 | con.row_factory = dict_factory 167 | cur = con.cursor() 168 | cur.execute( 169 | 'SELECT score,url FROM submissions WHERE score >= ? AND created >= ?', (SCORE_THRESHOLD, MIN_EPOCH_TIME)) 170 | submissions.extend(cur.fetchall()) 171 | con.close() 172 | 173 | # Sort domains and generate Goggle 174 | sorted_domains = sort_domains(submissions) 175 | generate(sorted_domains) 176 | -------------------------------------------------------------------------------- /generate/requirements.txt: -------------------------------------------------------------------------------- 1 | tldextract 2 | -------------------------------------------------------------------------------- /generate/top_domains.txt: -------------------------------------------------------------------------------- 1 | google.com 2 | youtube.com 3 | facebook.com 4 | akamaiedge.net 5 | netflix.com 6 | microsoft.com 7 | instagram.com 8 | twitter.com 9 | gtld-servers.net 10 | baidu.com 11 | linkedin.com 12 | akamai.net 13 | wikipedia.org 14 | apple.com 15 | amazonaws.com 16 | yahoo.com 17 | cloudflare.com 18 | bilibili.com 19 | amazon.com 20 | a-msedge.net 21 | qq.com 22 | live.com 23 | akadns.net 24 | googletagmanager.com 25 | wordpress.org 26 | bing.com 27 | netflix.net 28 | github.com 29 | whatsapp.com 30 | pinterest.com 31 | reddit.com 32 | office.com 33 | youtu.be 34 | trafficmanager.net 35 | microsoftonline.com 36 | l-msedge.net 37 | azure.com 38 | windowsupdate.com 39 | vimeo.com 40 | adobe.com 41 | zoom.us 42 | doubleclick.net 43 | zhihu.com 44 | fastly.net 45 | yandex.ru 46 | mail.ru 47 | googlevideo.com 48 | wordpress.com 49 | domaincontrol.com 50 | goo.gl 51 | vk.com 52 | bit.ly 53 | gandi.net 54 | nflxso.net 55 | s-msedge.net 56 | googleusercontent.com 57 | msn.com 58 | taobao.com 59 | weibo.com 60 | aaplimg.com 61 | sharepoint.com 62 | csdn.net 63 | t.co 64 | blogspot.com 65 | mozilla.org 66 | tiktok.com 67 | google.com.hk 68 | tumblr.com 69 | cloudapp.net 70 | paypal.com 71 | edgekey.net 72 | windows.net 73 | macromedia.com 74 | webex.com 75 | nytimes.com 76 | xvideos.com 77 | nih.gov 78 | office365.com 79 | apple-dns.net 80 | 163.com 81 | spotify.com 82 | sina.com.cn 83 | intuit.com 84 | yandex.net 85 | pornhub.com 86 | google-analytics.com 87 | europa.eu 88 | flickr.com 89 | dropbox.com 90 | skype.com 91 | twitch.tv 92 | canva.com 93 | attcompute.com 94 | yahoo.co.jp 95 | medium.com 96 | stackoverflow.com 97 | imdb.com 98 | ebay.com 99 | sohu.com 100 | gravatar.com 101 | googledomains.com 102 | jd.com 103 | opera.com 104 | naver.com 105 | lencr.org 106 | fandom.com 107 | icloud.com 108 | t-msedge.net 109 | cnn.com 110 | myfritz.net 111 | soundcloud.com 112 | outlook.com 113 | cloudfront.net 114 | aliexpress.com 115 | myshopify.com 116 | tmall.com 117 | t.me 118 | gvt1.com 119 | apache.org 120 | amazon.in 121 | fbcdn.net 122 | archive.org 123 | bbc.com 124 | forbes.com 125 | zemanta.com 126 | nic.ru 127 | theguardian.com 128 | bbc.co.uk 129 | quora.com 130 | cloudflare.net 131 | omtrdc.net 132 | wellsfargo.com 133 | force.com 134 | github.io 135 | abusix.zone 136 | w3.org 137 | msedge.net 138 | chaturbate.com 139 | xhamster.com 140 | office.net 141 | forms.gle 142 | digicert.com 143 | salesforce.com 144 | douban.com 145 | etsy.com 146 | indeed.com 147 | p-cdn.us 148 | sourceforge.net 149 | linode.com 150 | google.co.in 151 | duckduckgo.com 152 | sciencedirect.com 153 | spo-msedge.net 154 | miit.gov.cn 155 | imgur.com 156 | teamviewer.com 157 | amazon.co.uk 158 | creativecommons.org 159 | who.int 160 | 1688.com 161 | booking.com 162 | discord.com 163 | googlesyndication.com 164 | weebly.com 165 | dnsowl.com 166 | issuu.com 167 | twimg.com 168 | akamaihd.net 169 | youku.com 170 | spankbang.com 171 | wixsite.com 172 | cdc.gov 173 | researchgate.net 174 | roblox.com 175 | telegram.org 176 | flipkart.com 177 | reuters.com 178 | washingtonpost.com 179 | xnxx.com 180 | wikimedia.org 181 | oracle.com 182 | amazon.co.jp 183 | dailymail.co.uk 184 | tiktokcdn.com 185 | cedexis.net 186 | mangosip.ru 187 | registrar-servers.com 188 | tradingview.com 189 | harvard.edu 190 | bloomberg.com 191 | wsj.com 192 | blogger.com 193 | dnsmadeeasy.com 194 | hp.com 195 | tinyurl.com 196 | googleapis.com 197 | wix.com 198 | googleadservices.com 199 | worldfcdn.com 200 | alibaba.com 201 | akamaized.net 202 | ytimg.com 203 | mit.edu 204 | meraki.com 205 | kaspersky.com 206 | businessinsider.com 207 | aliyun.com 208 | shopify.com 209 | wa.me 210 | hwcdn.net 211 | go.com 212 | gvt2.com 213 | alipay.com 214 | wp.com 215 | dell.com 216 | samsung.com 217 | youtube-nocookie.com 218 | cisco.com 219 | epicgames.com 220 | cdninstagram.com 221 | ok.ru 222 | wiley.com 223 | cnbc.com 224 | freepik.com 225 | google.cn 226 | windows.com 227 | slideshare.net 228 | hao123.com 229 | ibm.com 230 | amazon.de 231 | azurewebsites.net 232 | demdex.net 233 | sogou.com 234 | php.net 235 | pixiv.net 236 | behance.net 237 | 360.cn 238 | slack.com 239 | edgesuite.net 240 | godaddy.com 241 | so.com 242 | nature.com 243 | coinmarketcap.com 244 | mediafire.com 245 | springer.com 246 | google.de 247 | walmart.com 248 | list-manage.com 249 | espn.com 250 | zendesk.com 251 | tencent.com 252 | aol.com 253 | adnxs.com 254 | grammarly.com 255 | binance.com 256 | stanford.edu 257 | fc2.com 258 | cnblogs.com 259 | weather.com 260 | un.org 261 | comcast.net 262 | o365filtering.com 263 | nasa.gov 264 | deepl.com 265 | elasticbeanstalk.com 266 | cnki.net 267 | gov.uk 268 | gnu.org 269 | app-measurement.com 270 | mega.nz 271 | doi.org 272 | db.com 273 | fiverr.com 274 | line.me 275 | att.net 276 | trello.com 277 | unsplash.com 278 | w3schools.com 279 | douyu.com 280 | salesforceliveagent.com 281 | nginx.org 282 | marriott.com 283 | name-services.com 284 | fincoad.com 285 | usatoday.com 286 | tiktokv.com 287 | azureedge.net 288 | cnet.com 289 | akam.net 290 | herokudns.com 291 | wsdvs.com 292 | ilovepdf.com 293 | dailymotion.com 294 | iqiyi.com 295 | hubspot.com 296 | google.co.uk 297 | sberdevices.ru 298 | stripe.com 299 | npr.org 300 | xcal.tv 301 | yelp.com 302 | telegraph.co.uk 303 | rakuten.co.jp 304 | linktr.ee 305 | eventbrite.com 306 | nginx.com 307 | steampowered.com 308 | surveymonkey.com 309 | rambler.ru 310 | cdngslb.com 311 | google.co.jp 312 | indiatimes.com 313 | instructure.com 314 | 2mdn.net 315 | shifen.com 316 | wetransfer.com 317 | mcafee.com 318 | amazon-adsystem.com 319 | goodreads.com 320 | pubmatic.com 321 | time.com 322 | huya.com 323 | deviantart.com 324 | casalemedia.com 325 | udemy.com 326 | cpanel.net 327 | rackspace.net 328 | ca.gov 329 | realsrv.com 330 | ampproject.org 331 | themeforest.net 332 | patreon.com 333 | snapchat.com 334 | ring.com 335 | tripadvisor.com 336 | userapi.com 337 | a2z.com 338 | google.fr 339 | criteo.com 340 | scorecardresearch.com 341 | gcdn.co 342 | foxnews.com 343 | zillow.com 344 | scribd.com 345 | avito.ru 346 | ted.com 347 | mailchimp.com 348 | bitly.com 349 | reg.ru 350 | cdn77.org 351 | gstatic.com 352 | healthline.com 353 | jomodns.com 354 | jianshu.com 355 | gosuslugi.ru 356 | stuvamac.com 357 | wired.com 358 | addtoany.com 359 | pki.goog 360 | manitu.net 361 | google.ru 362 | unity3d.com 363 | whatsapp.net 364 | independent.co.uk 365 | wildberries.ru 366 | huawei.com 367 | webmd.com 368 | pixabay.com 369 | incapdns.net 370 | nstld.com 371 | myspace.com 372 | livejournal.com 373 | ggpht.com 374 | shutterstock.com 375 | aparat.com 376 | addthis.com 377 | statista.com 378 | berkeley.edu 379 | stackexchange.com 380 | facebook.net 381 | xiaomi.com 382 | mysql.com 383 | digikala.com 384 | squarespace.com 385 | ft.com 386 | amazon.ca 387 | cpanel.com 388 | douyin.com 389 | google.ca 390 | craigslist.org 391 | youdao.com 392 | adobe.io 393 | okta.com 394 | intel.com 395 | g.page 396 | techcrunch.com 397 | huffingtonpost.com 398 | hupu.com 399 | latimes.com 400 | 3dmgame.com 401 | loc.gov 402 | fb.com 403 | digitalocean.com 404 | chase.com 405 | amzn.to 406 | hicloud.com 407 | f5silverline.com 408 | consultant.ru 409 | daum.net 410 | onlyfans.com 411 | appsflyer.com 412 | opendns.com 413 | savefrom.net 414 | hichina.com 415 | free.fr 416 | theverge.com 417 | alicdn.com 418 | aliyuncs.com 419 | atlassian.net 420 | taboola.com 421 | goskope.com 422 | cambridge.org 423 | pexels.com 424 | rzone.de 425 | f2pool.com 426 | ohthree.com 427 | speedtest.net 428 | amazonvideo.com 429 | kickstarter.com 430 | googletagservices.com 431 | investopedia.com 432 | rbxcdn.com 433 | tandfonline.com 434 | alibabadns.com 435 | b-msedge.net 436 | google.es 437 | beian.gov.cn 438 | ikea.com 439 | uol.com.br 440 | washington.edu 441 | roche.com 442 | huffpost.com 443 | rackspace.com 444 | cornell.edu 445 | debian.org 446 | giphy.com 447 | openx.net 448 | ozon.ru 449 | amazon.it 450 | hotstar.com 451 | prnewswire.com 452 | apple.news 453 | fedex.com 454 | upwork.com 455 | state.gov 456 | trustpilot.com 457 | feishu.cn 458 | att.com 459 | zoho.com 460 | homedepot.com 461 | oup.com 462 | safebrowsing.apple 463 | redd.it 464 | azure-dns.com 465 | saasprotection.com 466 | nicovideo.jp 467 | tistory.com 468 | britannica.com 469 | cbsnews.com 470 | no-ip.com 471 | android.com 472 | theatlantic.com 473 | nationalgeographic.com 474 | fda.gov 475 | ietf.org 476 | business.site 477 | hostgator.com 478 | yandex.com 479 | galaxydata.ru 480 | nbcnews.com 481 | lenovo.com.cn 482 | bankofamerica.com 483 | bidswitch.net 484 | google.com.tw 485 | dcinside.com 486 | cloudns.net 487 | eset.com 488 | mi.com 489 | buzzfeed.com 490 | sagepub.com 491 | maricopa.gov 492 | amazon.fr 493 | pinimg.com 494 | nypost.com 495 | mega.co.nz 496 | worldnic.com 497 | sentry.io 498 | cctv.com 499 | steamcommunity.com 500 | wikihow.com 501 | academia.edu 502 | root-servers.net 503 | sitemaps.org 504 | bandcamp.com 505 | name.com 506 | adriver.ru 507 | rlcdn.com 508 | usnews.com 509 | globo.com 510 | msftconnecttest.com 511 | hotjar.com 512 | dns.com 513 | primevideo.com 514 | ali213.net 515 | investing.com 516 | marketwatch.com 517 | dribbble.com 518 | bluehost.com 519 | aboutads.info 520 | mailchi.mp 521 | yahoodns.net 522 | disqus.com 523 | moatads.com 524 | rt.com 525 | usda.gov 526 | gmail.com 527 | box.com 528 | miui.com 529 | princeton.edu 530 | avast.com 531 | jimdo.com 532 | arnebrachhold.de 533 | mayoclinic.org 534 | hbr.org 535 | messenger.com 536 | nga.cn 537 | notion.so 538 | vkontakte.ru 539 | amazon.es 540 | rubiconproject.com 541 | coursera.org 542 | forter.com 543 | marca.com 544 | hugedomains.com 545 | statcounter.com 546 | herokuapp.com 547 | ea.com 548 | tbcache.com 549 | nr-data.net 550 | change.org 551 | igamecj.com 552 | mozilla.com 553 | nintendo.net 554 | merriam-webster.com 555 | nike.com 556 | ups.com 557 | unesco.org 558 | noaa.gov 559 | rbc.ru 560 | launchpad.net 561 | dmm.co.jp 562 | msftncsi.com 563 | trendmicro.com 564 | oath.cloud 565 | livedoor.jp 566 | stripchat.com 567 | akismet.com 568 | pbs.org 569 | nvidia.com 570 | ntp.org 571 | airbnb.com 572 | spaceweb.pro 573 | rackspaceclouddb.com 574 | smzdm.com 575 | usps.com 576 | arcgis.com 577 | bet365.com 578 | eastmoney.com 579 | adsrvr.org 580 | economist.com 581 | msn.cn 582 | umich.edu 583 | whitehouse.gov 584 | ngenix.net 585 | outbrain.com 586 | elsevier.com 587 | ubuntu.com 588 | chaoxing.com 589 | autodesk.com 590 | spcsdns.net 591 | e-msedge.net 592 | networkadvertising.org 593 | columbia.edu 594 | envato.com 595 | calendly.com 596 | verisign.com 597 | google.com.br 598 | bongacams.com 599 | businesswire.com 600 | epa.gov 601 | target.com 602 | ieee.org 603 | xinhuanet.com 604 | telegram.me 605 | atlassian.com 606 | varzesh3.com 607 | allaboutcookies.org 608 | ifeng.com 609 | accuweather.com 610 | y2mate.com 611 | northgrum.com 612 | wpengine.com 613 | irs.gov 614 | sorbs.net 615 | gamersky.com 616 | netease.com 617 | nessus.org 618 | exacttarget.com 619 | capitalone.com 620 | ptinews.com 621 | xiaohongshu.com 622 | geeksforgeeks.org 623 | redhat.com 624 | google.com.mx 625 | digg.com 626 | typepad.com 627 | 3lift.com 628 | roku.com 629 | ixigua.com 630 | namu.wiki 631 | bestbuy.com 632 | worldbank.org 633 | toutiao.com 634 | istockphoto.com 635 | ria.ru 636 | google.com.sg 637 | mts.ru 638 | hulu.com 639 | myqcloud.com 640 | fbs1-t-msedge.net 641 | 9gag.com 642 | media.net 643 | appspot.com 644 | shein.com 645 | b-cdn.net 646 | timeweb.ru 647 | typeform.com 648 | abc.net.au 649 | fortinet.net 650 | canada.ca 651 | rackcdn.com 652 | quizlet.com 653 | qualtrics.com 654 | about.com 655 | newrelic.com 656 | opensea.io 657 | netangels.ru 658 | yts.mx 659 | deloitte.com 660 | sciencemag.org 661 | atomile.com 662 | fb.me 663 | psu.edu 664 | arxiv.org 665 | gitlab.com 666 | domainmarket.com 667 | americanexpress.com 668 | media-amazon.com 669 | rncdn7.com 670 | visualstudio.com 671 | beeline.ru 672 | ucla.edu 673 | skyeng.link 674 | cbc.ca 675 | yale.edu 676 | tremorhub.com 677 | chinamobile.com 678 | reverso.net 679 | apigee.net 680 | dnspod.net 681 | ivi.ru 682 | dynect.net 683 | aiv-cdn.net 684 | ox.ac.uk 685 | vice.com 686 | dw.com 687 | lum-superproxy.io 688 | dreamhost.com 689 | iso.org 690 | upenn.edu 691 | sitescout.com 692 | constantcontact.com 693 | google.com.tr 694 | ameblo.jp 695 | cricbuzz.com 696 | figma.com 697 | zdnet.com 698 | imgsmail.ru 699 | tinkoff.ru 700 | nhentai.net 701 | c-msedge.net 702 | mashable.com 703 | smallpdf.com 704 | shaparak.ir 705 | hdfcbank.com 706 | psychologytoday.com 707 | aliexpress.ru 708 | vox.com 709 | rosintel.com 710 | jotform.com 711 | sophos.com 712 | fincoec.com 713 | eepurl.com 714 | shopee.tw 715 | lenovo.com 716 | mzstatic.com 717 | google.com.au 718 | nist.gov 719 | gofundme.com 720 | mathtag.com 721 | sciencedaily.com 722 | markmonitor.com 723 | checkpoint.com 724 | chsi.com.cn 725 | evernote.com 726 | aliyundrive.com 727 | playstation.com 728 | gitee.com 729 | bmj.com 730 | runoob.com 731 | dhl.com 732 | jdadelivers.com 733 | wisc.edu 734 | wiktionary.org 735 | nokia.com 736 | cogentco.com 737 | rayjump.com 738 | google.co.th 739 | ortb.net 740 | theconversation.com 741 | feedburner.com 742 | google.it 743 | getpocket.com 744 | newyorker.com 745 | ttlivecdn.com 746 | zerodha.com 747 | zhibo8.cc 748 | discord.gg 749 | apnews.com 750 | plos.org 751 | sharethrough.com 752 | uber.com 753 | elpais.com 754 | optimizely.com 755 | bitrix24.ru 756 | gearbest.com 757 | fortune.com 758 | trendyol.com 759 | biomedcentral.com 760 | bluekai.com 761 | wattpad.com 762 | ps.kz 763 | quantserve.com 764 | animeflv.net 765 | genius.com 766 | icourse163.org 767 | python.org 768 | medicalnewstoday.com 769 | ovscdns.com 770 | doubleverify.com 771 | paloaltonetworks.com 772 | docker.com 773 | weforum.org 774 | nba.com 775 | ertelecom.ru 776 | azurefd.net 777 | nest.com 778 | lazada.sg 779 | 1337x.to 780 | yimg.com 781 | alidns.com 782 | fastcompany.com 783 | mirtesen.ru 784 | mdpi.com 785 | ign.com 786 | zippyshare.com 787 | zhihuishu.com 788 | uci.edu 789 | mlb.com 790 | fontawesome.com 791 | jquery.com 792 | rtcfront.net 793 | va.gov 794 | engadget.com 795 | ovh.net 796 | advertising.com 797 | sfx.ms 798 | verizon.com 799 | 52pojie.cn 800 | umn.edu 801 | newsweek.com 802 | tds.net 803 | gihc.net 804 | ebay.co.uk 805 | kakao.com 806 | meetup.com 807 | avito.st 808 | thesun.co.uk 809 | as.com 810 | agkn.com 811 | vtb.ru 812 | nic.uk 813 | autohome.com.cn 814 | frontiersin.org 815 | sxyprn.com 816 | stumbleupon.com 817 | duolingo.com 818 | llnwi.net 819 | oreilly.com 820 | attn.tv 821 | gizmodo.com 822 | mckinsey.com 823 | spiegel.de 824 | jstor.org 825 | adsafeprotected.com 826 | apa.org 827 | online-metrix.net 828 | eporner.com 829 | inc.com 830 | crashlytics.com 831 | playstation.net 832 | pconline.com.cn 833 | expedia.com 834 | hilton.com 835 | comcast.com 836 | mirror.co.uk 837 | vmware.com 838 | swrve.com 839 | mercadolibre.com.mx 840 | example.com 841 | asus.com 842 | secureserver.net 843 | gismeteo.ru 844 | xfinity.com 845 | webs.com 846 | timeanddate.com 847 | discordapp.com 848 | gamespot.com 849 | myanimelist.net 850 | thepaper.cn 851 | google.com.sa 852 | dyndns.org 853 | cam.ac.uk 854 | zol.com.cn 855 | youporn.com 856 | sun.com 857 | jhu.edu 858 | corelux.net 859 | sahibinden.com 860 | wayfair.com 861 | criteo.net 862 | e-hentai.org 863 | emxdgt.com 864 | wp.pl 865 | disneyplus.com 866 | crunchyroll.com 867 | duckdns.org 868 | asana.com 869 | sberbank.ru 870 | tapad.com 871 | pikiran-rakyat.com 872 | people.com.cn 873 | everesttech.net 874 | smartadserver.com 875 | entrepreneur.com 876 | mwbsys.com 877 | utexas.edu 878 | cmu.edu 879 | fcuat.com 880 | ripn.net 881 | bidr.io 882 | oecd.org 883 | tripod.com 884 | youronlinechoices.com 885 | plesk.com 886 | photobucket.com 887 | huaban.com 888 | xing.com 889 | remove.bg 890 | namebrightdns.com 891 | realtor.com 892 | cdn-apple.com 893 | jb51.net 894 | hitomi.la 895 | hinet.net 896 | bbb.org 897 | coupang.com 898 | softonic.com 899 | acs.org 900 | exelator.com 901 | blog.jp 902 | glassdoor.com 903 | chess.com 904 | geocities.com 905 | onlinesbi.com 906 | imrworldwide.com 907 | teads.tv 908 | samsungcloudsolution.com 909 | tokopedia.com 910 | kunlunsl.com 911 | pewresearch.org 912 | aljazeera.com 913 | merchantlink.com 914 | tencent-cloud.net 915 | footprint.net 916 | xhamsterlive.com 917 | licdn.com 918 | dropboxusercontent.com 919 | icloud-content.com 920 | readmanganato.com 921 | canonical.com 922 | openstreetmap.org 923 | gotowebinar.com 924 | scientificamerican.com 925 | usgovcloudapi.net 926 | ed.gov 927 | ny.gov 928 | kaspersky-labs.com 929 | blackboard.com 930 | sectigo.com 931 | uchicago.edu 932 | sfgate.com 933 | shopee.co.id 934 | xueqiu.com 935 | nsone.net 936 | sky.com 937 | crwdcntrl.net 938 | substack.com 939 | nps.gov 940 | lowes.com 941 | hbo.com 942 | mercadolibre.com.ar 943 | adp.com 944 | elegantthemes.com 945 | dnsv1.com 946 | hootsuite.com 947 | amazon.com.mx 948 | 360doc.com 949 | samsungcloud.com 950 | ilive.cn 951 | guardian.co.uk 952 | 33across.com 953 | fmkorea.com 954 | xbox.com 955 | mihoyo.com 956 | wbx2.com 957 | chegg.com 958 | pcmag.com 959 | ys7.com 960 | lenta.ru 961 | fao.org 962 | applovin.com 963 | ecdns.net 964 | t-mobile.com 965 | fidelity.com 966 | gartner.com 967 | webrootcloudav.com 968 | immedia-semi.com 969 | getbootstrap.com 970 | detik.com 971 | allegro.pl 972 | weibo.cn 973 | qidian.com 974 | purdue.edu 975 | google.co.kr 976 | pnas.org 977 | ftc.gov 978 | soso.com 979 | braze.com 980 | indiegogo.com 981 | ucsd.edu 982 | news.com.au 983 | quillbot.com 984 | nyu.edu 985 | slate.com 986 | indiamart.com 987 | arstechnica.com 988 | onet.pl 989 | emailvision.net 990 | 360yield.com 991 | ucoz.ru 992 | thelancet.com 993 | cdn20.com 994 | op.gg 995 | bizjournals.com 996 | anchor.fm 997 | tribunnews.com 998 | google.com.ar 999 | branch.io 1000 | revopush.com 1001 | -------------------------------------------------------------------------------- /narwhalizer.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # This is where we have everything set up in the container 4 | export APPLICATION_ROOT=/app 5 | 6 | # Get ready 7 | source "$APPLICATION_ROOT"/venv/bin/activate 8 | cd "$APPLICATION_ROOT"/data 9 | 10 | # Update database for each subreddit using timesearch 11 | IFS=',' read -ra TARGET_SUBREDDITS <<< "$SUBREDDITS" 12 | for SUBREDDIT in "${TARGET_SUBREDDITS[@]}"; do 13 | echo "Running timesearch for subreddit: $SUBREDDIT" 14 | python "$APPLICATION_ROOT"/timesearch/timesearch.py get_submissions -r "$SUBREDDIT" 15 | done 16 | 17 | # Generate Goggle 18 | python "$APPLICATION_ROOT"/generate/generate_goggle.py 19 | -------------------------------------------------------------------------------- /netsec.env: -------------------------------------------------------------------------------- 1 | # Timesearch 2 | USERAGENT=narwhalizer 3 | CONTACT_INFO= 4 | APP_ID= 5 | APP_SECRET= 6 | APP_REFRESH= 7 | 8 | # Goggle Metadata 9 | GOGGLE_NAME=Netsec 10 | GOGGLE_DESCRIPTION=Prioritizes domains popular with the information security community. Primarily uses submissions and scoring from /r/netsec. 11 | GOGGLE_PUBLIC=true 12 | GOGGLE_AUTHOR=Forces Unseen 13 | GOGGLE_AVATAR=#01ebae 14 | GOGGLE_HOMEPAGE=https://github.com/forcesunseen/netsec-goggle 15 | 16 | # Goggle 17 | SUBREDDITS=netsec 18 | GOGGLE_FILENAME=netsec.goggle 19 | GOGGLE_EXTRAS=$boost=2,site=github.io\n$boost=2,site=github.com\n$boost=2,site=stackoverflow.com\n/blog.$boost=2\n/blog/$boost=2\n/docs.$boost=2\n/docs/$boost=2\n/doc/$boost=2\n/Doc/$boost=2\n/manual/$boost=2 20 | 21 | # Algorithm 22 | SCORE_THRESHOLD=20 23 | MIN_FREQUENCY=1 24 | MIN_EPOCH_TIME=0 25 | TOP_DOMAINS_BEHAVIOR=exclude 26 | TOP_DOMAINS_DOWNRANK_VALUE=2 -------------------------------------------------------------------------------- /patches/tsdb_naming.patch: -------------------------------------------------------------------------------- 1 | 20,24c20,24 2 | < '.\\{name}.db', 3 | < '.\\subreddits\\{name}\\{name}.db', 4 | < '.\\{name}\\{name}.db', 5 | < '.\\databases\\{name}.db', 6 | < '.\\subreddits\\{name}\\{name}.db', 7 | --- 8 | > '{name}.db', 9 | > 'subreddits/{name}/{name}.db', 10 | > '{name}/{name}.db', 11 | > 'databases/{name}.db', 12 | > 'subreddits/{name}/{name}.db', 13 | 27,31c27,31 14 | < '.\\@{name}.db', 15 | < '.\\users\\@{name}\\@{name}.db', 16 | < '.\\@{name}\\@{name}.db', 17 | < '.\\databases\\@{name}.db', 18 | < '.\\users\\@{name}\\@{name}.db', 19 | --- 20 | > '@{name}.db', 21 | > 'users/@{name}/@{name}.db', 22 | > '@{name}/@{name}.db', 23 | > 'databases/@{name}.db', 24 | > 'users/@{name}/@{name}.db', 25 | -------------------------------------------------------------------------------- /refresh/obtain_refresh_token.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | ''' 3 | Copyright (c) 2016, Bryce Boe 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 2. Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 18 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 19 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 20 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 21 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 22 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 23 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 24 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25 | ''' 26 | import random 27 | import socket 28 | import sys 29 | 30 | import praw 31 | 32 | 33 | def main(): 34 | scopes = ["read", "wikiread"] 35 | 36 | reddit = praw.Reddit( 37 | redirect_uri="http://localhost:8081", 38 | user_agent="narwhalizer", 39 | ) 40 | state = str(random.randint(0, 65000)) 41 | url = reddit.auth.url(duration="permanent", scopes=scopes, state=state) 42 | print(f"Now open this url in your browser: {url}") 43 | 44 | client = receive_connection() 45 | data = client.recv(1024).decode("utf-8") 46 | param_tokens = data.split(" ", 2)[1].split("?", 1)[1].split("&") 47 | params = { 48 | key: value for (key, value) in [token.split("=") for token in param_tokens] 49 | } 50 | 51 | if state != params["state"]: 52 | send_message( 53 | client, 54 | f"State mismatch. Expected: {state} Received: {params['state']}", 55 | ) 56 | return 1 57 | elif "error" in params: 58 | send_message(client, params["error"]) 59 | return 1 60 | 61 | print(params["code"]) 62 | refresh_token = reddit.auth.authorize(params["code"]) 63 | send_message(client, f"Refresh token: {refresh_token}") 64 | return 0 65 | 66 | 67 | def receive_connection(): 68 | """Wait for and then return a connected socket.. 69 | 70 | Opens a TCP connection on port 8081, and waits for a single client. 71 | 72 | """ 73 | server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 74 | server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) 75 | server.bind(("0.0.0.0", 8081)) 76 | server.listen(1) 77 | client = server.accept()[0] 78 | server.close() 79 | return client 80 | 81 | 82 | def send_message(client, message): 83 | """Send message to client and close the connection.""" 84 | print(message) 85 | client.send(f"HTTP/1.1 200 OK\r\n\r\n{message}".encode("utf-8")) 86 | client.close() 87 | 88 | 89 | if __name__ == "__main__": 90 | sys.exit(main()) 91 | -------------------------------------------------------------------------------- /refresh/requirements.txt: -------------------------------------------------------------------------------- 1 | praw 2 | -------------------------------------------------------------------------------- /scripts/bot.py: -------------------------------------------------------------------------------- 1 | ''' 2 | BSD 3-Clause License 3 | 4 | Copyright (c) 2022, Ethan Dalool aka voussoir 5 | All rights reserved. 6 | 7 | Redistribution and use in source and binary forms, with or without 8 | modification, are permitted provided that the following conditions are met: 9 | 10 | 1. Redistributions of source code must retain the above copyright notice, this 11 | list of conditions and the following disclaimer. 12 | 13 | 2. Redistributions in binary form must reproduce the above copyright notice, 14 | this list of conditions and the following disclaimer in the documentation 15 | and/or other materials provided with the distribution. 16 | 17 | 3. Neither the name of the copyright holder nor the names of its 18 | contributors may be used to endorse or promote products derived from 19 | this software without specific prior written permission. 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 22 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 24 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 25 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 27 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 28 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 29 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 31 | ''' 32 | import os 33 | 34 | ''' 35 | bot.py template for PRAW4 36 | 37 | This file will be imported by all bots, and provides a standard way to log in. 38 | 39 | You should never place this file in a git repository or any place where it will 40 | get shared. 41 | 42 | The requirements for this file are: 43 | 44 | 1 A function `anonymous` with no arguments, which returns a `praw.Reddit` 45 | instance that has a Useragent but is otherwise anonymous / unauthenticated. 46 | This will be used in bots that need to make requests but don't need any 47 | permissions. 48 | 49 | 2 A function `login` with optional parameter `r`, which returns an 50 | authenticated Reddit instance. 51 | If `r` is provided, authenticate it. 52 | If not, create one using `anonymous` and authenticate that. 53 | Either way, return the instance when finished. 54 | 55 | The exact workings of these functions, and the existence of any other variables 56 | and functions are up to you. 57 | 58 | I suggest placing this file in a private directory and adding that directory to 59 | your `PYTHONPATH` environment variable. This makes it importable from anywhere. 60 | 61 | However, you may place it in your default Python library. An easy way to find 62 | this is by importing a standard library module and checking its location: 63 | >>> import os 64 | >>> os 65 | 66 | 67 | But placing the file in the standard library means you will have to copy it over 68 | when you upgrade Python. 69 | 70 | If you need multiple separate bots, I would suggest creating copies of this file 71 | with different names, and then using `import specialbot as bot` within the 72 | application, so that the rest of the interface can stay the same. 73 | ''' 74 | 75 | # The USERAGENT is a short description of why you're using reddit's API. 76 | # You can make it as simple as "/u/myusername's bot for /r/mysubreddit". 77 | # It should include your username so that reddit can contact you if there 78 | # is a problem. 79 | USERAGENT = os.environ.get('USERAGENT') 80 | 81 | # CONTACT_INFO can be your reddit username, or your email address, or any other 82 | # means of contacting you. This is used for some bot programs but not all. 83 | CONTACT_INFO = os.environ.get('CONTACT_INFO') 84 | 85 | # It's time to get your OAuth credentials. 86 | # 1. Go to https://old.reddit.com/prefs/apps 87 | # 2. Click create a new app 88 | # 3. Give it any name you want 89 | # 4. Choose "script" type 90 | # 5. Description and About URI can be blank 91 | # 6. Put "http://localhost:8080" as the Redirect URI 92 | # 7. Now that you have created your app, write down the app ID (which appears 93 | # under its name), the secret, and the URI (http://localhost:8080) in the 94 | # variables below: 95 | APP_ID = os.environ.get('APP_ID') 96 | APP_SECRET = os.environ.get('APP_SECRET') 97 | APP_URI = os.environ.get('APP_URI') 98 | # 8. Go to https://praw.readthedocs.io/en/latest/tutorials/refresh_token.html#obtaining-refresh-tokens 99 | # 9. Copy that script and save it to a .py file on your computer 100 | # 10. The instructions at the top of the script tell you to run two "EXPORT" 101 | # commands before running the script. This only works on Unix. If you are on 102 | # Windows, or simply don't want to bother with environment variables, ignore 103 | # that part of the instructions and instead add `client_id='XXXX'` and 104 | # `client_secret='XXXX'` into the praw.Reddit constructor that you see on 105 | # line 40 of that script. When I say XXXX I mean the values you just wrote 106 | # down. 107 | # 11. Run the script on your command line `python obtain_refresh_token.py` 108 | # 12. Write down the refresh token that it gives you: 109 | APP_REFRESH = os.environ.get('APP_REFRESH') 110 | 111 | ################################################################################ 112 | 113 | import praw 114 | 115 | def anonymous(): 116 | r = praw.Reddit( 117 | user_agent=USERAGENT, 118 | client_id=APP_ID, 119 | client_secret=APP_SECRET, 120 | ) 121 | return r 122 | 123 | def login(r=None): 124 | new_r = praw.Reddit( 125 | user_agent=USERAGENT, 126 | client_id=APP_ID, 127 | client_secret=APP_SECRET, 128 | refresh_token=APP_REFRESH, 129 | ) 130 | if r: 131 | r.__dict__.clear() 132 | r.__dict__.update(new_r.__dict__) 133 | return new_r 134 | -------------------------------------------------------------------------------- /scripts/install.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | mkdir data 4 | 5 | git clone https://github.com/voussoir/timesearch 6 | cd timesearch 7 | git checkout 01f2cdb 8 | cd .. 9 | 10 | python3 -m venv venv 11 | source venv/bin/activate 12 | pip install -r timesearch/requirements.txt 13 | pip install -r generate/requirements.txt 14 | pip install -r refresh/requirements.txt 15 | 16 | mv scripts/bot.py timesearch/ 17 | ./scripts/patch.sh 18 | -------------------------------------------------------------------------------- /scripts/patch.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | patch /app/timesearch/timesearch_modules/tsdb.py /app/patches/tsdb_naming.patch 4 | -------------------------------------------------------------------------------- /scripts/refresh.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | while [[ $# -gt 0 ]] && [[ "$1" == "--"* ]] ; 4 | do 5 | opt="$1"; 6 | shift; 7 | case "$opt" in 8 | "--" ) break 2;; 9 | "--app-id" ) 10 | APP_ID="$1"; shift;; 11 | "--app-secret" ) 12 | APP_SECRET="$1"; shift;; 13 | *) echo >&2 "Invalid option: $@"; exit 1;; 14 | esac 15 | done 16 | 17 | export praw_client_id=$APP_ID 18 | export praw_client_secret=$APP_SECRET 19 | 20 | source venv/bin/activate 21 | ./refresh/obtain_refresh_token.py -------------------------------------------------------------------------------- /template.env: -------------------------------------------------------------------------------- 1 | # Timesearch 2 | USERAGENT=narwhalizer 3 | CONTACT_INFO= 4 | APP_ID= 5 | APP_SECRET= 6 | APP_REFRESH= 7 | 8 | # Goggle Metadata 9 | GOGGLE_NAME=Test 10 | GOGGLE_DESCRIPTION=This is a test Goggle. 11 | GOGGLE_PUBLIC=false 12 | GOGGLE_AUTHOR=User 13 | GOGGLE_AVATAR=#ff80ed 14 | GOGGLE_HOMEPAGE=https://example.com 15 | 16 | # Goggle 17 | SUBREDDITS=netsec 18 | GOGGLE_FILENAME=output.goggle 19 | GOGGLE_EXTRAS= 20 | 21 | # Algorithm 22 | SCORE_THRESHOLD=20 23 | MIN_FREQUENCY=1 24 | MIN_EPOCH_TIME=0 25 | TOP_DOMAINS_BEHAVIOR=exclude 26 | TOP_DOMAINS_DOWNRANK_VALUE=2 27 | --------------------------------------------------------------------------------