├── .dockerignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── data
    └── .gitkeep
├── generate
    ├── generate_goggle.py
    ├── requirements.txt
    └── top_domains.txt
├── narwhalizer.sh
├── netsec.env
├── patches
    └── tsdb_naming.patch
├── refresh
    ├── obtain_refresh_token.py
    └── requirements.txt
├── scripts
    ├── bot.py
    ├── install.sh
    ├── patch.sh
    └── refresh.sh
└── template.env


/.dockerignore:
--------------------------------------------------------------------------------
1 | **/.git
2 | **/data
3 | *.env
4 | Dockerfile


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM alpine:3.16
 2 | RUN apk add python3 git patch bash
 3 | WORKDIR /app
 4 | COPY . .
 5 | RUN ./scripts/install.sh
 6 | CMD ./narwhalizer.sh
 7 | 
 8 | RUN addgroup app
 9 | RUN adduser -D -G app -h /app app
10 | RUN chown -R app:app /app
11 | USER app
12 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright © 2022 Forces Unseen
2 | 
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 | 
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 | 
7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | build:
2 | 	docker build . -t narwhalizer
3 | 
4 | run: 
5 | 	docker run -it -v ${PWD}/data:/app/data --env-file $(env) narwhalizer
6 | 
7 | dev:
8 | 	docker run -it -v ${PWD}/data:/app/data -v ${PWD}/generate:/app/generate --env-file $(env) narwhalizer
9 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # narwhalizer
  2 | 
  3 | <p align="center">
  4 |   <img src="https://i.snap.as/YLH9XsMZ.jpeg" width="450"/>
  5 | </p>
  6 | 
  7 | Goggles are a way to alter the ranking in a search engine. Brave is behind this technology and you should [read more about it](https://github.com/brave/goggles-quickstart). This tool lets you generate Goggles using your favorite subreddit(s). Forces Unseen uses this to generate [netsec-goggle](https://github.com/forcesunseen/netsec-goggle).
  8 | 
  9 | You can also read more about this on Forces Unseen's blog: [Make Search Engines Great Again!](https://blog.forcesunseen.com/make-search-engines-great-again)
 10 | 
 11 | ## Basic Usage
 12 | 
 13 | 1. Build the container locally using the following command.
 14 | 
 15 | ```
 16 | docker build . -t narwhalizer
 17 | ```
 18 | 
 19 | 2. Create an `.env` file. Refer to `netsec.env` for what you will need in there. The [Environment Variables](#environment-variables) section has more details about what each variable does. For your first time you can try it out with the `netsec.env` file by just replacing the variables that say `REPLACE`. Refer to the [Reddit API Credentials](#reddit-api-credentials) section for obtaining credentials for Reddit.
 20 | 
 21 | 3. Run the container using your `.env` file. The following command shows `netsec.env` being used.
 22 | 
 23 | ```
 24 | docker run -it -v ${PWD}/data:/app/data --env-file netsec.env narwhalizer
 25 | ```
 26 | 
 27 | The first run will take longer since timesearch will have to build a database for the subreddit first. Afterwards updates will be much quicker. After execution completes the `data` directory will contain the subreddit database and generated `output.goggle` file. Every time the container is run, timesearch will check for updates and the `output.goggle` file will be re-generated. If at any point you terminate the container or your computer crashes, timesearch will continue from where it left off in building the database.
 28 | 
 29 | 
 30 | ## Reddit API Credentials
 31 | 
 32 | Timesearch requires Reddit API credentials and other identifiers. These steps only have to be completed one time.
 33 | 
 34 | 1. Go to https://old.reddit.com/prefs/apps/
 35 | 2. Create an application with the `script` type and the redirect URL set to `http://localhost:8081`.
 36 | 3. Copy the application ID and secret that was generated. (The application ID is near the top, under the name of your application)
 37 | 4. Run the following command to obtain a refresh token. Replace the variables with the application ID and secret generated in the previous step.
 38 | 
 39 | ```
 40 | docker run -it -p 127.0.0.1:8081:8081 narwhalizer ./scripts/refresh.sh --app-id <replace> --app-secret <replace>
 41 | ```
 42 | 
 43 | After this you should be able to fill out the `APP_ID`, `APP_SECRET`, and `APP_REFRESH` variables within your `.env` file.
 44 | 
 45 | *Note: The script requests the following scopes: `read` and `wikiread`*
 46 | 
 47 | ## Environment Variables
 48 | 
 49 | Everything is controlled through environment variables. Reference `netsec.env` for a real example and/or use `template.env` if you want a more bare-bones starting point.
 50 | 
 51 | ### Timesearch
 52 | 
 53 | `USERAGENT` should be set to a description of your API usage. In this case we just left it as `narwhalizer`.
 54 | 
 55 | `CONTACT_INFO` should be set to an email address or Reddit username.
 56 | 
 57 | See the [Reddit API Credentials](#reddit-api-credentials) section to obtain values for the `APP_ID`, `APP_SECRET`, and `APP_REFRESH` variables.
 58 | 
 59 | **Example:**
 60 | ```
 61 | USERAGENT=narwhalizer
 62 | CONTACT_INFO=<REPLACE_WITH_EMAIL_OR_REDDIT_USERNAME>
 63 | APP_ID=<REPLACE>
 64 | APP_SECRET=<REPLACE>
 65 | APP_REFRESH=<REPLACE>
 66 | ```
 67 | 
 68 | ### Goggle Metadata
 69 | 
 70 | This ends up as metadata at the top of the Goggle. More details about these can be these parameters can be found [here](https://github.com/brave/goggles-quickstart/blob/main/getting-started.md#goggles-syntax).
 71 | 
 72 | **Example:**
 73 | ```
 74 | GOGGLE_NAME=Netsec
 75 | GOGGLE_DESCRIPTION=Prioritizes domains popular with the information security community. Primarily uses submissions and scoring from /r/netsec.
 76 | GOGGLE_PUBLIC=true
 77 | GOGGLE_AUTHOR=Forces Unseen
 78 | GOGGLE_AVATAR=#01ebae
 79 | GOGGLE_HOMEPAGE=https://github.com/forcesunseen/netsec-goggle
 80 | ```
 81 | 
 82 | ### Goggle
 83 | 
 84 | `SUBREDDITS` takes a comma delimited list of subreddits.
 85 | 
 86 | `GOGGLE_FILENAME` sets the output filename of the Goggle.
 87 | 
 88 | `GOGGLES_EXTRAS` allows you to include additional instructions in the final Goggle. Use `\n` to separate each instruction.
 89 | 
 90 | **Example:**
 91 | ```
 92 | SUBREDDITS=netsec
 93 | GOGGLE_FILENAME=netsec.goggle
 94 | GOGGLE_EXTRAS=$boost=2,site=github.io\n$boost=2,site=github.com\n$boost=2,site=stackoverflow.com\n/blog.$boost=2\n/blog/$boost=2\n/docs.$boost=2\n/docs/$boost=2\n/doc/$boost=2\n/Doc/$boost=2\n/manual/$boost=2
 95 | ```
 96 | 
 97 | ### Algorithm
 98 | 
 99 | `SCORE_THRESHOLD` takes an integer for the minimum score of a submission to be included.
100 | 
101 | `MIN_FREQUENCY` takes an integer for the minimum frequency of a domain to be included.
102 | 
103 | `MIN_EPOCH_TIME` takes an integer representing Unix time for the oldest date submission to be included. Even though older submissions will not be included in the generated Goggle, we decided to still include all submissions when building the database with timesearch.
104 | 
105 | `TOP_DOMAINS_BEHAVIOR` takes one of four options: `exclude`, `include`, `discard`, or `downrank`.
106 | 
107 |    * **exclude** - top domains will be removed from the list of subreddit submissions.
108 |    * **include** - top domains will be left in the list of subreddit submissions.
109 |    * **discard** - top domains will be removed from the list of subreddit submissions and will also be marked as discard within the Goggle.
110 |    * **downrank** - top domains will be removed from the list of subreddit submissions and will also be downranked using `TOP_DOMAINS_DOWNRANK_VALUE` within the Goggle.
111 | 
112 | `TOP_DOMAINS_DOWNRANK_VALUE` takes an integer for the amount to downrank. This is only used when `TOP_DOMAINS_BEHAVIOR` is set to `downrank`.
113 | 
114 | **Example:**
115 | ```
116 | SCORE_THRESHOLD=20
117 | MIN_FREQUENCY=1
118 | MIN_EPOCH_TIME=0
119 | TOP_DOMAINS_BEHAVIOR=exclude
120 | TOP_DOMAINS_DOWNRANK_VALUE=2
121 | ```
122 | 
123 | Don't worry too much if there are completely irrelevant domains within your list. It most likely won't have a big impact because the rules are applied to Brave's "expanded recall set", as explained below.
124 | 
125 | >The instructions defined in a Goggle are not applied to Brave Search’s entire index, but to what we call the “expanded recall set,” which in turn is a function of the query. The set of candidate URLs can be in the tens of thousands, which is often more than enough to observe a noticeable effect; however, there are no guarantees that all possible URLs are surfaced (in search terminology, we have no guarantees on recall).
126 | 
127 | >Goggles do not apply to the whole Brave Search index, but to the expanded recall set which is a function of the input query. So if the target pages aren’t in the recall set, or even be in the Brave Search index, they won’t be captured by the Goggle.
128 | 
129 | 
130 | ## Development
131 | 
132 | The most common modifications can just be made through environment variables, but if you want you want to modify the `generate/generate_goggle.py` script you can mount the `generate` directory and run the container using the modified script. This can be done using the following command.
133 | 
134 | ```
135 | docker run -it -v ${PWD}/data:/app/data -v ${PWD}/generate:/app/generate --env-file netsec.env narwhalizer
136 | ```
137 | 
138 | ### Make
139 | 
140 | Here are some make commands for you lazy ones.
141 | 
142 | ```
143 | make build
144 | make run env=netsec.env
145 | make dev env=netsec.env
146 | ```
147 | 
148 | ## Thank You!
149 | 
150 | We owe it to [Ethan Dalool](https://github.com/voussoir) for creating [timesearch](https://github.com/voussoir/timesearch) and [Jason Baumgartner](https://github.com/pushshift) for creating [pushshift.io](https://pushshift.io/). Also, without Brave this project wouldn't even exist.
151 | 
152 | Thank you so much!
153 | 


--------------------------------------------------------------------------------
/data/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/forcesunseen/narwhalizer/1f307d651a0d4c6bd4c0ecf88f5649f9b0ea8563/data/.gitkeep


--------------------------------------------------------------------------------
/generate/generate_goggle.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sqlite3
  3 | import tldextract
  4 | from collections import defaultdict
  5 | from datetime import timezone, datetime
  6 | 
  7 | # Environment variables
  8 | APPLICATION_ROOT = os.environ.get("APPLICATION_ROOT")
  9 | GOGGLE_NAME = os.environ.get('GOGGLE_NAME')
 10 | GOGGLE_DESCRIPTION = os.environ.get('GOGGLE_DESCRIPTION')
 11 | GOGGLE_PUBLIC = os.environ.get('GOGGLE_PUBLIC')
 12 | GOGGLE_AUTHOR = os.environ.get('GOGGLE_AUTHOR')
 13 | GOGGLE_AVATAR = os.environ.get('GOGGLE_AVATAR')
 14 | GOGGLE_HOMEPAGE = os.environ.get('GOGGLE_HOMEPAGE')
 15 | GOGGLE_EXTRAS = os.environ.get('GOGGLE_EXTRAS')
 16 | GOGGLE_FILENAME = os.environ.get('GOGGLE_FILENAME')
 17 | SUBREDDITS = os.environ.get('SUBREDDITS').split(',')
 18 | SCORE_THRESHOLD = int(os.environ.get('SCORE_THRESHOLD'))
 19 | MIN_EPOCH_TIME = int(os.environ.get('MIN_EPOCH_TIME'))
 20 | MIN_FREQUENCY = int(os.environ.get('MIN_FREQUENCY'))
 21 | TOP_DOMAINS_BEHAVIOR = os.environ.get('TOP_DOMAINS_BEHAVIOR').lower()
 22 | TOP_DOMAINS_DOWNRANK_VALUE = int(os.environ.get('TOP_DOMAINS_DOWNRANK_VALUE'))
 23 | 
 24 | 
 25 | def dict_factory(cursor, row):
 26 |     """Return dicts for SQLite queries.
 27 |     """
 28 |     d = {}
 29 |     for idx, col in enumerate(cursor.description):
 30 |         d[col[0]] = row[idx]
 31 |     return d
 32 | 
 33 | 
 34 | def header():
 35 |     """Generate the Goggle metadata.
 36 |     """
 37 |     return (
 38 |         f"! name: {GOGGLE_NAME}\n"
 39 |         f"! description: {GOGGLE_DESCRIPTION}\n"
 40 |         f"! public: {GOGGLE_PUBLIC}\n"
 41 |         f"! author: {GOGGLE_AUTHOR}\n"
 42 |         f"! avatar: {GOGGLE_AVATAR}\n"
 43 |         f"! homepage: {GOGGLE_HOMEPAGE}\n"
 44 |     )
 45 | 
 46 | 
 47 | def extras():
 48 |     """Generate any extras to include in the Goggle.
 49 |     """
 50 |     comment = "! Goggle extras\n"
 51 |     if GOGGLE_EXTRAS is not None:
 52 |         extras = GOGGLE_EXTRAS.replace("\\n", "\n")
 53 |         return comment + extras + '\n'
 54 |     else:
 55 |         return ""
 56 | 
 57 | 
 58 | def boost(domain, amt):
 59 |     """Boost a site by an integer amount.
 60 |     """
 61 |     return f'$boost={amt},site={domain}'
 62 | 
 63 | 
 64 | def downrank(domain, amt):
 65 |     """Downrank a site by an integer amount.
 66 |     """
 67 |     return f'$downrank={amt},site={domain}'
 68 | 
 69 | 
 70 | def discard(domain):
 71 |     """Discard a site."
 72 |     """
 73 |     return f'$discard,site={domain}'
 74 | 
 75 | 
 76 | def TOP_DOMAINS_BEHAVIOR():
 77 |     if TOP_DOMAINS_BEHAVIOR in ['exclude', 'discard', 'include', 'downrank']:
 78 |         return TOP_DOMAINS_BEHAVIOR
 79 | 
 80 | 
 81 | def sort_domains(submissions):
 82 |     """Parse URLs and return sorted list of domains with exclusions.
 83 |     """
 84 |     domains = defaultdict(lambda: 0)
 85 |     domains_counts = defaultdict(lambda: 0)
 86 |     for item in submissions:
 87 |         score = item['score']
 88 |         url = item['url']
 89 |         if url is None:
 90 |             pass
 91 |         else:
 92 |             extracted = tldextract.extract(url)
 93 | 
 94 |         # Double check that we have a real domain
 95 |         if extracted[1] and extracted[2]:
 96 |             domain = '.'.join(extracted[1:]).lower()
 97 |             # Check if we want to include domains from the top domains list
 98 |             if TOP_DOMAINS_BEHAVIOR() != "include":
 99 |                 if domain not in TOP_DOMAINS:
100 |                     domains[domain] += score
101 |                     domains_counts[domain] += 1
102 |             else:
103 |                 domains[domain] += score
104 |                 domains_counts[domain] += 1
105 | 
106 |     # Remove domains that don't meet the frequency requirements
107 |     for item in domains_counts:
108 |         count = domains_counts[item]
109 |         if count < MIN_FREQUENCY:
110 |             domains.pop(item)
111 | 
112 |     # Sort domains by score
113 |     sorted_domains = sorted(
114 |         domains.items(), key=lambda item: item[1], reverse=True)
115 |     return sorted_domains
116 | 
117 | 
118 | def generate(domains):
119 |     """Generate rankings in Goggle format.
120 |     """
121 |     with open(f'{APPLICATION_ROOT}/data/{GOGGLE_FILENAME}', 'w') as target:
122 |         target.write(header())
123 |         target.write(f"! generated: {datetime.now(timezone.utc)}\n")
124 |         target.write('\n')
125 |         target.write(extras())
126 | 
127 |         entries = len(domains)
128 | 
129 |         if TOP_DOMAINS_BEHAVIOR() == "discard":
130 |             for domain in TOP_DOMAINS:
131 |                 target.write('\n')
132 |                 target.write(discard(domain))
133 | 
134 |         if TOP_DOMAINS_BEHAVIOR() == "downrank":
135 |             for domain in TOP_DOMAINS:
136 |                 target.write('\n')
137 |                 target.write(downrank(domain, TOP_DOMAINS_DOWNRANK_VALUE))
138 | 
139 |         # Split up the list into thirds and assign a boost of 4, 3, or 2
140 |         for item in range(len(domains)):
141 |             domain = domains[item][0]
142 |             place = item/entries
143 |             if place <= 0.33:
144 |                 target.write('\n')
145 |                 target.write(boost(domain, 4))
146 |             elif place <= 0.66:
147 |                 target.write('\n')
148 |                 target.write(boost(domain, 3))
149 |             else:
150 |                 target.write('\n')
151 |                 target.write(boost(domain, 2))
152 |     print(f'{GOGGLE_FILENAME} generated')
153 | 
154 | 
155 | # Store top domains in memory
156 | with open(f'{APPLICATION_ROOT}/generate/top_domains.txt', 'r') as top_domains_file:
157 |     TOP_DOMAINS = top_domains_file.read().splitlines()
158 | 
159 | # Get data from SQLite database
160 | submissions = []
161 | for item in SUBREDDITS:
162 |     if len(item) > 0:
163 |         target_subreddit = item.lower()
164 |         con = sqlite3.connect(
165 |             f'{APPLICATION_ROOT}/data/subreddits/{target_subreddit}/{target_subreddit}.db')
166 |         con.row_factory = dict_factory
167 |         cur = con.cursor()
168 |         cur.execute(
169 |             'SELECT score,url FROM submissions WHERE score >= ? AND created >= ?', (SCORE_THRESHOLD, MIN_EPOCH_TIME))
170 |         submissions.extend(cur.fetchall())
171 |         con.close()
172 | 
173 | # Sort domains and generate Goggle
174 | sorted_domains = sort_domains(submissions)
175 | generate(sorted_domains)
176 | 


--------------------------------------------------------------------------------
/generate/requirements.txt:
--------------------------------------------------------------------------------
1 | tldextract
2 | 


--------------------------------------------------------------------------------
/generate/top_domains.txt:
--------------------------------------------------------------------------------
   1 | google.com
   2 | youtube.com
   3 | facebook.com
   4 | akamaiedge.net
   5 | netflix.com
   6 | microsoft.com
   7 | instagram.com
   8 | twitter.com
   9 | gtld-servers.net
  10 | baidu.com
  11 | linkedin.com
  12 | akamai.net
  13 | wikipedia.org
  14 | apple.com
  15 | amazonaws.com
  16 | yahoo.com
  17 | cloudflare.com
  18 | bilibili.com
  19 | amazon.com
  20 | a-msedge.net
  21 | qq.com
  22 | live.com
  23 | akadns.net
  24 | googletagmanager.com
  25 | wordpress.org
  26 | bing.com
  27 | netflix.net
  28 | github.com
  29 | whatsapp.com
  30 | pinterest.com
  31 | reddit.com
  32 | office.com
  33 | youtu.be
  34 | trafficmanager.net
  35 | microsoftonline.com
  36 | l-msedge.net
  37 | azure.com
  38 | windowsupdate.com
  39 | vimeo.com
  40 | adobe.com
  41 | zoom.us
  42 | doubleclick.net
  43 | zhihu.com
  44 | fastly.net
  45 | yandex.ru
  46 | mail.ru
  47 | googlevideo.com
  48 | wordpress.com
  49 | domaincontrol.com
  50 | goo.gl
  51 | vk.com
  52 | bit.ly
  53 | gandi.net
  54 | nflxso.net
  55 | s-msedge.net
  56 | googleusercontent.com
  57 | msn.com
  58 | taobao.com
  59 | weibo.com
  60 | aaplimg.com
  61 | sharepoint.com
  62 | csdn.net
  63 | t.co
  64 | blogspot.com
  65 | mozilla.org
  66 | tiktok.com
  67 | google.com.hk
  68 | tumblr.com
  69 | cloudapp.net
  70 | paypal.com
  71 | edgekey.net
  72 | windows.net
  73 | macromedia.com
  74 | webex.com
  75 | nytimes.com
  76 | xvideos.com
  77 | nih.gov
  78 | office365.com
  79 | apple-dns.net
  80 | 163.com
  81 | spotify.com
  82 | sina.com.cn
  83 | intuit.com
  84 | yandex.net
  85 | pornhub.com
  86 | google-analytics.com
  87 | europa.eu
  88 | flickr.com
  89 | dropbox.com
  90 | skype.com
  91 | twitch.tv
  92 | canva.com
  93 | attcompute.com
  94 | yahoo.co.jp
  95 | medium.com
  96 | stackoverflow.com
  97 | imdb.com
  98 | ebay.com
  99 | sohu.com
 100 | gravatar.com
 101 | googledomains.com
 102 | jd.com
 103 | opera.com
 104 | naver.com
 105 | lencr.org
 106 | fandom.com
 107 | icloud.com
 108 | t-msedge.net
 109 | cnn.com
 110 | myfritz.net
 111 | soundcloud.com
 112 | outlook.com
 113 | cloudfront.net
 114 | aliexpress.com
 115 | myshopify.com
 116 | tmall.com
 117 | t.me
 118 | gvt1.com
 119 | apache.org
 120 | amazon.in
 121 | fbcdn.net
 122 | archive.org
 123 | bbc.com
 124 | forbes.com
 125 | zemanta.com
 126 | nic.ru
 127 | theguardian.com
 128 | bbc.co.uk
 129 | quora.com
 130 | cloudflare.net
 131 | omtrdc.net
 132 | wellsfargo.com
 133 | force.com
 134 | github.io
 135 | abusix.zone
 136 | w3.org
 137 | msedge.net
 138 | chaturbate.com
 139 | xhamster.com
 140 | office.net
 141 | forms.gle
 142 | digicert.com
 143 | salesforce.com
 144 | douban.com
 145 | etsy.com
 146 | indeed.com
 147 | p-cdn.us
 148 | sourceforge.net
 149 | linode.com
 150 | google.co.in
 151 | duckduckgo.com
 152 | sciencedirect.com
 153 | spo-msedge.net
 154 | miit.gov.cn
 155 | imgur.com
 156 | teamviewer.com
 157 | amazon.co.uk
 158 | creativecommons.org
 159 | who.int
 160 | 1688.com
 161 | booking.com
 162 | discord.com
 163 | googlesyndication.com
 164 | weebly.com
 165 | dnsowl.com
 166 | issuu.com
 167 | twimg.com
 168 | akamaihd.net
 169 | youku.com
 170 | spankbang.com
 171 | wixsite.com
 172 | cdc.gov
 173 | researchgate.net
 174 | roblox.com
 175 | telegram.org
 176 | flipkart.com
 177 | reuters.com
 178 | washingtonpost.com
 179 | xnxx.com
 180 | wikimedia.org
 181 | oracle.com
 182 | amazon.co.jp
 183 | dailymail.co.uk
 184 | tiktokcdn.com
 185 | cedexis.net
 186 | mangosip.ru
 187 | registrar-servers.com
 188 | tradingview.com
 189 | harvard.edu
 190 | bloomberg.com
 191 | wsj.com
 192 | blogger.com
 193 | dnsmadeeasy.com
 194 | hp.com
 195 | tinyurl.com
 196 | googleapis.com
 197 | wix.com
 198 | googleadservices.com
 199 | worldfcdn.com
 200 | alibaba.com
 201 | akamaized.net
 202 | ytimg.com
 203 | mit.edu
 204 | meraki.com
 205 | kaspersky.com
 206 | businessinsider.com
 207 | aliyun.com
 208 | shopify.com
 209 | wa.me
 210 | hwcdn.net
 211 | go.com
 212 | gvt2.com
 213 | alipay.com
 214 | wp.com
 215 | dell.com
 216 | samsung.com
 217 | youtube-nocookie.com
 218 | cisco.com
 219 | epicgames.com
 220 | cdninstagram.com
 221 | ok.ru
 222 | wiley.com
 223 | cnbc.com
 224 | freepik.com
 225 | google.cn
 226 | windows.com
 227 | slideshare.net
 228 | hao123.com
 229 | ibm.com
 230 | amazon.de
 231 | azurewebsites.net
 232 | demdex.net
 233 | sogou.com
 234 | php.net
 235 | pixiv.net
 236 | behance.net
 237 | 360.cn
 238 | slack.com
 239 | edgesuite.net
 240 | godaddy.com
 241 | so.com
 242 | nature.com
 243 | coinmarketcap.com
 244 | mediafire.com
 245 | springer.com
 246 | google.de
 247 | walmart.com
 248 | list-manage.com
 249 | espn.com
 250 | zendesk.com
 251 | tencent.com
 252 | aol.com
 253 | adnxs.com
 254 | grammarly.com
 255 | binance.com
 256 | stanford.edu
 257 | fc2.com
 258 | cnblogs.com
 259 | weather.com
 260 | un.org
 261 | comcast.net
 262 | o365filtering.com
 263 | nasa.gov
 264 | deepl.com
 265 | elasticbeanstalk.com
 266 | cnki.net
 267 | gov.uk
 268 | gnu.org
 269 | app-measurement.com
 270 | mega.nz
 271 | doi.org
 272 | db.com
 273 | fiverr.com
 274 | line.me
 275 | att.net
 276 | trello.com
 277 | unsplash.com
 278 | w3schools.com
 279 | douyu.com
 280 | salesforceliveagent.com
 281 | nginx.org
 282 | marriott.com
 283 | name-services.com
 284 | fincoad.com
 285 | usatoday.com
 286 | tiktokv.com
 287 | azureedge.net
 288 | cnet.com
 289 | akam.net
 290 | herokudns.com
 291 | wsdvs.com
 292 | ilovepdf.com
 293 | dailymotion.com
 294 | iqiyi.com
 295 | hubspot.com
 296 | google.co.uk
 297 | sberdevices.ru
 298 | stripe.com
 299 | npr.org
 300 | xcal.tv
 301 | yelp.com
 302 | telegraph.co.uk
 303 | rakuten.co.jp
 304 | linktr.ee
 305 | eventbrite.com
 306 | nginx.com
 307 | steampowered.com
 308 | surveymonkey.com
 309 | rambler.ru
 310 | cdngslb.com
 311 | google.co.jp
 312 | indiatimes.com
 313 | instructure.com
 314 | 2mdn.net
 315 | shifen.com
 316 | wetransfer.com
 317 | mcafee.com
 318 | amazon-adsystem.com
 319 | goodreads.com
 320 | pubmatic.com
 321 | time.com
 322 | huya.com
 323 | deviantart.com
 324 | casalemedia.com
 325 | udemy.com
 326 | cpanel.net
 327 | rackspace.net
 328 | ca.gov
 329 | realsrv.com
 330 | ampproject.org
 331 | themeforest.net
 332 | patreon.com
 333 | snapchat.com
 334 | ring.com
 335 | tripadvisor.com
 336 | userapi.com
 337 | a2z.com
 338 | google.fr
 339 | criteo.com
 340 | scorecardresearch.com
 341 | gcdn.co
 342 | foxnews.com
 343 | zillow.com
 344 | scribd.com
 345 | avito.ru
 346 | ted.com
 347 | mailchimp.com
 348 | bitly.com
 349 | reg.ru
 350 | cdn77.org
 351 | gstatic.com
 352 | healthline.com
 353 | jomodns.com
 354 | jianshu.com
 355 | gosuslugi.ru
 356 | stuvamac.com
 357 | wired.com
 358 | addtoany.com
 359 | pki.goog
 360 | manitu.net
 361 | google.ru
 362 | unity3d.com
 363 | whatsapp.net
 364 | independent.co.uk
 365 | wildberries.ru
 366 | huawei.com
 367 | webmd.com
 368 | pixabay.com
 369 | incapdns.net
 370 | nstld.com
 371 | myspace.com
 372 | livejournal.com
 373 | ggpht.com
 374 | shutterstock.com
 375 | aparat.com
 376 | addthis.com
 377 | statista.com
 378 | berkeley.edu
 379 | stackexchange.com
 380 | facebook.net
 381 | xiaomi.com
 382 | mysql.com
 383 | digikala.com
 384 | squarespace.com
 385 | ft.com
 386 | amazon.ca
 387 | cpanel.com
 388 | douyin.com
 389 | google.ca
 390 | craigslist.org
 391 | youdao.com
 392 | adobe.io
 393 | okta.com
 394 | intel.com
 395 | g.page
 396 | techcrunch.com
 397 | huffingtonpost.com
 398 | hupu.com
 399 | latimes.com
 400 | 3dmgame.com
 401 | loc.gov
 402 | fb.com
 403 | digitalocean.com
 404 | chase.com
 405 | amzn.to
 406 | hicloud.com
 407 | f5silverline.com
 408 | consultant.ru
 409 | daum.net
 410 | onlyfans.com
 411 | appsflyer.com
 412 | opendns.com
 413 | savefrom.net
 414 | hichina.com
 415 | free.fr
 416 | theverge.com
 417 | alicdn.com
 418 | aliyuncs.com
 419 | atlassian.net
 420 | taboola.com
 421 | goskope.com
 422 | cambridge.org
 423 | pexels.com
 424 | rzone.de
 425 | f2pool.com
 426 | ohthree.com
 427 | speedtest.net
 428 | amazonvideo.com
 429 | kickstarter.com
 430 | googletagservices.com
 431 | investopedia.com
 432 | rbxcdn.com
 433 | tandfonline.com
 434 | alibabadns.com
 435 | b-msedge.net
 436 | google.es
 437 | beian.gov.cn
 438 | ikea.com
 439 | uol.com.br
 440 | washington.edu
 441 | roche.com
 442 | huffpost.com
 443 | rackspace.com
 444 | cornell.edu
 445 | debian.org
 446 | giphy.com
 447 | openx.net
 448 | ozon.ru
 449 | amazon.it
 450 | hotstar.com
 451 | prnewswire.com
 452 | apple.news
 453 | fedex.com
 454 | upwork.com
 455 | state.gov
 456 | trustpilot.com
 457 | feishu.cn
 458 | att.com
 459 | zoho.com
 460 | homedepot.com
 461 | oup.com
 462 | safebrowsing.apple
 463 | redd.it
 464 | azure-dns.com
 465 | saasprotection.com
 466 | nicovideo.jp
 467 | tistory.com
 468 | britannica.com
 469 | cbsnews.com
 470 | no-ip.com
 471 | android.com
 472 | theatlantic.com
 473 | nationalgeographic.com
 474 | fda.gov
 475 | ietf.org
 476 | business.site
 477 | hostgator.com
 478 | yandex.com
 479 | galaxydata.ru
 480 | nbcnews.com
 481 | lenovo.com.cn
 482 | bankofamerica.com
 483 | bidswitch.net
 484 | google.com.tw
 485 | dcinside.com
 486 | cloudns.net
 487 | eset.com
 488 | mi.com
 489 | buzzfeed.com
 490 | sagepub.com
 491 | maricopa.gov
 492 | amazon.fr
 493 | pinimg.com
 494 | nypost.com
 495 | mega.co.nz
 496 | worldnic.com
 497 | sentry.io
 498 | cctv.com
 499 | steamcommunity.com
 500 | wikihow.com
 501 | academia.edu
 502 | root-servers.net
 503 | sitemaps.org
 504 | bandcamp.com
 505 | name.com
 506 | adriver.ru
 507 | rlcdn.com
 508 | usnews.com
 509 | globo.com
 510 | msftconnecttest.com
 511 | hotjar.com
 512 | dns.com
 513 | primevideo.com
 514 | ali213.net
 515 | investing.com
 516 | marketwatch.com
 517 | dribbble.com
 518 | bluehost.com
 519 | aboutads.info
 520 | mailchi.mp
 521 | yahoodns.net
 522 | disqus.com
 523 | moatads.com
 524 | rt.com
 525 | usda.gov
 526 | gmail.com
 527 | box.com
 528 | miui.com
 529 | princeton.edu
 530 | avast.com
 531 | jimdo.com
 532 | arnebrachhold.de
 533 | mayoclinic.org
 534 | hbr.org
 535 | messenger.com
 536 | nga.cn
 537 | notion.so
 538 | vkontakte.ru
 539 | amazon.es
 540 | rubiconproject.com
 541 | coursera.org
 542 | forter.com
 543 | marca.com
 544 | hugedomains.com
 545 | statcounter.com
 546 | herokuapp.com
 547 | ea.com
 548 | tbcache.com
 549 | nr-data.net
 550 | change.org
 551 | igamecj.com
 552 | mozilla.com
 553 | nintendo.net
 554 | merriam-webster.com
 555 | nike.com
 556 | ups.com
 557 | unesco.org
 558 | noaa.gov
 559 | rbc.ru
 560 | launchpad.net
 561 | dmm.co.jp
 562 | msftncsi.com
 563 | trendmicro.com
 564 | oath.cloud
 565 | livedoor.jp
 566 | stripchat.com
 567 | akismet.com
 568 | pbs.org
 569 | nvidia.com
 570 | ntp.org
 571 | airbnb.com
 572 | spaceweb.pro
 573 | rackspaceclouddb.com
 574 | smzdm.com
 575 | usps.com
 576 | arcgis.com
 577 | bet365.com
 578 | eastmoney.com
 579 | adsrvr.org
 580 | economist.com
 581 | msn.cn
 582 | umich.edu
 583 | whitehouse.gov
 584 | ngenix.net
 585 | outbrain.com
 586 | elsevier.com
 587 | ubuntu.com
 588 | chaoxing.com
 589 | autodesk.com
 590 | spcsdns.net
 591 | e-msedge.net
 592 | networkadvertising.org
 593 | columbia.edu
 594 | envato.com
 595 | calendly.com
 596 | verisign.com
 597 | google.com.br
 598 | bongacams.com
 599 | businesswire.com
 600 | epa.gov
 601 | target.com
 602 | ieee.org
 603 | xinhuanet.com
 604 | telegram.me
 605 | atlassian.com
 606 | varzesh3.com
 607 | allaboutcookies.org
 608 | ifeng.com
 609 | accuweather.com
 610 | y2mate.com
 611 | northgrum.com
 612 | wpengine.com
 613 | irs.gov
 614 | sorbs.net
 615 | gamersky.com
 616 | netease.com
 617 | nessus.org
 618 | exacttarget.com
 619 | capitalone.com
 620 | ptinews.com
 621 | xiaohongshu.com
 622 | geeksforgeeks.org
 623 | redhat.com
 624 | google.com.mx
 625 | digg.com
 626 | typepad.com
 627 | 3lift.com
 628 | roku.com
 629 | ixigua.com
 630 | namu.wiki
 631 | bestbuy.com
 632 | worldbank.org
 633 | toutiao.com
 634 | istockphoto.com
 635 | ria.ru
 636 | google.com.sg
 637 | mts.ru
 638 | hulu.com
 639 | myqcloud.com
 640 | fbs1-t-msedge.net
 641 | 9gag.com
 642 | media.net
 643 | appspot.com
 644 | shein.com
 645 | b-cdn.net
 646 | timeweb.ru
 647 | typeform.com
 648 | abc.net.au
 649 | fortinet.net
 650 | canada.ca
 651 | rackcdn.com
 652 | quizlet.com
 653 | qualtrics.com
 654 | about.com
 655 | newrelic.com
 656 | opensea.io
 657 | netangels.ru
 658 | yts.mx
 659 | deloitte.com
 660 | sciencemag.org
 661 | atomile.com
 662 | fb.me
 663 | psu.edu
 664 | arxiv.org
 665 | gitlab.com
 666 | domainmarket.com
 667 | americanexpress.com
 668 | media-amazon.com
 669 | rncdn7.com
 670 | visualstudio.com
 671 | beeline.ru
 672 | ucla.edu
 673 | skyeng.link
 674 | cbc.ca
 675 | yale.edu
 676 | tremorhub.com
 677 | chinamobile.com
 678 | reverso.net
 679 | apigee.net
 680 | dnspod.net
 681 | ivi.ru
 682 | dynect.net
 683 | aiv-cdn.net
 684 | ox.ac.uk
 685 | vice.com
 686 | dw.com
 687 | lum-superproxy.io
 688 | dreamhost.com
 689 | iso.org
 690 | upenn.edu
 691 | sitescout.com
 692 | constantcontact.com
 693 | google.com.tr
 694 | ameblo.jp
 695 | cricbuzz.com
 696 | figma.com
 697 | zdnet.com
 698 | imgsmail.ru
 699 | tinkoff.ru
 700 | nhentai.net
 701 | c-msedge.net
 702 | mashable.com
 703 | smallpdf.com
 704 | shaparak.ir
 705 | hdfcbank.com
 706 | psychologytoday.com
 707 | aliexpress.ru
 708 | vox.com
 709 | rosintel.com
 710 | jotform.com
 711 | sophos.com
 712 | fincoec.com
 713 | eepurl.com
 714 | shopee.tw
 715 | lenovo.com
 716 | mzstatic.com
 717 | google.com.au
 718 | nist.gov
 719 | gofundme.com
 720 | mathtag.com
 721 | sciencedaily.com
 722 | markmonitor.com
 723 | checkpoint.com
 724 | chsi.com.cn
 725 | evernote.com
 726 | aliyundrive.com
 727 | playstation.com
 728 | gitee.com
 729 | bmj.com
 730 | runoob.com
 731 | dhl.com
 732 | jdadelivers.com
 733 | wisc.edu
 734 | wiktionary.org
 735 | nokia.com
 736 | cogentco.com
 737 | rayjump.com
 738 | google.co.th
 739 | ortb.net
 740 | theconversation.com
 741 | feedburner.com
 742 | google.it
 743 | getpocket.com
 744 | newyorker.com
 745 | ttlivecdn.com
 746 | zerodha.com
 747 | zhibo8.cc
 748 | discord.gg
 749 | apnews.com
 750 | plos.org
 751 | sharethrough.com
 752 | uber.com
 753 | elpais.com
 754 | optimizely.com
 755 | bitrix24.ru
 756 | gearbest.com
 757 | fortune.com
 758 | trendyol.com
 759 | biomedcentral.com
 760 | bluekai.com
 761 | wattpad.com
 762 | ps.kz
 763 | quantserve.com
 764 | animeflv.net
 765 | genius.com
 766 | icourse163.org
 767 | python.org
 768 | medicalnewstoday.com
 769 | ovscdns.com
 770 | doubleverify.com
 771 | paloaltonetworks.com
 772 | docker.com
 773 | weforum.org
 774 | nba.com
 775 | ertelecom.ru
 776 | azurefd.net
 777 | nest.com
 778 | lazada.sg
 779 | 1337x.to
 780 | yimg.com
 781 | alidns.com
 782 | fastcompany.com
 783 | mirtesen.ru
 784 | mdpi.com
 785 | ign.com
 786 | zippyshare.com
 787 | zhihuishu.com
 788 | uci.edu
 789 | mlb.com
 790 | fontawesome.com
 791 | jquery.com
 792 | rtcfront.net
 793 | va.gov
 794 | engadget.com
 795 | ovh.net
 796 | advertising.com
 797 | sfx.ms
 798 | verizon.com
 799 | 52pojie.cn
 800 | umn.edu
 801 | newsweek.com
 802 | tds.net
 803 | gihc.net
 804 | ebay.co.uk
 805 | kakao.com
 806 | meetup.com
 807 | avito.st
 808 | thesun.co.uk
 809 | as.com
 810 | agkn.com
 811 | vtb.ru
 812 | nic.uk
 813 | autohome.com.cn
 814 | frontiersin.org
 815 | sxyprn.com
 816 | stumbleupon.com
 817 | duolingo.com
 818 | llnwi.net
 819 | oreilly.com
 820 | attn.tv
 821 | gizmodo.com
 822 | mckinsey.com
 823 | spiegel.de
 824 | jstor.org
 825 | adsafeprotected.com
 826 | apa.org
 827 | online-metrix.net
 828 | eporner.com
 829 | inc.com
 830 | crashlytics.com
 831 | playstation.net
 832 | pconline.com.cn
 833 | expedia.com
 834 | hilton.com
 835 | comcast.com
 836 | mirror.co.uk
 837 | vmware.com
 838 | swrve.com
 839 | mercadolibre.com.mx
 840 | example.com
 841 | asus.com
 842 | secureserver.net
 843 | gismeteo.ru
 844 | xfinity.com
 845 | webs.com
 846 | timeanddate.com
 847 | discordapp.com
 848 | gamespot.com
 849 | myanimelist.net
 850 | thepaper.cn
 851 | google.com.sa
 852 | dyndns.org
 853 | cam.ac.uk
 854 | zol.com.cn
 855 | youporn.com
 856 | sun.com
 857 | jhu.edu
 858 | corelux.net
 859 | sahibinden.com
 860 | wayfair.com
 861 | criteo.net
 862 | e-hentai.org
 863 | emxdgt.com
 864 | wp.pl
 865 | disneyplus.com
 866 | crunchyroll.com
 867 | duckdns.org
 868 | asana.com
 869 | sberbank.ru
 870 | tapad.com
 871 | pikiran-rakyat.com
 872 | people.com.cn
 873 | everesttech.net
 874 | smartadserver.com
 875 | entrepreneur.com
 876 | mwbsys.com
 877 | utexas.edu
 878 | cmu.edu
 879 | fcuat.com
 880 | ripn.net
 881 | bidr.io
 882 | oecd.org
 883 | tripod.com
 884 | youronlinechoices.com
 885 | plesk.com
 886 | photobucket.com
 887 | huaban.com
 888 | xing.com
 889 | remove.bg
 890 | namebrightdns.com
 891 | realtor.com
 892 | cdn-apple.com
 893 | jb51.net
 894 | hitomi.la
 895 | hinet.net
 896 | bbb.org
 897 | coupang.com
 898 | softonic.com
 899 | acs.org
 900 | exelator.com
 901 | blog.jp
 902 | glassdoor.com
 903 | chess.com
 904 | geocities.com
 905 | onlinesbi.com
 906 | imrworldwide.com
 907 | teads.tv
 908 | samsungcloudsolution.com
 909 | tokopedia.com
 910 | kunlunsl.com
 911 | pewresearch.org
 912 | aljazeera.com
 913 | merchantlink.com
 914 | tencent-cloud.net
 915 | footprint.net
 916 | xhamsterlive.com
 917 | licdn.com
 918 | dropboxusercontent.com
 919 | icloud-content.com
 920 | readmanganato.com
 921 | canonical.com
 922 | openstreetmap.org
 923 | gotowebinar.com
 924 | scientificamerican.com
 925 | usgovcloudapi.net
 926 | ed.gov
 927 | ny.gov
 928 | kaspersky-labs.com
 929 | blackboard.com
 930 | sectigo.com
 931 | uchicago.edu
 932 | sfgate.com
 933 | shopee.co.id
 934 | xueqiu.com
 935 | nsone.net
 936 | sky.com
 937 | crwdcntrl.net
 938 | substack.com
 939 | nps.gov
 940 | lowes.com
 941 | hbo.com
 942 | mercadolibre.com.ar
 943 | adp.com
 944 | elegantthemes.com
 945 | dnsv1.com
 946 | hootsuite.com
 947 | amazon.com.mx
 948 | 360doc.com
 949 | samsungcloud.com
 950 | ilive.cn
 951 | guardian.co.uk
 952 | 33across.com
 953 | fmkorea.com
 954 | xbox.com
 955 | mihoyo.com
 956 | wbx2.com
 957 | chegg.com
 958 | pcmag.com
 959 | ys7.com
 960 | lenta.ru
 961 | fao.org
 962 | applovin.com
 963 | ecdns.net
 964 | t-mobile.com
 965 | fidelity.com
 966 | gartner.com
 967 | webrootcloudav.com
 968 | immedia-semi.com
 969 | getbootstrap.com
 970 | detik.com
 971 | allegro.pl
 972 | weibo.cn
 973 | qidian.com
 974 | purdue.edu
 975 | google.co.kr
 976 | pnas.org
 977 | ftc.gov
 978 | soso.com
 979 | braze.com
 980 | indiegogo.com
 981 | ucsd.edu
 982 | news.com.au
 983 | quillbot.com
 984 | nyu.edu
 985 | slate.com
 986 | indiamart.com
 987 | arstechnica.com
 988 | onet.pl
 989 | emailvision.net
 990 | 360yield.com
 991 | ucoz.ru
 992 | thelancet.com
 993 | cdn20.com
 994 | op.gg
 995 | bizjournals.com
 996 | anchor.fm
 997 | tribunnews.com
 998 | google.com.ar
 999 | branch.io
1000 | revopush.com
1001 | 


--------------------------------------------------------------------------------
/narwhalizer.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # This is where we have everything set up in the container
 4 | export APPLICATION_ROOT=/app
 5 | 
 6 | # Get ready
 7 | source "$APPLICATION_ROOT"/venv/bin/activate
 8 | cd "$APPLICATION_ROOT"/data
 9 | 
10 | # Update database for each subreddit using timesearch
11 | IFS=',' read -ra TARGET_SUBREDDITS <<< "$SUBREDDITS"
12 | for SUBREDDIT in "${TARGET_SUBREDDITS[@]}"; do
13 | 	echo "Running timesearch for subreddit: $SUBREDDIT"
14 | 	python "$APPLICATION_ROOT"/timesearch/timesearch.py get_submissions -r "$SUBREDDIT"
15 | done
16 | 
17 | # Generate Goggle
18 | python "$APPLICATION_ROOT"/generate/generate_goggle.py
19 | 


--------------------------------------------------------------------------------
/netsec.env:
--------------------------------------------------------------------------------
 1 | # Timesearch
 2 | USERAGENT=narwhalizer
 3 | CONTACT_INFO=<REPLACE_WITH_EMAIL_OR_REDDIT_USERNAME>
 4 | APP_ID=<REPLACE>
 5 | APP_SECRET=<REPLACE>
 6 | APP_REFRESH=<REPLACE>
 7 | 
 8 | # Goggle Metadata
 9 | GOGGLE_NAME=Netsec
10 | GOGGLE_DESCRIPTION=Prioritizes domains popular with the information security community. Primarily uses submissions and scoring from /r/netsec.
11 | GOGGLE_PUBLIC=true
12 | GOGGLE_AUTHOR=Forces Unseen
13 | GOGGLE_AVATAR=#01ebae
14 | GOGGLE_HOMEPAGE=https://github.com/forcesunseen/netsec-goggle
15 | 
16 | # Goggle
17 | SUBREDDITS=netsec
18 | GOGGLE_FILENAME=netsec.goggle
19 | GOGGLE_EXTRAS=$boost=2,site=github.io\n$boost=2,site=github.com\n$boost=2,site=stackoverflow.com\n/blog.$boost=2\n/blog/$boost=2\n/docs.$boost=2\n/docs/$boost=2\n/doc/$boost=2\n/Doc/$boost=2\n/manual/$boost=2
20 | 
21 | # Algorithm
22 | SCORE_THRESHOLD=20
23 | MIN_FREQUENCY=1
24 | MIN_EPOCH_TIME=0
25 | TOP_DOMAINS_BEHAVIOR=exclude
26 | TOP_DOMAINS_DOWNRANK_VALUE=2


--------------------------------------------------------------------------------
/patches/tsdb_naming.patch:
--------------------------------------------------------------------------------
 1 | 20,24c20,24
 2 | <     '.\\{name}.db',
 3 | <     '.\\subreddits\\{name}\\{name}.db',
 4 | <     '.\\{name}\\{name}.db',
 5 | <     '.\\databases\\{name}.db',
 6 | <     '.\\subreddits\\{name}\\{name}.db',
 7 | ---
 8 | >     '{name}.db',
 9 | >     'subreddits/{name}/{name}.db',
10 | >     '{name}/{name}.db',
11 | >     'databases/{name}.db',
12 | >     'subreddits/{name}/{name}.db',
13 | 27,31c27,31
14 | <     '.\\@{name}.db',
15 | <     '.\\users\\@{name}\\@{name}.db',
16 | <     '.\\@{name}\\@{name}.db',
17 | <     '.\\databases\\@{name}.db',
18 | <     '.\\users\\@{name}\\@{name}.db',
19 | ---
20 | >     '@{name}.db',
21 | >     'users/@{name}/@{name}.db',
22 | >     '@{name}/@{name}.db',
23 | >     'databases/@{name}.db',
24 | >     'users/@{name}/@{name}.db',
25 | 


--------------------------------------------------------------------------------
/refresh/obtain_refresh_token.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | '''
 3 | Copyright (c) 2016, Bryce Boe
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 2. Redistributions in binary form must reproduce the above copyright notice,
12 |    this list of conditions and the following disclaimer in the documentation
13 |    and/or other materials provided with the distribution.
14 | 
15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
19 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
20 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
21 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
22 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
23 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
24 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25 | '''
26 | import random
27 | import socket
28 | import sys
29 | 
30 | import praw
31 | 
32 | 
33 | def main():
34 |     scopes = ["read", "wikiread"]
35 | 
36 |     reddit = praw.Reddit(
37 |         redirect_uri="http://localhost:8081",
38 |         user_agent="narwhalizer",
39 |     )
40 |     state = str(random.randint(0, 65000))
41 |     url = reddit.auth.url(duration="permanent", scopes=scopes, state=state)
42 |     print(f"Now open this url in your browser: {url}")
43 | 
44 |     client = receive_connection()
45 |     data = client.recv(1024).decode("utf-8")
46 |     param_tokens = data.split(" ", 2)[1].split("?", 1)[1].split("&")
47 |     params = {
48 |         key: value for (key, value) in [token.split("=") for token in param_tokens]
49 |     }
50 | 
51 |     if state != params["state"]:
52 |         send_message(
53 |             client,
54 |             f"State mismatch. Expected: {state} Received: {params['state']}",
55 |         )
56 |         return 1
57 |     elif "error" in params:
58 |         send_message(client, params["error"])
59 |         return 1
60 | 
61 |     print(params["code"])
62 |     refresh_token = reddit.auth.authorize(params["code"])
63 |     send_message(client, f"Refresh token: {refresh_token}")
64 |     return 0
65 | 
66 | 
67 | def receive_connection():
68 |     """Wait for and then return a connected socket..
69 | 
70 |     Opens a TCP connection on port 8081, and waits for a single client.
71 | 
72 |     """
73 |     server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
74 |     server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
75 |     server.bind(("0.0.0.0", 8081))
76 |     server.listen(1)
77 |     client = server.accept()[0]
78 |     server.close()
79 |     return client
80 | 
81 | 
82 | def send_message(client, message):
83 |     """Send message to client and close the connection."""
84 |     print(message)
85 |     client.send(f"HTTP/1.1 200 OK\r\n\r\n{message}".encode("utf-8"))
86 |     client.close()
87 | 
88 | 
89 | if __name__ == "__main__":
90 |     sys.exit(main())
91 | 


--------------------------------------------------------------------------------
/refresh/requirements.txt:
--------------------------------------------------------------------------------
1 | praw
2 | 


--------------------------------------------------------------------------------
/scripts/bot.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | BSD 3-Clause License
  3 | 
  4 | Copyright (c) 2022, Ethan Dalool aka voussoir
  5 | All rights reserved.
  6 | 
  7 | Redistribution and use in source and binary forms, with or without
  8 | modification, are permitted provided that the following conditions are met:
  9 | 
 10 | 1. Redistributions of source code must retain the above copyright notice, this
 11 |    list of conditions and the following disclaimer.
 12 | 
 13 | 2. Redistributions in binary form must reproduce the above copyright notice,
 14 |    this list of conditions and the following disclaimer in the documentation
 15 |    and/or other materials provided with the distribution.
 16 | 
 17 | 3. Neither the name of the copyright holder nor the names of its
 18 |    contributors may be used to endorse or promote products derived from
 19 |    this software without specific prior written permission.
 20 | 
 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 22 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 23 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 24 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
 25 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 26 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 27 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 28 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 29 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 31 | '''
 32 | import os
 33 | 
 34 | '''
 35 | bot.py template for PRAW4
 36 | 
 37 | This file will be imported by all bots, and provides a standard way to log in.
 38 | 
 39 | You should never place this file in a git repository or any place where it will
 40 | get shared.
 41 | 
 42 | The requirements for this file are:
 43 | 
 44 | 1   A function `anonymous` with no arguments, which returns a `praw.Reddit`
 45 |     instance that has a Useragent but is otherwise anonymous / unauthenticated.
 46 |     This will be used in bots that need to make requests but don't need any
 47 |     permissions.
 48 | 
 49 | 2   A function `login` with optional parameter `r`, which returns an
 50 |     authenticated Reddit instance.
 51 |     If `r` is provided, authenticate it.
 52 |     If not, create one using `anonymous` and authenticate that.
 53 |     Either way, return the instance when finished.
 54 | 
 55 | The exact workings of these functions, and the existence of any other variables
 56 | and functions are up to you.
 57 | 
 58 | I suggest placing this file in a private directory and adding that directory to
 59 | your `PYTHONPATH` environment variable. This makes it importable from anywhere.
 60 | 
 61 | However, you may place it in your default Python library. An easy way to find
 62 | this is by importing a standard library module and checking its location:
 63 | >>> import os
 64 | >>> os
 65 | <module 'os' from 'C:\\Python36\\lib\\os.py'>
 66 | 
 67 | But placing the file in the standard library means you will have to copy it over
 68 | when you upgrade Python.
 69 | 
 70 | If you need multiple separate bots, I would suggest creating copies of this file
 71 | with different names, and then using `import specialbot as bot` within the
 72 | application, so that the rest of the interface can stay the same.
 73 | '''
 74 | 
 75 | # The USERAGENT is a short description of why you're using reddit's API.
 76 | # You can make it as simple as "/u/myusername's bot for /r/mysubreddit".
 77 | # It should include your username so that reddit can contact you if there
 78 | # is a problem.
 79 | USERAGENT = os.environ.get('USERAGENT')
 80 | 
 81 | # CONTACT_INFO can be your reddit username, or your email address, or any other
 82 | # means of contacting you. This is used for some bot programs but not all.
 83 | CONTACT_INFO = os.environ.get('CONTACT_INFO')
 84 | 
 85 | # It's time to get your OAuth credentials.
 86 | #  1. Go to https://old.reddit.com/prefs/apps
 87 | #  2. Click create a new app
 88 | #  3. Give it any name you want
 89 | #  4. Choose "script" type
 90 | #  5. Description and About URI can be blank
 91 | #  6. Put "http://localhost:8080" as the Redirect URI
 92 | #  7. Now that you have created your app, write down the app ID (which appears
 93 | #     under its name), the secret, and the URI (http://localhost:8080) in the
 94 | #     variables below:
 95 | APP_ID = os.environ.get('APP_ID')
 96 | APP_SECRET = os.environ.get('APP_SECRET')
 97 | APP_URI = os.environ.get('APP_URI')
 98 | #  8. Go to https://praw.readthedocs.io/en/latest/tutorials/refresh_token.html#obtaining-refresh-tokens
 99 | #  9. Copy that script and save it to a .py file on your computer
100 | # 10. The instructions at the top of the script tell you to run two "EXPORT"
101 | #     commands before running the script. This only works on Unix. If you are on
102 | #     Windows, or simply don't want to bother with environment variables, ignore
103 | #     that part of the instructions and instead add `client_id='XXXX'` and
104 | #     `client_secret='XXXX'` into the praw.Reddit constructor that you see on
105 | #     line 40 of that script. When I say XXXX I mean the values you just wrote
106 | #     down.
107 | # 11. Run the script on your command line `python obtain_refresh_token.py`
108 | # 12. Write down the refresh token that it gives you:
109 | APP_REFRESH = os.environ.get('APP_REFRESH')
110 | 
111 | ################################################################################
112 | 
113 | import praw
114 | 
115 | def anonymous():
116 |     r = praw.Reddit(
117 |         user_agent=USERAGENT,
118 |         client_id=APP_ID,
119 |         client_secret=APP_SECRET,
120 |     )
121 |     return r
122 | 
123 | def login(r=None):
124 |     new_r = praw.Reddit(
125 |         user_agent=USERAGENT,
126 |         client_id=APP_ID,
127 |         client_secret=APP_SECRET,
128 |         refresh_token=APP_REFRESH,
129 |     )
130 |     if r:
131 |         r.__dict__.clear()
132 |         r.__dict__.update(new_r.__dict__)
133 |     return new_r
134 | 


--------------------------------------------------------------------------------
/scripts/install.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | mkdir data
 4 | 
 5 | git clone https://github.com/voussoir/timesearch
 6 | cd timesearch
 7 | git checkout 01f2cdb
 8 | cd ..
 9 | 
10 | python3 -m venv venv
11 | source venv/bin/activate
12 | pip install -r timesearch/requirements.txt
13 | pip install -r generate/requirements.txt
14 | pip install -r refresh/requirements.txt
15 | 
16 | mv scripts/bot.py timesearch/
17 | ./scripts/patch.sh
18 | 


--------------------------------------------------------------------------------
/scripts/patch.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 | 
3 | patch /app/timesearch/timesearch_modules/tsdb.py /app/patches/tsdb_naming.patch
4 | 


--------------------------------------------------------------------------------
/scripts/refresh.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | while [[ $# -gt 0 ]] && [[ "$1" == "--"* ]] ;
 4 | do
 5 |     opt="$1";
 6 |     shift;
 7 |     case "$opt" in
 8 |         "--" ) break 2;;
 9 |         "--app-id" )
10 |            APP_ID="$1"; shift;;
11 |         "--app-secret" )
12 |            APP_SECRET="$1"; shift;;
13 |         *) echo >&2 "Invalid option: $@"; exit 1;;
14 |    esac
15 | done
16 | 
17 | export praw_client_id=$APP_ID
18 | export praw_client_secret=$APP_SECRET
19 | 
20 | source venv/bin/activate
21 | ./refresh/obtain_refresh_token.py


--------------------------------------------------------------------------------
/template.env:
--------------------------------------------------------------------------------
 1 | # Timesearch
 2 | USERAGENT=narwhalizer
 3 | CONTACT_INFO=
 4 | APP_ID=
 5 | APP_SECRET=
 6 | APP_REFRESH=
 7 | 
 8 | # Goggle Metadata
 9 | GOGGLE_NAME=Test
10 | GOGGLE_DESCRIPTION=This is a test Goggle.
11 | GOGGLE_PUBLIC=false
12 | GOGGLE_AUTHOR=User
13 | GOGGLE_AVATAR=#ff80ed
14 | GOGGLE_HOMEPAGE=https://example.com
15 | 
16 | # Goggle
17 | SUBREDDITS=netsec
18 | GOGGLE_FILENAME=output.goggle
19 | GOGGLE_EXTRAS=
20 | 
21 | # Algorithm
22 | SCORE_THRESHOLD=20
23 | MIN_FREQUENCY=1
24 | MIN_EPOCH_TIME=0
25 | TOP_DOMAINS_BEHAVIOR=exclude
26 | TOP_DOMAINS_DOWNRANK_VALUE=2
27 | 


--------------------------------------------------------------------------------