├── README.md ├── backup.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # [BBS](https://github.com/net4people/bbs/issues) 2 | 3 | ### Net4People BBS 4 | The BBS is an inclusive and multilingual forum for public discussion about Internet censorship circumvention. It is a place for **developers and researchers** to discuss and share information, techniques, and research. Feel free to write in your own language; we will translate. To start a discussion topic, [open a new issue](https://github.com/net4people/bbs/issues/new). 5 | 6 | ### Net4People论坛 7 | 本BBS是一个包容的多语种论坛,用于公开讨论规避互联网审查的话题。欢迎各位**开发者和研究人员**讨论和分享有关互联网封锁的信息、技术及研究。欢迎你使用自己的语言,我们会翻译的。要发起一个讨论话题,请[创建一个新的issue](https://github.com/net4people/bbs/issues/new)。 8 | 9 | ### Net4People BBS 10 | El BBS es un servicio inclusivo y multilingüe para la discusión pública acerca de las formas de elusión de la censura en Internet. Es un espacio para que **desarrolladores e investigadores** conversen y compartan información, técnicas y resultados. Si prefieres, escribe en tu propio idioma y lo trataremos de traducir. Para iniciar un nuevo tema de discusión, por favor [crea una nueva "issue"](https://github.com/net4people/bbs/issues/new). 11 | 12 | ### Net4People serwis BBS 13 | Ten BBS jest otwartym i wielojęzycznym forum dla publicznej dyskusji na temat obchodzenia cenzury Internetowej. To miejsce, gdzie **programiści i badacze** mogą rozmawiać i dzielić informacje, sposoby, i wyniki badawcze. Jeśli wolisz, proszę pisz po swoim języku, a przetłumaczymy. Aby rozpocząć temat dyskusyjny, proszę [otwórz nowy issue](https://github.com/net4people/bbs/issues/new). 14 | 15 | ### Net4People BBS 16 | Das BBS ist ein inklusives und vielsprachiges Forum für öffentliche Diskussion um Internetzensur und Zensurumgehung. Es ist ein Ort für **Entwickler und Forscher**, um Informationen, Techniken und Forschung zu teilen. Schreibe gerne in deiner Sprache; wir werden übersetzen. Um eine Diskussion zu starten, [starte ein "issue"](https://github.com/net4people/bbs/issues/new). 17 | 18 | ### انجمن Net4People 19 | 20 | BBS یک انجمن فراگیر و چند زبانه برای بحث و گفتگوی عمومی در مورد دور زدن سانسور اینترنت است. این مکانی برای **توسعه دهندگان و محققان** است تا بحث کنند و اطلاعات، فنون و تحقیقات را به اشتراک بگذارند. با خیال راحت به زبان خود بنویسید؛ ما ترجمه خواهیم کرد. برای شروع یک موضوع بحث، [یک مسئله ی جدید ایجاد کنید](https://github.com/net4people/bbs/issues/new). 21 | 22 | ### Net4People BBS 23 | O BBS é um forum inclusivo e multilíngue para discussão pública sobre como se evadir da censura na Internet. É um lugar para **desenvolvedores e pesquisadores** discutirem e compartilharem informações, técnicas e pesquisas. Sinta-se à vontade para escrever em seu próprio idioma, pois nós traduziremos. Para iniciar um tópico de discussão, [abra um novo problema](https://github.com/net4people/bbs/issues/new). 24 | 25 | ### Net4People BBS 26 | BBS adalah forum inklusif dan multibahasa untuk diskusi publik tentang pengelakan sensor internet. Forum ini merupakan tempat bagi para **pengembang dan peneliti** untuk berdiskusi dan berbagi informasi, teknik, dan penelitian. Jangan ragu untuk menulis dalam bahasamu sendiri; kami akan menerjemahkannya. Untuk memulai topik diskusi, [buka isu baru](https://github.com/net4people/bbs/issues/new). 27 | 28 | ### Net4People ဘီဘီအက်စ် 29 | ဘီဘီအက်စ်ဆိုသည်မှာ အင်တာနက်ဆင်ဆာပိတ်ဆို့မှုများအား ကျော်ဖြတ်ခြင်းအတွက် ဆွေးနွေးနိုင်သည့် ဖိုရမ်တစ်ခုဖြစ်ပါသည်။ **သုတေသီတွေနဲ့ ဒီဗလိုပါတွေ** သတင်းအချက်အလက်၊ နည်းစနစ်နဲ့ စာတမ်းတွေ မျှဝေနိုင် 30 | သည့်နေရာတစ်ခုလည်းဖြစ်ပါသည်။သင်နားလည်တဲ့ ဘာသာစကားနဲ့ဝင်ရောက်ဆွေးနွေးနိုင်ပါသည်။ ကျွန်ုပ်တို့မှ ဘာသာပြန်ပေးပါမည်။ 31 | အောက်က လင့်ကို နှိပ်ပြီးဆွေးနွေးမှုတစ်ခုစတင်နိုင်ပါသည်။ 32 | [open a new issue](https://github.com/net4people/bbs/issues/new) 33 | 34 | ### منتدى Net4People 35 | هَذَا الْمُنْتَدَى مَسَّاحَةٌ شَامِلَةٌ وَمُتَعَدِّدَةُ اللُّغَاتِ لِلنِّقَاشِ الْعَامِّ حَوْلَ تَجَاوُزِ رَقَابَةِ الإنترنت. يُمْكِنُ **لِلْمُطَوِّرِينَ وَالْبَاحِثِينَ** مُنَاقَشَةُ وَمُشَارَكَةُ الْمَعْلُومَاتِ، وَالتِّقْنِيَّاتِ، وَالْأَبْحَاثِ هُنَا. لَا تَتَرَدَّدْ/ي فِي الْكُتَّابَةِ بَلَغَتِك؛ سَنَقُومُ بِالتَّرْجَمَةِ. لِفَتْحِ نِقَاشِ جَديدٍ، [اِفْتَحْ/ي مُشَكَّلَةَ جَديدَةٍ](https://github.com/net4people/bbs/issues/new). 36 | 37 | ---- 38 | 39 | [Archives of this forum](https://archive.org/search.php?query=source%3A%22https%3A%2F%2Fgithub.com%2Fnet4people%2Fbbs%22&sort=-date), made using the [backup.py](backup.py) script. To make your own backup, [create a personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) and run: 40 |
./backup.py -u username:token net4people/bbs net4people_bbs.zip
41 |
--------------------------------------------------------------------------------
/backup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | # Usage: ./backup.py -u username:token net4people/bbs bbs-20201231.zip
4 | #
5 | # Downloads GitHub issues, comments, and labels using the GitHub REST API
6 | # (https://docs.github.com/en/free-pro-team@latest/rest). Saves output to a zip
7 | # file.
8 | #
9 | # The -u option controls authentication. You don't have to use it, but if you
10 | # don't, you will be limited to 60 API requests per hour. When you are
11 | # authenticated, you get 5000 API requests per hour. The "token" part is a
12 | # Personal Access Token, created at https://github.com/settings/tokens.
13 | # https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token
14 | # You don't have to enable any scopes for the token.
15 |
16 | import datetime
17 | import getopt
18 | import itertools
19 | import json
20 | import os
21 | import os.path
22 | import sys
23 | import tempfile
24 | import time
25 | import urllib.parse
26 | import zipfile
27 |
28 | import mistune
29 | import requests
30 |
31 | BASE_URL = "https://api.github.com/"
32 |
33 | # https://docs.github.com/en/free-pro-team@latest/rest/overview/media-types
34 | MEDIATYPE = "application/vnd.github.v3+json"
35 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-repository-issues-preview-notices
36 | MEDIATYPE_REACTIONS = "application/vnd.github.squirrel-girl-preview+json"
37 |
38 | UNSET_ZIPINFO_DATE_TIME = zipfile.ZipInfo("").date_time
39 |
40 | def url_origin(url):
41 | components = urllib.parse.urlparse(url)
42 | return (components.scheme, components.netloc)
43 |
44 | def check_url_origin(base, url):
45 | assert url_origin(base) == url_origin(url), (base, url)
46 |
47 | def datetime_to_zip_time(d):
48 | return (d.year, d.month, d.day, d.hour, d.minute, d.second)
49 |
50 | def timestamp_to_zip_time(timestamp):
51 | return datetime_to_zip_time(datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%SZ"))
52 |
53 | def http_date_to_zip_time(timestamp):
54 | # https://tools.ietf.org/html/rfc7231#section-7.1.1.1
55 | # We only support the IMF-fixdate format.
56 | return datetime_to_zip_time(datetime.datetime.strptime(timestamp, "%a, %d %b %Y %H:%M:%S GMT"))
57 |
58 | # https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#rate-limiting
59 | # Returns a datetime at which the rate limit will be reset, or None if not
60 | # currently rate limited.
61 | def rate_limit_reset(r):
62 | # A rate-limited response is one that has status code 403, an
63 | # x-ratelimit-remaining header with a value of 0, and an x-ratelimit-reset
64 | # header.
65 | if r.status_code != 403:
66 | return None
67 |
68 | remaining = r.headers.get("x-ratelimit-remaining")
69 | if remaining is None:
70 | return None
71 | try:
72 | if int(remaining) > 0:
73 | return None
74 | except ValueError:
75 | return None
76 |
77 | # If x-ratelimit-remaining is set, assume x-ratelimit-reset is set.
78 | reset = r.headers["x-ratelimit-reset"]
79 | return datetime.datetime.utcfromtimestamp(int(r.headers["x-ratelimit-reset"]))
80 |
81 | def response_datetime(r):
82 | dt = r.headers.get("date")
83 | return datetime.datetime.strptime(dt, "%a, %d %b %Y %X %Z")
84 |
85 | def get(sess, url, mediatype, params={}):
86 | # TODO: warn on 301 redirect? https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#http-redirects
87 |
88 | while True:
89 | print(url, end="", flush=True)
90 | try:
91 | headers = {}
92 | if mediatype is not None:
93 | headers["Accept"] = mediatype
94 | r = sess.get(url, params=params, headers=headers)
95 | except Exception as e:
96 | print(f" => {str(type(e))}", flush=True)
97 | raise
98 |
99 | print(f" => {r.status_code} {r.reason} {r.headers.get('x-ratelimit-used', '-')}/{r.headers.get('x-ratelimit-limit', '-')}", flush=True)
100 | reset = rate_limit_reset(r)
101 | if reset is not None:
102 | reset_seconds = (reset - response_datetime(r)).total_seconds()
103 | print(f"waiting {reset_seconds:.0f} s for rate limit, will resume at {reset.strftime('%Y-%m-%d %H:%M:%S')}", flush=True)
104 | time.sleep(reset_seconds)
105 | else:
106 | r.raise_for_status()
107 | return r
108 |
109 | # https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#pagination
110 | # https://docs.github.com/en/free-pro-team@latest/guides/traversing-with-pagination
111 | def get_paginated(sess, url, mediatype, params={}):
112 | params = params.copy()
113 | try:
114 | del params["page"]
115 | except KeyError:
116 | pass
117 | params["per_page"] = "100"
118 |
119 | while True:
120 | r = get(sess, url, mediatype, params)
121 | yield r
122 |
123 | next_link = r.links.get("next")
124 | if next_link is None:
125 | break
126 | next_url = next_link["url"]
127 | # The API documentation instructs us to follow the "next" link without
128 | # interpretation, but at least ensure it refers to the same scheme and
129 | # host.
130 | check_url_origin(url, next_url)
131 |
132 | url = next_url
133 |
134 | # If zi.date_time is UNSET_ZIPINFO_DATE_TIME, then it will be replaced with the
135 | # value of the HTTP response's Last-Modified header, if present.
136 | def get_to_zipinfo(sess, url, z, zi, mediatype, params={}):
137 | r = get(sess, url, mediatype, params)
138 |
139 | if zi.date_time == UNSET_ZIPINFO_DATE_TIME:
140 | last_modified = r.headers.get("Last-Modified")
141 | if last_modified is not None:
142 | zi.date_time = http_date_to_zip_time(last_modified)
143 |
144 | with z.open(zi, mode="w") as f:
145 | for chunk in r.iter_content(4096):
146 | f.write(chunk)
147 |
148 | # Converts a list of path components into a string path, raising an exception if
149 | # any component contains a slash, is "." or "..", or is empty; or if the whole
150 | # path is empty. The checks are to prevent any file writes outside the
151 | # destination directory when the zip file is extracted. We rely on the
152 | # assumption that no other files in the zip file are symbolic links, which is
153 | # true because this program does not create symbolic links.
154 | def make_zip_file_path(*components):
155 | for component in components:
156 | if "/" in component:
157 | raise ValueError("path component contains a slash")
158 | if component == "":
159 | raise ValueError("path component is empty")
160 | if component == ".":
161 | raise ValueError("path component is a self directory reference")
162 | if component == "..":
163 | raise ValueError("path component is a parent directory reference")
164 | if not components:
165 | raise ValueError("path is empty")
166 | return "/".join(components)
167 |
168 | # Fallback to mistune 1.0 renderer if mistune 2.0 is not installed
169 | try:
170 | mistuneRenderer = mistune.HTMLRenderer
171 | except AttributeError:
172 | mistuneRenderer = mistune.Renderer
173 | # Custom mistune.Renderer that stores a list of all links encountered.
174 | class LinkExtractionRenderer(mistuneRenderer):
175 | def __init__(self, **kwargs):
176 | super().__init__(**kwargs)
177 | self.links = []
178 |
179 | def autolink(self, link, is_email=False):
180 | self.links.append(link)
181 | return super().autolink(link, is_email)
182 |
183 | def image(self, src, title, alt_text):
184 | self.links.append(src)
185 | return super().image(src, title, alt_text)
186 |
187 | def link(self, link, title, content=None):
188 | self.links.append(link)
189 | return super().link(link, title, content)
190 |
191 | def markdown_extract_links(markdown):
192 | renderer = LinkExtractionRenderer()
193 | mistune.Markdown(renderer=renderer)(markdown) # Discard HTML output.
194 | return renderer.links
195 |
196 | # Return seq with prefix stripped if it has such a prefix, or else None.
197 | def strip_prefix(seq, prefix):
198 | if len(seq) < len(prefix):
199 | return None
200 | for a, b in zip(seq, prefix):
201 | if a != b:
202 | return None
203 | return seq[len(prefix):]
204 |
205 | def split_url_path(path):
206 | return tuple(urllib.parse.unquote(component) for component in path.split("/"))
207 |
208 | def strip_url_path_prefix(path, prefix):
209 | return strip_prefix(split_url_path(path), split_url_path(prefix))
210 |
211 | # If url is one we want to download, return a list of path components for the
212 | # path we want to store it at.
213 | def link_is_wanted(url):
214 | try:
215 | components = urllib.parse.urlparse(url)
216 | except ValueError:
217 | return None
218 |
219 | if components.scheme == "https" and components.netloc == "user-images.githubusercontent.com":
220 | subpath = strip_url_path_prefix(components.path, "")
221 | if subpath is not None:
222 | # Inline image.
223 | return ("user-images.githubusercontent.com", *subpath)
224 | if components.scheme == "https" and components.netloc == "github.com":
225 | for prefix in (f"/{owner}/{repo}/files", "/user-attachments/files"):
226 | subpath = strip_url_path_prefix(components.path, prefix)
227 | if subpath is not None:
228 | # File attachment.
229 | return ("files", *subpath)
230 | if components.scheme == "https" and components.netloc == "avatars.githubusercontent.com":
231 | path = components.path
232 | if components.query:
233 | # Avatar URLs often differ in the presence or absence of query
234 | # parameters. Save the query string with the path, just in case they
235 | # differ.
236 | path += "?" + components.query
237 | subpath = strip_url_path_prefix(path, "")
238 | if subpath is not None:
239 | # Avatar image.
240 | return ("avatars.githubusercontent.com", *subpath)
241 |
242 | def backup(owner, repo, z, username, token):
243 | paths_seen = set()
244 | # Calls make_zip_file_path, and additionally raises an exception if the path
245 | # has already been used.
246 | def check_path(*components):
247 | path = make_zip_file_path(*components)
248 | if path in paths_seen:
249 | raise ValueError(f"duplicate filename {path!a}")
250 | paths_seen.add(path)
251 | return path
252 |
253 | # Escape owner and repo suitably for use in a URL.
254 | owner = urllib.parse.quote(owner, safe="")
255 | repo = urllib.parse.quote(repo, safe="")
256 |
257 | now = datetime.datetime.utcnow()
258 | z.writestr(check_path("README"), f"""\
259 | Archive of the GitHub repository https://github.com/{owner}/{repo}/
260 | made {now.strftime("%Y-%m-%d %H:%M:%S")}.
261 | """)
262 |
263 | file_urls = set()
264 |
265 | # HTTP Basic authentication for API.
266 | sess = requests.Session()
267 | sess.auth = requests.auth.HTTPBasicAuth(username, token)
268 |
269 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-repository-issues
270 | issues_url = urllib.parse.urlparse(BASE_URL)._replace(
271 | path=f"/repos/{owner}/{repo}/issues",
272 | ).geturl()
273 | for r in get_paginated(sess, issues_url, MEDIATYPE_REACTIONS, {"sort": "created", "direction": "asc"}):
274 | for issue in r.json():
275 | check_url_origin(BASE_URL, issue["url"])
276 | zi = zipfile.ZipInfo(check_path("issues", str(issue["id"]) + ".json"), timestamp_to_zip_time(issue["created_at"]))
277 | get_to_zipinfo(sess, issue["url"], z, zi, MEDIATYPE_REACTIONS)
278 |
279 | # Re-open the JSON file we just wrote, to parse it for links.
280 | with z.open(zi) as f:
281 | data = json.load(f)
282 | for link in itertools.chain(markdown_extract_links(data["body"] or ""), [data["user"]["avatar_url"]]):
283 | link = urllib.parse.urlunparse(urllib.parse.urlparse(link)._replace(fragment = None)) # Discard fragment.
284 | dest = link_is_wanted(link)
285 | if dest is not None:
286 | file_urls.add((dest, link))
287 |
288 | # There's no API for getting all reactions in a repository, so get
289 | # them per issue and per comment.
290 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/reactions#list-reactions-for-an-issue
291 | reactions_url = issue["reactions"]["url"]
292 | check_url_origin(BASE_URL, reactions_url)
293 | for r2 in get_paginated(sess, reactions_url, MEDIATYPE_REACTIONS):
294 | for reaction in r2.json():
295 | zi = zipfile.ZipInfo(check_path("issues", str(issue["id"]), "reactions", str(reaction["id"]) + ".json"), timestamp_to_zip_time(reaction["created_at"]))
296 | with z.open(zi, mode="w") as f:
297 | f.write(json.dumps(reaction).encode("utf-8"))
298 |
299 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-issue-comments-for-a-repository
300 | # Comments are linked to their parent issue via the issue_url field.
301 | comments_url = urllib.parse.urlparse(BASE_URL)._replace(
302 | path=f"/repos/{owner}/{repo}/issues/comments",
303 | ).geturl()
304 | for r in get_paginated(sess, comments_url, MEDIATYPE_REACTIONS):
305 | for comment in r.json():
306 | check_url_origin(BASE_URL, comment["url"])
307 | zi = zipfile.ZipInfo(check_path("issues", "comments", str(comment["id"]) + ".json"), timestamp_to_zip_time(comment["created_at"]))
308 | get_to_zipinfo(sess, comment["url"], z, zi, MEDIATYPE_REACTIONS)
309 |
310 | # Re-open the JSON file we just wrote, to parse it for links.
311 | with z.open(zi) as f:
312 | data = json.load(f)
313 | for link in itertools.chain(markdown_extract_links(data["body"] or ""), [data["user"]["avatar_url"]]):
314 | link = urllib.parse.urlunparse(urllib.parse.urlparse(link)._replace(fragment = None)) # Discard fragment.
315 | dest = link_is_wanted(link)
316 | if dest is not None:
317 | file_urls.add((dest, link))
318 |
319 | # There's no API for getting all reactions in a repository, so get
320 | # them per issue and per comment.
321 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/reactions#list-reactions-for-an-issue-comment
322 | reactions_url = comment["reactions"]["url"]
323 | check_url_origin(BASE_URL, reactions_url)
324 | for r2 in get_paginated(sess, reactions_url, MEDIATYPE_REACTIONS):
325 | for reaction in r2.json():
326 | zi = zipfile.ZipInfo(check_path("issues", "comments", str(comment["id"]), "reactions", str(reaction["id"]) + ".json"), timestamp_to_zip_time(reaction["created_at"]))
327 | with z.open(zi, mode="w") as f:
328 | f.write(json.dumps(reaction).encode("utf-8"))
329 |
330 | # TODO: comment edit history (if possible)
331 |
332 | labels_url = urllib.parse.urlparse(BASE_URL)._replace(
333 | path=f"/repos/{owner}/{repo}/labels",
334 | ).geturl()
335 | for r in get_paginated(sess, labels_url, MEDIATYPE):
336 | for label in r.json():
337 | check_url_origin(BASE_URL, label["url"])
338 | zi = zipfile.ZipInfo(check_path("labels", str(label["id"]) + ".json"))
339 | get_to_zipinfo(sess, label["url"], z, zi, MEDIATYPE)
340 |
341 | # A new session, without Basic auth, for downloading plain files.
342 | sess = requests.Session()
343 |
344 | for dest, url in sorted(file_urls):
345 | zi = zipfile.ZipInfo(check_path(*dest))
346 | get_to_zipinfo(sess, url, z, zi, None)
347 |
348 | if __name__ == "__main__":
349 | opts, (repo, zip_filename) = getopt.gnu_getopt(sys.argv[1:], "u:")
350 | for o, a in opts:
351 | if o == "-u":
352 | username, token = a.split(":", 1)
353 | elif o in ("-h", "--help"):
354 | pass
355 |
356 | owner, repo = repo.split("/", 1)
357 |
358 | # Write to a temporary file, then rename to the requested name when
359 | # finished.
360 | with tempfile.NamedTemporaryFile(dir=os.path.dirname(zip_filename), suffix=".zip", delete=False) as f:
361 | try:
362 | with zipfile.ZipFile(f, mode="w") as z:
363 | backup(owner, repo, z, username, token)
364 | os.rename(f.name, zip_filename)
365 | except:
366 | # Delete output zip file on error.
367 | os.remove(f.name)
368 | raise
369 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | mistune >=2.0
2 | requests >=2.25
3 |
--------------------------------------------------------------------------------