├── README.md ├── backup.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # [BBS](https://github.com/net4people/bbs/issues) 2 | 3 | ### Net4People BBS 4 | The BBS is an inclusive and multilingual forum for public discussion about Internet censorship circumvention. It is a place for **developers and researchers** to discuss and share information, techniques, and research. Feel free to write in your own language; we will translate. To start a discussion topic, [open a new issue](https://github.com/net4people/bbs/issues/new). 5 | 6 | ### Net4People论坛 7 | 本BBS是一个包容的多语种论坛,用于公开讨论规避互联网审查的话题。欢迎各位**开发者和研究人员**讨论和分享有关互联网封锁的信息、技术及研究。欢迎你使用自己的语言,我们会翻译的。要发起一个讨论话题,请[创建一个新的issue](https://github.com/net4people/bbs/issues/new)。 8 | 9 | ### Net4People BBS 10 | El BBS es un servicio inclusivo y multilingüe para la discusión pública acerca de las formas de elusión de la censura en Internet. Es un espacio para que **desarrolladores e investigadores** conversen y compartan información, técnicas y resultados. Si prefieres, escribe en tu propio idioma y lo trataremos de traducir. Para iniciar un nuevo tema de discusión, por favor [crea una nueva "issue"](https://github.com/net4people/bbs/issues/new). 11 | 12 | ### Net4People serwis BBS 13 | Ten BBS jest otwartym i wielojęzycznym forum dla publicznej dyskusji na temat obchodzenia cenzury Internetowej. To miejsce, gdzie **programiści i badacze** mogą rozmawiać i dzielić informacje, sposoby, i wyniki badawcze. Jeśli wolisz, proszę pisz po swoim języku, a przetłumaczymy. Aby rozpocząć temat dyskusyjny, proszę [otwórz nowy issue](https://github.com/net4people/bbs/issues/new). 14 | 15 | ### Net4People BBS 16 | Das BBS ist ein inklusives und vielsprachiges Forum für öffentliche Diskussion um Internetzensur und Zensurumgehung. Es ist ein Ort für **Entwickler und Forscher**, um Informationen, Techniken und Forschung zu teilen. Schreibe gerne in deiner Sprache; wir werden übersetzen. Um eine Diskussion zu starten, [starte ein "issue"](https://github.com/net4people/bbs/issues/new). 17 | 18 | ### ‏انجمن Net4People‌ 19 | 20 | ‏BBS یک انجمن فراگیر و چند زبانه برای بحث و گفتگوی عمومی در مورد دور زدن سانسور اینترنت است. این مکانی برای **توسعه دهندگان و محققان** است تا بحث کنند و اطلاعات، فنون و تحقیقات را به اشتراک بگذارند. با خیال راحت به زبان خود بنویسید؛ ما ترجمه خواهیم کرد. برای شروع یک موضوع بحث، [یک مسئله ی جدید ایجاد کنید](https://github.com/net4people/bbs/issues/new).‌ 21 | 22 | ### Net4People BBS 23 | O BBS é um forum inclusivo e multilíngue para discussão pública sobre como se evadir da censura na Internet. É um lugar para **desenvolvedores e pesquisadores** discutirem e compartilharem informações, técnicas e pesquisas. Sinta-se à vontade para escrever em seu próprio idioma, pois nós traduziremos. Para iniciar um tópico de discussão, [abra um novo problema](https://github.com/net4people/bbs/issues/new). 24 | 25 | ### Net4People BBS 26 | BBS adalah forum inklusif dan multibahasa untuk diskusi publik tentang pengelakan sensor internet. Forum ini merupakan tempat bagi para **pengembang dan peneliti** untuk berdiskusi dan berbagi informasi, teknik, dan penelitian. Jangan ragu untuk menulis dalam bahasamu sendiri; kami akan menerjemahkannya. Untuk memulai topik diskusi, [buka isu baru](https://github.com/net4people/bbs/issues/new). 27 | 28 | ### Net4People ဘီဘီအက်စ် 29 | ဘီဘီအက်စ်ဆိုသည်မှာ အင်တာနက်ဆင်ဆာပိတ်ဆို့မှုများအား ကျော်ဖြတ်ခြင်းအတွက် ဆွေးနွေးနိုင်သည့် ဖိုရမ်တစ်ခုဖြစ်ပါသည်။ **သုတေသီတွေနဲ့ ဒီဗလိုပါတွေ** သတင်းအချက်အလက်၊ နည်းစနစ်နဲ့ စာတမ်းတွေ မျှဝေနိုင် 30 | သည့်နေရာတစ်ခုလည်းဖြစ်ပါသည်။သင်နားလည်တဲ့ ဘာသာစကားနဲ့ဝင်ရောက်ဆွေးနွေးနိုင်ပါသည်။ ကျွန်ုပ်တို့မှ ဘာသာပြန်ပေးပါမည်။ 31 | အောက်က လင့်ကို နှိပ်ပြီးဆွေးနွေးမှုတစ်ခုစတင်နိုင်ပါသည်။ 32 | [open a new issue](https://github.com/net4people/bbs/issues/new) 33 | 34 | ### ‏منتدى Net4People‌ 35 | ‏هَذَا الْمُنْتَدَى مَسَّاحَةٌ شَامِلَةٌ وَمُتَعَدِّدَةُ اللُّغَاتِ لِلنِّقَاشِ الْعَامِّ حَوْلَ تَجَاوُزِ رَقَابَةِ الإنترنت. يُمْكِنُ **لِلْمُطَوِّرِينَ وَالْبَاحِثِينَ** مُنَاقَشَةُ وَمُشَارَكَةُ الْمَعْلُومَاتِ، وَالتِّقْنِيَّاتِ، وَالْأَبْحَاثِ هُنَا. لَا تَتَرَدَّدْ/ي فِي الْكُتَّابَةِ بَلَغَتِك؛ سَنَقُومُ بِالتَّرْجَمَةِ. لِفَتْحِ نِقَاشِ جَديدٍ، [اِفْتَحْ/ي مُشَكَّلَةَ جَديدَةٍ](https://github.com/net4people/bbs/issues/new).‌ 36 | 37 | ---- 38 | 39 | [Archives of this forum](https://archive.org/search.php?query=source%3A%22https%3A%2F%2Fgithub.com%2Fnet4people%2Fbbs%22&sort=-date), made using the [backup.py](backup.py) script. To make your own backup, [create a personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) and run: 40 |
./backup.py -u username:token net4people/bbs net4people_bbs.zip
41 | -------------------------------------------------------------------------------- /backup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Usage: ./backup.py -u username:token net4people/bbs bbs-20201231.zip 4 | # 5 | # Downloads GitHub issues, comments, and labels using the GitHub REST API 6 | # (https://docs.github.com/en/free-pro-team@latest/rest). Saves output to a zip 7 | # file. 8 | # 9 | # The -u option controls authentication. You don't have to use it, but if you 10 | # don't, you will be limited to 60 API requests per hour. When you are 11 | # authenticated, you get 5000 API requests per hour. The "token" part is a 12 | # Personal Access Token, created at https://github.com/settings/tokens. 13 | # https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token 14 | # You don't have to enable any scopes for the token. 15 | 16 | import datetime 17 | import getopt 18 | import itertools 19 | import json 20 | import os 21 | import os.path 22 | import sys 23 | import tempfile 24 | import time 25 | import urllib.parse 26 | import zipfile 27 | 28 | import mistune 29 | import requests 30 | 31 | BASE_URL = "https://api.github.com/" 32 | 33 | # https://docs.github.com/en/free-pro-team@latest/rest/overview/media-types 34 | MEDIATYPE = "application/vnd.github.v3+json" 35 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-repository-issues-preview-notices 36 | MEDIATYPE_REACTIONS = "application/vnd.github.squirrel-girl-preview+json" 37 | 38 | UNSET_ZIPINFO_DATE_TIME = zipfile.ZipInfo("").date_time 39 | 40 | def url_origin(url): 41 | components = urllib.parse.urlparse(url) 42 | return (components.scheme, components.netloc) 43 | 44 | def check_url_origin(base, url): 45 | assert url_origin(base) == url_origin(url), (base, url) 46 | 47 | def datetime_to_zip_time(d): 48 | return (d.year, d.month, d.day, d.hour, d.minute, d.second) 49 | 50 | def timestamp_to_zip_time(timestamp): 51 | return datetime_to_zip_time(datetime.datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%SZ")) 52 | 53 | def http_date_to_zip_time(timestamp): 54 | # https://tools.ietf.org/html/rfc7231#section-7.1.1.1 55 | # We only support the IMF-fixdate format. 56 | return datetime_to_zip_time(datetime.datetime.strptime(timestamp, "%a, %d %b %Y %H:%M:%S GMT")) 57 | 58 | # https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#rate-limiting 59 | # Returns a datetime at which the rate limit will be reset, or None if not 60 | # currently rate limited. 61 | def rate_limit_reset(r): 62 | # A rate-limited response is one that has status code 403, an 63 | # x-ratelimit-remaining header with a value of 0, and an x-ratelimit-reset 64 | # header. 65 | if r.status_code != 403: 66 | return None 67 | 68 | remaining = r.headers.get("x-ratelimit-remaining") 69 | if remaining is None: 70 | return None 71 | try: 72 | if int(remaining) > 0: 73 | return None 74 | except ValueError: 75 | return None 76 | 77 | # If x-ratelimit-remaining is set, assume x-ratelimit-reset is set. 78 | reset = r.headers["x-ratelimit-reset"] 79 | return datetime.datetime.utcfromtimestamp(int(r.headers["x-ratelimit-reset"])) 80 | 81 | def response_datetime(r): 82 | dt = r.headers.get("date") 83 | return datetime.datetime.strptime(dt, "%a, %d %b %Y %X %Z") 84 | 85 | def get(sess, url, mediatype, params={}): 86 | # TODO: warn on 301 redirect? https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#http-redirects 87 | 88 | while True: 89 | print(url, end="", flush=True) 90 | try: 91 | headers = {} 92 | if mediatype is not None: 93 | headers["Accept"] = mediatype 94 | r = sess.get(url, params=params, headers=headers) 95 | except Exception as e: 96 | print(f" => {str(type(e))}", flush=True) 97 | raise 98 | 99 | print(f" => {r.status_code} {r.reason} {r.headers.get('x-ratelimit-used', '-')}/{r.headers.get('x-ratelimit-limit', '-')}", flush=True) 100 | reset = rate_limit_reset(r) 101 | if reset is not None: 102 | reset_seconds = (reset - response_datetime(r)).total_seconds() 103 | print(f"waiting {reset_seconds:.0f} s for rate limit, will resume at {reset.strftime('%Y-%m-%d %H:%M:%S')}", flush=True) 104 | time.sleep(reset_seconds) 105 | else: 106 | r.raise_for_status() 107 | return r 108 | 109 | # https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#pagination 110 | # https://docs.github.com/en/free-pro-team@latest/guides/traversing-with-pagination 111 | def get_paginated(sess, url, mediatype, params={}): 112 | params = params.copy() 113 | try: 114 | del params["page"] 115 | except KeyError: 116 | pass 117 | params["per_page"] = "100" 118 | 119 | while True: 120 | r = get(sess, url, mediatype, params) 121 | yield r 122 | 123 | next_link = r.links.get("next") 124 | if next_link is None: 125 | break 126 | next_url = next_link["url"] 127 | # The API documentation instructs us to follow the "next" link without 128 | # interpretation, but at least ensure it refers to the same scheme and 129 | # host. 130 | check_url_origin(url, next_url) 131 | 132 | url = next_url 133 | 134 | # If zi.date_time is UNSET_ZIPINFO_DATE_TIME, then it will be replaced with the 135 | # value of the HTTP response's Last-Modified header, if present. 136 | def get_to_zipinfo(sess, url, z, zi, mediatype, params={}): 137 | r = get(sess, url, mediatype, params) 138 | 139 | if zi.date_time == UNSET_ZIPINFO_DATE_TIME: 140 | last_modified = r.headers.get("Last-Modified") 141 | if last_modified is not None: 142 | zi.date_time = http_date_to_zip_time(last_modified) 143 | 144 | with z.open(zi, mode="w") as f: 145 | for chunk in r.iter_content(4096): 146 | f.write(chunk) 147 | 148 | # Converts a list of path components into a string path, raising an exception if 149 | # any component contains a slash, is "." or "..", or is empty; or if the whole 150 | # path is empty. The checks are to prevent any file writes outside the 151 | # destination directory when the zip file is extracted. We rely on the 152 | # assumption that no other files in the zip file are symbolic links, which is 153 | # true because this program does not create symbolic links. 154 | def make_zip_file_path(*components): 155 | for component in components: 156 | if "/" in component: 157 | raise ValueError("path component contains a slash") 158 | if component == "": 159 | raise ValueError("path component is empty") 160 | if component == ".": 161 | raise ValueError("path component is a self directory reference") 162 | if component == "..": 163 | raise ValueError("path component is a parent directory reference") 164 | if not components: 165 | raise ValueError("path is empty") 166 | return "/".join(components) 167 | 168 | # Fallback to mistune 1.0 renderer if mistune 2.0 is not installed 169 | try: 170 | mistuneRenderer = mistune.HTMLRenderer 171 | except AttributeError: 172 | mistuneRenderer = mistune.Renderer 173 | # Custom mistune.Renderer that stores a list of all links encountered. 174 | class LinkExtractionRenderer(mistuneRenderer): 175 | def __init__(self, **kwargs): 176 | super().__init__(**kwargs) 177 | self.links = [] 178 | 179 | def autolink(self, link, is_email=False): 180 | self.links.append(link) 181 | return super().autolink(link, is_email) 182 | 183 | def image(self, src, title, alt_text): 184 | self.links.append(src) 185 | return super().image(src, title, alt_text) 186 | 187 | def link(self, link, title, content=None): 188 | self.links.append(link) 189 | return super().link(link, title, content) 190 | 191 | def markdown_extract_links(markdown): 192 | renderer = LinkExtractionRenderer() 193 | mistune.Markdown(renderer=renderer)(markdown) # Discard HTML output. 194 | return renderer.links 195 | 196 | # Return seq with prefix stripped if it has such a prefix, or else None. 197 | def strip_prefix(seq, prefix): 198 | if len(seq) < len(prefix): 199 | return None 200 | for a, b in zip(seq, prefix): 201 | if a != b: 202 | return None 203 | return seq[len(prefix):] 204 | 205 | def split_url_path(path): 206 | return tuple(urllib.parse.unquote(component) for component in path.split("/")) 207 | 208 | def strip_url_path_prefix(path, prefix): 209 | return strip_prefix(split_url_path(path), split_url_path(prefix)) 210 | 211 | # If url is one we want to download, return a list of path components for the 212 | # path we want to store it at. 213 | def link_is_wanted(url): 214 | try: 215 | components = urllib.parse.urlparse(url) 216 | except ValueError: 217 | return None 218 | 219 | if components.scheme == "https" and components.netloc == "user-images.githubusercontent.com": 220 | subpath = strip_url_path_prefix(components.path, "") 221 | if subpath is not None: 222 | # Inline image. 223 | return ("user-images.githubusercontent.com", *subpath) 224 | if components.scheme == "https" and components.netloc == "github.com": 225 | for prefix in (f"/{owner}/{repo}/files", "/user-attachments/files"): 226 | subpath = strip_url_path_prefix(components.path, prefix) 227 | if subpath is not None: 228 | # File attachment. 229 | return ("files", *subpath) 230 | if components.scheme == "https" and components.netloc == "avatars.githubusercontent.com": 231 | path = components.path 232 | if components.query: 233 | # Avatar URLs often differ in the presence or absence of query 234 | # parameters. Save the query string with the path, just in case they 235 | # differ. 236 | path += "?" + components.query 237 | subpath = strip_url_path_prefix(path, "") 238 | if subpath is not None: 239 | # Avatar image. 240 | return ("avatars.githubusercontent.com", *subpath) 241 | 242 | def backup(owner, repo, z, username, token): 243 | paths_seen = set() 244 | # Calls make_zip_file_path, and additionally raises an exception if the path 245 | # has already been used. 246 | def check_path(*components): 247 | path = make_zip_file_path(*components) 248 | if path in paths_seen: 249 | raise ValueError(f"duplicate filename {path!a}") 250 | paths_seen.add(path) 251 | return path 252 | 253 | # Escape owner and repo suitably for use in a URL. 254 | owner = urllib.parse.quote(owner, safe="") 255 | repo = urllib.parse.quote(repo, safe="") 256 | 257 | now = datetime.datetime.utcnow() 258 | z.writestr(check_path("README"), f"""\ 259 | Archive of the GitHub repository https://github.com/{owner}/{repo}/ 260 | made {now.strftime("%Y-%m-%d %H:%M:%S")}. 261 | """) 262 | 263 | file_urls = set() 264 | 265 | # HTTP Basic authentication for API. 266 | sess = requests.Session() 267 | sess.auth = requests.auth.HTTPBasicAuth(username, token) 268 | 269 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-repository-issues 270 | issues_url = urllib.parse.urlparse(BASE_URL)._replace( 271 | path=f"/repos/{owner}/{repo}/issues", 272 | ).geturl() 273 | for r in get_paginated(sess, issues_url, MEDIATYPE_REACTIONS, {"sort": "created", "direction": "asc"}): 274 | for issue in r.json(): 275 | check_url_origin(BASE_URL, issue["url"]) 276 | zi = zipfile.ZipInfo(check_path("issues", str(issue["id"]) + ".json"), timestamp_to_zip_time(issue["created_at"])) 277 | get_to_zipinfo(sess, issue["url"], z, zi, MEDIATYPE_REACTIONS) 278 | 279 | # Re-open the JSON file we just wrote, to parse it for links. 280 | with z.open(zi) as f: 281 | data = json.load(f) 282 | for link in itertools.chain(markdown_extract_links(data["body"] or ""), [data["user"]["avatar_url"]]): 283 | link = urllib.parse.urlunparse(urllib.parse.urlparse(link)._replace(fragment = None)) # Discard fragment. 284 | dest = link_is_wanted(link) 285 | if dest is not None: 286 | file_urls.add((dest, link)) 287 | 288 | # There's no API for getting all reactions in a repository, so get 289 | # them per issue and per comment. 290 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/reactions#list-reactions-for-an-issue 291 | reactions_url = issue["reactions"]["url"] 292 | check_url_origin(BASE_URL, reactions_url) 293 | for r2 in get_paginated(sess, reactions_url, MEDIATYPE_REACTIONS): 294 | for reaction in r2.json(): 295 | zi = zipfile.ZipInfo(check_path("issues", str(issue["id"]), "reactions", str(reaction["id"]) + ".json"), timestamp_to_zip_time(reaction["created_at"])) 296 | with z.open(zi, mode="w") as f: 297 | f.write(json.dumps(reaction).encode("utf-8")) 298 | 299 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/issues#list-issue-comments-for-a-repository 300 | # Comments are linked to their parent issue via the issue_url field. 301 | comments_url = urllib.parse.urlparse(BASE_URL)._replace( 302 | path=f"/repos/{owner}/{repo}/issues/comments", 303 | ).geturl() 304 | for r in get_paginated(sess, comments_url, MEDIATYPE_REACTIONS): 305 | for comment in r.json(): 306 | check_url_origin(BASE_URL, comment["url"]) 307 | zi = zipfile.ZipInfo(check_path("issues", "comments", str(comment["id"]) + ".json"), timestamp_to_zip_time(comment["created_at"])) 308 | get_to_zipinfo(sess, comment["url"], z, zi, MEDIATYPE_REACTIONS) 309 | 310 | # Re-open the JSON file we just wrote, to parse it for links. 311 | with z.open(zi) as f: 312 | data = json.load(f) 313 | for link in itertools.chain(markdown_extract_links(data["body"] or ""), [data["user"]["avatar_url"]]): 314 | link = urllib.parse.urlunparse(urllib.parse.urlparse(link)._replace(fragment = None)) # Discard fragment. 315 | dest = link_is_wanted(link) 316 | if dest is not None: 317 | file_urls.add((dest, link)) 318 | 319 | # There's no API for getting all reactions in a repository, so get 320 | # them per issue and per comment. 321 | # https://docs.github.com/en/free-pro-team@latest/rest/reference/reactions#list-reactions-for-an-issue-comment 322 | reactions_url = comment["reactions"]["url"] 323 | check_url_origin(BASE_URL, reactions_url) 324 | for r2 in get_paginated(sess, reactions_url, MEDIATYPE_REACTIONS): 325 | for reaction in r2.json(): 326 | zi = zipfile.ZipInfo(check_path("issues", "comments", str(comment["id"]), "reactions", str(reaction["id"]) + ".json"), timestamp_to_zip_time(reaction["created_at"])) 327 | with z.open(zi, mode="w") as f: 328 | f.write(json.dumps(reaction).encode("utf-8")) 329 | 330 | # TODO: comment edit history (if possible) 331 | 332 | labels_url = urllib.parse.urlparse(BASE_URL)._replace( 333 | path=f"/repos/{owner}/{repo}/labels", 334 | ).geturl() 335 | for r in get_paginated(sess, labels_url, MEDIATYPE): 336 | for label in r.json(): 337 | check_url_origin(BASE_URL, label["url"]) 338 | zi = zipfile.ZipInfo(check_path("labels", str(label["id"]) + ".json")) 339 | get_to_zipinfo(sess, label["url"], z, zi, MEDIATYPE) 340 | 341 | # A new session, without Basic auth, for downloading plain files. 342 | sess = requests.Session() 343 | 344 | for dest, url in sorted(file_urls): 345 | zi = zipfile.ZipInfo(check_path(*dest)) 346 | get_to_zipinfo(sess, url, z, zi, None) 347 | 348 | if __name__ == "__main__": 349 | opts, (repo, zip_filename) = getopt.gnu_getopt(sys.argv[1:], "u:") 350 | for o, a in opts: 351 | if o == "-u": 352 | username, token = a.split(":", 1) 353 | elif o in ("-h", "--help"): 354 | pass 355 | 356 | owner, repo = repo.split("/", 1) 357 | 358 | # Write to a temporary file, then rename to the requested name when 359 | # finished. 360 | with tempfile.NamedTemporaryFile(dir=os.path.dirname(zip_filename), suffix=".zip", delete=False) as f: 361 | try: 362 | with zipfile.ZipFile(f, mode="w") as z: 363 | backup(owner, repo, z, username, token) 364 | os.rename(f.name, zip_filename) 365 | except: 366 | # Delete output zip file on error. 367 | os.remove(f.name) 368 | raise 369 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | mistune >=2.0 2 | requests >=2.25 3 | --------------------------------------------------------------------------------