├── .gitignore ├── LICENSE ├── README.md ├── bin └── update_pages.py ├── pyscp ├── __init__.py ├── core.py ├── orm.py ├── resources │ ├── cover.png │ ├── pages │ │ ├── cover.xhtml │ │ ├── intro.xhtml │ │ ├── license.xhtml │ │ └── title.xhtml │ ├── stafflist.txt │ ├── stylesheet.css │ └── templates │ │ ├── container.xml │ │ ├── content.opf │ │ ├── page.xhtml │ │ └── toc.ncx ├── snapshot.py ├── stats │ ├── __init__.py │ ├── counters.py │ ├── filters.py │ ├── scalars.py │ └── updater.py ├── utils.py └── wikidot.py ├── setup.py └── tests └── test_core.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by .ignore support plugin (hsz.mobi) 2 | *.sublime-* 3 | .idea/ 4 | *.log 5 | *.pass 6 | .project 7 | */.coverage 8 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 anqxyr 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pyscp 2 | 3 | **pyscp** is a python library for interacting with wikidot-hosted websites. The library is mainly intended for use by the administrative staff of the www.scp-wiki.net website, and has a host of feature exclusive to it. However, the majority of the core functionality should be applicalbe to any wikidot-based site. 4 | 5 | ## Installation 6 | 7 | Download the latest code, open the containing folder, and run the following command: 8 | ``` 9 | pip install . --user 10 | ``` 11 | Done. 12 | 13 | ## Examples 14 | 15 | ### Acessing Pages 16 | 17 | ```python 18 | import pyscp 19 | 20 | wiki = pyscp.wikidot.Wiki('www.scp-wiki.net') 21 | p = wiki('scp-837') 22 | print( 23 | '"{}" has a rating of {}, {} revisions, and {} comments.' 24 | .format(p.title, p.rating, len(p.history), len(p.comments))) 25 | ``` 26 | ``` 27 | "SCP-837: Multiplying Clay" has a rating of 108, 14 revisions, and 54 comments. 28 | ``` 29 | 30 | You can access other sites as well: 31 | 32 | ```python 33 | ru_wiki = pyscp.wikidot.Wiki('scpfoundation.ru') 34 | p = ru_wiki('scp-837') 35 | print('"{}" was created by {} on {}.'.format(p.title, p.author, p.created)) 36 | ``` 37 | ``` 38 | "SCP-837 - Глина умножения" was created by Gene R on 2012-12-26 11:12:13. 39 | ``` 40 | 41 | If the site doesn't use a custom domain, you can use the name of the site instead of the full url. E.g. `Wiki('scpsandbox2')` is the same as `Wiki('scpsandbox2.wikidot.com')`. 42 | 43 | ### Editing Pages 44 | 45 | ```python 46 | 47 | wiki = pyscp.wikidot.Wiki('scpsandbox2') 48 | wiki.auth('example_username', 'example_password') 49 | p = wiki('test') 50 | last_revision = p.history[-1].number 51 | p.edit( 52 | source='= This is centered **text** that uses Wikidot markup.', 53 | title="you can skip the title if you don't want changing it", 54 | #you can leave out the comment too, but that'd be rude 55 | comment='testing automated editing') 56 | print(p.text) # see if it worked 57 | p.revert(last_revision) # let's revert it back to what it were. 58 | ``` 59 | ``` 60 | This is centered text that uses Wikidot markup. 61 | ``` 62 | 63 | 64 | ### Snapshots 65 | 66 | When working with large number of pages, it could be faster to create a snapshot of the site than to download the pages one by one. Snapshots are optimized to download a large amount of data in the shortest possible time using multithreading. 67 | 68 | ```python 69 | import pyscp 70 | 71 | creator = pyscp.snapshot.SnapshotCreator('www.scp-wiki.net', 'snapshot_file.db') 72 | creator.take_snapshot(forums=False) 73 | # that's where we wait half an hour for it to finish 74 | ``` 75 | 76 | Once a snapshot is created, you can use `snapshot.Wiki` to read pages same as in the first example: 77 | 78 | ```python 79 | wiki = pyscp.snapshot.Wiki('www.scp-wiki.net', 'snapshot_file.db') 80 | p = wiki('scp-9005-2') 81 | print( 82 | '"{}" has a rating of {}, was created by {}, and is awesome.' 83 | .format(p.title, p.rating, p.author)) 84 | print('Other pages by {}:'.format(p.author)) 85 | for other in wiki.list_pages(author=p.author): 86 | print( 87 | '{} (rating: {}, created: {})' 88 | .format(other.title, other.rating, other.created)) 89 | ``` 90 | ``` 91 | Page "SCP-9005-2" has a rating of 80, was created by yellowdrakex, and is awesome. 92 | Other pages by yellowdrakex: 93 | ClusterfREDACTED (rating: 112, created: 2011-10-20 18:08:49) 94 | Dr Rights' Draft Box (rating: None, created: 2009-02-01 18:58:36) 95 | Dr. Rights' Personal Log (rating: 3, created: 2008-11-26 23:03:27) 96 | Dr. Rights' Personnel File (rating: 13, created: 2008-11-24 20:45:34) 97 | Fifteen To Sixteen (rating: 17, created: 2010-02-15 05:55:58) 98 | Great Short Story Concepts (rating: 1, created: 2010-06-03 19:26:06) 99 | RUN AWAY FOREVURRR (rating: 79, created: 2011-10-24 16:34:23) 100 | SCP-288: The "Stepford Marriage" Rings (rating: 56, created: 2008-11-27 07:47:01) 101 | SCP-291: Disassembler/Reassembler (rating: 113, created: 2008-11-24 20:11:11) 102 | ... 103 | ``` 104 | -------------------------------------------------------------------------------- /bin/update_pages.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Update wiki pages. 5 | 6 | This script is used to update scp-wiki tale hubs and other such pages. 7 | """ 8 | 9 | ############################################################################### 10 | # Module Imports 11 | ############################################################################### 12 | 13 | import arrow 14 | import collections 15 | import logging 16 | import pyscp 17 | import re 18 | import string 19 | 20 | ############################################################################### 21 | 22 | log = logging.getLogger('pyscp') 23 | 24 | ############################################################################### 25 | 26 | TEMPLATE = """ 27 | [[# {name}]] 28 | [[div class="section"]] 29 | +++ {disp} 30 | [#top ⇑] 31 | {header} 32 | {body} 33 | [[/div]] 34 | 35 | """ 36 | 37 | ############################################################################### 38 | 39 | 40 | class Updater: 41 | 42 | def __init__(self, wiki, pages): 43 | self.wiki = wiki 44 | self.pages = pages 45 | 46 | def disp(self): 47 | return self.keys() 48 | 49 | def get_author(self, page): 50 | return page.build_attribution_string( 51 | user_formatter='[[user {}]]', separator=' _\n') 52 | 53 | def get_section(self, idx): 54 | name = self.keys()[idx] 55 | disp = self.disp()[idx] 56 | pages = [p for p in self.pages if self.keyfunc(p) == name] 57 | 58 | if pages: 59 | body = '\n'.join(map( 60 | self.format_page, sorted(pages, key=self.sortfunc))) 61 | else: 62 | body = self.NODATA 63 | 64 | return TEMPLATE.format( 65 | name=name.replace(' ', '-'), 66 | disp=disp, 67 | header=self.HEADER, 68 | body=body) 69 | 70 | def update(self, *targets): 71 | output = [''] 72 | for idx in range(len(self.keys())): 73 | section = self.get_section(idx) 74 | if len(output[-1]) + len(section) < 180000: 75 | output[-1] += section 76 | else: 77 | output.append(section) 78 | for idx, target in enumerate(targets): 79 | source = output[idx] if idx < len(output) else '' 80 | self.wiki(target).revert(0) 81 | self.wiki(target).edit(source, comment='automated update') 82 | log.info('{} {}'.format(target, len(source))) 83 | 84 | ############################################################################### 85 | 86 | 87 | class TaleUpdater(Updater): 88 | 89 | HEADER = '||~ Title||~ Author||~ Created||' 90 | NODATA = '||||||= **NO DATA AVAILABLE**||' 91 | 92 | def format_page(self, page=None): 93 | return '||[[[{}|]]]||{}||//{}//||\n||||||{}||'.format( 94 | page._body['fullname'], self.get_author(page), 95 | page.created[:10], page._body['preview']) 96 | 97 | def update(self, target): 98 | targets = [ 99 | 'component:tales-by-{}-{}'.format(target, i + 1) for i in range(5)] 100 | super().update(*targets) 101 | 102 | 103 | class TalesByTitle(TaleUpdater): 104 | 105 | def keys(self): 106 | return list(string.ascii_uppercase) + ['misc'] 107 | 108 | def keyfunc(self, page): 109 | if not page._body['title']: 110 | return 'misc' 111 | l = page._body['title'][0] 112 | return l.upper() if l.isalpha() else 'misc' 113 | 114 | def sortfunc(self, page): 115 | return page._body['title'].lower() 116 | 117 | 118 | class TalesByAuthor(TaleUpdater): 119 | 120 | def keys(self): 121 | return sorted(list(string.ascii_uppercase) + ['Dr', 'misc']) 122 | 123 | def keyfunc(self, page): 124 | templates = collections.defaultdict(lambda: '{user}') 125 | authors = page.build_attribution_string(templates).split(', ') 126 | author = authors[0] 127 | if re.match(r'Dr[^a-z]|Doctor|Doc[^a-z]', author): 128 | return 'Dr' 129 | elif author[0].isalpha(): 130 | return author[0].upper() 131 | else: 132 | return 'misc' 133 | 134 | def sortfunc(self, page): 135 | author = sorted(page.metadata.keys())[0] 136 | return author.lower() 137 | 138 | 139 | class TalesByDate(TaleUpdater): 140 | 141 | def disp(self): 142 | return [ 143 | arrow.get(i, 'YYYY-MM').format('MMMM YYYY') for i in self.keys()] 144 | 145 | def keys(self): 146 | return [i.format('YYYY-MM') for i in 147 | arrow.Arrow.range('month', arrow.get('2008-07'), arrow.now())] 148 | 149 | def keyfunc(self, page=None): 150 | return page.created[:7] 151 | 152 | def sortfunc(self, page): 153 | return page.created 154 | 155 | 156 | def update_tale_hubs(wiki): 157 | pages = list(wiki.list_pages( 158 | tags='tale -hub -_sys', 159 | body='title created_by created_at preview tags')) 160 | TalesByTitle(wiki, pages).update('title') 161 | TalesByAuthor(wiki, pages).update('author') 162 | TalesByDate(wiki, pages).update('date') 163 | 164 | ############################################################################### 165 | 166 | 167 | class CreditUpdater(Updater): 168 | 169 | HEADER = '' 170 | NODATA = '||||= **NO DATA AVAILABLE**||' 171 | 172 | def format_page(self, page): 173 | return '||[[[{}|{}]]]||{}||'.format( 174 | page._body['fullname'], 175 | page.title.replace('[', '').replace(']', ''), 176 | self.get_author(page)) 177 | 178 | def sortfunc(self, page): 179 | title = [] 180 | for word in re.split('([0-9]+)', page._body['title']): 181 | if word.isdigit(): 182 | title.append(int(word)) 183 | else: 184 | title.append(word.lower()) 185 | return title 186 | 187 | def update(self, target): 188 | super().update('component:credits-' + target) 189 | 190 | 191 | class SeriesCredits(CreditUpdater): 192 | 193 | def __init__(self, wiki, pages, series): 194 | super().__init__(wiki, pages) 195 | self.series = (series - 1) * 1000 196 | 197 | def keys(self): 198 | return ['{:03}-{:03}'.format(i or 2, i + 99) 199 | for i in range(self.series, self.series + 999, 100)] 200 | 201 | def keyfunc(self, page): 202 | num = re.search('[scp]+-([0-9]+)$', page._body['fullname']) 203 | if not num: 204 | return 205 | num = (int(num.group(1)) // 100) * 100 206 | return '{:03}-{:03}'.format(num or 2, num + 99) 207 | 208 | 209 | class MiscCredits(CreditUpdater): 210 | 211 | def __init__(self, wiki, pages): 212 | self.proposals = pyscp.wikidot.Wiki('scp-wiki')('scp-001').links 213 | super().__init__(wiki, pages) 214 | 215 | def keys(self): 216 | return 'proposals explained joke archived'.split() 217 | 218 | def disp(self): 219 | return [ 220 | '001 Proposals', 'Explained Phenomena', 221 | 'Joke Articles', 'Archived Articles'] 222 | 223 | def keyfunc(self, page): 224 | if page.url in self.proposals: 225 | return 'proposals' 226 | for tag in ('explained', 'joke', 'archived'): 227 | if tag in page.tags: 228 | return tag 229 | 230 | 231 | def update_credit_hubs(wiki): 232 | pages = list(wiki.list_pages( 233 | tag='scp', body='title created_by tags')) 234 | wiki = pyscp.wikidot.Wiki('scpsandbox2') 235 | with open('pyscp_bot.pass') as file: 236 | wiki.auth('jarvis-bot', file.read()) 237 | 238 | SeriesCredits(wiki, pages, 1).update('series1') 239 | SeriesCredits(wiki, pages, 2).update('series2') 240 | SeriesCredits(wiki, pages, 3).update('series3') 241 | MiscCredits(wiki, pages).update('misc') 242 | 243 | ############################################################################### 244 | 245 | wiki = pyscp.wikidot.Wiki('scp-wiki') 246 | with open('/media/hdd0/code/pyscp/bin/pyscp_bot.pass') as file: 247 | wiki.auth('jarvis-bot', file.read()) 248 | 249 | pyscp.utils.default_logging() 250 | #update_credit_hubs(wiki) 251 | 252 | update_tale_hubs(wiki) 253 | -------------------------------------------------------------------------------- /pyscp/__init__.py: -------------------------------------------------------------------------------- 1 | from pyscp import core, utils, snapshot, wikidot -------------------------------------------------------------------------------- /pyscp/core.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Abstract Base Classes. 5 | 6 | pyscp builds most of its functionality on top of three large classes: Wiki, 7 | Page, and Thread. This module contains the abstract base classes for those 8 | three. The ABC-s define the abstact methods that each child must implement, 9 | as well as some common functionality that builds on top of the abstract 10 | methods. 11 | 12 | Each class inheriting from the ABC-s must implement its own realization of 13 | the abstract methods, and can also provide additional methods unique to it. 14 | 15 | This module also defines the named tuples for simple containers used by the 16 | three core classes, such as Revision or Vote. 17 | """ 18 | 19 | 20 | ############################################################################### 21 | # Module Imports 22 | ############################################################################### 23 | 24 | import abc 25 | import arrow 26 | import bs4 27 | import collections 28 | import functools 29 | import itertools 30 | import re 31 | import urllib.parse 32 | import logging 33 | 34 | import pyscp.utils 35 | 36 | ############################################################################### 37 | # Global Constants And Variables 38 | ############################################################################### 39 | 40 | log = logging.getLogger(__name__) 41 | 42 | ############################################################################### 43 | # Abstract Base Classes 44 | ############################################################################### 45 | 46 | 47 | class Page(metaclass=abc.ABCMeta): 48 | """ 49 | Page Abstract Base Class. 50 | 51 | Page object are wrappers around individual wiki-pages, and allow simple 52 | operations with them, such as retrieving the rating or the author. 53 | 54 | Each Page instance is attached to a specific instance of the Wiki class. 55 | The wiki may be used by the page to retrieve a list of titles or other 56 | similar wiki-wide information that may be used by the Page to, in turn, 57 | deduce some information about itself. 58 | 59 | Typically, the Page instances should not be created directly. Instead, 60 | calling an instance of a Wiki class will creating a Page instance 61 | attached to that wiki. 62 | """ 63 | 64 | ########################################################################### 65 | # Special Methods 66 | ########################################################################### 67 | 68 | def __init__(self, wiki, url): 69 | self.url = url 70 | self._wiki = wiki 71 | 72 | def __repr__(self): 73 | return '{}.{}({}, {})'.format( 74 | self.__module__, self.__class__.__name__, 75 | repr(self.url), repr(self._wiki)) 76 | 77 | def __eq__(self, other): 78 | if not hasattr(other, 'url') or not hasattr(other, '_wiki'): 79 | return False 80 | return self.url == other.url and self._wiki is other._wiki 81 | 82 | ########################################################################### 83 | # Abstract Methods 84 | ########################################################################### 85 | 86 | @property 87 | @abc.abstractmethod 88 | def _pdata(self): 89 | """ 90 | Commonly used data about the page. 91 | 92 | This method should return a tuple, the first three elements of which 93 | are the id number of the page; the id number of the page's comments 94 | thread; and the html contents of the page. 95 | 96 | Any additional elements of the tuple are left to the discretion 97 | of the individual Page implimentations. 98 | """ 99 | pass 100 | 101 | @property 102 | @abc.abstractmethod 103 | def history(self): 104 | """ 105 | Revision history of the page. 106 | 107 | Should return a sorted list of Revision named tuples. 108 | """ 109 | pass 110 | 111 | @property 112 | @abc.abstractmethod 113 | def votes(self): 114 | """ 115 | Page votes. 116 | 117 | Should return a list of Vote named tuples. 118 | """ 119 | pass 120 | 121 | @property 122 | @abc.abstractmethod 123 | def tags(self): 124 | """ 125 | Page tags. 126 | 127 | Should return a set of strings. 128 | """ 129 | pass 130 | 131 | ########################################################################### 132 | # Internal Methods 133 | ########################################################################### 134 | 135 | @property 136 | def _id(self): 137 | """Unique ID number of the page.""" 138 | return self._pdata[0] 139 | 140 | @pyscp.utils.cached_property 141 | def _thread(self): 142 | """Thread object corresponding to the page's comments thread.""" 143 | return self._wiki.Thread(self._wiki, self._pdata[1]) 144 | 145 | @property 146 | def _raw_title(self): 147 | """Title as displayed on the page.""" 148 | title = self._soup.find(id='page-title') 149 | return title.text.strip() if title else '' 150 | 151 | @property 152 | def _raw_author(self): 153 | return self.history[0].user 154 | 155 | @property 156 | def _soup(self): 157 | """BeautifulSoup of the contents of the page.""" 158 | return bs4.BeautifulSoup(self.html, 'lxml') 159 | 160 | ########################################################################### 161 | # Properties 162 | ########################################################################### 163 | 164 | @property 165 | def html(self): 166 | """HTML contents of the page.""" 167 | return self._pdata[2] 168 | 169 | @property 170 | def posts(self): 171 | """List of the comments made on the page.""" 172 | return self._thread.posts 173 | 174 | @property 175 | def comments(self): 176 | """Alias for Page.posts.""" 177 | return self._thread.posts 178 | 179 | @property 180 | def text(self): 181 | """Plain text of the page.""" 182 | return self._soup.find(id='page-content').text 183 | 184 | @property 185 | def wordcount(self): 186 | """Number of words encountered on the page.""" 187 | return len(re.findall(r"[\w'█_-]+", self.text)) 188 | 189 | @property 190 | def images(self): 191 | """Number of images dislayed on the page.""" 192 | # TODO: needs more work. 193 | return [i['src'] for i in self._soup('img')] 194 | 195 | @property 196 | def name(self): 197 | return self.url.split('/')[-1] 198 | 199 | @property 200 | def title(self): 201 | """ 202 | Title of the page. 203 | 204 | In case of SCP articles, will include the title from the 'series' page. 205 | """ 206 | try: 207 | return '{}: {}'.format( 208 | self._raw_title, self._wiki.titles()[self.url]) 209 | except KeyError: 210 | return self._raw_title 211 | 212 | @property 213 | def created(self): 214 | """When was the page created.""" 215 | return self.history[0].time 216 | 217 | @property 218 | def metadata(self): 219 | """ 220 | Return page metadata. 221 | 222 | Authors in this case includes all users related to the creation 223 | and subsequent maintenance of the page. The values of the dict 224 | describe the user's relationship to the page. 225 | """ 226 | data = [i for i in self._wiki.metadata() if i.url == self.url] 227 | data = {i.user: i for i in data} 228 | 229 | if 'author' not in {i.role for i in data.values()}: 230 | meta = Metadata(self.url, self._raw_author, 'author', None) 231 | data[self._raw_author] = meta 232 | 233 | for k, v in data.items(): 234 | if v.role == 'author' and not v.date: 235 | data[k] = v._replace(date=self.created) 236 | 237 | return data 238 | 239 | @property 240 | def rating(self): 241 | """Rating of the page, excluding deleted accounts.""" 242 | return sum( 243 | v.value for v in self.votes if v.user != '(account deleted)') 244 | 245 | @property 246 | @pyscp.utils.listify() 247 | def links(self): 248 | """ 249 | Other pages linked from this one. 250 | 251 | Returns an ordered list of unique urls. Off-site links or links to 252 | images are not included. 253 | """ 254 | unique = set() 255 | for element in self._soup.select('#page-content a'): 256 | href = element.get('href', None) 257 | if (not href or href[0] != '/' or # bad or absolute link 258 | href[-4:] in ('.png', '.jpg', '.gif')): 259 | continue 260 | url = self._wiki.site + href.rstrip('|') 261 | if url not in unique: 262 | unique.add(url) 263 | yield url 264 | 265 | @property 266 | def parent(self): 267 | """Parent of the current page.""" 268 | if not self.html: 269 | return None 270 | breadcrumb = self._soup.select('#breadcrumbs a') 271 | if breadcrumb: 272 | return self._wiki.site + breadcrumb[-1]['href'] 273 | 274 | @property 275 | def is_mainlist(self): 276 | """ 277 | Indicate whether the page is a mainlist scp article. 278 | 279 | This is an scp-wiki exclusive property. 280 | """ 281 | if 'scp-wiki' not in self._wiki.site: 282 | return False 283 | if 'scp' not in self.tags: 284 | return False 285 | return bool(re.search(r'/scp-[0-9]{3,4}$', self.url)) 286 | 287 | ########################################################################### 288 | # Methods 289 | ########################################################################### 290 | 291 | def build_attribution_string( 292 | self, templates=None, group_templates=None, separator=', ', 293 | user_formatter=None): 294 | """ 295 | Create an attribution string based on the page's metadata. 296 | 297 | This is a commonly needed operation. The result should be a nicely 298 | formatted, human-readable description of who was and is involved with 299 | the page, and in what role. 300 | """ 301 | roles = 'author rewrite translator maintainer'.split() 302 | 303 | if not templates: 304 | templates = {i: '{{user}} ({})'.format(i) for i in roles} 305 | 306 | items = list(self.metadata.values()) 307 | items.sort(key=lambda x: [roles.index(x.role), x.date]) 308 | 309 | # group users in the same role on the same date together 310 | itemdict = collections.OrderedDict() 311 | for i in items: 312 | user = user_formatter.format(i.user) if user_formatter else i.user 313 | key = (i.role, i.date) 314 | itemdict[key] = itemdict.get(key, []) + [user] 315 | 316 | output = [] 317 | 318 | for (role, date), users in itemdict.items(): 319 | 320 | hdate = arrow.get(date).humanize() if date else '' 321 | 322 | if group_templates and len(users) > 1: 323 | output.append( 324 | group_templates[role].format( 325 | date=date, 326 | hdate=hdate, 327 | users=', '.join(users[:-1]), 328 | last_user=users[-1])) 329 | else: 330 | for user in users: 331 | output.append( 332 | templates[role].format( 333 | date=date, hdate=hdate, user=user)) 334 | 335 | return separator.join(output) 336 | 337 | 338 | class Thread(metaclass=abc.ABCMeta): 339 | """ 340 | Thread Abstract Base Class. 341 | 342 | Thread objects represent individual forum threads. Most pages have a 343 | corresponding comments thread, accessible via Page._thread. 344 | """ 345 | 346 | def __init__(self, wiki, _id, title=None, description=None): 347 | self._wiki = wiki 348 | self._id, self.title, self.description = _id, title, description 349 | 350 | @abc.abstractmethod 351 | def posts(self): 352 | """Posts in this thread.""" 353 | pass 354 | 355 | 356 | class Wiki(metaclass=abc.ABCMeta): 357 | """ 358 | Wiki Abstract Base Class. 359 | 360 | Wiki objects provide wiki-wide functionality not limited to individual 361 | pages or threads. 362 | """ 363 | 364 | ########################################################################### 365 | # Class Attributes 366 | ########################################################################### 367 | 368 | # should point to the respective Page and Thread classes in each submodule. 369 | 370 | Page = Page 371 | Thread = Thread 372 | 373 | ########################################################################### 374 | # Special Methods 375 | ########################################################################### 376 | 377 | def __init__(self, site): 378 | parsed = urllib.parse.urlparse(site) 379 | netloc = parsed.netloc if parsed.netloc else parsed.path 380 | if '.' not in netloc: 381 | netloc += '.wikidot.com' 382 | self.site = urllib.parse.urlunparse(['http', netloc, '', '', '', '']) 383 | self._title_data = {} 384 | 385 | def __call__(self, name): 386 | url = name if self.site in name else '{}/{}'.format(self.site, name) 387 | url = url.replace(' ', '-').replace('_', '-').lower() 388 | return self.Page(self, url) 389 | 390 | ########################################################################### 391 | 392 | @functools.lru_cache(maxsize=1) 393 | def metadata(self): 394 | """ 395 | List page ownership metadata. 396 | 397 | This method is exclusive to the scp-wiki, and is used to fine-tune 398 | the page ownership information beyond what is possible with Wikidot. 399 | This allows a single page to have an author different from the user 400 | who created the zeroth revision of the page, or even have multiple 401 | users attached to the page in various roles. 402 | """ 403 | if 'scp-wiki' not in self.site: 404 | return [] 405 | soup = self('attribution-metadata')._soup 406 | results = [] 407 | for row in soup('tr')[1:]: 408 | name, user, type_, date = [i.text.strip() for i in row('td')] 409 | name = name.lower() 410 | url = '{}/{}'.format(self.site, name) 411 | results.append(pyscp.core.Metadata(url, user, type_, date)) 412 | return results 413 | 414 | def _update_titles(self): 415 | for name in ( 416 | 'scp-series', 'scp-series-2', 'scp-series-3', 'scp-series-4', 'scp-series-5', 417 | 'joke-scps', 'scp-ex', 'archived-scps'): 418 | page = self(name) 419 | try: 420 | soup = page._soup 421 | except: 422 | continue 423 | self._title_data[name] = soup 424 | 425 | @functools.lru_cache(maxsize=1) 426 | @pyscp.utils.ignore(value={}) 427 | @pyscp.utils.log_errors(logger=log.error) 428 | def titles(self): 429 | """Dict of url/title pairs for scp articles.""" 430 | if 'scp-wiki' not in self.site: 431 | return {} 432 | 433 | self._update_titles() 434 | 435 | elems = [i.select('ul > li') for i in self._title_data.values()] 436 | elems = list(itertools.chain(*elems)) 437 | try: 438 | elems += list(self('scp-001')._soup(class_='series')[1]('p')) 439 | except: 440 | pass 441 | 442 | titles = {} 443 | for elem in elems: 444 | 445 | sep = ' - ' if ' - ' in elem.text else ', ' 446 | try: 447 | url1 = self.site + elem.a['href'] 448 | skip, title = elem.text.split(sep, maxsplit=1) 449 | except (ValueError, TypeError): 450 | continue 451 | 452 | if title != '[ACCESS DENIED]': 453 | url2 = self.site + '/' + skip.lower() 454 | titles[url1] = titles[url2] = title 455 | 456 | return titles 457 | 458 | def list_pages(self, **kwargs): 459 | """Return pages matching the specified criteria.""" 460 | pages = self._list_pages_parsed(**kwargs) 461 | author = kwargs.pop('author', None) 462 | if not author: 463 | # if 'author' isn't specified, there's no need to check rewrites 464 | return pages 465 | include, exclude = set(), set() 466 | for meta in self.metadata(): 467 | if meta.user == author: 468 | # if username matches, include regardless of type 469 | include.add(meta.url) 470 | elif meta.role == 'author': 471 | # exclude only if override type is author. 472 | # if url has author and rewrite author, 473 | # it will appear in list_pages for both. 474 | exclude.add(meta.url) 475 | urls = {p.url for p in pages} | include - exclude 476 | # if no other options beside author were specified, 477 | # just return everything we can 478 | if not kwargs: 479 | return map(self, sorted(urls)) 480 | # otherwise, retrieve the list of urls without the author parameter 481 | # to check which urls we should return and in which order 482 | pages = self._list_pages_parsed(**kwargs) 483 | return [p for p in pages if p.url in urls] 484 | 485 | ############################################################################### 486 | # Named Tuple Containers 487 | ############################################################################### 488 | 489 | nt = collections.namedtuple 490 | Revision = nt('Revision', 'id number user time comment') 491 | Vote = nt('Vote', 'user value') 492 | Post = nt('Post', 'id title content user time parent') 493 | File = nt('File', 'url name filetype size') 494 | Metadata = nt('Metadata', 'url user role date') 495 | Category = nt('Category', 'id title description size') 496 | Image = nt('Image', 'url source status notes data') 497 | del nt 498 | -------------------------------------------------------------------------------- /pyscp/orm.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | ############################################################################### 4 | # Module Imports 5 | ############################################################################### 6 | 7 | import concurrent.futures 8 | import logging 9 | import peewee 10 | import queue 11 | 12 | from itertools import islice 13 | 14 | ############################################################################### 15 | # Global Constants And Variables 16 | ############################################################################### 17 | 18 | log = logging.getLogger('pyscp.orm') 19 | pool = concurrent.futures.ThreadPoolExecutor(max_workers=1) 20 | queue = queue.Queue() 21 | 22 | 23 | def queue_execution(fn, args=(), kw={}): 24 | queue.put(dict(fn=fn, args=args, kw=kw)) 25 | pool.submit(async_write) 26 | 27 | ############################################################################### 28 | # Database ORM Classes 29 | ############################################################################### 30 | 31 | db = peewee.Proxy() 32 | 33 | 34 | class BaseModel(peewee.Model): 35 | 36 | class Meta: 37 | database = db 38 | 39 | @classmethod 40 | def create(cls, **kw): 41 | queue_execution(fn=super().create, kw=kw) 42 | 43 | @classmethod 44 | def create_table(cls): 45 | if not hasattr(cls, '_id_cache'): 46 | cls._id_cache = [] 47 | queue_execution(fn=super().create_table, args=(True,)) 48 | 49 | @classmethod 50 | def insert_many(cls, data): 51 | data_iter = iter(data) 52 | chunk = list(islice(data_iter, 500)) 53 | while chunk: 54 | queue_execution( 55 | fn=lambda x: super(BaseModel, cls).insert_many(x).execute(), 56 | args=(chunk, )) 57 | chunk = list(islice(data_iter, 500)) 58 | 59 | @classmethod 60 | def convert_to_id(cls, data, key='user'): 61 | for row in data: 62 | if row[key] not in cls._id_cache: 63 | cls._id_cache.append(row[key]) 64 | row[key] = cls._id_cache.index(row[key]) + 1 65 | yield row 66 | 67 | @classmethod 68 | def write_ids(cls, field_name): 69 | cls.insert_many([ 70 | {'id': cls._id_cache.index(value) + 1, field_name: value} 71 | for value in set(cls._id_cache)]) 72 | cls._id_cache.clear() 73 | 74 | 75 | class ForumCategory(BaseModel): 76 | title = peewee.CharField() 77 | description = peewee.TextField() 78 | 79 | 80 | class ForumThread(BaseModel): 81 | category = peewee.ForeignKeyField(ForumCategory, null=True) 82 | title = peewee.CharField(null=True) 83 | description = peewee.TextField(null=True) 84 | 85 | 86 | class Page(BaseModel): 87 | url = peewee.CharField(unique=True) 88 | html = peewee.TextField() 89 | thread = peewee.ForeignKeyField( 90 | ForumThread, related_name='page', null=True) 91 | 92 | 93 | class User(BaseModel): 94 | name = peewee.CharField(unique=True) 95 | 96 | 97 | class Revision(BaseModel): 98 | page = peewee.ForeignKeyField(Page, related_name='revisions', index=True) 99 | user = peewee.ForeignKeyField(User, related_name='revisions', index=True) 100 | number = peewee.IntegerField() 101 | time = peewee.DateTimeField() 102 | comment = peewee.CharField(null=True) 103 | 104 | 105 | class Vote(BaseModel): 106 | page = peewee.ForeignKeyField(Page, related_name='votes', index=True) 107 | user = peewee.ForeignKeyField(User, related_name='votes', index=True) 108 | value = peewee.IntegerField() 109 | 110 | 111 | class ForumPost(BaseModel): 112 | thread = peewee.ForeignKeyField( 113 | ForumThread, related_name='posts', index=True) 114 | user = peewee.ForeignKeyField(User, related_name='posts', index=True) 115 | parent = peewee.ForeignKeyField('self', null=True) 116 | title = peewee.CharField(null=True) 117 | time = peewee.DateTimeField() 118 | content = peewee.TextField() 119 | 120 | 121 | class Tag(BaseModel): 122 | name = peewee.CharField(unique=True) 123 | 124 | 125 | class PageTag(BaseModel): 126 | page = peewee.ForeignKeyField(Page, related_name='tags', index=True) 127 | tag = peewee.ForeignKeyField(Tag, related_name='pages', index=True) 128 | 129 | 130 | class OverrideType(BaseModel): 131 | name = peewee.CharField(unique=True) 132 | 133 | 134 | class Override(BaseModel): 135 | url = peewee.ForeignKeyField(Page, to_field=Page.url, index=True) 136 | user = peewee.ForeignKeyField(User, index=True) 137 | type = peewee.ForeignKeyField(OverrideType) 138 | 139 | 140 | class ImageStatus(BaseModel): 141 | name = peewee.CharField(unique=True) 142 | 143 | 144 | class Image(BaseModel): 145 | url = peewee.CharField(unique=True) 146 | source = peewee.CharField() 147 | data = peewee.BlobField() 148 | status = peewee.ForeignKeyField(ImageStatus) 149 | notes = peewee.TextField(null=True) 150 | 151 | ############################################################################### 152 | # Helper Functions 153 | ############################################################################### 154 | 155 | 156 | def async_write(buffer=[]): 157 | item = queue.get() 158 | buffer.append(item) 159 | if len(buffer) > 500 or queue.empty(): 160 | log.debug('Processing {} queue items.'.format(len(buffer))) 161 | with db.transaction(): 162 | write_buffer(buffer) 163 | buffer.clear() 164 | 165 | 166 | def write_buffer(buffer): 167 | for item in buffer: 168 | try: 169 | item['fn'](*item.get('args', ()), **item.get('kw', {})) 170 | except: 171 | log.exception( 172 | 'Exception while processing queue item: {}' 173 | .format(item)) 174 | queue.task_done() 175 | 176 | 177 | def create_tables(*tables): 178 | for table in tables: 179 | eval(table).create_table() 180 | 181 | 182 | def connect(dbpath): 183 | log.info('Connecting to the database at {}'.format(dbpath)) 184 | db.initialize(peewee.SqliteDatabase(dbpath)) 185 | db.connect() 186 | 187 | 188 | ############################################################################### 189 | # Macros 190 | ############################################################################### 191 | 192 | 193 | def votes_by_user(user): 194 | up, down = [], [] 195 | for vote in (Vote.select().join(User).where(User.name == user)): 196 | if vote.value == 1: 197 | up.append(vote.page.url) 198 | else: 199 | down.append(vote.page.url) 200 | return {'+': up, '-': down} 201 | -------------------------------------------------------------------------------- /pyscp/resources/cover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anqxyr/pyscp/fc85c808495f8f47783db6fb12a79ce7727e919c/pyscp/resources/cover.png -------------------------------------------------------------------------------- /pyscp/resources/pages/cover.xhtml: -------------------------------------------------------------------------------- 1 |
Mankind in its present state has been around for a quarter of a million years, yet only the last 4,000 have been of any significance.
3 |So, what did we do for nearly 250,000 years? We huddled in caves and around small fires, fearful of the things that we didn't understand. It was more than explaining why the sun came up, it was the mystery of enormous birds with heads of men and rocks that came to life. So we called them 'gods' and 'demons', begged them to spare us, and prayed for salvation.
4 |In time, their numbers dwindled and ours rose. The world began to make more sense when there were fewer things to fear, yet the unexplained can never truly go away, as if the universe demands the absurd and impossible.
5 |Mankind must not go back to hiding in fear.No one else will protect us, and we must stand up for ourselves.
6 |While the rest of mankind dwells in the light, we must stand in the darkness to fight it, contain it, and shield it from the eyes of the public, so that others may live in a sane and normal world.
7 |We secure. We contain. We protect.
9 |— The Administrator
10 |This book contains the collected works of the SCP Foundation, a collaborative fiction writing website. All contents are licensed under the CC-BY-SA 3.0 license. The stories comprising the book are available online at www.scp-wiki.net .