├── .gitignore ├── LICENSE ├── README.md ├── pykcd └── __init__.py ├── requirements.txt └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # Distribution / packaging 6 | .Python 7 | env/ 8 | bin/ 9 | build/ 10 | develop-eggs/ 11 | dist/ 12 | eggs/ 13 | lib/ 14 | lib64/ 15 | parts/ 16 | sdist/ 17 | var/ 18 | *.egg-info/ 19 | .installed.cfg 20 | *.egg 21 | 22 | # Installer logs 23 | pip-log.txt 24 | pip-delete-this-directory.txt -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 JacobLandau 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pykcd [![PyPI version](https://badge.fury.io/py/pykcd.svg)](https://badge.fury.io/py/pykcd) 2 | Python interface for the XKCD API 3 | 4 | ## Usage 5 | 6 | The strip object can be initialized like so: 7 | 8 | Strip = pykcd.XKCDStrip(strip_number) 9 | 10 | The full berth of properties can be found using the help function. Here's a sampling. 11 | 12 | * Alt text 13 | 14 | In [1]: XKCDStrip(50).alt_text 15 | Out[1]: 'Of course, Penny Arcade has already mocked themselves for this. They don't care." 16 | 17 | * Image link 18 | 19 | In [2]: XKCDStrip(732).image_link 20 | Out[2]: 'http://imgs.xkcd.com/comics/hdtv.png' 21 | 22 | * Downloading Strips 23 | 24 | In [3]: XKCDStrip(178).download_strip() 25 | 100% [...................................] 18611 / 18611 26 | // Downloaded to /XKCD_Archive/ in the working directory 27 | 28 | ## Under the Hood 29 | 30 | Each XKCD strip, barring Strip #404 ([Funny funny](http://www.explainxkcd.com/wiki/index.php/404)), has a JSON document located at "www.xkcd.com/#/info.0.json". This contains references to data such as the day, month and year published, the strip transcript, the image hotlink, the alt text, and other details. By using the requests library, this document can be grabbed and parsed into a standard Python dictionary, through which the data can be referenced and accessed by it's respective keys. 31 | 32 | Image links present a unique challenge in the case of large strips such as [Strip #802: Online Communities 2](http://www.explainxkcd.com/wiki/index.php/802:_Online_Communities_2), which have their img key point to a thumbnail rather than the full-resolution hotlink. The solution lies within the link key, which points to a page containing only the full resolution image. We can scrape out the link from this page using BeautifulSoup, and return this value when the user asks for the image link from one of these large strips. 33 | 34 | Wget is used in order to download the strips to the '/XKCD_Archive/' folder in the working directory, which will be created if the directory does not already exist. It will check to see if the file is already present, and names the file according to a "Number - Title" scheme. Any characters in the title not friendly with Windows filesystems will be filtered out using a lambda function. 35 | 36 | ## Why? 37 | 38 | Why not. 39 | -------------------------------------------------------------------------------- /pykcd/__init__.py: -------------------------------------------------------------------------------- 1 | from bs4 import BeautifulSoup 2 | import os, wget, requests, shutil 3 | 4 | 5 | 6 | class XKCDStrip(): 7 | ''' 8 | Strip object for given XKCD strip 9 | ''' 10 | 11 | 12 | def __init__(self, strip_num): 13 | ''' 14 | Constructs the strip object 15 | 16 | :param strip_num: The number of the strip 17 | ''' 18 | 19 | # The location of the API for the strip 20 | # Interpolates the strip number into the domain 21 | self.json_domain = 'http://www.xkcd.com/{}/info.0.json'.format(strip_num) 22 | self.user_agent = 'pykcd/1.0.0 (+https://github.com/JacobLandau/pykcd/)' 23 | 24 | # Creates the identifiers for each property, 25 | # so the value can be assigned later, when needed 26 | self.strip_num = strip_num 27 | self.transcript = None 28 | self.news = None 29 | self.title = None 30 | self.day = None 31 | self.month = None 32 | self.year = None 33 | self.link = None 34 | self.alt_text = None 35 | self.image_link = None 36 | 37 | # If the strip is No. 404, there is no API. 38 | # Thus, we continue the April Fools joke by 39 | # returning 'value not found' for each property 40 | if self.strip_num == 404: 41 | self.transcript = '404 - Transcript Not Found' 42 | self.news = '404 - News Not Found' 43 | self.title = '404 - Title Not Found' 44 | self.day = '404 - Day Not Found' 45 | self.month = '404 - Month Not Found' 46 | self.year = '404 - Year Not Found' 47 | self.link = '404 - Hyperlink Not Found' 48 | self.alt_text = '404 - Alt Text Not Found' 49 | self.image_link = '404 - Image Link Not Found' 50 | else: 51 | # This grabs the JSON document and parses it into a dictionary 52 | # It is saved for here because since the JSON for No. 404 53 | # doesn't exist, it would fittingly return a 404 error. 54 | self.url = requests.get(self.json_domain, headers={'user-agent':self.user_agent, 'content-type':'application/json'}) 55 | self.url.raise_for_status() 56 | self.strip = self.url.json() 57 | 58 | # Grabs the value for each from their dictionary entry 59 | self.transcript = self.strip['transcript'] 60 | self.news = self.strip['news'] 61 | self.title = self.strip['title'] 62 | self.day = self.strip['day'] 63 | self.month = self.strip['month'] 64 | self.year = self.strip['year'] 65 | self.link = self.strip['link'] 66 | self.alt_text = self.strip['alt'] 67 | 68 | # If the strip has a hyperlink, it may lead to a larger version 69 | if self.link != '': 70 | # If the strip links to a large version 71 | # We get the image link for the large version instead 72 | # Using a temporary web scraper 73 | if 'large' in self.link: 74 | temp_soup = BeautifulSoup(requests.get(self.link, headers={'user-agent':self.user_agent, 'content-type':'text/html'}).content, 'lxml') 75 | link = temp_soup.img 76 | self.image_link = link.get('src') 77 | else: 78 | self.image_link = self.strip['img'] 79 | else: 80 | self.image_link = self.strip['img'] 81 | 82 | @property 83 | def strip_num(self): 84 | ''' 85 | The strip's number 86 | ''' 87 | return self._strip_num 88 | 89 | @strip_num.setter 90 | def strip_num(self, strip_num): 91 | self._strip_num = strip_num 92 | 93 | @property 94 | def transcript(self): 95 | ''' 96 | The strip's transcript 97 | ''' 98 | return self._transcript 99 | 100 | @transcript.setter 101 | def transcript(self, transcript): 102 | self._transcript = transcript 103 | 104 | @property 105 | def news(self): 106 | ''' 107 | The strip's news posting 108 | ''' 109 | return self._news 110 | 111 | @news.setter 112 | def news(self, news): 113 | self._news = news 114 | 115 | @property 116 | def title(self): 117 | ''' 118 | The strip's title 119 | ''' 120 | return self._title 121 | 122 | @title.setter 123 | def title(self, title): 124 | self._title = title 125 | 126 | @property 127 | def day(self): 128 | ''' 129 | Day strip was published 130 | ''' 131 | return self._day 132 | 133 | @day.setter 134 | def day(self, day): 135 | self._day = day 136 | 137 | @property 138 | def month(self): 139 | ''' 140 | Month strip was published 141 | ''' 142 | return self._month 143 | 144 | @month.setter 145 | def month(self, month): 146 | self._month = month 147 | 148 | @property 149 | def year(self): 150 | ''' 151 | Year strip was published 152 | ''' 153 | return self._year 154 | 155 | @year.setter 156 | def year(self, year): 157 | self._year = year 158 | 159 | @property 160 | def link(self): 161 | ''' 162 | Domain strip is hyperlinked towards 163 | ''' 164 | return self._link 165 | 166 | @link.setter 167 | def link(self, link): 168 | self._link = link 169 | 170 | @property 171 | def alt_text(self): 172 | ''' 173 | The strip's alt text 174 | ''' 175 | return self._alt_text 176 | 177 | @alt_text.setter 178 | def alt_text(self, alt_text): 179 | self._alt_text = alt_text 180 | 181 | @property 182 | def image_link(self): 183 | ''' 184 | The strip's image link 185 | ''' 186 | return self._image_link 187 | 188 | @image_link.setter 189 | def image_link(self, image_link): 190 | self._image_link = image_link 191 | 192 | def download_strip(self): 193 | ''' 194 | Downloads the strip into the working directory 195 | ''' 196 | 197 | # Creates the archive folder if it doesn't already exist 198 | if os.path.exists('XKCD Archive'): 199 | pass 200 | else: 201 | os.mkdir('XKCD Archive') 202 | 203 | archive_directory = './XKCD Archive/' 204 | 205 | if self.strip_num == 404: 206 | file = open('{}404 - Item Not Found'.format(archive_directory), mode='w') 207 | file.close() 208 | else: 209 | try: 210 | # Gets the name of the strip, filtering out any characters 211 | # which are invalid for Windows filenaming conventions 212 | strip_title = ''.join(filter(lambda x: x not in '\/:*?"<>|', self.title)) 213 | 214 | image_link = self.image_link 215 | 216 | # The file name will be the strip title plus the file extension as grabbed by the image link 217 | file_name = strip_title + image_link[-4:] 218 | 219 | # Checks if the strip is already downloaded 220 | # if the strip is, notifies user and iterates loop 221 | # if the strip is not, downloads and renames 222 | if os.path.exists(archive_directory + file_name) or os.path.exists(archive_directory + str(self.strip_num) + ' - ' + file_name): 223 | print('-' * (shutil.get_terminal_size()[0] - 1)) 224 | print('{} ALREADY DOWNLOADED'.format(self.strip_num)) 225 | else: 226 | wget.download(image_link, archive_directory) 227 | os.rename('{}{}'.format(archive_directory, image_link[28:]), '{}{} - {}'.format(archive_directory, self.strip_num, file_name)) 228 | 229 | # Runs if the system throws a UnicodeEncodeError, which will only happen 230 | # when it tries to print a unicode character to the console 231 | # In that case, we substitute the usual line printed to the console 232 | # for a cheeky, console-safe stand-in 233 | except UnicodeEncodeError: 234 | print('\n{} - This title cannot be printed because unicode hates you.'.format(self.strip_num)) 235 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4==4.4.1 2 | requests==2.32.0 3 | wget==2.2 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup(name='pykcd', 4 | version = '1.0.1', 5 | description = 'Python interface/wrapper for the XKCD API', 6 | url = 'https://github.com/JacobLandau/pykcd', 7 | author = 'Jacob Landau', 8 | author_email = 'Jacob@PopcornFlicks.ca', 9 | license = 'MIT', 10 | packages = ['pykcd'], 11 | install_requires = ['BeautifulSoup4', 'requests', 'wget', 'lxml'], 12 | zip_safe = False) 13 | --------------------------------------------------------------------------------