├── .gitignore
├── LICENSE
├── README.md
├── pykcd
    └── __init__.py
├── requirements.txt
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Compiled
 2 | __pycache__/
 3 | *.py[cod]
 4 | 
 5 | # Distribution / packaging
 6 | .Python
 7 | env/
 8 | bin/
 9 | build/
10 | develop-eggs/
11 | dist/
12 | eggs/
13 | lib/
14 | lib64/
15 | parts/
16 | sdist/
17 | var/
18 | *.egg-info/
19 | .installed.cfg
20 | *.egg
21 | 
22 | # Installer logs
23 | pip-log.txt
24 | pip-delete-this-directory.txt


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2015 JacobLandau
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # pykcd [![PyPI version](https://badge.fury.io/py/pykcd.svg)](https://badge.fury.io/py/pykcd)
 2 | Python interface for the XKCD API
 3 | 
 4 | ## Usage
 5 | 
 6 | The strip object can be initialized like so:
 7 | 
 8 |     Strip = pykcd.XKCDStrip(strip_number)
 9 | 
10 | The full berth of properties can be found using the help function. Here's a sampling.
11 | 
12 | * Alt text
13 | 
14 |         In [1]: XKCDStrip(50).alt_text
15 |         Out[1]: 'Of course, Penny Arcade has already mocked themselves for this. They don't care."
16 | 
17 | * Image link
18 | 
19 |         In [2]: XKCDStrip(732).image_link
20 |         Out[2]: 'http://imgs.xkcd.com/comics/hdtv.png'
21 | 
22 | * Downloading Strips
23 | 
24 |         In [3]: XKCDStrip(178).download_strip()
25 |         100% [...................................] 18611 / 18611
26 |         // Downloaded to /XKCD_Archive/ in the working directory
27 | 
28 | ## Under the Hood
29 | 
30 | Each XKCD strip, barring Strip #404 ([Funny funny](http://www.explainxkcd.com/wiki/index.php/404)), has a JSON document located at "www.xkcd.com/#/info.0.json". This contains references to data such as the day, month and year published, the strip transcript, the image hotlink, the alt text, and other details. By using the requests library, this document can be grabbed and parsed into a standard Python dictionary, through which the data can be referenced and accessed by it's respective keys.
31 | 
32 | Image links present a unique challenge in the case of large strips such as [Strip #802: Online Communities 2](http://www.explainxkcd.com/wiki/index.php/802:_Online_Communities_2), which have their img key point to a thumbnail rather than the full-resolution hotlink. The solution lies within the link key, which points to a page containing only the full resolution image. We can scrape out the link from this page using BeautifulSoup, and return this value when the user asks for the image link from one of these large strips.
33 | 
34 | Wget is used in order to download the strips to the '/XKCD_Archive/' folder in the working directory, which will be created if the directory does not already exist. It will check to see if the file is already present, and names the file according to a "Number - Title" scheme. Any characters in the title not friendly with Windows filesystems will be filtered out using a lambda function.
35 | 
36 | ## Why?
37 | 
38 | Why not.
39 | 


--------------------------------------------------------------------------------
/pykcd/__init__.py:
--------------------------------------------------------------------------------
  1 | from bs4 import BeautifulSoup
  2 | import os, wget, requests, shutil
  3 | 
  4 | 
  5 | 
  6 | class XKCDStrip():
  7 |     '''
  8 |     Strip object for given XKCD strip
  9 |     '''
 10 | 
 11 | 
 12 |     def __init__(self, strip_num):
 13 |         '''
 14 |         Constructs the strip object
 15 | 
 16 |         :param strip_num: The number of the strip
 17 |         '''
 18 | 
 19 |         # The location of the API for the strip
 20 |         # Interpolates the strip number into the domain
 21 |         self.json_domain = 'http://www.xkcd.com/{}/info.0.json'.format(strip_num)
 22 |         self.user_agent = 'pykcd/1.0.0 (+https://github.com/JacobLandau/pykcd/)'
 23 | 
 24 |         # Creates the identifiers for each property,
 25 |         # so the value can be assigned later, when needed
 26 |         self.strip_num = strip_num
 27 |         self.transcript = None
 28 |         self.news = None
 29 |         self.title = None
 30 |         self.day = None
 31 |         self.month = None
 32 |         self.year = None
 33 |         self.link = None
 34 |         self.alt_text = None
 35 |         self.image_link = None
 36 | 
 37 |         # If the strip is No. 404, there is no API.
 38 |         # Thus, we continue the April Fools joke by
 39 |         # returning 'value not found' for each property
 40 |         if self.strip_num == 404:
 41 |             self.transcript = '404 - Transcript Not Found'
 42 |             self.news = '404 - News Not Found'
 43 |             self.title = '404 - Title Not Found'
 44 |             self.day = '404 - Day Not Found'
 45 |             self.month = '404 - Month Not Found'
 46 |             self.year = '404 - Year Not Found'
 47 |             self.link = '404 - Hyperlink Not Found'
 48 |             self.alt_text = '404 - Alt Text Not Found'
 49 |             self.image_link = '404 - Image Link Not Found'
 50 |         else:
 51 |             # This grabs the JSON document and parses it into a dictionary
 52 |             # It is saved for here because since the JSON for No. 404
 53 |             # doesn't exist, it would fittingly return a 404 error.
 54 |             self.url = requests.get(self.json_domain, headers={'user-agent':self.user_agent, 'content-type':'application/json'})
 55 |             self.url.raise_for_status()
 56 |             self.strip = self.url.json()
 57 | 
 58 |             # Grabs the value for each from their dictionary entry
 59 |             self.transcript = self.strip['transcript']
 60 |             self.news = self.strip['news']
 61 |             self.title = self.strip['title']
 62 |             self.day = self.strip['day']
 63 |             self.month = self.strip['month']
 64 |             self.year = self.strip['year']
 65 |             self.link = self.strip['link']
 66 |             self.alt_text = self.strip['alt']
 67 | 
 68 |             # If the strip has a hyperlink, it may lead to a larger version
 69 |             if self.link != '':
 70 |                 # If the strip links to a large version
 71 |                 # We get the image link for the large version instead
 72 |                 # Using a temporary web scraper
 73 |                 if 'large' in self.link:
 74 |                     temp_soup = BeautifulSoup(requests.get(self.link, headers={'user-agent':self.user_agent, 'content-type':'text/html'}).content, 'lxml')
 75 |                     link = temp_soup.img
 76 |                     self.image_link = link.get('src')
 77 |                 else:
 78 |                     self.image_link = self.strip['img']
 79 |             else:
 80 |                 self.image_link = self.strip['img']
 81 | 
 82 |     @property
 83 |     def strip_num(self):
 84 |         '''
 85 |         The strip's number
 86 |         '''
 87 |         return self._strip_num
 88 | 
 89 |     @strip_num.setter
 90 |     def strip_num(self, strip_num):
 91 |         self._strip_num = strip_num
 92 | 
 93 |     @property
 94 |     def transcript(self):
 95 |         '''
 96 |         The strip's transcript
 97 |         '''
 98 |         return self._transcript
 99 | 
100 |     @transcript.setter
101 |     def transcript(self, transcript):
102 |         self._transcript = transcript
103 | 
104 |     @property
105 |     def news(self):
106 |         '''
107 |         The strip's news posting
108 |         '''
109 |         return self._news
110 | 
111 |     @news.setter
112 |     def news(self, news):
113 |         self._news = news
114 | 
115 |     @property
116 |     def title(self):
117 |         '''
118 |         The strip's title
119 |         '''
120 |         return self._title
121 | 
122 |     @title.setter
123 |     def title(self, title):
124 |         self._title = title
125 | 
126 |     @property
127 |     def day(self):
128 |         '''
129 |         Day strip was published
130 |         '''
131 |         return self._day
132 | 
133 |     @day.setter
134 |     def day(self, day):
135 |         self._day = day
136 | 
137 |     @property
138 |     def month(self):
139 |         '''
140 |         Month strip was published
141 |         '''
142 |         return self._month
143 | 
144 |     @month.setter
145 |     def month(self, month):
146 |         self._month = month
147 | 
148 |     @property
149 |     def year(self):
150 |         '''
151 |         Year strip was published
152 |         '''
153 |         return self._year
154 | 
155 |     @year.setter
156 |     def year(self, year):
157 |         self._year = year
158 | 
159 |     @property
160 |     def link(self):
161 |         '''
162 |         Domain strip is hyperlinked towards
163 |         '''
164 |         return self._link
165 | 
166 |     @link.setter
167 |     def link(self, link):
168 |         self._link = link
169 | 
170 |     @property
171 |     def alt_text(self):
172 |         '''
173 |         The strip's alt text
174 |         '''
175 |         return self._alt_text
176 | 
177 |     @alt_text.setter
178 |     def alt_text(self, alt_text):
179 |         self._alt_text = alt_text
180 | 
181 |     @property
182 |     def image_link(self):
183 |         '''
184 |         The strip's image link
185 |         '''
186 |         return self._image_link
187 | 
188 |     @image_link.setter
189 |     def image_link(self, image_link):
190 |         self._image_link = image_link
191 | 
192 |     def download_strip(self):
193 |         '''
194 |         Downloads the strip into the working directory
195 |         '''
196 | 
197 |         # Creates the archive folder if it doesn't already exist
198 |         if os.path.exists('XKCD Archive'):
199 |             pass
200 |         else:
201 |             os.mkdir('XKCD Archive')
202 | 
203 |         archive_directory = './XKCD Archive/'
204 | 
205 |         if self.strip_num == 404:
206 |             file = open('{}404 - Item Not Found'.format(archive_directory), mode='w')
207 |             file.close()
208 |         else:
209 |             try:
210 |                 # Gets the name of the strip, filtering out any characters
211 |                 # which are invalid for Windows filenaming conventions
212 |                 strip_title = ''.join(filter(lambda x: x not in '\/:*?"<>|', self.title))
213 | 
214 |                 image_link = self.image_link
215 | 
216 |                 # The file name will be the strip title plus the file extension as grabbed by the image link
217 |                 file_name = strip_title + image_link[-4:]
218 | 
219 |                 # Checks if the strip is already downloaded
220 |                 # if the strip is, notifies user and iterates loop
221 |                 # if the strip is not, downloads and renames
222 |                 if os.path.exists(archive_directory + file_name) or os.path.exists(archive_directory + str(self.strip_num) + ' - ' + file_name):
223 |                     print('-' * (shutil.get_terminal_size()[0] - 1))
224 |                     print('{} ALREADY DOWNLOADED'.format(self.strip_num))
225 |                 else:
226 |                     wget.download(image_link, archive_directory)
227 |                     os.rename('{}{}'.format(archive_directory, image_link[28:]), '{}{} - {}'.format(archive_directory, self.strip_num, file_name))
228 | 
229 |                 # Runs if the system throws a UnicodeEncodeError, which will only happen
230 |                 # when it tries to print a unicode character to the console
231 |                 # In that case, we substitute the usual line printed to the console
232 |                 # for a cheeky, console-safe stand-in
233 |             except UnicodeEncodeError:
234 |                 print('\n{} - This title cannot be printed because unicode hates you.'.format(self.strip_num))
235 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | beautifulsoup4==4.4.1
2 | requests==2.32.0
3 | wget==2.2
4 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | setup(name='pykcd',
 4 |     version = '1.0.1',
 5 |     description = 'Python interface/wrapper for the XKCD API',
 6 |     url = 'https://github.com/JacobLandau/pykcd',
 7 |     author = 'Jacob Landau',
 8 |     author_email = 'Jacob@PopcornFlicks.ca',
 9 |     license = 'MIT',
10 |     packages = ['pykcd'],
11 |     install_requires = ['BeautifulSoup4', 'requests', 'wget', 'lxml'],
12 |     zip_safe = False)
13 | 


--------------------------------------------------------------------------------