February 2015
One of the most valuable exercises you can try if you ..."
91 |
92 | See :class:`readability.ParserClient` docs for a complete list of
93 | available functionality.
94 |
95 |
96 | .. toctree::
97 | :hidden:
98 |
99 | Authentication
109 | Requests to the Parser API are not signed like an OAuth
110 | request. The Parser token is simply passed as a POST or GET
111 | parameter depending on the request type. Be careful not to
112 | reveal this token, requests directly to the Parser API should
113 | not be made on the client device but rather proxied to keep the
114 | API token secure.
115 |
121 | Here's how to pull an article's content from the Readability Parser API:
122 |
151 | All requests are, by default, provided as JSON. You may also pass "?format=xml" in the URL to convert this into XML data to be consumed.
152 | Authorization available response representations: token string (required) url id max_pages available response representations: potential faults:
161 | Retrieve the Content Status of an article. This is useful if you want to save yourself from POSTing a large html document. You can do a HEAD request on the resource, and check for the status of the article in the X-Article-Status header. Additionally, if we've never seen the article before, we'll return a 404, which also means you should POST.
162 | token string (required) url id X-Article-Id The ID of the article within Readablity. X-Article-Status The status of the content in Readability. One of: potential faults: url string (required) callback available response representations: potential faults:
231 | Authentication failed or was not provided. Verify that you have sent valid ixDirectory credentials via HTTP Basic.
232 | A 'Www-Authenticate' challenge header will be sent with this type of error response.Readability v1 Parser API
106 | Authentication
108 | Quick Start
120 | Request
124 | GET /api/content/v1/parser?url=http://blog.readability.com/2011/02/step-up-be-heard-readability-ideas/&token=1b830931777ac7c2ac954e9f0d67df437175e66e
125 | Response
126 |
127 | HTTP/1.0 200 OK
128 | {
129 | "content" <div class=\"article-text\">\n<p>I'm idling outside Diamante's, [snip] ...</p></div>",
130 | "domain": "www.gq.com",
131 | "author": "Rafi Kohan",
132 | "url": "http://www.gq.com/sports/profiles/201202/david-diamante-interview-cigar-lounge-brooklyn-new-jersey-nets?currentPage=all",
133 | "short_url": "http://rdd.me/g3jcb1sr",
134 | "title": "Blowing Smoke with Boxing's Big Voice",
135 | "excerpt": "I'm idling outside Diamante's, a cigar lounge in Fort Greene, waiting for David Diamante, and soon I smell him coming. It's late January but warm. A motorcycle growls down the Brooklyn side street,…",
136 | "direction": "ltr",
137 | "word_count": 2892,
138 | "total_pages": 1,
139 | "date_published": null,
140 | "dek": "Announcer <strong>David Diamante</strong>, the new voice of the New Jersey (soon Brooklyn) Nets, has been calling boxing matches for years. On the side, he owns a cigar lounge in the heart of Brooklyn. We talk with Diamante about his new gig and the fine art of cigars",
141 | "lead_image_url": "http://www.gq.com/images/entertainment/2012/02/david-diamante/diamante-628.jpg",
142 | "next_page_id": null,
143 | "rendered_pages": 1
144 | }
145 |
146 | Data Formats
150 | Resources, Representations & Errors
Resources
/
Methods
GET
156 | Retrieve the base API URI - information about subresources.
157 | request header parameters
parameter value description /parser?token&url&id&max_pages
Methods
GET
158 | Parse an article
159 | request query parameters
parameter value description The URL of an article to return the content for. The ID of an article to return the content for. The maximum number of pages to parse and combine. Default is 25. HEAD
160 | request query parameters
parameter value description The URL of an article to check. The ID of an article to check. response header parameters
parameter value description
164 |
166 |
168 |
179 | /confidence?url&callback
Methods
GET
Detect the confidence with which Readability could parse a given URL. Does not require a token.request query parameters
parameter value description The URL of an article to return the confidence for. The jsonp callback function name. Representations
Example root representation. (application/json)
180 |
181 | {
182 | "resources": {
183 | "parser": {
184 | "description": "The Content Parser Resource",
185 | "href": "/api/content/v1/parser"
186 | }
187 | }
188 | }
189 |
190 | Example article representation. (application/json)
191 |
192 | {
193 | "content" <div class=\"article-text\">\n<p>I'm idling outside Diamante's, [snip] ...</p></div>",
194 | "domain": "www.gq.com",
195 | "author": "Rafi Kohan",
196 | "url": "http://www.gq.com/sports/profiles/201202/david-diamante-interview-cigar-lounge-brooklyn-new-jersey-nets?currentPage=all",
197 | "short_url": "http://rdd.me/g3jcb1sr",
198 | "title": "Blowing Smoke with Boxing's Big Voice",
199 | "excerpt": "I'm idling outside Diamante's, a cigar lounge in Fort Greene, waiting for David Diamante, and soon I smell him coming. It's late January but warm. A motorcycle growls down the Brooklyn side street,…",
200 | "direction": "ltr",
201 | "word_count": 2892,
202 | "total_pages": 1,
203 | "date_published": null,
204 | "dek": "Announcer <strong>David Diamante</strong>, the new voice of the New Jersey (soon Brooklyn) Nets, has been calling boxing matches for years. On the side, he owns a cigar lounge in the heart of Brooklyn. We talk with Diamante about his new gig and the fine art of cigars",
205 | "lead_image_url": "http://www.gq.com/images/entertainment/2012/02/david-diamante/diamante-628.jpg",
206 | "next_page_id": null,
207 | "rendered_pages": 1
208 | }
209 |
210 |
211 | Example confidence representation. (application/json)
212 |
213 | {
214 | "url": "http://www.gq.com/article/12",
215 | "confidence": .7
216 | }
217 |
218 |
219 | Example confidence representation as jsonp. (application/json)
220 |
221 | callback({
222 | "url": "http://www.gq.com/article/12",
223 | "confidence": .7
224 | });
225 |
226 |
227 | Errors
400 Bad Request (application/json)
228 | The server could not understand your request. Verify that request parameters (and content, if any) are valid.
229 | 401 Authorization Required (application/json)
230 | 500 Internal Server Error (application/json)
235 | An unknown error has occurred.
236 | 404 Not Found (application/json)
237 | The resource that you requested does not exist.
238 |
239 |
240 |
--------------------------------------------------------------------------------
/readability/tests/test_auth.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | # Bad hack. I only installed unittest2 locally in my virtualenv
4 | # for Python 2.6.7
5 | try:
6 | import unittest2 as unittest
7 | except ImportError:
8 | import unittest
9 |
10 |
11 | from readability import xauth
12 |
13 |
14 | class XAuthTestCase(unittest.TestCase):
15 | """
16 | Test XAuth functionality.
17 | """
18 | def test_bad_base_url(self):
19 | """
20 | If given a bad base url template, the request to the
21 | ACCESS_TOKEN_URL should fail and an exception be raised.
22 | """
23 | token = None
24 | with self.assertRaises(Exception):
25 | token = xauth(base_url_template='https://arc90.com/{0}')
26 | self.assertEqual(token, None)
27 |
28 | def test_bad_consumer_key(self):
29 | """
30 | If given a bad consumer key, the `xauth` method should raise
31 | an exception.
32 | """
33 | token = None
34 | with self.assertRaises(Exception):
35 | token = xauth(consumer_key='bad consumer key')
36 | self.assertEqual(token, None)
37 |
38 | def test_bad_consumer_secret(self):
39 | """
40 | If given a bad consumer key, the `xauth` method should raise
41 | an exception.
42 | """
43 | token = None
44 | with self.assertRaises(Exception):
45 | token = xauth(consumer_secret='bad consumer secret')
46 | self.assertEqual(token, None)
47 |
48 | def test_bad_username(self):
49 | """
50 | If given a bad username, an exception should be raised.
51 | """
52 | token = None
53 | with self.assertRaises(Exception):
54 | token = xauth(username='bad username')
55 | self.assertEqual(token, None)
56 |
57 | def test_bad_password(self):
58 | """
59 | If given a bad password, an exception should be raised.
60 | """
61 | token = None
62 | with self.assertRaises(Exception):
63 | token = xauth(password='badpassword')
64 | self.assertEqual(token, None)
65 |
66 | def test_successful_auth(self):
67 | """
68 | Test getting a token with proper creds
69 | """
70 | # Credentials should be set as environment variables when running tests
71 | token = xauth()
72 | self.assertEqual(len(token), 2)
73 |
74 |
75 | if __name__ == '__main__':
76 | unittest.main()
77 |
--------------------------------------------------------------------------------
/readability/tests/test_clients.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import os
3 | try:
4 | import unittest2 as unittest
5 | except ImportError:
6 | import unittest
7 | try:
8 | from unittest.mock import patch
9 | except ImportError as e:
10 | from mock import patch
11 |
12 | from readability import xauth, ReaderClient, ParserClient
13 |
14 |
15 | class ClientInitTest(unittest.TestCase):
16 | """
17 | Test that passing tokens to the constructor bypasses looking in ENV.
18 |
19 | """
20 | def setUp(self):
21 | self.env_cache = {}
22 | for var in ['READABILITY_PARSER_TOKEN', 'READABILITY_CONSUMER_KEY', 'READABILITY_CONSUMER_SECRET']:
23 | if var in os.environ:
24 | self.env_cache[var] = os.environ[var]
25 | del os.environ[var]
26 |
27 | def tearDown(self):
28 | for key, val in self.env_cache.items():
29 | os.environ[key] = val
30 |
31 | def test_reader(self):
32 | """
33 | Test that passing tokens to the constructor bypasses looking in ENV.
34 |
35 | """
36 | with patch('readability.core.required_from_env') as mock:
37 | ReaderClient(
38 | consumer_key='consumer_key',
39 | consumer_secret='consumer_secret',
40 | # Fake xauth since we wont be actually making calls for this test
41 | token_key='token_key',
42 | token_secret='token_secret')
43 | self.assertEqual(mock.call_count, 0)
44 |
45 | def test_parser(self):
46 | with patch('readability.core.required_from_env') as mock:
47 | ParserClient(token='token')
48 | self.assertEqual(mock.call_count, 0)
49 |
50 | class ReaderClientNoBookmarkTest(unittest.TestCase):
51 | """
52 | Tests for the Readability ReaderClient class that need no bookmarks.
53 | """
54 | def setUp(self):
55 | """
56 | Need to get a token for each test.
57 |
58 | """
59 | token_key, token_secret = xauth()
60 | self.reader_client = ReaderClient(token_key, token_secret)
61 |
62 | def test_get_article(self):
63 | """
64 | Test the `get_article` method.
65 | """
66 | article_id = 'orrspy2p'
67 | response = self.reader_client.get_article(article_id)
68 | self.assertEqual(response.status_code, 200)
69 |
70 | # spot check some keys
71 | some_expected_keys = set(['direction', 'title', 'url', 'excerpt',
72 | 'content', 'processed', 'short_url', 'date_published'])
73 | keys_set = set(response.json().keys())
74 | self.assertTrue(some_expected_keys.issubset(keys_set))
75 |
76 | def test_get_article_404(self):
77 | """
78 | Try getting an article that doesn't exist.
79 | """
80 | article_id = 'antidisestablishmentarianism'
81 | response = self.reader_client.get_article(article_id)
82 | self.assertEqual(response.status_code, 404)
83 |
84 | def test_get_user(self):
85 | """
86 | Test getting user data
87 | """
88 | user_response = self.reader_client.get_user()
89 | self.assertEqual(user_response.status_code, 200)
90 | some_expected_keys = set(['username', 'first_name', 'last_name',
91 | 'date_joined', 'email_into_address'])
92 | received_keys = set(user_response.json().keys())
93 | self.assertTrue(some_expected_keys.issubset(received_keys))
94 |
95 | def test_get_empty_tags(self):
96 | """
97 | Test getting an empty set of tags. Since there are no bookmarks
98 | present in this test, there should be no tags.
99 | """
100 | tag_response = self.reader_client.get_tags()
101 | self.assertEqual(tag_response.status_code, 200)
102 | response_json = tag_response.json()
103 | self.assertTrue('tags' in response_json)
104 | self.assertEqual(len(response_json['tags']), 0)
105 |
106 |
107 | class ReaderClientSingleBookmarkTest(unittest.TestCase):
108 | """
109 | Tests that only need one bookmark
110 | """
111 | def setUp(self):
112 | """
113 | Get a client and add a bookmark
114 | """
115 | token_key, token_secret = xauth()
116 | self.reader_client = ReaderClient(token_key=token_key, token_secret=token_secret)
117 | self.url = 'http://www.theatlantic.com/technology/archive/2013/01/the-never-before-told-story-of-the-worlds-first-computer-art-its-a-sexy-dame/267439/'
118 | add_response = self.reader_client.add_bookmark(self.url)
119 | self.assertTrue(add_response.status_code in [201, 202])
120 |
121 | def tearDown(self):
122 | """
123 | Remove all added bookmarks.
124 | """
125 | for bm in self.reader_client.get_bookmarks().json()['bookmarks']:
126 | del_response = self.reader_client.delete_bookmark(bm['id'])
127 | self.assertEqual(del_response.status_code, 204)
128 |
129 | def test_get_bookmark(self):
130 | """
131 | Test getting one bookmark by id
132 | """
133 | bookmark_id = self._get_bookmark_data()['id']
134 |
135 | bm_response = self.reader_client.get_bookmark(bookmark_id)
136 | self.assertEqual(bm_response.status_code, 200)
137 | some_expected_keys = set(['article', 'user_id', 'favorite', 'id'])
138 | received_keys = set(bm_response.json().keys())
139 | self.assertTrue(some_expected_keys.issubset(received_keys))
140 |
141 | def test_bookmark_tag_functionality(self):
142 | """
143 | Test adding, fetching and deleting tags on a bookmark.
144 | """
145 | bookmark_id = self._get_bookmark_data()['id']
146 |
147 | # test getting empty tags
148 | tag_response = self.reader_client.get_bookmark_tags(bookmark_id)
149 | self.assertEqual(tag_response.status_code, 200)
150 | self.assertEqual(len(tag_response.json()['tags']), 0)
151 |
152 | # test adding tags
153 | tags = ['tag', 'another tag']
154 | tag_string = ', '.join(tags)
155 | tag_add_response = \
156 | self.reader_client.add_tags_to_bookmark(bookmark_id, tag_string)
157 | self.assertEqual(tag_add_response.status_code, 202)
158 |
159 | # re-fetch tags. should have 2
160 | retag_response = self.reader_client.get_bookmark_tags(bookmark_id)
161 | self.assertEqual(retag_response.status_code, 200)
162 | self.assertEqual(len(retag_response.json()['tags']), 2)
163 | for tag in retag_response.json()['tags']:
164 | self.assertTrue(tag['text'] in tags)
165 |
166 | # test getting tags for user
167 | user_tag_resp = self.reader_client.get_tags()
168 | self.assertEqual(user_tag_resp.status_code, 200)
169 | self.assertEqual(len(user_tag_resp.json()['tags']), 2)
170 | for tag in user_tag_resp.json()['tags']:
171 | self.assertTrue(tag['text'] in tags)
172 |
173 | # test getting a single tag while we're here
174 | single_tag_resp = self.reader_client.get_tag(tag['id'])
175 | self.assertEqual(single_tag_resp.status_code, 200)
176 | self.assertTrue('applied_count' in single_tag_resp.json())
177 | self.assertTrue('id' in single_tag_resp.json())
178 | self.assertTrue('text' in single_tag_resp.json())
179 |
180 | # delete tags
181 | for tag in retag_response.json()['tags']:
182 | del_response = self.reader_client.delete_tag_from_bookmark(
183 | bookmark_id, tag['id'])
184 | self.assertEqual(del_response.status_code, 204)
185 |
186 | # check that tags are gone
187 | tag_response = self.reader_client.get_bookmark_tags(bookmark_id)
188 | self.assertEqual(tag_response.status_code, 200)
189 | self.assertEqual(len(tag_response.json()['tags']), 0)
190 |
191 | def _get_bookmark_data(self):
192 | """
193 | Convenience method to get a single bookmark's data.
194 | """
195 | bm_response = self.reader_client.get_bookmarks()
196 | self.assertEqual(bm_response.status_code, 200)
197 | bm_response_json = bm_response.json()
198 | self.assertTrue(len(bm_response_json['bookmarks']) > 0)
199 | return bm_response_json['bookmarks'][0]
200 |
201 |
202 | class ReaderClientMultipleBookmarkTest(unittest.TestCase):
203 | """
204 | Tests for bookmark functionality
205 | """
206 | def setUp(self):
207 | """
208 | Add a few bookmarks.
209 | """
210 | token_key, token_secret = xauth()
211 | self.reader_client = ReaderClient(token_key=token_key, token_secret=token_secret)
212 |
213 | self.urls = [
214 | 'http://www.theatlantic.com/technology/archive/2013/01/the-never-before-told-story-of-the-worlds-first-computer-art-its-a-sexy-dame/267439/',
215 | 'http://www.theatlantic.com/business/archive/2013/01/why-smart-poor-students-dont-apply-to-selective-colleges-and-how-to-fix-it/272490/',
216 | ]
217 |
218 | self.favorite_urls = [
219 | 'http://www.theatlantic.com/sexes/archive/2013/01/the-lonely-existence-of-mel-feit-mens-rights-advocate/267413/',
220 | 'http://www.theatlantic.com/technology/archive/2013/01/women-in-combat-an-idea-whose-time-has-come-aided-by-technology/272483/'
221 | ]
222 |
223 | self.archive_urls = [
224 | 'http://www.theatlantic.com/business/archive/2013/01/what-economics-can-and-cant-tell-us-about-the-legacy-of-legal-abortion/267459/',
225 | 'http://www.theatlantic.com/business/archive/2013/01/5-ways-to-understand-just-how-absurd-spains-26-unemployment-rate-is/272502/'
226 | ]
227 |
228 | self.all_urls = self.urls + self.favorite_urls + self.archive_urls
229 |
230 | for url in self.urls:
231 | response = self.reader_client.add_bookmark(url)
232 | self.assertTrue(response.status_code in [201, 202])
233 |
234 | for url in self.favorite_urls:
235 | response = self.reader_client.add_bookmark(url, favorite=True)
236 | self.assertTrue(response.status_code in [201, 202])
237 |
238 | for url in self.archive_urls:
239 | response = self.reader_client.add_bookmark(url, archive=True)
240 | self.assertTrue(response.status_code in [201, 202])
241 |
242 | def test_get_bookmarks(self):
243 | """
244 | Test getting all bookmarks
245 | """
246 | response = self.reader_client.get_bookmarks()
247 | self.assertEqual(response.status_code, 200)
248 | self.assertEqual(
249 | len(response.json()['bookmarks']), len(self.all_urls))
250 |
251 | # test favorite bookmarks
252 | response = self.reader_client.get_bookmarks(favorite=True)
253 | self.assertEqual(response.status_code, 200)
254 | self.assertEqual(
255 | len(response.json()['bookmarks']), len(self.favorite_urls))
256 | for bm in response.json()['bookmarks']:
257 | self.assertTrue(bm['article']['url'] in self.favorite_urls)
258 |
259 | # test archive bookmarks
260 | response = self.reader_client.get_bookmarks(archive=True)
261 | self.assertEqual(response.status_code, 200)
262 | self.assertEqual(
263 | len(response.json()['bookmarks']), len(self.archive_urls))
264 | for bm in response.json()['bookmarks']:
265 | self.assertTrue(bm['article']['url'] in self.archive_urls)
266 |
267 | def tearDown(self):
268 | """
269 | Remove all added bookmarks.
270 | """
271 | for bm in self.reader_client.get_bookmarks().json()['bookmarks']:
272 | del_response = self.reader_client.delete_bookmark(bm['id'])
273 | self.assertEqual(del_response.status_code, 204)
274 |
275 |
276 | if __name__ == '__main__':
277 | unittest.main(warnings='ignore')
278 |
--------------------------------------------------------------------------------
/readability/tests/test_parser.py:
--------------------------------------------------------------------------------
1 | try:
2 | import unittest2 as unittest
3 | except ImportError:
4 | import unittest
5 |
6 | from readability import ParserClient
7 | from readability.clients import DEFAULT_PARSER_URL_TEMPLATE
8 | from readability.core import required_from_env
9 | from readability.tests import load_test_content
10 |
11 | class ParserClientTest(unittest.TestCase):
12 | """
13 | Test case for the Parser Client
14 | """
15 | def setUp(self):
16 | self.parser_token = required_from_env('READABILITY_PARSER_TOKEN')
17 | self.parser_client = ParserClient(token=self.parser_token)
18 | self.test_url = 'https://en.wikipedia.org/wiki/Mark_Twain'
19 |
20 | def test_generate_url(self):
21 | """
22 | Test the clients ability to generate urls to endpoints.
23 | """
24 | # Test root resource
25 | expected_url = DEFAULT_PARSER_URL_TEMPLATE.format('')
26 | expected_url = '{}?token={}'.format(expected_url, self.parser_token)
27 | generated_url = self.parser_client._generate_url('')
28 | self.assertEqual(generated_url, expected_url)
29 |
30 | # Test parser resource
31 | expected_url = '{base_url}?token={token}&url=http%3A%2F%2Fwww.google.biz%2Fblog.html'.format(
32 | base_url=DEFAULT_PARSER_URL_TEMPLATE.format('parser'),
33 | token=self.parser_token)
34 | params = {'url': 'http://www.google.biz/blog.html'}
35 | generated_url = self.parser_client._generate_url(
36 | 'parser', query_params=params)
37 |
38 | self.assertEqual(generated_url, expected_url)
39 |
40 | def test_get_root(self):
41 | """
42 | Test the client's ability to hit the root endpoint.
43 | """
44 | response = self.parser_client.get_root()
45 |
46 | expected_keys = set(['resources', ])
47 | self.assertEqual(set(response.json().keys()), expected_keys)
48 |
49 | def test_get_confidence(self):
50 | """
51 | Test the client's ability to hit the confidence endpoint.
52 | """
53 | # hit without an article_id or url. Should get an error.
54 | response = self.parser_client.get_confidence()
55 | self.assertEqual(response.status_code, 400)
56 |
57 | expected_keys = set(['url', 'confidence'])
58 |
59 | response = self.parser_client.get_confidence(url=self.test_url)
60 | self.assertEqual(response.status_code, 200)
61 | self.assertEqual(set(response.json().keys()), expected_keys)
62 | # confidence for wikipedia should be over .5
63 | self.assertTrue(response.json()['confidence'] >= .5)
64 |
65 | def test_get_article_status(self):
66 | """
67 | Test the client's ability to hit the parser endpoint with a HEAD
68 | """
69 | # hit without an article_id or url. Should get an error.
70 | response = self.parser_client.get_confidence()
71 | self.assertEqual(response.status_code, 400)
72 |
73 | response = self.parser_client.get_article_status(url=self.test_url)
74 | self.assertEqual(response.status_code, 200)
75 | self.assertTrue(response.headers.get('x-article-status') is not None)
76 | self.assertTrue(response.headers.get('x-article-id') is not None)
77 |
78 | def test_get_article(self):
79 | """
80 | Test the client's ability to hit the parser endpoint with a GET
81 | """
82 | # test with incorrect params
83 | response = self.parser_client.get_article()
84 | self.assertEqual(response.status_code, 400)
85 |
86 | response = self.parser_client.get_article(url=self.test_url)
87 | self.assertEqual(response.status_code, 200)
88 |
89 | some_expected_keys = set(['content', 'domain', 'author', 'word_count',
90 | 'title', 'total_pages'])
91 | self.assertTrue(
92 | some_expected_keys.issubset(set(response.json().keys())))
93 |
94 | def test_post_article_content(self):
95 | """
96 | Test the client's ability to hit the parser endpoint with a POST
97 | request.
98 | """
99 | content = load_test_content('content/test_post_content.html')
100 | url = 'http://thisisaurlthatdoesntmatterbutmustbepassedanyway.com/article.html'
101 | response = self.parser_client.post_article_content(content, url)
102 | self.assertEqual(response.status_code, 200)
103 |
104 |
105 | if __name__ == '__main__':
106 | unittest.main()
107 |
--------------------------------------------------------------------------------
/readability/tests/test_utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # Bad hack. I only installed unittest2 locally in my virtualenv
3 | # for Python 2.6.7
4 | try:
5 | import unittest2 as unittest
6 | except ImportError:
7 | import unittest
8 |
9 | from unittest import TestCase
10 | from datetime import datetime
11 |
12 | from readability.utils import \
13 | cast_datetime_filter, cast_integer_filter, filter_args_to_dict
14 |
15 |
16 | class CastDatetimeFilterTestCase(unittest.TestCase):
17 | """
18 | Tests for the `cast_datetime_filter` function.
19 | """
20 | def test_int(self):
21 | """
22 | Pass an int. Should raise a `ValueError`
23 | """
24 | with self.assertRaises(ValueError):
25 | cast_datetime_filter(1)
26 |
27 | def test_non_iso_string(self):
28 | """
29 | Pass a string that's not in ISO format. Should get a string back
30 | that's in ISO format.
31 | """
32 | date_string = '08-03-2010'
33 | expected_iso = cast_datetime_filter(date_string)
34 | self.assertEqual(expected_iso, '2010-08-03T00:00:00')
35 |
36 | def test_datetime_object(self):
37 | """
38 | Pass a datetime object. Should get a string back in ISO format.
39 | """
40 | now = datetime.now()
41 | expected_output = now.isoformat()
42 | actual_output = cast_datetime_filter(now)
43 | self.assertEqual(actual_output, expected_output)
44 |
45 |
46 | class CastIntegerFilter(unittest.TestCase):
47 | """
48 | Test for the `cast_integer_filter` function.
49 | """
50 | def test_int(self):
51 | """
52 | Pass an int. Should get it back.
53 | """
54 | value_to_cast = 1
55 | output = cast_integer_filter(value_to_cast)
56 | self.assertEqual(value_to_cast, output)
57 |
58 | def test_false(self):
59 | """
60 | Pass a boolean False. Should get a 0 back.
61 | """
62 | output = cast_integer_filter(False)
63 | expected_output = 0
64 | self.assertEqual(output, expected_output)
65 |
66 | def test_true(self):
67 | """
68 | Pass a boolean True. Should get a 1 back.
69 | """
70 | output = cast_integer_filter(True)
71 | expected_output = 1
72 | self.assertEqual(output, expected_output)
73 |
74 | def test_numeric_string(self):
75 | """
76 | Pass a numeric string. Should get the integer version back.
77 | """
78 | numeric_string = '123'
79 | expected_output = 123
80 | output = cast_integer_filter(numeric_string)
81 | self.assertEqual(expected_output, output)
82 |
83 |
84 | class FilterArgsToDictTestCase(unittest.TestCase):
85 | """
86 | Test for the `filter_args_to_dict` function.
87 | """
88 | def test_all_bad_filter_keys(self):
89 | """
90 | Pass a dict who's keys are not in the acceptable filter list.
91 |
92 | Should get an empty dict back.
93 | """
94 | filters = {
95 | 'date_deleted': '08-08-2010',
96 | 'date_updated': '08-08-2011',
97 | 'liked': 1
98 | }
99 |
100 | acceptable_filters = ['favorite', 'archive']
101 | expected_empty = filter_args_to_dict(filters, acceptable_filters)
102 | self.assertEqual(expected_empty, {})
103 |
104 | def test_some_bad_filter_keys(self):
105 | """
106 | Pass a mixture of good and bad filter keys.
107 | """
108 | filters = {
109 | 'favorite': True,
110 | 'archive': False
111 | }
112 | bad_filters = {
113 | 'date_deleted': '08-08-2010',
114 | 'date_updated': '08-08-2011',
115 | 'liked': 1
116 | }
117 | acceptable_filter_keys = ['favorite', 'archive']
118 |
119 | # add bad filters to filters dict
120 | filters.update(bad_filters)
121 | filter_dict = filter_args_to_dict(filters, acceptable_filter_keys)
122 | self.assertEqual(set(filter_dict.keys()), set(acceptable_filter_keys))
123 |
124 | def test_casting_of_integer_filters(self):
125 | """
126 | Pass keys that correspond to integer filters.
127 | """
128 | filters = {
129 | 'favorite': True,
130 | 'archive': False
131 | }
132 | acceptable_filter_keys = filters.keys()
133 | filter_dict = filter_args_to_dict(filters, acceptable_filter_keys)
134 | self.assertEqual(set(filter_dict.keys()), set(acceptable_filter_keys))
135 | self.assertEqual(filter_dict['favorite'], 1)
136 | self.assertEqual(filter_dict['archive'], 0)
137 |
138 | def test_casting_of_datetime_filters(self):
139 | """
140 | Pass keys that correspond to datetime filters.
141 | """
142 | now = datetime.now()
143 | filters = {
144 | 'archived_since': '08-08-2010',
145 | 'favorited_since': now
146 | }
147 | acceptable_filter_keys = filters.keys()
148 | filter_dict = filter_args_to_dict(filters, acceptable_filter_keys)
149 | self.assertEqual(set(filter_dict.keys()), set(acceptable_filter_keys))
150 | self.assertEqual(filter_dict['archived_since'], '2010-08-08T00:00:00')
151 | self.assertEqual(filter_dict['favorited_since'], now.isoformat())
152 |
153 |
154 | if __name__ == '__main__':
155 | unittest.main()
156 |
--------------------------------------------------------------------------------
/readability/utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 |
3 | """
4 | readability.utils
5 | ~~~~~~~~~~~~~~~~~
6 |
7 | This module provides various utils to the rest of the package.
8 |
9 | """
10 |
11 | import logging
12 |
13 | from datetime import datetime
14 |
15 | from dateutil.parser import parse as parse_datetime
16 |
17 |
18 | logger = logging.getLogger(__name__)
19 |
20 |
21 | # map of filter names to a data type. This is used to map names to a
22 | # casting function when needed.
23 | filter_type_map = {
24 | 'added_since': 'datetime',
25 | 'added_until': 'datetime',
26 | 'archive': 'int',
27 | 'archived_since': 'datetime',
28 | 'archived_until': 'datetime',
29 | 'exclude_accessibility': 'string',
30 | 'favorite': 'int',
31 | 'favorited_since': 'datetime',
32 | 'favorited_until': 'datetime',
33 | 'domain': 'string',
34 | 'only_delete': 'int',
35 | 'opened_since': 'datetime',
36 | 'opened_until': 'datetime',
37 | 'order': 'string',
38 | 'page': 'int',
39 | 'per_page': 'int',
40 | 'tags': 'string',
41 | 'updated_since': 'datetime',
42 | 'updated_until': 'datetime',
43 | }
44 |
45 |
46 | def cast_datetime_filter(value):
47 | """Cast a datetime filter value.
48 |
49 | :param value: string representation of a value that needs to be casted to
50 | a `datetime` object.
51 |
52 | """
53 | if isinstance(value, str):
54 | dtime = parse_datetime(value)
55 |
56 | elif isinstance(value, datetime):
57 | dtime = value
58 | else:
59 | raise ValueError('Received value of type {0}'.format(type(value)))
60 |
61 | return dtime.isoformat()
62 |
63 |
64 | def cast_integer_filter(value):
65 | """Cast an integer filter value.
66 |
67 | Theses are usually booleans in Python but they need to be sent as
68 | 1s and 0s to the API.
69 |
70 | :param value: boolean value that needs to be casted to an int
71 | """
72 | return int(value)
73 |
74 |
75 | def filter_args_to_dict(filter_dict, accepted_filter_keys=[]):
76 | """Cast and validate filter args.
77 |
78 | :param filter_dict: Filter kwargs
79 | :param accepted_filter_keys: List of keys that are acceptable to use.
80 |
81 | """
82 | out_dict = {}
83 | for k, v in filter_dict.items():
84 | # make sure that the filter k is acceptable
85 | # and that there is a value associated with the key
86 | if k not in accepted_filter_keys or v is None:
87 | logger.debug(
88 | 'Filter was not in accepted_filter_keys or value is None.')
89 | # skip it
90 | continue
91 | filter_type = filter_type_map.get(k, None)
92 |
93 | if filter_type is None:
94 | logger.debug('Filter key not foud in map.')
95 | # hmm, this was an acceptable filter type but not in the map...
96 | # Going to skip it.
97 | continue
98 |
99 | # map of casting funcitons to filter types
100 | filter_cast_map = {
101 | 'int': cast_integer_filter,
102 | 'datetime': cast_datetime_filter
103 | }
104 | cast_function = filter_cast_map.get(filter_type, None)
105 |
106 | # if we get a cast function, call it with v. If not, just use v.
107 | if cast_function:
108 | out_value = cast_function(v)
109 | else:
110 | out_value = v
111 | out_dict[k] = out_value
112 |
113 | return out_dict
114 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # -*- coding: utf-8 -*-
3 | import sys
4 |
5 | from setuptools import setup
6 |
7 | required = [
8 | 'pytest',
9 | 'requests',
10 | 'requests_oauthlib',
11 | 'httplib2==0.19.0',
12 | 'python-dateutil',
13 | ]
14 |
15 | # Python 2 dependencies
16 | if sys.version_info[0] == 2:
17 | required += [
18 | 'mock',
19 | ]
20 |
21 | setup(
22 | name='readability-api',
23 | version='1.0.2',
24 | description='Python client for the Readability Reader and Parser APIs.',
25 | long_description=open('README.rst').read(),
26 | author='The Readability Team',
27 | author_email='philip@readability.com',
28 | url='https://github.com/arc90/python-readability-api',
29 | packages=['readability'],
30 | install_requires=required,
31 | license='MIT',
32 | classifiers=(
33 | 'Development Status :: 5 - Production/Stable',
34 | 'Intended Audience :: Developers',
35 | 'Natural Language :: English',
36 | 'License :: OSI Approved :: MIT License',
37 | 'Programming Language :: Python',
38 | 'Programming Language :: Python :: 2.7',
39 | 'Programming Language :: Python :: 3.5',
40 | 'Programming Language :: Python :: Implementation :: PyPy',
41 | ),
42 | )
43 |
--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [tox]
2 | envlist = py27, py35, pypy, pypy3
3 |
4 | [testenv]
5 | commands = py.test
6 | deps =
7 | pytest
8 | requests
9 | requests_oauthlib
10 | httplib2==0.9.1
11 | python-dateutil
12 | mock
13 | passenv =
14 | READABILITY_CONSUMER_KEY
15 | READABILITY_CONSUMER_SECRET
16 | READABILITY_PARSER_TOKEN
17 | READABILITY_PASSWORD
18 | READABILITY_USERNAME
19 |
--------------------------------------------------------------------------------