├── Procfile ├── requirements.txt ├── .gitignore ├── LICENSE ├── crossdomain.py ├── README.md ├── app.py └── templates └── main.html /Procfile: -------------------------------------------------------------------------------- 1 | web: newrelic-admin run-program gunicorn app:app -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Flask==0.10.1 2 | HackerNews 3 | beautifulsoup4 4 | gunicorn 5 | requests 6 | python-binary-memcached 7 | newrelic 8 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .env 2 | 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Packages 9 | *.egg 10 | *.egg-info 11 | dist 12 | build 13 | eggs 14 | parts 15 | bin 16 | var 17 | sdist 18 | develop-eggs 19 | .installed.cfg 20 | lib 21 | lib64 22 | __pycache__ 23 | 24 | # Installer logs 25 | pip-log.txt 26 | 27 | # Unit test / coverage reports 28 | .coverage 29 | .tox 30 | nosetests.xml 31 | 32 | # Translations 33 | *.mo 34 | 35 | # Mr Developer 36 | .mr.developer.cfg 37 | .project 38 | .pydevproject 39 | 40 | *.komodoproject 41 | .komodotools/ 42 | 43 | .DS_Store 44 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2013 Karan Goel 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /crossdomain.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from datetime import timedelta 4 | from flask import make_response, request, current_app 5 | from functools import update_wrapper 6 | 7 | 8 | def crossdomain(origin=None, methods=['GET'], headers=None, 9 | max_age=21600, attach_to_all=True, 10 | automatic_options=True): 11 | if methods is not None: 12 | methods = ', '.join(sorted(x.upper() for x in methods)) 13 | if headers is not None and not isinstance(headers, basestring): 14 | headers = ', '.join(x.upper() for x in headers) 15 | if not isinstance(origin, basestring): 16 | origin = ', '.join(origin) 17 | if isinstance(max_age, timedelta): 18 | max_age = max_age.total_seconds() 19 | 20 | def get_methods(): 21 | if methods is not None: 22 | return methods 23 | 24 | options_resp = current_app.make_default_options_response() 25 | return options_resp.headers['allow'] 26 | 27 | def decorator(f): 28 | def wrapped_function(*args, **kwargs): 29 | if automatic_options and request.method == 'OPTIONS': 30 | resp = current_app.make_default_options_response() 31 | else: 32 | resp = make_response(f(*args, **kwargs)) 33 | if not attach_to_all and request.method != 'OPTIONS': 34 | return resp 35 | 36 | h = resp.headers 37 | 38 | h['Access-Control-Allow-Origin'] = origin 39 | h['Access-Control-Allow-Methods'] = get_methods() 40 | h['Access-Control-Max-Age'] = str(max_age) 41 | if headers is not None: 42 | h['Access-Control-Allow-Headers'] = headers 43 | return resp 44 | 45 | f.provide_automatic_options = False 46 | return update_wrapper(wrapped_function, f) 47 | return decorator 48 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | HNify 2 | ===== 3 | 4 | ![HNify](https://raw.github.com/karan/HackerNewsAPI/master/HN.jpg) 5 | 6 | Unofficial REST API for [Hacker News](https://news.ycombinator.com/). Built using [HackerNewsAPI](https://github.com/karan/HackerNewsAPI). 7 | 8 | Now uses memcached for increased performace! 9 | 10 | Start 11 | ===== 12 | 13 | $ brew install memcached # install memcached 14 | $ pip install -r requirements.txt # install dependencies 15 | $ memcached -vv # start memcached server 16 | $ python app.py # start the api 17 | 18 | Deploy to Heroku 19 | ===== 20 | 21 | $ pip install -r requirements.txt # install dependencies 22 | $ heroku create 23 | $ heroku addons:add memcachedcloud 24 | $ heroku addons:add newrelic 25 | $ (git add, git commit) 26 | $ git push heroku master 27 | 28 | If you get an error on the memcached line, see the following [help article](https://devcenter.heroku.com/articles/config-vars). 29 | 30 | Usage 31 | ========== 32 | 33 | **Base URL:** [http://hnify.herokuapp.com](http://hnify.herokuapp.com) 34 | 35 | **Output:** JSON 36 | 37 | ### Get stories from top page 38 | 39 | #### `GET /get/top` 40 | 41 | **Parameters:** 42 | 43 | | Name | Type | Description | 44 | | ---- | ---- | ----------- | 45 | | `limit` | integer | Return only at most these many stories, at least 30 | 46 | 47 | ### Get stories from newest page 48 | 49 | #### `GET /get/newest` 50 | 51 | **Parameters:** 52 | 53 | | Name | Type | Description | 54 | | ---- | ---- | ----------- | 55 | | `limit` | integer | Return only at most these many stories, at least 30 | 56 | 57 | ### Get stories from best page 58 | 59 | #### `GET /get/best` 60 | 61 | **Parameters:** 62 | 63 | | Name | Type | Description | 64 | | ---- | ---- | ----------- | 65 | | `limit` | integer | Return only at most these many stories, at least 30 | 66 | 67 | ### Currently trending topics on HN 68 | 69 | #### `GET /get/trends` 70 | 71 | ### Get comments from story id 72 | 73 | #### `GET /get/comments//` 74 | 75 | -------- 76 | 77 | ### Example 78 | 79 | karan:$ curl -i http://hnify.herokuapp.com/get/newest 80 | HTTP/1.1 200 OK 81 | Content-Type: application/json 82 | Date: Tue, 29 Oct 2013 06:23:39 GMT 83 | Server: gunicorn/18.0 84 | Content-Length: 16562 85 | Connection: keep-alive 86 | 87 | { 88 | "stories": [ 89 | { 90 | "comments_link": "http://news.ycombinator.com/item?id=6632337", 91 | "domain": "independent.co.uk", 92 | "is_self": false, 93 | "link": "http://www.independent.co.uk/news/science/lifi-breakthrough-internet-connections-using-light-bulbs-are-250-times-faster-than-broadband-8909320.html", 94 | "num_comments": 0, 95 | "points": 1, 96 | "published_time": "1 minute ago", 97 | "rank": 1, 98 | "story_id": 6632337, 99 | "submitter": "yapcguy", 100 | "submitter_profile": "http://news.ycombinator.com/user?id=yapcguy", 101 | "title": "Li-Fi: Internet connections using light bulbs are 250 x faster than broadband" 102 | }, 103 | { 104 | "comments_link": "http://news.ycombinator.com/item?id=6632335", 105 | "domain": "github.com", 106 | "is_self": false, 107 | "link": "https://github.com/postmodern/chruby", 108 | "num_comments": 0, 109 | "points": 2, 110 | "published_time": "1 minute ago", 111 | "rank": 2, 112 | "story_id": 6632335, 113 | "submitter": "michaelrkn", 114 | "submitter_profile": "http://news.ycombinator.com/user?id=michaelrkn", 115 | "title": "Chruby: a lightweight, elegant RVM alternative" 116 | }, 117 | <-- snip --> 118 | ] 119 | } 120 | 121 | 122 | Donations 123 | ============= 124 | 125 | If HNify has helped you in any way, and you'd like to help the developer, please consider donating. 126 | 127 | **- BTC: [19dLDL4ax7xRmMiGDAbkizh6WA6Yei2zP5](http://i.imgur.com/bAQgKLN.png)** 128 | 129 | **- Gittip: [https://www.gittip.com/karan/](https://www.gittip.com/karan/)** 130 | 131 | **- Flattr: [https://flattr.com/profile/thekarangoel](https://flattr.com/profile/thekarangoel)** 132 | 133 | 134 | Contribute 135 | ======== 136 | 137 | If you want to add any new features, or improve existing ones, feel free to send a pull request! -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | sys.setrecursionlimit(1000) 5 | 6 | import time 7 | import re 8 | from collections import Counter 9 | import os 10 | from crossdomain import crossdomain 11 | 12 | from hn import * 13 | from flask import Flask, jsonify, make_response, render_template, redirect, request 14 | import bmemcached as memcache 15 | 16 | 17 | app = Flask(__name__) 18 | 19 | # cache time to live in seconds 20 | timeout = 600 21 | 22 | mc = memcache.Client(os.environ.get('MEMCACHEDCLOUD_SERVERS').split(','), 23 | os.environ.get('MEMCACHEDCLOUD_USERNAME'), 24 | os.environ.get('MEMCACHEDCLOUD_PASSWORD')) 25 | 26 | mc.set('top', None, time=timeout) 27 | mc.set('best', None, time=timeout) 28 | mc.set('newest', None, time=timeout) 29 | mc.set('trends', None, time=timeout) 30 | 31 | stopwords = ["a","able","about","across","after","all","almost","also","am", 32 | "among","an","and","any","are","as","at","be","because","been", 33 | "but","by","can","cannot","could","dear","did","do","does", 34 | "either","else","ever","every","for","from","get","got","had", 35 | "has","have","he","her","hers","him","his","how","however","i", 36 | "if","in","into","is","it","its","just","least","let","like", 37 | "likely","may","me","might","most","must","my","neither","no", 38 | "nor","not","of","off","often","on","only","or","other","our", 39 | "own","rather","said","say","says","she","should","since","so", 40 | "some","than","that","the","their","them","then","there","these", 41 | "they","this","tis","to","too","twas","us","wants","was","we", 42 | "were","what","when","where","which","while","who","whom","why", 43 | "will","with","would","yet","you","your", 'show hn', 'ask hn', 44 | 'hn', 'show', 'ask'] 45 | 46 | 47 | @app.route('/') 48 | @crossdomain(origin='*') 49 | def index(): 50 | ''' 51 | This page is displayed when index page is requested. 52 | ''' 53 | return render_template('main.html') 54 | 55 | 56 | @app.route('/get//', methods=['GET']) 57 | @app.route('/get/', methods=['GET']) 58 | @crossdomain(origin='*') 59 | def get_stories(story_type): 60 | ''' 61 | Returns stories from the requested page of HN. 62 | story_type is one of: 63 | \ttop 64 | \tnewest 65 | \tbest 66 | ''' 67 | story_type = str(story_type) 68 | limit = request.args.get('limit') 69 | limit = int(limit) if limit is not None else 30 70 | 71 | temp_cache = mc.get(story_type) # get the cache from memory 72 | 73 | if temp_cache is not None and len(temp_cache['stories']) >= limit: 74 | # we have enough in cache already 75 | return jsonify({'stories': temp_cache['stories'][:limit]}) 76 | else: 77 | hn = HN() 78 | if story_type == 'top': 79 | stories = [story for story in hn.get_stories(limit=limit)] 80 | elif story_type in ['newest', 'best']: 81 | stories = [story for story in hn.get_stories(story_type=story_type, limit=limit)] 82 | else: 83 | abort(404) 84 | mc.set(story_type, {'stories': serialize(stories)}, time=timeout) 85 | return jsonify(mc.get(story_type)) 86 | 87 | 88 | @app.route('/get/comments/', methods=['GET']) 89 | @app.route('/get/comments//', methods=['GET']) 90 | @crossdomain(origin='*') 91 | def comments(story_id): 92 | story_id = int(story_id) 93 | memcache_key = "%s_comments" % (story_id) 94 | 95 | temp_cache = mc.get(memcache_key) # get the cache from memory 96 | result = [] 97 | 98 | if temp_cache is None: 99 | story = Story.fromid(story_id) 100 | comments = story.get_comments() 101 | for comment in comments: 102 | result.append({ 103 | "comment_id": comment.comment_id, 104 | "level": comment.level, 105 | "user": comment.user, 106 | "time_ago": comment.time_ago, 107 | "body": comment.body, 108 | "body_html": comment.body_html 109 | }) 110 | mc.set(memcache_key, {'comments': result}, time=timeout) 111 | return jsonify(mc.get(memcache_key)) 112 | 113 | 114 | @app.route('/get/trends', methods=['GET']) 115 | @crossdomain(origin='*') 116 | def trends(): 117 | ''' 118 | Returns currently trending topics. 119 | ''' 120 | temp_cache = mc.get('trends') # get the cache from memory 121 | if temp_cache is not None: 122 | return jsonify(temp_cache) 123 | else: 124 | hn = HN() 125 | mc.set('trends', {'trends': get_trends()}, time=timeout) 126 | return jsonify(mc.get('trends')) 127 | 128 | 129 | def get_trends(): 130 | ''' 131 | Returns a list of trending topics on HN. 132 | ''' 133 | hn = HN() 134 | 135 | titles = [story.title for story in hn.get_stories(limit=90)] 136 | 137 | one_grams = [] # list of 1-grams 138 | two_grams = [] # list of 2-grams 139 | 140 | # Single word regex 141 | one_word_pat = re.compile('[A-Z][A-Za-z.]+') 142 | # Two consecutive word @ http://regex101.com/r/xE2vT0 143 | two_word_pat = re.compile('(?=((? 1] 156 | 157 | def serialize(stories): 158 | ''' 159 | Takes a list of Story objects and returns a list of dict's. 160 | ''' 161 | result = [] 162 | 163 | for story in stories: 164 | result.append( 165 | { 166 | "comments_link": story.comments_link, 167 | "domain": story.domain, 168 | "is_self": story.is_self, 169 | "link": story.link, 170 | "num_comments": story.num_comments, 171 | "points": story.points, 172 | "published_time": story.published_time, 173 | "rank": story.rank, 174 | "story_id": story.story_id, 175 | "submitter": story.submitter, 176 | "submitter_profile": story.submitter_profile, 177 | "title": story.title 178 | } 179 | ) 180 | return result 181 | 182 | 183 | @app.errorhandler(404) 184 | def not_found(error): 185 | ''' 186 | Returns a jsonified 404 error message instead of a HTTP 404 error. 187 | ''' 188 | return make_response(jsonify({ 'error': '404 not found' }), 404) 189 | 190 | 191 | @app.errorhandler(503) 192 | def not_found(error): 193 | ''' 194 | Returns a jsonified 503 error message instead of a HTTP 404 error. 195 | ''' 196 | return make_response(jsonify({ 'error': '503 something wrong' }), 503) 197 | 198 | 199 | @app.errorhandler(500) 200 | def not_found(error): 201 | ''' 202 | Returns a jsonified 500 error message instead of a HTTP 404 error. 203 | ''' 204 | return make_response(jsonify({ 'error': '500 something wrong' }), 500) 205 | 206 | 207 | if __name__ == '__main__': 208 | app.run(debug=True) 209 | -------------------------------------------------------------------------------- /templates/main.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 305 | HNify - Hacker News API 306 | 307 | 308 | 309 |

HNify

310 | 311 |

HNify

312 | 313 |

Unofficial REST API for Hacker News. Built using HackerNewsAPI.

314 | 315 |

Now uses memcached for increased performace!

316 | 317 |

Start

318 | 319 |
$ brew install memcached            # install memcached
320 | $ pip install -r requirements.txt   # install dependencies
321 | $ memcached -vv                     # start memcached server
322 | $ python app.py                     # start the api
323 | 
324 | 325 |

Deploy to Heroku

326 | 327 |
$ pip install -r requirements.txt   # install dependencies
328 | $ heroku create
329 | $ heroku addons:add memcachedcloud
330 | $ heroku addons:add newrelic
331 | $ (git add, git commit)
332 | $ git push heroku master
333 | 
334 | 335 |

If you get an error on the memcached line, see the following help article.

336 | 337 |

Usage

338 | 339 |

Base URL: http://hnify.herokuapp.com

340 | 341 |

Output: JSON

342 | 343 |

Get stories from top page

344 | 345 |

GET /get/top

346 | 347 |

Parameters:

348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 |
Name Type Description
limit integer Return only at most these many stories, at least 30
365 | 366 | 367 |

Get stories from newest page

368 | 369 |

GET /get/newest

370 | 371 |

Parameters:

372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 |
Name Type Description
limit integer Return only at most these many stories, at least 30
389 | 390 | 391 |

Get stories from best page

392 | 393 |

GET /get/best

394 | 395 |

Parameters:

396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 |
Name Type Description
limit integer Return only at most these many stories, at least 30
413 | 414 | 415 |

Currently trending topics on HN

416 | 417 |

GET /get/trends

418 | 419 |

Get comments from story id

420 | 421 |

GET /get/comments/<story_id>/

422 | 423 |
424 | 425 |

Example

426 | 427 |
karan:$ curl -i http://hnify.herokuapp.com/get/newest
428 | HTTP/1.1 200 OK
429 | Content-Type: application/json
430 | Date: Tue, 29 Oct 2013 06:23:39 GMT
431 | Server: gunicorn/18.0
432 | Content-Length: 16562
433 | Connection: keep-alive
434 | 
435 | {
436 |   "stories": [
437 |     {
438 |       "comments_link": "http://news.ycombinator.com/item?id=6632337", 
439 |       "domain": "independent.co.uk", 
440 |       "is_self": false, 
441 |       "link": "http://www.independent.co.uk/news/science/lifi-breakthrough-internet-connections-using-light-bulbs-are-250-times-faster-than-broadband-8909320.html", 
442 |       "num_comments": 0, 
443 |       "points": 1, 
444 |       "published_time": "1 minute ago", 
445 |       "rank": 1, 
446 |       "story_id": 6632337, 
447 |       "submitter": "yapcguy", 
448 |       "submitter_profile": "http://news.ycombinator.com/user?id=yapcguy", 
449 |       "title": "Li-Fi: Internet connections using light bulbs are 250 x faster than broadband"
450 |     }, 
451 |     {
452 |       "comments_link": "http://news.ycombinator.com/item?id=6632335", 
453 |       "domain": "github.com", 
454 |       "is_self": false, 
455 |       "link": "https://github.com/postmodern/chruby", 
456 |       "num_comments": 0, 
457 |       "points": 2, 
458 |       "published_time": "1 minute ago", 
459 |       "rank": 2, 
460 |       "story_id": 6632335, 
461 |       "submitter": "michaelrkn", 
462 |       "submitter_profile": "http://news.ycombinator.com/user?id=michaelrkn", 
463 |       "title": "Chruby: a lightweight, elegant RVM alternative"
464 |     }, 
465 |     <-- snip -->
466 |     ]
467 | }
468 | 
469 | 470 |

Donations

471 | 472 |

If HNify has helped you in any way, and you'd like to help the developer, please consider donating.

473 | 474 |

- BTC: 19dLDL4ax7xRmMiGDAbkizh6WA6Yei2zP5

475 | 476 |

- Gittip: https://www.gittip.com/karan/

477 | 478 |

- Flattr: https://flattr.com/profile/thekarangoel

479 | 480 |

Contribute

481 | 482 |

If you want to add any new features, or improve existing ones, feel free to send a pull request!

483 | 484 | 485 | --------------------------------------------------------------------------------