├── README.md ├── news.tmpl └── rss2json.py /README.md: -------------------------------------------------------------------------------- 1 | RSS Scraper 2 | =========== 3 | 4 | This is a simple tool written in Python showing how to consume an RSS feed and output it as JSON. 5 | 6 | It is best to use a [virtualenv](http://www.virtualenv.org/en/latest/index.html) for managing dependencies. Anyone who 7 | already knows about this tool shouldn't have a problem with installing the required libraries. 8 | 9 | Dependencies on Linux 10 | --------------------- 11 | 12 | Open a terminal or SSH into a remote machine as root, or with an account that can is member of the sudoers group. On 13 | many machines you can execute pip immediately. If it is not installed then get it using easy_install: 14 | 15 | easy_install pip (if you are root) or sudo easy_install pip (if you need sudo) 16 | 17 | Next install the dependencies using pip: 18 | 19 | sudo pip install feedparser 20 | 21 | Dependencies on Windows 22 | ----------------------- 23 | Open an administrator command prompt. Run easy_install to check if it is configured. 24 | 25 | Install pip by executing the following command if you don't have pip already: 26 | 27 | easy_install pip 28 | 29 | Next install the dependencies using pip: 30 | 31 | pip install feedparser 32 | 33 | Usage 34 | ----- 35 | 36 | To return output to the console run: 37 | 38 | python rss2json.py 39 | -------------------------------------------------------------------------------- /news.tmpl: -------------------------------------------------------------------------------- 1 | callback1001( 2 | { 3 | "items": 4 | [ 5 | {% for entry in feed %} 6 | { 7 | "text": "{{ entry.link }}", 8 | "id": "{{ entry.id }}", 9 | "strapline": "date", 10 | "published": "{{ entry.published }}", 11 | "summary": "{{ entry.summary }}", 12 | "title": "{{ entry.title }}", 13 | "summary_detail": "{{ entry.summary_detail.value }}", 14 | 15 | "leaf": "true" 16 | }, 17 | {% endfor %} 18 | ] 19 | } ); -------------------------------------------------------------------------------- /rss2json.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import feedparser 3 | 4 | from jinja2 import Environment 5 | from jinja2.loaders import FileSystemLoader 6 | 7 | def render_template(data, template_name, filters=None): 8 | """Render data using a jinja2 template""" 9 | env = Environment(loader=FileSystemLoader('')) 10 | 11 | if filters is not None: 12 | for key, value in filters.iteritems(): 13 | env.filters[key] = value 14 | 15 | template = env.get_template(template_name) 16 | return template.render(feed=data).encode('utf-8') 17 | 18 | def main(): 19 | feed = feedparser.parse('http://www.astrazeneca.com/cs/Satellite?c=AZ_Placeholder_C&childpagename=astrazeneca%2FAZ_Placeholder_C%2FRSS&cid=1277293175128&p=1277293173773&packedargs=item-alias%3DLatestPressReleasesHubBlock%26item-context%3DHome&pagename=AZ%2FWrapper') 20 | 21 | json = render_template(feed.entries, 'news.tmpl') 22 | 23 | with open('../news.json', 'w') as output: 24 | output.write(json) 25 | 26 | if __name__ == '__main__': 27 | main() --------------------------------------------------------------------------------