8 | The user endpoint is not a content resource per se, but lets you get
9 | information about your API usage. As this is sensitive information,
10 | requests must be authorized. Parameters are not required.
11 |
35 | {% endblock %}
--------------------------------------------------------------------------------
/CHANGES.md:
--------------------------------------------------------------------------------
1 | #Changelog
2 |
3 | ## 25. Apr 2013
4 | * [FIXED] URLs for sub-departments now working correctly
5 |
6 | ## 21. Apr 2013
7 | * [FIXED] Keyword IDs are now URL-safe
8 |
9 | ##02. Jan 2013
10 | * [IMPROVED] API explorer performance
11 |
12 | ##21. Dec 2012
13 | * [IMPROVED] General documentation expanded
14 | * [IMPROVED] Cosmetic changes at developer portal
15 | * [DEPRECATED] App Gallery removed, now only at our blog
16 |
17 | ##20. Dec 2012
18 | * [FIXED] Server errors at /content resolved
19 | * [FIXED] UTF8 support for all parameters
20 | * [IMPROVED] Performance optimizations for /author
21 | * [ADDED] Id field for non-content searches
22 | * [ADDED] Href field for authors, if available
23 |
24 | ##05. Dec 2012
25 | * [ADDED] Uuid field for content objects
26 |
27 | ##04. Dec 2012
28 | * [FIXED] Response header gets proper encoding
29 |
30 | ##30. Nov 2012
31 | * [FIXED] Related articles now have proper titles
32 |
33 | ##23. Nov 2012
34 | * [ADDED] Initial beta release
--------------------------------------------------------------------------------
/src/zeit/api/schemas/solrmsg.rnc:
--------------------------------------------------------------------------------
1 | #
2 | # zeit.api.schemas.solrmsg
3 | # ~~~~~~~~~~~~~~~~~~~~~~~~
4 | #
5 | # RelaxNG schema for messages to update articles in the Solr index.
6 | # For more info see: http://wiki.apache.org/solr/UpdateXmlMessages
7 | #
8 | # Copyright: (c) 2013 by ZEIT ONLINE.
9 | # License: BSD, see LICENSE.md for more details.
10 | #
11 |
12 |
13 | default namespace = ""
14 |
15 | start =
16 | element doc {
17 | element field { attribute name { 'related' }, text }*,
18 | element field { attribute name { 'keyword' }, text }*,
19 | element field { attribute name { 'author' }, text }+,
20 | element field { attribute name { 'product' }, text }?,
21 | element field { attribute name { 'release_date' }, text }?,
22 | element field { attribute name { 'department' }, text }*,
23 | element field { attribute name { 'sub_department' }, text }*,
24 | element field { attribute name { 'self' }, text },
25 | element field { attribute name { 'subtitle' }, text }?,
26 | element field { attribute name { 'supertitle' }, text }?,
27 | element field { attribute name { 'title' }, text }?,
28 | element field { attribute name { 'body' }, text }
29 | }
--------------------------------------------------------------------------------
/dev.cfg:
--------------------------------------------------------------------------------
1 | [buildout]
2 | extends = buildout.cfg
3 | parts +=
4 | solr-server
5 | solr-schema
6 | solr-conf
7 | tests
8 |
9 | [api-wsgi]
10 | app_name = api_dev
11 |
12 | [doc-wsgi]
13 | app_name = doc_dev
14 |
15 | [solr-package]
16 | recipe = gocept.download
17 | url = http://archive.apache.org/dist/lucene/solr/3.6.2/apache-solr-3.6.2.tgz
18 | md5sum = e9c51f51265b070062a9d8ed50b84647
19 |
20 | [solr-schema]
21 | recipe = collective.recipe.rsync
22 | source = ${buildout:directory}/src/zeit/api/schemas/schema.xml
23 | target = ${buildout:directory}/parts/solr-server/solr/conf/schema.xml
24 |
25 | [solr-conf]
26 | recipe = collective.recipe.rsync
27 | source = ${buildout:directory}/src/zeit/api/schemas/solrconfig.xml
28 | target = ${buildout:directory}/parts/solr-server/solr/conf/solrconfig.xml
29 |
30 | [solr-server]
31 | recipe = collective.recipe.solrinstance
32 | solr-location = ${solr-package:location}
33 | index = name:uid type:string indexed:true stored:true required:true
34 | schema-template = ${buildout:directory}/src/zeit/api/schemas/schema_.xml
35 | vardir = ${deployment:root}/var/${:_buildout_section_name_}
36 | logdir = ${deployment:log-directory}
37 | script = ${deployment:rc-directory}/${:_buildout_section_name_}
38 | autoCommitMaxTime = 1000
39 |
40 | [tests]
41 | recipe = pbp.recipe.noserunner
42 | eggs = ${eggs:eggs}
43 | defaults = -v --nocapture
--------------------------------------------------------------------------------
/LICENCE.md:
--------------------------------------------------------------------------------
1 | Copyright (c) 2013 by ZEIT ONLINE GmbH and contributors.
2 |
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are met:
7 |
8 | * Redistributions of source code must retain the above copyright notice, this
9 | list of conditions and the following disclaimer.
10 |
11 | * Redistributions in binary form must reproduce the above copyright notice,
12 | this list of conditions and the following disclaimer in the documentation
13 | and/or other materials provided with the distribution.
14 |
15 | * Neither the name of the ZEIT ONLINE GmbH nor the names of its contributors
16 | may be used to endorse or promote products derived from this software
17 | without specific prior written permission.
18 |
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | DISCLAIMED. IN NO EVENT SHALL ZEIT ONLINE BE LIABLE FOR ANY DIRECT,
23 | INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
24 | BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
25 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
26 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
27 | OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
28 | ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------------
/src/zeit/api/schemas/database.sql:
--------------------------------------------------------------------------------
1 | /*
2 | zeit.api.schemas.database
3 | ~~~~~~~~~~~~~
4 |
5 | These are the initialization statements for the main SQLite database.
6 |
7 | Copyright: (c) 2013 by ZEIT ONLINE.
8 | License: BSD, see LICENSE.md for more details.
9 | */
10 |
11 | CREATE TABLE IF NOT EXISTS author
12 | (
13 | href CHAR(256),
14 | id CHAR(256) NOT NULL PRIMARY KEY,
15 | type CHAR(32) NOT NULL,
16 | uri CHAR(288) NOT NULL,
17 | value CHAR(256) NOT NULL
18 | );
19 |
20 | CREATE TABLE IF NOT EXISTS client
21 | (
22 | api_key CHAR(64) NOT NULL PRIMARY KEY,
23 | tier CHAR(32) NOT NULL,
24 | name CHAR(128),
25 | email CHAR(128),
26 | requests UNSIGNED INTEGER,
27 | reset UNSIGNED INTEGER
28 | );
29 |
30 | CREATE TABLE IF NOT EXISTS department
31 | (
32 | href CHAR(128),
33 | id CHAR(64) NOT NULL PRIMARY KEY,
34 | parent CHAR(64),
35 | uri CHAR(96) NOT NULL,
36 | value CHAR(64) NOT NULL,
37 | FOREIGN KEY(parent) REFERENCES department(id)
38 | );
39 |
40 | CREATE TABLE IF NOT EXISTS keyword
41 | (
42 | href CHAR(128),
43 | id CHAR(64) NOT NULL PRIMARY KEY,
44 | lexical CHAR(64) NOT NULL,
45 | score UNSIGNED INTEGER,
46 | type CHAR(32) NOT NULL,
47 | uri CHAR(96) NOT NULL,
48 | value CHAR(64) NOT NULL
49 | );
50 |
51 | CREATE TABLE IF NOT EXISTS product
52 | (
53 | href CHAR(128),
54 | id CHAR(32) NOT NULL PRIMARY KEY,
55 | uri CHAR(64) NOT NULL,
56 | value CHAR(64) NOT NULL
57 | );
58 |
59 | CREATE TABLE IF NOT EXISTS series
60 | (
61 | href CHAR(128),
62 | id CHAR(64) NOT NULL PRIMARY KEY,
63 | name CHAR(128) NOT NULL,
64 | uri CHAR(96) NOT NULL,
65 | value CHAR(128) NOT NULL
66 | );
--------------------------------------------------------------------------------
/src/zeit/api/templates/licence.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block docTitle %}Licence{% endblock %}
3 | {% block docBody %}
4 |
5 |
6 | ZEIT ONLINE stellt die API (Application Programming Interface) als
7 | öffentliche BETA-Version zur Verfügung. Die API umfasst die Inhalte des
8 | ZEIT Archivs seit 1946 sowie Inhalte von ZEIT ONLINE in maschinenlesbarer
9 | Form. Mit der API sollen nicht-kommerzielle Erforschung, Analyse und
10 | Visualisierung der Inhalte erleichtert werden. Umsetzungen auf Basis der
11 | API möchten wir sehr gerne auf diesen Seiten vorstellen, bitte informieren
12 | Sie uns unter api@zeit.de.
13 |
14 |
15 | Die Speicherung und Ausgabe des Volltexts von Artikeln ist zum derzeitigen
16 | Stand nicht möglich. Bitte beachten Sie, dass die Beiträge unserer Autoren
17 | dem Schutz des Urheberrechts unterliegen. Falls Sie ein Projekt auf
18 | Grundlage von Volltextübernahmen planen, möchten wir Sie bitten, mit uns
19 | Kontakt aufzunehmen. Es gelten die
20 |
21 | Allgemeinen Nutzungsbedingungen von ZEIT ONLINE. Für Rückfragen stehen
22 | wir gern zur Verfügung.
23 |
24 |
25 | Wir wünschen einen fruchtbaren Dialog mit allen an der API Interessierten.
26 | Für Vorschläge, technische Fragen oder Anregungen erreichen Sie uns unter
27 | api@zeit.de oder
28 |
29 | @zeitonline_dev, bei Fragen zur kommerziellen Verwendung und Lizenzierung
30 | unter online-syndication@zeit.de.
31 |
32 |
33 | {% endblock %}
34 |
--------------------------------------------------------------------------------
/src/zeit/api/access.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | zeit.api.access
4 | ~~~~~~~~~~~~~~~
5 |
6 | This module provides functionalities for regulating API access.
7 |
8 | Copyright: (c) 2013 by ZEIT ONLINE.
9 | License: BSD, see LICENSE.md for more details.
10 | """
11 | import time
12 |
13 | from flask import g, current_app as current_app
14 |
15 | from . import exception
16 |
17 |
18 | class Verifictaion(object):
19 | """Context manager class for API key validation and usage tracking."""
20 |
21 | def __enter__(self):
22 | """Verify key and quota. Raise exception if either fails."""
23 | if not hasattr(g, 'api_key'):
24 | raise exception.unauthorized()
25 |
26 | query = 'SELECT requests,reset,tier FROM client WHERE api_key=?;'
27 | client = g.db.execute(query, (g.api_key,)).fetchone()
28 |
29 | if not client:
30 | raise exception.unauthorized()
31 |
32 | requests = client[0]
33 | reset = client[1]
34 | quota = current_app.config['ACCESS_TIERS'][client[2]]
35 | timeframe = current_app.config['ACCESS_TIMEFRAME']
36 |
37 | if (int(time.time()) - reset) / timeframe > 0:
38 | reset += ((int(time.time()) - reset) / timeframe) * timeframe
39 | requests = 0
40 | query = ('UPDATE OR IGNORE client SET reset=?, requests=? '
41 | 'WHERE api_key=?;')
42 | g.db.execute(query, (reset, requests, g.api_key))
43 |
44 | if quota <= requests:
45 | raise exception.too_many_requests()
46 |
47 | def __exit__(self, type, value, traceback):
48 | """Increase request counter before closing context."""
49 | query = ('UPDATE OR IGNORE client SET requests=requests + 1 '
50 | 'WHERE api_key=?;')
51 | g.db.execute(query, (g.api_key,))
52 |
--------------------------------------------------------------------------------
/src/zeit/api/settings.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | zeit.api.settings
4 | ~~~~~~~~~~~~~~~~
5 |
6 | This module contains config classes for different build scenarios.
7 |
8 | Copyright: (c) 2013 by ZEIT ONLINE.
9 | License: BSD, see LICENSE.md for more details.
10 | """
11 |
12 |
13 | class Config(object):
14 |
15 | ACCESS_TIMEFRAME = 86400
16 | ACCESS_TIERS = {'free': 10000, 'pro': 50000, 'max': 1000000}
17 |
18 | SCHEMA = '/schemas/database.sql'
19 | DATABASE = '/var/lib/zon-api/data.db'
20 | PRODUCT_ALPHABET = ''
21 | SERIES_ALPHABET = ''
22 | KEYWORD_ALPHABET = ''
23 | DEPARTMENT_ALPHABET = ''
24 | RECAPTCHA_PRIVATE_KEY = ''
25 | RECAPTCHA_PUBLIC_KEY = ''
26 |
27 | try:
28 | import private
29 | PRODUCT_ALPHABET = private.PRODUCT_ALPHABET
30 | SERIES_ALPHABET = private.SERIES_ALPHABET
31 | KEYWORD_ALPHABET = private.KEYWORD_ALPHABET
32 | DEPARTMENT_ALPHABET = private.DEPARTMENT_ALPHABET
33 | except:
34 | pass
35 |
36 |
37 | class ProductionConfig(Config):
38 |
39 | DOC_URL = 'https://developer.zeit.de'
40 | API_URL = 'https://api.zeit.de'
41 | SOLR_URL = 'http://127.0.0.1:8983/solr'
42 |
43 | try:
44 | import private
45 | RECAPTCHA_PRIVATE_KEY = private.RECAPTCHA_PRIVATE_PROD
46 | RECAPTCHA_PUBLIC_KEY = private.RECAPTCHA_PUBLIC_PROD
47 | except:
48 | pass
49 |
50 |
51 | class DevelopmentConfig(Config):
52 |
53 | DOC_URL = 'http://localhost:9090'
54 | API_URL = 'http://localhost:9091'
55 | SOLR_URL = 'http://developer.zeit.de:8983/solr'
56 |
57 | try:
58 | import private
59 | RECAPTCHA_PRIVATE_KEY = private.RECAPTCHA_PRIVATE_DEVEL
60 | RECAPTCHA_PUBLIC_KEY = private.RECAPTCHA_PUBLIC_DEVEL
61 | except:
62 | pass
63 |
64 |
65 | class LocalConfig(DevelopmentConfig):
66 |
67 | API_PORT = 5000
68 | DOC_PORT = 5001
69 | SERVERNAME = '127.0.0.1'
70 | API_URL = 'http://%s:%d' % (SERVERNAME, API_PORT)
71 | DOC_URL = 'http://%s:%d' % (SERVERNAME, DOC_PORT)
72 |
73 |
74 | class TestingConfig(LocalConfig):
75 |
76 | TESTING = True
77 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Content-API
2 |
3 | ZEIT ONLINE content API - A [flask](http://flask.pocoo.org/) –
4 | [Solr](http://lucene.apache.org/solr/) middleware
5 |
6 |
7 | ## Frequent questions
8 |
9 | ### What is this?
10 | This is the ZEIT ONLINE content API project as available at
11 | https://developer.zeit.de. It enables you to access articles and corresponding
12 | metadata from the ZEIT newspaper archive, as well as recent articles from ZEIT
13 | and ZEIT ONLINE.
14 |
15 | ### Is it ready?
16 | It is still in public beta. We are working on improving the quality of our data
17 | as well as the stability of our code. You are welcome to test and experiment
18 | with the API, take a look at the code, file a bug report or request a feature.
19 |
20 | ### Where are the docs?
21 | You can find the documentation at our developer portal over at
22 | https://developer.zeit.de
23 |
24 | ### Can I run the API locally?
25 | Yes. It is tailored to the infrastructure we have here at ZEIT ONLINE, though.
26 | So it might not be of too much use for other scenarios and will require some
27 | adaptation. See below for instructions.
28 |
29 | ###How can I get in touch?
30 | We would love to hear your ideas and feedback. Join the discussion in the
31 | [issues section](http://github.com/ZeitOnline/content-api/issues), contact us
32 | on Twitter at [zeitonline_dev](http://twitter.com/zeitonline_dev) or send an
33 | Email to [api@zeit.de](mailto:api@zeit.de).
34 |
35 |
36 | ## Getting started
37 |
38 | Check out the repository, change to the project folder and then bootstrap the
39 | project like this:
40 | ```bash
41 | $ ./bootstrap.sh # requires python2.7 and virtualenv to be installed
42 | ```
43 |
44 | Run the server like this:
45 | ```bash
46 | $ bin/api
47 | $ bin/doc
48 | ```
49 |
50 | ## Local solr
51 |
52 | Run buildout with the development configuration:
53 | ```bash
54 | $ bin/buildout -c dev.cfg
55 | ```
56 |
57 | This will install dependencies, generate executables for the API, the developer
58 | portal and the testsuite and download a Solr package. To start the Solr
59 | locally, switch to parts/solr-server and enter:
60 | ```bash
61 | $ java -jar start.jar
62 | ```
63 |
64 | With the search server running, you can now start the testsuite:
65 | ```bash
66 | $ bin/tests
67 | ```
68 |
--------------------------------------------------------------------------------
/src/zeit/api/application.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | zeit.api.application
4 | ~~~~~~~~~~~~~~~~~~~~
5 |
6 | This module contains factory methods, that return WSGI compatible app
7 | instances configured for deployment or testing, and a local run method,
8 | that does not require a full WSGI server.
9 |
10 | Copyright: (c) 2013 by ZEIT ONLINE.
11 | License: BSD, see LICENSE.md for more details.
12 | """
13 |
14 | import flask
15 |
16 | from . import settings, blueprints
17 |
18 |
19 | def make_app(blueprint, config):
20 | """Configure a flask instance with a given blueprint and configuration."""
21 | app = flask.Flask(import_name=__name__)
22 | app.register_blueprint(blueprint)
23 | app.url_map.strict_slashes = False
24 | app.config.from_object(config)
25 | return app
26 |
27 |
28 | def run_local_api():
29 | """Run an api server instance on a local development server."""
30 | cfg = settings.LocalConfig()
31 | app = make_app(blueprints.api_server, settings.LocalConfig)
32 | app.run(host=cfg.SERVERNAME, port=cfg.API_PORT, debug=True)
33 |
34 |
35 | def run_local_doc():
36 | """Runs a developer portal instance on a local development server."""
37 | cfg = settings.LocalConfig()
38 | app = make_app(blueprints.developer_portal, settings.LocalConfig)
39 | app.run(host=cfg.SERVERNAME, port=cfg.DOC_PORT, debug=True)
40 |
41 |
42 | def test_client_factory():
43 | """Return a client instance for automated testing."""
44 | app = make_app(blueprints.api_server, settings.TestingConfig)
45 | return app.test_client()
46 |
47 |
48 | def api_factory(global_config, **local_conf):
49 | """Return an api server instance configured for production."""
50 | return make_app(blueprints.api_server, settings.ProductionConfig)
51 |
52 |
53 | def doc_factory(global_config, **local_conf):
54 | """Return a developer portal instance configured for production."""
55 | return make_app(blueprints.developer_portal, settings.ProductionConfig)
56 |
57 |
58 | def api_dev_factory(global_config, **local_conf):
59 | """Return an api server instance configured for development."""
60 | return make_app(blueprints.api_server, settings.DevelopmentConfig)
61 |
62 |
63 | def doc_dev_factory(global_config, **local_conf):
64 | """Return a developer portal instance configured for development."""
65 | return make_app(blueprints.developer_portal, settings.DevelopmentConfig)
66 |
--------------------------------------------------------------------------------
/src/zeit/api/util.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | zeit.api.util
4 | ~~~~~~~~~~~~~
5 |
6 | This module provides helper functions used throughout the application.
7 |
8 | Copyright: (c) 2013 by ZEIT ONLINE.
9 | License: BSD, see LICENSE.md for more details.
10 | """
11 |
12 | import re
13 | import urllib
14 | import urlparse
15 |
16 |
17 | def append_to_csv(csv, new):
18 | """Add a new value to a string formatted, comma separted value list."""
19 |
20 | if csv == '':
21 | return new
22 | arr = csv.split(',')
23 | arr.append(new)
24 | return ','.join(arr)
25 |
26 |
27 | def csv_to_list(csv):
28 | """Convert a csv string to an array, while ommitting empty entries."""
29 |
30 | for val in csv.split(','):
31 | if val != '':
32 | yield val
33 |
34 |
35 | def dict_by_list(dic, arr):
36 | """Filter a dictionary by a list of valid keys."""
37 |
38 | for key in dic:
39 | if key in arr:
40 | yield key, dic[key]
41 |
42 |
43 | def ensure_prefix(string, prefix):
44 | """Ensure, that a string is prefixed by another string."""
45 |
46 | if string[:len(prefix)] == prefix:
47 | return string
48 | return prefix + string
49 |
50 |
51 | def iri_to_uri(iri):
52 | """Convert an Internationalized Resource Identifier to a URI."""
53 |
54 | parts = urlparse.urlparse(iri)
55 | return urlparse.urlunparse(
56 | part.encode('idna') if parti == 1 else
57 | url_encode_non_ascii(part.encode('utf-8'))
58 | for parti, part in enumerate(parts)
59 | )
60 |
61 |
62 | def save_xpath(element, xpath, fallback=''):
63 | """Safely return the first result of an xpath expression."""
64 |
65 | try:
66 | return element.xpath(xpath)[0]
67 | except IndexError:
68 | return fallback
69 |
70 |
71 | def url_encode(data):
72 | """Safely encode a dictionary to a URL compatible string."""
73 |
74 | param = dict()
75 | for key, val in data.iteritems():
76 | if key == 'facet.field':
77 | param[key] = list(csv_to_list(val))
78 | elif isinstance(val, int):
79 | param[key] = '%d' % val
80 | else:
81 | param[key] = val
82 | return urllib.urlencode(param, True)
83 |
84 |
85 | def url_encode_non_ascii(href):
86 | """URL-encode non ascii characters."""
87 |
88 | function = lambda c: '%%%02x' % ord(c.group(0))
89 | return re.sub('[\x80-\xFF]', function, href)
90 |
--------------------------------------------------------------------------------
/src/zeit/api/templates/index.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block docTitle %}ZEIT ONLINE Content API{% endblock %}
3 | {% block docBody %}
4 |
5 |
Visualization by Gregor Aisch
6 |
7 |
8 |
9 | Willkommen zu der Beta-Version der ZEIT ONLINE Content API.
10 | Über diese Schnittstelle kann auf Inhalte und zugehörige Metadaten aus
11 | dem Archiv der ZEIT seit 1946 und von ZEIT ONLINE zugegriffen werden kann.
12 | Wir laden alle an Programmierung, Datenanalyse und Visualisierung
13 | Interessierten ein, den Index aus mehreren hunderttausenden Artikeln zu
14 | erkunden. Die Benutzung der Schnittstelle ist für nicht-kommerzielle
15 | Anwendungsfälle kostenlos. Bitte beachten Sie in jedem Falle die
16 | Nutzungsbedingungen.
17 |
18 |
19 | Um gleich mit dem Erstellen von Anwendungen oder Auswertungen zu beginnen,
20 | können Sie sich unter Quickstart
21 | einen API-Key freischalten lassen. Der
22 | Explorer ermöglicht Ihnen die
23 | Befragung der Schnittstelle über ein einfaches Web-Interface. Mehr über die
24 | Funktionen und Endpunkte der API erfahren Sie in der englischen
25 | Dokumentation.
26 |
27 |
28 | Wir laden Sie ein, uns Ihre Umsetzungen, Experimente und
29 | Erfahrungen mit der API via Email
30 | oder
31 | Twitter mitzuteilen. Aktuelle Informationen und spannende Projekte mit
32 | der API finden Sie auf dem ZEIT ONLINE
33 | Dev-Blog.
34 |
35 |
36 |
37 | Welcome to the beta version of the ZEIT ONLINE content API. Via the API you
38 | can access articles and corresponding metadata from the ZEIT archive dating
39 | back to 1946 and from ZEIT ONLINE. We invite everybody interested in
40 | programming, data analysis and visualization to explore the index of
41 | hundreds of thousands articles. The API can be used for free for
42 | non-commercial use cases. Please refer to the
43 | terms of use (German only at the
44 | moment).
45 |
46 |
47 | To get started building apps or analysing data, just request an API key
48 | in our quickstart guide. The
49 | explorer will allow you to query
50 | the API via a simple web-interface. A full
51 | documentation of the endpoints and their capabilites is also available.
52 |
53 |
54 | We invite you to share your apps, experiments and experiences with the API
55 | with us via email and
56 | Twitter.
57 | More news and a roundup of projects using our API can be found at the
58 | ZEIT ONLINE Dev-Blog.
59 |
64 | This endpoint provides a pre-filtered search for all articles written for a
65 | specific series. Deeper queries, pagination and partial field selection can
66 | be used to narrow down the match list.
67 |
8 | This endpoint lets you search for publication products by name. The product
9 | field of an article provides information about where the article was first
10 | published. An asterisk can serve as a wildcard or whitespace in your search
11 | query.
12 |
69 | This endpoint provides a pre-filtered search for all articles published in a
70 | specific product. Deeper queries, pagination and partial field selection can
71 | be used to narrow down the match list.
72 |
7 | This endpoint lets you search for authors by their name. The result is a list
8 | of possible matches with a displayable value, a URI to an article overview
9 | within the API and, if available, a link to all articles by this author on
10 | www.zeit.de.
11 |
67 | This endpoint provides a pre-filtered search for all articles written by a
68 | specific author. Deeper queries, pagination and partial field selection can
69 | be used to narrow down the match list.
70 |
126 |
127 | {% endblock %}
128 |
--------------------------------------------------------------------------------
/src/zeit/api/exception.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | zeit.api.exception
4 | ~~~~~~~~~~~~~~~~~~
5 |
6 | This module contains custom JSON exceptions. Not to be confused with the
7 | similarly named flask module 'exceptions'. Exceptions correspond to
8 | standard HTTP status codes.
9 |
10 | Copyright: (c) 2013 by ZEIT ONLINE.
11 | License: BSD, see LICENSE.md for more details.
12 | """
13 |
14 | from flask.exceptions import JSONHTTPException
15 | from werkzeug.exceptions import (escape, NotFound, MethodNotAllowed,
16 | Unauthorized, BadRequest, InternalServerError, ServiceUnavailable)
17 | from flask import request
18 |
19 |
20 | class JSONTooManyRequests(JSONHTTPException):
21 | code = 429
22 | description = 'You have reached your request quota.'
23 |
24 | @property
25 | def name(self):
26 | return 'Too Many Requests'
27 |
28 |
29 | class JSONUnauthorized(JSONHTTPException, Unauthorized):
30 | description = 'The provided api key seems to be invalid.'
31 |
32 |
33 | class JSONEndpointNotFound(JSONHTTPException, NotFound):
34 | description = 'The requested endpoint is not defined.'
35 |
36 |
37 | class JSONResourceNotFound(JSONHTTPException, NotFound):
38 | description = 'The requested resource can not be found.'
39 |
40 |
41 | class JSONMethodNotAllowed(JSONHTTPException, MethodNotAllowed):
42 | def get_description(self, environ):
43 | m = escape(environ.get('REQUEST_METHOD', 'GET'))
44 | return 'The method %s is not allowed for the requested URL.' % m
45 |
46 |
47 | class JSONBadRequest(JSONHTTPException, BadRequest):
48 | description = 'The request cannot be fulfilled due to bad syntax.'
49 |
50 |
51 | class JSONInternalServerError(JSONHTTPException, InternalServerError):
52 | description = 'Due to an internal error the request could not be fulfilled.'
53 |
54 |
55 | class JSONServiceUnavailable(JSONHTTPException, ServiceUnavailable):
56 | description = 'The service is currently unavailable.'
57 |
58 |
59 | def too_many_requests(error=None):
60 | excp = JSONTooManyRequests()
61 | message = '%d: %s' % (excp.code, error or excp.description)
62 | print >> request.environ['wsgi.errors'], message
63 | return excp
64 |
65 |
66 | def unauthorized(error=None):
67 | excp = JSONUnauthorized()
68 | message = '%d: %s' % (excp.code, error or excp.description)
69 | print >> request.environ['wsgi.errors'], message
70 | return excp
71 |
72 |
73 | def endpoint_not_found(error=None):
74 | excp = JSONEndpointNotFound()
75 | message = '%d: %s' % (excp.code, error or excp.description)
76 | print >> request.environ['wsgi.errors'], message
77 | return excp
78 |
79 |
80 | def resource_not_found(error=None):
81 | excp = JSONResourceNotFound()
82 | message = '%d: %s' % (excp.code, error or excp.description)
83 | print >> request.environ['wsgi.errors'], message
84 | return excp
85 |
86 |
87 | def method_not_allowed(error=None):
88 | excp = JSONMethodNotAllowed()
89 | message = '%d: %s' % (excp.code, error or excp.description)
90 | print >> request.environ['wsgi.errors'], message
91 | return excp
92 |
93 |
94 | def bad_request(error=None):
95 | excp = JSONBadRequest()
96 | message = '%d: %s' % (excp.code, error or excp.description)
97 | print >> request.environ['wsgi.errors'], message
98 | return excp
99 |
100 |
101 | def internal_server_error(error=None):
102 | excp = JSONInternalServerError()
103 | message = '%d: %s' % (excp.code, error or excp.description)
104 | print >> request.environ['wsgi.errors'], message
105 | return excp
106 |
107 |
108 | def service_unavailable(error=None):
109 | excp = JSONServiceUnavailable()
110 | message = '%d: %s' % (excp.code, error or excp.description)
111 | print >> request.environ['wsgi.errors'], message
112 | return excp
113 |
--------------------------------------------------------------------------------
/src/zeit/api/templates/docs/department.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block docTitle %}Department{% endblock %}
3 | {% block docBody %}
4 |
5 |
Search all departments
6 |
7 |
8 | Departments correlate to the sections of our homepage and to those of the
9 | printed issues. There are top level departments and sub-departments. Both are
10 | accessible via this endpoint. Sub-departments have a link to their parent
11 | attached to them. The query expects an astisk for wildcards and whitespaces.
12 |
66 | This endpoint provides a pre-filtered search for all articles belonging to a
67 | specific department. Deeper queries, pagination and partial field selection
68 | can be used to narrow down the match list.
69 |
8 | This endpoint lets you search for keywords that semantically summarize
9 | articles. An asterisk can serve as a wildcard or whitespace in your search
10 | query. Keywords are weighted by their overall frequency with a score from 0
11 | to 100. Possible keyword types are: location, person, organisation or issue.
12 |
70 | This endpoint provides a pre-filtered search for all articles tagged with a
71 | specific keyword. Deeper queries, pagination and partial field selection can
72 | be used to narrow down the match list.
73 |
133 |
134 | {% endblock %}
135 |
--------------------------------------------------------------------------------
/src/zeit/api/metadata.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | zeit.api.metadata
4 | ~~~~~~~~~~~~~~~~~
5 |
6 | This module contains methods that maintain metadata entities by parsing XML
7 | files and updating the API's SQLite database.
8 |
9 | Copyright: (c) 2013 by ZEIT ONLINE.
10 | License: BSD, see LICENSE.md for more details.
11 | """
12 |
13 | import urllib
14 |
15 | from flask import g, current_app
16 | from lxml import etree
17 |
18 | from .util import iri_to_uri, save_xpath
19 |
20 |
21 | def __update_product(product):
22 | """Update the given product entity."""
23 | product_id = save_xpath(product, './@id').lower()
24 | uri = '%s/product/%s' % (current_app.config['API_URL'], product_id)
25 | value = save_xpath(product, 'text()')
26 | href = save_xpath(product, './@href')
27 | query = 'REPLACE INTO product VALUES (?, ?, ?, ?);'
28 | g.db.execute(query, (href, product_id, uri, value))
29 |
30 |
31 | def __update_series(series):
32 | """Update the given series entity."""
33 | series_id = save_xpath(series, './@url')
34 | uri = '%s/series/%s' % (current_app.config['API_URL'], series_id)
35 | value = save_xpath(series, './@title')
36 | name = save_xpath(series, './@serienname')
37 | query = 'REPLACE INTO series VALUES (?, ?, ?, ?, ?);'
38 | href = 'http://www.zeit.de/serie/%s' % series_id
39 | g.db.execute(query, (href, series_id, name, uri, value))
40 |
41 |
42 | def __update_keyword(keyword, ranks, types):
43 | """Update the given keyword entity."""
44 | kw_id = save_xpath(keyword, './@url_value')
45 | uri = '%s/keyword/%s' % (current_app.config['API_URL'], kw_id)
46 | value = save_xpath(keyword, 'text()')
47 | lexical = save_xpath(keyword, './@lexical_value')
48 | kw_type = save_xpath(keyword, './@type')
49 | kw_type = 'subject' if kw_type in ['free', 'topic'] else kw_type.lower()
50 | score = (ranks.index(int(save_xpath(keyword, './@freq'))) + 1)
51 | score = int(100.0 / len(ranks) * score)
52 | href = 'http://www.zeit.de/schlagworte/%s/%s/index' % (types[kw_type], kw_id)
53 | query = 'REPLACE INTO keyword VALUES (?, ?, ?, ?, ?, ?, ?);'
54 | g.db.execute(query, (href, kw_id, lexical, score, kw_type, uri, value))
55 |
56 |
57 | def __update_department(department):
58 | """Update the given department entity."""
59 | dept_id = save_xpath(department, './@label')
60 | if dept_id in ['startseite']:
61 | return
62 | uri = '%s/department/%s' % (current_app.config['API_URL'], dept_id)
63 | value = save_xpath(department, 'text()')
64 | href = save_xpath(department, './@href')[19:].split('/', 1)[0]
65 | parent = href if href != dept_id else ''
66 | path = parent + '/' + dept_id if parent else dept_id
67 | href = 'http://www.zeit.de/%s/index' % path
68 | query = 'REPLACE INTO department VALUES (?, ?, ?, ?, ?);'
69 | g.db.execute(query, (href, dept_id, parent, uri, value))
70 |
71 |
72 | def __update_author(author):
73 | """Update the given author entity."""
74 | value = save_xpath(author, './@name')
75 | author_id = value.replace(' ', '-')
76 | uri = '%s/author/%s' % (current_app.config['API_URL'], author_id)
77 | initial = value.split(' ')[-1] or 'A'
78 | href_raw = 'http://www.zeit.de/autoren/%s/%s/index.xml'
79 | href = href_raw % (initial[0], value.replace(' ', '_'))
80 | href = href if urllib.urlopen(iri_to_uri(href)).getcode() == 200 else ''
81 | query = 'REPLACE INTO author VALUES (?, ?, ?, ?, ?);'
82 | g.db.execute(query, (href, author_id, 'author', uri, value))
83 |
84 |
85 | def update():
86 | """Update metadata of all categories and write changes to database."""
87 | products = current_app.config['PRODUCT_ALPHABET']
88 | for p in etree.parse(products).xpath('//product'):
89 | __update_product(p)
90 |
91 | series = current_app.config['SERIES_ALPHABET']
92 | for s in etree.parse(series).xpath('//series'):
93 | print s
94 | __update_series(s)
95 |
96 | keywords = current_app.config['KEYWORD_ALPHABET']
97 | parsed_keywords = etree.parse(keywords).xpath('//tag')
98 | ranks = sorted(set(int(save_xpath(k, './@freq')) for k in parsed_keywords))
99 | types = {'location': 'orte', 'person': 'personen', 'subject': 'themen',
100 | 'organization': 'organisationen'}
101 | for k in parsed_keywords:
102 | print k
103 | __update_keyword(k, ranks, types)
104 |
105 | depts = current_app.config['DEPARTMENT_ALPHABET']
106 | for d in etree.parse(depts).xpath('/lists/list[@id="sitemap"]//link'):
107 | print d
108 | __update_department(d)
109 |
110 | url = '/select?q=*:*&facet=true&facet.field=author'
111 | url += '&facet.limit=1000000&rows=0&facet.mincount=1'
112 | authors = current_app.config['SOLR_URL'] + url
113 | for a in etree.parse(authors).xpath('//lst[@name="author"]/int'):
114 | print a
115 | __update_author(a)
116 |
--------------------------------------------------------------------------------
/src/zeit/api/templates/docs/content.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 | {% block docTitle %}Content{% endblock %}
3 | {% block docBody %}
4 |
5 |
Search for content
6 |
7 |
8 | This endpoint exposes an unfiltered search for articles. You can set search
9 | queries, paginate, sort and partially select the fields, that should be
10 | returned. Articles, that match your query, are returned in the
11 | matches array with a reduced set of meta data. The full set of data
12 | is only available at /content/{id}.
13 |
{
50 | "matches": [
51 | {
52 | "subtitle": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
53 | sed diam nonumy eirmod tempor invidunt.",
54 | "uuid": "1111122299xxcc99aa",
55 | "title": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr.",
56 | "href": "http://www.zeit.de/lorem/ipsum/2012-12/Lorem-ipsum-dolor-sit",
57 | "release_date": "2011-12-01T14:40:00.000Z",
58 | "uri": "{{ api_url }}/content/1111122299xxcc99aa",
59 | "snippet": "tempor invidunt ut labore et dolore magna aliquyam
60 | erat, sed diam voluptua"
61 | "supertitle": "Lorem Ipsum",
62 | "teaser_title": "Lorem ipsum dolor sit amet, consetetur sadipscing.",
63 | "teaser_text": "Stet clita kasd gubergren, no sea takimata sanctus est
64 | Lorem ipsum dolor sit amet."
65 | }
66 | ],
67 | "found": 148,
68 | "limit": 1,
69 | "offset": 0
70 | }
71 |
72 |
73 |
74 |
Get content by ID
75 |
76 |
77 | Requesting a content object by its ID will get you all available data for
78 | that article. Partial field selection is available, if not all fields are
79 | of interest.
80 |
6 | You can browse the API with this interactive explorer. Select an endpoint
7 | to see its available parameters and their default values. If you have
8 | requested an API key, it is
9 | inserted automatically in the form below.
10 |
Accessing our content requires an API key. To request one, simply
8 | sign up with your full name and a valid Email address. A key will be
9 | generated for you right here on this page. At the moment we offer free
10 | API-Access with a limit of 10,000 requests per day.
127 | All requests to our API must be authorized, so we know who plays with our
128 | data. To keep URLs simple and clean, the key should be sent as an
129 | X-Authorization header attached to your HTTP request.
130 |
150 | The supported content types for now are JSON and
152 | JSONP, simple as that. The default is JSON. To get JSONP, a
153 | callback parameter specifying the function's name is required.
154 |
155 |
156 |
Example
157 |
GET /{endpoint}?callback=myCallbackName HTTP/1.1
158 | Response: myCallbackName({"result":"data"});
159 |
160 |
161 |
Error handling
162 |
If there was something wrong with your request or, if for some reason, we
163 | dropped the ball, you will receive an appropriate HTTP status code. The
164 | body will contain a JSON-encoded description of what might have been the
165 | problem.
166 |
167 |
168 |
Example
169 |
HTTP/1.0 401 UNAUTHORIZED
170 | Content: {"description": "The provided API key seems to be invalid."}
171 |
172 |
173 |
174 |
175 |
Start exploring
176 |
177 | Now, that you have learned the basics, head over to our API Explorer or digg into the
179 | documentation.
180 |
114 | The q parameter supports simple Solr query syntax for the
115 | /content endpoint and all /{ep}/{id} endpoints.
116 | Here are a few basics to get you started. For more details check out the
117 |
118 | Apache Solr documentation.
119 |
120 |
121 | Full text search: You can search the entire article text and all
122 | meta data simply by setting the query parameter to your search phrase.
123 | John+Fitzgerald+Kennedy will search for multiple tokens and
124 | "John%20Fitzgerald%20Kennedy" will search for the entire
125 | string.
126 |
127 |
128 | Field queries: All fields of an article can be queried individually.
129 | For example, to get articles that have the word "Kennedy" in their headline,
130 | you would search for title:"Kennedy".
131 |
132 |
133 | Boolean operators: To form a boolean expression of multiple queries,
134 | connect them with an AND or an OR. So if you
135 | want to search the subtitle for "Kennedy" as well, you can modify the query
136 | to title:"Kennedy" AND subtitle:"Kennedy".
137 |
138 |
139 | Range queries: You can specify a date range for your query using
140 | full ISO 8601 date syntax. Getting all Kennedy related articles from
141 | the 60s would work like this:
142 | "Kennedy" AND release_date:[1960-01-01T00:00:00Z TO
143 | 1969-12-31T23:59:59.999Z].
144 |
145 |
146 | Non-content search: All other endpoints currently only support
147 | simple search phrases with asterisk wildcards. So, a search for Kennedy
148 | related keywords goes like this: /keyword?q=*Kennedy
149 |
150 |
151 |
152 |
153 |
Pagination
154 |
155 |
156 | Search results are limited to 10 matches by default. You can increase this
157 | value with the limit parameter. To iterate over the resultset,
158 | repeat your request with the offset parameter set to multiples
159 | of the limit.
160 |
161 |
162 |
163 |
164 |
Partial Selection
165 |
166 |
167 | By default, the API returns all available fields for your request. To speed
168 | things up, you can specify which fields the server should return using the
169 | fields paramter. If the request returns an array of matches,
170 | this setting only affects entries within that array. So, for example the
171 | field found cannot be deselected, as it always resides outside
172 | of the matches array.
173 |
174 |
175 |
176 |
177 |
Sorting
178 |
179 |
180 | Search results, namely those of the /content endpoint, can be
181 | sorted using the sort parameter. Any of the returned fields are
182 | sortable. Direction keywords are asc and desc for
183 | an ascending or descending sort order respectively. Multiple sort orders are
184 | accepted as a comma-separated list, for example: sort=release_date
185 | asc, uuid desc.
186 |
187 |
188 |
189 |
190 |
Facetting
191 |
192 |
193 | The content endpoint also provides a faceting interface. You can display
194 | facets for specific fields and for date ranges. The
195 | facet_field parameter is used to get a frequency distribution
196 | for the different values of a field. The facet_date parameter
197 | returns counts for the distribution over a specified date range. The
198 | facetting results are limited by your search query, but not by pagination
199 | parameters.
200 |
201 |
202 |
203 |
204 |
Parameter
205 |
Possible values
206 |
207 |
208 |
facet_field
209 |
keyword, author, series, department, product or any combination
210 |
211 |
212 |
facet_date
213 |
1day, 7day, 1month, 1year, 10year or any numerical variation