├── __init__.py ├── .coveragerc ├── tests ├── __init__.py ├── test_exceptions.py ├── test_client.py └── test_wrapper.py ├── setup.cfg ├── .bumpversion.cfg ├── tox.ini ├── MANIFEST.in ├── docs ├── source │ ├── installation.rst │ ├── index.rst │ ├── conf.py │ └── usage.rst └── Makefile ├── requirements.txt ├── AUTHORS.md ├── .travis.yml ├── .gitignore ├── scrapyd_api ├── __init__.py ├── exceptions.py ├── compat.py ├── constants.py ├── client.py └── wrapper.py ├── LICENSE ├── setup.py ├── Makefile ├── HISTORY.md ├── CONTRIBUTING.md └── README.md /__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.coveragerc: -------------------------------------------------------------------------------- 1 | [report] 2 | omit = 3 | scrapyd_api/compat.py 4 | -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [easy_install] 2 | zip_ok = false 3 | 4 | [wheel] 5 | universal = 1 6 | -------------------------------------------------------------------------------- /.bumpversion.cfg: -------------------------------------------------------------------------------- 1 | [bumpversion] 2 | current_version = 2.1.2 3 | commit = True 4 | tag = True 5 | 6 | [bumpversion:file:README.md] 7 | 8 | [bumpversion:file:scrapyd_api/__init__.py] 9 | 10 | [bumpversion:file:setup.py] 11 | 12 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = 3 | py26, 4 | py27, 5 | py33, 6 | py34 7 | 8 | [testenv] 9 | setenv = 10 | PYTHONPATH = {toxinidir}:{toxinidir}/python-scrapyd-api 11 | commands = python setup.py test 12 | deps = 13 | -r{toxinidir}/requirements.txt 14 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include AUTHORS.md 2 | include CONTRIBUTING.md 3 | include HISTORY.md 4 | include LICENSE 5 | include README.md 6 | 7 | recursive-include tests * 8 | recursive-exclude * __pycache__ 9 | recursive-exclude * *.py[co] 10 | 11 | recursive-include docs *.rst conf.py Makefile 12 | -------------------------------------------------------------------------------- /docs/source/installation.rst: -------------------------------------------------------------------------------- 1 | ============ 2 | Installation 3 | ============ 4 | 5 | The package is available via the Python Package Index and can be installed in 6 | the usual ways:: 7 | 8 | $ easy_install python-scrapyd-api 9 | 10 | or:: 11 | 12 | $ pip install python-scrapyd-api 13 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # Functionality 2 | requests==2.4.1 3 | 4 | # Development 5 | wheel 6 | twine 7 | bumpversion 8 | 9 | # Docs 10 | sphinx==1.2.3 11 | sphinx_rtd_theme==0.1.6 12 | 13 | # Testing 14 | pytest==2.6.2 15 | pytest-sugar==0.3.4 16 | pytest-cov==1.8.0 17 | mock==1.0.1 18 | responses==0.2.2 19 | -------------------------------------------------------------------------------- /tests/test_exceptions.py: -------------------------------------------------------------------------------- 1 | from scrapyd_api.exceptions import ScrapydError 2 | 3 | 4 | def test_scrapyd_error(): 5 | err = ScrapydError() 6 | assert repr(err) == 'ScrapydError("Scrapyd Error")' 7 | err_with_detail = ScrapydError(detail='Something went wrong') 8 | assert repr(err_with_detail) == 'ScrapydError("Something went wrong")' 9 | -------------------------------------------------------------------------------- /AUTHORS.md: -------------------------------------------------------------------------------- 1 | # Authors 2 | 3 | ## Initial Work 4 | 5 | * Darian Moody ([mail](mailto:mail@djm.org.uk), [github](https://github.com/djm)) 6 | 7 | ## Contributors 8 | 9 | * Bruce Tang ([github](https://github.com/BruceDone)) 10 | * Tomas Linhart ([github](https://github.com/tlinhart)) 11 | * Serhii Berebko ([github](https://github.com/serbernar)) 12 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | # Config file for automatic testing at travis-ci.org 2 | # and providing coverage reporting at coveralls.io 3 | 4 | language: python 5 | 6 | python: 7 | - "2.6" 8 | - "2.7" 9 | - "3.3" 10 | - "3.4" 11 | 12 | install: 13 | - pip install -r requirements.txt 14 | - pip install coveralls 15 | script: 16 | py.test --cov scrapyd_api tests/ 17 | after_success: 18 | coveralls 19 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[cod] 2 | 3 | # Mac OSX 4 | .DS_Store 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Packages 10 | *.egg 11 | *.egg-info 12 | dist 13 | build 14 | eggs 15 | parts 16 | bin 17 | var 18 | sdist 19 | develop-eggs 20 | .installed.cfg 21 | lib 22 | lib64 23 | 24 | # Installer logs 25 | pip-log.txt 26 | 27 | # Unit test / coverage reports 28 | .coverage 29 | .tox 30 | nosetests.xml 31 | htmlcov 32 | 33 | # Translations 34 | *.mo 35 | 36 | # Complexity 37 | output/*.html 38 | output/*/index.html 39 | 40 | # Sphinx 41 | docs/_build 42 | 43 | # PyCharm 44 | .idea 45 | -------------------------------------------------------------------------------- /scrapyd_api/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | from __future__ import unicode_literals 4 | 5 | from .constants import ( 6 | FINISHED, 7 | PENDING, 8 | RUNNING 9 | ) 10 | from .exceptions import ScrapydError 11 | from .wrapper import ScrapydAPI 12 | 13 | __author__ = 'Darian Moody' 14 | __email__ = 'mail@djm.org.uk' 15 | __version__ = '2.1.2' 16 | __license__ = 'BSD 2-Clause' 17 | __copyright__ = 'Copyright 2014 Darian Moody' 18 | 19 | VERSION = __version__ 20 | 21 | __all__ = ['ScrapydError', 'ScrapydAPI', 'FINISHED', 'PENDING', 'RUNNING'] 22 | -------------------------------------------------------------------------------- /scrapyd_api/exceptions.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | 4 | class ScrapydError(Exception): 5 | """ 6 | Base class for Scrapyd API exceptions. 7 | """ 8 | default_detail = 'Scrapyd Error' 9 | 10 | def __init__(self, detail=None): 11 | self.detail = detail or self.default_detail 12 | 13 | def __str__(self): 14 | return self.detail 15 | 16 | def __repr__(self): 17 | return '{0}("{1}")'.format(self.__class__.__name__, self.detail) 18 | 19 | 20 | class ScrapydResponseError(ScrapydError): 21 | 22 | default_detail = 'Scrapyd Response Error' 23 | -------------------------------------------------------------------------------- /scrapyd_api/compat.py: -------------------------------------------------------------------------------- 1 | # This library is a cut down version of the `six` package; it 2 | # is designed to contain exactly & only what we need to support 3 | # Python 2 & 3 concurrently. 4 | import sys 5 | 6 | PY2 = sys.version_info[0] == 2 7 | PY3 = sys.version_info[0] == 3 8 | 9 | 10 | if PY3: 11 | import io 12 | StringIO = io.StringIO 13 | 14 | def iteritems(d, **kw): 15 | return iter(d.items(**kw)) 16 | else: # PY2 17 | import StringIO 18 | StringIO = StringIO.StringIO 19 | 20 | def iteritems(d, **kw): 21 | return iter(d.iteritems(**kw)) 22 | 23 | try: 24 | # Python 3 25 | from urllib.parse import urljoin 26 | except ImportError: 27 | # Python 2 28 | from urlparse import urljoin 29 | -------------------------------------------------------------------------------- /docs/source/index.rst: -------------------------------------------------------------------------------- 1 | .. complexity documentation master file, created by 2 | sphinx-quickstart on Tue Jul 9 22:26:36 2013. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Documentation for python-scrapyd-api 7 | ==================================== 8 | 9 | ``python-scrapyd-api`` is a very simple Python wrapper for working with 10 | Scrapyd_'s API_; it allows a Python application to talk to, and therefore 11 | control, the Scrapy Daemon. 12 | 13 | It is built on top of the Requests_ library and supports Python 2.6, 2.7, 3.3 14 | & 3.4. 15 | 16 | .. _Scrapyd: https://github.com/scrapy/scrapyd 17 | .. _API: http://scrapyd.readthedocs.org/en/latest/api.html 18 | .. _Requests: http://python-requests.org 19 | 20 | 21 | Contents 22 | -------- 23 | 24 | .. toctree:: 25 | :maxdepth: 2 26 | 27 | installation 28 | usage 29 | -------------------------------------------------------------------------------- /scrapyd_api/constants.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | ADD_VERSION_ENDPOINT = 'add_version' 4 | CANCEL_ENDPOINT = 'cancel' 5 | DELETE_PROJECT_ENDPOINT = 'delete_project' 6 | DELETE_VERSION_ENDPOINT = 'delete_version' 7 | LIST_JOBS_ENDPOINT = 'list_jobs' 8 | LIST_PROJECTS_ENDPOINT = 'list_projects' 9 | LIST_SPIDERS_ENDPOINT = 'list_spiders' 10 | LIST_VERSIONS_ENDPOINT = 'list_versions' 11 | SCHEDULE_ENDPOINT = 'schedule' 12 | DAEMON_STATUS_ENDPOINT = 'daemonstatus' 13 | 14 | DEFAULT_ENDPOINTS = { 15 | ADD_VERSION_ENDPOINT: '/addversion.json', 16 | CANCEL_ENDPOINT: '/cancel.json', 17 | DELETE_PROJECT_ENDPOINT: '/delproject.json', 18 | DELETE_VERSION_ENDPOINT: '/delversion.json', 19 | LIST_JOBS_ENDPOINT: '/listjobs.json', 20 | LIST_PROJECTS_ENDPOINT: '/listprojects.json', 21 | LIST_SPIDERS_ENDPOINT: '/listspiders.json', 22 | LIST_VERSIONS_ENDPOINT: '/listversions.json', 23 | SCHEDULE_ENDPOINT: '/schedule.json', 24 | DAEMON_STATUS_ENDPOINT: '/daemonstatus.json' 25 | } 26 | 27 | FINISHED = 'finished' 28 | PENDING = 'pending' 29 | RUNNING = 'running' 30 | 31 | JOB_STATES = [FINISHED, PENDING, RUNNING] 32 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014, Darian Moody All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | 5 | Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 6 | 7 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 8 | -------------------------------------------------------------------------------- /scrapyd_api/client.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | from requests import Session 4 | 5 | from .exceptions import ScrapydResponseError 6 | 7 | 8 | class Client(Session): 9 | """ 10 | The client is a thin wrapper around the requests Session class which 11 | allows us to wrap the response handler so that we can handle it in a 12 | Scrapyd-specific way. 13 | """ 14 | 15 | def _handle_response(self, response): 16 | """ 17 | Handles the response received from Scrapyd. 18 | """ 19 | if not response.ok: 20 | raise ScrapydResponseError( 21 | "Scrapyd returned a {0} error: {1}".format( 22 | response.status_code, 23 | response.text)) 24 | 25 | try: 26 | json = response.json() 27 | except ValueError: 28 | raise ScrapydResponseError("Scrapyd returned an invalid JSON " 29 | "response: {0}".format(response.text)) 30 | if json['status'] == 'ok': 31 | json.pop('status') 32 | return json 33 | elif json['status'] == 'error': 34 | raise ScrapydResponseError(json['message']) 35 | 36 | def request(self, *args, **kwargs): 37 | response = super(Client, self).request(*args, **kwargs) 38 | return self._handle_response(response) 39 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | import sys 4 | from setuptools import setup 5 | from setuptools.command.test import test as TestCommand 6 | 7 | 8 | readme = open('README.md').read() 9 | history = open('HISTORY.md').read() 10 | 11 | 12 | class PyTest(TestCommand): 13 | 14 | def finalize_options(self): 15 | TestCommand.finalize_options(self) 16 | self.test_args = ['tests'] 17 | self.test_suite = True 18 | 19 | def run_tests(self): 20 | import pytest 21 | errno = pytest.main(self.test_args) 22 | sys.exit(errno) 23 | 24 | 25 | if sys.argv[-1] == 'publish': 26 | print("Use `make release` instead.") 27 | sys.exit() 28 | 29 | 30 | setup( 31 | name='python-scrapyd-api', 32 | version='2.1.2', 33 | description='A Python wrapper for working with the Scrapyd API', 34 | keywords='python-scrapyd-api scrapyd scrapy api wrapper', 35 | long_description=readme + '\n\n' + history, 36 | long_description_content_type='text/markdown', 37 | author='Darian Moody', 38 | author_email='mail@djm.org.uk', 39 | url='https://github.com/djm/python-scrapyd-api', 40 | packages=[ 41 | 'scrapyd_api', 42 | ], 43 | package_dir={ 44 | 'scrapyd_api': 'scrapyd_api' 45 | }, 46 | include_package_data=True, 47 | setup_requires=['setuptools>=38.6.0'], 48 | install_requires=[ 49 | 'requests' 50 | ], 51 | license="BSD", 52 | zip_safe=False, 53 | classifiers=[ 54 | 'Development Status :: 4 - Beta', 55 | 'Intended Audience :: Developers', 56 | 'License :: OSI Approved :: BSD License', 57 | 'Natural Language :: English', 58 | "Programming Language :: Python :: 2", 59 | 'Programming Language :: Python :: 2.6', 60 | 'Programming Language :: Python :: 2.7', 61 | 'Programming Language :: Python :: 3', 62 | 'Programming Language :: Python :: 3.3', 63 | 'Programming Language :: Python :: 3.4', 64 | 'Topic :: Internet :: WWW/HTTP', 65 | ], 66 | cmdclass={ 67 | 'test': PyTest 68 | } 69 | ) 70 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean-pyc clean-build docs clean 2 | 3 | help: 4 | @echo "clean-build - remove build artifacts" 5 | @echo "clean-pyc - remove Python file artifacts" 6 | @echo "test - run tests quickly with the default Python" 7 | @echo "test-all - run tests on every Python version with tox" 8 | @echo "coverage - check code coverage quickly with the default Python" 9 | @echo "docs - generate Sphinx HTML documentation, including API docs" 10 | @echo "release - package and upload a release" 11 | @echo "test-release - using TestPyPI" 12 | @echo "dist - package" 13 | 14 | clean: clean-build clean-pyc 15 | rm -fr htmlcov/ 16 | 17 | clean-build: 18 | rm -fr build/ 19 | rm -fr dist/ 20 | rm -fr *.egg-info 21 | 22 | clean-pyc: 23 | find . -name '*.pyc' -exec rm -f {} + 24 | find . -name '*.pyo' -exec rm -f {} + 25 | find . -name '*~' -exec rm -f {} + 26 | 27 | test: 28 | python setup.py test 29 | 30 | test-all: 31 | tox 32 | 33 | coverage: 34 | py.test --cov-report html --cov scrapyd_api tests/ 35 | open htmlcov/index.html 36 | 37 | docs: 38 | $(MAKE) -C docs clean 39 | $(MAKE) -C docs html 40 | open docs/build/html/index.html 41 | 42 | release: clean 43 | python3 setup.py sdist bdist_wheel 44 | echo "You will be asked for auth TWICE, one for tar, one for wheel - until MD bug is resolved" 45 | # Tar must go first due to markdown render related bug with wheel 46 | # When wheel==0.31 is released, this can change to just one line with dist/* 47 | twine upload --repository-url https://upload.pypi.org/legacy/ dist/*.tar.gz 48 | twine upload --repository-url https://upload.pypi.org/legacy/ dist/*.whl 49 | 50 | test-release: clean 51 | python3 setup.py sdist bdist_wheel 52 | echo "You will be asked for auth TWICE, one for tar, one for wheel - until MD bug is resolved" 53 | # Tar must go first due to markdown render related bug with wheel 54 | # When wheel==0.31 is released, this can change to just one line with dist/* 55 | twine upload --repository-url https://test.pypi.org/legacy/ dist/*.tar.gz 56 | twine upload --repository-url https://test.pypi.org/legacy/ dist/*.whl 57 | 58 | dist: clean 59 | python3 setup.py sdist bdist_wheel 60 | ls -l dist 61 | -------------------------------------------------------------------------------- /HISTORY.md: -------------------------------------------------------------------------------- 1 | # History 2 | 3 | ## 2.1.1 (2018-04-01) 4 | 5 | * Base set of docs converted to markdown (README, AUTHORS, CONTRIBUTING, HISTORY) 6 | 7 | ## 2.1.0 (2018-03-31) 8 | 9 | * Introduces the `timeout` keyword argument, which allows the caller to specify 10 | a timeout after which requests to the scrapyd server give up. This works as 11 | per the underlying `requests` library, and raises `requests.exceptions.Timeout` 12 | when the timeout is exceeded. See docs for usage. 13 | 14 | 15 | ## 2.0.1 (2016-02-27) 16 | 17 | v2.0.0 shipped with docs which were slightly out of date for the cancel 18 | endpoint, this release corrects that. 19 | 20 | ## 2.0.0 (2016-02-27) 21 | 22 | Why Version 2? This package has been production ready and stable in use 23 | for over a year now, so it's ready to commit to a stable API /w semver. 24 | Version 1 has deliberately been skipped to make it absolutely clear that 25 | this release contains a breaking change: 26 | 27 | Breaking changes: 28 | 29 | * The cancel job endpoint now returns the previous state of the successfully 30 | cancelled spider rather than a simple boolean True/False. This change was 31 | made because: 32 | a) the boolean return was relatively useless and actually hiding data the 33 | scrapyd API passes us as part of the cancel endpoint response. 34 | b) before this change, the method would have returned `True` only if the 35 | cancelled job was previously running, and this resulted in us incorrectly 36 | reporting `False` when a *pending* job was cancelled. 37 | This may require no changes to your codebase but nevertheless it is a change 38 | in a public API, thus the requirement for major version bumping. 39 | 40 | Other changes: 41 | 42 | * The cancel job endpoint now accepts a `signal` keyword argument which is 43 | the termination signal Scrapyd uses to cancel the spider job. If not 44 | specified, the value is not sent to the scrapyd endpoint at all, therefore 45 | allows scrapyd control over which default signal gets used (currently `TERM`). 46 | 47 | 48 | ## 0.2.0 (2015-01-14) 49 | 50 | * Added the new ``job_status`` method which can retrieve the job status of a 51 | specific job from a project. See docs for usage. 52 | * Increased and improved test coverage. 53 | 54 | ## 0.1.0 (2014-09-16) 55 | 56 | * First release on PyPI. 57 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | `python-scrapyd-api` is free & open-source software and therefore every little 4 | bit helps. Whether you're simply correcting a typo or bringing the release 5 | up-to-date with 3rd party changes, all help is welcome and very appreciated. 6 | 7 | 8 | ## Reporting Bugs 9 | 10 | Please report bugs by utilising [Github Issues][issues]. Simply check if your issue 11 | exists first, and if not, submit a new issue. 12 | 13 | [issues]: https://github.com/djm/python-scrapyd-api/issues 14 | 15 | If you are reporting a bug: 16 | 17 | * Detailed steps to reproduce the bug. 18 | * Your operating system name and any versions of software (if applicable). 19 | * Include any details about your local setup that might be helpful in 20 | troubleshooting. 21 | * A pull request would be most appreciated but even just submitting the bug 22 | is very helpful, thanks! 23 | 24 | ## Pull Request Guidelines 25 | 26 | Before you submit a pull request, check that it meets these guidelines: 27 | 28 | 1. The pull request should include tests, especially if fixing a regression. 29 | 2. If the pull request adds functionality, the docs should be updated to 30 | document that functionality. 31 | 3. The pull request should work for Python 2.6, 2.7, 3.3 and 3.4. 32 | Check [TravisCI][travis] and make sure that the tests pass for all supported Python versions. 33 | 34 | [travis]: https://travis-ci.org/djm/python-scrapyd-api/pull_requests 35 | 36 | ## Submitting Code 37 | 38 | Ready to contribute? Here's how to set up `python-scrapyd-api` for local development. 39 | 40 | 1. Fork the `python-scrapyd-api` repo on GitHub. 41 | 42 | 2. Clone your fork locally: 43 | 44 | $ git clone git@github.com:your_name_here/python-scrapyd-api.git 45 | 46 | 3. Install your local copy into a `virtualenv`. Assuming you have `virtualenvwrapper` installed, this is how you set up your fork for local development: 47 | 48 | $ mkvirtualenv python-scrapyd-api 49 | $ cd python-scrapyd-api/ 50 | $ python setup.py develop 51 | 52 | 4. Install the requirements needed to develop on `python-scrapyd-api`. That 53 | includes doc writing and testing tools: 54 | 55 | $ pip install -r requirements.txt 56 | 57 | 58 | 5. Create a branch for local development: 59 | 60 | $ git checkout -b name-of-your-bugfix-or-feature 61 | 62 | Now you can make your changes locally. 63 | 64 | 6. When you're done making changes, check the following things: 65 | 66 | a. That your changes pass the flake8 linter (use common sense though): 67 | 68 | $ pip install flake8 69 | $ flake8 python-scrapyd-api tests 70 | 71 | b. That the tests still run: 72 | 73 | $ python setup.py test 74 | 75 | c. That the tests run for all supported versions of Python. This requires tox and having the various versions of Python installed: 76 | 77 | $ pip install tox 78 | $ tox 79 | 80 | 7. Add yourself to the `AUTHORS.md` file as a contributor. 81 | 82 | 8. Commit your changes and push your branch to GitHub. Please use a suitable 83 | git commit message (summary line, two line breaks, detailed description): 84 | 85 | $ git add . 86 | $ git commit 87 | $ git push origin name-of-your-bugfix-or-feature 88 | 89 | 9. Submit a pull request through the GitHub website. 90 | 91 | 92 | ## How to run the tests 93 | 94 | To run the tests: 95 | 96 | ```bash 97 | $ python setup.py test 98 | # or use PyTest directly: 99 | $ py.test 100 | ``` 101 | 102 | To see coverage: 103 | 104 | ```bash 105 | # In the terminal: 106 | $ py.test --cov scrapyd_api tests/ 107 | # As a browseable HTML report: 108 | $ make coverage 109 | 110 | ``` 111 | 112 | ## Other development commands 113 | 114 | Please run `make help` or see the [Makefile][makefile] for other development related commands. 115 | 116 | [makefile]: https://github.com/djm/python-scrapyd-api/blob/master/Makefile 117 | -------------------------------------------------------------------------------- /tests/test_client.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | import responses 4 | import pytest 5 | 6 | from scrapyd_api.client import Client 7 | from scrapyd_api.exceptions import ScrapydResponseError 8 | 9 | 10 | SCRAPYD_RESPONSE_OK = { 11 | 'status': 'ok', 12 | 'example': 'Test', 13 | 'another-example': 'Another Test', 14 | } 15 | 16 | SCRAPYD_RESPONSE_ERROR = { 17 | 'status': 'error', 18 | 'message': 'Test Error' 19 | } 20 | 21 | URL = 'http://localhost/' 22 | AUTH = ('username', 'password') 23 | 24 | OK_JSON = '{"status": "ok", "key": "value"}' 25 | ERROR_JSON = '{"status": "error", "message": "some-error"}' 26 | MALFORMED_JSON = 'this-aint-json' 27 | 28 | 29 | @responses.activate 30 | def test_get_handle_ok_response(): 31 | """ 32 | Test that a GET request uses the requests lib properly. 33 | """ 34 | non_authed_client = Client() 35 | responses.add(responses.GET, URL, body=OK_JSON, status=200) 36 | non_authed_client.get(URL) 37 | assert len(responses.calls) == 1 38 | call = responses.calls[0] 39 | assert call.request.url == URL 40 | assert call.response.json() == json.loads(OK_JSON) 41 | # Test with some query string params. 42 | test_params = {'test': 'params'} 43 | url_with_query = URL + '?test=params' 44 | responses.add(responses.GET, url_with_query, body=OK_JSON, status=200) 45 | non_authed_client.get(URL, params=test_params) 46 | assert len(responses.calls) == 2 47 | call = responses.calls[1] 48 | assert call.response.json() == json.loads(OK_JSON) 49 | 50 | 51 | @responses.activate 52 | def test_post_handle_ok_response(): 53 | """ 54 | Test that a POST request uses the requests lib properly. 55 | """ 56 | non_authed_client = Client() 57 | responses.add(responses.POST, URL, body=OK_JSON, status=200) 58 | test_data = {'test': 'json'} 59 | non_authed_client.post(URL, data=test_data) 60 | assert len(responses.calls) == 1 61 | call = responses.calls[0] 62 | assert call.response.json() == json.loads(OK_JSON) 63 | 64 | 65 | @responses.activate 66 | def test_handle_http_error_response(): 67 | """ 68 | Test that an 'Error' response from Scrapyd handles as desired. 69 | """ 70 | non_authed_client = Client() 71 | responses.add(responses.GET, URL, body=MALFORMED_JSON, status=500) 72 | with pytest.raises(ScrapydResponseError) as excinfo: 73 | non_authed_client.get(URL) 74 | assert '500 error' in str(excinfo.value) 75 | 76 | 77 | @responses.activate 78 | def test_non_or_invalid_json_response_errors(): 79 | """ 80 | Test that a response from Scrapyd that does not parse as 81 | valid JSON raises the correct exception. 82 | """ 83 | non_authed_client = Client() 84 | responses.add(responses.GET, URL, body=MALFORMED_JSON, status=200) 85 | with pytest.raises(ScrapydResponseError) as excinfo: 86 | non_authed_client.get(URL) 87 | assert 'invalid JSON' in str(excinfo.value) 88 | 89 | 90 | @responses.activate 91 | def test_scrapyd_error_response(): 92 | """ 93 | Test that a response from Scrapyd that does not parse as 94 | valid JSON raises the correct exception. 95 | """ 96 | non_authed_client = Client() 97 | responses.add(responses.GET, URL, body=ERROR_JSON, status=200) 98 | with pytest.raises(ScrapydResponseError) as excinfo: 99 | non_authed_client.get(URL) 100 | assert 'some-error' in str(excinfo.value) 101 | 102 | 103 | @responses.activate 104 | def test_with_auth(): 105 | """ 106 | Test attaching basic auth creds results in correct headers. 107 | """ 108 | authed_client = Client() 109 | authed_client.auth = AUTH 110 | # Test with just a URL call. 111 | responses.add(responses.GET, URL, body=OK_JSON, status=200) 112 | authed_client.get(URL) 113 | assert len(responses.calls) == 1 114 | call = responses.calls[0] 115 | assert 'Authorization' in call.request.headers 116 | assert 'Basic' in call.request.headers['Authorization'] 117 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # python-scrapyd-api 2 | 3 | [![The PyPI version](https://badge.fury.io/py/python-scrapyd-api.png)][pypi] [![Build status on Travis-CI](https://travis-ci.org/djm/python-scrapyd-api.png?branch=master)](https://travis-ci.org/djm/python-scrapyd-api) [![Coverage status on Coveralls](https://coveralls.io/repos/djm/python-scrapyd-api/badge.png)](https://coveralls.io/r/djm/python-scrapyd-api) [![Documentation status on ReadTheDocs](https://readthedocs.org/projects/python-scrapyd-api/badge/?version=latest)][docs] 4 | 5 | A Python wrapper for working with [Scrapyd][scrapyd]'s [API][scrapyd-api-docs]. 6 | 7 | Current released version: 2.1.2 (see [history][history]). 8 | 9 | Allows a Python application to talk to, and therefore control, the 10 | [Scrapy][scrapy] daemon: [Scrapyd][scrapyd]. 11 | 12 | * Supports Python 2.6, 2.7, 3.3 & 3.4 13 | * Free software: BSD license 14 | * [Full documentation][docs] 15 | * On the [Python Package Index (PyPI)][pypi] 16 | * Scrapyd's [API Documentation][scrapyd-api-docs] 17 | 18 | [scrapy]: http://scrapy.org/ 19 | [scrapyd]: https://github.com/scrapy/scrapyd 20 | [scrapyd-api-docs]: http://scrapyd.readthedocs.org/en/latest/api.html 21 | [history]: https://github.com/djm/python-scrapyd-api/blob/master/HISTORY.md 22 | [pypi]: https://pypi.python.org/pypi/python-scrapyd-api/ 23 | [docs]: http://python-scrapyd-api.readthedocs.org/en/latest/ 24 | 25 | ## Install 26 | 27 | Easiest installation is via `pip`: 28 | 29 | ```bash 30 | pip install python-scrapyd-api 31 | ``` 32 | 33 | ## Quick Usage 34 | 35 | Please refer to the [full documentation][docs] for more detailed usage but to get you started: 36 | 37 | ```python 38 | >>> from scrapyd_api import ScrapydAPI 39 | >>> scrapyd = ScrapydAPI('http://localhost:6800') 40 | ``` 41 | 42 | **Add a project** egg as a new version: 43 | 44 | ```python 45 | >>> egg = open('some_egg.egg', 'rb') 46 | >>> scrapyd.add_version('project_name', 'version_name', egg) 47 | # Returns the number of spiders in the project. 48 | 3 49 | >>> egg.close() 50 | ``` 51 | 52 | **Cancel a scheduled job**: 53 | 54 | ```python 55 | >>> scrapyd.cancel('project_name', '14a6599ef67111e38a0e080027880ca6') 56 | # Returns the "previous state" of the job before it was cancelled: 'running' or 'pending'. 57 | 'running' 58 | ``` 59 | 60 | **Delete a project** and all sibling versions: 61 | 62 | ```python 63 | >>> scrapyd.delete_project('project_name') 64 | # Returns True if the request was met with an OK response. 65 | True 66 | ``` 67 | 68 | **Delete a version** of a project: 69 | 70 | ```python 71 | >>> scrapyd.delete_version('project_name', 'version_name') 72 | # Returns True if the request was met with an OK response. 73 | True 74 | ``` 75 | 76 | **Request status** of a job: 77 | 78 | ```python 79 | >>> scrapyd.job_status('project_name', '14a6599ef67111e38a0e080027880ca6') 80 | # Returns 'running', 'pending', 'finished' or '' for unknown state. 81 | 'running' 82 | ``` 83 | 84 | **List all jobs** registered: 85 | 86 | ```python 87 | >>> scrapyd.list_jobs('project_name') 88 | # Returns a dictionary of running, finished and pending job lists. 89 | { 90 | 'pending': [ 91 | { 92 | u'id': u'24c35...f12ae', 93 | u'spider': u'spider_name' 94 | }, 95 | ], 96 | 'running': [ 97 | { 98 | u'id': u'14a65...b27ce', 99 | u'spider': u'spider_name', 100 | u'start_time': u'2014-06-17 22:45:31.975358' 101 | }, 102 | ], 103 | 'finished': [ 104 | { 105 | u'id': u'34c23...b21ba', 106 | u'spider': u'spider_name', 107 | u'start_time': u'2014-06-17 22:45:31.975358', 108 | u'end_time': u'2014-06-23 14:01:18.209680' 109 | } 110 | ] 111 | } 112 | ``` 113 | 114 | **List all projects** registered: 115 | 116 | ```python 117 | >>> scrapyd.list_projects() 118 | [u'ecom_project', u'estate_agent_project', u'car_project'] 119 | ``` 120 | 121 | **Displays the load status of a service** registered: 122 | 123 | ```python 124 | >>> scrapyd.daemon_status() 125 | {u'finished': 0, u'running': 0, u'pending': 0, u'node_name': u'ScrapyMachine'} 126 | ``` 127 | 128 | **List all spiders** available to a given project: 129 | 130 | ```python 131 | >>> scrapyd.list_spiders('project_name') 132 | [u'raw_spider', u'js_enhanced_spider', u'selenium_spider'] 133 | ``` 134 | 135 | **List all versions** registered to a given project: 136 | 137 | ```python 138 | >>> scrapyd.list_versions('project_name'): 139 | [u'345', u'346', u'347', u'348'] 140 | ``` 141 | 142 | **Schedule a job** to run with a specific spider: 143 | 144 | ```python 145 | # Schedule a job to run with a specific spider. 146 | >>> scrapyd.schedule('project_name', 'spider_name') 147 | # Returns the Scrapyd job id. 148 | u'14a6599ef67111e38a0e080027880ca6' 149 | ``` 150 | 151 | **Schedule a job** to run while passing override settings: 152 | 153 | ```python 154 | >>> settings = {'DOWNLOAD_DELAY': 2} 155 | >>> scrapyd.schedule('project_name', 'spider_name', settings=settings) 156 | u'25b6588ef67333e38a0e080027880de7' 157 | ``` 158 | 159 | **Schedule a job** to run while passing extra attributes to spider initialisation: 160 | 161 | ```python 162 | >>> scrapyd.schedule('project_name', 'spider_name', extra_attribute='value') 163 | # NB: 'project', 'spider' and 'settings' are reserved kwargs for this 164 | # method and therefore these names should be avoided when trying to pass 165 | # extra attributes to the spider init. 166 | u'25b6588ef67333e38a0e080027880de7' 167 | ``` 168 | 169 | 170 | ## Setting up the project to contribute code 171 | 172 | Please see [CONTRIBUTING.md][contributing]. This will guide you through our pull request 173 | guidelines, project setup and testing requirements. 174 | 175 | [contributing]: https://github.com/djm/python-scrapyd-api/blob/master/CONTRIBUTING.md 176 | 177 | ## License 178 | 179 | 2-clause BSD. See the full [LICENSE][license]. 180 | 181 | [license]: https://github.com/djm/python-scrapyd-api/blob/master/LICENSE 182 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = build 9 | 10 | # User-friendly check for sphinx-build 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) 13 | endif 14 | 15 | # Internal variables. 16 | PAPEROPT_a4 = -D latex_paper_size=a4 17 | PAPEROPT_letter = -D latex_paper_size=letter 18 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source 19 | # the i18n builder cannot share the environment and doctrees with the others 20 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source 21 | 22 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext 23 | 24 | help: 25 | @echo "Please use \`make ' where is one of" 26 | @echo " html to make standalone HTML files" 27 | @echo " dirhtml to make HTML files named index.html in directories" 28 | @echo " singlehtml to make a single large HTML file" 29 | @echo " pickle to make pickle files" 30 | @echo " json to make JSON files" 31 | @echo " htmlhelp to make HTML files and a HTML help project" 32 | @echo " qthelp to make HTML files and a qthelp project" 33 | @echo " devhelp to make HTML files and a Devhelp project" 34 | @echo " epub to make an epub" 35 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 36 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 37 | @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" 38 | @echo " text to make text files" 39 | @echo " man to make manual pages" 40 | @echo " texinfo to make Texinfo files" 41 | @echo " info to make Texinfo files and run them through makeinfo" 42 | @echo " gettext to make PO message catalogs" 43 | @echo " changes to make an overview of all changed/added/deprecated items" 44 | @echo " xml to make Docutils-native XML files" 45 | @echo " pseudoxml to make pseudoxml-XML files for display purposes" 46 | @echo " linkcheck to check all external links for integrity" 47 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 48 | 49 | clean: 50 | rm -rf $(BUILDDIR)/* 51 | 52 | html: 53 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 54 | @echo 55 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 56 | 57 | dirhtml: 58 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 59 | @echo 60 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 61 | 62 | singlehtml: 63 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 64 | @echo 65 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 66 | 67 | pickle: 68 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 69 | @echo 70 | @echo "Build finished; now you can process the pickle files." 71 | 72 | json: 73 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 74 | @echo 75 | @echo "Build finished; now you can process the JSON files." 76 | 77 | htmlhelp: 78 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 79 | @echo 80 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 81 | ".hhp project file in $(BUILDDIR)/htmlhelp." 82 | 83 | qthelp: 84 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 85 | @echo 86 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 87 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 88 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/python-scrapyd-api.qhcp" 89 | @echo "To view the help file:" 90 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/python-scrapyd-api.qhc" 91 | 92 | devhelp: 93 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 94 | @echo 95 | @echo "Build finished." 96 | @echo "To view the help file:" 97 | @echo "# mkdir -p $$HOME/.local/share/devhelp/python-scrapyd-api" 98 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/python-scrapyd-api" 99 | @echo "# devhelp" 100 | 101 | epub: 102 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 103 | @echo 104 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 105 | 106 | latex: 107 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 108 | @echo 109 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 110 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 111 | "(use \`make latexpdf' here to do that automatically)." 112 | 113 | latexpdf: 114 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 115 | @echo "Running LaTeX files through pdflatex..." 116 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 117 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 118 | 119 | latexpdfja: 120 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 121 | @echo "Running LaTeX files through platex and dvipdfmx..." 122 | $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja 123 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 124 | 125 | text: 126 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 127 | @echo 128 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 129 | 130 | man: 131 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 132 | @echo 133 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 134 | 135 | texinfo: 136 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 137 | @echo 138 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 139 | @echo "Run \`make' in that directory to run these through makeinfo" \ 140 | "(use \`make info' here to do that automatically)." 141 | 142 | info: 143 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 144 | @echo "Running Texinfo files through makeinfo..." 145 | make -C $(BUILDDIR)/texinfo info 146 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 147 | 148 | gettext: 149 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 150 | @echo 151 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 152 | 153 | changes: 154 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 155 | @echo 156 | @echo "The overview file is in $(BUILDDIR)/changes." 157 | 158 | linkcheck: 159 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 160 | @echo 161 | @echo "Link check complete; look for any errors in the above output " \ 162 | "or in $(BUILDDIR)/linkcheck/output.txt." 163 | 164 | doctest: 165 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 166 | @echo "Testing of doctests in the sources finished, look at the " \ 167 | "results in $(BUILDDIR)/doctest/output.txt." 168 | 169 | xml: 170 | $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml 171 | @echo 172 | @echo "Build finished. The XML files are in $(BUILDDIR)/xml." 173 | 174 | pseudoxml: 175 | $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml 176 | @echo 177 | @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." 178 | -------------------------------------------------------------------------------- /scrapyd_api/wrapper.py: -------------------------------------------------------------------------------- 1 | from __future__ import unicode_literals 2 | 3 | from copy import deepcopy 4 | 5 | from . import constants 6 | from .client import Client 7 | from .compat import ( 8 | iteritems, 9 | urljoin 10 | ) 11 | 12 | 13 | class ScrapydAPI(object): 14 | """ 15 | Provides a thin Pythonic wrapper around the Scrapyd API. The public methods 16 | come in two types: first class, those that wrap a Scrapyd API endpoint 17 | directly; and derived, those that use a one or more Scrapyd API endpoint(s) 18 | to provide functionality that is unique to this wrapper. 19 | """ 20 | 21 | def __init__(self, target='http://localhost:6800', auth=None, 22 | endpoints=None, client=None, timeout=None): 23 | """ 24 | Instantiates the ScrapydAPI wrapper for use. 25 | 26 | Args: 27 | target (str): the hostname/port to hit with requests. 28 | auth (str, str): a 2-item tuple containing user/pass details. Only 29 | used when `client` is not passed. 30 | endpoints: a dictionary of custom endpoints to apply on top of 31 | the pre-existing defaults. 32 | client: a pre-instantiated requests-like client. By default, we use 33 | our own client. Override for your own needs. 34 | timeout: timeout for client requests in seconds, either as a float 35 | or a (connect timeout, read timeout) tuple 36 | 37 | """ 38 | if endpoints is None: 39 | endpoints = {} 40 | 41 | if client is None: 42 | client = Client() 43 | client.auth = auth 44 | 45 | self.target = target 46 | self.client = client 47 | self.timeout = timeout 48 | self.endpoints = deepcopy(constants.DEFAULT_ENDPOINTS) 49 | self.endpoints.update(endpoints) 50 | 51 | def _build_url(self, endpoint): 52 | """ 53 | Builds the absolute URL using the target and desired endpoint. 54 | """ 55 | try: 56 | path = self.endpoints[endpoint] 57 | except KeyError: 58 | msg = 'Unknown endpoint `{0}`' 59 | raise ValueError(msg.format(endpoint)) 60 | absolute_url = urljoin(self.target, path) 61 | return absolute_url 62 | 63 | def add_version(self, project, version, egg): 64 | """ 65 | Adds a new project egg to the Scrapyd service. First class, maps to 66 | Scrapyd's add version endpoint. 67 | """ 68 | url = self._build_url(constants.ADD_VERSION_ENDPOINT) 69 | data = { 70 | 'project': project, 71 | 'version': version 72 | } 73 | files = { 74 | 'egg': egg 75 | } 76 | json = self.client.post(url, data=data, files=files, 77 | timeout=self.timeout) 78 | return json['spiders'] 79 | 80 | def cancel(self, project, job, signal=None): 81 | """ 82 | Cancels a job from a specific project. First class, maps to 83 | Scrapyd's cancel job endpoint. 84 | """ 85 | url = self._build_url(constants.CANCEL_ENDPOINT) 86 | data = { 87 | 'project': project, 88 | 'job': job, 89 | } 90 | if signal is not None: 91 | data['signal'] = signal 92 | json = self.client.post(url, data=data, timeout=self.timeout) 93 | return json['prevstate'] 94 | 95 | def delete_project(self, project): 96 | """ 97 | Deletes all versions of a project. First class, maps to Scrapyd's 98 | delete project endpoint. 99 | """ 100 | url = self._build_url(constants.DELETE_PROJECT_ENDPOINT) 101 | data = { 102 | 'project': project, 103 | } 104 | self.client.post(url, data=data, timeout=self.timeout) 105 | return True 106 | 107 | def delete_version(self, project, version): 108 | """ 109 | Deletes a specific version of a project. First class, maps to 110 | Scrapyd's delete version endpoint. 111 | """ 112 | url = self._build_url(constants.DELETE_VERSION_ENDPOINT) 113 | data = { 114 | 'project': project, 115 | 'version': version 116 | } 117 | self.client.post(url, data=data, timeout=self.timeout) 118 | return True 119 | 120 | def job_status(self, project, job_id): 121 | """ 122 | Retrieves the 'status' of a specific job specified by its id. Derived, 123 | utilises Scrapyd's list jobs endpoint to provide the answer. 124 | """ 125 | all_jobs = self.list_jobs(project) 126 | for state in constants.JOB_STATES: 127 | job_ids = [job['id'] for job in all_jobs[state]] 128 | if job_id in job_ids: 129 | return state 130 | return '' # Job not found, state unknown. 131 | 132 | def list_jobs(self, project): 133 | """ 134 | Lists all known jobs for a project. First class, maps to Scrapyd's 135 | list jobs endpoint. 136 | """ 137 | url = self._build_url(constants.LIST_JOBS_ENDPOINT) 138 | params = {'project': project} 139 | jobs = self.client.get(url, params=params, timeout=self.timeout) 140 | return jobs 141 | 142 | def list_projects(self): 143 | """ 144 | Lists all deployed projects. First class, maps to Scrapyd's 145 | list projects endpoint. 146 | """ 147 | url = self._build_url(constants.LIST_PROJECTS_ENDPOINT) 148 | json = self.client.get(url, timeout=self.timeout) 149 | return json['projects'] 150 | 151 | def list_spiders(self, project): 152 | """ 153 | Lists all known spiders for a specific project. First class, maps 154 | to Scrapyd's list spiders endpoint. 155 | """ 156 | url = self._build_url(constants.LIST_SPIDERS_ENDPOINT) 157 | params = {'project': project} 158 | json = self.client.get(url, params=params, timeout=self.timeout) 159 | return json['spiders'] 160 | 161 | def list_versions(self, project): 162 | """ 163 | Lists all deployed versions of a specific project. First class, maps 164 | to Scrapyd's list versions endpoint. 165 | """ 166 | url = self._build_url(constants.LIST_VERSIONS_ENDPOINT) 167 | params = {'project': project} 168 | json = self.client.get(url, params=params, timeout=self.timeout) 169 | return json['versions'] 170 | 171 | def schedule(self, project, spider, settings=None, **kwargs): 172 | """ 173 | Schedules a spider from a specific project to run. First class, maps 174 | to Scrapyd's scheduling endpoint. 175 | """ 176 | 177 | url = self._build_url(constants.SCHEDULE_ENDPOINT) 178 | data = { 179 | 'project': project, 180 | 'spider': spider 181 | } 182 | data.update(kwargs) 183 | if settings: 184 | setting_params = [] 185 | for setting_name, value in iteritems(settings): 186 | setting_params.append('{0}={1}'.format(setting_name, value)) 187 | data['setting'] = setting_params 188 | json = self.client.post(url, data=data, timeout=self.timeout) 189 | return json['jobid'] 190 | 191 | def daemon_status(self): 192 | """ 193 | Displays the load status of a service. 194 | :rtype: dict 195 | """ 196 | url = self._build_url(constants.DAEMON_STATUS_ENDPOINT) 197 | json = self.client.get(url, timeout=self.timeout) 198 | return json 199 | -------------------------------------------------------------------------------- /docs/source/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # complexity documentation build configuration file, created by 5 | # sphinx-quickstart on Tue Jul 9 22:26:36 2013. 6 | # 7 | # This file is execfile()d with the current directory set to its 8 | # containing dir. 9 | # 10 | # Note that not all possible configuration values are present in this 11 | # autogenerated file. 12 | # 13 | # All configuration values have a default; values that are commented out 14 | # serve to show the default. 15 | 16 | import sys 17 | import os 18 | 19 | # If extensions (or modules to document with autodoc) are in another 20 | # directory, add these directories to sys.path here. If the directory is 21 | # relative to the documentation root, use os.path.abspath to make it 22 | # absolute, like shown here. 23 | #sys.path.insert(0, os.path.abspath('.')) 24 | 25 | # Get the project root dir, which is the parent dir of this 26 | cwd = os.getcwd() 27 | project_root = os.path.dirname(cwd) 28 | 29 | # Insert the project root dir as the first element in the PYTHONPATH. 30 | # This lets us ensure that the source package is imported, and that its 31 | # version is used. 32 | sys.path.insert(0, project_root) 33 | 34 | import scrapyd_api 35 | 36 | # -- General configuration --------------------------------------------- 37 | 38 | # If your documentation needs a minimal Sphinx version, state it here. 39 | #needs_sphinx = '1.0' 40 | 41 | # Add any Sphinx extension module names here, as strings. They can be 42 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 43 | extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode'] 44 | 45 | # Add any paths that contain templates here, relative to this directory. 46 | templates_path = ['_templates'] 47 | 48 | # The suffix of source filenames. 49 | source_suffix = '.rst' 50 | 51 | # The encoding of source files. 52 | #source_encoding = 'utf-8-sig' 53 | 54 | # The master toctree document. 55 | master_doc = 'index' 56 | 57 | # General information about the project. 58 | project = u'Python Scrapyd API' 59 | copyright = u'2014, Darian Moody' 60 | 61 | # The version info for the project you're documenting, acts as replacement 62 | # for |version| and |release|, also used in various other places throughout 63 | # the built documents. 64 | # 65 | # The short X.Y version. 66 | version = scrapyd_api.__version__ 67 | # The full version, including alpha/beta/rc tags. 68 | release = scrapyd_api.__version__ 69 | 70 | # The language for content autogenerated by Sphinx. Refer to documentation 71 | # for a list of supported languages. 72 | #language = None 73 | 74 | # There are two options for replacing |today|: either, you set today to 75 | # some non-false value, then it is used: 76 | #today = '' 77 | # Else, today_fmt is used as the format for a strftime call. 78 | #today_fmt = '%B %d, %Y' 79 | 80 | # List of patterns, relative to source directory, that match files and 81 | # directories to ignore when looking for source files. 82 | exclude_patterns = ['_build'] 83 | 84 | # The reST default role (used for this markup: `text`) to use for all 85 | # documents. 86 | #default_role = None 87 | 88 | # If true, '()' will be appended to :func: etc. cross-reference text. 89 | #add_function_parentheses = True 90 | 91 | # If true, the current module name will be prepended to all description 92 | # unit titles (such as .. function::). 93 | #add_module_names = True 94 | 95 | # If true, sectionauthor and moduleauthor directives will be shown in the 96 | # output. They are ignored by default. 97 | #show_authors = False 98 | 99 | # The name of the Pygments (syntax highlighting) style to use. 100 | pygments_style = 'sphinx' 101 | 102 | # A list of ignored prefixes for module index sorting. 103 | #modindex_common_prefix = [] 104 | 105 | # If true, keep warnings as "system message" paragraphs in the built 106 | # documents. 107 | #keep_warnings = False 108 | 109 | 110 | # -- Options for HTML output ------------------------------------------- 111 | 112 | # The theme to use for HTML and HTML Help pages. See the documentation for 113 | # a list of builtin themes. 114 | 115 | import sphinx_rtd_theme 116 | 117 | html_theme = "sphinx_rtd_theme" 118 | html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] 119 | 120 | # Theme options are theme-specific and customize the look and feel of a 121 | # theme further. For a list of options available for each theme, see the 122 | # documentation. 123 | #html_theme_options = {} 124 | 125 | # The name for this set of Sphinx documents. If None, it defaults to 126 | # " v documentation". 127 | #html_title = None 128 | 129 | # A shorter title for the navigation bar. Default is the same as 130 | # html_title. 131 | #html_short_title = None 132 | 133 | # The name of an image file (relative to this directory) to place at the 134 | # top of the sidebar. 135 | #html_logo = None 136 | 137 | # The name of an image file (within the static path) to use as favicon 138 | # of the docs. This file should be a Windows icon file (.ico) being 139 | # 16x16 or 32x32 pixels large. 140 | #html_favicon = None 141 | 142 | # Add any paths that contain custom static files (such as style sheets) 143 | # here, relative to this directory. They are copied after the builtin 144 | # static files, so a file named "default.css" will overwrite the builtin 145 | # "default.css". 146 | html_static_path = ['_static'] 147 | 148 | # If not '', a 'Last updated on:' timestamp is inserted at every page 149 | # bottom, using the given strftime format. 150 | #html_last_updated_fmt = '%b %d, %Y' 151 | 152 | # If true, SmartyPants will be used to convert quotes and dashes to 153 | # typographically correct entities. 154 | #html_use_smartypants = True 155 | 156 | # Custom sidebar templates, maps document names to template names. 157 | #html_sidebars = {} 158 | 159 | # Additional templates that should be rendered to pages, maps page names 160 | # to template names. 161 | #html_additional_pages = {} 162 | 163 | # If false, no module index is generated. 164 | #html_domain_indices = True 165 | 166 | # If false, no index is generated. 167 | #html_use_index = True 168 | 169 | # If true, the index is split into individual pages for each letter. 170 | #html_split_index = False 171 | 172 | # If true, links to the reST sources are added to the pages. 173 | #html_show_sourcelink = True 174 | 175 | # If true, "Created using Sphinx" is shown in the HTML footer. 176 | # Default is True. 177 | #html_show_sphinx = True 178 | 179 | # If true, "(C) Copyright ..." is shown in the HTML footer. 180 | # Default is True. 181 | #html_show_copyright = True 182 | 183 | # If true, an OpenSearch description file will be output, and all pages 184 | # will contain a tag referring to it. The value of this option 185 | # must be the base URL from which the finished HTML is served. 186 | #html_use_opensearch = '' 187 | 188 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 189 | #html_file_suffix = None 190 | 191 | # Output file base name for HTML help builder. 192 | htmlhelp_basename = 'python-scrapyd-api-doc' 193 | 194 | 195 | # -- Options for LaTeX output ------------------------------------------ 196 | 197 | latex_elements = { 198 | # The paper size ('letterpaper' or 'a4paper'). 199 | #'papersize': 'letterpaper', 200 | 201 | # The font size ('10pt', '11pt' or '12pt'). 202 | #'pointsize': '10pt', 203 | 204 | # Additional stuff for the LaTeX preamble. 205 | #'preamble': '', 206 | } 207 | 208 | # Grouping the document tree into LaTeX files. List of tuples 209 | # (source start file, target name, title, author, documentclass 210 | # [howto/manual]). 211 | latex_documents = [ 212 | ('index', 'python-scrapyd-api.tex', 213 | u'Scrapyd API Documentation', 214 | u'Darian Moody', 'manual'), 215 | ] 216 | 217 | # The name of an image file (relative to this directory) to place at 218 | # the top of the title page. 219 | #latex_logo = None 220 | 221 | # For "manual" documents, if this is true, then toplevel headings 222 | # are parts, not chapters. 223 | #latex_use_parts = False 224 | 225 | # If true, show page references after internal links. 226 | #latex_show_pagerefs = False 227 | 228 | # If true, show URL addresses after external links. 229 | #latex_show_urls = False 230 | 231 | # Documents to append as an appendix to all manuals. 232 | #latex_appendices = [] 233 | 234 | # If false, no module index is generated. 235 | #latex_domain_indices = True 236 | 237 | 238 | # -- Options for manual page output ------------------------------------ 239 | 240 | # One entry per manual page. List of tuples 241 | # (source start file, name, description, authors, manual section). 242 | man_pages = [ 243 | ('index', 'python-scrapyd-api', 244 | u'Scrapyd API Documentation', 245 | [u'Darian Moody'], 1) 246 | ] 247 | 248 | # If true, show URL addresses after external links. 249 | #man_show_urls = False 250 | 251 | 252 | # -- Options for Texinfo output ---------------------------------------- 253 | 254 | # Grouping the document tree into Texinfo files. List of tuples 255 | # (source start file, target name, title, author, 256 | # dir menu entry, description, category) 257 | texinfo_documents = [ 258 | ('index', 'python-scrapyd-api', 259 | u'Scrapyd API Documentation', 260 | u'Darian Moody', 261 | 'python-scrapyd-api', 262 | "A Python wrapper for working with Scrapyd's API.", 263 | 'Miscellaneous'), 264 | ] 265 | 266 | # Documents to append as an appendix to all manuals. 267 | #texinfo_appendices = [] 268 | 269 | # If false, no module index is generated. 270 | #texinfo_domain_indices = True 271 | 272 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 273 | #texinfo_show_urls = 'footnote' 274 | 275 | # If true, do not generate a @detailmenu in the "Top" node's menu. 276 | #texinfo_no_detailmenu = False 277 | -------------------------------------------------------------------------------- /tests/test_wrapper.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | from mock import MagicMock 3 | from requests import Timeout 4 | 5 | from scrapyd_api.compat import StringIO 6 | from scrapyd_api.constants import ( 7 | ADD_VERSION_ENDPOINT, 8 | CANCEL_ENDPOINT, 9 | FINISHED, 10 | PENDING 11 | ) 12 | from scrapyd_api.wrapper import ScrapydAPI 13 | 14 | HOST_URL = 'http://localhost' 15 | AUTH = ('username', 'password') 16 | PROJECT = 'project' 17 | VERSION = '45' 18 | SPIDER = 'spider' 19 | JOB = 'd131dd02c5e6eec4693d9a0698aff95c' 20 | 21 | 22 | def test_auth_gets_applied_when_client_is_not_supplied(): 23 | """ 24 | Auth details should get correctly passed to the client 25 | when no client is provided. 26 | """ 27 | api = ScrapydAPI(HOST_URL, auth=AUTH) 28 | assert api.client.auth == AUTH 29 | 30 | 31 | def test_auth_doesnt_get_applied_when_client_is_supplied(): 32 | """ 33 | Auth details should not get set on a passed client. 34 | Instantiated clients should handle auth themselves. 35 | """ 36 | mock_client = MagicMock() 37 | api = ScrapydAPI(HOST_URL, auth=AUTH, client=mock_client) 38 | assert api.client.auth != AUTH 39 | 40 | 41 | def test_build_url_with_default_endpoints(): 42 | """ 43 | Absolute URL constructor should form correct URL when 44 | the client is relying on the default endpoints. 45 | """ 46 | api = ScrapydAPI('http://localhost') 47 | url = api._build_url(ADD_VERSION_ENDPOINT) 48 | assert url == 'http://localhost/addversion.json' 49 | # Test trailing slash on target. 50 | api = ScrapydAPI('http://localhost/') 51 | url = api._build_url(ADD_VERSION_ENDPOINT) 52 | assert url == 'http://localhost/addversion.json' 53 | 54 | 55 | def test_build_url_with_custom_endpoints(): 56 | """ 57 | The absolute URL constructor should correctly form URL when 58 | the client has custom endpoints passed in. 59 | """ 60 | custom_endpoints = { 61 | ADD_VERSION_ENDPOINT: '/addversion-custom.json' 62 | } 63 | api = ScrapydAPI('http://localhost', endpoints=custom_endpoints) 64 | url = api._build_url(ADD_VERSION_ENDPOINT) 65 | assert url == 'http://localhost/addversion-custom.json' 66 | # Test trailing slash on target. 67 | api = ScrapydAPI('http://localhost/', endpoints=custom_endpoints) 68 | url = api._build_url(ADD_VERSION_ENDPOINT) 69 | assert url == 'http://localhost/addversion-custom.json' 70 | # Test that endpoints that were not overridden by the custom_endpoints 71 | # still work as the defaults. 72 | url = api._build_url(CANCEL_ENDPOINT) 73 | assert url == 'http://localhost/cancel.json' 74 | 75 | 76 | def test_build_url_with_non_existant_endpoint_errors(): 77 | """ 78 | Supplying _build_url with an endpoint that does not exist in 79 | the endpoints dictionary should result in a ValueError. 80 | """ 81 | api = ScrapydAPI(HOST_URL) 82 | with pytest.raises(ValueError): 83 | api._build_url('does-not-exist') 84 | 85 | 86 | def test_add_version(): 87 | mock_client = MagicMock() 88 | mock_client.post.return_value = { 89 | 'spiders': 3 90 | } 91 | api = ScrapydAPI(HOST_URL, client=mock_client) 92 | test_egg = StringIO('Test egg') 93 | rtn = api.add_version(PROJECT, VERSION, test_egg) 94 | assert rtn == 3 # The number of spiders uploaded. 95 | mock_client.post.assert_called_with( 96 | 'http://localhost/addversion.json', 97 | data={ 98 | 'project': PROJECT, 99 | 'version': VERSION 100 | }, 101 | files={ 102 | 'egg': test_egg 103 | }, 104 | timeout=None 105 | ) 106 | 107 | 108 | def test_cancelling_running_job(): 109 | mock_client = MagicMock() 110 | mock_client.post.return_value = { 111 | 'prevstate': 'running', 112 | } 113 | api = ScrapydAPI(HOST_URL, client=mock_client) 114 | rtn = api.cancel(PROJECT, JOB) 115 | assert rtn is 'running' 116 | mock_client.post.assert_called_with( 117 | 'http://localhost/cancel.json', 118 | data={ 119 | 'project': PROJECT, 120 | 'job': JOB 121 | }, 122 | timeout=None 123 | ) 124 | 125 | 126 | def test_cancelling_pending_job(): 127 | mock_client = MagicMock() 128 | mock_client.post.return_value = { 129 | 'prevstate': 'pending', 130 | } 131 | api = ScrapydAPI(HOST_URL, client=mock_client) 132 | rtn = api.cancel(PROJECT, JOB) 133 | assert rtn is 'pending' 134 | mock_client.post.assert_called_with( 135 | 'http://localhost/cancel.json', 136 | data={ 137 | 'project': PROJECT, 138 | 'job': JOB 139 | }, 140 | timeout=None 141 | ) 142 | 143 | 144 | def test_cancelling_with_specific_signal(): 145 | mock_client = MagicMock() 146 | mock_client.post.return_value = { 147 | 'prevstate': 'running', 148 | } 149 | api = ScrapydAPI(HOST_URL, client=mock_client) 150 | rtn = api.cancel(PROJECT, JOB, signal='TERM') 151 | assert rtn is 'running' 152 | mock_client.post.assert_called_with( 153 | 'http://localhost/cancel.json', 154 | data={ 155 | 'project': PROJECT, 156 | 'job': JOB, 157 | 'signal': 'TERM' 158 | }, 159 | timeout=None 160 | ) 161 | 162 | 163 | def test_delete_project(): 164 | mock_client = MagicMock() 165 | mock_client.post.return_value = {} 166 | api = ScrapydAPI(HOST_URL, client=mock_client) 167 | rtn = api.delete_project(PROJECT) 168 | assert rtn is True 169 | mock_client.post.assert_called_with( 170 | 'http://localhost/delproject.json', 171 | data={ 172 | 'project': PROJECT, 173 | }, 174 | timeout=None 175 | ) 176 | 177 | 178 | def test_delete_version(): 179 | mock_client = MagicMock() 180 | mock_client.post.return_value = {} 181 | api = ScrapydAPI(HOST_URL, client=mock_client) 182 | rtn = api.delete_version(PROJECT, VERSION) 183 | assert rtn is True 184 | mock_client.post.assert_called_with( 185 | 'http://localhost/delversion.json', 186 | data={ 187 | 'project': PROJECT, 188 | 'version': VERSION 189 | }, 190 | timeout=None 191 | ) 192 | 193 | 194 | def test_job_status(): 195 | mock_client = MagicMock() 196 | mock_client.get.return_value = { 197 | 'pending': [{'id': 'abc'}, {'id': 'def'}], 198 | 'running': [], 199 | 'finished': [{'id': 'ghi'}], 200 | } 201 | api = ScrapydAPI(HOST_URL, client=mock_client) 202 | expected_results = ( 203 | ('abc', PENDING), 204 | ('def', PENDING), 205 | ('ghi', FINISHED), 206 | ('xyz', '') 207 | ) 208 | for job_id, expected_result in expected_results: 209 | rtn = api.job_status(PROJECT, job_id) 210 | assert rtn == expected_result 211 | 212 | 213 | def test_list_jobs(): 214 | mock_client = MagicMock() 215 | mock_client.get.return_value = { 216 | 'pending': [{'id': 'abc'}, {'id': 'def'}], 217 | 'running': [], 218 | 'finished': [{'id': 'ghi'}], 219 | } 220 | api = ScrapydAPI(HOST_URL, client=mock_client) 221 | rtn = api.list_jobs(PROJECT) 222 | assert len(rtn) == 3 223 | assert sorted(rtn.keys()) == ['finished', 'pending', 'running'] 224 | assert rtn['pending'] == [{'id': 'abc'}, {'id': 'def'}] 225 | assert rtn['finished'] == [{'id': 'ghi'}] 226 | assert rtn['running'] == [] 227 | mock_client.get.assert_called_with( 228 | 'http://localhost/listjobs.json', 229 | params={ 230 | 'project': PROJECT, 231 | }, 232 | timeout=None 233 | ) 234 | 235 | 236 | def test_list_projects(): 237 | mock_client = MagicMock() 238 | mock_client.get.return_value = { 239 | 'projects': ['test', 'test2'] 240 | } 241 | api = ScrapydAPI(HOST_URL, client=mock_client) 242 | rtn = api.list_projects() 243 | assert rtn == ['test', 'test2'] 244 | mock_client.get.assert_called_with( 245 | 'http://localhost/listprojects.json', 246 | timeout=None 247 | ) 248 | 249 | 250 | def test_list_spiders(): 251 | mock_client = MagicMock() 252 | mock_client.get.return_value = { 253 | 'spiders': ['spider', 'spider2'] 254 | } 255 | api = ScrapydAPI(HOST_URL, client=mock_client) 256 | rtn = api.list_spiders(PROJECT) 257 | 258 | assert rtn == ['spider', 'spider2'] 259 | mock_client.get.assert_called_with( 260 | 'http://localhost/listspiders.json', 261 | params={ 262 | 'project': PROJECT, 263 | }, 264 | timeout=None 265 | ) 266 | 267 | 268 | def test_list_versions(): 269 | mock_client = MagicMock() 270 | mock_client.get.return_value = { 271 | 'versions': ['version', 'version2'] 272 | } 273 | api = ScrapydAPI(HOST_URL, client=mock_client) 274 | rtn = api.list_versions(PROJECT) 275 | assert rtn == ['version', 'version2'] 276 | mock_client.get.assert_called_with( 277 | 'http://localhost/listversions.json', 278 | params={ 279 | 'project': PROJECT, 280 | }, 281 | timeout=None 282 | ) 283 | 284 | 285 | def test_schedule(): 286 | mock_client = MagicMock() 287 | job_id = 'ce54b67080280d1ec69821bcb6a88393' 288 | settings = { 289 | 'BOT_NAME': 'Firefox', 290 | 'DOWNLOAD_DELAY': 2 291 | } 292 | kwargs = {'extra_detail': 'Test'} 293 | mock_client.post.return_value = { 294 | 'jobid': job_id 295 | } 296 | api = ScrapydAPI(HOST_URL, client=mock_client) 297 | rtn = api.schedule(PROJECT, SPIDER, settings=settings, **kwargs) 298 | assert rtn == job_id 299 | args, kwargs = mock_client.post.call_args 300 | assert len(args) == 1 301 | assert args[0] == 'http://localhost/schedule.json' 302 | assert len(kwargs) == 2 303 | assert 'data' in kwargs 304 | data_kw = kwargs['data'] 305 | assert 'project' in data_kw 306 | assert data_kw['project'] == PROJECT 307 | assert 'extra_detail' in data_kw 308 | assert data_kw['extra_detail'] == 'Test' 309 | assert 'setting' in data_kw 310 | assert sorted(data_kw['setting']) == ['BOT_NAME=Firefox', 311 | 'DOWNLOAD_DELAY=2'] 312 | assert 'spider' in data_kw 313 | assert data_kw['spider'] == SPIDER 314 | 315 | 316 | def test_request_timeout(): 317 | """ 318 | The client should raise an exception when the server does not respond 319 | in time limit. 320 | """ 321 | api = ScrapydAPI('http://httpbin.org/delay/5', timeout=1) 322 | with pytest.raises(Timeout): 323 | api.client.get(api.target, timeout=api.timeout) 324 | 325 | 326 | def test_daemon_status(): 327 | api = ScrapydAPI(HOST_URL) 328 | rtn = api.daemon_status() 329 | assert isinstance(rtn, dict) 330 | assert 'finished' in rtn 331 | assert 'running' in rtn 332 | assert 'pending' in rtn 333 | assert 'node_name' in rtn 334 | assert isinstance(rtn['finished'], int) 335 | -------------------------------------------------------------------------------- /docs/source/usage.rst: -------------------------------------------------------------------------------- 1 | ================== 2 | Usage Instructions 3 | ================== 4 | 5 | Quick Usage 6 | ----------- 7 | 8 | Please see the README_ for quick usage instructions. 9 | 10 | .. _README: https://github.com/djm/python-scrapyd-api/blob/master/README.md 11 | 12 | Instantiating the wrapper 13 | ------------------------- 14 | 15 | The wrapper is the core component which allows you to talk to Scrapyd's API 16 | and in most cases this will be the first and only point of interaction with 17 | this package. 18 | 19 | .. code-block:: python 20 | 21 | from scrapyd_api import ScrapydAPI 22 | scrapyd = ScrapydAPI('http://localhost:6800') 23 | 24 | Where ``http://localhost:6800`` is the absolute URI to the location of the 25 | service, including which port Scrapyd's API is running on. 26 | 27 | Note that while it is usually better to be explicit, if you are running Scrapyd 28 | on the same machine and with the default port then the wrapper can be 29 | instantiated with no arguments at all. 30 | 31 | You may have further special requirements, for example one of the following: 32 | 33 | - you may require HTTP Basic Authentication for your connections to Scrapyd. 34 | - you may have changed the default endpoints for the various API actions. 35 | - you may need to swap out the default connection client/handler. 36 | - you may want to provide a timeout for client requests so the program does not 37 | hang indefinitely in case the server is not responding. 38 | 39 | Providing HTTP Basic Auth credentials 40 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 41 | 42 | The ``auth`` parameter can be passed during instantiation. The value itself 43 | should be a tuple containing two strings: the username and password required 44 | to successfully authenticate: 45 | 46 | .. code-block:: python 47 | 48 | credentials = ('admin-username', 'admin-p4ssw0rd') 49 | scrapyd = ScrapydAPI('http//example.com:6800/scrapyd/', auth=credentials) 50 | 51 | .. note:: 52 | If you pass the ``client`` argument explained below, the ``auth`` 53 | argument's value will be ignored. The ``auth`` argument is only meant as 54 | a shortcut for setting the credentials on the built-in client; therefore, 55 | when passing your own you will have to handle this yourself. 56 | 57 | Supplying custom endpoints for the various API actions 58 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 59 | 60 | You might have changed the location of the *add version* action for whatever 61 | reason, here is how you would go about overriding that so the wrapper contacts 62 | the correct endpoint: 63 | 64 | .. code-block:: python 65 | 66 | from scrapyd_api.constants import ADD_VERSION_ENDPOINT 67 | 68 | custom_endpoints = { 69 | ADD_VERSION_ENDPOINT: '/changed-add-version-location.json' 70 | } 71 | scrapyd = ScrapydAPI('http://localhost:6800', endpoints=custom_endpoints) 72 | 73 | The code example above only overrides the `add version endpoint` location 74 | and thus for all other endpoints, the default value would still be utilised. 75 | Simply add extra endpoint keys to the dict you pass in to override extra 76 | endpoints; the key values can either: 77 | 78 | - be imported from ``scrapyd_api.constants`` as per the example. 79 | - or simply set as strings, see the constants module for correct usage. 80 | 81 | Replacing the client used 82 | ~~~~~~~~~~~~~~~~~~~~~~~~~ 83 | 84 | When no ``client`` argument is passed, the wrapper uses the default client 85 | which can be found at ``scrapyd_api.client.Client``. This default client is 86 | effectively a small modification of Requests_' ``Session`` client which 87 | knows how to handle errors from Scrapyd in a more graceful fashion by raising 88 | a ``ScrapydResponseError`` exception. 89 | 90 | .. _Requests: http://python-requests.org 91 | 92 | If you have custom authentication requirements or other issues which the 93 | default client does not solve then you can create your own client class, 94 | instantiate it and then pass it in to the constructor. This may be as simple 95 | as subclassing ``scrapyd_api.client.Client`` and modifying its functionality 96 | or it may require building your own Request's Session-like class. It would 97 | be done like so: 98 | 99 | .. code-block:: python 100 | 101 | new_client = SomeNewClient() 102 | scrapyd = ScrapydAPI('http://localhost:6800', client=new_client) 103 | 104 | 105 | At the very minimum the client object should support: 106 | 107 | - the ``.get()`` and ``.post()`` methods which should accept Requests-list args. 108 | - the responses being parsed in a similar fashion to the 109 | ``scrapd_api.client.Client._handle_response`` method which has the ability 110 | to load the JSON returned and check the "status" which gets sent from 111 | Scrapyd, raising the ``ScrapydResponseError`` exception as required. 112 | 113 | Setting timeout for the requests 114 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 115 | 116 | By default, client requests do not time out unless a timeout value is set 117 | explicitly. Thus, if the server is not responding, your code may hang 118 | indefinitely. You can tell the client to stop waiting for a response after 119 | a given number of seconds with the ``timeout`` parameter provided during 120 | instantiation of the wrapper: 121 | 122 | .. code-block:: python 123 | 124 | scrapyd = ScrapydAPI('http//example.com:6800/scrapyd/', timeout=5) 125 | 126 | The value should be a float or a (connect timeout, read timeout) tuple. It will 127 | be supplied to every request to the server. Additional information can be found 128 | in the `Requests documentation`_. 129 | 130 | .. _Requests documentation: http://docs.python-requests.org/en/master/user/advanced/#timeouts 131 | 132 | Calling the API 133 | --------------- 134 | 135 | The Scrapyd API has a number of different actions designed to enable the 136 | full control and automation of the daemon itself, and this package provides 137 | a wrapper for *all* of those. 138 | 139 | Add a version 140 | ~~~~~~~~~~~~~ 141 | 142 | .. method:: ScrapydAPI.add_version(project, version, egg) 143 | 144 | Uploads a new version of a project. See the `add version endpoint`_ on Scrapyd's 145 | documentation. 146 | 147 | .. _add version endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#addversion-json 148 | 149 | **Arguments**: 150 | 151 | - **project** *(string)* The name of the project. 152 | - **version** *(string)* The name of the new version you are uploading. 153 | - **egg** *(string)* The Python egg you wish to upload as the project, as a pre-opened file. 154 | 155 | **Returns**: *(int)* The number of spiders found in the uploaded project; this is 156 | the only useful information returned by Scrapyd as part of this call. 157 | 158 | .. code-block:: python 159 | 160 | >>> with open('some-egg.egg') as egg: 161 | >>> scrapyd.add_version('project_name', 'version_name', egg) 162 | 3 163 | 164 | Cancel a job 165 | ~~~~~~~~~~~~ 166 | 167 | .. method:: ScrapydAPI.cancel(project, job, signal=None) 168 | 169 | Cancels a running or pending job with an optionally supplied termination signal. 170 | A job in this regard is a previously scheduled run of a specific spider. See the 171 | `cancel endpoint`_ on Scrapyd's documentation. 172 | 173 | .. _cancel endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#cancel-json 174 | 175 | **Arguments**: 176 | 177 | - **project** *(string)* The name of the project the job belongs to. 178 | - **job** *(string)* The ID of the job (which was reported back on scheduling). 179 | - **signal** *(optional - string or int)* The termination signal to use. If one is not provided, this field is not send allowing scrapyd to pick the default. 180 | 181 | **Returns**: *(string)* ``'running'`` if the cancelled job was active, or ``'pending'`` if it was waiting to run. 182 | 183 | .. code-block:: python 184 | 185 | >>> scrapyd.cancel('project_name', 'a3cb2..4efc1') 186 | 'running' 187 | >>> scrapyd.cancel('project_name', 'b3ea2..3acc2', signal='TERM') 188 | 'pending' 189 | 190 | Delete a project 191 | ~~~~~~~~~~~~~~~~ 192 | 193 | .. method:: ScrapydAPI.delete_project(project) 194 | 195 | Deletes all versions of an entire project, this includes all spiders within 196 | those versions. See the `delete project endpoint`_ on Scrapyd's documentation. 197 | 198 | .. _delete project endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#delproject-json 199 | 200 | **Arguments**: 201 | 202 | - **project** *(string)* The name of the project to delete. 203 | 204 | **Returns**: *(bool)* Always True, an exception is raised for other outcomes. 205 | 206 | .. code-block:: python 207 | 208 | >>> scrapyd.delete_project('project_name') 209 | True 210 | 211 | Delete a version of a project 212 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 213 | 214 | .. method:: ScrapydAPI.delete_version(project, version) 215 | 216 | Deletes a specific version of a project and all spiders within that version. 217 | See the `delete version endpoint`_ on Scrapyd's documentation. 218 | 219 | .. _delete version endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#delversion-json 220 | 221 | **Arguments**: 222 | 223 | - **project** *(string)* The name of the project which the version belongs to. 224 | - **version** *(string)* The name of the version you wish to delete. 225 | 226 | **Returns**: *(bool)* Always True, an exception is raised for other outcomes. 227 | 228 | .. code-block:: python 229 | 230 | >>> scrapyd.delete_version('project_name', 'version_name') 231 | True 232 | 233 | Retrieve the status of a specific job 234 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 235 | 236 | .. method:: ScrapydAPI.job_status(project, job_id) 237 | 238 | .. versionadded:: 0.2 239 | 240 | Returns the job status for a single job. The status returned can be one of: 241 | ``''``, ``'running'``, ``'pending'`` or ``'finished'``. The empty string is 242 | returned if the job ID could not be found and the status is therefore unknown. 243 | 244 | **Arguments**: 245 | 246 | - **project** *(string)* The name of the project which the version belongs to. 247 | - **job_id** *(string)* The ID of the job you wish to check the status of. 248 | 249 | **Returns**: *(string)* The status of the job, if known. 250 | 251 | .. note:: 252 | Scrapyd does not support an endpoint for this specific action. This 253 | method's result is derived from the list jobs endpoint, and therefore 254 | this is a helper method/shortcut provided by this wrapper itself. This is 255 | why the call requires the `project` argument, as the list jobs endpoint 256 | underlying this method also requires it. 257 | 258 | .. code-block:: python 259 | 260 | >>> scrapyd.job_status('project_name', 'ac32a..bc21') 261 | 'running' 262 | 263 | If you wish, the various strings defining job state can be imported from 264 | the ``scrapyd`` module itself for use in comparisons. e.g: 265 | 266 | .. code-block:: python 267 | 268 | from scrapyd_api import RUNNING, FINISHED, PENDING 269 | 270 | state = scrapyd.job_status('project_name', 'ac32a..bc21') 271 | if state == RUNNING: 272 | print 'Job is running' 273 | 274 | List all jobs for a project 275 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 276 | 277 | .. method:: ScrapydAPI.list_jobs(project) 278 | 279 | Lists all running, finished & pending spider jobs for a given project. See the 280 | `list jobs endpoint`_ on Scrapyd's documentation. 281 | 282 | .. _list jobs endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#listjobs-json 283 | 284 | - **project** *(string)* The name of the project to list jobs for. 285 | 286 | **Returns**: *(dict)* A dictionary with keys ``pending``, ``running`` and 287 | ``finished``, each containing a list of job dicts. Each job dict has keys for 288 | the ``id`` and the name of the ``spider`` which ran the job. 289 | 290 | .. code-block:: python 291 | 292 | >>> scrapyd.list_jobs('project_name') 293 | { 294 | 'pending': [ 295 | { 296 | u'id': u'24c35...f12ae', 297 | u'spider': u'spider_name' 298 | }, 299 | ], 300 | 'running': [ 301 | { 302 | u'id': u'14a65...b27ce', 303 | u'spider': u'spider_name', 304 | u'start_time': u'2014-06-17 22:45:31.975358' 305 | }, 306 | ], 307 | 'finished': [ 308 | { 309 | u'id': u'34c23...b21ba', 310 | u'spider': u'spider_name', 311 | u'start_time': u'2014-06-17 22:45:31.975358', 312 | u'end_time': u'2014-06-23 14:01:18.209680' 313 | } 314 | ] 315 | } 316 | 317 | List all projects 318 | ~~~~~~~~~~~~~~~~~ 319 | 320 | .. method:: ScrapydAPI.list_projects() 321 | 322 | Lists all available projects. See the `list projects endpoint`_ on Scrapyd's 323 | documentation. 324 | 325 | .. _list projects endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#listprojects-json 326 | 327 | **Arguments**: 328 | 329 | - This method takes no arguments. 330 | 331 | **Returns**: *(list)* A list of strings denoting the names of which projects 332 | are available. 333 | 334 | .. code-block:: python 335 | 336 | >>> scrapyd.list_projects() 337 | [u'ecom_project', u'estate_agent_project', u'car_project'] 338 | 339 | List all spiders in a project 340 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 341 | 342 | .. method:: ScrapydAPI.list_spiders(project) 343 | 344 | Lists all spiders available to a given project. See the `list spiders 345 | endpoint`_ on Scrapyd's documentation. 346 | 347 | .. _list spiders endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#listspiders-json 348 | 349 | **Arguments**: 350 | 351 | - **project** *(string)* The name of the project to list spiders for. 352 | 353 | **Returns**: *(list)* A list of strings denoting the names of spider available 354 | to the project. 355 | 356 | .. code-block:: python 357 | 358 | >>> scrapyd.list_spiders('project_name') 359 | [u'raw_spider', u'js_enhanced_spider', u'selenium_spider'] 360 | 361 | List all versions of a project 362 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 363 | 364 | .. method:: ScrapydAPI.list_versions(project) 365 | 366 | This endpoint lists all available versions of a given project. See the `list 367 | versions endpoint`_ on Scrapyd's documentation. 368 | 369 | .. _list versions endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#listversions-json 370 | 371 | **Arguments**: 372 | 373 | - **project** *(string)* The name of the project to list versions for. 374 | 375 | **Returns**: (list) A list of strings denoting all available version names for 376 | the requested project. 377 | 378 | .. code-block:: python 379 | 380 | >>> scrapyd.list_versions('project_name'): 381 | [u'345', u'346', u'347', u'348'] 382 | 383 | Schedule a job to run 384 | ~~~~~~~~~~~~~~~~~~~~~ 385 | 386 | .. method:: ScrapydAPI.schedule(project, spider, settings=None, **kwargs) 387 | 388 | The main action method which would actually cause scraping to start. This 389 | action schedules a given spider to run immediately if there are no concurrent 390 | jobs or as soon as possible once the current jobs are complete (this is a 391 | Scrapyd setting). 392 | 393 | There is currently no built-in ability in Scrapyd to schedule a spider for a 394 | specific time, but this can be handled client side by simply firing off the 395 | request at the desired time. 396 | 397 | See the `schedule endpoint`_ on Scrapyd's documentation. 398 | 399 | .. _schedule endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#schedule-json 400 | 401 | **Arguments**: 402 | 403 | - **project** *(string)* The name of the project that owns the spider. 404 | - **spider** *(string)* The name of the spider you wish to run. 405 | - **settings** *(dict)* A dictionary of Scrapy settings keys you wish to 406 | override for this run. 407 | - **kwargs** Any extra parameters you would like to pass to the spiders 408 | constructor/init method. 409 | 410 | **Returns**: (string) The Job ID of the newly created run. 411 | 412 | .. code-block:: python 413 | 414 | # Schedule a job to run now sans extra parameters. 415 | >>> scrapyd.schedule('project_name', 'spider_name') 416 | u'14a6599ef67111e38a0e080027880ca6' 417 | # Schedule a job to run now with overridden settings. 418 | >>> settings = {'DOWNLOAD_DELAY': 2} 419 | >>> scrapyd.schedule('project_name', 'spider_name', settings=settings) 420 | u'23b5688df67111e38a0e080027880ca6' 421 | # Schedule a job to run now with overridden settings. 422 | # Schedule a joib to run now while passing init parameters. 423 | >>> scrapyd.schedule('project_name', 'spider_name', extra_init_param='value') 424 | u'14a6599ef67111e38a0e080027880ca6' 425 | # Schedule a job to run now with overridden settings. 426 | 427 | .. note:: 428 | 'project', 'spider' and 'settings' are reserved kwargs for this method and 429 | therefore these names should be avoided when trying to pass extra 430 | attributes to the spider init. 431 | 432 | Handling Exceptions 433 | ------------------- 434 | 435 | As this library relies on the Requests_ library to handle HTTP connections, 436 | the exceptions raised by Requests itself for such things as hard connection 437 | errors, timeouts etc can be found in the `Requests exceptions documentation`_. 438 | 439 | .. _Requests: http://python-requests.org 440 | .. _Requests exceptions documentation: http://docs.python-requests.org/en/latest/api/?highlight=exceptions#exceptions 441 | 442 | However, when the problem is an error Scrapyd has returned itself instead, 443 | the ``scrapyd_api.exceptions.ScrapydResponseError`` will be raised with the 444 | applicable error message sent back from the Scrapyd API. 445 | 446 | This works by simply checking the JSON return's `status` key and raising 447 | the exception with the return's `message` value, allowing the developer 448 | to debug the response. 449 | 450 | Daemon status 451 | ~~~~~~~~~~~~~~~~~ 452 | 453 | .. method:: ScrapydAPI.daemon_status() 454 | 455 | Displays the load status of a service. See the `daemonstatus endpoint`_ on Scrapyd's 456 | documentation. 457 | 458 | .. _daemonstatus endpoint: http://scrapyd.readthedocs.org/en/latest/api.html#daemonstatus-json 459 | 460 | **Arguments**: 461 | 462 | - This method takes no arguments. 463 | 464 | **Returns**: *(dict)* A dictionary with keys ``pending``, ``running``, 465 | ``finished`` and ``node_name``. 466 | 467 | .. code-block:: python 468 | 469 | >>> scrapyd.daemon_status() 470 | {u'finished': 0, u'running': 0, u'pending': 0, u'node_name': u'ScrapyMachine'} --------------------------------------------------------------------------------