├── src ├── bandersnatch_filter_plugins │ ├── __init__.py │ ├── prerelease_name.py │ ├── latest_name.py │ ├── regex_name.py │ ├── filename_name.py │ ├── allowlist_name.py │ └── blocklist_name.py ├── bandersnatch │ ├── tests │ │ ├── sample │ │ ├── mock_config.py │ │ ├── ci.conf │ │ ├── ci-swift.conf │ │ ├── test_sync.py │ │ ├── test_package.py │ │ ├── plugins │ │ │ ├── test_prerelease_name.py │ │ │ ├── test_regex_name.py │ │ │ ├── test_latest_release.py │ │ │ ├── test_filename.py │ │ │ ├── test_allowlist_name.py │ │ │ └── test_blocklist_name.py │ │ ├── test_master.py │ │ ├── test_utils.py │ │ ├── conftest.py │ │ ├── test_main.py │ │ ├── test_filter.py │ │ ├── test_delete.py │ │ ├── test_configuration.py │ │ └── test_verify.py │ ├── __main__.py │ ├── log.py │ ├── __init__.py │ ├── errors.py │ ├── default.conf │ ├── unittest.conf │ ├── utils.py │ ├── delete.py │ ├── configuration.py │ ├── package.py │ ├── main.py │ ├── verify.py │ └── filter.py ├── bandersnatch_storage_plugins │ └── __init__.py ├── test_tools │ └── test_xmlrpc.py └── runner.py ├── pytest.ini ├── setup.py ├── requirements_swift.txt ├── docs ├── modules.rst ├── installation.md ├── bandersnatch_storage_plugins.rst ├── index.rst ├── bandersnatch_filter_plugins.rst ├── bandersnatch.rst ├── mirror_configuration.md └── filtering_configuration.md ├── .coveragerc ├── .pyup.yml ├── .flake8 ├── requirements_test.txt ├── requirements.txt ├── bootstrap.sh ├── .github └── workflows │ ├── docker_upload.yml │ ├── docker_readme.yml │ ├── pypi_upload.yml │ └── ci.yml ├── mypy.ini ├── .readthedocs.yml ├── requirements_docs.txt ├── Dockerfile ├── .pre-commit-config.yaml ├── tox.ini ├── .travis.yml ├── .gitignore ├── MAINTAINERS.md ├── DEVELOPMENT.md ├── setup.cfg ├── test_runner.py └── README.md /src/bandersnatch_filter_plugins/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/sample: -------------------------------------------------------------------------------- 1 | I am a sample! 2 | -------------------------------------------------------------------------------- /src/bandersnatch_storage_plugins/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /pytest.ini: -------------------------------------------------------------------------------- 1 | [pytest] 2 | log_cli_level = DEBUG 3 | log_level = DEBUG 4 | -------------------------------------------------------------------------------- /src/bandersnatch/__main__.py: -------------------------------------------------------------------------------- 1 | from . import main 2 | 3 | exit(main.main()) 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | from setuptools import setup 3 | 4 | setup() 5 | -------------------------------------------------------------------------------- /requirements_swift.txt: -------------------------------------------------------------------------------- 1 | keystoneauth1==4.2.1 2 | openstackclient==4.0.0 3 | python-swiftclient==3.10.0 4 | -------------------------------------------------------------------------------- /docs/modules.rst: -------------------------------------------------------------------------------- 1 | bandersnatch 2 | ============ 3 | 4 | .. toctree:: 5 | :maxdepth: 4 6 | 7 | bandersnatch 8 | bandersnatch_filter_plugins 9 | bandersnatch_storage_plugins 10 | -------------------------------------------------------------------------------- /.coveragerc: -------------------------------------------------------------------------------- 1 | [run] 2 | branch = True 3 | source = 4 | bandersnatch 5 | bandersnatch_storage_plugins 6 | 7 | [report] 8 | precision = 2 9 | omit = */apache_*.py 10 | */buildout.py 11 | */release.py 12 | */log.py 13 | -------------------------------------------------------------------------------- /.pyup.yml: -------------------------------------------------------------------------------- 1 | requirements: 2 | - requirements.txt: 3 | update: all 4 | pin: True 5 | - requirements_docs.txt: 6 | update: all 7 | pin: True 8 | - requirements_test.txt: 9 | update: all 10 | pin: True 11 | - setup.cfg: 12 | update: False 13 | -------------------------------------------------------------------------------- /.flake8: -------------------------------------------------------------------------------- 1 | [flake8] 2 | select = B,C,E,F,P,W 3 | max_line_length = 88 4 | # E722 is a duplicate of B001. 5 | # P207 is a duplicate of B003. 6 | # W503 is against PEP8 7 | ignore = E722, P207, W503 8 | max-complexity = 20 9 | exclude = 10 | build, 11 | dist, 12 | __pycache__, 13 | *.pyc, 14 | .git, 15 | .tox 16 | -------------------------------------------------------------------------------- /requirements_test.txt: -------------------------------------------------------------------------------- 1 | asynctest==0.13.0 2 | async-timeout==3.0.1 3 | black==19.10b0 4 | codecov==2.1.8 5 | coverage==5.2.1 6 | flake8==3.8.3 7 | flake8-bugbear==20.1.4 8 | freezegun==0.3.15 9 | pre-commit==2.6.0 10 | pytest==6.0.1 11 | pytest-asyncio==0.14.0 12 | pytest-timeout==1.4.2 13 | setuptools==49.6.0 14 | tox==3.19.0 15 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.6.2 2 | aiohttp-xmlrpc==0.8.1 3 | async-timeout==3.0.1 4 | attrs==19.3.0 5 | chardet==3.0.4 6 | filelock==3.0.12 7 | idna==2.10 8 | importlib_resources==3.0.0; python_version < '3.7' 9 | lxml==4.5.2 10 | multidict==4.7.6 11 | packaging==20.4 12 | pyparsing==2.4.7 13 | setuptools==49.6.0 14 | six==1.15.0 15 | yarl==1.5.1 16 | -------------------------------------------------------------------------------- /bootstrap.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [ ! -e bin/python3 ]; then 4 | virtualenv --python=python3.6 . 5 | fi 6 | if [ ! -e bin/buildout ]; then 7 | bin/pip install zc.buildout 8 | bin/pip install virtualenv 9 | fi 10 | bin/pip install --upgrade zc.buildout==2.11.1 11 | bin/pip install --upgrade setuptools==38.5.2 12 | bin/pip install --upgrade virtualenv==15.1.0 13 | bin/buildout 14 | -------------------------------------------------------------------------------- /docs/installation.md: -------------------------------------------------------------------------------- 1 | ## Installation 2 | 3 | The following instructions will place the bandersnatch executable in a 4 | virtualenv under `bandersnatch/bin/bandersnatch`. 5 | 6 | - bandersnatch **requires** `>= Python 3.6` 7 | 8 | 9 | ### pip 10 | 11 | This installs the latest stable, released version. 12 | 13 | *(>= 3.6.1 required)* 14 | 15 | ``` 16 | $ python3.6 -m venv bandersnatch 17 | $ bandersnatch/bin/pip install bandersnatch 18 | $ bandersnatch/bin/bandersnatch --help 19 | ``` 20 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/mock_config.py: -------------------------------------------------------------------------------- 1 | from bandersnatch.configuration import BandersnatchConfig 2 | 3 | 4 | def mock_config(contents: str, filename: str = "test.conf") -> BandersnatchConfig: 5 | """ 6 | Creates a config file with contents and loads them into a 7 | BandersnatchConfig instance. 8 | """ 9 | with open(filename, "w") as fd: 10 | fd.write(contents) 11 | 12 | instance = BandersnatchConfig() 13 | instance.config_file = filename 14 | instance.load_configuration() 15 | return instance 16 | -------------------------------------------------------------------------------- /.github/workflows/docker_upload.yml: -------------------------------------------------------------------------------- 1 | name: bandersnatch_docker_upload 2 | 3 | on: 4 | push: 5 | branches: 6 | - master 7 | 8 | jobs: 9 | build: 10 | runs-on: ubuntu-latest 11 | steps: 12 | - uses: actions/checkout@master 13 | - name: Publish to Docker Registry 14 | uses: elgohr/Publish-Docker-Github-Action@master 15 | with: 16 | name: pypa/bandersnatch 17 | username: ${{ secrets.DOCKER_USERNAME }} 18 | password: ${{ secrets.DOCKER_PASSWORD }} 19 | snapshot: true 20 | tag_names: true 21 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/ci.conf: -------------------------------------------------------------------------------- 1 | ; Config for the Travis CI Integration Test that hits PyPI 2 | 3 | [mirror] 4 | directory = /tmp/pypi 5 | json = true 6 | cleanup = true 7 | master = https://pypi.org 8 | timeout = 60 9 | global-timeout = 18000 10 | workers = 3 11 | hash-index = true 12 | stop-on-error = true 13 | storage-backend = filesystem 14 | verifiers = 3 15 | keep_index_versions = 2 16 | 17 | [plugins] 18 | enabled = 19 | allowlist_project 20 | 21 | [allowlist] 22 | packages = 23 | ACMPlus 24 | black 25 | peerme 26 | pyaib 27 | 28 | ; vim: set ft=cfg: 29 | -------------------------------------------------------------------------------- /src/test_tools/test_xmlrpc.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ 4 | Quick tool to test xmlrpc queries from bandersnatch 5 | - Note this is >= 3.7 only due to asyncio.run 6 | """ 7 | 8 | import asyncio 9 | 10 | from bandersnatch.master import Master 11 | 12 | 13 | async def main() -> int: 14 | async with Master("https://pypi.org") as master: 15 | all_packages = await master.all_packages() 16 | print(f"PyPI returned {len(all_packages)} PyPI packages via xmlrpc") 17 | return 0 18 | 19 | 20 | if __name__ == "__main__": 21 | exit(asyncio.run(main())) # type: ignore 22 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/ci-swift.conf: -------------------------------------------------------------------------------- 1 | ; Config for the Travis CI Integration Test that hits PyPI 2 | 3 | [mirror] 4 | directory = /tmp/pypi 5 | json = true 6 | master = https://pypi.org 7 | timeout = 60 8 | global-timeout = 18000 9 | workers = 3 10 | hash-index = true 11 | stop-on-error = true 12 | storage-backend = swift 13 | verifiers = 3 14 | keep_index_versions = 2 15 | 16 | [swift] 17 | default_container = bandersnatch 18 | 19 | [plugins] 20 | enabled = 21 | allowlist_project 22 | 23 | [allowlist] 24 | packages = 25 | black 26 | peerme 27 | pyaib 28 | 29 | ; vim: set ft=cfg: 30 | -------------------------------------------------------------------------------- /mypy.ini: -------------------------------------------------------------------------------- 1 | [mypy] 2 | python_version = 3.6 3 | check_untyped_defs = True 4 | disallow_incomplete_defs = True 5 | disallow_untyped_defs = True 6 | # Until pytest + plugins type their decorators need this disabled 7 | disallow_untyped_decorators = False 8 | ignore_missing_imports = True 9 | no_implicit_optional = True 10 | pretty = True 11 | show_error_context = True 12 | sqlite_cache = True 13 | warn_no_return = True 14 | warn_redundant_casts = True 15 | warn_return_any = True 16 | warn_unreachable = True 17 | warn_unused_ignores = True 18 | 19 | # TODO: Enable PEP420 for bandersnatch 20 | namespace_packages = False 21 | -------------------------------------------------------------------------------- /.readthedocs.yml: -------------------------------------------------------------------------------- 1 | # Read the Docs configuration file 2 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details 3 | 4 | version: 2 5 | 6 | sphinx: 7 | builder: html 8 | configuration: docs/conf.py 9 | # It's probably better if we have slightly outdated but known good documentation 10 | # up than having (possibly very) broken documentation. 11 | fail_on_warning: true 12 | 13 | formats: 14 | - pdf 15 | - htmlzip 16 | - epub 17 | 18 | python: 19 | version: 3.7 20 | install: 21 | - method: pip 22 | path: . 23 | - requirements: requirements_docs.txt # By extension, this installs requirements_swift.txt too 24 | -------------------------------------------------------------------------------- /.github/workflows/docker_readme.yml: -------------------------------------------------------------------------------- 1 | name: bandersnatch_docker_readme 2 | 3 | on: 4 | push: 5 | branches: 6 | - master 7 | paths: 8 | - README.md 9 | - .github/workflows/docker_readme.yml 10 | 11 | jobs: 12 | dockerHubDescription: 13 | runs-on: ubuntu-latest 14 | steps: 15 | - uses: actions/checkout@master 16 | - name: Publish README to Docker Hub Description 17 | uses: peter-evans/dockerhub-description@v2.2.0 18 | env: 19 | DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USERNAME }} 20 | DOCKERHUB_PASSWORD: ${{ secrets.DOCKER_PASSWORD }} 21 | DOCKERHUB_REPOSITORY: pypa/bandersnatch 22 | -------------------------------------------------------------------------------- /src/bandersnatch/log.py: -------------------------------------------------------------------------------- 1 | # This is mainly factored out into a separate module so I can ignore it in 2 | # coverage analysis. Unfortunately this is really hard to test as the Python 3 | # logging module won't allow reasonable teardown. :( 4 | import logging 5 | from typing import Any 6 | 7 | 8 | def setup_logging(args: Any) -> logging.StreamHandler: 9 | ch = logging.StreamHandler() 10 | formatter = logging.Formatter("%(asctime)s %(levelname)s: %(message)s") 11 | ch.setFormatter(formatter) 12 | logger = logging.getLogger("bandersnatch") 13 | logger.setLevel(logging.DEBUG if args.debug else logging.INFO) 14 | logger.addHandler(ch) 15 | return ch 16 | -------------------------------------------------------------------------------- /requirements_docs.txt: -------------------------------------------------------------------------------- 1 | docutils==0.16 2 | pyparsing==2.4.7 3 | python-dateutil==2.8.1 4 | packaging==20.4 5 | requests==2.24.0 6 | six==1.15.0 7 | sphinx==3.2.1 8 | recommonmark==0.6.0 9 | xmlrpc2==0.3.1 10 | 11 | git+https://github.com/pypa/pypa-docs-theme.git#egg=pypa-docs-theme 12 | git+https://github.com/python/python-docs-theme.git#egg=python-docs-theme 13 | 14 | # This is needed since autodoc imports all bandersnatch packages and modules 15 | # so imports must not fail or its containing module will NOT be documented. 16 | # Also, the missing swift dependencies will cause the doc build to fail since 17 | # autodoc will raise a warning due to the import failure. 18 | -r requirements_swift.txt 19 | -------------------------------------------------------------------------------- /src/bandersnatch/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | __copyright__ = "2010-2020, PyPA" 4 | 5 | from typing import NamedTuple 6 | 7 | 8 | class _VersionInfo(NamedTuple): 9 | major: int 10 | minor: int 11 | micro: int 12 | releaselevel: str 13 | serial: int 14 | 15 | @property 16 | def version_str(self) -> str: 17 | release_level = f".{self.releaselevel}" if self.releaselevel else "" 18 | return f"{self.major}.{self.minor}.{self.micro}{release_level}" 19 | 20 | 21 | __version_info__ = _VersionInfo( 22 | major=4, 23 | minor=1, 24 | micro=1, 25 | releaselevel="", 26 | serial=0, # Not currently in use with Bandersnatch versioning 27 | ) 28 | __version__ = __version_info__.version_str 29 | -------------------------------------------------------------------------------- /docs/bandersnatch_storage_plugins.rst: -------------------------------------------------------------------------------- 1 | bandersnatch_storage_plugins package 2 | ==================================== 3 | 4 | Package contents 5 | ---------------- 6 | 7 | .. automodule:: bandersnatch_storage_plugins 8 | :members: 9 | :undoc-members: 10 | :show-inheritance: 11 | 12 | Submodules 13 | ---------- 14 | 15 | bandersnatch_storage_plugins.filesystem module 16 | ---------------------------------------------- 17 | 18 | .. automodule:: bandersnatch_storage_plugins.filesystem 19 | :members: 20 | :undoc-members: 21 | :show-inheritance: 22 | 23 | bandersnatch_storage_plugins.swift module 24 | ----------------------------------------- 25 | 26 | .. automodule:: bandersnatch_storage_plugins.swift 27 | :members: 28 | :undoc-members: 29 | :show-inheritance: 30 | -------------------------------------------------------------------------------- /src/bandersnatch/errors.py: -------------------------------------------------------------------------------- 1 | class PackageNotFound(Exception): 2 | """We asked for package metadata from PyPI and it wasn't available""" 3 | 4 | def __init__(self, package_name: str) -> None: 5 | super().__init__() 6 | self.package_name = package_name 7 | 8 | def __str__(self) -> str: 9 | return f"{self.package_name} no longer exists on PyPI" 10 | 11 | 12 | class StaleMetadata(Exception): 13 | """We attempted to retreive metadata from PyPI, but it was stale.""" 14 | 15 | def __init__(self, package_name: str, attempts: int) -> None: 16 | super().__init__() 17 | self.package_name = package_name 18 | self.attempts = attempts 19 | 20 | def __str__(self) -> str: 21 | return f"Stale serial for {self.package_name} after {self.attempts} attempts" 22 | -------------------------------------------------------------------------------- /.github/workflows/pypi_upload.yml: -------------------------------------------------------------------------------- 1 | name: bandersnatch_pypi_upload 2 | 3 | on: 4 | release: 5 | types: created 6 | 7 | jobs: 8 | build: 9 | name: bandersnatch PyPI Upload 10 | runs-on: ubuntu-latest 11 | 12 | steps: 13 | - uses: actions/checkout@v1 14 | 15 | - name: Set up Python 3.7 16 | uses: actions/setup-python@v1 17 | with: 18 | python-version: 3.7 19 | 20 | - name: Install latest pip, setuptools + tox 21 | run: | 22 | python -m pip install --upgrade pip setuptools twine wheel 23 | 24 | - name: Build wheels 25 | run: | 26 | python setup.py bdist_wheel 27 | python setup.py sdist 28 | 29 | - name: Upload to PyPI via Twine 30 | env: 31 | TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }} 32 | run: | 33 | twine upload --verbose -u '__token__' dist/* 34 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3 2 | 3 | RUN mkdir /bandersnatch 4 | RUN mkdir /conf && chmod 777 /conf 5 | ADD setup.cfg /bandersnatch 6 | ADD setup.py /bandersnatch 7 | ADD requirements.txt /bandersnatch 8 | ADD requirements_swift.txt /bandersnatch 9 | ADD README.md /bandersnatch 10 | ADD CHANGES.md /bandersnatch 11 | COPY src /bandersnatch/src 12 | 13 | # OPTIONAL: Include a config file 14 | # Remember to bind mount the "directory" in bandersnatch.conf 15 | # Reccomended to bind mount /conf - `runner.py` defaults to look for /conf/bandersnatch.conf 16 | # ADD bandersnatch.conf /etc 17 | 18 | RUN pip --no-cache-dir install --upgrade pip setuptools wheel 19 | RUN pip --no-cache-dir install --upgrade -r /bandersnatch/requirements.txt -r /bandersnatch/requirements_swift.txt 20 | RUN pip --no-cache-dir -v install /bandersnatch/[swift] 21 | 22 | CMD ["python", "/bandersnatch/src/runner.py", "3600"] 23 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | repos: 2 | - repo: https://github.com/pre-commit/pre-commit-hooks 3 | rev: v3.1.0 4 | hooks: 5 | - id: trailing-whitespace 6 | - id: end-of-file-fixer 7 | - id: check-yaml 8 | - id: check-added-large-files 9 | exclude: ^docs/conf.py$ 10 | - repo: https://github.com/ambv/black 11 | rev: 19.10b0 12 | hooks: 13 | - id: black 14 | args: [--target-version, py36] 15 | - repo: https://github.com/asottile/pyupgrade 16 | rev: v2.4.4 17 | hooks: 18 | - id: pyupgrade 19 | args: [--py36-plus] 20 | - repo: https://github.com/asottile/seed-isort-config 21 | rev: v2.1.1 22 | hooks: 23 | - id: seed-isort-config 24 | args: [--application-directories, '.:src'] 25 | - repo: https://github.com/pre-commit/mirrors-isort 26 | rev: v4.3.21 27 | hooks: 28 | - id: isort 29 | - repo: https://github.com/pre-commit/mirrors-mypy 30 | rev: v0.770 31 | hooks: 32 | - id: mypy 33 | exclude: (docs/.*) 34 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | .. documentation master file 2 | You can adapt this file completely to your liking, but it should at least 3 | contain the root `toctree` directive. 4 | 5 | Bandersnatch documentation 6 | ========================== 7 | 8 | bandersnatch is a PyPI mirror client according to `PEP 381` 9 | http://www.python.org/dev/peps/pep-0381/. 10 | 11 | Bandersnatch hits the XMLRPC API of pypi.org to get all packages with serial 12 | or packages since the last run's serial. bandersnatch then uses the JSON API 13 | of PyPI to get shasums and release file paths to download and workout where 14 | to layout the package files on a POSIX file system. 15 | 16 | As of 4.0 bandersnatch: 17 | - Is fully asyncio based (mainly via aiohttp) 18 | - Only stores PEP503 nomalized packages names for the /simple API 19 | - Only stores JSON in normailzed package name path too 20 | 21 | Contents: 22 | 23 | .. toctree:: 24 | :maxdepth: 3 25 | 26 | installation 27 | mirror_configuration 28 | filtering_configuration 29 | CONTRIBUTING 30 | modules 31 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: bandersnatch_ci 2 | 3 | on: [push, pull_request] 4 | 5 | jobs: 6 | build: 7 | name: bandersnatch CI python ${{ matrix.python-version }} on ${{matrix.os}} 8 | runs-on: ${{ matrix.os }} 9 | strategy: 10 | matrix: 11 | python-version: [3.6, 3.7, 3.8] 12 | os: [macOS-latest, ubuntu-latest, windows-latest] 13 | 14 | steps: 15 | - uses: actions/checkout@v1 16 | 17 | - name: Set up Python ${{ matrix.python-version }} 18 | uses: actions/setup-python@v1 19 | with: 20 | python-version: ${{ matrix.python-version }} 21 | 22 | - name: Install latest pip, setuptools + tox 23 | run: | 24 | python -m pip install --upgrade pip setuptools tox 25 | 26 | - name: Install base bandersnatch requirements 27 | run: | 28 | python -m pip install -r requirements.txt 29 | 30 | - name: Run Unittests 31 | env: 32 | TOXENV: py3 33 | run: | 34 | python test_runner.py 35 | 36 | - name: Run Integration Test 37 | env: 38 | TOXENV: INTEGRATION 39 | run: | 40 | python -m pip install . 41 | python test_runner.py 42 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = lint,py3 3 | 4 | [testenv] 5 | passenv = CI TRAVIS TRAVIS_* 6 | commands = 7 | coverage run -m pytest {posargs} 8 | coverage report -m 9 | coverage html 10 | codecov 11 | deps = -r requirements_test.txt 12 | extras = swift 13 | 14 | [testenv:doc_build] 15 | basepython=python3 16 | commands = 17 | {envpython} {envbindir}/sphinx-build -a -W -b html docs docs/html 18 | changedir = {toxinidir} 19 | deps = 20 | -r requirements_docs.txt 21 | sphinx-rtd-theme 22 | 23 | extras = doc_build 24 | passenv = SSH_AUTH_SOCK 25 | setenv = 26 | SPHINX_THEME='pypa' 27 | 28 | [testenv:lint] 29 | basepython=python3 30 | skip_install=True 31 | deps = -r requirements_test.txt 32 | commands= 33 | pre-commit run --all-files --show-diff-on-failure 34 | flake8 35 | 36 | [isort] 37 | atomic = true 38 | not_skip = __init__.py 39 | line_length = 88 40 | multi_line_output = 3 41 | known_third_party = aiohttp,aiohttp_xmlrpc,asynctest,filelock,freezegun,importlib_resources,packaging,pytest,setuptools 42 | known_first_party = bandersnatch,bandersnatch_filter_plugins,bandersnatch_storage_plugins 43 | force_grid_wrap = 0 44 | use_parentheses=True 45 | include_trailing_comma = True 46 | -------------------------------------------------------------------------------- /src/bandersnatch_filter_plugins/prerelease_name.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import re 3 | from typing import Dict, List, Pattern 4 | 5 | from bandersnatch.filter import FilterReleasePlugin 6 | 7 | logger = logging.getLogger("bandersnatch") 8 | 9 | 10 | class PreReleaseFilter(FilterReleasePlugin): 11 | """ 12 | Filters releases considered pre-releases. 13 | """ 14 | 15 | name = "prerelease_release" 16 | PRERELEASE_PATTERNS = ( 17 | r".+rc\d+$", 18 | r".+a(lpha)?\d+$", 19 | r".+b(eta)?\d+$", 20 | r".+dev\d+$", 21 | ) 22 | patterns: List[Pattern] = [] 23 | 24 | def initialize_plugin(self) -> None: 25 | """ 26 | Initialize the plugin reading patterns from the config. 27 | """ 28 | if not self.patterns: 29 | self.patterns = [ 30 | re.compile(pattern_string) 31 | for pattern_string in self.PRERELEASE_PATTERNS 32 | ] 33 | logger.info(f"Initialized prerelease plugin with {self.patterns}") 34 | 35 | def filter(self, metadata: Dict) -> bool: 36 | """ 37 | Returns False if version fails the filter, i.e. follows a prerelease pattern 38 | """ 39 | version = metadata["version"] 40 | return not any(pattern.match(version) for pattern in self.patterns) 41 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_sync.py: -------------------------------------------------------------------------------- 1 | from os import sep 2 | 3 | import asynctest 4 | import pytest 5 | 6 | from bandersnatch import utils 7 | from bandersnatch.mirror import BandersnatchMirror 8 | 9 | 10 | @pytest.mark.asyncio 11 | async def test_sync_specific_packages(mirror: BandersnatchMirror) -> None: 12 | FAKE_SERIAL = b"112233" 13 | with open("status", "wb") as f: 14 | f.write(FAKE_SERIAL) 15 | # Package names should be normalized by synchronize() 16 | specific_packages = ["Foo"] 17 | mirror.master.all_packages = asynctest.CoroutineMock( # type: ignore 18 | return_value={"foo": 1} 19 | ) 20 | mirror.json_save = True 21 | # Recall bootstrap so we have the json dirs 22 | mirror._bootstrap() 23 | await mirror.synchronize(specific_packages) 24 | 25 | assert """\ 26 | json{0}foo 27 | packages{0}2.7{0}f{0}foo{0}foo.whl 28 | packages{0}any{0}f{0}foo{0}foo.zip 29 | pypi{0}foo{0}json 30 | simple{0}foo{0}index.html 31 | simple{0}index.html""".format( 32 | sep 33 | ) == utils.find( 34 | mirror.webdir, dirs=False 35 | ) 36 | 37 | assert ( 38 | open("web{0}simple{0}index.html".format(sep)).read() 39 | == """\ 40 | 41 | 42 | 43 | Simple Index 44 | 45 | 46 | foo
47 | 48 | """ 49 | ) 50 | # The "sync" method shouldn't update the serial 51 | assert open("status", "rb").read() == FAKE_SERIAL 52 | -------------------------------------------------------------------------------- /src/runner.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # Simple Script to replace cron for Docker 4 | 5 | import argparse 6 | import sys 7 | from subprocess import CalledProcessError, run 8 | from time import sleep, time 9 | 10 | 11 | def main() -> int: 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument( 14 | "-c", 15 | "--config", 16 | default="/conf/bandersnatch.conf", 17 | help="Configuration location", 18 | ) 19 | parser.add_argument("interval", help="Time in seconds between runs", type=int) 20 | args = parser.parse_args() 21 | 22 | print(f"Running bandersnatch every {args.interval}s", file=sys.stderr) 23 | try: 24 | while True: 25 | start_time = time() 26 | 27 | try: 28 | cmd = [ 29 | sys.executable, 30 | "-m", 31 | "bandersnatch.main", 32 | "--config", 33 | args.config, 34 | "mirror", 35 | ] 36 | run(cmd, check=True) 37 | except CalledProcessError as cpe: 38 | return cpe.returncode 39 | 40 | run_time = time() - start_time 41 | if run_time < args.interval: 42 | sleep_time = args.interval - run_time 43 | print(f"Sleeping for {sleep_time}s", file=sys.stderr) 44 | sleep(sleep_time) 45 | except KeyboardInterrupt: 46 | pass 47 | 48 | return 0 49 | 50 | 51 | if __name__ == "__main__": 52 | sys.exit(main()) 53 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | 3 | env: 4 | global: 5 | secure: "d2LR/WLe9ZragzMEXGGjkFGoPiVm8ilCc41CEhnsJNRUzDo5crfZY1OTzE45CIx3kpVYgUNUveb37Q2WMYI9X7xJp4gY+gxEOWG4EMefkXu2IwUBzwKaqA0oU1pYcnm5FlU8MoBL3HBWagTvGG7Q90YR4s/78HJTDrwHjTAlZctyh5O89vHq8yjDDa/FIwEytns3U8FsVp5IGe+vvDBsbrlFgW0kGG2ayc2bO0i9wof0RF9J7gre5zrKg9h80AHxbmprZ9hhjsYPj3cgEniaQn5dFxf7k3YfkPMvr2h9HHdHPucF+KRiux2/UvQ/CPeSpZGmcC+YzHcgliKK/bkl2MHZDQJ78V+vhJKchbZ+3iVyuFYbhggE5nmUjDMpthnfhraGGIPc9ZYwwKTYhLMc2AlcBLu3+cLAAcCT7gl4ArZQT6+1jXrMApulPIHqxsnxwTKxBPyuq0M7w5TJJMpgXGy4l5xUO/z8FYAQ1+rBHif3Sy36Sh2w0cAAKCz46dow5ZdpXhxUurA9VeCkQRb0D/fg59N/KoAnsjbbbgUyg3zjsxF8OiLMqyOTnagecFzCMUjT8yT0cknEz8oCwznEbP3seqzGevTzmC8yXXAhAFeaGwdh8t7WkOT5I50cOZ+bNgnyi1nsiEBzaGDNEVtd+uI91Ij5GTT54ZRoXBxaS0k=" 6 | 7 | matrix: 8 | fast_finish: true 9 | include: 10 | - python: 3.8 11 | env: TOXENV=doc_build 12 | - python: 3.8 13 | env: TOXENV=lint 14 | - python: 3.8 15 | env: TOXENV=py3 16 | - python: nightly 17 | env: TOXENV=INTEGRATION 18 | - python: nightly 19 | env: TOXENV=py3 20 | allow_failures: 21 | - python: nightly 22 | env: TOXENV=INTEGRATION 23 | - python: nightly 24 | env: TOXENV=py3 25 | 26 | install: 27 | - pip install --upgrade pip setuptools 28 | - pip install -r requirements.txt -r requirements_test.txt 29 | - pip install . 30 | 31 | script: 32 | - python test_runner.py 33 | 34 | notifications: 35 | irc: 36 | channels: 37 | - "chat.freenode.net#bandersnatch" 38 | 39 | cache: 40 | directories: 41 | - $HOME/.cache/pip 42 | - $HOME/.cache/pre-commit 43 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_package.py: -------------------------------------------------------------------------------- 1 | import asynctest 2 | import pytest 3 | from _pytest.capture import CaptureFixture 4 | 5 | from bandersnatch.errors import PackageNotFound, StaleMetadata 6 | from bandersnatch.master import Master, StalePage 7 | from bandersnatch.package import Package 8 | 9 | 10 | def test_package_accessors(package: Package) -> None: 11 | assert package.info == {"name": "Foo", "version": "0.1"} 12 | assert package.last_serial == 654_321 13 | assert list(package.releases.keys()) == ["0.1"] 14 | assert len(package.release_files) == 2 15 | for f in package.release_files: 16 | assert "filename" in f 17 | assert "digests" in f 18 | 19 | 20 | @pytest.mark.asyncio 21 | async def test_package_update_metadata_gives_up_after_3_stale_responses( 22 | caplog: CaptureFixture, master: Master 23 | ) -> None: 24 | master.get_package_metadata = asynctest.CoroutineMock( # type: ignore 25 | side_effect=StalePage 26 | ) 27 | package = Package("foo", serial=11) 28 | 29 | with pytest.raises(StaleMetadata): 30 | await package.update_metadata(master, attempts=3) 31 | assert master.get_package_metadata.await_count == 3 # type: ignore 32 | assert "not updating. Giving up" in caplog.text 33 | 34 | 35 | @pytest.mark.asyncio 36 | async def test_package_not_found(caplog: CaptureFixture, master: Master) -> None: 37 | pkg_name = "foo" 38 | master.get_package_metadata = asynctest.CoroutineMock( # type: ignore 39 | side_effect=PackageNotFound(pkg_name) 40 | ) 41 | package = Package(pkg_name, serial=11) 42 | 43 | with pytest.raises(PackageNotFound): 44 | await package.update_metadata(master) 45 | assert "foo no longer exists on PyPI" in caplog.text 46 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | test.conf 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | doc/html/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # IPython Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # dotenv 80 | .env 81 | 82 | # virtualenv 83 | venv/ 84 | ENV/ 85 | 86 | # Spyder project settings 87 | .spyderproject 88 | 89 | # Rope project settings 90 | .ropeproject 91 | 92 | # VSCode project settings 93 | .vscode/ 94 | 95 | /.coverage 96 | .tox/ 97 | artifacts/*.xml 98 | UNKNOWN.egg-info/ 99 | 100 | /.idea 101 | 102 | .mypy_cache 103 | /.pytest_cache 104 | /docs/html 105 | 106 | # MonkeyType 107 | monkeytype.sqlite3 108 | 109 | # Integration test 110 | mirrored-files 111 | -------------------------------------------------------------------------------- /docs/bandersnatch_filter_plugins.rst: -------------------------------------------------------------------------------- 1 | bandersnatch_filter_plugins package 2 | =================================== 3 | 4 | Package contents 5 | ---------------- 6 | 7 | .. automodule:: bandersnatch_filter_plugins 8 | :members: 9 | :undoc-members: 10 | :show-inheritance: 11 | 12 | Submodules 13 | ---------- 14 | 15 | bandersnatch_filter_plugins.blocklist_name module 16 | ------------------------------------------------- 17 | 18 | .. automodule:: bandersnatch_filter_plugins.blocklist_name 19 | :members: 20 | :undoc-members: 21 | :show-inheritance: 22 | 23 | bandersnatch_filter_plugins.filename_name module 24 | ------------------------------------------------ 25 | 26 | .. automodule:: bandersnatch_filter_plugins.filename_name 27 | :members: 28 | :undoc-members: 29 | :show-inheritance: 30 | 31 | bandersnatch_filter_plugins.latest_name module 32 | ---------------------------------------------- 33 | 34 | .. automodule:: bandersnatch_filter_plugins.latest_name 35 | :members: 36 | :undoc-members: 37 | :show-inheritance: 38 | 39 | bandersnatch_filter_plugins.metadata_filter module 40 | -------------------------------------------------- 41 | 42 | .. automodule:: bandersnatch_filter_plugins.metadata_filter 43 | :members: 44 | :undoc-members: 45 | :show-inheritance: 46 | 47 | bandersnatch_filter_plugins.prerelease_name module 48 | -------------------------------------------------- 49 | 50 | .. automodule:: bandersnatch_filter_plugins.prerelease_name 51 | :members: 52 | :undoc-members: 53 | :show-inheritance: 54 | 55 | bandersnatch_filter_plugins.regex_name module 56 | --------------------------------------------- 57 | 58 | .. automodule:: bandersnatch_filter_plugins.regex_name 59 | :members: 60 | :undoc-members: 61 | :show-inheritance: 62 | 63 | bandersnatch_filter_plugins.allowlist_name module 64 | ------------------------------------------------- 65 | 66 | .. automodule:: bandersnatch_filter_plugins.allowlist_name 67 | :members: 68 | :undoc-members: 69 | :show-inheritance: 70 | -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | # Maintaining Bandersnatch 2 | 3 | This document sets out the roles, processes and responsibilities `bandersnatch` 4 | maintainers hold and can conduct. 5 | 6 | ## Summary of being a Maintainer of `bandersnatch` 7 | 8 | - **Issue Triage** 9 | - Assesses if the Issue is accurate + reproducible 10 | - If the issue is a feature request, assesses if it fits the *bandersnatch mission* 11 | - **PR Merging** 12 | - Accesses Pull Requests for suitability and adherence to the *bandersnatch mission* 13 | - It is preferred that big changes be pulling in from *branches* via *Pull Requests* 14 | - Peer reviewed by another maintainer 15 | - **Releases** 16 | - You will have **"the commit bit"** access 17 | 18 | ### Links to key mentioned files 19 | 20 | - Change Log: [CHANGES.md](https://github.com/pypa/bandersnatch/blob/master/CHANGES.md) 21 | - Mission Statement: Can be found in bandersnatch's [README.md](https://github.com/pypa/bandersnatch/blob/master/README.md) 22 | - Readme File: [README.md](https://github.com/pypa/bandersnatch/blob/master/README.md) 23 | - Semantic Versioning: [PEP 440 Semantic](https://www.python.org/dev/peps/pep-0440/#semantic-versioning) 24 | 25 | ## Processes 26 | 27 | ### Evaluating Issues and Pull Requests 28 | 29 | Please always think of the mission of bandersnatch. We should just mirror in a 30 | compatible way like a PEP381 mirror. Simple is always better than complex and all *bug* 31 | issues need to be reproducable for our developers. 32 | 33 | #### pyup.io 34 | - Remember it's not perfect 35 | - It does not take into account modules pinned dependencies 36 | - e.g. If requests wants *urllib3<1.25* *pyup.io* can still try and update it 37 | - Until we have **CI** that effectively runs `pip freeze` from time to time we 38 | should recheck our minimal deps that we pin in `requirements.txt` 39 | 40 | ### Releasing to PyPI 41 | Every maintainer can release to PyPI. A maintainer should have agreement of 42 | two or more Maintainers that it is a suitable time for a release. 43 | 44 | #### Release Process 45 | 46 | - Update `src/bandersnatch/__init__.py` version 47 | - Update the Change Log with difference from the last release 48 | - Push / Merge to Master 49 | - Create a GitHub Release 50 | - Tag with the semantic version number 51 | - Build a `sdist` + `wheel` 52 | - Use `twine` to upload to PyPI 53 | -------------------------------------------------------------------------------- /docs/bandersnatch.rst: -------------------------------------------------------------------------------- 1 | bandersnatch package 2 | ==================== 3 | 4 | Package contents 5 | ---------------- 6 | 7 | .. automodule:: bandersnatch 8 | :members: 9 | :undoc-members: 10 | :show-inheritance: 11 | 12 | Submodules 13 | ---------- 14 | 15 | bandersnatch.configuration module 16 | --------------------------------- 17 | 18 | .. automodule:: bandersnatch.configuration 19 | :members: 20 | :undoc-members: 21 | :show-inheritance: 22 | 23 | bandersnatch.delete module 24 | -------------------------- 25 | 26 | .. automodule:: bandersnatch.delete 27 | :members: 28 | :undoc-members: 29 | :show-inheritance: 30 | 31 | bandersnatch.filter module 32 | -------------------------- 33 | 34 | .. automodule:: bandersnatch.filter 35 | :members: 36 | :undoc-members: 37 | :show-inheritance: 38 | 39 | bandersnatch.log module 40 | ----------------------- 41 | 42 | .. automodule:: bandersnatch.log 43 | :members: 44 | :undoc-members: 45 | :show-inheritance: 46 | 47 | bandersnatch.main module 48 | ------------------------ 49 | 50 | .. automodule:: bandersnatch.main 51 | :members: 52 | :undoc-members: 53 | :show-inheritance: 54 | 55 | bandersnatch.master module 56 | -------------------------- 57 | 58 | .. automodule:: bandersnatch.master 59 | :members: 60 | :undoc-members: 61 | :show-inheritance: 62 | 63 | bandersnatch.mirror module 64 | -------------------------- 65 | 66 | .. automodule:: bandersnatch.mirror 67 | :members: 68 | :undoc-members: 69 | :show-inheritance: 70 | 71 | bandersnatch.package module 72 | --------------------------- 73 | 74 | .. automodule:: bandersnatch.package 75 | :members: 76 | :undoc-members: 77 | :show-inheritance: 78 | 79 | bandersnatch.storage module 80 | --------------------------- 81 | 82 | .. automodule:: bandersnatch.storage 83 | :members: 84 | :undoc-members: 85 | :show-inheritance: 86 | 87 | bandersnatch.utils module 88 | ------------------------- 89 | 90 | .. automodule:: bandersnatch.utils 91 | :members: 92 | :undoc-members: 93 | :show-inheritance: 94 | 95 | bandersnatch.verify module 96 | -------------------------- 97 | 98 | .. automodule:: bandersnatch.verify 99 | :members: 100 | :undoc-members: 101 | :show-inheritance: 102 | -------------------------------------------------------------------------------- /src/bandersnatch_filter_plugins/latest_name.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from operator import itemgetter 3 | from typing import Dict, Sequence, Tuple, Union 4 | 5 | from packaging.version import LegacyVersion, Version, parse 6 | 7 | from bandersnatch.filter import FilterReleasePlugin 8 | 9 | logger = logging.getLogger("bandersnatch") 10 | 11 | 12 | class LatestReleaseFilter(FilterReleasePlugin): 13 | """ 14 | Plugin to download only latest releases 15 | """ 16 | 17 | name = "latest_release" 18 | keep = 0 # by default, keep 'em all 19 | latest: Sequence[str] = [] 20 | 21 | def initialize_plugin(self) -> None: 22 | """ 23 | Initialize the plugin reading patterns from the config. 24 | """ 25 | if self.keep: 26 | return 27 | 28 | try: 29 | self.keep = int(self.configuration["latest_release"]["keep"]) 30 | except KeyError: 31 | return 32 | except ValueError: 33 | return 34 | if self.keep > 0: 35 | logger.info(f"Initialized latest releases plugin with keep={self.keep}") 36 | 37 | def filter(self, metadata: Dict) -> bool: 38 | """ 39 | Returns False if version fails the filter, i.e. is not a latest/current release 40 | """ 41 | if self.keep == 0: 42 | return True 43 | 44 | if not self.latest: 45 | info = metadata["info"] 46 | releases = metadata["releases"] 47 | versions = list(releases.keys()) 48 | before = len(versions) 49 | 50 | if before <= self.keep: 51 | # not enough releases: do nothing 52 | return True 53 | 54 | versions_pair = map(lambda v: (parse(v), v), versions) 55 | latest_sorted: Sequence[Tuple[Union[LegacyVersion, Version], str]] = sorted( 56 | versions_pair 57 | )[ 58 | -self.keep : # noqa: E203 59 | ] 60 | self.latest = list(map(itemgetter(1), latest_sorted)) 61 | 62 | current_version = info.get("version") 63 | if current_version and (current_version not in self.latest): 64 | # never remove the stable/official version 65 | self.latest[0] = current_version 66 | 67 | version = metadata["version"] 68 | return version in self.latest 69 | -------------------------------------------------------------------------------- /DEVELOPMENT.md: -------------------------------------------------------------------------------- 1 | # bandersnatch development 2 | 3 | So you want to help out? **Awesome**. Go you! 4 | 5 | ## Getting Started 6 | 7 | We use GitHub. To get started I'd suggest visiting https://guides.github.com/ 8 | 9 | ### Pre Install 10 | 11 | Please make sure you system has the following: 12 | 13 | - Python 3.6.1 or greater 14 | - git cli client 15 | 16 | Also ensure you can authenticate with GitHub via SSH Keys or HTTPS. 17 | 18 | ### Checkout `bandersnatch` 19 | 20 | Lets now cd to where we want the code and clone the repo: 21 | 22 | - `cd somewhere` 23 | - `git clone git@github.com:pypa/bandersnatch.git` 24 | 25 | ### Development venv 26 | 27 | One way to develop and install all the dependencies of bandersnatch is to use a venv. 28 | 29 | - Lets create one and upgrade `pip` and `setuptools`. 30 | 31 | ``` 32 | python3 -m venv /path/to/venv 33 | /path/to/venv/bin/pip install --upgrade pip setuptools 34 | ``` 35 | 36 | - Then we should install the dependencies to the venv: 37 | 38 | ``` 39 | /path/to/venv/bin/pip install -r requirements.txt 40 | /path/to/venv/bin/pip install -r requirements_test.txt 41 | ``` 42 | 43 | - To verify any changes in the documentation: 44 | 45 | **NOTICE:** This effectively installs `requirements_swift` *and* `requirements_docs.txt` 46 | since the dependencies are needed by autodoc which imports all of bandersnatch during 47 | documention building. So pip will install **a lot** of dependencies. 48 | 49 | ``` 50 | /path/to/venv/bin/pip install -r requirements_docs.txt 51 | ``` 52 | 53 | - Finally install bandersnatch in editable mode: 54 | 55 | ``` 56 | /path/to/venv/bin/pip install -e . 57 | ``` 58 | 59 | ## Running Bandersnatch 60 | 61 | You will need to customize `src/bandersnatch/default.conf` and run via the following: 62 | 63 | **WARNING: Bandersnatch will go off and sync from pypi.org and use disk space!** 64 | 65 | ``` 66 | cd bandersnatch 67 | /path/to/venv/bin/pip install --upgrade . 68 | /path/to/venv/bin/bandersnatch --help 69 | 70 | /path/to/venv/bin/bandersnatch -c src/bandersnatch/default.conf mirror 71 | ``` 72 | 73 | ## Running Unit Tests 74 | 75 | We use tox to run tests. `tox.ini` has the options needed, so running tests is very easy. 76 | 77 | ``` 78 | cd bandersnatch 79 | /path/to/venv/bin/tox [-vv] 80 | ``` 81 | 82 | You want to see: 83 | ``` 84 | py36: commands succeeded 85 | congratulations :) 86 | ``` 87 | 88 | 89 | ## Making a release 90 | 91 | *To be completed - @cooper has never used zc.buildout* 92 | 93 | * Tests green? 94 | * run `bin/fullrelease` 95 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/plugins/test_prerelease_name.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | from pathlib import Path 4 | from tempfile import TemporaryDirectory 5 | from unittest import TestCase 6 | 7 | from mock_config import mock_config 8 | 9 | import bandersnatch.filter 10 | from bandersnatch.master import Master 11 | from bandersnatch.mirror import BandersnatchMirror 12 | from bandersnatch.package import Package 13 | from bandersnatch_filter_plugins import prerelease_name 14 | 15 | 16 | class BasePluginTestCase(TestCase): 17 | 18 | tempdir = None 19 | cwd = None 20 | 21 | def setUp(self) -> None: 22 | self.cwd = os.getcwd() 23 | self.tempdir = TemporaryDirectory() 24 | os.chdir(self.tempdir.name) 25 | 26 | def tearDown(self) -> None: 27 | if self.tempdir: 28 | assert self.cwd 29 | os.chdir(self.cwd) 30 | self.tempdir.cleanup() 31 | self.tempdir = None 32 | 33 | 34 | class TestRegexReleaseFilter(BasePluginTestCase): 35 | 36 | config_contents = """\ 37 | [plugins] 38 | enabled = 39 | prerelease_release 40 | """ 41 | 42 | def test_plugin_includes_predefined_patterns(self) -> None: 43 | mock_config(self.config_contents) 44 | 45 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 46 | 47 | assert any( 48 | type(plugin) == prerelease_name.PreReleaseFilter for plugin in plugins 49 | ) 50 | plugin = next( 51 | plugin 52 | for plugin in plugins 53 | if isinstance(plugin, prerelease_name.PreReleaseFilter) 54 | ) 55 | expected_patterns = [ 56 | re.compile(pattern_string) for pattern_string in plugin.PRERELEASE_PATTERNS 57 | ] 58 | assert plugin.patterns == expected_patterns 59 | 60 | def test_plugin_check_match(self) -> None: 61 | mock_config(self.config_contents) 62 | 63 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 64 | pkg = Package("foo", serial=1) 65 | pkg._metadata = { 66 | "info": {"name": "foo", "version": "1.2.0"}, 67 | "releases": { 68 | "1.2.0alpha1": {}, 69 | "1.2.0a2": {}, 70 | "1.2.0beta1": {}, 71 | "1.2.0b2": {}, 72 | "1.2.0rc1": {}, 73 | "1.2.0": {}, 74 | }, 75 | } 76 | 77 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 78 | 79 | assert pkg.releases == {"1.2.0": {}} 80 | -------------------------------------------------------------------------------- /src/bandersnatch_filter_plugins/regex_name.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import re 3 | from typing import Dict, List, Pattern 4 | 5 | from bandersnatch.filter import FilterProjectPlugin, FilterReleasePlugin 6 | 7 | logger = logging.getLogger("bandersnatch") 8 | 9 | 10 | class RegexReleaseFilter(FilterReleasePlugin): 11 | """ 12 | Filters releases based on regex patters defined by the user. 13 | """ 14 | 15 | name = "regex_release" 16 | # Has to be iterable to ensure it works with any() 17 | patterns: List[Pattern] = [] 18 | 19 | def initialize_plugin(self) -> None: 20 | """ 21 | Initialize the plugin reading patterns from the config. 22 | """ 23 | # TODO: should retrieving the plugin's config be part of the base class? 24 | try: 25 | config = self.configuration["filter_regex"]["releases"] 26 | except KeyError: 27 | return 28 | else: 29 | if not self.patterns: 30 | pattern_strings = [pattern for pattern in config.split("\n") if pattern] 31 | self.patterns = [ 32 | re.compile(pattern_string) for pattern_string in pattern_strings 33 | ] 34 | 35 | logger.info(f"Initialized regex release plugin with {self.patterns}") 36 | 37 | def filter(self, metadata: Dict) -> bool: 38 | """ 39 | Returns False if version fails the filter, i.e. follows a regex pattern 40 | """ 41 | version = metadata["version"] 42 | return not any(pattern.match(version) for pattern in self.patterns) 43 | 44 | 45 | class RegexProjectFilter(FilterProjectPlugin): 46 | """ 47 | Filters projects based on regex patters defined by the user. 48 | """ 49 | 50 | name = "regex_project" 51 | # Has to be iterable to ensure it works with any() 52 | patterns: List[Pattern] = [] 53 | 54 | def initialize_plugin(self) -> None: 55 | """ 56 | Initialize the plugin reading patterns from the config. 57 | """ 58 | try: 59 | config = self.configuration["filter_regex"]["packages"] 60 | except KeyError: 61 | return 62 | else: 63 | if not self.patterns: 64 | pattern_strings = [pattern for pattern in config.split("\n") if pattern] 65 | self.patterns = [ 66 | re.compile(pattern_string) for pattern_string in pattern_strings 67 | ] 68 | 69 | logger.info(f"Initialized regex release plugin with {self.patterns}") 70 | 71 | def filter(self, metadata: Dict) -> bool: 72 | return not self.check_match(name=metadata["info"]["name"]) 73 | 74 | def check_match(self, name: str) -> bool: # type: ignore[override] 75 | """ 76 | Check if a release version matches any of the specified patterns. 77 | 78 | Parameters 79 | ========== 80 | name: str 81 | Release name 82 | 83 | Returns 84 | ======= 85 | bool: 86 | True if it matches, False otherwise. 87 | """ 88 | return any(pattern.match(name) for pattern in self.patterns) 89 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_master.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | from tempfile import gettempdir 3 | 4 | import asynctest 5 | import pytest 6 | 7 | import bandersnatch 8 | from bandersnatch.master import Master, StalePage, XmlRpcError 9 | 10 | 11 | def test_disallow_http() -> None: 12 | with pytest.raises(ValueError): 13 | Master("http://pypi.example.com") 14 | 15 | 16 | def test_rpc_url(master: Master) -> None: 17 | assert master.xmlrpc_url == "https://pypi.example.com/pypi" 18 | 19 | 20 | @pytest.mark.asyncio 21 | async def test_all_packages(master: Master) -> None: 22 | expected = [["aiohttp", "", "", "", "69"]] 23 | master.rpc = asynctest.CoroutineMock(return_value=expected) # type: ignore 24 | pacakges = await master.all_packages() 25 | assert expected == pacakges 26 | 27 | 28 | @pytest.mark.asyncio 29 | async def test_all_packages_raises(master: Master) -> None: 30 | master.rpc = asynctest.CoroutineMock(return_value=[]) # type: ignore 31 | with pytest.raises(XmlRpcError): 32 | await master.all_packages() 33 | 34 | 35 | @pytest.mark.asyncio 36 | async def test_changed_packages_no_changes(master: Master) -> None: 37 | master.rpc = asynctest.CoroutineMock(return_value=None) # type: ignore 38 | changes = await master.changed_packages(4) 39 | assert changes == {} 40 | 41 | 42 | @pytest.mark.asyncio 43 | async def test_changed_packages_with_changes(master: Master) -> None: 44 | list_of_package_changes = [ 45 | ("foobar", "1", 0, "added", 17), 46 | ("baz", "2", 1, "updated", 18), 47 | ("foobar", "1", 0, "changed", 20), 48 | # The server usually just hands out monotonous serials in the 49 | # changelog. This verifies that we don't fail even with garbage input. 50 | ("foobar", "1", 0, "changed", 19), 51 | ] 52 | master.rpc = asynctest.CoroutineMock( # type: ignore 53 | return_value=list_of_package_changes 54 | ) 55 | changes = await master.changed_packages(4) 56 | assert changes == {"baz": 18, "foobar": 20} 57 | 58 | 59 | @pytest.mark.asyncio 60 | async def test_master_raises_if_serial_too_small(master: Master) -> None: 61 | get_ag = master.get("/asdf", 10) 62 | with pytest.raises(StalePage): 63 | await get_ag.asend(None) 64 | assert master.session.request.called 65 | 66 | 67 | @pytest.mark.asyncio 68 | async def test_master_doesnt_raise_if_serial_equal(master: Master) -> None: 69 | get_ag = master.get("/asdf", 1) 70 | await get_ag.asend(None) 71 | 72 | 73 | @pytest.mark.asyncio 74 | async def test_master_url_fetch(master: Master) -> None: 75 | fetch_path = Path(gettempdir()) / "unittest_url_fetch" 76 | await master.url_fetch("https://unittest.org/asdf", fetch_path) 77 | assert master.session.get.called 78 | 79 | 80 | @pytest.mark.asyncio 81 | async def test_xmlrpc_user_agent(master: Master) -> None: 82 | client = await master._gen_xmlrpc_client() 83 | assert f"bandersnatch {bandersnatch.__version__}" in client.headers["User-Agent"] 84 | 85 | 86 | @pytest.mark.asyncio 87 | async def test_session_raise_for_status(master: Master) -> None: 88 | patcher = asynctest.patch("aiohttp.ClientSession", autospec=True) 89 | with patcher as create_session: 90 | async with master: 91 | pass 92 | assert len(create_session.call_args_list) == 1 93 | assert create_session.call_args_list[0][1]["raise_for_status"] 94 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/plugins/test_regex_name.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | from pathlib import Path 4 | from tempfile import TemporaryDirectory 5 | from unittest import TestCase 6 | 7 | from mock_config import mock_config 8 | 9 | import bandersnatch.filter 10 | from bandersnatch.master import Master 11 | from bandersnatch.mirror import BandersnatchMirror 12 | from bandersnatch.package import Package 13 | from bandersnatch_filter_plugins import regex_name 14 | 15 | 16 | class BasePluginTestCase(TestCase): 17 | 18 | tempdir = None 19 | cwd = None 20 | 21 | def setUp(self) -> None: 22 | self.cwd = os.getcwd() 23 | self.tempdir = TemporaryDirectory() 24 | os.chdir(self.tempdir.name) 25 | 26 | def tearDown(self) -> None: 27 | if self.tempdir: 28 | assert self.cwd 29 | os.chdir(self.cwd) 30 | self.tempdir.cleanup() 31 | self.tempdir = None 32 | 33 | 34 | class TestRegexReleaseFilter(BasePluginTestCase): 35 | 36 | config_contents = """\ 37 | [plugins] 38 | enabled = 39 | regex_release 40 | 41 | [filter_regex] 42 | releases = 43 | .+rc\\d$ 44 | .+alpha\\d$ 45 | """ 46 | 47 | def test_plugin_compiles_patterns(self) -> None: 48 | mock_config(self.config_contents) 49 | 50 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 51 | 52 | assert any(type(plugin) == regex_name.RegexReleaseFilter for plugin in plugins) 53 | plugin = next( 54 | plugin 55 | for plugin in plugins 56 | if isinstance(plugin, regex_name.RegexReleaseFilter) 57 | ) 58 | assert plugin.patterns == [re.compile(r".+rc\d$"), re.compile(r".+alpha\d$")] 59 | 60 | def test_plugin_check_match(self) -> None: 61 | mock_config(self.config_contents) 62 | 63 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 64 | pkg = Package("foo", 1) 65 | pkg._metadata = { 66 | "info": {"name": "foo", "version": "foo-1.2.0"}, 67 | "releases": {"foo-1.2.0rc2": {}, "foo-1.2.0": {}, "foo-1.2.0alpha2": {}}, 68 | } 69 | 70 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 71 | 72 | assert pkg.releases == {"foo-1.2.0": {}} 73 | 74 | 75 | class TestRegexProjectFilter(BasePluginTestCase): 76 | 77 | config_contents = """\ 78 | [plugins] 79 | enabled = 80 | regex_project 81 | 82 | [filter_regex] 83 | packages = 84 | .+-evil$ 85 | .+-neutral$ 86 | """ 87 | 88 | def test_plugin_compiles_patterns(self) -> None: 89 | mock_config(self.config_contents) 90 | 91 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins() 92 | 93 | assert any(type(plugin) == regex_name.RegexProjectFilter for plugin in plugins) 94 | plugin = next( 95 | plugin 96 | for plugin in plugins 97 | if isinstance(plugin, regex_name.RegexProjectFilter) 98 | ) 99 | assert plugin.patterns == [re.compile(r".+-evil$"), re.compile(r".+-neutral$")] 100 | 101 | def test_plugin_check_match(self) -> None: 102 | mock_config(self.config_contents) 103 | 104 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 105 | mirror.packages_to_sync = {"foo-good": "", "foo-evil": "", "foo-neutral": ""} 106 | mirror._filter_packages() 107 | 108 | assert list(mirror.packages_to_sync.keys()) == ["foo-good"] 109 | -------------------------------------------------------------------------------- /src/bandersnatch_filter_plugins/filename_name.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from typing import Dict, List 3 | 4 | from bandersnatch.filter import FilterReleaseFilePlugin 5 | 6 | logger = logging.getLogger("bandersnatch") 7 | 8 | 9 | class ExcludePlatformFilter(FilterReleaseFilePlugin): 10 | """ 11 | Filters releases based on regex patters defined by the user. 12 | """ 13 | 14 | name = "exclude_platform" 15 | 16 | _patterns: List[str] = [] 17 | _packagetypes: List[str] = [] 18 | 19 | _windowsPlatformTypes = [".win32", "-win32", "win_amd64", "win-amd64"] 20 | 21 | _linuxPlatformTypes = [ 22 | "linux-i686", # PEP 425 23 | "linux-x86_64", # PEP 425 24 | "linux_armv7l", # https://github.com/pypa/warehouse/pull/2010 25 | "linux_armv6l", # https://github.com/pypa/warehouse/pull/2012 26 | "manylinux1_i686", # PEP 513 27 | "manylinux1_x86_64", # PEP 513 28 | "manylinux2010_i686", # PEP 571 29 | "manylinux2010_x86_64", # PEP 571 30 | ] 31 | 32 | def initialize_plugin(self) -> None: 33 | """ 34 | Initialize the plugin reading patterns from the config. 35 | """ 36 | if self._patterns or self._packagetypes: 37 | logger.debug( 38 | "Skipping initalization of Exclude Platform plugin. " 39 | + "Already initialized" 40 | ) 41 | return 42 | 43 | try: 44 | tags = self.blocklist["platforms"].split() 45 | except KeyError: 46 | logger.error(f"Plugin {self.name}: missing platforms= setting") 47 | return 48 | 49 | for platform in tags: 50 | lplatform = platform.lower() 51 | 52 | if lplatform in ("windows", "win"): 53 | # PEP 425 54 | # see also setuptools/package_index.py 55 | self._patterns.extend(self._windowsPlatformTypes) 56 | # PEP 527 57 | self._packagetypes.extend(["bdist_msi", "bdist_wininst"]) 58 | 59 | elif lplatform in ("macos", "macosx"): 60 | self._patterns.extend(["macosx_", "macosx-"]) 61 | self._packagetypes.extend(["bdist_dmg"]) 62 | 63 | elif lplatform in ("freebsd"): 64 | # concerns only very few files 65 | self._patterns.extend([".freebsd", "-freebsd"]) 66 | 67 | elif lplatform in ("linux"): 68 | self._patterns.extend(self._linuxPlatformTypes) 69 | self._packagetypes.extend(["bdist_rpm"]) 70 | 71 | # check for platform specific architectures 72 | elif lplatform in self._windowsPlatformTypes: 73 | self._patterns.extend([lplatform]) 74 | 75 | elif lplatform in self._linuxPlatformTypes: 76 | self._patterns.extend([lplatform]) 77 | 78 | logger.info(f"Initialized {self.name} plugin with {self._patterns!r}") 79 | 80 | def filter(self, metadata: Dict) -> bool: 81 | """ 82 | Returns False if file matches any of the filename patterns 83 | """ 84 | file = metadata["release_file"] 85 | return not self._check_match(file) 86 | 87 | def _check_match(self, file_desc: Dict) -> bool: 88 | """ 89 | Check if a release version matches any of the specified patterns. 90 | 91 | Parameters 92 | ========== 93 | file_desc: Dict 94 | file description entry 95 | 96 | Returns 97 | ======= 98 | bool: 99 | True if it matches, False otherwise. 100 | """ 101 | 102 | # source dist: never filter out 103 | pt = file_desc.get("packagetype") 104 | if pt == "sdist": 105 | return False 106 | 107 | # Windows installer 108 | if pt in self._packagetypes: 109 | return True 110 | 111 | fn = file_desc["filename"] 112 | for i in self._patterns: 113 | if i in fn: 114 | return True 115 | 116 | return False 117 | -------------------------------------------------------------------------------- /src/bandersnatch/default.conf: -------------------------------------------------------------------------------- 1 | [mirror] 2 | ; The directory where the mirror data will be stored. 3 | directory = /srv/pypi 4 | ; Save JSON metadata into the web tree: 5 | ; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME 6 | json = false 7 | 8 | ; Cleanup legacy non PEP 503 normalized named simple directories 9 | cleanup = false 10 | 11 | ; The PyPI server which will be mirrored. 12 | ; master = https://test.python.org 13 | ; scheme for PyPI server MUST be https 14 | master = https://pypi.org 15 | 16 | ; The network socket timeout to use for all connections. This is set to a 17 | ; somewhat aggressively low value: rather fail quickly temporarily and re-run 18 | ; the client soon instead of having a process hang infinitely and have TCP not 19 | ; catching up for ages. 20 | timeout = 10 21 | 22 | ; The global-timeout sets aiohttp total timeout for it's coroutines 23 | ; This is set incredibly high by default as aiohttp coroutines need to be 24 | ; equipped to handle mirroring large PyPI packages on slow connections. 25 | global-timeout = 1800 26 | 27 | ; Number of worker threads to use for parallel downloads. 28 | ; Recommendations for worker thread setting: 29 | ; - leave the default of 3 to avoid overloading the pypi master 30 | ; - official servers located in data centers could run 10 workers 31 | ; - anything beyond 10 is probably unreasonable and avoided by bandersnatch 32 | workers = 3 33 | 34 | ; Whether to hash package indexes 35 | ; Note that package index directory hashing is incompatible with pip, and so 36 | ; this should only be used in an environment where it is behind an application 37 | ; that can translate URIs to filesystem locations. For example, with the 38 | ; following Apache RewriteRule: 39 | ; RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/ 40 | ; RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3 41 | ; OR 42 | ; following nginx rewrite rules: 43 | ; rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last; 44 | ; rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last; 45 | ; Setting this to true would put the package 'abc' index in simple/a/abc. 46 | ; Recommended setting: the default of false for full pip/pypi compatibility. 47 | hash-index = false 48 | 49 | ; Whether to stop a sync quickly after an error is found or whether to continue 50 | ; syncing but not marking the sync as successful. Value should be "true" or 51 | ; "false". 52 | stop-on-error = false 53 | 54 | ; The storage backend that will be used to save data and metadata while 55 | ; mirroring packages. By default, use the filesystem backend. Other options 56 | ; currently include: 'swift' 57 | storage-backend = filesystem 58 | 59 | ; Advanced logging configuration. Uncomment and set to the location of a 60 | ; python logging format logging config file. 61 | ; log-config = /etc/bandersnatch-log.conf 62 | 63 | ; Generate index pages with absolute urls rather than relative links. This is 64 | ; generally not necessary, but was added for the official internal PyPI mirror, 65 | ; which requires serving packages from https://files.pythonhosted.org 66 | ; root_uri = https://example.com 67 | 68 | ; Number of consumers which verify metadata 69 | verifiers = 3 70 | 71 | ; Number of prior simple index.html to store. Used as a safeguard against 72 | ; upstream changes generating blank index.html files. Prior versions are 73 | ; stored under as "versions/index__.html" and the current 74 | ; index.html will be a symlink to the latest version. 75 | ; If set to 0 no prior versions are stored and index.html is the latest version. 76 | ; If unset defaults to 0. 77 | ; keep_index_versions = 0 78 | 79 | ; vim: set ft=cfg: 80 | 81 | ; Configure a file to write out the list of files downloaded during the mirror. 82 | ; This is useful for situations when mirroring to offline systems where a process 83 | ; is required to only sync new files to the upstream mirror. 84 | ; The file be be named as set in the diff-file, and overwritten unless the 85 | ; diff-append-epoch setting is set to true. If this is true, the epoch date will 86 | ; be appended to the filename (i.e. /path/to/diff-1568129735) 87 | ; diff-file = /srv/pypi/mirrored-files 88 | ; diff-append-epoch = true 89 | -------------------------------------------------------------------------------- /src/bandersnatch/unittest.conf: -------------------------------------------------------------------------------- 1 | [mirror] 2 | ; The directory where the mirror data will be stored. 3 | directory = /srv/pypi 4 | ; Save JSON metadata into the web tree: 5 | ; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME 6 | json = false 7 | 8 | ; Cleanup legacy non PEP 503 normalized named simple directories 9 | cleanup = false 10 | 11 | ; The PyPI server which will be mirrored. 12 | ; master = https://test.python.org 13 | ; scheme for PyPI server MUST be https 14 | master = https://pypi.org 15 | 16 | ; The network socket timeout to use for all connections. This is set to a 17 | ; somewhat aggressively low value: rather fail quickly temporarily and re-run 18 | ; the client soon instead of having a process hang infinitely and have TCP not 19 | ; catching up for ages. 20 | timeout = 10 21 | global-timeout = 18000 22 | 23 | ; Number of worker threads to use for parallel downloads. 24 | ; Recommendations for worker thread setting: 25 | ; - leave the default of 3 to avoid overloading the pypi master 26 | ; - official servers located in data centers could run 10 workers 27 | ; - anything beyond 10 is probably unreasonable and avoided by bandersnatch 28 | workers = 3 29 | 30 | ; Whether to hash package indexes 31 | ; Note that package index directory hashing is incompatible with pip, and so 32 | ; this should only be used in an environment where it is behind an application 33 | ; that can translate URIs to filesystem locations. For example, with the 34 | ; following Apache RewriteRule: 35 | ; RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/ 36 | ; RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3 37 | ; OR 38 | ; following nginx rewrite rules: 39 | ; rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last; 40 | ; rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last; 41 | ; Setting this to true would put the package 'abc' index in simple/a/abc. 42 | ; Recommended setting: the default of false for full pip/pypi compatibility. 43 | hash-index = false 44 | 45 | ; Whether to stop a sync quickly after an error is found or whether to continue 46 | ; syncing but not marking the sync as successful. Value should be "true" or 47 | ; "false". 48 | stop-on-error = false 49 | 50 | storage-backend = filesystem 51 | ; Advanced logging configuration. Uncomment and set to the location of a 52 | ; python logging format logging config file. 53 | ; log-config = /etc/bandersnatch-log.conf 54 | 55 | ; Generate index pages with absolute urls rather than relative links. This is 56 | ; generally not necessary, but was added for the official internal PyPI mirror, 57 | ; which requires serving packages from https://files.pythonhosted.org 58 | ; root_uri = https://example.com 59 | 60 | ; Number of consumers which verify metadata 61 | verifiers = 3 62 | 63 | ; Number of prior simple index.html to store. Used as a safeguard against 64 | ; upstream changes generating blank index.html files. Prior versions are 65 | ; stored under as "versions/index__.html" and the current 66 | ; index.html will be a symlink to the latest version. 67 | ; If set to 0 no prior versions are stored and index.html is the latest version. 68 | ; If unset defaults to 0. 69 | ; keep_index_versions = 0 70 | 71 | ; Configure a file to write out the list of files downloaded during the mirror. 72 | ; This is useful for situations when mirroring to offline systems where a process 73 | ; is required to only sync new files to the upstream mirror. 74 | ; The file be be named as set in the diff-file, and overwritten unless the 75 | ; diff-append-epoch setting is set to true. If this is true, the epoch date will 76 | ; be appended to the filename (i.e. /path/to/diff-1568129735) 77 | ; You can also indicate a section and key of the config file that is storing the 78 | ; directory to keep the diff file, separated by an _ character in- 79 | ; e.g. {{mirror_directory}} 80 | diff-file = {{mirror_directory}}/mirrored-files 81 | diff-append-epoch = false 82 | 83 | ; Enable filtering plugins 84 | [plugins] 85 | ; Enable all or specific plugins - e.g. allowlist_project 86 | enabled = all 87 | 88 | [blocklist] 89 | ; List of PyPI packages not to sync - Useful if malicious packages are mirrored 90 | packages = 91 | example1 92 | example2 93 | 94 | ; vim: set ft=cfg: 95 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path 3 | import re 4 | from pathlib import Path 5 | from tempfile import NamedTemporaryFile, TemporaryDirectory, gettempdir 6 | from typing import Set 7 | 8 | import aiohttp 9 | import pytest 10 | from _pytest.monkeypatch import MonkeyPatch 11 | 12 | from bandersnatch.utils import ( # isort:skip 13 | bandersnatch_safe_name, 14 | convert_url_to_path, 15 | hash, 16 | recursive_find_files, 17 | rewrite, 18 | unlink_parent_dir, 19 | user_agent, 20 | WINDOWS, 21 | ) 22 | 23 | 24 | def test_convert_url_to_path() -> None: 25 | assert ( 26 | "packages/8f/1a/1aa000db9c5a799b676227e845d2b64fe725328e05e3d3b30036f" 27 | + "50eb316/peerme-1.0.0-py36-none-any.whl" 28 | == convert_url_to_path( 29 | "https://files.pythonhosted.org/packages/8f/1a/1aa000db9c5a799b67" 30 | + "6227e845d2b64fe725328e05e3d3b30036f50eb316/" 31 | + "peerme-1.0.0-py36-none-any.whl" 32 | ) 33 | ) 34 | 35 | 36 | def test_hash() -> None: 37 | expected_md5 = "b2855c4a4340dad73d9d870630390885" 38 | expected_sha256 = "a2a5e3823bf4cccfaad4e2f0fbabe72bc8c3cf78bc51eb396b5c7af99e17f07a" 39 | with NamedTemporaryFile(delete=False) as ntf: 40 | ntf_path = Path(ntf.name) 41 | ntf.close() 42 | try: 43 | with ntf_path.open("w") as ntfp: 44 | ntfp.write("Unittest File for hashing Fun!") 45 | 46 | assert hash(ntf_path, function="md5") == expected_md5 47 | assert hash(ntf_path, function="sha256") == expected_sha256 48 | assert hash(ntf_path) == expected_sha256 49 | finally: 50 | if ntf_path.exists(): 51 | ntf_path.unlink() 52 | 53 | 54 | def test_find_files() -> None: 55 | with TemporaryDirectory() as td: 56 | td_path = Path(td) 57 | td_sub_path = td_path / "aDir" 58 | td_sub_path.mkdir() 59 | 60 | expected_found_files = {td_path / "file1", td_sub_path / "file2"} 61 | for afile in expected_found_files: 62 | with afile.open("w") as afp: 63 | afp.write("PyPA ftw!") 64 | 65 | found_files: Set[Path] = set() 66 | recursive_find_files(found_files, td_path) 67 | assert found_files == expected_found_files 68 | 69 | 70 | def test_rewrite(tmpdir: Path, monkeypatch: MonkeyPatch) -> None: 71 | monkeypatch.chdir(tmpdir) 72 | with open("sample", "w") as f: 73 | f.write("bsdf") 74 | with rewrite("sample") as f: # type: ignore 75 | f.write("csdf") 76 | assert open("sample").read() == "csdf" 77 | mode = os.stat("sample").st_mode 78 | # chmod doesn't work on windows machines. Permissions are pinned at 666 79 | if not WINDOWS: 80 | assert oct(mode) == "0o100644" 81 | 82 | 83 | def test_rewrite_fails(tmpdir: Path, monkeypatch: MonkeyPatch) -> None: 84 | monkeypatch.chdir(tmpdir) 85 | with open("sample", "w") as f: 86 | f.write("bsdf") 87 | with pytest.raises(Exception): 88 | with rewrite("sample") as f: # type: ignore 89 | f.write("csdf") 90 | raise Exception() 91 | assert open("sample").read() == "bsdf" # type: ignore 92 | 93 | 94 | def test_rewrite_nonexisting_file(tmpdir: Path, monkeypatch: MonkeyPatch) -> None: 95 | monkeypatch.chdir(tmpdir) 96 | with rewrite("sample", "w") as f: 97 | f.write("csdf") 98 | with open("sample") as f: 99 | assert f.read() == "csdf" 100 | 101 | 102 | def test_unlink_parent_dir() -> None: 103 | adir = Path(gettempdir()) / f"tb.{os.getpid()}" 104 | adir.mkdir() 105 | afile = adir / "file1" 106 | afile.touch() 107 | unlink_parent_dir(afile) 108 | assert not adir.exists() 109 | 110 | 111 | def test_user_agent() -> None: 112 | assert re.match( 113 | r"bandersnatch/[0-9]\.[0-9]\.[0-9]\.?d?e?v?[0-9]? \(.*\) " 114 | + fr"\(aiohttp {aiohttp.__version__}\)", 115 | user_agent(), 116 | ) 117 | 118 | 119 | def test_bandersnatch_safe_name() -> None: 120 | bad_name = "Flake_8_Fake" 121 | assert "flake-8-fake" == bandersnatch_safe_name(bad_name) 122 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | author = Christian Theune 3 | author_email = ct@flyingcircus.io 4 | classifiers = 5 | Programming Language :: Python :: 3 :: Only 6 | Programming Language :: Python :: 3.6 7 | Programming Language :: Python :: 3.7 8 | Programming Language :: Python :: 3.8 9 | description = Mirroring tool that implements the client (mirror) side of PEP 381 10 | long_description = file:README.md 11 | long_description_content_type = text/markdown 12 | license = Academic Free License, version 3 13 | license_file = LICENSE 14 | name = bandersnatch 15 | project_urls = 16 | Source Code = https://github.com/pypa/bandersnatch 17 | Change Log = https://github.com/pypa/bandersnatch/CHANGES.md 18 | url = https://github.com/pypa/bandersnatch/ 19 | version = 4.1.1 20 | 21 | [options] 22 | install_requires = 23 | aiohttp-xmlrpc 24 | aiohttp 25 | filelock 26 | importlib_resources; python_version < '3.7' 27 | packaging 28 | setuptools>40.0.0 29 | package_dir = 30 | =src 31 | packages = find: 32 | python_requires = >=3.6 33 | 34 | [options.packages.find] 35 | where=src 36 | 37 | [options.package_data] 38 | bandersnatch = *.conf 39 | 40 | [options.entry_points] 41 | bandersnatch_storage_plugins.v1.backend = 42 | swift_plugin = bandersnatch_storage_plugins.swift:SwiftStorage 43 | filesystem_plugin = bandersnatch_storage_plugins.filesystem:FilesystemStorage 44 | 45 | # This entrypoint group must match the value of bandersnatch.filter.PROJECT_PLUGIN_RESOURCE 46 | bandersnatch_filter_plugins.v2.project = 47 | blacklist_project = bandersnatch_filter_plugins.blocklist_name:BlockListProject 48 | whitelist_project = bandersnatch_filter_plugins.allowlist_name:AllowListProject 49 | regex_project = bandersnatch_filter_plugins.regex_name:RegexProjectFilter 50 | 51 | # This entrypoint group must match the value of bandersnatch.filter.METADATA_PLUGIN_RESOURCE 52 | bandersnatch_filter_plugins.v2.metadata = 53 | regex_project_metadata = bandersnatch_filter_plugins.metadata_filter:RegexProjectMetadataFilter 54 | 55 | # This entrypoint group must match the value of bandersnatch.filter.RELEASE_PLUGIN_RESOURCE 56 | bandersnatch_filter_plugins.v2.release = 57 | blacklist_release = bandersnatch_filter_plugins.blocklist_name:BlockListRelease 58 | prerelease_release = bandersnatch_filter_plugins.prerelease_name:PreReleaseFilter 59 | regex_release = bandersnatch_filter_plugins.regex_name:RegexReleaseFilter 60 | latest_release = bandersnatch_filter_plugins.latest_name:LatestReleaseFilter 61 | allowlist_release = bandersnatch_filter_plugins.allowlist_name:AllowListRelease 62 | 63 | # This entrypoint group must match the value of bandersnatch.filter.RELEASE_FILE_PLUGIN_RESOURCE 64 | bandersnatch_filter_plugins.v2.release_file = 65 | regex_release_file_metadata = bandersnatch_filter_plugins.metadata_filter:RegexReleaseFileMetadataFilter 66 | version_range_release_file_metadata = bandersnatch_filter_plugins.metadata_filter:VersionRangeReleaseFileMetadataFilter 67 | exclude_platform = bandersnatch_filter_plugins.filename_name:ExcludePlatformFilter 68 | 69 | console_scripts = 70 | bandersnatch = bandersnatch.main:main 71 | 72 | [options.extras_require] 73 | safety_db = 74 | bandersnatch_safety_db 75 | 76 | test = 77 | coverage 78 | freezegun 79 | flake8 80 | flake8-bugbear 81 | pytest 82 | pytest-timeout 83 | pytest-cache 84 | 85 | doc_build = 86 | docutils 87 | sphinx 88 | sphinx_bootstrap_theme 89 | guzzle_sphinx_theme 90 | sphinx_rtd_theme 91 | recommonmark 92 | # git+https://github.com/pypa/pypa-docs-theme.git#egg=pypa-docs-theme 93 | # git+https://github.com/python/python-docs-theme.git#egg=python-docs-theme 94 | 95 | swift = 96 | keystoneauth1 97 | openstackclient 98 | python-swiftclient 99 | 100 | [isort] 101 | atomic = true 102 | not_skip = __init__.py 103 | line_length = 88 104 | multi_line_output = 3 105 | known_third_party = _pytest,aiohttp,aiohttp_xmlrpc,asynctest,filelock,freezegun,keystoneauth1,mock_config,packaging,pkg_resources,pytest,setuptools,swiftclient 106 | known_first_party = bandersnatch,bandersnatch_filter_plugins,bandersnatch_storage_plugins 107 | force_grid_wrap = 0 108 | use_parentheses=True 109 | include_trailing_comma = True 110 | -------------------------------------------------------------------------------- /test_runner.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """ 4 | bandersnatch CI run script - Will either drive `tox` or run an Integration Test 5 | - Rewritten in Python for easier dev contributions + Windows support 6 | 7 | Integration Tests will go off and hit PyPI + pull whiteliested packages 8 | then check for expected outputs to exist 9 | """ 10 | 11 | from configparser import ConfigParser 12 | from os import environ 13 | from pathlib import Path 14 | from shutil import rmtree, which 15 | from subprocess import run 16 | from sys import exit 17 | from tempfile import gettempdir 18 | 19 | from src.bandersnatch.utils import hash 20 | 21 | BANDERSNATCH_EXE = Path( 22 | which("bandersnatch") or which("bandersnatch.exe") or "bandersnatch" 23 | ) 24 | CI_CONFIG = Path("src/bandersnatch/tests/ci.conf") 25 | EOP = "[CI ERROR]:" 26 | MIRROR_ROOT = Path(f"{gettempdir()}/pypi") 27 | MIRROR_BASE = MIRROR_ROOT / "web" 28 | TGZ_SHA256 = "bc9430dae93f8bc53728773545cbb646a6b5327f98de31bdd6e1a2b2c6e805a9" 29 | TOX_EXE = Path(which("tox") or "tox") 30 | 31 | # Make Global so we can check exists before delete 32 | A_BLACK_WHL = ( 33 | MIRROR_BASE 34 | / "packages" 35 | / "30" 36 | / "62" 37 | / "cf549544a5fe990bbaeca21e9c419501b2de7a701ab0afb377bc81676600" 38 | / "black-19.3b0-py36-none-any.whl" 39 | ) 40 | 41 | 42 | def check_ci() -> int: 43 | black_index = MIRROR_BASE / "simple/b/black/index.html" 44 | peerme_index = MIRROR_BASE / "simple/p/peerme/index.html" 45 | peerme_json = MIRROR_BASE / "json/peerme" 46 | peerme_tgz = ( 47 | MIRROR_BASE 48 | / "packages" 49 | / "8f" 50 | / "1a" 51 | / "1aa000db9c5a799b676227e845d2b64fe725328e05e3d3b30036f50eb316" 52 | / "peerme-1.0.0-py36-none-any.whl" 53 | ) 54 | 55 | if not peerme_index.exists(): 56 | print(f"{EOP} No peerme simple API index exists @ {peerme_index}") 57 | return 69 58 | 59 | if not peerme_json.exists(): 60 | print(f"{EOP} No peerme JSON API file exists @ {peerme_json}") 61 | return 70 62 | 63 | if not peerme_tgz.exists(): 64 | print(f"{EOP} No peerme tgz file exists @ {peerme_tgz}") 65 | return 71 66 | 67 | peerme_tgz_sha256 = hash(str(peerme_tgz)) 68 | if peerme_tgz_sha256 != TGZ_SHA256: 69 | print(f"{EOP} Bad peerme 1.0.0 sha256: {peerme_tgz_sha256} != {TGZ_SHA256}") 70 | return 72 71 | 72 | if black_index.exists(): 73 | print(f"{EOP} {black_index} exists ... delete failed?") 74 | return 73 75 | 76 | if A_BLACK_WHL.exists(): 77 | print(f"{EOP} {A_BLACK_WHL} exists ... delete failed?") 78 | return 74 79 | 80 | rmtree(MIRROR_ROOT) 81 | 82 | print("Bandersnatch PyPI CI finished successfully!") 83 | return 0 84 | 85 | 86 | def do_ci(conf: Path) -> int: 87 | if not conf.exists(): 88 | print(f"CI config {conf} does not exist for bandersnatch run") 89 | return 2 90 | 91 | print("Starting CI bandersnatch mirror ...") 92 | cmds = (str(BANDERSNATCH_EXE), "--config", str(conf), "--debug", "mirror") 93 | print(f"bandersnatch cmd: {' '.join(cmds)}") 94 | run(cmds, check=True) 95 | 96 | print(f"Checking if {A_BLACK_WHL} exists") 97 | if not A_BLACK_WHL.exists(): 98 | print(f"{EOP} {A_BLACK_WHL} does not exist after mirroring ...") 99 | return 68 100 | 101 | print("Starting to deleting black from mirror ...") 102 | del_cmds = ( 103 | str(BANDERSNATCH_EXE), 104 | "--config", 105 | str(conf), 106 | "--debug", 107 | "delete", 108 | "black", 109 | ) 110 | print(f"bandersnatch delete cmd: {' '.join(cmds)}") 111 | run(del_cmds, check=True) 112 | 113 | return check_ci() 114 | 115 | 116 | def platform_config() -> Path: 117 | """Ensure the CI_CONFIG is correct for the platform we're running on""" 118 | platform_ci_conf = MIRROR_ROOT / "ci.conf" 119 | cp = ConfigParser() 120 | cp.read(str(CI_CONFIG)) 121 | 122 | print(f"Setting CI directory={MIRROR_ROOT}") 123 | cp["mirror"]["directory"] = str(MIRROR_ROOT) 124 | 125 | with platform_ci_conf.open("w") as pccfp: 126 | cp.write(pccfp) 127 | 128 | return platform_ci_conf 129 | 130 | 131 | def main() -> int: 132 | if "TOXENV" not in environ: 133 | print("No TOXENV set. Exiting!") 134 | return 1 135 | 136 | if environ["TOXENV"] != "INTEGRATION": 137 | return run((str(TOX_EXE),)).returncode 138 | else: 139 | print("Running Ingtegration tests due to TOXENV set to INTEGRATION") 140 | MIRROR_ROOT.mkdir(exist_ok=True) 141 | return do_ci(platform_config()) 142 | 143 | 144 | if __name__ == "__main__": 145 | exit(main()) 146 | -------------------------------------------------------------------------------- /src/bandersnatch/utils.py: -------------------------------------------------------------------------------- 1 | import contextlib 2 | import hashlib 3 | import logging 4 | import os 5 | import os.path 6 | import platform 7 | import re 8 | import shutil 9 | import sys 10 | import tempfile 11 | from datetime import datetime 12 | from pathlib import Path 13 | from typing import IO, Any, Generator, List, Set, Union 14 | from urllib.parse import urlparse 15 | 16 | import aiohttp 17 | 18 | from . import __version__ 19 | 20 | logger = logging.getLogger(__name__) 21 | 22 | 23 | def user_agent() -> str: 24 | template = "bandersnatch/{version} ({python}, {system})" 25 | template += f" (aiohttp {aiohttp.__version__})" 26 | version = __version__ 27 | python = sys.implementation.name 28 | python += " {}.{}.{}-{}{}".format(*sys.version_info) 29 | uname = platform.uname() 30 | system = " ".join([uname.system, uname.machine]) 31 | return template.format(**locals()) 32 | 33 | 34 | SAFE_NAME_REGEX = re.compile(r"[^A-Za-z0-9.]+") 35 | USER_AGENT = user_agent() 36 | WINDOWS = bool(platform.system() == "Windows") 37 | 38 | 39 | def make_time_stamp() -> str: 40 | """Helper function that returns a timestamp suitable for use 41 | in a filename on any OS""" 42 | return f"{datetime.utcnow().isoformat()}Z".replace(":", "") 43 | 44 | 45 | def convert_url_to_path(url: str) -> str: 46 | return urlparse(url).path[1:] 47 | 48 | 49 | def hash(path: Path, function: str = "sha256") -> str: 50 | h = getattr(hashlib, function)() 51 | with open(path, "rb") as f: 52 | while True: 53 | chunk = f.read(128 * 1024) 54 | if not chunk: 55 | break 56 | h.update(chunk) 57 | return str(h.hexdigest()) 58 | 59 | 60 | def find(root: Union[Path, str], dirs: bool = True) -> str: 61 | """A test helper simulating 'find'. 62 | 63 | Iterates over directories and filenames, given as relative paths to the 64 | root. 65 | 66 | """ 67 | # TODO: account for alternative backends 68 | if isinstance(root, str): 69 | root = Path(root) 70 | 71 | results: List[Path] = [] 72 | for dirpath, dirnames, filenames in os.walk(root): 73 | names = filenames 74 | if dirs: 75 | names += dirnames 76 | for name in names: 77 | results.append(Path(dirpath) / name) 78 | results.sort() 79 | return "\n".join(str(result.relative_to(root)) for result in results) 80 | 81 | 82 | @contextlib.contextmanager 83 | def rewrite( 84 | filepath: Union[str, Path], mode: str = "w", **kw: Any 85 | ) -> Generator[IO, None, None]: 86 | """Rewrite an existing file atomically to avoid programs running in 87 | parallel to have race conditions while reading.""" 88 | # TODO: Account for alternative backends 89 | if isinstance(filepath, str): 90 | base_dir = os.path.dirname(filepath) 91 | filename = os.path.basename(filepath) 92 | else: 93 | base_dir = str(filepath.parent) 94 | filename = filepath.name 95 | 96 | # Change naming format to be more friendly with distributed POSIX 97 | # filesystems like GlusterFS that hash based on filename 98 | # GlusterFS ignore '.' at the start of filenames and this avoid rehashing 99 | with tempfile.NamedTemporaryFile( 100 | mode=mode, prefix=f".{filename}.", delete=False, dir=base_dir, **kw 101 | ) as f: 102 | filepath_tmp = f.name 103 | yield f 104 | 105 | if not os.path.exists(filepath_tmp): 106 | # Allow our clients to remove the file in case it doesn't want it to be 107 | # put in place actually but also doesn't want to error out. 108 | return 109 | os.chmod(filepath_tmp, 0o100644) 110 | shutil.move(filepath_tmp, filepath) 111 | 112 | 113 | def recursive_find_files(files: Set[Path], base_dir: Path) -> None: 114 | dirs = [d for d in base_dir.iterdir() if d.is_dir()] 115 | files.update([x for x in base_dir.iterdir() if x.is_file()]) 116 | for directory in dirs: 117 | recursive_find_files(files, directory) 118 | 119 | 120 | def unlink_parent_dir(path: Path) -> None: 121 | """ Remove a file and if the dir is empty remove it """ 122 | logger.info(f"unlink {str(path)}") 123 | path.unlink() 124 | 125 | parent_path = path.parent 126 | try: 127 | parent_path.rmdir() 128 | logger.info(f"rmdir {str(parent_path)}") 129 | except OSError as oe: 130 | logger.debug(f"Did not remove {str(parent_path)}: {str(oe)}") 131 | 132 | 133 | def bandersnatch_safe_name(name: str) -> str: 134 | """Convert an arbitrary string to a standard distribution name 135 | Any runs of non-alphanumeric/. characters are replaced with a single '-'. 136 | 137 | - This was copied from `pkg_resources` (part of `setuptools`) 138 | 139 | bandersnatch also lower cases the returned name 140 | """ 141 | return SAFE_NAME_REGEX.sub("-", name).lower() 142 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/plugins/test_latest_release.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from tempfile import TemporaryDirectory 4 | from unittest import TestCase 5 | 6 | from mock_config import mock_config 7 | 8 | import bandersnatch.filter 9 | from bandersnatch.master import Master 10 | from bandersnatch.mirror import BandersnatchMirror 11 | from bandersnatch.package import Package 12 | from bandersnatch_filter_plugins import latest_name 13 | 14 | 15 | class BasePluginTestCase(TestCase): 16 | 17 | tempdir = None 18 | cwd = None 19 | 20 | def setUp(self) -> None: 21 | self.cwd = os.getcwd() 22 | self.tempdir = TemporaryDirectory() 23 | os.chdir(self.tempdir.name) 24 | 25 | def tearDown(self) -> None: 26 | if self.tempdir: 27 | assert self.cwd 28 | os.chdir(self.cwd) 29 | self.tempdir.cleanup() 30 | self.tempdir = None 31 | 32 | 33 | class TestLatestReleaseFilter(BasePluginTestCase): 34 | 35 | config_contents = """\ 36 | [plugins] 37 | enabled = 38 | latest_release 39 | 40 | [latest_release] 41 | keep = 2 42 | """ 43 | 44 | def test_plugin_compiles_patterns(self) -> None: 45 | mock_config(self.config_contents) 46 | 47 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 48 | 49 | assert any( 50 | type(plugin) == latest_name.LatestReleaseFilter for plugin in plugins 51 | ) 52 | plugin = next( 53 | plugin 54 | for plugin in plugins 55 | if isinstance(plugin, latest_name.LatestReleaseFilter) 56 | ) 57 | assert plugin.keep == 2 58 | 59 | def test_latest_releases_keep_latest(self) -> None: 60 | mock_config(self.config_contents) 61 | 62 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 63 | pkg = Package("foo", 1) 64 | pkg._metadata = { 65 | "info": {"name": "foo", "version": "2.0.0"}, 66 | "releases": { 67 | "1.0.0": {}, 68 | "1.1.0": {}, 69 | "1.1.1": {}, 70 | "1.1.2": {}, 71 | "1.1.3": {}, 72 | "2.0.0": {}, 73 | }, 74 | } 75 | 76 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 77 | 78 | assert pkg.releases == {"1.1.3": {}, "2.0.0": {}} 79 | 80 | def test_latest_releases_keep_stable(self) -> None: 81 | mock_config(self.config_contents) 82 | 83 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 84 | pkg = Package("foo", 1) 85 | pkg._metadata = { 86 | "info": {"name": "foo", "version": "2.0.0"}, # stable version 87 | "releases": { 88 | "1.0.0": {}, 89 | "1.1.0": {}, 90 | "1.1.1": {}, 91 | "1.1.2": {}, 92 | "1.1.3": {}, 93 | "2.0.0": {}, # <= stable version, keep it 94 | "2.0.1b1": {}, 95 | "2.0.1b2": {}, # <= most recent, keep it 96 | }, 97 | } 98 | 99 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 100 | 101 | assert pkg.releases == {"2.0.1b2": {}, "2.0.0": {}} 102 | 103 | 104 | class TestLatestReleaseFilterUninitialized(BasePluginTestCase): 105 | 106 | config_contents = """\ 107 | [plugins] 108 | enabled = 109 | latest_release 110 | """ 111 | 112 | def test_plugin_compiles_patterns(self) -> None: 113 | mock_config(self.config_contents) 114 | 115 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 116 | 117 | assert any( 118 | type(plugin) == latest_name.LatestReleaseFilter for plugin in plugins 119 | ) 120 | plugin = next( 121 | plugin 122 | for plugin in plugins 123 | if isinstance(plugin, latest_name.LatestReleaseFilter) 124 | ) 125 | assert plugin.keep == 0 126 | 127 | def test_latest_releases_uninitialized(self) -> None: 128 | mock_config(self.config_contents) 129 | 130 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 131 | pkg = Package("foo", 1) 132 | pkg._metadata = { 133 | "info": {"name": "foo", "version": "2.0.0"}, 134 | "releases": { 135 | "1.0.0": {}, 136 | "1.1.0": {}, 137 | "1.1.1": {}, 138 | "1.1.2": {}, 139 | "1.1.3": {}, 140 | "2.0.0": {}, 141 | }, 142 | } 143 | 144 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 145 | 146 | assert pkg.releases == { 147 | "1.0.0": {}, 148 | "1.1.0": {}, 149 | "1.1.1": {}, 150 | "1.1.2": {}, 151 | "1.1.3": {}, 152 | "2.0.0": {}, 153 | } 154 | -------------------------------------------------------------------------------- /src/bandersnatch/delete.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import asyncio 4 | import concurrent.futures 5 | import logging 6 | from argparse import Namespace 7 | from configparser import ConfigParser 8 | from json import JSONDecodeError, load 9 | from pathlib import Path 10 | from typing import Awaitable, List 11 | from urllib.parse import urlparse 12 | 13 | from packaging.utils import canonicalize_name 14 | 15 | from .master import Master 16 | from .storage import storage_backend_plugins 17 | from .verify import get_latest_json 18 | 19 | logger = logging.getLogger(__name__) 20 | 21 | 22 | def delete_path(blob_path: Path, dry_run: bool = False) -> int: 23 | storage_backend = next(iter(storage_backend_plugins())) 24 | if dry_run: 25 | logger.info(f" rm {blob_path}") 26 | if not storage_backend.exists(blob_path): 27 | logger.debug(f"{blob_path} does not exist. Skipping") 28 | return 0 29 | try: 30 | storage_backend.delete(blob_path, dry_run=dry_run) 31 | except FileNotFoundError: 32 | # Due to using threads in executors we sometimes have a 33 | # race condition if canonicalize_name == passed in name 34 | pass 35 | except OSError: 36 | logger.exception(f"Unable to delete {blob_path}") 37 | return 1 38 | return 0 39 | 40 | 41 | async def delete_packages(config: ConfigParser, args: Namespace, master: Master) -> int: 42 | loop = asyncio.get_event_loop() 43 | workers = args.workers or config.getint("mirror", "workers") 44 | executor = concurrent.futures.ThreadPoolExecutor(max_workers=workers) 45 | storage_backend = next( 46 | iter(storage_backend_plugins(config=config, clear_cache=True)) 47 | ) 48 | web_base_path = storage_backend.web_base_path 49 | json_base_path = storage_backend.json_base_path 50 | pypi_base_path = storage_backend.pypi_base_path 51 | simple_path = storage_backend.simple_base_path 52 | 53 | delete_coros: List[Awaitable] = [] 54 | for package in args.pypi_packages: 55 | canon_name = canonicalize_name(package) 56 | need_nc_paths = canon_name != package 57 | json_full_path = json_base_path / canon_name 58 | json_full_path_nc = json_base_path / package if need_nc_paths else None 59 | legacy_json_path = pypi_base_path / canon_name 60 | logger.debug(f"Looking up {canon_name} metadata @ {json_full_path}") 61 | 62 | if not storage_backend.exists(json_full_path): 63 | if args.dry_run: 64 | logger.error( 65 | f"Skipping {json_full_path} as dry run and no JSON file exists" 66 | ) 67 | continue 68 | 69 | logger.error(f"{json_full_path} does not exist. Pulling from PyPI") 70 | await get_latest_json(master, json_full_path, config, executor, False) 71 | if not json_full_path.exists(): 72 | logger.error(f"Unable to HTTP get JSON for {json_full_path}") 73 | continue 74 | 75 | with storage_backend.open_file(json_full_path, text=True) as jfp: 76 | try: 77 | package_data = load(jfp) 78 | except JSONDecodeError: 79 | logger.exception(f"Skipping {canon_name} @ {json_full_path}") 80 | continue 81 | 82 | for _release, blobs in package_data["releases"].items(): 83 | for blob in blobs: 84 | url_parts = urlparse(blob["url"]) 85 | blob_path = web_base_path / url_parts.path[1:] 86 | delete_coros.append( 87 | loop.run_in_executor(executor, delete_path, blob_path, args.dry_run) 88 | ) 89 | 90 | # Attempt to delete json, normal simple path + hash simple path 91 | package_simple_path = simple_path / canon_name 92 | package_simple_path_nc = simple_path / package if need_nc_paths else None 93 | package_hash_path = simple_path / canon_name[0] / canon_name 94 | package_hash_path_nc = ( 95 | simple_path / canon_name[0] / package if need_nc_paths else None 96 | ) 97 | # Try cleanup non canon name if they differ 98 | for package_path in ( 99 | json_full_path, 100 | legacy_json_path, 101 | package_simple_path, 102 | package_simple_path_nc, 103 | package_hash_path, 104 | package_hash_path_nc, 105 | json_full_path_nc, 106 | ): 107 | if not package_path: 108 | continue 109 | 110 | delete_coros.append( 111 | loop.run_in_executor(executor, delete_path, package_path, args.dry_run) 112 | ) 113 | 114 | if args.dry_run: 115 | logger.info("-- bandersnatch delete DRY RUN --") 116 | if delete_coros: 117 | logger.info(f"Attempting to remove {len(delete_coros)} files") 118 | return sum(await asyncio.gather(*delete_coros)) 119 | return 0 120 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/conftest.py: -------------------------------------------------------------------------------- 1 | # flake8: noqa 2 | 3 | import unittest.mock as mock 4 | from pathlib import Path 5 | from typing import TYPE_CHECKING, Any, Dict 6 | 7 | import pytest 8 | from _pytest.capture import CaptureFixture 9 | from _pytest.fixtures import FixtureRequest 10 | from _pytest.monkeypatch import MonkeyPatch 11 | from asynctest import asynctest 12 | 13 | if TYPE_CHECKING: 14 | from bandersnatch.mirror import BandersnatchMirror 15 | from bandersnatch.master import Master 16 | from bandersnatch.package import Package 17 | 18 | 19 | @pytest.fixture(autouse=True) 20 | def stop_std_logging(request: FixtureRequest, capfd: CaptureFixture) -> None: 21 | patcher = mock.patch("bandersnatch.log.setup_logging") 22 | patcher.start() 23 | 24 | def tearDown() -> None: 25 | patcher.stop() 26 | 27 | request.addfinalizer(tearDown) 28 | 29 | 30 | async def _nosleep(*args: Any) -> None: 31 | pass 32 | 33 | 34 | @pytest.fixture(autouse=True) 35 | def never_sleep(request: FixtureRequest) -> None: 36 | patcher = mock.patch("asyncio.sleep", _nosleep) 37 | patcher.start() 38 | 39 | def tearDown() -> None: 40 | patcher.stop() 41 | 42 | request.addfinalizer(tearDown) 43 | 44 | 45 | @pytest.fixture 46 | def package(package_json: dict) -> "Package": 47 | from bandersnatch.package import Package 48 | 49 | pkg = Package(package_json["info"]["name"], serial=11) 50 | pkg._metadata = package_json 51 | return pkg 52 | 53 | 54 | @pytest.fixture 55 | def package_json() -> Dict[str, Any]: 56 | return { 57 | "info": {"name": "Foo", "version": "0.1"}, 58 | "last_serial": 654_321, 59 | "releases": { 60 | "0.1": [ 61 | { 62 | "url": "https://pypi.example.com/packages/any/f/foo/foo.zip", 63 | "filename": "foo.zip", 64 | "digests": { 65 | "md5": "6bd3ddc295176f4dca196b5eb2c4d858", 66 | "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", 67 | }, 68 | "md5_digest": "b6bcb391b040c4468262706faf9d3cce", 69 | }, 70 | { 71 | "url": "https://pypi.example.com/packages/2.7/f/foo/foo.whl", 72 | "filename": "foo.whl", 73 | "digests": { 74 | "md5": "6bd3ddc295176f4dca196b5eb2c4d858", 75 | "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", 76 | }, 77 | "md5_digest": "6bd3ddc295176f4dca196b5eb2c4d858", 78 | }, 79 | ] 80 | }, 81 | } 82 | 83 | 84 | @pytest.fixture 85 | def master(package_json: Dict[str, Any]) -> "Master": 86 | from bandersnatch.master import Master 87 | 88 | class FakeReader: 89 | async def read(self, *args: Any) -> bytes: 90 | return b"" 91 | 92 | class FakeAiohttpClient: 93 | headers = {"X-PYPI-LAST-SERIAL": "1"} 94 | 95 | async def __aenter__(self) -> "FakeAiohttpClient": 96 | return self 97 | 98 | async def __aexit__(self, *args: Any) -> None: 99 | pass 100 | 101 | @property 102 | def content(self) -> "FakeReader": 103 | return FakeReader() 104 | 105 | async def json(self, *args: Any) -> Dict[str, Any]: 106 | return package_json 107 | 108 | master = Master("https://pypi.example.com") 109 | master.rpc = mock.Mock() # type: ignore 110 | master.session = asynctest.MagicMock() 111 | master.session.get = asynctest.MagicMock(return_value=FakeAiohttpClient()) 112 | master.session.request = asynctest.MagicMock(return_value=FakeAiohttpClient()) 113 | return master 114 | 115 | 116 | @pytest.fixture 117 | def mirror( 118 | tmpdir: Path, master: "Master", monkeypatch: MonkeyPatch 119 | ) -> "BandersnatchMirror": 120 | monkeypatch.chdir(tmpdir) 121 | from bandersnatch.mirror import BandersnatchMirror 122 | 123 | return BandersnatchMirror(tmpdir, master) 124 | 125 | 126 | @pytest.fixture 127 | def mirror_hash_index( 128 | tmpdir: Path, master: "Master", monkeypatch: MonkeyPatch 129 | ) -> "BandersnatchMirror": 130 | monkeypatch.chdir(tmpdir) 131 | from bandersnatch.mirror import BandersnatchMirror 132 | 133 | return BandersnatchMirror(tmpdir, master, hash_index=True) 134 | 135 | 136 | @pytest.fixture 137 | def mirror_mock(request: FixtureRequest) -> mock.MagicMock: 138 | patcher = mock.patch("bandersnatch.mirror.BandersnatchMirror") 139 | mirror: mock.MagicMock = patcher.start() 140 | 141 | def tearDown() -> None: 142 | patcher.stop() 143 | 144 | request.addfinalizer(tearDown) 145 | return mirror 146 | 147 | 148 | @pytest.fixture 149 | def logging_mock(request: FixtureRequest) -> mock.MagicMock: 150 | patcher = mock.patch("logging.config.fileConfig") 151 | logger: mock.MagicMock = patcher.start() 152 | 153 | def tearDown() -> None: 154 | patcher.stop() 155 | 156 | request.addfinalizer(tearDown) 157 | return logger 158 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_main.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import configparser 3 | import sys 4 | import tempfile 5 | import unittest.mock as mock 6 | from pathlib import Path 7 | from typing import TYPE_CHECKING, Any, Dict 8 | 9 | import pytest 10 | from _pytest.capture import CaptureFixture 11 | from _pytest.logging import LogCaptureFixture 12 | 13 | import bandersnatch.mirror 14 | import bandersnatch.storage 15 | from bandersnatch.configuration import Singleton 16 | from bandersnatch.main import main 17 | 18 | if TYPE_CHECKING: 19 | from bandersnatch.mirror import BandersnatchMirror 20 | 21 | 22 | async def empty_dict(*args: Any, **kwargs: Any) -> Dict: 23 | return {} 24 | 25 | 26 | def setup() -> None: 27 | """ simple setup function to clear Singleton._instances before each test""" 28 | Singleton._instances = {} 29 | 30 | 31 | def test_main_help(capfd: CaptureFixture) -> None: 32 | sys.argv = ["bandersnatch", "--help"] 33 | with pytest.raises(SystemExit): 34 | main(asyncio.new_event_loop()) 35 | out, err = capfd.readouterr() 36 | assert out.startswith("usage: bandersnatch") 37 | assert "" == err 38 | 39 | 40 | def test_main_create_config(caplog: LogCaptureFixture, tmpdir: Path) -> None: 41 | sys.argv = ["bandersnatch", "-c", str(tmpdir / "bandersnatch.conf"), "mirror"] 42 | assert main(asyncio.new_event_loop()) == 1 43 | assert "creating default config" in caplog.text 44 | conf_path = Path(tmpdir) / "bandersnatch.conf" 45 | assert conf_path.exists() 46 | 47 | 48 | def test_main_cant_create_config(caplog: LogCaptureFixture, tmpdir: Path) -> None: 49 | sys.argv = [ 50 | "bandersnatch", 51 | "-c", 52 | str(tmpdir / "foo" / "bandersnatch.conf"), 53 | "mirror", 54 | ] 55 | assert main(asyncio.new_event_loop()) == 1 56 | assert "creating default config" in caplog.text 57 | assert "Could not create config file" in caplog.text 58 | conf_path = Path(tmpdir) / "bandersnatch.conf" 59 | assert not conf_path.exists() 60 | 61 | 62 | def test_main_reads_config_values(mirror_mock: mock.MagicMock, tmpdir: Path) -> None: 63 | base_config_path = Path(bandersnatch.__file__).parent / "unittest.conf" 64 | diff_file = Path(tempfile.gettempdir()) / "srv/pypi/mirrored-files" 65 | config_lines = [ 66 | f"diff-file = {diff_file.as_posix()}\n" 67 | if line.startswith("diff-file") 68 | else line 69 | for line in base_config_path.read_text().splitlines() 70 | ] 71 | config_path = tmpdir / "unittest.conf" 72 | config_path.write_text("\n".join(config_lines), encoding="utf-8") 73 | sys.argv = ["bandersnatch", "-c", str(config_path), "mirror"] 74 | assert config_path.exists() 75 | main(asyncio.new_event_loop()) 76 | (homedir, master), kwargs = mirror_mock.call_args_list[0] 77 | 78 | assert Path("/srv/pypi") == homedir 79 | assert isinstance(master, bandersnatch.master.Master) 80 | assert { 81 | "stop_on_error": False, 82 | "hash_index": False, 83 | "workers": 3, 84 | "root_uri": "", 85 | "json_save": False, 86 | "digest_name": "sha256", 87 | "keep_index_versions": 0, 88 | "storage_backend": "filesystem", 89 | "diff_file": diff_file, 90 | "diff_append_epoch": False, 91 | "diff_full_path": diff_file, 92 | "cleanup": False, 93 | } == kwargs 94 | 95 | 96 | def test_main_reads_custom_config_values( 97 | mirror_mock: "BandersnatchMirror", logging_mock: mock.MagicMock, customconfig: Path 98 | ) -> None: 99 | setup() 100 | conffile = str(customconfig / "bandersnatch.conf") 101 | sys.argv = ["bandersnatch", "-c", conffile, "mirror"] 102 | main(asyncio.new_event_loop()) 103 | (log_config, _kwargs) = logging_mock.call_args_list[0] 104 | assert log_config == (str(customconfig / "bandersnatch-log.conf"),) 105 | 106 | 107 | def test_main_throws_exception_on_unsupported_digest_name(customconfig: Path,) -> None: 108 | setup() 109 | conffile = str(customconfig / "bandersnatch.conf") 110 | parser = configparser.ConfigParser() 111 | parser.read(conffile) 112 | parser["mirror"]["digest_name"] = "foobar" 113 | del parser["mirror"]["log-config"] 114 | with open(conffile, "w") as fp: 115 | parser.write(fp) 116 | sys.argv = ["bandersnatch", "-c", conffile, "mirror"] 117 | 118 | with pytest.raises(ValueError) as e: 119 | main(asyncio.new_event_loop()) 120 | 121 | assert "foobar is not supported" in str(e.value) 122 | 123 | 124 | @pytest.fixture 125 | def customconfig(tmpdir: Path) -> Path: 126 | default_path = Path(bandersnatch.__file__).parent / "unittest.conf" 127 | with default_path.open("r") as dfp: 128 | config = dfp.read() 129 | config = config.replace("/srv/pypi", str(tmpdir / "pypi")) 130 | with open(str(tmpdir / "bandersnatch.conf"), "w") as f: 131 | f.write(config) 132 | config = config.replace("; log-config", "log-config") 133 | config = config.replace( 134 | "/etc/bandersnatch-log.conf", str(tmpdir / "bandersnatch-log.conf") 135 | ) 136 | with open(str(tmpdir / "bandersnatch.conf"), "w") as f: 137 | f.write(config) 138 | return tmpdir 139 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_filter.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import unittest 4 | from tempfile import TemporaryDirectory 5 | from unittest import TestCase 6 | 7 | from mock_config import mock_config 8 | 9 | from bandersnatch.configuration import BandersnatchConfig 10 | 11 | from bandersnatch.filter import ( # isort:skip 12 | Filter, 13 | FilterProjectPlugin, 14 | FilterReleasePlugin, 15 | LoadedFilters, 16 | ) 17 | 18 | 19 | class TestBandersnatchFilter(TestCase): 20 | """ 21 | Tests for the bandersnatch filtering classes 22 | """ 23 | 24 | tempdir = None 25 | cwd = None 26 | 27 | def setUp(self) -> None: 28 | self.cwd = os.getcwd() 29 | self.tempdir = TemporaryDirectory() 30 | os.chdir(self.tempdir.name) 31 | sys.stderr.write(self.tempdir.name) 32 | sys.stderr.flush() 33 | 34 | def tearDown(self) -> None: 35 | if self.tempdir: 36 | assert self.cwd 37 | os.chdir(self.cwd) 38 | self.tempdir.cleanup() 39 | self.tempdir = None 40 | 41 | def test__filter_project_plugins__loads(self) -> None: 42 | mock_config( 43 | """\ 44 | [plugins] 45 | enabled = all 46 | """ 47 | ) 48 | builtin_plugin_names = [ 49 | "blocklist_project", 50 | "regex_project", 51 | "allowlist_project", 52 | ] 53 | 54 | plugins = LoadedFilters().filter_project_plugins() 55 | names = [plugin.name for plugin in plugins] 56 | for name in builtin_plugin_names: 57 | self.assertIn(name, names) 58 | 59 | def test__filter_release_plugins__loads(self) -> None: 60 | mock_config( 61 | """\ 62 | [plugins] 63 | enabled = all 64 | """ 65 | ) 66 | builtin_plugin_names = [ 67 | "blocklist_release", 68 | "prerelease_release", 69 | "regex_release", 70 | "latest_release", 71 | ] 72 | 73 | plugins = LoadedFilters().filter_release_plugins() 74 | names = [plugin.name for plugin in plugins] 75 | for name in builtin_plugin_names: 76 | self.assertIn(name, names) 77 | 78 | def test__filter_no_plugin(self) -> None: 79 | mock_config( 80 | """\ 81 | [plugins] 82 | enabled = 83 | """ 84 | ) 85 | 86 | plugins = LoadedFilters().filter_release_plugins() 87 | self.assertEqual(len(plugins), 0) 88 | 89 | plugins = LoadedFilters().filter_project_plugins() 90 | self.assertEqual(len(plugins), 0) 91 | 92 | def test__filter_base_clases(self) -> None: 93 | """ 94 | Test the base filter classes 95 | """ 96 | 97 | plugin = Filter() 98 | self.assertEqual(plugin.name, "filter") 99 | try: 100 | plugin.initialize_plugin() 101 | error = False 102 | except Exception: 103 | error = True 104 | self.assertFalse(error) 105 | 106 | plugin = FilterReleasePlugin() 107 | self.assertIsInstance(plugin, Filter) 108 | self.assertEqual(plugin.name, "release_plugin") 109 | try: 110 | plugin.filter({}) 111 | error = False 112 | except Exception: 113 | error = True 114 | self.assertFalse(error) 115 | 116 | plugin = FilterProjectPlugin() 117 | self.assertIsInstance(plugin, Filter) 118 | self.assertEqual(plugin.name, "project_plugin") 119 | try: 120 | result = plugin.check_match(key="value") 121 | error = False 122 | self.assertIsInstance(result, bool) 123 | except Exception: 124 | error = True 125 | self.assertFalse(error) 126 | 127 | def test_deprecated_keys(self) -> None: 128 | with open("test.conf", "w") as f: 129 | f.write("[allowlist]\npackages=foo\n[blocklist]\npackages=bar\n") 130 | instance = BandersnatchConfig() 131 | instance.config_file = "test.conf" 132 | instance.load_configuration() 133 | plugin = Filter() 134 | assert plugin.allowlist.name == "allowlist" 135 | assert plugin.blocklist.name == "blocklist" 136 | 137 | def test__filter_project_blocklist_allowlist__pep503_normalize(self) -> None: 138 | mock_config( 139 | """\ 140 | [plugins] 141 | enabled = 142 | blocklist_project 143 | allowlist_project 144 | 145 | [blocklist] 146 | packages = 147 | SampleProject 148 | trove----classifiers 149 | 150 | [allowlist] 151 | packages = 152 | SampleProject 153 | trove----classifiers 154 | """ 155 | ) 156 | 157 | plugins = { 158 | plugin.name: plugin for plugin in LoadedFilters().filter_project_plugins() 159 | } 160 | 161 | self.assertTrue(plugins["blocklist_project"].check_match(name="sampleproject")) 162 | self.assertTrue( 163 | plugins["blocklist_project"].check_match(name="trove-classifiers") 164 | ) 165 | self.assertFalse(plugins["allowlist_project"].check_match(name="sampleproject")) 166 | self.assertFalse( 167 | plugins["allowlist_project"].check_match(name="trove-classifiers") 168 | ) 169 | 170 | 171 | if __name__ == "__main__": 172 | unittest.main() 173 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_delete.py: -------------------------------------------------------------------------------- 1 | import os 2 | from argparse import Namespace 3 | from configparser import ConfigParser 4 | from json import loads 5 | from pathlib import Path 6 | from tempfile import TemporaryDirectory 7 | from unittest.mock import patch 8 | from urllib.parse import urlparse 9 | 10 | import pytest 11 | 12 | from bandersnatch.delete import delete_packages, delete_path 13 | from bandersnatch.master import Master 14 | from bandersnatch.utils import find 15 | 16 | EXPECTED_WEB_BEFORE_DELETION = """\ 17 | json 18 | json{0}cooper 19 | json{0}unittest 20 | packages 21 | packages{0}69 22 | packages{0}69{0}cooper-6.9.tar.gz 23 | packages{0}69{0}unittest-6.9.tar.gz 24 | packages{0}7b 25 | packages{0}7b{0}cooper-6.9-py3-none-any.whl 26 | packages{0}7b{0}unittest-6.9-py3-none-any.whl 27 | pypi 28 | pypi{0}cooper 29 | pypi{0}cooper{0}json 30 | pypi{0}unittest 31 | pypi{0}unittest{0}json 32 | simple 33 | simple{0}cooper 34 | simple{0}cooper{0}index.html 35 | simple{0}unittest 36 | simple{0}unittest{0}index.html\ 37 | """.format( 38 | os.sep 39 | ) 40 | EXPECTED_WEB_AFTER_DELETION = """\ 41 | json 42 | packages 43 | packages{0}69 44 | packages{0}7b 45 | pypi 46 | simple\ 47 | """.format( 48 | os.sep 49 | ) 50 | MOCK_JSON_TEMPLATE = """{ 51 | "releases": { 52 | "6.9": [ 53 | {"url": "https://files.ph.org/packages/7b/PKGNAME-6.9-py3-none-any.whl"}, 54 | {"url": "https://files.ph.org/packages/69/PKGNAME-6.9.tar.gz"} 55 | ] 56 | } 57 | } 58 | """ 59 | 60 | 61 | def _fake_args() -> Namespace: 62 | return Namespace(dry_run=True, pypi_packages=["cooper", "unittest"], workers=0) 63 | 64 | 65 | def _fake_config() -> ConfigParser: 66 | cp = ConfigParser() 67 | cp.add_section("mirror") 68 | cp["mirror"]["directory"] = "/tmp/unittest" 69 | cp["mirror"]["workers"] = "1" 70 | cp["mirror"]["storage-backend"] = "filesystem" 71 | return cp 72 | 73 | 74 | def test_delete_path() -> None: 75 | with TemporaryDirectory() as td: 76 | td_path = Path(td) 77 | fake_path = td_path / "unittest-file.tgz" 78 | with patch("bandersnatch.delete.logger.info") as mock_log: 79 | assert delete_path(fake_path, True) == 0 80 | assert mock_log.call_count == 1 81 | 82 | with patch("bandersnatch.delete.logger.debug") as mock_log: 83 | assert delete_path(fake_path, False) == 0 84 | assert mock_log.call_count == 1 85 | 86 | fake_path.touch() 87 | # Remove file 88 | assert delete_path(fake_path, False) == 0 89 | # File should be gone - We should log that via debug 90 | with patch("bandersnatch.delete.logger.debug") as mock_log: 91 | assert delete_path(fake_path, False) == 0 92 | assert mock_log.call_count == 1 93 | 94 | 95 | @pytest.mark.asyncio 96 | async def test_delete_packages() -> None: 97 | args = _fake_args() 98 | config = _fake_config() 99 | master = Master("https://unittest.org") 100 | 101 | with TemporaryDirectory() as td: 102 | td_path = Path(td) 103 | config["mirror"]["directory"] = td 104 | web_path = td_path / "web" 105 | json_path = web_path / "json" 106 | json_path.mkdir(parents=True) 107 | pypi_path = web_path / "pypi" 108 | pypi_path.mkdir(parents=True) 109 | simple_path = web_path / "simple" 110 | 111 | # Setup web tree with some json, package index.html + fake blobs 112 | for package_name in args.pypi_packages: 113 | package_simple_path = simple_path / package_name 114 | package_simple_path.mkdir(parents=True) 115 | package_index_path = package_simple_path / "index.html" 116 | package_index_path.touch() 117 | 118 | package_json_str = MOCK_JSON_TEMPLATE.replace("PKGNAME", package_name) 119 | package_json_path = json_path / package_name 120 | with package_json_path.open("w") as pjfp: 121 | pjfp.write(package_json_str) 122 | legacy_json_path = pypi_path / package_name / "json" 123 | legacy_json_path.parent.mkdir() 124 | legacy_json_path.symlink_to(package_json_path) 125 | 126 | package_json = loads(package_json_str) 127 | for _version, blobs in package_json["releases"].items(): 128 | for blob in blobs: 129 | url_parts = urlparse(blob["url"]) 130 | blob_path = web_path / url_parts.path[1:] 131 | blob_path.parent.mkdir(parents=True, exist_ok=True) 132 | blob_path.touch() 133 | 134 | # See we have a correct mirror setup 135 | assert find(web_path) == EXPECTED_WEB_BEFORE_DELETION 136 | 137 | args.dry_run = True 138 | assert await delete_packages(config, args, master) == 0 139 | 140 | args.dry_run = False 141 | with patch("bandersnatch.delete.logger.info") as mock_log: 142 | assert await delete_packages(config, args, master) == 0 143 | assert mock_log.call_count == 1 144 | 145 | # See we've deleted it all 146 | assert find(web_path) == EXPECTED_WEB_AFTER_DELETION 147 | 148 | 149 | @pytest.mark.asyncio 150 | async def test_delete_packages_no_exist() -> None: 151 | args = _fake_args() 152 | master = Master("https://unittest.org") 153 | with patch("bandersnatch.delete.logger.error") as mock_log: 154 | assert await delete_packages(_fake_config(), args, master) == 0 155 | assert mock_log.call_count == len(args.pypi_packages) 156 | -------------------------------------------------------------------------------- /docs/mirror_configuration.md: -------------------------------------------------------------------------------- 1 | ## Mirror configuration 2 | 3 | The mirror configuration settings are in a configuration section of the configuration file 4 | named **\[mirror\]**. 5 | 6 | This section contains settings to specify how the mirroring software should operate. 7 | 8 | ### directory 9 | 10 | The mirror directory setting is a string that specifies the directory to 11 | store the mirror files. 12 | 13 | The directory used must meet the following requirements: 14 | - The filesystem must be case-sensitive filesystem. 15 | - The filesystem must support large numbers of sub-directories. 16 | - The filesystem must support large numbers of files (inodes) 17 | 18 | Example: 19 | ``` ini 20 | [mirror] 21 | directory = /srv/pypi 22 | ``` 23 | 24 | ### json 25 | 26 | The mirror json seting is a boolean (true/false) setting that indicates that 27 | the json packaging metadata should be mirrored in additon to the packages. 28 | 29 | Example: 30 | ``` ini 31 | [mirror] 32 | json = false 33 | ``` 34 | 35 | ### master 36 | 37 | The master setting is a string containing a url of the server which will be mirrored. 38 | 39 | The master url string must use https: protocol. 40 | 41 | The default value is: https://pypi.org 42 | 43 | Example: 44 | ``` ini 45 | [mirror] 46 | master = https://pypi.org 47 | ``` 48 | 49 | ### timeout 50 | 51 | The timeout value is an integer that indicates the maximum number of seconds for web requests. 52 | 53 | The default value for this setting is 10 seconds. 54 | 55 | Example: 56 | ``` ini 57 | [mirror] 58 | timeout = 10 59 | ``` 60 | 61 | ### global-timeout 62 | 63 | The global-timeout value is an integer that indicates the maximum runtime of individual aiohttp coroutines. 64 | 65 | The default value for this setting is 18000 seconds, or 5 hours. 66 | 67 | Example: 68 | ```ini 69 | [mirror] 70 | global-timeout = 18000 71 | ``` 72 | 73 | ### workers 74 | 75 | The workers value is an integer from from 1-10 that indicates the number of concurrent downloads. 76 | 77 | The default value is 3. 78 | 79 | Recommendations for the workers setting: 80 | - leave the default of 3 to avoid overloading the pypi master 81 | - official servers located in data centers could run 10 workers 82 | - anything beyond 10 is probably unreasonable and is not allowed. 83 | 84 | ### hash-index 85 | 86 | The hash-index is a boolean (true/false) to determine if package hashing should be used. 87 | 88 | The Recommended setting: the default of false for full pip/pypi compatibility. 89 | 90 | ```eval_rst 91 | .. warning:: Package index directory hashing is incompatible with pip, and so this should only be used in an environment where it is behind an application that can translate URIs to filesystem locations. 92 | ``` 93 | 94 | #### Apache rewrite rules when using hash-index 95 | 96 | When using this setting with an apache server. The apache server will need the following rewrite rules: 97 | 98 | ``` 99 | RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/ 100 | RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3 101 | ``` 102 | 103 | #### NGINX rewrite rules when using hash-index 104 | 105 | When using this setting with an nginx server. The nginx server will need the following rewrite rules: 106 | 107 | ``` 108 | rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last; 109 | rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last; 110 | ``` 111 | 112 | ### stop-on-error 113 | 114 | The stop-on-error setting is a boolean (true/false) setting that indicates if bandersnatch 115 | should stop immediately if it encounters an error. 116 | 117 | If this setting is false it will not stop when an error is encountered but it will not 118 | mark the sync as successful when the sync is complete. 119 | 120 | ``` ini 121 | [mirror] 122 | stop-on-error = false 123 | ``` 124 | 125 | ### log-config 126 | 127 | The log-config setting is a string containing the filename of a python logging configuration 128 | file. 129 | 130 | Example: 131 | ```ini 132 | [mirror] 133 | log-config = /etc/bandersnatch-log.conf 134 | ``` 135 | 136 | ### root_uri 137 | 138 | The root_uri is a string containing a uri which is the root added to relative links. 139 | 140 | ``` eval_rst 141 | .. note:: This is generally not necessary, but was added for the official internal PyPI mirror, which requires serving packages from https://files.pythonhosted.org 142 | ``` 143 | 144 | Example: 145 | ```ini 146 | [mirror] 147 | root_uri = https://example.com 148 | ``` 149 | 150 | 151 | ### diff-file 152 | 153 | The diff file is a string containing the filename to log the files that were downloaded during the mirror. 154 | This file can then be used to synchronize external disks or send the files through some other mechanism to offline systems. 155 | You can then sync the list of files to an attached drive or ssh destination such as a diode: 156 | ``` 157 | rsync -av --files-from=/srv/pypi/mirrored-files / /mnt/usb/ 158 | ``` 159 | 160 | You can also use this file list as an input to 7zip to create split archives for transfers, allowing you to size the files as you needed: 161 | ``` 162 | 7za a -i@"/srv/pypi/mirrored-files" -spf -v100m path_to_new_zip.7z 163 | ``` 164 | 165 | Example: 166 | ```ini 167 | [mirror] 168 | diff-file = /srv/pypi/mirrored-files 169 | ``` 170 | 171 | 172 | 173 | ### diff-append-epoch 174 | 175 | The diff append epoch is a boolean (true/false) setting that indicates if the diff-file should be appended with the current epoch time. 176 | This can be used to track diffs over time so the diff file doesn't get cobbered each run. It is only used when diff-file is used. 177 | 178 | Example: 179 | ```ini 180 | [mirror] 181 | diff-append-epoch = true 182 | ``` 183 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/plugins/test_filename.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from tempfile import TemporaryDirectory 4 | from unittest import TestCase 5 | 6 | from mock_config import mock_config 7 | 8 | import bandersnatch.filter 9 | from bandersnatch.master import Master 10 | from bandersnatch.mirror import BandersnatchMirror 11 | from bandersnatch.package import Package 12 | from bandersnatch_filter_plugins import filename_name 13 | 14 | 15 | class BasePluginTestCase(TestCase): 16 | 17 | tempdir = None 18 | cwd = None 19 | 20 | def setUp(self) -> None: 21 | self.cwd = os.getcwd() 22 | self.tempdir = TemporaryDirectory() 23 | os.chdir(self.tempdir.name) 24 | 25 | def tearDown(self) -> None: 26 | if self.tempdir: 27 | assert self.cwd 28 | os.chdir(self.cwd) 29 | self.tempdir.cleanup() 30 | self.tempdir = None 31 | 32 | 33 | class TestExcludePlatformFilter(BasePluginTestCase): 34 | 35 | config_contents = """\ 36 | [plugins] 37 | enabled = 38 | exclude_platform 39 | 40 | [blocklist] 41 | platforms = 42 | windows 43 | freebsd 44 | macos 45 | linux_armv7l 46 | """ 47 | 48 | def test_plugin_compiles_patterns(self) -> None: 49 | mock_config(self.config_contents) 50 | 51 | plugins = bandersnatch.filter.LoadedFilters().filter_release_file_plugins() 52 | 53 | assert any( 54 | type(plugin) == filename_name.ExcludePlatformFilter for plugin in plugins 55 | ) 56 | 57 | def test_exclude_platform(self) -> None: 58 | """ 59 | Tests the platform filter for what it will keep and excluded 60 | based on the config provided. It is expected to drop all windows, 61 | freebsd and macos packages while only dropping linux-armv7l from 62 | linux packages 63 | """ 64 | mock_config(self.config_contents) 65 | 66 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 67 | pkg = Package("foobar", 1) 68 | pkg._metadata = { 69 | "info": {"name": "foobar", "version": "1.0"}, 70 | "releases": { 71 | "1.0": [ 72 | { 73 | "packagetype": "sdist", 74 | "filename": "foobar-1.0-win32.tar.gz", 75 | "flag": "KEEP", 76 | }, 77 | { 78 | "packagetype": "bdist_msi", 79 | "filename": "foobar-1.0.msi", 80 | "flag": "DROP", 81 | }, 82 | { 83 | "packagetype": "bdist_wininst", 84 | "filename": "foobar-1.0.exe", 85 | "flag": "DROP", 86 | }, 87 | { 88 | "packagetype": "bdist_dmg", 89 | "filename": "foobar-1.0.dmg", 90 | "flag": "DROP", 91 | }, 92 | { 93 | "packagetype": "bdist_wheel", 94 | "filename": "foobar-1.0-win32.zip", 95 | "flag": "DROP", 96 | }, 97 | { 98 | "packagetype": "bdist_wheel", 99 | "filename": "foobar-1.0-linux.tar.gz", 100 | "flag": "KEEP", 101 | }, 102 | { 103 | "packagetype": "bdist_wheel", 104 | "filename": "foobar-1.0-manylinux1_i686.whl", 105 | "flag": "KEEP", 106 | }, 107 | { 108 | "packagetype": "bdist_wheel", 109 | "filename": "foobar-1.0-linux_armv7l.whl", 110 | "flag": "DROP", 111 | }, 112 | { 113 | "packagetype": "bdist_wheel", 114 | "filename": "foobar-1.0-macosx_10_14_x86_64.whl", 115 | "flag": "DROP", 116 | }, 117 | { 118 | "packagetype": "bdist_egg", 119 | "filename": "foobar-1.0-win_amd64.zip", 120 | "flag": "DROP", 121 | }, 122 | { 123 | "packagetype": "unknown", 124 | "filename": "foobar-1.0-unknown", 125 | "flag": "KEEP", 126 | }, 127 | ], 128 | "0.1": [ 129 | { 130 | "packagetype": "sdist", 131 | "filename": "foobar-0.1-win32.msi", 132 | "flag": "KEEP", 133 | }, 134 | { 135 | "packagetype": "bdist_wheel", 136 | "filename": "foobar-0.1-win32.whl", 137 | "flag": "DROP", 138 | }, 139 | ], 140 | "0.2": [ 141 | { 142 | "packagetype": "bdist_egg", 143 | "filename": "foobar-0.1-freebsd-6.0-RELEASE-i386.egg", 144 | "flag": "DROP", 145 | } 146 | ], 147 | }, 148 | } 149 | 150 | # count the files we should keep 151 | rv = pkg.releases.values() 152 | keep_count = sum(f["flag"] == "KEEP" for r in rv for f in r) 153 | 154 | pkg.filter_all_releases_files(mirror.filters.filter_release_file_plugins()) 155 | 156 | # we should have the same keep count and no drop 157 | rv = pkg.releases.values() 158 | assert sum(f["flag"] == "KEEP" for r in rv for f in r) == keep_count 159 | assert sum(f["flag"] == "DROP" for r in rv for f in r) == 0 160 | 161 | # the release "0.2" should have been deleted since there is no more file in it 162 | assert len(pkg.releases.keys()) == 2 163 | -------------------------------------------------------------------------------- /docs/filtering_configuration.md: -------------------------------------------------------------------------------- 1 | ## Mirror filtering 2 | 3 | _NOTE: All references to whitelist/blacklist are deprecated, and will be replaced with allowlist/blocklist in 5.0_ 4 | 5 | The mirror filter configuration settings are in the same configuration file as the mirror settings. 6 | There are different configuration sections for the different plugin types. 7 | 8 | Filtering Plugin pacakage lists need to use the **Raw PyPI Name** 9 | (non [PEP503](https://www.python.org/dev/peps/pep-0503/#normalized-names) normalized) 10 | in order to get filtered. 11 | 12 | E.g. to Blacklist [ACMPlus](https://pypi.org/project/ACMPlus/) you'd need to 13 | use that *exact* casing in `bandersnatch.conf` 14 | 15 | - A PR would be welcome fixing the normalization but it's an invasive PR 16 | 17 | ### Plugins Enabling 18 | 19 | The plugins setting is a list of plugins to enable. 20 | 21 | Example (enable all installed filter plugins): 22 | 23 | - *Explicitly* enabling plugins is now **mandatory** for *activating plugins* 24 | - They will *do nothing* without activation 25 | 26 | Also, enabling will get plugin's defaults if not configured in their respective sections. 27 | 28 | ```ini 29 | [plugins] 30 | enabled = all 31 | ``` 32 | 33 | Example (only enable specific plugins): 34 | 35 | ```ini 36 | [plugins] 37 | enabled = 38 | blacklist_project 39 | whitelist_project 40 | ... 41 | ``` 42 | 43 | ### blacklist / whitelist filtering settings 44 | 45 | The blacklist / whitelist settings are in configuration sections named **\[blacklist\]** and **\[whitelist\]** 46 | these section provides settings to indicate packages, projects and releases that should / 47 | should not be mirrored from PyPI. 48 | 49 | This is useful to avoid syncing broken or malicious packages. 50 | 51 | ### packages 52 | 53 | The packages setting is a list of python [pep440 version specifier](https://www.python.org/dev/peps/pep-0440/#id51) of packages to not be mirrored. Enable version specifier filtering for whitelist and blacklist packages through enabling the 'blacklist_release' and 'allowlist_release' plugins, respectively. 54 | 55 | Any packages matching the version specifier for blacklist packages will not be downloaded. Any packages not matching the version specifier for whitelist packages will not be downloaded. 56 | 57 | Example: 58 | 59 | ```ini 60 | [plugins] 61 | enabled = 62 | blacklist_project 63 | blacklist_release 64 | whitelist_project 65 | allowlist_release 66 | 67 | [blacklist] 68 | packages = 69 | example1 70 | example2>=1.4.2,<1.9,!=1.5.*,!=1.6.* 71 | 72 | [whitelist] 73 | packages = 74 | black==18.5 75 | ptr 76 | ``` 77 | 78 | ### Metadata Filtering 79 | Packages and release files may be selected by filtering on specific metadata value. 80 | 81 | General form of configuration entries is: 82 | 83 | ```ini 84 | [filter_some_metadata] 85 | tag:tag:path.to.object = 86 | matcha 87 | matchb 88 | ``` 89 | 90 | #### Project Regex Matching 91 | 92 | Filter projects to be synced based on regex matches against their raw metadata entries straight from parsed downloaded json. 93 | 94 | Example: 95 | 96 | ```ini 97 | [regex_project_metadata] 98 | not-null:info.classifiers = 99 | .*Programming Language :: Python :: 2.* 100 | ``` 101 | 102 | Valid tags are `all`,`any`,`none`,`match-null`,`not-null`, with default of `any:match-null` 103 | 104 | All metadata provided by json is available, including `info`, `last_serial`, `releases`, etc. headings. 105 | 106 | 107 | #### Release File Regex Matching 108 | 109 | Filter release files to be downloaded for projects based on regex matches against the stored metadata entries for each release file. 110 | 111 | Example: 112 | 113 | ```ini 114 | [regex_release_file_metadata] 115 | any:release_file.packagetype = 116 | sdist 117 | bdist_wheel 118 | ``` 119 | 120 | Valid tags are the same as for projects. 121 | 122 | Metadata available to match consists of `info`, `release`, and `release_file` top level structures, with `info` 123 | containing the package-wide inthe fo, `release` containing the version of the release and `release_file` the metadata 124 | for an individual file for that release. 125 | 126 | 127 | ### Prerelease filtering 128 | 129 | Bandersnatch includes a plugin to filter our pre-releases of packages. To enable this plugin simply add `prerelease_release` to the enabled plugins list. 130 | 131 | ```ini 132 | [plugins] 133 | enabled = 134 | prerelease_release 135 | ``` 136 | 137 | ### Regex filtering 138 | 139 | Advanced users who would like finer control over which packages and releases to filter can use the regex Bandersnatch plugin. 140 | 141 | This plugin allows arbitrary regular expressions to be defined in the configuration, any package name or release version that matches will *not* be downloaded. 142 | 143 | The plugin can be activated for packages and releases separately. For example to activate the project regex filter simply add it to the configuration as before: 144 | 145 | ```ini 146 | [plugins] 147 | enabled = 148 | regex_project 149 | ``` 150 | 151 | If you'd like to filter releases using the regex filter use `regex_release` instead. 152 | 153 | The regex plugin requires an extra section in the config to define the actual patterns to used for filtering: 154 | 155 | ```ini 156 | [filter_regex] 157 | packages = 158 | .+-evil$ 159 | releases = 160 | .+alpha\d$ 161 | ``` 162 | 163 | Note the same `filter_regex` section may include a `packages` and a `releases` entry with any number of regular expressions. 164 | 165 | 166 | ### Platform-specific binaries filtering 167 | 168 | This filter allows advanced users not interesting in Windows/macOS/Linux specific binaries to not mirror the corresponding files. 169 | 170 | 171 | ```ini 172 | [plugins] 173 | enabled = 174 | exclude_platform 175 | [blacklist] 176 | platforms = 177 | windows 178 | ``` 179 | 180 | Available platforms are: `windows` `macos` `freebsd` `linux`. 181 | 182 | 183 | ### Keep only latest releases 184 | 185 | You can also keep only the latest releases based on greatest [Version](https://packaging.pypa.io/en/latest/version/) numbers. 186 | 187 | ```ini 188 | [plugins] 189 | enabled = 190 | latest_release 191 | 192 | [latest_release] 193 | keep = 3 194 | ``` 195 | 196 | By default, the plugin does not filter out any release. You have to add the `keep` setting. 197 | 198 | You should be aware that it can break requirements. 199 | -------------------------------------------------------------------------------- /src/bandersnatch/configuration.py: -------------------------------------------------------------------------------- 1 | """ 2 | Module containing classes to access the bandersnatch configuration file 3 | """ 4 | import configparser 5 | import logging 6 | import warnings 7 | from pathlib import Path 8 | from typing import Any, Dict, List, NamedTuple, Optional, Type 9 | 10 | try: 11 | import importlib.resources 12 | except ImportError: # pragma: no cover 13 | # For <=3.6 14 | import importlib 15 | import importlib_resources 16 | 17 | importlib.resources = importlib_resources 18 | 19 | 20 | logger = logging.getLogger("bandersnatch") 21 | 22 | 23 | class SetConfigValues(NamedTuple): 24 | json_save: bool 25 | root_uri: str 26 | diff_file_path: str 27 | diff_append_epoch: bool 28 | digest_name: str 29 | storage_backend_name: str 30 | cleanup: bool 31 | 32 | 33 | class Singleton(type): # pragma: no cover 34 | _instances: Dict["Singleton", Type] = {} 35 | 36 | def __call__(cls, *args: Any, **kwargs: Any) -> Type: 37 | if cls not in cls._instances: 38 | cls._instances[cls] = super().__call__(*args, **kwargs) 39 | return cls._instances[cls] 40 | 41 | 42 | class BandersnatchConfig(metaclass=Singleton): 43 | # Ensure we only show the deprecations once 44 | SHOWN_DEPRECATIONS = False 45 | 46 | def __init__(self, config_file: Optional[str] = None) -> None: 47 | """ 48 | Bandersnatch configuration class singleton 49 | 50 | This class is a singleton that parses the configuration once at the 51 | start time. 52 | 53 | Parameters 54 | ========== 55 | config_file: str, optional 56 | Path to the configuration file to use 57 | """ 58 | self.found_deprecations: List[str] = [] 59 | with importlib.resources.path( # type: ignore 60 | "bandersnatch", "default.conf" 61 | ) as config_path: 62 | self.default_config_file = str(config_path) 63 | self.config_file = config_file 64 | self.load_configuration() 65 | self.check_for_deprecations() 66 | 67 | def check_for_deprecations(self) -> None: 68 | if self.SHOWN_DEPRECATIONS: 69 | return 70 | if self.config.has_section("whitelist") or self.config.has_section("blacklist"): 71 | err_msg = ( 72 | "whitelist/blacklist filter plugins will be renamed to " 73 | "allowlist_*/blocklist_* in version 5.0 " 74 | " - Documentation @ https://bandersnatch.readthedocs.io/" 75 | ) 76 | warnings.warn(err_msg, DeprecationWarning, stacklevel=2) 77 | logger.warning(err_msg) 78 | self.SHOWN_DEPRECATIONS = True 79 | 80 | def load_configuration(self) -> None: 81 | """ 82 | Read the configuration from a configuration file 83 | """ 84 | config_file = self.default_config_file 85 | if self.config_file: 86 | config_file = self.config_file 87 | self.config = configparser.ConfigParser(delimiters="=") 88 | self.config.optionxform = lambda option: option # type: ignore 89 | self.config.read(config_file) 90 | 91 | 92 | # 11-15, 84-89, 98-99, 117-118, 124-126, 144-149 93 | def validate_config_values(config: configparser.ConfigParser) -> SetConfigValues: 94 | try: 95 | json_save = config.getboolean("mirror", "json") 96 | except configparser.NoOptionError: 97 | logger.error( 98 | "Please update your config to include a json " 99 | + "boolean in the [mirror] section. Setting to False" 100 | ) 101 | json_save = False 102 | 103 | try: 104 | root_uri = config.get("mirror", "root_uri") 105 | except configparser.NoOptionError: 106 | root_uri = "" 107 | 108 | try: 109 | diff_file_path = config.get("mirror", "diff-file") 110 | except configparser.NoOptionError: 111 | diff_file_path = "" 112 | if "{{" in diff_file_path and "}}" in diff_file_path: 113 | diff_file_path = diff_file_path.replace("{{", "").replace("}}", "") 114 | diff_ref_section, _, diff_ref_key = diff_file_path.partition("_") 115 | try: 116 | diff_file_path = config.get(diff_ref_section, diff_ref_key) 117 | except (configparser.NoOptionError, configparser.NoSectionError): 118 | logger.error( 119 | "Invalid section reference in `diff-file` key. " 120 | "Please correct this error. Saving diff files in" 121 | " base mirror directory." 122 | ) 123 | diff_file_path = str( 124 | Path(config.get("mirror", "directory")) / "mirrored-files" 125 | ) 126 | 127 | try: 128 | diff_append_epoch = config.getboolean("mirror", "diff-append-epoch") 129 | except configparser.NoOptionError: 130 | diff_append_epoch = False 131 | 132 | try: 133 | logger.debug("Checking config for storage backend...") 134 | storage_backend_name = config.get("mirror", "storage-backend") 135 | logger.debug("Found storage backend in config!") 136 | except configparser.NoOptionError: 137 | storage_backend_name = "filesystem" 138 | logger.debug( 139 | "Failed to find storage backend in config, falling back to default!" 140 | ) 141 | logger.info(f"Selected storage backend: {storage_backend_name}") 142 | 143 | try: 144 | digest_name = config.get("mirror", "digest_name") 145 | except configparser.NoOptionError: 146 | digest_name = "sha256" 147 | if digest_name not in ("md5", "sha256"): 148 | raise ValueError( 149 | f"Supplied digest_name {digest_name} is not supported! Please " 150 | + "update digest_name to one of ('sha256', 'md5') in the [mirror] " 151 | + "section." 152 | ) 153 | 154 | try: 155 | cleanup = config.getboolean("mirror", "cleanup") 156 | except configparser.NoOptionError: 157 | logger.debug( 158 | "bandersnatch is not cleaning up non PEP 503 normalized Simple " 159 | + "API directories" 160 | ) 161 | cleanup = False 162 | 163 | return SetConfigValues( 164 | json_save, 165 | root_uri, 166 | diff_file_path, 167 | diff_append_epoch, 168 | digest_name, 169 | storage_backend_name, 170 | cleanup, 171 | ) 172 | -------------------------------------------------------------------------------- /src/bandersnatch/package.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import logging 3 | from typing import TYPE_CHECKING, Any, Dict, List, Optional 4 | 5 | from packaging.utils import canonicalize_name 6 | 7 | from .errors import PackageNotFound, StaleMetadata 8 | from .master import StalePage 9 | 10 | if TYPE_CHECKING: # pragma: no cover 11 | from .filter import Filter 12 | from .master import Master 13 | 14 | # Bool to help us not spam the logs with certain log messages 15 | display_filter_log = True 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | class Package: 20 | def __init__(self, name: str, serial: int = 0) -> None: 21 | self.name = canonicalize_name(name) 22 | self.raw_name = name 23 | self.serial = serial 24 | 25 | self._metadata: Optional[Dict] = None 26 | 27 | @property 28 | def metadata(self) -> Dict[str, Any]: 29 | assert self._metadata is not None, "Must fetch metadata before accessing it" 30 | return self._metadata 31 | 32 | @property 33 | def info(self) -> Dict[str, Any]: 34 | return self.metadata["info"] # type: ignore 35 | 36 | @property 37 | def last_serial(self) -> int: 38 | return self.metadata["last_serial"] # type: ignore 39 | 40 | @property 41 | def releases(self) -> Dict[str, List]: 42 | return self.metadata["releases"] # type: ignore 43 | 44 | @property 45 | def release_files(self) -> List: 46 | release_files: List[Dict] = [] 47 | 48 | for release in self.releases.values(): 49 | release_files.extend(release) 50 | 51 | return release_files 52 | 53 | async def update_metadata(self, master: "Master", attempts: int = 3) -> None: 54 | tries = 0 55 | sleep_on_stale = 1 56 | 57 | while tries < attempts: 58 | try: 59 | logger.info( 60 | f"Fetching metadata for package: {self.name} (serial {self.serial})" 61 | ) 62 | self._metadata = await master.get_package_metadata( 63 | self.name, serial=self.serial 64 | ) 65 | return 66 | except PackageNotFound as e: 67 | logger.info(str(e)) 68 | raise 69 | except StalePage: 70 | tries += 1 71 | logger.error(f"Stale serial for package {self.name} - Attempt {tries}") 72 | if tries < attempts: 73 | logger.debug(f"Sleeping {sleep_on_stale}s to give CDN a chance") 74 | await asyncio.sleep(sleep_on_stale) 75 | sleep_on_stale *= 2 76 | continue 77 | logger.error( 78 | f"Stale serial for {self.name} ({self.serial}) " 79 | + "not updating. Giving up." 80 | ) 81 | raise StaleMetadata(package_name=self.name, attempts=attempts) 82 | 83 | def filter_metadata(self, metadata_filters: List["Filter"]) -> bool: 84 | """ 85 | Run the metadata filtering plugins 86 | """ 87 | global display_filter_log 88 | if not metadata_filters: 89 | if display_filter_log: 90 | logger.info( 91 | "No metadata filters are enabled. Skipping metadata filtering" 92 | ) 93 | display_filter_log = False 94 | return True 95 | 96 | return all(plugin.filter(self.metadata) for plugin in metadata_filters) 97 | 98 | def _filter_release( 99 | self, release_data: Dict, release_filters: List["Filter"] 100 | ) -> bool: 101 | """ 102 | Run the release filtering plugins 103 | """ 104 | global display_filter_log 105 | if not release_filters: 106 | if display_filter_log: 107 | logger.info( 108 | "No release filters are enabled. Skipping release filtering" 109 | ) 110 | display_filter_log = False 111 | return True 112 | 113 | return all(plugin.filter(release_data) for plugin in release_filters) 114 | 115 | def filter_all_releases(self, release_filters: List["Filter"]) -> bool: 116 | """ 117 | Filter releases and removes releases that fail the filters 118 | """ 119 | releases = list(self.releases.keys()) 120 | for version in releases: 121 | if not self._filter_release( 122 | {"version": version, "releases": self.releases, "info": self.info}, 123 | release_filters, 124 | ): 125 | del self.releases[version] 126 | if releases: 127 | return True 128 | return False 129 | 130 | def _filter_release_file( 131 | self, metadata: Dict, release_file_filters: List["Filter"] 132 | ) -> bool: 133 | """ 134 | Run the release file filtering plugins 135 | """ 136 | global display_filter_log 137 | if not release_file_filters: 138 | if display_filter_log: 139 | logger.info( 140 | "No release file filters are enabled. Skipping release file filtering" # noqa: E501 141 | ) 142 | display_filter_log = False 143 | return True 144 | 145 | return all(plugin.filter(metadata) for plugin in release_file_filters) 146 | 147 | def filter_all_releases_files(self, release_file_filters: List["Filter"]) -> bool: 148 | """ 149 | Filter release files and remove empty releases after doing so. 150 | """ 151 | releases = list(self.releases.keys()) 152 | for version in releases: 153 | release_files = list(self.releases[version]) 154 | for rfindex in reversed(range(len(release_files))): 155 | if not self._filter_release_file( 156 | { 157 | "info": self.info, 158 | "release": version, 159 | "release_file": self.releases[version][rfindex], 160 | }, 161 | release_file_filters, 162 | ): 163 | del self.releases[version][rfindex] 164 | if not self.releases[version]: 165 | del self.releases[version] 166 | 167 | if releases: 168 | return True 169 | return False 170 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_configuration.py: -------------------------------------------------------------------------------- 1 | import configparser 2 | import os 3 | import unittest 4 | import warnings 5 | from tempfile import TemporaryDirectory 6 | from unittest import TestCase 7 | 8 | from bandersnatch.configuration import ( 9 | BandersnatchConfig, 10 | SetConfigValues, 11 | Singleton, 12 | validate_config_values, 13 | ) 14 | 15 | try: 16 | import importlib.resources 17 | except ImportError: # For 3.6 and lesser 18 | import importlib 19 | import importlib_resources 20 | 21 | importlib.resources = importlib_resources 22 | 23 | 24 | class TestBandersnatchConf(TestCase): 25 | """ 26 | Tests for the BandersnatchConf singleton class 27 | """ 28 | 29 | tempdir = None 30 | cwd = None 31 | 32 | def setUp(self) -> None: 33 | self.cwd = os.getcwd() 34 | self.tempdir = TemporaryDirectory() 35 | os.chdir(self.tempdir.name) 36 | # Hack to ensure each test gets fresh instance if needed 37 | # We have a dedicated test to ensure we're creating a singleton 38 | Singleton._instances = {} 39 | 40 | def tearDown(self) -> None: 41 | if self.tempdir: 42 | assert self.cwd 43 | os.chdir(self.cwd) 44 | self.tempdir.cleanup() 45 | self.tempdir = None 46 | 47 | def test_is_singleton(self) -> None: 48 | instance1 = BandersnatchConfig() 49 | instance2 = BandersnatchConfig() 50 | self.assertEqual(id(instance1), id(instance2)) 51 | 52 | def test_single_config__default__all_sections_present(self) -> None: 53 | with importlib.resources.path( # type: ignore 54 | "bandersnatch", "unittest.conf" 55 | ) as config_file: 56 | instance = BandersnatchConfig(str(config_file)) 57 | # All default values should at least be present and be the write types 58 | for section in ["mirror", "plugins", "blocklist"]: 59 | self.assertIn(section, instance.config.sections()) 60 | 61 | def test_single_config__default__mirror__setting_attributes(self) -> None: 62 | instance = BandersnatchConfig() 63 | options = [option for option in instance.config["mirror"]] 64 | options.sort() 65 | self.assertListEqual( 66 | options, 67 | [ 68 | "cleanup", 69 | "directory", 70 | "global-timeout", 71 | "hash-index", 72 | "json", 73 | "master", 74 | "stop-on-error", 75 | "storage-backend", 76 | "timeout", 77 | "verifiers", 78 | "workers", 79 | ], 80 | ) 81 | 82 | def test_single_config__default__mirror__setting__types(self) -> None: 83 | """ 84 | Make sure all default mirror settings will cast to the correct types 85 | """ 86 | instance = BandersnatchConfig() 87 | for option, option_type in [ 88 | ("directory", str), 89 | ("hash-index", bool), 90 | ("json", bool), 91 | ("master", str), 92 | ("stop-on-error", bool), 93 | ("storage-backend", str), 94 | ("timeout", int), 95 | ("global-timeout", int), 96 | ("workers", int), 97 | ]: 98 | self.assertIsInstance( 99 | option_type(instance.config["mirror"].get(option)), option_type 100 | ) 101 | 102 | def test_single_config_custom_setting_boolean(self) -> None: 103 | with open("test.conf", "w") as testconfig_handle: 104 | testconfig_handle.write("[mirror]\nhash-index=false\n") 105 | instance = BandersnatchConfig() 106 | instance.config_file = "test.conf" 107 | instance.load_configuration() 108 | self.assertFalse(instance.config["mirror"].getboolean("hash-index")) 109 | 110 | def test_single_config_custom_setting_int(self) -> None: 111 | with open("test.conf", "w") as testconfig_handle: 112 | testconfig_handle.write("[mirror]\ntimeout=999\n") 113 | instance = BandersnatchConfig() 114 | instance.config_file = "test.conf" 115 | instance.load_configuration() 116 | self.assertEqual(int(instance.config["mirror"]["timeout"]), 999) 117 | 118 | def test_single_config_custom_setting_str(self) -> None: 119 | with open("test.conf", "w") as testconfig_handle: 120 | testconfig_handle.write("[mirror]\nmaster=https://foo.bar.baz\n") 121 | instance = BandersnatchConfig() 122 | instance.config_file = "test.conf" 123 | instance.load_configuration() 124 | self.assertEqual(instance.config["mirror"]["master"], "https://foo.bar.baz") 125 | 126 | def test_multiple_instances_custom_setting_str(self) -> None: 127 | with open("test.conf", "w") as testconfig_handle: 128 | testconfig_handle.write("[mirror]\nmaster=https://foo.bar.baz\n") 129 | instance1 = BandersnatchConfig() 130 | instance1.config_file = "test.conf" 131 | instance1.load_configuration() 132 | 133 | instance2 = BandersnatchConfig() 134 | self.assertEqual(instance2.config["mirror"]["master"], "https://foo.bar.baz") 135 | 136 | def test_validate_config_values(self) -> None: 137 | default_values = SetConfigValues( 138 | False, "", "", False, "sha256", "filesystem", False 139 | ) 140 | no_options_configparser = configparser.ConfigParser() 141 | no_options_configparser["mirror"] = {} 142 | self.assertEqual( 143 | default_values, validate_config_values(no_options_configparser) 144 | ) 145 | 146 | def test_deprecation_warning_raised(self) -> None: 147 | # Remove in 5.0 once we deprecate whitelist/blacklist 148 | 149 | config_file = "test.conf" 150 | instance = BandersnatchConfig() 151 | instance.config_file = config_file 152 | # Test no warning if new plugins used 153 | with open(config_file, "w") as f: 154 | f.write("[allowlist]\npackages=foo\n") 155 | instance.load_configuration() 156 | with warnings.catch_warnings(record=True) as w: 157 | instance.check_for_deprecations() 158 | self.assertEqual(len(w), 0) 159 | 160 | # Test warning if old plugins used 161 | instance.SHOWN_DEPRECATIONS = False 162 | with open(config_file, "w") as f: 163 | f.write("[whitelist]\npackages=foo\n") 164 | instance.load_configuration() 165 | with warnings.catch_warnings(record=True) as w: 166 | instance.check_for_deprecations() 167 | instance.check_for_deprecations() 168 | # Assert we only throw 1 warning 169 | self.assertEqual(len(w), 1) 170 | 171 | 172 | if __name__ == "__main__": 173 | unittest.main() 174 | -------------------------------------------------------------------------------- /src/bandersnatch_filter_plugins/allowlist_name.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from typing import Any, Dict, List, Set 3 | 4 | from packaging.requirements import Requirement 5 | from packaging.utils import canonicalize_name 6 | from packaging.version import InvalidVersion, Version 7 | 8 | from bandersnatch.filter import FilterProjectPlugin, FilterReleasePlugin 9 | 10 | logger = logging.getLogger("bandersnatch") 11 | 12 | 13 | class AllowListProject(FilterProjectPlugin): 14 | name = "allowlist_project" 15 | deprecated_name = "whitelist_project" 16 | # Requires iterable default 17 | allowlist_package_names: List[str] = [] 18 | 19 | def initialize_plugin(self) -> None: 20 | """ 21 | Initialize the plugin 22 | """ 23 | # Generate a list of allowlisted packages from the configuration and 24 | # store it into self.allowlist_package_names attribute so this 25 | # operation doesn't end up in the fastpath. 26 | if not self.allowlist_package_names: 27 | self.allowlist_package_names = self._determine_unfiltered_package_names() 28 | logger.info( 29 | f"Initialized project plugin {self.name}, filtering " 30 | + f"{self.allowlist_package_names}" 31 | ) 32 | 33 | def _determine_unfiltered_package_names(self) -> List[str]: 34 | """ 35 | Return a list of package names to be filtered base on the configuration 36 | file. 37 | """ 38 | # This plugin only processes packages, if the line in the packages 39 | # configuration contains a PEP440 specifier it will be processed by the 40 | # allowlist release filter. So we need to remove any packages that 41 | # are not applicable for this plugin. 42 | unfiltered_packages: Set[str] = set() 43 | try: 44 | lines = self.allowlist["packages"] 45 | package_lines = lines.split("\n") 46 | except KeyError: 47 | package_lines = [] 48 | for package_line in package_lines: 49 | package_line = package_line.strip() 50 | if not package_line or package_line.startswith("#"): 51 | continue 52 | unfiltered_packages.add(canonicalize_name(Requirement(package_line).name)) 53 | return list(unfiltered_packages) 54 | 55 | def filter(self, metadata: Dict) -> bool: 56 | return not self.check_match(name=metadata["info"]["name"]) 57 | 58 | def check_match(self, **kwargs: Any) -> bool: 59 | """ 60 | Check if the package name matches against a project that is blocklisted 61 | in the configuration. 62 | 63 | Parameters 64 | ========== 65 | name: str 66 | The normalized package name of the package/project to check against 67 | the blocklist. 68 | 69 | Returns 70 | ======= 71 | bool: 72 | True if it matches, False otherwise. 73 | """ 74 | if not self.allowlist_package_names: 75 | return False 76 | 77 | name = kwargs.get("name", None) 78 | if not name: 79 | return False 80 | 81 | if canonicalize_name(name) in self.allowlist_package_names: 82 | logger.info(f"Package {name!r} is allowlisted") 83 | return False 84 | return True 85 | 86 | 87 | class AllowListRelease(FilterReleasePlugin): 88 | name = "allowlist_release" 89 | deprecated_name = "whitelist_release" 90 | # Requires iterable default 91 | allowlist_package_names: List[Requirement] = [] 92 | 93 | def initialize_plugin(self) -> None: 94 | """ 95 | Initialize the plugin 96 | """ 97 | # Generate a list of allowlisted packages from the configuration and 98 | # store it into self.allowlist_package_names attribute so this 99 | # operation doesn't end up in the fastpath. 100 | if not self.allowlist_package_names: 101 | self.allowlist_release_requirements = ( 102 | self._determine_filtered_package_requirements() 103 | ) 104 | logger.info( 105 | f"Initialized release plugin {self.name}, filtering " 106 | + f"{self.allowlist_release_requirements}" 107 | ) 108 | 109 | def _determine_filtered_package_requirements(self) -> List[Requirement]: 110 | """ 111 | Parse the configuration file for [allowlist]packages 112 | 113 | Returns 114 | ------- 115 | list of packaging.requirements.Requirement 116 | For all PEP440 package specifiers 117 | """ 118 | filtered_requirements: Set[Requirement] = set() 119 | try: 120 | lines = self.allowlist["packages"] 121 | package_lines = lines.split("\n") 122 | except KeyError: 123 | package_lines = [] 124 | for package_line in package_lines: 125 | package_line = package_line.strip() 126 | if not package_line or package_line.startswith("#"): 127 | continue 128 | requirement = Requirement(package_line) 129 | requirement.name = canonicalize_name(requirement.name) 130 | requirement.specifier.prereleases = True 131 | filtered_requirements.add(requirement) 132 | return list(filtered_requirements) 133 | 134 | def filter(self, metadata: Dict) -> bool: 135 | """ 136 | Returns False if version fails the filter, 137 | i.e. doesn't matches an allowlist version specifier 138 | """ 139 | name = metadata["info"]["name"] 140 | version = metadata["version"] 141 | return self._check_match(canonicalize_name(name), version) 142 | 143 | def _check_match(self, name: str, version_string: str) -> bool: 144 | """ 145 | Check if the package name and version matches against an allowlisted 146 | package version specifier. 147 | 148 | Parameters 149 | ========== 150 | name: str 151 | Package name 152 | 153 | version: str 154 | Package version 155 | 156 | Returns 157 | ======= 158 | bool: 159 | True if it matches, False otherwise. 160 | """ 161 | if not name or not version_string: 162 | return False 163 | 164 | try: 165 | version = Version(version_string) 166 | except InvalidVersion: 167 | logger.debug(f"Package {name}=={version_string} has an invalid version") 168 | return False 169 | for requirement in self.allowlist_release_requirements: 170 | if name != requirement.name: 171 | continue 172 | if version in requirement.specifier: 173 | logger.debug( 174 | f"MATCH: Release {name}=={version} matches specifier " 175 | f"{requirement.specifier}" 176 | ) 177 | return True 178 | return False 179 | -------------------------------------------------------------------------------- /src/bandersnatch/main.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import asyncio 3 | import logging 4 | import logging.config 5 | import shutil 6 | import sys 7 | from configparser import ConfigParser 8 | from pathlib import Path 9 | from tempfile import gettempdir 10 | from typing import Optional 11 | 12 | import bandersnatch.configuration 13 | import bandersnatch.delete 14 | import bandersnatch.log 15 | import bandersnatch.master 16 | import bandersnatch.mirror 17 | import bandersnatch.verify 18 | from bandersnatch.storage import storage_backend_plugins 19 | 20 | logger = logging.getLogger(__name__) # pylint: disable=C0103 21 | 22 | 23 | # TODO: Workout why argparse.ArgumentParser causes type errors 24 | def _delete_parser(subparsers: argparse._SubParsersAction) -> None: 25 | d = subparsers.add_parser( 26 | "delete", 27 | help=( 28 | "Consulte metadata (locally or remotely) and delete " 29 | + "entire pacakge artifacts." 30 | ), 31 | ) 32 | d.add_argument( 33 | "--dry-run", 34 | action="store_true", 35 | default=False, 36 | help="Do not download or delete files", 37 | ) 38 | d.add_argument( 39 | "--workers", 40 | type=int, 41 | default=0, 42 | help="# of parallel iops [Defaults to bandersnatch.conf]", 43 | ) 44 | d.add_argument("pypi_packages", nargs="*") 45 | d.set_defaults(op="delete") 46 | 47 | 48 | def _mirror_parser(subparsers: argparse._SubParsersAction) -> None: 49 | m = subparsers.add_parser( 50 | "mirror", 51 | help="Performs a one-time synchronization with the PyPI master server.", 52 | ) 53 | m.add_argument( 54 | "--force-check", 55 | action="store_true", 56 | default=False, 57 | help=( 58 | "Force bandersnatch to reset the PyPI serial (move serial file to /tmp) to " 59 | + "perform a full sync" 60 | ), 61 | ) 62 | m.set_defaults(op="mirror") 63 | 64 | 65 | def _verify_parser(subparsers: argparse._SubParsersAction) -> None: 66 | v = subparsers.add_parser( 67 | "verify", help="Read in Metadata and check package file validity" 68 | ) 69 | v.add_argument( 70 | "--delete", 71 | action="store_true", 72 | default=False, 73 | help="Enable deletion of packages not active", 74 | ) 75 | v.add_argument( 76 | "--dry-run", 77 | action="store_true", 78 | default=False, 79 | help="Do not download or delete files", 80 | ) 81 | v.add_argument( 82 | "--json-update", 83 | action="store_true", 84 | default=False, 85 | help="Enable updating JSON from PyPI", 86 | ) 87 | v.add_argument( 88 | "--workers", 89 | type=int, 90 | default=0, 91 | help="# of parallel iops [Defaults to bandersnatch.conf]", 92 | ) 93 | v.set_defaults(op="verify") 94 | 95 | 96 | def _sync_parser(subparsers: argparse._SubParsersAction) -> None: 97 | m = subparsers.add_parser( 98 | "sync", help="Synchronize specific packages with the PyPI master server.", 99 | ) 100 | m.add_argument( 101 | "packages", metavar="package", nargs="+", help="The name of package to sync", 102 | ) 103 | m.set_defaults(op="sync") 104 | 105 | 106 | async def async_main(args: argparse.Namespace, config: ConfigParser) -> int: 107 | if args.op.lower() == "delete": 108 | async with bandersnatch.master.Master( 109 | config.get("mirror", "master"), 110 | config.getfloat("mirror", "timeout"), 111 | config.getfloat("mirror", "global-timeout", fallback=None), 112 | ) as master: 113 | return await bandersnatch.delete.delete_packages(config, args, master) 114 | elif args.op.lower() == "verify": 115 | return await bandersnatch.verify.metadata_verify(config, args) 116 | elif args.op.lower() == "sync": 117 | return await bandersnatch.mirror.mirror(config, args.packages) 118 | 119 | if args.force_check: 120 | storage_plugin = next(iter(storage_backend_plugins())) 121 | status_file = ( 122 | storage_plugin.PATH_BACKEND(config.get("mirror", "directory")) / "status" 123 | ) 124 | if status_file.exists(): 125 | tmp_status_file = Path(gettempdir()) / "status" 126 | try: 127 | shutil.move(str(status_file), tmp_status_file) 128 | logger.debug( 129 | "Force bandersnatch to check everything against the master PyPI" 130 | + f" - status file moved to {tmp_status_file}" 131 | ) 132 | except OSError as e: 133 | logger.error( 134 | f"Could not move status file ({status_file} to " 135 | + f" {tmp_status_file}): {e}" 136 | ) 137 | else: 138 | logger.info( 139 | f"No status file to move ({status_file}) - Full sync will occur" 140 | ) 141 | 142 | return await bandersnatch.mirror.mirror(config) 143 | 144 | 145 | def main(loop: Optional[asyncio.AbstractEventLoop] = None) -> int: 146 | parser = argparse.ArgumentParser( 147 | description="PyPI PEP 381 mirroring client.", prog="bandersnatch" 148 | ) 149 | parser.add_argument( 150 | "--version", action="version", version=f"%(prog)s {bandersnatch.__version__}" 151 | ) 152 | parser.add_argument( 153 | "-c", 154 | "--config", 155 | default="/etc/bandersnatch.conf", 156 | help="use configuration file (default: %(default)s)", 157 | ) 158 | parser.add_argument( 159 | "--debug", 160 | action="store_true", 161 | default=False, 162 | help="Turn on extra logging (DEBUG level)", 163 | ) 164 | 165 | subparsers = parser.add_subparsers() 166 | _delete_parser(subparsers) 167 | _mirror_parser(subparsers) 168 | _verify_parser(subparsers) 169 | _sync_parser(subparsers) 170 | 171 | if len(sys.argv) < 2: 172 | parser.print_help() 173 | parser.exit() 174 | 175 | args = parser.parse_args() 176 | 177 | bandersnatch.log.setup_logging(args) 178 | 179 | # Prepare default config file if needed. 180 | config_path = Path(args.config) 181 | if not config_path.exists(): 182 | logger.warning(f"Config file '{args.config}' missing, creating default config.") 183 | logger.warning("Please review the config file, then run 'bandersnatch' again.") 184 | 185 | default_config_path = Path(__file__).parent / "default.conf" 186 | try: 187 | shutil.copy(default_config_path, args.config) 188 | except OSError as e: 189 | logger.error(f"Could not create config file: {e}") 190 | return 1 191 | 192 | config = bandersnatch.configuration.BandersnatchConfig( 193 | config_file=args.config 194 | ).config 195 | 196 | if config.has_option("mirror", "log-config"): 197 | logging.config.fileConfig(str(Path(config.get("mirror", "log-config")))) 198 | 199 | # TODO: Go to asyncio.run() when >= 3.7 200 | loop = loop or asyncio.get_event_loop() 201 | loop.set_debug(args.debug) 202 | try: 203 | return loop.run_until_complete(async_main(args, config)) 204 | finally: 205 | loop.close() 206 | 207 | 208 | if __name__ == "__main__": 209 | exit(main()) 210 | -------------------------------------------------------------------------------- /src/bandersnatch_filter_plugins/blocklist_name.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from typing import Any, Dict, List, Set 3 | 4 | from packaging.requirements import Requirement 5 | from packaging.utils import canonicalize_name 6 | from packaging.version import InvalidVersion, Version 7 | 8 | from bandersnatch.filter import FilterProjectPlugin, FilterReleasePlugin 9 | 10 | logger = logging.getLogger("bandersnatch") 11 | 12 | 13 | class BlockListProject(FilterProjectPlugin): 14 | name = "blocklist_project" 15 | deprecated_name = "blacklist_project" 16 | # Requires iterable default 17 | blocklist_package_names: List[str] = [] 18 | 19 | def initialize_plugin(self) -> None: 20 | """ 21 | Initialize the plugin 22 | """ 23 | # Generate a list of blocklisted packages from the configuration and 24 | # store it into self.blocklist_package_names attribute so this 25 | # operation doesn't end up in the fastpath. 26 | if not self.blocklist_package_names: 27 | self.blocklist_package_names = self._determine_filtered_package_names() 28 | logger.info( 29 | f"Initialized project plugin {self.name}, filtering " 30 | + f"{self.blocklist_package_names}" 31 | ) 32 | 33 | def _determine_filtered_package_names(self) -> List[str]: 34 | """ 35 | Return a list of package names to be filtered base on the configuration 36 | file. 37 | """ 38 | # This plugin only processes packages, if the line in the packages 39 | # configuration contains a PEP440 specifier it will be processed by the 40 | # blocklist release filter. So we need to remove any packages that 41 | # are not applicable for this plugin. 42 | filtered_packages: Set[str] = set() 43 | try: 44 | lines = self.blocklist["packages"] 45 | package_lines = lines.split("\n") 46 | except KeyError: 47 | package_lines = [] 48 | for package_line in package_lines: 49 | package_line = package_line.strip() 50 | if not package_line or package_line.startswith("#"): 51 | continue 52 | package_requirement = Requirement(package_line) 53 | if package_requirement.specifier: 54 | continue 55 | if package_requirement.name != package_line: 56 | logger.debug( 57 | "Package line %r does not requirement name %r", 58 | package_line, 59 | package_requirement.name, 60 | ) 61 | continue 62 | filtered_packages.add(canonicalize_name(package_requirement.name)) 63 | logger.debug("Project blocklist is %r", list(filtered_packages)) 64 | return list(filtered_packages) 65 | 66 | def filter(self, metadata: Dict) -> bool: 67 | return not self.check_match(name=metadata["info"]["name"]) 68 | 69 | def check_match(self, **kwargs: Any) -> bool: 70 | """ 71 | Check if the package name matches against a project that is blocklisted 72 | in the configuration. 73 | 74 | Parameters 75 | ========== 76 | name: str 77 | The normalized package name of the package/project to check against 78 | the blocklist. 79 | 80 | Returns 81 | ======= 82 | bool: 83 | True if it matches, False otherwise. 84 | """ 85 | name = kwargs.get("name", None) 86 | if not name: 87 | return False 88 | 89 | if canonicalize_name(name) in self.blocklist_package_names: 90 | logger.info(f"Package {name!r} is blocklisted") 91 | return True 92 | return False 93 | 94 | 95 | class BlockListRelease(FilterReleasePlugin): 96 | name = "blocklist_release" 97 | deprecated_name = "blacklist_release" 98 | # Requires iterable default 99 | blocklist_package_names: List[Requirement] = [] 100 | 101 | def initialize_plugin(self) -> None: 102 | """ 103 | Initialize the plugin 104 | """ 105 | # Generate a list of blocklisted packages from the configuration and 106 | # store it into self.blocklist_package_names attribute so this 107 | # operation doesn't end up in the fastpath. 108 | if not self.blocklist_package_names: 109 | self.blocklist_release_requirements = ( 110 | self._determine_filtered_package_requirements() 111 | ) 112 | logger.info( 113 | f"Initialized release plugin {self.name}, filtering " 114 | + f"{self.blocklist_release_requirements}" 115 | ) 116 | 117 | def _determine_filtered_package_requirements(self) -> List[Requirement]: 118 | """ 119 | Parse the configuration file for [blocklist]packages 120 | 121 | Returns 122 | ------- 123 | list of packaging.requirements.Requirement 124 | For all PEP440 package specifiers 125 | """ 126 | filtered_requirements: Set[Requirement] = set() 127 | try: 128 | lines = self.blocklist["packages"] 129 | package_lines = lines.split("\n") 130 | except KeyError: 131 | package_lines = [] 132 | for package_line in package_lines: 133 | package_line = package_line.strip() 134 | if not package_line or package_line.startswith("#"): 135 | continue 136 | requirement = Requirement(package_line) 137 | requirement.name = canonicalize_name(requirement.name) 138 | requirement.specifier.prereleases = True 139 | filtered_requirements.add(requirement) 140 | return list(filtered_requirements) 141 | 142 | def filter(self, metadata: Dict) -> bool: 143 | """ 144 | Returns False if version fails the filter, 145 | i.e. matches a blocklist version specifier 146 | """ 147 | name = metadata["info"]["name"] 148 | version = metadata["version"] 149 | return not self._check_match(canonicalize_name(name), version) 150 | 151 | def _check_match(self, name: str, version_string: str) -> bool: 152 | """ 153 | Check if the package name and version matches against a blocklisted 154 | package version specifier. 155 | 156 | Parameters 157 | ========== 158 | name: str 159 | Package name 160 | 161 | version: str 162 | Package version 163 | 164 | Returns 165 | ======= 166 | bool: 167 | True if it matches, False otherwise. 168 | """ 169 | if not name or not version_string: 170 | return False 171 | 172 | try: 173 | version = Version(version_string) 174 | except InvalidVersion: 175 | logger.debug(f"Package {name}=={version_string} has an invalid version") 176 | return False 177 | for requirement in self.blocklist_release_requirements: 178 | if name != requirement.name: 179 | continue 180 | if version in requirement.specifier: 181 | logger.debug( 182 | f"MATCH: Release {name}=={version} matches specifier " 183 | f"{requirement.specifier}" 184 | ) 185 | return True 186 | return False 187 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/plugins/test_allowlist_name.py: -------------------------------------------------------------------------------- 1 | import os 2 | from collections import defaultdict 3 | from pathlib import Path 4 | from tempfile import TemporaryDirectory 5 | from unittest import TestCase 6 | 7 | from mock_config import mock_config 8 | 9 | import bandersnatch.filter 10 | import bandersnatch.storage 11 | from bandersnatch.master import Master 12 | from bandersnatch.mirror import BandersnatchMirror 13 | from bandersnatch.package import Package 14 | 15 | 16 | class TestAllowListProject(TestCase): 17 | """ 18 | Tests for the bandersnatch filtering classes 19 | """ 20 | 21 | tempdir = None 22 | cwd = None 23 | 24 | def setUp(self) -> None: 25 | self.cwd = os.getcwd() 26 | self.tempdir = TemporaryDirectory() 27 | bandersnatch.storage.loaded_storage_plugins = defaultdict(list) 28 | os.chdir(self.tempdir.name) 29 | 30 | def tearDown(self) -> None: 31 | if self.tempdir: 32 | assert self.cwd 33 | os.chdir(self.cwd) 34 | self.tempdir.cleanup() 35 | self.tempdir = None 36 | 37 | def test__plugin__loads__explicitly_enabled(self) -> None: 38 | mock_config( 39 | contents="""\ 40 | [plugins] 41 | enabled = 42 | allowlist_project 43 | """ 44 | ) 45 | 46 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins() 47 | names = [plugin.name for plugin in plugins] 48 | self.assertListEqual(names, ["allowlist_project"]) 49 | self.assertEqual(len(plugins), 1) 50 | 51 | def test__plugin__loads__default(self) -> None: 52 | mock_config( 53 | """\ 54 | [mirror] 55 | storage-backend = filesystem 56 | 57 | [plugins] 58 | """ 59 | ) 60 | 61 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins() 62 | names = [plugin.name for plugin in plugins] 63 | self.assertNotIn("allowlist_project", names) 64 | 65 | def test__filter__matches__package(self) -> None: 66 | mock_config( 67 | """\ 68 | [mirror] 69 | storage-backend = filesystem 70 | 71 | [plugins] 72 | enabled = 73 | allowlist_project 74 | 75 | [allowlist] 76 | packages = 77 | foo 78 | """ 79 | ) 80 | 81 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 82 | mirror.packages_to_sync = {"foo": ""} 83 | mirror._filter_packages() 84 | 85 | self.assertIn("foo", mirror.packages_to_sync.keys()) 86 | 87 | def test__filter__nomatch_package(self) -> None: 88 | mock_config( 89 | """\ 90 | [mirror] 91 | storage-backend = filesystem 92 | 93 | [plugins] 94 | enabled = 95 | allowlist_project 96 | 97 | [allowlist] 98 | packages = 99 | foo 100 | """ 101 | ) 102 | 103 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 104 | mirror.packages_to_sync = {"foo": "", "foo2": ""} 105 | mirror._filter_packages() 106 | 107 | self.assertIn("foo", mirror.packages_to_sync.keys()) 108 | self.assertNotIn("foo2", mirror.packages_to_sync.keys()) 109 | 110 | def test__filter__name_only(self) -> None: 111 | mock_config( 112 | """\ 113 | [mirror] 114 | storage-backend = filesystem 115 | 116 | [plugins] 117 | enabled = 118 | allowlist_project 119 | 120 | [allowlist] 121 | packages = 122 | foo==1.2.3 123 | """ 124 | ) 125 | 126 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 127 | mirror.packages_to_sync = {"foo": "", "foo2": ""} 128 | mirror._filter_packages() 129 | 130 | self.assertIn("foo", mirror.packages_to_sync.keys()) 131 | self.assertNotIn("foo2", mirror.packages_to_sync.keys()) 132 | 133 | def test__filter__varying__specifiers(self) -> None: 134 | mock_config( 135 | """\ 136 | [mirror] 137 | storage-backend = filesystem 138 | 139 | [plugins] 140 | enabled = 141 | allowlist_project 142 | 143 | [allowlist] 144 | packages = 145 | foo==1.2.3 146 | bar~=3.0,<=1.5 147 | """ 148 | ) 149 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 150 | mirror.packages_to_sync = { 151 | "foo": "", 152 | "bar": "", 153 | "snu": "", 154 | } 155 | mirror._filter_packages() 156 | 157 | self.assertEqual({"foo": "", "bar": ""}, mirror.packages_to_sync) 158 | 159 | 160 | class TestAllowlistRelease(TestCase): 161 | """ 162 | Tests for the bandersnatch filtering classes 163 | """ 164 | 165 | tempdir = None 166 | cwd = None 167 | 168 | def setUp(self) -> None: 169 | self.cwd = os.getcwd() 170 | self.tempdir = TemporaryDirectory() 171 | os.chdir(self.tempdir.name) 172 | 173 | def tearDown(self) -> None: 174 | if self.tempdir: 175 | assert self.cwd 176 | os.chdir(self.cwd) 177 | self.tempdir.cleanup() 178 | self.tempdir = None 179 | 180 | def test__plugin__loads__explicitly_enabled(self) -> None: 181 | mock_config( 182 | """\ 183 | [plugins] 184 | enabled = 185 | allowlist_release 186 | """ 187 | ) 188 | 189 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 190 | names = [plugin.name for plugin in plugins] 191 | self.assertListEqual(names, ["allowlist_release"]) 192 | self.assertEqual(len(plugins), 1) 193 | 194 | def test__plugin__doesnt_load__explicitly__disabled(self) -> None: 195 | mock_config( 196 | """\ 197 | [plugins] 198 | enabled = 199 | allowlist_package 200 | """ 201 | ) 202 | 203 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 204 | names = [plugin.name for plugin in plugins] 205 | self.assertNotIn("allowlist_release", names) 206 | 207 | def test__filter__matches__release(self) -> None: 208 | mock_config( 209 | """\ 210 | [plugins] 211 | enabled = 212 | allowlist_release 213 | [allowlist] 214 | packages = 215 | foo==1.2.0 216 | """ 217 | ) 218 | 219 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 220 | pkg = Package("foo", 1) 221 | pkg._metadata = { 222 | "info": {"name": "foo"}, 223 | "releases": {"1.2.0": {}, "1.2.1": {}}, 224 | } 225 | 226 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 227 | 228 | self.assertEqual(pkg.releases, {"1.2.0": {}}) 229 | 230 | def test__dont__filter__prereleases(self) -> None: 231 | mock_config( 232 | """\ 233 | [plugins] 234 | enabled = 235 | allowlist_release 236 | [allowlist] 237 | packages = 238 | foo<=1.2.0 239 | """ 240 | ) 241 | 242 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 243 | pkg = Package("foo", 1) 244 | pkg._metadata = { 245 | "info": {"name": "foo"}, 246 | "releases": { 247 | "1.1.0a2": {}, 248 | "1.1.1beta1": {}, 249 | "1.2.0": {}, 250 | "1.2.1": {}, 251 | "1.2.2alpha3": {}, 252 | "1.2.3rc1": {}, 253 | }, 254 | } 255 | 256 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 257 | 258 | self.assertEqual(pkg.releases, {"1.1.0a2": {}, "1.1.1beta1": {}, "1.2.0": {}}) 259 | 260 | def test__casing__no__affect(self) -> None: 261 | mock_config( 262 | """\ 263 | [plugins] 264 | enabled = 265 | allowlist_release 266 | [allowlist] 267 | packages = 268 | Foo<=1.2.0 269 | """ 270 | ) 271 | 272 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 273 | pkg = Package("foo", 1) 274 | pkg._metadata = { 275 | "info": {"name": "foo"}, 276 | "releases": {"1.2.0": {}, "1.2.1": {}}, 277 | } 278 | 279 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 280 | 281 | self.assertEqual(pkg.releases, {"1.2.0": {}}) 282 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/test_verify.py: -------------------------------------------------------------------------------- 1 | import configparser 2 | import os 3 | import sys 4 | import unittest.mock as mock 5 | from concurrent.futures import ThreadPoolExecutor 6 | from pathlib import Path 7 | from shutil import rmtree 8 | from tempfile import gettempdir 9 | from typing import Any, List 10 | 11 | import pytest 12 | from _pytest.monkeypatch import MonkeyPatch 13 | 14 | import bandersnatch 15 | from bandersnatch.master import Master 16 | from bandersnatch.utils import convert_url_to_path, find 17 | 18 | from bandersnatch.verify import ( # isort:skip 19 | get_latest_json, 20 | delete_unowned_files, 21 | metadata_verify, 22 | verify_producer, 23 | ) 24 | 25 | 26 | async def do_nothing(*args: Any, **kwargs: Any) -> None: 27 | pass 28 | 29 | 30 | def some_dirs(*args: Any, **kwargs: Any) -> List[str]: 31 | return ["/data/pypi/web/json/bandersnatch", "/data/pypi/web/json/black"] 32 | 33 | 34 | class FakeArgs: 35 | delete = True 36 | dry_run = True 37 | workers = 2 38 | 39 | 40 | class FakeConfig: 41 | def get(self, section: str, item: str) -> str: 42 | if section == "mirror": 43 | if item == "directory": 44 | return "/data/pypi" 45 | if item == "master": 46 | return "https://pypi.org/simple/" 47 | return "" 48 | 49 | def getfloat(self, section: str, item: str, fallback: float = 0.5) -> float: 50 | return 0.5 51 | 52 | 53 | # TODO: Support testing sharded simple dirs 54 | class FakeMirror: 55 | def __init__(self, entropy: str = "") -> None: 56 | self.mirror_base = Path(gettempdir()) / f"pypi_unittest_{os.getpid()}{entropy}" 57 | if self.mirror_base.exists(): 58 | return 59 | self.web_base = self.mirror_base / "web" 60 | self.web_base.mkdir(parents=True) 61 | self.json_path = self.web_base / "json" 62 | self.package_path = self.web_base / "packages" 63 | self.pypi_path = self.web_base / "pypi" 64 | self.simple_path = self.web_base / "simple" 65 | 66 | for web_dir in ( 67 | self.json_path, 68 | self.package_path, 69 | self.pypi_path, 70 | self.simple_path, 71 | ): 72 | web_dir.mkdir() 73 | 74 | self.pypi_packages = { 75 | "bandersnatch": { 76 | "bandersnatch-0.6.9": { 77 | "filename": "bandersnatch-0.6.9.tar.gz", 78 | "contents": "69", 79 | "sha256": "b35e87b5838011a3637be660e4238af9a55e4edc74404c990f7a558e7f416658", # noqa: E501 80 | "url": "https://test.pypi.org/packages/8f/1a/6969/bandersnatch-0.6.9.tar.gz", # noqa: E501 81 | } 82 | }, 83 | "black": { 84 | "black-2018.6.9": { 85 | "filename": "black-2018.6.9.tar.gz", 86 | "contents": "69", 87 | "sha256": "b35e87b5838011a3637be660e4238af9a55e4edc74404c990f7a558e7f416658", # noqa: E501 88 | "url": "https://test.pypi.org/packages/8f/1a/6969/black-2018.6.9.tar.gz", # noqa: E501 89 | }, 90 | "black-2019.6.9": { 91 | "filename": "black-2019.6.9.tar.gz", 92 | "contents": "1469", 93 | "sha256": "c896470f5975bd5dc7d173871faca19848855b01bacf3171e9424b8a993b528b", # noqa: E501 94 | "url": "https://test.pypi.org/packages/8f/1a/1aa0/black-2019.6.9.tar.gz", # noqa: E501 95 | }, 96 | }, 97 | } 98 | 99 | # Create each subdir of web 100 | self.setup_json() 101 | self.setup_simple() 102 | self.setup_packages() 103 | 104 | def clean_up(self) -> None: 105 | if self.mirror_base.exists(): 106 | rmtree(self.mirror_base) 107 | 108 | def setup_json(self) -> None: 109 | for pkg in self.pypi_packages.keys(): 110 | pkg_json = self.json_path / pkg 111 | pkg_json.touch() 112 | pkg_legacy_json = self.pypi_path / pkg / "json" 113 | pkg_legacy_json.parent.mkdir() 114 | pkg_legacy_json.symlink_to(str(pkg_json)) 115 | 116 | def setup_packages(self) -> None: 117 | for _pkg, dists in self.pypi_packages.items(): 118 | for _version, metadata in dists.items(): 119 | dist_file = self.web_base / convert_url_to_path(metadata["url"]) 120 | dist_file.parent.mkdir(exist_ok=True, parents=True) 121 | with dist_file.open("w") as dfp: 122 | dfp.write(metadata["contents"]) 123 | 124 | def setup_simple(self) -> None: 125 | for pkg in self.pypi_packages.keys(): 126 | pkg_dir = self.simple_path / pkg 127 | pkg_dir.mkdir() 128 | index_path = pkg_dir / "index.html" 129 | index_path.touch() 130 | 131 | 132 | @pytest.mark.asyncio 133 | async def test_verify_producer(monkeypatch: MonkeyPatch) -> None: 134 | fm = FakeMirror("test_async_verify") 135 | fc = configparser.ConfigParser() 136 | fc["mirror"] = {} 137 | fc["mirror"]["verifiers"] = "2" 138 | master = Master("https://unittest.org") 139 | json_files = ["web/json/bandersnatch", "web/json/black"] 140 | monkeypatch.setattr(bandersnatch.verify, "verify", do_nothing) 141 | await verify_producer(master, fc, [], fm.mirror_base, json_files, mock.Mock(), None) 142 | 143 | 144 | def test_fake_mirror() -> None: 145 | expected_mirror_layout = """\ 146 | web 147 | web{0}json 148 | web{0}json{0}bandersnatch 149 | web{0}json{0}black 150 | web{0}packages 151 | web{0}packages{0}8f 152 | web{0}packages{0}8f{0}1a 153 | web{0}packages{0}8f{0}1a{0}1aa0 154 | web{0}packages{0}8f{0}1a{0}1aa0{0}black-2019.6.9.tar.gz 155 | web{0}packages{0}8f{0}1a{0}6969 156 | web{0}packages{0}8f{0}1a{0}6969{0}bandersnatch-0.6.9.tar.gz 157 | web{0}packages{0}8f{0}1a{0}6969{0}black-2018.6.9.tar.gz 158 | web{0}pypi 159 | web{0}pypi{0}bandersnatch 160 | web{0}pypi{0}bandersnatch{0}json 161 | web{0}pypi{0}black 162 | web{0}pypi{0}black{0}json 163 | web{0}simple 164 | web{0}simple{0}bandersnatch 165 | web{0}simple{0}bandersnatch{0}index.html 166 | web{0}simple{0}black 167 | web{0}simple{0}black{0}index.html""".format( 168 | os.sep 169 | ) 170 | fm = FakeMirror("_mirror_base_test") 171 | assert expected_mirror_layout == find(str(fm.mirror_base), True) 172 | fm.clean_up() 173 | 174 | 175 | @pytest.mark.asyncio 176 | async def test_delete_unowned_files() -> None: 177 | executor = ThreadPoolExecutor(max_workers=2) 178 | fm = FakeMirror("_test_delete_files") 179 | # Leave out black-2018.6.9.tar.gz so it gets deleted 180 | all_pkgs = [ 181 | fm.mirror_base / "web/packages/8f/1a/1aa0/black-2019.6.9.tar.gz", 182 | fm.mirror_base / "web/packages/8f/1a/6969/bandersnatch-0.6.9.tar.gz", 183 | ] 184 | await delete_unowned_files(fm.mirror_base, executor, all_pkgs, True) 185 | await delete_unowned_files(fm.mirror_base, executor, all_pkgs, False) 186 | deleted_path = fm.mirror_base / "web/packages/8f/1a/6969/black-2018.6.9.tar.gz" 187 | assert not deleted_path.exists() 188 | fm.clean_up() 189 | 190 | 191 | @pytest.mark.asyncio 192 | async def test_get_latest_json(monkeypatch: MonkeyPatch) -> None: 193 | config = FakeConfig() 194 | executor = ThreadPoolExecutor(max_workers=2) 195 | json_path = Path(gettempdir()) / f"unittest_{os.getpid()}.json" 196 | master = Master("https://unittest.org") 197 | master.url_fetch = do_nothing # type: ignore 198 | await get_latest_json(master, json_path, config, executor) # type: ignore 199 | 200 | 201 | @pytest.mark.asyncio 202 | async def test_metadata_verify(monkeypatch: MonkeyPatch) -> None: 203 | fa = FakeArgs() 204 | fc = FakeConfig() 205 | monkeypatch.setattr(bandersnatch.verify, "verify_producer", do_nothing) 206 | monkeypatch.setattr(bandersnatch.verify, "delete_unowned_files", do_nothing) 207 | monkeypatch.setattr(bandersnatch.verify.os, "listdir", some_dirs) 208 | await metadata_verify(fc, fa) # type: ignore 209 | 210 | 211 | if __name__ == "__main__": 212 | pytest.main(sys.argv) 213 | -------------------------------------------------------------------------------- /src/bandersnatch/tests/plugins/test_blocklist_name.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | from tempfile import TemporaryDirectory 4 | from unittest import TestCase 5 | 6 | from mock_config import mock_config 7 | 8 | import bandersnatch.filter 9 | from bandersnatch.master import Master 10 | from bandersnatch.mirror import BandersnatchMirror 11 | from bandersnatch.package import Package 12 | 13 | 14 | class TestBlockListProject(TestCase): 15 | """ 16 | Tests for the bandersnatch filtering classes 17 | """ 18 | 19 | tempdir = None 20 | cwd = None 21 | 22 | def setUp(self) -> None: 23 | self.cwd = os.getcwd() 24 | self.tempdir = TemporaryDirectory() 25 | os.chdir(self.tempdir.name) 26 | 27 | def tearDown(self) -> None: 28 | if self.tempdir: 29 | assert self.cwd 30 | os.chdir(self.cwd) 31 | self.tempdir.cleanup() 32 | self.tempdir = None 33 | 34 | def test__plugin__loads__explicitly_enabled(self) -> None: 35 | mock_config( 36 | """\ 37 | [plugins] 38 | enabled = 39 | blocklist_project 40 | """ 41 | ) 42 | 43 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins() 44 | names = [plugin.name for plugin in plugins] 45 | self.assertListEqual(names, ["blocklist_project"]) 46 | self.assertEqual(len(plugins), 1) 47 | 48 | def test__plugin__doesnt_load__explicitly__disabled(self) -> None: 49 | mock_config( 50 | """\ 51 | [plugins] 52 | enabled = 53 | blocklist_release 54 | """ 55 | ) 56 | 57 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins() 58 | names = [plugin.name for plugin in plugins] 59 | self.assertNotIn("blocklist_project", names) 60 | 61 | def test__plugin__loads__default(self) -> None: 62 | mock_config( 63 | """\ 64 | [blocklist] 65 | """ 66 | ) 67 | 68 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins() 69 | names = [plugin.name for plugin in plugins] 70 | self.assertNotIn("blocklist_project", names) 71 | 72 | def test__filter__matches__package(self) -> None: 73 | mock_config( 74 | """\ 75 | [plugins] 76 | enabled = 77 | blocklist_project 78 | [blocklist] 79 | packages = 80 | foo 81 | """ 82 | ) 83 | 84 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 85 | mirror.packages_to_sync = {"foo": ""} 86 | mirror._filter_packages() 87 | 88 | self.assertNotIn("foo", mirror.packages_to_sync.keys()) 89 | 90 | def test__filter__nomatch_package(self) -> None: 91 | mock_config( 92 | """\ 93 | [blocklist] 94 | plugins = 95 | blocklist_project 96 | packages = 97 | foo 98 | """ 99 | ) 100 | 101 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 102 | mirror.packages_to_sync = {"foo2": ""} 103 | mirror._filter_packages() 104 | 105 | self.assertIn("foo2", mirror.packages_to_sync.keys()) 106 | 107 | def test__filter__name_only(self) -> None: 108 | mock_config( 109 | """\ 110 | [mirror] 111 | storage-backend = filesystem 112 | 113 | [plugins] 114 | enabled = 115 | blocklist_project 116 | 117 | [blocklist] 118 | packages = 119 | foo==1.2.3 120 | """ 121 | ) 122 | 123 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 124 | mirror.packages_to_sync = {"foo": "", "foo2": ""} 125 | mirror._filter_packages() 126 | 127 | self.assertIn("foo", mirror.packages_to_sync.keys()) 128 | self.assertIn("foo2", mirror.packages_to_sync.keys()) 129 | 130 | def test__filter__varying__specifiers(self) -> None: 131 | mock_config( 132 | """\ 133 | [mirror] 134 | storage-backend = filesystem 135 | 136 | [plugins] 137 | enabled = 138 | blocklist_project 139 | 140 | [blocklist] 141 | packages = 142 | foo==1.2.3 143 | bar~=3.0,<=1.5 144 | snu 145 | """ 146 | ) 147 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 148 | mirror.packages_to_sync = { 149 | "foo": "", 150 | "foo2": "", 151 | "bar": "", 152 | "snu": "", 153 | } 154 | mirror._filter_packages() 155 | 156 | self.assertEqual({"foo": "", "foo2": "", "bar": ""}, mirror.packages_to_sync) 157 | 158 | 159 | class TestBlockListRelease(TestCase): 160 | """ 161 | Tests for the bandersnatch filtering classes 162 | """ 163 | 164 | tempdir = None 165 | cwd = None 166 | 167 | def setUp(self) -> None: 168 | self.cwd = os.getcwd() 169 | self.tempdir = TemporaryDirectory() 170 | os.chdir(self.tempdir.name) 171 | 172 | def tearDown(self) -> None: 173 | if self.tempdir: 174 | assert self.cwd 175 | os.chdir(self.cwd) 176 | self.tempdir.cleanup() 177 | self.tempdir = None 178 | 179 | def test__plugin__loads__explicitly_enabled(self) -> None: 180 | mock_config( 181 | """\ 182 | [plugins] 183 | enabled = 184 | blocklist_release 185 | """ 186 | ) 187 | 188 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 189 | names = [plugin.name for plugin in plugins] 190 | self.assertListEqual(names, ["blocklist_release"]) 191 | self.assertEqual(len(plugins), 1) 192 | 193 | def test__plugin__doesnt_load__explicitly__disabled(self) -> None: 194 | mock_config( 195 | """\ 196 | [plugins] 197 | enabled = 198 | blocklist_package 199 | """ 200 | ) 201 | 202 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins() 203 | names = [plugin.name for plugin in plugins] 204 | self.assertNotIn("blocklist_release", names) 205 | 206 | def test__filter__matches__release(self) -> None: 207 | mock_config( 208 | """\ 209 | [plugins] 210 | enabled = 211 | blocklist_release 212 | [blocklist] 213 | packages = 214 | foo==1.2.0 215 | """ 216 | ) 217 | 218 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 219 | pkg = Package("foo", 1) 220 | pkg._metadata = { 221 | "info": {"name": "foo"}, 222 | "releases": {"1.2.0": {}, "1.2.1": {}}, 223 | } 224 | 225 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 226 | 227 | self.assertEqual(pkg.releases, {"1.2.1": {}}) 228 | 229 | def test__dont__filter__prereleases(self) -> None: 230 | mock_config( 231 | """\ 232 | [plugins] 233 | enabled = 234 | blocklist_release 235 | [blocklist] 236 | packages = 237 | foo<=1.2.0 238 | """ 239 | ) 240 | 241 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 242 | pkg = Package("foo", 1) 243 | pkg._metadata = { 244 | "info": {"name": "foo"}, 245 | "releases": { 246 | "1.1.0a2": {}, 247 | "1.1.1beta1": {}, 248 | "1.2.0": {}, 249 | "1.2.1": {}, 250 | "1.2.2alpha3": {}, 251 | "1.2.3rc1": {}, 252 | }, 253 | } 254 | 255 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 256 | 257 | self.assertEqual(pkg.releases, {"1.2.1": {}, "1.2.2alpha3": {}, "1.2.3rc1": {}}) 258 | 259 | def test__casing__no__affect(self) -> None: 260 | mock_config( 261 | """\ 262 | [plugins] 263 | enabled = 264 | blocklist_release 265 | [blocklist] 266 | packages = 267 | Foo<=1.2.0 268 | """ 269 | ) 270 | 271 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com")) 272 | pkg = Package("foo", 1) 273 | pkg._metadata = { 274 | "info": {"name": "foo"}, 275 | "releases": {"1.2.0": {}, "1.2.1": {}}, 276 | } 277 | 278 | pkg.filter_all_releases(mirror.filters.filter_release_plugins()) 279 | 280 | self.assertEqual(pkg.releases, {"1.2.1": {}}) 281 | -------------------------------------------------------------------------------- /src/bandersnatch/verify.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import asyncio 3 | import concurrent.futures 4 | import json 5 | import logging 6 | import os 7 | import shutil 8 | from argparse import Namespace 9 | from asyncio.queues import Queue 10 | from configparser import ConfigParser 11 | from pathlib import Path 12 | from sys import stderr 13 | from typing import List, Optional, Set 14 | from urllib.parse import urlparse 15 | 16 | from .filter import LoadedFilters 17 | from .master import Master 18 | from .storage import storage_backend_plugins 19 | from .utils import convert_url_to_path, hash, recursive_find_files, unlink_parent_dir 20 | 21 | logger = logging.getLogger(__name__) 22 | 23 | 24 | async def get_latest_json( 25 | master: Master, 26 | json_path: Path, 27 | config: ConfigParser, 28 | executor: Optional[concurrent.futures.ThreadPoolExecutor] = None, 29 | delete_removed_packages: bool = False, 30 | ) -> None: 31 | url_parts = urlparse(config.get("mirror", "master")) 32 | url = f"{url_parts.scheme}://{url_parts.netloc}/pypi/{json_path.name}/json" 33 | logger.debug(f"Updating {json_path.name} json from {url}") 34 | new_json_path = json_path.parent / f"{json_path.name}.new" 35 | await master.url_fetch(url, new_json_path, executor) 36 | if new_json_path.exists(): 37 | shutil.move(str(new_json_path), json_path) 38 | else: 39 | logger.error( 40 | f"{str(new_json_path)} does not exist - Did not get new JSON metadata" 41 | ) 42 | if delete_removed_packages and json_path.exists(): 43 | logger.debug(f"Unlinking {json_path} - assuming it does not exist upstream") 44 | json_path.unlink() 45 | 46 | 47 | async def delete_unowned_files( 48 | mirror_base: Path, 49 | executor: concurrent.futures.ThreadPoolExecutor, 50 | all_package_files: List[Path], 51 | dry_run: bool, 52 | ) -> int: 53 | loop = asyncio.get_event_loop() 54 | packages_path = mirror_base / "web" / "packages" 55 | all_fs_files: Set[Path] = set() 56 | await loop.run_in_executor( 57 | executor, recursive_find_files, all_fs_files, packages_path 58 | ) 59 | 60 | all_package_files_set = set(all_package_files) 61 | unowned_files = all_fs_files - all_package_files_set 62 | logger.info( 63 | f"We have {len(all_package_files_set)} files. " 64 | + f"{len(unowned_files)} unowned files" 65 | ) 66 | if not unowned_files: 67 | logger.info(f"{mirror_base} has no files to delete") 68 | return 0 69 | 70 | if dry_run: 71 | print("[DRY RUN] Unowned file list:", file=stderr) 72 | for f in sorted(unowned_files): 73 | print(f) 74 | else: 75 | del_coros = [] 76 | for file_path in unowned_files: 77 | del_coros.append( 78 | loop.run_in_executor(executor, unlink_parent_dir, file_path) 79 | ) 80 | await asyncio.gather(*del_coros) 81 | 82 | return 0 83 | 84 | 85 | async def verify( 86 | master: Master, 87 | config: ConfigParser, 88 | json_file: str, 89 | mirror_base_path: Path, 90 | all_package_files: List[Path], 91 | args: argparse.Namespace, 92 | executor: Optional[concurrent.futures.ThreadPoolExecutor] = None, 93 | releases_key: str = "releases", 94 | ) -> None: 95 | json_base = mirror_base_path / "web" / "json" 96 | json_full_path = json_base / json_file 97 | loop = asyncio.get_event_loop() 98 | logger.info(f"Parsing {json_file}") 99 | 100 | if args.json_update: 101 | if not args.dry_run: 102 | await get_latest_json(master, json_full_path, config, executor, args.delete) 103 | else: 104 | logger.info(f"[DRY RUN] Would of grabbed latest json for {json_file}") 105 | 106 | if not json_full_path.exists(): 107 | logger.debug(f"Not trying to sync package as {json_full_path} does not exist") 108 | return 109 | 110 | try: 111 | with json_full_path.open("r") as jfp: 112 | pkg = json.load(jfp) 113 | except json.decoder.JSONDecodeError as jde: 114 | logger.error(f"Failed to load {json_full_path}: {jde} - skipping ...") 115 | return 116 | 117 | # apply releases filter plugins like class Package 118 | for plugin in LoadedFilters().filter_release_plugins() or []: 119 | plugin.filter(pkg) 120 | 121 | for release_version in pkg[releases_key]: 122 | for jpkg in pkg[releases_key][release_version]: 123 | pkg_file = mirror_base_path / "web" / convert_url_to_path(jpkg["url"]) 124 | if not pkg_file.exists(): 125 | if args.dry_run: 126 | logger.info(f"{jpkg['url']} would be fetched") 127 | all_package_files.append(pkg_file) 128 | continue 129 | else: 130 | await master.url_fetch(jpkg["url"], pkg_file, executor) 131 | 132 | calc_sha256 = await loop.run_in_executor(executor, hash, str(pkg_file)) 133 | if calc_sha256 != jpkg["digests"]["sha256"]: 134 | if not args.dry_run: 135 | await loop.run_in_executor(None, pkg_file.unlink) 136 | await master.url_fetch(jpkg["url"], pkg_file, executor) 137 | else: 138 | logger.info( 139 | f"[DRY RUN] {jpkg['info']['name']} has a sha256 mismatch." 140 | ) 141 | 142 | all_package_files.append(pkg_file) 143 | 144 | logger.info(f"Finished validating {json_file}") 145 | 146 | 147 | async def verify_producer( 148 | master: Master, 149 | config: ConfigParser, 150 | all_package_files: List[Path], 151 | mirror_base_path: Path, 152 | json_files: List[str], 153 | args: argparse.Namespace, 154 | executor: Optional[concurrent.futures.ThreadPoolExecutor] = None, 155 | ) -> None: 156 | queue: asyncio.Queue = asyncio.Queue() 157 | for jf in json_files: 158 | await queue.put(jf) 159 | 160 | async def consume(q: Queue) -> None: 161 | while not q.empty(): 162 | json_file = await q.get() 163 | await verify( 164 | master, 165 | config, 166 | json_file, 167 | mirror_base_path, 168 | all_package_files, 169 | args, 170 | executor, 171 | ) 172 | 173 | await asyncio.gather( 174 | *[consume(queue)] * config.getint("mirror", "verifiers", fallback=3) 175 | ) 176 | 177 | 178 | async def metadata_verify(config: ConfigParser, args: Namespace) -> int: 179 | """Crawl all saved JSON metadata or online to check we have all packages 180 | if delete - generate a diff of unowned files""" 181 | all_package_files: List[Path] = [] 182 | loop = asyncio.get_event_loop() 183 | 184 | storage_backend = next( 185 | iter(storage_backend_plugins(config=config, clear_cache=True)) 186 | ) 187 | 188 | mirror_base_path = storage_backend.PATH_BACKEND(config.get("mirror", "directory")) 189 | json_base = mirror_base_path / "web" / "json" 190 | workers = args.workers or config.getint("mirror", "workers") 191 | executor = concurrent.futures.ThreadPoolExecutor(max_workers=workers) 192 | 193 | logger.info(f"Starting verify for {mirror_base_path} with {workers} workers") 194 | try: 195 | json_files = await loop.run_in_executor(executor, os.listdir, json_base) 196 | except FileExistsError as fee: 197 | logger.error(f"Metadata base dir {json_base} does not exist: {fee}") 198 | return 2 199 | if not json_files: 200 | logger.error("No JSON metadata files found. Can not verify") 201 | return 3 202 | 203 | logger.debug(f"Found {len(json_files)} objects in {json_base}") 204 | logger.debug(f"Using a {workers} thread ThreadPoolExecutor") 205 | async with Master( 206 | config.get("mirror", "master"), 207 | config.getfloat("mirror", "timeout"), 208 | config.getfloat("mirror", "global-timeout", fallback=None), 209 | ) as master: 210 | await verify_producer( 211 | master, 212 | config, 213 | all_package_files, 214 | mirror_base_path, 215 | json_files, 216 | args, 217 | executor, 218 | ) 219 | 220 | if not args.delete: 221 | return 0 222 | 223 | return await delete_unowned_files( 224 | mirror_base_path, executor, all_package_files, args.dry_run 225 | ) 226 | -------------------------------------------------------------------------------- /src/bandersnatch/filter.py: -------------------------------------------------------------------------------- 1 | """ 2 | Blacklist management 3 | """ 4 | from collections import defaultdict 5 | from typing import TYPE_CHECKING, Any, Dict, List 6 | 7 | import pkg_resources 8 | 9 | from .configuration import BandersnatchConfig 10 | 11 | if TYPE_CHECKING: 12 | from configparser import SectionProxy 13 | 14 | 15 | # The API_REVISION is incremented if the plugin class is modified in a 16 | # backwards incompatible way. In order to prevent loading older 17 | # broken plugins that may be installed and will break due to changes to 18 | # the methods of the classes. 19 | PLUGIN_API_REVISION = 2 20 | PROJECT_PLUGIN_RESOURCE = f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.project" 21 | METADATA_PLUGIN_RESOURCE = ( 22 | f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.metadata" 23 | ) 24 | RELEASE_PLUGIN_RESOURCE = f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.release" 25 | RELEASE_FILE_PLUGIN_RESOURCE = ( 26 | f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.release_file" 27 | ) 28 | 29 | 30 | class Filter: 31 | """ 32 | Base Filter class 33 | """ 34 | 35 | name = "filter" 36 | deprecated_name: str = "" 37 | 38 | def __init__(self, *args: Any, **kwargs: Any) -> None: 39 | self.configuration = BandersnatchConfig().config 40 | if ( 41 | "plugins" not in self.configuration 42 | or "enabled" not in self.configuration["plugins"] 43 | ): 44 | return 45 | 46 | split_plugins = self.configuration["plugins"]["enabled"].split("\n") 47 | if ( 48 | "all" not in split_plugins 49 | and self.name not in split_plugins 50 | # NOTE: Remove after 5.0 51 | and not (self.deprecated_name and self.deprecated_name in split_plugins) 52 | ): 53 | return 54 | 55 | self.initialize_plugin() 56 | 57 | def initialize_plugin(self) -> None: 58 | """ 59 | Code to initialize the plugin 60 | """ 61 | # The intialize_plugin method is run once to initialize the plugin. This should 62 | # contain all code to set up the plugin. 63 | # This method is not run in the fast path and should be used to do things like 64 | # indexing filter databases, etc that will speed the operation of the filter 65 | # and check_match methods that are called in the fast path. 66 | pass 67 | 68 | def filter(self, metadata: dict) -> bool: 69 | """ 70 | Check if the plugin matches based on the package's metadata. 71 | 72 | Returns 73 | ======= 74 | bool: 75 | True if the values match a filter rule, False otherwise 76 | """ 77 | return False 78 | 79 | def check_match(self, **kwargs: Any) -> bool: 80 | """ 81 | Check if the plugin matches based on the arguments provides. 82 | 83 | Returns 84 | ======= 85 | bool: 86 | True if the values match a filter rule, False otherwise 87 | """ 88 | return False 89 | 90 | # NOTE: These two can be removed in 5.0 91 | @property 92 | def allowlist(self) -> "SectionProxy": 93 | return ( 94 | self.configuration["whitelist"] 95 | if self.configuration.has_section("whitelist") 96 | else self.configuration["allowlist"] 97 | ) 98 | 99 | @property 100 | def blocklist(self) -> "SectionProxy": 101 | return ( 102 | self.configuration["blacklist"] 103 | if self.configuration.has_section("blacklist") 104 | else self.configuration["blocklist"] 105 | ) 106 | 107 | 108 | class FilterProjectPlugin(Filter): 109 | """ 110 | Plugin that blocks sync operations for an entire project 111 | """ 112 | 113 | name = "project_plugin" 114 | 115 | 116 | class FilterMetadataPlugin(Filter): 117 | """ 118 | Plugin that blocks sync operations for an entire project based on info fields. 119 | """ 120 | 121 | name = "metadata_plugin" 122 | 123 | 124 | class FilterReleasePlugin(Filter): 125 | """ 126 | Plugin that modifies the download of specific releases or dist files 127 | """ 128 | 129 | name = "release_plugin" 130 | 131 | 132 | class FilterReleaseFilePlugin(Filter): 133 | """ 134 | Plugin that modify the download of specific release or dist files 135 | """ 136 | 137 | name = "release_file_plugin" 138 | 139 | 140 | class LoadedFilters: 141 | """ 142 | A class to load all of the filters enabled 143 | """ 144 | 145 | ENTRYPOINT_GROUPS = [ 146 | PROJECT_PLUGIN_RESOURCE, 147 | METADATA_PLUGIN_RESOURCE, 148 | RELEASE_PLUGIN_RESOURCE, 149 | RELEASE_FILE_PLUGIN_RESOURCE, 150 | ] 151 | 152 | def __init__(self, load_all: bool = False) -> None: 153 | """ 154 | Loads and stores all of specified filters from the config file 155 | """ 156 | self.config = BandersnatchConfig().config 157 | self.loaded_filter_plugins: Dict[str, List["Filter"]] = defaultdict(list) 158 | self.enabled_plugins = self._load_enabled() 159 | if load_all: 160 | self._load_filters(self.ENTRYPOINT_GROUPS) 161 | 162 | def _load_enabled(self) -> List[str]: 163 | """ 164 | Reads the config and returns all the enabled plugins 165 | """ 166 | enabled_plugins: List[str] = [] 167 | try: 168 | config_plugins = self.config["plugins"]["enabled"] 169 | split_plugins = config_plugins.split("\n") 170 | if "all" in split_plugins: 171 | enabled_plugins = ["all"] 172 | else: 173 | for plugin in split_plugins: 174 | if not plugin: 175 | continue 176 | enabled_plugins.append(plugin) 177 | except KeyError: 178 | pass 179 | return enabled_plugins 180 | 181 | def _load_filters(self, groups: List[str]) -> None: 182 | """ 183 | Loads filters from the entry-point groups specified in groups 184 | """ 185 | for group in groups: 186 | plugins = set() 187 | for entry_point in pkg_resources.iter_entry_points(group=group): 188 | plugin_class = entry_point.load() 189 | plugin_instance = plugin_class() 190 | if ( 191 | "all" in self.enabled_plugins 192 | or plugin_instance.name in self.enabled_plugins 193 | or plugin_instance.deprecated_name in self.enabled_plugins 194 | ): 195 | plugins.add(plugin_instance) 196 | 197 | self.loaded_filter_plugins[group] = list(plugins) 198 | 199 | def filter_project_plugins(self) -> List[Filter]: 200 | """ 201 | Load and return the release filtering plugin objects 202 | 203 | Returns 204 | ------- 205 | list of bandersnatch.filter.Filter: 206 | List of objects derived from the bandersnatch.filter.Filter class 207 | """ 208 | if PROJECT_PLUGIN_RESOURCE not in self.loaded_filter_plugins: 209 | self._load_filters([PROJECT_PLUGIN_RESOURCE]) 210 | return self.loaded_filter_plugins[PROJECT_PLUGIN_RESOURCE] 211 | 212 | def filter_metadata_plugins(self) -> List[Filter]: 213 | """ 214 | Load and return the release filtering plugin objects 215 | 216 | Returns 217 | ------- 218 | list of bandersnatch.filter.Filter: 219 | List of objects derived from the bandersnatch.filter.Filter class 220 | """ 221 | if METADATA_PLUGIN_RESOURCE not in self.loaded_filter_plugins: 222 | self._load_filters([METADATA_PLUGIN_RESOURCE]) 223 | return self.loaded_filter_plugins[METADATA_PLUGIN_RESOURCE] 224 | 225 | def filter_release_plugins(self) -> List[Filter]: 226 | """ 227 | Load and return the release filtering plugin objects 228 | 229 | Returns 230 | ------- 231 | list of bandersnatch.filter.Filter: 232 | List of objects derived from the bandersnatch.filter.Filter class 233 | """ 234 | if RELEASE_PLUGIN_RESOURCE not in self.loaded_filter_plugins: 235 | self._load_filters([RELEASE_PLUGIN_RESOURCE]) 236 | return self.loaded_filter_plugins[RELEASE_PLUGIN_RESOURCE] 237 | 238 | def filter_release_file_plugins(self) -> List[Filter]: 239 | """ 240 | Load and return the release file filtering plugin objects 241 | 242 | Returns 243 | ------- 244 | list of bandersnatch.filter.Filter: 245 | List of objects derived from the bandersnatch.filter.Filter class 246 | """ 247 | if RELEASE_FILE_PLUGIN_RESOURCE not in self.loaded_filter_plugins: 248 | self._load_filters([RELEASE_FILE_PLUGIN_RESOURCE]) 249 | return self.loaded_filter_plugins[RELEASE_FILE_PLUGIN_RESOURCE] 250 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black) 2 | [![Build Status](https://travis-ci.org/pypa/bandersnatch.svg?branch=master)](https://travis-ci.org/pypa/bandersnatch) 3 | [![Actions Status](https://github.com/pypa/bandersnatch/workflows/bandersnatch_ci/badge.svg)](https://github.com/pypa/bandersnatch/actions) 4 | [![codecov.io](https://codecov.io/github/pypa/bandersnatch/coverage.svg?branch=master)](https://codecov.io/github/codecov/codecov-python) 5 | [![Documentation Status](https://readthedocs.org/projects/bandersnatch/badge/?version=latest)](http://bandersnatch.readthedocs.io/en/latest/?badge=latest) 6 | [![Updates](https://pyup.io/repos/github/pypa/bandersnatch/shield.svg)](https://pyup.io/repos/github/pypa/bandersnatch/) 7 | [![Downloads](https://pepy.tech/badge/bandersnatch)](https://pepy.tech/project/bandersnatch) 8 | 9 | ---- 10 | 11 | This is a PyPI mirror client according to `PEP 381` + `PEP 503` 12 | http://www.python.org/dev/peps/pep-0381/. 13 | 14 | - bandersnatch >=4.0 supports *Linux*, *MacOSX* + *Windows* 15 | - [Documentation](https://bandersnatch.readthedocs.io/en/latest/) 16 | 17 | **bandersnatch maintainers** are looking for more **help**! Please refer to our 18 | [MAINTAINER](https://github.com/pypa/bandersnatch/blob/master/MAINTAINERS.md) 19 | documentation to see the roles and responsibilities. We would also 20 | ask you read our **Mission Statement** to ensure it aligns with your thoughts for 21 | this project. 22 | 23 | - If interested contact @cooperlees 24 | 25 | `bandersnatch` has its dependencies kept up to date by **[pyup.io](https://pyup.io/)**! 26 | 27 | - If you'd like to have your dependencies kept up to date in your `requirements.txt` or `setup.cfg`, 28 | this is the service for you! 29 | 30 | ## Installation 31 | 32 | The following instructions will place the bandersnatch executable in a 33 | virtualenv under `bandersnatch/bin/bandersnatch`. 34 | 35 | - bandersnatch **requires** `>= Python 3.6.1` 36 | 37 | ## Docker 38 | 39 | This will pull latest build. Please use a specific tag if desired. 40 | 41 | - Docker image includes `/bandersnatch/src/runner.py` to periodically 42 | run a `bandersnatch mirror` 43 | - Please `/bandersnatch/src/runner.py --help` for usage 44 | - With docker, we recommend bind mounting in a read only `bandersnatch.conf` 45 | - Defaults to `/conf/bandersnatch.conf` 46 | 47 | ```shell 48 | docker pull pypa/bandersnatch 49 | docker run pypa/bandersnatch bandersnatch --help 50 | ``` 51 | 52 | ### pip 53 | 54 | This installs the latest stable, released version. 55 | 56 | ```shell 57 | python3.6 -m venv bandersnatch 58 | bandersnatch/bin/pip install bandersnatch 59 | bandersnatch/bin/bandersnatch --help 60 | ``` 61 | 62 | ## Quickstart 63 | 64 | - Run ``bandersnatch mirror`` - it will create an empty configuration file 65 | for you in ``/etc/bandersnatch.conf``. 66 | - Review ``/etc/bandersnatch.conf`` and adapt to your needs. 67 | - Run ``bandersnatch mirror`` again. It will populate your mirror with the 68 | current status of all PyPI packages. 69 | Current mirror package size can be seen here: https://pypi.org/stats/ 70 | - A ``blacklist`` or ``whitelist`` can be created to cut down your mirror size. 71 | You might want to [Analyze PyPI downloads](https://packaging.python.org/guides/analyzing-pypi-package-downloads/) 72 | to determine which packages to add to your list. 73 | - Run ``bandersnatch mirror`` regularly to update your mirror with any 74 | intermediate changes. 75 | 76 | ### Webserver 77 | 78 | Configure your webserver to serve the ``web/`` sub-directory of the mirror. 79 | For nginx it should look something like this: 80 | 81 | ```conf 82 | server { 83 | listen 127.0.0.1:80; 84 | listen [::1]:80; 85 | server_name ; 86 | root /web; 87 | autoindex on; 88 | charset utf-8; 89 | } 90 | ``` 91 | 92 | * Note that it is a good idea to have your webserver publish the HTML index 93 | files correctly with UTF-8 as the charset. The index pages will work without 94 | it but if humans look at the pages the characters will end up looking funny. 95 | 96 | * Make sure that the webserver uses UTF-8 to look up unicode path names. nginx 97 | gets this right by default - not sure about others. 98 | 99 | 100 | ### Cron jobs 101 | 102 | You need to set up one cron job to run the mirror itself. 103 | 104 | Here's a sample that you could place in `/etc/cron.d/bandersnatch`: 105 | 106 | ``` 107 | LC_ALL=en_US.utf8 108 | */2 * * * * root bandersnatch mirror |& logger -t bandersnatch[mirror] 109 | ``` 110 | 111 | This assumes that you have a ``logger`` utility installed that will convert the 112 | output of the commands to syslog entries. 113 | 114 | 115 | ### Maintenance 116 | 117 | bandersnatch does not keep much local state in addition to the mirrored data. 118 | In general you can just keep rerunning `bandersnatch mirror` to make it fix 119 | errors. 120 | 121 | If you want to force bandersnatch to check everything against the master PyPI: 122 | 123 | * run `bandersnatch mirror --force-check` to move status files if they exist in your mirror directory in order get a full sync. 124 | 125 | Be aware that full syncs likely take hours depending on PyPI's performance and your network latency and bandwidth. 126 | 127 | #### Other Commands 128 | 129 | * `bandersnatch delete --help` - Allows you to specify package(s) to be removed from your mirror (*dangerous*) 130 | * `bandersnatch verify --help` - Crawls your repo and fixes any missed files + deletes any unowned files found (*dangerous*) 131 | 132 | ### Operational notes 133 | 134 | #### Case-sensitive filesystem needed 135 | 136 | You need to run bandersnatch on a case-sensitive filesystem. 137 | 138 | OS X natively does this OK even though the filesystem is not strictly 139 | case-sensitive and bandersnatch will work fine when running on OS X. However, 140 | tarring a bandersnatch data directory and moving it to, e.g. Linux with a 141 | case-sensitive filesystem will lead to inconsistencies. You can fix those by 142 | deleting the status files and have bandersnatch run a full check on your data. 143 | 144 | #### Windows requires elevated prompt 145 | 146 | Bandersnatch makes use of symbolic links. On Windows, this permission is turned off by default for non-admin users. In order to run bandersnatch on Windows either call it from an elevated command prompt (i.e. right-click, run-as Administrator) or give yourself symlink permissions in the group policy editor. 147 | 148 | #### Many sub-directories needed 149 | 150 | The PyPI has a quite extensive list of packages that we need to maintain in a 151 | flat directory. Filesystems with small limits on the number of sub-directories 152 | per directory can run into a problem like this: 153 | 154 | 2013-07-09 16:11:33,331 ERROR: Error syncing package: zweb@802449 155 | OSError: [Errno 31] Too many links: '../pypi/web/simple/zweb' 156 | 157 | Specifically we recommend to avoid using ext3. Ext4 and newer does not have the 158 | limitation of 32k sub-directories. 159 | 160 | #### Client Compatibility 161 | 162 | A bandersnatch static mirror is compatible only to the "static", cacheable 163 | parts of PyPI that are needed to support package installation. It does not 164 | support more dynamic APIs of PyPI that maybe be used by various clients for 165 | other purposes. 166 | 167 | An example of an unsupported API is [PyPI's XML-RPC interface](https://warehouse.readthedocs.io/api-reference/xml-rpc/), which is used when running `pip search`. 168 | 169 | ### Bandersnatch Mission 170 | The bandersnatch project strives to: 171 | - Mirror all static objects of the Python Package Index (https://pypi.org/) 172 | - bandersnatch's main goal is to support the main global index to local syncing **only** 173 | - This will allow organizations to have lower latency access to PyPI and 174 | save bandwidth on their WAN connections and more importantly the PyPI CDN 175 | - Custom features and requests may be accepted if they can be of a *plugin* form 176 | - e.g. refer to the `blacklist` and `whitelist` plugins 177 | 178 | ### Contact 179 | 180 | If you have questions or comments, please submit a bug report to 181 | https://github.com/pypa/bandersnatch/issues/new 182 | - IRC: #bandersnatch on *Freenode* (You can use [webchat](https://webchat.freenode.net/?channels=%23bandersnatch) if you don't have an IRC client) 183 | 184 | ### Code of Conduct 185 | 186 | Everyone interacting in the bandersnatch project's codebases, issue trackers, 187 | chat rooms, and mailing lists is expected to follow the 188 | [PSF Code of Conduct](https://github.com/pypa/.github/blob/main/CODE_OF_CONDUCT.md). 189 | 190 | ### Kudos 191 | 192 | This client is based on the original pep381client by *Martin v. Loewis*. 193 | 194 | *Richard Jones* was very patient answering questions at PyCon 2013 and made the 195 | protocol more reliable by implementing some PyPI enhancements. 196 | 197 | *Christian Theune* for creating and maintaining `bandersnatch` for many years! 198 | --------------------------------------------------------------------------------