├── src
├── bandersnatch_filter_plugins
│ ├── __init__.py
│ ├── prerelease_name.py
│ ├── latest_name.py
│ ├── regex_name.py
│ ├── filename_name.py
│ ├── allowlist_name.py
│ └── blocklist_name.py
├── bandersnatch
│ ├── tests
│ │ ├── sample
│ │ ├── mock_config.py
│ │ ├── ci.conf
│ │ ├── ci-swift.conf
│ │ ├── test_sync.py
│ │ ├── test_package.py
│ │ ├── plugins
│ │ │ ├── test_prerelease_name.py
│ │ │ ├── test_regex_name.py
│ │ │ ├── test_latest_release.py
│ │ │ ├── test_filename.py
│ │ │ ├── test_allowlist_name.py
│ │ │ └── test_blocklist_name.py
│ │ ├── test_master.py
│ │ ├── test_utils.py
│ │ ├── conftest.py
│ │ ├── test_main.py
│ │ ├── test_filter.py
│ │ ├── test_delete.py
│ │ ├── test_configuration.py
│ │ └── test_verify.py
│ ├── __main__.py
│ ├── log.py
│ ├── __init__.py
│ ├── errors.py
│ ├── default.conf
│ ├── unittest.conf
│ ├── utils.py
│ ├── delete.py
│ ├── configuration.py
│ ├── package.py
│ ├── main.py
│ ├── verify.py
│ └── filter.py
├── bandersnatch_storage_plugins
│ └── __init__.py
├── test_tools
│ └── test_xmlrpc.py
└── runner.py
├── pytest.ini
├── setup.py
├── requirements_swift.txt
├── docs
├── modules.rst
├── installation.md
├── bandersnatch_storage_plugins.rst
├── index.rst
├── bandersnatch_filter_plugins.rst
├── bandersnatch.rst
├── mirror_configuration.md
└── filtering_configuration.md
├── .coveragerc
├── .pyup.yml
├── .flake8
├── requirements_test.txt
├── requirements.txt
├── bootstrap.sh
├── .github
└── workflows
│ ├── docker_upload.yml
│ ├── docker_readme.yml
│ ├── pypi_upload.yml
│ └── ci.yml
├── mypy.ini
├── .readthedocs.yml
├── requirements_docs.txt
├── Dockerfile
├── .pre-commit-config.yaml
├── tox.ini
├── .travis.yml
├── .gitignore
├── MAINTAINERS.md
├── DEVELOPMENT.md
├── setup.cfg
├── test_runner.py
└── README.md
/src/bandersnatch_filter_plugins/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/sample:
--------------------------------------------------------------------------------
1 | I am a sample!
2 |
--------------------------------------------------------------------------------
/src/bandersnatch_storage_plugins/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/pytest.ini:
--------------------------------------------------------------------------------
1 | [pytest]
2 | log_cli_level = DEBUG
3 | log_level = DEBUG
4 |
--------------------------------------------------------------------------------
/src/bandersnatch/__main__.py:
--------------------------------------------------------------------------------
1 | from . import main
2 |
3 | exit(main.main())
4 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | from setuptools import setup
3 |
4 | setup()
5 |
--------------------------------------------------------------------------------
/requirements_swift.txt:
--------------------------------------------------------------------------------
1 | keystoneauth1==4.2.1
2 | openstackclient==4.0.0
3 | python-swiftclient==3.10.0
4 |
--------------------------------------------------------------------------------
/docs/modules.rst:
--------------------------------------------------------------------------------
1 | bandersnatch
2 | ============
3 |
4 | .. toctree::
5 | :maxdepth: 4
6 |
7 | bandersnatch
8 | bandersnatch_filter_plugins
9 | bandersnatch_storage_plugins
10 |
--------------------------------------------------------------------------------
/.coveragerc:
--------------------------------------------------------------------------------
1 | [run]
2 | branch = True
3 | source =
4 | bandersnatch
5 | bandersnatch_storage_plugins
6 |
7 | [report]
8 | precision = 2
9 | omit = */apache_*.py
10 | */buildout.py
11 | */release.py
12 | */log.py
13 |
--------------------------------------------------------------------------------
/.pyup.yml:
--------------------------------------------------------------------------------
1 | requirements:
2 | - requirements.txt:
3 | update: all
4 | pin: True
5 | - requirements_docs.txt:
6 | update: all
7 | pin: True
8 | - requirements_test.txt:
9 | update: all
10 | pin: True
11 | - setup.cfg:
12 | update: False
13 |
--------------------------------------------------------------------------------
/.flake8:
--------------------------------------------------------------------------------
1 | [flake8]
2 | select = B,C,E,F,P,W
3 | max_line_length = 88
4 | # E722 is a duplicate of B001.
5 | # P207 is a duplicate of B003.
6 | # W503 is against PEP8
7 | ignore = E722, P207, W503
8 | max-complexity = 20
9 | exclude =
10 | build,
11 | dist,
12 | __pycache__,
13 | *.pyc,
14 | .git,
15 | .tox
16 |
--------------------------------------------------------------------------------
/requirements_test.txt:
--------------------------------------------------------------------------------
1 | asynctest==0.13.0
2 | async-timeout==3.0.1
3 | black==19.10b0
4 | codecov==2.1.8
5 | coverage==5.2.1
6 | flake8==3.8.3
7 | flake8-bugbear==20.1.4
8 | freezegun==0.3.15
9 | pre-commit==2.6.0
10 | pytest==6.0.1
11 | pytest-asyncio==0.14.0
12 | pytest-timeout==1.4.2
13 | setuptools==49.6.0
14 | tox==3.19.0
15 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | aiohttp==3.6.2
2 | aiohttp-xmlrpc==0.8.1
3 | async-timeout==3.0.1
4 | attrs==19.3.0
5 | chardet==3.0.4
6 | filelock==3.0.12
7 | idna==2.10
8 | importlib_resources==3.0.0; python_version < '3.7'
9 | lxml==4.5.2
10 | multidict==4.7.6
11 | packaging==20.4
12 | pyparsing==2.4.7
13 | setuptools==49.6.0
14 | six==1.15.0
15 | yarl==1.5.1
16 |
--------------------------------------------------------------------------------
/bootstrap.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | if [ ! -e bin/python3 ]; then
4 | virtualenv --python=python3.6 .
5 | fi
6 | if [ ! -e bin/buildout ]; then
7 | bin/pip install zc.buildout
8 | bin/pip install virtualenv
9 | fi
10 | bin/pip install --upgrade zc.buildout==2.11.1
11 | bin/pip install --upgrade setuptools==38.5.2
12 | bin/pip install --upgrade virtualenv==15.1.0
13 | bin/buildout
14 |
--------------------------------------------------------------------------------
/docs/installation.md:
--------------------------------------------------------------------------------
1 | ## Installation
2 |
3 | The following instructions will place the bandersnatch executable in a
4 | virtualenv under `bandersnatch/bin/bandersnatch`.
5 |
6 | - bandersnatch **requires** `>= Python 3.6`
7 |
8 |
9 | ### pip
10 |
11 | This installs the latest stable, released version.
12 |
13 | *(>= 3.6.1 required)*
14 |
15 | ```
16 | $ python3.6 -m venv bandersnatch
17 | $ bandersnatch/bin/pip install bandersnatch
18 | $ bandersnatch/bin/bandersnatch --help
19 | ```
20 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/mock_config.py:
--------------------------------------------------------------------------------
1 | from bandersnatch.configuration import BandersnatchConfig
2 |
3 |
4 | def mock_config(contents: str, filename: str = "test.conf") -> BandersnatchConfig:
5 | """
6 | Creates a config file with contents and loads them into a
7 | BandersnatchConfig instance.
8 | """
9 | with open(filename, "w") as fd:
10 | fd.write(contents)
11 |
12 | instance = BandersnatchConfig()
13 | instance.config_file = filename
14 | instance.load_configuration()
15 | return instance
16 |
--------------------------------------------------------------------------------
/.github/workflows/docker_upload.yml:
--------------------------------------------------------------------------------
1 | name: bandersnatch_docker_upload
2 |
3 | on:
4 | push:
5 | branches:
6 | - master
7 |
8 | jobs:
9 | build:
10 | runs-on: ubuntu-latest
11 | steps:
12 | - uses: actions/checkout@master
13 | - name: Publish to Docker Registry
14 | uses: elgohr/Publish-Docker-Github-Action@master
15 | with:
16 | name: pypa/bandersnatch
17 | username: ${{ secrets.DOCKER_USERNAME }}
18 | password: ${{ secrets.DOCKER_PASSWORD }}
19 | snapshot: true
20 | tag_names: true
21 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/ci.conf:
--------------------------------------------------------------------------------
1 | ; Config for the Travis CI Integration Test that hits PyPI
2 |
3 | [mirror]
4 | directory = /tmp/pypi
5 | json = true
6 | cleanup = true
7 | master = https://pypi.org
8 | timeout = 60
9 | global-timeout = 18000
10 | workers = 3
11 | hash-index = true
12 | stop-on-error = true
13 | storage-backend = filesystem
14 | verifiers = 3
15 | keep_index_versions = 2
16 |
17 | [plugins]
18 | enabled =
19 | allowlist_project
20 |
21 | [allowlist]
22 | packages =
23 | ACMPlus
24 | black
25 | peerme
26 | pyaib
27 |
28 | ; vim: set ft=cfg:
29 |
--------------------------------------------------------------------------------
/src/test_tools/test_xmlrpc.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | """
4 | Quick tool to test xmlrpc queries from bandersnatch
5 | - Note this is >= 3.7 only due to asyncio.run
6 | """
7 |
8 | import asyncio
9 |
10 | from bandersnatch.master import Master
11 |
12 |
13 | async def main() -> int:
14 | async with Master("https://pypi.org") as master:
15 | all_packages = await master.all_packages()
16 | print(f"PyPI returned {len(all_packages)} PyPI packages via xmlrpc")
17 | return 0
18 |
19 |
20 | if __name__ == "__main__":
21 | exit(asyncio.run(main())) # type: ignore
22 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/ci-swift.conf:
--------------------------------------------------------------------------------
1 | ; Config for the Travis CI Integration Test that hits PyPI
2 |
3 | [mirror]
4 | directory = /tmp/pypi
5 | json = true
6 | master = https://pypi.org
7 | timeout = 60
8 | global-timeout = 18000
9 | workers = 3
10 | hash-index = true
11 | stop-on-error = true
12 | storage-backend = swift
13 | verifiers = 3
14 | keep_index_versions = 2
15 |
16 | [swift]
17 | default_container = bandersnatch
18 |
19 | [plugins]
20 | enabled =
21 | allowlist_project
22 |
23 | [allowlist]
24 | packages =
25 | black
26 | peerme
27 | pyaib
28 |
29 | ; vim: set ft=cfg:
30 |
--------------------------------------------------------------------------------
/mypy.ini:
--------------------------------------------------------------------------------
1 | [mypy]
2 | python_version = 3.6
3 | check_untyped_defs = True
4 | disallow_incomplete_defs = True
5 | disallow_untyped_defs = True
6 | # Until pytest + plugins type their decorators need this disabled
7 | disallow_untyped_decorators = False
8 | ignore_missing_imports = True
9 | no_implicit_optional = True
10 | pretty = True
11 | show_error_context = True
12 | sqlite_cache = True
13 | warn_no_return = True
14 | warn_redundant_casts = True
15 | warn_return_any = True
16 | warn_unreachable = True
17 | warn_unused_ignores = True
18 |
19 | # TODO: Enable PEP420 for bandersnatch
20 | namespace_packages = False
21 |
--------------------------------------------------------------------------------
/.readthedocs.yml:
--------------------------------------------------------------------------------
1 | # Read the Docs configuration file
2 | # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
3 |
4 | version: 2
5 |
6 | sphinx:
7 | builder: html
8 | configuration: docs/conf.py
9 | # It's probably better if we have slightly outdated but known good documentation
10 | # up than having (possibly very) broken documentation.
11 | fail_on_warning: true
12 |
13 | formats:
14 | - pdf
15 | - htmlzip
16 | - epub
17 |
18 | python:
19 | version: 3.7
20 | install:
21 | - method: pip
22 | path: .
23 | - requirements: requirements_docs.txt # By extension, this installs requirements_swift.txt too
24 |
--------------------------------------------------------------------------------
/.github/workflows/docker_readme.yml:
--------------------------------------------------------------------------------
1 | name: bandersnatch_docker_readme
2 |
3 | on:
4 | push:
5 | branches:
6 | - master
7 | paths:
8 | - README.md
9 | - .github/workflows/docker_readme.yml
10 |
11 | jobs:
12 | dockerHubDescription:
13 | runs-on: ubuntu-latest
14 | steps:
15 | - uses: actions/checkout@master
16 | - name: Publish README to Docker Hub Description
17 | uses: peter-evans/dockerhub-description@v2.2.0
18 | env:
19 | DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USERNAME }}
20 | DOCKERHUB_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
21 | DOCKERHUB_REPOSITORY: pypa/bandersnatch
22 |
--------------------------------------------------------------------------------
/src/bandersnatch/log.py:
--------------------------------------------------------------------------------
1 | # This is mainly factored out into a separate module so I can ignore it in
2 | # coverage analysis. Unfortunately this is really hard to test as the Python
3 | # logging module won't allow reasonable teardown. :(
4 | import logging
5 | from typing import Any
6 |
7 |
8 | def setup_logging(args: Any) -> logging.StreamHandler:
9 | ch = logging.StreamHandler()
10 | formatter = logging.Formatter("%(asctime)s %(levelname)s: %(message)s")
11 | ch.setFormatter(formatter)
12 | logger = logging.getLogger("bandersnatch")
13 | logger.setLevel(logging.DEBUG if args.debug else logging.INFO)
14 | logger.addHandler(ch)
15 | return ch
16 |
--------------------------------------------------------------------------------
/requirements_docs.txt:
--------------------------------------------------------------------------------
1 | docutils==0.16
2 | pyparsing==2.4.7
3 | python-dateutil==2.8.1
4 | packaging==20.4
5 | requests==2.24.0
6 | six==1.15.0
7 | sphinx==3.2.1
8 | recommonmark==0.6.0
9 | xmlrpc2==0.3.1
10 |
11 | git+https://github.com/pypa/pypa-docs-theme.git#egg=pypa-docs-theme
12 | git+https://github.com/python/python-docs-theme.git#egg=python-docs-theme
13 |
14 | # This is needed since autodoc imports all bandersnatch packages and modules
15 | # so imports must not fail or its containing module will NOT be documented.
16 | # Also, the missing swift dependencies will cause the doc build to fail since
17 | # autodoc will raise a warning due to the import failure.
18 | -r requirements_swift.txt
19 |
--------------------------------------------------------------------------------
/src/bandersnatch/__init__.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | __copyright__ = "2010-2020, PyPA"
4 |
5 | from typing import NamedTuple
6 |
7 |
8 | class _VersionInfo(NamedTuple):
9 | major: int
10 | minor: int
11 | micro: int
12 | releaselevel: str
13 | serial: int
14 |
15 | @property
16 | def version_str(self) -> str:
17 | release_level = f".{self.releaselevel}" if self.releaselevel else ""
18 | return f"{self.major}.{self.minor}.{self.micro}{release_level}"
19 |
20 |
21 | __version_info__ = _VersionInfo(
22 | major=4,
23 | minor=1,
24 | micro=1,
25 | releaselevel="",
26 | serial=0, # Not currently in use with Bandersnatch versioning
27 | )
28 | __version__ = __version_info__.version_str
29 |
--------------------------------------------------------------------------------
/docs/bandersnatch_storage_plugins.rst:
--------------------------------------------------------------------------------
1 | bandersnatch_storage_plugins package
2 | ====================================
3 |
4 | Package contents
5 | ----------------
6 |
7 | .. automodule:: bandersnatch_storage_plugins
8 | :members:
9 | :undoc-members:
10 | :show-inheritance:
11 |
12 | Submodules
13 | ----------
14 |
15 | bandersnatch_storage_plugins.filesystem module
16 | ----------------------------------------------
17 |
18 | .. automodule:: bandersnatch_storage_plugins.filesystem
19 | :members:
20 | :undoc-members:
21 | :show-inheritance:
22 |
23 | bandersnatch_storage_plugins.swift module
24 | -----------------------------------------
25 |
26 | .. automodule:: bandersnatch_storage_plugins.swift
27 | :members:
28 | :undoc-members:
29 | :show-inheritance:
30 |
--------------------------------------------------------------------------------
/src/bandersnatch/errors.py:
--------------------------------------------------------------------------------
1 | class PackageNotFound(Exception):
2 | """We asked for package metadata from PyPI and it wasn't available"""
3 |
4 | def __init__(self, package_name: str) -> None:
5 | super().__init__()
6 | self.package_name = package_name
7 |
8 | def __str__(self) -> str:
9 | return f"{self.package_name} no longer exists on PyPI"
10 |
11 |
12 | class StaleMetadata(Exception):
13 | """We attempted to retreive metadata from PyPI, but it was stale."""
14 |
15 | def __init__(self, package_name: str, attempts: int) -> None:
16 | super().__init__()
17 | self.package_name = package_name
18 | self.attempts = attempts
19 |
20 | def __str__(self) -> str:
21 | return f"Stale serial for {self.package_name} after {self.attempts} attempts"
22 |
--------------------------------------------------------------------------------
/.github/workflows/pypi_upload.yml:
--------------------------------------------------------------------------------
1 | name: bandersnatch_pypi_upload
2 |
3 | on:
4 | release:
5 | types: created
6 |
7 | jobs:
8 | build:
9 | name: bandersnatch PyPI Upload
10 | runs-on: ubuntu-latest
11 |
12 | steps:
13 | - uses: actions/checkout@v1
14 |
15 | - name: Set up Python 3.7
16 | uses: actions/setup-python@v1
17 | with:
18 | python-version: 3.7
19 |
20 | - name: Install latest pip, setuptools + tox
21 | run: |
22 | python -m pip install --upgrade pip setuptools twine wheel
23 |
24 | - name: Build wheels
25 | run: |
26 | python setup.py bdist_wheel
27 | python setup.py sdist
28 |
29 | - name: Upload to PyPI via Twine
30 | env:
31 | TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
32 | run: |
33 | twine upload --verbose -u '__token__' dist/*
34 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3
2 |
3 | RUN mkdir /bandersnatch
4 | RUN mkdir /conf && chmod 777 /conf
5 | ADD setup.cfg /bandersnatch
6 | ADD setup.py /bandersnatch
7 | ADD requirements.txt /bandersnatch
8 | ADD requirements_swift.txt /bandersnatch
9 | ADD README.md /bandersnatch
10 | ADD CHANGES.md /bandersnatch
11 | COPY src /bandersnatch/src
12 |
13 | # OPTIONAL: Include a config file
14 | # Remember to bind mount the "directory" in bandersnatch.conf
15 | # Reccomended to bind mount /conf - `runner.py` defaults to look for /conf/bandersnatch.conf
16 | # ADD bandersnatch.conf /etc
17 |
18 | RUN pip --no-cache-dir install --upgrade pip setuptools wheel
19 | RUN pip --no-cache-dir install --upgrade -r /bandersnatch/requirements.txt -r /bandersnatch/requirements_swift.txt
20 | RUN pip --no-cache-dir -v install /bandersnatch/[swift]
21 |
22 | CMD ["python", "/bandersnatch/src/runner.py", "3600"]
23 |
--------------------------------------------------------------------------------
/.pre-commit-config.yaml:
--------------------------------------------------------------------------------
1 | repos:
2 | - repo: https://github.com/pre-commit/pre-commit-hooks
3 | rev: v3.1.0
4 | hooks:
5 | - id: trailing-whitespace
6 | - id: end-of-file-fixer
7 | - id: check-yaml
8 | - id: check-added-large-files
9 | exclude: ^docs/conf.py$
10 | - repo: https://github.com/ambv/black
11 | rev: 19.10b0
12 | hooks:
13 | - id: black
14 | args: [--target-version, py36]
15 | - repo: https://github.com/asottile/pyupgrade
16 | rev: v2.4.4
17 | hooks:
18 | - id: pyupgrade
19 | args: [--py36-plus]
20 | - repo: https://github.com/asottile/seed-isort-config
21 | rev: v2.1.1
22 | hooks:
23 | - id: seed-isort-config
24 | args: [--application-directories, '.:src']
25 | - repo: https://github.com/pre-commit/mirrors-isort
26 | rev: v4.3.21
27 | hooks:
28 | - id: isort
29 | - repo: https://github.com/pre-commit/mirrors-mypy
30 | rev: v0.770
31 | hooks:
32 | - id: mypy
33 | exclude: (docs/.*)
34 |
--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
1 | .. documentation master file
2 | You can adapt this file completely to your liking, but it should at least
3 | contain the root `toctree` directive.
4 |
5 | Bandersnatch documentation
6 | ==========================
7 |
8 | bandersnatch is a PyPI mirror client according to `PEP 381`
9 | http://www.python.org/dev/peps/pep-0381/.
10 |
11 | Bandersnatch hits the XMLRPC API of pypi.org to get all packages with serial
12 | or packages since the last run's serial. bandersnatch then uses the JSON API
13 | of PyPI to get shasums and release file paths to download and workout where
14 | to layout the package files on a POSIX file system.
15 |
16 | As of 4.0 bandersnatch:
17 | - Is fully asyncio based (mainly via aiohttp)
18 | - Only stores PEP503 nomalized packages names for the /simple API
19 | - Only stores JSON in normailzed package name path too
20 |
21 | Contents:
22 |
23 | .. toctree::
24 | :maxdepth: 3
25 |
26 | installation
27 | mirror_configuration
28 | filtering_configuration
29 | CONTRIBUTING
30 | modules
31 |
--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
1 | name: bandersnatch_ci
2 |
3 | on: [push, pull_request]
4 |
5 | jobs:
6 | build:
7 | name: bandersnatch CI python ${{ matrix.python-version }} on ${{matrix.os}}
8 | runs-on: ${{ matrix.os }}
9 | strategy:
10 | matrix:
11 | python-version: [3.6, 3.7, 3.8]
12 | os: [macOS-latest, ubuntu-latest, windows-latest]
13 |
14 | steps:
15 | - uses: actions/checkout@v1
16 |
17 | - name: Set up Python ${{ matrix.python-version }}
18 | uses: actions/setup-python@v1
19 | with:
20 | python-version: ${{ matrix.python-version }}
21 |
22 | - name: Install latest pip, setuptools + tox
23 | run: |
24 | python -m pip install --upgrade pip setuptools tox
25 |
26 | - name: Install base bandersnatch requirements
27 | run: |
28 | python -m pip install -r requirements.txt
29 |
30 | - name: Run Unittests
31 | env:
32 | TOXENV: py3
33 | run: |
34 | python test_runner.py
35 |
36 | - name: Run Integration Test
37 | env:
38 | TOXENV: INTEGRATION
39 | run: |
40 | python -m pip install .
41 | python test_runner.py
42 |
--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [tox]
2 | envlist = lint,py3
3 |
4 | [testenv]
5 | passenv = CI TRAVIS TRAVIS_*
6 | commands =
7 | coverage run -m pytest {posargs}
8 | coverage report -m
9 | coverage html
10 | codecov
11 | deps = -r requirements_test.txt
12 | extras = swift
13 |
14 | [testenv:doc_build]
15 | basepython=python3
16 | commands =
17 | {envpython} {envbindir}/sphinx-build -a -W -b html docs docs/html
18 | changedir = {toxinidir}
19 | deps =
20 | -r requirements_docs.txt
21 | sphinx-rtd-theme
22 |
23 | extras = doc_build
24 | passenv = SSH_AUTH_SOCK
25 | setenv =
26 | SPHINX_THEME='pypa'
27 |
28 | [testenv:lint]
29 | basepython=python3
30 | skip_install=True
31 | deps = -r requirements_test.txt
32 | commands=
33 | pre-commit run --all-files --show-diff-on-failure
34 | flake8
35 |
36 | [isort]
37 | atomic = true
38 | not_skip = __init__.py
39 | line_length = 88
40 | multi_line_output = 3
41 | known_third_party = aiohttp,aiohttp_xmlrpc,asynctest,filelock,freezegun,importlib_resources,packaging,pytest,setuptools
42 | known_first_party = bandersnatch,bandersnatch_filter_plugins,bandersnatch_storage_plugins
43 | force_grid_wrap = 0
44 | use_parentheses=True
45 | include_trailing_comma = True
46 |
--------------------------------------------------------------------------------
/src/bandersnatch_filter_plugins/prerelease_name.py:
--------------------------------------------------------------------------------
1 | import logging
2 | import re
3 | from typing import Dict, List, Pattern
4 |
5 | from bandersnatch.filter import FilterReleasePlugin
6 |
7 | logger = logging.getLogger("bandersnatch")
8 |
9 |
10 | class PreReleaseFilter(FilterReleasePlugin):
11 | """
12 | Filters releases considered pre-releases.
13 | """
14 |
15 | name = "prerelease_release"
16 | PRERELEASE_PATTERNS = (
17 | r".+rc\d+$",
18 | r".+a(lpha)?\d+$",
19 | r".+b(eta)?\d+$",
20 | r".+dev\d+$",
21 | )
22 | patterns: List[Pattern] = []
23 |
24 | def initialize_plugin(self) -> None:
25 | """
26 | Initialize the plugin reading patterns from the config.
27 | """
28 | if not self.patterns:
29 | self.patterns = [
30 | re.compile(pattern_string)
31 | for pattern_string in self.PRERELEASE_PATTERNS
32 | ]
33 | logger.info(f"Initialized prerelease plugin with {self.patterns}")
34 |
35 | def filter(self, metadata: Dict) -> bool:
36 | """
37 | Returns False if version fails the filter, i.e. follows a prerelease pattern
38 | """
39 | version = metadata["version"]
40 | return not any(pattern.match(version) for pattern in self.patterns)
41 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_sync.py:
--------------------------------------------------------------------------------
1 | from os import sep
2 |
3 | import asynctest
4 | import pytest
5 |
6 | from bandersnatch import utils
7 | from bandersnatch.mirror import BandersnatchMirror
8 |
9 |
10 | @pytest.mark.asyncio
11 | async def test_sync_specific_packages(mirror: BandersnatchMirror) -> None:
12 | FAKE_SERIAL = b"112233"
13 | with open("status", "wb") as f:
14 | f.write(FAKE_SERIAL)
15 | # Package names should be normalized by synchronize()
16 | specific_packages = ["Foo"]
17 | mirror.master.all_packages = asynctest.CoroutineMock( # type: ignore
18 | return_value={"foo": 1}
19 | )
20 | mirror.json_save = True
21 | # Recall bootstrap so we have the json dirs
22 | mirror._bootstrap()
23 | await mirror.synchronize(specific_packages)
24 |
25 | assert """\
26 | json{0}foo
27 | packages{0}2.7{0}f{0}foo{0}foo.whl
28 | packages{0}any{0}f{0}foo{0}foo.zip
29 | pypi{0}foo{0}json
30 | simple{0}foo{0}index.html
31 | simple{0}index.html""".format(
32 | sep
33 | ) == utils.find(
34 | mirror.webdir, dirs=False
35 | )
36 |
37 | assert (
38 | open("web{0}simple{0}index.html".format(sep)).read()
39 | == """\
40 |
41 |
42 |
43 | Simple Index
44 |
45 |
46 | foo
47 |
48 | """
49 | )
50 | # The "sync" method shouldn't update the serial
51 | assert open("status", "rb").read() == FAKE_SERIAL
52 |
--------------------------------------------------------------------------------
/src/runner.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | # Simple Script to replace cron for Docker
4 |
5 | import argparse
6 | import sys
7 | from subprocess import CalledProcessError, run
8 | from time import sleep, time
9 |
10 |
11 | def main() -> int:
12 | parser = argparse.ArgumentParser()
13 | parser.add_argument(
14 | "-c",
15 | "--config",
16 | default="/conf/bandersnatch.conf",
17 | help="Configuration location",
18 | )
19 | parser.add_argument("interval", help="Time in seconds between runs", type=int)
20 | args = parser.parse_args()
21 |
22 | print(f"Running bandersnatch every {args.interval}s", file=sys.stderr)
23 | try:
24 | while True:
25 | start_time = time()
26 |
27 | try:
28 | cmd = [
29 | sys.executable,
30 | "-m",
31 | "bandersnatch.main",
32 | "--config",
33 | args.config,
34 | "mirror",
35 | ]
36 | run(cmd, check=True)
37 | except CalledProcessError as cpe:
38 | return cpe.returncode
39 |
40 | run_time = time() - start_time
41 | if run_time < args.interval:
42 | sleep_time = args.interval - run_time
43 | print(f"Sleeping for {sleep_time}s", file=sys.stderr)
44 | sleep(sleep_time)
45 | except KeyboardInterrupt:
46 | pass
47 |
48 | return 0
49 |
50 |
51 | if __name__ == "__main__":
52 | sys.exit(main())
53 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: python
2 |
3 | env:
4 | global:
5 | secure: "d2LR/WLe9ZragzMEXGGjkFGoPiVm8ilCc41CEhnsJNRUzDo5crfZY1OTzE45CIx3kpVYgUNUveb37Q2WMYI9X7xJp4gY+gxEOWG4EMefkXu2IwUBzwKaqA0oU1pYcnm5FlU8MoBL3HBWagTvGG7Q90YR4s/78HJTDrwHjTAlZctyh5O89vHq8yjDDa/FIwEytns3U8FsVp5IGe+vvDBsbrlFgW0kGG2ayc2bO0i9wof0RF9J7gre5zrKg9h80AHxbmprZ9hhjsYPj3cgEniaQn5dFxf7k3YfkPMvr2h9HHdHPucF+KRiux2/UvQ/CPeSpZGmcC+YzHcgliKK/bkl2MHZDQJ78V+vhJKchbZ+3iVyuFYbhggE5nmUjDMpthnfhraGGIPc9ZYwwKTYhLMc2AlcBLu3+cLAAcCT7gl4ArZQT6+1jXrMApulPIHqxsnxwTKxBPyuq0M7w5TJJMpgXGy4l5xUO/z8FYAQ1+rBHif3Sy36Sh2w0cAAKCz46dow5ZdpXhxUurA9VeCkQRb0D/fg59N/KoAnsjbbbgUyg3zjsxF8OiLMqyOTnagecFzCMUjT8yT0cknEz8oCwznEbP3seqzGevTzmC8yXXAhAFeaGwdh8t7WkOT5I50cOZ+bNgnyi1nsiEBzaGDNEVtd+uI91Ij5GTT54ZRoXBxaS0k="
6 |
7 | matrix:
8 | fast_finish: true
9 | include:
10 | - python: 3.8
11 | env: TOXENV=doc_build
12 | - python: 3.8
13 | env: TOXENV=lint
14 | - python: 3.8
15 | env: TOXENV=py3
16 | - python: nightly
17 | env: TOXENV=INTEGRATION
18 | - python: nightly
19 | env: TOXENV=py3
20 | allow_failures:
21 | - python: nightly
22 | env: TOXENV=INTEGRATION
23 | - python: nightly
24 | env: TOXENV=py3
25 |
26 | install:
27 | - pip install --upgrade pip setuptools
28 | - pip install -r requirements.txt -r requirements_test.txt
29 | - pip install .
30 |
31 | script:
32 | - python test_runner.py
33 |
34 | notifications:
35 | irc:
36 | channels:
37 | - "chat.freenode.net#bandersnatch"
38 |
39 | cache:
40 | directories:
41 | - $HOME/.cache/pip
42 | - $HOME/.cache/pre-commit
43 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_package.py:
--------------------------------------------------------------------------------
1 | import asynctest
2 | import pytest
3 | from _pytest.capture import CaptureFixture
4 |
5 | from bandersnatch.errors import PackageNotFound, StaleMetadata
6 | from bandersnatch.master import Master, StalePage
7 | from bandersnatch.package import Package
8 |
9 |
10 | def test_package_accessors(package: Package) -> None:
11 | assert package.info == {"name": "Foo", "version": "0.1"}
12 | assert package.last_serial == 654_321
13 | assert list(package.releases.keys()) == ["0.1"]
14 | assert len(package.release_files) == 2
15 | for f in package.release_files:
16 | assert "filename" in f
17 | assert "digests" in f
18 |
19 |
20 | @pytest.mark.asyncio
21 | async def test_package_update_metadata_gives_up_after_3_stale_responses(
22 | caplog: CaptureFixture, master: Master
23 | ) -> None:
24 | master.get_package_metadata = asynctest.CoroutineMock( # type: ignore
25 | side_effect=StalePage
26 | )
27 | package = Package("foo", serial=11)
28 |
29 | with pytest.raises(StaleMetadata):
30 | await package.update_metadata(master, attempts=3)
31 | assert master.get_package_metadata.await_count == 3 # type: ignore
32 | assert "not updating. Giving up" in caplog.text
33 |
34 |
35 | @pytest.mark.asyncio
36 | async def test_package_not_found(caplog: CaptureFixture, master: Master) -> None:
37 | pkg_name = "foo"
38 | master.get_package_metadata = asynctest.CoroutineMock( # type: ignore
39 | side_effect=PackageNotFound(pkg_name)
40 | )
41 | package = Package(pkg_name, serial=11)
42 |
43 | with pytest.raises(PackageNotFound):
44 | await package.update_metadata(master)
45 | assert "foo no longer exists on PyPI" in caplog.text
46 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 |
27 | # PyInstaller
28 | # Usually these files are written by a python script from a template
29 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 |
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 |
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 | test.conf
48 |
49 | # Translations
50 | *.mo
51 | *.pot
52 |
53 | # Django stuff:
54 | *.log
55 | local_settings.py
56 |
57 | # Flask stuff:
58 | instance/
59 | .webassets-cache
60 |
61 | # Scrapy stuff:
62 | .scrapy
63 |
64 | # Sphinx documentation
65 | doc/html/
66 |
67 | # PyBuilder
68 | target/
69 |
70 | # IPython Notebook
71 | .ipynb_checkpoints
72 |
73 | # pyenv
74 | .python-version
75 |
76 | # celery beat schedule file
77 | celerybeat-schedule
78 |
79 | # dotenv
80 | .env
81 |
82 | # virtualenv
83 | venv/
84 | ENV/
85 |
86 | # Spyder project settings
87 | .spyderproject
88 |
89 | # Rope project settings
90 | .ropeproject
91 |
92 | # VSCode project settings
93 | .vscode/
94 |
95 | /.coverage
96 | .tox/
97 | artifacts/*.xml
98 | UNKNOWN.egg-info/
99 |
100 | /.idea
101 |
102 | .mypy_cache
103 | /.pytest_cache
104 | /docs/html
105 |
106 | # MonkeyType
107 | monkeytype.sqlite3
108 |
109 | # Integration test
110 | mirrored-files
111 |
--------------------------------------------------------------------------------
/docs/bandersnatch_filter_plugins.rst:
--------------------------------------------------------------------------------
1 | bandersnatch_filter_plugins package
2 | ===================================
3 |
4 | Package contents
5 | ----------------
6 |
7 | .. automodule:: bandersnatch_filter_plugins
8 | :members:
9 | :undoc-members:
10 | :show-inheritance:
11 |
12 | Submodules
13 | ----------
14 |
15 | bandersnatch_filter_plugins.blocklist_name module
16 | -------------------------------------------------
17 |
18 | .. automodule:: bandersnatch_filter_plugins.blocklist_name
19 | :members:
20 | :undoc-members:
21 | :show-inheritance:
22 |
23 | bandersnatch_filter_plugins.filename_name module
24 | ------------------------------------------------
25 |
26 | .. automodule:: bandersnatch_filter_plugins.filename_name
27 | :members:
28 | :undoc-members:
29 | :show-inheritance:
30 |
31 | bandersnatch_filter_plugins.latest_name module
32 | ----------------------------------------------
33 |
34 | .. automodule:: bandersnatch_filter_plugins.latest_name
35 | :members:
36 | :undoc-members:
37 | :show-inheritance:
38 |
39 | bandersnatch_filter_plugins.metadata_filter module
40 | --------------------------------------------------
41 |
42 | .. automodule:: bandersnatch_filter_plugins.metadata_filter
43 | :members:
44 | :undoc-members:
45 | :show-inheritance:
46 |
47 | bandersnatch_filter_plugins.prerelease_name module
48 | --------------------------------------------------
49 |
50 | .. automodule:: bandersnatch_filter_plugins.prerelease_name
51 | :members:
52 | :undoc-members:
53 | :show-inheritance:
54 |
55 | bandersnatch_filter_plugins.regex_name module
56 | ---------------------------------------------
57 |
58 | .. automodule:: bandersnatch_filter_plugins.regex_name
59 | :members:
60 | :undoc-members:
61 | :show-inheritance:
62 |
63 | bandersnatch_filter_plugins.allowlist_name module
64 | -------------------------------------------------
65 |
66 | .. automodule:: bandersnatch_filter_plugins.allowlist_name
67 | :members:
68 | :undoc-members:
69 | :show-inheritance:
70 |
--------------------------------------------------------------------------------
/MAINTAINERS.md:
--------------------------------------------------------------------------------
1 | # Maintaining Bandersnatch
2 |
3 | This document sets out the roles, processes and responsibilities `bandersnatch`
4 | maintainers hold and can conduct.
5 |
6 | ## Summary of being a Maintainer of `bandersnatch`
7 |
8 | - **Issue Triage**
9 | - Assesses if the Issue is accurate + reproducible
10 | - If the issue is a feature request, assesses if it fits the *bandersnatch mission*
11 | - **PR Merging**
12 | - Accesses Pull Requests for suitability and adherence to the *bandersnatch mission*
13 | - It is preferred that big changes be pulling in from *branches* via *Pull Requests*
14 | - Peer reviewed by another maintainer
15 | - **Releases**
16 | - You will have **"the commit bit"** access
17 |
18 | ### Links to key mentioned files
19 |
20 | - Change Log: [CHANGES.md](https://github.com/pypa/bandersnatch/blob/master/CHANGES.md)
21 | - Mission Statement: Can be found in bandersnatch's [README.md](https://github.com/pypa/bandersnatch/blob/master/README.md)
22 | - Readme File: [README.md](https://github.com/pypa/bandersnatch/blob/master/README.md)
23 | - Semantic Versioning: [PEP 440 Semantic](https://www.python.org/dev/peps/pep-0440/#semantic-versioning)
24 |
25 | ## Processes
26 |
27 | ### Evaluating Issues and Pull Requests
28 |
29 | Please always think of the mission of bandersnatch. We should just mirror in a
30 | compatible way like a PEP381 mirror. Simple is always better than complex and all *bug*
31 | issues need to be reproducable for our developers.
32 |
33 | #### pyup.io
34 | - Remember it's not perfect
35 | - It does not take into account modules pinned dependencies
36 | - e.g. If requests wants *urllib3<1.25* *pyup.io* can still try and update it
37 | - Until we have **CI** that effectively runs `pip freeze` from time to time we
38 | should recheck our minimal deps that we pin in `requirements.txt`
39 |
40 | ### Releasing to PyPI
41 | Every maintainer can release to PyPI. A maintainer should have agreement of
42 | two or more Maintainers that it is a suitable time for a release.
43 |
44 | #### Release Process
45 |
46 | - Update `src/bandersnatch/__init__.py` version
47 | - Update the Change Log with difference from the last release
48 | - Push / Merge to Master
49 | - Create a GitHub Release
50 | - Tag with the semantic version number
51 | - Build a `sdist` + `wheel`
52 | - Use `twine` to upload to PyPI
53 |
--------------------------------------------------------------------------------
/docs/bandersnatch.rst:
--------------------------------------------------------------------------------
1 | bandersnatch package
2 | ====================
3 |
4 | Package contents
5 | ----------------
6 |
7 | .. automodule:: bandersnatch
8 | :members:
9 | :undoc-members:
10 | :show-inheritance:
11 |
12 | Submodules
13 | ----------
14 |
15 | bandersnatch.configuration module
16 | ---------------------------------
17 |
18 | .. automodule:: bandersnatch.configuration
19 | :members:
20 | :undoc-members:
21 | :show-inheritance:
22 |
23 | bandersnatch.delete module
24 | --------------------------
25 |
26 | .. automodule:: bandersnatch.delete
27 | :members:
28 | :undoc-members:
29 | :show-inheritance:
30 |
31 | bandersnatch.filter module
32 | --------------------------
33 |
34 | .. automodule:: bandersnatch.filter
35 | :members:
36 | :undoc-members:
37 | :show-inheritance:
38 |
39 | bandersnatch.log module
40 | -----------------------
41 |
42 | .. automodule:: bandersnatch.log
43 | :members:
44 | :undoc-members:
45 | :show-inheritance:
46 |
47 | bandersnatch.main module
48 | ------------------------
49 |
50 | .. automodule:: bandersnatch.main
51 | :members:
52 | :undoc-members:
53 | :show-inheritance:
54 |
55 | bandersnatch.master module
56 | --------------------------
57 |
58 | .. automodule:: bandersnatch.master
59 | :members:
60 | :undoc-members:
61 | :show-inheritance:
62 |
63 | bandersnatch.mirror module
64 | --------------------------
65 |
66 | .. automodule:: bandersnatch.mirror
67 | :members:
68 | :undoc-members:
69 | :show-inheritance:
70 |
71 | bandersnatch.package module
72 | ---------------------------
73 |
74 | .. automodule:: bandersnatch.package
75 | :members:
76 | :undoc-members:
77 | :show-inheritance:
78 |
79 | bandersnatch.storage module
80 | ---------------------------
81 |
82 | .. automodule:: bandersnatch.storage
83 | :members:
84 | :undoc-members:
85 | :show-inheritance:
86 |
87 | bandersnatch.utils module
88 | -------------------------
89 |
90 | .. automodule:: bandersnatch.utils
91 | :members:
92 | :undoc-members:
93 | :show-inheritance:
94 |
95 | bandersnatch.verify module
96 | --------------------------
97 |
98 | .. automodule:: bandersnatch.verify
99 | :members:
100 | :undoc-members:
101 | :show-inheritance:
102 |
--------------------------------------------------------------------------------
/src/bandersnatch_filter_plugins/latest_name.py:
--------------------------------------------------------------------------------
1 | import logging
2 | from operator import itemgetter
3 | from typing import Dict, Sequence, Tuple, Union
4 |
5 | from packaging.version import LegacyVersion, Version, parse
6 |
7 | from bandersnatch.filter import FilterReleasePlugin
8 |
9 | logger = logging.getLogger("bandersnatch")
10 |
11 |
12 | class LatestReleaseFilter(FilterReleasePlugin):
13 | """
14 | Plugin to download only latest releases
15 | """
16 |
17 | name = "latest_release"
18 | keep = 0 # by default, keep 'em all
19 | latest: Sequence[str] = []
20 |
21 | def initialize_plugin(self) -> None:
22 | """
23 | Initialize the plugin reading patterns from the config.
24 | """
25 | if self.keep:
26 | return
27 |
28 | try:
29 | self.keep = int(self.configuration["latest_release"]["keep"])
30 | except KeyError:
31 | return
32 | except ValueError:
33 | return
34 | if self.keep > 0:
35 | logger.info(f"Initialized latest releases plugin with keep={self.keep}")
36 |
37 | def filter(self, metadata: Dict) -> bool:
38 | """
39 | Returns False if version fails the filter, i.e. is not a latest/current release
40 | """
41 | if self.keep == 0:
42 | return True
43 |
44 | if not self.latest:
45 | info = metadata["info"]
46 | releases = metadata["releases"]
47 | versions = list(releases.keys())
48 | before = len(versions)
49 |
50 | if before <= self.keep:
51 | # not enough releases: do nothing
52 | return True
53 |
54 | versions_pair = map(lambda v: (parse(v), v), versions)
55 | latest_sorted: Sequence[Tuple[Union[LegacyVersion, Version], str]] = sorted(
56 | versions_pair
57 | )[
58 | -self.keep : # noqa: E203
59 | ]
60 | self.latest = list(map(itemgetter(1), latest_sorted))
61 |
62 | current_version = info.get("version")
63 | if current_version and (current_version not in self.latest):
64 | # never remove the stable/official version
65 | self.latest[0] = current_version
66 |
67 | version = metadata["version"]
68 | return version in self.latest
69 |
--------------------------------------------------------------------------------
/DEVELOPMENT.md:
--------------------------------------------------------------------------------
1 | # bandersnatch development
2 |
3 | So you want to help out? **Awesome**. Go you!
4 |
5 | ## Getting Started
6 |
7 | We use GitHub. To get started I'd suggest visiting https://guides.github.com/
8 |
9 | ### Pre Install
10 |
11 | Please make sure you system has the following:
12 |
13 | - Python 3.6.1 or greater
14 | - git cli client
15 |
16 | Also ensure you can authenticate with GitHub via SSH Keys or HTTPS.
17 |
18 | ### Checkout `bandersnatch`
19 |
20 | Lets now cd to where we want the code and clone the repo:
21 |
22 | - `cd somewhere`
23 | - `git clone git@github.com:pypa/bandersnatch.git`
24 |
25 | ### Development venv
26 |
27 | One way to develop and install all the dependencies of bandersnatch is to use a venv.
28 |
29 | - Lets create one and upgrade `pip` and `setuptools`.
30 |
31 | ```
32 | python3 -m venv /path/to/venv
33 | /path/to/venv/bin/pip install --upgrade pip setuptools
34 | ```
35 |
36 | - Then we should install the dependencies to the venv:
37 |
38 | ```
39 | /path/to/venv/bin/pip install -r requirements.txt
40 | /path/to/venv/bin/pip install -r requirements_test.txt
41 | ```
42 |
43 | - To verify any changes in the documentation:
44 |
45 | **NOTICE:** This effectively installs `requirements_swift` *and* `requirements_docs.txt`
46 | since the dependencies are needed by autodoc which imports all of bandersnatch during
47 | documention building. So pip will install **a lot** of dependencies.
48 |
49 | ```
50 | /path/to/venv/bin/pip install -r requirements_docs.txt
51 | ```
52 |
53 | - Finally install bandersnatch in editable mode:
54 |
55 | ```
56 | /path/to/venv/bin/pip install -e .
57 | ```
58 |
59 | ## Running Bandersnatch
60 |
61 | You will need to customize `src/bandersnatch/default.conf` and run via the following:
62 |
63 | **WARNING: Bandersnatch will go off and sync from pypi.org and use disk space!**
64 |
65 | ```
66 | cd bandersnatch
67 | /path/to/venv/bin/pip install --upgrade .
68 | /path/to/venv/bin/bandersnatch --help
69 |
70 | /path/to/venv/bin/bandersnatch -c src/bandersnatch/default.conf mirror
71 | ```
72 |
73 | ## Running Unit Tests
74 |
75 | We use tox to run tests. `tox.ini` has the options needed, so running tests is very easy.
76 |
77 | ```
78 | cd bandersnatch
79 | /path/to/venv/bin/tox [-vv]
80 | ```
81 |
82 | You want to see:
83 | ```
84 | py36: commands succeeded
85 | congratulations :)
86 | ```
87 |
88 |
89 | ## Making a release
90 |
91 | *To be completed - @cooper has never used zc.buildout*
92 |
93 | * Tests green?
94 | * run `bin/fullrelease`
95 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/plugins/test_prerelease_name.py:
--------------------------------------------------------------------------------
1 | import os
2 | import re
3 | from pathlib import Path
4 | from tempfile import TemporaryDirectory
5 | from unittest import TestCase
6 |
7 | from mock_config import mock_config
8 |
9 | import bandersnatch.filter
10 | from bandersnatch.master import Master
11 | from bandersnatch.mirror import BandersnatchMirror
12 | from bandersnatch.package import Package
13 | from bandersnatch_filter_plugins import prerelease_name
14 |
15 |
16 | class BasePluginTestCase(TestCase):
17 |
18 | tempdir = None
19 | cwd = None
20 |
21 | def setUp(self) -> None:
22 | self.cwd = os.getcwd()
23 | self.tempdir = TemporaryDirectory()
24 | os.chdir(self.tempdir.name)
25 |
26 | def tearDown(self) -> None:
27 | if self.tempdir:
28 | assert self.cwd
29 | os.chdir(self.cwd)
30 | self.tempdir.cleanup()
31 | self.tempdir = None
32 |
33 |
34 | class TestRegexReleaseFilter(BasePluginTestCase):
35 |
36 | config_contents = """\
37 | [plugins]
38 | enabled =
39 | prerelease_release
40 | """
41 |
42 | def test_plugin_includes_predefined_patterns(self) -> None:
43 | mock_config(self.config_contents)
44 |
45 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
46 |
47 | assert any(
48 | type(plugin) == prerelease_name.PreReleaseFilter for plugin in plugins
49 | )
50 | plugin = next(
51 | plugin
52 | for plugin in plugins
53 | if isinstance(plugin, prerelease_name.PreReleaseFilter)
54 | )
55 | expected_patterns = [
56 | re.compile(pattern_string) for pattern_string in plugin.PRERELEASE_PATTERNS
57 | ]
58 | assert plugin.patterns == expected_patterns
59 |
60 | def test_plugin_check_match(self) -> None:
61 | mock_config(self.config_contents)
62 |
63 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
64 | pkg = Package("foo", serial=1)
65 | pkg._metadata = {
66 | "info": {"name": "foo", "version": "1.2.0"},
67 | "releases": {
68 | "1.2.0alpha1": {},
69 | "1.2.0a2": {},
70 | "1.2.0beta1": {},
71 | "1.2.0b2": {},
72 | "1.2.0rc1": {},
73 | "1.2.0": {},
74 | },
75 | }
76 |
77 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
78 |
79 | assert pkg.releases == {"1.2.0": {}}
80 |
--------------------------------------------------------------------------------
/src/bandersnatch_filter_plugins/regex_name.py:
--------------------------------------------------------------------------------
1 | import logging
2 | import re
3 | from typing import Dict, List, Pattern
4 |
5 | from bandersnatch.filter import FilterProjectPlugin, FilterReleasePlugin
6 |
7 | logger = logging.getLogger("bandersnatch")
8 |
9 |
10 | class RegexReleaseFilter(FilterReleasePlugin):
11 | """
12 | Filters releases based on regex patters defined by the user.
13 | """
14 |
15 | name = "regex_release"
16 | # Has to be iterable to ensure it works with any()
17 | patterns: List[Pattern] = []
18 |
19 | def initialize_plugin(self) -> None:
20 | """
21 | Initialize the plugin reading patterns from the config.
22 | """
23 | # TODO: should retrieving the plugin's config be part of the base class?
24 | try:
25 | config = self.configuration["filter_regex"]["releases"]
26 | except KeyError:
27 | return
28 | else:
29 | if not self.patterns:
30 | pattern_strings = [pattern for pattern in config.split("\n") if pattern]
31 | self.patterns = [
32 | re.compile(pattern_string) for pattern_string in pattern_strings
33 | ]
34 |
35 | logger.info(f"Initialized regex release plugin with {self.patterns}")
36 |
37 | def filter(self, metadata: Dict) -> bool:
38 | """
39 | Returns False if version fails the filter, i.e. follows a regex pattern
40 | """
41 | version = metadata["version"]
42 | return not any(pattern.match(version) for pattern in self.patterns)
43 |
44 |
45 | class RegexProjectFilter(FilterProjectPlugin):
46 | """
47 | Filters projects based on regex patters defined by the user.
48 | """
49 |
50 | name = "regex_project"
51 | # Has to be iterable to ensure it works with any()
52 | patterns: List[Pattern] = []
53 |
54 | def initialize_plugin(self) -> None:
55 | """
56 | Initialize the plugin reading patterns from the config.
57 | """
58 | try:
59 | config = self.configuration["filter_regex"]["packages"]
60 | except KeyError:
61 | return
62 | else:
63 | if not self.patterns:
64 | pattern_strings = [pattern for pattern in config.split("\n") if pattern]
65 | self.patterns = [
66 | re.compile(pattern_string) for pattern_string in pattern_strings
67 | ]
68 |
69 | logger.info(f"Initialized regex release plugin with {self.patterns}")
70 |
71 | def filter(self, metadata: Dict) -> bool:
72 | return not self.check_match(name=metadata["info"]["name"])
73 |
74 | def check_match(self, name: str) -> bool: # type: ignore[override]
75 | """
76 | Check if a release version matches any of the specified patterns.
77 |
78 | Parameters
79 | ==========
80 | name: str
81 | Release name
82 |
83 | Returns
84 | =======
85 | bool:
86 | True if it matches, False otherwise.
87 | """
88 | return any(pattern.match(name) for pattern in self.patterns)
89 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_master.py:
--------------------------------------------------------------------------------
1 | from pathlib import Path
2 | from tempfile import gettempdir
3 |
4 | import asynctest
5 | import pytest
6 |
7 | import bandersnatch
8 | from bandersnatch.master import Master, StalePage, XmlRpcError
9 |
10 |
11 | def test_disallow_http() -> None:
12 | with pytest.raises(ValueError):
13 | Master("http://pypi.example.com")
14 |
15 |
16 | def test_rpc_url(master: Master) -> None:
17 | assert master.xmlrpc_url == "https://pypi.example.com/pypi"
18 |
19 |
20 | @pytest.mark.asyncio
21 | async def test_all_packages(master: Master) -> None:
22 | expected = [["aiohttp", "", "", "", "69"]]
23 | master.rpc = asynctest.CoroutineMock(return_value=expected) # type: ignore
24 | pacakges = await master.all_packages()
25 | assert expected == pacakges
26 |
27 |
28 | @pytest.mark.asyncio
29 | async def test_all_packages_raises(master: Master) -> None:
30 | master.rpc = asynctest.CoroutineMock(return_value=[]) # type: ignore
31 | with pytest.raises(XmlRpcError):
32 | await master.all_packages()
33 |
34 |
35 | @pytest.mark.asyncio
36 | async def test_changed_packages_no_changes(master: Master) -> None:
37 | master.rpc = asynctest.CoroutineMock(return_value=None) # type: ignore
38 | changes = await master.changed_packages(4)
39 | assert changes == {}
40 |
41 |
42 | @pytest.mark.asyncio
43 | async def test_changed_packages_with_changes(master: Master) -> None:
44 | list_of_package_changes = [
45 | ("foobar", "1", 0, "added", 17),
46 | ("baz", "2", 1, "updated", 18),
47 | ("foobar", "1", 0, "changed", 20),
48 | # The server usually just hands out monotonous serials in the
49 | # changelog. This verifies that we don't fail even with garbage input.
50 | ("foobar", "1", 0, "changed", 19),
51 | ]
52 | master.rpc = asynctest.CoroutineMock( # type: ignore
53 | return_value=list_of_package_changes
54 | )
55 | changes = await master.changed_packages(4)
56 | assert changes == {"baz": 18, "foobar": 20}
57 |
58 |
59 | @pytest.mark.asyncio
60 | async def test_master_raises_if_serial_too_small(master: Master) -> None:
61 | get_ag = master.get("/asdf", 10)
62 | with pytest.raises(StalePage):
63 | await get_ag.asend(None)
64 | assert master.session.request.called
65 |
66 |
67 | @pytest.mark.asyncio
68 | async def test_master_doesnt_raise_if_serial_equal(master: Master) -> None:
69 | get_ag = master.get("/asdf", 1)
70 | await get_ag.asend(None)
71 |
72 |
73 | @pytest.mark.asyncio
74 | async def test_master_url_fetch(master: Master) -> None:
75 | fetch_path = Path(gettempdir()) / "unittest_url_fetch"
76 | await master.url_fetch("https://unittest.org/asdf", fetch_path)
77 | assert master.session.get.called
78 |
79 |
80 | @pytest.mark.asyncio
81 | async def test_xmlrpc_user_agent(master: Master) -> None:
82 | client = await master._gen_xmlrpc_client()
83 | assert f"bandersnatch {bandersnatch.__version__}" in client.headers["User-Agent"]
84 |
85 |
86 | @pytest.mark.asyncio
87 | async def test_session_raise_for_status(master: Master) -> None:
88 | patcher = asynctest.patch("aiohttp.ClientSession", autospec=True)
89 | with patcher as create_session:
90 | async with master:
91 | pass
92 | assert len(create_session.call_args_list) == 1
93 | assert create_session.call_args_list[0][1]["raise_for_status"]
94 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/plugins/test_regex_name.py:
--------------------------------------------------------------------------------
1 | import os
2 | import re
3 | from pathlib import Path
4 | from tempfile import TemporaryDirectory
5 | from unittest import TestCase
6 |
7 | from mock_config import mock_config
8 |
9 | import bandersnatch.filter
10 | from bandersnatch.master import Master
11 | from bandersnatch.mirror import BandersnatchMirror
12 | from bandersnatch.package import Package
13 | from bandersnatch_filter_plugins import regex_name
14 |
15 |
16 | class BasePluginTestCase(TestCase):
17 |
18 | tempdir = None
19 | cwd = None
20 |
21 | def setUp(self) -> None:
22 | self.cwd = os.getcwd()
23 | self.tempdir = TemporaryDirectory()
24 | os.chdir(self.tempdir.name)
25 |
26 | def tearDown(self) -> None:
27 | if self.tempdir:
28 | assert self.cwd
29 | os.chdir(self.cwd)
30 | self.tempdir.cleanup()
31 | self.tempdir = None
32 |
33 |
34 | class TestRegexReleaseFilter(BasePluginTestCase):
35 |
36 | config_contents = """\
37 | [plugins]
38 | enabled =
39 | regex_release
40 |
41 | [filter_regex]
42 | releases =
43 | .+rc\\d$
44 | .+alpha\\d$
45 | """
46 |
47 | def test_plugin_compiles_patterns(self) -> None:
48 | mock_config(self.config_contents)
49 |
50 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
51 |
52 | assert any(type(plugin) == regex_name.RegexReleaseFilter for plugin in plugins)
53 | plugin = next(
54 | plugin
55 | for plugin in plugins
56 | if isinstance(plugin, regex_name.RegexReleaseFilter)
57 | )
58 | assert plugin.patterns == [re.compile(r".+rc\d$"), re.compile(r".+alpha\d$")]
59 |
60 | def test_plugin_check_match(self) -> None:
61 | mock_config(self.config_contents)
62 |
63 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
64 | pkg = Package("foo", 1)
65 | pkg._metadata = {
66 | "info": {"name": "foo", "version": "foo-1.2.0"},
67 | "releases": {"foo-1.2.0rc2": {}, "foo-1.2.0": {}, "foo-1.2.0alpha2": {}},
68 | }
69 |
70 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
71 |
72 | assert pkg.releases == {"foo-1.2.0": {}}
73 |
74 |
75 | class TestRegexProjectFilter(BasePluginTestCase):
76 |
77 | config_contents = """\
78 | [plugins]
79 | enabled =
80 | regex_project
81 |
82 | [filter_regex]
83 | packages =
84 | .+-evil$
85 | .+-neutral$
86 | """
87 |
88 | def test_plugin_compiles_patterns(self) -> None:
89 | mock_config(self.config_contents)
90 |
91 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins()
92 |
93 | assert any(type(plugin) == regex_name.RegexProjectFilter for plugin in plugins)
94 | plugin = next(
95 | plugin
96 | for plugin in plugins
97 | if isinstance(plugin, regex_name.RegexProjectFilter)
98 | )
99 | assert plugin.patterns == [re.compile(r".+-evil$"), re.compile(r".+-neutral$")]
100 |
101 | def test_plugin_check_match(self) -> None:
102 | mock_config(self.config_contents)
103 |
104 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
105 | mirror.packages_to_sync = {"foo-good": "", "foo-evil": "", "foo-neutral": ""}
106 | mirror._filter_packages()
107 |
108 | assert list(mirror.packages_to_sync.keys()) == ["foo-good"]
109 |
--------------------------------------------------------------------------------
/src/bandersnatch_filter_plugins/filename_name.py:
--------------------------------------------------------------------------------
1 | import logging
2 | from typing import Dict, List
3 |
4 | from bandersnatch.filter import FilterReleaseFilePlugin
5 |
6 | logger = logging.getLogger("bandersnatch")
7 |
8 |
9 | class ExcludePlatformFilter(FilterReleaseFilePlugin):
10 | """
11 | Filters releases based on regex patters defined by the user.
12 | """
13 |
14 | name = "exclude_platform"
15 |
16 | _patterns: List[str] = []
17 | _packagetypes: List[str] = []
18 |
19 | _windowsPlatformTypes = [".win32", "-win32", "win_amd64", "win-amd64"]
20 |
21 | _linuxPlatformTypes = [
22 | "linux-i686", # PEP 425
23 | "linux-x86_64", # PEP 425
24 | "linux_armv7l", # https://github.com/pypa/warehouse/pull/2010
25 | "linux_armv6l", # https://github.com/pypa/warehouse/pull/2012
26 | "manylinux1_i686", # PEP 513
27 | "manylinux1_x86_64", # PEP 513
28 | "manylinux2010_i686", # PEP 571
29 | "manylinux2010_x86_64", # PEP 571
30 | ]
31 |
32 | def initialize_plugin(self) -> None:
33 | """
34 | Initialize the plugin reading patterns from the config.
35 | """
36 | if self._patterns or self._packagetypes:
37 | logger.debug(
38 | "Skipping initalization of Exclude Platform plugin. "
39 | + "Already initialized"
40 | )
41 | return
42 |
43 | try:
44 | tags = self.blocklist["platforms"].split()
45 | except KeyError:
46 | logger.error(f"Plugin {self.name}: missing platforms= setting")
47 | return
48 |
49 | for platform in tags:
50 | lplatform = platform.lower()
51 |
52 | if lplatform in ("windows", "win"):
53 | # PEP 425
54 | # see also setuptools/package_index.py
55 | self._patterns.extend(self._windowsPlatformTypes)
56 | # PEP 527
57 | self._packagetypes.extend(["bdist_msi", "bdist_wininst"])
58 |
59 | elif lplatform in ("macos", "macosx"):
60 | self._patterns.extend(["macosx_", "macosx-"])
61 | self._packagetypes.extend(["bdist_dmg"])
62 |
63 | elif lplatform in ("freebsd"):
64 | # concerns only very few files
65 | self._patterns.extend([".freebsd", "-freebsd"])
66 |
67 | elif lplatform in ("linux"):
68 | self._patterns.extend(self._linuxPlatformTypes)
69 | self._packagetypes.extend(["bdist_rpm"])
70 |
71 | # check for platform specific architectures
72 | elif lplatform in self._windowsPlatformTypes:
73 | self._patterns.extend([lplatform])
74 |
75 | elif lplatform in self._linuxPlatformTypes:
76 | self._patterns.extend([lplatform])
77 |
78 | logger.info(f"Initialized {self.name} plugin with {self._patterns!r}")
79 |
80 | def filter(self, metadata: Dict) -> bool:
81 | """
82 | Returns False if file matches any of the filename patterns
83 | """
84 | file = metadata["release_file"]
85 | return not self._check_match(file)
86 |
87 | def _check_match(self, file_desc: Dict) -> bool:
88 | """
89 | Check if a release version matches any of the specified patterns.
90 |
91 | Parameters
92 | ==========
93 | file_desc: Dict
94 | file description entry
95 |
96 | Returns
97 | =======
98 | bool:
99 | True if it matches, False otherwise.
100 | """
101 |
102 | # source dist: never filter out
103 | pt = file_desc.get("packagetype")
104 | if pt == "sdist":
105 | return False
106 |
107 | # Windows installer
108 | if pt in self._packagetypes:
109 | return True
110 |
111 | fn = file_desc["filename"]
112 | for i in self._patterns:
113 | if i in fn:
114 | return True
115 |
116 | return False
117 |
--------------------------------------------------------------------------------
/src/bandersnatch/default.conf:
--------------------------------------------------------------------------------
1 | [mirror]
2 | ; The directory where the mirror data will be stored.
3 | directory = /srv/pypi
4 | ; Save JSON metadata into the web tree:
5 | ; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME
6 | json = false
7 |
8 | ; Cleanup legacy non PEP 503 normalized named simple directories
9 | cleanup = false
10 |
11 | ; The PyPI server which will be mirrored.
12 | ; master = https://test.python.org
13 | ; scheme for PyPI server MUST be https
14 | master = https://pypi.org
15 |
16 | ; The network socket timeout to use for all connections. This is set to a
17 | ; somewhat aggressively low value: rather fail quickly temporarily and re-run
18 | ; the client soon instead of having a process hang infinitely and have TCP not
19 | ; catching up for ages.
20 | timeout = 10
21 |
22 | ; The global-timeout sets aiohttp total timeout for it's coroutines
23 | ; This is set incredibly high by default as aiohttp coroutines need to be
24 | ; equipped to handle mirroring large PyPI packages on slow connections.
25 | global-timeout = 1800
26 |
27 | ; Number of worker threads to use for parallel downloads.
28 | ; Recommendations for worker thread setting:
29 | ; - leave the default of 3 to avoid overloading the pypi master
30 | ; - official servers located in data centers could run 10 workers
31 | ; - anything beyond 10 is probably unreasonable and avoided by bandersnatch
32 | workers = 3
33 |
34 | ; Whether to hash package indexes
35 | ; Note that package index directory hashing is incompatible with pip, and so
36 | ; this should only be used in an environment where it is behind an application
37 | ; that can translate URIs to filesystem locations. For example, with the
38 | ; following Apache RewriteRule:
39 | ; RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/
40 | ; RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3
41 | ; OR
42 | ; following nginx rewrite rules:
43 | ; rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last;
44 | ; rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last;
45 | ; Setting this to true would put the package 'abc' index in simple/a/abc.
46 | ; Recommended setting: the default of false for full pip/pypi compatibility.
47 | hash-index = false
48 |
49 | ; Whether to stop a sync quickly after an error is found or whether to continue
50 | ; syncing but not marking the sync as successful. Value should be "true" or
51 | ; "false".
52 | stop-on-error = false
53 |
54 | ; The storage backend that will be used to save data and metadata while
55 | ; mirroring packages. By default, use the filesystem backend. Other options
56 | ; currently include: 'swift'
57 | storage-backend = filesystem
58 |
59 | ; Advanced logging configuration. Uncomment and set to the location of a
60 | ; python logging format logging config file.
61 | ; log-config = /etc/bandersnatch-log.conf
62 |
63 | ; Generate index pages with absolute urls rather than relative links. This is
64 | ; generally not necessary, but was added for the official internal PyPI mirror,
65 | ; which requires serving packages from https://files.pythonhosted.org
66 | ; root_uri = https://example.com
67 |
68 | ; Number of consumers which verify metadata
69 | verifiers = 3
70 |
71 | ; Number of prior simple index.html to store. Used as a safeguard against
72 | ; upstream changes generating blank index.html files. Prior versions are
73 | ; stored under as "versions/index__.html" and the current
74 | ; index.html will be a symlink to the latest version.
75 | ; If set to 0 no prior versions are stored and index.html is the latest version.
76 | ; If unset defaults to 0.
77 | ; keep_index_versions = 0
78 |
79 | ; vim: set ft=cfg:
80 |
81 | ; Configure a file to write out the list of files downloaded during the mirror.
82 | ; This is useful for situations when mirroring to offline systems where a process
83 | ; is required to only sync new files to the upstream mirror.
84 | ; The file be be named as set in the diff-file, and overwritten unless the
85 | ; diff-append-epoch setting is set to true. If this is true, the epoch date will
86 | ; be appended to the filename (i.e. /path/to/diff-1568129735)
87 | ; diff-file = /srv/pypi/mirrored-files
88 | ; diff-append-epoch = true
89 |
--------------------------------------------------------------------------------
/src/bandersnatch/unittest.conf:
--------------------------------------------------------------------------------
1 | [mirror]
2 | ; The directory where the mirror data will be stored.
3 | directory = /srv/pypi
4 | ; Save JSON metadata into the web tree:
5 | ; URL/pypi/PKG_NAME/json (Symlink) -> URL/json/PKG_NAME
6 | json = false
7 |
8 | ; Cleanup legacy non PEP 503 normalized named simple directories
9 | cleanup = false
10 |
11 | ; The PyPI server which will be mirrored.
12 | ; master = https://test.python.org
13 | ; scheme for PyPI server MUST be https
14 | master = https://pypi.org
15 |
16 | ; The network socket timeout to use for all connections. This is set to a
17 | ; somewhat aggressively low value: rather fail quickly temporarily and re-run
18 | ; the client soon instead of having a process hang infinitely and have TCP not
19 | ; catching up for ages.
20 | timeout = 10
21 | global-timeout = 18000
22 |
23 | ; Number of worker threads to use for parallel downloads.
24 | ; Recommendations for worker thread setting:
25 | ; - leave the default of 3 to avoid overloading the pypi master
26 | ; - official servers located in data centers could run 10 workers
27 | ; - anything beyond 10 is probably unreasonable and avoided by bandersnatch
28 | workers = 3
29 |
30 | ; Whether to hash package indexes
31 | ; Note that package index directory hashing is incompatible with pip, and so
32 | ; this should only be used in an environment where it is behind an application
33 | ; that can translate URIs to filesystem locations. For example, with the
34 | ; following Apache RewriteRule:
35 | ; RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/
36 | ; RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3
37 | ; OR
38 | ; following nginx rewrite rules:
39 | ; rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last;
40 | ; rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last;
41 | ; Setting this to true would put the package 'abc' index in simple/a/abc.
42 | ; Recommended setting: the default of false for full pip/pypi compatibility.
43 | hash-index = false
44 |
45 | ; Whether to stop a sync quickly after an error is found or whether to continue
46 | ; syncing but not marking the sync as successful. Value should be "true" or
47 | ; "false".
48 | stop-on-error = false
49 |
50 | storage-backend = filesystem
51 | ; Advanced logging configuration. Uncomment and set to the location of a
52 | ; python logging format logging config file.
53 | ; log-config = /etc/bandersnatch-log.conf
54 |
55 | ; Generate index pages with absolute urls rather than relative links. This is
56 | ; generally not necessary, but was added for the official internal PyPI mirror,
57 | ; which requires serving packages from https://files.pythonhosted.org
58 | ; root_uri = https://example.com
59 |
60 | ; Number of consumers which verify metadata
61 | verifiers = 3
62 |
63 | ; Number of prior simple index.html to store. Used as a safeguard against
64 | ; upstream changes generating blank index.html files. Prior versions are
65 | ; stored under as "versions/index__.html" and the current
66 | ; index.html will be a symlink to the latest version.
67 | ; If set to 0 no prior versions are stored and index.html is the latest version.
68 | ; If unset defaults to 0.
69 | ; keep_index_versions = 0
70 |
71 | ; Configure a file to write out the list of files downloaded during the mirror.
72 | ; This is useful for situations when mirroring to offline systems where a process
73 | ; is required to only sync new files to the upstream mirror.
74 | ; The file be be named as set in the diff-file, and overwritten unless the
75 | ; diff-append-epoch setting is set to true. If this is true, the epoch date will
76 | ; be appended to the filename (i.e. /path/to/diff-1568129735)
77 | ; You can also indicate a section and key of the config file that is storing the
78 | ; directory to keep the diff file, separated by an _ character in-
79 | ; e.g. {{mirror_directory}}
80 | diff-file = {{mirror_directory}}/mirrored-files
81 | diff-append-epoch = false
82 |
83 | ; Enable filtering plugins
84 | [plugins]
85 | ; Enable all or specific plugins - e.g. allowlist_project
86 | enabled = all
87 |
88 | [blocklist]
89 | ; List of PyPI packages not to sync - Useful if malicious packages are mirrored
90 | packages =
91 | example1
92 | example2
93 |
94 | ; vim: set ft=cfg:
95 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import os.path
3 | import re
4 | from pathlib import Path
5 | from tempfile import NamedTemporaryFile, TemporaryDirectory, gettempdir
6 | from typing import Set
7 |
8 | import aiohttp
9 | import pytest
10 | from _pytest.monkeypatch import MonkeyPatch
11 |
12 | from bandersnatch.utils import ( # isort:skip
13 | bandersnatch_safe_name,
14 | convert_url_to_path,
15 | hash,
16 | recursive_find_files,
17 | rewrite,
18 | unlink_parent_dir,
19 | user_agent,
20 | WINDOWS,
21 | )
22 |
23 |
24 | def test_convert_url_to_path() -> None:
25 | assert (
26 | "packages/8f/1a/1aa000db9c5a799b676227e845d2b64fe725328e05e3d3b30036f"
27 | + "50eb316/peerme-1.0.0-py36-none-any.whl"
28 | == convert_url_to_path(
29 | "https://files.pythonhosted.org/packages/8f/1a/1aa000db9c5a799b67"
30 | + "6227e845d2b64fe725328e05e3d3b30036f50eb316/"
31 | + "peerme-1.0.0-py36-none-any.whl"
32 | )
33 | )
34 |
35 |
36 | def test_hash() -> None:
37 | expected_md5 = "b2855c4a4340dad73d9d870630390885"
38 | expected_sha256 = "a2a5e3823bf4cccfaad4e2f0fbabe72bc8c3cf78bc51eb396b5c7af99e17f07a"
39 | with NamedTemporaryFile(delete=False) as ntf:
40 | ntf_path = Path(ntf.name)
41 | ntf.close()
42 | try:
43 | with ntf_path.open("w") as ntfp:
44 | ntfp.write("Unittest File for hashing Fun!")
45 |
46 | assert hash(ntf_path, function="md5") == expected_md5
47 | assert hash(ntf_path, function="sha256") == expected_sha256
48 | assert hash(ntf_path) == expected_sha256
49 | finally:
50 | if ntf_path.exists():
51 | ntf_path.unlink()
52 |
53 |
54 | def test_find_files() -> None:
55 | with TemporaryDirectory() as td:
56 | td_path = Path(td)
57 | td_sub_path = td_path / "aDir"
58 | td_sub_path.mkdir()
59 |
60 | expected_found_files = {td_path / "file1", td_sub_path / "file2"}
61 | for afile in expected_found_files:
62 | with afile.open("w") as afp:
63 | afp.write("PyPA ftw!")
64 |
65 | found_files: Set[Path] = set()
66 | recursive_find_files(found_files, td_path)
67 | assert found_files == expected_found_files
68 |
69 |
70 | def test_rewrite(tmpdir: Path, monkeypatch: MonkeyPatch) -> None:
71 | monkeypatch.chdir(tmpdir)
72 | with open("sample", "w") as f:
73 | f.write("bsdf")
74 | with rewrite("sample") as f: # type: ignore
75 | f.write("csdf")
76 | assert open("sample").read() == "csdf"
77 | mode = os.stat("sample").st_mode
78 | # chmod doesn't work on windows machines. Permissions are pinned at 666
79 | if not WINDOWS:
80 | assert oct(mode) == "0o100644"
81 |
82 |
83 | def test_rewrite_fails(tmpdir: Path, monkeypatch: MonkeyPatch) -> None:
84 | monkeypatch.chdir(tmpdir)
85 | with open("sample", "w") as f:
86 | f.write("bsdf")
87 | with pytest.raises(Exception):
88 | with rewrite("sample") as f: # type: ignore
89 | f.write("csdf")
90 | raise Exception()
91 | assert open("sample").read() == "bsdf" # type: ignore
92 |
93 |
94 | def test_rewrite_nonexisting_file(tmpdir: Path, monkeypatch: MonkeyPatch) -> None:
95 | monkeypatch.chdir(tmpdir)
96 | with rewrite("sample", "w") as f:
97 | f.write("csdf")
98 | with open("sample") as f:
99 | assert f.read() == "csdf"
100 |
101 |
102 | def test_unlink_parent_dir() -> None:
103 | adir = Path(gettempdir()) / f"tb.{os.getpid()}"
104 | adir.mkdir()
105 | afile = adir / "file1"
106 | afile.touch()
107 | unlink_parent_dir(afile)
108 | assert not adir.exists()
109 |
110 |
111 | def test_user_agent() -> None:
112 | assert re.match(
113 | r"bandersnatch/[0-9]\.[0-9]\.[0-9]\.?d?e?v?[0-9]? \(.*\) "
114 | + fr"\(aiohttp {aiohttp.__version__}\)",
115 | user_agent(),
116 | )
117 |
118 |
119 | def test_bandersnatch_safe_name() -> None:
120 | bad_name = "Flake_8_Fake"
121 | assert "flake-8-fake" == bandersnatch_safe_name(bad_name)
122 |
--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
1 | [metadata]
2 | author = Christian Theune
3 | author_email = ct@flyingcircus.io
4 | classifiers =
5 | Programming Language :: Python :: 3 :: Only
6 | Programming Language :: Python :: 3.6
7 | Programming Language :: Python :: 3.7
8 | Programming Language :: Python :: 3.8
9 | description = Mirroring tool that implements the client (mirror) side of PEP 381
10 | long_description = file:README.md
11 | long_description_content_type = text/markdown
12 | license = Academic Free License, version 3
13 | license_file = LICENSE
14 | name = bandersnatch
15 | project_urls =
16 | Source Code = https://github.com/pypa/bandersnatch
17 | Change Log = https://github.com/pypa/bandersnatch/CHANGES.md
18 | url = https://github.com/pypa/bandersnatch/
19 | version = 4.1.1
20 |
21 | [options]
22 | install_requires =
23 | aiohttp-xmlrpc
24 | aiohttp
25 | filelock
26 | importlib_resources; python_version < '3.7'
27 | packaging
28 | setuptools>40.0.0
29 | package_dir =
30 | =src
31 | packages = find:
32 | python_requires = >=3.6
33 |
34 | [options.packages.find]
35 | where=src
36 |
37 | [options.package_data]
38 | bandersnatch = *.conf
39 |
40 | [options.entry_points]
41 | bandersnatch_storage_plugins.v1.backend =
42 | swift_plugin = bandersnatch_storage_plugins.swift:SwiftStorage
43 | filesystem_plugin = bandersnatch_storage_plugins.filesystem:FilesystemStorage
44 |
45 | # This entrypoint group must match the value of bandersnatch.filter.PROJECT_PLUGIN_RESOURCE
46 | bandersnatch_filter_plugins.v2.project =
47 | blacklist_project = bandersnatch_filter_plugins.blocklist_name:BlockListProject
48 | whitelist_project = bandersnatch_filter_plugins.allowlist_name:AllowListProject
49 | regex_project = bandersnatch_filter_plugins.regex_name:RegexProjectFilter
50 |
51 | # This entrypoint group must match the value of bandersnatch.filter.METADATA_PLUGIN_RESOURCE
52 | bandersnatch_filter_plugins.v2.metadata =
53 | regex_project_metadata = bandersnatch_filter_plugins.metadata_filter:RegexProjectMetadataFilter
54 |
55 | # This entrypoint group must match the value of bandersnatch.filter.RELEASE_PLUGIN_RESOURCE
56 | bandersnatch_filter_plugins.v2.release =
57 | blacklist_release = bandersnatch_filter_plugins.blocklist_name:BlockListRelease
58 | prerelease_release = bandersnatch_filter_plugins.prerelease_name:PreReleaseFilter
59 | regex_release = bandersnatch_filter_plugins.regex_name:RegexReleaseFilter
60 | latest_release = bandersnatch_filter_plugins.latest_name:LatestReleaseFilter
61 | allowlist_release = bandersnatch_filter_plugins.allowlist_name:AllowListRelease
62 |
63 | # This entrypoint group must match the value of bandersnatch.filter.RELEASE_FILE_PLUGIN_RESOURCE
64 | bandersnatch_filter_plugins.v2.release_file =
65 | regex_release_file_metadata = bandersnatch_filter_plugins.metadata_filter:RegexReleaseFileMetadataFilter
66 | version_range_release_file_metadata = bandersnatch_filter_plugins.metadata_filter:VersionRangeReleaseFileMetadataFilter
67 | exclude_platform = bandersnatch_filter_plugins.filename_name:ExcludePlatformFilter
68 |
69 | console_scripts =
70 | bandersnatch = bandersnatch.main:main
71 |
72 | [options.extras_require]
73 | safety_db =
74 | bandersnatch_safety_db
75 |
76 | test =
77 | coverage
78 | freezegun
79 | flake8
80 | flake8-bugbear
81 | pytest
82 | pytest-timeout
83 | pytest-cache
84 |
85 | doc_build =
86 | docutils
87 | sphinx
88 | sphinx_bootstrap_theme
89 | guzzle_sphinx_theme
90 | sphinx_rtd_theme
91 | recommonmark
92 | # git+https://github.com/pypa/pypa-docs-theme.git#egg=pypa-docs-theme
93 | # git+https://github.com/python/python-docs-theme.git#egg=python-docs-theme
94 |
95 | swift =
96 | keystoneauth1
97 | openstackclient
98 | python-swiftclient
99 |
100 | [isort]
101 | atomic = true
102 | not_skip = __init__.py
103 | line_length = 88
104 | multi_line_output = 3
105 | known_third_party = _pytest,aiohttp,aiohttp_xmlrpc,asynctest,filelock,freezegun,keystoneauth1,mock_config,packaging,pkg_resources,pytest,setuptools,swiftclient
106 | known_first_party = bandersnatch,bandersnatch_filter_plugins,bandersnatch_storage_plugins
107 | force_grid_wrap = 0
108 | use_parentheses=True
109 | include_trailing_comma = True
110 |
--------------------------------------------------------------------------------
/test_runner.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 |
3 | """
4 | bandersnatch CI run script - Will either drive `tox` or run an Integration Test
5 | - Rewritten in Python for easier dev contributions + Windows support
6 |
7 | Integration Tests will go off and hit PyPI + pull whiteliested packages
8 | then check for expected outputs to exist
9 | """
10 |
11 | from configparser import ConfigParser
12 | from os import environ
13 | from pathlib import Path
14 | from shutil import rmtree, which
15 | from subprocess import run
16 | from sys import exit
17 | from tempfile import gettempdir
18 |
19 | from src.bandersnatch.utils import hash
20 |
21 | BANDERSNATCH_EXE = Path(
22 | which("bandersnatch") or which("bandersnatch.exe") or "bandersnatch"
23 | )
24 | CI_CONFIG = Path("src/bandersnatch/tests/ci.conf")
25 | EOP = "[CI ERROR]:"
26 | MIRROR_ROOT = Path(f"{gettempdir()}/pypi")
27 | MIRROR_BASE = MIRROR_ROOT / "web"
28 | TGZ_SHA256 = "bc9430dae93f8bc53728773545cbb646a6b5327f98de31bdd6e1a2b2c6e805a9"
29 | TOX_EXE = Path(which("tox") or "tox")
30 |
31 | # Make Global so we can check exists before delete
32 | A_BLACK_WHL = (
33 | MIRROR_BASE
34 | / "packages"
35 | / "30"
36 | / "62"
37 | / "cf549544a5fe990bbaeca21e9c419501b2de7a701ab0afb377bc81676600"
38 | / "black-19.3b0-py36-none-any.whl"
39 | )
40 |
41 |
42 | def check_ci() -> int:
43 | black_index = MIRROR_BASE / "simple/b/black/index.html"
44 | peerme_index = MIRROR_BASE / "simple/p/peerme/index.html"
45 | peerme_json = MIRROR_BASE / "json/peerme"
46 | peerme_tgz = (
47 | MIRROR_BASE
48 | / "packages"
49 | / "8f"
50 | / "1a"
51 | / "1aa000db9c5a799b676227e845d2b64fe725328e05e3d3b30036f50eb316"
52 | / "peerme-1.0.0-py36-none-any.whl"
53 | )
54 |
55 | if not peerme_index.exists():
56 | print(f"{EOP} No peerme simple API index exists @ {peerme_index}")
57 | return 69
58 |
59 | if not peerme_json.exists():
60 | print(f"{EOP} No peerme JSON API file exists @ {peerme_json}")
61 | return 70
62 |
63 | if not peerme_tgz.exists():
64 | print(f"{EOP} No peerme tgz file exists @ {peerme_tgz}")
65 | return 71
66 |
67 | peerme_tgz_sha256 = hash(str(peerme_tgz))
68 | if peerme_tgz_sha256 != TGZ_SHA256:
69 | print(f"{EOP} Bad peerme 1.0.0 sha256: {peerme_tgz_sha256} != {TGZ_SHA256}")
70 | return 72
71 |
72 | if black_index.exists():
73 | print(f"{EOP} {black_index} exists ... delete failed?")
74 | return 73
75 |
76 | if A_BLACK_WHL.exists():
77 | print(f"{EOP} {A_BLACK_WHL} exists ... delete failed?")
78 | return 74
79 |
80 | rmtree(MIRROR_ROOT)
81 |
82 | print("Bandersnatch PyPI CI finished successfully!")
83 | return 0
84 |
85 |
86 | def do_ci(conf: Path) -> int:
87 | if not conf.exists():
88 | print(f"CI config {conf} does not exist for bandersnatch run")
89 | return 2
90 |
91 | print("Starting CI bandersnatch mirror ...")
92 | cmds = (str(BANDERSNATCH_EXE), "--config", str(conf), "--debug", "mirror")
93 | print(f"bandersnatch cmd: {' '.join(cmds)}")
94 | run(cmds, check=True)
95 |
96 | print(f"Checking if {A_BLACK_WHL} exists")
97 | if not A_BLACK_WHL.exists():
98 | print(f"{EOP} {A_BLACK_WHL} does not exist after mirroring ...")
99 | return 68
100 |
101 | print("Starting to deleting black from mirror ...")
102 | del_cmds = (
103 | str(BANDERSNATCH_EXE),
104 | "--config",
105 | str(conf),
106 | "--debug",
107 | "delete",
108 | "black",
109 | )
110 | print(f"bandersnatch delete cmd: {' '.join(cmds)}")
111 | run(del_cmds, check=True)
112 |
113 | return check_ci()
114 |
115 |
116 | def platform_config() -> Path:
117 | """Ensure the CI_CONFIG is correct for the platform we're running on"""
118 | platform_ci_conf = MIRROR_ROOT / "ci.conf"
119 | cp = ConfigParser()
120 | cp.read(str(CI_CONFIG))
121 |
122 | print(f"Setting CI directory={MIRROR_ROOT}")
123 | cp["mirror"]["directory"] = str(MIRROR_ROOT)
124 |
125 | with platform_ci_conf.open("w") as pccfp:
126 | cp.write(pccfp)
127 |
128 | return platform_ci_conf
129 |
130 |
131 | def main() -> int:
132 | if "TOXENV" not in environ:
133 | print("No TOXENV set. Exiting!")
134 | return 1
135 |
136 | if environ["TOXENV"] != "INTEGRATION":
137 | return run((str(TOX_EXE),)).returncode
138 | else:
139 | print("Running Ingtegration tests due to TOXENV set to INTEGRATION")
140 | MIRROR_ROOT.mkdir(exist_ok=True)
141 | return do_ci(platform_config())
142 |
143 |
144 | if __name__ == "__main__":
145 | exit(main())
146 |
--------------------------------------------------------------------------------
/src/bandersnatch/utils.py:
--------------------------------------------------------------------------------
1 | import contextlib
2 | import hashlib
3 | import logging
4 | import os
5 | import os.path
6 | import platform
7 | import re
8 | import shutil
9 | import sys
10 | import tempfile
11 | from datetime import datetime
12 | from pathlib import Path
13 | from typing import IO, Any, Generator, List, Set, Union
14 | from urllib.parse import urlparse
15 |
16 | import aiohttp
17 |
18 | from . import __version__
19 |
20 | logger = logging.getLogger(__name__)
21 |
22 |
23 | def user_agent() -> str:
24 | template = "bandersnatch/{version} ({python}, {system})"
25 | template += f" (aiohttp {aiohttp.__version__})"
26 | version = __version__
27 | python = sys.implementation.name
28 | python += " {}.{}.{}-{}{}".format(*sys.version_info)
29 | uname = platform.uname()
30 | system = " ".join([uname.system, uname.machine])
31 | return template.format(**locals())
32 |
33 |
34 | SAFE_NAME_REGEX = re.compile(r"[^A-Za-z0-9.]+")
35 | USER_AGENT = user_agent()
36 | WINDOWS = bool(platform.system() == "Windows")
37 |
38 |
39 | def make_time_stamp() -> str:
40 | """Helper function that returns a timestamp suitable for use
41 | in a filename on any OS"""
42 | return f"{datetime.utcnow().isoformat()}Z".replace(":", "")
43 |
44 |
45 | def convert_url_to_path(url: str) -> str:
46 | return urlparse(url).path[1:]
47 |
48 |
49 | def hash(path: Path, function: str = "sha256") -> str:
50 | h = getattr(hashlib, function)()
51 | with open(path, "rb") as f:
52 | while True:
53 | chunk = f.read(128 * 1024)
54 | if not chunk:
55 | break
56 | h.update(chunk)
57 | return str(h.hexdigest())
58 |
59 |
60 | def find(root: Union[Path, str], dirs: bool = True) -> str:
61 | """A test helper simulating 'find'.
62 |
63 | Iterates over directories and filenames, given as relative paths to the
64 | root.
65 |
66 | """
67 | # TODO: account for alternative backends
68 | if isinstance(root, str):
69 | root = Path(root)
70 |
71 | results: List[Path] = []
72 | for dirpath, dirnames, filenames in os.walk(root):
73 | names = filenames
74 | if dirs:
75 | names += dirnames
76 | for name in names:
77 | results.append(Path(dirpath) / name)
78 | results.sort()
79 | return "\n".join(str(result.relative_to(root)) for result in results)
80 |
81 |
82 | @contextlib.contextmanager
83 | def rewrite(
84 | filepath: Union[str, Path], mode: str = "w", **kw: Any
85 | ) -> Generator[IO, None, None]:
86 | """Rewrite an existing file atomically to avoid programs running in
87 | parallel to have race conditions while reading."""
88 | # TODO: Account for alternative backends
89 | if isinstance(filepath, str):
90 | base_dir = os.path.dirname(filepath)
91 | filename = os.path.basename(filepath)
92 | else:
93 | base_dir = str(filepath.parent)
94 | filename = filepath.name
95 |
96 | # Change naming format to be more friendly with distributed POSIX
97 | # filesystems like GlusterFS that hash based on filename
98 | # GlusterFS ignore '.' at the start of filenames and this avoid rehashing
99 | with tempfile.NamedTemporaryFile(
100 | mode=mode, prefix=f".{filename}.", delete=False, dir=base_dir, **kw
101 | ) as f:
102 | filepath_tmp = f.name
103 | yield f
104 |
105 | if not os.path.exists(filepath_tmp):
106 | # Allow our clients to remove the file in case it doesn't want it to be
107 | # put in place actually but also doesn't want to error out.
108 | return
109 | os.chmod(filepath_tmp, 0o100644)
110 | shutil.move(filepath_tmp, filepath)
111 |
112 |
113 | def recursive_find_files(files: Set[Path], base_dir: Path) -> None:
114 | dirs = [d for d in base_dir.iterdir() if d.is_dir()]
115 | files.update([x for x in base_dir.iterdir() if x.is_file()])
116 | for directory in dirs:
117 | recursive_find_files(files, directory)
118 |
119 |
120 | def unlink_parent_dir(path: Path) -> None:
121 | """ Remove a file and if the dir is empty remove it """
122 | logger.info(f"unlink {str(path)}")
123 | path.unlink()
124 |
125 | parent_path = path.parent
126 | try:
127 | parent_path.rmdir()
128 | logger.info(f"rmdir {str(parent_path)}")
129 | except OSError as oe:
130 | logger.debug(f"Did not remove {str(parent_path)}: {str(oe)}")
131 |
132 |
133 | def bandersnatch_safe_name(name: str) -> str:
134 | """Convert an arbitrary string to a standard distribution name
135 | Any runs of non-alphanumeric/. characters are replaced with a single '-'.
136 |
137 | - This was copied from `pkg_resources` (part of `setuptools`)
138 |
139 | bandersnatch also lower cases the returned name
140 | """
141 | return SAFE_NAME_REGEX.sub("-", name).lower()
142 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/plugins/test_latest_release.py:
--------------------------------------------------------------------------------
1 | import os
2 | from pathlib import Path
3 | from tempfile import TemporaryDirectory
4 | from unittest import TestCase
5 |
6 | from mock_config import mock_config
7 |
8 | import bandersnatch.filter
9 | from bandersnatch.master import Master
10 | from bandersnatch.mirror import BandersnatchMirror
11 | from bandersnatch.package import Package
12 | from bandersnatch_filter_plugins import latest_name
13 |
14 |
15 | class BasePluginTestCase(TestCase):
16 |
17 | tempdir = None
18 | cwd = None
19 |
20 | def setUp(self) -> None:
21 | self.cwd = os.getcwd()
22 | self.tempdir = TemporaryDirectory()
23 | os.chdir(self.tempdir.name)
24 |
25 | def tearDown(self) -> None:
26 | if self.tempdir:
27 | assert self.cwd
28 | os.chdir(self.cwd)
29 | self.tempdir.cleanup()
30 | self.tempdir = None
31 |
32 |
33 | class TestLatestReleaseFilter(BasePluginTestCase):
34 |
35 | config_contents = """\
36 | [plugins]
37 | enabled =
38 | latest_release
39 |
40 | [latest_release]
41 | keep = 2
42 | """
43 |
44 | def test_plugin_compiles_patterns(self) -> None:
45 | mock_config(self.config_contents)
46 |
47 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
48 |
49 | assert any(
50 | type(plugin) == latest_name.LatestReleaseFilter for plugin in plugins
51 | )
52 | plugin = next(
53 | plugin
54 | for plugin in plugins
55 | if isinstance(plugin, latest_name.LatestReleaseFilter)
56 | )
57 | assert plugin.keep == 2
58 |
59 | def test_latest_releases_keep_latest(self) -> None:
60 | mock_config(self.config_contents)
61 |
62 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
63 | pkg = Package("foo", 1)
64 | pkg._metadata = {
65 | "info": {"name": "foo", "version": "2.0.0"},
66 | "releases": {
67 | "1.0.0": {},
68 | "1.1.0": {},
69 | "1.1.1": {},
70 | "1.1.2": {},
71 | "1.1.3": {},
72 | "2.0.0": {},
73 | },
74 | }
75 |
76 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
77 |
78 | assert pkg.releases == {"1.1.3": {}, "2.0.0": {}}
79 |
80 | def test_latest_releases_keep_stable(self) -> None:
81 | mock_config(self.config_contents)
82 |
83 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
84 | pkg = Package("foo", 1)
85 | pkg._metadata = {
86 | "info": {"name": "foo", "version": "2.0.0"}, # stable version
87 | "releases": {
88 | "1.0.0": {},
89 | "1.1.0": {},
90 | "1.1.1": {},
91 | "1.1.2": {},
92 | "1.1.3": {},
93 | "2.0.0": {}, # <= stable version, keep it
94 | "2.0.1b1": {},
95 | "2.0.1b2": {}, # <= most recent, keep it
96 | },
97 | }
98 |
99 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
100 |
101 | assert pkg.releases == {"2.0.1b2": {}, "2.0.0": {}}
102 |
103 |
104 | class TestLatestReleaseFilterUninitialized(BasePluginTestCase):
105 |
106 | config_contents = """\
107 | [plugins]
108 | enabled =
109 | latest_release
110 | """
111 |
112 | def test_plugin_compiles_patterns(self) -> None:
113 | mock_config(self.config_contents)
114 |
115 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
116 |
117 | assert any(
118 | type(plugin) == latest_name.LatestReleaseFilter for plugin in plugins
119 | )
120 | plugin = next(
121 | plugin
122 | for plugin in plugins
123 | if isinstance(plugin, latest_name.LatestReleaseFilter)
124 | )
125 | assert plugin.keep == 0
126 |
127 | def test_latest_releases_uninitialized(self) -> None:
128 | mock_config(self.config_contents)
129 |
130 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
131 | pkg = Package("foo", 1)
132 | pkg._metadata = {
133 | "info": {"name": "foo", "version": "2.0.0"},
134 | "releases": {
135 | "1.0.0": {},
136 | "1.1.0": {},
137 | "1.1.1": {},
138 | "1.1.2": {},
139 | "1.1.3": {},
140 | "2.0.0": {},
141 | },
142 | }
143 |
144 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
145 |
146 | assert pkg.releases == {
147 | "1.0.0": {},
148 | "1.1.0": {},
149 | "1.1.1": {},
150 | "1.1.2": {},
151 | "1.1.3": {},
152 | "2.0.0": {},
153 | }
154 |
--------------------------------------------------------------------------------
/src/bandersnatch/delete.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 |
3 | import asyncio
4 | import concurrent.futures
5 | import logging
6 | from argparse import Namespace
7 | from configparser import ConfigParser
8 | from json import JSONDecodeError, load
9 | from pathlib import Path
10 | from typing import Awaitable, List
11 | from urllib.parse import urlparse
12 |
13 | from packaging.utils import canonicalize_name
14 |
15 | from .master import Master
16 | from .storage import storage_backend_plugins
17 | from .verify import get_latest_json
18 |
19 | logger = logging.getLogger(__name__)
20 |
21 |
22 | def delete_path(blob_path: Path, dry_run: bool = False) -> int:
23 | storage_backend = next(iter(storage_backend_plugins()))
24 | if dry_run:
25 | logger.info(f" rm {blob_path}")
26 | if not storage_backend.exists(blob_path):
27 | logger.debug(f"{blob_path} does not exist. Skipping")
28 | return 0
29 | try:
30 | storage_backend.delete(blob_path, dry_run=dry_run)
31 | except FileNotFoundError:
32 | # Due to using threads in executors we sometimes have a
33 | # race condition if canonicalize_name == passed in name
34 | pass
35 | except OSError:
36 | logger.exception(f"Unable to delete {blob_path}")
37 | return 1
38 | return 0
39 |
40 |
41 | async def delete_packages(config: ConfigParser, args: Namespace, master: Master) -> int:
42 | loop = asyncio.get_event_loop()
43 | workers = args.workers or config.getint("mirror", "workers")
44 | executor = concurrent.futures.ThreadPoolExecutor(max_workers=workers)
45 | storage_backend = next(
46 | iter(storage_backend_plugins(config=config, clear_cache=True))
47 | )
48 | web_base_path = storage_backend.web_base_path
49 | json_base_path = storage_backend.json_base_path
50 | pypi_base_path = storage_backend.pypi_base_path
51 | simple_path = storage_backend.simple_base_path
52 |
53 | delete_coros: List[Awaitable] = []
54 | for package in args.pypi_packages:
55 | canon_name = canonicalize_name(package)
56 | need_nc_paths = canon_name != package
57 | json_full_path = json_base_path / canon_name
58 | json_full_path_nc = json_base_path / package if need_nc_paths else None
59 | legacy_json_path = pypi_base_path / canon_name
60 | logger.debug(f"Looking up {canon_name} metadata @ {json_full_path}")
61 |
62 | if not storage_backend.exists(json_full_path):
63 | if args.dry_run:
64 | logger.error(
65 | f"Skipping {json_full_path} as dry run and no JSON file exists"
66 | )
67 | continue
68 |
69 | logger.error(f"{json_full_path} does not exist. Pulling from PyPI")
70 | await get_latest_json(master, json_full_path, config, executor, False)
71 | if not json_full_path.exists():
72 | logger.error(f"Unable to HTTP get JSON for {json_full_path}")
73 | continue
74 |
75 | with storage_backend.open_file(json_full_path, text=True) as jfp:
76 | try:
77 | package_data = load(jfp)
78 | except JSONDecodeError:
79 | logger.exception(f"Skipping {canon_name} @ {json_full_path}")
80 | continue
81 |
82 | for _release, blobs in package_data["releases"].items():
83 | for blob in blobs:
84 | url_parts = urlparse(blob["url"])
85 | blob_path = web_base_path / url_parts.path[1:]
86 | delete_coros.append(
87 | loop.run_in_executor(executor, delete_path, blob_path, args.dry_run)
88 | )
89 |
90 | # Attempt to delete json, normal simple path + hash simple path
91 | package_simple_path = simple_path / canon_name
92 | package_simple_path_nc = simple_path / package if need_nc_paths else None
93 | package_hash_path = simple_path / canon_name[0] / canon_name
94 | package_hash_path_nc = (
95 | simple_path / canon_name[0] / package if need_nc_paths else None
96 | )
97 | # Try cleanup non canon name if they differ
98 | for package_path in (
99 | json_full_path,
100 | legacy_json_path,
101 | package_simple_path,
102 | package_simple_path_nc,
103 | package_hash_path,
104 | package_hash_path_nc,
105 | json_full_path_nc,
106 | ):
107 | if not package_path:
108 | continue
109 |
110 | delete_coros.append(
111 | loop.run_in_executor(executor, delete_path, package_path, args.dry_run)
112 | )
113 |
114 | if args.dry_run:
115 | logger.info("-- bandersnatch delete DRY RUN --")
116 | if delete_coros:
117 | logger.info(f"Attempting to remove {len(delete_coros)} files")
118 | return sum(await asyncio.gather(*delete_coros))
119 | return 0
120 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/conftest.py:
--------------------------------------------------------------------------------
1 | # flake8: noqa
2 |
3 | import unittest.mock as mock
4 | from pathlib import Path
5 | from typing import TYPE_CHECKING, Any, Dict
6 |
7 | import pytest
8 | from _pytest.capture import CaptureFixture
9 | from _pytest.fixtures import FixtureRequest
10 | from _pytest.monkeypatch import MonkeyPatch
11 | from asynctest import asynctest
12 |
13 | if TYPE_CHECKING:
14 | from bandersnatch.mirror import BandersnatchMirror
15 | from bandersnatch.master import Master
16 | from bandersnatch.package import Package
17 |
18 |
19 | @pytest.fixture(autouse=True)
20 | def stop_std_logging(request: FixtureRequest, capfd: CaptureFixture) -> None:
21 | patcher = mock.patch("bandersnatch.log.setup_logging")
22 | patcher.start()
23 |
24 | def tearDown() -> None:
25 | patcher.stop()
26 |
27 | request.addfinalizer(tearDown)
28 |
29 |
30 | async def _nosleep(*args: Any) -> None:
31 | pass
32 |
33 |
34 | @pytest.fixture(autouse=True)
35 | def never_sleep(request: FixtureRequest) -> None:
36 | patcher = mock.patch("asyncio.sleep", _nosleep)
37 | patcher.start()
38 |
39 | def tearDown() -> None:
40 | patcher.stop()
41 |
42 | request.addfinalizer(tearDown)
43 |
44 |
45 | @pytest.fixture
46 | def package(package_json: dict) -> "Package":
47 | from bandersnatch.package import Package
48 |
49 | pkg = Package(package_json["info"]["name"], serial=11)
50 | pkg._metadata = package_json
51 | return pkg
52 |
53 |
54 | @pytest.fixture
55 | def package_json() -> Dict[str, Any]:
56 | return {
57 | "info": {"name": "Foo", "version": "0.1"},
58 | "last_serial": 654_321,
59 | "releases": {
60 | "0.1": [
61 | {
62 | "url": "https://pypi.example.com/packages/any/f/foo/foo.zip",
63 | "filename": "foo.zip",
64 | "digests": {
65 | "md5": "6bd3ddc295176f4dca196b5eb2c4d858",
66 | "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
67 | },
68 | "md5_digest": "b6bcb391b040c4468262706faf9d3cce",
69 | },
70 | {
71 | "url": "https://pypi.example.com/packages/2.7/f/foo/foo.whl",
72 | "filename": "foo.whl",
73 | "digests": {
74 | "md5": "6bd3ddc295176f4dca196b5eb2c4d858",
75 | "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
76 | },
77 | "md5_digest": "6bd3ddc295176f4dca196b5eb2c4d858",
78 | },
79 | ]
80 | },
81 | }
82 |
83 |
84 | @pytest.fixture
85 | def master(package_json: Dict[str, Any]) -> "Master":
86 | from bandersnatch.master import Master
87 |
88 | class FakeReader:
89 | async def read(self, *args: Any) -> bytes:
90 | return b""
91 |
92 | class FakeAiohttpClient:
93 | headers = {"X-PYPI-LAST-SERIAL": "1"}
94 |
95 | async def __aenter__(self) -> "FakeAiohttpClient":
96 | return self
97 |
98 | async def __aexit__(self, *args: Any) -> None:
99 | pass
100 |
101 | @property
102 | def content(self) -> "FakeReader":
103 | return FakeReader()
104 |
105 | async def json(self, *args: Any) -> Dict[str, Any]:
106 | return package_json
107 |
108 | master = Master("https://pypi.example.com")
109 | master.rpc = mock.Mock() # type: ignore
110 | master.session = asynctest.MagicMock()
111 | master.session.get = asynctest.MagicMock(return_value=FakeAiohttpClient())
112 | master.session.request = asynctest.MagicMock(return_value=FakeAiohttpClient())
113 | return master
114 |
115 |
116 | @pytest.fixture
117 | def mirror(
118 | tmpdir: Path, master: "Master", monkeypatch: MonkeyPatch
119 | ) -> "BandersnatchMirror":
120 | monkeypatch.chdir(tmpdir)
121 | from bandersnatch.mirror import BandersnatchMirror
122 |
123 | return BandersnatchMirror(tmpdir, master)
124 |
125 |
126 | @pytest.fixture
127 | def mirror_hash_index(
128 | tmpdir: Path, master: "Master", monkeypatch: MonkeyPatch
129 | ) -> "BandersnatchMirror":
130 | monkeypatch.chdir(tmpdir)
131 | from bandersnatch.mirror import BandersnatchMirror
132 |
133 | return BandersnatchMirror(tmpdir, master, hash_index=True)
134 |
135 |
136 | @pytest.fixture
137 | def mirror_mock(request: FixtureRequest) -> mock.MagicMock:
138 | patcher = mock.patch("bandersnatch.mirror.BandersnatchMirror")
139 | mirror: mock.MagicMock = patcher.start()
140 |
141 | def tearDown() -> None:
142 | patcher.stop()
143 |
144 | request.addfinalizer(tearDown)
145 | return mirror
146 |
147 |
148 | @pytest.fixture
149 | def logging_mock(request: FixtureRequest) -> mock.MagicMock:
150 | patcher = mock.patch("logging.config.fileConfig")
151 | logger: mock.MagicMock = patcher.start()
152 |
153 | def tearDown() -> None:
154 | patcher.stop()
155 |
156 | request.addfinalizer(tearDown)
157 | return logger
158 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_main.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import configparser
3 | import sys
4 | import tempfile
5 | import unittest.mock as mock
6 | from pathlib import Path
7 | from typing import TYPE_CHECKING, Any, Dict
8 |
9 | import pytest
10 | from _pytest.capture import CaptureFixture
11 | from _pytest.logging import LogCaptureFixture
12 |
13 | import bandersnatch.mirror
14 | import bandersnatch.storage
15 | from bandersnatch.configuration import Singleton
16 | from bandersnatch.main import main
17 |
18 | if TYPE_CHECKING:
19 | from bandersnatch.mirror import BandersnatchMirror
20 |
21 |
22 | async def empty_dict(*args: Any, **kwargs: Any) -> Dict:
23 | return {}
24 |
25 |
26 | def setup() -> None:
27 | """ simple setup function to clear Singleton._instances before each test"""
28 | Singleton._instances = {}
29 |
30 |
31 | def test_main_help(capfd: CaptureFixture) -> None:
32 | sys.argv = ["bandersnatch", "--help"]
33 | with pytest.raises(SystemExit):
34 | main(asyncio.new_event_loop())
35 | out, err = capfd.readouterr()
36 | assert out.startswith("usage: bandersnatch")
37 | assert "" == err
38 |
39 |
40 | def test_main_create_config(caplog: LogCaptureFixture, tmpdir: Path) -> None:
41 | sys.argv = ["bandersnatch", "-c", str(tmpdir / "bandersnatch.conf"), "mirror"]
42 | assert main(asyncio.new_event_loop()) == 1
43 | assert "creating default config" in caplog.text
44 | conf_path = Path(tmpdir) / "bandersnatch.conf"
45 | assert conf_path.exists()
46 |
47 |
48 | def test_main_cant_create_config(caplog: LogCaptureFixture, tmpdir: Path) -> None:
49 | sys.argv = [
50 | "bandersnatch",
51 | "-c",
52 | str(tmpdir / "foo" / "bandersnatch.conf"),
53 | "mirror",
54 | ]
55 | assert main(asyncio.new_event_loop()) == 1
56 | assert "creating default config" in caplog.text
57 | assert "Could not create config file" in caplog.text
58 | conf_path = Path(tmpdir) / "bandersnatch.conf"
59 | assert not conf_path.exists()
60 |
61 |
62 | def test_main_reads_config_values(mirror_mock: mock.MagicMock, tmpdir: Path) -> None:
63 | base_config_path = Path(bandersnatch.__file__).parent / "unittest.conf"
64 | diff_file = Path(tempfile.gettempdir()) / "srv/pypi/mirrored-files"
65 | config_lines = [
66 | f"diff-file = {diff_file.as_posix()}\n"
67 | if line.startswith("diff-file")
68 | else line
69 | for line in base_config_path.read_text().splitlines()
70 | ]
71 | config_path = tmpdir / "unittest.conf"
72 | config_path.write_text("\n".join(config_lines), encoding="utf-8")
73 | sys.argv = ["bandersnatch", "-c", str(config_path), "mirror"]
74 | assert config_path.exists()
75 | main(asyncio.new_event_loop())
76 | (homedir, master), kwargs = mirror_mock.call_args_list[0]
77 |
78 | assert Path("/srv/pypi") == homedir
79 | assert isinstance(master, bandersnatch.master.Master)
80 | assert {
81 | "stop_on_error": False,
82 | "hash_index": False,
83 | "workers": 3,
84 | "root_uri": "",
85 | "json_save": False,
86 | "digest_name": "sha256",
87 | "keep_index_versions": 0,
88 | "storage_backend": "filesystem",
89 | "diff_file": diff_file,
90 | "diff_append_epoch": False,
91 | "diff_full_path": diff_file,
92 | "cleanup": False,
93 | } == kwargs
94 |
95 |
96 | def test_main_reads_custom_config_values(
97 | mirror_mock: "BandersnatchMirror", logging_mock: mock.MagicMock, customconfig: Path
98 | ) -> None:
99 | setup()
100 | conffile = str(customconfig / "bandersnatch.conf")
101 | sys.argv = ["bandersnatch", "-c", conffile, "mirror"]
102 | main(asyncio.new_event_loop())
103 | (log_config, _kwargs) = logging_mock.call_args_list[0]
104 | assert log_config == (str(customconfig / "bandersnatch-log.conf"),)
105 |
106 |
107 | def test_main_throws_exception_on_unsupported_digest_name(customconfig: Path,) -> None:
108 | setup()
109 | conffile = str(customconfig / "bandersnatch.conf")
110 | parser = configparser.ConfigParser()
111 | parser.read(conffile)
112 | parser["mirror"]["digest_name"] = "foobar"
113 | del parser["mirror"]["log-config"]
114 | with open(conffile, "w") as fp:
115 | parser.write(fp)
116 | sys.argv = ["bandersnatch", "-c", conffile, "mirror"]
117 |
118 | with pytest.raises(ValueError) as e:
119 | main(asyncio.new_event_loop())
120 |
121 | assert "foobar is not supported" in str(e.value)
122 |
123 |
124 | @pytest.fixture
125 | def customconfig(tmpdir: Path) -> Path:
126 | default_path = Path(bandersnatch.__file__).parent / "unittest.conf"
127 | with default_path.open("r") as dfp:
128 | config = dfp.read()
129 | config = config.replace("/srv/pypi", str(tmpdir / "pypi"))
130 | with open(str(tmpdir / "bandersnatch.conf"), "w") as f:
131 | f.write(config)
132 | config = config.replace("; log-config", "log-config")
133 | config = config.replace(
134 | "/etc/bandersnatch-log.conf", str(tmpdir / "bandersnatch-log.conf")
135 | )
136 | with open(str(tmpdir / "bandersnatch.conf"), "w") as f:
137 | f.write(config)
138 | return tmpdir
139 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_filter.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import unittest
4 | from tempfile import TemporaryDirectory
5 | from unittest import TestCase
6 |
7 | from mock_config import mock_config
8 |
9 | from bandersnatch.configuration import BandersnatchConfig
10 |
11 | from bandersnatch.filter import ( # isort:skip
12 | Filter,
13 | FilterProjectPlugin,
14 | FilterReleasePlugin,
15 | LoadedFilters,
16 | )
17 |
18 |
19 | class TestBandersnatchFilter(TestCase):
20 | """
21 | Tests for the bandersnatch filtering classes
22 | """
23 |
24 | tempdir = None
25 | cwd = None
26 |
27 | def setUp(self) -> None:
28 | self.cwd = os.getcwd()
29 | self.tempdir = TemporaryDirectory()
30 | os.chdir(self.tempdir.name)
31 | sys.stderr.write(self.tempdir.name)
32 | sys.stderr.flush()
33 |
34 | def tearDown(self) -> None:
35 | if self.tempdir:
36 | assert self.cwd
37 | os.chdir(self.cwd)
38 | self.tempdir.cleanup()
39 | self.tempdir = None
40 |
41 | def test__filter_project_plugins__loads(self) -> None:
42 | mock_config(
43 | """\
44 | [plugins]
45 | enabled = all
46 | """
47 | )
48 | builtin_plugin_names = [
49 | "blocklist_project",
50 | "regex_project",
51 | "allowlist_project",
52 | ]
53 |
54 | plugins = LoadedFilters().filter_project_plugins()
55 | names = [plugin.name for plugin in plugins]
56 | for name in builtin_plugin_names:
57 | self.assertIn(name, names)
58 |
59 | def test__filter_release_plugins__loads(self) -> None:
60 | mock_config(
61 | """\
62 | [plugins]
63 | enabled = all
64 | """
65 | )
66 | builtin_plugin_names = [
67 | "blocklist_release",
68 | "prerelease_release",
69 | "regex_release",
70 | "latest_release",
71 | ]
72 |
73 | plugins = LoadedFilters().filter_release_plugins()
74 | names = [plugin.name for plugin in plugins]
75 | for name in builtin_plugin_names:
76 | self.assertIn(name, names)
77 |
78 | def test__filter_no_plugin(self) -> None:
79 | mock_config(
80 | """\
81 | [plugins]
82 | enabled =
83 | """
84 | )
85 |
86 | plugins = LoadedFilters().filter_release_plugins()
87 | self.assertEqual(len(plugins), 0)
88 |
89 | plugins = LoadedFilters().filter_project_plugins()
90 | self.assertEqual(len(plugins), 0)
91 |
92 | def test__filter_base_clases(self) -> None:
93 | """
94 | Test the base filter classes
95 | """
96 |
97 | plugin = Filter()
98 | self.assertEqual(plugin.name, "filter")
99 | try:
100 | plugin.initialize_plugin()
101 | error = False
102 | except Exception:
103 | error = True
104 | self.assertFalse(error)
105 |
106 | plugin = FilterReleasePlugin()
107 | self.assertIsInstance(plugin, Filter)
108 | self.assertEqual(plugin.name, "release_plugin")
109 | try:
110 | plugin.filter({})
111 | error = False
112 | except Exception:
113 | error = True
114 | self.assertFalse(error)
115 |
116 | plugin = FilterProjectPlugin()
117 | self.assertIsInstance(plugin, Filter)
118 | self.assertEqual(plugin.name, "project_plugin")
119 | try:
120 | result = plugin.check_match(key="value")
121 | error = False
122 | self.assertIsInstance(result, bool)
123 | except Exception:
124 | error = True
125 | self.assertFalse(error)
126 |
127 | def test_deprecated_keys(self) -> None:
128 | with open("test.conf", "w") as f:
129 | f.write("[allowlist]\npackages=foo\n[blocklist]\npackages=bar\n")
130 | instance = BandersnatchConfig()
131 | instance.config_file = "test.conf"
132 | instance.load_configuration()
133 | plugin = Filter()
134 | assert plugin.allowlist.name == "allowlist"
135 | assert plugin.blocklist.name == "blocklist"
136 |
137 | def test__filter_project_blocklist_allowlist__pep503_normalize(self) -> None:
138 | mock_config(
139 | """\
140 | [plugins]
141 | enabled =
142 | blocklist_project
143 | allowlist_project
144 |
145 | [blocklist]
146 | packages =
147 | SampleProject
148 | trove----classifiers
149 |
150 | [allowlist]
151 | packages =
152 | SampleProject
153 | trove----classifiers
154 | """
155 | )
156 |
157 | plugins = {
158 | plugin.name: plugin for plugin in LoadedFilters().filter_project_plugins()
159 | }
160 |
161 | self.assertTrue(plugins["blocklist_project"].check_match(name="sampleproject"))
162 | self.assertTrue(
163 | plugins["blocklist_project"].check_match(name="trove-classifiers")
164 | )
165 | self.assertFalse(plugins["allowlist_project"].check_match(name="sampleproject"))
166 | self.assertFalse(
167 | plugins["allowlist_project"].check_match(name="trove-classifiers")
168 | )
169 |
170 |
171 | if __name__ == "__main__":
172 | unittest.main()
173 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_delete.py:
--------------------------------------------------------------------------------
1 | import os
2 | from argparse import Namespace
3 | from configparser import ConfigParser
4 | from json import loads
5 | from pathlib import Path
6 | from tempfile import TemporaryDirectory
7 | from unittest.mock import patch
8 | from urllib.parse import urlparse
9 |
10 | import pytest
11 |
12 | from bandersnatch.delete import delete_packages, delete_path
13 | from bandersnatch.master import Master
14 | from bandersnatch.utils import find
15 |
16 | EXPECTED_WEB_BEFORE_DELETION = """\
17 | json
18 | json{0}cooper
19 | json{0}unittest
20 | packages
21 | packages{0}69
22 | packages{0}69{0}cooper-6.9.tar.gz
23 | packages{0}69{0}unittest-6.9.tar.gz
24 | packages{0}7b
25 | packages{0}7b{0}cooper-6.9-py3-none-any.whl
26 | packages{0}7b{0}unittest-6.9-py3-none-any.whl
27 | pypi
28 | pypi{0}cooper
29 | pypi{0}cooper{0}json
30 | pypi{0}unittest
31 | pypi{0}unittest{0}json
32 | simple
33 | simple{0}cooper
34 | simple{0}cooper{0}index.html
35 | simple{0}unittest
36 | simple{0}unittest{0}index.html\
37 | """.format(
38 | os.sep
39 | )
40 | EXPECTED_WEB_AFTER_DELETION = """\
41 | json
42 | packages
43 | packages{0}69
44 | packages{0}7b
45 | pypi
46 | simple\
47 | """.format(
48 | os.sep
49 | )
50 | MOCK_JSON_TEMPLATE = """{
51 | "releases": {
52 | "6.9": [
53 | {"url": "https://files.ph.org/packages/7b/PKGNAME-6.9-py3-none-any.whl"},
54 | {"url": "https://files.ph.org/packages/69/PKGNAME-6.9.tar.gz"}
55 | ]
56 | }
57 | }
58 | """
59 |
60 |
61 | def _fake_args() -> Namespace:
62 | return Namespace(dry_run=True, pypi_packages=["cooper", "unittest"], workers=0)
63 |
64 |
65 | def _fake_config() -> ConfigParser:
66 | cp = ConfigParser()
67 | cp.add_section("mirror")
68 | cp["mirror"]["directory"] = "/tmp/unittest"
69 | cp["mirror"]["workers"] = "1"
70 | cp["mirror"]["storage-backend"] = "filesystem"
71 | return cp
72 |
73 |
74 | def test_delete_path() -> None:
75 | with TemporaryDirectory() as td:
76 | td_path = Path(td)
77 | fake_path = td_path / "unittest-file.tgz"
78 | with patch("bandersnatch.delete.logger.info") as mock_log:
79 | assert delete_path(fake_path, True) == 0
80 | assert mock_log.call_count == 1
81 |
82 | with patch("bandersnatch.delete.logger.debug") as mock_log:
83 | assert delete_path(fake_path, False) == 0
84 | assert mock_log.call_count == 1
85 |
86 | fake_path.touch()
87 | # Remove file
88 | assert delete_path(fake_path, False) == 0
89 | # File should be gone - We should log that via debug
90 | with patch("bandersnatch.delete.logger.debug") as mock_log:
91 | assert delete_path(fake_path, False) == 0
92 | assert mock_log.call_count == 1
93 |
94 |
95 | @pytest.mark.asyncio
96 | async def test_delete_packages() -> None:
97 | args = _fake_args()
98 | config = _fake_config()
99 | master = Master("https://unittest.org")
100 |
101 | with TemporaryDirectory() as td:
102 | td_path = Path(td)
103 | config["mirror"]["directory"] = td
104 | web_path = td_path / "web"
105 | json_path = web_path / "json"
106 | json_path.mkdir(parents=True)
107 | pypi_path = web_path / "pypi"
108 | pypi_path.mkdir(parents=True)
109 | simple_path = web_path / "simple"
110 |
111 | # Setup web tree with some json, package index.html + fake blobs
112 | for package_name in args.pypi_packages:
113 | package_simple_path = simple_path / package_name
114 | package_simple_path.mkdir(parents=True)
115 | package_index_path = package_simple_path / "index.html"
116 | package_index_path.touch()
117 |
118 | package_json_str = MOCK_JSON_TEMPLATE.replace("PKGNAME", package_name)
119 | package_json_path = json_path / package_name
120 | with package_json_path.open("w") as pjfp:
121 | pjfp.write(package_json_str)
122 | legacy_json_path = pypi_path / package_name / "json"
123 | legacy_json_path.parent.mkdir()
124 | legacy_json_path.symlink_to(package_json_path)
125 |
126 | package_json = loads(package_json_str)
127 | for _version, blobs in package_json["releases"].items():
128 | for blob in blobs:
129 | url_parts = urlparse(blob["url"])
130 | blob_path = web_path / url_parts.path[1:]
131 | blob_path.parent.mkdir(parents=True, exist_ok=True)
132 | blob_path.touch()
133 |
134 | # See we have a correct mirror setup
135 | assert find(web_path) == EXPECTED_WEB_BEFORE_DELETION
136 |
137 | args.dry_run = True
138 | assert await delete_packages(config, args, master) == 0
139 |
140 | args.dry_run = False
141 | with patch("bandersnatch.delete.logger.info") as mock_log:
142 | assert await delete_packages(config, args, master) == 0
143 | assert mock_log.call_count == 1
144 |
145 | # See we've deleted it all
146 | assert find(web_path) == EXPECTED_WEB_AFTER_DELETION
147 |
148 |
149 | @pytest.mark.asyncio
150 | async def test_delete_packages_no_exist() -> None:
151 | args = _fake_args()
152 | master = Master("https://unittest.org")
153 | with patch("bandersnatch.delete.logger.error") as mock_log:
154 | assert await delete_packages(_fake_config(), args, master) == 0
155 | assert mock_log.call_count == len(args.pypi_packages)
156 |
--------------------------------------------------------------------------------
/docs/mirror_configuration.md:
--------------------------------------------------------------------------------
1 | ## Mirror configuration
2 |
3 | The mirror configuration settings are in a configuration section of the configuration file
4 | named **\[mirror\]**.
5 |
6 | This section contains settings to specify how the mirroring software should operate.
7 |
8 | ### directory
9 |
10 | The mirror directory setting is a string that specifies the directory to
11 | store the mirror files.
12 |
13 | The directory used must meet the following requirements:
14 | - The filesystem must be case-sensitive filesystem.
15 | - The filesystem must support large numbers of sub-directories.
16 | - The filesystem must support large numbers of files (inodes)
17 |
18 | Example:
19 | ``` ini
20 | [mirror]
21 | directory = /srv/pypi
22 | ```
23 |
24 | ### json
25 |
26 | The mirror json seting is a boolean (true/false) setting that indicates that
27 | the json packaging metadata should be mirrored in additon to the packages.
28 |
29 | Example:
30 | ``` ini
31 | [mirror]
32 | json = false
33 | ```
34 |
35 | ### master
36 |
37 | The master setting is a string containing a url of the server which will be mirrored.
38 |
39 | The master url string must use https: protocol.
40 |
41 | The default value is: https://pypi.org
42 |
43 | Example:
44 | ``` ini
45 | [mirror]
46 | master = https://pypi.org
47 | ```
48 |
49 | ### timeout
50 |
51 | The timeout value is an integer that indicates the maximum number of seconds for web requests.
52 |
53 | The default value for this setting is 10 seconds.
54 |
55 | Example:
56 | ``` ini
57 | [mirror]
58 | timeout = 10
59 | ```
60 |
61 | ### global-timeout
62 |
63 | The global-timeout value is an integer that indicates the maximum runtime of individual aiohttp coroutines.
64 |
65 | The default value for this setting is 18000 seconds, or 5 hours.
66 |
67 | Example:
68 | ```ini
69 | [mirror]
70 | global-timeout = 18000
71 | ```
72 |
73 | ### workers
74 |
75 | The workers value is an integer from from 1-10 that indicates the number of concurrent downloads.
76 |
77 | The default value is 3.
78 |
79 | Recommendations for the workers setting:
80 | - leave the default of 3 to avoid overloading the pypi master
81 | - official servers located in data centers could run 10 workers
82 | - anything beyond 10 is probably unreasonable and is not allowed.
83 |
84 | ### hash-index
85 |
86 | The hash-index is a boolean (true/false) to determine if package hashing should be used.
87 |
88 | The Recommended setting: the default of false for full pip/pypi compatibility.
89 |
90 | ```eval_rst
91 | .. warning:: Package index directory hashing is incompatible with pip, and so this should only be used in an environment where it is behind an application that can translate URIs to filesystem locations.
92 | ```
93 |
94 | #### Apache rewrite rules when using hash-index
95 |
96 | When using this setting with an apache server. The apache server will need the following rewrite rules:
97 |
98 | ```
99 | RewriteRule ^([^/])([^/]*)/$ /mirror/pypi/web/simple/$1/$1$2/
100 | RewriteRule ^([^/])([^/]*)/([^/]+)$/ /mirror/pypi/web/simple/$1/$1$2/$3
101 | ```
102 |
103 | #### NGINX rewrite rules when using hash-index
104 |
105 | When using this setting with an nginx server. The nginx server will need the following rewrite rules:
106 |
107 | ```
108 | rewrite ^/simple/([^/])([^/]*)/$ /simple/$1/$1$2/ last;
109 | rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last;
110 | ```
111 |
112 | ### stop-on-error
113 |
114 | The stop-on-error setting is a boolean (true/false) setting that indicates if bandersnatch
115 | should stop immediately if it encounters an error.
116 |
117 | If this setting is false it will not stop when an error is encountered but it will not
118 | mark the sync as successful when the sync is complete.
119 |
120 | ``` ini
121 | [mirror]
122 | stop-on-error = false
123 | ```
124 |
125 | ### log-config
126 |
127 | The log-config setting is a string containing the filename of a python logging configuration
128 | file.
129 |
130 | Example:
131 | ```ini
132 | [mirror]
133 | log-config = /etc/bandersnatch-log.conf
134 | ```
135 |
136 | ### root_uri
137 |
138 | The root_uri is a string containing a uri which is the root added to relative links.
139 |
140 | ``` eval_rst
141 | .. note:: This is generally not necessary, but was added for the official internal PyPI mirror, which requires serving packages from https://files.pythonhosted.org
142 | ```
143 |
144 | Example:
145 | ```ini
146 | [mirror]
147 | root_uri = https://example.com
148 | ```
149 |
150 |
151 | ### diff-file
152 |
153 | The diff file is a string containing the filename to log the files that were downloaded during the mirror.
154 | This file can then be used to synchronize external disks or send the files through some other mechanism to offline systems.
155 | You can then sync the list of files to an attached drive or ssh destination such as a diode:
156 | ```
157 | rsync -av --files-from=/srv/pypi/mirrored-files / /mnt/usb/
158 | ```
159 |
160 | You can also use this file list as an input to 7zip to create split archives for transfers, allowing you to size the files as you needed:
161 | ```
162 | 7za a -i@"/srv/pypi/mirrored-files" -spf -v100m path_to_new_zip.7z
163 | ```
164 |
165 | Example:
166 | ```ini
167 | [mirror]
168 | diff-file = /srv/pypi/mirrored-files
169 | ```
170 |
171 |
172 |
173 | ### diff-append-epoch
174 |
175 | The diff append epoch is a boolean (true/false) setting that indicates if the diff-file should be appended with the current epoch time.
176 | This can be used to track diffs over time so the diff file doesn't get cobbered each run. It is only used when diff-file is used.
177 |
178 | Example:
179 | ```ini
180 | [mirror]
181 | diff-append-epoch = true
182 | ```
183 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/plugins/test_filename.py:
--------------------------------------------------------------------------------
1 | import os
2 | from pathlib import Path
3 | from tempfile import TemporaryDirectory
4 | from unittest import TestCase
5 |
6 | from mock_config import mock_config
7 |
8 | import bandersnatch.filter
9 | from bandersnatch.master import Master
10 | from bandersnatch.mirror import BandersnatchMirror
11 | from bandersnatch.package import Package
12 | from bandersnatch_filter_plugins import filename_name
13 |
14 |
15 | class BasePluginTestCase(TestCase):
16 |
17 | tempdir = None
18 | cwd = None
19 |
20 | def setUp(self) -> None:
21 | self.cwd = os.getcwd()
22 | self.tempdir = TemporaryDirectory()
23 | os.chdir(self.tempdir.name)
24 |
25 | def tearDown(self) -> None:
26 | if self.tempdir:
27 | assert self.cwd
28 | os.chdir(self.cwd)
29 | self.tempdir.cleanup()
30 | self.tempdir = None
31 |
32 |
33 | class TestExcludePlatformFilter(BasePluginTestCase):
34 |
35 | config_contents = """\
36 | [plugins]
37 | enabled =
38 | exclude_platform
39 |
40 | [blocklist]
41 | platforms =
42 | windows
43 | freebsd
44 | macos
45 | linux_armv7l
46 | """
47 |
48 | def test_plugin_compiles_patterns(self) -> None:
49 | mock_config(self.config_contents)
50 |
51 | plugins = bandersnatch.filter.LoadedFilters().filter_release_file_plugins()
52 |
53 | assert any(
54 | type(plugin) == filename_name.ExcludePlatformFilter for plugin in plugins
55 | )
56 |
57 | def test_exclude_platform(self) -> None:
58 | """
59 | Tests the platform filter for what it will keep and excluded
60 | based on the config provided. It is expected to drop all windows,
61 | freebsd and macos packages while only dropping linux-armv7l from
62 | linux packages
63 | """
64 | mock_config(self.config_contents)
65 |
66 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
67 | pkg = Package("foobar", 1)
68 | pkg._metadata = {
69 | "info": {"name": "foobar", "version": "1.0"},
70 | "releases": {
71 | "1.0": [
72 | {
73 | "packagetype": "sdist",
74 | "filename": "foobar-1.0-win32.tar.gz",
75 | "flag": "KEEP",
76 | },
77 | {
78 | "packagetype": "bdist_msi",
79 | "filename": "foobar-1.0.msi",
80 | "flag": "DROP",
81 | },
82 | {
83 | "packagetype": "bdist_wininst",
84 | "filename": "foobar-1.0.exe",
85 | "flag": "DROP",
86 | },
87 | {
88 | "packagetype": "bdist_dmg",
89 | "filename": "foobar-1.0.dmg",
90 | "flag": "DROP",
91 | },
92 | {
93 | "packagetype": "bdist_wheel",
94 | "filename": "foobar-1.0-win32.zip",
95 | "flag": "DROP",
96 | },
97 | {
98 | "packagetype": "bdist_wheel",
99 | "filename": "foobar-1.0-linux.tar.gz",
100 | "flag": "KEEP",
101 | },
102 | {
103 | "packagetype": "bdist_wheel",
104 | "filename": "foobar-1.0-manylinux1_i686.whl",
105 | "flag": "KEEP",
106 | },
107 | {
108 | "packagetype": "bdist_wheel",
109 | "filename": "foobar-1.0-linux_armv7l.whl",
110 | "flag": "DROP",
111 | },
112 | {
113 | "packagetype": "bdist_wheel",
114 | "filename": "foobar-1.0-macosx_10_14_x86_64.whl",
115 | "flag": "DROP",
116 | },
117 | {
118 | "packagetype": "bdist_egg",
119 | "filename": "foobar-1.0-win_amd64.zip",
120 | "flag": "DROP",
121 | },
122 | {
123 | "packagetype": "unknown",
124 | "filename": "foobar-1.0-unknown",
125 | "flag": "KEEP",
126 | },
127 | ],
128 | "0.1": [
129 | {
130 | "packagetype": "sdist",
131 | "filename": "foobar-0.1-win32.msi",
132 | "flag": "KEEP",
133 | },
134 | {
135 | "packagetype": "bdist_wheel",
136 | "filename": "foobar-0.1-win32.whl",
137 | "flag": "DROP",
138 | },
139 | ],
140 | "0.2": [
141 | {
142 | "packagetype": "bdist_egg",
143 | "filename": "foobar-0.1-freebsd-6.0-RELEASE-i386.egg",
144 | "flag": "DROP",
145 | }
146 | ],
147 | },
148 | }
149 |
150 | # count the files we should keep
151 | rv = pkg.releases.values()
152 | keep_count = sum(f["flag"] == "KEEP" for r in rv for f in r)
153 |
154 | pkg.filter_all_releases_files(mirror.filters.filter_release_file_plugins())
155 |
156 | # we should have the same keep count and no drop
157 | rv = pkg.releases.values()
158 | assert sum(f["flag"] == "KEEP" for r in rv for f in r) == keep_count
159 | assert sum(f["flag"] == "DROP" for r in rv for f in r) == 0
160 |
161 | # the release "0.2" should have been deleted since there is no more file in it
162 | assert len(pkg.releases.keys()) == 2
163 |
--------------------------------------------------------------------------------
/docs/filtering_configuration.md:
--------------------------------------------------------------------------------
1 | ## Mirror filtering
2 |
3 | _NOTE: All references to whitelist/blacklist are deprecated, and will be replaced with allowlist/blocklist in 5.0_
4 |
5 | The mirror filter configuration settings are in the same configuration file as the mirror settings.
6 | There are different configuration sections for the different plugin types.
7 |
8 | Filtering Plugin pacakage lists need to use the **Raw PyPI Name**
9 | (non [PEP503](https://www.python.org/dev/peps/pep-0503/#normalized-names) normalized)
10 | in order to get filtered.
11 |
12 | E.g. to Blacklist [ACMPlus](https://pypi.org/project/ACMPlus/) you'd need to
13 | use that *exact* casing in `bandersnatch.conf`
14 |
15 | - A PR would be welcome fixing the normalization but it's an invasive PR
16 |
17 | ### Plugins Enabling
18 |
19 | The plugins setting is a list of plugins to enable.
20 |
21 | Example (enable all installed filter plugins):
22 |
23 | - *Explicitly* enabling plugins is now **mandatory** for *activating plugins*
24 | - They will *do nothing* without activation
25 |
26 | Also, enabling will get plugin's defaults if not configured in their respective sections.
27 |
28 | ```ini
29 | [plugins]
30 | enabled = all
31 | ```
32 |
33 | Example (only enable specific plugins):
34 |
35 | ```ini
36 | [plugins]
37 | enabled =
38 | blacklist_project
39 | whitelist_project
40 | ...
41 | ```
42 |
43 | ### blacklist / whitelist filtering settings
44 |
45 | The blacklist / whitelist settings are in configuration sections named **\[blacklist\]** and **\[whitelist\]**
46 | these section provides settings to indicate packages, projects and releases that should /
47 | should not be mirrored from PyPI.
48 |
49 | This is useful to avoid syncing broken or malicious packages.
50 |
51 | ### packages
52 |
53 | The packages setting is a list of python [pep440 version specifier](https://www.python.org/dev/peps/pep-0440/#id51) of packages to not be mirrored. Enable version specifier filtering for whitelist and blacklist packages through enabling the 'blacklist_release' and 'allowlist_release' plugins, respectively.
54 |
55 | Any packages matching the version specifier for blacklist packages will not be downloaded. Any packages not matching the version specifier for whitelist packages will not be downloaded.
56 |
57 | Example:
58 |
59 | ```ini
60 | [plugins]
61 | enabled =
62 | blacklist_project
63 | blacklist_release
64 | whitelist_project
65 | allowlist_release
66 |
67 | [blacklist]
68 | packages =
69 | example1
70 | example2>=1.4.2,<1.9,!=1.5.*,!=1.6.*
71 |
72 | [whitelist]
73 | packages =
74 | black==18.5
75 | ptr
76 | ```
77 |
78 | ### Metadata Filtering
79 | Packages and release files may be selected by filtering on specific metadata value.
80 |
81 | General form of configuration entries is:
82 |
83 | ```ini
84 | [filter_some_metadata]
85 | tag:tag:path.to.object =
86 | matcha
87 | matchb
88 | ```
89 |
90 | #### Project Regex Matching
91 |
92 | Filter projects to be synced based on regex matches against their raw metadata entries straight from parsed downloaded json.
93 |
94 | Example:
95 |
96 | ```ini
97 | [regex_project_metadata]
98 | not-null:info.classifiers =
99 | .*Programming Language :: Python :: 2.*
100 | ```
101 |
102 | Valid tags are `all`,`any`,`none`,`match-null`,`not-null`, with default of `any:match-null`
103 |
104 | All metadata provided by json is available, including `info`, `last_serial`, `releases`, etc. headings.
105 |
106 |
107 | #### Release File Regex Matching
108 |
109 | Filter release files to be downloaded for projects based on regex matches against the stored metadata entries for each release file.
110 |
111 | Example:
112 |
113 | ```ini
114 | [regex_release_file_metadata]
115 | any:release_file.packagetype =
116 | sdist
117 | bdist_wheel
118 | ```
119 |
120 | Valid tags are the same as for projects.
121 |
122 | Metadata available to match consists of `info`, `release`, and `release_file` top level structures, with `info`
123 | containing the package-wide inthe fo, `release` containing the version of the release and `release_file` the metadata
124 | for an individual file for that release.
125 |
126 |
127 | ### Prerelease filtering
128 |
129 | Bandersnatch includes a plugin to filter our pre-releases of packages. To enable this plugin simply add `prerelease_release` to the enabled plugins list.
130 |
131 | ```ini
132 | [plugins]
133 | enabled =
134 | prerelease_release
135 | ```
136 |
137 | ### Regex filtering
138 |
139 | Advanced users who would like finer control over which packages and releases to filter can use the regex Bandersnatch plugin.
140 |
141 | This plugin allows arbitrary regular expressions to be defined in the configuration, any package name or release version that matches will *not* be downloaded.
142 |
143 | The plugin can be activated for packages and releases separately. For example to activate the project regex filter simply add it to the configuration as before:
144 |
145 | ```ini
146 | [plugins]
147 | enabled =
148 | regex_project
149 | ```
150 |
151 | If you'd like to filter releases using the regex filter use `regex_release` instead.
152 |
153 | The regex plugin requires an extra section in the config to define the actual patterns to used for filtering:
154 |
155 | ```ini
156 | [filter_regex]
157 | packages =
158 | .+-evil$
159 | releases =
160 | .+alpha\d$
161 | ```
162 |
163 | Note the same `filter_regex` section may include a `packages` and a `releases` entry with any number of regular expressions.
164 |
165 |
166 | ### Platform-specific binaries filtering
167 |
168 | This filter allows advanced users not interesting in Windows/macOS/Linux specific binaries to not mirror the corresponding files.
169 |
170 |
171 | ```ini
172 | [plugins]
173 | enabled =
174 | exclude_platform
175 | [blacklist]
176 | platforms =
177 | windows
178 | ```
179 |
180 | Available platforms are: `windows` `macos` `freebsd` `linux`.
181 |
182 |
183 | ### Keep only latest releases
184 |
185 | You can also keep only the latest releases based on greatest [Version](https://packaging.pypa.io/en/latest/version/) numbers.
186 |
187 | ```ini
188 | [plugins]
189 | enabled =
190 | latest_release
191 |
192 | [latest_release]
193 | keep = 3
194 | ```
195 |
196 | By default, the plugin does not filter out any release. You have to add the `keep` setting.
197 |
198 | You should be aware that it can break requirements.
199 |
--------------------------------------------------------------------------------
/src/bandersnatch/configuration.py:
--------------------------------------------------------------------------------
1 | """
2 | Module containing classes to access the bandersnatch configuration file
3 | """
4 | import configparser
5 | import logging
6 | import warnings
7 | from pathlib import Path
8 | from typing import Any, Dict, List, NamedTuple, Optional, Type
9 |
10 | try:
11 | import importlib.resources
12 | except ImportError: # pragma: no cover
13 | # For <=3.6
14 | import importlib
15 | import importlib_resources
16 |
17 | importlib.resources = importlib_resources
18 |
19 |
20 | logger = logging.getLogger("bandersnatch")
21 |
22 |
23 | class SetConfigValues(NamedTuple):
24 | json_save: bool
25 | root_uri: str
26 | diff_file_path: str
27 | diff_append_epoch: bool
28 | digest_name: str
29 | storage_backend_name: str
30 | cleanup: bool
31 |
32 |
33 | class Singleton(type): # pragma: no cover
34 | _instances: Dict["Singleton", Type] = {}
35 |
36 | def __call__(cls, *args: Any, **kwargs: Any) -> Type:
37 | if cls not in cls._instances:
38 | cls._instances[cls] = super().__call__(*args, **kwargs)
39 | return cls._instances[cls]
40 |
41 |
42 | class BandersnatchConfig(metaclass=Singleton):
43 | # Ensure we only show the deprecations once
44 | SHOWN_DEPRECATIONS = False
45 |
46 | def __init__(self, config_file: Optional[str] = None) -> None:
47 | """
48 | Bandersnatch configuration class singleton
49 |
50 | This class is a singleton that parses the configuration once at the
51 | start time.
52 |
53 | Parameters
54 | ==========
55 | config_file: str, optional
56 | Path to the configuration file to use
57 | """
58 | self.found_deprecations: List[str] = []
59 | with importlib.resources.path( # type: ignore
60 | "bandersnatch", "default.conf"
61 | ) as config_path:
62 | self.default_config_file = str(config_path)
63 | self.config_file = config_file
64 | self.load_configuration()
65 | self.check_for_deprecations()
66 |
67 | def check_for_deprecations(self) -> None:
68 | if self.SHOWN_DEPRECATIONS:
69 | return
70 | if self.config.has_section("whitelist") or self.config.has_section("blacklist"):
71 | err_msg = (
72 | "whitelist/blacklist filter plugins will be renamed to "
73 | "allowlist_*/blocklist_* in version 5.0 "
74 | " - Documentation @ https://bandersnatch.readthedocs.io/"
75 | )
76 | warnings.warn(err_msg, DeprecationWarning, stacklevel=2)
77 | logger.warning(err_msg)
78 | self.SHOWN_DEPRECATIONS = True
79 |
80 | def load_configuration(self) -> None:
81 | """
82 | Read the configuration from a configuration file
83 | """
84 | config_file = self.default_config_file
85 | if self.config_file:
86 | config_file = self.config_file
87 | self.config = configparser.ConfigParser(delimiters="=")
88 | self.config.optionxform = lambda option: option # type: ignore
89 | self.config.read(config_file)
90 |
91 |
92 | # 11-15, 84-89, 98-99, 117-118, 124-126, 144-149
93 | def validate_config_values(config: configparser.ConfigParser) -> SetConfigValues:
94 | try:
95 | json_save = config.getboolean("mirror", "json")
96 | except configparser.NoOptionError:
97 | logger.error(
98 | "Please update your config to include a json "
99 | + "boolean in the [mirror] section. Setting to False"
100 | )
101 | json_save = False
102 |
103 | try:
104 | root_uri = config.get("mirror", "root_uri")
105 | except configparser.NoOptionError:
106 | root_uri = ""
107 |
108 | try:
109 | diff_file_path = config.get("mirror", "diff-file")
110 | except configparser.NoOptionError:
111 | diff_file_path = ""
112 | if "{{" in diff_file_path and "}}" in diff_file_path:
113 | diff_file_path = diff_file_path.replace("{{", "").replace("}}", "")
114 | diff_ref_section, _, diff_ref_key = diff_file_path.partition("_")
115 | try:
116 | diff_file_path = config.get(diff_ref_section, diff_ref_key)
117 | except (configparser.NoOptionError, configparser.NoSectionError):
118 | logger.error(
119 | "Invalid section reference in `diff-file` key. "
120 | "Please correct this error. Saving diff files in"
121 | " base mirror directory."
122 | )
123 | diff_file_path = str(
124 | Path(config.get("mirror", "directory")) / "mirrored-files"
125 | )
126 |
127 | try:
128 | diff_append_epoch = config.getboolean("mirror", "diff-append-epoch")
129 | except configparser.NoOptionError:
130 | diff_append_epoch = False
131 |
132 | try:
133 | logger.debug("Checking config for storage backend...")
134 | storage_backend_name = config.get("mirror", "storage-backend")
135 | logger.debug("Found storage backend in config!")
136 | except configparser.NoOptionError:
137 | storage_backend_name = "filesystem"
138 | logger.debug(
139 | "Failed to find storage backend in config, falling back to default!"
140 | )
141 | logger.info(f"Selected storage backend: {storage_backend_name}")
142 |
143 | try:
144 | digest_name = config.get("mirror", "digest_name")
145 | except configparser.NoOptionError:
146 | digest_name = "sha256"
147 | if digest_name not in ("md5", "sha256"):
148 | raise ValueError(
149 | f"Supplied digest_name {digest_name} is not supported! Please "
150 | + "update digest_name to one of ('sha256', 'md5') in the [mirror] "
151 | + "section."
152 | )
153 |
154 | try:
155 | cleanup = config.getboolean("mirror", "cleanup")
156 | except configparser.NoOptionError:
157 | logger.debug(
158 | "bandersnatch is not cleaning up non PEP 503 normalized Simple "
159 | + "API directories"
160 | )
161 | cleanup = False
162 |
163 | return SetConfigValues(
164 | json_save,
165 | root_uri,
166 | diff_file_path,
167 | diff_append_epoch,
168 | digest_name,
169 | storage_backend_name,
170 | cleanup,
171 | )
172 |
--------------------------------------------------------------------------------
/src/bandersnatch/package.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import logging
3 | from typing import TYPE_CHECKING, Any, Dict, List, Optional
4 |
5 | from packaging.utils import canonicalize_name
6 |
7 | from .errors import PackageNotFound, StaleMetadata
8 | from .master import StalePage
9 |
10 | if TYPE_CHECKING: # pragma: no cover
11 | from .filter import Filter
12 | from .master import Master
13 |
14 | # Bool to help us not spam the logs with certain log messages
15 | display_filter_log = True
16 | logger = logging.getLogger(__name__)
17 |
18 |
19 | class Package:
20 | def __init__(self, name: str, serial: int = 0) -> None:
21 | self.name = canonicalize_name(name)
22 | self.raw_name = name
23 | self.serial = serial
24 |
25 | self._metadata: Optional[Dict] = None
26 |
27 | @property
28 | def metadata(self) -> Dict[str, Any]:
29 | assert self._metadata is not None, "Must fetch metadata before accessing it"
30 | return self._metadata
31 |
32 | @property
33 | def info(self) -> Dict[str, Any]:
34 | return self.metadata["info"] # type: ignore
35 |
36 | @property
37 | def last_serial(self) -> int:
38 | return self.metadata["last_serial"] # type: ignore
39 |
40 | @property
41 | def releases(self) -> Dict[str, List]:
42 | return self.metadata["releases"] # type: ignore
43 |
44 | @property
45 | def release_files(self) -> List:
46 | release_files: List[Dict] = []
47 |
48 | for release in self.releases.values():
49 | release_files.extend(release)
50 |
51 | return release_files
52 |
53 | async def update_metadata(self, master: "Master", attempts: int = 3) -> None:
54 | tries = 0
55 | sleep_on_stale = 1
56 |
57 | while tries < attempts:
58 | try:
59 | logger.info(
60 | f"Fetching metadata for package: {self.name} (serial {self.serial})"
61 | )
62 | self._metadata = await master.get_package_metadata(
63 | self.name, serial=self.serial
64 | )
65 | return
66 | except PackageNotFound as e:
67 | logger.info(str(e))
68 | raise
69 | except StalePage:
70 | tries += 1
71 | logger.error(f"Stale serial for package {self.name} - Attempt {tries}")
72 | if tries < attempts:
73 | logger.debug(f"Sleeping {sleep_on_stale}s to give CDN a chance")
74 | await asyncio.sleep(sleep_on_stale)
75 | sleep_on_stale *= 2
76 | continue
77 | logger.error(
78 | f"Stale serial for {self.name} ({self.serial}) "
79 | + "not updating. Giving up."
80 | )
81 | raise StaleMetadata(package_name=self.name, attempts=attempts)
82 |
83 | def filter_metadata(self, metadata_filters: List["Filter"]) -> bool:
84 | """
85 | Run the metadata filtering plugins
86 | """
87 | global display_filter_log
88 | if not metadata_filters:
89 | if display_filter_log:
90 | logger.info(
91 | "No metadata filters are enabled. Skipping metadata filtering"
92 | )
93 | display_filter_log = False
94 | return True
95 |
96 | return all(plugin.filter(self.metadata) for plugin in metadata_filters)
97 |
98 | def _filter_release(
99 | self, release_data: Dict, release_filters: List["Filter"]
100 | ) -> bool:
101 | """
102 | Run the release filtering plugins
103 | """
104 | global display_filter_log
105 | if not release_filters:
106 | if display_filter_log:
107 | logger.info(
108 | "No release filters are enabled. Skipping release filtering"
109 | )
110 | display_filter_log = False
111 | return True
112 |
113 | return all(plugin.filter(release_data) for plugin in release_filters)
114 |
115 | def filter_all_releases(self, release_filters: List["Filter"]) -> bool:
116 | """
117 | Filter releases and removes releases that fail the filters
118 | """
119 | releases = list(self.releases.keys())
120 | for version in releases:
121 | if not self._filter_release(
122 | {"version": version, "releases": self.releases, "info": self.info},
123 | release_filters,
124 | ):
125 | del self.releases[version]
126 | if releases:
127 | return True
128 | return False
129 |
130 | def _filter_release_file(
131 | self, metadata: Dict, release_file_filters: List["Filter"]
132 | ) -> bool:
133 | """
134 | Run the release file filtering plugins
135 | """
136 | global display_filter_log
137 | if not release_file_filters:
138 | if display_filter_log:
139 | logger.info(
140 | "No release file filters are enabled. Skipping release file filtering" # noqa: E501
141 | )
142 | display_filter_log = False
143 | return True
144 |
145 | return all(plugin.filter(metadata) for plugin in release_file_filters)
146 |
147 | def filter_all_releases_files(self, release_file_filters: List["Filter"]) -> bool:
148 | """
149 | Filter release files and remove empty releases after doing so.
150 | """
151 | releases = list(self.releases.keys())
152 | for version in releases:
153 | release_files = list(self.releases[version])
154 | for rfindex in reversed(range(len(release_files))):
155 | if not self._filter_release_file(
156 | {
157 | "info": self.info,
158 | "release": version,
159 | "release_file": self.releases[version][rfindex],
160 | },
161 | release_file_filters,
162 | ):
163 | del self.releases[version][rfindex]
164 | if not self.releases[version]:
165 | del self.releases[version]
166 |
167 | if releases:
168 | return True
169 | return False
170 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_configuration.py:
--------------------------------------------------------------------------------
1 | import configparser
2 | import os
3 | import unittest
4 | import warnings
5 | from tempfile import TemporaryDirectory
6 | from unittest import TestCase
7 |
8 | from bandersnatch.configuration import (
9 | BandersnatchConfig,
10 | SetConfigValues,
11 | Singleton,
12 | validate_config_values,
13 | )
14 |
15 | try:
16 | import importlib.resources
17 | except ImportError: # For 3.6 and lesser
18 | import importlib
19 | import importlib_resources
20 |
21 | importlib.resources = importlib_resources
22 |
23 |
24 | class TestBandersnatchConf(TestCase):
25 | """
26 | Tests for the BandersnatchConf singleton class
27 | """
28 |
29 | tempdir = None
30 | cwd = None
31 |
32 | def setUp(self) -> None:
33 | self.cwd = os.getcwd()
34 | self.tempdir = TemporaryDirectory()
35 | os.chdir(self.tempdir.name)
36 | # Hack to ensure each test gets fresh instance if needed
37 | # We have a dedicated test to ensure we're creating a singleton
38 | Singleton._instances = {}
39 |
40 | def tearDown(self) -> None:
41 | if self.tempdir:
42 | assert self.cwd
43 | os.chdir(self.cwd)
44 | self.tempdir.cleanup()
45 | self.tempdir = None
46 |
47 | def test_is_singleton(self) -> None:
48 | instance1 = BandersnatchConfig()
49 | instance2 = BandersnatchConfig()
50 | self.assertEqual(id(instance1), id(instance2))
51 |
52 | def test_single_config__default__all_sections_present(self) -> None:
53 | with importlib.resources.path( # type: ignore
54 | "bandersnatch", "unittest.conf"
55 | ) as config_file:
56 | instance = BandersnatchConfig(str(config_file))
57 | # All default values should at least be present and be the write types
58 | for section in ["mirror", "plugins", "blocklist"]:
59 | self.assertIn(section, instance.config.sections())
60 |
61 | def test_single_config__default__mirror__setting_attributes(self) -> None:
62 | instance = BandersnatchConfig()
63 | options = [option for option in instance.config["mirror"]]
64 | options.sort()
65 | self.assertListEqual(
66 | options,
67 | [
68 | "cleanup",
69 | "directory",
70 | "global-timeout",
71 | "hash-index",
72 | "json",
73 | "master",
74 | "stop-on-error",
75 | "storage-backend",
76 | "timeout",
77 | "verifiers",
78 | "workers",
79 | ],
80 | )
81 |
82 | def test_single_config__default__mirror__setting__types(self) -> None:
83 | """
84 | Make sure all default mirror settings will cast to the correct types
85 | """
86 | instance = BandersnatchConfig()
87 | for option, option_type in [
88 | ("directory", str),
89 | ("hash-index", bool),
90 | ("json", bool),
91 | ("master", str),
92 | ("stop-on-error", bool),
93 | ("storage-backend", str),
94 | ("timeout", int),
95 | ("global-timeout", int),
96 | ("workers", int),
97 | ]:
98 | self.assertIsInstance(
99 | option_type(instance.config["mirror"].get(option)), option_type
100 | )
101 |
102 | def test_single_config_custom_setting_boolean(self) -> None:
103 | with open("test.conf", "w") as testconfig_handle:
104 | testconfig_handle.write("[mirror]\nhash-index=false\n")
105 | instance = BandersnatchConfig()
106 | instance.config_file = "test.conf"
107 | instance.load_configuration()
108 | self.assertFalse(instance.config["mirror"].getboolean("hash-index"))
109 |
110 | def test_single_config_custom_setting_int(self) -> None:
111 | with open("test.conf", "w") as testconfig_handle:
112 | testconfig_handle.write("[mirror]\ntimeout=999\n")
113 | instance = BandersnatchConfig()
114 | instance.config_file = "test.conf"
115 | instance.load_configuration()
116 | self.assertEqual(int(instance.config["mirror"]["timeout"]), 999)
117 |
118 | def test_single_config_custom_setting_str(self) -> None:
119 | with open("test.conf", "w") as testconfig_handle:
120 | testconfig_handle.write("[mirror]\nmaster=https://foo.bar.baz\n")
121 | instance = BandersnatchConfig()
122 | instance.config_file = "test.conf"
123 | instance.load_configuration()
124 | self.assertEqual(instance.config["mirror"]["master"], "https://foo.bar.baz")
125 |
126 | def test_multiple_instances_custom_setting_str(self) -> None:
127 | with open("test.conf", "w") as testconfig_handle:
128 | testconfig_handle.write("[mirror]\nmaster=https://foo.bar.baz\n")
129 | instance1 = BandersnatchConfig()
130 | instance1.config_file = "test.conf"
131 | instance1.load_configuration()
132 |
133 | instance2 = BandersnatchConfig()
134 | self.assertEqual(instance2.config["mirror"]["master"], "https://foo.bar.baz")
135 |
136 | def test_validate_config_values(self) -> None:
137 | default_values = SetConfigValues(
138 | False, "", "", False, "sha256", "filesystem", False
139 | )
140 | no_options_configparser = configparser.ConfigParser()
141 | no_options_configparser["mirror"] = {}
142 | self.assertEqual(
143 | default_values, validate_config_values(no_options_configparser)
144 | )
145 |
146 | def test_deprecation_warning_raised(self) -> None:
147 | # Remove in 5.0 once we deprecate whitelist/blacklist
148 |
149 | config_file = "test.conf"
150 | instance = BandersnatchConfig()
151 | instance.config_file = config_file
152 | # Test no warning if new plugins used
153 | with open(config_file, "w") as f:
154 | f.write("[allowlist]\npackages=foo\n")
155 | instance.load_configuration()
156 | with warnings.catch_warnings(record=True) as w:
157 | instance.check_for_deprecations()
158 | self.assertEqual(len(w), 0)
159 |
160 | # Test warning if old plugins used
161 | instance.SHOWN_DEPRECATIONS = False
162 | with open(config_file, "w") as f:
163 | f.write("[whitelist]\npackages=foo\n")
164 | instance.load_configuration()
165 | with warnings.catch_warnings(record=True) as w:
166 | instance.check_for_deprecations()
167 | instance.check_for_deprecations()
168 | # Assert we only throw 1 warning
169 | self.assertEqual(len(w), 1)
170 |
171 |
172 | if __name__ == "__main__":
173 | unittest.main()
174 |
--------------------------------------------------------------------------------
/src/bandersnatch_filter_plugins/allowlist_name.py:
--------------------------------------------------------------------------------
1 | import logging
2 | from typing import Any, Dict, List, Set
3 |
4 | from packaging.requirements import Requirement
5 | from packaging.utils import canonicalize_name
6 | from packaging.version import InvalidVersion, Version
7 |
8 | from bandersnatch.filter import FilterProjectPlugin, FilterReleasePlugin
9 |
10 | logger = logging.getLogger("bandersnatch")
11 |
12 |
13 | class AllowListProject(FilterProjectPlugin):
14 | name = "allowlist_project"
15 | deprecated_name = "whitelist_project"
16 | # Requires iterable default
17 | allowlist_package_names: List[str] = []
18 |
19 | def initialize_plugin(self) -> None:
20 | """
21 | Initialize the plugin
22 | """
23 | # Generate a list of allowlisted packages from the configuration and
24 | # store it into self.allowlist_package_names attribute so this
25 | # operation doesn't end up in the fastpath.
26 | if not self.allowlist_package_names:
27 | self.allowlist_package_names = self._determine_unfiltered_package_names()
28 | logger.info(
29 | f"Initialized project plugin {self.name}, filtering "
30 | + f"{self.allowlist_package_names}"
31 | )
32 |
33 | def _determine_unfiltered_package_names(self) -> List[str]:
34 | """
35 | Return a list of package names to be filtered base on the configuration
36 | file.
37 | """
38 | # This plugin only processes packages, if the line in the packages
39 | # configuration contains a PEP440 specifier it will be processed by the
40 | # allowlist release filter. So we need to remove any packages that
41 | # are not applicable for this plugin.
42 | unfiltered_packages: Set[str] = set()
43 | try:
44 | lines = self.allowlist["packages"]
45 | package_lines = lines.split("\n")
46 | except KeyError:
47 | package_lines = []
48 | for package_line in package_lines:
49 | package_line = package_line.strip()
50 | if not package_line or package_line.startswith("#"):
51 | continue
52 | unfiltered_packages.add(canonicalize_name(Requirement(package_line).name))
53 | return list(unfiltered_packages)
54 |
55 | def filter(self, metadata: Dict) -> bool:
56 | return not self.check_match(name=metadata["info"]["name"])
57 |
58 | def check_match(self, **kwargs: Any) -> bool:
59 | """
60 | Check if the package name matches against a project that is blocklisted
61 | in the configuration.
62 |
63 | Parameters
64 | ==========
65 | name: str
66 | The normalized package name of the package/project to check against
67 | the blocklist.
68 |
69 | Returns
70 | =======
71 | bool:
72 | True if it matches, False otherwise.
73 | """
74 | if not self.allowlist_package_names:
75 | return False
76 |
77 | name = kwargs.get("name", None)
78 | if not name:
79 | return False
80 |
81 | if canonicalize_name(name) in self.allowlist_package_names:
82 | logger.info(f"Package {name!r} is allowlisted")
83 | return False
84 | return True
85 |
86 |
87 | class AllowListRelease(FilterReleasePlugin):
88 | name = "allowlist_release"
89 | deprecated_name = "whitelist_release"
90 | # Requires iterable default
91 | allowlist_package_names: List[Requirement] = []
92 |
93 | def initialize_plugin(self) -> None:
94 | """
95 | Initialize the plugin
96 | """
97 | # Generate a list of allowlisted packages from the configuration and
98 | # store it into self.allowlist_package_names attribute so this
99 | # operation doesn't end up in the fastpath.
100 | if not self.allowlist_package_names:
101 | self.allowlist_release_requirements = (
102 | self._determine_filtered_package_requirements()
103 | )
104 | logger.info(
105 | f"Initialized release plugin {self.name}, filtering "
106 | + f"{self.allowlist_release_requirements}"
107 | )
108 |
109 | def _determine_filtered_package_requirements(self) -> List[Requirement]:
110 | """
111 | Parse the configuration file for [allowlist]packages
112 |
113 | Returns
114 | -------
115 | list of packaging.requirements.Requirement
116 | For all PEP440 package specifiers
117 | """
118 | filtered_requirements: Set[Requirement] = set()
119 | try:
120 | lines = self.allowlist["packages"]
121 | package_lines = lines.split("\n")
122 | except KeyError:
123 | package_lines = []
124 | for package_line in package_lines:
125 | package_line = package_line.strip()
126 | if not package_line or package_line.startswith("#"):
127 | continue
128 | requirement = Requirement(package_line)
129 | requirement.name = canonicalize_name(requirement.name)
130 | requirement.specifier.prereleases = True
131 | filtered_requirements.add(requirement)
132 | return list(filtered_requirements)
133 |
134 | def filter(self, metadata: Dict) -> bool:
135 | """
136 | Returns False if version fails the filter,
137 | i.e. doesn't matches an allowlist version specifier
138 | """
139 | name = metadata["info"]["name"]
140 | version = metadata["version"]
141 | return self._check_match(canonicalize_name(name), version)
142 |
143 | def _check_match(self, name: str, version_string: str) -> bool:
144 | """
145 | Check if the package name and version matches against an allowlisted
146 | package version specifier.
147 |
148 | Parameters
149 | ==========
150 | name: str
151 | Package name
152 |
153 | version: str
154 | Package version
155 |
156 | Returns
157 | =======
158 | bool:
159 | True if it matches, False otherwise.
160 | """
161 | if not name or not version_string:
162 | return False
163 |
164 | try:
165 | version = Version(version_string)
166 | except InvalidVersion:
167 | logger.debug(f"Package {name}=={version_string} has an invalid version")
168 | return False
169 | for requirement in self.allowlist_release_requirements:
170 | if name != requirement.name:
171 | continue
172 | if version in requirement.specifier:
173 | logger.debug(
174 | f"MATCH: Release {name}=={version} matches specifier "
175 | f"{requirement.specifier}"
176 | )
177 | return True
178 | return False
179 |
--------------------------------------------------------------------------------
/src/bandersnatch/main.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import asyncio
3 | import logging
4 | import logging.config
5 | import shutil
6 | import sys
7 | from configparser import ConfigParser
8 | from pathlib import Path
9 | from tempfile import gettempdir
10 | from typing import Optional
11 |
12 | import bandersnatch.configuration
13 | import bandersnatch.delete
14 | import bandersnatch.log
15 | import bandersnatch.master
16 | import bandersnatch.mirror
17 | import bandersnatch.verify
18 | from bandersnatch.storage import storage_backend_plugins
19 |
20 | logger = logging.getLogger(__name__) # pylint: disable=C0103
21 |
22 |
23 | # TODO: Workout why argparse.ArgumentParser causes type errors
24 | def _delete_parser(subparsers: argparse._SubParsersAction) -> None:
25 | d = subparsers.add_parser(
26 | "delete",
27 | help=(
28 | "Consulte metadata (locally or remotely) and delete "
29 | + "entire pacakge artifacts."
30 | ),
31 | )
32 | d.add_argument(
33 | "--dry-run",
34 | action="store_true",
35 | default=False,
36 | help="Do not download or delete files",
37 | )
38 | d.add_argument(
39 | "--workers",
40 | type=int,
41 | default=0,
42 | help="# of parallel iops [Defaults to bandersnatch.conf]",
43 | )
44 | d.add_argument("pypi_packages", nargs="*")
45 | d.set_defaults(op="delete")
46 |
47 |
48 | def _mirror_parser(subparsers: argparse._SubParsersAction) -> None:
49 | m = subparsers.add_parser(
50 | "mirror",
51 | help="Performs a one-time synchronization with the PyPI master server.",
52 | )
53 | m.add_argument(
54 | "--force-check",
55 | action="store_true",
56 | default=False,
57 | help=(
58 | "Force bandersnatch to reset the PyPI serial (move serial file to /tmp) to "
59 | + "perform a full sync"
60 | ),
61 | )
62 | m.set_defaults(op="mirror")
63 |
64 |
65 | def _verify_parser(subparsers: argparse._SubParsersAction) -> None:
66 | v = subparsers.add_parser(
67 | "verify", help="Read in Metadata and check package file validity"
68 | )
69 | v.add_argument(
70 | "--delete",
71 | action="store_true",
72 | default=False,
73 | help="Enable deletion of packages not active",
74 | )
75 | v.add_argument(
76 | "--dry-run",
77 | action="store_true",
78 | default=False,
79 | help="Do not download or delete files",
80 | )
81 | v.add_argument(
82 | "--json-update",
83 | action="store_true",
84 | default=False,
85 | help="Enable updating JSON from PyPI",
86 | )
87 | v.add_argument(
88 | "--workers",
89 | type=int,
90 | default=0,
91 | help="# of parallel iops [Defaults to bandersnatch.conf]",
92 | )
93 | v.set_defaults(op="verify")
94 |
95 |
96 | def _sync_parser(subparsers: argparse._SubParsersAction) -> None:
97 | m = subparsers.add_parser(
98 | "sync", help="Synchronize specific packages with the PyPI master server.",
99 | )
100 | m.add_argument(
101 | "packages", metavar="package", nargs="+", help="The name of package to sync",
102 | )
103 | m.set_defaults(op="sync")
104 |
105 |
106 | async def async_main(args: argparse.Namespace, config: ConfigParser) -> int:
107 | if args.op.lower() == "delete":
108 | async with bandersnatch.master.Master(
109 | config.get("mirror", "master"),
110 | config.getfloat("mirror", "timeout"),
111 | config.getfloat("mirror", "global-timeout", fallback=None),
112 | ) as master:
113 | return await bandersnatch.delete.delete_packages(config, args, master)
114 | elif args.op.lower() == "verify":
115 | return await bandersnatch.verify.metadata_verify(config, args)
116 | elif args.op.lower() == "sync":
117 | return await bandersnatch.mirror.mirror(config, args.packages)
118 |
119 | if args.force_check:
120 | storage_plugin = next(iter(storage_backend_plugins()))
121 | status_file = (
122 | storage_plugin.PATH_BACKEND(config.get("mirror", "directory")) / "status"
123 | )
124 | if status_file.exists():
125 | tmp_status_file = Path(gettempdir()) / "status"
126 | try:
127 | shutil.move(str(status_file), tmp_status_file)
128 | logger.debug(
129 | "Force bandersnatch to check everything against the master PyPI"
130 | + f" - status file moved to {tmp_status_file}"
131 | )
132 | except OSError as e:
133 | logger.error(
134 | f"Could not move status file ({status_file} to "
135 | + f" {tmp_status_file}): {e}"
136 | )
137 | else:
138 | logger.info(
139 | f"No status file to move ({status_file}) - Full sync will occur"
140 | )
141 |
142 | return await bandersnatch.mirror.mirror(config)
143 |
144 |
145 | def main(loop: Optional[asyncio.AbstractEventLoop] = None) -> int:
146 | parser = argparse.ArgumentParser(
147 | description="PyPI PEP 381 mirroring client.", prog="bandersnatch"
148 | )
149 | parser.add_argument(
150 | "--version", action="version", version=f"%(prog)s {bandersnatch.__version__}"
151 | )
152 | parser.add_argument(
153 | "-c",
154 | "--config",
155 | default="/etc/bandersnatch.conf",
156 | help="use configuration file (default: %(default)s)",
157 | )
158 | parser.add_argument(
159 | "--debug",
160 | action="store_true",
161 | default=False,
162 | help="Turn on extra logging (DEBUG level)",
163 | )
164 |
165 | subparsers = parser.add_subparsers()
166 | _delete_parser(subparsers)
167 | _mirror_parser(subparsers)
168 | _verify_parser(subparsers)
169 | _sync_parser(subparsers)
170 |
171 | if len(sys.argv) < 2:
172 | parser.print_help()
173 | parser.exit()
174 |
175 | args = parser.parse_args()
176 |
177 | bandersnatch.log.setup_logging(args)
178 |
179 | # Prepare default config file if needed.
180 | config_path = Path(args.config)
181 | if not config_path.exists():
182 | logger.warning(f"Config file '{args.config}' missing, creating default config.")
183 | logger.warning("Please review the config file, then run 'bandersnatch' again.")
184 |
185 | default_config_path = Path(__file__).parent / "default.conf"
186 | try:
187 | shutil.copy(default_config_path, args.config)
188 | except OSError as e:
189 | logger.error(f"Could not create config file: {e}")
190 | return 1
191 |
192 | config = bandersnatch.configuration.BandersnatchConfig(
193 | config_file=args.config
194 | ).config
195 |
196 | if config.has_option("mirror", "log-config"):
197 | logging.config.fileConfig(str(Path(config.get("mirror", "log-config"))))
198 |
199 | # TODO: Go to asyncio.run() when >= 3.7
200 | loop = loop or asyncio.get_event_loop()
201 | loop.set_debug(args.debug)
202 | try:
203 | return loop.run_until_complete(async_main(args, config))
204 | finally:
205 | loop.close()
206 |
207 |
208 | if __name__ == "__main__":
209 | exit(main())
210 |
--------------------------------------------------------------------------------
/src/bandersnatch_filter_plugins/blocklist_name.py:
--------------------------------------------------------------------------------
1 | import logging
2 | from typing import Any, Dict, List, Set
3 |
4 | from packaging.requirements import Requirement
5 | from packaging.utils import canonicalize_name
6 | from packaging.version import InvalidVersion, Version
7 |
8 | from bandersnatch.filter import FilterProjectPlugin, FilterReleasePlugin
9 |
10 | logger = logging.getLogger("bandersnatch")
11 |
12 |
13 | class BlockListProject(FilterProjectPlugin):
14 | name = "blocklist_project"
15 | deprecated_name = "blacklist_project"
16 | # Requires iterable default
17 | blocklist_package_names: List[str] = []
18 |
19 | def initialize_plugin(self) -> None:
20 | """
21 | Initialize the plugin
22 | """
23 | # Generate a list of blocklisted packages from the configuration and
24 | # store it into self.blocklist_package_names attribute so this
25 | # operation doesn't end up in the fastpath.
26 | if not self.blocklist_package_names:
27 | self.blocklist_package_names = self._determine_filtered_package_names()
28 | logger.info(
29 | f"Initialized project plugin {self.name}, filtering "
30 | + f"{self.blocklist_package_names}"
31 | )
32 |
33 | def _determine_filtered_package_names(self) -> List[str]:
34 | """
35 | Return a list of package names to be filtered base on the configuration
36 | file.
37 | """
38 | # This plugin only processes packages, if the line in the packages
39 | # configuration contains a PEP440 specifier it will be processed by the
40 | # blocklist release filter. So we need to remove any packages that
41 | # are not applicable for this plugin.
42 | filtered_packages: Set[str] = set()
43 | try:
44 | lines = self.blocklist["packages"]
45 | package_lines = lines.split("\n")
46 | except KeyError:
47 | package_lines = []
48 | for package_line in package_lines:
49 | package_line = package_line.strip()
50 | if not package_line or package_line.startswith("#"):
51 | continue
52 | package_requirement = Requirement(package_line)
53 | if package_requirement.specifier:
54 | continue
55 | if package_requirement.name != package_line:
56 | logger.debug(
57 | "Package line %r does not requirement name %r",
58 | package_line,
59 | package_requirement.name,
60 | )
61 | continue
62 | filtered_packages.add(canonicalize_name(package_requirement.name))
63 | logger.debug("Project blocklist is %r", list(filtered_packages))
64 | return list(filtered_packages)
65 |
66 | def filter(self, metadata: Dict) -> bool:
67 | return not self.check_match(name=metadata["info"]["name"])
68 |
69 | def check_match(self, **kwargs: Any) -> bool:
70 | """
71 | Check if the package name matches against a project that is blocklisted
72 | in the configuration.
73 |
74 | Parameters
75 | ==========
76 | name: str
77 | The normalized package name of the package/project to check against
78 | the blocklist.
79 |
80 | Returns
81 | =======
82 | bool:
83 | True if it matches, False otherwise.
84 | """
85 | name = kwargs.get("name", None)
86 | if not name:
87 | return False
88 |
89 | if canonicalize_name(name) in self.blocklist_package_names:
90 | logger.info(f"Package {name!r} is blocklisted")
91 | return True
92 | return False
93 |
94 |
95 | class BlockListRelease(FilterReleasePlugin):
96 | name = "blocklist_release"
97 | deprecated_name = "blacklist_release"
98 | # Requires iterable default
99 | blocklist_package_names: List[Requirement] = []
100 |
101 | def initialize_plugin(self) -> None:
102 | """
103 | Initialize the plugin
104 | """
105 | # Generate a list of blocklisted packages from the configuration and
106 | # store it into self.blocklist_package_names attribute so this
107 | # operation doesn't end up in the fastpath.
108 | if not self.blocklist_package_names:
109 | self.blocklist_release_requirements = (
110 | self._determine_filtered_package_requirements()
111 | )
112 | logger.info(
113 | f"Initialized release plugin {self.name}, filtering "
114 | + f"{self.blocklist_release_requirements}"
115 | )
116 |
117 | def _determine_filtered_package_requirements(self) -> List[Requirement]:
118 | """
119 | Parse the configuration file for [blocklist]packages
120 |
121 | Returns
122 | -------
123 | list of packaging.requirements.Requirement
124 | For all PEP440 package specifiers
125 | """
126 | filtered_requirements: Set[Requirement] = set()
127 | try:
128 | lines = self.blocklist["packages"]
129 | package_lines = lines.split("\n")
130 | except KeyError:
131 | package_lines = []
132 | for package_line in package_lines:
133 | package_line = package_line.strip()
134 | if not package_line or package_line.startswith("#"):
135 | continue
136 | requirement = Requirement(package_line)
137 | requirement.name = canonicalize_name(requirement.name)
138 | requirement.specifier.prereleases = True
139 | filtered_requirements.add(requirement)
140 | return list(filtered_requirements)
141 |
142 | def filter(self, metadata: Dict) -> bool:
143 | """
144 | Returns False if version fails the filter,
145 | i.e. matches a blocklist version specifier
146 | """
147 | name = metadata["info"]["name"]
148 | version = metadata["version"]
149 | return not self._check_match(canonicalize_name(name), version)
150 |
151 | def _check_match(self, name: str, version_string: str) -> bool:
152 | """
153 | Check if the package name and version matches against a blocklisted
154 | package version specifier.
155 |
156 | Parameters
157 | ==========
158 | name: str
159 | Package name
160 |
161 | version: str
162 | Package version
163 |
164 | Returns
165 | =======
166 | bool:
167 | True if it matches, False otherwise.
168 | """
169 | if not name or not version_string:
170 | return False
171 |
172 | try:
173 | version = Version(version_string)
174 | except InvalidVersion:
175 | logger.debug(f"Package {name}=={version_string} has an invalid version")
176 | return False
177 | for requirement in self.blocklist_release_requirements:
178 | if name != requirement.name:
179 | continue
180 | if version in requirement.specifier:
181 | logger.debug(
182 | f"MATCH: Release {name}=={version} matches specifier "
183 | f"{requirement.specifier}"
184 | )
185 | return True
186 | return False
187 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/plugins/test_allowlist_name.py:
--------------------------------------------------------------------------------
1 | import os
2 | from collections import defaultdict
3 | from pathlib import Path
4 | from tempfile import TemporaryDirectory
5 | from unittest import TestCase
6 |
7 | from mock_config import mock_config
8 |
9 | import bandersnatch.filter
10 | import bandersnatch.storage
11 | from bandersnatch.master import Master
12 | from bandersnatch.mirror import BandersnatchMirror
13 | from bandersnatch.package import Package
14 |
15 |
16 | class TestAllowListProject(TestCase):
17 | """
18 | Tests for the bandersnatch filtering classes
19 | """
20 |
21 | tempdir = None
22 | cwd = None
23 |
24 | def setUp(self) -> None:
25 | self.cwd = os.getcwd()
26 | self.tempdir = TemporaryDirectory()
27 | bandersnatch.storage.loaded_storage_plugins = defaultdict(list)
28 | os.chdir(self.tempdir.name)
29 |
30 | def tearDown(self) -> None:
31 | if self.tempdir:
32 | assert self.cwd
33 | os.chdir(self.cwd)
34 | self.tempdir.cleanup()
35 | self.tempdir = None
36 |
37 | def test__plugin__loads__explicitly_enabled(self) -> None:
38 | mock_config(
39 | contents="""\
40 | [plugins]
41 | enabled =
42 | allowlist_project
43 | """
44 | )
45 |
46 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins()
47 | names = [plugin.name for plugin in plugins]
48 | self.assertListEqual(names, ["allowlist_project"])
49 | self.assertEqual(len(plugins), 1)
50 |
51 | def test__plugin__loads__default(self) -> None:
52 | mock_config(
53 | """\
54 | [mirror]
55 | storage-backend = filesystem
56 |
57 | [plugins]
58 | """
59 | )
60 |
61 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins()
62 | names = [plugin.name for plugin in plugins]
63 | self.assertNotIn("allowlist_project", names)
64 |
65 | def test__filter__matches__package(self) -> None:
66 | mock_config(
67 | """\
68 | [mirror]
69 | storage-backend = filesystem
70 |
71 | [plugins]
72 | enabled =
73 | allowlist_project
74 |
75 | [allowlist]
76 | packages =
77 | foo
78 | """
79 | )
80 |
81 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
82 | mirror.packages_to_sync = {"foo": ""}
83 | mirror._filter_packages()
84 |
85 | self.assertIn("foo", mirror.packages_to_sync.keys())
86 |
87 | def test__filter__nomatch_package(self) -> None:
88 | mock_config(
89 | """\
90 | [mirror]
91 | storage-backend = filesystem
92 |
93 | [plugins]
94 | enabled =
95 | allowlist_project
96 |
97 | [allowlist]
98 | packages =
99 | foo
100 | """
101 | )
102 |
103 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
104 | mirror.packages_to_sync = {"foo": "", "foo2": ""}
105 | mirror._filter_packages()
106 |
107 | self.assertIn("foo", mirror.packages_to_sync.keys())
108 | self.assertNotIn("foo2", mirror.packages_to_sync.keys())
109 |
110 | def test__filter__name_only(self) -> None:
111 | mock_config(
112 | """\
113 | [mirror]
114 | storage-backend = filesystem
115 |
116 | [plugins]
117 | enabled =
118 | allowlist_project
119 |
120 | [allowlist]
121 | packages =
122 | foo==1.2.3
123 | """
124 | )
125 |
126 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
127 | mirror.packages_to_sync = {"foo": "", "foo2": ""}
128 | mirror._filter_packages()
129 |
130 | self.assertIn("foo", mirror.packages_to_sync.keys())
131 | self.assertNotIn("foo2", mirror.packages_to_sync.keys())
132 |
133 | def test__filter__varying__specifiers(self) -> None:
134 | mock_config(
135 | """\
136 | [mirror]
137 | storage-backend = filesystem
138 |
139 | [plugins]
140 | enabled =
141 | allowlist_project
142 |
143 | [allowlist]
144 | packages =
145 | foo==1.2.3
146 | bar~=3.0,<=1.5
147 | """
148 | )
149 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
150 | mirror.packages_to_sync = {
151 | "foo": "",
152 | "bar": "",
153 | "snu": "",
154 | }
155 | mirror._filter_packages()
156 |
157 | self.assertEqual({"foo": "", "bar": ""}, mirror.packages_to_sync)
158 |
159 |
160 | class TestAllowlistRelease(TestCase):
161 | """
162 | Tests for the bandersnatch filtering classes
163 | """
164 |
165 | tempdir = None
166 | cwd = None
167 |
168 | def setUp(self) -> None:
169 | self.cwd = os.getcwd()
170 | self.tempdir = TemporaryDirectory()
171 | os.chdir(self.tempdir.name)
172 |
173 | def tearDown(self) -> None:
174 | if self.tempdir:
175 | assert self.cwd
176 | os.chdir(self.cwd)
177 | self.tempdir.cleanup()
178 | self.tempdir = None
179 |
180 | def test__plugin__loads__explicitly_enabled(self) -> None:
181 | mock_config(
182 | """\
183 | [plugins]
184 | enabled =
185 | allowlist_release
186 | """
187 | )
188 |
189 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
190 | names = [plugin.name for plugin in plugins]
191 | self.assertListEqual(names, ["allowlist_release"])
192 | self.assertEqual(len(plugins), 1)
193 |
194 | def test__plugin__doesnt_load__explicitly__disabled(self) -> None:
195 | mock_config(
196 | """\
197 | [plugins]
198 | enabled =
199 | allowlist_package
200 | """
201 | )
202 |
203 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
204 | names = [plugin.name for plugin in plugins]
205 | self.assertNotIn("allowlist_release", names)
206 |
207 | def test__filter__matches__release(self) -> None:
208 | mock_config(
209 | """\
210 | [plugins]
211 | enabled =
212 | allowlist_release
213 | [allowlist]
214 | packages =
215 | foo==1.2.0
216 | """
217 | )
218 |
219 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
220 | pkg = Package("foo", 1)
221 | pkg._metadata = {
222 | "info": {"name": "foo"},
223 | "releases": {"1.2.0": {}, "1.2.1": {}},
224 | }
225 |
226 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
227 |
228 | self.assertEqual(pkg.releases, {"1.2.0": {}})
229 |
230 | def test__dont__filter__prereleases(self) -> None:
231 | mock_config(
232 | """\
233 | [plugins]
234 | enabled =
235 | allowlist_release
236 | [allowlist]
237 | packages =
238 | foo<=1.2.0
239 | """
240 | )
241 |
242 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
243 | pkg = Package("foo", 1)
244 | pkg._metadata = {
245 | "info": {"name": "foo"},
246 | "releases": {
247 | "1.1.0a2": {},
248 | "1.1.1beta1": {},
249 | "1.2.0": {},
250 | "1.2.1": {},
251 | "1.2.2alpha3": {},
252 | "1.2.3rc1": {},
253 | },
254 | }
255 |
256 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
257 |
258 | self.assertEqual(pkg.releases, {"1.1.0a2": {}, "1.1.1beta1": {}, "1.2.0": {}})
259 |
260 | def test__casing__no__affect(self) -> None:
261 | mock_config(
262 | """\
263 | [plugins]
264 | enabled =
265 | allowlist_release
266 | [allowlist]
267 | packages =
268 | Foo<=1.2.0
269 | """
270 | )
271 |
272 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
273 | pkg = Package("foo", 1)
274 | pkg._metadata = {
275 | "info": {"name": "foo"},
276 | "releases": {"1.2.0": {}, "1.2.1": {}},
277 | }
278 |
279 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
280 |
281 | self.assertEqual(pkg.releases, {"1.2.0": {}})
282 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/test_verify.py:
--------------------------------------------------------------------------------
1 | import configparser
2 | import os
3 | import sys
4 | import unittest.mock as mock
5 | from concurrent.futures import ThreadPoolExecutor
6 | from pathlib import Path
7 | from shutil import rmtree
8 | from tempfile import gettempdir
9 | from typing import Any, List
10 |
11 | import pytest
12 | from _pytest.monkeypatch import MonkeyPatch
13 |
14 | import bandersnatch
15 | from bandersnatch.master import Master
16 | from bandersnatch.utils import convert_url_to_path, find
17 |
18 | from bandersnatch.verify import ( # isort:skip
19 | get_latest_json,
20 | delete_unowned_files,
21 | metadata_verify,
22 | verify_producer,
23 | )
24 |
25 |
26 | async def do_nothing(*args: Any, **kwargs: Any) -> None:
27 | pass
28 |
29 |
30 | def some_dirs(*args: Any, **kwargs: Any) -> List[str]:
31 | return ["/data/pypi/web/json/bandersnatch", "/data/pypi/web/json/black"]
32 |
33 |
34 | class FakeArgs:
35 | delete = True
36 | dry_run = True
37 | workers = 2
38 |
39 |
40 | class FakeConfig:
41 | def get(self, section: str, item: str) -> str:
42 | if section == "mirror":
43 | if item == "directory":
44 | return "/data/pypi"
45 | if item == "master":
46 | return "https://pypi.org/simple/"
47 | return ""
48 |
49 | def getfloat(self, section: str, item: str, fallback: float = 0.5) -> float:
50 | return 0.5
51 |
52 |
53 | # TODO: Support testing sharded simple dirs
54 | class FakeMirror:
55 | def __init__(self, entropy: str = "") -> None:
56 | self.mirror_base = Path(gettempdir()) / f"pypi_unittest_{os.getpid()}{entropy}"
57 | if self.mirror_base.exists():
58 | return
59 | self.web_base = self.mirror_base / "web"
60 | self.web_base.mkdir(parents=True)
61 | self.json_path = self.web_base / "json"
62 | self.package_path = self.web_base / "packages"
63 | self.pypi_path = self.web_base / "pypi"
64 | self.simple_path = self.web_base / "simple"
65 |
66 | for web_dir in (
67 | self.json_path,
68 | self.package_path,
69 | self.pypi_path,
70 | self.simple_path,
71 | ):
72 | web_dir.mkdir()
73 |
74 | self.pypi_packages = {
75 | "bandersnatch": {
76 | "bandersnatch-0.6.9": {
77 | "filename": "bandersnatch-0.6.9.tar.gz",
78 | "contents": "69",
79 | "sha256": "b35e87b5838011a3637be660e4238af9a55e4edc74404c990f7a558e7f416658", # noqa: E501
80 | "url": "https://test.pypi.org/packages/8f/1a/6969/bandersnatch-0.6.9.tar.gz", # noqa: E501
81 | }
82 | },
83 | "black": {
84 | "black-2018.6.9": {
85 | "filename": "black-2018.6.9.tar.gz",
86 | "contents": "69",
87 | "sha256": "b35e87b5838011a3637be660e4238af9a55e4edc74404c990f7a558e7f416658", # noqa: E501
88 | "url": "https://test.pypi.org/packages/8f/1a/6969/black-2018.6.9.tar.gz", # noqa: E501
89 | },
90 | "black-2019.6.9": {
91 | "filename": "black-2019.6.9.tar.gz",
92 | "contents": "1469",
93 | "sha256": "c896470f5975bd5dc7d173871faca19848855b01bacf3171e9424b8a993b528b", # noqa: E501
94 | "url": "https://test.pypi.org/packages/8f/1a/1aa0/black-2019.6.9.tar.gz", # noqa: E501
95 | },
96 | },
97 | }
98 |
99 | # Create each subdir of web
100 | self.setup_json()
101 | self.setup_simple()
102 | self.setup_packages()
103 |
104 | def clean_up(self) -> None:
105 | if self.mirror_base.exists():
106 | rmtree(self.mirror_base)
107 |
108 | def setup_json(self) -> None:
109 | for pkg in self.pypi_packages.keys():
110 | pkg_json = self.json_path / pkg
111 | pkg_json.touch()
112 | pkg_legacy_json = self.pypi_path / pkg / "json"
113 | pkg_legacy_json.parent.mkdir()
114 | pkg_legacy_json.symlink_to(str(pkg_json))
115 |
116 | def setup_packages(self) -> None:
117 | for _pkg, dists in self.pypi_packages.items():
118 | for _version, metadata in dists.items():
119 | dist_file = self.web_base / convert_url_to_path(metadata["url"])
120 | dist_file.parent.mkdir(exist_ok=True, parents=True)
121 | with dist_file.open("w") as dfp:
122 | dfp.write(metadata["contents"])
123 |
124 | def setup_simple(self) -> None:
125 | for pkg in self.pypi_packages.keys():
126 | pkg_dir = self.simple_path / pkg
127 | pkg_dir.mkdir()
128 | index_path = pkg_dir / "index.html"
129 | index_path.touch()
130 |
131 |
132 | @pytest.mark.asyncio
133 | async def test_verify_producer(monkeypatch: MonkeyPatch) -> None:
134 | fm = FakeMirror("test_async_verify")
135 | fc = configparser.ConfigParser()
136 | fc["mirror"] = {}
137 | fc["mirror"]["verifiers"] = "2"
138 | master = Master("https://unittest.org")
139 | json_files = ["web/json/bandersnatch", "web/json/black"]
140 | monkeypatch.setattr(bandersnatch.verify, "verify", do_nothing)
141 | await verify_producer(master, fc, [], fm.mirror_base, json_files, mock.Mock(), None)
142 |
143 |
144 | def test_fake_mirror() -> None:
145 | expected_mirror_layout = """\
146 | web
147 | web{0}json
148 | web{0}json{0}bandersnatch
149 | web{0}json{0}black
150 | web{0}packages
151 | web{0}packages{0}8f
152 | web{0}packages{0}8f{0}1a
153 | web{0}packages{0}8f{0}1a{0}1aa0
154 | web{0}packages{0}8f{0}1a{0}1aa0{0}black-2019.6.9.tar.gz
155 | web{0}packages{0}8f{0}1a{0}6969
156 | web{0}packages{0}8f{0}1a{0}6969{0}bandersnatch-0.6.9.tar.gz
157 | web{0}packages{0}8f{0}1a{0}6969{0}black-2018.6.9.tar.gz
158 | web{0}pypi
159 | web{0}pypi{0}bandersnatch
160 | web{0}pypi{0}bandersnatch{0}json
161 | web{0}pypi{0}black
162 | web{0}pypi{0}black{0}json
163 | web{0}simple
164 | web{0}simple{0}bandersnatch
165 | web{0}simple{0}bandersnatch{0}index.html
166 | web{0}simple{0}black
167 | web{0}simple{0}black{0}index.html""".format(
168 | os.sep
169 | )
170 | fm = FakeMirror("_mirror_base_test")
171 | assert expected_mirror_layout == find(str(fm.mirror_base), True)
172 | fm.clean_up()
173 |
174 |
175 | @pytest.mark.asyncio
176 | async def test_delete_unowned_files() -> None:
177 | executor = ThreadPoolExecutor(max_workers=2)
178 | fm = FakeMirror("_test_delete_files")
179 | # Leave out black-2018.6.9.tar.gz so it gets deleted
180 | all_pkgs = [
181 | fm.mirror_base / "web/packages/8f/1a/1aa0/black-2019.6.9.tar.gz",
182 | fm.mirror_base / "web/packages/8f/1a/6969/bandersnatch-0.6.9.tar.gz",
183 | ]
184 | await delete_unowned_files(fm.mirror_base, executor, all_pkgs, True)
185 | await delete_unowned_files(fm.mirror_base, executor, all_pkgs, False)
186 | deleted_path = fm.mirror_base / "web/packages/8f/1a/6969/black-2018.6.9.tar.gz"
187 | assert not deleted_path.exists()
188 | fm.clean_up()
189 |
190 |
191 | @pytest.mark.asyncio
192 | async def test_get_latest_json(monkeypatch: MonkeyPatch) -> None:
193 | config = FakeConfig()
194 | executor = ThreadPoolExecutor(max_workers=2)
195 | json_path = Path(gettempdir()) / f"unittest_{os.getpid()}.json"
196 | master = Master("https://unittest.org")
197 | master.url_fetch = do_nothing # type: ignore
198 | await get_latest_json(master, json_path, config, executor) # type: ignore
199 |
200 |
201 | @pytest.mark.asyncio
202 | async def test_metadata_verify(monkeypatch: MonkeyPatch) -> None:
203 | fa = FakeArgs()
204 | fc = FakeConfig()
205 | monkeypatch.setattr(bandersnatch.verify, "verify_producer", do_nothing)
206 | monkeypatch.setattr(bandersnatch.verify, "delete_unowned_files", do_nothing)
207 | monkeypatch.setattr(bandersnatch.verify.os, "listdir", some_dirs)
208 | await metadata_verify(fc, fa) # type: ignore
209 |
210 |
211 | if __name__ == "__main__":
212 | pytest.main(sys.argv)
213 |
--------------------------------------------------------------------------------
/src/bandersnatch/tests/plugins/test_blocklist_name.py:
--------------------------------------------------------------------------------
1 | import os
2 | from pathlib import Path
3 | from tempfile import TemporaryDirectory
4 | from unittest import TestCase
5 |
6 | from mock_config import mock_config
7 |
8 | import bandersnatch.filter
9 | from bandersnatch.master import Master
10 | from bandersnatch.mirror import BandersnatchMirror
11 | from bandersnatch.package import Package
12 |
13 |
14 | class TestBlockListProject(TestCase):
15 | """
16 | Tests for the bandersnatch filtering classes
17 | """
18 |
19 | tempdir = None
20 | cwd = None
21 |
22 | def setUp(self) -> None:
23 | self.cwd = os.getcwd()
24 | self.tempdir = TemporaryDirectory()
25 | os.chdir(self.tempdir.name)
26 |
27 | def tearDown(self) -> None:
28 | if self.tempdir:
29 | assert self.cwd
30 | os.chdir(self.cwd)
31 | self.tempdir.cleanup()
32 | self.tempdir = None
33 |
34 | def test__plugin__loads__explicitly_enabled(self) -> None:
35 | mock_config(
36 | """\
37 | [plugins]
38 | enabled =
39 | blocklist_project
40 | """
41 | )
42 |
43 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins()
44 | names = [plugin.name for plugin in plugins]
45 | self.assertListEqual(names, ["blocklist_project"])
46 | self.assertEqual(len(plugins), 1)
47 |
48 | def test__plugin__doesnt_load__explicitly__disabled(self) -> None:
49 | mock_config(
50 | """\
51 | [plugins]
52 | enabled =
53 | blocklist_release
54 | """
55 | )
56 |
57 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins()
58 | names = [plugin.name for plugin in plugins]
59 | self.assertNotIn("blocklist_project", names)
60 |
61 | def test__plugin__loads__default(self) -> None:
62 | mock_config(
63 | """\
64 | [blocklist]
65 | """
66 | )
67 |
68 | plugins = bandersnatch.filter.LoadedFilters().filter_project_plugins()
69 | names = [plugin.name for plugin in plugins]
70 | self.assertNotIn("blocklist_project", names)
71 |
72 | def test__filter__matches__package(self) -> None:
73 | mock_config(
74 | """\
75 | [plugins]
76 | enabled =
77 | blocklist_project
78 | [blocklist]
79 | packages =
80 | foo
81 | """
82 | )
83 |
84 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
85 | mirror.packages_to_sync = {"foo": ""}
86 | mirror._filter_packages()
87 |
88 | self.assertNotIn("foo", mirror.packages_to_sync.keys())
89 |
90 | def test__filter__nomatch_package(self) -> None:
91 | mock_config(
92 | """\
93 | [blocklist]
94 | plugins =
95 | blocklist_project
96 | packages =
97 | foo
98 | """
99 | )
100 |
101 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
102 | mirror.packages_to_sync = {"foo2": ""}
103 | mirror._filter_packages()
104 |
105 | self.assertIn("foo2", mirror.packages_to_sync.keys())
106 |
107 | def test__filter__name_only(self) -> None:
108 | mock_config(
109 | """\
110 | [mirror]
111 | storage-backend = filesystem
112 |
113 | [plugins]
114 | enabled =
115 | blocklist_project
116 |
117 | [blocklist]
118 | packages =
119 | foo==1.2.3
120 | """
121 | )
122 |
123 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
124 | mirror.packages_to_sync = {"foo": "", "foo2": ""}
125 | mirror._filter_packages()
126 |
127 | self.assertIn("foo", mirror.packages_to_sync.keys())
128 | self.assertIn("foo2", mirror.packages_to_sync.keys())
129 |
130 | def test__filter__varying__specifiers(self) -> None:
131 | mock_config(
132 | """\
133 | [mirror]
134 | storage-backend = filesystem
135 |
136 | [plugins]
137 | enabled =
138 | blocklist_project
139 |
140 | [blocklist]
141 | packages =
142 | foo==1.2.3
143 | bar~=3.0,<=1.5
144 | snu
145 | """
146 | )
147 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
148 | mirror.packages_to_sync = {
149 | "foo": "",
150 | "foo2": "",
151 | "bar": "",
152 | "snu": "",
153 | }
154 | mirror._filter_packages()
155 |
156 | self.assertEqual({"foo": "", "foo2": "", "bar": ""}, mirror.packages_to_sync)
157 |
158 |
159 | class TestBlockListRelease(TestCase):
160 | """
161 | Tests for the bandersnatch filtering classes
162 | """
163 |
164 | tempdir = None
165 | cwd = None
166 |
167 | def setUp(self) -> None:
168 | self.cwd = os.getcwd()
169 | self.tempdir = TemporaryDirectory()
170 | os.chdir(self.tempdir.name)
171 |
172 | def tearDown(self) -> None:
173 | if self.tempdir:
174 | assert self.cwd
175 | os.chdir(self.cwd)
176 | self.tempdir.cleanup()
177 | self.tempdir = None
178 |
179 | def test__plugin__loads__explicitly_enabled(self) -> None:
180 | mock_config(
181 | """\
182 | [plugins]
183 | enabled =
184 | blocklist_release
185 | """
186 | )
187 |
188 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
189 | names = [plugin.name for plugin in plugins]
190 | self.assertListEqual(names, ["blocklist_release"])
191 | self.assertEqual(len(plugins), 1)
192 |
193 | def test__plugin__doesnt_load__explicitly__disabled(self) -> None:
194 | mock_config(
195 | """\
196 | [plugins]
197 | enabled =
198 | blocklist_package
199 | """
200 | )
201 |
202 | plugins = bandersnatch.filter.LoadedFilters().filter_release_plugins()
203 | names = [plugin.name for plugin in plugins]
204 | self.assertNotIn("blocklist_release", names)
205 |
206 | def test__filter__matches__release(self) -> None:
207 | mock_config(
208 | """\
209 | [plugins]
210 | enabled =
211 | blocklist_release
212 | [blocklist]
213 | packages =
214 | foo==1.2.0
215 | """
216 | )
217 |
218 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
219 | pkg = Package("foo", 1)
220 | pkg._metadata = {
221 | "info": {"name": "foo"},
222 | "releases": {"1.2.0": {}, "1.2.1": {}},
223 | }
224 |
225 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
226 |
227 | self.assertEqual(pkg.releases, {"1.2.1": {}})
228 |
229 | def test__dont__filter__prereleases(self) -> None:
230 | mock_config(
231 | """\
232 | [plugins]
233 | enabled =
234 | blocklist_release
235 | [blocklist]
236 | packages =
237 | foo<=1.2.0
238 | """
239 | )
240 |
241 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
242 | pkg = Package("foo", 1)
243 | pkg._metadata = {
244 | "info": {"name": "foo"},
245 | "releases": {
246 | "1.1.0a2": {},
247 | "1.1.1beta1": {},
248 | "1.2.0": {},
249 | "1.2.1": {},
250 | "1.2.2alpha3": {},
251 | "1.2.3rc1": {},
252 | },
253 | }
254 |
255 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
256 |
257 | self.assertEqual(pkg.releases, {"1.2.1": {}, "1.2.2alpha3": {}, "1.2.3rc1": {}})
258 |
259 | def test__casing__no__affect(self) -> None:
260 | mock_config(
261 | """\
262 | [plugins]
263 | enabled =
264 | blocklist_release
265 | [blocklist]
266 | packages =
267 | Foo<=1.2.0
268 | """
269 | )
270 |
271 | mirror = BandersnatchMirror(Path("."), Master(url="https://foo.bar.com"))
272 | pkg = Package("foo", 1)
273 | pkg._metadata = {
274 | "info": {"name": "foo"},
275 | "releases": {"1.2.0": {}, "1.2.1": {}},
276 | }
277 |
278 | pkg.filter_all_releases(mirror.filters.filter_release_plugins())
279 |
280 | self.assertEqual(pkg.releases, {"1.2.1": {}})
281 |
--------------------------------------------------------------------------------
/src/bandersnatch/verify.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import asyncio
3 | import concurrent.futures
4 | import json
5 | import logging
6 | import os
7 | import shutil
8 | from argparse import Namespace
9 | from asyncio.queues import Queue
10 | from configparser import ConfigParser
11 | from pathlib import Path
12 | from sys import stderr
13 | from typing import List, Optional, Set
14 | from urllib.parse import urlparse
15 |
16 | from .filter import LoadedFilters
17 | from .master import Master
18 | from .storage import storage_backend_plugins
19 | from .utils import convert_url_to_path, hash, recursive_find_files, unlink_parent_dir
20 |
21 | logger = logging.getLogger(__name__)
22 |
23 |
24 | async def get_latest_json(
25 | master: Master,
26 | json_path: Path,
27 | config: ConfigParser,
28 | executor: Optional[concurrent.futures.ThreadPoolExecutor] = None,
29 | delete_removed_packages: bool = False,
30 | ) -> None:
31 | url_parts = urlparse(config.get("mirror", "master"))
32 | url = f"{url_parts.scheme}://{url_parts.netloc}/pypi/{json_path.name}/json"
33 | logger.debug(f"Updating {json_path.name} json from {url}")
34 | new_json_path = json_path.parent / f"{json_path.name}.new"
35 | await master.url_fetch(url, new_json_path, executor)
36 | if new_json_path.exists():
37 | shutil.move(str(new_json_path), json_path)
38 | else:
39 | logger.error(
40 | f"{str(new_json_path)} does not exist - Did not get new JSON metadata"
41 | )
42 | if delete_removed_packages and json_path.exists():
43 | logger.debug(f"Unlinking {json_path} - assuming it does not exist upstream")
44 | json_path.unlink()
45 |
46 |
47 | async def delete_unowned_files(
48 | mirror_base: Path,
49 | executor: concurrent.futures.ThreadPoolExecutor,
50 | all_package_files: List[Path],
51 | dry_run: bool,
52 | ) -> int:
53 | loop = asyncio.get_event_loop()
54 | packages_path = mirror_base / "web" / "packages"
55 | all_fs_files: Set[Path] = set()
56 | await loop.run_in_executor(
57 | executor, recursive_find_files, all_fs_files, packages_path
58 | )
59 |
60 | all_package_files_set = set(all_package_files)
61 | unowned_files = all_fs_files - all_package_files_set
62 | logger.info(
63 | f"We have {len(all_package_files_set)} files. "
64 | + f"{len(unowned_files)} unowned files"
65 | )
66 | if not unowned_files:
67 | logger.info(f"{mirror_base} has no files to delete")
68 | return 0
69 |
70 | if dry_run:
71 | print("[DRY RUN] Unowned file list:", file=stderr)
72 | for f in sorted(unowned_files):
73 | print(f)
74 | else:
75 | del_coros = []
76 | for file_path in unowned_files:
77 | del_coros.append(
78 | loop.run_in_executor(executor, unlink_parent_dir, file_path)
79 | )
80 | await asyncio.gather(*del_coros)
81 |
82 | return 0
83 |
84 |
85 | async def verify(
86 | master: Master,
87 | config: ConfigParser,
88 | json_file: str,
89 | mirror_base_path: Path,
90 | all_package_files: List[Path],
91 | args: argparse.Namespace,
92 | executor: Optional[concurrent.futures.ThreadPoolExecutor] = None,
93 | releases_key: str = "releases",
94 | ) -> None:
95 | json_base = mirror_base_path / "web" / "json"
96 | json_full_path = json_base / json_file
97 | loop = asyncio.get_event_loop()
98 | logger.info(f"Parsing {json_file}")
99 |
100 | if args.json_update:
101 | if not args.dry_run:
102 | await get_latest_json(master, json_full_path, config, executor, args.delete)
103 | else:
104 | logger.info(f"[DRY RUN] Would of grabbed latest json for {json_file}")
105 |
106 | if not json_full_path.exists():
107 | logger.debug(f"Not trying to sync package as {json_full_path} does not exist")
108 | return
109 |
110 | try:
111 | with json_full_path.open("r") as jfp:
112 | pkg = json.load(jfp)
113 | except json.decoder.JSONDecodeError as jde:
114 | logger.error(f"Failed to load {json_full_path}: {jde} - skipping ...")
115 | return
116 |
117 | # apply releases filter plugins like class Package
118 | for plugin in LoadedFilters().filter_release_plugins() or []:
119 | plugin.filter(pkg)
120 |
121 | for release_version in pkg[releases_key]:
122 | for jpkg in pkg[releases_key][release_version]:
123 | pkg_file = mirror_base_path / "web" / convert_url_to_path(jpkg["url"])
124 | if not pkg_file.exists():
125 | if args.dry_run:
126 | logger.info(f"{jpkg['url']} would be fetched")
127 | all_package_files.append(pkg_file)
128 | continue
129 | else:
130 | await master.url_fetch(jpkg["url"], pkg_file, executor)
131 |
132 | calc_sha256 = await loop.run_in_executor(executor, hash, str(pkg_file))
133 | if calc_sha256 != jpkg["digests"]["sha256"]:
134 | if not args.dry_run:
135 | await loop.run_in_executor(None, pkg_file.unlink)
136 | await master.url_fetch(jpkg["url"], pkg_file, executor)
137 | else:
138 | logger.info(
139 | f"[DRY RUN] {jpkg['info']['name']} has a sha256 mismatch."
140 | )
141 |
142 | all_package_files.append(pkg_file)
143 |
144 | logger.info(f"Finished validating {json_file}")
145 |
146 |
147 | async def verify_producer(
148 | master: Master,
149 | config: ConfigParser,
150 | all_package_files: List[Path],
151 | mirror_base_path: Path,
152 | json_files: List[str],
153 | args: argparse.Namespace,
154 | executor: Optional[concurrent.futures.ThreadPoolExecutor] = None,
155 | ) -> None:
156 | queue: asyncio.Queue = asyncio.Queue()
157 | for jf in json_files:
158 | await queue.put(jf)
159 |
160 | async def consume(q: Queue) -> None:
161 | while not q.empty():
162 | json_file = await q.get()
163 | await verify(
164 | master,
165 | config,
166 | json_file,
167 | mirror_base_path,
168 | all_package_files,
169 | args,
170 | executor,
171 | )
172 |
173 | await asyncio.gather(
174 | *[consume(queue)] * config.getint("mirror", "verifiers", fallback=3)
175 | )
176 |
177 |
178 | async def metadata_verify(config: ConfigParser, args: Namespace) -> int:
179 | """Crawl all saved JSON metadata or online to check we have all packages
180 | if delete - generate a diff of unowned files"""
181 | all_package_files: List[Path] = []
182 | loop = asyncio.get_event_loop()
183 |
184 | storage_backend = next(
185 | iter(storage_backend_plugins(config=config, clear_cache=True))
186 | )
187 |
188 | mirror_base_path = storage_backend.PATH_BACKEND(config.get("mirror", "directory"))
189 | json_base = mirror_base_path / "web" / "json"
190 | workers = args.workers or config.getint("mirror", "workers")
191 | executor = concurrent.futures.ThreadPoolExecutor(max_workers=workers)
192 |
193 | logger.info(f"Starting verify for {mirror_base_path} with {workers} workers")
194 | try:
195 | json_files = await loop.run_in_executor(executor, os.listdir, json_base)
196 | except FileExistsError as fee:
197 | logger.error(f"Metadata base dir {json_base} does not exist: {fee}")
198 | return 2
199 | if not json_files:
200 | logger.error("No JSON metadata files found. Can not verify")
201 | return 3
202 |
203 | logger.debug(f"Found {len(json_files)} objects in {json_base}")
204 | logger.debug(f"Using a {workers} thread ThreadPoolExecutor")
205 | async with Master(
206 | config.get("mirror", "master"),
207 | config.getfloat("mirror", "timeout"),
208 | config.getfloat("mirror", "global-timeout", fallback=None),
209 | ) as master:
210 | await verify_producer(
211 | master,
212 | config,
213 | all_package_files,
214 | mirror_base_path,
215 | json_files,
216 | args,
217 | executor,
218 | )
219 |
220 | if not args.delete:
221 | return 0
222 |
223 | return await delete_unowned_files(
224 | mirror_base_path, executor, all_package_files, args.dry_run
225 | )
226 |
--------------------------------------------------------------------------------
/src/bandersnatch/filter.py:
--------------------------------------------------------------------------------
1 | """
2 | Blacklist management
3 | """
4 | from collections import defaultdict
5 | from typing import TYPE_CHECKING, Any, Dict, List
6 |
7 | import pkg_resources
8 |
9 | from .configuration import BandersnatchConfig
10 |
11 | if TYPE_CHECKING:
12 | from configparser import SectionProxy
13 |
14 |
15 | # The API_REVISION is incremented if the plugin class is modified in a
16 | # backwards incompatible way. In order to prevent loading older
17 | # broken plugins that may be installed and will break due to changes to
18 | # the methods of the classes.
19 | PLUGIN_API_REVISION = 2
20 | PROJECT_PLUGIN_RESOURCE = f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.project"
21 | METADATA_PLUGIN_RESOURCE = (
22 | f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.metadata"
23 | )
24 | RELEASE_PLUGIN_RESOURCE = f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.release"
25 | RELEASE_FILE_PLUGIN_RESOURCE = (
26 | f"bandersnatch_filter_plugins.v{PLUGIN_API_REVISION}.release_file"
27 | )
28 |
29 |
30 | class Filter:
31 | """
32 | Base Filter class
33 | """
34 |
35 | name = "filter"
36 | deprecated_name: str = ""
37 |
38 | def __init__(self, *args: Any, **kwargs: Any) -> None:
39 | self.configuration = BandersnatchConfig().config
40 | if (
41 | "plugins" not in self.configuration
42 | or "enabled" not in self.configuration["plugins"]
43 | ):
44 | return
45 |
46 | split_plugins = self.configuration["plugins"]["enabled"].split("\n")
47 | if (
48 | "all" not in split_plugins
49 | and self.name not in split_plugins
50 | # NOTE: Remove after 5.0
51 | and not (self.deprecated_name and self.deprecated_name in split_plugins)
52 | ):
53 | return
54 |
55 | self.initialize_plugin()
56 |
57 | def initialize_plugin(self) -> None:
58 | """
59 | Code to initialize the plugin
60 | """
61 | # The intialize_plugin method is run once to initialize the plugin. This should
62 | # contain all code to set up the plugin.
63 | # This method is not run in the fast path and should be used to do things like
64 | # indexing filter databases, etc that will speed the operation of the filter
65 | # and check_match methods that are called in the fast path.
66 | pass
67 |
68 | def filter(self, metadata: dict) -> bool:
69 | """
70 | Check if the plugin matches based on the package's metadata.
71 |
72 | Returns
73 | =======
74 | bool:
75 | True if the values match a filter rule, False otherwise
76 | """
77 | return False
78 |
79 | def check_match(self, **kwargs: Any) -> bool:
80 | """
81 | Check if the plugin matches based on the arguments provides.
82 |
83 | Returns
84 | =======
85 | bool:
86 | True if the values match a filter rule, False otherwise
87 | """
88 | return False
89 |
90 | # NOTE: These two can be removed in 5.0
91 | @property
92 | def allowlist(self) -> "SectionProxy":
93 | return (
94 | self.configuration["whitelist"]
95 | if self.configuration.has_section("whitelist")
96 | else self.configuration["allowlist"]
97 | )
98 |
99 | @property
100 | def blocklist(self) -> "SectionProxy":
101 | return (
102 | self.configuration["blacklist"]
103 | if self.configuration.has_section("blacklist")
104 | else self.configuration["blocklist"]
105 | )
106 |
107 |
108 | class FilterProjectPlugin(Filter):
109 | """
110 | Plugin that blocks sync operations for an entire project
111 | """
112 |
113 | name = "project_plugin"
114 |
115 |
116 | class FilterMetadataPlugin(Filter):
117 | """
118 | Plugin that blocks sync operations for an entire project based on info fields.
119 | """
120 |
121 | name = "metadata_plugin"
122 |
123 |
124 | class FilterReleasePlugin(Filter):
125 | """
126 | Plugin that modifies the download of specific releases or dist files
127 | """
128 |
129 | name = "release_plugin"
130 |
131 |
132 | class FilterReleaseFilePlugin(Filter):
133 | """
134 | Plugin that modify the download of specific release or dist files
135 | """
136 |
137 | name = "release_file_plugin"
138 |
139 |
140 | class LoadedFilters:
141 | """
142 | A class to load all of the filters enabled
143 | """
144 |
145 | ENTRYPOINT_GROUPS = [
146 | PROJECT_PLUGIN_RESOURCE,
147 | METADATA_PLUGIN_RESOURCE,
148 | RELEASE_PLUGIN_RESOURCE,
149 | RELEASE_FILE_PLUGIN_RESOURCE,
150 | ]
151 |
152 | def __init__(self, load_all: bool = False) -> None:
153 | """
154 | Loads and stores all of specified filters from the config file
155 | """
156 | self.config = BandersnatchConfig().config
157 | self.loaded_filter_plugins: Dict[str, List["Filter"]] = defaultdict(list)
158 | self.enabled_plugins = self._load_enabled()
159 | if load_all:
160 | self._load_filters(self.ENTRYPOINT_GROUPS)
161 |
162 | def _load_enabled(self) -> List[str]:
163 | """
164 | Reads the config and returns all the enabled plugins
165 | """
166 | enabled_plugins: List[str] = []
167 | try:
168 | config_plugins = self.config["plugins"]["enabled"]
169 | split_plugins = config_plugins.split("\n")
170 | if "all" in split_plugins:
171 | enabled_plugins = ["all"]
172 | else:
173 | for plugin in split_plugins:
174 | if not plugin:
175 | continue
176 | enabled_plugins.append(plugin)
177 | except KeyError:
178 | pass
179 | return enabled_plugins
180 |
181 | def _load_filters(self, groups: List[str]) -> None:
182 | """
183 | Loads filters from the entry-point groups specified in groups
184 | """
185 | for group in groups:
186 | plugins = set()
187 | for entry_point in pkg_resources.iter_entry_points(group=group):
188 | plugin_class = entry_point.load()
189 | plugin_instance = plugin_class()
190 | if (
191 | "all" in self.enabled_plugins
192 | or plugin_instance.name in self.enabled_plugins
193 | or plugin_instance.deprecated_name in self.enabled_plugins
194 | ):
195 | plugins.add(plugin_instance)
196 |
197 | self.loaded_filter_plugins[group] = list(plugins)
198 |
199 | def filter_project_plugins(self) -> List[Filter]:
200 | """
201 | Load and return the release filtering plugin objects
202 |
203 | Returns
204 | -------
205 | list of bandersnatch.filter.Filter:
206 | List of objects derived from the bandersnatch.filter.Filter class
207 | """
208 | if PROJECT_PLUGIN_RESOURCE not in self.loaded_filter_plugins:
209 | self._load_filters([PROJECT_PLUGIN_RESOURCE])
210 | return self.loaded_filter_plugins[PROJECT_PLUGIN_RESOURCE]
211 |
212 | def filter_metadata_plugins(self) -> List[Filter]:
213 | """
214 | Load and return the release filtering plugin objects
215 |
216 | Returns
217 | -------
218 | list of bandersnatch.filter.Filter:
219 | List of objects derived from the bandersnatch.filter.Filter class
220 | """
221 | if METADATA_PLUGIN_RESOURCE not in self.loaded_filter_plugins:
222 | self._load_filters([METADATA_PLUGIN_RESOURCE])
223 | return self.loaded_filter_plugins[METADATA_PLUGIN_RESOURCE]
224 |
225 | def filter_release_plugins(self) -> List[Filter]:
226 | """
227 | Load and return the release filtering plugin objects
228 |
229 | Returns
230 | -------
231 | list of bandersnatch.filter.Filter:
232 | List of objects derived from the bandersnatch.filter.Filter class
233 | """
234 | if RELEASE_PLUGIN_RESOURCE not in self.loaded_filter_plugins:
235 | self._load_filters([RELEASE_PLUGIN_RESOURCE])
236 | return self.loaded_filter_plugins[RELEASE_PLUGIN_RESOURCE]
237 |
238 | def filter_release_file_plugins(self) -> List[Filter]:
239 | """
240 | Load and return the release file filtering plugin objects
241 |
242 | Returns
243 | -------
244 | list of bandersnatch.filter.Filter:
245 | List of objects derived from the bandersnatch.filter.Filter class
246 | """
247 | if RELEASE_FILE_PLUGIN_RESOURCE not in self.loaded_filter_plugins:
248 | self._load_filters([RELEASE_FILE_PLUGIN_RESOURCE])
249 | return self.loaded_filter_plugins[RELEASE_FILE_PLUGIN_RESOURCE]
250 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://github.com/ambv/black)
2 | [](https://travis-ci.org/pypa/bandersnatch)
3 | [](https://github.com/pypa/bandersnatch/actions)
4 | [](https://codecov.io/github/codecov/codecov-python)
5 | [](http://bandersnatch.readthedocs.io/en/latest/?badge=latest)
6 | [](https://pyup.io/repos/github/pypa/bandersnatch/)
7 | [](https://pepy.tech/project/bandersnatch)
8 |
9 | ----
10 |
11 | This is a PyPI mirror client according to `PEP 381` + `PEP 503`
12 | http://www.python.org/dev/peps/pep-0381/.
13 |
14 | - bandersnatch >=4.0 supports *Linux*, *MacOSX* + *Windows*
15 | - [Documentation](https://bandersnatch.readthedocs.io/en/latest/)
16 |
17 | **bandersnatch maintainers** are looking for more **help**! Please refer to our
18 | [MAINTAINER](https://github.com/pypa/bandersnatch/blob/master/MAINTAINERS.md)
19 | documentation to see the roles and responsibilities. We would also
20 | ask you read our **Mission Statement** to ensure it aligns with your thoughts for
21 | this project.
22 |
23 | - If interested contact @cooperlees
24 |
25 | `bandersnatch` has its dependencies kept up to date by **[pyup.io](https://pyup.io/)**!
26 |
27 | - If you'd like to have your dependencies kept up to date in your `requirements.txt` or `setup.cfg`,
28 | this is the service for you!
29 |
30 | ## Installation
31 |
32 | The following instructions will place the bandersnatch executable in a
33 | virtualenv under `bandersnatch/bin/bandersnatch`.
34 |
35 | - bandersnatch **requires** `>= Python 3.6.1`
36 |
37 | ## Docker
38 |
39 | This will pull latest build. Please use a specific tag if desired.
40 |
41 | - Docker image includes `/bandersnatch/src/runner.py` to periodically
42 | run a `bandersnatch mirror`
43 | - Please `/bandersnatch/src/runner.py --help` for usage
44 | - With docker, we recommend bind mounting in a read only `bandersnatch.conf`
45 | - Defaults to `/conf/bandersnatch.conf`
46 |
47 | ```shell
48 | docker pull pypa/bandersnatch
49 | docker run pypa/bandersnatch bandersnatch --help
50 | ```
51 |
52 | ### pip
53 |
54 | This installs the latest stable, released version.
55 |
56 | ```shell
57 | python3.6 -m venv bandersnatch
58 | bandersnatch/bin/pip install bandersnatch
59 | bandersnatch/bin/bandersnatch --help
60 | ```
61 |
62 | ## Quickstart
63 |
64 | - Run ``bandersnatch mirror`` - it will create an empty configuration file
65 | for you in ``/etc/bandersnatch.conf``.
66 | - Review ``/etc/bandersnatch.conf`` and adapt to your needs.
67 | - Run ``bandersnatch mirror`` again. It will populate your mirror with the
68 | current status of all PyPI packages.
69 | Current mirror package size can be seen here: https://pypi.org/stats/
70 | - A ``blacklist`` or ``whitelist`` can be created to cut down your mirror size.
71 | You might want to [Analyze PyPI downloads](https://packaging.python.org/guides/analyzing-pypi-package-downloads/)
72 | to determine which packages to add to your list.
73 | - Run ``bandersnatch mirror`` regularly to update your mirror with any
74 | intermediate changes.
75 |
76 | ### Webserver
77 |
78 | Configure your webserver to serve the ``web/`` sub-directory of the mirror.
79 | For nginx it should look something like this:
80 |
81 | ```conf
82 | server {
83 | listen 127.0.0.1:80;
84 | listen [::1]:80;
85 | server_name ;
86 | root /web;
87 | autoindex on;
88 | charset utf-8;
89 | }
90 | ```
91 |
92 | * Note that it is a good idea to have your webserver publish the HTML index
93 | files correctly with UTF-8 as the charset. The index pages will work without
94 | it but if humans look at the pages the characters will end up looking funny.
95 |
96 | * Make sure that the webserver uses UTF-8 to look up unicode path names. nginx
97 | gets this right by default - not sure about others.
98 |
99 |
100 | ### Cron jobs
101 |
102 | You need to set up one cron job to run the mirror itself.
103 |
104 | Here's a sample that you could place in `/etc/cron.d/bandersnatch`:
105 |
106 | ```
107 | LC_ALL=en_US.utf8
108 | */2 * * * * root bandersnatch mirror |& logger -t bandersnatch[mirror]
109 | ```
110 |
111 | This assumes that you have a ``logger`` utility installed that will convert the
112 | output of the commands to syslog entries.
113 |
114 |
115 | ### Maintenance
116 |
117 | bandersnatch does not keep much local state in addition to the mirrored data.
118 | In general you can just keep rerunning `bandersnatch mirror` to make it fix
119 | errors.
120 |
121 | If you want to force bandersnatch to check everything against the master PyPI:
122 |
123 | * run `bandersnatch mirror --force-check` to move status files if they exist in your mirror directory in order get a full sync.
124 |
125 | Be aware that full syncs likely take hours depending on PyPI's performance and your network latency and bandwidth.
126 |
127 | #### Other Commands
128 |
129 | * `bandersnatch delete --help` - Allows you to specify package(s) to be removed from your mirror (*dangerous*)
130 | * `bandersnatch verify --help` - Crawls your repo and fixes any missed files + deletes any unowned files found (*dangerous*)
131 |
132 | ### Operational notes
133 |
134 | #### Case-sensitive filesystem needed
135 |
136 | You need to run bandersnatch on a case-sensitive filesystem.
137 |
138 | OS X natively does this OK even though the filesystem is not strictly
139 | case-sensitive and bandersnatch will work fine when running on OS X. However,
140 | tarring a bandersnatch data directory and moving it to, e.g. Linux with a
141 | case-sensitive filesystem will lead to inconsistencies. You can fix those by
142 | deleting the status files and have bandersnatch run a full check on your data.
143 |
144 | #### Windows requires elevated prompt
145 |
146 | Bandersnatch makes use of symbolic links. On Windows, this permission is turned off by default for non-admin users. In order to run bandersnatch on Windows either call it from an elevated command prompt (i.e. right-click, run-as Administrator) or give yourself symlink permissions in the group policy editor.
147 |
148 | #### Many sub-directories needed
149 |
150 | The PyPI has a quite extensive list of packages that we need to maintain in a
151 | flat directory. Filesystems with small limits on the number of sub-directories
152 | per directory can run into a problem like this:
153 |
154 | 2013-07-09 16:11:33,331 ERROR: Error syncing package: zweb@802449
155 | OSError: [Errno 31] Too many links: '../pypi/web/simple/zweb'
156 |
157 | Specifically we recommend to avoid using ext3. Ext4 and newer does not have the
158 | limitation of 32k sub-directories.
159 |
160 | #### Client Compatibility
161 |
162 | A bandersnatch static mirror is compatible only to the "static", cacheable
163 | parts of PyPI that are needed to support package installation. It does not
164 | support more dynamic APIs of PyPI that maybe be used by various clients for
165 | other purposes.
166 |
167 | An example of an unsupported API is [PyPI's XML-RPC interface](https://warehouse.readthedocs.io/api-reference/xml-rpc/), which is used when running `pip search`.
168 |
169 | ### Bandersnatch Mission
170 | The bandersnatch project strives to:
171 | - Mirror all static objects of the Python Package Index (https://pypi.org/)
172 | - bandersnatch's main goal is to support the main global index to local syncing **only**
173 | - This will allow organizations to have lower latency access to PyPI and
174 | save bandwidth on their WAN connections and more importantly the PyPI CDN
175 | - Custom features and requests may be accepted if they can be of a *plugin* form
176 | - e.g. refer to the `blacklist` and `whitelist` plugins
177 |
178 | ### Contact
179 |
180 | If you have questions or comments, please submit a bug report to
181 | https://github.com/pypa/bandersnatch/issues/new
182 | - IRC: #bandersnatch on *Freenode* (You can use [webchat](https://webchat.freenode.net/?channels=%23bandersnatch) if you don't have an IRC client)
183 |
184 | ### Code of Conduct
185 |
186 | Everyone interacting in the bandersnatch project's codebases, issue trackers,
187 | chat rooms, and mailing lists is expected to follow the
188 | [PSF Code of Conduct](https://github.com/pypa/.github/blob/main/CODE_OF_CONDUCT.md).
189 |
190 | ### Kudos
191 |
192 | This client is based on the original pep381client by *Martin v. Loewis*.
193 |
194 | *Richard Jones* was very patient answering questions at PyCon 2013 and made the
195 | protocol more reliable by implementing some PyPI enhancements.
196 |
197 | *Christian Theune* for creating and maintaining `bandersnatch` for many years!
198 |
--------------------------------------------------------------------------------