├── .codex └── skills ├── collectfasta ├── py.typed ├── tests │ ├── __init__.py │ ├── command │ │ ├── __init__.py │ │ ├── utils.py │ │ ├── test_post_process.py │ │ ├── test_disable.py │ │ └── test_command.py │ ├── strategies │ │ ├── __init__.py │ │ ├── test_hash_strategy.py │ │ └── test_caching_hash_strategy.py │ ├── test_settings.py │ ├── settings.py │ ├── conftest.py │ └── utils.py ├── management │ ├── __init__.py │ └── commands │ │ ├── __init__.py │ │ └── collectstatic.py ├── __init__.py ├── strategies │ ├── __init__.py │ ├── filesystem.py │ ├── gcloud.py │ ├── azure.py │ ├── hashing.py │ ├── base.py │ └── boto3.py └── settings.py ├── .github ├── skills └── workflows │ ├── lint.yaml │ ├── test-suite-unreleased-django.yaml │ ├── release.yaml │ └── test-suite.yaml ├── setup.py ├── localstack └── init.sh ├── .gitignore ├── conftest.py ├── test-requirements.txt ├── .codeclimate.yml ├── MANIFEST.in ├── manage.py ├── Makefile ├── .skills ├── collectfasta-test-verification │ └── SKILL.md └── collectfasta-version-updates │ └── SKILL.md ├── LICENSE ├── docker-compose.yml ├── .pre-commit-config.yaml ├── CHANGELOG.md ├── CODE_OF_CONDUCT.md ├── setup.cfg └── README.md /.codex/skills: -------------------------------------------------------------------------------- 1 | ../.skills -------------------------------------------------------------------------------- /collectfasta/py.typed: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.github/skills: -------------------------------------------------------------------------------- 1 | ../.skills -------------------------------------------------------------------------------- /collectfasta/tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /collectfasta/management/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /collectfasta/tests/command/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /collectfasta/tests/strategies/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /collectfasta/management/commands/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /collectfasta/__init__.py: -------------------------------------------------------------------------------- 1 | __version__ = "3.3.2" 2 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup() 4 | -------------------------------------------------------------------------------- /localstack/init.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | awslocal s3 mb s3://collectfasta 3 | -------------------------------------------------------------------------------- /collectfasta/strategies/__init__.py: -------------------------------------------------------------------------------- 1 | from .base import DisabledStrategy 2 | from .base import Strategy 3 | from .base import load_strategy 4 | 5 | __all__ = ("load_strategy", "Strategy", "DisabledStrategy") 6 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .coverage 3 | .DS_Store 4 | .coveragerc.swp 5 | .tox 6 | *.egg-info 7 | /static/ 8 | /static_root/ 9 | /build/ 10 | dist/ 11 | storage-credentials 12 | .mypy_cache 13 | .idea 14 | .python-version 15 | .dmypy.json 16 | .cache/ 17 | -------------------------------------------------------------------------------- /conftest.py: -------------------------------------------------------------------------------- 1 | def pytest_addoption(parser): 2 | parser.addoption( 3 | "--speedtest", 4 | action="store_true", 5 | dest="speedtest", 6 | default=False, 7 | help="run the test on many files (not for live environments", 8 | ) 9 | -------------------------------------------------------------------------------- /test-requirements.txt: -------------------------------------------------------------------------------- 1 | typing-extensions 2 | mock 3 | coveralls 4 | django-storages[azure,google,s3] 5 | boto3 6 | google-cloud-storage 7 | pytest 8 | pytest-mock 9 | pytest-django 10 | django-stubs 11 | boto3-stubs[s3] 12 | types-s3transfer 13 | pytest-uncollect-if>=0.1.2 14 | -------------------------------------------------------------------------------- /.codeclimate.yml: -------------------------------------------------------------------------------- 1 | version: "2" 2 | plugins: 3 | duplication: 4 | enabled: true 5 | config: 6 | languages: 7 | python: 8 | python_version: 3 9 | sonar-python: 10 | enabled: true 11 | radon: 12 | enabled: true 13 | config: 14 | python_version: 3 15 | -------------------------------------------------------------------------------- /collectfasta/tests/command/utils.py: -------------------------------------------------------------------------------- 1 | from io import StringIO 2 | from typing import Any 3 | 4 | from django.core.management import call_command 5 | 6 | 7 | def call_collectstatic(*args: Any, **kwargs: Any) -> str: 8 | out = StringIO() 9 | call_command( 10 | "collectstatic", *args, verbosity=3, interactive=False, stdout=out, **kwargs 11 | ) 12 | return out.getvalue() 13 | -------------------------------------------------------------------------------- /.github/workflows/lint.yaml: -------------------------------------------------------------------------------- 1 | name: Static analysis 2 | on: 3 | workflow_call: 4 | push: 5 | branches: 6 | - master 7 | pull_request: 8 | jobs: 9 | lint: 10 | runs-on: ubuntu-latest 11 | steps: 12 | - uses: actions/checkout@v4 13 | - uses: actions/setup-python@v5 14 | with: 15 | python-version: 3.11 16 | - uses: pre-commit/action@v3.0.1 17 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.md 2 | include LICENSE 3 | include collectfasta/py.typed 4 | exclude .codex/skills 5 | recursive-exclude .skills *.md 6 | exclude conftest.py 7 | exclude collectfasta/tests 8 | recursive-exclude localstack * 9 | exclude manage.py 10 | exclude Makefile 11 | exclude *.yml 12 | exclude *.yaml 13 | exclude CODE_OF_CONDUCT.md 14 | exclude CHANGELOG.md 15 | exclude test-requirements.txt 16 | -------------------------------------------------------------------------------- /manage.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | """Needed for tests.""" 3 | import sys 4 | 5 | 6 | def main(): 7 | """Run administrative tasks.""" 8 | try: 9 | from django.core.management import execute_from_command_line 10 | except ImportError as exc: 11 | raise ImportError( 12 | "Couldn't import Django. Are you sure it's installed and " 13 | "available on your PYTHONPATH environment variable? Did you " 14 | "forget to activate a virtual environment?" 15 | ) from exc 16 | execute_from_command_line(sys.argv) 17 | 18 | 19 | if __name__ == "__main__": 20 | main() 21 | -------------------------------------------------------------------------------- /collectfasta/strategies/filesystem.py: -------------------------------------------------------------------------------- 1 | from typing import Optional 2 | 3 | from django.core.files.storage import FileSystemStorage 4 | 5 | from .base import CachingHashStrategy 6 | from .base import HashStrategy 7 | 8 | 9 | class FileSystemStrategy(HashStrategy[FileSystemStorage]): 10 | def get_remote_file_hash(self, prefixed_path: str) -> Optional[str]: 11 | try: 12 | return self.get_local_file_hash(prefixed_path, self.remote_storage) 13 | except FileNotFoundError: 14 | return None 15 | 16 | 17 | class CachingFileSystemStrategy( 18 | CachingHashStrategy[FileSystemStorage], FileSystemStrategy 19 | ): ... 20 | -------------------------------------------------------------------------------- /collectfasta/strategies/gcloud.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import binascii 3 | from typing import Optional 4 | 5 | from google.api_core.exceptions import NotFound 6 | from storages.backends.gcloud import GoogleCloudStorage 7 | 8 | from .base import CachingHashStrategy 9 | 10 | 11 | class GoogleCloudStrategy(CachingHashStrategy[GoogleCloudStorage]): 12 | delete_not_found_exception = (NotFound,) 13 | 14 | def get_remote_file_hash(self, prefixed_path: str) -> Optional[str]: 15 | normalized_path = prefixed_path.replace("\\", "/") 16 | blob = self.remote_storage.bucket.get_blob(normalized_path) 17 | if blob is None: 18 | return blob 19 | md5_base64 = blob._properties["md5Hash"] 20 | return binascii.hexlify(base64.urlsafe_b64decode(md5_base64)).decode() 21 | -------------------------------------------------------------------------------- /.github/workflows/test-suite-unreleased-django.yaml: -------------------------------------------------------------------------------- 1 | name: Test unreleased Django 2 | on: 3 | schedule: 4 | - cron: '30 10 * * 1-5' 5 | workflow_dispatch: 6 | jobs: 7 | test: 8 | runs-on: ubuntu-latest 9 | name: Test latest stable Python with unreleased Django 10 | steps: 11 | - uses: actions/checkout@v4 12 | - uses: actions/setup-python@v5 13 | - uses: actions/cache@v4 14 | with: 15 | path: ~/.cache/pip 16 | key: ${{ runner.os }}-pip-${{ hashFiles('setup.cfg') }} 17 | restore-keys: ${{ runner.os }}-pip 18 | - name: Install dependencies 19 | run: | 20 | pip install --upgrade https://github.com/django/django/archive/main.tar.gz 21 | pip install --upgrade -r test-requirements.txt 22 | pip install . 23 | - name: Run tests 24 | run: make docker-up && coverage run -m pytest -svv --speedtest 25 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SHELL := /usr/bin/env bash 2 | 3 | docker-up: 4 | docker compose up -d --wait 5 | test: 6 | pytest -svv 7 | 8 | test-docker: docker-up 9 | pytest -svv; docker compose down 10 | 11 | test-docker-all: docker-up 12 | pytest -x --speedtest -svv; docker compose down 13 | 14 | test-docker-ff: docker-up 15 | pytest -svv -x; docker compose down 16 | 17 | test-speed: docker-up 18 | pytest -x --speedtest -m speed_test -svv; docker compose down 19 | 20 | test-skip-live: 21 | SKIP_LIVE_TESTS=true pytest 22 | 23 | test-coverage: 24 | coverage run --source collectfasta -m pytest 25 | 26 | clean: 27 | rm -rf Collectfasta.egg-info __pycache__ build dist 28 | 29 | build: clean 30 | python3 -m pip install --upgrade wheel twine setuptools 31 | python3 setup.py sdist bdist_wheel 32 | 33 | distribute: build 34 | python3 -m twine upload dist/* 35 | 36 | test-distribute: build 37 | python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/* 38 | 39 | checks: 40 | pre-commit run --all-files 41 | -------------------------------------------------------------------------------- /.skills/collectfasta-test-verification/SKILL.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: collectfasta-test-verification 3 | description: Verify Collectfasta changes by running the project test commands defined in the Makefile. Use when asked to run or verify tests, or just generally when making changes. 4 | --- 5 | 6 | # Collectfasta Test Verification 7 | 8 | ## Choose the right test target 9 | 10 | - Read `Makefile` to find the appropriate test target. 11 | - For linting and formatting, run `make checks` 12 | - Since you don't have credentials, only use the docker tests 13 | - Default to `make test-docker` for a standard run. 14 | - If you want to test performance, run `make test-speed` - otherwise, avoid this, it is slower 15 | - If you want to investigate an error, run `make test-docker-ff` - it is fast-fail and verbose logging 16 | 17 | ## Run and report 18 | 19 | - Run the selected `make` target. 20 | - If a command times out or fails, report the partial output and the exit status. 21 | - Suggest a narrower target if the default run is too slow. 22 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2013-2020 Anton Agestam 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /collectfasta/tests/command/test_post_process.py: -------------------------------------------------------------------------------- 1 | from unittest import mock 2 | 3 | from django.contrib.staticfiles.storage import StaticFilesStorage 4 | from django.test import override_settings as override_django_settings 5 | 6 | from collectfasta.management.commands.collectstatic import Command 7 | from collectfasta.tests.utils import clean_static_dir 8 | from collectfasta.tests.utils import create_static_file 9 | from collectfasta.tests.utils import override_setting 10 | 11 | 12 | class MockPostProcessing(StaticFilesStorage): 13 | def __init__(self): 14 | super().__init__() 15 | self.post_process = mock.MagicMock() 16 | 17 | 18 | @override_setting("threads", 2) 19 | @override_django_settings( 20 | STORAGES={ 21 | "staticfiles": { 22 | "BACKEND": "collectfasta.tests.command.test_post_process.MockPostProcessing", 23 | }, 24 | }, 25 | ) 26 | def test_calls_post_process_with_collected_files() -> None: 27 | clean_static_dir() 28 | path = create_static_file() 29 | 30 | cmd = Command() 31 | cmd.run_from_argv(["manage.py", "collectstatic", "--noinput"]) 32 | cmd.storage.post_process.assert_called_once_with( 33 | {path.name: (mock.ANY, path.name)}, dry_run=False 34 | ) 35 | -------------------------------------------------------------------------------- /collectfasta/settings.py: -------------------------------------------------------------------------------- 1 | from typing import Container 2 | from typing import Type 3 | from typing import TypeVar 4 | 5 | from django.conf import settings 6 | from typing_extensions import Final 7 | 8 | T = TypeVar("T") 9 | 10 | 11 | def _get_setting(type_: Type[T], key: str, default: T) -> T: 12 | value = getattr(settings, key, default) 13 | if not isinstance(value, type_): 14 | raise ValueError( 15 | f"The {key!r} setting must be of type {type_!r}, found {type(value)}" 16 | ) 17 | return value 18 | 19 | 20 | debug: Final = _get_setting( 21 | bool, "COLLECTFASTA_DEBUG", _get_setting(bool, "DEBUG", False) 22 | ) 23 | cache_key_prefix: Final = _get_setting( 24 | str, "COLLECTFASTA_CACHE_KEY_PREFIX", "collectfasta06_asset_" 25 | ) 26 | cache: Final = _get_setting(str, "COLLECTFASTA_CACHE", "default") 27 | threads: Final = _get_setting(int, "COLLECTFASTA_THREADS", 0) 28 | enabled: Final = _get_setting(bool, "COLLECTFASTA_ENABLED", True) 29 | aws_is_gzipped: Final = _get_setting(bool, "AWS_IS_GZIPPED", False) 30 | gzip_content_types: Final[Container] = _get_setting( 31 | tuple, 32 | "GZIP_CONTENT_TYPES", 33 | ( 34 | "text/css", 35 | "text/javascript", 36 | "application/javascript", 37 | "application/x-javascript", 38 | "image/svg+xml", 39 | ), 40 | ) 41 | -------------------------------------------------------------------------------- /.github/workflows/release.yaml: -------------------------------------------------------------------------------- 1 | name: Release 2 | on: 3 | release: 4 | types: [released] 5 | jobs: 6 | call-lint: 7 | uses: jasongi/collectfasta/.github/workflows/lint.yaml@master 8 | secrets: inherit 9 | call-test-suite: 10 | uses: jasongi/collectfasta/.github/workflows/test-suite.yaml@master 11 | secrets: inherit 12 | build: 13 | runs-on: ubuntu-latest 14 | steps: 15 | - uses: actions/checkout@v4 16 | - uses: actions/setup-python@v5 17 | with: 18 | python-version: 3.11 19 | - run: make build 20 | - name: Store the distribution packages 21 | uses: actions/upload-artifact@v4 22 | with: 23 | name: python-package-distributions 24 | path: dist/ 25 | publish-to-pypi: 26 | name: >- 27 | Publish Python 🐍 distribution 📦 to PyPI 28 | needs: 29 | - build 30 | - call-lint 31 | - call-test-suite 32 | runs-on: ubuntu-latest 33 | environment: 34 | name: pypi 35 | url: https://pypi.org/p/collectfasta 36 | permissions: 37 | id-token: write # IMPORTANT: mandatory for trusted publishing 38 | steps: 39 | - name: Download all the dists 40 | uses: actions/download-artifact@v4 41 | with: 42 | name: python-package-distributions 43 | path: dist/ 44 | - name: Publish distribution 📦 to PyPI 45 | uses: pypa/gh-action-pypi-publish@release/v1 46 | -------------------------------------------------------------------------------- /.skills/collectfasta-version-updates/SKILL.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: collectfasta-version-updates 3 | description: Update Collectfasta's supported Python/Django versions across CI, packaging metadata, and changelog when new runtimes are released or deprecated. Use for tasks like adjusting GitHub Actions test matrices, setup.cfg classifiers, python_requires, and documenting support changes. 4 | --- 5 | 6 | # Collectfasta Version Updates 7 | 8 | ## Update the CI matrix 9 | 10 | - Edit `.github/workflows/test-suite.yaml`. 11 | - Verify supported Django/Python combinations using https://docs.djangoproject.com/en/dev/faq/install/#what-python-version-can-i-use-with-django and only include compatible pairs. 12 | - Add new Python/Django versions to the matrix and live-test job. 13 | - Remove EOL Python versions and unsupported Django versions. 14 | - Keep version ranges pinned to minor series (e.g., `>=6.0,<6.1`). 15 | 16 | ## Update package metadata 17 | 18 | - Edit `setup.cfg`. 19 | - Update `python_requires` to match the minimum supported Python. 20 | - Update `Programming Language :: Python :: X.Y` classifiers. 21 | - Add `Framework :: Django :: X.Y` classifiers for supported series. 22 | - Update `mypy` `python_version` to the minimum supported Python. 23 | - Update version in collectfasta/__init__.py 24 | ## Update the changelog 25 | 26 | - Add a new entry at the top of `CHANGELOG.md`. 27 | - Mention added/removed Python and Django versions and CI/metadata updates. 28 | 29 | ## Create a branch and push changes 30 | 31 | - Create a new git branch for the change and commit to that branch. 32 | - Push the branch to the remote; never push directly to `master`. 33 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | services: 2 | localstack-collectfasta: 3 | networks: 4 | localstack: 5 | # Set the container IP address in the 10.0.2.0/24 subnet 6 | ipv4_address: 10.0.4.20 7 | container_name: "localstack-collectfasta" # Container name in your docker 8 | image: localstack/localstack:3 9 | # Will download latest version of localstack 10 | #image: localstack/localstack-full:latest # Full image support WebUI 11 | ports: 12 | - "4567:4566" # Default port forward 13 | - "9201:4571" # Elasticsearch port forward 14 | - "8082:8080" # WebUI port forward 15 | environment: 16 | - SERVICES=s3 #AWS Services that you want in your localstack 17 | - DEBUG=1 # Debug level 1 if you want to logs, 0 if you want to disable 18 | - START_WEB=1 # Flag to control whether the Web UI should be started in Docker 19 | - LAMBDA_REMOTE_DOCKER=0 20 | - DEFAULT_REGION=ap-southeast-2 21 | - DOCKER_HOST=unix:///var/run/docker.sock 22 | volumes: 23 | - ./localstack:/etc/localstack/init/ready.d 24 | - '/var/run/docker.sock:/var/run/docker.sock' 25 | gcloudstorage: 26 | image: fsouza/fake-gcs-server 27 | ports: 28 | - '6050:4443' 29 | command: -scheme http 30 | azurite: 31 | image: mcr.microsoft.com/azure-storage/azurite 32 | command: azurite-blob --blobHost 0.0.0.0 --blobPort 10000 --skipApiVersionCheck --loose 33 | ports: 34 | - '10000:10000' # Blob service 35 | 36 | networks: 37 | localstack: 38 | ipam: 39 | config: 40 | # Specify the subnet range for IP address allocation 41 | - subnet: 10.0.4.0/24 42 | -------------------------------------------------------------------------------- /collectfasta/tests/strategies/test_hash_strategy.py: -------------------------------------------------------------------------------- 1 | import re 2 | import tempfile 3 | 4 | from django.contrib.staticfiles.storage import StaticFilesStorage 5 | from django.core.files.storage import FileSystemStorage 6 | from pytest_mock import MockerFixture 7 | 8 | from collectfasta.strategies.base import HashStrategy 9 | 10 | 11 | class Strategy(HashStrategy[FileSystemStorage]): 12 | def __init__(self) -> None: 13 | super().__init__(FileSystemStorage()) 14 | 15 | def get_remote_file_hash(self, prefixed_path: str) -> None: 16 | pass 17 | 18 | 19 | def test_get_file_hash() -> None: 20 | strategy = Strategy() 21 | local_storage = StaticFilesStorage() 22 | 23 | with tempfile.NamedTemporaryFile(dir=local_storage.base_location) as f: 24 | f.write(b"spam") 25 | hash_ = strategy.get_local_file_hash(f.name, local_storage) 26 | assert re.fullmatch(r"^[A-z0-9]{32}$", hash_) is not None 27 | 28 | 29 | def test_should_copy_file(mocker: MockerFixture) -> None: 30 | strategy = Strategy() 31 | local_storage = StaticFilesStorage() 32 | remote_hash = "foo" 33 | mocker.patch.object( 34 | strategy, "get_remote_file_hash", mocker.MagicMock(return_value=remote_hash) 35 | ) 36 | mocker.patch.object( 37 | strategy, "get_local_file_hash", mocker.MagicMock(return_value=remote_hash) 38 | ) 39 | assert not strategy.should_copy_file("path", "prefixed_path", local_storage) 40 | mocker.patch.object( 41 | strategy, "get_local_file_hash", mocker.MagicMock(return_value="bar") 42 | ) 43 | assert strategy.should_copy_file("path", "prefixed_path", local_storage) 44 | -------------------------------------------------------------------------------- /collectfasta/tests/command/test_disable.py: -------------------------------------------------------------------------------- 1 | from django.test import override_settings as override_django_settings 2 | from pytest_mock import MockerFixture 3 | 4 | from collectfasta.tests.conftest import StrategyFixture 5 | from collectfasta.tests.conftest import live_test 6 | from collectfasta.tests.utils import clean_static_dir 7 | from collectfasta.tests.utils import create_static_file 8 | from collectfasta.tests.utils import override_setting 9 | 10 | from .utils import call_collectstatic 11 | 12 | 13 | @override_django_settings( 14 | STORAGES={ 15 | "staticfiles": { 16 | "BACKEND": "django.contrib.staticfiles.storage.StaticFilesStorage", 17 | }, 18 | }, 19 | ) 20 | def test_disable_collectfasta_with_default_storage() -> None: 21 | clean_static_dir() 22 | create_static_file() 23 | assert "1 static file copied" in call_collectstatic(disable_collectfasta=True) 24 | 25 | 26 | @live_test 27 | def test_disable_collectfasta(strategy: StrategyFixture) -> None: 28 | clean_static_dir() 29 | create_static_file() 30 | assert "1 static file copied" in call_collectstatic(disable_collectfasta=True) 31 | 32 | 33 | @override_setting("enabled", False) 34 | def test_no_load_with_disable_setting(mocker: MockerFixture) -> None: 35 | mocked_load_strategy = mocker.patch( 36 | "collectfasta.management.commands.collectstatic.Command._load_strategy" 37 | ) 38 | clean_static_dir() 39 | call_collectstatic() 40 | mocked_load_strategy.assert_not_called() 41 | 42 | 43 | def test_no_load_with_disable_flag(mocker: MockerFixture) -> None: 44 | mocked_load_strategy = mocker.patch( 45 | "collectfasta.management.commands.collectstatic.Command._load_strategy" 46 | ) 47 | clean_static_dir() 48 | call_collectstatic(disable_collectfasta=True) 49 | mocked_load_strategy.assert_not_called() 50 | -------------------------------------------------------------------------------- /collectfasta/tests/test_settings.py: -------------------------------------------------------------------------------- 1 | from importlib import reload 2 | 3 | import pytest 4 | from django.test.utils import override_settings 5 | 6 | from collectfasta import settings 7 | 8 | 9 | @override_settings(FOO=2) 10 | def test_get_setting_returns_valid_value(): 11 | assert 2 == settings._get_setting(int, "FOO", 1) 12 | 13 | 14 | def test_get_setting_returns_default_value_for_missing_setting(): 15 | assert 1 == settings._get_setting(int, "FOO", 1) 16 | 17 | 18 | @override_settings(FOO="bar") 19 | def test_get_setting_raises_for_invalid_type(): 20 | with pytest.raises(ValueError): 21 | settings._get_setting(int, "FOO", 1) 22 | 23 | 24 | def test_basic_settings(): 25 | with override_settings( 26 | COLLECTFASTA_DEBUG=True, 27 | COLLECTFASTA_CACHE="custom", 28 | COLLECTFASTA_ENABLED=False, 29 | AWS_IS_GZIPPED=True, 30 | GZIP_CONTENT_TYPES=("text/css", "text/javascript"), 31 | COLLECTFASTA_THREADS=0, 32 | ): 33 | reload(settings) 34 | assert settings.debug is True 35 | assert isinstance(settings.cache_key_prefix, str) 36 | assert settings.cache == "custom" 37 | assert settings.enabled is False 38 | assert isinstance(settings.gzip_content_types, tuple) 39 | assert settings.threads == 0 40 | 41 | 42 | def test_settings_with_threads(): 43 | with override_settings(COLLECTFASTA_THREADS=22): 44 | reload(settings) 45 | assert settings.threads == 22 46 | 47 | 48 | @pytest.mark.parametrize( 49 | "django_settings", 50 | ( 51 | {"COLLECTFASTA_DEBUG": "True"}, 52 | {"COLLECTFASTA_CACHE_KEY_PREFIX": 1}, 53 | {"COLLECTFASTA_CACHE": None}, 54 | {"COLLECTFASTA_THREADS": None}, 55 | {"COLLECTFASTA_ENABLED": 1}, 56 | {"AWS_IS_GZIPPED": "yes"}, 57 | {"GZIP_CONTENT_TYPES": "not tuple"}, 58 | ), 59 | ids=lambda x: list(x.keys())[0], 60 | ) 61 | def test_invalid_setting_type_raises_value_error(django_settings): 62 | with override_settings(**django_settings): 63 | with pytest.raises(ValueError): 64 | reload(settings) 65 | -------------------------------------------------------------------------------- /collectfasta/strategies/azure.py: -------------------------------------------------------------------------------- 1 | import binascii 2 | import hashlib 3 | from functools import lru_cache 4 | from pathlib import Path 5 | from typing import Union 6 | 7 | from azure.core.exceptions import ResourceNotFoundError 8 | from django.core.files.storage import FileSystemStorage 9 | from storages.backends.azure_storage import AzureStorage 10 | 11 | from .base import CachingHashStrategy 12 | 13 | 14 | class AzureBlobStrategy(CachingHashStrategy[AzureStorage]): 15 | delete_not_found_exception = (ResourceNotFoundError,) 16 | 17 | def get_remote_file_hash(self, prefixed_path: str) -> Union[str, None]: 18 | normalized_path = prefixed_path.replace("\\", "/") 19 | 20 | blob_client = self.remote_storage.service_client.get_blob_client( 21 | container=self.remote_storage.azure_container, 22 | blob=normalized_path, 23 | ) 24 | 25 | try: 26 | properties = blob_client.get_blob_properties() 27 | 28 | # If content_md5 is available (<4MiB), use it 29 | if properties.content_settings.content_md5: 30 | return binascii.hexlify( 31 | properties.content_settings.content_md5 32 | ).decode() 33 | 34 | # For larger files, create a hash from size and path 35 | size = properties.size 36 | 37 | return self._create_composite_hash(normalized_path, size) 38 | 39 | except ResourceNotFoundError: 40 | return None 41 | 42 | def _create_composite_hash(self, path: str, size: int) -> str: 43 | hash_components = f"{path}|{size}" 44 | return hashlib.md5(hash_components.encode()).hexdigest() 45 | 46 | @lru_cache(maxsize=None) 47 | def get_local_file_hash(self, path: str, local_storage: FileSystemStorage) -> str: 48 | stat = (Path(local_storage.base_location) / path).stat() 49 | file_size = stat.st_size 50 | 51 | # For smaller files (<4MiB), use MD5 of content 52 | # https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob?tabs=microsoft-entra-id#response-headers 53 | if file_size < 4 * 1024 * 1024: # 4MiB 54 | return super().get_local_file_hash(path, local_storage) 55 | 56 | # For larger files, create a composite hash using only size and path 57 | return self._create_composite_hash(path, file_size) 58 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | default_language_version: 2 | python: python3.11 3 | repos: 4 | - repo: https://github.com/pre-commit/pre-commit-hooks 5 | rev: v4.5.0 6 | hooks: 7 | - id: check-case-conflict 8 | - id: check-merge-conflict 9 | - id: end-of-file-fixer 10 | - id: trailing-whitespace 11 | - id: debug-statements 12 | - id: detect-private-key 13 | - repo: https://github.com/asottile/pyupgrade 14 | rev: v3.15.1 15 | hooks: 16 | - id: pyupgrade 17 | args: 18 | - --py36-plus 19 | - repo: https://github.com/myint/autoflake 20 | rev: v2.3.0 21 | hooks: 22 | - id: autoflake 23 | args: 24 | - --in-place 25 | - --remove-all-unused-imports 26 | - --ignore-init-module-imports 27 | - repo: https://github.com/pycqa/isort 28 | rev: 5.13.2 29 | hooks: 30 | - id: isort 31 | - repo: https://github.com/psf/black 32 | rev: 24.2.0 33 | hooks: 34 | - id: black 35 | - repo: https://github.com/asottile/blacken-docs 36 | rev: 1.16.0 37 | hooks: 38 | - id: blacken-docs 39 | additional_dependencies: [black==24.2.0] 40 | - repo: https://github.com/PyCQA/flake8 41 | rev: 7.0.0 42 | hooks: 43 | - id: flake8 44 | additional_dependencies: 45 | - flake8-bugbear 46 | - flake8-comprehensions 47 | - flake8-tidy-imports 48 | - repo: https://github.com/sirosen/check-jsonschema 49 | rev: 0.28.0 50 | hooks: 51 | - id: check-github-workflows 52 | - repo: https://github.com/pre-commit/mirrors-mypy 53 | rev: v1.16.0 54 | hooks: 55 | - id: mypy 56 | args: [] 57 | additional_dependencies: 58 | - typing-extensions 59 | - mock 60 | - coveralls 61 | - django-storages[azure,google,s3] 62 | - boto3 63 | - google-cloud-storage 64 | - pytest 65 | - pytest-mock 66 | - pytest-django 67 | - django-stubs==5.1.0 68 | - boto3-stubs[s3] 69 | - types-s3transfer 70 | - pytest-uncollect-if>=0.1.2 71 | - repo: https://github.com/mgedmin/check-manifest 72 | rev: "0.49" 73 | hooks: 74 | - id: check-manifest 75 | 76 | exclude: | 77 | (?x)( 78 | /( 79 | \.eggs 80 | | \.git 81 | | \.hg 82 | | \.mypy_cache 83 | | \.pytest_cache 84 | | \.nox 85 | | \.tox 86 | | \.venv 87 | | _build 88 | | buck-out 89 | | build 90 | | dist 91 | )/ 92 | ) 93 | -------------------------------------------------------------------------------- /collectfasta/tests/strategies/test_caching_hash_strategy.py: -------------------------------------------------------------------------------- 1 | import string 2 | 3 | from django.core.files.storage import FileSystemStorage 4 | from pytest_mock import MockerFixture 5 | 6 | from collectfasta import settings 7 | from collectfasta.strategies.base import CachingHashStrategy 8 | 9 | hash_characters = string.ascii_letters + string.digits 10 | 11 | 12 | class Strategy(CachingHashStrategy[FileSystemStorage]): 13 | def __init__(self) -> None: 14 | super().__init__(FileSystemStorage()) 15 | 16 | def get_remote_file_hash(self, prefixed_path: str) -> None: 17 | pass 18 | 19 | 20 | def test_get_cache_key() -> None: 21 | strategy = Strategy() 22 | cache_key = strategy.get_cache_key("/some/random/path") 23 | prefix_len = len(settings.cache_key_prefix) 24 | # case.assertTrue(cache_key.startswith(settings.cache_key_prefix)) 25 | assert cache_key.startswith(settings.cache_key_prefix) 26 | assert len(cache_key) == 32 + prefix_len 27 | expected_chars = hash_characters + "_" 28 | for c in cache_key: 29 | assert c in expected_chars 30 | 31 | 32 | def test_gets_and_invalidates_hash(mocker: MockerFixture) -> None: 33 | strategy = Strategy() 34 | expected_hash = "hash" 35 | mocked = mocker.patch.object( 36 | strategy, 37 | "get_remote_file_hash", 38 | new=mocker.MagicMock(return_value=expected_hash), 39 | ) 40 | # empty cache 41 | result_hash = strategy.get_cached_remote_file_hash("path", "prefixed_path") 42 | assert result_hash == expected_hash 43 | mocked.assert_called_once_with("prefixed_path") 44 | 45 | # populated cache 46 | mocked.reset_mock() 47 | result_hash = strategy.get_cached_remote_file_hash("path", "prefixed_path") 48 | assert result_hash == expected_hash 49 | mocked.assert_not_called() 50 | 51 | # test destroy_etag 52 | mocked.reset_mock() 53 | strategy.invalidate_cached_hash("path") 54 | result_hash = strategy.get_cached_remote_file_hash("path", "prefixed_path") 55 | assert result_hash == expected_hash 56 | mocked.assert_called_once_with("prefixed_path") 57 | 58 | 59 | def test_post_copy_hook_primes_cache(mocker: MockerFixture) -> None: 60 | filename = "123abc" 61 | expected_hash = "abc123" 62 | strategy = Strategy() 63 | 64 | mocker.patch.object( 65 | strategy, "get_local_file_hash", return_value=expected_hash, autospec=True 66 | ) 67 | strategy.post_copy_hook(filename, filename, strategy.remote_storage) 68 | 69 | assert strategy.get_cached_remote_file_hash(filename, filename) == expected_hash 70 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | ## 3.3.2 3 | - add support for Django 6.0 and Python 3.14 4 | - drop support for Python 3.9 and Django 5.1 5 | - add Django framework classifiers and update CI matrix 6 | 7 | ## 3.3.1 8 | - fix for upcoming release of django 6.0 changes in django commit c1aa4a7a79a56fa46793d128bcf42795e2149595 which returns a "deleted" key from the collect command 9 | 10 | ## 3.3.0 11 | - added support for `storages.backends.azure_storage.AzureStorage` - contributed by @jgoedeke 12 | - support for Django 5.2 13 | 14 | ## 3.2.1 15 | - change minimum django-storages version 16 | - support for Django 5.1 and python 3.13 17 | 18 | ## 3.2.0 19 | - Add support for custom S3ManifestStaticStorage subclasses with location set. 20 | - Fix edge case where location is in the filename 21 | 22 | ## 3.1.3 23 | - fixed 2-pass to copy subdirectories 24 | 25 | ## 3.1.2 26 | - fix types to work with python 3.12 27 | 28 | ## 3.1.1 29 | - removed type ignores, updated tests 30 | 31 | ## 3.1.0 32 | - add new strategies for two-pass collectstatic where the first pass is file or memory based 33 | 34 | ## 3.0.1 35 | - Refactor boto3 strategy to wrap the storage classes to re-introduce preloading of metadata 36 | 37 | ## 3.0.0 38 | 39 | - Rename to collectfasta with new maintainer/repo 40 | - Remove some deprecated settings 41 | - Ability to run live tests against localstack/fake GCP instead of the real APIs 42 | - refactor tests to use the STORAGES config 43 | - implement preloading of S3 metadata for boto3 strategy as it was removed by django-storages 44 | - dropped support for Python 3.6-3.8 45 | - dropped support for Django < 4.2 46 | 47 | ## 2.2.0 48 | 49 | - Add `post_copy_hook` and `on_skip_hook` to 50 | `collectfasta.strategies.base.Strategy`. 51 | - Add `collectfasta.strategies.filesystem.CachingFileSystemStrategy`. 52 | - Fix a bug where files weren't properly closed when read for hashing. 53 | - Fix a bug where gzip compression level was inconsistent with S3. 54 | 55 | 56 | ## 2.1.0 57 | 58 | - Use `concurrent.futures.ThreadPoolExecutor` instead of 59 | `multiprocessing.dummy.Pool` for parallel uploads. 60 | - Support `post_process()`. 61 | 62 | ## 2.0.1 63 | 64 | - Fix and add regression test for #178 (wrong type for `COLLECTFAST_THREADS`). 65 | - Add tests for strictly typed settings. 66 | 67 | ## 2.0.0 68 | 69 | - Drop support for Python 3.5. 70 | - Drop support for Django 1.11. 71 | - Drop support for `storages.backends.s3boto.S3BotoStorage` (remove 72 | `collectfasta.strategies.boto.BotoStrategy`). 73 | - Drop support for guessing strategies, e.g. require 74 | `COLLECTFASTA_STRATEGY` to be set. 75 | - Package type hints. 76 | - Support django-storages 1.9+. 77 | - Validate types of settings. 78 | 79 | ## Previous versions 80 | 81 | For changes in previous versions see [releases on Github][releases]. 82 | 83 | [releases]: https://github.com/jasongi/collectfasta/releases 84 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 6 | 7 | ## Our Standards 8 | 9 | Examples of behavior that contributes to creating a positive environment include: 10 | 11 | * Using welcoming and inclusive language 12 | * Being respectful of differing viewpoints and experiences 13 | * Gracefully accepting constructive criticism 14 | * Focusing on what is best for the community 15 | * Showing empathy towards other community members 16 | 17 | Examples of unacceptable behavior by participants include: 18 | 19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 20 | * Trolling, insulting/derogatory comments, and personal or political attacks 21 | * Public or private harassment 22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 23 | * Other conduct which could reasonably be considered inappropriate in a professional setting 24 | 25 | ## Our Responsibilities 26 | 27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 28 | 29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 30 | 31 | ## Scope 32 | 33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 34 | 35 | ## Enforcement 36 | 37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at jasongiancono+github@gmail.com. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 38 | 39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 40 | 41 | ## Attribution 42 | 43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] 44 | 45 | [homepage]: http://contributor-covenant.org 46 | [version]: http://contributor-covenant.org/version/1/4/ 47 | -------------------------------------------------------------------------------- /.github/workflows/test-suite.yaml: -------------------------------------------------------------------------------- 1 | name: Test Suite 2 | on: 3 | workflow_call: 4 | secrets: 5 | AWS_ACCESS_KEY_ID: 6 | description: 'AWS Access Key ID' 7 | required: true 8 | AWS_SECRET_ACCESS_KEY: 9 | description: 'AWS Secret Access Key' 10 | required: true 11 | GCLOUD_API_CREDENTIALS_BASE64: 12 | description: 'Base64 encoded GCloud API credentials' 13 | required: true 14 | AZURE_CONNECTION_STRING: 15 | description: 'Azure Storage Connection String' 16 | required: true 17 | push: 18 | branches: 19 | - master 20 | pull_request: 21 | jobs: 22 | live-test: 23 | runs-on: ubuntu-latest 24 | strategy: 25 | matrix: 26 | python-version: ['3.14'] 27 | django-version: ['>=6.0,<6.1'] 28 | name: Live test on latest Python (${{ matrix.python-version }}) and latest Django (${{ matrix.django-version }}) 29 | steps: 30 | - uses: actions/checkout@v4 31 | - uses: actions/setup-python@v5 32 | with: 33 | python-version: ${{ matrix.python-version }} 34 | - uses: actions/cache@v4 35 | with: 36 | path: ~/.cache/pip 37 | key: ${{ runner.os }}-pip-${{ hashFiles('setup.cfg') }} 38 | restore-keys: ${{ runner.os }}-pip 39 | - name: Install dependencies 40 | run: | 41 | pip install --upgrade django'${{ matrix.django-version }}' 42 | pip install --upgrade -r test-requirements.txt 43 | pip install . 44 | - name: Run tests against live env 45 | env: 46 | AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} 47 | AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} 48 | GCLOUD_API_CREDENTIALS_BASE64: ${{ secrets.GCLOUD_API_CREDENTIALS_BASE64 }} 49 | AZURE_CONNECTION_STRING: ${{ secrets.AZURE_CONNECTION_STRING }} 50 | if: github.event_name == 'push' && github.repository == 'jasongi/collectfasta' 51 | run: coverage run -m pytest -svv 52 | test-django: 53 | runs-on: ubuntu-latest 54 | strategy: 55 | matrix: 56 | python-version: ['3.10', '3.11', '3.12', '3.13', '3.14'] 57 | django-version: ['>=4.2,<4.3', '>=5.2,<5.3', '>=6.0,<6.1'] 58 | exclude: 59 | - python-version: '3.10' 60 | django-version: '>=6.0,<6.1' 61 | - python-version: '3.11' 62 | django-version: '>=6.0,<6.1' 63 | - python-version: '3.13' 64 | django-version: '>=4.2,<4.3' 65 | - python-version: '3.14' 66 | django-version: '>=4.2,<4.3' 67 | name: Test Python ${{ matrix.python-version }} Django ${{ matrix.django-version }} 68 | steps: 69 | - uses: actions/checkout@v4 70 | - uses: actions/setup-python@v5 71 | with: 72 | python-version: ${{ matrix.python-version }} 73 | - uses: actions/cache@v4 74 | with: 75 | path: ~/.cache/pip 76 | key: ${{ runner.os }}-pip-${{ hashFiles('setup.cfg') }} 77 | restore-keys: ${{ runner.os }}-pip 78 | - name: Install dependencies 79 | run: | 80 | pip install --upgrade django'${{ matrix.django-version }}' 81 | pip install --upgrade -r test-requirements.txt 82 | pip install . 83 | - name: Run tests against docker env 84 | run: make docker-up && coverage run -m pytest -svv --speedtest 85 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | name = Collectfasta 3 | version = attr: collectfasta.__version__ 4 | description = A Faster Collectstatic 5 | long_description = file: README.md 6 | long_description_content_type = text/markdown; charset=UTF-8 7 | license = MIT License 8 | license_file = LICENSE 9 | classifiers = 10 | Environment :: Web Environment 11 | Intended Audience :: Developers 12 | Operating System :: OS Independent 13 | Programming Language :: Python 14 | Programming Language :: Python :: 3 15 | Programming Language :: Python :: 3.10 16 | Programming Language :: Python :: 3.11 17 | Programming Language :: Python :: 3.12 18 | Programming Language :: Python :: 3.13 19 | Programming Language :: Python :: 3.14 20 | Framework :: Django 21 | Framework :: Django :: 4.2 22 | Framework :: Django :: 5.2 23 | Framework :: Django :: 6.0 24 | author = Anton Agestam, Jason Giancono 25 | author_email = jasongiancono+github@gmail.com 26 | url = https://github.com/jasongi/collectfasta/ 27 | 28 | [options] 29 | include_package_data = True 30 | packages = find: 31 | install_requires = 32 | Django>=4.2 33 | django-storages>=1.13.2 34 | typing-extensions 35 | python_requires = >=3.10 36 | 37 | [options.package_data] 38 | collectfasta = py.typed 39 | 40 | [bdist_wheel] 41 | universal = true 42 | 43 | [tool:pytest] 44 | DJANGO_SETTINGS_MODULE = collectfasta.tests.settings 45 | markers = 46 | live_test: tests that interact with real APIs if credentials are set, docker if they aren't 47 | speed_test: tests that test processing many static files 48 | backend(backend): test that runs on a specific backend - generated by the strategy fixture 49 | strategy(strategy): test that runs on a specific backend - generated by the strategy fixture 50 | 51 | [flake8] 52 | exclude = appveyor, .idea, .git, .venv, .tox, __pycache__, *.egg-info, build 53 | max-complexity = 8 54 | max-line-length = 120 55 | # ignore F821 until mypy-0.730 compatibility is released 56 | # https://github.com/PyCQA/pyflakes/issues/475 57 | # see this discussion as to why we're ignoring E722 58 | # https://github.com/PyCQA/pycodestyle/issues/703 59 | # ignore E701 because it conflicts with black 60 | # ignore B019 because we don't really care about memory leaks during collectstatic 61 | extend-ignore = E722 F821 E701 B019 62 | 63 | [isort] 64 | profile = black 65 | src_paths = collectfasta 66 | force_single_line = True 67 | 68 | [mypy] 69 | python_version = 3.10 70 | show_error_codes = True 71 | pretty = True 72 | files = . 73 | 74 | no_implicit_reexport = True 75 | no_implicit_optional = True 76 | strict_equality = True 77 | strict_optional = True 78 | check_untyped_defs = True 79 | disallow_incomplete_defs = True 80 | ignore_missing_imports = False 81 | 82 | warn_unused_configs = True 83 | warn_redundant_casts = True 84 | warn_unused_ignores = True 85 | warn_return_any = True 86 | warn_unreachable = True 87 | 88 | plugins = 89 | mypy_django_plugin.main 90 | 91 | [mypy.plugins.django-stubs] 92 | django_settings_module = collectfasta.tests.settings 93 | 94 | [mypy-storages.*,google.*,botocore.*,setuptools.*,pytest.*] 95 | ignore_missing_imports = True 96 | 97 | [coverage:run] 98 | source = collectfasta 99 | 100 | [coverage:report] 101 | omit = */tests/* 102 | exclude_lines = 103 | pragma: no cover 104 | # ignore non-implementations 105 | \.\.\. 106 | -------------------------------------------------------------------------------- /collectfasta/tests/settings.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import os 3 | import pathlib 4 | import sys 5 | import tempfile 6 | 7 | # import python and django versions 8 | from django import get_version 9 | from google.cloud import storage 10 | from google.oauth2 import service_account 11 | 12 | base_path = pathlib.Path.cwd() 13 | 14 | # Set USE_TZ to True to work around bug in django-storages 15 | USE_TZ = True 16 | 17 | SECRET_KEY = "nonsense" 18 | CACHES = { 19 | "default": { 20 | "BACKEND": "django.core.cache.backends.locmem.LocMemCache", 21 | "LOCATION": "test-collectfasta", 22 | } 23 | } 24 | TEMPLATE_LOADERS = ( 25 | "django.template.loaders.filesystem.Loader", 26 | "django.template.loaders.app_directories.Loader", 27 | "django.template.loaders.eggs.Loader", 28 | ) 29 | TEMPLATE_DIRS = [str(base_path / "collectfasta/templates")] 30 | INSTALLED_APPS = ("collectfasta", "django.contrib.staticfiles") 31 | STATIC_URL = "/staticfiles/" 32 | # python then django version 33 | AWS_LOCATION = sys.version.split(" ")[0] + "-" + get_version() 34 | GS_LOCATION = sys.version.split(" ")[0] + "-" + get_version() 35 | STATIC_ROOT = str(base_path / "static_root") 36 | MEDIA_ROOT = str(base_path / "fs_remote") 37 | STATICFILES_DIRS = [str(base_path / "static")] 38 | STORAGES = { 39 | "staticfiles": { 40 | "BACKEND": "storages.backends.s3.S3Storage", 41 | }, 42 | } 43 | COLLECTFASTA_STRATEGY = "collectfasta.strategies.boto3.Boto3Strategy" 44 | COLLECTFASTA_DEBUG = True 45 | 46 | GZIP_CONTENT_TYPES = ("text/plain",) 47 | AWS_IS_GZIPPED = False 48 | AWS_QUERYSTRING_AUTH = False 49 | AWS_DEFAULT_ACL: None = None 50 | S3_USE_SIGV4 = True 51 | AWS_S3_SIGNATURE_VERSION = "s3v4" 52 | 53 | # AWS 54 | AWS_STORAGE_BUCKET_NAME = "collectfasta" 55 | 56 | FAKE_AWS_ACCESS_KEY_ID = "AKIAIOSFODNN7EXAMPLE" 57 | FAKE_AWS_SECRET_ACCESS_KEY = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" 58 | AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID", default=FAKE_AWS_ACCESS_KEY_ID) 59 | AWS_SECRET_ACCESS_KEY = os.environ.get( 60 | "AWS_SECRET_ACCESS_KEY", 61 | default=FAKE_AWS_SECRET_ACCESS_KEY, 62 | ) 63 | AWS_S3_REGION_NAME = "ap-southeast-2" 64 | if AWS_ACCESS_KEY_ID == FAKE_AWS_ACCESS_KEY_ID: 65 | AWS_ENDPOINT_URL = "http://localhost.localstack.cloud:4567" 66 | AWS_S3_ENDPOINT_URL = "http://localhost.localstack.cloud:4567" 67 | 68 | GCLOUD_API_CREDENTIALS_BASE64 = os.environ.get( 69 | "GCLOUD_API_CREDENTIALS_BASE64", default=None 70 | ) 71 | # Google Cloud 72 | if GCLOUD_API_CREDENTIALS_BASE64: 73 | # live test 74 | GS_CUSTOM_ENDPOINT = None 75 | with tempfile.NamedTemporaryFile() as file: 76 | gcloud_credentials_json = base64.b64decode(GCLOUD_API_CREDENTIALS_BASE64) 77 | file.write(gcloud_credentials_json) 78 | file.read() 79 | GS_CREDENTIALS = service_account.Credentials.from_service_account_file( 80 | file.name 81 | ) 82 | else: 83 | GS_CUSTOM_ENDPOINT = "http://127.0.0.1:6050" 84 | try: 85 | storage.Client( 86 | client_options={"api_endpoint": GS_CUSTOM_ENDPOINT}, 87 | use_auth_w_custom_endpoint=False, 88 | ).create_bucket("collectfasta") 89 | except Exception: 90 | pass 91 | GS_BUCKET_NAME = "collectfasta" 92 | 93 | AZURE_CONTAINER = "collectfasta" 94 | 95 | AZURE_CONNECTION_STRING = os.environ.get( 96 | "AZURE_CONNECTION_STRING", 97 | ( 98 | "DefaultEndpointsProtocol=http;" 99 | "AccountName=devstoreaccount1;" 100 | "AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;" 101 | "BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;" 102 | ), 103 | ) 104 | -------------------------------------------------------------------------------- /collectfasta/strategies/hashing.py: -------------------------------------------------------------------------------- 1 | from typing import Any 2 | from typing import Callable 3 | from typing import Optional 4 | from typing import Protocol 5 | from typing import Tuple 6 | from typing import Type 7 | from typing import Union 8 | from typing import cast 9 | from typing import runtime_checkable 10 | 11 | from django.contrib.staticfiles.storage import ManifestFilesMixin 12 | from django.core.files.storage import FileSystemStorage 13 | from django.core.files.storage import Storage 14 | from django.core.files.storage.memory import InMemoryStorage 15 | from django.utils.functional import LazyObject 16 | 17 | from .base import HashStrategy 18 | from .base import Strategy 19 | 20 | 21 | @runtime_checkable 22 | class LocationConstructorProtocol(Protocol): 23 | 24 | def __init__(self, location: Optional[str]) -> None: ... 25 | 26 | 27 | @runtime_checkable 28 | class HasLocationProtocol(Protocol): 29 | location: str 30 | 31 | 32 | class InMemoryManifestFilesStorage(ManifestFilesMixin, InMemoryStorage): 33 | url: Callable[..., Any] 34 | 35 | 36 | class FileSystemManifestFilesStorage(ManifestFilesMixin, FileSystemStorage): 37 | url: Callable[..., Any] 38 | 39 | 40 | OriginalStorage = Union[ 41 | LocationConstructorProtocol, 42 | HasLocationProtocol, 43 | Storage, 44 | ManifestFilesMixin, 45 | LazyObject, 46 | ] 47 | 48 | 49 | class HashingTwoPassStrategy(HashStrategy[Storage]): 50 | """ 51 | Hashing strategies interact a lot with the remote storage as a part of post-processing 52 | This strategy will instead run the hashing strategy using InMemoryStorage first, then copy 53 | the files to the remote storage 54 | """ 55 | 56 | first_manifest_storage: Type[OriginalStorage] 57 | second_strategy: Type[Strategy[Storage]] 58 | original_storage: OriginalStorage 59 | memory_storage: OriginalStorage 60 | 61 | def __init__(self, remote_storage: OriginalStorage) -> None: 62 | assert issubclass(self.first_manifest_storage, ManifestFilesMixin) 63 | assert isinstance(remote_storage, ManifestFilesMixin) 64 | self.first_pass = True 65 | self.original_storage = remote_storage 66 | self.memory_storage = self._get_tmp_storage() 67 | assert isinstance(self.memory_storage, Storage) 68 | self.remote_storage = self.memory_storage 69 | super().__init__(self.memory_storage) 70 | 71 | def _get_tmp_storage(self) -> OriginalStorage: 72 | # python 3.12 freezes types at runtime, which does not play nicely with 73 | # LazyObject so we need to cast the type here 74 | location = cast(HasLocationProtocol, self.original_storage).location 75 | assert issubclass(self.first_manifest_storage, LocationConstructorProtocol) 76 | return self.first_manifest_storage(location=location) 77 | 78 | def wrap_storage(self, remote_storage: Storage) -> Storage: 79 | return self.remote_storage 80 | 81 | def get_remote_file_hash(self, prefixed_path: str) -> Optional[str]: 82 | try: 83 | return super().get_local_file_hash(prefixed_path, self.remote_storage) 84 | except FileNotFoundError: 85 | return None 86 | 87 | def second_pass_strategy(self): 88 | """ 89 | Strategy that is used after the first pass of hashing is done - to copy the files 90 | to the remote destination. 91 | """ 92 | if self.second_strategy is None: 93 | raise NotImplementedError( 94 | "second_strategy must be set to a valid strategy class" 95 | ) 96 | else: 97 | assert isinstance(self.original_storage, Storage) 98 | return self.second_strategy(self.original_storage) 99 | 100 | 101 | Task = Tuple[str, str, Storage] 102 | 103 | 104 | class StrategyWithLocationProtocol: 105 | remote_storage: Any 106 | 107 | 108 | class WithoutPrefixMixin(StrategyWithLocationProtocol): 109 | 110 | def copy_args_hook(self, args: Task) -> Task: 111 | assert isinstance(self.remote_storage, HasLocationProtocol) 112 | if self.remote_storage.location == "" or self.remote_storage.location.endswith( 113 | "/" 114 | ): 115 | location = self.remote_storage.location 116 | else: 117 | location = f"{self.remote_storage.location}/" 118 | return ( 119 | args[0].replace(location, ""), 120 | args[1].replace(location, ""), 121 | args[2], 122 | ) 123 | 124 | 125 | class TwoPassInMemoryStrategy(HashingTwoPassStrategy): 126 | first_manifest_storage = InMemoryManifestFilesStorage 127 | 128 | 129 | class TwoPassFileSystemStrategy(HashingTwoPassStrategy): 130 | first_manifest_storage = FileSystemManifestFilesStorage 131 | -------------------------------------------------------------------------------- /collectfasta/tests/conftest.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | 4 | import pytest 5 | from django.conf import settings 6 | from django.test import override_settings as override_django_settings 7 | from pytest_uncollect_if import uncollect_if 8 | 9 | 10 | def composed(*decs): 11 | def deco(f): 12 | for dec in reversed(decs): 13 | f = dec(f) 14 | return f 15 | 16 | return deco 17 | 18 | 19 | S3_STORAGE_BACKEND = "storages.backends.s3.S3Storage" 20 | S3_STATIC_STORAGE_BACKEND = "storages.backends.s3.S3StaticStorage" 21 | S3_MANIFEST_STATIC_STORAGE_BACKEND = "storages.backends.s3.S3ManifestStaticStorage" 22 | S3_CUSTOM_MANIFEST_STATIC_STORAGE_BACKEND = ( 23 | "collectfasta.tests.utils.S3ManifestCustomStaticStorage" 24 | ) 25 | 26 | GOOGLE_CLOUD_STORAGE_BACKEND = "collectfasta.tests.utils.GoogleCloudStorageTest" 27 | AZURE_BLOB_STORAGE_BACKEND = "collectfasta.tests.utils.AzureBlobStorageTest" 28 | FILE_SYSTEM_STORAGE_BACKEND = "django.core.files.storage.FileSystemStorage" 29 | 30 | BOTO3_STRATEGY = "collectfasta.strategies.boto3.Boto3Strategy" 31 | BOTO3_MANIFEST_MEMORY_STRATEGY = ( 32 | "collectfasta.strategies.boto3.Boto3ManifestMemoryStrategy" 33 | ) 34 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY = ( 35 | "collectfasta.strategies.boto3.Boto3ManifestFileSystemStrategy" 36 | ) 37 | GOOGLE_CLOUD_STRATEGY = "collectfasta.strategies.gcloud.GoogleCloudStrategy" 38 | AZURE_BLOB_STRATEGY = "collectfasta.strategies.azure.AzureBlobStrategy" 39 | FILE_SYSTEM_STRATEGY = "collectfasta.strategies.filesystem.FileSystemStrategy" 40 | CACHING_FILE_SYSTEM_STRATEGY = ( 41 | "collectfasta.strategies.filesystem.CachingFileSystemStrategy" 42 | ) 43 | 44 | S3_BACKENDS = [ 45 | S3_STORAGE_BACKEND, 46 | S3_STATIC_STORAGE_BACKEND, 47 | S3_MANIFEST_STATIC_STORAGE_BACKEND, 48 | S3_CUSTOM_MANIFEST_STATIC_STORAGE_BACKEND, 49 | ] 50 | BACKENDS = [ 51 | *S3_BACKENDS, 52 | GOOGLE_CLOUD_STORAGE_BACKEND, 53 | AZURE_BLOB_STORAGE_BACKEND, 54 | FILE_SYSTEM_STORAGE_BACKEND, 55 | ] 56 | 57 | STRATEGIES = [ 58 | BOTO3_STRATEGY, 59 | BOTO3_MANIFEST_MEMORY_STRATEGY, 60 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY, 61 | GOOGLE_CLOUD_STRATEGY, 62 | AZURE_BLOB_STRATEGY, 63 | FILE_SYSTEM_STRATEGY, 64 | CACHING_FILE_SYSTEM_STRATEGY, 65 | ] 66 | 67 | COMPATIBLE_STRATEGIES_FOR_BACKENDS = { 68 | S3_STORAGE_BACKEND: [BOTO3_STRATEGY], 69 | S3_STATIC_STORAGE_BACKEND: [BOTO3_STRATEGY], 70 | S3_MANIFEST_STATIC_STORAGE_BACKEND: [ 71 | BOTO3_STRATEGY, 72 | BOTO3_MANIFEST_MEMORY_STRATEGY, 73 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY, 74 | ], 75 | S3_CUSTOM_MANIFEST_STATIC_STORAGE_BACKEND: [ 76 | BOTO3_STRATEGY, 77 | BOTO3_MANIFEST_MEMORY_STRATEGY, 78 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY, 79 | ], 80 | GOOGLE_CLOUD_STORAGE_BACKEND: [GOOGLE_CLOUD_STRATEGY], 81 | AZURE_BLOB_STORAGE_BACKEND: [AZURE_BLOB_STRATEGY], 82 | FILE_SYSTEM_STORAGE_BACKEND: [FILE_SYSTEM_STRATEGY, CACHING_FILE_SYSTEM_STRATEGY], 83 | } 84 | 85 | 86 | def two_n_plus_1(files): 87 | return files * 2 + 1 88 | 89 | 90 | def n(files): 91 | return files 92 | 93 | 94 | def short_name(backend, strategy): 95 | return f"{backend.split('.')[-1]}:{strategy.split('.')[-1]}" 96 | 97 | 98 | def params_for_backends(): 99 | for backend in BACKENDS: 100 | for strategy in COMPATIBLE_STRATEGIES_FOR_BACKENDS[backend]: 101 | yield pytest.param( 102 | (backend, strategy), 103 | marks=[pytest.mark.backend(backend), pytest.mark.strategy(strategy)], 104 | id=short_name(backend, strategy), 105 | ) 106 | 107 | 108 | class StrategyFixture: 109 | def __init__(self, expected_copied_files, backend, strategy, two_pass): 110 | self.backend = backend 111 | self.strategy = strategy 112 | self.expected_copied_files = expected_copied_files 113 | self.two_pass = two_pass 114 | 115 | 116 | @pytest.fixture(params=params_for_backends()) 117 | def strategy(request): 118 | backend, strategy = request.param 119 | if strategy in ( 120 | BOTO3_MANIFEST_MEMORY_STRATEGY, 121 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY, 122 | ) and backend in ( 123 | S3_MANIFEST_STATIC_STORAGE_BACKEND, 124 | S3_CUSTOM_MANIFEST_STATIC_STORAGE_BACKEND, 125 | ): 126 | expected_copied_files = two_n_plus_1 127 | else: 128 | expected_copied_files = n 129 | with override_django_settings( 130 | STORAGES={"staticfiles": {"BACKEND": backend}}, 131 | COLLECTFASTA_STRATEGY=strategy, 132 | ): 133 | yield StrategyFixture( 134 | expected_copied_files, 135 | backend, 136 | strategy, 137 | two_pass=strategy 138 | in ( 139 | BOTO3_MANIFEST_MEMORY_STRATEGY, 140 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY, 141 | ), 142 | ) 143 | 144 | 145 | def uncollect_if_not_s3(strategy: tuple[str, str], **kwargs: dict) -> bool: 146 | backend, _ = strategy 147 | return backend not in S3_BACKENDS 148 | 149 | 150 | def uncollect_if_not_cloud(strategy: tuple[str, str], **kwargs: dict) -> bool: 151 | backend, _ = strategy 152 | return ( 153 | backend not in S3_BACKENDS 154 | and backend != GOOGLE_CLOUD_STORAGE_BACKEND 155 | and backend != AZURE_BLOB_STORAGE_BACKEND 156 | ) 157 | 158 | 159 | def uncollect_if_not_azure(strategy: tuple[str, str], **kwargs: dict) -> bool: 160 | backend, _ = strategy 161 | return backend not in AZURE_BLOB_STORAGE_BACKEND 162 | 163 | 164 | live_test = pytest.mark.live_test 165 | speed_test_mark = pytest.mark.speed_test 166 | 167 | speed_test = composed( 168 | live_test, 169 | speed_test_mark, 170 | pytest.mark.skipif( 171 | "not config.getoption('speedtest')", 172 | reason="no --speedtest flag", 173 | ), 174 | ) 175 | 176 | aws_backends_only = uncollect_if(func=uncollect_if_not_s3) 177 | cloud_backends_only = uncollect_if(func=uncollect_if_not_cloud) 178 | azure_backends_only = uncollect_if(func=uncollect_if_not_azure) 179 | 180 | 181 | def uncollect_if_not_two_pass(strategy: tuple[str, str], **kwargs: dict) -> bool: 182 | _, strategy_str = strategy 183 | return strategy_str not in ( 184 | BOTO3_MANIFEST_MEMORY_STRATEGY, 185 | BOTO3_MANIFEST_FILE_SYSTEM_STRATEGY, 186 | ) 187 | 188 | 189 | def uncollect_if_two_pass(strategy: tuple[str, str], **kwargs: dict) -> bool: 190 | return not uncollect_if_not_two_pass(strategy, **kwargs) 191 | 192 | 193 | two_pass_only = uncollect_if(func=uncollect_if_not_two_pass) 194 | exclude_two_pass = uncollect_if(func=uncollect_if_two_pass) 195 | 196 | 197 | @pytest.fixture(autouse=True) 198 | def create_test_directories(): 199 | paths = (settings.STATICFILES_DIRS[0], settings.STATIC_ROOT, settings.MEDIA_ROOT) 200 | for path in paths: 201 | if not os.path.exists(path): 202 | os.makedirs(path) 203 | try: 204 | yield 205 | finally: 206 | for path in paths: 207 | shutil.rmtree(path) 208 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Collectfasta 2 | 3 | A faster collectstatic command. This is a fork of the archived collectfast by @antonagestam and a drop-in replacement - you must not have both installed at the same time. 4 | 5 | [![Test Suite](https://github.com/jasongi/collectfasta/workflows/Test%20Suite/badge.svg)](https://github.com/jasongi/collectfasta/actions?query=workflow%3A%22Test+Suite%22+branch%3Amaster) 6 | [![Static analysis](https://github.com/jasongi/collectfasta/workflows/Static%20analysis/badge.svg?branch=master)](https://github.com/jasongi/collectfasta/actions?query=workflow%3A%22Static+analysis%22+branch%3Amaster) 7 | 8 | **Features** 9 | 10 | - Efficiently decide what files to upload using cached checksums 11 | - Two-pass uploads for Manifest storage which can be slow using a single pass - files are hashed/post-processed in Memory/Local filesystem and then the result is copied. 12 | - Parallel file uploads 13 | 14 | **Supported Storage Backends** 15 | - `storages.backends.s3boto3.S3Boto3Storage` 16 | - `storages.backends.s3boto3.S3StaticStorage` 17 | - `storages.backends.s3boto3.S3ManifestStaticStorage` 18 | - `storages.backends.gcloud.GoogleCloudStorage` 19 | - `storages.backends.azure_storage.AzureStorage` 20 | - `django.core.files.storage.FileSystemStorage` 21 | 22 | Running Django's `collectstatic` command can become painfully slow as more and 23 | more files are added to a project, especially when heavy libraries such as 24 | jQuery UI are included in source code. Collectfasta customizes the builtin 25 | `collectstatic` command, adding different optimizations to make uploading large 26 | amounts of files much faster. 27 | 28 | 29 | ## Installation 30 | 31 | Install the app using pip: 32 | 33 | ```bash 34 | $ python3 -m pip install Collectfasta 35 | ``` 36 | 37 | Make sure you have this in your settings file and add `'collectfasta'` to your 38 | `INSTALLED_APPS`, before `'django.contrib.staticfiles'`: 39 | 40 | ```python 41 | STORAGES = ( 42 | { 43 | "staticfiles": { 44 | "BACKEND": "storages.backends.s3.S3Storage", 45 | }, 46 | }, 47 | ) 48 | COLLECTFASTA_STRATEGY = "collectfasta.strategies.boto3.Boto3Strategy" 49 | INSTALLED_APPS = ( 50 | # ... 51 | "collectfasta", 52 | ) 53 | ``` 54 | 55 | **Note:** `'collectfasta'` must come before `'django.contrib.staticfiles'` in 56 | `INSTALLED_APPS`. 57 | 58 | ##### Upload Strategies 59 | 60 | Collectfasta Strategy|Storage Backend 61 | ---|--- 62 | collectfasta.strategies.boto3.Boto3Strategy|storages.backends.s3.S3Storage 63 | collectfasta.strategies.boto3.Boto3Strategy|storages.backends.s3.S3StaticStorage 64 | collectfasta.strategies.boto3.Boto3ManifestMemoryStrategy (recommended)|storages.backends.s3.S3ManifestStaticStorage 65 | collectfasta.strategies.boto3.Boto3ManifestFileSystemStrategy|storages.backends.s3.S3ManifestStaticStorage 66 | collectfasta.strategies.gcloud.GoogleCloudStrategy|storages.backends.gcloud.GoogleCloudStorage 67 | collectfasta.strategies.azure.AzureBlobStrategy|storages.backends.azure_storage.AzureStorage 68 | collectfasta.strategies.filesystem.FileSystemStrategy|django.core.files.storage.FileSystemStorage 69 | 70 | Custom strategies can also be made for backends not listed above by 71 | implementing the `collectfasta.strategies.Strategy` ABC. 72 | 73 | 74 | ## Usage 75 | 76 | Collectfasta overrides Django's builtin `collectstatic` command so just run 77 | `python manage.py collectstatic` as normal. 78 | 79 | You can disable Collectfasta by using the `--disable-collectfasta` option or by 80 | setting `COLLECTFASTA_ENABLED = False` in your settings file. 81 | 82 | ### Setting Up a Dedicated Cache Backend 83 | 84 | It's recommended to setup a dedicated cache backend for Collectfasta. Every time 85 | Collectfasta does not find a lookup for a file in the cache it will trigger a 86 | lookup to the storage backend, so it's recommended to have a fairly high 87 | `TIMEOUT` setting. 88 | 89 | Configure your dedicated cache with the `COLLECTFASTA_CACHE` setting: 90 | 91 | ```python 92 | CACHES = { 93 | "default": { 94 | # Your default cache 95 | }, 96 | "collectfasta": { 97 | # Your dedicated Collectfasta cache 98 | }, 99 | } 100 | 101 | COLLECTFASTA_CACHE = "collectfasta" 102 | ``` 103 | 104 | If `COLLECTFASTA_CACHE` isn't set, the `default` cache will be used. 105 | 106 | **Note:** Collectfasta will never clean the cache of obsolete files. To clean 107 | out the entire cache, use `cache.clear()`. [See docs for Django's cache 108 | framework][django-cache]. 109 | 110 | **Note:** We recommend you to set the `MAX_ENTRIES` setting if you have more 111 | than 300 static files, see [#47][issue-47]. 112 | 113 | [django-cache]: https://docs.djangoproject.com/en/stable/topics/cache/ 114 | [issue-47]: https://github.com/antonagestam/collectfast/issues/47 115 | 116 | ### Enable Parallel Uploads 117 | 118 | The parallelization feature enables parallel file uploads using Python's 119 | `concurrent.futures` module. Enable it by setting the `COLLECTFASTA_THREADS` 120 | setting. 121 | 122 | To enable parallel uploads, a dedicated cache backend must be setup and it must 123 | use a backend that is thread-safe, i.e. something other than Django's default 124 | LocMemCache. 125 | 126 | ```python 127 | COLLECTFASTA_THREADS = 20 128 | ``` 129 | 130 | 131 | ## Debugging 132 | 133 | By default, Collectfasta will suppress any exceptions that happens when copying 134 | and let Django's `collectstatic` handle it. To debug those suppressed errors 135 | you can set `COLLECTFASTA_DEBUG = True` in your Django settings file. 136 | 137 | 138 | ## Contribution 139 | 140 | Please feel free to contribute by using issues and pull requests. Discussion is 141 | open and welcome. 142 | 143 | ### Versioning policy 144 | 145 | We follow semantic versioning with the following support policy: 146 | unsupported Django and Python versions are dropped after their EOL date. When 147 | dropping support for an unsupported Django or Python version, Collectfasta only 148 | bumps a patch version. 149 | 150 | ### Testing 151 | 152 | The test suite is built to run against localstack / fake-gcs-server OR live S3 and GCloud buckets. 153 | To run live tests locally you need to provide API credentials to test against as environment variables. 154 | 155 | ```bash 156 | export AWS_ACCESS_KEY_ID='...' 157 | export AWS_SECRET_ACCESS_KEY='...' 158 | export GCLOUD_API_CREDENTIALS_BASE64='{...}' # Google Cloud credentials as Base64'd json 159 | ``` 160 | 161 | Install test dependencies and target Django version: 162 | 163 | ```bash 164 | python3 -m pip install -r test-requirements.txt 165 | python3 -m pip install django==5.2.3 166 | ``` 167 | 168 | Run test suite: 169 | 170 | ```bash 171 | make test 172 | ``` 173 | 174 | Run test against localstack/fake-gcs-server: 175 | 176 | ```bash 177 | make test-docker 178 | ``` 179 | 180 | Code quality tools are broken out from test requirements because some of them 181 | only install on Python >= 3.7. 182 | 183 | ```bash 184 | python3 -m pip install -r lint-requirements.txt 185 | ``` 186 | 187 | Run linters and static type check: 188 | 189 | ```bash 190 | make checks 191 | ``` 192 | 193 | 194 | ## License 195 | 196 | Collectfasta is licensed under the MIT License, see LICENSE file for more 197 | information. 198 | -------------------------------------------------------------------------------- /collectfasta/tests/utils.py: -------------------------------------------------------------------------------- 1 | import functools 2 | import os 3 | import pathlib 4 | import random 5 | import uuid 6 | from concurrent.futures import ThreadPoolExecutor 7 | from typing import Any 8 | from typing import Callable 9 | from typing import TypeVar 10 | from typing import cast 11 | 12 | from django.conf import settings as django_settings 13 | from django.utils.module_loading import import_string 14 | from storages.backends.azure_storage import AzureStorage 15 | from storages.backends.gcloud import GoogleCloudStorage 16 | from storages.backends.s3boto3 import S3ManifestStaticStorage 17 | from typing_extensions import Final 18 | 19 | from collectfasta import settings 20 | 21 | static_dir: Final = pathlib.Path(django_settings.STATICFILES_DIRS[0]) 22 | 23 | F = TypeVar("F", bound=Callable[..., Any]) 24 | 25 | 26 | def make_100_files(): 27 | with ThreadPoolExecutor(max_workers=5) as executor: 28 | for _ in range(50): 29 | executor.submit(create_big_referenced_static_file) 30 | executor.shutdown(wait=True) 31 | 32 | 33 | def get_fake_client(): 34 | from google.api_core.client_options import ClientOptions 35 | from google.auth.credentials import AnonymousCredentials 36 | from google.cloud import storage 37 | 38 | client = storage.Client( 39 | credentials=AnonymousCredentials(), 40 | project="test", 41 | client_options=ClientOptions(api_endpoint=django_settings.GS_CUSTOM_ENDPOINT), 42 | ) 43 | return client 44 | 45 | 46 | class GoogleCloudStorageTest(GoogleCloudStorage): 47 | def __init__(self, *args, **kwargs): 48 | super().__init__(*args, **kwargs) 49 | if django_settings.GS_CUSTOM_ENDPOINT: 50 | # Use the fake client if we are using the fake endpoint 51 | self._client = get_fake_client() 52 | 53 | 54 | class AzureBlobStorageTest(AzureStorage): 55 | def __init__(self, *args, **kwargs): 56 | super().__init__(*args, **kwargs) 57 | self.create_container() 58 | 59 | def create_container(self): 60 | from azure.core.exceptions import ResourceExistsError 61 | from azure.storage.blob import BlobServiceClient 62 | 63 | connection_string = django_settings.AZURE_CONNECTION_STRING 64 | continer_name = django_settings.AZURE_CONTAINER 65 | 66 | client = BlobServiceClient.from_connection_string(connection_string) 67 | if "DefaultEndpointsProtocol=http;" in connection_string: 68 | try: 69 | client.create_container(continer_name) 70 | except ResourceExistsError: 71 | # recreate orphaned containers 72 | client.delete_container(continer_name) 73 | client.create_container(continer_name) 74 | 75 | 76 | class S3ManifestCustomStaticStorage(S3ManifestStaticStorage): 77 | location = f"prefix-{django_settings.AWS_LOCATION}" 78 | manifest_name = "prefixfiles.json" 79 | 80 | 81 | def create_two_referenced_static_files() -> tuple[pathlib.Path, pathlib.Path]: 82 | """Create a static file, then another file with a reference to the file""" 83 | path = create_static_file() 84 | folder_path = static_dir / (path.stem + "_folder") 85 | folder_path.mkdir() 86 | reference_path = folder_path / f"{uuid.uuid4().hex}.html" 87 | reference_path.write_text(f"{{% static '../{path.name}' %}}") 88 | return (path, reference_path) 89 | 90 | 91 | def create_static_file() -> pathlib.Path: 92 | """Write random characters to a file in the static directory.""" 93 | path = static_dir / f"{uuid.uuid4().hex}.html" 94 | path.write_text("".join(chr(random.randint(0, 64)) for _ in range(500))) 95 | return path 96 | 97 | 98 | def create_big_referenced_static_file() -> tuple[pathlib.Path, pathlib.Path]: 99 | """Create a big static file, then another file with a reference to the file""" 100 | path = create_big_static_file() 101 | reference_path = static_dir / f"{uuid.uuid4().hex}.html" 102 | reference_path.write_text(f"{{% static '{path.name}' %}}") 103 | return (path, reference_path) 104 | 105 | 106 | def create_big_static_file() -> pathlib.Path: 107 | """Write random characters to a file in the static directory.""" 108 | path = static_dir / f"{uuid.uuid4().hex}.html" 109 | path.write_text("".join(chr(random.randint(0, 64)) for _ in range(100000))) 110 | return path 111 | 112 | 113 | def create_larger_than_4mb_referenced_static_file() -> ( 114 | tuple[pathlib.Path, pathlib.Path] 115 | ): 116 | """Create a larger than 4mb static file, then another file with a reference to the file""" 117 | path = create_larger_than_4mb_file() 118 | reference_path = static_dir / f"{uuid.uuid4().hex}.html" 119 | reference_path.write_text(f"{{% static '{path.name}' %}}") 120 | return (path, reference_path) 121 | 122 | 123 | def create_larger_than_4mb_file() -> pathlib.Path: 124 | """Write random characters to a file larger than 4mb in the static directory.""" 125 | size_bytes = 4 * 1024 * 1024 + 1 # 4MB + 1 byte 126 | path = static_dir / f"{uuid.uuid4().hex}.html" 127 | path.write_text("".join(chr(random.randint(0, 64)) for _ in range(size_bytes))) 128 | return path 129 | 130 | 131 | def clean_static_dir() -> None: 132 | clean_static_dir_recurse(static_dir.as_posix()) 133 | clean_static_dir_recurse(django_settings.AWS_LOCATION) 134 | clean_static_dir_recurse(S3ManifestCustomStaticStorage.location) 135 | 136 | 137 | def clean_static_dir_recurse(location: str) -> None: 138 | try: 139 | for filename in os.listdir(location): 140 | file = pathlib.Path(location) / filename 141 | # don't accidentally wipe the whole drive if someone puts / as location. 142 | if ( 143 | "collectfasta" in str(file.absolute()) 144 | and ".." not in str(file.as_posix()) 145 | and len(list(filter(lambda x: x == "/", str(file.absolute())))) > 2 146 | ): 147 | if file.is_file(): 148 | file.unlink() 149 | elif file.is_dir(): 150 | clean_static_dir_recurse(file.as_posix()) 151 | file.rmdir() 152 | except FileNotFoundError: 153 | pass 154 | 155 | 156 | def override_setting(name: str, value: Any) -> Callable[[F], F]: 157 | def decorator(fn: F) -> F: 158 | @functools.wraps(fn) 159 | def wrapper(*args, **kwargs): 160 | original = getattr(settings, name) 161 | setattr(settings, name, value) 162 | try: 163 | return fn(*args, **kwargs) 164 | finally: 165 | setattr(settings, name, original) 166 | 167 | return cast(F, wrapper) 168 | 169 | return decorator 170 | 171 | 172 | def override_storage_attr(name: str, value: Any) -> Callable[[F], F]: 173 | def decorator(fn: F) -> F: 174 | @functools.wraps(fn) 175 | def wrapper(*args, **kwargs): 176 | storage = import_string(django_settings.STORAGES["staticfiles"]["BACKEND"]) 177 | if hasattr(storage, name): 178 | original = getattr(storage, name) 179 | else: 180 | original = None 181 | setattr(storage, name, value) 182 | try: 183 | return fn(*args, **kwargs) 184 | finally: 185 | setattr(storage, name, original) 186 | 187 | return cast(F, wrapper) 188 | 189 | return decorator 190 | -------------------------------------------------------------------------------- /collectfasta/strategies/base.py: -------------------------------------------------------------------------------- 1 | import abc 2 | import gzip 3 | import hashlib 4 | import logging 5 | import mimetypes 6 | import pydoc 7 | from functools import lru_cache 8 | from io import BytesIO 9 | from typing import ClassVar 10 | from typing import Generic 11 | from typing import NoReturn 12 | from typing import Optional 13 | from typing import Tuple 14 | from typing import Type 15 | from typing import TypeVar 16 | from typing import Union 17 | 18 | from django.core.cache import caches 19 | from django.core.exceptions import ImproperlyConfigured 20 | from django.core.files.storage import Storage 21 | from django.utils.encoding import force_bytes 22 | 23 | from collectfasta import settings 24 | 25 | _RemoteStorage = TypeVar("_RemoteStorage", bound=Storage) 26 | 27 | 28 | cache = caches[settings.cache] 29 | logger = logging.getLogger(__name__) 30 | 31 | 32 | class Strategy(abc.ABC, Generic[_RemoteStorage]): 33 | # Exceptions raised by storage backend for delete calls to non-existing 34 | # objects. The command silently catches these. 35 | delete_not_found_exception: ClassVar[Tuple[Type[Exception], ...]] = () 36 | 37 | def __init__(self, remote_storage: _RemoteStorage) -> None: 38 | self.remote_storage = remote_storage 39 | 40 | def wrap_storage(self, remote_storage: _RemoteStorage) -> _RemoteStorage: 41 | """ 42 | Wrap the remote storage. 43 | Allows you to change the remote storage behavior 44 | just for collectstatic. 45 | """ 46 | return remote_storage 47 | 48 | @abc.abstractmethod 49 | def should_copy_file( 50 | self, path: str, prefixed_path: str, local_storage: Storage 51 | ) -> bool: 52 | """ 53 | Called for each file before copying happens, this method decides 54 | whether a file should be copied or not. Return False to indicate that 55 | the file is already up-to-date and should not be copied, or True to 56 | indicate that it is stale and needs updating. 57 | """ 58 | ... 59 | 60 | def pre_should_copy_hook(self) -> None: 61 | """Hook called before calling should_copy_file.""" 62 | ... 63 | 64 | def post_copy_hook( 65 | self, path: str, prefixed_path: str, local_storage: Storage 66 | ) -> None: 67 | """Hook called after a file is copied.""" 68 | ... 69 | 70 | def on_skip_hook( 71 | self, path: str, prefixed_path: str, local_storage: Storage 72 | ) -> None: 73 | """Hook called when a file copy is skipped.""" 74 | ... 75 | 76 | def second_pass_strategy(self) -> "Optional[Strategy[_RemoteStorage]]": 77 | """ 78 | Strategy that is used after the first pass of hashing is done - to copy the files 79 | to the remote destination. 80 | """ 81 | return None 82 | 83 | def copy_args_hook( 84 | self, args: Tuple[str, str, Storage] 85 | ) -> Tuple[str, str, Storage]: 86 | """Hook called before copying a file. Use this to modify the path or storage.""" 87 | return args 88 | 89 | 90 | class HashStrategy(Strategy[_RemoteStorage], abc.ABC): 91 | use_gzip = False 92 | 93 | def should_copy_file( 94 | self, path: str, prefixed_path: str, local_storage: Storage 95 | ) -> bool: 96 | local_hash = self.get_local_file_hash(path, local_storage) 97 | remote_hash = self.get_remote_file_hash(prefixed_path) 98 | return local_hash != remote_hash 99 | 100 | def get_gzipped_local_file_hash( 101 | self, uncompressed_file_hash: str, path: str, contents: str 102 | ) -> str: 103 | buffer = BytesIO() 104 | zf = gzip.GzipFile(mode="wb", fileobj=buffer, mtime=0.0) 105 | zf.write(force_bytes(contents)) 106 | zf.close() 107 | return hashlib.md5(buffer.getvalue()).hexdigest() 108 | 109 | @lru_cache(maxsize=None) 110 | def get_local_file_hash(self, path: str, local_storage: Storage) -> str: 111 | """Create md5 hash from file contents.""" 112 | # Read file contents and handle file closing 113 | file = local_storage.open(path) 114 | try: 115 | contents = file.read() 116 | finally: 117 | file.close() 118 | 119 | file_hash = hashlib.md5(contents).hexdigest() 120 | 121 | # Check if content should be gzipped and hash gzipped content 122 | content_type = mimetypes.guess_type(path)[0] or "application/octet-stream" 123 | if self.use_gzip and content_type in settings.gzip_content_types: 124 | file_hash = self.get_gzipped_local_file_hash(file_hash, path, contents) 125 | 126 | return file_hash 127 | 128 | @abc.abstractmethod 129 | def get_remote_file_hash(self, prefixed_path: str) -> Optional[str]: ... 130 | 131 | 132 | class CachingHashStrategy(HashStrategy[_RemoteStorage], abc.ABC): 133 | @lru_cache(maxsize=None) 134 | def get_cache_key(self, path: str) -> str: 135 | path_hash = hashlib.md5(path.encode()).hexdigest() 136 | return settings.cache_key_prefix + path_hash 137 | 138 | def invalidate_cached_hash(self, path: str) -> None: 139 | cache.delete(self.get_cache_key(path)) 140 | 141 | def should_copy_file( 142 | self, path: str, prefixed_path: str, local_storage: Storage 143 | ) -> bool: 144 | local_hash = self.get_local_file_hash(path, local_storage) 145 | remote_hash = self.get_cached_remote_file_hash(path, prefixed_path) 146 | if local_hash != remote_hash: 147 | # invalidate cached hash, since we expect its corresponding file to 148 | # be overwritten 149 | self.invalidate_cached_hash(path) 150 | return True 151 | return False 152 | 153 | def get_cached_remote_file_hash(self, path: str, prefixed_path: str) -> str: 154 | """Cache the hash of the remote storage file.""" 155 | cache_key = self.get_cache_key(path) 156 | hash_ = cache.get(cache_key, False) 157 | if hash_ is False: 158 | hash_ = self.get_remote_file_hash(prefixed_path) 159 | cache.set(cache_key, hash_) 160 | return str(hash_) 161 | 162 | def get_gzipped_local_file_hash( 163 | self, uncompressed_file_hash: str, path: str, contents: str 164 | ) -> str: 165 | """Cache the hash of the gzipped local file.""" 166 | cache_key = self.get_cache_key("gzip_hash_%s" % uncompressed_file_hash) 167 | file_hash = cache.get(cache_key, False) 168 | if file_hash is False: 169 | file_hash = super().get_gzipped_local_file_hash( 170 | uncompressed_file_hash, path, contents 171 | ) 172 | cache.set(cache_key, file_hash) 173 | return str(file_hash) 174 | 175 | def post_copy_hook( 176 | self, path: str, prefixed_path: str, local_storage: Storage 177 | ) -> None: 178 | """Cache the hash of the just copied local file.""" 179 | super().post_copy_hook(path, prefixed_path, local_storage) 180 | key = self.get_cache_key(path) 181 | value = self.get_local_file_hash(path, local_storage) 182 | cache.set(key, value) 183 | 184 | 185 | class DisabledStrategy(Strategy): 186 | def should_copy_file( 187 | self, path: str, prefixed_path: str, local_storage: Storage 188 | ) -> NoReturn: 189 | raise NotImplementedError 190 | 191 | def pre_should_copy_hook(self) -> NoReturn: 192 | raise NotImplementedError 193 | 194 | 195 | def load_strategy(klass: Union[str, type, object]) -> Type[Strategy[Storage]]: 196 | if isinstance(klass, str): 197 | klass = pydoc.locate(klass) 198 | if not isinstance(klass, type) or not issubclass(klass, Strategy): 199 | raise ImproperlyConfigured( 200 | "Configured strategies must be subclasses of %s.%s" 201 | % (Strategy.__module__, Strategy.__qualname__) 202 | ) 203 | return klass 204 | -------------------------------------------------------------------------------- /collectfasta/tests/command/test_command.py: -------------------------------------------------------------------------------- 1 | import timeit 2 | 3 | import pytest 4 | from django.core.exceptions import ImproperlyConfigured 5 | from django.test import override_settings as override_django_settings 6 | from django.test.utils import override_settings 7 | from pytest_mock import MockerFixture 8 | 9 | from collectfasta.management.commands.collectstatic import Command 10 | from collectfasta.tests.conftest import StrategyFixture 11 | from collectfasta.tests.conftest import aws_backends_only 12 | from collectfasta.tests.conftest import azure_backends_only 13 | from collectfasta.tests.conftest import cloud_backends_only 14 | from collectfasta.tests.conftest import exclude_two_pass 15 | from collectfasta.tests.conftest import live_test 16 | from collectfasta.tests.conftest import speed_test 17 | from collectfasta.tests.conftest import two_pass_only 18 | from collectfasta.tests.utils import clean_static_dir 19 | from collectfasta.tests.utils import create_larger_than_4mb_referenced_static_file 20 | from collectfasta.tests.utils import create_static_file 21 | from collectfasta.tests.utils import create_two_referenced_static_files 22 | from collectfasta.tests.utils import make_100_files 23 | from collectfasta.tests.utils import override_setting 24 | from collectfasta.tests.utils import override_storage_attr 25 | 26 | from .utils import call_collectstatic 27 | 28 | 29 | @live_test 30 | def test_basics(strategy: StrategyFixture) -> None: 31 | clean_static_dir() 32 | create_two_referenced_static_files() 33 | assert ( 34 | f"{strategy.expected_copied_files(2)} static files copied." 35 | in call_collectstatic() 36 | ) 37 | # file state should now be cached 38 | assert "0 static files copied." in call_collectstatic() 39 | 40 | 41 | @live_test 42 | def test_only_copies_new(strategy: StrategyFixture) -> None: 43 | clean_static_dir() 44 | create_two_referenced_static_files() 45 | assert ( 46 | f"{strategy.expected_copied_files(2)} static files copied." 47 | in call_collectstatic() 48 | ) 49 | create_two_referenced_static_files() 50 | # Since the files were already created and are expected to be cached/not copied again, 51 | # we expect 0 new files to be copied. 52 | assert ( 53 | f"{strategy.expected_copied_files(2)} static files copied." 54 | in call_collectstatic() 55 | ) 56 | 57 | 58 | @live_test 59 | @override_setting("threads", 5) 60 | def test_threads(strategy: StrategyFixture) -> None: 61 | clean_static_dir() 62 | create_two_referenced_static_files() 63 | assert ( 64 | f"{strategy.expected_copied_files(2)} static files copied." 65 | in call_collectstatic() 66 | ) 67 | # file state should now be cached 68 | assert "0 static files copied." in call_collectstatic() 69 | 70 | 71 | @cloud_backends_only 72 | @speed_test 73 | def test_basics_cloud_speed(strategy: StrategyFixture) -> None: 74 | clean_static_dir() 75 | make_100_files() 76 | assert ( 77 | f"{strategy.expected_copied_files(100)} static files copied." 78 | in call_collectstatic() 79 | ) 80 | 81 | def collectstatic_one(): 82 | assert ( 83 | f"{strategy.expected_copied_files(2)} static files copied." 84 | in call_collectstatic() 85 | ) 86 | 87 | create_two_referenced_static_files() 88 | ittook = timeit.timeit(collectstatic_one, number=1) 89 | print(f"it took {ittook} seconds") 90 | 91 | 92 | @cloud_backends_only 93 | @speed_test 94 | @override_settings( 95 | INSTALLED_APPS=["django.contrib.staticfiles"], COLLECTFASTA_STRATEGY=None 96 | ) 97 | def test_no_collectfasta_cloud_speed(strategy: StrategyFixture) -> None: 98 | clean_static_dir() 99 | make_100_files() 100 | assert "100 static files copied" in call_collectstatic() 101 | 102 | def collectstatic_one(): 103 | assert "2 static files copied" in call_collectstatic() 104 | 105 | create_two_referenced_static_files() 106 | ittook = timeit.timeit(collectstatic_one, number=1) 107 | print(f"it took {ittook} seconds") 108 | 109 | 110 | @exclude_two_pass 111 | def test_dry_run(strategy: StrategyFixture) -> None: 112 | clean_static_dir() 113 | create_static_file() 114 | result = call_collectstatic(dry_run=True) 115 | assert "1 static file copied." in result 116 | assert "Pretending to copy" in result 117 | result = call_collectstatic(dry_run=True) 118 | assert "1 static file copied." in result 119 | assert "Pretending to copy" in result 120 | assert "Pretending to delete" in result 121 | 122 | 123 | @two_pass_only 124 | def test_dry_run_two_pass(strategy: StrategyFixture) -> None: 125 | clean_static_dir() 126 | create_static_file() 127 | result = call_collectstatic(dry_run=True) 128 | assert "0 static files copied." in result 129 | assert "Pretending to copy" in result 130 | result = call_collectstatic(dry_run=True) 131 | assert "0 static files copied." in result 132 | assert "Pretending to copy" in result 133 | assert "Pretending to delete" in result 134 | 135 | 136 | @aws_backends_only 137 | @live_test 138 | @override_storage_attr("gzip", True) 139 | @override_setting("aws_is_gzipped", True) 140 | def test_aws_is_gzipped(strategy: StrategyFixture) -> None: 141 | clean_static_dir() 142 | create_two_referenced_static_files() 143 | assert ( 144 | f"{strategy.expected_copied_files(2)} static files copied." 145 | in call_collectstatic() 146 | ) 147 | 148 | # file state should now be cached 149 | assert "0 static files copied." in call_collectstatic() 150 | 151 | 152 | @override_django_settings(STORAGES={}, COLLECTFASTA_STRATEGY=None) 153 | def test_raises_for_no_configured_strategy() -> None: 154 | with pytest.raises(ImproperlyConfigured): 155 | Command._load_strategy() 156 | 157 | 158 | @live_test 159 | def test_calls_post_copy_hook(strategy: StrategyFixture, mocker: MockerFixture) -> None: 160 | post_copy_hook = mocker.patch( 161 | "collectfasta.strategies.base.Strategy.post_copy_hook", autospec=True 162 | ) 163 | clean_static_dir() 164 | (path_one, path_two) = create_two_referenced_static_files() 165 | cmd = Command() 166 | cmd.run_from_argv(["manage.py", "collectstatic", "--noinput"]) 167 | post_copy_hook.assert_has_calls( 168 | [ 169 | mocker.call(mocker.ANY, path_one.name, path_one.name, mocker.ANY), 170 | mocker.call( 171 | mocker.ANY, 172 | f"{path_one.name.replace('.html','')}_folder/{path_two.name}", 173 | f"{path_one.name.replace('.html','')}_folder/{path_two.name}", 174 | mocker.ANY, 175 | ), 176 | ], 177 | any_order=True, 178 | ) 179 | 180 | 181 | @live_test 182 | def test_calls_on_skip_hook(strategy: StrategyFixture, mocker: MockerFixture) -> None: 183 | on_skip_hook = mocker.patch( 184 | "collectfasta.strategies.base.Strategy.on_skip_hook", autospec=True 185 | ) 186 | clean_static_dir() 187 | (path_one, path_two) = create_two_referenced_static_files() 188 | cmd = Command() 189 | cmd.run_from_argv(["manage.py", "collectstatic", "--noinput"]) 190 | on_skip_hook.assert_not_called() 191 | cmd.run_from_argv(["manage.py", "collectstatic", "--noinput"]) 192 | on_skip_hook.assert_has_calls( 193 | [ 194 | mocker.call(mocker.ANY, path_one.name, path_one.name, mocker.ANY), 195 | mocker.call( 196 | mocker.ANY, 197 | f"{path_one.name.replace('.html','')}_folder/{path_two.name}", 198 | f"{path_one.name.replace('.html','')}_folder/{path_two.name}", 199 | mocker.ANY, 200 | ), 201 | ], 202 | any_order=True, 203 | ) 204 | 205 | 206 | @azure_backends_only 207 | @live_test 208 | def test_azure_large_file_hashing( 209 | strategy: StrategyFixture, mocker: MockerFixture 210 | ) -> None: 211 | from collectfasta.strategies.azure import AzureBlobStrategy 212 | 213 | create_composite_hash_spy = mocker.spy(AzureBlobStrategy, "_create_composite_hash") 214 | 215 | clean_static_dir() 216 | create_two_referenced_static_files() 217 | assert ( 218 | f"{strategy.expected_copied_files(2)} static files copied." 219 | in call_collectstatic() 220 | ) 221 | # files are < 4mb no composite hash should be created 222 | assert create_composite_hash_spy.call_count == 0 223 | 224 | create_larger_than_4mb_referenced_static_file() 225 | # the small files should be cached now 226 | assert ( 227 | f"{strategy.expected_copied_files(2)} static files copied." 228 | in call_collectstatic() 229 | ) 230 | # one file is > 4mb a composite hash should be created 231 | assert create_composite_hash_spy.call_count == 1 232 | # file state should now be cached 233 | assert "0 static files copied." in call_collectstatic() 234 | # again the the > 4mb file triggers a hash creation 235 | assert create_composite_hash_spy.call_count == 2 236 | -------------------------------------------------------------------------------- /collectfasta/strategies/boto3.py: -------------------------------------------------------------------------------- 1 | import logging 2 | import os 3 | import tempfile 4 | from typing import Any 5 | from typing import Dict 6 | from typing import Optional 7 | from typing import TypeVar 8 | from typing import Union 9 | from typing import cast 10 | 11 | import botocore.exceptions 12 | from boto3.s3.transfer import TransferConfig 13 | from django.contrib.staticfiles.storage import ManifestFilesMixin 14 | from django.core.files.storage import Storage 15 | from django.utils.timezone import make_naive 16 | from storages.backends.s3boto3 import S3Boto3Storage 17 | from storages.backends.s3boto3 import S3ManifestStaticStorage 18 | from storages.backends.s3boto3 import S3StaticStorage 19 | from storages.utils import clean_name 20 | from storages.utils import is_seekable 21 | from storages.utils import safe_join 22 | from storages.utils import setting 23 | 24 | from collectfasta import settings 25 | 26 | from .base import CachingHashStrategy 27 | from .hashing import TwoPassFileSystemStrategy 28 | from .hashing import TwoPassInMemoryStrategy 29 | from .hashing import WithoutPrefixMixin 30 | 31 | logger = logging.getLogger(__name__) 32 | 33 | 34 | class S3StorageWrapperBase(S3Boto3Storage): 35 | def __init__(self, *args: Any, original: Any, **kwargs: Any) -> None: 36 | default_settings = original.get_default_settings() 37 | for name in default_settings.keys(): 38 | setattr(self, name, default_settings[name]) 39 | if hasattr(original, name): 40 | setattr(self, name, getattr(original, name)) 41 | for arg in [ 42 | "_bucket", 43 | "_connections", 44 | "access_key", 45 | "secret_key", 46 | "security_token", 47 | "config", 48 | "_transfer_config", 49 | ]: 50 | if hasattr(original, arg): 51 | setattr(self, arg, getattr(original, arg)) 52 | if not hasattr(self, "_transfer_config"): 53 | # not sure why, but the original doesn't have this attribute 54 | self._transfer_config = TransferConfig(use_threads=self.use_threads) 55 | 56 | self.preload_metadata = True 57 | self._entries: Dict[str, str] = {} 58 | 59 | # restores the "preload_metadata" method that was removed in django-storages 1.10 60 | def _save(self, name, content): 61 | cleaned_name = clean_name(name) 62 | name = self._normalize_name(cleaned_name) 63 | params = self._get_write_parameters(name, content) 64 | 65 | if is_seekable(content): 66 | content.seek(0, os.SEEK_SET) 67 | if ( 68 | self.gzip 69 | and params["ContentType"] in self.gzip_content_types 70 | and "ContentEncoding" not in params 71 | ): 72 | content = self._compress_content(content) 73 | params["ContentEncoding"] = "gzip" 74 | 75 | obj = self.bucket.Object(name) 76 | if self.preload_metadata: 77 | self._entries[name] = obj 78 | obj.upload_fileobj(content, ExtraArgs=params, Config=self._transfer_config) 79 | return cleaned_name 80 | 81 | @property 82 | def entries(self): 83 | if self.preload_metadata and not self._entries: 84 | self._entries = { 85 | entry.key: entry 86 | for entry in self.bucket.objects.filter(Prefix=self.location) 87 | } 88 | return self._entries 89 | 90 | def delete(self, name): 91 | super().delete(name) 92 | if name in self._entries: 93 | del self._entries[name] 94 | 95 | def exists(self, name): 96 | cleaned_name = self._normalize_name(clean_name(name)) 97 | if self.entries: 98 | return cleaned_name in self.entries 99 | return super().exists(name) 100 | 101 | def size(self, name): 102 | cleaned_name = self._normalize_name(clean_name(name)) 103 | if self.entries: 104 | entry = self.entries.get(cleaned_name) 105 | if entry: 106 | return entry.size if hasattr(entry, "size") else entry.content_length 107 | return 0 108 | return super().size(name) 109 | 110 | def get_modified_time(self, name): 111 | """ 112 | Returns an (aware) datetime object containing the last modified time if 113 | USE_TZ is True, otherwise returns a naive datetime in the local timezone. 114 | """ 115 | name = self._normalize_name(clean_name(name)) 116 | entry = self.entries.get(name) 117 | if entry is None: 118 | entry = self.bucket.Object(name) 119 | if setting("USE_TZ"): 120 | # boto3 returns TZ aware timestamps 121 | return entry.last_modified 122 | else: 123 | return make_naive(entry.last_modified) 124 | 125 | 126 | class ManifestFilesWrapper(ManifestFilesMixin): 127 | def __init__(self, *args: Any, original: Any, **kwargs: Any) -> None: 128 | super().__init__(*args, original=original, **kwargs) 129 | if original.manifest_storage == original: 130 | self.manifest_storage = cast(Storage, self) 131 | else: 132 | self.manifest_storage = original.manifest_storage 133 | for arg in [ 134 | "hashed_files", 135 | "manifest_hash", 136 | "support_js_module_import_aggregation", 137 | "patterns", 138 | "_patterns", 139 | "hashed_files", 140 | ]: 141 | if hasattr(original, arg): 142 | setattr(self, arg, getattr(original, arg)) 143 | 144 | 145 | class S3StorageWrapper(S3StorageWrapperBase, S3Boto3Storage): 146 | pass 147 | 148 | 149 | class S3StaticStorageWrapper(S3StorageWrapperBase, S3StaticStorage): 150 | pass 151 | 152 | 153 | class S3ManifestStaticStorageWrapper( 154 | ManifestFilesWrapper, 155 | S3StorageWrapperBase, 156 | S3ManifestStaticStorage, 157 | ): 158 | def _save(self, name, content): 159 | content.seek(0) 160 | with tempfile.SpooledTemporaryFile() as tmp: 161 | tmp.write(content.read()) 162 | return super()._save(name, tmp) 163 | 164 | 165 | S3Storage = TypeVar( 166 | "S3Storage", bound=Union[S3Boto3Storage, S3StaticStorage, S3ManifestStaticStorage] 167 | ) 168 | 169 | S3StorageWrapped = Union[ 170 | S3StaticStorageWrapper, S3ManifestStaticStorageWrapper, S3StorageWrapper 171 | ] 172 | 173 | 174 | class Boto3Strategy(CachingHashStrategy[S3Storage]): 175 | def __init__(self, remote_storage: S3Storage) -> None: 176 | self.remote_storage = self.wrapped_storage(remote_storage) 177 | super().__init__(self.remote_storage) 178 | self.use_gzip = settings.aws_is_gzipped 179 | 180 | def wrapped_storage(self, remote_storage: S3Storage) -> S3StorageWrapped: 181 | if isinstance(remote_storage, S3ManifestStaticStorage): 182 | return S3ManifestStaticStorageWrapper(original=remote_storage) 183 | elif isinstance(remote_storage, S3StaticStorage): 184 | return S3StaticStorageWrapper(original=remote_storage) 185 | elif isinstance(remote_storage, S3Boto3Storage): 186 | return S3StorageWrapper(original=remote_storage) 187 | return remote_storage 188 | 189 | def wrap_storage(self, remote_storage: S3Storage) -> S3StorageWrapped: 190 | return self.remote_storage 191 | 192 | def _normalize_path(self, prefixed_path: str) -> str: 193 | path = str(safe_join(self.remote_storage.location, prefixed_path)) 194 | return path.replace("\\", "") 195 | 196 | @staticmethod 197 | def _clean_hash(quoted_hash: Optional[str]) -> Optional[str]: 198 | """boto returns hashes wrapped in quotes that need to be stripped.""" 199 | if quoted_hash is None: 200 | return None 201 | assert quoted_hash[0] == quoted_hash[-1] == '"' 202 | return quoted_hash[1:-1] 203 | 204 | def get_remote_file_hash(self, prefixed_path: str) -> Optional[str]: 205 | normalized_path = self._normalize_path(prefixed_path) 206 | logger.debug("Getting file hash", extra={"normalized_path": normalized_path}) 207 | try: 208 | hash_: str 209 | if normalized_path in self.remote_storage.entries: 210 | hash_ = self.remote_storage.entries[normalized_path].e_tag 211 | else: 212 | hash_ = self.remote_storage.bucket.Object(normalized_path).e_tag 213 | except botocore.exceptions.ClientError: 214 | logger.debug("Error on remote hash request", exc_info=True) 215 | return None 216 | return self._clean_hash(hash_) 217 | 218 | def pre_should_copy_hook(self) -> None: 219 | if settings.threads: 220 | logger.info("Resetting connection") 221 | self.remote_storage._connection = None 222 | 223 | 224 | class Boto3WithoutPrefixStrategy(WithoutPrefixMixin, Boto3Strategy): 225 | pass 226 | 227 | 228 | class Boto3ManifestMemoryStrategy(TwoPassInMemoryStrategy): 229 | second_strategy = Boto3WithoutPrefixStrategy 230 | 231 | 232 | class Boto3ManifestFileSystemStrategy(TwoPassFileSystemStrategy): 233 | second_strategy = Boto3WithoutPrefixStrategy 234 | -------------------------------------------------------------------------------- /collectfasta/management/commands/collectstatic.py: -------------------------------------------------------------------------------- 1 | from concurrent.futures import ThreadPoolExecutor 2 | from typing import Any 3 | from typing import Dict 4 | from typing import Generator 5 | from typing import List 6 | from typing import Optional 7 | from typing import Tuple 8 | from typing import Type 9 | 10 | from django.conf import settings as django_settings 11 | from django.contrib.staticfiles.management.commands import collectstatic 12 | from django.core.exceptions import ImproperlyConfigured 13 | from django.core.files.storage import Storage 14 | from django.core.management.base import CommandParser 15 | 16 | from collectfasta import __version__ 17 | from collectfasta import settings 18 | from collectfasta.strategies import DisabledStrategy 19 | from collectfasta.strategies import Strategy 20 | from collectfasta.strategies import load_strategy 21 | 22 | Task = Tuple[str, str, Storage] 23 | 24 | 25 | def collect_from_folder( 26 | storage: Storage, path: str = "" 27 | ) -> Generator[tuple[str, str], str, None]: 28 | folders, files = storage.listdir(path) 29 | for thefile in files: 30 | if path: 31 | prefixed = f"{path}/{thefile}" 32 | else: 33 | prefixed = thefile 34 | yield prefixed, prefixed 35 | for folder in folders: 36 | if path: 37 | folder = f"{path}/{folder}" 38 | yield from collect_from_folder(storage, folder) 39 | 40 | 41 | class Command(collectstatic.Command): 42 | def __init__(self, *args: Any, **kwargs: Any) -> None: 43 | super().__init__(*args, **kwargs) 44 | self.num_copied_files = 0 45 | self.tasks: List[Task] = [] 46 | self.collectfasta_enabled = settings.enabled 47 | self.strategy: Strategy = DisabledStrategy(Storage()) 48 | self.found_files: Dict[str, Tuple[Storage, str]] = {} 49 | 50 | @staticmethod 51 | def _load_strategy() -> Type[Strategy[Storage]]: 52 | strategy_str = getattr(django_settings, "COLLECTFASTA_STRATEGY", None) 53 | if strategy_str is not None: 54 | return load_strategy(strategy_str) 55 | 56 | raise ImproperlyConfigured( 57 | "No strategy configured, please make sure COLLECTFASTA_STRATEGY is set." 58 | ) 59 | 60 | def get_version(self) -> str: 61 | return __version__ 62 | 63 | def add_arguments(self, parser: CommandParser) -> None: 64 | super().add_arguments(parser) 65 | parser.add_argument( 66 | "--disable-collectfasta", 67 | action="store_true", 68 | dest="disable_collectfasta", 69 | default=False, 70 | help="Disable Collectfasta.", 71 | ) 72 | 73 | def set_options(self, **options: Any) -> None: 74 | self.collectfasta_enabled = self.collectfasta_enabled and not options.pop( 75 | "disable_collectfasta" 76 | ) 77 | if self.collectfasta_enabled: 78 | self.strategy = self._load_strategy()(self.storage) 79 | self.storage = self.strategy.wrap_storage(self.storage) 80 | super().set_options(**options) 81 | 82 | def second_pass(self, stats: Dict[str, List[str]]) -> Dict[str, List[str]]: 83 | second_pass_strategy = self.strategy.second_pass_strategy() 84 | if self.collectfasta_enabled and second_pass_strategy: 85 | self.copied_files = [] 86 | self.symlinked_files = [] 87 | self.unmodified_files = [] 88 | self.deleted_files: list[str] = [] 89 | self.skipped_files: list[str] = [] 90 | self.num_copied_files = 0 91 | source_storage = self.storage 92 | self.storage = second_pass_strategy.wrap_storage(self.storage) 93 | self.strategy = second_pass_strategy 94 | self.log(f"Running second pass with {self.strategy.__class__.__name__}...") 95 | for f, prefixed in collect_from_folder(source_storage): 96 | self.maybe_copy_file((f, prefixed, source_storage)) 97 | return { 98 | "modified": self.copied_files + self.symlinked_files, 99 | "unmodified": self.unmodified_files, 100 | "post_processed": self.post_processed_files, 101 | "deleted": self.deleted_files, 102 | "skipped": self.skipped_files, 103 | } 104 | 105 | return stats 106 | 107 | def collect(self) -> Dict[str, List[str]]: 108 | """ 109 | Override collect to copy files concurrently. The tasks are populated by 110 | Command.copy_file() which is called by super().collect(). 111 | """ 112 | if not self.collectfasta_enabled or not settings.threads: 113 | return self.second_pass(super().collect()) 114 | 115 | # Store original value of post_process in super_post_process and always 116 | # set the value to False to prevent the default behavior from 117 | # interfering when using threads. See maybe_post_process(). 118 | super_post_process = self.post_process 119 | self.post_process = False 120 | 121 | return_value = super().collect() 122 | 123 | with ThreadPoolExecutor(settings.threads) as pool: 124 | pool.map(self.maybe_copy_file, self.tasks) 125 | 126 | self.maybe_post_process(super_post_process) 127 | return_value["post_processed"] = self.post_processed_files 128 | return self.second_pass(return_value) 129 | 130 | def handle(self, *args: Any, **options: Any) -> Optional[str]: 131 | """Override handle to suppress summary output.""" 132 | ret = super().handle(**options) 133 | if not self.collectfasta_enabled: 134 | return ret 135 | plural = "" if self.num_copied_files == 1 else "s" 136 | return f"{self.num_copied_files} static file{plural} copied." 137 | 138 | def maybe_copy_file(self, args: Task) -> None: 139 | """Determine if file should be copied or not and handle exceptions.""" 140 | path, prefixed_path, source_storage = self.strategy.copy_args_hook(args) 141 | # Build up found_files to look identical to how it's created in the 142 | # builtin command's collect() method so that we can run post_process 143 | # after all parallel uploads finish. 144 | self.found_files[prefixed_path] = (source_storage, path) 145 | 146 | if self.collectfasta_enabled and not self.dry_run: 147 | self.strategy.pre_should_copy_hook() 148 | 149 | if not self.strategy.should_copy_file(path, prefixed_path, source_storage): 150 | self.log(f"Skipping '{path}'") 151 | self.strategy.on_skip_hook(path, prefixed_path, source_storage) 152 | return 153 | 154 | self.num_copied_files += 1 155 | 156 | existed = prefixed_path in self.copied_files 157 | super().copy_file(path, prefixed_path, source_storage) 158 | copied = not existed and prefixed_path in self.copied_files 159 | if copied: 160 | self.strategy.post_copy_hook(path, prefixed_path, source_storage) 161 | else: 162 | self.strategy.on_skip_hook(path, prefixed_path, source_storage) 163 | 164 | def copy_file(self, path: str, prefixed_path: str, source_storage: Storage) -> None: 165 | """ 166 | Append path to task queue if threads are enabled, otherwise copy the 167 | file with a blocking call. 168 | """ 169 | args = (path, prefixed_path, source_storage) 170 | if settings.threads and self.collectfasta_enabled: 171 | self.tasks.append(args) 172 | else: 173 | self.maybe_copy_file(args) 174 | 175 | def delete_file( 176 | self, path: str, prefixed_path: str, source_storage: Storage 177 | ) -> bool: 178 | """Override delete_file to skip modified time and exists lookups.""" 179 | if not self.collectfasta_enabled: 180 | return super().delete_file(path, prefixed_path, source_storage) 181 | 182 | if self.dry_run: 183 | self.log(f"Pretending to delete '{path}'") 184 | return True 185 | 186 | self.log(f"Deleting '{path}' on remote storage") 187 | 188 | try: 189 | self.storage.delete(prefixed_path) 190 | except self.strategy.delete_not_found_exception: 191 | pass 192 | 193 | return True 194 | 195 | def maybe_post_process(self, super_post_process: bool) -> None: 196 | # This method is extracted and modified from the collect() method of the 197 | # builtin collectstatic command. 198 | # https://github.com/django/django/blob/5320ba98f3d253afcaa76b4b388a8982f87d4f1a/django/contrib/staticfiles/management/commands/collectstatic.py#L124 199 | 200 | if not super_post_process or not hasattr(self.storage, "post_process"): 201 | return 202 | 203 | processor = self.storage.post_process(self.found_files, dry_run=self.dry_run) 204 | 205 | for original_path, processed_path, processed in processor: 206 | if isinstance(processed, Exception): 207 | self.stderr.write("Post-processing '%s' failed!" % original_path) 208 | # Add a blank line before the traceback, otherwise it's 209 | # too easy to miss the relevant part of the error message. 210 | self.stderr.write("") 211 | raise processed 212 | if processed: 213 | self.log( 214 | f"Post-processed '{original_path}' as '{processed_path}'", 215 | level=2, 216 | ) 217 | self.post_processed_files.append(original_path) 218 | else: 219 | self.log("Skipped post-processing '%s'" % original_path) 220 | --------------------------------------------------------------------------------