├── tests ├── __init__.py ├── auth │ ├── __init__.py │ └── test_planetary_computer.py ├── store │ ├── __init__.py │ ├── test_memory.py │ ├── test_config.py │ ├── test_http.py │ ├── test_gcs.py │ ├── test_azure.py │ ├── test_from_url.py │ └── test_local.py ├── test_version.py ├── test_attributes.py ├── test_backoff.py ├── obspec │ └── test-store.yml ├── test_delete.py ├── test_put.py ├── test_buffered.py ├── test_bytes.py └── conftest.py ├── docs ├── index.md ├── CHANGELOG.md ├── blog │ ├── index.md │ ├── .authors.yml │ └── posts │ │ └── obstore-0.7.md ├── dev │ ├── DEVELOP.md │ ├── overridden-defaults.md │ └── functional-api.md ├── assets │ ├── example.gif │ ├── zarr-example.png │ ├── logo_no_text.png │ ├── aws_type_hint1.png │ ├── aws_type_hint2.png │ ├── fsspec-type-hinting.png │ ├── cloudflare-r2-bucket-info.png │ ├── cloudflare-r2-credentials.jpg │ ├── class-methods-vscode-suggestions.jpg │ ├── planetary-computer-naip-thumbnail.jpg │ └── sentinel2-grca-thumbnail-obstore-04.jpg ├── api │ ├── exceptions.md │ ├── copy.md │ ├── head.md │ ├── delete.md │ ├── rename.md │ ├── attributes.md │ ├── fsspec.md │ ├── store │ │ ├── index.md │ │ ├── http.md │ │ ├── config.md │ │ ├── local.md │ │ ├── memory.md │ │ ├── aws.md │ │ ├── gcs.md │ │ └── azure.md │ ├── auth │ │ ├── earthdata.md │ │ ├── boto3.md │ │ ├── google.md │ │ ├── azure.md │ │ └── planetary-computer.md │ ├── sign.md │ ├── put.md │ ├── list.md │ ├── get.md │ └── file.md ├── overrides │ ├── main.html │ └── stylesheets │ │ └── extra.css ├── integrations │ └── index.md ├── obspec.md ├── advanced │ └── pickle.md ├── examples │ ├── tqdm.md │ ├── minio.md │ ├── r2.md │ ├── pyarrow.md │ ├── zarr.md │ ├── stream-zip.md │ └── fastapi.md ├── troubleshooting │ └── aws.md ├── performance.md └── alternatives.md ├── obstore ├── README.md ├── python │ └── obstore │ │ ├── py.typed │ │ ├── _bytes.pyi │ │ ├── auth │ │ ├── __init__.py │ │ └── _http.py │ │ ├── _scheme.pyi │ │ ├── __init__.py │ │ ├── _head.pyi │ │ ├── _delete.pyi │ │ ├── _rename.pyi │ │ ├── _copy.pyi │ │ ├── exceptions │ │ └── __init__.pyi │ │ ├── _store │ │ ├── _http.pyi │ │ ├── _retry.pyi │ │ └── _client.pyi │ │ ├── _obstore.pyi │ │ ├── _attributes.pyi │ │ └── _sign.pyi ├── src │ ├── utils.rs │ ├── tags.rs │ ├── scheme.rs │ ├── path.rs │ ├── head.rs │ ├── copy.rs │ ├── rename.rs │ ├── delete.rs │ ├── attributes.rs │ └── lib.rs ├── pyproject.toml ├── build.rs └── Cargo.toml ├── examples ├── zarr │ ├── .python-version │ ├── zarr-example.png │ ├── README.md │ ├── pyproject.toml │ └── main.py ├── fastapi │ ├── .python-version │ ├── pyproject.toml │ ├── README.md │ └── main.py ├── minio │ ├── .python-version │ ├── pyproject.toml │ ├── README.md │ └── main.py ├── stream-zip │ ├── .gitignore │ ├── .python-version │ ├── pyproject.toml │ ├── README.md │ └── main.py └── progress-bar │ ├── .python-version │ ├── example.gif │ ├── pyproject.toml │ ├── README.md │ └── main.py ├── pyo3-object_store ├── LICENSE ├── type-hints ├── src │ ├── aws │ │ └── mod.rs │ ├── gcp │ │ └── mod.rs │ ├── azure │ │ ├── mod.rs │ │ └── error.rs │ ├── lib.rs │ ├── memory.rs │ ├── config.rs │ ├── path.rs │ ├── url.rs │ ├── retry.rs │ ├── credentials.rs │ ├── simple.rs │ └── http.rs ├── CHANGELOG.md ├── Cargo.toml └── README.md ├── pyo3-bytes ├── src │ └── lib.rs ├── Cargo.toml ├── LICENSE ├── README.md └── bytes.pyi ├── .github └── workflows │ ├── conventional-commits.yml │ ├── ci.yml │ ├── docs.yml │ └── test-python.yml ├── .pre-commit-config.yaml ├── LICENSE ├── Cargo.toml ├── DEVELOP.md ├── README.md ├── pyproject.toml └── .gitignore /tests/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/auth/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /tests/store/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- 1 | ../README.md -------------------------------------------------------------------------------- /docs/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ../CHANGELOG.md -------------------------------------------------------------------------------- /docs/blog/index.md: -------------------------------------------------------------------------------- 1 | # Blog 2 | -------------------------------------------------------------------------------- /obstore/README.md: -------------------------------------------------------------------------------- 1 | ../README.md -------------------------------------------------------------------------------- /obstore/python/obstore/py.typed: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/dev/DEVELOP.md: -------------------------------------------------------------------------------- 1 | ../../DEVELOP.md -------------------------------------------------------------------------------- /examples/zarr/.python-version: -------------------------------------------------------------------------------- 1 | 3.12 2 | -------------------------------------------------------------------------------- /pyo3-object_store/LICENSE: -------------------------------------------------------------------------------- 1 | ../LICENSE -------------------------------------------------------------------------------- /examples/fastapi/.python-version: -------------------------------------------------------------------------------- 1 | 3.12 2 | -------------------------------------------------------------------------------- /examples/minio/.python-version: -------------------------------------------------------------------------------- 1 | 3.12 2 | -------------------------------------------------------------------------------- /examples/stream-zip/.gitignore: -------------------------------------------------------------------------------- 1 | *.zip 2 | -------------------------------------------------------------------------------- /examples/progress-bar/.python-version: -------------------------------------------------------------------------------- 1 | 3.12 2 | -------------------------------------------------------------------------------- /examples/stream-zip/.python-version: -------------------------------------------------------------------------------- 1 | 3.12 2 | -------------------------------------------------------------------------------- /docs/assets/example.gif: -------------------------------------------------------------------------------- 1 | ../../examples/progress-bar/example.gif -------------------------------------------------------------------------------- /obstore/python/obstore/_bytes.pyi: -------------------------------------------------------------------------------- 1 | ../../../pyo3-bytes/bytes.pyi -------------------------------------------------------------------------------- /pyo3-object_store/type-hints: -------------------------------------------------------------------------------- 1 | ../obstore/python/obstore/_store -------------------------------------------------------------------------------- /docs/assets/zarr-example.png: -------------------------------------------------------------------------------- 1 | ../../examples/zarr/zarr-example.png -------------------------------------------------------------------------------- /docs/api/exceptions.md: -------------------------------------------------------------------------------- 1 | # Exceptions 2 | 3 | ::: obstore.exceptions 4 | -------------------------------------------------------------------------------- /docs/api/copy.md: -------------------------------------------------------------------------------- 1 | # Copy 2 | 3 | ::: obstore.copy 4 | ::: obstore.copy_async 5 | -------------------------------------------------------------------------------- /docs/api/head.md: -------------------------------------------------------------------------------- 1 | # Head 2 | 3 | ::: obstore.head 4 | ::: obstore.head_async 5 | -------------------------------------------------------------------------------- /docs/api/delete.md: -------------------------------------------------------------------------------- 1 | # Delete 2 | 3 | ::: obstore.delete 4 | ::: obstore.delete_async 5 | -------------------------------------------------------------------------------- /docs/api/rename.md: -------------------------------------------------------------------------------- 1 | # Rename 2 | 3 | ::: obstore.rename 4 | ::: obstore.rename_async 5 | -------------------------------------------------------------------------------- /obstore/python/obstore/auth/__init__.py: -------------------------------------------------------------------------------- 1 | """A collection of credential providers.""" 2 | -------------------------------------------------------------------------------- /docs/api/attributes.md: -------------------------------------------------------------------------------- 1 | # Attributes 2 | 3 | ::: obstore.Attribute 4 | ::: obstore.Attributes 5 | -------------------------------------------------------------------------------- /docs/api/fsspec.md: -------------------------------------------------------------------------------- 1 | ::: obstore.fsspec 2 | options: 3 | inherited_members: true 4 | -------------------------------------------------------------------------------- /pyo3-object_store/src/aws/mod.rs: -------------------------------------------------------------------------------- 1 | mod credentials; 2 | mod store; 3 | 4 | pub use store::PyS3Store; 5 | -------------------------------------------------------------------------------- /pyo3-object_store/src/gcp/mod.rs: -------------------------------------------------------------------------------- 1 | mod credentials; 2 | mod store; 3 | 4 | pub use store::PyGCSStore; 5 | -------------------------------------------------------------------------------- /docs/api/store/index.md: -------------------------------------------------------------------------------- 1 | # ObjectStore 2 | 3 | ::: obstore.store.from_url 4 | ::: obstore.store.ObjectStore 5 | -------------------------------------------------------------------------------- /docs/assets/logo_no_text.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/logo_no_text.png -------------------------------------------------------------------------------- /docs/assets/aws_type_hint1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/aws_type_hint1.png -------------------------------------------------------------------------------- /docs/assets/aws_type_hint2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/aws_type_hint2.png -------------------------------------------------------------------------------- /examples/zarr/zarr-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/examples/zarr/zarr-example.png -------------------------------------------------------------------------------- /pyo3-object_store/src/azure/mod.rs: -------------------------------------------------------------------------------- 1 | mod credentials; 2 | mod error; 3 | mod store; 4 | 5 | pub use store::PyAzureStore; 6 | -------------------------------------------------------------------------------- /examples/progress-bar/example.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/examples/progress-bar/example.gif -------------------------------------------------------------------------------- /docs/api/auth/earthdata.md: -------------------------------------------------------------------------------- 1 | # NASA Earthdata 2 | 3 | ::: obstore.auth.earthdata 4 | options: 5 | members_order: source 6 | -------------------------------------------------------------------------------- /docs/assets/fsspec-type-hinting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/fsspec-type-hinting.png -------------------------------------------------------------------------------- /docs/api/sign.md: -------------------------------------------------------------------------------- 1 | # Sign 2 | 3 | ::: obstore.sign 4 | ::: obstore.sign_async 5 | ::: obstore.SignCapableStore 6 | ::: obstore.HTTP_METHOD 7 | -------------------------------------------------------------------------------- /docs/assets/cloudflare-r2-bucket-info.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/cloudflare-r2-bucket-info.png -------------------------------------------------------------------------------- /docs/assets/cloudflare-r2-credentials.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/cloudflare-r2-credentials.jpg -------------------------------------------------------------------------------- /pyo3-bytes/src/lib.rs: -------------------------------------------------------------------------------- 1 | #![doc = include_str!("../README.md")] 2 | #![warn(missing_docs)] 3 | 4 | mod bytes; 5 | 6 | pub use bytes::PyBytes; 7 | -------------------------------------------------------------------------------- /docs/api/store/http.md: -------------------------------------------------------------------------------- 1 | # HTTP 2 | 3 | ::: obstore.store.HTTPStore 4 | options: 5 | inherited_members: true 6 | show_bases: false 7 | -------------------------------------------------------------------------------- /docs/api/put.md: -------------------------------------------------------------------------------- 1 | # Put 2 | 3 | ::: obstore.put 4 | ::: obstore.put_async 5 | ::: obstore.PutResult 6 | ::: obstore.UpdateVersion 7 | ::: obstore.PutMode 8 | -------------------------------------------------------------------------------- /docs/api/store/config.md: -------------------------------------------------------------------------------- 1 | # Configuration 2 | 3 | ::: obstore.store.ClientConfig 4 | ::: obstore.store.BackoffConfig 5 | ::: obstore.store.RetryConfig 6 | -------------------------------------------------------------------------------- /docs/api/store/local.md: -------------------------------------------------------------------------------- 1 | # Local 2 | 3 | ::: obstore.store.LocalStore 4 | options: 5 | inherited_members: true 6 | show_bases: false 7 | -------------------------------------------------------------------------------- /docs/assets/class-methods-vscode-suggestions.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/class-methods-vscode-suggestions.jpg -------------------------------------------------------------------------------- /docs/assets/planetary-computer-naip-thumbnail.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/planetary-computer-naip-thumbnail.jpg -------------------------------------------------------------------------------- /docs/blog/.authors.yml: -------------------------------------------------------------------------------- 1 | authors: 2 | kylebarron: 3 | name: Kyle Barron 4 | description: Creator 5 | avatar: https://github.com/kylebarron.png 6 | -------------------------------------------------------------------------------- /docs/api/store/memory.md: -------------------------------------------------------------------------------- 1 | # Memory 2 | 3 | ::: obstore.store.MemoryStore 4 | options: 5 | inherited_members: true 6 | show_bases: false 7 | -------------------------------------------------------------------------------- /docs/assets/sentinel2-grca-thumbnail-obstore-04.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/developmentseed/obstore/HEAD/docs/assets/sentinel2-grca-thumbnail-obstore-04.jpg -------------------------------------------------------------------------------- /obstore/python/obstore/_scheme.pyi: -------------------------------------------------------------------------------- 1 | from typing import Literal 2 | 3 | def parse_scheme( 4 | url: str, 5 | ) -> Literal["s3", "gcs", "http", "local", "memory", "azure"]: ... 6 | -------------------------------------------------------------------------------- /docs/api/auth/boto3.md: -------------------------------------------------------------------------------- 1 | # Boto3 2 | 3 | ::: obstore.auth.boto3.Boto3CredentialProvider 4 | options: 5 | members_order: source 6 | ::: obstore.auth.boto3.StsCredentialProvider 7 | options: 8 | members_order: source 9 | -------------------------------------------------------------------------------- /tests/store/test_memory.py: -------------------------------------------------------------------------------- 1 | from obstore.store import MemoryStore 2 | 3 | 4 | def test_eq(): 5 | store = MemoryStore() 6 | store2 = MemoryStore() 7 | assert store == store # noqa: PLR0124 8 | assert store != store2 9 | -------------------------------------------------------------------------------- /docs/api/auth/google.md: -------------------------------------------------------------------------------- 1 | # Google 2 | 3 | ::: obstore.auth.google.GoogleCredentialProvider 4 | options: 5 | members_order: source 6 | ::: obstore.auth.google.GoogleAsyncCredentialProvider 7 | options: 8 | members_order: source 9 | -------------------------------------------------------------------------------- /docs/api/list.md: -------------------------------------------------------------------------------- 1 | # List 2 | 3 | ::: obstore.list 4 | ::: obstore.list_with_delimiter 5 | ::: obstore.list_with_delimiter_async 6 | ::: obstore.ObjectMeta 7 | ::: obstore.ListResult 8 | ::: obstore.ListStream 9 | ::: obstore.ListChunkType 10 | -------------------------------------------------------------------------------- /examples/fastapi/pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "fastapi-example" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.11" 7 | dependencies = ["fastapi[standard]>=0.115.6", "obstore>=0.6"] 8 | -------------------------------------------------------------------------------- /docs/api/auth/azure.md: -------------------------------------------------------------------------------- 1 | # Azure 2 | 3 | ::: obstore.auth.azure.DEFAULT_SCOPES 4 | ::: obstore.auth.azure.AzureCredentialProvider 5 | options: 6 | members_order: source 7 | ::: obstore.auth.azure.AzureAsyncCredentialProvider 8 | options: 9 | members_order: source 10 | -------------------------------------------------------------------------------- /tests/store/test_config.py: -------------------------------------------------------------------------------- 1 | from datetime import timedelta 2 | 3 | from obstore.store import HTTPStore 4 | 5 | 6 | def test_config_timedelta(): 7 | HTTPStore.from_url( 8 | "https://example.com", 9 | client_options={"timeout": timedelta(seconds=30)}, 10 | ) 11 | -------------------------------------------------------------------------------- /tests/test_version.py: -------------------------------------------------------------------------------- 1 | from obstore import __version__, _object_store_source, _object_store_version 2 | 3 | 4 | def test_versions_are_str(): 5 | assert isinstance(__version__, str) 6 | assert isinstance(_object_store_version, str) 7 | assert isinstance(_object_store_source, str) 8 | -------------------------------------------------------------------------------- /examples/minio/pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "progress-bar" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.11" 7 | dependencies = ["obstore>=0.6"] 8 | 9 | [dependency-groups] 10 | dev = ["ipykernel>=6.29.5"] 11 | -------------------------------------------------------------------------------- /examples/progress-bar/pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "progress-bar" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.11" 7 | dependencies = ["obstore>=0.6", "tqdm"] 8 | 9 | [dependency-groups] 10 | dev = ["ipykernel>=6.29.5"] 11 | -------------------------------------------------------------------------------- /docs/api/auth/planetary-computer.md: -------------------------------------------------------------------------------- 1 | # Planetary Computer 2 | 3 | ::: obstore.auth.planetary_computer.PlanetaryComputerCredentialProvider 4 | options: 5 | members_order: source 6 | ::: obstore.auth.planetary_computer.PlanetaryComputerAsyncCredentialProvider 7 | options: 8 | members_order: source 9 | -------------------------------------------------------------------------------- /examples/fastapi/README.md: -------------------------------------------------------------------------------- 1 | # obstore FastAPI example 2 | 3 | Example returning a streaming response via FastAPI. 4 | 5 | ``` 6 | uv run fastapi dev main.py 7 | ``` 8 | 9 | Note that here FastAPI wraps `starlette.responses`. So any web server that uses 10 | Starlette for responses can use this same code. 11 | -------------------------------------------------------------------------------- /docs/api/store/aws.md: -------------------------------------------------------------------------------- 1 | # AWS S3 2 | 3 | ::: obstore.store.S3Store 4 | options: 5 | inherited_members: true 6 | show_bases: false 7 | ::: obstore.store.S3Config 8 | options: 9 | show_if_no_docstring: true 10 | ::: obstore.store.S3Credential 11 | ::: obstore.store.S3CredentialProvider 12 | -------------------------------------------------------------------------------- /examples/stream-zip/pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "stream-zip-example" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.11" 7 | dependencies = ["obstore>=0.6", "stream-zip>=0.0.83"] 8 | 9 | [dependency-groups] 10 | dev = ["ipykernel>=6.29.5"] 11 | -------------------------------------------------------------------------------- /docs/api/store/gcs.md: -------------------------------------------------------------------------------- 1 | # Google Cloud Storage 2 | 3 | ::: obstore.store.GCSStore 4 | options: 5 | inherited_members: true 6 | show_bases: false 7 | ::: obstore.store.GCSConfig 8 | options: 9 | show_if_no_docstring: true 10 | ::: obstore.store.GCSCredential 11 | ::: obstore.store.GCSCredentialProvider 12 | -------------------------------------------------------------------------------- /examples/zarr/README.md: -------------------------------------------------------------------------------- 1 | # Obstore Zarr example 2 | 3 | Example using Zarr with the Obstore backend. 4 | 5 | ``` 6 | uv run main.py 7 | ``` 8 | 9 | ![](./zarr-example.png) 10 | 11 | This is a port of the Zarr example in the [Planetary Computer documentation](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-zarr-data/). 12 | -------------------------------------------------------------------------------- /docs/dev/overridden-defaults.md: -------------------------------------------------------------------------------- 1 | # Overridden Defaults 2 | 3 | In general, we wish to follow the upstream `object_store` as closely as possible, which should reduce the maintenance overhead here. 4 | 5 | However, there are occasionally places where we want to diverge from the upstream decision making, and we document those here. 6 | 7 | (Currently none). 8 | -------------------------------------------------------------------------------- /docs/api/get.md: -------------------------------------------------------------------------------- 1 | # Get 2 | 3 | ::: obstore.get 4 | ::: obstore.get_async 5 | ::: obstore.get_range 6 | ::: obstore.get_range_async 7 | ::: obstore.get_ranges 8 | ::: obstore.get_ranges_async 9 | ::: obstore.GetOptions 10 | ::: obstore.GetResult 11 | ::: obstore.BytesStream 12 | ::: obstore.Bytes 13 | ::: obstore.OffsetRange 14 | ::: obstore.SuffixRange 15 | -------------------------------------------------------------------------------- /examples/stream-zip/README.md: -------------------------------------------------------------------------------- 1 | # Obstore stream-zip example 2 | 3 | This example demonstrates how to create a zip archive from files in one store and upload it to another store using the [`stream_zip`](https://github.com/uktrade/stream-zip) library. 4 | 5 | This never stores any entire source file or the target zip file in memory, so you can zip large files with low memory overhead. 6 | -------------------------------------------------------------------------------- /obstore/python/obstore/__init__.py: -------------------------------------------------------------------------------- 1 | from typing import TYPE_CHECKING 2 | 3 | from . import _obstore, store # pyright:ignore[reportMissingModuleSource] 4 | from ._obstore import * # noqa: F403 # pyright:ignore[reportMissingModuleSource] 5 | 6 | if TYPE_CHECKING: 7 | from . import exceptions # noqa: TC004 8 | 9 | 10 | __all__ = ["exceptions", "store"] 11 | __all__ += _obstore.__all__ 12 | -------------------------------------------------------------------------------- /examples/zarr/pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "zarr-example" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.11" 7 | dependencies = [ 8 | "matplotlib>=3.10.1", 9 | "obstore>=0.6.0", 10 | "pystac-client>=0.8.6", 11 | "xarray>=2025.3.0", 12 | "zarr>=3.0.8", 13 | ] 14 | 15 | [dependency-groups] 16 | dev = ["ipykernel>=6.29.5"] 17 | -------------------------------------------------------------------------------- /docs/api/store/azure.md: -------------------------------------------------------------------------------- 1 | # Microsoft Azure 2 | 3 | ::: obstore.store.AzureStore 4 | options: 5 | inherited_members: true 6 | show_bases: false 7 | ::: obstore.store.AzureConfig 8 | options: 9 | show_if_no_docstring: true 10 | ::: obstore.store.AzureAccessKey 11 | ::: obstore.store.AzureSASToken 12 | ::: obstore.store.AzureBearerToken 13 | ::: obstore.store.AzureCredential 14 | ::: obstore.store.AzureCredentialProvider 15 | -------------------------------------------------------------------------------- /.github/workflows/conventional-commits.yml: -------------------------------------------------------------------------------- 1 | name: PR Conventional Commit Validation 2 | 3 | on: 4 | pull_request_target: 5 | types: [opened, synchronize, reopened, edited] 6 | 7 | jobs: 8 | validate-pr-title: 9 | runs-on: ubuntu-latest 10 | steps: 11 | - name: PR Conventional Commit Validation 12 | uses: ytanikin/pr-conventional-commits@1.4.0 13 | with: 14 | task_types: '["feat","fix","docs","test","ci","refactor","perf","chore","revert"]' 15 | -------------------------------------------------------------------------------- /obstore/src/utils.rs: -------------------------------------------------------------------------------- 1 | use pyo3::prelude::*; 2 | 3 | /// Returning `()` from `future_into_py` returns an empty tuple instead of None 4 | /// https://github.com/developmentseed/obstore/issues/240 5 | pub(crate) struct PyNone; 6 | 7 | impl<'py> IntoPyObject<'py> for PyNone { 8 | type Target = PyAny; 9 | type Output = Bound<'py, PyAny>; 10 | type Error = PyErr; 11 | 12 | fn into_pyobject(self, py: Python<'py>) -> Result { 13 | Ok(py.None().bind(py).clone()) 14 | } 15 | } 16 | -------------------------------------------------------------------------------- /examples/progress-bar/README.md: -------------------------------------------------------------------------------- 1 | # Obstore progress bar example 2 | 3 | Example displaying a progress bar from a streaming response using [`tqdm`](https://tqdm.github.io/). 4 | 5 | ![](./example.gif) 6 | 7 | ```shell 8 | uv run python main.py 9 | ``` 10 | 11 | You can also pass an arbitrary URL from the command line for testing: 12 | 13 | ```shell 14 | uv run python main.py https://ookla-open-data.s3.us-west-2.amazonaws.com/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet 15 | ``` 16 | -------------------------------------------------------------------------------- /docs/api/file.md: -------------------------------------------------------------------------------- 1 | # File-like Object 2 | 3 | Native support for reading from object stores as a file-like object. 4 | 5 | Use `obstore.open_reader` or `obstore.open_reader_async` to open readable files. Use `obstore.open_writer` or `obstore.open_writer_async` to open writable files. 6 | 7 | ::: obstore.open_reader 8 | ::: obstore.open_reader_async 9 | ::: obstore.open_writer 10 | ::: obstore.open_writer_async 11 | ::: obstore.ReadableFile 12 | ::: obstore.AsyncReadableFile 13 | ::: obstore.WritableFile 14 | ::: obstore.AsyncWritableFile 15 | -------------------------------------------------------------------------------- /docs/overrides/main.html: -------------------------------------------------------------------------------- 1 | {% extends "base.html" %} 2 | 3 | {% block content %} 4 | {% if page.nb_url %} 5 | 6 | {% include ".icons/material/download.svg" %} 7 | 8 | {% endif %} 9 | 10 | {{ super() }} 11 | {% endblock content %} 12 | 13 | {% block outdated %} 14 | You're not viewing the latest version. 15 | 16 | Click here to go to latest. 17 | 18 | {% endblock %} 19 | -------------------------------------------------------------------------------- /obstore/python/obstore/_head.pyi: -------------------------------------------------------------------------------- 1 | from ._list import ObjectMeta 2 | from .store import ObjectStore 3 | 4 | def head(store: ObjectStore, path: str) -> ObjectMeta: 5 | """Return the metadata for the specified location. 6 | 7 | Args: 8 | store: The ObjectStore instance to use. 9 | path: The path within ObjectStore to retrieve. 10 | 11 | Returns: 12 | ObjectMeta 13 | 14 | """ 15 | 16 | async def head_async(store: ObjectStore, path: str) -> ObjectMeta: 17 | """Call `head` asynchronously. 18 | 19 | Refer to the documentation for [head][obstore.head]. 20 | """ 21 | -------------------------------------------------------------------------------- /pyo3-bytes/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "pyo3-bytes" 3 | version = "0.5.0" 4 | authors = [ 5 | "Kyle Barron ", 6 | "jesse rubin ", 7 | ] 8 | edition = "2021" 9 | description = "bytes integration for pyo3." 10 | readme = "README.md" 11 | repository = "https://github.com/developmentseed/obstore" 12 | license = "MIT OR Apache-2.0" 13 | keywords = ["python", "pyo3", "buffers", "zero-copy"] 14 | categories = [] 15 | rust-version = "1.75" 16 | 17 | [dependencies] 18 | # https://github.com/tokio-rs/bytes/releases/tag/v1.10.1 19 | bytes = "1.10.1" 20 | pyo3 = "0.27" 21 | 22 | [lib] 23 | crate-type = ["rlib"] 24 | -------------------------------------------------------------------------------- /pyo3-object_store/src/azure/error.rs: -------------------------------------------------------------------------------- 1 | const STORE: &str = "MicrosoftAzure"; 2 | 3 | // Vendored from upstream 4 | /// A specialized `Error` for Azure builder-related errors 5 | #[derive(Debug, thiserror::Error)] 6 | pub(crate) enum Error { 7 | #[error("Failed parsing an SAS key")] 8 | DecodeSasKey { source: std::str::Utf8Error }, 9 | 10 | #[error("Missing component in SAS query pair")] 11 | MissingSasComponent {}, 12 | } 13 | 14 | impl From for object_store::Error { 15 | fn from(source: Error) -> Self { 16 | Self::Generic { 17 | store: STORE, 18 | source: Box::new(source), 19 | } 20 | } 21 | } 22 | -------------------------------------------------------------------------------- /tests/test_attributes.py: -------------------------------------------------------------------------------- 1 | from obstore.store import MemoryStore 2 | 3 | 4 | def test_content_type(): 5 | store = MemoryStore() 6 | store.put("test.txt", b"Hello, World!", attributes={"Content-Type": "text/plain"}) 7 | result = store.get("test.txt") 8 | assert result.attributes.get("Content-Type") == "text/plain" 9 | 10 | 11 | def test_custom_attribute(): 12 | store = MemoryStore() 13 | store.put( 14 | "test.txt", 15 | b"Hello, World!", 16 | attributes={"My-Custom-Attribute": "CustomValue"}, 17 | ) 18 | result = store.get("test.txt") 19 | assert result.attributes.get("My-Custom-Attribute") == "CustomValue" 20 | -------------------------------------------------------------------------------- /tests/store/test_http.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | 3 | from obstore.store import HTTPStore 4 | 5 | 6 | def test_pickle(): 7 | store = HTTPStore.from_url("https://example.com") 8 | new_store: HTTPStore = pickle.loads(pickle.dumps(store)) 9 | assert store.url == new_store.url 10 | 11 | 12 | def test_eq(): 13 | store = HTTPStore.from_url("https://example.com", client_options={"timeout": "10s"}) 14 | store2 = HTTPStore.from_url( 15 | "https://example.com", 16 | client_options={"timeout": "10s"}, 17 | ) 18 | store3 = HTTPStore.from_url("https://example2.com") 19 | assert store == store # noqa: PLR0124 20 | assert store == store2 21 | assert store != store3 22 | -------------------------------------------------------------------------------- /.pre-commit-config.yaml: -------------------------------------------------------------------------------- 1 | # See https://pre-commit.com for more information 2 | # See https://pre-commit.com/hooks.html for more hooks 3 | 4 | # Default to Python 3 5 | default_language_version: 6 | python: python3 7 | 8 | # Optionally both commit and push 9 | default_stages: [pre-commit] 10 | 11 | repos: 12 | - repo: https://github.com/pre-commit/pre-commit-hooks 13 | rev: v5.0.0 14 | hooks: 15 | - id: trailing-whitespace 16 | - id: end-of-file-fixer 17 | - id: check-added-large-files 18 | args: ["--maxkb=500"] 19 | 20 | - repo: https://github.com/astral-sh/ruff-pre-commit 21 | rev: v0.13.0 22 | hooks: 23 | - id: ruff 24 | args: ["--fix"] 25 | - id: ruff-format 26 | -------------------------------------------------------------------------------- /obstore/src/tags.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | 3 | use object_store::TagSet; 4 | use pyo3::prelude::*; 5 | use pyo3::pybacked::PyBackedStr; 6 | 7 | pub(crate) struct PyTagSet(TagSet); 8 | 9 | impl PyTagSet { 10 | pub fn into_inner(self) -> TagSet { 11 | self.0 12 | } 13 | } 14 | 15 | impl<'py> FromPyObject<'_, 'py> for PyTagSet { 16 | type Error = PyErr; 17 | 18 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 19 | let input = obj.extract::>()?; 20 | let mut tag_set = TagSet::default(); 21 | for (key, value) in input.into_iter() { 22 | tag_set.push(&key, &value); 23 | } 24 | Ok(Self(tag_set)) 25 | } 26 | } 27 | -------------------------------------------------------------------------------- /obstore/src/scheme.rs: -------------------------------------------------------------------------------- 1 | use object_store::ObjectStoreScheme; 2 | use pyo3::exceptions::PyValueError; 3 | use pyo3::prelude::*; 4 | use pyo3_object_store::{PyObjectStoreResult, PyUrl}; 5 | 6 | #[pyfunction] 7 | pub(crate) fn parse_scheme(url: PyUrl) -> PyObjectStoreResult<&'static str> { 8 | let (scheme, _) = 9 | object_store::ObjectStoreScheme::parse(url.as_ref()).map_err(object_store::Error::from)?; 10 | match scheme { 11 | ObjectStoreScheme::AmazonS3 => Ok("s3"), 12 | ObjectStoreScheme::GoogleCloudStorage => Ok("gcs"), 13 | ObjectStoreScheme::Http => Ok("http"), 14 | ObjectStoreScheme::Local => Ok("local"), 15 | ObjectStoreScheme::Memory => Ok("memory"), 16 | ObjectStoreScheme::MicrosoftAzure => Ok("azure"), 17 | _ => Err(PyValueError::new_err("Unknown scheme: {scheme:?}").into()), 18 | } 19 | } 20 | -------------------------------------------------------------------------------- /examples/minio/README.md: -------------------------------------------------------------------------------- 1 | # Minio example 2 | 3 | [MinIO](https://github.com/minio/minio) is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license. It's often used for testing or self-hosting S3-compatible storage. 4 | 5 | We can run minio locally using docker: 6 | 7 | ```shell 8 | docker run -p 9000:9000 -p 9001:9001 \ 9 | quay.io/minio/minio server /data --console-address ":9001" 10 | ``` 11 | 12 | `obstore` isn't able to create a bucket, so we need to do that manually. We can do that through the minio web UI. After running the above docker command, go to . 13 | 14 | Then log in with the credentials `minioadmin`, `minioadmin` for username and password. 15 | 16 | Then click "Create a Bucket" and create a bucket with the name `"test-bucket"`. 17 | 18 | Now, run the Python script: 19 | 20 | ``` 21 | uv run python main.py 22 | ``` 23 | -------------------------------------------------------------------------------- /obstore/src/path.rs: -------------------------------------------------------------------------------- 1 | use object_store::path::Path; 2 | use pyo3::exceptions::PyTypeError; 3 | use pyo3::prelude::*; 4 | use pyo3_object_store::PyPath; 5 | 6 | pub(crate) enum PyPaths { 7 | One(Path), 8 | // TODO: also support an Arrow String Array here. 9 | Many(Vec), 10 | } 11 | 12 | impl<'py> FromPyObject<'_, 'py> for PyPaths { 13 | type Error = PyErr; 14 | 15 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 16 | if let Ok(path) = obj.extract::() { 17 | Ok(Self::One(path.into_inner())) 18 | } else if let Ok(paths) = obj.extract::>() { 19 | Ok(Self::Many( 20 | paths.into_iter().map(|path| path.into_inner()).collect(), 21 | )) 22 | } else { 23 | Err(PyTypeError::new_err( 24 | "Expected string path or sequence of string paths.", 25 | )) 26 | } 27 | } 28 | } 29 | -------------------------------------------------------------------------------- /pyo3-object_store/src/lib.rs: -------------------------------------------------------------------------------- 1 | #![doc = include_str!("../README.md")] 2 | #![warn(missing_docs)] 3 | 4 | mod api; 5 | mod aws; 6 | mod azure; 7 | mod client; 8 | mod config; 9 | mod credentials; 10 | pub(crate) mod error; 11 | mod gcp; 12 | mod http; 13 | mod local; 14 | mod memory; 15 | mod path; 16 | mod prefix; 17 | mod retry; 18 | mod simple; 19 | mod store; 20 | mod url; 21 | 22 | pub use api::{register_exceptions_module, register_store_module}; 23 | pub use aws::PyS3Store; 24 | pub use azure::PyAzureStore; 25 | pub use client::{PyClientConfigKey, PyClientOptions}; 26 | pub use error::{PyObjectStoreError, PyObjectStoreResult}; 27 | pub use gcp::PyGCSStore; 28 | pub use http::PyHttpStore; 29 | pub use local::PyLocalStore; 30 | pub use memory::PyMemoryStore; 31 | pub use path::PyPath; 32 | pub use prefix::MaybePrefixedStore; 33 | pub use simple::from_url; 34 | pub use store::{AnyObjectStore, PyExternalObjectStore, PyObjectStore}; 35 | pub use url::PyUrl; 36 | -------------------------------------------------------------------------------- /docs/integrations/index.md: -------------------------------------------------------------------------------- 1 | # External Integrations 2 | 3 | Various integrations with external libraries exist: 4 | 5 | - [`dagster`](https://dagster.io/): Refer to [`dagster-obstore`](https://github.com/dagster-io/community-integrations/tree/main/libraries/dagster-obstore). 6 | - [`fsspec`](https://github.com/fsspec/filesystem_spec): Use the [`obstore.fsspec`][obstore.fsspec] module. 7 | - [`zarr-python`](https://zarr.readthedocs.io/en/stable/): Use [`zarr.storage.ObjectStore`](https://zarr.readthedocs.io/en/stable/user-guide/storage.html#object-store), included as of Zarr version `3.0.7` and later. See also the [Obstore-Zarr example](../examples/zarr.md). 8 | 9 | And obstore is used internally in more projects: 10 | 11 | - [litData](https://github.com/Lightning-AI/litData) 12 | - [VirtualiZarr](https://github.com/zarr-developers/VirtualiZarr) 13 | 14 | Know of an integration that doesn't exist here? [Edit this document](https://github.com/developmentseed/obstore/edit/main/docs/integrations.md). 15 | -------------------------------------------------------------------------------- /examples/minio/main.py: -------------------------------------------------------------------------------- 1 | # ruff: noqa 2 | import asyncio 3 | 4 | import obstore as obs 5 | from obstore.store import S3Store 6 | 7 | 8 | async def main(): 9 | store = S3Store( 10 | "test-bucket", 11 | endpoint="http://localhost:9000", 12 | access_key_id="minioadmin", 13 | secret_access_key="minioadmin", 14 | virtual_hosted_style_request=False, 15 | client_options={"allow_http": True}, 16 | ) 17 | 18 | print("Put file:") 19 | await obs.put_async(store, "a.txt", b"foo") 20 | await obs.put_async(store, "b.txt", b"bar") 21 | await obs.put_async(store, "c/d.txt", b"baz") 22 | 23 | print("\nList files:") 24 | files = await obs.list(store).collect_async() 25 | print(files) 26 | 27 | print("\nFetch a.txt") 28 | resp = await obs.get_async(store, "a.txt") 29 | print(await resp.bytes_async()) 30 | 31 | print("\nDelete a.txt") 32 | await obs.delete_async(store, "a.txt") 33 | 34 | 35 | if __name__ == "__main__": 36 | asyncio.run(main()) 37 | -------------------------------------------------------------------------------- /obstore/src/head.rs: -------------------------------------------------------------------------------- 1 | use pyo3::prelude::*; 2 | use pyo3_async_runtimes::tokio::get_runtime; 3 | use pyo3_object_store::{PyObjectStore, PyObjectStoreError, PyObjectStoreResult, PyPath}; 4 | 5 | use crate::list::PyObjectMeta; 6 | 7 | #[pyfunction] 8 | pub fn head(py: Python, store: PyObjectStore, path: PyPath) -> PyObjectStoreResult { 9 | let runtime = get_runtime(); 10 | let store = store.into_inner(); 11 | 12 | py.detach(|| { 13 | let meta = runtime.block_on(store.head(path.as_ref()))?; 14 | Ok::<_, PyObjectStoreError>(PyObjectMeta::new(meta)) 15 | }) 16 | } 17 | 18 | #[pyfunction] 19 | pub fn head_async(py: Python, store: PyObjectStore, path: PyPath) -> PyResult> { 20 | let store = store.into_inner().clone(); 21 | pyo3_async_runtimes::tokio::future_into_py(py, async move { 22 | let meta = store 23 | .head(path.as_ref()) 24 | .await 25 | .map_err(PyObjectStoreError::ObjectStoreError)?; 26 | Ok(PyObjectMeta::new(meta)) 27 | }) 28 | } 29 | -------------------------------------------------------------------------------- /docs/overrides/stylesheets/extra.css: -------------------------------------------------------------------------------- 1 | :root, 2 | [data-md-color-scheme="default"] { 3 | /* --md-heading-font: "Oswald"; */ 4 | --md-primary-fg-color: #cf3f02; 5 | --md-default-fg-color: #443f3f; 6 | --boxShadowD: 0px 12px 24px 0px rgba(68, 63, 63, 0.08), 7 | 0px 0px 4px 0px rgba(68, 63, 63, 0.08); 8 | } 9 | body { 10 | margin: 0; 11 | padding: 0; 12 | /* font-size: 16px; */ 13 | } 14 | h1, 15 | h2, 16 | h3, 17 | h4, 18 | h5, 19 | h6 { 20 | font-family: var(--md-heading-font); 21 | font-weight: bold; 22 | } 23 | .md-typeset h1, 24 | .md-typeset h2 { 25 | font-weight: normal; 26 | color: var(--md-default-fg-color); 27 | } 28 | .md-typeset h3, 29 | .md-typeset h4 { 30 | font-weight: bold; 31 | color: var(--md-default-fg-color); 32 | } 33 | .md-button, 34 | .md-typeset .md-button { 35 | font-family: var(--md-heading-font); 36 | } 37 | .md-content .supheading { 38 | font-family: var(--md-heading-font); 39 | text-transform: uppercase; 40 | color: var(--md-primary-fg-color); 41 | font-size: 0.75rem; 42 | font-weight: bold; 43 | } 44 | -------------------------------------------------------------------------------- /obstore/python/obstore/_delete.pyi: -------------------------------------------------------------------------------- 1 | from collections.abc import Sequence 2 | 3 | from ._store import ObjectStore 4 | 5 | def delete(store: ObjectStore, paths: str | Sequence[str]) -> None: 6 | """Delete the object at the specified location(s). 7 | 8 | Args: 9 | store: The ObjectStore instance to use. 10 | paths: The path or paths within the store to delete. 11 | 12 | When supported by the underlying store, this method will use bulk operations 13 | that delete more than one object per a request. 14 | 15 | If the object did not exist, the result may be an error or a success, 16 | depending on the behavior of the underlying store. For example, local 17 | filesystems, GCP, and Azure return an error, while S3 and in-memory will 18 | return Ok. 19 | 20 | """ 21 | 22 | async def delete_async(store: ObjectStore, paths: str | Sequence[str]) -> None: 23 | """Call `delete` asynchronously. 24 | 25 | Refer to the documentation for [delete][obstore.delete]. 26 | """ 27 | -------------------------------------------------------------------------------- /obstore/pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["maturin>=1.4.0,<2.0"] 3 | build-backend = "maturin" 4 | 5 | [project] 6 | name = "obstore" 7 | requires-python = ">=3.9" 8 | dependencies = ["typing-extensions; python_version < '3.13'"] 9 | classifiers = [ 10 | "Development Status :: 4 - Beta", 11 | "Framework :: AsyncIO", 12 | "Framework :: FastAPI", 13 | "Framework :: aiohttp", 14 | "Intended Audience :: Developers", 15 | "Intended Audience :: Science/Research", 16 | "License :: OSI Approved :: MIT License", 17 | "Operating System :: MacOS", 18 | "Operating System :: Microsoft :: Windows", 19 | "Operating System :: Unix", 20 | "Programming Language :: Rust", 21 | "Programming Language :: Python :: Implementation :: CPython", 22 | "Programming Language :: Python :: Implementation :: PyPy", 23 | "Topic :: Internet", 24 | "Typing :: Typed", 25 | ] 26 | dynamic = ["version"] 27 | 28 | [tool.maturin] 29 | features = ["pyo3/extension-module"] 30 | module-name = "obstore._obstore" 31 | python-source = "python" 32 | strip = true 33 | -------------------------------------------------------------------------------- /obstore/python/obstore/_rename.pyi: -------------------------------------------------------------------------------- 1 | from ._store import ObjectStore 2 | 3 | def rename(store: ObjectStore, from_: str, to: str, *, overwrite: bool = True) -> None: 4 | """Move an object from one path to another in the same object store. 5 | 6 | By default, this is implemented as a copy and then delete source. It may not check 7 | when deleting source that it was the same object that was originally copied. 8 | 9 | Args: 10 | store: The ObjectStore instance to use. 11 | from_: Source path 12 | to: Destination path 13 | 14 | Keyword Args: 15 | overwrite: If `True`, if there exists an object at the destination, it will be 16 | overwritten. If `False`, will return an error if the destination already has 17 | an object. 18 | 19 | """ 20 | 21 | async def rename_async( 22 | store: ObjectStore, 23 | from_: str, 24 | to: str, 25 | *, 26 | overwrite: bool = True, 27 | ) -> None: 28 | """Call `rename` asynchronously. 29 | 30 | Refer to the documentation for [rename][obstore.rename]. 31 | """ 32 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Development Seed 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /obstore/python/obstore/_copy.pyi: -------------------------------------------------------------------------------- 1 | from ._store import ObjectStore 2 | 3 | def copy(store: ObjectStore, from_: str, to: str, *, overwrite: bool = True) -> None: 4 | """Copy an object from one path to another in the same object store. 5 | 6 | Args: 7 | store: The ObjectStore instance to use. 8 | from_: Source path 9 | to: Destination path 10 | 11 | Keyword Args: 12 | overwrite: If `True`, if there exists an object at the destination, it will 13 | be overwritten. 14 | 15 | If `False`: will copy only if destination is empty. Performs an atomic operation if the underlying object storage supports it. If atomic operations are not supported by the underlying object storage (like S3) it will return an error. 16 | 17 | Will return an error if the destination already has an object. 18 | 19 | """ 20 | 21 | async def copy_async( 22 | store: ObjectStore, 23 | from_: str, 24 | to: str, 25 | *, 26 | overwrite: bool = True, 27 | ) -> None: 28 | """Call `copy` asynchronously. 29 | 30 | Refer to the documentation for [copy][obstore.copy]. 31 | """ 32 | -------------------------------------------------------------------------------- /pyo3-bytes/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Development Seed 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /tests/store/test_gcs.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | from obstore.exceptions import BaseError, GenericError 4 | from obstore.store import GCSStore 5 | 6 | 7 | def test_overlapping_config_keys(): 8 | with pytest.raises(BaseError, match="Duplicate key"): 9 | GCSStore(google_bucket="bucket", GOOGLE_BUCKET="bucket") # type: ignore intentional test 10 | 11 | with pytest.raises(BaseError, match="Duplicate key"): 12 | GCSStore(config={"google_bucket": "test", "GOOGLE_BUCKET": "test"}) # type: ignore intentional test 13 | 14 | 15 | def test_eq(): 16 | store = GCSStore("bucket", client_options={"timeout": "10s"}) 17 | store2 = GCSStore("bucket", client_options={"timeout": "10s"}) 18 | store3 = GCSStore("bucket") 19 | assert store == store # noqa: PLR0124 20 | assert store == store2 21 | assert store != store3 22 | 23 | 24 | def test_application_credentials(): 25 | # The application_credentials parameter should be correctly passed down 26 | # Finalizing the GCSBuilder should try to load and parse those credentials, which 27 | # should error here. 28 | with pytest.raises(GenericError, match="source: OpenCredentials"): 29 | GCSStore("bucket", application_credentials="path/to/creds.json") 30 | -------------------------------------------------------------------------------- /pyo3-object_store/src/memory.rs: -------------------------------------------------------------------------------- 1 | use std::sync::Arc; 2 | 3 | use object_store::memory::InMemory; 4 | use pyo3::intern; 5 | use pyo3::prelude::*; 6 | use pyo3::types::PyString; 7 | 8 | /// A Python-facing wrapper around an [`InMemory`]. 9 | #[derive(Debug, Clone)] 10 | #[pyclass(name = "MemoryStore", frozen, subclass)] 11 | pub struct PyMemoryStore(Arc); 12 | 13 | impl AsRef> for PyMemoryStore { 14 | fn as_ref(&self) -> &Arc { 15 | &self.0 16 | } 17 | } 18 | 19 | impl From> for PyMemoryStore { 20 | fn from(value: Arc) -> Self { 21 | Self(value) 22 | } 23 | } 24 | 25 | impl<'py> PyMemoryStore { 26 | /// Consume self and return the underlying [`InMemory`]. 27 | pub fn into_inner(self) -> Arc { 28 | self.0 29 | } 30 | 31 | fn __repr__(&'py self, py: Python<'py>) -> &'py Bound<'py, PyString> { 32 | intern!(py, "MemoryStore") 33 | } 34 | } 35 | 36 | #[pymethods] 37 | impl PyMemoryStore { 38 | #[new] 39 | fn py_new() -> Self { 40 | Self(Arc::new(InMemory::new())) 41 | } 42 | 43 | fn __eq__(slf: Py, other: &Bound) -> bool { 44 | // Two memory stores are equal only if they are the same object 45 | slf.is(other) 46 | } 47 | } 48 | -------------------------------------------------------------------------------- /docs/obspec.md: -------------------------------------------------------------------------------- 1 | # Generic storage abstractions with Obspec 2 | 3 | Obstore provides an implementation for accessing Amazon S3, Google Cloud Storage, and Azure Storage, but some libraries may want to also support other backends, such as HTTP clients or more obscure things like SFTP or HDFS filesystems. 4 | 5 | Additionally, there's a bunch of useful behavior that could exist on top of Obstore: caching, metrics, globbing, bulk operations. While all of those operations are useful, we want to keep the core Obstore library as small as possible, tightly coupled with the underlying Rust `object_store` library. 6 | 7 | [Obspec](https://developmentseed.org/obspec/) exists to provide the abstractions for generic programming against object store backends. Obspec is essentially a formalization and generalization of the Obstore API, so if you're already using Obstore, very few changes are needed to use Obspec instead. 8 | 9 | Downstream libraries can program against the Obspec API to be fully generic around what underlying backend is used at runtime. 10 | 11 | For further information, refer to the [Obspec documentation](https://developmentseed.org/obspec/latest/) and the [Obspec announcement blog post](https://developmentseed.org/obspec/latest/blog/2025/06/25/introducing-obspec-a-python-protocol-for-interfacing-with-object-storage/). 12 | -------------------------------------------------------------------------------- /examples/zarr/main.py: -------------------------------------------------------------------------------- 1 | """Example using Zarr with the Obstore backend.""" 2 | 3 | import matplotlib.pyplot as plt 4 | import pystac_client 5 | import xarray as xr 6 | from zarr.storage import ObjectStore 7 | 8 | from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider 9 | from obstore.store import AzureStore 10 | 11 | # These first lines are specific to Zarr stored in the Microsoft Planetary 12 | # Computer. We use pystac-client to find the metadata for this specific Zarr 13 | # store. 14 | catalog = pystac_client.Client.open( 15 | "https://planetarycomputer.microsoft.com/api/stac/v1/", 16 | ) 17 | collection = catalog.get_collection("daymet-daily-hi") 18 | asset = collection.assets["zarr-abfs"] 19 | 20 | # We construct an AzureStore because this Zarr dataset is stored in Azure 21 | # storage 22 | azure_store = AzureStore( 23 | credential_provider=PlanetaryComputerCredentialProvider.from_asset(asset), 24 | ) 25 | 26 | # Next we use the Zarr ObjectStorage adapter and pass it to xarray. 27 | zarr_store = ObjectStore(azure_store, read_only=True) 28 | ds = xr.open_dataset(zarr_store, consolidated=True, engine="zarr") 29 | 30 | # And plot with matplotlib 31 | fig, ax = plt.subplots(figsize=(12, 12)) 32 | ds.sel(time="2009")["tmax"].mean(dim="time").plot.imshow(ax=ax, cmap="inferno") 33 | fig.savefig("zarr-example.png") 34 | -------------------------------------------------------------------------------- /docs/advanced/pickle.md: -------------------------------------------------------------------------------- 1 | # Pickle Support 2 | 3 | Obstore supports [pickle](https://docs.python.org/3/library/pickle.html), which is commonly used from inside [Dask](https://www.dask.org/) and similar libraries to manage state across distributed workers. 4 | 5 | ## Not for persistence 6 | 7 | The format used to pickle stores may change across versions. Pickle support is intended for execution frameworks like [Dask](https://www.dask.org/) that need to share state across workers that are using the same environments, including the same version of Python and obstore. 8 | 9 | ## Middlewares 10 | 11 | Obstore expects to support some sort of middleware in the future, such as for recording request metrics. It's unlikely that middlewares will support pickle. 12 | 13 | ## MemoryStore not implemented 14 | 15 | Pickling isn't supported for [`MemoryStore`][obstore.store.MemoryStore] because we don't have a way to access the raw state of the store. 16 | 17 | ## Custom authentication 18 | 19 | As of obstore 0.5.0, [custom authentication](../authentication.md#custom-authentication) is supported. 20 | 21 | Pickling works with a custom authentication provider so long as that Python callback can itself be pickled. 22 | 23 | So, for example, the [boto3 provider][obstore.auth.boto3.Boto3CredentialProvider] cannot be pickled, because a [`boto3.session.Session`][] cannot be pickled, but a simple function can be. 24 | -------------------------------------------------------------------------------- /Cargo.toml: -------------------------------------------------------------------------------- 1 | [workspace] 2 | members = ["obstore"] 3 | # Note: pyo3-object_store is _not_ a member of this workspace because we need to 4 | # patch the object_store version for Python to export a list stream. This list 5 | # stream is implemented in https://github.com/apache/arrow-rs/pull/6619 and will 6 | # be included in object_store's next major release. 7 | # 8 | # But pyo3-object_store gets published to crates.io, which can't have git 9 | # dependencies. 10 | exclude = ["pyo3-object_store", "pyo3-bytes"] 11 | resolver = "2" 12 | 13 | [workspace.package] 14 | authors = ["Kyle Barron "] 15 | edition = "2021" 16 | homepage = "https://developmentseed.org/obstore" 17 | repository = "https://github.com/developmentseed/obstore" 18 | license = "MIT OR Apache-2.0" 19 | keywords = ["python"] 20 | categories = [] 21 | rust-version = "1.75" 22 | 23 | [workspace.dependencies] 24 | bytes = "1.10.1" 25 | chrono = "0.4.38" 26 | futures = "0.3.31" 27 | http = "1.2" 28 | indexmap = "2" 29 | object_store = "0.12.4" 30 | pyo3 = { version = "0.27.1", features = ["macros", "indexmap"] } 31 | pyo3-async-runtimes = { version = "0.27", features = ["tokio-runtime"] } 32 | pyo3-file = { git = "https://github.com/kylebarron/pyo3-file", rev = "aacc18816591f9987247bac8b7011b452b4eeb3e" } 33 | thiserror = "1" 34 | tokio = "1.40" 35 | url = "2" 36 | 37 | [profile.release] 38 | lto = true 39 | codegen-units = 1 40 | -------------------------------------------------------------------------------- /pyo3-object_store/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | ## [0.7.0] - 2025-10-23 4 | 5 | - Bump to pyo3 0.27. 6 | 7 | ## [0.6.0] - 2025-09-02 8 | 9 | ### Breaking changes :wrench: 10 | 11 | - Don't percent-encode paths. The implementation of `FromPyObject` for `PyPath` now uses `Path::parse` instead of `Path::from` under the hood. #524 12 | - Bump to pyo3 0.26. 13 | 14 | ### Other 15 | 16 | - Configurable warning on PyExternalObjectStore creation #550 17 | 18 | ## [0.5.0] - 2025-05-19 19 | 20 | - Bump to pyo3 0.25. 21 | 22 | ## [0.4.0] - 2025-03-24 23 | 24 | Compatibility release to use `pyo3-object_store` with `object_store` 0.11 and `pyo3` 0.24. 25 | 26 | ## [0.3.0] - 2025-03-24 27 | 28 | Compatibility release to use `pyo3-object_store` with `object_store` 0.11 and `pyo3` 0.23. 29 | 30 | ### Breaking changes :wrench: 31 | 32 | #### Store constructors 33 | 34 | - In the `AzureStore` constructor, the `container` positional argument was renamed to `container_name` to match the `container_name` key in `AzureConfig`. 35 | 36 | This is a breaking change if you had been calling `AzureStore(container="my container name")`. This is not breaking if you had been using it as a positional argument `AzureStore("my container name")` or if you had already been using `AzureStore(container_name="my container name")`. 37 | 38 | ## [0.2.0] - 2025-03-14 39 | 40 | - Bump to pyo3 0.24. 41 | 42 | ## [0.1.0] - 2025-03-14 43 | 44 | - Initial release. 45 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: Rust 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | pull_request: 8 | 9 | permissions: 10 | contents: read 11 | 12 | jobs: 13 | lint-test: 14 | name: Lint and Test 15 | runs-on: ubuntu-latest 16 | steps: 17 | - uses: actions/checkout@v4 18 | with: 19 | submodules: "recursive" 20 | 21 | - uses: actions/setup-python@v5 22 | with: 23 | python-version: "3.11" 24 | 25 | - name: Install Rust 26 | uses: dtolnay/rust-toolchain@stable 27 | with: 28 | components: rustfmt, clippy 29 | 30 | - uses: Swatinem/rust-cache@v2 31 | 32 | - name: Cargo fmt 33 | run: | 34 | cargo fmt --all -- --check 35 | cd pyo3-object_store && cargo fmt --all -- --check && cd .. 36 | cd pyo3-bytes && cargo fmt --all -- --check && cd .. 37 | 38 | - name: "clippy --all" 39 | run: | 40 | cargo clippy --all --all-features --tests -- -D warnings 41 | cd pyo3-object_store && cargo clippy --all --all-features --tests -- -D warnings && cd .. 42 | cd pyo3-bytes && cargo clippy --all --all-features --tests -- -D warnings && cd .. 43 | 44 | - name: "cargo check" 45 | run: cargo check --all --all-features 46 | 47 | - name: "cargo test" 48 | run: | 49 | cargo test --all 50 | cargo test --all --all-features 51 | -------------------------------------------------------------------------------- /pyo3-object_store/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "pyo3-object_store" 3 | version = "0.7.0" 4 | authors = ["Kyle Barron "] 5 | edition = "2021" 6 | description = "object_store integration for pyo3." 7 | readme = "README.md" 8 | repository = "https://github.com/developmentseed/obstore" 9 | license = "MIT OR Apache-2.0" 10 | keywords = [] 11 | categories = [] 12 | rust-version = "1.75" 13 | # Include the Python type hints as part of the cargo distribution 14 | include = ["src", "type-hints", "README.md", "LICENSE"] 15 | 16 | [features] 17 | default = ["external-store-warning"] 18 | external-store-warning = [] 19 | 20 | [dependencies] 21 | async-trait = "0.1.85" 22 | bytes = "1" 23 | chrono = "0.4" 24 | futures = "0.3" 25 | # This is already an object_store dependency 26 | humantime = "2.1" 27 | # This is already an object_store dependency 28 | http = "1" 29 | # This is already an object_store dependency 30 | itertools = "0.14.0" 31 | object_store = { version = "0.12.4", features = [ 32 | "aws", 33 | "azure", 34 | "gcp", 35 | "http", 36 | ] } 37 | # This is already an object_store dependency 38 | percent-encoding = "2.1" 39 | pyo3 = { version = "0.27", features = ["chrono", "indexmap"] } 40 | pyo3-async-runtimes = { version = "0.27", features = ["tokio-runtime"] } 41 | serde = "1" 42 | thiserror = "1" 43 | tokio = { version = "1.40", features = ["rt-multi-thread"] } 44 | url = "2" 45 | 46 | [lib] 47 | crate-type = ["rlib"] 48 | -------------------------------------------------------------------------------- /tests/test_backoff.py: -------------------------------------------------------------------------------- 1 | from datetime import timedelta 2 | 3 | from obstore.store import HTTPStore 4 | 5 | 6 | def test_construction_with_backoff_config(): 7 | HTTPStore.from_url( 8 | "https://...", 9 | client_options={ 10 | "connect_timeout": "4 seconds", 11 | "timeout": "16 seconds", 12 | }, 13 | retry_config={ 14 | "max_retries": 10, 15 | "backoff": { 16 | "base": 2, 17 | "init_backoff": timedelta(seconds=1), 18 | "max_backoff": timedelta(seconds=16), 19 | }, 20 | "retry_timeout": timedelta(minutes=3), 21 | }, 22 | ) 23 | 24 | 25 | def test_construction_partial_retry_config(): 26 | HTTPStore.from_url( 27 | "https://...", 28 | client_options={ 29 | "connect_timeout": "4 seconds", 30 | "timeout": "16 seconds", 31 | }, 32 | retry_config={ 33 | "max_retries": 10, 34 | }, 35 | ) 36 | HTTPStore.from_url( 37 | "https://...", 38 | client_options={ 39 | "connect_timeout": "4 seconds", 40 | "timeout": "16 seconds", 41 | }, 42 | retry_config={ 43 | "max_retries": 10, 44 | "backoff": { 45 | "init_backoff": timedelta(seconds=1), 46 | }, 47 | "retry_timeout": timedelta(minutes=3), 48 | }, 49 | ) 50 | -------------------------------------------------------------------------------- /docs/examples/tqdm.md: -------------------------------------------------------------------------------- 1 | # tqdm (Progress Bar) 2 | 3 | [tqdm](https://tqdm.github.io/) provides an interactive progress bar for Python. 4 | 5 | ![](../assets/example.gif) 6 | 7 | It's easy to wrap obstore downloads with a tqdm progress bar: 8 | 9 | ```py 10 | from obstore.store import HTTPStore 11 | from tqdm import tqdm 12 | 13 | store = HTTPStore.from_url("https://ookla-open-data.s3.us-west-2.amazonaws.com") 14 | path = "parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet" 15 | response = obs.get(store, path) 16 | file_size = response.meta["size"] 17 | with tqdm(total=file_size) as pbar: 18 | for bytes_chunk in response: 19 | # Do something with buffer 20 | pbar.update(len(bytes_chunk)) 21 | ``` 22 | 23 | Or, if you're using the async API: 24 | 25 | ```py 26 | from obstore.store import HTTPStore 27 | from tqdm import tqdm 28 | 29 | store = HTTPStore.from_url("https://ookla-open-data.s3.us-west-2.amazonaws.com") 30 | path = "parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet" 31 | response = await obs.get_async(store, path) 32 | file_size = response.meta["size"] 33 | with tqdm(total=file_size) as pbar: 34 | async for bytes_chunk in response: 35 | # Do something with buffer 36 | pbar.update(len(bytes_chunk)) 37 | ``` 38 | 39 | There's a [full example](https://github.com/developmentseed/obstore/tree/main/examples/progress-bar) in the obstore repository. 40 | -------------------------------------------------------------------------------- /tests/store/test_azure.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | from obstore.exceptions import BaseError 4 | from obstore.store import AzureStore 5 | 6 | 7 | def test_overlapping_config_keys(): 8 | with pytest.raises(BaseError, match="Duplicate key"): 9 | AzureStore(container_name="test", AZURE_CONTAINER_NAME="test") # type: ignore intentional test 10 | 11 | with pytest.raises(BaseError, match="Duplicate key"): 12 | AzureStore( 13 | config={"azure_container_name": "test", "AZURE_CONTAINER_NAME": "test"}, # type: ignore (intentional test) 14 | ) 15 | 16 | 17 | def test_eq(): 18 | store = AzureStore( 19 | "container", 20 | account_name="account_name", 21 | client_options={"timeout": "10s"}, 22 | ) 23 | store2 = AzureStore( 24 | "container", 25 | account_name="account_name", 26 | client_options={"timeout": "10s"}, 27 | ) 28 | store3 = AzureStore( 29 | "container", 30 | account_name="account_name", 31 | ) 32 | assert store == store # noqa: PLR0124 33 | assert store == store2 34 | assert store != store3 35 | 36 | 37 | def test_from_url(): 38 | # https://github.com/developmentseed/obstore/issues/477 39 | url = "https://overturemapswestus2.blob.core.windows.net/release" 40 | store = AzureStore.from_url(url, skip_signature=True) 41 | 42 | assert store.config.get("container_name") == "release" 43 | assert store.config.get("account_name") == "overturemapswestus2" 44 | assert store.prefix is None 45 | -------------------------------------------------------------------------------- /obstore/build.rs: -------------------------------------------------------------------------------- 1 | use cargo_lock::{Lockfile, SourceId, Version}; 2 | use std::ffi::OsString; 3 | use std::io::ErrorKind; 4 | use std::path::{Path, PathBuf}; 5 | use std::{env, io}; 6 | 7 | fn main() { 8 | let lockfile_location = get_lockfile_location().unwrap(); 9 | let (version, source) = read_lockfile(&lockfile_location); 10 | 11 | println!("cargo:rustc-env=OBJECT_STORE_VERSION={version}"); 12 | println!( 13 | "cargo:rustc-env=OBJECT_STORE_SOURCE={}", 14 | source.map(|s| s.to_string()).unwrap_or("".to_string()) 15 | ); 16 | } 17 | 18 | fn get_lockfile_location() -> io::Result { 19 | let path = PathBuf::from(env!("CARGO_MANIFEST_DIR")); 20 | 21 | let cargo_lock = OsString::from("Cargo.lock"); 22 | 23 | for ancestor in path.as_path().ancestors() { 24 | for entry in ancestor.read_dir()? { 25 | let entry = entry?; 26 | if entry.file_name() == cargo_lock { 27 | return Ok(entry.path()); 28 | } 29 | } 30 | } 31 | 32 | Err(io::Error::new( 33 | ErrorKind::NotFound, 34 | "Ran out of places to find Cargo.toml", 35 | )) 36 | } 37 | 38 | fn read_lockfile(path: &Path) -> (Version, Option) { 39 | let lockfile = Lockfile::load(path).unwrap(); 40 | let idx = lockfile 41 | .packages 42 | .iter() 43 | .position(|p| p.name.as_str() == "object_store") 44 | .unwrap(); 45 | let package = &lockfile.packages[idx]; 46 | (package.version.clone(), package.source.clone()) 47 | } 48 | -------------------------------------------------------------------------------- /obstore/src/copy.rs: -------------------------------------------------------------------------------- 1 | use object_store::ObjectStore; 2 | use pyo3::prelude::*; 3 | use pyo3_async_runtimes::tokio::get_runtime; 4 | use pyo3_object_store::{PyObjectStore, PyObjectStoreError, PyObjectStoreResult}; 5 | 6 | use crate::utils::PyNone; 7 | 8 | #[pyfunction] 9 | #[pyo3(signature = (store, from_, to, *, overwrite=true))] 10 | pub(crate) fn copy( 11 | py: Python, 12 | store: PyObjectStore, 13 | from_: String, 14 | to: String, 15 | overwrite: bool, 16 | ) -> PyObjectStoreResult<()> { 17 | let runtime = get_runtime(); 18 | let from_ = from_.into(); 19 | let to = to.into(); 20 | py.detach(|| { 21 | let fut = if overwrite { 22 | store.as_ref().copy(&from_, &to) 23 | } else { 24 | store.as_ref().copy_if_not_exists(&from_, &to) 25 | }; 26 | runtime.block_on(fut)?; 27 | Ok::<_, PyObjectStoreError>(()) 28 | }) 29 | } 30 | 31 | #[pyfunction] 32 | #[pyo3(signature = (store, from_, to, *, overwrite=true))] 33 | pub(crate) fn copy_async( 34 | py: Python, 35 | store: PyObjectStore, 36 | from_: String, 37 | to: String, 38 | overwrite: bool, 39 | ) -> PyResult> { 40 | let from_ = from_.into(); 41 | let to = to.into(); 42 | pyo3_async_runtimes::tokio::future_into_py(py, async move { 43 | let fut = if overwrite { 44 | store.as_ref().copy(&from_, &to) 45 | } else { 46 | store.as_ref().copy_if_not_exists(&from_, &to) 47 | }; 48 | fut.await.map_err(PyObjectStoreError::ObjectStoreError)?; 49 | Ok(PyNone) 50 | }) 51 | } 52 | -------------------------------------------------------------------------------- /pyo3-object_store/src/config.rs: -------------------------------------------------------------------------------- 1 | use std::time::Duration; 2 | 3 | use humantime::format_duration; 4 | use pyo3::prelude::*; 5 | 6 | /// A wrapper around `String` used to store values for config values. 7 | /// 8 | /// Supported Python input: 9 | /// 10 | /// - `True` and `False` (becomes `"true"` and `"false"`) 11 | /// - `timedelta` 12 | /// - `str` 13 | #[derive(Clone, Debug, PartialEq, Eq, Hash, IntoPyObject, IntoPyObjectRef)] 14 | pub struct PyConfigValue(pub String); 15 | 16 | impl PyConfigValue { 17 | pub(crate) fn new(val: impl Into) -> Self { 18 | Self(val.into()) 19 | } 20 | } 21 | 22 | impl AsRef for PyConfigValue { 23 | fn as_ref(&self) -> &str { 24 | &self.0 25 | } 26 | } 27 | 28 | impl<'py> FromPyObject<'_, 'py> for PyConfigValue { 29 | type Error = PyErr; 30 | 31 | fn extract(obj: Borrowed<'_, 'py, pyo3::PyAny>) -> PyResult { 32 | if let Ok(val) = obj.extract::() { 33 | Ok(val.into()) 34 | } else if let Ok(duration) = obj.extract::() { 35 | Ok(duration.into()) 36 | } else { 37 | Ok(Self(obj.extract()?)) 38 | } 39 | } 40 | } 41 | 42 | impl From for String { 43 | fn from(value: PyConfigValue) -> Self { 44 | value.0 45 | } 46 | } 47 | 48 | impl From for PyConfigValue { 49 | fn from(value: bool) -> Self { 50 | Self(value.to_string()) 51 | } 52 | } 53 | 54 | impl From for PyConfigValue { 55 | fn from(value: Duration) -> Self { 56 | Self(format_duration(value).to_string()) 57 | } 58 | } 59 | -------------------------------------------------------------------------------- /obstore/src/rename.rs: -------------------------------------------------------------------------------- 1 | use object_store::ObjectStore; 2 | use pyo3::prelude::*; 3 | use pyo3_async_runtimes::tokio::get_runtime; 4 | use pyo3_object_store::{PyObjectStore, PyObjectStoreError, PyObjectStoreResult}; 5 | 6 | use crate::utils::PyNone; 7 | 8 | #[pyfunction] 9 | #[pyo3(signature = (store, from_, to, *, overwrite=true))] 10 | pub(crate) fn rename( 11 | py: Python, 12 | store: PyObjectStore, 13 | from_: String, 14 | to: String, 15 | overwrite: bool, 16 | ) -> PyObjectStoreResult<()> { 17 | let runtime = get_runtime(); 18 | let from_ = from_.into(); 19 | let to = to.into(); 20 | py.detach(|| { 21 | let fut = if overwrite { 22 | store.as_ref().rename(&from_, &to) 23 | } else { 24 | store.as_ref().rename_if_not_exists(&from_, &to) 25 | }; 26 | runtime.block_on(fut)?; 27 | Ok::<_, PyObjectStoreError>(()) 28 | }) 29 | } 30 | 31 | #[pyfunction] 32 | #[pyo3(signature = (store, from_, to, *, overwrite=true))] 33 | pub(crate) fn rename_async( 34 | py: Python, 35 | store: PyObjectStore, 36 | from_: String, 37 | to: String, 38 | overwrite: bool, 39 | ) -> PyResult> { 40 | let from_ = from_.into(); 41 | let to = to.into(); 42 | pyo3_async_runtimes::tokio::future_into_py(py, async move { 43 | let fut = if overwrite { 44 | store.as_ref().rename(&from_, &to) 45 | } else { 46 | store.as_ref().rename_if_not_exists(&from_, &to) 47 | }; 48 | fut.await.map_err(PyObjectStoreError::ObjectStoreError)?; 49 | Ok(PyNone) 50 | }) 51 | } 52 | -------------------------------------------------------------------------------- /examples/fastapi/main.py: -------------------------------------------------------------------------------- 1 | # ruff: noqa 2 | from fastapi import FastAPI 3 | from fastapi.responses import StreamingResponse 4 | 5 | import obstore as obs 6 | from obstore.store import HTTPStore, S3Store 7 | 8 | app = FastAPI() 9 | 10 | 11 | @app.get("/") 12 | def read_root(): 13 | return {"Hello": "World"} 14 | 15 | 16 | @app.get("/example.parquet") 17 | async def download_example() -> StreamingResponse: 18 | store = HTTPStore.from_url("https://raw.githubusercontent.com") 19 | path = "opengeospatial/geoparquet/refs/heads/main/examples/example.parquet" 20 | 21 | # Make the request. This only begins the download; it does not wait for the download 22 | # to finish. 23 | resp = await obs.get_async(store, path) 24 | 25 | # Passing `GetResult` directly to `StreamingResponse` calls `GetResult.stream()` 26 | # under the hood and thus uses the default chunking behavior of 27 | # `GetResult.stream()`. 28 | return StreamingResponse(resp) 29 | 30 | 31 | @app.get("/large.parquet") 32 | async def large_example() -> StreamingResponse: 33 | # Example large Parquet file hosted in AWS open data 34 | store = S3Store("ookla-open-data", region="us-west-2", skip_signature=True) 35 | path = "parquet/performance/type=fixed/year=2024/quarter=1/2024-01-01_performance_fixed_tiles.parquet" 36 | 37 | # Make the request 38 | # Note: for large file downloads you may need to increase the timeout in the client 39 | # configuration 40 | resp = await obs.get_async(store, path) 41 | 42 | # Example: Ensure the stream returns at least 10MB of data in each chunk. 43 | return StreamingResponse(resp.stream(min_chunk_size=10 * 1024 * 1024)) 44 | -------------------------------------------------------------------------------- /obstore/Cargo.toml: -------------------------------------------------------------------------------- 1 | [package] 2 | name = "obstore" 3 | version = "0.8.2" 4 | authors = { workspace = true } 5 | edition = { workspace = true } 6 | description = "The simplest, highest-throughput interface to Amazon S3, Google Cloud Storage, Azure Blob Storage, and S3-compliant APIs like Cloudflare R2." 7 | readme = "README.md" 8 | repository = { workspace = true } 9 | homepage = { workspace = true } 10 | license = { workspace = true } 11 | keywords = { workspace = true } 12 | categories = { workspace = true } 13 | rust-version = { workspace = true } 14 | 15 | # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html 16 | [lib] 17 | name = "_obstore" 18 | crate-type = ["cdylib"] 19 | 20 | [dependencies] 21 | arrow = "57" 22 | bytes = { workspace = true } 23 | chrono = { workspace = true } 24 | futures = { workspace = true } 25 | http = { workspace = true } 26 | indexmap = { workspace = true } 27 | object_store = { workspace = true } 28 | pyo3 = { workspace = true, features = ["chrono"] } 29 | pyo3-arrow = "0.15" 30 | pyo3-async-runtimes = { workspace = true, features = ["tokio-runtime"] } 31 | pyo3-bytes = "0.5" 32 | pyo3-file = { workspace = true } 33 | pyo3-object_store = { path = "../pyo3-object_store" } 34 | tokio = { workspace = true, features = [ 35 | "macros", 36 | "rt", 37 | "rt-multi-thread", 38 | "sync", 39 | ] } 40 | url = { workspace = true } 41 | 42 | # We opt-in to using rustls as the TLS provider for reqwest, which is the HTTP 43 | # library used by object_store. 44 | # https://github.com/seanmonstar/reqwest/issues/2025 45 | reqwest = { version = "*", default-features = false, features = [ 46 | "rustls-tls-native-roots", 47 | ] } 48 | 49 | [build-dependencies] 50 | cargo-lock = "10.1.0" 51 | -------------------------------------------------------------------------------- /pyo3-object_store/src/path.rs: -------------------------------------------------------------------------------- 1 | use object_store::path::Path; 2 | use pyo3::exceptions::PyValueError; 3 | use pyo3::prelude::*; 4 | use pyo3::pybacked::PyBackedStr; 5 | use pyo3::types::PyString; 6 | 7 | /// A Python-facing wrapper around a [`Path`]. 8 | #[derive(Clone, Debug, Default, PartialEq)] 9 | pub struct PyPath(Path); 10 | 11 | impl<'py> FromPyObject<'_, 'py> for PyPath { 12 | type Error = PyErr; 13 | 14 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 15 | let path = Path::parse(obj.extract::()?) 16 | .map_err(|err| PyValueError::new_err(format!("Could not parse path: {err}")))?; 17 | Ok(Self(path)) 18 | } 19 | } 20 | 21 | impl PyPath { 22 | /// Consume self and return the underlying [`Path`]. 23 | pub fn into_inner(self) -> Path { 24 | self.0 25 | } 26 | } 27 | 28 | impl<'py> IntoPyObject<'py> for PyPath { 29 | type Target = PyString; 30 | type Output = Bound<'py, PyString>; 31 | type Error = PyErr; 32 | 33 | fn into_pyobject(self, py: Python<'py>) -> Result { 34 | Ok(PyString::new(py, self.0.as_ref())) 35 | } 36 | } 37 | 38 | impl<'py> IntoPyObject<'py> for &PyPath { 39 | type Target = PyString; 40 | type Output = Bound<'py, PyString>; 41 | type Error = PyErr; 42 | 43 | fn into_pyobject(self, py: Python<'py>) -> Result { 44 | Ok(PyString::new(py, self.0.as_ref())) 45 | } 46 | } 47 | 48 | impl AsRef for PyPath { 49 | fn as_ref(&self) -> &Path { 50 | &self.0 51 | } 52 | } 53 | 54 | impl From for Path { 55 | fn from(value: PyPath) -> Self { 56 | value.0 57 | } 58 | } 59 | 60 | impl From for PyPath { 61 | fn from(value: Path) -> Self { 62 | Self(value) 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /pyo3-object_store/src/url.rs: -------------------------------------------------------------------------------- 1 | use pyo3::exceptions::PyValueError; 2 | use pyo3::prelude::*; 3 | use pyo3::pybacked::PyBackedStr; 4 | use pyo3::types::PyString; 5 | use pyo3::FromPyObject; 6 | use url::Url; 7 | 8 | /// A wrapper around [`url::Url`] that implements [`FromPyObject`]. 9 | #[derive(Debug, Clone, PartialEq)] 10 | pub struct PyUrl(Url); 11 | 12 | impl PyUrl { 13 | /// Create a new PyUrl from a [Url] 14 | pub fn new(url: Url) -> Self { 15 | Self(url) 16 | } 17 | 18 | /// Consume self and return the underlying [Url] 19 | pub fn into_inner(self) -> Url { 20 | self.0 21 | } 22 | } 23 | 24 | impl<'py> FromPyObject<'_, 'py> for PyUrl { 25 | type Error = PyErr; 26 | 27 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 28 | let s = obj.extract::()?; 29 | let url = Url::parse(&s).map_err(|err| PyValueError::new_err(err.to_string()))?; 30 | Ok(Self(url)) 31 | } 32 | } 33 | 34 | impl<'py> IntoPyObject<'py> for PyUrl { 35 | type Target = PyString; 36 | type Output = Bound<'py, PyString>; 37 | type Error = std::convert::Infallible; 38 | 39 | fn into_pyobject(self, py: Python<'py>) -> Result { 40 | Ok(PyString::new(py, self.0.as_str())) 41 | } 42 | } 43 | 44 | impl<'py> IntoPyObject<'py> for &PyUrl { 45 | type Target = PyString; 46 | type Output = Bound<'py, PyString>; 47 | type Error = std::convert::Infallible; 48 | 49 | fn into_pyobject(self, py: Python<'py>) -> Result { 50 | Ok(PyString::new(py, self.0.as_str())) 51 | } 52 | } 53 | 54 | impl AsRef for PyUrl { 55 | fn as_ref(&self) -> &Url { 56 | &self.0 57 | } 58 | } 59 | 60 | impl From for String { 61 | fn from(value: PyUrl) -> Self { 62 | value.0.into() 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /docs/examples/minio.md: -------------------------------------------------------------------------------- 1 | # Minio 2 | 3 | [MinIO](https://github.com/minio/minio) is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license. It's often used for testing or self-hosting S3-compatible storage. 4 | 5 | ## Example 6 | 7 | !!! note 8 | 9 | This example is also [available on Github](https://github.com/developmentseed/obstore/blob/main/examples/minio/README.md) if you'd like to test it out locally. 10 | 11 | We can run minio locally using docker: 12 | 13 | ```shell 14 | docker run -p 9000:9000 -p 9001:9001 \ 15 | quay.io/minio/minio server /data --console-address ":9001" 16 | ``` 17 | 18 | `obstore` isn't able to create a bucket, so we need to do that manually. We can do that through the minio web UI. After running the above docker command, go to . Then log in with the credentials `minioadmin`, `minioadmin` for username and password. Then click "Create a Bucket" and create a bucket with the name `"test-bucket"`. 19 | 20 | Now we can create an `S3Store` to interact with minio: 21 | 22 | ```py 23 | import obstore as obs 24 | from obstore.store import S3Store 25 | 26 | store = S3Store( 27 | "test-bucket", 28 | endpoint="http://localhost:9000", 29 | access_key_id="minioadmin", 30 | secret_access_key="minioadmin", 31 | virtual_hosted_style_request=False, 32 | client_options={"allow_http": True}, 33 | ) 34 | 35 | # Add files 36 | obs.put(store, "a.txt", b"foo") 37 | obs.put(store, "b.txt", b"bar") 38 | obs.put(store, "c/d.txt", b"baz") 39 | 40 | # List files 41 | files = obs.list(store).collect() 42 | print(files) 43 | 44 | # Download a file 45 | resp = obs.get(store, "a.txt") 46 | print(resp.bytes()) 47 | 48 | # Delete a file 49 | obs.delete(store, "a.txt") 50 | ``` 51 | 52 | There's a [full example](https://github.com/developmentseed/obstore/tree/main/examples/minio) in the obstore repository. 53 | -------------------------------------------------------------------------------- /examples/progress-bar/main.py: -------------------------------------------------------------------------------- 1 | # ruff: noqa 2 | import asyncio 3 | import sys 4 | from urllib.parse import urlsplit 5 | 6 | from tqdm import tqdm 7 | 8 | import obstore as obs 9 | from obstore.store import HTTPStore 10 | 11 | # https://registry.opendata.aws/speedtest-global-performance/ 12 | DEFAULT_URL = "https://ookla-open-data.s3.us-west-2.amazonaws.com/parquet/performance/type=fixed/year=2019/quarter=1/2019-01-01_performance_fixed_tiles.parquet" 13 | 14 | 15 | def sync_download_progress_bar(url: str): 16 | store, path = parse_url(url) 17 | resp = obs.get(store, path) 18 | file_size = resp.meta["size"] 19 | with tqdm(total=file_size) as pbar: 20 | for bytes_chunk in resp: 21 | # Do something with buffer 22 | pbar.update(len(bytes_chunk)) 23 | 24 | 25 | async def async_download_progress_bar(url: str): 26 | store, path = parse_url(url) 27 | resp = await obs.get_async(store, path) 28 | file_size = resp.meta["size"] 29 | with tqdm(total=file_size) as pbar: 30 | async for bytes_chunk in resp: 31 | # Do something with buffer 32 | pbar.update(len(bytes_chunk)) 33 | 34 | 35 | def parse_url(url: str) -> tuple[HTTPStore, str]: 36 | parsed = urlsplit(url) 37 | if parsed.query or parsed.fragment: 38 | raise ValueError("Invalid URL: query or fragment not supported in HTTPStore") 39 | 40 | base = f"{parsed.scheme}://{parsed.netloc}" 41 | store = HTTPStore.from_url(base) 42 | return store, parsed.path.lstrip("/") 43 | 44 | 45 | def main(): 46 | if len(sys.argv) >= 2: 47 | url = sys.argv[1] 48 | else: 49 | url = DEFAULT_URL 50 | 51 | print("Synchronous download:") 52 | sync_download_progress_bar(url) 53 | print("Asynchronous download:") 54 | asyncio.run(async_download_progress_bar(url)) 55 | 56 | 57 | if __name__ == "__main__": 58 | main() 59 | -------------------------------------------------------------------------------- /obstore/python/obstore/exceptions/__init__.pyi: -------------------------------------------------------------------------------- 1 | # Note: This should be able to be an `exceptions.pyi` file one level above, however 2 | # pylance isn't able to find that. So this is an exceptions module with only 3 | # `__init__.pyi` to work around pylance's bug. 4 | 5 | import sys 6 | 7 | if sys.version_info >= (3, 13): 8 | from warnings import deprecated 9 | else: 10 | from typing_extensions import deprecated 11 | 12 | class BaseError(Exception): 13 | """The base exception class. 14 | 15 | !!! note 16 | Some operations also raise a built-in `ValueError` or `FileNotFoundError`. 17 | """ 18 | 19 | class GenericError(BaseError): 20 | """A fallback error type when no variant matches.""" 21 | 22 | @deprecated("builtins.FileNotFoundError is emitted instead.") 23 | class NotFoundError(BaseError): 24 | """Error when the object is not found at given location.""" 25 | 26 | class InvalidPathError(BaseError): 27 | """Error for invalid path.""" 28 | 29 | class JoinError(BaseError): 30 | """Error when `tokio::spawn` failed.""" 31 | 32 | class NotSupportedError(BaseError): 33 | """Error when the attempted operation is not supported.""" 34 | 35 | class AlreadyExistsError(BaseError): 36 | """Error when the object already exists.""" 37 | 38 | class PreconditionError(BaseError): 39 | """Error when the required conditions failed for the operation.""" 40 | 41 | class NotModifiedError(BaseError): 42 | """Error when the object at the location isn't modified.""" 43 | 44 | class PermissionDeniedError(BaseError): 45 | """Permission denied. 46 | 47 | Error when the used credentials don't have enough permission to perform the 48 | requested operation. 49 | """ 50 | 51 | class UnauthenticatedError(BaseError): 52 | """Error when the used credentials lack valid authentication.""" 53 | 54 | class UnknownConfigurationKeyError(BaseError): 55 | """Error when a configuration key is invalid for the store used.""" 56 | -------------------------------------------------------------------------------- /docs/examples/r2.md: -------------------------------------------------------------------------------- 1 | # Cloudflare R2 2 | 3 | [Cloudflare R2](https://www.cloudflare.com/developer-platform/products/r2/) is Cloudflare's object storage solution, designed to be compatible with the S3 API. Some developers may choose to use Cloudflare R2 because it has no egress fees. 4 | 5 | It's easy to read and write data to and from Cloudflare R2 with obstore's [`S3Store`][obstore.store.S3Store] with three steps: 6 | 7 | 1. [Create an API token](https://dash.cloudflare.com/?to=/:account/r2/api-tokens) with read or read/write access to one or more buckets. 8 | 9 | Copy the `Access Key ID` and `Secret Access Key` to use in your code. 10 | 11 | ![](../assets/cloudflare-r2-credentials.jpg) 12 | 13 | 2. On the general settings of a bucket, take note of the `S3 API` URL. In my case it's `https://f0b62eebfbdde1133378bfe3958325f6.r2.cloudflarestorage.com/kylebarron-public`. 14 | 15 | ![](../assets/cloudflare-r2-bucket-info.png) 16 | 17 | 3. Pass this information to [`S3Store.from_url`][obstore.store.S3Store.from_url]: 18 | 19 | ```py 20 | from obstore.store import S3Store 21 | 22 | access_key_id = "..." 23 | secret_access_key = "..." 24 | store = S3Store.from_url( 25 | "https://f0b62eebfbdde1133378bfe3958325f6.r2.cloudflarestorage.com/ kylebarron-public", 26 | access_key_id=access_key_id, 27 | secret_access_key=secret_access_key, 28 | ) 29 | store.list_with_delimiter() 30 | ``` 31 | 32 | Or you can construct a store manually with the `endpoint` and `bucket` parameters: 33 | 34 | ```py 35 | from obstore.store import S3Store 36 | 37 | access_key_id = "..." 38 | secret_access_key = "..." 39 | bucket = "kylebarron-public" 40 | endpoint = "https://f0b62eebfbdde1133378bfe3958325f6.r2.cloudflarestorage. com" 41 | store = S3Store( 42 | bucket, 43 | access_key_id=access_key_id, 44 | secret_access_key=secret_access_key, 45 | endpoint=endpoint, 46 | ) 47 | store.list_with_delimiter() 48 | ``` 49 | -------------------------------------------------------------------------------- /examples/stream-zip/main.py: -------------------------------------------------------------------------------- 1 | """Example for using stream-zip with obstore.""" 2 | 3 | from __future__ import annotations 4 | 5 | import asyncio 6 | from pathlib import Path 7 | from stat import S_IFREG 8 | from typing import TYPE_CHECKING 9 | 10 | import stream_zip 11 | from stream_zip import ZIP_32, AsyncMemberFile 12 | 13 | from obstore.store import LocalStore, MemoryStore 14 | 15 | if TYPE_CHECKING: 16 | from collections.abc import AsyncIterable, Iterable 17 | 18 | from obstore.store import ObjectStore 19 | 20 | 21 | async def member_file(store: ObjectStore, path: str) -> AsyncMemberFile: 22 | """Create a member file for the zip archive.""" 23 | resp = await store.get_async(path) 24 | last_modified = resp.meta["last_modified"] 25 | mode = S_IFREG | 0o644 26 | # Unclear why but we need to wrap the response in an async generator 27 | return (path, last_modified, mode, ZIP_32, (byte async for byte in resp.stream())) 28 | 29 | 30 | async def member_files( 31 | store: ObjectStore, 32 | paths: Iterable[str], 33 | ) -> AsyncIterable[AsyncMemberFile]: 34 | """Create an async iterable of files for the zip archive.""" 35 | for path in paths: 36 | yield await member_file(store, path) 37 | 38 | 39 | async def zip_copy() -> None: 40 | """Copy files from one store into a zip archive that we upload to another store.""" 41 | # Input store with source data 42 | input_store = MemoryStore() 43 | input_store.put("foo", b"hello") 44 | input_store.put("bar", b"world") 45 | 46 | # Output store where the zip file will be saved 47 | output_store = LocalStore(Path()) 48 | 49 | # We can pass the streaming zip directly to `put` 50 | await output_store.put_async( 51 | "my.zip", 52 | stream_zip.async_stream_zip( 53 | member_files(input_store, ["foo", "bar"]), 54 | chunk_size=10 * 1024 * 1024, 55 | ), 56 | ) 57 | 58 | 59 | def main() -> None: 60 | """Run the zip copy example.""" 61 | asyncio.run(zip_copy()) 62 | 63 | 64 | if __name__ == "__main__": 65 | main() 66 | -------------------------------------------------------------------------------- /obstore/src/delete.rs: -------------------------------------------------------------------------------- 1 | use futures::{StreamExt, TryStreamExt}; 2 | use pyo3::prelude::*; 3 | use pyo3_async_runtimes::tokio::get_runtime; 4 | use pyo3_object_store::{PyObjectStore, PyObjectStoreError, PyObjectStoreResult}; 5 | 6 | use crate::path::PyPaths; 7 | use crate::utils::PyNone; 8 | 9 | #[pyfunction] 10 | pub(crate) fn delete(py: Python, store: PyObjectStore, paths: PyPaths) -> PyObjectStoreResult<()> { 11 | let runtime = get_runtime(); 12 | let store = store.into_inner(); 13 | py.detach(|| { 14 | match paths { 15 | PyPaths::One(path) => { 16 | runtime.block_on(store.delete(&path))?; 17 | } 18 | PyPaths::Many(paths) => { 19 | // TODO: add option to allow some errors here? 20 | let stream = 21 | store.delete_stream(futures::stream::iter(paths.into_iter().map(Ok)).boxed()); 22 | runtime.block_on(stream.try_collect::>())?; 23 | } 24 | }; 25 | Ok::<_, PyObjectStoreError>(()) 26 | }) 27 | } 28 | 29 | #[pyfunction] 30 | pub(crate) fn delete_async( 31 | py: Python, 32 | store: PyObjectStore, 33 | paths: PyPaths, 34 | ) -> PyResult> { 35 | let store = store.into_inner(); 36 | pyo3_async_runtimes::tokio::future_into_py(py, async move { 37 | match paths { 38 | PyPaths::One(path) => { 39 | store 40 | .delete(&path) 41 | .await 42 | .map_err(PyObjectStoreError::ObjectStoreError)?; 43 | } 44 | PyPaths::Many(paths) => { 45 | // TODO: add option to allow some errors here? 46 | let stream = 47 | store.delete_stream(futures::stream::iter(paths.into_iter().map(Ok)).boxed()); 48 | stream 49 | .try_collect::>() 50 | .await 51 | .map_err(PyObjectStoreError::ObjectStoreError)?; 52 | } 53 | } 54 | Ok(PyNone) 55 | }) 56 | } 57 | -------------------------------------------------------------------------------- /obstore/python/obstore/_store/_http.pyi: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from ._client import ClientConfig 4 | from ._retry import RetryConfig 5 | 6 | if sys.version_info >= (3, 11): 7 | from typing import Self 8 | else: 9 | from typing_extensions import Self 10 | 11 | class HTTPStore: 12 | """Configure a connection to a generic HTTP server.""" 13 | 14 | def __init__( 15 | self, 16 | url: str, 17 | *, 18 | client_options: ClientConfig | None = None, 19 | retry_config: RetryConfig | None = None, 20 | ) -> None: 21 | """Construct a new HTTPStore from a URL. 22 | 23 | Any path on the URL will be assigned as the `prefix` for the store. So if you 24 | pass `https://example.com/path/to/directory`, the store will be created with a 25 | prefix of `path/to/directory`, and all further operations will use paths 26 | relative to that prefix. 27 | 28 | Args: 29 | url: The base URL to use for the store. 30 | 31 | Keyword Args: 32 | client_options: HTTP Client options. Defaults to None. 33 | retry_config: Retry configuration. Defaults to None. 34 | 35 | Returns: 36 | HTTPStore 37 | 38 | """ 39 | 40 | @classmethod 41 | def from_url( 42 | cls, 43 | url: str, 44 | *, 45 | client_options: ClientConfig | None = None, 46 | retry_config: RetryConfig | None = None, 47 | ) -> Self: 48 | """Construct a new HTTPStore from a URL. 49 | 50 | This is an alias of [`HTTPStore.__init__`][obstore.store.HTTPStore.__init__]. 51 | """ 52 | 53 | def __eq__(self, value: object) -> bool: ... 54 | def __getnewargs_ex__(self): ... 55 | @property 56 | def url(self) -> str: 57 | """Get the base url of this store.""" 58 | @property 59 | def client_options(self) -> ClientConfig | None: 60 | """Get the store's client configuration.""" 61 | @property 62 | def retry_config(self) -> RetryConfig | None: 63 | """Get the store's retry configuration.""" 64 | -------------------------------------------------------------------------------- /tests/store/test_from_url.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | from pathlib import Path 4 | from typing import TYPE_CHECKING 5 | 6 | import pytest 7 | 8 | from obstore.exceptions import BaseError, UnknownConfigurationKeyError 9 | from obstore.store import from_url 10 | 11 | if TYPE_CHECKING: 12 | from obstore.store import S3Credential 13 | 14 | 15 | def test_local(): 16 | cwd = Path().absolute() 17 | url = f"file://{cwd}" 18 | _store = from_url(url) 19 | 20 | 21 | def test_memory(): 22 | url = "memory:///" 23 | _store = from_url(url) 24 | 25 | with pytest.raises(BaseError): 26 | from_url(url, access_key_id="test") 27 | 28 | 29 | def test_s3_params(): 30 | from_url( 31 | "s3://bucket/path", 32 | access_key_id="access_key_id", 33 | secret_access_key="secret_access_key", # noqa: S106 34 | ) 35 | 36 | with pytest.raises(UnknownConfigurationKeyError): 37 | from_url("s3://bucket/path", tenant_id="") 38 | 39 | 40 | def test_gcs_params(): 41 | # Just to test the params. In practice, the bucket shouldn't be passed 42 | # Note: we can't pass the bucket name here as a kwarg because it would conflict with 43 | # the bucket name in the URL. 44 | from_url("gs://test.example.com/path") 45 | 46 | with pytest.raises(UnknownConfigurationKeyError): 47 | from_url("gs://test.example.com/path", tenant_id="") 48 | 49 | 50 | def test_azure_params(): 51 | url = "abfs://container@account.dfs.core.windows.net/path" 52 | from_url(url, skip_signature=True) 53 | 54 | with pytest.raises(UnknownConfigurationKeyError): 55 | from_url(url, bucket="test") 56 | 57 | 58 | def test_http(): 59 | url = "https://mydomain/path" 60 | from_url(url) 61 | 62 | with pytest.raises(BaseError): 63 | from_url(url, bucket="test") 64 | 65 | 66 | def test_credential_provider_to_http_store_raises(): 67 | def s3_credential_provider() -> S3Credential: 68 | return {"access_key_id": "", "secret_access_key": "", "expires_at": None} 69 | 70 | with pytest.raises(BaseError): 71 | from_url("http://mydomain/path", credential_provider=s3_credential_provider) 72 | -------------------------------------------------------------------------------- /docs/examples/pyarrow.md: -------------------------------------------------------------------------------- 1 | # PyArrow 2 | 3 | [PyArrow](https://arrow.apache.org/docs/python/index.html) is the canonical Python implementation for the Apache Arrow project. 4 | 5 | PyArrow also supports reading and writing various file formats, including Parquet, CSV, JSON, and Arrow IPC. 6 | 7 | PyArrow integration is supported [via its fsspec integration](https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow), since Obstore [exposes an fsspec-compatible API](../integrations/fsspec.md). 8 | 9 | ```py 10 | import pyarrow.parquet as pq 11 | 12 | from obstore.fsspec import FsspecStore 13 | 14 | fs = FsspecStore("s3", skip_signature=True, region="us-west-2") 15 | 16 | url = "s3://overturemaps-us-west-2/release/2025-02-19.0/theme=addresses/type=address/part-00010-e084a2d7-fea9-41e5-a56f-e638a3307547-c000.zstd.parquet" 17 | parquet_file = pq.ParquetFile(url, filesystem=fs) 18 | print(parquet_file.schema_arrow) 19 | ``` 20 | prints: 21 | ``` 22 | id: string 23 | geometry: binary 24 | bbox: struct not null 25 | child 0, xmin: float 26 | child 1, xmax: float 27 | child 2, ymin: float 28 | child 3, ymax: float 29 | country: string 30 | postcode: string 31 | street: string 32 | number: string 33 | unit: string 34 | address_levels: list> 35 | child 0, element: struct 36 | child 0, value: string 37 | postal_city: string 38 | version: int32 not null 39 | sources: list> 40 | child 0, element: struct 41 | child 0, property: string 42 | child 1, dataset: string 43 | child 2, record_id: string 44 | child 3, update_time: string 45 | child 4, confidence: double 46 | -- schema metadata -- 47 | geo: '{"version":"1.1.0","primary_column":"geometry","columns":{"geometry' + 230 48 | org.apache.spark.legacyINT96: '' 49 | org.apache.spark.version: '3.4.1' 50 | org.apache.spark.sql.parquet.row.metadata: '{"type":"struct","fields":[{"' + 1586 51 | org.apache.spark.legacyDateTime: '' 52 | ``` 53 | -------------------------------------------------------------------------------- /docs/examples/zarr.md: -------------------------------------------------------------------------------- 1 | # Zarr 2 | 3 | [Zarr-Python](https://zarr.readthedocs.io/en/stable/index.html) is a Python library for reading and writing the [Zarr file format](https://zarr.dev/) for N-dimensional arrays. Zarr-Python is often used in conjunction with [Xarray](https://xarray.dev/). 4 | 5 | Zarr datasets are often very large and thus stored in object storage for cost effectiveness. As of Zarr-Python version 3.0.7 and later, you can [use Obstore as a backend](https://zarr.readthedocs.io/en/stable/user-guide/storage.html#object-store) for Zarr-Python. For large queries this [can be significantly faster](https://github.com/maxrjones/zarr-obstore-performance) than the default fsspec-based backend. 6 | 7 | ## Example 8 | 9 | !!! note 10 | 11 | This example is also [available on Github](https://github.com/developmentseed/obstore/blob/main/examples/zarr/README.md) if you'd like to test it out locally. 12 | 13 | ```py 14 | import matplotlib.pyplot as plt 15 | import pystac_client 16 | import xarray as xr 17 | from zarr.storage import ObjectStore 18 | 19 | from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider 20 | from obstore.store import AzureStore 21 | 22 | # These first lines are specific to Zarr stored in the Microsoft Planetary 23 | # Computer. We use pystac-client to find the metadata for this specific Zarr 24 | # store. 25 | catalog = pystac_client.Client.open( 26 | "https://planetarycomputer.microsoft.com/api/stac/v1/", 27 | ) 28 | collection = catalog.get_collection("daymet-daily-hi") 29 | asset = collection.assets["zarr-abfs"] 30 | 31 | # We construct an AzureStore because this Zarr dataset is stored in Azure 32 | # storage 33 | azure_store = AzureStore( 34 | credential_provider=PlanetaryComputerCredentialProvider.from_asset(asset), 35 | ) 36 | 37 | # Next we use the Zarr ObjectStorage adapter and pass it to xarray. 38 | zarr_store = ObjectStore(azure_store, read_only=True) 39 | ds = xr.open_dataset(zarr_store, consolidated=True, engine="zarr") 40 | 41 | # And plot with matplotlib 42 | fig, ax = plt.subplots(figsize=(12, 12)) 43 | ds.sel(time="2009")["tmax"].mean(dim="time").plot.imshow(ax=ax, cmap="inferno") 44 | fig.savefig("zarr-example.png") 45 | ``` 46 | 47 | This plots: 48 | 49 | ![](../assets/zarr-example.png) 50 | -------------------------------------------------------------------------------- /obstore/python/obstore/auth/_http.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | from typing import TYPE_CHECKING 4 | from warnings import warn 5 | 6 | if TYPE_CHECKING: 7 | import aiohttp 8 | import aiohttp_retry 9 | import requests 10 | 11 | 12 | def default_requests_session() -> requests.Session: 13 | import requests 14 | import requests.adapters 15 | import urllib3 16 | import urllib3.util.retry 17 | 18 | # retry_total: The number of allowable retry attempts for REST API calls. 19 | # Use retry_total=0 to disable retries. A backoff factor to apply 20 | # between attempts. 21 | # retry_backoff_factor: A backoff factor to apply between attempts 22 | # after the second try (most errors are resolved immediately by a second 23 | # try without a delay). Retry policy will sleep for: 24 | 25 | # ``{backoff factor} * (2 ** ({number of total retries} - 1))`` seconds. 26 | # If the backoff_factor is 0.1, then the retry will sleep for 27 | # [0.0s, 0.2s, 0.4s, ...] between retries. The default value is 0.8. 28 | retry_total = 10 29 | retry_backoff_factor = 0.8 30 | 31 | session = requests.Session() 32 | retry = urllib3.util.retry.Retry( 33 | total=retry_total, 34 | backoff_factor=retry_backoff_factor, 35 | status_forcelist=[429, 500, 502, 503, 504], 36 | ) 37 | 38 | adapter = requests.adapters.HTTPAdapter(max_retries=retry) 39 | session.mount("http://", adapter) 40 | session.mount("https://", adapter) 41 | 42 | return session 43 | 44 | 45 | def default_aiohttp_session() -> aiohttp_retry.RetryClient | aiohttp.ClientSession: 46 | try: 47 | from aiohttp_retry import ExponentialRetry, RetryClient 48 | 49 | return RetryClient( 50 | raise_for_status=False, 51 | retry_options=ExponentialRetry(attempts=1), 52 | ) 53 | except ImportError: 54 | from aiohttp import ClientSession 55 | 56 | # Put this after validating that we can import aiohttp 57 | warn( 58 | "aiohttp_retry not installed and custom aiohttp session not provided. " 59 | "Authentication will not be retried.", 60 | RuntimeWarning, 61 | stacklevel=3, 62 | ) 63 | 64 | return ClientSession() 65 | -------------------------------------------------------------------------------- /tests/obspec/test-store.yml: -------------------------------------------------------------------------------- 1 | # yaml-language-server: $schema=https://raw.githubusercontent.com/typeddjango/pytest-mypy-plugins/master/pytest_mypy_plugins/schema.json 2 | - case: assignable_to_sync_store 3 | parametrized: 4 | - store: AzureStore 5 | - store: GCSStore 6 | - store: LocalStore 7 | - store: MemoryStore 8 | - store: S3Store 9 | main: | 10 | from typing import Protocol 11 | 12 | from obspec import ( 13 | Copy, 14 | Delete, 15 | Get, 16 | GetRange, 17 | GetRanges, 18 | Head, 19 | List, 20 | ListWithDelimiter, 21 | Put, 22 | Rename, 23 | ) 24 | from typing_extensions import assert_type 25 | 26 | from obstore.store import {{ store }} 27 | 28 | class SyncStore( 29 | Copy, 30 | Delete, 31 | Get, 32 | GetRange, 33 | GetRanges, 34 | Head, 35 | List, 36 | ListWithDelimiter, 37 | Put, 38 | Rename, 39 | Protocol, 40 | ): ... 41 | 42 | def accepts_sync_store(store: SyncStore) -> None: 43 | assert_type(store, SyncStore) 44 | 45 | def assignable_to_sync_store(store: {{ store }}) -> None: 46 | accepts_sync_store(store) 47 | 48 | - case: assignable_to_async_store 49 | parametrized: 50 | - store: AzureStore 51 | - store: GCSStore 52 | - store: LocalStore 53 | - store: MemoryStore 54 | - store: S3Store 55 | main: | 56 | from typing import Protocol 57 | 58 | from obspec import ( 59 | CopyAsync, 60 | DeleteAsync, 61 | GetAsync, 62 | GetRangeAsync, 63 | GetRangesAsync, 64 | HeadAsync, 65 | ListAsync, 66 | ListWithDelimiterAsync, 67 | PutAsync, 68 | RenameAsync, 69 | ) 70 | from typing_extensions import assert_type 71 | 72 | from obstore.store import {{ store }} 73 | 74 | class AsyncStore( 75 | CopyAsync, 76 | DeleteAsync, 77 | GetAsync, 78 | GetRangeAsync, 79 | GetRangesAsync, 80 | HeadAsync, 81 | ListAsync, 82 | ListWithDelimiterAsync, 83 | PutAsync, 84 | RenameAsync, 85 | Protocol, 86 | ): ... 87 | 88 | def accepts_async_store(store: AsyncStore) -> None: 89 | assert_type(store, AsyncStore) 90 | 91 | def assignable_to_async_store(store: {{ store }}) -> None: 92 | accepts_async_store(store) 93 | -------------------------------------------------------------------------------- /tests/test_delete.py: -------------------------------------------------------------------------------- 1 | from tempfile import TemporaryDirectory 2 | 3 | import pytest 4 | 5 | import obstore as obs 6 | from obstore.store import LocalStore, MemoryStore 7 | 8 | 9 | def test_delete_one(): 10 | store = MemoryStore() 11 | 12 | store.put("file1.txt", b"foo") 13 | store.put("file2.txt", b"bar") 14 | store.put("file3.txt", b"baz") 15 | 16 | assert len(store.list().collect()) == 3 17 | store.delete("file1.txt") 18 | store.delete("file2.txt") 19 | store.delete("file3.txt") 20 | assert len(store.list().collect()) == 0 21 | 22 | 23 | @pytest.mark.asyncio 24 | async def test_delete_async(): 25 | store = MemoryStore() 26 | 27 | await obs.put_async(store, "file1.txt", b"foo") 28 | result = await obs.delete_async(store, "file1.txt") 29 | assert result is None 30 | 31 | 32 | def test_delete_many(): 33 | store = MemoryStore() 34 | 35 | store.put("file1.txt", b"foo") 36 | store.put("file2.txt", b"bar") 37 | store.put("file3.txt", b"baz") 38 | 39 | assert len(store.list().collect()) == 3 40 | obs.delete( 41 | store, 42 | ["file1.txt", "file2.txt", "file3.txt"], 43 | ) 44 | assert len(store.list().collect()) == 0 45 | 46 | 47 | # Local filesystem errors if the file does not exist. 48 | def test_delete_one_local_fs(): 49 | with TemporaryDirectory() as tmpdir: 50 | store = LocalStore(tmpdir) 51 | 52 | store.put("file1.txt", b"foo") 53 | store.put("file2.txt", b"bar") 54 | store.put("file3.txt", b"baz") 55 | 56 | assert len(store.list().collect()) == 3 57 | obs.delete(store, "file1.txt") 58 | obs.delete(store, "file2.txt") 59 | obs.delete(store, "file3.txt") 60 | assert len(store.list().collect()) == 0 61 | 62 | with pytest.raises(FileNotFoundError): 63 | obs.delete(store, "file1.txt") 64 | 65 | 66 | def test_delete_many_local_fs(): 67 | with TemporaryDirectory() as tmpdir: 68 | store = LocalStore(tmpdir) 69 | 70 | store.put("file1.txt", b"foo") 71 | store.put("file2.txt", b"bar") 72 | store.put("file3.txt", b"baz") 73 | 74 | assert len(store.list().collect()) == 3 75 | obs.delete( 76 | store, 77 | ["file1.txt", "file2.txt", "file3.txt"], 78 | ) 79 | 80 | with pytest.raises(FileNotFoundError): 81 | obs.delete( 82 | store, 83 | ["file1.txt", "file2.txt", "file3.txt"], 84 | ) 85 | -------------------------------------------------------------------------------- /.github/workflows/docs.yml: -------------------------------------------------------------------------------- 1 | name: Publish Python docs 2 | 3 | # Only run on new tags starting with `py-v` 4 | on: 5 | push: 6 | tags: 7 | - "py-v*" 8 | workflow_dispatch: 9 | 10 | # https://stackoverflow.com/a/77412363 11 | permissions: 12 | contents: write 13 | pages: write 14 | 15 | jobs: 16 | build: 17 | name: Deploy Python docs 18 | runs-on: ubuntu-latest 19 | # Used for configuring social plugin in mkdocs.yml 20 | # Unclear if this is always set in github actions 21 | env: 22 | CI: "TRUE" 23 | steps: 24 | - uses: actions/checkout@v4 25 | # We need to additionally fetch the gh-pages branch for mike deploy 26 | with: 27 | fetch-depth: 0 28 | 29 | - name: Install Rust 30 | uses: dtolnay/rust-toolchain@stable 31 | 32 | - uses: Swatinem/rust-cache@v2 33 | 34 | - name: Install a specific version of uv 35 | uses: astral-sh/setup-uv@v5 36 | with: 37 | enable-cache: true 38 | version: "0.5.x" 39 | 40 | - name: Set up Python 3.11 41 | run: uv python install 3.11 42 | 43 | - name: Install dependencies 44 | run: uv sync 45 | 46 | - name: Build python packages 47 | run: | 48 | uv run maturin develop -m obstore/Cargo.toml 49 | 50 | - name: Deploy docs 51 | env: 52 | GIT_COMMITTER_NAME: CI 53 | GIT_COMMITTER_EMAIL: ci-bot@example.com 54 | run: | 55 | # Get most recent git tag 56 | # https://stackoverflow.com/a/7261049 57 | # https://stackoverflow.com/a/3867811 58 | # We don't use {{github.ref_name}} because if triggered manually, it 59 | # will be a branch name instead of a tag version. 60 | # Then remove `py-` from the tag 61 | VERSION=$(git describe --tags --match="py-*" --abbrev=0 | cut -c 4-) 62 | 63 | # Only push publish docs as latest version if no letters in git tag 64 | # after the first character 65 | # (usually the git tag will have v as the first character) 66 | # Note the `cut` index is 1-ordered 67 | if echo $VERSION | cut -c 2- | grep -q "[A-Za-z]"; then 68 | echo "Is beta version" 69 | # For beta versions publish but don't set as latest 70 | uv run mike deploy $VERSION --update-aliases --push 71 | else 72 | echo "Is NOT beta version" 73 | uv run mike deploy $VERSION latest --update-aliases --push 74 | fi 75 | -------------------------------------------------------------------------------- /.github/workflows/test-python.yml: -------------------------------------------------------------------------------- 1 | name: Python 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | pull_request: 8 | 9 | permissions: 10 | contents: read 11 | 12 | concurrency: 13 | group: ${{ github.workflow }}-${{ github.ref }} 14 | cancel-in-progress: true 15 | 16 | jobs: 17 | pre-commit: 18 | name: Run pre-commit on Python code 19 | runs-on: ubuntu-latest 20 | steps: 21 | - uses: actions/checkout@v4 22 | 23 | - uses: actions/setup-python@v5 24 | with: 25 | python-version: "3.11" 26 | 27 | # Use ruff-action so we get annotations in the Github UI 28 | - uses: astral-sh/ruff-action@v3 29 | 30 | - name: Cache pre-commit virtualenvs 31 | uses: actions/cache@v4 32 | with: 33 | path: ~/.cache/pre-commit 34 | key: pre-commit-3|${{ hashFiles('.pre-commit-config.yaml') }} 35 | 36 | - name: run pre-commit 37 | run: | 38 | python -m pip install pre-commit 39 | pre-commit run --all-files 40 | 41 | test-python: 42 | name: Build and test Python 43 | runs-on: ubuntu-latest 44 | strategy: 45 | fail-fast: true 46 | matrix: 47 | python-version: ["3.9", "3.10", "3.11", "3.12"] 48 | steps: 49 | - uses: actions/checkout@v4 50 | 51 | - name: Install Rust 52 | uses: dtolnay/rust-toolchain@stable 53 | 54 | - uses: Swatinem/rust-cache@v2 55 | 56 | - name: Install uv 57 | uses: astral-sh/setup-uv@v5 58 | with: 59 | enable-cache: true 60 | version: "0.5.x" 61 | 62 | - name: Set up Python 63 | run: uv python install ${{ matrix.python-version }} 64 | 65 | - name: Build rust submodules 66 | run: | 67 | uv run maturin develop -m obstore/Cargo.toml 68 | 69 | - name: Run python tests 70 | run: | 71 | uv run pytest 72 | 73 | # Ensure docs build without warnings 74 | - name: Check docs 75 | if: "${{ matrix.python-version == 3.11 }}" 76 | run: uv run mkdocs build --strict 77 | 78 | - name: Add venv to PATH (for pyright action) 79 | run: echo "$PWD/.venv/bin" >> $GITHUB_PATH 80 | 81 | - name: Run pyright 82 | uses: jakebailey/pyright-action@v2.3.3 83 | with: 84 | # Restore pylance-version: latest-release 85 | # once it uses >1.1.405 86 | # https://github.com/microsoft/pyright/issues/10906 87 | version: 1.1.406 88 | # pylance-version: latest-release 89 | -------------------------------------------------------------------------------- /obstore/python/obstore/_obstore.pyi: -------------------------------------------------------------------------------- 1 | from . import _store 2 | from ._attributes import Attribute, Attributes 3 | from ._buffered import ( 4 | AsyncReadableFile, 5 | AsyncWritableFile, 6 | ReadableFile, 7 | WritableFile, 8 | open_reader, 9 | open_reader_async, 10 | open_writer, 11 | open_writer_async, 12 | ) 13 | from ._bytes import Bytes 14 | from ._copy import copy, copy_async 15 | from ._delete import delete, delete_async 16 | from ._get import ( 17 | BytesStream, 18 | GetOptions, 19 | GetResult, 20 | OffsetRange, 21 | SuffixRange, 22 | get, 23 | get_async, 24 | get_range, 25 | get_range_async, 26 | get_ranges, 27 | get_ranges_async, 28 | ) 29 | from ._head import head, head_async 30 | from ._list import ( 31 | ListChunkType, 32 | ListResult, 33 | ListStream, 34 | ObjectMeta, 35 | list, # noqa: A004 36 | list_with_delimiter, 37 | list_with_delimiter_async, 38 | ) 39 | from ._put import PutMode, PutResult, UpdateVersion, put, put_async 40 | from ._rename import rename, rename_async 41 | from ._scheme import parse_scheme 42 | from ._sign import HTTP_METHOD, SignCapableStore, sign, sign_async 43 | 44 | __version__: str 45 | _object_store_version: str 46 | _object_store_source: str 47 | 48 | __all__ = [ 49 | "HTTP_METHOD", 50 | "AsyncReadableFile", 51 | "AsyncWritableFile", 52 | "Attribute", 53 | "Attributes", 54 | "Bytes", 55 | "BytesStream", 56 | "GetOptions", 57 | "GetResult", 58 | "ListChunkType", 59 | "ListResult", 60 | "ListStream", 61 | "ObjectMeta", 62 | "OffsetRange", 63 | "PutMode", 64 | "PutResult", 65 | "ReadableFile", 66 | "SignCapableStore", 67 | "SuffixRange", 68 | "UpdateVersion", 69 | "WritableFile", 70 | "__version__", 71 | "_object_store_source", 72 | "_object_store_version", 73 | "_store", 74 | "copy", 75 | "copy_async", 76 | "delete", 77 | "delete_async", 78 | "get", 79 | "get_async", 80 | "get_range", 81 | "get_range_async", 82 | "get_ranges", 83 | "get_ranges_async", 84 | "head", 85 | "head_async", 86 | "list", 87 | "list_with_delimiter", 88 | "list_with_delimiter_async", 89 | "open_reader", 90 | "open_reader_async", 91 | "open_writer", 92 | "open_writer_async", 93 | "parse_scheme", 94 | "put", 95 | "put_async", 96 | "rename", 97 | "rename_async", 98 | "sign", 99 | "sign_async", 100 | ] 101 | -------------------------------------------------------------------------------- /obstore/python/obstore/_attributes.pyi: -------------------------------------------------------------------------------- 1 | import sys 2 | from typing import Literal 3 | 4 | if sys.version_info >= (3, 10): 5 | from typing import TypeAlias 6 | else: 7 | from typing_extensions import TypeAlias 8 | 9 | Attribute: TypeAlias = ( 10 | Literal[ 11 | "Content-Disposition", 12 | "Content-Encoding", 13 | "Content-Language", 14 | "Content-Type", 15 | "Cache-Control", 16 | ] 17 | | str 18 | ) 19 | """Additional object attribute types. 20 | 21 | - `"Content-Disposition"`: Specifies how the object should be handled by a browser. 22 | 23 | See [Content-Disposition](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition). 24 | 25 | - `"Content-Encoding"`: Specifies the encodings applied to the object. 26 | 27 | See [Content-Encoding](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding). 28 | 29 | - `"Content-Language"`: Specifies the language of the object. 30 | 31 | See [Content-Language](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Language). 32 | 33 | - `"Content-Type"`: Specifies the MIME type of the object. 34 | 35 | This takes precedence over any client configuration. 36 | 37 | See [Content-Type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type). 38 | 39 | - `"Cache-Control"`: Overrides cache control policy of the object. 40 | 41 | See [Cache-Control](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control). 42 | 43 | Any other string key specifies a user-defined metadata field for the object. 44 | 45 | !!! warning "Not importable at runtime" 46 | 47 | To use this type hint in your code, import it within a `TYPE_CHECKING` block: 48 | 49 | ```py 50 | from __future__ import annotations 51 | from typing import TYPE_CHECKING 52 | if TYPE_CHECKING: 53 | from obstore import Attribute 54 | ``` 55 | """ 56 | 57 | Attributes: TypeAlias = dict[Attribute, str] 58 | """Additional attributes of an object 59 | 60 | Attributes can be specified in [`put`][obstore.put]/[`put_async`][obstore.put_async] and 61 | retrieved from [`get`][obstore.get]/[`get_async`][obstore.get_async]. 62 | 63 | Unlike ObjectMeta, Attributes are not returned by listing APIs 64 | 65 | !!! warning "Not importable at runtime" 66 | 67 | To use this type hint in your code, import it within a `TYPE_CHECKING` block: 68 | 69 | ```py 70 | from __future__ import annotations 71 | from typing import TYPE_CHECKING 72 | if TYPE_CHECKING: 73 | from obstore import Attributes 74 | ``` 75 | """ 76 | -------------------------------------------------------------------------------- /DEVELOP.md: -------------------------------------------------------------------------------- 1 | # Contributor Documentation 2 | 3 | ## Prerequisites 4 | 5 | Install [uv](https://docs.astral.sh/uv/) and [Rust](https://www.rust-lang.org/tools/install). 6 | 7 | ## Layout 8 | 9 | - `pyo3-object_store/`: Logic for constructing `object_store` instances lives here, so that it can potentially be shared with other Rust-Python libraries in the future. 10 | - `obstore/`: The primary Python-facing bindings of the `obstore` library. This re-exports the classes defined in `pyo3-object_store`. It also adds the top-level functions that form the `obstore` API. 11 | - `pyo3-bytes`: A wrapper of [`bytes::Bytes`](https://docs.rs/bytes/latest/bytes/struct.Bytes.html) that is used inside `obstore` for zero-copy buffer exchange between Rust and Python but also is intended to be reusable for other Rust-Python libraries. 12 | 13 | ## Developing obstore 14 | 15 | From the top-level directory, run 16 | 17 | ``` 18 | uv run maturin dev -m obstore/Cargo.toml 19 | ``` 20 | 21 | this will compile `obstore` and add it to the uv-managed Python environment. 22 | 23 | If you wish to do any benchmarking, run 24 | 25 | ``` 26 | uv run maturin dev -m obstore/Cargo.toml --release 27 | ``` 28 | 29 | to compile `obstore` with release optimizations turned on. 30 | 31 | ### Maturin import hook 32 | 33 | Run 34 | ``` 35 | uv run python -m maturin_import_hook site install 36 | ``` 37 | 38 | to ensure that obstore is automatically recompiled if changed whenever you 39 | import `obstore` in Python. 40 | 41 | See [import hook docs](https://www.maturin.rs/import_hook) for more information. 42 | 43 | ### Tests 44 | 45 | All obstore tests should go into the top-level `tests` directory. 46 | 47 | ## Publishing 48 | 49 | Push a new tag to the main branch of the format `py-v*`. A new version will be published to PyPI automatically. 50 | 51 | ## Documentation website 52 | 53 | The documentation website is generated with `mkdocs` and [`mkdocs-material`](https://squidfunk.github.io/mkdocs-material). You can serve the docs website locally with 54 | 55 | ``` 56 | uv run mkdocs serve 57 | ``` 58 | 59 | Publishing documentation happens automatically via CI when a new tag is published of the format `py-v*`. It can also be triggered manually through the Github Actions dashboard on [this page](https://github.com/developmentseed/obstore/actions/workflows/docs.yml). Note that publishing docs manually is **not advised if there have been new code additions since the last release** as the new functionality will be associated in the documentation with the tag of the _previous_ release. In this case, prefer publishing a new patch or minor release, which will publish both a new Python package and the new documentation for it. 60 | -------------------------------------------------------------------------------- /pyo3-bytes/README.md: -------------------------------------------------------------------------------- 1 | # pyo3-bytes 2 | 3 | Integration between [`bytes`](https://docs.rs/bytes) and [`pyo3`](https://github.com/PyO3/pyo3). 4 | 5 | This provides [`PyBytes`], a wrapper around [`Bytes`][::bytes::Bytes] that supports the [Python buffer protocol](https://docs.python.org/3/c-api/buffer.html). 6 | 7 | This uses the new [`Bytes::from_owner` API](https://docs.rs/bytes/latest/bytes/struct.Bytes.html#method.from_owner) introduced in `bytes` 1.9. 8 | 9 | Since this integration uses the Python buffer protocol, any library that uses `pyo3-bytes` must set the feature flags for the `pyo3` dependency correctly. `pyo3` must either _not_ have an [`abi3` feature flag](https://pyo3.rs/v0.23.4/features.html#abi3) (in which case maturin will generate wheels per Python version), or have `abi3-py311` (which supports only Python 3.11+), since the buffer protocol became part of the Python stable ABI [as of Python 3.11](https://docs.python.org/3/c-api/buffer.html#c.Py_buffer). 10 | 11 | ## Importing buffers to Rust 12 | 13 | Just use `PyBytes` as a type in your functions or methods exposed to Python. 14 | 15 | ```rs 16 | use pyo3_bytes::PyBytes; 17 | use bytes::Bytes; 18 | 19 | #[pyfunction] 20 | pub fn use_bytes(buffer: PyBytes) { 21 | let buffer: Bytes = buffer.into_inner(); 22 | } 23 | ``` 24 | 25 | ## Exporting buffers to Python 26 | 27 | Return the `PyBytes` class from your function. 28 | 29 | ```rs 30 | use pyo3_bytes::PyBytes; 31 | use bytes::Bytes; 32 | 33 | #[pyfunction] 34 | pub fn return_bytes() -> PyBytes { 35 | let buffer = Bytes::from_static(b"hello"); 36 | PyBytes::new(buffer) 37 | } 38 | ``` 39 | 40 | ## Safety 41 | 42 | Unfortunately, this interface cannot be 100% safe, as the Python buffer protocol does not enforce buffer immutability. 43 | 44 | The Python user must take care to not mutate the buffers that have been passed 45 | to Rust. 46 | 47 | For more reading: 48 | 49 | - 50 | - 51 | - 52 | 53 | ## Python type hints 54 | 55 | On the Python side, the exported `Bytes` class implements many of the same 56 | methods (with the same signature) as the Python `bytes` object. 57 | 58 | The Python type hints are available in the Github repo in the file `bytes.pyi`. 59 | I don't know the best way to distribute this to downstream projects. If you have 60 | an idea, create an issue to discuss. 61 | 62 | ## Version compatibility 63 | 64 | | pyo3-bytes | pyo3 | 65 | | ---------- | ---- | 66 | | 0.1.x | 0.23 | 67 | | 0.2.x | 0.24 | 68 | | 0.3.x | 0.25 | 69 | | 0.4.x | 0.26 | 70 | | 0.5.x | 0.27 | 71 | -------------------------------------------------------------------------------- /obstore/python/obstore/_sign.pyi: -------------------------------------------------------------------------------- 1 | import sys 2 | from collections.abc import Sequence 3 | from datetime import timedelta 4 | from typing import Literal, overload 5 | 6 | from .store import AzureStore, GCSStore, S3Store 7 | 8 | if sys.version_info >= (3, 10): 9 | from typing import TypeAlias 10 | else: 11 | from typing_extensions import TypeAlias 12 | 13 | HTTP_METHOD: TypeAlias = Literal[ 14 | "GET", 15 | "PUT", 16 | "POST", 17 | "HEAD", 18 | "PATCH", 19 | "TRACE", 20 | "DELETE", 21 | "OPTIONS", 22 | "CONNECT", 23 | ] 24 | """Allowed HTTP Methods for signing.""" 25 | 26 | SignCapableStore: TypeAlias = AzureStore | GCSStore | S3Store 27 | """ObjectStore instances that are capable of signing.""" 28 | 29 | @overload 30 | def sign( 31 | store: SignCapableStore, 32 | method: HTTP_METHOD, 33 | paths: str, 34 | expires_in: timedelta, 35 | ) -> str: ... 36 | @overload 37 | def sign( 38 | store: SignCapableStore, 39 | method: HTTP_METHOD, 40 | paths: Sequence[str], 41 | expires_in: timedelta, 42 | ) -> Sequence[str]: ... 43 | def sign( # type: ignore[misc] # docstring in pyi file 44 | store: SignCapableStore, 45 | method: HTTP_METHOD, 46 | paths: str | Sequence[str], 47 | expires_in: timedelta, 48 | ) -> str | Sequence[str]: 49 | """Create a signed URL. 50 | 51 | Given the intended `method` and `paths` to use and the desired length of time for 52 | which the URL should be valid, return a signed URL created with the object store 53 | implementation's credentials such that the URL can be handed to something that 54 | doesn't have access to the object store's credentials, to allow limited access to 55 | the object store. 56 | 57 | Args: 58 | store: The ObjectStore instance to use. 59 | method: The HTTP method to use. 60 | paths: The path(s) within ObjectStore to retrieve. If 61 | expires_in: How long the signed URL(s) should be valid. 62 | 63 | Returns: 64 | _description_ 65 | 66 | """ 67 | 68 | @overload 69 | async def sign_async( 70 | store: SignCapableStore, 71 | method: HTTP_METHOD, 72 | paths: str, 73 | expires_in: timedelta, 74 | ) -> str: ... 75 | @overload 76 | async def sign_async( 77 | store: SignCapableStore, 78 | method: HTTP_METHOD, 79 | paths: Sequence[str], 80 | expires_in: timedelta, 81 | ) -> Sequence[str]: ... 82 | async def sign_async( # type: ignore[misc] # docstring in pyi file 83 | store: SignCapableStore, 84 | method: HTTP_METHOD, 85 | paths: str | Sequence[str], 86 | expires_in: timedelta, 87 | ) -> str | Sequence[str]: 88 | """Call `sign` asynchronously. 89 | 90 | Refer to the documentation for [sign][obstore.sign]. 91 | """ 92 | -------------------------------------------------------------------------------- /docs/troubleshooting/aws.md: -------------------------------------------------------------------------------- 1 | # Troubleshooting Amazon S3 2 | 3 | ## Region required 4 | 5 | All requests to S3 must include the region. An error will occur on requests when you don't pass the correct region. 6 | 7 | For example, trying to list the [`sentinel-cogs`](https://registry.opendata.aws/sentinel-2-l2a-cogs/) open bucket without passing a region will fail: 8 | 9 | ```py 10 | import obstore as obs 11 | from obstore.store import S3Store 12 | 13 | store = S3Store("sentinel-cogs", skip_signature=True) 14 | next(obs.list(store)) 15 | ``` 16 | 17 | raises 18 | 19 | ``` 20 | GenericError: Generic S3 error: Error performing list request: 21 | Received redirect without LOCATION, this normally indicates an incorrectly 22 | configured region 23 | ``` 24 | 25 | We can fix this by passing the correct region: 26 | 27 | ```py 28 | import obstore as obs 29 | from obstore.store import S3Store 30 | 31 | store = S3Store("sentinel-cogs", skip_signature=True, region="us-west-2") 32 | next(obs.list(store)) 33 | ``` 34 | 35 | this prints: 36 | 37 | ```py 38 | [{'path': 'sentinel-s2-l2a-cogs/1/C/CV/2018/10/S2B_1CCV_20181004_0_L2A/AOT.tif', 39 | 'last_modified': datetime.datetime(2020, 9, 30, 20, 25, 56, tzinfo=datetime.timezone.utc), 40 | 'size': 50510, 41 | 'e_tag': '"2e24c2ee324ea478f2f272dbd3f5ce69"', 42 | 'version': None}, 43 | ... 44 | ``` 45 | 46 | ### Inferring the bucket region 47 | 48 | Note that it's possible to infer the S3 bucket region from an arbitrary `HEAD` request. 49 | 50 | Here, we show an example of using `requests` to find the bucket region, but you can use any HTTP client: 51 | 52 | ```py 53 | import requests 54 | 55 | def find_bucket_region(bucket_name: str) -> str: 56 | resp = requests.head(f"https://{bucket_name}.s3.amazonaws.com") 57 | return resp.headers["x-amz-bucket-region"] 58 | ``` 59 | 60 | Applying this to our previous example, we can use this to find the region of the `sentinel-cogs` bucket: 61 | 62 | ```py 63 | find_bucket_region("sentinel-cogs") 64 | # 'us-west-2' 65 | ``` 66 | 67 | Or we can pass this directly into the region: 68 | 69 | ```py 70 | bucket_name = "sentinel-cogs" 71 | store = S3Store( 72 | bucket_name, skip_signature=True, region=find_bucket_region(bucket_name) 73 | ) 74 | ``` 75 | 76 | Finding the bucket region in this way works **both for public and non-public buckets**. 77 | 78 | This `HEAD` request can also tell you if the bucket is public or not by checking the [HTTP response code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) (accessible in `requests` via [`resp.status_code`](https://requests.readthedocs.io/en/latest/api/#requests.Response.status_code)): 79 | 80 | - [`200`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/200): public bucket. 81 | - [`403`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403): private bucket. 82 | -------------------------------------------------------------------------------- /tests/auth/test_planetary_computer.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | import pystac_client 4 | import pytest 5 | 6 | from obstore.auth.planetary_computer import ( 7 | PlanetaryComputerAsyncCredentialProvider, 8 | PlanetaryComputerCredentialProvider, 9 | ) 10 | from obstore.store import AzureStore 11 | 12 | catalog = pystac_client.Client.open( 13 | "https://planetarycomputer.microsoft.com/api/stac/v1/", 14 | ) 15 | 16 | 17 | @pytest.mark.parametrize( 18 | "cls", 19 | [PlanetaryComputerCredentialProvider, PlanetaryComputerAsyncCredentialProvider], 20 | ) 21 | @pytest.mark.asyncio 22 | async def test_from_asset( 23 | cls: type[ 24 | PlanetaryComputerCredentialProvider | PlanetaryComputerAsyncCredentialProvider 25 | ], 26 | ): 27 | collection = catalog.get_collection("daymet-daily-hi") 28 | 29 | abfs_asset = collection.assets["zarr-abfs"] 30 | cls.from_asset(abfs_asset) 31 | 32 | cls.from_asset(abfs_asset.__dict__) 33 | 34 | blob_asset = collection.assets["zarr-https"] 35 | cls.from_asset(blob_asset) 36 | 37 | cls.from_asset(blob_asset.__dict__) 38 | 39 | collection = catalog.get_collection("landsat-c2-l2") 40 | gpq_asset = collection.assets["geoparquet-items"] 41 | cls.from_asset(gpq_asset) 42 | 43 | cls.from_asset(gpq_asset.__dict__) 44 | 45 | 46 | @pytest.mark.parametrize( 47 | "cls", 48 | [PlanetaryComputerCredentialProvider, PlanetaryComputerAsyncCredentialProvider], 49 | ) 50 | @pytest.mark.asyncio 51 | async def test_pass_config_to_store( 52 | cls: type[ 53 | PlanetaryComputerCredentialProvider | PlanetaryComputerAsyncCredentialProvider 54 | ], 55 | ): 56 | url = "https://naipeuwest.blob.core.windows.net/naip/v002/mt/2023/mt_060cm_2023/" 57 | store = AzureStore(credential_provider=cls(url)) 58 | assert store.config == {"account_name": "naipeuwest", "container_name": "naip"} 59 | assert store.prefix == "v002/mt/2023/mt_060cm_2023" 60 | 61 | 62 | @pytest.mark.parametrize( 63 | "cls", 64 | [PlanetaryComputerCredentialProvider, PlanetaryComputerAsyncCredentialProvider], 65 | ) 66 | @pytest.mark.asyncio 67 | async def test_url_account_container_params( 68 | cls: type[ 69 | PlanetaryComputerCredentialProvider | PlanetaryComputerAsyncCredentialProvider 70 | ], 71 | ): 72 | url = "https://naipeuwest.blob.core.windows.net/naip/v002/mt/2023/mt_060cm_2023/" 73 | account_name = "naipeuwest" 74 | container_name = "naip" 75 | 76 | cls(url) 77 | 78 | with pytest.raises(ValueError, match="Cannot pass container_name"): 79 | cls(url, container_name=container_name) 80 | 81 | with pytest.raises(ValueError, match="Cannot pass account_name"): 82 | cls(url, account_name=account_name) 83 | 84 | cls( 85 | account_name=account_name, 86 | container_name=container_name, 87 | ) 88 | -------------------------------------------------------------------------------- /docs/examples/stream-zip.md: -------------------------------------------------------------------------------- 1 | # Streaming ZIP file creation 2 | 3 | This example demonstrates how to create a zip archive from files in one store and upload it to another store using the [`stream_zip`](https://github.com/uktrade/stream-zip) library. 4 | 5 | This never stores any entire source file or the target zip file in memory, so you can zip large files with low memory overhead. 6 | 7 | ## Example 8 | 9 | !!! note 10 | 11 | This example is also [available on Github](https://github.com/developmentseed/obstore/blob/main/examples/stream-zip/README.md) if you'd like to test it out locally. 12 | 13 | ```py 14 | from __future__ import annotations 15 | 16 | import asyncio 17 | from pathlib import Path 18 | from stat import S_IFREG 19 | from typing import TYPE_CHECKING 20 | 21 | import stream_zip 22 | from stream_zip import ZIP_32, AsyncMemberFile 23 | 24 | from obstore.store import LocalStore, MemoryStore 25 | 26 | if TYPE_CHECKING: 27 | from collections.abc import AsyncIterable, Iterable 28 | 29 | from obstore.store import ObjectStore 30 | 31 | 32 | async def member_file(store: ObjectStore, path: str) -> AsyncMemberFile: 33 | """Create a member file for the zip archive.""" 34 | resp = await store.get_async(path) 35 | last_modified = resp.meta["last_modified"] 36 | mode = S_IFREG | 0o644 37 | # Unclear why but we need to wrap the response in an async generator 38 | return (path, last_modified, mode, ZIP_32, (byte async for byte in resp.stream())) 39 | 40 | 41 | async def member_files( 42 | store: ObjectStore, 43 | paths: Iterable[str], 44 | ) -> AsyncIterable[AsyncMemberFile]: 45 | """Create an async iterable of files for the zip archive.""" 46 | for path in paths: 47 | yield await member_file(store, path) 48 | 49 | 50 | async def zip_copy() -> None: 51 | """Copy files from one store into a zip archive that we upload to another store.""" 52 | # Input store with source data 53 | input_store = MemoryStore() 54 | input_store.put("foo", b"hello") 55 | input_store.put("bar", b"world") 56 | 57 | # Output store where the zip file will be saved 58 | output_store = LocalStore(Path()) 59 | 60 | # We can pass the streaming zip directly to `put` 61 | await output_store.put_async( 62 | "my.zip", 63 | stream_zip.async_stream_zip( 64 | member_files(input_store, ["foo", "bar"]), 65 | chunk_size=10 * 1024 * 1024, 66 | ), 67 | ) 68 | ``` 69 | 70 | This creates a zip file in the current directory: 71 | 72 | ``` 73 | > unzip -l my.zip 74 | Archive: my.zip 75 | Length Date Time Name 76 | --------- ---------- ----- ---- 77 | 5 05-22-2025 13:37 foo 78 | 5 05-22-2025 13:37 bar 79 | --------- ------- 80 | 10 2 files 81 | ``` 82 | 83 | And we can read a file: 84 | 85 | ``` 86 | > unzip -p my.zip foo 87 | hello 88 | ``` 89 | -------------------------------------------------------------------------------- /obstore/python/obstore/_store/_retry.pyi: -------------------------------------------------------------------------------- 1 | from datetime import timedelta 2 | from typing import TypedDict 3 | 4 | class BackoffConfig(TypedDict, total=False): 5 | """Exponential backoff with jitter. 6 | 7 | See 8 | 9 | !!! warning "Not importable at runtime" 10 | 11 | To use this type hint in your code, import it within a `TYPE_CHECKING` block: 12 | 13 | ```py 14 | from __future__ import annotations 15 | from typing import TYPE_CHECKING 16 | if TYPE_CHECKING: 17 | from obstore.store import BackoffConfig 18 | ``` 19 | """ 20 | 21 | init_backoff: timedelta 22 | """The initial backoff duration. 23 | 24 | Defaults to 100 milliseconds. 25 | """ 26 | 27 | max_backoff: timedelta 28 | """The maximum backoff duration. 29 | 30 | Defaults to 15 seconds. 31 | """ 32 | 33 | base: int | float 34 | """The base of the exponential to use. 35 | 36 | Defaults to `2`. 37 | """ 38 | 39 | class RetryConfig(TypedDict, total=False): 40 | """The configuration for how to respond to request errors. 41 | 42 | The following categories of error will be retried: 43 | 44 | * 5xx server errors 45 | * Connection errors 46 | * Dropped connections 47 | * Timeouts for [safe] / read-only requests 48 | 49 | Requests will be retried up to some limit, using exponential 50 | backoff with jitter. See [`BackoffConfig`][obstore.store.BackoffConfig] for 51 | more information 52 | 53 | [safe]: https://datatracker.ietf.org/doc/html/rfc7231#section-4.2.1 54 | 55 | !!! warning "Not importable at runtime" 56 | 57 | To use this type hint in your code, import it within a `TYPE_CHECKING` block: 58 | 59 | ```py 60 | from __future__ import annotations 61 | from typing import TYPE_CHECKING 62 | if TYPE_CHECKING: 63 | from obstore.store import RetryConfig 64 | ``` 65 | """ 66 | 67 | backoff: BackoffConfig 68 | """The backoff configuration. 69 | 70 | Defaults to the values listed above if not provided. 71 | """ 72 | 73 | max_retries: int 74 | """ 75 | The maximum number of times to retry a request 76 | 77 | Set to 0 to disable retries. 78 | 79 | Defaults to 10. 80 | """ 81 | 82 | retry_timeout: timedelta 83 | """ 84 | The maximum length of time from the initial request 85 | after which no further retries will be attempted 86 | 87 | This not only bounds the length of time before a server 88 | error will be surfaced to the application, but also bounds 89 | the length of time a request's credentials must remain valid. 90 | 91 | As requests are retried without renewing credentials or 92 | regenerating request payloads, this number should be kept 93 | below 5 minutes to avoid errors due to expired credentials 94 | and/or request payloads. 95 | 96 | Defaults to 3 minutes. 97 | """ 98 | -------------------------------------------------------------------------------- /docs/examples/fastapi.md: -------------------------------------------------------------------------------- 1 | # FastAPI 2 | 3 | [FastAPI](https://fastapi.tiangolo.com/) is a modern, high-performance, web framework for building APIs with Python based on standard Python type hints. 4 | 5 | It's easy to integrate obstore with FastAPI routes, where you want to download a file from an object store and return it to the user. 6 | 7 | FastAPI has a [`StreamingResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse), which neatly integrates with [`BytesStream`][obstore.BytesStream] to stream the response to the user. 8 | 9 | ## Example 10 | 11 | !!! note 12 | 13 | This example is also [available on Github](https://github.com/developmentseed/obstore/blob/main/examples/fastapi/README.md) if you'd like to test it out locally. 14 | 15 | First, import `fastapi` and `obstore` and create the FastAPI application. 16 | 17 | ```py 18 | from fastapi import FastAPI 19 | from fastapi.responses import StreamingResponse 20 | 21 | import obstore as obs 22 | from obstore.store import HTTPStore, S3Store 23 | 24 | app = FastAPI() 25 | ``` 26 | 27 | Next, we can add our route. Here, we create a simple route that fetches a small 28 | Parquet file from an HTTP url and returns it to the user. 29 | 30 | Passing `resp` directly to `StreamingResponse` calls 31 | [`GetResult.stream()`][obstore.GetResult.stream] under the hood and thus uses 32 | the default chunking behavior of `GetResult.stream()`. 33 | 34 | ```py 35 | @app.get("/example.parquet") 36 | async def download_example() -> StreamingResponse: 37 | store = HTTPStore.from_url("https://raw.githubusercontent.com") 38 | path = "opengeospatial/geoparquet/refs/heads/main/examples/example.parquet" 39 | 40 | # Make the request. This only begins the download; it does not wait for the 41 | # download to finish. 42 | resp = await obs.get_async(store, path) 43 | return StreamingResponse(resp) 44 | ``` 45 | 46 | You may also want to customize the chunking behavior of the async stream. To do 47 | this, call [`GetResult.stream()`][obstore.GetResult.stream] before passing to 48 | `StreamingResponse`. 49 | 50 | ```py 51 | @app.get("/large.parquet") 52 | async def large_example() -> StreamingResponse: 53 | # Example large Parquet file hosted in AWS open data 54 | store = S3Store("ookla-open-data", region="us-west-2", skip_signature=True) 55 | path = "parquet/performance/type=fixed/year=2024/quarter=1/2024-01-01_performance_fixed_tiles.parquet" 56 | 57 | # Note: for large file downloads you may need to increase the timeout in 58 | # the client configuration 59 | resp = await obs.get_async(store, path) 60 | 61 | # Example: Ensure the stream returns at least 5MB of data in each chunk. 62 | return StreamingResponse(resp.stream(min_chunk_size=5 * 1024 * 1024)) 63 | ``` 64 | 65 | Note that here FastAPI wraps 66 | [`starlette.responses.StreamingResponse`](https://www.starlette.io/responses/#streamingresponse). 67 | So any web server that uses [Starlette](https://www.starlette.io/) for responses 68 | can use this same code. 69 | -------------------------------------------------------------------------------- /tests/store/test_local.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | from pathlib import Path 3 | 4 | import pytest 5 | 6 | import obstore as obs 7 | from obstore.exceptions import GenericError 8 | from obstore.store import LocalStore 9 | 10 | HERE = Path() 11 | 12 | 13 | def test_local_store(): 14 | store = LocalStore(HERE) 15 | list_result = obs.list(store).collect() 16 | assert any("test_local.py" in x["path"] for x in list_result) 17 | 18 | 19 | def test_repr(): 20 | store = LocalStore(HERE) 21 | assert repr(store).startswith("LocalStore") 22 | 23 | 24 | def test_local_from_url(): 25 | with pytest.raises(ValueError, match="relative URL without a base"): 26 | LocalStore.from_url("") 27 | 28 | LocalStore.from_url("file://") 29 | LocalStore.from_url("file:///") 30 | 31 | url = f"file://{HERE.absolute()}" 32 | store = LocalStore.from_url(url) 33 | list_result = obs.list(store).collect() 34 | assert any("test_local.py" in x["path"] for x in list_result) 35 | 36 | # Test with trailing slash 37 | url = f"file://{HERE.absolute()}/" 38 | store = LocalStore.from_url(url) 39 | list_result = obs.list(store).collect() 40 | assert any("test_local.py" in x["path"] for x in list_result) 41 | 42 | # Test with two trailing slashes 43 | url = f"file://{HERE.absolute()}//" 44 | with pytest.raises(GenericError): 45 | store = LocalStore.from_url(url) 46 | 47 | 48 | def test_create_prefix(tmp_path: Path): 49 | tmpdir = tmp_path / "abc" 50 | assert not tmpdir.exists() 51 | LocalStore(tmpdir, mkdir=True) 52 | assert tmpdir.exists() 53 | 54 | # Assert that mkdir=True works even when the dir already exists 55 | LocalStore(tmpdir, mkdir=True) 56 | assert tmpdir.exists() 57 | 58 | 59 | def test_prefix_property(tmp_path: Path): 60 | store = LocalStore(tmp_path) 61 | assert store.prefix == tmp_path 62 | assert isinstance(store.prefix, Path) 63 | # Can pass it back to the store init 64 | LocalStore(store.prefix) 65 | 66 | 67 | def test_pickle(tmp_path: Path): 68 | store = LocalStore(tmp_path) 69 | obs.put(store, "path.txt", b"foo") 70 | new_store: LocalStore = pickle.loads(pickle.dumps(store)) 71 | assert obs.get(new_store, "path.txt").bytes() == b"foo" 72 | 73 | 74 | def test_eq(): 75 | store = LocalStore(HERE, automatic_cleanup=True) 76 | store2 = LocalStore(HERE, automatic_cleanup=True) 77 | store3 = LocalStore(HERE) 78 | assert store == store # noqa: PLR0124 79 | assert store == store2 80 | assert store != store3 81 | 82 | 83 | def test_local_store_percent_encoded(tmp_path: Path): 84 | fname1 = "hello%20world.txt" 85 | content1 = b"Hello, World!" 86 | with (tmp_path / fname1).open("wb") as f: 87 | f.write(content1) 88 | 89 | store = LocalStore(tmp_path) 90 | assert store.get(fname1).bytes() == content1 91 | 92 | fname2 = "hello world.txt" 93 | content2 = b"Hello, World! (with spaces)" 94 | with (tmp_path / fname2).open("wb") as f: 95 | f.write(content2) 96 | 97 | assert store.get(fname2).bytes() == content2 98 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # obstore 2 | 3 | 4 | [![PyPI][pypi_badge]][pypi_link] 5 | [![Conda Version][conda_version_badge]][conda_version] 6 | [![PyPI - Downloads][pypi-img]][pypi-link] 7 | 8 | [pypi_badge]: https://badge.fury.io/py/obstore.svg 9 | [pypi_link]: https://pypi.org/project/obstore/ 10 | [conda_version_badge]: https://img.shields.io/conda/vn/conda-forge/obstore.svg 11 | [conda_version]: https://prefix.dev/channels/conda-forge/packages/obstore 12 | [pypi-img]: https://static.pepy.tech/badge/obstore/month 13 | [pypi-link]: https://pypi.org/project/obstore/ 14 | 15 | The simplest, highest-throughput [^1] Python interface to [Amazon S3][s3], [Google Cloud Storage][gcs], [Azure Storage][azure_storage], & other S3-compliant APIs, powered by Rust. 16 | 17 | [s3]: https://aws.amazon.com/s3/ 18 | [gcs]: https://cloud.google.com/storage 19 | [azure_storage]: https://learn.microsoft.com/en-us/azure/storage/common/storage-introduction 20 | 21 | - **One interface** for all backends with **no required Python dependencies**. 22 | - Sync and async API with **full type hinting**. 23 | - **Streaming downloads** with configurable chunking. 24 | - **Streaming uploads** from files or async or sync iterators. 25 | - **Streaming list**, with no need to paginate. 26 | - Automatic [**multipart uploads**](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) for large file objects. 27 | - Automatic **credential refresh** before expiration. 28 | - File-like object API and [fsspec](https://github.com/fsspec/filesystem_spec) integration. 29 | - Optionally return list results in [Apache Arrow](https://arrow.apache.org/) format, which is faster and more memory-efficient than materializing Python `dict`s. 30 | - Zero-copy data exchange between Rust and Python via the [buffer protocol](https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/). 31 | 32 | For Rust developers looking to add `object_store` support to their own Python packages, refer to [`pyo3-object_store`](https://docs.rs/pyo3-object_store/latest/pyo3_object_store/). 33 | 34 | [^1]: Benchmarking is ongoing, but preliminary results indicate roughly [9x higher throughput than fsspec](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/fe8f290cb3282dcc3bc96cae06ed5f90ad326eff/test_results/cog_header_results.csv) and [2.8x higher throughput than aioboto3](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/40e67509a248c5102a6b1608bcb9773295691213/test_results/20250218_results/ec2_m5/aggregated_results.csv) for many concurrent, small, get requests from an async context. 35 | 36 | ## Installation 37 | 38 | To install obstore using pip: 39 | 40 | ```sh 41 | pip install obstore 42 | ``` 43 | 44 | Obstore is on [conda-forge](https://prefix.dev/channels/conda-forge/packages/obstore) and can be installed using [conda](https://docs.conda.io), [mamba](https://mamba.readthedocs.io/), or [pixi](https://pixi.sh/). To install obstore using conda: 45 | 46 | ``` 47 | conda install -c conda-forge obstore 48 | ``` 49 | 50 | ## Documentation 51 | 52 | [Full documentation is available on the website](https://developmentseed.org/obstore). 53 | 54 | Head to [Getting Started](https://developmentseed.org/obstore/latest/getting-started/) to dig in. 55 | -------------------------------------------------------------------------------- /tests/test_put.py: -------------------------------------------------------------------------------- 1 | import itertools 2 | from tempfile import TemporaryDirectory 3 | 4 | import pytest 5 | 6 | from obstore.exceptions import AlreadyExistsError 7 | from obstore.store import LocalStore, MemoryStore 8 | 9 | 10 | def test_put_non_multipart(): 11 | store = MemoryStore() 12 | 13 | store.put("file1.txt", b"foo", use_multipart=False) 14 | assert store.get("file1.txt").bytes() == b"foo" 15 | 16 | 17 | def test_put_non_multipart_sync_iterable(): 18 | store = MemoryStore() 19 | 20 | b = b"the quick brown fox jumps over the lazy dog," 21 | iterator = itertools.repeat(b, 5) 22 | store.put("file1.txt", iterator, use_multipart=False) 23 | assert store.get("file1.txt").bytes() == (b * 5) 24 | 25 | 26 | @pytest.mark.asyncio 27 | async def test_put_non_multipart_async_iterable(): 28 | store = MemoryStore() 29 | 30 | b = b"the quick brown fox jumps over the lazy dog," 31 | 32 | async def it(): 33 | for _ in range(5): 34 | yield b"the quick brown fox jumps over the lazy dog," 35 | 36 | await store.put_async("file1.txt", it(), use_multipart=False) 37 | assert store.get("file1.txt").bytes() == (b * 5) 38 | 39 | 40 | def test_put_multipart_one_chunk(): 41 | store = MemoryStore() 42 | 43 | store.put("file1.txt", b"foo", use_multipart=True) 44 | assert store.get("file1.txt").bytes() == b"foo" 45 | 46 | 47 | def test_put_multipart_large(): 48 | store = MemoryStore() 49 | 50 | data = b"the quick brown fox jumps over the lazy dog," * 5000 51 | path = "big-data.txt" 52 | 53 | store.put(path, data, use_multipart=True) 54 | assert store.get(path).bytes() == data 55 | 56 | 57 | def test_put_mode(): 58 | store = MemoryStore() 59 | 60 | store.put("file1.txt", b"foo") 61 | store.put("file1.txt", b"bar", mode="overwrite") 62 | 63 | with pytest.raises(AlreadyExistsError): 64 | store.put("file1.txt", b"foo", mode="create") 65 | 66 | assert store.get("file1.txt").bytes() == b"bar" 67 | 68 | 69 | @pytest.mark.asyncio 70 | async def test_put_async_iterable(): 71 | store = MemoryStore() 72 | 73 | data = b"the quick brown fox jumps over the lazy dog," * 50_000 74 | path = "big-data.txt" 75 | 76 | await store.put_async(path, data) 77 | 78 | resp = await store.get_async(path) 79 | stream = resp.stream(min_chunk_size=0) 80 | new_path = "new-path.txt" 81 | await store.put_async(new_path, stream) 82 | 83 | assert store.get(new_path).bytes() == data 84 | 85 | 86 | def test_put_sync_iterable(): 87 | store = MemoryStore() 88 | 89 | b = b"the quick brown fox jumps over the lazy dog," 90 | iterator = itertools.repeat(b, 50_000) 91 | data = b * 50_000 92 | path = "big-data.txt" 93 | 94 | store.put(path, iterator) 95 | 96 | assert store.get(path).bytes() == data 97 | 98 | 99 | def test_put_sync_iterable_local_store(): 100 | """Issue #450.""" 101 | with TemporaryDirectory() as tmpdir: 102 | store = LocalStore(tmpdir) 103 | 104 | b = b"the quick brown fox jumps over the lazy dog," 105 | iterator = itertools.repeat(b, 50_000) 106 | data = b * 50_000 107 | path = "big-data.txt" 108 | 109 | store.put(path, iterator) 110 | 111 | assert store.get(path).bytes() == data 112 | -------------------------------------------------------------------------------- /obstore/src/attributes.rs: -------------------------------------------------------------------------------- 1 | use std::borrow::Cow; 2 | use std::collections::HashMap; 3 | 4 | use indexmap::IndexMap; 5 | use object_store::{Attribute, AttributeValue, Attributes}; 6 | use pyo3::prelude::*; 7 | use pyo3::pybacked::PyBackedStr; 8 | use pyo3::types::PyDict; 9 | 10 | #[derive(Debug, PartialEq, Eq, Hash)] 11 | pub(crate) struct PyAttribute(Attribute); 12 | 13 | impl<'py> FromPyObject<'_, 'py> for PyAttribute { 14 | type Error = PyErr; 15 | 16 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 17 | let s = obj.extract::()?; 18 | match s.to_ascii_lowercase().as_str() { 19 | "content-disposition" | "contentdisposition" => Ok(Self(Attribute::ContentDisposition)), 20 | "content-encoding" | "contentencoding" => Ok(Self(Attribute::ContentEncoding)), 21 | "content-language" | "contentlanguage" => Ok(Self(Attribute::ContentLanguage)), 22 | "content-type" | "contenttype" => Ok(Self(Attribute::ContentType)), 23 | "cache-control" | "cachecontrol" => Ok(Self(Attribute::CacheControl)), 24 | _ => Ok(Self(Attribute::Metadata(Cow::Owned(s.to_string())))), 25 | } 26 | } 27 | } 28 | 29 | fn attribute_to_string(attribute: &Attribute) -> Cow<'static, str> { 30 | match attribute { 31 | Attribute::ContentDisposition => Cow::Borrowed("Content-Disposition"), 32 | Attribute::ContentEncoding => Cow::Borrowed("Content-Encoding"), 33 | Attribute::ContentLanguage => Cow::Borrowed("Content-Language"), 34 | Attribute::ContentType => Cow::Borrowed("Content-Type"), 35 | Attribute::CacheControl => Cow::Borrowed("Cache-Control"), 36 | Attribute::Metadata(x) => x.clone(), 37 | other => panic!("Unexpected attribute: {other:?}"), 38 | } 39 | } 40 | 41 | #[derive(Debug, PartialEq, Eq, Hash)] 42 | pub(crate) struct PyAttributeValue(AttributeValue); 43 | 44 | impl<'py> FromPyObject<'_, 'py> for PyAttributeValue { 45 | type Error = PyErr; 46 | 47 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 48 | Ok(Self(obj.extract::()?.into())) 49 | } 50 | } 51 | 52 | #[derive(Debug, PartialEq, Eq)] 53 | pub(crate) struct PyAttributes(Attributes); 54 | 55 | impl PyAttributes { 56 | pub fn new(attributes: Attributes) -> Self { 57 | Self(attributes) 58 | } 59 | 60 | pub fn into_inner(self) -> Attributes { 61 | self.0 62 | } 63 | } 64 | 65 | impl<'py> FromPyObject<'_, 'py> for PyAttributes { 66 | type Error = PyErr; 67 | 68 | fn extract(obj: Borrowed<'_, 'py, PyAny>) -> Result { 69 | let d = obj.extract::>()?; 70 | let mut attributes = Attributes::with_capacity(d.len()); 71 | for (k, v) in d.into_iter() { 72 | attributes.insert(k.0, v.0); 73 | } 74 | Ok(Self(attributes)) 75 | } 76 | } 77 | 78 | impl<'py> IntoPyObject<'py> for PyAttributes { 79 | type Target = PyDict; 80 | type Output = Bound<'py, PyDict>; 81 | type Error = PyErr; 82 | 83 | fn into_pyobject(self, py: Python<'py>) -> Result { 84 | let mut d = IndexMap::with_capacity(self.0.len()); 85 | for (k, v) in self.0.into_iter() { 86 | d.insert(attribute_to_string(k), v.as_ref()); 87 | } 88 | d.into_pyobject(py) 89 | } 90 | } 91 | -------------------------------------------------------------------------------- /obstore/python/obstore/_store/_client.pyi: -------------------------------------------------------------------------------- 1 | from datetime import timedelta 2 | from typing import TypedDict 3 | 4 | class ClientConfig(TypedDict, total=False): 5 | """HTTP client configuration. 6 | 7 | For timeout values (`connect_timeout`, `http2_keep_alive_timeout`, 8 | `pool_idle_timeout`, and `timeout`), values can either be Python `timedelta` 9 | objects, or they can be "human-readable duration strings". 10 | 11 | The human-readable duration string is a concatenation of time spans. Where each time 12 | span is an integer number and a suffix. Supported suffixes: 13 | 14 | - `nsec`, `ns` -- nanoseconds 15 | - `usec`, `us` -- microseconds 16 | - `msec`, `ms` -- milliseconds 17 | - `seconds`, `second`, `sec`, `s` 18 | - `minutes`, `minute`, `min`, `m` 19 | - `hours`, `hour`, `hr`, `h` 20 | - `days`, `day`, `d` 21 | - `weeks`, `week`, `w` 22 | - `months`, `month`, `M` -- defined as 30.44 days 23 | - `years`, `year`, `y` -- defined as 365.25 days 24 | 25 | For example: 26 | 27 | - `"2h 37min"` 28 | - `"32ms"` 29 | 30 | !!! warning "Not importable at runtime" 31 | 32 | To use this type hint in your code, import it within a `TYPE_CHECKING` block: 33 | 34 | ```py 35 | from __future__ import annotations 36 | from typing import TYPE_CHECKING 37 | if TYPE_CHECKING: 38 | from obstore.store import ClientConfig 39 | ``` 40 | """ 41 | 42 | allow_http: bool 43 | """Allow non-TLS, i.e. non-HTTPS connections.""" 44 | allow_invalid_certificates: bool 45 | """Skip certificate validation on https connections. 46 | 47 | !!! warning 48 | 49 | You should think very carefully before using this method. If 50 | invalid certificates are trusted, *any* certificate for *any* site 51 | will be trusted for use. This includes expired certificates. This 52 | introduces significant vulnerabilities, and should only be used 53 | as a last resort or for testing 54 | """ 55 | connect_timeout: str | timedelta 56 | """Timeout for only the connect phase of a Client""" 57 | default_content_type: str 58 | """Default `CONTENT_TYPE` for uploads""" 59 | default_headers: dict[str, str] | dict[str, bytes] 60 | """Default headers to be sent with each request""" 61 | http1_only: bool 62 | """Only use http1 connections.""" 63 | http2_keep_alive_interval: str 64 | """Interval for HTTP2 Ping frames should be sent to keep a connection alive.""" 65 | http2_keep_alive_timeout: str | timedelta 66 | """Timeout for receiving an acknowledgement of the keep-alive ping.""" 67 | http2_keep_alive_while_idle: str 68 | """Enable HTTP2 keep alive pings for idle connections""" 69 | http2_only: bool 70 | """Only use http2 connections""" 71 | pool_idle_timeout: str | timedelta 72 | """The pool max idle timeout. 73 | 74 | This is the length of time an idle connection will be kept alive. 75 | """ 76 | pool_max_idle_per_host: str 77 | """Maximum number of idle connections per host.""" 78 | proxy_url: str 79 | """HTTP proxy to use for requests.""" 80 | timeout: str | timedelta 81 | """Request timeout. 82 | 83 | The timeout is applied from when the request starts connecting until the 84 | response body has finished. 85 | """ 86 | user_agent: str 87 | """User-Agent header to be used by this client.""" 88 | -------------------------------------------------------------------------------- /obstore/src/lib.rs: -------------------------------------------------------------------------------- 1 | // Except for explicit areas where we enable unsafe 2 | #![deny(unsafe_code)] 3 | 4 | mod attributes; 5 | mod buffered; 6 | mod copy; 7 | mod delete; 8 | mod get; 9 | mod head; 10 | mod list; 11 | mod path; 12 | mod put; 13 | mod rename; 14 | mod scheme; 15 | mod signer; 16 | mod tags; 17 | mod utils; 18 | 19 | use pyo3::prelude::*; 20 | 21 | const VERSION: &str = env!("CARGO_PKG_VERSION"); 22 | const OBJECT_STORE_VERSION: &str = env!("OBJECT_STORE_VERSION"); 23 | const OBJECT_STORE_SOURCE: &str = env!("OBJECT_STORE_SOURCE"); 24 | 25 | /// Raise RuntimeWarning for debug builds 26 | #[pyfunction] 27 | fn check_debug_build(_py: Python) -> PyResult<()> { 28 | #[cfg(debug_assertions)] 29 | { 30 | use pyo3::exceptions::PyRuntimeWarning; 31 | use pyo3::intern; 32 | use pyo3::types::PyTuple; 33 | 34 | let warnings_mod = _py.import(intern!(_py, "warnings"))?; 35 | let warning = PyRuntimeWarning::new_err( 36 | "obstore has not been compiled in release mode. Performance will be degraded.", 37 | ); 38 | let args = PyTuple::new(_py, vec![warning])?; 39 | warnings_mod.call_method1(intern!(_py, "warn"), args)?; 40 | } 41 | 42 | Ok(()) 43 | } 44 | 45 | /// A Python module implemented in Rust. 46 | #[pymodule] 47 | fn _obstore(py: Python, m: &Bound) -> PyResult<()> { 48 | check_debug_build(py)?; 49 | 50 | m.add("__version__", VERSION)?; 51 | m.add("_object_store_version", OBJECT_STORE_VERSION)?; 52 | m.add("_object_store_source", OBJECT_STORE_SOURCE)?; 53 | 54 | pyo3_object_store::register_store_module(py, m, "obstore", "_store")?; 55 | pyo3_object_store::register_exceptions_module(py, m, "obstore", "exceptions")?; 56 | 57 | m.add_class::()?; 58 | // Set the value of `__module__` correctly on PyBytes 59 | m.getattr("Bytes")?.setattr("__module__", "obstore")?; 60 | 61 | m.add_wrapped(wrap_pyfunction!(buffered::open_reader))?; 62 | m.add_wrapped(wrap_pyfunction!(buffered::open_reader_async))?; 63 | m.add_wrapped(wrap_pyfunction!(buffered::open_writer))?; 64 | m.add_wrapped(wrap_pyfunction!(buffered::open_writer_async))?; 65 | m.add_wrapped(wrap_pyfunction!(copy::copy_async))?; 66 | m.add_wrapped(wrap_pyfunction!(copy::copy))?; 67 | m.add_wrapped(wrap_pyfunction!(delete::delete_async))?; 68 | m.add_wrapped(wrap_pyfunction!(delete::delete))?; 69 | m.add_wrapped(wrap_pyfunction!(get::get_async))?; 70 | m.add_wrapped(wrap_pyfunction!(get::get_range_async))?; 71 | m.add_wrapped(wrap_pyfunction!(get::get_range))?; 72 | m.add_wrapped(wrap_pyfunction!(get::get_ranges_async))?; 73 | m.add_wrapped(wrap_pyfunction!(get::get_ranges))?; 74 | m.add_wrapped(wrap_pyfunction!(get::get))?; 75 | m.add_wrapped(wrap_pyfunction!(head::head_async))?; 76 | m.add_wrapped(wrap_pyfunction!(head::head))?; 77 | m.add_wrapped(wrap_pyfunction!(list::list_with_delimiter_async))?; 78 | m.add_wrapped(wrap_pyfunction!(list::list_with_delimiter))?; 79 | m.add_wrapped(wrap_pyfunction!(list::list))?; 80 | m.add_wrapped(wrap_pyfunction!(put::put_async))?; 81 | m.add_wrapped(wrap_pyfunction!(put::put))?; 82 | m.add_wrapped(wrap_pyfunction!(rename::rename_async))?; 83 | m.add_wrapped(wrap_pyfunction!(rename::rename))?; 84 | m.add_wrapped(wrap_pyfunction!(scheme::parse_scheme))?; 85 | m.add_wrapped(wrap_pyfunction!(signer::sign_async))?; 86 | m.add_wrapped(wrap_pyfunction!(signer::sign))?; 87 | 88 | Ok(()) 89 | } 90 | -------------------------------------------------------------------------------- /tests/test_buffered.py: -------------------------------------------------------------------------------- 1 | from io import BytesIO 2 | 3 | import pytest 4 | 5 | import obstore as obs 6 | from obstore.store import MemoryStore 7 | 8 | 9 | def test_readable_file_sync(): 10 | store = MemoryStore() 11 | 12 | line = b"the quick brown fox jumps over the lazy dog\n" 13 | data = line * 5000 14 | path = "big-data.txt" 15 | 16 | obs.put(store, path, data) 17 | 18 | file = obs.open_reader(store, path) 19 | assert line == file.readline().to_bytes() 20 | 21 | file = obs.open_reader(store, path) 22 | buffer = file.read() 23 | assert memoryview(data) == memoryview(buffer) 24 | 25 | file = obs.open_reader(store, path) 26 | assert line == file.readline().to_bytes() 27 | 28 | file = obs.open_reader(store, path) 29 | assert memoryview(data[:20]) == memoryview(file.read(20)) 30 | 31 | 32 | @pytest.mark.asyncio 33 | async def test_readable_file_async(): 34 | store = MemoryStore() 35 | 36 | line = b"the quick brown fox jumps over the lazy dog\n" 37 | data = line * 5000 38 | path = "big-data.txt" 39 | 40 | await obs.put_async(store, path, data) 41 | 42 | file = await obs.open_reader_async(store, path) 43 | assert line == (await file.readline()).to_bytes() 44 | 45 | file = await obs.open_reader_async(store, path) 46 | buffer = await file.read() 47 | assert memoryview(data) == memoryview(buffer) 48 | 49 | file = await obs.open_reader_async(store, path) 50 | assert line == (await file.readline()).to_bytes() 51 | 52 | file = await obs.open_reader_async(store, path) 53 | assert memoryview(data[:20]) == memoryview(await file.read(20)) 54 | 55 | 56 | def test_writable_file_sync(): 57 | store = MemoryStore() 58 | 59 | line = b"the quick brown fox jumps over the lazy dog\n" 60 | path = "big-data.txt" 61 | with obs.open_writer(store, path) as writer: 62 | for _ in range(50): 63 | writer.write(line) 64 | 65 | retour = obs.get(store, path).bytes() 66 | assert retour == line * 50 67 | 68 | 69 | @pytest.mark.asyncio 70 | async def test_writable_file_async(): 71 | store = MemoryStore() 72 | 73 | line = b"the quick brown fox jumps over the lazy dog\n" 74 | path = "big-data.txt" 75 | async with obs.open_writer_async(store, path) as writer: 76 | for _ in range(50): 77 | await writer.write(line) 78 | 79 | resp = await obs.get_async(store, path) 80 | retour = await resp.bytes_async() 81 | assert retour == line * 50 82 | 83 | 84 | def test_read_past_eof_sync(): 85 | store = MemoryStore() 86 | 87 | data = b"Hello, World!" 88 | path = "greeting.txt" 89 | obs.put(store, path, data) 90 | 91 | file = obs.open_reader(store, path) 92 | buffer = file.read(20) 93 | assert memoryview(data) == memoryview(buffer) 94 | 95 | buf = BytesIO(data) 96 | expected = buf.read(20) 97 | assert memoryview(expected) == memoryview(buffer) 98 | 99 | 100 | @pytest.mark.asyncio 101 | async def test_read_past_eof_async(): 102 | store = MemoryStore() 103 | 104 | data = b"Hello, World!" 105 | path = "greeting.txt" 106 | await obs.put_async(store, path, data) 107 | 108 | file = await obs.open_reader_async(store, path) 109 | buffer = await file.read(20) 110 | assert memoryview(data) == memoryview(buffer) 111 | 112 | buf = BytesIO(data) 113 | expected = buf.read(20) 114 | assert memoryview(expected) == memoryview(buffer) 115 | -------------------------------------------------------------------------------- /pyo3-object_store/src/retry.rs: -------------------------------------------------------------------------------- 1 | use std::time::Duration; 2 | 3 | use object_store::{BackoffConfig, RetryConfig}; 4 | use pyo3::intern; 5 | use pyo3::prelude::*; 6 | 7 | #[derive(Clone, Debug, IntoPyObject, IntoPyObjectRef, PartialEq)] 8 | pub struct PyBackoffConfig { 9 | #[pyo3(item)] 10 | init_backoff: Duration, 11 | #[pyo3(item)] 12 | max_backoff: Duration, 13 | #[pyo3(item)] 14 | base: f64, 15 | } 16 | 17 | impl<'py> FromPyObject<'_, 'py> for PyBackoffConfig { 18 | type Error = PyErr; 19 | 20 | fn extract(obj: Borrowed<'_, 'py, pyo3::PyAny>) -> PyResult { 21 | let mut backoff_config = BackoffConfig::default(); 22 | let py = obj.py(); 23 | if let Ok(init_backoff) = obj.get_item(intern!(py, "init_backoff")) { 24 | backoff_config.init_backoff = init_backoff.extract()?; 25 | } 26 | if let Ok(max_backoff) = obj.get_item(intern!(py, "max_backoff")) { 27 | backoff_config.max_backoff = max_backoff.extract()?; 28 | } 29 | if let Ok(base) = obj.get_item(intern!(py, "base")) { 30 | backoff_config.base = base.extract()?; 31 | } 32 | Ok(backoff_config.into()) 33 | } 34 | } 35 | 36 | impl From for BackoffConfig { 37 | fn from(value: PyBackoffConfig) -> Self { 38 | BackoffConfig { 39 | init_backoff: value.init_backoff, 40 | max_backoff: value.max_backoff, 41 | base: value.base, 42 | } 43 | } 44 | } 45 | 46 | impl From for PyBackoffConfig { 47 | fn from(value: BackoffConfig) -> Self { 48 | PyBackoffConfig { 49 | init_backoff: value.init_backoff, 50 | max_backoff: value.max_backoff, 51 | base: value.base, 52 | } 53 | } 54 | } 55 | 56 | #[derive(Clone, Debug, IntoPyObject, IntoPyObjectRef, PartialEq)] 57 | pub struct PyRetryConfig { 58 | #[pyo3(item)] 59 | backoff: PyBackoffConfig, 60 | #[pyo3(item)] 61 | max_retries: usize, 62 | #[pyo3(item)] 63 | retry_timeout: Duration, 64 | } 65 | 66 | impl<'py> FromPyObject<'_, 'py> for PyRetryConfig { 67 | type Error = PyErr; 68 | 69 | fn extract(obj: Borrowed<'_, 'py, pyo3::PyAny>) -> PyResult { 70 | let mut retry_config = RetryConfig::default(); 71 | let py = obj.py(); 72 | if let Ok(backoff) = obj.get_item(intern!(py, "backoff")) { 73 | retry_config.backoff = backoff.extract::()?.into(); 74 | } 75 | if let Ok(max_retries) = obj.get_item(intern!(py, "max_retries")) { 76 | retry_config.max_retries = max_retries.extract()?; 77 | } 78 | if let Ok(retry_timeout) = obj.get_item(intern!(py, "retry_timeout")) { 79 | retry_config.retry_timeout = retry_timeout.extract()?; 80 | } 81 | Ok(retry_config.into()) 82 | } 83 | } 84 | 85 | impl From for RetryConfig { 86 | fn from(value: PyRetryConfig) -> Self { 87 | RetryConfig { 88 | backoff: value.backoff.into(), 89 | max_retries: value.max_retries, 90 | retry_timeout: value.retry_timeout, 91 | } 92 | } 93 | } 94 | 95 | impl From for PyRetryConfig { 96 | fn from(value: RetryConfig) -> Self { 97 | PyRetryConfig { 98 | backoff: value.backoff.into(), 99 | max_retries: value.max_retries, 100 | retry_timeout: value.retry_timeout, 101 | } 102 | } 103 | } 104 | -------------------------------------------------------------------------------- /tests/test_bytes.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | import pickle 4 | from typing import TYPE_CHECKING 5 | 6 | import pytest 7 | 8 | from obstore import Bytes 9 | 10 | if TYPE_CHECKING: 11 | from collections.abc import Iterable 12 | 13 | ALL_BYTES = b"".join([bytes([i]) for i in range(256)]) 14 | 15 | 16 | def test_empty_eq() -> None: 17 | """Test that empty bytes and Bytes are equal.""" 18 | assert Bytes(b"") == b"" 19 | 20 | 21 | def test_repr(): 22 | """Test the repr of Bytes and bytes.""" 23 | py_buf = b"foo\nbar\nbaz" 24 | rust_buf = Bytes(py_buf) 25 | # Assert reprs are the same excluding the prefix and suffix 26 | assert repr(py_buf)[2:-1] == repr(rust_buf)[8:-2] 27 | 28 | 29 | @pytest.mark.parametrize( 30 | "b", 31 | [bytes([i]) for i in range(256)], 32 | ) 33 | def test_uno_byte_bytes_repr(b: bytes) -> None: 34 | """Test the repr of Bytes and bytes for single byte values.""" 35 | rust_bytes = Bytes(b) 36 | rust_bytes_str = repr(rust_bytes) 37 | rust_bytes_str_eval = eval(rust_bytes_str) # noqa: S307 38 | assert rust_bytes_str_eval == rust_bytes == b 39 | 40 | 41 | class TestBytesRemovePrefixSuffix: 42 | """Test the remove_prefix and remove_suffix methods.""" 43 | 44 | def test_remove_prefix(self) -> None: 45 | """Test that remove_prefix works as expected.""" 46 | rust_bytes = Bytes(b"asdf") 47 | assert rust_bytes.removeprefix(b"as") == Bytes(b"df") 48 | assert rust_bytes.removeprefix(b"asdf") == Bytes(b"") 49 | 50 | def test_remove_suffix(self) -> None: 51 | """Test that remove_suffix works as expected.""" 52 | rust_bytes = Bytes(b"asdf") 53 | assert rust_bytes.removesuffix(b"df") == Bytes(b"as") 54 | assert rust_bytes.removesuffix(b"asdf") == Bytes(b"") 55 | 56 | 57 | class TestBytesSlice: 58 | """Test suite for Bytes slicing.""" 59 | 60 | def test_zero_step_value_err(self) -> None: 61 | """Test that slicing with step=0 raises ValueError.""" 62 | rs_bytes = Bytes(b"abcdefg") 63 | py_bytes = b"abcdefg" 64 | with pytest.raises(ValueError, match="slice step cannot be zero"): 65 | _py_new = py_bytes[0:4:0] 66 | 67 | with pytest.raises(ValueError, match="slice step cannot be zero"): 68 | _rs_bytes = rs_bytes[0:4:0] 69 | 70 | @pytest.mark.parametrize( 71 | "py_bytes", 72 | [b"abcdefg", b"", ALL_BYTES], 73 | ) 74 | def test_slice_o_bytes(self, py_bytes: bytes) -> None: 75 | """Run slicing on both bytes and Bytes and assert they are equal.""" 76 | rs_bytes = Bytes(py_bytes) 77 | for start, stop, step, _sliced in self._bytes_slices(py_bytes): 78 | new_py = py_bytes[start:stop:step] 79 | new_rs = rs_bytes[start:stop:step] 80 | assert new_rs == new_py 81 | 82 | @staticmethod 83 | def _bytes_slices( 84 | b: bytes, 85 | range_buffer: int = 3, 86 | ) -> Iterable[tuple[int, int, int, bytes]]: 87 | """Yield tuples (start, stop, step, sliced_result) for all slices of b.""" 88 | b_len = len(b) 89 | indices_range = range(-b_len - (range_buffer - 1), b_len + range_buffer) 90 | steps = (i for i in range(-(b_len + 2), b_len + 3) if i != 0) 91 | return ( 92 | (start, stop, step, b[start:stop:step]) 93 | for start in indices_range 94 | for stop in indices_range 95 | for step in steps 96 | ) 97 | 98 | 99 | def test_pickle(): 100 | b = Bytes(b"hello_world") 101 | assert b == pickle.loads(pickle.dumps(b)) 102 | -------------------------------------------------------------------------------- /pyo3-object_store/src/credentials.rs: -------------------------------------------------------------------------------- 1 | use chrono::Utc; 2 | use chrono::{DateTime, TimeDelta}; 3 | use pyo3::intern; 4 | use pyo3::prelude::*; 5 | use pyo3::types::PyTuple; 6 | use std::future::Future; 7 | use tokio::sync::Mutex; 8 | 9 | /// A temporary authentication token with an associated expiry 10 | #[derive(Debug, Clone)] 11 | pub(crate) struct TemporaryToken { 12 | /// The temporary credential 13 | pub token: T, 14 | /// The instant at which this credential is no longer valid 15 | /// None means the credential does not expire 16 | pub expiry: Option>, 17 | } 18 | 19 | /// Provides [`TokenCache::get_or_insert_with`] which can be used to cache a 20 | /// [`TemporaryToken`] based on its expiry 21 | #[derive(Debug)] 22 | pub(crate) struct TokenCache { 23 | /// A temporary token and the instant at which it was fetched 24 | cache: Mutex, DateTime)>>, 25 | min_ttl: TimeDelta, 26 | /// How long to wait before re-attempting a token fetch after receiving one that 27 | /// is still within the min-ttl 28 | fetch_backoff: TimeDelta, 29 | } 30 | 31 | impl Default for TokenCache { 32 | fn default() -> Self { 33 | Self { 34 | cache: Default::default(), 35 | min_ttl: TimeDelta::seconds(300), 36 | fetch_backoff: TimeDelta::milliseconds(100), 37 | } 38 | } 39 | } 40 | 41 | impl Clone for TokenCache { 42 | /// Cloning the token cache invalidates the cache. 43 | fn clone(&self) -> Self { 44 | Self { 45 | cache: Default::default(), 46 | min_ttl: self.min_ttl, 47 | fetch_backoff: self.fetch_backoff, 48 | } 49 | } 50 | } 51 | 52 | impl TokenCache { 53 | /// Override the minimum remaining TTL for a cached token to be used 54 | pub(crate) fn with_min_ttl(self, min_ttl: TimeDelta) -> Self { 55 | Self { min_ttl, ..self } 56 | } 57 | 58 | pub(crate) async fn get_or_insert_with(&self, f: F) -> Result 59 | where 60 | F: FnOnce() -> Fut + Send, 61 | Fut: Future, E>> + Send, 62 | { 63 | // let now = Instant::now(); 64 | let now = Utc::now(); 65 | 66 | let mut locked = self.cache.lock().await; 67 | 68 | if let Some((cached, fetched_at)) = locked.as_ref() { 69 | match cached.expiry { 70 | Some(expiry_time) => { 71 | // let x = ttl - now; 72 | // let x = ttl.signed_duration_since(now); 73 | // let x = expiry_time - now > self.min_ttl.into(); 74 | if expiry_time - now > self.min_ttl || 75 | // if we've recently attempted to fetch this token and it's not actually 76 | // expired, we'll wait to re-fetch it and return the cached one 77 | (Utc::now() - fetched_at < self.fetch_backoff && expiry_time - now > TimeDelta::zero()) 78 | { 79 | return Ok(cached.token.clone()); 80 | } 81 | } 82 | None => return Ok(cached.token.clone()), 83 | } 84 | } 85 | 86 | let cached = f().await?; 87 | let token = cached.token.clone(); 88 | *locked = Some((cached, Utc::now())); 89 | 90 | Ok(token) 91 | } 92 | } 93 | 94 | /// Check whether a Python object is awaitable 95 | pub(crate) fn is_awaitable(ob: &Bound) -> PyResult { 96 | let py = ob.py(); 97 | let inspect_mod = py.import(intern!(py, "inspect"))?; 98 | inspect_mod 99 | .call_method1(intern!(py, "isawaitable"), PyTuple::new(py, [ob])?)? 100 | .extract::() 101 | } 102 | -------------------------------------------------------------------------------- /docs/performance.md: -------------------------------------------------------------------------------- 1 | # Performance 2 | 3 | > Last edited 2025-02-05. 4 | 5 | Performance is a primary goal of Obstore. Benchmarking is still ongoing, so this document is a mix of what we've learned so far and our untested expectations. 6 | 7 | **tl;dr**: Obstore can't magically make your networking hardware faster, but it can reduce overhead, and in cases where that overhead is the limiting factor it can better utilize your available hardware and improve performance. 8 | 9 | ## Non-performance benefits 10 | 11 | Before we get into the weeds of performance, keep in mind that performance is not the _only_ feature of Obstore. There's a strong focus on developer experience as well: 12 | 13 | - Simple to install with no required Python dependencies. 14 | - Works the same across AWS S3, Google Cloud Storage, and Azure Storage. 15 | - Full type hinting, including all store configuration and operations. 16 | - Downloads that automatically act as iterators and uploads that automatically accept iterators. 17 | - Automatic pagination of `list` calls behind the scenes 18 | 19 | So you might enjoy using Obstore even in a case where it only marginally improves your performance. 20 | 21 | ## Defining performance 22 | 23 | "Fast" can have several definitions in a networking context. 24 | 25 | - **Download latency**: the time until the first byte of a download is received. 26 | - **Single-request throughput**: the download or upload bytes per second of a single request. 27 | - **Many-request throughput**: the combined download or upload bytes per second of multiple concurrent requests. 28 | 29 | Furthermore, performance can be different when using obstore's synchronous or asynchronous API. 30 | 31 | Let's consider the areas where we expect improved, possibly-improved, and equal performance. 32 | 33 | ## Improved performance 34 | 35 | **Many-request throughput with the asynchronous API** is the primary place where we expect significantly improved performance. Especially when making many requests of relatively small files, we find that obstore can provide significantly higher throughput. 36 | 37 | For example, preliminary results indicate roughly [9x higher throughput than fsspec](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/fe8f290cb3282dcc3bc96cae06ed5f90ad326eff/test_results/cog_header_results.csv) and [2.8x higher throughput than aioboto3](https://github.com/geospatial-jeff/pyasyncio-benchmark/blob/40e67509a248c5102a6b1608bcb9773295691213/test_results/20250218_results/ec2_m5/aggregated_results.csv). That specific benchmark considered fetching the first 16KB of a file many times from an async context. 38 | 39 | ## Possibly improved performance 40 | 41 | **Using the synchronous API**. We haven't benchmarked the synchronous API. However, we do release the Python [Global Interpreter Lock (GIL)](https://en.wikipedia.org/wiki/Global_interpreter_lock) for all synchronous operations, so it may perform better in a thread pool than other Python request libraries. 42 | 43 | ## Equal performance 44 | 45 | - Single-request throughput: if you're making _one request_, the limiting factor is likely network conditions, not Python overhead, so it's unlikely that obstore will be faster. 46 | 47 | Keep in mind, however, that what looks like a single request may actually be multiple requests under the hood. [`obstore.put`][obstore.put] will use multipart uploads by default, meaning that various parts of a file will be uploaded concurrently, and there may be efficiency gains here. 48 | - Latency: this is primarily driven by hardware and network conditions, and we expect Obstore to have similar latency as other Python request libraries. 49 | 50 | ## Future research 51 | 52 | In the future, we'd like to benchmark: 53 | 54 | - Alternate Python event loops, e.g. [`uvloop`](https://github.com/MagicStack/uvloop) 55 | - The obstore synchronous API 56 | 57 | If you have any interest in collaborating on this, [open an issue](https://github.com/developmentseed/obstore/issues/new/choose). 58 | -------------------------------------------------------------------------------- /docs/alternatives.md: -------------------------------------------------------------------------------- 1 | # Alternatives to Obstore 2 | 3 | ## Obstore vs fsspec 4 | 5 | [Fsspec](https://github.com/fsspec/filesystem_spec) is a generic specification for pythonic filesystems. It includes implementations for several cloud storage providers, including [s3fs](https://github.com/fsspec/s3fs) for Amazon S3, [gcsfs](https://github.com/fsspec/gcsfs) for Google Cloud Storage, and [adlfs](https://github.com/fsspec/adlfs) for Azure Storage. 6 | 7 | ### API Differences 8 | 9 | Like Obstore, fsspec presents an abstraction layer that allows you to write code once to interface to multiple cloud providers. However, the abstracted API each presents is different. Obstore tries to mirror **native object store** APIs while fsspec tries to mirror a **file-like** API. 10 | 11 | The upstream Rust library powering obstore, [`object_store`](https://docs.rs/object_store), documents why [it intentionally avoids](https://docs.rs/object_store/latest/object_store/index.html#why-not-a-filesystem-interface) a primary file-like API: 12 | 13 | > The `ObjectStore` interface is designed to mirror the APIs of object stores and not filesystems, and thus has stateless APIs instead of cursor based interfaces such as `Read` or `Seek` available in filesystems. 14 | > 15 | > This design provides the following advantages: 16 | > 17 | > - All operations are atomic, and readers cannot observe partial and/or failed writes 18 | > - Methods map directly to object store APIs, providing both efficiency and predictability 19 | > - Abstracts away filesystem and operating system specific quirks, ensuring portability 20 | > - Allows for functionality not native to filesystems, such as operation preconditions and atomic multipart uploads 21 | 22 | Obstore's primary APIs, like [`get`][obstore.get], [`put`][obstore.put], and [`list`][obstore.list], mirror such object store APIs. However, if you still need to use a file-like API, Obstore provides such APIs with [`open_reader`][obstore.open_reader] and [`open_writer`][obstore.open_writer]. 23 | 24 | Obstore also includes a best-effort [fsspec compatibility layer][obstore.fsspec], which allows you to use obstore in applications that expect an fsspec-compatible API. 25 | 26 | ### Performance 27 | 28 | Beyond API design, performance can also be a consideration. [Initial benchmarks](./performance.md) show that obstore's async API can provide 9x higher throughput than fsspec's async API. 29 | 30 | ## Obstore vs boto3 31 | 32 | [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) is the official Python client for working with AWS services, including S3. 33 | 34 | boto3 supports all features of S3, including some features that obstore doesn't provide, like creating or deleting buckets. 35 | 36 | However, boto3 is synchronous and specific to AWS. To support multiple clouds you'd need to use boto3 and another library and abstract away those differences yourself. With obstore you can interface with data in multiple clouds, changing only configuration settings. 37 | 38 | ## Obstore vs aioboto3 39 | 40 | [aioboto3](https://github.com/terricain/aioboto3) is an async Python client for S3, wrapping boto3 and [aiobotocore](https://github.com/aio-libs/aiobotocore). 41 | 42 | aioboto3 presents largely the same API as boto3, but async. As above, this means that it may support more S3-specific features than what obstore supports. 43 | 44 | But it's still specific to AWS, and in early [benchmarks](./performance.md) we've measured obstore to provide significantly higher throughput than aioboto3. 45 | 46 | ## Obstore vs Google Cloud Storage Python Client 47 | 48 | The official [Google Cloud Storage Python client](https://cloud.google.com/python/docs/reference/storage/latest) [uses requests](https://github.com/googleapis/python-storage/blob/f2cc9c5a2b1cc9724ca1269b8d452304da96bf03/setup.py#L42) as its HTTP client. This means that the GCS Python client supports only synchronous requests. 49 | 50 | It also presents a Google-specific API, so you'd need to re-implement your code if you want to use multiple cloud providers. 51 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "test-env" 3 | version = "0.1.0" 4 | description = "Add your description here" 5 | readme = "README.md" 6 | requires-python = ">=3.9" 7 | dependencies = [] 8 | 9 | [tool.uv] 10 | dev-dependencies = [ 11 | "aiohttp-retry>=2.9.1", 12 | "aiohttp>=3.11.13", 13 | "arro3-core>=0.4.2", 14 | "azure-identity>=1.21.0", 15 | "boto3>=1.38.21", 16 | "docker>=7.1.0", 17 | "fastapi>=0.115.12", # used in example but added here for pyright CI 18 | "fsspec>=2024.10.0", 19 | "google-auth>=2.38.0", 20 | "griffe-inherited-docstrings>=1.0.1", 21 | "griffe>=1.6.0", 22 | "ipykernel>=6.29.5", 23 | "maturin-import-hook>=0.2.0", 24 | "maturin>=1.7.4", 25 | "mike>=2.1.3", 26 | "minio>=7.2.16", 27 | "mkdocs-material[imaging]>=9.6.3", 28 | "mkdocs-redirects>=1.2.2", 29 | "mkdocs>=1.6.1", 30 | "mkdocstrings-python>=1.13.0", 31 | "mkdocstrings>=0.27.0", 32 | "mypy>=1.15.0", 33 | "obspec>=0.1.0", 34 | "pip>=24.2", 35 | "polars>=1.30.0", 36 | "pyarrow>=17.0.0", 37 | "pystac-client>=0.8.3", 38 | "pystac>=1.10.1", 39 | "pytest-asyncio>=0.24.0", 40 | "pytest-mypy-plugins>=3.2.0", 41 | "pytest>=8.3.3", 42 | "python-dotenv>=1.0.1", 43 | "ruff>=0.13.0", 44 | "tqdm>=4.67.1", 45 | "types-boto3[s3,sts]>=1.36.23", 46 | "types-requests>=2.31.0.6", 47 | ] 48 | constraint-dependencies = [ 49 | # ensure lockfile grabs wheels for pyproj for each Python version 50 | "urllib; python_version == '3.9'", 51 | "urllib>=2.0; python_version >= '3.10'", 52 | ] 53 | 54 | [tool.uv.workspace] 55 | members = ["examples/fastapi-example"] 56 | 57 | [tool.ruff] 58 | target-version = "py39" 59 | 60 | [tool.ruff.lint] 61 | select = ["ALL"] 62 | ignore = [ 63 | "D104", # Missing docstring in public package 64 | "D203", # 1 blank line required before class docstring (conflicts with D211) 65 | "D213", # Multi-line docstring summary should start at the second line (conflicts with D212) 66 | "EM101", 67 | "FIX002", # Line contains TODO, consider resolving the issue 68 | "INP001", # File is part of an implicit namespace package. 69 | "PLC0415", # `import` should be at the top-level of a file 70 | "PYI021", # docstring-in-stub 71 | "PYI051", # redundant-literal-union 72 | "PYI011", # typed-argument-default-in-stub 73 | "S101", # allow assert 74 | "TD", # Todo comments 75 | "TRY003", # Avoid specifying long messages outside the exception class 76 | ] 77 | 78 | [tool.ruff.lint.per-file-ignores] 79 | "examples/*" = [ 80 | "PGH004", # Use specific rule codes when using `ruff: noqa` 81 | ] 82 | "*.pyi" = [ 83 | "ANN204", # Missing return type annotation for special method 84 | "E501", # Line too long 85 | ] 86 | "tests/*" = [ 87 | "ANN201", # Missing return type annotation for public function 88 | "ANN202", # Missing return type annotation for private function `it` 89 | "D100", # Missing docstring in public module 90 | "D103", # Missing docstring in public function 91 | "PLR2004", # Magic value used in comparison, consider replacing `100` with a constant variable 92 | "S301", # `pickle` and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue 93 | "SLF001", # Private member accessed 94 | ] 95 | 96 | [tool.ruff.lint.isort] 97 | known-first-party = ["obstore"] 98 | 99 | [tool.pyright] 100 | exclude = ["**/__pycache__", "examples", ".venv"] 101 | executionEnvironments = [ 102 | { root = "./", extraPaths = ["./obstore/python"] }, # Tests. 103 | { root = "./obstore/python" }, 104 | ] 105 | 106 | [tool.pytest.ini_options] 107 | addopts = "-v --mypy-only-local-stub" 108 | asyncio_default_fixture_loop_scope = "function" 109 | testpaths = ["tests"] 110 | markers = ["network: mark the test as requiring a network connection"] 111 | 112 | [tool.mypy] 113 | files = ["obstore/python"] 114 | 115 | [[tool.mypy.overrides]] 116 | module = ["fsspec.*"] 117 | ignore_missing_imports = true 118 | -------------------------------------------------------------------------------- /pyo3-object_store/README.md: -------------------------------------------------------------------------------- 1 | # pyo3-object_store 2 | 3 | Integration between [`object_store`](https://docs.rs/object_store) and [`pyo3`](https://github.com/PyO3/pyo3). 4 | 5 | This provides Python builder classes so that Python users can easily create [`Arc`][object_store::ObjectStore] instances, which can then be used in pure-Rust code. 6 | 7 | ## Usage 8 | 9 | 1. Register the builders. 10 | 11 | ```rs 12 | #[pymodule] 13 | fn python_module(py: Python, m: &Bound) -> PyResult<()> { 14 | pyo3_object_store::register_store_module(py, m, "python_module", "store")?; 15 | pyo3_object_store::register_exceptions_module(py, m, "python_module", "exceptions")?; 16 | } 17 | ``` 18 | 19 | This exports the underlying Python classes from your own Rust-Python library. 20 | 21 | Refer to [`register_store_module`] and [`register_exceptions_module`] for more information. 22 | 23 | 2. Accept [`PyObjectStore`] as a parameter in your function exported to Python. Its [`into_dyn`][PyObjectStore::into_dyn] method (or `Into` impl) gives you an [`Arc`][object_store::ObjectStore]. 24 | 25 | ```rs 26 | #[pyfunction] 27 | pub fn use_object_store(store: PyObjectStore) { 28 | let store: Arc = store.into_dyn(); 29 | } 30 | ``` 31 | 32 | You can also accept [`AnyObjectStore`] as a parameter, which wraps [`PyObjectStore`] and [`PyExternalObjectStore`]. This allows you to seamlessly recreate `ObjectStore` instances that users pass in from other Python libraries (like [`obstore`][obstore]) that themselves export `pyo3-object_store` builders. 33 | 34 | Note however that due to lack of [ABI stability](#abi-stability), `ObjectStore` instances will be **recreated**, and so there will be no connection pooling across the external store. 35 | 36 | ## Example 37 | 38 | The [`obstore`][obstore] Python library gives a full real-world example of using `pyo3-object_store`, exporting a Python API that mimics the Rust [`ObjectStore`][object_store::ObjectStore] API. 39 | 40 | [obstore]: https://developmentseed.org/obstore/latest/ 41 | 42 | ## Feature flags 43 | 44 | - `external-store-warning` (enabled by default): Emit a user warning when constructing a `PyExternalObjectStore` (or `AnyObjectStore::PyExternalObjectStore`), to inform users that there may be performance implications due to lack of connection pooling across separately-compiled Python libraries. Disable this feature if you don't want the warning. 45 | 46 | ## ABI stability 47 | 48 | It's [not currently possible](https://github.com/PyO3/pyo3/issues/1444) to share a `#[pyclass]` across multiple Python libraries, except in special cases where the underlying data has a stable ABI. 49 | 50 | As `object_store` does not currently have a stable ABI, we can't share `PyObjectStore` instances across multiple separately-compiled Python libraries. 51 | 52 | We have two ways to get around this: 53 | 54 | - Export your own Python classes so that users can construct `ObjectStore` instances that were compiled _with your library_. See [`register_store_module`]. 55 | - Accept [`AnyObjectStore`] or [`PyExternalObjectStore`] as a parameter, which allows for seamlessly **reconstructing** stores from an external Python library, like [`obstore`][obstore]. This has some overhead and removes any possibility of connection pooling across the two instances. 56 | 57 | Note about not being able to use these across Python packages. It has to be used with the exported classes from your own library. 58 | 59 | ## Python Type hints 60 | 61 | We don't yet have a _great_ solution here for reusing the store builder type hints in your own library. Type hints are shipped with the cargo dependency. Or, you can use a submodule on the `obstore` repo. See [`async-tiff` for an example](https://github.com/developmentseed/async-tiff/blob/35eaf116d9b1ab31232a1e23298b3102d2879e9c/python/python/async_tiff/store). 62 | 63 | ## Version compatibility 64 | 65 | | pyo3-object_store | pyo3 | object_store | 66 | | ----------------- | ------------------ | ------------------ | 67 | | 0.1.x | 0.23 | 0.12 | 68 | | 0.2.x | 0.24 | 0.12 | 69 | | 0.3.x | **0.23** :warning: | **0.11** :warning: | 70 | | 0.4.x | 0.24 | **0.11** :warning: | 71 | | 0.5.x | 0.25 | 0.12 | 72 | | 0.6.x | 0.26 | 0.12 | 73 | | 0.7.x | 0.27 | 0.12 | 74 | 75 | Note that 0.3.x and 0.4.x are compatibility releases to use `pyo3-object_store` with older versions of `pyo3` and `object_store`. 76 | -------------------------------------------------------------------------------- /pyo3-object_store/src/simple.rs: -------------------------------------------------------------------------------- 1 | use std::sync::Arc; 2 | 3 | use object_store::memory::InMemory; 4 | use object_store::ObjectStoreScheme; 5 | use pyo3::prelude::*; 6 | use pyo3::types::{PyDict, PyType}; 7 | use pyo3::{intern, IntoPyObjectExt}; 8 | 9 | use crate::error::GenericError; 10 | use crate::retry::PyRetryConfig; 11 | use crate::url::PyUrl; 12 | use crate::{ 13 | PyAzureStore, PyClientOptions, PyGCSStore, PyHttpStore, PyLocalStore, PyMemoryStore, 14 | PyObjectStoreResult, PyS3Store, 15 | }; 16 | 17 | /// Simple construction of stores by url. 18 | // Note: We don't extract the PyObject in the function signature because it's possible that 19 | // AWS/Azure/Google config keys could overlap. And so we don't want to accidentally parse a config 20 | // as an AWS config before knowing that the URL scheme is AWS. 21 | #[pyfunction] 22 | #[pyo3(signature = (url, *, config=None, client_options=None, retry_config=None, credential_provider=None, **kwargs))] 23 | pub fn from_url<'py>( 24 | py: Python<'py>, 25 | url: PyUrl, 26 | config: Option>, 27 | client_options: Option, 28 | retry_config: Option, 29 | credential_provider: Option>, 30 | kwargs: Option>, 31 | ) -> PyObjectStoreResult> { 32 | let (scheme, _) = ObjectStoreScheme::parse(url.as_ref()).map_err(object_store::Error::from)?; 33 | match scheme { 34 | ObjectStoreScheme::AmazonS3 => PyS3Store::from_url( 35 | &PyType::new::(py), 36 | url, 37 | config.map(|x| x.extract()).transpose()?, 38 | client_options, 39 | retry_config, 40 | credential_provider.map(|x| x.extract()).transpose()?, 41 | kwargs.map(|x| x.extract()).transpose()?, 42 | ), 43 | ObjectStoreScheme::GoogleCloudStorage => PyGCSStore::from_url( 44 | &PyType::new::(py), 45 | url, 46 | config.map(|x| x.extract()).transpose()?, 47 | client_options, 48 | retry_config, 49 | credential_provider.map(|x| x.extract()).transpose()?, 50 | kwargs.map(|x| x.extract()).transpose()?, 51 | ), 52 | ObjectStoreScheme::MicrosoftAzure => PyAzureStore::from_url( 53 | &PyType::new::(py), 54 | url, 55 | config.map(|x| x.extract()).transpose()?, 56 | client_options, 57 | retry_config, 58 | credential_provider.map(|x| x.extract()).transpose()?, 59 | kwargs.map(|x| x.extract()).transpose()?, 60 | ), 61 | ObjectStoreScheme::Http => { 62 | raise_if_config_passed(config, kwargs, "http")?; 63 | PyHttpStore::from_url( 64 | &PyType::new::(py), 65 | py, 66 | url, 67 | client_options, 68 | retry_config, 69 | ) 70 | } 71 | ObjectStoreScheme::Local => { 72 | let mut automatic_cleanup = false; 73 | let mut mkdir = false; 74 | if let Some(kwargs) = kwargs { 75 | let kwargs = kwargs.cast::()?; 76 | if let Some(val) = kwargs.get_item(intern!(py, "automatic_cleanup"))? { 77 | automatic_cleanup = val.extract()?; 78 | } 79 | if let Some(val) = kwargs.get_item(intern!(py, "mkdir"))? { 80 | mkdir = val.extract()?; 81 | } 82 | } 83 | 84 | PyLocalStore::from_url( 85 | &PyType::new::(py), 86 | url, 87 | automatic_cleanup, 88 | mkdir, 89 | ) 90 | } 91 | ObjectStoreScheme::Memory => { 92 | raise_if_config_passed(config, kwargs, "memory")?; 93 | let store: PyMemoryStore = Arc::new(InMemory::new()).into(); 94 | Ok(store.into_bound_py_any(py)?) 95 | } 96 | scheme => Err(GenericError::new_err(format!("Unknown URL scheme {scheme:?}")).into()), 97 | } 98 | } 99 | 100 | fn raise_if_config_passed( 101 | config: Option>, 102 | kwargs: Option>, 103 | scheme: &str, 104 | ) -> PyObjectStoreResult<()> { 105 | if config.is_some() || kwargs.is_some() { 106 | return Err(GenericError::new_err(format!( 107 | "Cannot pass config or keyword parameters for scheme {scheme:?}" 108 | )) 109 | .into()); 110 | } 111 | Ok(()) 112 | } 113 | -------------------------------------------------------------------------------- /tests/conftest.py: -------------------------------------------------------------------------------- 1 | from __future__ import annotations 2 | 3 | import socket 4 | import time 5 | import warnings 6 | from typing import TYPE_CHECKING, Any 7 | 8 | import docker 9 | import pytest 10 | import requests 11 | from minio import Minio 12 | from requests.exceptions import RequestException 13 | 14 | from obstore.store import S3Store 15 | 16 | if TYPE_CHECKING: 17 | from collections.abc import Generator 18 | 19 | from obstore.store import ClientConfig, S3Config 20 | 21 | TEST_BUCKET_NAME = "test-bucket" 22 | 23 | 24 | def find_available_port() -> int: 25 | """Find a free port on localhost. 26 | 27 | Note that this is susceptible to race conditions. 28 | """ 29 | # https://stackoverflow.com/a/36331860 30 | 31 | with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: 32 | # Bind to a free port provided by the host. 33 | s.bind(("", 0)) 34 | 35 | # Return the port number assigned. 36 | return s.getsockname()[1] 37 | 38 | 39 | @pytest.fixture(scope="session") 40 | def minio_config() -> Generator[tuple[S3Config, ClientConfig], Any, None]: 41 | warnings.warn( 42 | "Creating Docker client...", 43 | UserWarning, 44 | stacklevel=1, 45 | ) 46 | docker_client = docker.from_env() 47 | warnings.warn( 48 | "Finished creating Docker client...", 49 | UserWarning, 50 | stacklevel=1, 51 | ) 52 | 53 | username = "minioadmin" 54 | password = "minioadmin" # noqa: S105 55 | port = find_available_port() 56 | console_port = find_available_port() 57 | 58 | print(f"Using ports: {port=}, {console_port=}") # noqa: T201 59 | print( # noqa: T201 60 | f"Log on to MinIO console at http://localhost:{console_port} with " 61 | f"{username=} and {password=}", 62 | ) 63 | 64 | warnings.warn( 65 | "Starting MinIO container...", 66 | UserWarning, 67 | stacklevel=1, 68 | ) 69 | minio_container = docker_client.containers.run( 70 | "quay.io/minio/minio", 71 | "server /data --console-address :9001", 72 | detach=True, 73 | ports={ 74 | "9000/tcp": port, 75 | "9001/tcp": console_port, 76 | }, 77 | environment={ 78 | "MINIO_ROOT_USER": username, 79 | "MINIO_ROOT_PASSWORD": password, 80 | }, 81 | ) 82 | warnings.warn( 83 | "Finished starting MinIO container...", 84 | UserWarning, 85 | stacklevel=1, 86 | ) 87 | 88 | # Wait for MinIO to be ready 89 | endpoint = f"http://localhost:{port}" 90 | wait_for_minio(endpoint, timeout=30) 91 | 92 | minio_client = Minio( 93 | f"localhost:{port}", 94 | access_key=username, 95 | secret_key=password, 96 | secure=False, 97 | ) 98 | minio_client.make_bucket(TEST_BUCKET_NAME) 99 | 100 | s3_config: S3Config = { 101 | "bucket": TEST_BUCKET_NAME, 102 | "endpoint": endpoint, 103 | "access_key_id": username, 104 | "secret_access_key": password, 105 | "virtual_hosted_style_request": False, 106 | } 107 | client_options: ClientConfig = {"allow_http": True} 108 | 109 | yield (s3_config, client_options) 110 | 111 | minio_container.stop() 112 | minio_container.remove() 113 | 114 | 115 | @pytest.fixture 116 | def minio_bucket( 117 | minio_config: tuple[S3Config, ClientConfig], 118 | ) -> Generator[tuple[S3Config, ClientConfig], Any, None]: 119 | yield minio_config 120 | 121 | # Remove all files from bucket 122 | store = S3Store(config=minio_config[0], client_options=minio_config[1]) 123 | objects = store.list().collect() 124 | paths = [obj["path"] for obj in objects] 125 | store.delete(paths) 126 | 127 | 128 | @pytest.fixture 129 | def minio_store(minio_bucket: tuple[S3Config, ClientConfig]) -> S3Store: 130 | """Create an S3Store configured for MinIO integration testing.""" 131 | return S3Store(config=minio_bucket[0], client_options=minio_bucket[1]) 132 | 133 | 134 | def wait_for_minio(endpoint: str, timeout: int): 135 | start_time = time.time() 136 | while time.time() - start_time < timeout: 137 | try: 138 | # MinIO health check endpoint 139 | response = requests.get(f"{endpoint}/minio/health/live", timeout=2) 140 | if response.status_code == 200: 141 | return 142 | except RequestException: 143 | pass 144 | time.sleep(0.5) 145 | 146 | exc_str = f"MinIO failed to start within {timeout} seconds" 147 | raise TimeoutError(exc_str) 148 | -------------------------------------------------------------------------------- /pyo3-bytes/bytes.pyi: -------------------------------------------------------------------------------- 1 | # ruff: noqa: D205 2 | 3 | import sys 4 | from typing import overload 5 | 6 | if sys.version_info >= (3, 12): 7 | from collections.abc import Buffer 8 | else: 9 | from typing_extensions import Buffer 10 | 11 | class Bytes(Buffer): 12 | """A `bytes`-like buffer. 13 | 14 | This implements the Python buffer protocol, allowing zero-copy access 15 | to underlying Rust memory. 16 | 17 | You can pass this to `memoryview` for a zero-copy view into the underlying 18 | data or to `bytes` to copy the underlying data into a Python `bytes`. 19 | 20 | Many methods from the Python `bytes` class are implemented on this, 21 | """ 22 | 23 | def __init__(self, buf: Buffer = b"") -> None: 24 | """Construct a new Bytes object. 25 | 26 | This will be a zero-copy view on the Python byte slice. 27 | """ 28 | def __add__(self, other: Buffer) -> Bytes: ... 29 | def __buffer__(self, flags: int) -> memoryview[int]: ... 30 | def __contains__(self, other: Buffer) -> bool: ... 31 | def __eq__(self, other: object) -> bool: ... 32 | @overload 33 | def __getitem__(self, key: int, /) -> int: ... 34 | @overload 35 | def __getitem__(self, key: slice[int, int, int], /) -> Bytes: ... 36 | def __getitem__(self, key: int | slice, /) -> int | Bytes: ... # type: ignore[misc] # docstring in pyi file 37 | def __mul__(self, other: Buffer) -> int: ... 38 | def __len__(self) -> int: ... 39 | def removeprefix(self, prefix: Buffer, /) -> Bytes: 40 | """If the binary data starts with the prefix string, return `bytes[len(prefix):]`. 41 | Otherwise, return the original binary data. 42 | """ 43 | def removesuffix(self, suffix: Buffer, /) -> Bytes: 44 | """If the binary data ends with the suffix string and that suffix is not empty, 45 | return `bytes[:-len(suffix)]`. Otherwise, return the original binary data. 46 | """ 47 | def isalnum(self) -> bool: 48 | """Return `True` if all bytes in the sequence are alphabetical ASCII characters or 49 | ASCII decimal digits and the sequence is not empty, `False` otherwise. 50 | 51 | Alphabetic ASCII characters are those byte values in the sequence 52 | `b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'`. ASCII decimal digits 53 | are those byte values in the sequence `b'0123456789'`. 54 | """ 55 | def isalpha(self) -> bool: 56 | """Return `True` if all bytes in the sequence are alphabetic ASCII characters and 57 | the sequence is not empty, `False` otherwise. 58 | 59 | Alphabetic ASCII characters are those byte values in the sequence 60 | `b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'`. 61 | """ 62 | def isascii(self) -> bool: 63 | """Return `True` if the sequence is empty or all bytes in the sequence are ASCII, 64 | `False` otherwise. 65 | 66 | ASCII bytes are in the range `0-0x7F`. 67 | """ 68 | def isdigit(self) -> bool: 69 | """Return `True` if all bytes in the sequence are ASCII decimal digits and the 70 | sequence is not empty, `False` otherwise. 71 | 72 | ASCII decimal digits are those byte values in the sequence `b'0123456789'`. 73 | """ 74 | def islower(self) -> bool: 75 | """Return `True` if there is at least one lowercase ASCII character in the sequence 76 | and no uppercase ASCII characters, `False` otherwise. 77 | """ 78 | def isspace(self) -> bool: 79 | r"""Return `True` if all bytes in the sequence are ASCII whitespace and the sequence 80 | is not empty, `False` otherwise. 81 | 82 | ASCII whitespace characters are those byte values 83 | in the sequence `b' \t\n\r\x0b\f'` (space, tab, newline, carriage return, 84 | vertical tab, form feed). 85 | """ 86 | def isupper(self) -> bool: 87 | """Return `True` if there is at least one uppercase alphabetic ASCII character in 88 | the sequence and no lowercase ASCII characters, `False` otherwise. 89 | """ 90 | 91 | def lower(self) -> Bytes: 92 | """Return a copy of the sequence with all the uppercase ASCII characters converted 93 | to their corresponding lowercase counterpart. 94 | """ 95 | 96 | def upper(self) -> Bytes: 97 | """Return a copy of the sequence with all the lowercase ASCII characters converted 98 | to their corresponding uppercase counterpart. 99 | """ 100 | 101 | def to_bytes(self) -> bytes: 102 | """Copy this buffer's contents into a Python `bytes` object.""" 103 | -------------------------------------------------------------------------------- /pyo3-object_store/src/http.rs: -------------------------------------------------------------------------------- 1 | use std::sync::Arc; 2 | 3 | use object_store::http::{HttpBuilder, HttpStore}; 4 | use pyo3::prelude::*; 5 | use pyo3::types::{PyDict, PyTuple, PyType}; 6 | use pyo3::{intern, IntoPyObjectExt}; 7 | 8 | use crate::error::PyObjectStoreResult; 9 | use crate::retry::PyRetryConfig; 10 | use crate::{PyClientOptions, PyUrl}; 11 | 12 | #[derive(Debug, Clone, PartialEq)] 13 | struct HTTPConfig { 14 | url: PyUrl, 15 | client_options: Option, 16 | retry_config: Option, 17 | } 18 | 19 | impl HTTPConfig { 20 | fn __getnewargs_ex__<'py>(&'py self, py: Python<'py>) -> PyResult> { 21 | let args = PyTuple::new(py, vec![self.url.clone()])?.into_bound_py_any(py)?; 22 | let kwargs = PyDict::new(py); 23 | 24 | if let Some(client_options) = &self.client_options { 25 | kwargs.set_item(intern!(py, "client_options"), client_options.clone())?; 26 | } 27 | if let Some(retry_config) = &self.retry_config { 28 | kwargs.set_item(intern!(py, "retry_config"), retry_config.clone())?; 29 | } 30 | 31 | PyTuple::new(py, [args, kwargs.into_bound_py_any(py)?]) 32 | } 33 | } 34 | 35 | /// A Python-facing wrapper around a [`HttpStore`]. 36 | #[derive(Debug, Clone)] 37 | #[pyclass(name = "HTTPStore", frozen, subclass)] 38 | pub struct PyHttpStore { 39 | // Note: we don't need to wrap this in a MaybePrefixedStore because the HttpStore manages its 40 | // own prefix. 41 | store: Arc, 42 | /// A config used for pickling. This must stay in sync with the underlying store's config. 43 | config: HTTPConfig, 44 | } 45 | 46 | impl AsRef> for PyHttpStore { 47 | fn as_ref(&self) -> &Arc { 48 | &self.store 49 | } 50 | } 51 | 52 | impl PyHttpStore { 53 | /// Consume self and return the underlying [`HttpStore`]. 54 | pub fn into_inner(self) -> Arc { 55 | self.store 56 | } 57 | } 58 | 59 | #[pymethods] 60 | impl PyHttpStore { 61 | #[new] 62 | #[pyo3(signature = (url, *, client_options=None, retry_config=None))] 63 | fn new( 64 | url: PyUrl, 65 | client_options: Option, 66 | retry_config: Option, 67 | ) -> PyObjectStoreResult { 68 | let mut builder = HttpBuilder::new().with_url(url.clone()); 69 | if let Some(client_options) = client_options.clone() { 70 | builder = builder.with_client_options(client_options.into()) 71 | } 72 | if let Some(retry_config) = retry_config.clone() { 73 | builder = builder.with_retry(retry_config.into()) 74 | } 75 | Ok(Self { 76 | store: Arc::new(builder.build()?), 77 | config: HTTPConfig { 78 | url, 79 | client_options, 80 | retry_config, 81 | }, 82 | }) 83 | } 84 | 85 | #[classmethod] 86 | #[pyo3(signature = (url, *, client_options=None, retry_config=None))] 87 | pub(crate) fn from_url<'py>( 88 | cls: &Bound<'py, PyType>, 89 | py: Python<'py>, 90 | url: PyUrl, 91 | client_options: Option, 92 | retry_config: Option, 93 | ) -> PyObjectStoreResult> { 94 | // Note: we pass **back** through Python so that if cls is a subclass, we instantiate the 95 | // subclass 96 | let kwargs = PyDict::new(py); 97 | kwargs.set_item("url", url)?; 98 | kwargs.set_item("client_options", client_options)?; 99 | kwargs.set_item("retry_config", retry_config)?; 100 | Ok(cls.call((), Some(&kwargs))?) 101 | } 102 | 103 | fn __eq__(&self, other: &Bound) -> bool { 104 | // Ensure we never error on __eq__ by returning false if the other object is not the same 105 | // type 106 | other 107 | .cast::() 108 | .map(|other| self.config == other.get().config) 109 | .unwrap_or(false) 110 | } 111 | 112 | fn __getnewargs_ex__<'py>(&'py self, py: Python<'py>) -> PyResult> { 113 | self.config.__getnewargs_ex__(py) 114 | } 115 | 116 | fn __repr__(&self) -> String { 117 | format!("HTTPStore(\"{}\")", &self.config.url.as_ref()) 118 | } 119 | 120 | #[getter] 121 | fn url(&self) -> &PyUrl { 122 | &self.config.url 123 | } 124 | 125 | #[getter] 126 | fn client_options(&self) -> Option { 127 | self.config.client_options.clone() 128 | } 129 | 130 | #[getter] 131 | fn retry_config(&self) -> Option { 132 | self.config.retry_config.clone() 133 | } 134 | } 135 | -------------------------------------------------------------------------------- /docs/dev/functional-api.md: -------------------------------------------------------------------------------- 1 | # Functional API Design Choice 2 | 3 | > Last edited 2025-02-04. 4 | > 5 | > See further discussion in [this issue](https://github.com/developmentseed/obstore/issues/160). 6 | 7 | Obstore intentionally presents its main API as top-level functions. E.g. users must use the top level `obstore.put` function: 8 | 9 | ```py 10 | import obstore as obs 11 | from obstore.store import AzureStore 12 | 13 | store = AzureStore() 14 | obs.put(store, ....) 15 | ``` 16 | 17 | instead of a method on the store itself: 18 | 19 | ```py 20 | import obstore as obs 21 | from obstore.store import AzureStore 22 | 23 | store = AzureStore() 24 | store.put(....) 25 | ``` 26 | 27 | This page documents the design decisions for this API. 28 | 29 | ## Store-specific vs generic API 30 | 31 | This presents a nice separation of concerns, in my opinion, between store-specific properties and a generic API that works for _every_ `ObjectStore`. 32 | 33 | Python store classes such as `S3Store` have a few properties to access the _store-specific_ configuration, e.g. `S3Store.config` accesses the S3 credentials. Anything that's a property/method of the store class is specific to that type of store. Whereas any top-level method should work on _any_ store equally well. 34 | 35 | ## Simpler Rust code 36 | 37 | On the Rust side, each Python class is a separate `struct`. A pyo3 `#[pyclass]` can't implement a trait, so the only way to implement the same methods on multiple Rust structs without copy-pasting is by having a macro. That isn't out of the question, however it does hamper extensibility, and having one and only one way to call commands is simpler to maintain. 38 | 39 | ## Simpler Middlewares 40 | 41 | > The `PrefixStore` concept has since been taken out, in favor of natively handling store prefixes, but this argument still holds for other potential middlewares in the future. 42 | 43 | In https://github.com/developmentseed/obstore/pull/117 we added a binding for `PrefixStore`. Because we use object store classes functionally, we only needed 20 lines of Rust code: 44 | https://github.com/developmentseed/obstore/blob/b40d59b4e060ba4fd3dc69468b3ba7da1149758e/pyo3-object_store/src/prefix.rs#L10-L25 45 | 46 | If we exposed methods on an `S3Store`, then those methods would be lost whenever you apply a middleware around it, such as `PrefixStore(S3Store(...))`. So we'd have to ensure those same methods are also installed onto every middleware or other wrapper. 47 | 48 | ## External FFI for ObjectStore 49 | 50 | There was recently [discussion on Discord](https://discord.com/channels/885562378132000778/885562378132000781/1328392836353360007) about the merits of having a stable FFI for `ObjectStore`. If this comes to fruition in the future, then by having a functional API we could seamlessly use _third party_ ObjectStore implementations or middlewares, with no Python overhead. 51 | 52 | I use a similar functional API in other Python bindings, especially in cases with zero-copy FFI, such as https://kylebarron.dev/geo-index/latest/api/rtree/#geoindex_rs.rtree.search (where the spatial index is passed in as the first argument instead) and https://kylebarron.dev/arro3/latest/api/compute/#arro3.compute.cast where the `cast` is not a method on the Arrow Array. 53 | 54 | ## Smaller core for third-party Rust bindings 55 | 56 | This repo has twin goals: 57 | 58 | 1. Provide bindings to `object_store` for _Python users_ who want a _Python API_. 59 | 2. Make it easier for other Rust developers who are making Python bindings, who are using `object_store` on the Rust side already, and who want to expose `ObjectStore` bindings to Python in their own projects. 60 | 61 | The first goal is served by the `obstore` Python package and the second is served by the `pyo3-object_store` Rust crate. The latter provides builders for `S3Store`, `AzureStore`, `GCSStore`, which means that those third party Rust-Python bindings can have code as simple as: 62 | 63 | ```rs 64 | #[pyfunction] 65 | fn use_object_store(store: PyObjectStore) { 66 | let store: Arc = store.into_inner(); 67 | } 68 | ``` 69 | 70 | Those third party bindings don't need the Python bindings to perform arbitrary `get`, `list`, `put` from Python. Instead, they use this to access a raw `Arc` from the Rust side. 71 | 72 | You'll notice that `S3Store`, `GCSStore`, and `AzureStore` **aren't** in the `obstore` library; they're in `pyo3-object_store`. We can't add methods to a pyclass from an external crate, so we couldn't leave those builders in `pyo3_object_store` while having the Python-facing operations live in `obstore`. Instead we'd have to put the entire content of the Python bindings in the `pyo3-object_store` crate. Then this would expose whatever class methods from the `obstore` Python API onto any external Rust-Python library that uses `pyo3-object_store`. I don't want to leak this abstraction nor make that public to other Rust consumers. 73 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.so 2 | *.whl 3 | 4 | # Generated by Cargo 5 | # will have compiled files and executables 6 | debug/ 7 | target/ 8 | 9 | # Remove Cargo.lock from gitignore if creating an executable, leave it for libraries 10 | # More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html 11 | Cargo.lock 12 | 13 | # These are backup files generated by rustfmt 14 | **/*.rs.bk 15 | 16 | # MSVC Windows builds of rustc generate these, which store debugging information 17 | *.pdb 18 | 19 | # RustRover 20 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 21 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 22 | # and can be added to the global gitignore or merged into this file. For a more nuclear 23 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 24 | #.idea/ 25 | 26 | 27 | # Byte-compiled / optimized / DLL files 28 | __pycache__/ 29 | *.py[cod] 30 | *$py.class 31 | 32 | # C extensions 33 | *.so 34 | 35 | # Distribution / packaging 36 | .Python 37 | build/ 38 | develop-eggs/ 39 | dist/ 40 | downloads/ 41 | eggs/ 42 | .eggs/ 43 | lib/ 44 | lib64/ 45 | parts/ 46 | sdist/ 47 | var/ 48 | wheels/ 49 | share/python-wheels/ 50 | *.egg-info/ 51 | .installed.cfg 52 | *.egg 53 | MANIFEST 54 | 55 | # PyInstaller 56 | # Usually these files are written by a python script from a template 57 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 58 | *.manifest 59 | *.spec 60 | 61 | # Installer logs 62 | pip-log.txt 63 | pip-delete-this-directory.txt 64 | 65 | # Unit test / coverage reports 66 | htmlcov/ 67 | .tox/ 68 | .nox/ 69 | .coverage 70 | .coverage.* 71 | .cache 72 | nosetests.xml 73 | coverage.xml 74 | *.cover 75 | *.py,cover 76 | .hypothesis/ 77 | .pytest_cache/ 78 | cover/ 79 | 80 | # Translations 81 | *.mo 82 | *.pot 83 | 84 | # Django stuff: 85 | *.log 86 | local_settings.py 87 | db.sqlite3 88 | db.sqlite3-journal 89 | 90 | # Flask stuff: 91 | instance/ 92 | .webassets-cache 93 | 94 | # Scrapy stuff: 95 | .scrapy 96 | 97 | # Sphinx documentation 98 | docs/_build/ 99 | 100 | # PyBuilder 101 | .pybuilder/ 102 | target/ 103 | 104 | # Jupyter Notebook 105 | .ipynb_checkpoints 106 | 107 | # IPython 108 | profile_default/ 109 | ipython_config.py 110 | 111 | # pyenv 112 | # For a library or package, you might want to ignore these files since the code is 113 | # intended to run in multiple environments; otherwise, check them in: 114 | # .python-version 115 | 116 | # pipenv 117 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 118 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 119 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 120 | # install all needed dependencies. 121 | #Pipfile.lock 122 | 123 | # poetry 124 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 125 | # This is especially recommended for binary packages to ensure reproducibility, and is more 126 | # commonly ignored for libraries. 127 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 128 | #poetry.lock 129 | 130 | # pdm 131 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 132 | #pdm.lock 133 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 134 | # in version control. 135 | # https://pdm.fming.dev/latest/usage/project/#working-with-version-control 136 | .pdm.toml 137 | .pdm-python 138 | .pdm-build/ 139 | 140 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 141 | __pypackages__/ 142 | 143 | # Celery stuff 144 | celerybeat-schedule 145 | celerybeat.pid 146 | 147 | # SageMath parsed files 148 | *.sage.py 149 | 150 | # Environments 151 | .env 152 | .venv 153 | env/ 154 | venv/ 155 | ENV/ 156 | env.bak/ 157 | venv.bak/ 158 | 159 | # Spyder project settings 160 | .spyderproject 161 | .spyproject 162 | 163 | # Rope project settings 164 | .ropeproject 165 | 166 | # mkdocs documentation 167 | /site 168 | 169 | # mypy 170 | .mypy_cache/ 171 | .dmypy.json 172 | dmypy.json 173 | 174 | # Pyre type checker 175 | .pyre/ 176 | 177 | # pytype static type analyzer 178 | .pytype/ 179 | 180 | # Cython debug symbols 181 | cython_debug/ 182 | 183 | # PyCharm 184 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 185 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 186 | # and can be added to the global gitignore or merged into this file. For a more nuclear 187 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 188 | #.idea/ 189 | -------------------------------------------------------------------------------- /docs/blog/posts/obstore-0.7.md: -------------------------------------------------------------------------------- 1 | --- 2 | draft: false 3 | date: 2025-06-25 4 | categories: 5 | - Release 6 | authors: 7 | - kylebarron 8 | links: 9 | - CHANGELOG.md 10 | --- 11 | 12 | # Releasing obstore 0.7! 13 | 14 | Obstore is the simplest, highest-throughput Python interface to Amazon S3, Google Cloud Storage, and Azure Storage, powered by Rust. 15 | 16 | This post gives an overview of what's new in obstore version 0.7. 17 | 18 | 19 | 20 | Refer to the [changelog](../../CHANGELOG.md) for all updates. 21 | 22 | ## Anonymous connections to Google Cloud Storage 23 | 24 | Obstore now supports anonymous connections to GCS. Pass [`skip_signature=True`][obstore.store.GCSConfig.skip_signature] to configure an anonymous connection. 25 | 26 | ```py 27 | from obstore.store import GCSStore 28 | 29 | store = GCSStore( 30 | "weatherbench2", 31 | prefix="datasets/era5/1959-2023_01_10-full_37-1h-0p25deg-chunk-1.zarr", 32 | # Anonymous connection 33 | skip_signature=True, 34 | ) 35 | store.list_with_delimiter()["objects"] 36 | ``` 37 | 38 | Now prints: 39 | 40 | ```py 41 | [{'path': '.zattrs', 42 | 'last_modified': datetime.datetime(2023, 11, 22, 9, 4, 54, 481000, tzinfo=datetime.timezone.utc), 43 | 'size': 2, 44 | 'e_tag': '"99914b932bd37a50b983c5e7c90ae93b"', 45 | 'version': None}, 46 | {'path': '.zgroup', 47 | 'last_modified': datetime.datetime(2023, 11, 22, 9, 4, 53, 465000, tzinfo=datetime.timezone.utc), 48 | 'size': 24, 49 | 'e_tag': '"e20297935e73dd0154104d4ea53040ab"', 50 | 'version': None}, 51 | {'path': '.zmetadata', 52 | 'last_modified': datetime.datetime(2023, 11, 22, 9, 4, 54, 947000, tzinfo=datetime.timezone.utc), 53 | 'size': 46842, 54 | 'e_tag': '"9d287796ca614bfec4f1bb20a4ac1ba3"', 55 | 'version': None}] 56 | ``` 57 | 58 | ## Obspec v0.1 compatibility 59 | 60 | Obstore provides an implementation for accessing Amazon S3, Google Cloud Storage, and Azure Storage, but some libraries may want to also support other backends, such as HTTP clients or more obscure things like SFTP or HDFS filesystems. 61 | 62 | Additionally, there's a bunch of useful behavior that could exist on top of Obstore: caching, metrics, globbing, bulk operations. While all of those operations are useful, we want to keep the core Obstore library as small as possible, tightly coupled with the underlying Rust `object_store` library. 63 | 64 | [Obspec](https://developmentseed.org/obspec/) exists to provide the abstractions for generic programming against object store backends. Obspec is essentially a formalization and generalization of the Obstore API, so if you're already using Obstore, very few changes are needed to use Obspec instead. 65 | 66 | Downstream libraries can program against the Obspec API to be fully generic around what underlying backend is used at runtime. 67 | 68 | For further information, refer to the [Obspec documentation](https://developmentseed.org/obspec/latest/) and the [Obspec announcement blog post](https://developmentseed.org/obspec/latest/blog/2025/06/25/introducing-obspec-a-python-protocol-for-interfacing-with-object-storage/). 69 | 70 | ## Customize headers sent in requests 71 | 72 | `ClientConfig` now accepts a [`default_headers` key][obstore.store.ClientConfig.default_headers]. This allows you to add additional headers that will be sent by the HTTP client on every request. 73 | 74 | ## Improvements to NASA Earthdata credential provider 75 | 76 | The NASA Earthdata credential provider now allows user to customize the host that handles credentialization. 77 | 78 | It also allows for more possibilities of passing credentials. Authentication information can be a NASA Earthdata token, NASA Earthdata username/password (tuple), or `None`, in which case, environment variables or a `~/.netrc` file are used, if set. 79 | 80 | See updated documentation on the [NASA Earthdata page](../../../../../api/auth/earthdata/). 81 | 82 | ## Fixed creation of `AzureStore` from HTTPS URL 83 | 84 | Previously, this would create an incorrect AzureStore configuration: 85 | 86 | ```py 87 | url = "https://overturemapswestus2.blob.core.windows.net/release" 88 | store = AzureStore.from_url(url, skip_signature=True) 89 | ``` 90 | 91 | because it would interpret `release` as part of the within-bucket _prefix_, when it should really be interpreted as the _container name_. 92 | 93 | This is now fixed and this test passes: 94 | 95 | ```py 96 | url = "https://overturemapswestus2.blob.core.windows.net/release" 97 | store = AzureStore.from_url(url, skip_signature=True) 98 | 99 | assert store.config.get("container_name") == "release" 100 | assert store.config.get("account_name") == "overturemapswestus2" 101 | assert store.prefix is None 102 | ``` 103 | 104 | ## Improved documentation 105 | 106 | - [New `Zarr` example](../../examples/zarr.md) 107 | - [New `stream-zip` example](../../examples/stream-zip.md) 108 | 109 | ## All updates 110 | 111 | Refer to the [changelog](../../CHANGELOG.md) for all updates. 112 | --------------------------------------------------------------------------------