├── .gitignore
├── .readthedocs.yaml
├── LICENSE
├── README.md
└── docs
├── Makefile
├── _static
├── css
│ └── custom.css
├── draft-watermark.png
└── logo.png
├── conf.py
├── index.rst
├── make.bat
├── requirements.txt
├── specs.rst
├── v1
└── v1.0.rst
├── v2
└── v2.0.rst
└── v3
├── chunk-grids
├── index.rst
└── regular-grid
│ └── index.rst
├── chunk-key-encodings
├── default
│ └── index.rst
├── index.rst
└── v2
│ └── index.rst
├── codecs
├── blosc
│ └── index.rst
├── bytes
│ └── index.rst
├── crc32c
│ └── index.rst
├── gzip
│ └── index.rst
├── index.rst
├── sharding-indexed
│ ├── index.rst
│ └── sharding.png
└── transpose
│ └── index.rst
├── core
├── index.rst
├── terminology-hierarchy.excalidraw.png
└── terminology-read.excalidraw.png
├── data-types
└── index.rst
├── storage-transformers
└── index.rst
└── stores
├── filesystem
└── index.rst
└── index.rst
/.gitignore:
--------------------------------------------------------------------------------
1 | # emacs temp files
2 | *~
3 |
4 | # sphinx build files
5 | docs/_build
6 |
7 | # pycharm
8 | .idea
9 |
10 | # virtual environments
11 | .venv
12 |
13 | # visual studio code
14 | .vscode
--------------------------------------------------------------------------------
/.readthedocs.yaml:
--------------------------------------------------------------------------------
1 | version: 2
2 |
3 | build:
4 | os: ubuntu-22.04
5 | tools:
6 | python: "3.11"
7 |
8 | sphinx:
9 | configuration: docs/conf.py
10 |
11 | python:
12 | install:
13 | - requirements: docs/requirements.txt
14 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Attribution 4.0 International
2 |
3 | =======================================================================
4 |
5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
6 | does not provide legal services or legal advice. Distribution of
7 | Creative Commons public licenses does not create a lawyer-client or
8 | other relationship. Creative Commons makes its licenses and related
9 | information available on an "as-is" basis. Creative Commons gives no
10 | warranties regarding its licenses, any material licensed under their
11 | terms and conditions, or any related information. Creative Commons
12 | disclaims all liability for damages resulting from their use to the
13 | fullest extent possible.
14 |
15 | Using Creative Commons Public Licenses
16 |
17 | Creative Commons public licenses provide a standard set of terms and
18 | conditions that creators and other rights holders may use to share
19 | original works of authorship and other material subject to copyright
20 | and certain other rights specified in the public license below. The
21 | following considerations are for informational purposes only, are not
22 | exhaustive, and do not form part of our licenses.
23 |
24 | Considerations for licensors: Our public licenses are
25 | intended for use by those authorized to give the public
26 | permission to use material in ways otherwise restricted by
27 | copyright and certain other rights. Our licenses are
28 | irrevocable. Licensors should read and understand the terms
29 | and conditions of the license they choose before applying it.
30 | Licensors should also secure all rights necessary before
31 | applying our licenses so that the public can reuse the
32 | material as expected. Licensors should clearly mark any
33 | material not subject to the license. This includes other CC-
34 | licensed material, or material used under an exception or
35 | limitation to copyright. More considerations for licensors:
36 | wiki.creativecommons.org/Considerations_for_licensors
37 |
38 | Considerations for the public: By using one of our public
39 | licenses, a licensor grants the public permission to use the
40 | licensed material under specified terms and conditions. If
41 | the licensor's permission is not necessary for any reason--for
42 | example, because of any applicable exception or limitation to
43 | copyright--then that use is not regulated by the license. Our
44 | licenses grant only permissions under copyright and certain
45 | other rights that a licensor has authority to grant. Use of
46 | the licensed material may still be restricted for other
47 | reasons, including because others have copyright or other
48 | rights in the material. A licensor may make special requests,
49 | such as asking that all changes be marked or described.
50 | Although not required by our licenses, you are encouraged to
51 | respect those requests where reasonable. More considerations
52 | for the public:
53 | wiki.creativecommons.org/Considerations_for_licensees
54 |
55 | =======================================================================
56 |
57 | Creative Commons Attribution 4.0 International Public License
58 |
59 | By exercising the Licensed Rights (defined below), You accept and agree
60 | to be bound by the terms and conditions of this Creative Commons
61 | Attribution 4.0 International Public License ("Public License"). To the
62 | extent this Public License may be interpreted as a contract, You are
63 | granted the Licensed Rights in consideration of Your acceptance of
64 | these terms and conditions, and the Licensor grants You such rights in
65 | consideration of benefits the Licensor receives from making the
66 | Licensed Material available under these terms and conditions.
67 |
68 |
69 | Section 1 -- Definitions.
70 |
71 | a. Adapted Material means material subject to Copyright and Similar
72 | Rights that is derived from or based upon the Licensed Material
73 | and in which the Licensed Material is translated, altered,
74 | arranged, transformed, or otherwise modified in a manner requiring
75 | permission under the Copyright and Similar Rights held by the
76 | Licensor. For purposes of this Public License, where the Licensed
77 | Material is a musical work, performance, or sound recording,
78 | Adapted Material is always produced where the Licensed Material is
79 | synched in timed relation with a moving image.
80 |
81 | b. Adapter's License means the license You apply to Your Copyright
82 | and Similar Rights in Your contributions to Adapted Material in
83 | accordance with the terms and conditions of this Public License.
84 |
85 | c. Copyright and Similar Rights means copyright and/or similar rights
86 | closely related to copyright including, without limitation,
87 | performance, broadcast, sound recording, and Sui Generis Database
88 | Rights, without regard to how the rights are labeled or
89 | categorized. For purposes of this Public License, the rights
90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
91 | Rights.
92 |
93 | d. Effective Technological Measures means those measures that, in the
94 | absence of proper authority, may not be circumvented under laws
95 | fulfilling obligations under Article 11 of the WIPO Copyright
96 | Treaty adopted on December 20, 1996, and/or similar international
97 | agreements.
98 |
99 | e. Exceptions and Limitations means fair use, fair dealing, and/or
100 | any other exception or limitation to Copyright and Similar Rights
101 | that applies to Your use of the Licensed Material.
102 |
103 | f. Licensed Material means the artistic or literary work, database,
104 | or other material to which the Licensor applied this Public
105 | License.
106 |
107 | g. Licensed Rights means the rights granted to You subject to the
108 | terms and conditions of this Public License, which are limited to
109 | all Copyright and Similar Rights that apply to Your use of the
110 | Licensed Material and that the Licensor has authority to license.
111 |
112 | h. Licensor means the individual(s) or entity(ies) granting rights
113 | under this Public License.
114 |
115 | i. Share means to provide material to the public by any means or
116 | process that requires permission under the Licensed Rights, such
117 | as reproduction, public display, public performance, distribution,
118 | dissemination, communication, or importation, and to make material
119 | available to the public including in ways that members of the
120 | public may access the material from a place and at a time
121 | individually chosen by them.
122 |
123 | j. Sui Generis Database Rights means rights other than copyright
124 | resulting from Directive 96/9/EC of the European Parliament and of
125 | the Council of 11 March 1996 on the legal protection of databases,
126 | as amended and/or succeeded, as well as other essentially
127 | equivalent rights anywhere in the world.
128 |
129 | k. You means the individual or entity exercising the Licensed Rights
130 | under this Public License. Your has a corresponding meaning.
131 |
132 |
133 | Section 2 -- Scope.
134 |
135 | a. License grant.
136 |
137 | 1. Subject to the terms and conditions of this Public License,
138 | the Licensor hereby grants You a worldwide, royalty-free,
139 | non-sublicensable, non-exclusive, irrevocable license to
140 | exercise the Licensed Rights in the Licensed Material to:
141 |
142 | a. reproduce and Share the Licensed Material, in whole or
143 | in part; and
144 |
145 | b. produce, reproduce, and Share Adapted Material.
146 |
147 | 2. Exceptions and Limitations. For the avoidance of doubt, where
148 | Exceptions and Limitations apply to Your use, this Public
149 | License does not apply, and You do not need to comply with
150 | its terms and conditions.
151 |
152 | 3. Term. The term of this Public License is specified in Section
153 | 6(a).
154 |
155 | 4. Media and formats; technical modifications allowed. The
156 | Licensor authorizes You to exercise the Licensed Rights in
157 | all media and formats whether now known or hereafter created,
158 | and to make technical modifications necessary to do so. The
159 | Licensor waives and/or agrees not to assert any right or
160 | authority to forbid You from making technical modifications
161 | necessary to exercise the Licensed Rights, including
162 | technical modifications necessary to circumvent Effective
163 | Technological Measures. For purposes of this Public License,
164 | simply making modifications authorized by this Section 2(a)
165 | (4) never produces Adapted Material.
166 |
167 | 5. Downstream recipients.
168 |
169 | a. Offer from the Licensor -- Licensed Material. Every
170 | recipient of the Licensed Material automatically
171 | receives an offer from the Licensor to exercise the
172 | Licensed Rights under the terms and conditions of this
173 | Public License.
174 |
175 | b. No downstream restrictions. You may not offer or impose
176 | any additional or different terms or conditions on, or
177 | apply any Effective Technological Measures to, the
178 | Licensed Material if doing so restricts exercise of the
179 | Licensed Rights by any recipient of the Licensed
180 | Material.
181 |
182 | 6. No endorsement. Nothing in this Public License constitutes or
183 | may be construed as permission to assert or imply that You
184 | are, or that Your use of the Licensed Material is, connected
185 | with, or sponsored, endorsed, or granted official status by,
186 | the Licensor or others designated to receive attribution as
187 | provided in Section 3(a)(1)(A)(i).
188 |
189 | b. Other rights.
190 |
191 | 1. Moral rights, such as the right of integrity, are not
192 | licensed under this Public License, nor are publicity,
193 | privacy, and/or other similar personality rights; however, to
194 | the extent possible, the Licensor waives and/or agrees not to
195 | assert any such rights held by the Licensor to the limited
196 | extent necessary to allow You to exercise the Licensed
197 | Rights, but not otherwise.
198 |
199 | 2. Patent and trademark rights are not licensed under this
200 | Public License.
201 |
202 | 3. To the extent possible, the Licensor waives any right to
203 | collect royalties from You for the exercise of the Licensed
204 | Rights, whether directly or through a collecting society
205 | under any voluntary or waivable statutory or compulsory
206 | licensing scheme. In all other cases the Licensor expressly
207 | reserves any right to collect such royalties.
208 |
209 |
210 | Section 3 -- License Conditions.
211 |
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 |
215 | a. Attribution.
216 |
217 | 1. If You Share the Licensed Material (including in modified
218 | form), You must:
219 |
220 | a. retain the following if it is supplied by the Licensor
221 | with the Licensed Material:
222 |
223 | i. identification of the creator(s) of the Licensed
224 | Material and any others designated to receive
225 | attribution, in any reasonable manner requested by
226 | the Licensor (including by pseudonym if
227 | designated);
228 |
229 | ii. a copyright notice;
230 |
231 | iii. a notice that refers to this Public License;
232 |
233 | iv. a notice that refers to the disclaimer of
234 | warranties;
235 |
236 | v. a URI or hyperlink to the Licensed Material to the
237 | extent reasonably practicable;
238 |
239 | b. indicate if You modified the Licensed Material and
240 | retain an indication of any previous modifications; and
241 |
242 | c. indicate the Licensed Material is licensed under this
243 | Public License, and include the text of, or the URI or
244 | hyperlink to, this Public License.
245 |
246 | 2. You may satisfy the conditions in Section 3(a)(1) in any
247 | reasonable manner based on the medium, means, and context in
248 | which You Share the Licensed Material. For example, it may be
249 | reasonable to satisfy the conditions by providing a URI or
250 | hyperlink to a resource that includes the required
251 | information.
252 |
253 | 3. If requested by the Licensor, You must remove any of the
254 | information required by Section 3(a)(1)(A) to the extent
255 | reasonably practicable.
256 |
257 | 4. If You Share Adapted Material You produce, the Adapter's
258 | License You apply must not prevent recipients of the Adapted
259 | Material from complying with this Public License.
260 |
261 |
262 | Section 4 -- Sui Generis Database Rights.
263 |
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 |
267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 | to extract, reuse, reproduce, and Share all or a substantial
269 | portion of the contents of the database;
270 |
271 | b. if You include all or a substantial portion of the database
272 | contents in a database in which You have Sui Generis Database
273 | Rights, then the database in which You have Sui Generis Database
274 | Rights (but not its individual contents) is Adapted Material; and
275 |
276 | c. You must comply with the conditions in Section 3(a) if You Share
277 | all or a substantial portion of the contents of the database.
278 |
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 |
283 |
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 |
286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 |
297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 |
307 | c. The disclaimer of warranties and limitation of liability provided
308 | above shall be interpreted in a manner that, to the extent
309 | possible, most closely approximates an absolute disclaimer and
310 | waiver of all liability.
311 |
312 |
313 | Section 6 -- Term and Termination.
314 |
315 | a. This Public License applies for the term of the Copyright and
316 | Similar Rights licensed here. However, if You fail to comply with
317 | this Public License, then Your rights under this Public License
318 | terminate automatically.
319 |
320 | b. Where Your right to use the Licensed Material has terminated under
321 | Section 6(a), it reinstates:
322 |
323 | 1. automatically as of the date the violation is cured, provided
324 | it is cured within 30 days of Your discovery of the
325 | violation; or
326 |
327 | 2. upon express reinstatement by the Licensor.
328 |
329 | For the avoidance of doubt, this Section 6(b) does not affect any
330 | right the Licensor may have to seek remedies for Your violations
331 | of this Public License.
332 |
333 | c. For the avoidance of doubt, the Licensor may also offer the
334 | Licensed Material under separate terms or conditions or stop
335 | distributing the Licensed Material at any time; however, doing so
336 | will not terminate this Public License.
337 |
338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 | License.
340 |
341 |
342 | Section 7 -- Other Terms and Conditions.
343 |
344 | a. The Licensor shall not be bound by any additional or different
345 | terms or conditions communicated by You unless expressly agreed.
346 |
347 | b. Any arrangements, understandings, or agreements regarding the
348 | Licensed Material not stated herein are separate from and
349 | independent of the terms and conditions of this Public License.
350 |
351 |
352 | Section 8 -- Interpretation.
353 |
354 | a. For the avoidance of doubt, this Public License does not, and
355 | shall not be interpreted to, reduce, limit, restrict, or impose
356 | conditions on any use of the Licensed Material that could lawfully
357 | be made without permission under this Public License.
358 |
359 | b. To the extent possible, if any provision of this Public License is
360 | deemed unenforceable, it shall be automatically reformed to the
361 | minimum extent necessary to make it enforceable. If the provision
362 | cannot be reformed, it shall be severed from this Public License
363 | without affecting the enforceability of the remaining terms and
364 | conditions.
365 |
366 | c. No term or condition of this Public License will be waived and no
367 | failure to comply consented to unless expressly agreed to by the
368 | Licensor.
369 |
370 | d. Nothing in this Public License constitutes or may be interpreted
371 | as a limitation upon, or waiver of, any privileges and immunities
372 | that apply to the Licensor or You, including from the legal
373 | processes of any jurisdiction or authority.
374 |
375 |
376 | =======================================================================
377 |
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 |
395 | Creative Commons may be contacted at creativecommons.org.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Zarr Specification
2 |
3 | **Zarr core protocol for storage and retrieval of N-dimensional typed arrays**
4 |
5 |
6 |
7 | For the v1 and v2 specs, please see
8 | https://github.com/zarr-developers/zarr-python/tree/main/docs/spec.
9 |
10 | The rendered docs of the `main` branch are available at https://zarr-specs.readthedocs.io
11 |
12 | ## Usage
13 |
14 | The following steps install the necessary packages to render the specs with
15 | automatic updating and reloading of changes:
16 |
17 | ```shell
18 | ## optionally setup an venv
19 | # python3 -m venv .venv
20 | # . .venv/bin/activate
21 | pip install -r docs/requirements.txt
22 | pip install sphinx-autobuild
23 | sphinx-autobuild -a docs docs/_build/html
24 | ```
25 |
--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
1 | # Minimal makefile for Sphinx documentation
2 | #
3 |
4 | # You can set these variables from the command line.
5 | SPHINXOPTS =
6 | SPHINXBUILD = sphinx-build
7 | SOURCEDIR = .
8 | BUILDDIR = _build
9 |
10 | # Put it first so that "make" without argument is like "make help".
11 | help:
12 | @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
13 |
14 | .PHONY: help Makefile
15 |
16 | # Catch-all target: route all unknown targets to Sphinx using the new
17 | # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
18 | %: Makefile
19 | @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--------------------------------------------------------------------------------
/docs/_static/css/custom.css:
--------------------------------------------------------------------------------
1 | .bd-main .bd-content .bd-article-container {
2 | flex-grow: 1;
3 | max-width: 100%;
4 | }
5 |
6 | @media (min-width:960px) {
7 | .bd-page-width {
8 | max-width: 100rem;
9 | }
10 | }
11 |
12 | footer {
13 | display: none;
14 | }
15 |
16 | .sidebar-end-items {
17 | margin-top: 0% !important;
18 | }
19 |
20 | /* Remove ↗ for external links in top bar menu */
21 | .nav-link.nav-external:after {
22 | display: none;
23 | }
24 |
25 | div.bd-article-container:has(.draft) {
26 | background-image: url(../draft-watermark.png) !important;
27 | background-repeat: repeat-y !important;
28 | background-position: center top !important;
29 | background-attachment: scroll !important;
30 | }
31 |
--------------------------------------------------------------------------------
/docs/_static/draft-watermark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zarr-developers/zarr-specs/b880fb385bedb18dd78ffef1bd683e7e93270c74/docs/_static/draft-watermark.png
--------------------------------------------------------------------------------
/docs/_static/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zarr-developers/zarr-specs/b880fb385bedb18dd78ffef1bd683e7e93270c74/docs/_static/logo.png
--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
1 | # Configuration file for the Sphinx documentation builder.
2 | #
3 | # This file only contains a selection of the most common options. For a full
4 | # list see the documentation:
5 | # https://www.sphinx-doc.org/en/master/usage/configuration.html
6 |
7 | # -- Path setup --------------------------------------------------------------
8 |
9 | # If extensions (or modules to document with autodoc) are in another directory,
10 | # add these directories to sys.path here. If the directory is relative to the
11 | # documentation root, use os.path.abspath to make it absolute, like shown here.
12 | #
13 | # import os
14 | # import sys
15 | # sys.path.insert(0, os.path.abspath('.'))
16 |
17 |
18 | # -- Project information -----------------------------------------------------
19 |
20 | project = 'Zarr specs'
21 | copyright = '2024, Zarr Developers'
22 | author = 'Zarr Developers'
23 |
24 |
25 | # -- General configuration ---------------------------------------------------
26 |
27 | # Add any Sphinx extension module names here, as strings. They can be
28 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
29 | # ones.
30 | extensions = [
31 | 'sphinx.ext.todo',
32 | 'sphinxcontrib.mermaid',
33 | 'sphinx_reredirects',
34 | ]
35 |
36 | # Display todos by setting to True
37 | todo_include_todos = True
38 |
39 | # Add any paths that contain templates here, relative to this directory.
40 | templates_path = ['_templates']
41 |
42 | # List of patterns, relative to source directory, that match files and
43 | # directories to ignore when looking for source files.
44 | # This pattern also affects html_static_path and html_extra_path.
45 | exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
46 |
47 |
48 | # -- Options for HTML output -------------------------------------------------
49 |
50 | # The theme to use for HTML and HTML Help pages. See the documentation for
51 | # a list of builtin themes.
52 | #
53 | html_theme = "pydata_sphinx_theme"
54 | html_logo = '_static/logo.png'
55 |
56 | html_theme_options = {
57 | "github_url": "https://github.com/zarr-developers/zarr-specs",
58 | "icon_links": [
59 | {
60 | "name": "Bluesky",
61 | "url": "https://bsky.app/profile/zarr.dev",
62 | "icon": "fa-brands fa-bluesky",
63 | },
64 | {
65 | "name": "Zulip",
66 | "url": "https://ossci.zulipchat.com/",
67 | "icon": "fas fa-comments",
68 | },
69 | ],
70 | "show_prev_next": False,
71 | "secondary_sidebar_items": ["page-toc"],
72 | }
73 |
74 | # Add any paths that contain custom static files (such as style sheets) here,
75 | # relative to this directory. They are copied after the builtin static files,
76 | # so a file named "default.css" will overwrite the builtin "default.css".
77 | html_static_path = ['_static']
78 |
79 | html_css_files = [
80 | 'css/custom.css',
81 | ]
82 |
83 | suppress_warnings = [
84 | # suppress "duplicate citation" warnings
85 | 'ref.citation',
86 | ]
87 |
88 | redirects = {
89 | "index": "specs.html",
90 | "v3/core/v3.0.html": "./index.html",
91 | "v3/codecs/blosc/v1.0.rst": "./index.html",
92 | "v3/codecs/bytes/v1.0.rst": "./index.html",
93 | "v3/codecs/crc32c/v1.0.rst": "./index.html",
94 | "v3/codecs/gzip/v1.0.rst": "./index.html",
95 | "v3/codecs/sharding-indexed/v1.0.rst": "./index.html",
96 | "v3/codecs/transpose/v1.0.rst": "./index.html",
97 | "v3/stores/filesystem/v1.0.rst": "./index.html",
98 | "v3/chunk-grid.rst": "chunk-grids/index.rst",
99 | "v3/chunk-key-encoding.rst": "chunk-key-encodings/index.html",
100 | "v3/codecs.rst": "codecs/index.html",
101 | "v3/data-types.rst": "data-types/index.html",
102 | "v3/array-storage-transformers.rst": "storage-transformers/index.html",
103 | "v3/stores.rst": "stores/index.html",
104 | }
105 |
--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
1 | =====
2 | Specs
3 | =====
4 |
5 | A good starting point is the :ref:`zarr-core-specification-v3`.
6 |
7 | .. toctree::
8 |
9 | Home
10 | specs
11 | ZEPs
12 | Implementations
13 |
14 |
15 | Indices and tables
16 | ==================
17 |
18 | * :ref:`genindex`
19 | * :ref:`modindex`
20 | * :ref:`search`
21 |
--------------------------------------------------------------------------------
/docs/make.bat:
--------------------------------------------------------------------------------
1 | @ECHO OFF
2 |
3 | pushd %~dp0
4 |
5 | REM Command file for Sphinx documentation
6 |
7 | if "%SPHINXBUILD%" == "" (
8 | set SPHINXBUILD=sphinx-build
9 | )
10 | set SOURCEDIR=.
11 | set BUILDDIR=_build
12 |
13 | if "%1" == "" goto help
14 |
15 | %SPHINXBUILD% >NUL 2>NUL
16 | if errorlevel 9009 (
17 | echo.
18 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
19 | echo.installed, then set the SPHINXBUILD environment variable to point
20 | echo.to the full path of the 'sphinx-build' executable. Alternatively you
21 | echo.may add the Sphinx directory to PATH.
22 | echo.
23 | echo.If you don't have Sphinx installed, grab it from
24 | echo.https://www.sphinx-doc.org/
25 | exit /b 1
26 | )
27 |
28 | %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
29 | goto end
30 |
31 | :help
32 | %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
33 |
34 | :end
35 | popd
36 |
--------------------------------------------------------------------------------
/docs/requirements.txt:
--------------------------------------------------------------------------------
1 | sphinx
2 | pydata-sphinx-theme
3 | sphinxcontrib-mermaid
4 | sphinx-reredirects
5 |
--------------------------------------------------------------------------------
/docs/specs.rst:
--------------------------------------------------------------------------------
1 | ==============
2 | Specifications
3 | ==============
4 |
5 | .. _zarr-specs:
6 |
7 | .. toctree::
8 | :maxdepth: 1
9 | :caption: v3
10 |
11 | Core
12 | v3/codecs/index
13 | v3/chunk-grids/index
14 | v3/chunk-key-encodings/index
15 | v3/data-types/index
16 | v3/stores/index
17 | v3/storage-transformers/index
18 |
19 | .. toctree::
20 | :maxdepth: 1
21 | :caption: v2
22 |
23 | Zarr spec v2
24 |
25 | .. toctree::
26 | :maxdepth: 1
27 | :caption: v1
28 |
29 | Zarr spec v1
30 |
--------------------------------------------------------------------------------
/docs/v1/v1.0.rst:
--------------------------------------------------------------------------------
1 | .. _spec_v1:
2 |
3 | Zarr Storage Specification Version 1
4 | ====================================
5 |
6 | This document provides a technical specification of the protocol and
7 | format used for storing a Zarr array. The key words "MUST", "MUST
8 | NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
9 | "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
10 | interpreted as described in `RFC 2119
11 | `_.
12 |
13 | Status
14 | ------
15 |
16 | This specification is deprecated. See :ref:`zarr-specs` for the latest version.
17 |
18 | Storage
19 | -------
20 |
21 | A Zarr array can be stored in any storage system that provides a
22 | key/value interface, where a key is an ASCII string and a value is an
23 | arbitrary sequence of bytes, and the supported operations are read
24 | (get the sequence of bytes associated with a given key), write (set
25 | the sequence of bytes associated with a given key) and delete (remove
26 | a key/value pair).
27 |
28 | For example, a directory in a file system can provide this interface,
29 | where keys are file names, values are file contents, and files can be
30 | read, written or deleted via the operating system. Equally, an S3
31 | bucket can provide this interface, where keys are resource names,
32 | values are resource contents, and resources can be read, written or
33 | deleted via HTTP.
34 |
35 | Below an "array store" refers to any system implementing this
36 | interface.
37 |
38 | Metadata
39 | --------
40 |
41 | Each array requires essential configuration metadata to be stored,
42 | enabling correct interpretation of the stored data. This metadata is
43 | encoded using JSON and stored as the value of the 'meta' key within an
44 | array store.
45 |
46 | The metadata resource is a JSON object. The following keys MUST be
47 | present within the object:
48 |
49 | zarr_format
50 | An integer defining the version of the storage specification to which the
51 | array store adheres.
52 | shape
53 | A list of integers defining the length of each dimension of the array.
54 | chunks
55 | A list of integers defining the length of each dimension of a chunk of the
56 | array. Note that all chunks within a Zarr array have the same shape.
57 | dtype
58 | A string or list defining a valid data type for the array. See also
59 | the subsection below on data type encoding.
60 | compression
61 | A string identifying the primary compression library used to compress
62 | each chunk of the array.
63 | compression_opts
64 | An integer, string or dictionary providing options to the primary
65 | compression library.
66 | fill_value
67 | A scalar value providing the default value to use for uninitialized
68 | portions of the array.
69 | order
70 | Either 'C' or 'F', defining the layout of bytes within each chunk of the
71 | array. 'C' means row-major order, i.e., the last dimension varies fastest;
72 | 'F' means column-major order, i.e., the first dimension varies fastest.
73 |
74 | Other keys MAY be present within the metadata object however they MUST
75 | NOT alter the interpretation of the required fields defined above.
76 |
77 | For example, the JSON object below defines a 2-dimensional array of
78 | 64-bit little-endian floating point numbers with 10000 rows and 10000
79 | columns, divided into chunks of 1000 rows and 1000 columns (so there
80 | will be 100 chunks in total arranged in a 10 by 10 grid). Within each
81 | chunk the data are laid out in C contiguous order, and each chunk is
82 | compressed using the Blosc compression library::
83 |
84 | {
85 | "chunks": [
86 | 1000,
87 | 1000
88 | ],
89 | "compression": "blosc",
90 | "compression_opts": {
91 | "clevel": 5,
92 | "cname": "lz4",
93 | "shuffle": 1
94 | },
95 | "dtype": "`_. The
112 | format consists of 3 parts: a character describing the byteorder of
113 | the data (``<``: little-endian, ``>``: big-endian, ``|``:
114 | not-relevant), a character code giving the basic type of the array,
115 | and an integer providing the number of bytes the type uses. The byte
116 | order MUST be specified. E.g., ``"i4"``, ``"|b1"`` and
117 | ``"|S12"`` are valid data types.
118 |
119 | Structure data types (i.e., with multiple named fields) are encoded as
120 | a list of two-element lists, following `NumPy array protocol type
121 | descriptions (descr)
122 | `_.
123 | For example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b",
124 | "|u1"]]`` defines a data type composed of three single-byte unsigned
125 | integers labelled 'r', 'g' and 'b'.
126 |
127 | Chunks
128 | ------
129 |
130 | Each chunk of the array is compressed by passing the raw bytes for the
131 | chunk through the primary compression library to obtain a new sequence
132 | of bytes comprising the compressed chunk data. No header is added to
133 | the compressed bytes or any other modification made. The internal
134 | structure of the compressed bytes will depend on which primary
135 | compressor was used. For example, the `Blosc compressor
136 | `_
137 | produces a sequence of bytes that begins with a 16-byte header
138 | followed by compressed data.
139 |
140 | The compressed sequence of bytes for each chunk is stored under a key
141 | formed from the index of the chunk within the grid of chunks
142 | representing the array. To form a string key for a chunk, the indices
143 | are converted to strings and concatenated with the period character
144 | ('.') separating each index. For example, given an array with shape
145 | (10000, 10000) and chunk shape (1000, 1000) there will be 100 chunks
146 | laid out in a 10 by 10 grid. The chunk with indices (0, 0) provides
147 | data for rows 0-999 and columns 0-999 and is stored under the key
148 | '0.0'; the chunk with indices (2, 4) provides data for rows 2000-2999
149 | and columns 4000-4999 and is stored under the key '2.4'; etc.
150 |
151 | There is no need for all chunks to be present within an array
152 | store. If a chunk is not present then it is considered to be in an
153 | uninitialized state. An uninitialized chunk MUST be treated as if it
154 | was uniformly filled with the value of the 'fill_value' field in the
155 | array metadata. If the 'fill_value' field is ``null`` then the
156 | contents of the chunk are undefined.
157 |
158 | Note that all chunks in an array have the same shape. If the length of
159 | any array dimension is not exactly divisible by the length of the
160 | corresponding chunk dimension then some chunks will overhang the edge
161 | of the array. The contents of any chunk region falling outside the
162 | array are undefined.
163 |
164 | Attributes
165 | ----------
166 |
167 | Each array can also be associated with custom attributes, which are
168 | simple key/value items with application-specific meaning. Custom
169 | attributes are encoded as a JSON object and stored under the 'attrs'
170 | key within an array store. Even if the attributes are empty, the
171 | 'attrs' key MUST be present within an array store.
172 |
173 | For example, the JSON object below encodes three attributes named
174 | 'foo', 'bar' and 'baz'::
175 |
176 | {
177 | "foo": 42,
178 | "bar": "apples",
179 | "baz": [1, 2, 3, 4]
180 | }
181 |
182 | Example
183 | -------
184 |
185 | Below is an example of storing a Zarr array, using a directory on the
186 | local file system as storage.
187 |
188 | Initialize the store::
189 |
190 | >>> import zarr
191 | >>> store = zarr.DirectoryStore('example.zarr')
192 | >>> zarr.init_store(store, shape=(20, 20), chunks=(10, 10),
193 | ... dtype='i4', fill_value=42, compression='zlib',
194 | ... compression_opts=1, overwrite=True)
195 |
196 | No chunks are initialized yet, so only the 'meta' and 'attrs' keys
197 | have been set::
198 |
199 | >>> import os
200 | >>> sorted(os.listdir('example.zarr'))
201 | ['attrs', 'meta']
202 |
203 | Inspect the array metadata::
204 |
205 | >>> print(open('example.zarr/meta').read())
206 | {
207 | "chunks": [
208 | 10,
209 | 10
210 | ],
211 | "compression": "zlib",
212 | "compression_opts": 1,
213 | "dtype": ">> print(open('example.zarr/attrs').read())
226 | {}
227 |
228 | Set some data::
229 |
230 | >>> z = zarr.Array(store)
231 | >>> z[0:10, 0:10] = 1
232 | >>> sorted(os.listdir('example.zarr'))
233 | ['0.0', 'attrs', 'meta']
234 |
235 | Set some more data::
236 |
237 | >>> z[0:10, 10:20] = 2
238 | >>> z[10:20, :] = 3
239 | >>> sorted(os.listdir('example.zarr'))
240 | ['0.0', '0.1', '1.0', '1.1', 'attrs', 'meta']
241 |
242 | Manually decompress a single chunk for illustration::
243 |
244 | >>> import zlib
245 | >>> b = zlib.decompress(open('example.zarr/0.0', 'rb').read())
246 | >>> import numpy as np
247 | >>> a = np.frombuffer(b, dtype='>> a
249 | array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
250 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
251 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
252 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
253 | 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
254 |
255 | Modify the array attributes::
256 |
257 | >>> z.attrs['foo'] = 42
258 | >>> z.attrs['bar'] = 'apples'
259 | >>> z.attrs['baz'] = [1, 2, 3, 4]
260 | >>> print(open('example.zarr/attrs').read())
261 | {
262 | "bar": "apples",
263 | "baz": [
264 | 1,
265 | 2,
266 | 3,
267 | 4
268 | ],
269 | "foo": 42
270 | }
271 |
--------------------------------------------------------------------------------
/docs/v2/v2.0.rst:
--------------------------------------------------------------------------------
1 | .. _spec_v2:
2 |
3 | Zarr Storage Specification Version 2
4 | ====================================
5 |
6 | This document provides a technical specification of the protocol and format
7 | used for storing Zarr arrays. The key words "MUST", "MUST NOT", "REQUIRED",
8 | "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
9 | "OPTIONAL" in this document are to be interpreted as described in `RFC 2119
10 | `_.
11 |
12 | Status
13 | ------
14 |
15 | This specification has been superseded. See :ref:`zarr-specs` for the latest
16 | version.
17 |
18 | .. _spec_v2_storage:
19 |
20 | Storage
21 | -------
22 |
23 | A Zarr array can be stored in any storage system that provides a key/value
24 | interface, where a key is an ASCII string and a value is an arbitrary sequence
25 | of bytes, and the supported operations are read (get the sequence of bytes
26 | associated with a given key), write (set the sequence of bytes associated with
27 | a given key) and delete (remove a key/value pair).
28 |
29 | For example, a directory in a file system can provide this interface, where
30 | keys are file names, values are file contents, and files can be read, written
31 | or deleted via the operating system. Equally, an S3 bucket can provide this
32 | interface, where keys are resource names, values are resource contents, and
33 | resources can be read, written or deleted via HTTP.
34 |
35 | Below an "array store" refers to any system implementing this interface.
36 |
37 | .. _spec_v2_array:
38 |
39 | Arrays
40 | ------
41 |
42 | .. _spec_v2_array_metadata:
43 |
44 | Metadata
45 | ~~~~~~~~
46 |
47 | Each array requires essential configuration metadata to be stored, enabling
48 | correct interpretation of the stored data. This metadata is encoded using JSON
49 | and stored as the value of the ".zarray" key within an array store.
50 |
51 | The metadata resource is a JSON object. The following keys MUST be present
52 | within the object:
53 |
54 | zarr_format
55 | An integer defining the version of the storage specification to which the
56 | array store adheres.
57 | shape
58 | A list of integers defining the length of each dimension of the array.
59 | chunks
60 | A list of integers defining the length of each dimension of a chunk of the
61 | array. Note that all chunks within a Zarr array have the same shape.
62 | dtype
63 | A string or list defining a valid data type for the array. See also
64 | the subsection below on data type encoding.
65 | compressor
66 | A JSON object identifying the primary compression codec and providing
67 | configuration parameters, or ``null`` if no compressor is to be used.
68 | The object MUST contain an ``"id"`` key identifying the codec to be used.
69 | fill_value
70 | A scalar value providing the default value to use for uninitialized
71 | portions of the array, or ``null`` if no fill_value is to be used.
72 | order
73 | Either "C" or "F", defining the layout of bytes within each chunk of the
74 | array. "C" means row-major order, i.e., the last dimension varies fastest;
75 | "F" means column-major order, i.e., the first dimension varies fastest.
76 | filters
77 | A list of JSON objects providing codec configurations, or ``null`` if no
78 | filters are to be applied. Each codec configuration object MUST contain a
79 | ``"id"`` key identifying the codec to be used.
80 |
81 | The following keys MAY be present within the object:
82 |
83 | dimension_separator
84 | If present, either the string ``"."`` or ``"/"`` defining the separator placed
85 | between the dimensions of a chunk. If the value is not set, then the
86 | default MUST be assumed to be ``"."``, leading to chunk keys of the form "0.0".
87 | Arrays defined with ``"/"`` as the dimension separator can be considered to have
88 | nested, or hierarchical, keys of the form "0/0" that SHOULD where possible
89 | produce a directory-like structure.
90 |
91 | Other keys SHOULD NOT be present within the metadata object and SHOULD be
92 | ignored by implementations.
93 |
94 | For example, the JSON object below defines a 2-dimensional array of 64-bit
95 | little-endian floating point numbers with 10000 rows and 10000 columns, divided
96 | into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
97 | arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
98 | contiguous order. Each chunk is encoded using a delta filter and compressed
99 | using the Blosc compression library prior to storage::
100 |
101 | {
102 | "chunks": [
103 | 1000,
104 | 1000
105 | ],
106 | "compressor": {
107 | "id": "blosc",
108 | "cname": "lz4",
109 | "clevel": 5,
110 | "shuffle": 1
111 | },
112 | "dtype": "`_. The format
132 | consists of 3 parts:
133 |
134 | * One character describing the byteorder of the data (``"<"``: little-endian;
135 | ``">"``: big-endian; ``"|"``: not-relevant)
136 | * One character code giving the basic type of the array (``"b"``: Boolean (integer
137 | type where all values are only True or False); ``"i"``: integer; ``"u"``: unsigned
138 | integer; ``"f"``: floating point; ``"c"``: complex floating point; ``"m"``: timedelta;
139 | ``"M"``: datetime; ``"S"``: string (fixed-length sequence of char); ``"U"``: unicode
140 | (fixed-length sequence of Py_UNICODE); ``"V"``: other (void * – each item is a
141 | fixed-size chunk of memory))
142 | * An integer specifying the number of bytes the type uses.
143 |
144 | The byte order MUST be specified. E.g., ``"i4"``, ``"|b1"`` and
145 | ``"|S12"`` are valid data type encodings.
146 |
147 | For datetime64 ("M") and timedelta64 ("m") data types, these MUST also include the
148 | units within square brackets. A list of valid units and their definitions are given in
149 | the `NumPy documentation on Datetimes and Timedeltas `_.
150 | For example, ``"`_. Each
155 | sub-list has the form ``[fieldname, datatype, shape]`` where ``shape``
156 | is optional. ``fieldname`` is a string, ``datatype`` is a string
157 | specifying a simple data type (see above), and ``shape`` is a list of
158 | integers specifying subarray shape. For example, the JSON list below
159 | defines a data type composed of three single-byte unsigned integer
160 | fields named "r", "g" and "b"::
161 |
162 | [["r", "|u1"], ["g", "|u1"], ["b", "|u1"]]
163 |
164 | For example, the JSON list below defines a data type composed of three
165 | fields named "x", "y" and "z", where "x" and "y" each contain 32-bit
166 | floats, and each item in "z" is a 2 by 2 array of floats::
167 |
168 | [["x", "`_
207 | produces a sequence of bytes that begins with a 16-byte header followed by
208 | compressed data.
209 |
210 | The compressed sequence of bytes for each chunk is stored under a key formed
211 | from the index of the chunk within the grid of chunks representing the array.
212 | To form a string key for a chunk, the indices are converted to strings and
213 | concatenated with the period character (".") separating each index. For
214 | example, given an array with shape (10000, 10000) and chunk shape (1000, 1000)
215 | there will be 100 chunks laid out in a 10 by 10 grid. The chunk with indices
216 | (0, 0) provides data for rows 0-999 and columns 0-999 and is stored under the
217 | key "0.0"; the chunk with indices (2, 4) provides data for rows 2000-2999 and
218 | columns 4000-4999 and is stored under the key "2.4"; etc.
219 |
220 | There is no need for all chunks to be present within an array store. If a chunk
221 | is not present then it is considered to be in an uninitialized state. An
222 | uninitialized chunk MUST be treated as if it was uniformly filled with the value
223 | of the "fill_value" field in the array metadata. If the "fill_value" field is
224 | ``null`` then the contents of the chunk are undefined.
225 |
226 | Note that all chunks in an array have the same shape. If the length of any
227 | array dimension is not exactly divisible by the length of the corresponding
228 | chunk dimension then some chunks will overhang the edge of the array. The
229 | contents of any chunk region falling outside the array are undefined.
230 |
231 | .. _spec_v2_array_filters:
232 |
233 | Filters
234 | ~~~~~~~
235 |
236 | Optionally a sequence of one or more filters can be used to transform chunk
237 | data prior to compression. When storing data, filters are applied in the order
238 | specified in array metadata to encode data, then the encoded data are passed to
239 | the primary compressor. When retrieving data, stored chunk data are
240 | decompressed by the primary compressor then decoded using filters in the
241 | reverse order.
242 |
243 | .. _spec_v2_hierarchy:
244 |
245 | Hierarchies
246 | -----------
247 |
248 | .. _spec_v2_hierarchy_paths:
249 |
250 | Logical storage paths
251 | ~~~~~~~~~~~~~~~~~~~~~
252 |
253 | Multiple arrays can be stored in the same array store by associating each array
254 | with a different logical path. A logical path is simply an ASCII string. The
255 | logical path is used to form a prefix for keys used by the array. For example,
256 | if an array is stored at logical path "foo/bar" then the array metadata will be
257 | stored under the key "foo/bar/.zarray", the user-defined attributes will be
258 | stored under the key "foo/bar/.zattrs", and the chunks will be stored under
259 | keys like "foo/bar/0.0", "foo/bar/0.1", etc.
260 |
261 | To ensure consistent behaviour across different storage systems, logical paths
262 | MUST be normalized as follows:
263 |
264 | * Replace all backward slash characters ("\\\\") with forward slash characters
265 | ("/")
266 | * Strip any leading "/" characters
267 | * Strip any trailing "/" characters
268 | * Collapse any sequence of more than one "/" character into a single "/"
269 | character
270 |
271 | The key prefix is then obtained by appending a single "/" character to the
272 | normalized logical path.
273 |
274 | After normalization, if splitting a logical path by the "/" character results
275 | in any path segment equal to the string "." or the string ".." then an error
276 | MUST be raised.
277 |
278 | N.B., how the underlying array store processes requests to store values under
279 | keys containing the "/" character is entirely up to the store implementation
280 | and is not constrained by this specification. E.g., an array store could simply
281 | treat all keys as opaque ASCII strings; equally, an array store could map
282 | logical paths onto some kind of hierarchical storage (e.g., directories on a
283 | file system).
284 |
285 | .. _spec_v2_hierarchy_groups:
286 |
287 | Groups
288 | ~~~~~~
289 |
290 | Arrays can be organized into groups which can also contain other groups. A
291 | group is created by storing group metadata under the ".zgroup" key under some
292 | logical path. E.g., a group exists at the root of an array store if the
293 | ".zgroup" key exists in the store, and a group exists at logical path "foo/bar"
294 | if the "foo/bar/.zgroup" key exists in the store.
295 |
296 | If the user requests a group to be created under some logical path, then groups
297 | MUST also be created at all ancestor paths. E.g., if the user requests group
298 | creation at path "foo/bar" then groups MUST be created at path "foo" and the
299 | root of the store, if they don't already exist.
300 |
301 | If the user requests an array to be created under some logical path, then
302 | groups MUST also be created at all ancestor paths. E.g., if the user requests
303 | array creation at path "foo/bar/baz" then groups must be created at path
304 | "foo/bar", path "foo", and the root of the store, if they don't already exist.
305 |
306 | The group metadata resource is a JSON object. The following keys MUST be present
307 | within the object:
308 |
309 | zarr_format
310 | An integer defining the version of the storage specification to which the
311 | array store adheres.
312 |
313 | Other keys MUST NOT be present within the metadata object.
314 |
315 | The members of a group are arrays and groups stored under logical paths that
316 | are direct children of the parent group's logical path. E.g., if groups exist
317 | under the logical paths "foo" and "foo/bar" and an array exists at logical path
318 | "foo/baz" then the members of the group at path "foo" are the group at path
319 | "foo/bar" and the array at path "foo/baz".
320 |
321 | .. _spec_v2_attrs:
322 |
323 | Attributes
324 | ----------
325 |
326 | An array or group can be associated with custom attributes, which are arbitrary
327 | key/value pairs with application-specific meaning. Custom attributes are encoded
328 | as a JSON object and stored under the ".zattrs" key within an array store. The
329 | ".zattrs" key does not have to be present, and if it is absent the attributes
330 | should be treated as empty.
331 |
332 | For example, the JSON object below encodes three attributes named
333 | "foo", "bar" and "baz"::
334 |
335 | {
336 | "foo": 42,
337 | "bar": "apples",
338 | "baz": [1, 2, 3, 4]
339 | }
340 |
341 | .. _spec_v2_examples:
342 |
343 | Examples
344 | --------
345 |
346 | Storing a single array
347 | ~~~~~~~~~~~~~~~~~~~~~~
348 |
349 | Below is an example of storing a Zarr array, using a directory on the
350 | local file system as storage.
351 |
352 | Create an array::
353 |
354 | >>> import zarr
355 | >>> store = zarr.DirectoryStore('data/example.zarr')
356 | >>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
357 | ... fill_value=42, compressor=zarr.Zlib(level=1),
358 | ... store=store, overwrite=True)
359 |
360 | No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys
361 | have been set in the store::
362 |
363 | >>> import os
364 | >>> sorted(os.listdir('data/example.zarr'))
365 | ['.zarray']
366 |
367 | Inspect the array metadata::
368 |
369 | >>> print(open('data/example.zarr/.zarray').read())
370 | {
371 | "chunks": [
372 | 10,
373 | 10
374 | ],
375 | "compressor": {
376 | "id": "zlib",
377 | "level": 1
378 | },
379 | "dtype": ">> a[0:10, 0:10] = 1
393 | >>> sorted(os.listdir('data/example.zarr'))
394 | ['.zarray', '0.0']
395 |
396 | Set some more data::
397 |
398 | >>> a[0:10, 10:20] = 2
399 | >>> a[10:20, :] = 3
400 | >>> sorted(os.listdir('data/example.zarr'))
401 | ['.zarray', '0.0', '0.1', '1.0', '1.1']
402 |
403 | Manually decompress a single chunk for illustration::
404 |
405 | >>> import zlib
406 | >>> buf = zlib.decompress(open('data/example.zarr/0.0', 'rb').read())
407 | >>> import numpy as np
408 | >>> chunk = np.frombuffer(buf, dtype='>> chunk
410 | array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
411 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
412 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
413 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
414 | 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
415 |
416 | Modify the array attributes::
417 |
418 | >>> a.attrs['foo'] = 42
419 | >>> a.attrs['bar'] = 'apples'
420 | >>> a.attrs['baz'] = [1, 2, 3, 4]
421 | >>> sorted(os.listdir('data/example.zarr'))
422 | ['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1']
423 | >>> print(open('data/example.zarr/.zattrs').read())
424 | {
425 | "bar": "apples",
426 | "baz": [
427 | 1,
428 | 2,
429 | 3,
430 | 4
431 | ],
432 | "foo": 42
433 | }
434 |
435 | Storing multiple arrays in a hierarchy
436 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
437 |
438 | Below is an example of storing multiple Zarr arrays organized into a group
439 | hierarchy, using a directory on the local file system as storage. This storage
440 | implementation maps logical paths onto directory paths on the file system,
441 | however this is an implementation choice and is not required.
442 |
443 | Setup the store::
444 |
445 | >>> import zarr
446 | >>> store = zarr.DirectoryStore('data/group.zarr')
447 |
448 | Create the root group::
449 |
450 | >>> root_grp = zarr.group(store, overwrite=True)
451 |
452 | The metadata resource for the root group has been created::
453 |
454 | >>> import os
455 | >>> sorted(os.listdir('data/group.zarr'))
456 | ['.zgroup']
457 |
458 | Inspect the group metadata::
459 |
460 | >>> print(open('data/group.zarr/.zgroup').read())
461 | {
462 | "zarr_format": 2
463 | }
464 |
465 | Create a sub-group::
466 |
467 | >>> sub_grp = root_grp.create_group('foo')
468 |
469 | What has been stored::
470 |
471 | >>> sorted(os.listdir('data/group.zarr'))
472 | ['.zgroup', 'foo']
473 | >>> sorted(os.listdir('data/group.zarr/foo'))
474 | ['.zgroup']
475 |
476 | Create an array within the sub-group::
477 |
478 | >>> a = sub_grp.create_dataset('bar', shape=(20, 20), chunks=(10, 10))
479 | >>> a[:] = 42
480 |
481 | Set a custom attributes::
482 |
483 | >>> a.attrs['comment'] = 'answer to life, the universe and everything'
484 |
485 | What has been stored::
486 |
487 | >>> sorted(os.listdir('data/group.zarr'))
488 | ['.zgroup', 'foo']
489 | >>> sorted(os.listdir('data/group.zarr/foo'))
490 | ['.zgroup', 'bar']
491 | >>> sorted(os.listdir('data/group.zarr/foo/bar'))
492 | ['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1']
493 |
494 | Here is the same example using a Zip file as storage::
495 |
496 | >>> store = zarr.ZipStore('data/group.zip', mode='w')
497 | >>> root_grp = zarr.group(store)
498 | >>> sub_grp = root_grp.create_group('foo')
499 | >>> a = sub_grp.create_dataset('bar', shape=(20, 20), chunks=(10, 10))
500 | >>> a[:] = 42
501 | >>> a.attrs['comment'] = 'answer to life, the universe and everything'
502 | >>> store.close()
503 |
504 | What has been stored::
505 |
506 | >>> import zipfile
507 | >>> zf = zipfile.ZipFile('data/group.zip', mode='r')
508 | >>> for name in sorted(zf.namelist()):
509 | ... print(name)
510 | .zgroup
511 | foo/.zgroup
512 | foo/bar/.zarray
513 | foo/bar/.zattrs
514 | foo/bar/0.0
515 | foo/bar/0.1
516 | foo/bar/1.0
517 | foo/bar/1.1
518 |
519 | .. _spec_v2_changes:
520 |
521 | Changes
522 | -------
523 |
524 | Version 2 clarifications
525 | ~~~~~~~~~~~~~~~~~~~~~~~~
526 |
527 | The following changes have been made to the version 2 specification since it was
528 | initially published to clarify ambiguities and add some missing information.
529 |
530 | * The specification now describes how bytes fill values should be encoded and
531 | decoded for arrays with a fixed-length byte string data type (`#165
532 | `_, `#176
533 | `_).
534 |
535 | * The specification now clarifies that units must be specified for datetime64 and
536 | timedelta64 data types (`#85
537 | `_, `#215
538 | `_).
539 |
540 | * The specification now clarifies that the '.zattrs' key does not have to be present for
541 | either arrays or groups, and if absent then custom attributes should be treated as
542 | empty.
543 |
544 | * The specification now describes how structured datatypes with
545 | subarray shapes and/or with nested structured data types are encoded
546 | in array metadata (`#111
547 | `_, `#296
548 | `_).
549 |
550 | * Clarified the key/value pairs of custom attributes as "arbitrary" rather than
551 | "simple".
552 |
553 | Changes from version 1 to version 2
554 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
555 |
556 | The following changes were made between version 1 and version 2 of this specification:
557 |
558 | * Added support for storing multiple arrays in the same store and organising
559 | arrays into hierarchies using groups.
560 | * Array metadata is now stored under the ".zarray" key instead of the "meta"
561 | key.
562 | * Custom attributes are now stored under the ".zattrs" key instead of the
563 | "attrs" key.
564 | * Added support for filters.
565 | * Changed encoding of "fill_value" field within array metadata.
566 | * Changed encoding of compressor information within array metadata to be
567 | consistent with representation of filter information.
568 |
--------------------------------------------------------------------------------
/docs/v3/chunk-grids/index.rst:
--------------------------------------------------------------------------------
1 | .. _chunk-grid-list:
2 |
3 | ===========
4 | Chunk Grids
5 | ===========
6 |
7 | The following documents specify chunk grids which SHOULD
8 | be implemented by all implementations.
9 |
10 | .. toctree::
11 | :glob:
12 | :maxdepth: 1
13 | :titlesonly:
14 | :caption: Contents:
15 |
16 | */*
17 |
18 | Extensions
19 | ----------
20 |
21 | Registered chunk grid extensions can be found under
22 | `zarr-extensions::chunk-grids `_.
23 |
--------------------------------------------------------------------------------
/docs/v3/chunk-grids/regular-grid/index.rst:
--------------------------------------------------------------------------------
1 |
2 | .. _regular-chunkgrid:
3 |
4 | ==================
5 | Regular chunk grid
6 | ==================
7 |
8 | Version:
9 | 1.0
10 | Specification URI:
11 | https://zarr-specs.readthedocs.io/en/latest/v3/chunk-grids/regular-grid/
12 | Corresponding ZEP:
13 | `ZEP0001 — Zarr specification version 3 `_
14 | Issue tracking:
15 | `GitHub issues `_
16 | Suggest an edit for this spec:
17 | `GitHub editor `_
18 |
19 | Copyright 2020-Present Zarr core development team. This work
20 | is licensed under a `Creative Commons Attribution 3.0 Unported License
21 | `_.
22 |
23 | ----
24 |
25 | Abstract
26 | ========
27 |
28 | A regular grid is a type of grid where an array is divided into chunks
29 | such that each chunk is a hyperrectangle of the same shape. The
30 | dimensionality of the grid is the same as the dimensionality of the
31 | array. Each chunk in the grid can be addressed by a tuple of positive
32 | integers (`k`, `j`, `i`, ...) corresponding to the indices of the
33 | chunk along each dimension.
34 |
35 | Description
36 | ===========
37 |
38 | The origin element of a chunk has coordinates in the array space (`k` *
39 | `dz`, `j` * `dy`, `i` * `dx`, ...) where (`dz`, `dy`, `dx`, ...) are
40 | the chunk sizes along each dimension.
41 | Thus the origin element of the chunk at grid index (0, 0, 0,
42 | ...) is at coordinate (0, 0, 0, ...) in the array space, i.e., the
43 | grid is aligned with the origin of the array. If the length of any
44 | array dimension is not perfectly divisible by the chunk length along
45 | the same dimension, then the grid will overhang the edge of the array
46 | space.
47 |
48 | The shape of the chunk grid will be (ceil(`z` / `dz`), ceil(`y` /
49 | `dy`), ceil(`x` / `dx`), ...) where (`z`, `y`, `x`, ...) is the array
50 | shape, "/" is the division operator and "ceil" is the ceiling
51 | function. For example, if a 3 dimensional array has shape (10, 200,
52 | 3000), and has chunk shape (5, 20, 400), then the shape of the chunk
53 | grid will be (2, 10, 8), meaning that there will be 2 chunks along the
54 | first dimension, 10 along the second dimension, and 8 along the third
55 | dimension.
56 |
57 | .. list-table:: Regular Grid Example
58 | :header-rows: 1
59 |
60 | * - Array Shape
61 | - Chunk Shape
62 | - Chunk Grid Shape
63 | - Notes
64 | * - (10, 200, 3000)
65 | - (5, 20, 400)
66 | - (2, 10, 8)
67 | - The grid does overhang the edge of the array on the 3rd dimension.
68 |
69 | An element of an array with coordinates (`c`, `b`, `a`, ...) will
70 | occur within the chunk at grid index (`c` // `dz`, `b` // `dy`, `a` //
71 | `dx`, ...), where "//" is the floor division operator. The element
72 | will have coordinates (`c` % `dz`, `b` % `dy`, `a` % `dx`, ...) within
73 | that chunk, where "%" is the modulo operator. For example, if a
74 | 3 dimensional array has shape (10, 200, 3000), and has chunk shape
75 | (5, 20, 400), then the element of the array with coordinates (7, 150, 900)
76 | is contained within the chunk at grid index (1, 7, 2) and has coordinates
77 | (2, 10, 100) within that chunk.
78 |
79 | The store key corresponding to a given grid cell is determined based on the
80 | :ref:`array-metadata-chunk-key-encoding` member of the :ref:`array-metadata`.
81 |
82 | Note that this specification does not consider the case where the
83 | chunk grid and the array space are not aligned at the origin vertices
84 | of the array and the chunk at grid index (0, 0, 0, ...). However,
85 | extensions may define variations on the regular grid type
86 | such that the grid indices may include negative integers, and the
87 | origin element of the array may occur at an arbitrary position within
88 | any chunk, which is required to allow arrays to be extended by an
89 | arbitrary length in a "negative" direction along any dimension.
90 |
91 | .. note:: Chunks at the border of an array always have the full chunk size, even when
92 | the array only covers parts of it. For example, having an array with ``"shape": [30, 30]`` and
93 | ``"chunk_shape": [16, 16]``, the chunk ``0,1`` would also contain unused values for the indices
94 | ``0-16, 30-31``. When writing such chunks it is recommended to use the current fill value
95 | for elements outside the bounds of the array.
96 |
97 |
98 |
99 | Status of this document
100 | =======================
101 |
102 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
103 |
104 |
105 | Document conventions
106 | ====================
107 |
108 | Conformance requirements are expressed with a combination of
109 | descriptive assertions and [RFC2119]_ terminology. The key words
110 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
111 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
112 | parts of this document are to be interpreted as described in
113 | [RFC2119]_. However, for readability, these words do not appear in all
114 | uppercase letters in this specification.
115 |
116 | All of the text of this specification is normative except sections
117 | explicitly marked as non-normative, examples, and notes. Examples in
118 |
--------------------------------------------------------------------------------
/docs/v3/chunk-key-encodings/default/index.rst:
--------------------------------------------------------------------------------
1 | .. _default-chunkkeyencoding:
2 |
3 | ==========================
4 | Default chunk key encoding
5 | ==========================
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/default/
11 | Corresponding ZEP:
12 | `ZEP0001 — Zarr specification version 3 `_
13 | Issue tracking:
14 | `GitHub issues `_
15 | Suggest an edit for this spec:
16 | `GitHub editor `_
17 |
18 | Copyright 2020-Present Zarr core development team. This work
19 | is licensed under a `Creative Commons Attribution 3.0 Unported License
20 | `_.
21 |
22 | ----
23 |
24 | Description
25 | ===========
26 |
27 | The ``configuration`` object may contain one optional member,
28 | ``separator``, which must be either ``"/"`` or ``"."``. If not specified,
29 | ``separator`` defaults to ``"/"``.
30 |
31 | The key for a chunk with grid index (``k``, ``j``, ``i``, ...) is
32 | formed by taking the initial prefix ``c``, and appending for each dimension:
33 |
34 | - the ``separator`` character, followed by,
35 |
36 | - the ASCII decimal string representation of the chunk index within that dimension.
37 |
38 | For example, in a 3 dimensional array, with a separator of ``/`` the identifier
39 | for the chunk at grid index (1, 23, 45) is the string ``"c/1/23/45"``. With a
40 | separator of ``.``, the identifier is the string ``"c.1.23.45"``. The initial prefix
41 | ``c`` ensures that metadata documents and chunks have separate prefixes.
42 |
43 | .. note:: A main difference with spec v2 is that the default chunk separator
44 | changed from ``.`` to ``/``, as in N5. This decreases the maximum number of
45 | items in hierarchical stores like directory stores.
46 |
47 | .. note:: Arrays may have 0 dimensions (when for example representing scalars),
48 | in which case the coordinate of a chunk is the empty tuple, and the chunk key
49 | will consist of the string ``c``.
50 |
51 |
52 | Status of this document
53 | =======================
54 |
55 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
56 |
57 |
58 | Document conventions
59 | ====================
60 |
61 | Conformance requirements are expressed with a combination of
62 | descriptive assertions and [RFC2119]_ terminology. The key words
63 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
64 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
65 | parts of this document are to be interpreted as described in
66 | [RFC2119]_. However, for readability, these words do not appear in all
67 | uppercase letters in this specification.
68 |
69 | All of the text of this specification is normative except sections
70 | explicitly marked as non-normative, examples, and notes. Examples in
71 |
--------------------------------------------------------------------------------
/docs/v3/chunk-key-encodings/index.rst:
--------------------------------------------------------------------------------
1 | .. _chunk-key-encoding-list:
2 |
3 | ===================
4 | Chunk Key Encodings
5 | ===================
6 |
7 | The following documents specify chunk key encodings which SHOULD
8 | be implemented by all implementations.
9 |
10 | .. toctree::
11 | :glob:
12 | :maxdepth: 1
13 | :titlesonly:
14 | :caption: Contents:
15 |
16 | */*
17 |
18 | Extensions
19 | ----------
20 |
21 | Registered chunk grid extensions can be found under
22 | `zarr-extensions::chunk-key-encodings `_.
23 |
--------------------------------------------------------------------------------
/docs/v3/chunk-key-encodings/v2/index.rst:
--------------------------------------------------------------------------------
1 | .. _v2-chunkkeyencoding:
2 |
3 | =====================
4 | v2 chunk key encoding
5 | =====================
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/chunk-key-encodings/v2/
11 | Corresponding ZEP:
12 | `ZEP0001 — Zarr specification version 3 `_
13 | Issue tracking:
14 | `GitHub issues `_
15 | Suggest an edit for this spec:
16 | `GitHub editor `_
17 |
18 | Copyright 2020-Present Zarr core development team. This work
19 | is licensed under a `Creative Commons Attribution 3.0 Unported License
20 | `_.
21 |
22 | ----
23 |
24 | Description
25 | ===========
26 |
27 | The ``configuration`` object may contain one optional member,
28 | ``separator``, which must be either ``"/"`` or ``"."``. If not specified,
29 | ``separator`` defaults to ``"."``.
30 |
31 | The identifier for chunk with at least one dimension is formed by
32 | concatenating for each dimension:
33 |
34 | - the ASCII decimal string representation of the chunk index within that
35 | dimension, followed by
36 |
37 | - the ``separator`` character, except that it is omitted for the last
38 | dimension.
39 |
40 | For example, in a 3 dimensional array, with a separator of ``.`` the identifier
41 | for the chunk at grid index (1, 23, 45) is the string ``"1.23.45"``. With a
42 | separator of ``/``, the identifier is the string ``"1/23/45"``.
43 |
44 | For chunk grids with 0 dimensions, the single chunk has the key ``"0"``.
45 |
46 | .. warning::
47 |
48 | This encoding is intended only to allow existing v2 arrays to be
49 | converted to v3 without having to rename chunks. It is not recommended
50 | to be used when writing new arrays.
51 |
52 |
53 | Status of this document
54 | =======================
55 |
56 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
57 |
58 |
59 | Document conventions
60 | ====================
61 |
62 | Conformance requirements are expressed with a combination of
63 | descriptive assertions and [RFC2119]_ terminology. The key words
64 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
65 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
66 | parts of this document are to be interpreted as described in
67 | [RFC2119]_. However, for readability, these words do not appear in all
68 | uppercase letters in this specification.
69 |
70 | All of the text of this specification is normative except sections
71 | explicitly marked as non-normative, examples, and notes. Examples in
72 |
--------------------------------------------------------------------------------
/docs/v3/codecs/blosc/index.rst:
--------------------------------------------------------------------------------
1 | ===========
2 | Blosc codec
3 | ===========
4 |
5 | Version:
6 | 1.0
7 | Specification URI:
8 | https://zarr-specs.readthedocs.io/en/latest/v3/codecs/blosc/
9 | Corresponding ZEP:
10 | `ZEP0001 — Zarr specification version 3 `_
11 | Issue tracking:
12 | `GitHub issues `_
13 | Suggest an edit for this spec:
14 | `GitHub editor `_
15 |
16 | Copyright 2020-Present Zarr core development team. This work
17 | is licensed under a `Creative Commons Attribution 3.0 Unported License
18 | `_.
19 |
20 | ----
21 |
22 |
23 | Abstract
24 | ========
25 |
26 | Defines a ``bytes -> bytes`` codec that uses the blosc container format.
27 |
28 |
29 | Status of this document
30 | =======================
31 |
32 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
33 |
34 |
35 | Document conventions
36 | ====================
37 |
38 | Conformance requirements are expressed with a combination of
39 | descriptive assertions and [RFC2119]_ terminology. The key words
40 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
41 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
42 | parts of this document are to be interpreted as described in
43 | [RFC2119]_. However, for readability, these words do not appear in all
44 | uppercase letters in this specification.
45 |
46 | All of the text of this specification is normative except sections
47 | explicitly marked as non-normative, examples, and notes. Examples in
48 | this specification are introduced with the words "for example".
49 |
50 |
51 | Codec name
52 | ==========
53 |
54 | The value of the ``name`` member in the codec object MUST be ``blosc``.
55 |
56 |
57 | Configuration parameters
58 | ========================
59 |
60 | cname:
61 | A string identifying the internal compression algorithm to be
62 | used. At the time of writing, the following values are supported
63 | by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd",
64 | "snappy", "zlib".
65 |
66 | clevel:
67 | An integer from 0 to 9 which controls the speed and level of
68 | compression. A level of 1 is the fastest compression method and
69 | produces the least compressions, while 9 is slowest and produces
70 | the most compression. Compression is turned off completely when
71 | level is 0.
72 |
73 | shuffle:
74 | Specifies the type of shuffling to perform, if any, prior to compression.
75 | Must be one of:
76 |
77 | - ``"noshuffle"``, to indicate no shuffling;
78 | - ``"shuffle"``, to indicate byte-wise shuffling;
79 | - ``"bitshuffle"``, to indicate bit-wise shuffling.
80 |
81 | Zarr implementations MAY provide users an option to choose a shuffle mode
82 | automatically based on the typesize or other information, but MUST record in
83 | the metadata the mode that is chosen.
84 |
85 | typesize:
86 | Positive integer specifying the stride in bytes over which shuffling is
87 | performed. Required unless ``shuffle`` is ``"noshuffle"``, in which case the value
88 | is ignored.
89 |
90 | Zarr implementations MAY allow users to leave this unspecified and have the
91 | implementation choose a value automatically based on the array data type and
92 | previous codecs in the chain, but MUST record in the metadata the value that
93 | is chosen.
94 |
95 | blocksize:
96 | An integer giving the size in bytes of blocks into which a
97 | buffer is divided before compression. A value of 0
98 | indicates that an automatic size will be used.
99 |
100 | For example, the array metadata document below specifies that the compressor is
101 | the Blosc codec configured with a compression level of 1, byte-wise shuffling
102 | with a stride of 4, the ``lz4`` compression algorithm and the default block
103 | size::
104 |
105 | {
106 | "codecs": [{
107 | "name": "blosc",
108 | "configuration": {
109 | "cname": "lz4",
110 | "clevel": 1,
111 | "shuffle": "shuffle",
112 | "typesize": 4,
113 | "blocksize": 0
114 | }
115 | }],
116 | }
117 |
118 |
119 | Format and algorithm
120 | ====================
121 |
122 | This is a ``bytes -> bytes`` codec.
123 |
124 | Blosc is a meta-compressor, which divides an input buffer into blocks,
125 | then applies an internal compression algorithm to each block, then
126 | packs the encoded blocks together into a single output buffer with a
127 | header. The format of the encoded buffer is defined in [BLOSC]_. The
128 | reference implementation is provided by the `c-blosc library
129 | `_.
130 |
131 |
132 | Comparison to Zarr v2
133 | =====================
134 |
135 | While the binary format is identical, the JSON metadata differs from that used
136 | by the Zarr v2 ``blosc`` codec in the following ways:
137 |
138 | - The `shuffle` mode is now specified more clearly as `noshuffle` (0 in Zarr v2),
139 | `"bitshuffle"` (2 in Zarr v2), or `"shuffle"` (1 in Zarr v2). Using these constants
140 | rather than numbers makes it much easier to know what shuffle mode will be
141 | used from manual inspection of the metadata.
142 |
143 | - When shuffling is enabled, the `typesize` must now be specified explicitly in
144 | the metadata, rather than determined implicitly from the input data. This
145 | allows Blosc to function as a pure "bytes -> bytes" codec rather than an
146 | "array -> bytes" codec.
147 |
148 | - There is no option to choose between bit-wise and byte-wise shuffling
149 | automatically, as supported in Zarr v2 via a `shuffle` value of `-1`.
150 |
151 | References
152 | ==========
153 |
154 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
155 | Requirement Levels. March 1997. Best Current Practice. URL:
156 | https://tools.ietf.org/html/rfc2119
157 |
158 | .. [BLOSC] F. Alted. Blosc Chunk Format. URL:
159 | https://github.com/Blosc/c-blosc/blob/HEAD/README_CHUNK_FORMAT.rst
160 |
161 |
162 | Change log
163 | ==========
164 |
165 | No changes yet.
166 |
--------------------------------------------------------------------------------
/docs/v3/codecs/bytes/index.rst:
--------------------------------------------------------------------------------
1 | .. _bytes-codec-v1:
2 |
3 | ===========
4 | Bytes codec
5 | ===========
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/codecs/bytes/
11 | Corresponding ZEP:
12 | `ZEP0001 — Zarr specification version 3 `_
13 | Issue tracking:
14 | `GitHub issues `_
15 | Suggest an edit for this spec:
16 | `GitHub editor `_
17 |
18 | Copyright 2020-Present Zarr core development team. This work
19 | is licensed under a `Creative Commons Attribution 3.0 Unported License
20 | `_.
21 |
22 | ----
23 |
24 |
25 | Abstract
26 | ========
27 |
28 | Defines an ``array -> bytes`` codec that encodes arrays of fixed-size numeric
29 | data types as a sequence of bytes in lexicographical order. For multi-byte data
30 | types, it encodes the array either in little endian or big endian.
31 |
32 |
33 | Status of this document
34 | =======================
35 |
36 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
37 |
38 |
39 | Document conventions
40 | ====================
41 |
42 | Conformance requirements are expressed with a combination of
43 | descriptive assertions and [RFC2119]_ terminology. The key words
44 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
45 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
46 | parts of this document are to be interpreted as described in
47 | [RFC2119]_. However, for readability, these words do not appear in all
48 | uppercase letters in this specification.
49 |
50 | All of the text of this specification is normative except sections
51 | explicitly marked as non-normative, examples, and notes. Examples in
52 | this specification are introduced with the words "for example".
53 |
54 |
55 | Codec name
56 | ==========
57 |
58 | The value of the ``name`` member in the codec object MUST be ``bytes``.
59 |
60 |
61 | Configuration parameters
62 | ========================
63 |
64 | endian:
65 | Required for data types for which endianness is applicable. For example,
66 | this includes multi-byte data types, such as ``uint16`` and ``int32``,
67 | but not single-byte data types, such as ``uint8`` or ``bool``.
68 | If present, the value MUST be a string equal to either ``"big"`` or
69 | ``"little"``.
70 |
71 |
72 | Format and algorithm
73 | ====================
74 |
75 | This is an ``array -> bytes`` codec.
76 |
77 | Each element of the array is encoded using the specified endian variant of its
78 | binary representation listed below. Array elements are encoded in
79 | lexicographical order. For example, with ``endian`` specified as ``big``, the
80 | ``int32`` data type is encoded as a 4-byte big endian two's complement integer,
81 | and the ``complex128`` data type is encoded as two consecutive 8-byte big endian
82 | IEEE 754 binary64 values.
83 |
84 | .. list-table:: Supported data types
85 | :header-rows: 1
86 |
87 | * - Identifier
88 | - Binary representation
89 | * - ``bool``
90 | - Single byte, with false encoded as ``\\x00`` and true encoded as
91 | ``\\x01``. Does not depend on ``endian`` parameter.
92 | * - ``int8``
93 | - 1 byte two's complement. Does not depend on ``endian`` parameter.
94 | * - ``int16``
95 | - 2-byte two's complement
96 | * - ``int32``
97 | - 4-byte two's complement
98 | * - ``int64``
99 | - 8-byte two's complement
100 | * - ``uint8``
101 | - 1 byte. Does not depend on ``endian`` parameter.
102 | * - ``uint16``
103 | - 2-byte
104 | * - ``uint32``
105 | - 4-byte
106 | * - ``uint64``
107 | - 8-byte
108 | * - ``float16`` (optionally supported)
109 | - 2-byte IEEE 754 binary16
110 | * - ``float32``
111 | - 4-byte IEEE 754 binary32
112 | * - ``float64``
113 | - 8-byte IEEE 754 binary64
114 | * - ``complex64``
115 | - 2 consecutive 4-byte IEEE 754 binary32 values (real component followed by imaginary component)
116 | * - ``complex128``
117 | - 2 consecutive 8-byte IEEE 754 binary64 values (real component followed by imaginary component)
118 | * - ``r*``
119 | - number of bits, which must be a multiple of 8, given by ``*``.
120 |
121 | .. note::
122 |
123 | To encode elements in a different order than lexicographical order (C
124 | order/row major), the :ref:`transpose codec` may be
125 | specified.
126 |
127 | References
128 | ==========
129 |
130 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
131 | Requirement Levels. March 1997. Best Current Practice. URL:
132 | https://tools.ietf.org/html/rfc2119
133 |
134 |
135 | Change log
136 | ==========
137 |
138 | - ``endian`` codec was renamed to ``bytes`` codec. `PR #263
139 | `_
140 |
--------------------------------------------------------------------------------
/docs/v3/codecs/crc32c/index.rst:
--------------------------------------------------------------------------------
1 | .. _crc32c-codec:
2 |
3 | =====================
4 | CRC32C checksum codec
5 | =====================
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/codecs/crc32c/
11 | Editors:
12 | * Jonathan Striebel (`@jstriebel `_), Scalable Minds
13 | * Norman Rzepka (`@normanrz `_), Scalable Minds
14 | * Jeremy Maitin-Shepard (`@jbms `_), Google
15 | Corresponding ZEP:
16 | `ZEP0002 — Sharding codec `_
17 | Issue tracking:
18 | `GitHub issues `_
19 | Suggest an edit for this spec:
20 | `GitHub editor `_
21 |
22 | Copyright 2022-Present `Zarr core development team
23 | `_. This work
24 | is licensed under a `Creative Commons Attribution 3.0 Unported License
25 | `_.
26 |
27 | ----
28 |
29 |
30 | Abstract
31 | ========
32 |
33 | Defines an ``bytes -> bytes`` codec that appends a CRC32C checksum of the input bytestream.
34 |
35 |
36 | Status of this document
37 | =======================
38 |
39 | ZEP0002 was accepted on November 1st, 2023 via https://github.com/zarr-developers/zarr-specs/issues/254.
40 |
41 | Document conventions
42 | ====================
43 |
44 | Conformance requirements are expressed with a combination of
45 | descriptive assertions and [RFC2119]_ terminology. The key words
46 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
47 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
48 | parts of this document are to be interpreted as described in
49 | [RFC2119]_. However, for readability, these words do not appear in all
50 | uppercase letters in this specification.
51 |
52 | All of the text of this specification is normative except sections
53 | explicitly marked as non-normative, examples, and notes. Examples in
54 | this specification are introduced with the words "for example".
55 |
56 |
57 | Codec name
58 | ==========
59 |
60 | The value of the ``name`` member in the codec object MUST be ``crc32c``.
61 |
62 |
63 | Configuration parameters
64 | ========================
65 |
66 | None.
67 |
68 |
69 | Format and algorithm
70 | ====================
71 |
72 | This is a ``bytes -> bytes`` codec.
73 |
74 | The codec computes the CRC32C checksum as defined in [RFC3720]_ of the input
75 | bytestream. The output bytestream is composed of the unchanged input byte
76 | stream with the appended checksum. The checksum is represented as a 32-bit
77 | unsigned integer represented in little endian.
78 |
79 |
80 | References
81 | ==========
82 |
83 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
84 | Requirement Levels. March 1997. Best Current Practice. URL:
85 | https://tools.ietf.org/html/rfc2119
86 |
87 | .. [RFC3720] J. Satran et al. Internet Small Computer Systems
88 | Interface (iSCSI). April 2004. Proposed Standard. URL:
89 | https://tools.ietf.org/html/rfc3720
90 |
91 |
92 | Change log
93 | ==========
94 |
95 | No changes yet.
96 |
--------------------------------------------------------------------------------
/docs/v3/codecs/gzip/index.rst:
--------------------------------------------------------------------------------
1 | ==========
2 | Gzip codec
3 | ==========
4 |
5 | Version:
6 | 1.0
7 | Specification URI:
8 | https://zarr-specs.readthedocs.io/en/latest/v3/codecs/gzip/
9 | Corresponding ZEP:
10 | `ZEP0001 — Zarr specification version 3 `_
11 | Issue tracking:
12 | `GitHub issues `_
13 | Suggest an edit for this spec:
14 | `GitHub editor `_
15 |
16 | Copyright 2020-Present Zarr core development team. This work
17 | is licensed under a `Creative Commons Attribution 3.0 Unported License
18 | `_.
19 |
20 | ----
21 |
22 |
23 | Abstract
24 | ========
25 |
26 | Defines a ``bytes -> bytes`` codec that applies gzip compression.
27 |
28 |
29 | Status of this document
30 | =======================
31 |
32 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
33 |
34 |
35 | Document conventions
36 | ====================
37 |
38 | Conformance requirements are expressed with a combination of
39 | descriptive assertions and [RFC2119]_ terminology. The key words
40 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
41 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
42 | parts of this document are to be interpreted as described in
43 | [RFC2119]_. However, for readability, these words do not appear in all
44 | uppercase letters in this specification.
45 |
46 | All of the text of this specification is normative except sections
47 | explicitly marked as non-normative, examples, and notes. Examples in
48 | this specification are introduced with the words "for example".
49 |
50 |
51 | Codec name
52 | ==========
53 |
54 | The value of the ``name`` member in the codec object MUST be ``gzip``.
55 |
56 |
57 | Configuration parameters
58 | ========================
59 |
60 | level:
61 | An integer from 0 to 9 which controls the speed and level of
62 | compression. A level of 1 is the fastest compression method and
63 | produces the least compressions, while 9 is slowest and produces
64 | the most compression. Compression is turned off completely when
65 | level is 0.
66 |
67 | For example, the array metadata below specifies that the compressor is
68 | the Gzip codec configured with a compression level of 1::
69 |
70 | {
71 | "codecs": [{
72 | "name": "gzip",
73 | "configuration": {
74 | "level": 1
75 | }
76 | }],
77 | }
78 |
79 |
80 | Format and algorithm
81 | ====================
82 |
83 | This is a ``bytes -> bytes`` codec.
84 |
85 | Encoding and decoding is performed using the algorithm defined in
86 | [RFC1951]_.
87 |
88 | Encoded data should conform to the Gzip file format [RFC1952]_.
89 |
90 |
91 | References
92 | ==========
93 |
94 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
95 | Requirement Levels. March 1997. Best Current Practice. URL:
96 | https://tools.ietf.org/html/rfc2119
97 |
98 | .. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version
99 | 1.3. Requirement Levels. May 1996. Informational. URL:
100 | https://tools.ietf.org/html/rfc1951
101 |
102 | .. [RFC1952] P. Deutsch. GZIP file format specification version 4.3.
103 | Requirement Levels. May 1996. Informational. URL:
104 | https://tools.ietf.org/html/rfc1952
105 |
106 |
107 | Change log
108 | ==========
109 |
110 | No changes yet.
111 |
--------------------------------------------------------------------------------
/docs/v3/codecs/index.rst:
--------------------------------------------------------------------------------
1 | .. _codec-list:
2 |
3 | ======
4 | Codecs
5 | ======
6 |
7 | The following documents specify codecs which SHOULD
8 | be implemented by all implementations.
9 |
10 | .. toctree::
11 | :glob:
12 | :maxdepth: 1
13 | :titlesonly:
14 | :caption: Contents:
15 |
16 | */*
17 |
18 | Extensions
19 | ----------
20 |
21 | Registered codec extensions can be found under
22 | `zarr-extensions::codecs `_.
23 |
--------------------------------------------------------------------------------
/docs/v3/codecs/sharding-indexed/index.rst:
--------------------------------------------------------------------------------
1 | .. _sharding-indexed-codec:
2 |
3 | ==============
4 | Sharding codec
5 | ==============
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/codecs/sharding-indexed/
11 | Editors:
12 | * Jonathan Striebel (`@jstriebel `_), Scalable Minds
13 | * Norman Rzepka (`@normanrz `_), Scalable Minds
14 | * Jeremy Maitin-Shepard (`@jbms `_), Google
15 | Corresponding ZEP:
16 | `ZEP0002 — Sharding codec `_
17 | Issue tracking:
18 | `GitHub issues `_
19 | Suggest an edit for this spec:
20 | `GitHub editor `_
21 |
22 | Copyright 2022-Present `Zarr core development team
23 | `_. This work
24 | is licensed under a `Creative Commons Attribution 3.0 Unported License
25 | `_.
26 |
27 | ----
28 |
29 |
30 | Abstract
31 | ========
32 |
33 | This specification defines a Zarr ``array -> bytes`` codec for sharding.
34 |
35 | Sharding logically splits chunks ("shards") into sub-chunks ("inner chunks")
36 | that can be individually compressed and accessed. This allows to colocate
37 | multiple chunks within one storage object, bundling them in shards.
38 |
39 | Status of this document
40 | =======================
41 |
42 | ZEP0002 was accepted on November 1st, 2023 via https://github.com/zarr-developers/zarr-specs/issues/254.
43 |
44 | Motivation
45 | ==========
46 |
47 | In many cases, it becomes inefficient or impractical to store a large number of
48 | chunks as separate files or objects due to the design constraints of the
49 | underlying storage. For example, the file block size and maximum inode number
50 | restrict the usage of numerous small files for typical file systems, also cloud
51 | storage such as S3, GCS, and various distributed filesystems do not efficiently
52 | handle large numbers of small files or objects.
53 |
54 | Increasing the chunk size works only up to a certain point, as chunk sizes need
55 | to be small for read efficiency requirements, for example to stream data in
56 | browser-based visualization software.
57 |
58 | Therefore, chunks may need to be smaller than the minimum size of one storage
59 | key. In those cases, it is efficient to store objects at a more coarse
60 | granularity than reading chunks.
61 |
62 | **Sharding solves this by allowing to store multiple chunks in one storage key,
63 | which is called a shard**:
64 |
65 | .. image:: sharding.png
66 |
67 |
68 | Document conventions
69 | ====================
70 |
71 | Conformance requirements are expressed with a combination of descriptive
72 | assertions and [RFC2119]_ terminology. The key words "MUST", "MUST NOT",
73 | "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
74 | and "OPTIONAL" in the normative parts of this document are to be interpreted as
75 | described in [RFC2119]_. However, for readability, these words do not appear in
76 | all uppercase letters in this specification.
77 |
78 | All of the text of this specification is normative except sections explicitly
79 | marked as non-normative, examples, and notes. Examples in this specification are
80 | introduced with the words "for example".
81 |
82 |
83 | Codec name
84 | ==========
85 |
86 | The value of the ``name`` member in the codec object MUST be ``sharding_indexed``.
87 |
88 |
89 | Configuration parameters
90 | ========================
91 |
92 | Sharding can be configured per array in the :ref:`array-metadata` as follows::
93 |
94 | {
95 | "codecs": [
96 | {
97 | "name": "sharding_indexed"
98 | "configuration": {
99 | "chunk_shape": [32, 32],
100 | "codecs": [
101 | {
102 | "name": "bytes",
103 | "configuration": {
104 | "endian": "little",
105 | }
106 | },
107 | {
108 | "name": "gzip",
109 | "configuration": {
110 | "level": 1
111 | }
112 | }
113 | ],
114 | "index_codecs": [
115 | {
116 | "name": "bytes",
117 | "configuration": {
118 | "endian": "little",
119 | }
120 | },
121 | { "name": "crc32c" }
122 | ],
123 | "index_location": "end"
124 | }
125 | }
126 | ]
127 | }
128 |
129 | ``chunk_shape``
130 |
131 | An array of integers specifying the shape of the inner chunks in a shard
132 | along each dimension of the outer array. The length of the ``chunk_shape``
133 | array must match the number of dimensions of the shard shape to which this
134 | sharding codec is applied, and the inner chunk shape along each dimension must
135 | evenly divide the size of the shard shape. For example, an inner chunk
136 | shape of ``[32, 2]`` with an shard shape ``[64, 64]`` indicates that
137 | 64 inner chunks are combined in one shard, 2 along the first dimension, and for
138 | each of those 32 along the second dimension.
139 |
140 | ``codecs``
141 |
142 | Specifies a list of codecs to be used for encoding and decoding inner chunks.
143 | The value must be an array of objects, as specified in the
144 | :ref:`array-metadata`. The ``codecs`` member is required and needs to contain
145 | exactly one ``array -> bytes`` codec.
146 |
147 | ``index_codecs``
148 |
149 | Specifies a list of codecs to be used for encoding and decoding shard index.
150 | The value must be an array of objects, as specified in the
151 | :ref:`array-metadata`. The ``index_codecs`` member is required and needs to
152 | contain exactly one ``array -> bytes`` codec. Codecs that produce
153 | variable-sized encoded representation, such as compression codecs, MUST NOT
154 | be used for index codecs. It is RECOMMENDED to use a little-endian codec
155 | followed by a crc32c checksum as index codecs.
156 |
157 | ``index_location``
158 |
159 | Specifies whether the shard index is located at the beginning or end of the
160 | file. The parameter value must be either the string ``start`` or ``end``.
161 | If the parameter is not present, the value defaults to ``end``.
162 |
163 | Definitions
164 | ===========
165 |
166 | * **Shard** is a chunk of the outer array that corresponds to one storage object.
167 | As described in this document, shards MAY have multiple inner chunks.
168 | * **Inner chunk** is a chunk within the shard.
169 | * **Shard shape** is the chunk shape of the outer array.
170 | * **Inner chunk shape** is defined by the ``chunk_shape`` configuration of the codec.
171 | The inner chunk shape needs to have the same number of dimensions as the shard shape and the
172 | inner chunk shape along each dimension must evenly divide the size of the shard shape.
173 | * **Chunks per shard** is the element-wise division of the shard shape by the
174 | inner chunk shape.
175 |
176 |
177 | Binary shard format
178 | ===================
179 |
180 | This is an ``array -> bytes`` codec.
181 |
182 | In the ``sharding_indexed`` binary format, inner chunks are written successively in a
183 | shard, where unused space between them is allowed, followed by an index
184 | referencing them.
185 |
186 | The index is an array with 64-bit unsigned integers with a shape that matches the
187 | chunks per shard tuple with an appended dimension of size 2.
188 | For example, given a shard shape of ``[128, 128]`` and chunk shape of ``[32, 32]``,
189 | there are ``[4, 4]`` inner chunks in a shard. The corresponding shard index has a
190 | shape of ``[4, 4, 2]``.
191 |
192 | The index contains the ``offset`` and ``nbytes`` values for each inner chunk.
193 | The ``offset[i]`` specifies the byte offset within the shard at which the
194 | encoded representation of chunk ``i`` begins, and ``nbytes[i]`` specifies the
195 | encoded length in bytes.
196 |
197 | Empty inner chunks are denoted by setting both offset and nbytes to ``2^64 - 1``.
198 | Empty inner chunks are interpreted as being filled with the fill value. The index
199 | always has the full shape of all possible inner chunks per shard, even if they extend
200 | beyond the array shape.
201 |
202 | The index is either placed at the end of the file or at the beginning of the file,
203 | as configured by the ``index_location`` parameter. The index is encoded into binary
204 | representations using the specified index codecs. The byte size of the index is
205 | determined by the number of inner chunks in the shard ``n``, i.e. the product of
206 | chunks per shard, and the choice of index codecs.
207 |
208 | For an example, consider a shard shape of ``[64, 64]``, an inner chunk shape of
209 | ``[32, 32]`` and an index codec combination of a little-endian codec followed by
210 | a crc32c checksum codec. The size of the corresponding index is
211 | ``16 (2x uint64) * 4 (chunks per shard) + 4 (crc32c checksum) = 68 bytes``.
212 | The index would look like::
213 |
214 | | chunk (0, 0) | chunk (0, 1) | chunk (1, 0) | chunk (1, 1) | |
215 | | offset | nbytes | offset | nbytes | offset | nbytes | offset | nbytes | checksum |
216 | | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint64 | uint32 |
217 |
218 |
219 | The actual order of the chunk content is not fixed and may be chosen by the
220 | implementation. All possible write orders are valid according to this
221 | specification and therefore can be read by any other implementation. When
222 | writing partial inner chunks into an existing shard, no specific order of the existing
223 | inner chunks may be expected. Some writing strategies might be
224 |
225 | * **Fixed order**: Specify a fixed order (e.g. row-, column-major, or Morton
226 | order). When replacing existing inner chunks larger or equal-sized inner chunks may be
227 | replaced in-place, leaving unused space up to an upper limit that might
228 | possibly be specified. Please note that, for regular-sized uncompressed data,
229 | all inner chunks have the same size and can therefore be replaced in-place.
230 | * **Append-only**: Any chunk to write is appended to the existing shard,
231 | followed by an updated index. If previous inner chunks are updated, their storage
232 | space becomes unused, as well as the previous index. This might be useful for
233 | storage that only allows append-only updates.
234 | * **Other formats**: Other formats that accept additional bytes at the end of
235 | the file (such as HDF) could be used for storing shards, by writing the inner chunks
236 | in the order the format prescribes and appending a binary index derived from
237 | the byte offsets and lengths at the end of the file.
238 |
239 | Any configuration parameters for the write strategy must not be part of the
240 | metadata document; instead they need to be configured at runtime, as this is
241 | implementation specific.
242 |
243 |
244 | Implementation notes
245 | ====================
246 |
247 | The section suggests a non-normative implementation of the codec including
248 | common optimizations.
249 |
250 | * **Decoding**: A simple implementation to decode inner chunks in a shard would (a)
251 | read the entire value from the store into a byte buffer, (b) parse the shard
252 | index as specified above from the beginning or end (according to the
253 | ``index_location``) of the buffer and (c) cut out the relevant bytes that belong
254 | to the requested chunk. The relevant bytes are determined by the
255 | ``offset,nbytes`` pair in the shard index. This bytestream then needs to be
256 | decoded with the inner codecs as specified in the sharding configuration applying
257 | the :ref:`decoding_procedure`. This is similar to how an implementation would
258 | access a sub-slice of a chunk.
259 |
260 | The size of the index can be determined by applying ``c.compute_encoded_size``
261 | for each index codec recursively. The initial size is the byte size of the index
262 | array, i.e. ``16 * chunks per shard``.
263 |
264 | When reading all inner chunks of a shard at once, a useful optimization would be to
265 | read the entire shard once into a byte buffer and then cut out and decode all
266 | inner chunks from that buffer in one pass.
267 |
268 | If the underlying store supports partial reads, the decoding of single inner
269 | chunks can be optimized. In that case, the shard index can be read from the
270 | store by requesting the ``n`` first or last bytes (according to the
271 | ``index_location``), where ``n`` is the size of the index as determined by
272 | the number of inner chunks in the shard and choice of index codecs. After
273 | parsing the shard index, single inner chunks can be requested from the store
274 | by specifying the byte range. The bytestream, then, needs to be decoded as above.
275 |
276 | * **Encoding**: A simple implementation to encode a chunk in a shard would (a)
277 | encode the new chunk per :ref:`encoding_procedure` in a byte buffer using the
278 | shard's inner codecs, (b) read an existing shard from the store, (c) create a
279 | new bytestream with all encoded inner chunks of that shard including the overwritten
280 | chunk, (d) generate a new shard index that is prepended or appended (according
281 | to the ``index_location``) to the chunk bytestream and (e) writes the shard to
282 | the store. If there was no existing shard, an empty shard is assumed. When
283 | writing entire inner chunks, reading the existing shard first may be skipped.
284 |
285 | When working with inner chunks that have a fixed byte size (e.g., uncompressed) and
286 | a store that supports partial writes, a optimization would be to replace the
287 | new chunk by writing to the store at the specified byte range.
288 |
289 | On stores with random-write capabilities, it may be useful to (a) place the shard
290 | index at the beginning of the file, (b) write out inner chunks in
291 | application-specific order, and (c) update the shard index accordingly.
292 | Synchronization of parallelly written inner chunks needs to be handled by the
293 | application.
294 |
295 | Other use case-specific optimizations may be available, e.g., for append-only
296 | workloads.
297 |
298 |
299 | References
300 | ==========
301 |
302 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
303 | Requirement Levels. March 1997. Best Current Practice. URL:
304 | https://tools.ietf.org/html/rfc2119
305 |
306 | Change log
307 | ==========
308 |
309 | * Adds ``index_location`` parameter. `PR 280 `_
310 |
311 | * ZEP0002 was accepted. `Issue 254 `_
312 |
--------------------------------------------------------------------------------
/docs/v3/codecs/sharding-indexed/sharding.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zarr-developers/zarr-specs/b880fb385bedb18dd78ffef1bd683e7e93270c74/docs/v3/codecs/sharding-indexed/sharding.png
--------------------------------------------------------------------------------
/docs/v3/codecs/transpose/index.rst:
--------------------------------------------------------------------------------
1 | .. _transpose-codec-v1:
2 |
3 | ===============
4 | Transpose codec
5 | ===============
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/codecs/transpose/
11 | Corresponding ZEP:
12 | `ZEP0001 — Zarr specification version 3 `_
13 | Issue tracking:
14 | `GitHub issues `_
15 | Suggest an edit for this spec:
16 | `GitHub editor `_
17 |
18 | Copyright 2020-Present Zarr core development team. This work
19 | is licensed under a `Creative Commons Attribution 3.0 Unported License
20 | `_.
21 |
22 | ----
23 |
24 |
25 | Abstract
26 | ========
27 |
28 | Defines an ``array -> array`` codec that permutes the dimensions of the chunk
29 | array.
30 |
31 |
32 | Status of this document
33 | =======================
34 |
35 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
36 |
37 |
38 | Document conventions
39 | ====================
40 |
41 | Conformance requirements are expressed with a combination of
42 | descriptive assertions and [RFC2119]_ terminology. The key words
43 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
44 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
45 | parts of this document are to be interpreted as described in
46 | [RFC2119]_. However, for readability, these words do not appear in all
47 | uppercase letters in this specification.
48 |
49 | All of the text of this specification is normative except sections
50 | explicitly marked as non-normative, examples, and notes. Examples in
51 | this specification are introduced with the words "for example".
52 |
53 |
54 | Codec name
55 | ==========
56 |
57 | The value of the ``name`` member in the codec object MUST be ``transpose``.
58 |
59 |
60 | Configuration parameters
61 | ========================
62 |
63 | order:
64 | Required. Must be an array of integers specifying a permutation of ``0``, ``1``, ...,
65 | `n-1``, where ``n`` is the number of dimensions in the decoded chunk
66 | representation provided as input to this codec.
67 |
68 | Format and algorithm
69 | ====================
70 |
71 | This is an ``array -> array`` codec.
72 |
73 | Given a chunk array ``A`` with shape ``A_shape`` as the decoded representation,
74 | the encoded representation is an array ``B`` with the same data type as ``A``
75 | and shape ``B_shape``, where:
76 |
77 | - ``B_shape[i] = A_shape[order[i]]`` for all dimension indices ``i``, and
78 | - ``B[B_pos] = A[A_pos]``, where ``B_pos[i] = A_pos[order[i]]``, for all chunk
79 | positions ``A_pos`` and dimension indices ``i``.
80 |
81 | .. note::
82 |
83 | Implementations of this codec may simply construct a virtual view that
84 | represents the transposed result, and avoid physically transposing the
85 | in-memory representation when possible.
86 |
87 | References
88 | ==========
89 |
90 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
91 | Requirement Levels. March 1997. Best Current Practice. URL:
92 | https://tools.ietf.org/html/rfc2119
93 |
94 |
95 | Change log
96 | ==========
97 |
98 | Changes after acceptance of ZEP 1
99 | ---------------------------------
100 |
101 | The ``order`` configuration parameter no longer supports the constants ``"C"``
102 | or ``"F"`` and must instead always be specified as an explicit permutation.
103 |
--------------------------------------------------------------------------------
/docs/v3/core/index.rst:
--------------------------------------------------------------------------------
1 | .. This file is in restructured text format: https://docutils.sourceforge.io/rst.html
2 | .. _zarr-core-specification-v3:
3 |
4 | =======================
5 | Zarr core specification
6 | =======================
7 |
8 | Version:
9 | 3.1
10 | Specification URI:
11 | https://zarr-specs.readthedocs.io/en/latest/v3/core/
12 |
13 | Editors:
14 | * Alistair Miles (`@alimanfoo `_), Wellcome Sanger Institute
15 | * Jonathan Striebel (`@jstriebel `_), Scalable Minds
16 | * Norman Rzepka (`@normanrz `_), Scalable Minds
17 | * Jeremy Maitin-Shepard (`@jbms `_), Google
18 | * Josh Moore (`@joshmoore `_), German BioImaging
19 |
20 | Corresponding ZEPs:
21 | * `ZEP0001 — Zarr specification version 3 `_
22 | * `ZEP0009 — Zarr extension naming `_
23 |
24 | Issue tracking:
25 | `GitHub issues `_
26 |
27 | Suggest an edit for this spec:
28 | `GitHub editor `_
29 |
30 | Copyright 2019-Present Zarr core development team. This work
31 | is licensed under a `Creative Commons Attribution 3.0 Unported License
32 | `_.
33 |
34 | ----
35 |
36 |
37 | Abstract
38 | ========
39 |
40 | This specification defines the Zarr format for N-dimensional typed arrays.
41 |
42 |
43 | Status of this document
44 | =======================
45 |
46 | * ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
47 |
48 | This specification is the latest version.
49 |
50 |
51 | Introduction
52 | ============
53 |
54 | This specification defines a format for multidimensional array data. This
55 | type of data is common in scientific and numerical computing
56 | applications. Many domains face computational challenges as
57 | increasingly large volumes of data are being generated, for example,
58 | via high resolution microscopy, remote sensing imagery, genome
59 | sequencing or numerical simulation. The primary motivation for the
60 | development of Zarr is to address this challenge by
61 | enabling the storage of large multidimensional arrays in a way that is
62 | compatible with parallel and/or distributed computing applications.
63 |
64 | This specification supersedes the `Zarr storage
65 | specification version 2
66 | `_ (Zarr v2). The
67 | Zarr v2 specification is implemented in several programming
68 | languages and is used to store and analyse large
69 | scientific datasets from a variety of domains. However, it has become
70 | clear that there are several opportunities for modest but useful
71 | improvements to be made in the format, and for establishing a foundation
72 | that allows for greater interoperability, whilst also enabling a variety
73 | of more advanced and specialised features to be explored and developed.
74 |
75 | This specification also draws heavily on the `N5 API and
76 | file-system specification `_, which
77 | was developed in parallel to Zarr v2 with similar
78 | goals and features. This specification defines a core set of features
79 | at the intersection of both Zarr v2 and N5, and so aims to provide a
80 | common target that can be fully implemented across multiple
81 | programming environments and serve a wide range of applications.
82 |
83 | We highlight the following areas motivating the
84 | development of this specification.
85 |
86 | Extensibility
87 | -------------
88 |
89 | The development of systems for storage of very large array-like data
90 | is a very active area of research and development, and there are many
91 | possibilities that remain to be explored. A goal of this specification
92 | is to define a format with a number of clear extension points and
93 | mechanisms, in order to provide a framework for freely building on and
94 | exploring these possibilities. We aim to make this possible, whilst
95 | also providing pathways for a graceful degradation of functionality
96 | where possible, in order to retain interoperability. We also aim to
97 | provide a framework for community-defined extensions, which can be
98 | developed and published independently without requiring centralised
99 | coordination of all specifications.
100 |
101 | See :ref:`extension points ` below.
102 |
103 | Interoperability
104 | ----------------
105 |
106 | While the Zarr v2 and N5 specifications have each been implemented in
107 | multiple programming languages, there is currently not feature parity
108 | across all implementations. This is in part because the feature set
109 | includes some features that are not easily translated or supported
110 | across different programming languages. This specification aims to
111 | define a set of core features that are useful and sufficient to
112 | address a significant fraction of use cases, but are also
113 | straightforward to implement fully across different programming
114 | languages. Additional functionality can then be layered via
115 | extensions, some of which may aim for wide adoption, some of which may
116 | be more specialised and have more limited implementation.
117 |
118 |
119 | Stability Policy
120 | ----------------
121 |
122 | This core specification adheres to a ``MAJOR.MINOR`` version
123 | number format. When incrementing the minor version, only additional features
124 | can be added. Breaking changes require incrementing the major version.
125 |
126 | A Zarr implementation that provides the read and write API by
127 | implementing a specification ``X.Y`` can be considered compatible with all
128 | datasets which only use features contained in version ``X.Y``.
129 |
130 | For example, spec ``X.1`` adds core feature "foo" compared to ``X.0``. Assuming
131 | implementation A implements ``X.1`` and implementation B implements ``X.0``,
132 | data using feature "foo" can only be read with implementation A. B fails to open
133 | it, as the key "foo" is unknown.
134 |
135 | Data not using "foo" can be used with both implementations, even if it's written
136 | with implementation B.
137 |
138 | Therefore, data is only marked with the respective major version, unknown
139 | features are auto-discovered via the metadata document.
140 |
141 | :ref:`Extensions` defined in subpages of this specification
142 | follow the same stability policy but do so with their own version number.
143 |
144 | Document conventions
145 | ====================
146 |
147 | Conformance requirements are expressed with a combination of
148 | descriptive assertions and [RFC2119]_ terminology. The key words
149 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
150 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
151 | parts of this document are to be interpreted as described in
152 | [RFC2119]_. However, for readability, these words do not appear in all
153 | uppercase letters in this specification.
154 |
155 | All of the text of this specification is normative except sections
156 | explicitly marked as non-normative, examples, and notes. Examples in
157 | this specification are introduced with the words "for example".
158 |
159 | Concepts and terminology
160 | ========================
161 |
162 | This section introduces and defines some key terms and explains the
163 | conceptual model underpinning the Zarr format.
164 |
165 | The following figure illustrates the first part of the terminology:
166 |
167 | ..
168 | The following image was produced with https://excalidraw.com/
169 | and can be loaded there, as the source is embedded in the png.
170 | .. image:: terminology-hierarchy.excalidraw.png
171 | :width: 600
172 |
173 | .. _hierarchy:
174 |
175 | *Hierarchy*
176 |
177 | A Zarr hierarchy is a tree structure, where each node in the tree
178 | is either a group_ or an array_. Group nodes may have children but
179 | array nodes may not. All nodes in a hierarchy have a name_ and a
180 | path_. The root of a Zarr hierarchy may be either a group_ or an array_.
181 | In the latter case, the hierarchy consists of just the single array.
182 |
183 | .. _array:
184 | .. _arrays:
185 |
186 | *Array*
187 |
188 | An array is a node in a hierarchy_. An array is a data structure
189 | with zero or more dimensions_ whose lengths define the shape_ of
190 | the array. An array contains zero or more data elements_. All
191 | elements_ in an array conform to the same `data type`_. An array
192 | may not have child nodes.
193 |
194 | .. _group:
195 | .. _groups:
196 |
197 | *Group*
198 |
199 | A group is a node in a hierarchy_ that may have child nodes.
200 |
201 | .. _name:
202 | .. _names:
203 |
204 | *Name*
205 |
206 | Each child node of a group has a name, which is a string of
207 | characters with some additional constraints defined in the section
208 | on `node names`_ below. Two sibling nodes cannot have the same
209 | name.
210 |
211 | .. _path:
212 | .. _paths:
213 |
214 | *Path*
215 |
216 | Each node in a hierarchy_ has a path, a Unicode string that uniquely
217 | identifies the node and defines its location within the hierarchy_. The root
218 | node has a path of ``/``. The path of a non-root node is equal the
219 | concatenation of:
220 |
221 | - the path of its parent node;
222 | - the ``/`` character, unless the parent is the root node;
223 | - the name_ of the node itself.
224 |
225 | For example, the path ``"/foo/bar"`` identifies a node named ``"bar"``,
226 | whose parent is named ``"foo"``, whose parent is the root of the hierarchy.
227 |
228 | A path always starts with ``/``, and a non-root path cannot end with ``/``,
229 | because node names_ must be non-empty and cannot contain ``/``.
230 |
231 | .. _dimension:
232 | .. _dimensions:
233 |
234 | *Dimension*
235 |
236 | An array_ has a fixed number of zero or more dimensions. Each dimension has
237 | an integer length. This specification only considers the case where the
238 | lengths of all dimensions are finite. However,
239 | :ref:`extensions` may be defined which allow a dimension
240 | to have an infinite or variable length.
241 |
242 | .. _shape:
243 |
244 | *Shape*
245 |
246 | The shape of an array_ is the tuple of dimension_ lengths. For
247 | example, if an array_ has 2 dimensions_, where the length of the
248 | first dimension_ is 100 and the length of the second dimension_ is
249 | 20, then the shape of the array_ is (100, 20). A shape can be the empty
250 | tuple in the case of zero-dimension arrays (scalars).
251 |
252 | .. _element:
253 | .. _elements:
254 |
255 | *Element*
256 |
257 | An array_ contains zero or more elements. Each element is
258 | identified by a tuple of integer coordinates, one for each
259 | dimension_ of the array_. If all dimensions_ of an array_ have
260 | finite length, then the number of elements in the array_ is given
261 | by the product of the dimension_ lengths.
262 |
263 | .. _data type:
264 |
265 | *Data type*
266 |
267 | A data type defines the set of possible values that an array_ may
268 | contain. For example, the 32-bit signed integer data type defines binary
269 | representations for all integers in the range −2,147,483,648 to
270 | 2,147,483,647. This specification only defines a limited set of data types,
271 | but additional data types can be defined as :ref:`extensions`.
272 |
273 | .. _chunk:
274 | .. _chunks:
275 |
276 | *Chunk*
277 |
278 | An array_ is divided into a set of chunks, where each chunk is a
279 | hyperrectangle defined by a tuple of intervals, one for each
280 | dimension_ of the array_. The chunk shape is the tuple of interval
281 | lengths, and the chunk size (i.e., number of elements_ contained
282 | within the chunk) is the product of its interval lengths.
283 |
284 | The chunk shape elements are non-zero when the corresponding dimensions of
285 | the arrays have non-zero length.
286 |
287 | .. _grid:
288 | .. _grids:
289 |
290 | *Grid*
291 |
292 | The chunks_ of an array_ are organised into a grid. This
293 | specification only considers the case where all chunks_ have the
294 | same chunk shape and the chunks form a regular grid. However,
295 | additional chunk grids can be defined as :ref:`extensions`.
296 |
297 | .. _codec:
298 | .. _codecs:
299 |
300 | *Codec*
301 |
302 | The list of *codecs* specified for an array_ determines the encoded byte
303 | representation of each chunk in the store_.
304 |
305 | .. _metadata document:
306 | .. _metadata documents:
307 |
308 | *Metadata document*
309 |
310 | Each array_ or group_ in a hierarchy_ is represented by a metadata document,
311 | which is a machine-readable document containing essential
312 | processing information about the node. For example, an array_
313 | metadata document specifies the number of dimensions_, shape_,
314 | `data type`_, grid_, and codec_ for that array_.
315 |
316 | .. _store:
317 | .. _stores:
318 |
319 | *Store*
320 |
321 | The `metadata documents`_ and encoded chunk_ data for all nodes in a
322 | hierarchy_ are held in a store as raw bytes. To enable a variety
323 | of different store types to be used, this specification defines an
324 | `Abstract store interface`_ which is a common set of operations that stores
325 | may provide. For example, a directory in a file system can be a Zarr store,
326 | where keys are file names, values are file contents, and files can be read,
327 | written, listed or deleted via the operating system. Equally, an S3 bucket
328 | can provide this interface, where keys are resource names, values are
329 | resource contents, and resources can be read, written or deleted via HTTP.
330 |
331 | .. _storage transformer:
332 | .. _storage transformers:
333 |
334 | *Storage transformer*
335 |
336 | To provide performance enhancements or other optimizations,
337 | storage transformers may intercept and alter the storage keys and bytes
338 | of an array_ before they reach the underlying physical storage.
339 | Upon retrieval, the original keys and bytes are restored within the
340 | transformer. Any number of storage transformers can be registered and
341 | stacked. In contrast to codecs, storage transformers can act on the
342 | complete array, rather than individual chunks. See the
343 | `storage transformers details`_ below.
344 |
345 | .. _`storage transformers details`: #storage-transformers-1
346 |
347 | The following figure illustrates the codec, store and storage transformer
348 | terminology for a use case of reading from an array:
349 |
350 | ..
351 | The following image was produced with https://excalidraw.com/
352 | and can be loaded there, as the source is embedded in the png.
353 | .. image:: terminology-read.excalidraw.png
354 | :width: 600
355 |
356 | *Extension point*
357 |
358 | A field in a `metadata document`_ that can be extended to allow values
359 | not defined in this specification.
360 | See :ref:`extension points ` below.
361 |
362 | *Extension*
363 |
364 | An implementation of an extension point which can be referenced
365 | by :ref:`name `.
366 | See the linked lists of extensions under :ref:`extension points ` below.
367 |
368 | *Core*
369 |
370 | Core refers to features or concepts defined within this specification. The
371 | designation of a feature as core does not imply that it is mandatory for
372 | all implementations.
373 |
374 | .. _stored-representation:
375 |
376 | Stored representation
377 | =====================
378 |
379 | A Zarr hierarchy_ is represented by the following set of key/value entries in an
380 | underlying store_:
381 |
382 | - The array_ or group_ metadata document for the root of a Zarr hierarchy_ is
383 | stored under the key ``zarr.json``.
384 |
385 | - The metadata document of a non-root array or group with hierarchy path ``P``
386 | is obtained by stripping the leading ``/`` of the path and appending
387 | ``/zarr.json``. For example, the metadata document of an array or group with
388 | path ``/foo/bar`` is ``foo/bar/zarr.json``.
389 |
390 | - All chunk or other data of an array is stored under the key prefix determined
391 | by its path. For a root array, the key prefix is obtained from the metadata
392 | document key by stripping the trailing ``zarr.json``. For example, for a root
393 | array, the prefix is the empty string. For a non-root array with hierarchy
394 | path ``/foo/bar``, the prefix is ``foo/bar/``.
395 |
396 | .. list-table:: Metadata Storage Key example
397 | :header-rows: 1
398 |
399 | * - Type
400 | - Path "P"
401 | - Key for Metadata at path `P`
402 | * - Array (Root)
403 | - `/`
404 | - `zarr.json`
405 | * - Group (Root)
406 | - `/`
407 | - `zarr.json`
408 | * - Group
409 | - `/foo`
410 | - `foo/zarr.json`
411 | * - Array
412 | - `/foo`
413 | - `foo/zarr.json`
414 | * - Group
415 | - `/foo/bar`
416 | - `foo/bar/zarr.json`
417 | * - Array
418 | - `/foo/baz`
419 | - `foo/baz/zarr.json`
420 |
421 |
422 | .. list-table:: Data Storage Key example
423 | :header-rows: 1
424 |
425 | * - Path `P` of array
426 | - Chunk grid indices
427 | - Data key
428 | * - `/foo/baz`
429 | - `(1, 0)`
430 | - `foo/baz/c/1/0`
431 |
432 | .. note::
433 |
434 | When storing a Zarr hierarchy in a filesystem-like store (e.g. the local
435 | filesystem or S3) as a sub-directory, it is recommended that the
436 | sub-directory name ends with ``.zarr`` to indicate the start of a hierarchy
437 | to users.
438 |
439 | .. _metadata:
440 |
441 | Metadata
442 | ========
443 |
444 | This section defines the structure of metadata documents for Zarr hierarchies,
445 | which consists of two types of metadata documents: array metadata documents, and
446 | group metadata documents. Both types of metadata documents are stored under the
447 | key ``zarr.json`` within the prefix of the array or group. Each type of
448 | metadata document is described in the following subsections.
449 |
450 | Metadata documents are defined here using the JSON
451 | type system defined in [RFC8259]_. In this section, the terms "value",
452 | "number", "string" and "object" are used to denote the types as
453 | defined in [RFC8259]_. The term "array" is also used as defined in
454 | [RFC8259]_, except where qualified as "Zarr array". Following
455 | [RFC8259]_, this section also describes an object as a set of
456 | name/value pairs. This section also defines how metadata documents are
457 | encoded for storage.
458 |
459 | .. _array-metadata:
460 |
461 | Array metadata
462 | --------------
463 |
464 | Each Zarr array in a hierarchy must have an array metadata document, named
465 | ``zarr.json``.
466 |
467 | Mandatory
468 | ^^^^^^^^^
469 |
470 | This document must contain a single object with the following
471 | mandatory names:
472 |
473 | .. _array-metadata-zarr-format:
474 |
475 | ``zarr_format``
476 | """"""""""""""""
477 |
478 | An integer defining the version of the storage specification to which the
479 | array store adheres, must be ``3`` here.
480 |
481 | .. _array-metadata-node-type:
482 |
483 | ``node_type``
484 | """""""""""""""
485 |
486 | A string defining the type of hierarchy node element, must be ``array``
487 | here.
488 |
489 | .. _array-metadata-shape:
490 |
491 | ``shape``
492 | """""""""
493 |
494 | An array of integers providing the length of each dimension of the
495 | Zarr array. For example, a value ``[10, 20]`` indicates a
496 | two-dimensional Zarr array, where the first dimension has length
497 | 10 and the second dimension has length 20.
498 |
499 | .. _array-metadata-data-type:
500 |
501 | ``data_type``
502 | """""""""""""
503 |
504 | The data type of the Zarr array.
505 |
506 | ``data_type`` is an :ref:`extension point`
507 | and MUST conform to the :ref:`extension-definition`.
508 |
509 | If the data type is defined in :ref:`this specification `,
510 | then the value must be the data type
511 | identifier provided as a string. For example, ``"float64"`` for
512 | little-endian 64-bit floating point number.
513 |
514 | Because the ``fill_value`` metadata key is dependent on the data type,
515 | extension data types SHOULD specify permitted values for the ``fill_value`` in
516 | their specification.
517 |
518 | .. _array-metadata-chunk-grid:
519 |
520 | ``chunk_grid``
521 | """"""""""""""
522 |
523 | The chunk grid of the Zarr array.
524 |
525 | ``chunk_grid`` is an :ref:`extension point`
526 | and MUST conform to the :ref:`extension-definition`.
527 |
528 | If the chunk grid is a regular chunk grid
529 | as defined in this specification, then the value must be an object with the
530 | names ``name`` and ``configuration``. The value of ``name`` must be the
531 | string ``"regular"``, and the value of ``configuration`` an object with the
532 | member ``chunk_shape``. ``chunk_shape`` must be an array of
533 | integers providing the lengths of the chunk along each dimension of the
534 | array. For example,
535 | ``{"name": "regular", "configuration": {"chunk_shape": [2, 5]}}``
536 | means a regular grid where the chunks have length 2 along the first
537 | dimension and length 5 along the second dimension.
538 |
539 |
540 | .. _array-metadata-chunk-key-encoding:
541 |
542 | ``chunk_key_encoding``
543 | """"""""""""""""""""""
544 |
545 | The mapping from chunk grid cell coordinates to keys in the underlying
546 | store.
547 |
548 | ``chunk_key_encoding`` is an :ref:`extension point`
549 | and MUST conform to the :ref:`extension-definition`.
550 |
551 | .. _array-metadata-fill-value:
552 |
553 | ``fill_value``
554 | """"""""""""""
555 |
556 | Provides an element value to use for uninitialised portions of the
557 | Zarr array.
558 |
559 | The permitted values depend on the data type. Fill values for core
560 | data types are listed in :ref:`fill-value-list`.
561 |
562 | Extension data types MUST also define the JSON fill value representation.
563 |
564 | .. note::
565 |
566 | The ``fill_value`` metadata field is required, but Zarr implementations
567 | may provide an interface for creating a new array with which users can
568 | leave the fill value unspecified, in which case a default fill value for
569 | the data type will be chosen. However, the default fill value that is
570 | chosen MUST be recorded in the metadata.
571 |
572 | .. _array-metadata-codecs:
573 |
574 | ``codecs``
575 | """"""""""
576 |
577 | Specifies a list of codecs to be used for encoding and decoding chunks.
578 |
579 | Each codec is an :ref:`extension point`
580 | and MUST conform to the :ref:`extension-definition`.
581 |
582 | Because ``codecs`` MUST contain an ``array
583 | -> bytes`` codec, the list cannot be empty (See :ref:`codecs `).
584 |
585 | Optional
586 | ^^^^^^^^
587 |
588 | The following members are optional:
589 |
590 | .. _array-metadata-attributes:
591 |
592 | ``attributes``
593 | """"""""""""""
594 |
595 | The value must be an object. The object may contain any key/value
596 | pairs, where the key must be a string and the value can be an arbitrary
597 | JSON literal. Intended to allow storage of arbitrary user metadata.
598 |
599 |
600 | .. note::
601 | An extension to store user attributes in a separate document is being
602 | discussed in https://github.com/zarr-developers/zarr-specs/issues/72.
603 |
604 | .. note::
605 | A proposal to specify metadata conventions (ZEP 4) is being discussed in
606 | https://github.com/zarr-developers/zeps/pull/28.
607 |
608 | .. _array-metadata-storage-transformers:
609 |
610 | ``storage_transformers``
611 | """"""""""""""""""""""""
612 |
613 | Specifies a list of `storage transformers`_.
614 |
615 | Each storage transformer is an :ref:`extension point`
616 | and MUST conform to the :ref:`extension-definition`.
617 |
618 | When the ``storage_transformers`` name is
619 | absent no storage transformer is used, same for an empty list.
620 |
621 | .. _array-metadata-dimension-names:
622 |
623 | ``dimension_names``
624 | """""""""""""""""""
625 |
626 | Specifies dimension names, e.g. ``["x", "y", "z"]``. If specified, must be
627 | an array of strings or null objects with the same length as ``shape``. An
628 | unnamed dimension is indicated by the null object. If ``dimension_names`` is
629 | not specified, all dimensions are unnamed.
630 |
631 | For compatibility with Zarr implementations and applications that support
632 | using dimension names to uniquely identify dimensions, it is recommended but
633 | not required that all non-null dimension names are distinct (no two
634 | dimensions have the same non-empty name).
635 |
636 | This specification also does not place any restrictions on the use of the
637 | same dimension name across multiple arrays within the same Zarr hierarchy,
638 | but extensions or specific applications may do so.
639 |
640 | .. _array-metadata-extensions:
641 |
642 | Unknown
643 | ^^^^^^^
644 |
645 | All other keys found in the metadata object MUST be interpreted
646 | following the :ref:`Extensions section `.
647 |
648 | Example
649 | ^^^^^^^
650 |
651 | For example, the array metadata JSON document below defines a
652 | two-dimensional array of 64-bit little-endian floating point numbers,
653 | with 10000 rows and 1000 columns, divided into a regular chunk grid where
654 | each chunk has 1000 rows and 100 columns, and thus there will be 100
655 | chunks in total arranged into a 10 by 10 grid. Within each chunk the
656 | binary values are laid out in C contiguous order::
657 |
658 | {
659 | "zarr_format": 3,
660 | "node_type": "array",
661 | "shape": [10000, 1000],
662 | "dimension_names": ["rows", "columns"],
663 | "data_type": "float64",
664 | "chunk_grid": {
665 | "name": "regular",
666 | "configuration": {
667 | "chunk_shape": [1000, 100]
668 | }
669 | },
670 | "chunk_key_encoding": {
671 | "name": "default",
672 | "configuration": {
673 | "separator": "/"
674 | }
675 | },
676 | "codecs": [{
677 | "name": "bytes",
678 | "configuration": {
679 | "endian": "little"
680 | }
681 | }],
682 | "fill_value": "NaN",
683 | "attributes": {
684 | "foo": 42,
685 | "bar": "apples",
686 | "baz": [1, 2, 3, 4]
687 | }
688 | }
689 |
690 | The following example illustrates an array with the same shape and chunking as
691 | above, but using a (currently made up) extension data type::
692 |
693 | {
694 | "zarr_format": 3,
695 | "node_type": "array",
696 | "shape": [10000, 1000],
697 | "data_type": {
698 | "name": "urn:example:datetime",
699 | "configuration": {
700 | "unit": "ns"
701 | }
702 | },
703 | "chunk_grid": {
704 | "name": "regular",
705 | "configuration": {
706 | "chunk_shape": [1000, 100]
707 | }
708 | },
709 | "chunk_key_encoding": {
710 | "name": "default",
711 | "configuration": {
712 | "separator": "/"
713 | }
714 | },
715 | "codecs": [{
716 | "name": "bytes",
717 | "configuration": {
718 | "endian": "big"
719 | }
720 | }],
721 | "fill_value": null,
722 | }
723 |
724 | .. note::
725 |
726 | Comparison with Zarr spec v2:
727 |
728 | - ``dtype`` has been renamed to ``data_type``,
729 | - ``chunks`` has been replaced with ``chunk_grid``,
730 | - ``dimension_separator`` has been replaced with ``chunk_key_encoding``,
731 | - ``order`` has been replaced by the :ref:`transpose ` codec,
732 | - the separate ``filters`` and ``compressor`` fields been combined into the single ``codecs`` field.
733 |
734 | .. _group-metadata:
735 |
736 | Group metadata
737 | --------------
738 |
739 | Mandatory
740 | ^^^^^^^^^
741 |
742 | A Zarr group metadata object must contain the following mandatory key:
743 |
744 | ``zarr_format``
745 | """""""""""""""
746 |
747 | An integer defining the version of the storage specification to which the
748 | array store adheres, must be ``3`` here.
749 |
750 | ``node_type``
751 | """""""""""""""
752 |
753 | A string defining the type of hierarchy node element, must be ``group``
754 | here.
755 |
756 | Optional
757 | ^^^^^^^^
758 |
759 | Optional keys:
760 |
761 | ``attributes``
762 | """"""""""""""
763 |
764 | The value must be an object. The object may contain any key/value
765 | pairs, where the key must be a string and the value can be an arbitrary
766 | JSON literal. Intended to allow storage of arbitrary user metadata.
767 |
768 | .. _group-metadata-extensions:
769 |
770 | Unknown
771 | ^^^^^^^
772 |
773 | All other keys found in the metadata object MUST be interpreted
774 | following the :ref:`Extensions section `.
775 |
776 | Example
777 | ^^^^^^^
778 |
779 | For example, the JSON document below defines a group::
780 |
781 | {
782 | "zarr_format": 3,
783 | "node_type": "group",
784 | "attributes": {
785 | "spam": "ham",
786 | "eggs": 42
787 | }
788 | }
789 |
790 | Node names
791 | ==========
792 |
793 | The root node does not have a name and is the empty string ``""``.
794 | Except for the root node, each node in a hierarchy must have a name,
795 | which is a string of unicode code points. The following constraints
796 | apply to node names:
797 |
798 | * must not be the empty string (``""``)
799 | * must not include the character ``"/"``
800 | * must not be a string composed only of period characters, e.g. ``"."`` or ``".."``
801 | * must not start with the reserved prefix ``"__"``
802 |
803 | To ensure consistent behaviour across different storage systems and programming
804 | languages, we recommend users to only use characters in the sets ``a-z``,
805 | ``A-Z``, ``0-9``, ``-``, ``_``, ``.``.
806 |
807 | Node names are case sensitive, e.g., the names "foo" and "FOO" are **not**
808 | identical.
809 |
810 | When using non-ASCII Unicode characters, we recommend users to use
811 | case-folded NFKC-normalized strings following the
812 | `General Security Profile for Identifiers of the Unicode Security Mechanisms (Unicode Technical Standard #39) `_.
813 | This follows the
814 | `Recommendations for Programmers (B) of the Unicode Security Considerations (Unicode Technical Report #36) `_.
815 |
816 | .. note::
817 | A storage transformer for unicode normalization might be added later, see
818 | https://github.com/zarr-developers/zarr-specs/issues/201.
819 |
820 | .. note::
821 | The underlying store might pose additional restriction on node names,
822 | such as the following:
823 |
824 | * `260 characters path length limit in Windows `_
825 | * 1,024 bytes UTF8 object key limit for
826 | `AWS S3 `_
827 | and `GCS `_, with
828 | additional constraints.
829 | * `Windows paths are case-insensitive by default `_
830 | * `MacOS paths are case-insensitive by default `_
831 |
832 | .. note::
833 | If a store requires an explicit byte string representation the default
834 | representation is the ``UTF-8`` encoded Unicode string.
835 |
836 | .. note::
837 | The prefix ``__zarr`` is reserved for core Zarr data, and extensions
838 | can use other files and folders starting with ``__``.
839 |
840 |
841 | Data types
842 | ==========
843 |
844 | A data type describes the set of possible binary values that an array
845 | element may take, along with some information about how the values
846 | should be interpreted.
847 |
848 | This specification defines a limited set of data types to
849 | represent boolean values, integers, and floating point
850 | numbers. These can be found under :ref:`Data Types`.
851 |
852 | All of the data types defined here have a fixed size, in the sense that all values
853 | require the same number of bytes.
854 |
855 | Additional data types may be defined as :ref:`extensions`
856 | which MAY have variable sized data types.
857 |
858 | Note that the Zarr specification is intended to enable communication
859 | of data between a variety of computing environments. The native byte
860 | order may differ between machines used to write and read the data.
861 |
862 | Each data type is associated with an identifier, which can be used in
863 | metadata documents to refer to the data type. For the data types
864 | defined in this specification, the identifier is a simple ASCII
865 | string. However, extensions may use any JSON value to identify a data
866 | type.
867 |
868 | In addition to these base types, an implementation should also handle the
869 | raw/opaque pass-through type designated by the lower-case letter ``r`` followed
870 | by the number of bits, multiple of 8. For example, ``r8``, ``r16``, and ``r24``
871 | should be understood as fall-back types of respectively 1, 2, and 3 byte length.
872 |
873 | Zarr v3 is limited to type sizes that are a multiple of 8 bits but may support
874 | other type sizes in later versions of this specification.
875 |
876 | .. note::
877 |
878 | We are explicitly looking for more feedback and prototypes of code using the ``r*``,
879 | raw bits, for various endianness and whether the spec could be made clearer.
880 |
881 | .. note::
882 |
883 | Currently only fixed size elements are supported as a core data type.
884 | There are many requests for variable length element encoding. There are many
885 | ways to encode variable length and we want to keep flexibility. While we seem
886 | to agree that for random access the most likely contender is to have two
887 | arrays, one with the actual variable length data and one with fixed size
888 | (pointer + length) to the variable size data, we do not want to commit to such
889 | a structure.
890 | See https://github.com/zarr-developers/zarr-specs/issues/62.
891 |
892 |
893 | Chunk grids
894 | ===========
895 |
896 | A chunk grid defines a set of chunks which contain the elements of an
897 | array. The chunks of a grid form a tessellation of the array space,
898 | which is a space defined by the dimensionality and shape of the
899 | array. This means that every element of the array is a member of one
900 | chunk, and there are no gaps or overlaps between chunks.
901 |
902 | In general there are different possible types of grids. Those defined
903 | under the core specification can be found under :ref:`chunk-grid-list`.
904 | Additional grid types MAY be defined as :ref:`extensions`,
905 | such as rectilinear grids where chunks are still
906 | hyperrectangles but do not all share the same shape.
907 |
908 | A grid type must also define rules for constructing an identifier for
909 | each chunk that is unique within the grid, which is a string of ASCII
910 | characters that can be used to construct keys to save and retrieve
911 | chunk data in a store, see also the `Storage`_ section.
912 |
913 | Chunk encoding
914 | ==============
915 |
916 | Chunks are encoded into a binary representation for storage in a store_, using
917 | the chain of codecs_ specified by the ``codecs`` metadata field.
918 |
919 | Codecs
920 | ------
921 |
922 | An array_ has an associated list of *codecs*. Each codec specifies a
923 | bidirectional transform (an *encode* transform and a *decode* transform).
924 |
925 | Each codec has an *encoded representation* and a *decoded representation*;
926 | each of these two representations are defined to be either:
927 |
928 | - a multi-dimensional array of some shape and data type, or
929 | - a byte string.
930 |
931 | Based on the input and output representations for the encode transform,
932 | codecs can be classified as one of three kinds:
933 |
934 | - ``array -> array``
935 | - ``array -> bytes``
936 | - ``bytes -> bytes``
937 |
938 | .. note::
939 |
940 | ``bytes -> array`` codecs, where after encoding an array as a byte
941 | string, it is subsequently transformed back into an array, to then later
942 | be transformed back into a byte string, are not currently allowed, due to
943 | the lack of a clear use case.
944 |
945 | If multiple codecs are specified for an array, each codec is applied
946 | sequentially; when encoding, the encoded output of codec ``i`` serves as the
947 | decoded input of codec ``i+1``, and similarly when decoding, the decoded output
948 | of codec ``i+1`` serves as the encoded input to codec ``i``. Since ``bytes ->
949 | array`` codecs are not supported, it follows that the list of codecs must be of
950 | the following form:
951 |
952 | - zero or more ``array -> array`` codecs; followed by
953 | - exactly one ``array -> bytes`` codec; followed by
954 | - zero or more ``bytes -> bytes`` codecs.
955 |
956 | Logically, a codec ``c`` must define three properties:
957 |
958 | - ``c.compute_encoded_representation_type(decoded_representation_type)``, a
959 | procedure that determines the encoded representation based on the decoded
960 | representation and any codec parameters. In the case of a decoded
961 | representation that is a multi-dimensional array, the shape and data type
962 | of the encoded representation must be computable based only on the shape
963 | and data type, but not the actual element values, of the decoded
964 | representation. If the ``decoded_representation_type`` is not supported,
965 | this algorithm must fail with an error.
966 |
967 | - ``c.encode(decoded_value)``, a procedure that computes the encoded
968 | representation, and is used when writing an array.
969 |
970 | - ``c.decode(encoded_value, decoded_representation_type)``, a procedure that
971 | computes the decoded representation, and is used when reading an array.
972 |
973 | Implementations MAY support partial decoding for certain codecs
974 | (e.g. sharding, blosc). Logically, partial decoding may be defined in terms
975 | of an additional operation:
976 |
977 | - ``c.partial_decode(input_handle, decoded_representation_type,
978 | decoded_regions)``, where:
979 |
980 | - ``input_handle`` provides an interface for requesting partial reads of
981 | the encoded representation and itself supports the same
982 | ``partial_decode`` interface;
983 | - ``decoded_representation_type`` is the same as for ``c.decode``;
984 | - ``decoded_regions`` specifies the regions of the decoded representation
985 | that must be returned.
986 |
987 | If the encoded representation is a multi-dimensional array, then
988 | ``decoded_regions`` specifies a subset of the array's domain. If the
989 | encoded representation is a byte string, then ``decoded_regions``
990 | specifies a list of byte ranges.
991 |
992 | - ``c.compute_encoded_size(input_size)``, a procedure that determines the
993 | size of the encoded representation given a size of the decoded representation.
994 | This procedure cannot be implemented for codecs that produce variable-sized
995 | encoded representations, such as compression algorithms. Depending on the
996 | type of the codec, the signature could differ:
997 |
998 | - ``c.compute_encoded_size(array_size, data_type) -> (array_size, data_type)``
999 | for ``array -> array`` codecs, where ``array_size`` is the number of items
1000 | in the array, i.e., the product of the components of the array's shape;
1001 | - ``c.compute_encoded_size(array_size, data_type) -> byte_size``
1002 | for ``array -> bytes`` codecs;
1003 | - ``c.compute_encoded_size(byte_size) -> byte_size``
1004 | for ``bytes -> bytes`` codecs.
1005 |
1006 | .. note::
1007 |
1008 | If ``partial_decode`` is not supported by a particular codec, it can
1009 | always be implemented in terms of ``decode`` by simply decoding in full
1010 | and then satisfying any ``decoded_regions`` requests directly from the
1011 | cached decoded representation.
1012 |
1013 | Determination of encoded representations
1014 | ----------------------------------------
1015 |
1016 | To encode or decode a chunk, the encoded and decoded representations for each
1017 | codec in the chain must first be determined as follows:
1018 |
1019 | 1. The initial decoded representation, ``decoded_representation[0]`` is a
1020 | multi-dimensional array with the same data type as the Zarr array, and shape
1021 | equal to the chunk shape.
1022 |
1023 | 2. For each codec ``i``, the encoded representation is equal to the decoded
1024 | representation ``decoded_representation[i+1]`` of the next codec, and is
1025 | computed from
1026 | ``codecs[i].compute_encoded_representation_type(decoded_representation[i])``.
1027 | If ``compute_encoded_representation_type`` fails because of an incompatible
1028 | decoded representation, an implementation should indicate an error.
1029 |
1030 | .. _encoding_procedure:
1031 |
1032 | Encoding procedure
1033 | ------------------
1034 |
1035 | Based on the computed ``decoded_representations`` list, a chunk is encoded using
1036 | the following procedure:
1037 |
1038 | 1. The initial *encoded chunk* ``EC[0]`` of the type specified by
1039 | ``decoded_representation[0]`` is equal to the chunk array ``A`` (with a shape
1040 | equal to the chunk shape, and data type equal to the Zarr array data type).
1041 |
1042 | 2. For each codec ``codecs[i]`` in ``codecs``, ``EC[i+1] :=
1043 | codecs[i].encode(EC[i])``.
1044 |
1045 | 3. The final encoded chunk representation ``EC_final := EC[codecs.length]``.
1046 | This is always a byte string due to the requirement that the list of codecs
1047 | include an ``array -> bytes`` codec.
1048 |
1049 | 4. ``EC_final`` is written to the store_.
1050 |
1051 | .. _decoding_procedure:
1052 |
1053 | Decoding procedure
1054 | ------------------
1055 |
1056 | Based on the computed ``decoded_representations`` list, a chunk is decoded using
1057 | the following procedure:
1058 |
1059 | 1. The encoded chunk representation ``EC_final`` is read from the store_.
1060 |
1061 | 2. ``EC[codecs.length] := EC_final``.
1062 |
1063 | 3. For each codec ``codecs[i]`` in ``codecs``, iterating in reverse order,
1064 | ``EC[i] := codecs[i].decode(EC[i+1], decoded_representation[i])``.
1065 |
1066 | 4. The chunk array ``A`` is equal to ``EC[0]``.
1067 |
1068 | .. _codec-specification:
1069 |
1070 | Core codecs
1071 | -----------
1072 |
1073 | This specification defines a set of codecs ("core codecs") which all Zarr implementations SHOULD implement in
1074 | order to ensure a minimal level of interoperability between Zarr implementations.
1075 | The list of core codecs is part of the Zarr v3 specification.
1076 | Changes to the list of core codecs MUST be made via the same protocol used for
1077 | changing the Zarr v3 specification. Changes to the list of core codecs SHOULD be made
1078 | in close collaboration with extant Zarr v3 implementations. A new core codec SHOULD be added to the
1079 | list when a sufficient number of Zarr implementations support or intend to support that codec.
1080 | An existing core codec SHOULD be removed from the list when a sufficient number of implementation
1081 | developers and Zarr users deem the codec worth removing, e.g. because of a technical flaw in the
1082 | algorithm underlying the codec.
1083 |
1084 | Extension codecs
1085 | ----------------
1086 |
1087 | To allow for flexibility to define and implement new codecs, the
1088 | list of codecs defined for an array MAY contain codecs which are
1089 | defined in separate specifications. In order to refer to codecs in array metadata
1090 | documents, each codec must have a conformant identifier as specified under
1091 | "`extension naming `_" below.
1092 | For ease of discovery, it is
1093 | recommended that codec specifications are contributed to the
1094 | registry of extensions
1095 | (`zarr-extensions`_).
1096 |
1097 | A codec specification MUST declare the codec identifier, and describe
1098 | (or cite documents that describe) the encoding and decoding algorithms
1099 | and the format of the encoded data.
1100 | A codec MAY have configuration parameters which modify the behaviour
1101 | of the codec in some way. For example, a compression codec may have a
1102 | compression level parameter, which is an integer that affects the
1103 | resulting compression ratio of the data. Configuration parameters must
1104 | be declared in the codec specification, including a definition of how
1105 | configuration parameters are represented as JSON.
1106 |
1107 | Further details of how codecs are configured for an array are given in the
1108 | `Array metadata`_ section.
1109 |
1110 | Stores
1111 | ======
1112 |
1113 | A Zarr store is a system that can be used to store and retrieve data
1114 | from a Zarr hierarchy. For a store to be compatible with this
1115 | specification, it must support a set of operations defined in the `Abstract store
1116 | interface`_ subsection. The store interface can be implemented using a
1117 | variety of underlying storage technologies, described in the
1118 | subsection on `Store implementations`_.
1119 |
1120 | Additionally, a store should specify a canonical URI format that can be used to
1121 | identify nodes in this store. Implementations should use the specified formats
1122 | when opening a Zarr hierarchy to automatically determine the appropriate store.
1123 |
1124 | .. _abstract-store-interface:
1125 |
1126 | Abstract store interface
1127 | ------------------------
1128 |
1129 | The store interface is intended to be simple to implement using a
1130 | variety of different underlying storage technologies. It is defined in
1131 | a general way here, but it should be straightforward to translate into
1132 | a software interface in any given programming language. The goal is
1133 | that an implementation of this specification could be modular and
1134 | allow for different store implementations to be used.
1135 |
1136 | The store interface defines a set of operations involving `keys` and
1137 | `values`. In the context of this interface, a `key` is a Unicode
1138 | string, where the final character is **not** a ``/`` character.
1139 | In general, a `value` is a sequence of bytes. Specific stores
1140 | may choose more specific storage formats, which must be stated in the
1141 | specification of the respective store. E.g. a database store might
1142 | encode values of ``*.json`` keys with a database-native json type.
1143 |
1144 | It is assumed that the store holds (`key`, `value`) pairs, with only
1145 | one such pair for any given `key`. I.e., a store is a mapping from
1146 | keys to values. It is also assumed that keys are case sensitive, i.e.,
1147 | the keys "foo" and "FOO" are different.
1148 |
1149 | To read and write partial values, a `range` specifies two integers
1150 | `range_start` and `range_length`, that specify a part of the value
1151 | starting at byte `range_start` (inclusive) and having a length of
1152 | `range_length` bytes. `range_length` may be none, indicating all
1153 | available data until the end of the referenced value. For example
1154 | `range` ``[0, none]`` specifies the full value. Stores that do not
1155 | support partial access can still fulfill partial requests by first extracting
1156 | the full value and then returning a subset of bytes.
1157 |
1158 | The store interface also defines some operations involving
1159 | `prefixes`. In the context of this interface, a prefix is a string
1160 | containing only characters that are valid for use in `keys` and ending
1161 | with a trailing ``/`` character.
1162 |
1163 | The store operations are grouped into three sets of capabilities:
1164 | **readable**, **writeable** and **listable**. It is not necessary for
1165 | a store implementation to support all of these capabilities.
1166 |
1167 | A **readable store** supports the following operations:
1168 |
1169 |
1170 | ``get`` - Retrieve the `value` associated with a given `key`.
1171 |
1172 | | Parameters: `key`
1173 | | Output: `value`
1174 |
1175 | ``get_partial_values`` - Retrieve possibly partial `values` from given `key_ranges`.
1176 |
1177 | | Parameters: `key_ranges`: ordered set of `key`, `range` pairs,
1178 | | a `key` may occur multiple times with different `ranges`
1179 | | Output: list of `values`, in the order of the `key_ranges`,
1180 | | may contain null/none for missing keys
1181 |
1182 | A **writeable store** supports the following operations:
1183 |
1184 | ``set`` - Store a (`key`, `value`) pair.
1185 |
1186 | | Parameters: `key`, `value`
1187 | | Output: none
1188 |
1189 | ``set_partial_values`` - Store `values` at a given `key`, starting at byte `range_start`.
1190 |
1191 | | Parameters: `key_start_values`: set of `key`,
1192 | | `range_start`, `values` triples, a `key` may occur multiple
1193 | | times with different `range_starts`, `range_starts` (considering
1194 | | the length of the respective `values`) must not specify overlapping
1195 | | ranges for the same `key`
1196 | | Output: none
1197 |
1198 | ``erase`` - Erase the given key/value pair from the store.
1199 |
1200 | | Parameters: `key`
1201 | | Output: none
1202 |
1203 | ``erase_values`` - Erase the given key/value pairs from the store.
1204 |
1205 | | Parameters: `keys`: set of `keys`
1206 | | Output: none
1207 |
1208 | ``erase_prefix`` - Erase all keys with the given prefix from the store:
1209 |
1210 | | Parameter: `prefix`
1211 | | Output: none
1212 |
1213 | .. note::
1214 |
1215 | Some stores allow creating and updating keys, but not deleting them. For
1216 | example, Zip archives do not allow removal of content without recreating the
1217 | full archive.
1218 |
1219 | Inability to delete can impair the ability to rename keys, as a rename
1220 | is often a sequence or atomic combination of a deletion and a creation.
1221 |
1222 | A **listable store** supports any one or more of the following
1223 | operations:
1224 |
1225 | ``list`` - Retrieve all `keys` in the store.
1226 |
1227 | | Parameters: none
1228 | | Output: set of `keys`
1229 |
1230 | ``list_prefix`` - Retrieve all keys with a given prefix.
1231 |
1232 | | Parameters: `prefix`
1233 | | Output: set of `keys` with the given `prefix`,
1234 |
1235 | For example, if a store contains the keys "a/b", "a/c/d" and
1236 | "e/f/g", then ``list_prefix("a/")`` would return "a/b" and "a/c/d".
1237 |
1238 | Note: the behaviour of ``list_prefix`` is undefined if ``prefix`` does not end
1239 | with a trailing slash ``/`` and the store can assume there is at least one key
1240 | that starts with ``prefix``.
1241 |
1242 | ``list_dir`` - Retrieve all keys and prefixes with a given prefix and
1243 | which do not contain the character "/" after the given prefix.
1244 |
1245 | | Parameters: `prefix`
1246 | | Output: set of `keys` and set of `prefixes`
1247 |
1248 | For example, if a store contains the keys "a/b", "a/c", "a/d/e",
1249 | "a/f/g", then ``list_dir("a/")`` would return keys "a/b" and "a/c"
1250 | and prefixes "a/d/" and "a/f/". ``list_dir("b/")`` would return
1251 | the empty set.
1252 |
1253 |
1254 | Note that because keys are case sensitive, it is assumed that the
1255 | operations ``set("foo", a)`` and ``set("FOO", b)`` will result in two
1256 | separate (key, value) pairs being stored. Subsequently ``get("foo")``
1257 | will return *a* and ``get("FOO")`` will return *b*.
1258 |
1259 | It is recommended that the implementation of the
1260 | ``get_partial_values``, ``set_partial_values`` and
1261 | ``erase_values`` methods is made optional, providing fallbacks
1262 | for them by default. However, it is recommended to supply those operations
1263 | where possible for efficiency. Also, the ``get``, ``set`` and ``erase``
1264 | can easily be mapped onto their `partial_values` counterparts.
1265 | Therefore, it is also recommended to supply fallbacks for those if the
1266 | `partial_values` operations can be implemented.
1267 | An entity containing those fallbacks could be named ``StoreWithPartialAccess``.
1268 |
1269 | Store implementations
1270 | ---------------------
1271 |
1272 | (This subsection is not normative.)
1273 |
1274 | A store implementation maps the abstract operations of the store
1275 | interface onto concrete operations on some underlying storage
1276 | system. This specification does not constrain or make any assumptions
1277 | about the nature of the underlying storage system. Thus it is possible
1278 | to implement the store interface in a variety of different ways.
1279 |
1280 | For example, a store implementation might use a conventional file
1281 | system as the underlying storage system, mapping keys onto file paths
1282 | and values onto file contents. The ``get`` operation could then be
1283 | implemented by reading a file, the ``set`` operation implemented by
1284 | writing a file, and the ``list_dir`` operation implemented by listing
1285 | a directory.
1286 |
1287 | For example, a store implementation might use a key-value database
1288 | such as BerkeleyDB or LMDB as the underlying storage system. In this
1289 | case the implementation of ``get`` and ``set`` operations would be
1290 | whatever native operations are provided by the
1291 | database for getting and setting key/value pairs. Such a store
1292 | implementation might natively support the ``list`` operation but might
1293 | not support ``list_prefix`` or ``list_dir``, although these could be
1294 | implemented via ``list`` with post-processing of the returned keys.
1295 |
1296 | For example, a store implementation might use a cloud object storage
1297 | service such as Amazon S3, Azure Blob Storage, or Google Cloud Storage
1298 | as the underlying storage system, mapping keys to object names and
1299 | values to object contents. The store interface operations would then
1300 | be implemented via concrete operations of the service's REST API,
1301 | i.e., via HTTP requests. E.g., the ``get`` operation could be
1302 | implemented via an HTTP GET request to an object URL, the ``set``
1303 | operation could be implemented via an HTTP PUT request to an object
1304 | URL, and the list operations could be implemented via an HTTP GET
1305 | request to a bucket URL (i.e., listing a bucket).
1306 |
1307 | The examples above are meant to be illustrative only, and other
1308 | implementations are possible. This specification does not attempt to
1309 | standardise any store implementations, however where a store
1310 | implementation is expected to be widely used then it is recommended to
1311 | create a store implementation spec and contribute it to the `zarr-specs GitHub repository`_.
1312 | For an example of a store implementation spec, see the
1313 | :ref:`file-system-store-v1` specification.
1314 |
1315 |
1316 | Storage
1317 | =======
1318 |
1319 | This section describes how to translate high level operations to
1320 | create, erase or modify Zarr hierarchies, groups or arrays, into low
1321 | level operations on the key/value store interface defined above.
1322 |
1323 | In this section a "hierarchy path" is a logical path which identifies
1324 | a group or array node within a Zarr hierarchy, and a "storage key" is
1325 | a key used to store and retrieve data via the store interface. There
1326 | is a further distinction between "metadata keys" which are storage
1327 | keys used to store metadata documents, and "chunk keys" which are
1328 | storage keys used to store encoded chunks.
1329 |
1330 | Note that any non-root hierarchy path will have ancestor paths that
1331 | identify ancestor nodes in the hierarchy. For example, the path
1332 | "/foo/bar" has ancestor paths "/foo" and "/".
1333 |
1334 |
1335 | Operations
1336 | ----------
1337 |
1338 | The following section describes possible operations of an implementation as a
1339 | non-normative guide-line.
1340 |
1341 | Let `P` be an arbitrary hierarchy path.
1342 |
1343 | Let ``meta_key(P)`` be the metadata key for `P`, ``P/zarr.json``.
1344 |
1345 | Let ``data_key(P, j, i ...)`` be the data key for `P` for the chunk
1346 | with grid coordinates (`j`, `i`, ...).
1347 |
1348 | Let "+" be the string concatenation operator.
1349 |
1350 |
1351 | **Create a group**
1352 |
1353 | To create a group at hierarchy path `P`, perform
1354 | ``set(meta_key(P), value)``, where `value` is the
1355 | serialization of a valid group metadata document, and
1356 | ensure the existence of groups at all ancestor paths of `P`.
1357 |
1358 | **Create an array**
1359 |
1360 | To create an array at hierarchy path `P`, perform
1361 | ``set(meta_key(P), value)``, where `value` is the serialisation of a valid
1362 | array metadata document.
1363 |
1364 | Creating an array at path `P` implies the existence of groups at all
1365 | ancestor paths of `P`.
1366 |
1367 | **Store chunk data in an array**
1368 |
1369 | To store chunk data in an array at path `P` and chunk coordinate (`j`, `i`,
1370 | ...), perform ``set(data_key(P, j, i, ...), value)``, where `value` is the
1371 | serialisation of the corresponding chunk, encoded according to the
1372 | information in the array metadata stored under the key ``meta_key(P)``.
1373 |
1374 | **Retrieve chunk data in an array**
1375 |
1376 | To retrieve chunk data in an array at path `P` and chunk coordinate (`i`,
1377 | `j`, ...), perform ``get(data_key(P, j, i, ...))``. The returned
1378 | value is the serialisation of the corresponding chunk, encoded according to
1379 | the array metadata stored at ``meta_key(P)``.
1380 |
1381 | **Discover children of a group**
1382 |
1383 | To discover the children of a group at hierarchy path `P`, perform
1384 | ``list_dir(P + "/")``. Any returned prefix ``Q`` not starting with ``__``
1385 | indicates a child array or group. To determine whether the child is
1386 | an array or group, the document ``meta_key(Q)`` must be checked.
1387 |
1388 | For example, if a group is created at path "/foo/bar" and an array
1389 | is created at path "/foo/baz/qux", then the store will contain the
1390 | keys "foo/bar/zarr.json" and "foo/baz/qux/zarr.json".
1391 | Groups at paths "/", "/foo" and "/foo/baz" have not been explicitly
1392 | created but are implied by their descendants. To list the children
1393 | of the group at path "/foo", perform ``list_dir("/foo/")``,
1394 | which will return the prefixes "foo/bar" and "foo/baz".
1395 | From this it can be inferred that child groups or arrays
1396 | "/foo/bar" and "/foo/baz" are present.
1397 |
1398 | If a store does not support any of the list operations then discovery of
1399 | group children is not possible, and the contents of the hierarchy must be
1400 | communicated by some other means, such as via an extension (see
1401 | https://github.com/zarr-developers/zarr-specs/issues/15) or via some out of
1402 | band communication.
1403 |
1404 | **Discover all nodes in a hierarchy**
1405 |
1406 | To discover all nodes in a hierarchy, one should discover the children of
1407 | the root of the hierarchy and then recursively list children of child
1408 | groups.
1409 |
1410 | For hierarchies without group storage transformers one may also call
1411 | ``list_prefix("/")``. All ``zarr.json`` keys represent either groups or arrays.
1412 |
1413 | **Erase a group or array**
1414 |
1415 | To erase an array at path `P`, erase the metadata document and array data
1416 | for the array, ``erase_prefix(P + "/")``.
1417 |
1418 | To erase a group at path `P`: erase all nodes under
1419 | this group and its metadata document - it should be sufficient to perform
1420 | ``erase_prefix(P + "/")``
1421 |
1422 | **Determine if a node exists**
1423 |
1424 | To determine if a node exists at path ``P``, try in the following order
1425 |
1426 | - ``get(meta_key(P))``
1427 | (success implies an array or group at ``P``);
1428 |
1429 | .. note::
1430 | For listable stores, ``list_dir(parent(P))`` can be an alternative.
1431 |
1432 |
1433 | Storage transformers
1434 | ====================
1435 |
1436 | A Zarr storage transformer modifies a request to read or write data before passing
1437 | that request to the following transformer or store.
1438 | The stored transformed data is restored to its original state whenever data is requested
1439 | by the Array. Storage transformers can be configured per array via the
1440 | `storage_transformers `_ name in the `array metadata`_. Storage transformers which do
1441 | not change the storage layout (e.g. for caching) may be specified at runtime without
1442 | adding them to the array metadata.
1443 |
1444 | .. note::
1445 | It is planned to add storage transformers also to groups in a later revision
1446 | of this spec, see https://github.com/zarr-developers/zarr-specs/issues/215.
1447 |
1448 | A storage transformer serves the same `abstract store interface`_ as the store_.
1449 | However, it should not persistently store any information necessary to restore the original data,
1450 | but instead propagates this to the next storage transformer or the final store.
1451 | From the perspective of an array or a previous stage transformer, both store and storage transformer follow the same
1452 | protocol and can be interchanged regarding the protocol. The behaviour can still be different,
1453 | e.g. requests may be cached or the form of the underlying data can change.
1454 |
1455 | Storage transformers may be stacked to combine different functionalities:
1456 |
1457 | .. mermaid::
1458 |
1459 | graph LR
1460 | Array --> t1
1461 | subgraph stack [Storage transformers]
1462 | t1[Transformer 1] --> t2[...] --> t3[Transformer N]
1463 | end
1464 | t3 --> Store
1465 |
1466 |
1467 | .. _extensions_section:
1468 |
1469 | Extensions
1470 | ==========
1471 |
1472 | Additional functionality and features can be enabled in Zarr datasets through
1473 | extensions defined in `metadata documents`_. Each extension corresponds to a
1474 | specific extension point, such as data types or codecs. Extensions may include
1475 | optional configuration, which can be provided via structured objects. Proper
1476 | naming is essential for cross-implementation interoperability, ensuring
1477 | extensions are recognized and used consistently. This section outlines
1478 | available extension points, the structural constraints on extensions, and
1479 | naming conventions.
1480 |
1481 | .. _extension-points:
1482 |
1483 | Extension points
1484 | ----------------
1485 |
1486 | Different types of extensions can exist and they can be grouped as follows:
1487 |
1488 | =========== ======================= ================================================================== ================================
1489 | node_type extension point metadata definition list of core extensions
1490 | =========== ======================= ================================================================== ================================
1491 | array data type :ref:`data-type ` :ref:`data-type-list`
1492 | array chunk grid :ref:`chunk-grid ` :ref:`chunk-grid-list`
1493 | array chunk key encoding :ref:`chunk-key-encoding ` :ref:`chunk-key-encoding-list`
1494 | array codecs :ref:`codecs ` :ref:`codec-list`
1495 | array storage transformer :ref:`storage-transformers ` :ref:`storage-transformer-list`
1496 | =========== ======================= ================================================================== ================================
1497 |
1498 | Note, that ``fill_value`` is not its own extension point, but is dependent on the data type.
1499 |
1500 | New extension points may be proposed to the Zarr community through the ZEP
1501 | process. See `ZEP 0 `_ for more information.
1502 |
1503 | .. _extension-definition:
1504 |
1505 | Extension definition
1506 | --------------------
1507 |
1508 | .. _extension-definition-object:
1509 |
1510 | Objects
1511 | ^^^^^^^
1512 |
1513 | In `metadata documents`_, extensions can be encoded either as objects or as
1514 | short-hand names.
1515 |
1516 | If using an object definition, the member ``name``
1517 | MUST be a plain string which conforms to :ref:`extension name `.
1518 | Optionally, the member ``configuration`` MAY be present but if so MUST be
1519 | an object.
1520 |
1521 | For example::
1522 |
1523 | {
1524 | "name": "", # conformant name
1525 | "configuration": { ... } # optional object
1526 | }
1527 |
1528 | .. _extension-definition-short-hand-name:
1529 |
1530 | Short-hand names
1531 | ^^^^^^^^^^^^^^^^
1532 |
1533 | Instead of extension objects, short-hand names MAY be used if no
1534 | configuration metadata is required. They are equivalent to extension
1535 | objects with just a `name` key.
1536 |
1537 | .. _extension-definition-must-understand:
1538 |
1539 | `must_understand`
1540 | ^^^^^^^^^^^^^^^^^
1541 |
1542 | An extension object is interpreted to have an implicit field `must_understand` set to
1543 | `True`, unless otherwise stated. An extension object MAY explicitly set `must_understand=False` if
1544 | implementations can ignore its presence.
1545 |
1546 | An implementation MUST fail to open Zarr groups or arrays if any
1547 | metadata fields are present which (a) the
1548 | implementation does not recognize and (b) are not explicitly
1549 | set to ``"must_understand": false``.
1550 |
1551 | `must_understand=False` is not supported for the following extension points:
1552 | data type, chunk grid, and chunk key encoding.
1553 |
1554 | Use of `must_understand=False` to add top-level keys is discouraged in favor
1555 | of the explicit use of :ref:`extension-points`.
1556 |
1557 | .. _extension-naming:
1558 |
1559 | Extension naming
1560 | ----------------
1561 |
1562 | The `name` field of an extension is an identifier that has been registered
1563 | prior to release in any implementation within the `zarr-extensions`_ Github
1564 | repository, where extensions and their specification are listed. The Zarr
1565 | Steering Council or by delegation a maintainer team reserves the right to
1566 | refuse name assignment at its own discretion.
1567 |
1568 | .. _extension-naming-registered-names:
1569 |
1570 | Registered names consist of a single string that is unique within the Zarr ecosystem.
1571 | Registered names are intended for well-known extensions aimed at broad adoption and maximum interoperability.
1572 | Registered names are unique and immutable.
1573 |
1574 | Registered names MUST start with one lower case letter a-z and then be followed
1575 | by only lower case letters a-z, numerals 0-9, underscores, dots and dashes.
1576 |
1577 | - **Accepted regex:** ``^[a-z][a-z0-9-_.]+$``
1578 | - **Valid examples:**
1579 | - ``zstd``
1580 | - ``numcodecs.adler32``
1581 | - **Invalid examples:**
1582 | - ``foo/bar``
1583 | - ``foo:bar``
1584 |
1585 | .. note::
1586 | In previous versions of the v3 spec, the name of an extension was required
1587 | to be a URI. That is now discouraged for new extensions, though, for
1588 | backwards compatibility with existing extensions, URIs names are still
1589 | permitted.
1590 |
1591 | A proposal to additionally support multiple registration mechanisms is under
1592 | discussion in https://github.com/zarr-developers/zarr-specs/pull/330 .
1593 |
1594 | .. _extension-guidance:
1595 |
1596 | Guidance for extension authors
1597 | ------------------------------
1598 |
1599 | *This section is non-normative and provides assistance for the authors of
1600 | extensions, especially those who are just getting started.*
1601 |
1602 | The Zarr maintainers endeavor to make the registration of names as
1603 | straight-forward as possible. We encourage all authors to make use of the extensions
1604 | repository to prevent duplicate efforts across the community where possible.
1605 |
1606 | * **During development**: Authors should use whatever name makes sense
1607 | for their extension, provided it is not already reserved in the registry.
1608 | Once there is a working implementation of the extension (e.g. a PR to an
1609 | existing Zarr implementation), the extension should be submitted to the registry.
1610 |
1611 | * **Well-known extensions**: Authors implementing a well-known extension
1612 | like a data type or codec that is already referred to by name in the
1613 | community may want to check the `zarr-extensions`_ repository to see if
1614 | someone has already implemented the extension.
1615 |
1616 | * **Production extensions**: Authors intending to create significant amounts of
1617 | data or widely distributed data should consider registering all extensions in
1618 | the extension registry to increase the long-term maintainability of the data.
1619 |
1620 | Extension versioning
1621 | --------------------
1622 |
1623 | Registered extensions SHOULD follow the compatibility and versioning `stability policy`_.
1624 |
1625 | Extension example
1626 | -----------------
1627 |
1628 | The following example of array metadata demonstrates these extension naming schemes::
1629 |
1630 | {
1631 | "zarr_format": 3,
1632 | "data_type": "string", // registered, short-hand name
1633 | "chunk_key_encoding": {
1634 | "name": "default", // core
1635 | "configuration": { "separator": "." }
1636 | },
1637 | "codecs": [
1638 | {
1639 | "name": "vlen-utf8" // registered name
1640 | },
1641 | {
1642 | "name": "zstd", // registered name
1643 | "configuration": { ... }
1644 | }
1645 | ],
1646 | "chunk_grid": {
1647 | "name": "regular", // core
1648 | "configuration": { "chunk_shape": [ 32 ] }
1649 | },
1650 | "shape": [ 128 ],
1651 | "dimension_names": [ "x" ],
1652 | "attributes": { ... }
1653 | }
1654 |
1655 | Extension specifications
1656 | ------------------------
1657 |
1658 | Extensions SHOULD have a published specification. A published specification
1659 | facilitates multiple implementations of an extension.
1660 |
1661 | For extensions with registered names, the `zarr-extensions`_ repository
1662 | SHOULD either contain the specification or link to it.
1663 |
1664 | Implementation Notes
1665 | ====================
1666 |
1667 | This section is non-normative and presents notes from implementers about cases
1668 | that need to be carefully considered but do not strictly fall into the spec.
1669 |
1670 | Resizing
1671 | --------
1672 |
1673 | In general, arrays can be resized for writable (and, if necessary, deletable)
1674 | stores. In the most basic case, two scenarios can be considered: shrinking along
1675 | an array dimension, or increasing its size.
1676 |
1677 | When shrinking, implementations can consider whether to delete chunks if the
1678 | store allows this, or keep them. This should either be configurable, or be
1679 | communicated to the user appropriately.
1680 |
1681 | When increasing an array along a dimension, chunks may or may not have existed
1682 | in the new area. For areas where no chunks existed previously, they implicitly
1683 | have the fill value after updating the metadata, no new chunks need to be
1684 | written in this case. Previous partial chunks will contain the fill value at the
1685 | time of writing them by default. If there was chunk data in the new area which
1686 | was not deleted when shrinking the array, this data will be shown by default.
1687 | The latter case should be signalled to the user appropriately. An implementation
1688 | can also allow the user to choose to delete previous data explicitly when
1689 | increasing the array (by writing the fill value into partial chunks and deleting
1690 | others), but this should not be the default behaviour.
1691 |
1692 |
1693 | Comparison with Zarr v2
1694 | =======================
1695 |
1696 | This section is informative.
1697 |
1698 | Below is a summary of the key differences between this specification
1699 | (v3) and Zarr v2.
1700 |
1701 | - v3 has explicit support for extensions via defined
1702 | extension points and mechanisms.
1703 |
1704 | - The set of data types specified in v3 is less than in v2. Additional
1705 | data types will be defined via extensions.
1706 |
1707 | References
1708 | ==========
1709 |
1710 | .. [RFC8259] T. Bray, Ed. The JavaScript Object Notation (JSON) Data
1711 | Interchange Format. December 2017. Best Current Practice. URL:
1712 | https://tools.ietf.org/html/rfc8259
1713 |
1714 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
1715 | Requirement Levels. March 1997. Best Current Practice. URL:
1716 | https://tools.ietf.org/html/rfc2119
1717 |
1718 |
1719 | Change log
1720 | ==========
1721 |
1722 | All notable and possibly implementation-affecting changes to this specification
1723 | are documented in this section, grouped by the specification status and ordered
1724 | by time.
1725 |
1726 | 3.1
1727 | ---
1728 |
1729 | - Clarification of extensions. `PR #330
1730 | `_. With this change,
1731 | it is now possible to add user-defined extensions.
1732 | Additionally, extensions may be marked with `must_understand=False` in case
1733 | a non-implementing library can safely ignore them.
1734 | Please see the new :ref:`Extensions section `
1735 | for details.
1736 |
1737 | Changes after Provisional Acceptance
1738 | ------------------------------------
1739 | - Support for implicit groups was removed. `PR #292
1740 | `_
1741 | - ``endian`` codec was renamed to ``bytes`` codec. `PR #263
1742 | `_
1743 | - Fallback data type support was removed. `PR #248
1744 | `_
1745 | - It is now required to specify an ``array -> bytes`` codec in the ``codecs``
1746 | array metadata field. `PR #249
1747 | `_
1748 | - The representation of fill values for floating point numbers was changed to
1749 | avoid ambiguity. `PR #236
1750 | `_
1751 |
1752 | Draft Changes
1753 | -------------
1754 |
1755 | - Removed `extensions` field and clarified extension point behaviour, changing the config format of
1756 | data-types, chunk-grid, storage-transformers and codecs. `PR #204
1757 | `_
1758 | - Changed `format_version` to the int ``3``, added key ``node_type`` to group and array metadata. `PR #204
1759 | `_
1760 | - Restructured keys and removed entry-point metadata. `PR #200
1761 | `_
1762 | - Added the ``dimension_names`` array metadata field. `PR #162
1763 | `_
1764 | - Replaced ``chunk_memory_layout`` with transpose codec. `PR #189
1765 | `_
1766 | - Allowed to have a list of fallback data types. `PR #167
1767 | `_
1768 | - Removed the 255 character limit for paths. `PR #175
1769 | `_
1770 | - Removed the ``/root`` prefix for paths. `PR #175
1771 | `_
1772 |
1773 | * ``meta/root.array.json`` is now ``meta/array.json``
1774 | * ``meta/root/foo/bar.group.json`` is now ``meta/foo/bar.group.json``
1775 | - Moved the ``metadata_key_suffix`` entrypoint metadata key into ``metadata_encoding``,
1776 | which now just specifies `"json"` via the `type` key and is an extension point.
1777 | `PR #171 `_
1778 | - Changed data type names and changed endianness to be handled by a codec.
1779 | `PR #155 `_
1780 | - Replaced the ``compressor`` field in the array metadata with a ``codecs``
1781 | field that can specify a list of codecs. `PR #153
1782 | `_
1783 | - Required ``fill_value`` in the array metadata to be defined.
1784 | `PR #145 `_
1785 | - Added array storage transformers which can be configured per array via the
1786 | storage_transformers name in the array metadata.
1787 | `PR #134 `_
1788 | - The changelog is incomplete before 2022, please refer to the commits on
1789 | GitHub.
1790 |
1791 | .. _zarr-specs GitHub repository: https://github.com/zarr-developers/zarr-specs
1792 | .. _zarr-extensions: https://github.com/zarr-developers/zarr-extensions
1793 |
--------------------------------------------------------------------------------
/docs/v3/core/terminology-hierarchy.excalidraw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zarr-developers/zarr-specs/b880fb385bedb18dd78ffef1bd683e7e93270c74/docs/v3/core/terminology-hierarchy.excalidraw.png
--------------------------------------------------------------------------------
/docs/v3/core/terminology-read.excalidraw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zarr-developers/zarr-specs/b880fb385bedb18dd78ffef1bd683e7e93270c74/docs/v3/core/terminology-read.excalidraw.png
--------------------------------------------------------------------------------
/docs/v3/data-types/index.rst:
--------------------------------------------------------------------------------
1 | .. _data-type-list:
2 |
3 | ==========
4 | Data Types
5 | ==========
6 |
7 | The following section specifies data types which SHOULD
8 | be implemented by all implementations.
9 |
10 | Core data types
11 | ---------------
12 |
13 | .. list-table:: Data types
14 | :header-rows: 1
15 |
16 | * - Identifier
17 | - Numerical Type
18 | * - ``bool``
19 | - Boolean
20 | * - ``int8``
21 | - Integer in ``[-2^7, 2^7-1]``
22 | * - ``int16``
23 | - Integer in ``[-2^15, 2^15-1]``
24 | * - ``int32``
25 | - Integer in ``[-2^31, 2^31-1]``
26 | * - ``int64``
27 | - Integer in ``[-2^63, 2^63-1]``
28 | * - ``uint8``
29 | - Integer in ``[0, 2^8-1]``
30 | * - ``uint16``
31 | - Integer in ``[0, 2^16-1]``
32 | * - ``uint32``
33 | - Integer in ``[0, 2^32-1]``
34 | * - ``uint64``
35 | - Integer in ``[0, 2^64-1]``
36 | * - ``float16`` (optionally supported)
37 | - IEEE 754 half-precision floating point: sign bit, 5 bits exponent, 10 bits mantissa
38 | * - ``float32``
39 | - IEEE 754 single-precision floating point: sign bit, 8 bits exponent, 23 bits mantissa
40 | * - ``float64``
41 | - IEEE 754 double-precision floating point: sign bit, 11 bits exponent, 52 bits mantissa
42 | * - ``complex64``
43 | - real and complex components are each IEEE 754 single-precision floating point
44 | * - ``complex128``
45 | - real and complex components are each IEEE 754 double-precision floating point
46 | * - ``r*`` (Optional)
47 | - raw bits, variable size given by ``*``, limited to be a multiple of 8
48 |
49 | .. _fill-value-list:
50 |
51 | Permitted fill values
52 | ^^^^^^^^^^^^^^^^^^^^^
53 |
54 | The permitted values depend on the data type:
55 |
56 | ``bool``
57 | The value must be a JSON boolean (``false`` or ``true``).
58 |
59 | Integers (``{uint,int}{8,16,32,64}``)
60 | The value must be a JSON number with no fraction or exponent part that is
61 | within the representable range of the data type.
62 |
63 | IEEE 754 floating point numbers (``float{16,32,64}``)
64 | The value may be either:
65 |
66 | - A JSON number, that will be rounded to the nearest representable value.
67 |
68 | - A JSON string of the form:
69 |
70 | - ``"Infinity"``, denoting positive infinity;
71 | - ``"-Infinity"``, denoting negative infinity;
72 | - ``"NaN"``, denoting the not-a-number (NaN) value where the sign bit is
73 | 0 (positive), the most significant bit (MSB) of the mantissa is 1, and
74 | all other bits of the mantissa are zero;
75 | - ``"0xYYYYYYYY"``, specifying the byte representation of the floating
76 | point number as an unsigned integer. For example, for ``float32``,
77 | ``"NaN"`` is equivalent to ``"0x7fc00000"``. This representation is
78 | the only way to specify a NaN value other than the specific NaN value
79 | denoted by ``"NaN"``.
80 |
81 | .. warning::
82 |
83 | While this NaN syntax is consistent with the syntax accepted by the
84 | C99 ``strtod`` function, C99 leaves the meaning of the NaN payload
85 | string implementation defined, which may not match the Zarr
86 | definition.
87 |
88 | Complex numbers (``complex{64,128}``)
89 | The value must be a two-element array, specifying the real and imaginary
90 | components respectively, where each component is specified as defined
91 | above for floating point number.
92 |
93 | For example, ``[1, 2]`` indicates ``1 + 2i`` and ``["-Infinity", "NaN"]``
94 | indicates a complex number with real component of -inf and imaginary
95 | component of NaN.
96 |
97 | Raw data types (``r``)
98 | An array of integers, with length equal to ````, where each integer is
99 | in the range ``[0, 255]``.
100 |
101 | Extensions
102 | ----------
103 |
104 | Registered data type extensions can be found under
105 | `zarr-extensions::data-types `_.
106 |
--------------------------------------------------------------------------------
/docs/v3/storage-transformers/index.rst:
--------------------------------------------------------------------------------
1 | .. _storage-transformer-list:
2 |
3 | ==========================
4 | Array Storage Transformers
5 | ==========================
6 |
7 | .. COMMENT TO BE REMOVED WHEN ONE IS ADDED
8 |
9 | The following documents specify core storage transformers which SHOULD
10 | be implemented by all implementations.
11 |
12 | toctree::
13 | :glob:
14 | :maxdepth: 1
15 | :titlesonly:
16 | :caption: Contents:
17 |
18 | */*
19 |
20 | Currently, no core storage transformers are defined by this specification.
21 |
22 | Extensions
23 | ----------
24 |
25 | Registered storage transform extensions can be found under
26 | `zarr-extensions::storage-transformers `_.
27 |
--------------------------------------------------------------------------------
/docs/v3/stores/filesystem/index.rst:
--------------------------------------------------------------------------------
1 | .. _file-system-store-v1:
2 |
3 | =================
4 | File system store
5 | =================
6 |
7 | Version:
8 | 1.0
9 | Specification URI:
10 | https://zarr-specs.readthedocs.io/en/latest/v3/stores/filesystem/
11 | Corresponding ZEP:
12 | `ZEP0001 — Zarr specification version 3 `_
13 | Issue tracking:
14 | `GitHub issues `_
15 | Suggest an edit for this spec:
16 | `GitHub editor `_
17 |
18 | Copyright 2019-Present Zarr core development team. This work is
19 | licensed under a `Creative Commons Attribution 3.0 Unported License
20 | `_.
21 |
22 | ----
23 |
24 |
25 | Abstract
26 | ========
27 |
28 | This specification defines an implementation of the Zarr abstract
29 | store API using a file system.
30 |
31 |
32 | Status of this document
33 | =======================
34 |
35 | ZEP0001 was accepted on May 15th, 2023 via https://github.com/zarr-developers/zarr-specs/issues/227.
36 |
37 |
38 | Notes about design decisions for the native File System Store
39 | =============================================================
40 |
41 | The original file system store is designed for simplicity and easy manipulation
42 | and transfer by external tools not aware of the store structure. In particular,
43 | tools like ``gsutil`` can be use to transfer a local directory store to cloud
44 | base storage, hence the keys choices will be conserved.
45 |
46 |
47 | Document conventions
48 | ====================
49 |
50 | Conformance requirements are expressed with a combination of
51 | descriptive assertions and [RFC2119]_ terminology. The key words
52 | "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
53 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
54 | parts of this document are to be interpreted as described in
55 | [RFC2119]_. However, for readability, these words do not appear in all
56 | uppercase letters in this specification.
57 |
58 | All of the text of this specification is normative except sections
59 | explicitly marked as non-normative, examples, and notes. Examples in
60 | this specification are introduced with the words "for example".
61 |
62 |
63 | Native storage operations
64 | =========================
65 |
66 | Here we consider a file system to be any system comprised of files and
67 | directories, where:
68 |
69 | * Each file has a name (sequence of characters) and contents
70 | (sequence of bytes).
71 |
72 | * Each directory has a name (sequence of characters) and children (set
73 | of zero or more files and/or directories).
74 |
75 | * Each file or directory can be addressed by a path, comprised of its
76 | name and the names of all ancestor directories, which uniquely
77 | identifies it within the file system.
78 |
79 | … and where the following native operations are supported:
80 |
81 | * Create a file.
82 |
83 | * Write the contents of a file.
84 |
85 | * Read the contents of a file.
86 |
87 | * Delete a file.
88 |
89 | * Create a directory.
90 |
91 | * List the children of a directory, returning the name and type (file
92 | or directory) of each child.
93 |
94 | * Delete a directory.
95 |
96 |
97 | Key translation
98 | ===============
99 |
100 | The Zarr store interface is defined in terms of `keys` and `values`,
101 | where a `key` is a sequence of characters and a `value` is a sequence
102 | of bytes. A file system store translates keys into file system
103 | paths. This translation assumes that the store has been defined
104 | relative to a base directory. The translation is as follows:
105 |
106 | * Replace any forward slash characters ('/') in the key with the
107 | native directory separator for the file system.
108 |
109 | * Join the result to the base directory path, using the native
110 | directory separator.
111 |
112 | For example, if the file system is a POSIX file system, and the base
113 | directory path is "/data", then the key "foo/bar" is translated to the
114 | file system path "/data/foo/bar".
115 |
116 | For example, if the file system is a Windows file system, and the base
117 | directory path is "C:\\data", then the key "foo/bar" is translated to
118 | the file system path "C:\\data\\foo\\bar".
119 |
120 | When returning information about available keys, a file system store
121 | performs the reverse translation from file system paths to keys. This
122 | translation is as follows:
123 |
124 | * Replace any native directory separator characters with the forward
125 | slash character.
126 |
127 | * Strip the base directory path from the beginning of the path.
128 |
129 | For example, if the file system is a POSIX file system, and the base
130 | directory path is "/data", then the file system path "/data/foo/bar"
131 | is translated to the key "foo/bar".
132 |
133 | For example, if the file system is a Windows file system, and the base
134 | directory path is "C:\\data", then the file system path
135 | "C:\\data\\foo\\bar" is translated to the key "foo/bar".
136 |
137 |
138 | Store API implementation
139 | ========================
140 |
141 | The section below defines an implementation of the Zarr
142 | :ref:`abstract-store-interface` in terms of the native operations of this
143 | storage system. Below ``fspath_to_key()`` is a function that
144 | translates file system paths to store keys, and ``key_to_fspath()`` is
145 | a function that translates store keys to file system paths, as defined
146 | in the section above.
147 |
148 | * ``get(key) -> value`` : Read and return the contents of the file at
149 | file system path ``key_to_fspath(key)``.
150 |
151 | * ``set(key, value)`` : Write ``value`` as the contents of the file at
152 | file system path ``key_to_fspath(key)``.
153 |
154 | * ``delete(key)`` : Delete the file or directory at file system path
155 | ``key_to_fspath(key)``.
156 |
157 | * ``list()`` : Recursively walk the file system from the base
158 | directory, returning an iterator over keys obtained by calling
159 | ``fspath_to_key(fp)`` for each descendant file path ``fp``.
160 |
161 | * ``list_prefix(prefix)`` : Obtain a file system path by calling
162 | ``key_to_fspath(prefix)``. If the result is a directory path,
163 | recursively walk the file system from this directory, returning an
164 | iterator over keys obtained by calling ``fspath_to_key(fp)`` for
165 | each descendant file path ``fp``.
166 |
167 | * ``list_dir(prefix)`` : Obtain a file system path by calling
168 | ``key_to_fspath(prefix)``. If the result is a directory path, list
169 | the directory children. Return a set of keys obtained by calling
170 | ``fspath_to_key(fp)`` for each child file path ``fp``, and a set of
171 | prefixes obtained by calling ``fspath_to_key(dp)`` for each child
172 | directory path ``dp``.
173 |
174 |
175 | Canonical URI
176 | =============
177 |
178 | The canonical URI format for this store follows the file URI scheme of the base
179 | directory path, as defined in [RFC8089]_. For a Windows base directory path
180 | "c:\\my data" the canonical URI would be "file:///c:/my%20data", for a Posix
181 | base directory "/my data" it would be"file:///my%20data".
182 |
183 | When expecting a URI string, but no scheme is present, implementations may
184 | assume a filesystem store with the (supposedly URI) string as the base directory
185 | path.
186 |
187 |
188 | Store limitations
189 | =================
190 |
191 | The following limitations for this store are know:
192 |
193 | * `260 characters path length limit in Windows `_
194 | * `Windows paths are case-insensitive by default `_
195 | * `MacOS paths are case-insensitive by default `_
196 |
197 |
198 | References
199 | ==========
200 |
201 | .. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
202 | Requirement Levels. March 1997. Best Current Practice. URL:
203 | https://tools.ietf.org/html/rfc2119
204 |
205 | .. [RFC8089] M. Kerwin. The "file" URI Scheme. February 2017. Proposed Standard.
206 | URL: https://tools.ietf.org/html/rfc8089
207 |
208 |
209 | Change log
210 | ==========
211 |
212 | No changes yet.
213 |
--------------------------------------------------------------------------------
/docs/v3/stores/index.rst:
--------------------------------------------------------------------------------
1 | .. _store-list:
2 |
3 | ======
4 | Stores
5 | ======
6 |
7 | The following documents specify stores which SHOULD
8 | be implemented by all implementations.
9 |
10 | .. toctree::
11 | :glob:
12 | :maxdepth: 1
13 | :titlesonly:
14 | :caption: Contents:
15 |
16 | */*
17 |
18 | .. note::
19 | Stores are *not* extension points since they define the mechanism
20 | for loading metadata documents such that extensions can be loaded.
21 |
--------------------------------------------------------------------------------