├── .nojekyll
├── styles.css
├── CNAME
├── .gitignore
├── janelia.jpg
├── template_image.png
├── review_process_image.png
├── posts.qmd
├── _data
└── navigation.yml
├── index.qmd
├── file_formats.qmd
├── metadata.qmd
├── theme-dark.scss
├── .github
└── workflows
│ └── publish.yml
├── _quarto.yml
├── CONTRIBUTING.md
├── LICENSE
├── about.qmd
├── README.md
├── posts
├── title-metadata.html
└── file_formats_introduction.qmd
├── docs
└── site_libs
│ └── quarto-html
│ ├── quarto-syntax-highlighting-dark-8ea72dc5fed832574809a9c94082fbbb.css
│ └── quarto-syntax-highlighting-549806ee2085284f45b00abea8c6df48.css
├── definitions.qmd
└── REVIEW_PROCESS.md
/.nojekyll:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/styles.css:
--------------------------------------------------------------------------------
1 | /* css styles */
2 |
--------------------------------------------------------------------------------
/CNAME:
--------------------------------------------------------------------------------
1 | datastandards.janelia.org
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # For macOS:
2 | .DS_Store
3 |
4 | # quarto:
5 | /_site/
6 | /.quarto/
7 |
--------------------------------------------------------------------------------
/janelia.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JaneliaSciComp/JaneliaDataStandards/HEAD/janelia.jpg
--------------------------------------------------------------------------------
/template_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JaneliaSciComp/JaneliaDataStandards/HEAD/template_image.png
--------------------------------------------------------------------------------
/review_process_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JaneliaSciComp/JaneliaDataStandards/HEAD/review_process_image.png
--------------------------------------------------------------------------------
/posts.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Posts"
3 | listing:
4 | contents: posts
5 | sort: "date desc"
6 | type: default
7 | categories: false
8 | ---
9 |
--------------------------------------------------------------------------------
/_data/navigation.yml:
--------------------------------------------------------------------------------
1 | main:
2 | - title: "About"
3 | url: /about/
4 | - title: "Posts"
5 | url: /posts/
6 | - title: "File Formats"
7 | url: /file_formats/
8 | - title: "Definitions"
9 | url: /definitions/
10 |
--------------------------------------------------------------------------------
/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Welcome!"
3 | ---
4 |
5 | :::{.column-page}
6 | This is a nascent project to unify bioimaging data conventions at HHMI's Janelia Research Campus.
7 |
8 | 
9 | :::
10 |
--------------------------------------------------------------------------------
/file_formats.qmd:
--------------------------------------------------------------------------------
1 | # Janelia's Bioimaging File Formats
2 |
3 | :::{.column-screen}
4 |
5 | :::
6 |
--------------------------------------------------------------------------------
/metadata.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Metadata: supported features"
3 | ---
4 |
5 | :::{.column-screen}
6 |
7 | :::
8 |
--------------------------------------------------------------------------------
/theme-dark.scss:
--------------------------------------------------------------------------------
1 | /*-- scss:defaults --*/
2 | // Base document colors
3 | $body-bg: #181818;
4 | $body-color: white;
5 | $link-color: #75AADB;
6 | $popover-bg: #75aadb;
7 |
8 | // Code blocks
9 | $code-block-bg-alpha: -.8;
10 |
11 | // Navigation bar
12 | $navbar-bg: #181818;
13 | $navbar-hl: #75aadb;
14 | $navbar-fg: #75aadb;
15 | $dropdown-link-hover-color: #75aadb;
16 |
--------------------------------------------------------------------------------
/.github/workflows/publish.yml:
--------------------------------------------------------------------------------
1 | on:
2 | workflow_dispatch:
3 | push:
4 | branches: main
5 |
6 | name: Quarto Publish
7 |
8 | jobs:
9 | build-deploy:
10 | runs-on: ubuntu-latest
11 | permissions:
12 | contents: write
13 | steps:
14 | - name: Check out repository
15 | uses: actions/checkout@v4
16 |
17 | - name: Set up Quarto
18 | uses: quarto-dev/quarto-actions/setup@v2
19 |
20 | - name: Render and Publish
21 | uses: quarto-dev/quarto-actions/publish@v2
22 | with:
23 | target: gh-pages
24 | env:
25 | GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
26 |
--------------------------------------------------------------------------------
/_quarto.yml:
--------------------------------------------------------------------------------
1 | project:
2 | type: website
3 |
4 | execute:
5 | freeze: auto
6 |
7 | website:
8 | title: "Janelia Data Standards"
9 | navbar:
10 | tools:
11 | - icon: github
12 | href: https://github.com/JaneliaSciComp/JaneliaDataStandards
13 | left:
14 | - text: About
15 | href: about.qmd
16 | - text: Posts
17 | href: posts.qmd
18 | - text: File Formats
19 | menu:
20 | - text: Overview
21 | href: file_formats.qmd
22 | - text: Metadata features
23 | href: metadata.qmd
24 | - text: Definitions
25 | href: definitions.qmd
26 |
27 | format:
28 | html:
29 | theme:
30 | light: cosmo
31 | dark: [cosmo,theme-dark.scss]
32 | css: styles.css
33 | toc: false
34 |
35 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to contribute
2 |
3 | Thanks for considering contributing to Janelia's data standards!
4 |
5 | In order to keep things simple, please refrain from forking this repository.
6 | Instead, create a new branch with your changes in this repository and create a pull request against the main branch (or any other suitable branch).
7 | Alternatively, if your changes are small and reasonable, you may commit them to the main branch yourself.
8 | If you don't have edit rights in this repository but would like them, please contact Virginia Scarlett.
9 |
10 | Posts in a format suitable for [quarto](https://quarto.org), such as quarto markdown (.qmd) or a notebook (.ipynb) can be added to /posts/.
11 | Use the existing posts as a template, at least for the header block. Quarto will render these files to html, i.e., build the static site files.
12 |
13 | The workflow for submitting a PR (should you wish to do so) is as follows: \
14 | Clone the repo > create your feature branch > do some work > Optionally do `quarto preview` to host the site locally and see it in your browser > `git commit` and `git push` as usual.
15 |
16 | To preview the site or build it locally, you will need quarto installed on your computer. Neither of these is strictly necessary, since a GitHub action should render the site files remotely for you. Changes may take up to 20 minutes to be reflected on the website.
17 |
18 | The rendered site pages are automatically stored in, and deployed from, the gh-pages branch, so please do not modify that branch.
19 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | BSD 3-Clause License
2 |
3 | Copyright (c) 2023, Howard Hughes Medical Institute
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are met:
7 |
8 | 1. Redistributions of source code must retain the above copyright notice, this
9 | list of conditions and the following disclaimer.
10 |
11 | 2. Redistributions in binary form must reproduce the above copyright notice,
12 | this list of conditions and the following disclaimer in the documentation
13 | and/or other materials provided with the distribution.
14 |
15 | 3. Neither the name of the copyright holder nor the names of its
16 | contributors may be used to endorse or promote products derived from
17 | this software without specific prior written permission.
18 |
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 |
--------------------------------------------------------------------------------
/about.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "About Janelia Data Standards"
3 | ---
4 | ## Purpose
5 |
6 | The Janelia Data Standards group was formed by bioimaging developers who have encountered specific, practical bioimaging data dilemmas for which there is little or no guidance from international community standards. Janelia is excited about international standardization efforts, particularly OME-NGFF. Janelia is contributing to OME-NGFF and is rooting for its success. However, such efforts are not enough, since research experiments may often outpace or even diverge from contemporary community standards. The essays in this collection are meant to fill that gap.
7 |
8 | **This website is Janelia’s bioimaging developers’ manifesto.** It is a collection of essays written by developers, for developers, on the advanced technical challenges they’ve encountered. It records the choices Janelia’s developers have made when encountering exotic data, so that those encountering similar situations can make consistent choices.
9 |
10 | ## Style
11 |
12 | The articles in this collection will be vetted, and their conclusions authoritative, for Janelia’s purposes. Where swift, unambiguous decisions are needed, such decisions will be made. The rationale behind those decisions will be explained, and they will become standard practice at Janelia. Contributors are encouraged to remain pragmatic, to describe their use cases, and to share their example data. That being said, where applicable, developers should speak to the abstract design principles that drove their choices.
13 |
14 | ## Contributing
15 |
16 | This project aims to develop conventions that Janelians need to do their work, and to disseminate those conventions across Janelia. It is not this group’s goal to create a comprehensive textbook, nor to create an international standard. However, as this project matures, contributions to and from the community may be considered. Individuals outside of Janelia who are interested in writing an article should create a GitHub issue to explore this possibility before investing time in it. It is this group’s hope that the rapidly evolving conventions developed here may ultimately, gradually, be considered for incorporation into the OME-NGFF standard as well.
17 |
18 | ## Structure
19 |
20 | This effort is in its infancy. Ultimately, the project is expected to consist of four components:
21 |
22 | - Written articles.
23 | - An accessible and easily readable website that hosts the articles.
24 | - A glossary and/or thesaurus.
25 | - A directory of example data that the public can view and browse.
26 |
27 | We appreciate the community’s interest in this social experiment.
28 |
29 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Janelia Data Standards
2 | [https://datastandards.janelia.org/](https://datastandards.janelia.org)
3 |
4 | ## Purpose
5 |
6 | The Janelia Data Standards group was formed by bioimaging developers who have encountered specific, practical bioimaging data dilemmas for which there is little or no guidance from international community standards. Janelia is excited about international standardization efforts, particularly OME-NGFF. Janelia is contributing to OME-NGFF and is rooting for its success. However, such efforts are not enough, since research experiments may often outpace or even diverge from contemporary community standards. The essays in this collection are meant to fill that gap.
7 |
8 | **This website is Janelia’s bioimaging developers’ manifesto.** It is a collection of essays written by developers, for developers, on the advanced technical challenges they’ve encountered. It records the choices Janelia’s developers have made when encountering exotic data, so that those encountering similar situations can make consistent choices.
9 |
10 | ## Style
11 |
12 | The articles in this collection will be vetted, and their conclusions authoritative, for Janelia’s purposes. Where swift, unambiguous decisions are needed, such decisions will be made. The rationale behind those decisions will be explained, and they will become standard practice at Janelia. Contributors are encouraged to remain pragmatic, to describe their use cases, and to share their example data. That being said, where applicable, developers should speak to the abstract design principles that drove their choices.
13 |
14 | ## Contributing
15 |
16 | This project aims to develop conventions that Janelians need to do their work, and to disseminate those conventions across Janelia. It is not this group’s goal to create a comprehensive textbook, nor to create an international standard. However, as this project matures, contributions to and from the community may be considered. Individuals outside of Janelia who are interested in writing an article should create a GitHub issue to explore this possibility before investing time in it. It is this group’s hope that the rapidly evolving conventions developed here may ultimately, gradually, be considered for incorporation into the OME-NGFF standard as well.
17 |
18 | ## Structure
19 |
20 | This effort is in its infancy. Ultimately, the project is expected to consist of four components:
21 |
22 | - Written articles.
23 | - An accessible and easily readable website that hosts the articles.
24 | - A glossary and/or thesaurus.
25 | - A directory of example data that the public can view and browse.
26 |
27 | We appreciate the community’s interest in this social experiment.
28 |
29 |
--------------------------------------------------------------------------------
/posts/title-metadata.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | $if(by-affiliation/first)$
4 |
27 | $endif$
28 |
29 |
116 |
117 | $if(abstract)$
118 |
119 |
120 |
$labels.abstract$
121 | $abstract$
122 |
123 |
124 | $endif$
125 |
126 | $if(keywords)$
127 |
128 |
129 |
$labels.keywords$
130 |
$for(keywords)$$it$$sep$, $endfor$
131 |
132 |
133 | $endif$
134 |
135 |
--------------------------------------------------------------------------------
/docs/site_libs/quarto-html/quarto-syntax-highlighting-dark-8ea72dc5fed832574809a9c94082fbbb.css:
--------------------------------------------------------------------------------
1 | /* quarto syntax highlight colors */
2 | :root {
3 | --quarto-hl-al-color: #f07178;
4 | --quarto-hl-an-color: #d4d0ab;
5 | --quarto-hl-at-color: #00e0e0;
6 | --quarto-hl-bn-color: #d4d0ab;
7 | --quarto-hl-bu-color: #abe338;
8 | --quarto-hl-ch-color: #abe338;
9 | --quarto-hl-co-color: #f8f8f2;
10 | --quarto-hl-cv-color: #ffd700;
11 | --quarto-hl-cn-color: #ffd700;
12 | --quarto-hl-cf-color: #ffa07a;
13 | --quarto-hl-dt-color: #ffa07a;
14 | --quarto-hl-dv-color: #d4d0ab;
15 | --quarto-hl-do-color: #f8f8f2;
16 | --quarto-hl-er-color: #f07178;
17 | --quarto-hl-ex-color: #00e0e0;
18 | --quarto-hl-fl-color: #d4d0ab;
19 | --quarto-hl-fu-color: #ffa07a;
20 | --quarto-hl-im-color: #abe338;
21 | --quarto-hl-in-color: #d4d0ab;
22 | --quarto-hl-kw-color: #ffa07a;
23 | --quarto-hl-op-color: #ffa07a;
24 | --quarto-hl-ot-color: #00e0e0;
25 | --quarto-hl-pp-color: #dcc6e0;
26 | --quarto-hl-re-color: #00e0e0;
27 | --quarto-hl-sc-color: #abe338;
28 | --quarto-hl-ss-color: #abe338;
29 | --quarto-hl-st-color: #abe338;
30 | --quarto-hl-va-color: #00e0e0;
31 | --quarto-hl-vs-color: #abe338;
32 | --quarto-hl-wa-color: #dcc6e0;
33 | }
34 |
35 | /* other quarto variables */
36 | :root {
37 | --quarto-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;
38 | }
39 |
40 | code span.al {
41 | background-color: #2a0f15;
42 | font-weight: bold;
43 | color: #f07178;
44 | }
45 |
46 | code span.an {
47 | color: #d4d0ab;
48 | }
49 |
50 | code span.at {
51 | color: #00e0e0;
52 | }
53 |
54 | code span.bn {
55 | color: #d4d0ab;
56 | }
57 |
58 | code span.bu {
59 | color: #abe338;
60 | }
61 |
62 | code span.ch {
63 | color: #abe338;
64 | }
65 |
66 | code span.co {
67 | font-style: italic;
68 | color: #f8f8f2;
69 | }
70 |
71 | code span.cv {
72 | color: #ffd700;
73 | }
74 |
75 | code span.cn {
76 | color: #ffd700;
77 | }
78 |
79 | code span.cf {
80 | font-weight: bold;
81 | color: #ffa07a;
82 | }
83 |
84 | code span.dt {
85 | color: #ffa07a;
86 | }
87 |
88 | code span.dv {
89 | color: #d4d0ab;
90 | }
91 |
92 | code span.do {
93 | color: #f8f8f2;
94 | }
95 |
96 | code span.er {
97 | color: #f07178;
98 | text-decoration: underline;
99 | }
100 |
101 | code span.ex {
102 | font-weight: bold;
103 | color: #00e0e0;
104 | }
105 |
106 | code span.fl {
107 | color: #d4d0ab;
108 | }
109 |
110 | code span.fu {
111 | color: #ffa07a;
112 | }
113 |
114 | code span.im {
115 | color: #abe338;
116 | }
117 |
118 | code span.in {
119 | color: #d4d0ab;
120 | }
121 |
122 | code span.kw {
123 | font-weight: bold;
124 | color: #ffa07a;
125 | }
126 |
127 | pre > code.sourceCode > span {
128 | color: #f8f8f2;
129 | }
130 |
131 | code span {
132 | color: #f8f8f2;
133 | }
134 |
135 | code.sourceCode > span {
136 | color: #f8f8f2;
137 | }
138 |
139 | div.sourceCode,
140 | div.sourceCode pre.sourceCode {
141 | color: #f8f8f2;
142 | }
143 |
144 | code span.op {
145 | color: #ffa07a;
146 | }
147 |
148 | code span.ot {
149 | color: #00e0e0;
150 | }
151 |
152 | code span.pp {
153 | color: #dcc6e0;
154 | }
155 |
156 | code span.re {
157 | background-color: #f8f8f2;
158 | color: #00e0e0;
159 | }
160 |
161 | code span.sc {
162 | color: #abe338;
163 | }
164 |
165 | code span.ss {
166 | color: #abe338;
167 | }
168 |
169 | code span.st {
170 | color: #abe338;
171 | }
172 |
173 | code span.va {
174 | color: #00e0e0;
175 | }
176 |
177 | code span.vs {
178 | color: #abe338;
179 | }
180 |
181 | code span.wa {
182 | color: #dcc6e0;
183 | }
184 |
185 | .prevent-inlining {
186 | content: "";
187 | }
188 |
189 | /*# sourceMappingURL=74548907a2aa49896d8738a3cf5e8124.css.map */
190 |
--------------------------------------------------------------------------------
/docs/site_libs/quarto-html/quarto-syntax-highlighting-549806ee2085284f45b00abea8c6df48.css:
--------------------------------------------------------------------------------
1 | /* quarto syntax highlight colors */
2 | :root {
3 | --quarto-hl-ot-color: #003B4F;
4 | --quarto-hl-at-color: #657422;
5 | --quarto-hl-ss-color: #20794D;
6 | --quarto-hl-an-color: #5E5E5E;
7 | --quarto-hl-fu-color: #4758AB;
8 | --quarto-hl-st-color: #20794D;
9 | --quarto-hl-cf-color: #003B4F;
10 | --quarto-hl-op-color: #5E5E5E;
11 | --quarto-hl-er-color: #AD0000;
12 | --quarto-hl-bn-color: #AD0000;
13 | --quarto-hl-al-color: #AD0000;
14 | --quarto-hl-va-color: #111111;
15 | --quarto-hl-bu-color: inherit;
16 | --quarto-hl-ex-color: inherit;
17 | --quarto-hl-pp-color: #AD0000;
18 | --quarto-hl-in-color: #5E5E5E;
19 | --quarto-hl-vs-color: #20794D;
20 | --quarto-hl-wa-color: #5E5E5E;
21 | --quarto-hl-do-color: #5E5E5E;
22 | --quarto-hl-im-color: #00769E;
23 | --quarto-hl-ch-color: #20794D;
24 | --quarto-hl-dt-color: #AD0000;
25 | --quarto-hl-fl-color: #AD0000;
26 | --quarto-hl-co-color: #5E5E5E;
27 | --quarto-hl-cv-color: #5E5E5E;
28 | --quarto-hl-cn-color: #8f5902;
29 | --quarto-hl-sc-color: #5E5E5E;
30 | --quarto-hl-dv-color: #AD0000;
31 | --quarto-hl-kw-color: #003B4F;
32 | }
33 |
34 | /* other quarto variables */
35 | :root {
36 | --quarto-font-monospace: SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace;
37 | }
38 |
39 | pre > code.sourceCode > span {
40 | color: #003B4F;
41 | }
42 |
43 | code span {
44 | color: #003B4F;
45 | }
46 |
47 | code.sourceCode > span {
48 | color: #003B4F;
49 | }
50 |
51 | div.sourceCode,
52 | div.sourceCode pre.sourceCode {
53 | color: #003B4F;
54 | }
55 |
56 | code span.ot {
57 | color: #003B4F;
58 | font-style: inherit;
59 | }
60 |
61 | code span.at {
62 | color: #657422;
63 | font-style: inherit;
64 | }
65 |
66 | code span.ss {
67 | color: #20794D;
68 | font-style: inherit;
69 | }
70 |
71 | code span.an {
72 | color: #5E5E5E;
73 | font-style: inherit;
74 | }
75 |
76 | code span.fu {
77 | color: #4758AB;
78 | font-style: inherit;
79 | }
80 |
81 | code span.st {
82 | color: #20794D;
83 | font-style: inherit;
84 | }
85 |
86 | code span.cf {
87 | color: #003B4F;
88 | font-weight: bold;
89 | font-style: inherit;
90 | }
91 |
92 | code span.op {
93 | color: #5E5E5E;
94 | font-style: inherit;
95 | }
96 |
97 | code span.er {
98 | color: #AD0000;
99 | font-style: inherit;
100 | }
101 |
102 | code span.bn {
103 | color: #AD0000;
104 | font-style: inherit;
105 | }
106 |
107 | code span.al {
108 | color: #AD0000;
109 | font-style: inherit;
110 | }
111 |
112 | code span.va {
113 | color: #111111;
114 | font-style: inherit;
115 | }
116 |
117 | code span.bu {
118 | font-style: inherit;
119 | }
120 |
121 | code span.ex {
122 | font-style: inherit;
123 | }
124 |
125 | code span.pp {
126 | color: #AD0000;
127 | font-style: inherit;
128 | }
129 |
130 | code span.in {
131 | color: #5E5E5E;
132 | font-style: inherit;
133 | }
134 |
135 | code span.vs {
136 | color: #20794D;
137 | font-style: inherit;
138 | }
139 |
140 | code span.wa {
141 | color: #5E5E5E;
142 | font-style: italic;
143 | }
144 |
145 | code span.do {
146 | color: #5E5E5E;
147 | font-style: italic;
148 | }
149 |
150 | code span.im {
151 | color: #00769E;
152 | font-style: inherit;
153 | }
154 |
155 | code span.ch {
156 | color: #20794D;
157 | font-style: inherit;
158 | }
159 |
160 | code span.dt {
161 | color: #AD0000;
162 | font-style: inherit;
163 | }
164 |
165 | code span.fl {
166 | color: #AD0000;
167 | font-style: inherit;
168 | }
169 |
170 | code span.co {
171 | color: #5E5E5E;
172 | font-style: inherit;
173 | }
174 |
175 | code span.cv {
176 | color: #5E5E5E;
177 | font-style: italic;
178 | }
179 |
180 | code span.cn {
181 | color: #8f5902;
182 | font-style: inherit;
183 | }
184 |
185 | code span.sc {
186 | color: #5E5E5E;
187 | font-style: inherit;
188 | }
189 |
190 | code span.dv {
191 | color: #AD0000;
192 | font-style: inherit;
193 | }
194 |
195 | code span.kw {
196 | color: #003B4F;
197 | font-weight: bold;
198 | font-style: inherit;
199 | }
200 |
201 | .prevent-inlining {
202 | content: "";
203 | }
204 |
205 | /*# sourceMappingURL=ae99138f4fbc2fb4c9f5e5cd69c22459.css.map */
206 |
--------------------------------------------------------------------------------
/definitions.qmd:
--------------------------------------------------------------------------------
1 | # Definitions
2 | Please note that this is a living document. Definitions are preliminary and subject to change.
3 |
4 | - Basic definitions
5 | - [array](#array)
6 | - [image](#image)
7 | - [pixel](#pixel)
8 | - [sample](#sample)
9 | - [voxel](#voxel)
10 | - Other definitions
11 | - [axis](#axis)
12 | - [bit-depth](#bit-depth)
13 | - [domain](#domain)
14 | - [downsampling](#downsampling)
15 | - [field of view](#field-of-view)
16 | - [filtering](#filtering)
17 | - [group](#group)
18 | - [hierarchy](#hierarchy)
19 | - [interpolation](#interpolation)
20 | - [origin](#origin)
21 | - [physical](#physical)
22 | - [quantization](#quantization)
23 | - [resampling](#resampling)
24 | - [resolution](#resolution)
25 |
26 |
27 | ## Basic definitions
28 |
29 | ### array
30 | An n-dimensional collection of discrete samples whose domain is a regular discrete (integer) grid.
31 |
32 | Related terms: [sample](#sample), [image](#image), [hierarchy](#hierarchy)
33 |
34 | ### image
35 | A static set of coherent visual information. For our purposes, ‘image’ and ‘digital image’ may be used interchangeably.
36 |
37 | Images can be represented in compact forms, for example as a compressed sequence of bytes or as a discrete function over a finite domain. For our purposes, the word ‘image’ by itself refers to raster images produced by displaying arrays and array-like data structures on a screen. Unless otherwise specified, this rastering occurs at regular equispaced intervals, the pixel pitch.
38 |
39 | An image is an abstract notion distinct from its representation, e.g. a discrete digital array. Colloquially, 'array' and 'image' are often used interchangeably. However, a rigorous technical definition separates the two, so that an image (a static set of coherent visual information) may be constituted by several arrays, for example.
40 |
41 | Related terms: [array](#array), [sample](#sample), [pixel](#pixel), [voxel](#voxel), [axis](#axis), [dimension](#dimension)
42 |
43 | ### pixel
44 | A single sample of a two-dimensional image.
45 |
46 | Related terms: [sample](#sample), [voxel](#voxel)
47 |
48 | ### sample
49 | A digital number representing a measurement of the energy sensed by a particular cell on a sensor at a discrete point in time. Because cells on a sensor correspond to elements of an array and pixels of an image, sample is often used interchangeably with pixel.
50 |
51 | Related terms: [pixel](#pixel), [voxel](#voxel), [image](#image), [array](#array)
52 |
53 | ### voxel
54 | A single sample of a three-dimensional image.
55 |
56 | Related terms: [sample](#sample), [pixel](#pixel)
57 |
58 | ## Other definitions
59 |
60 | ### axis
61 | The physical interpretation of a discrete, numeric, finite dimension. Generally represented with a 1D variable that is strictly monotonic and has the same name as the axis it represents. An axis must have physical units.
62 |
63 | ### bit-depth
64 | The number of bits used in the quantization of a digital image that defines the number of unique values that can be represented by samples. For example, samples of images with a bit depth of ("8-bit images") can take up to 256 unique values.
65 |
66 | Related terms: [quantization](#quantization)
67 |
68 | ### dimension
69 | An independent extent of a domain. A domain has $N$ dimensions where $N$ is the minimum number of coordinates needed to identify any particular point within the domain. The length of a discrete, numeric, finite dimension establishes the number of indexable locations along that dimension.
70 |
71 | ### domain
72 | A set of discrete locations in abstract space. A domain, or any location within a domain, may be described by multiple variables, but any given variable has only one domain. A domain has zero or more dimensions. The component dimensions of a domain need not be numeric, but when they are, the domain may be thought of as situated in a coordinate space. If a domain's dimensions are all axes, then that domain is situated in a physical space.
73 |
74 | ### downsampling
75 | The act of resampling an image to a lower sample density (higher pixel spacing), often by an integer factor.
76 | Sometimes this can require interpolation.
77 |
78 | Related terms: [resampling](#resampling), [resolution](#resolution), [interpolation](#interpolation)
79 |
80 | ### field of view
81 | The physical extent of the observed space. In microscopy, FOV may be expressed as the diameter of the circular view seen through the eyepiece. In scientific bioimaging, FOV is typically expressed as the horizontal, vertical, and/or diagonal extent of the space captured by the digital sensor. For example, the FOV for a 2D image may be $44mm$ by $22mm$, where $44mm$ is the width and $22mm$ is the height of the observed space.
82 |
83 | ### filtering
84 | 1. Usually referes to a convolution operation (a local, linear operation on the intensity values of an image).
85 | 2. Any operation that modifies the intensity values of an image.
86 |
87 | ### group
88 | See [hierarchy](#hierarchy).
89 |
90 | ### hierarchy
91 | A collection of nodes, connected in a tree-like structure.
92 | A node can be either:
93 | 1. A group, i.e., a node that can have child nodes, and can contain metadata, but cannot contain array data.
94 | 2. An array. Array nodes cannot have child nodes.
95 |
96 | Related terms: [group](#group), [array](#array)
97 |
98 | ### interpolation
99 | A process that, given an image, produces new samples at points in the domain not on the discrete image grid.
100 |
101 | The most common methods for interpolation are 'nearest-neighbor', 'bi-/tri-/n-linear', 'cubic', and 'windowed sinc'.
102 |
103 | Related terms: [resampling](#resampling), [downsampling](#downsampling)
104 |
105 | ### origin
106 | A special location that acts as a reference point, relative to which relative to which other locations are defined. Unless otherwise specified, the image's origin is the same as the array's origin (assuming the image is produced from an array). An array's origin is typically the point in the discrete domain with the minimum index (usually zero) for all dimensions. Physical or anatomical spaces can also have origins; for example, in MR imaging, the anterior/posterior commissure is commonly regarded as an origin for the brain.
107 |
108 | The term 'offset' is sometimes used to refer to the origin.
109 |
110 | ### physical
111 | Relating to quantities or measurements of the real world.
112 |
113 | Examples:
114 |
115 | * sample intensities measured by a physical sensor
116 | + photon count
117 | + [Hounsfield unit](https://en.wikipedia.org/wiki/Hounsfield_scale)
118 | * distances / areas / volumes / times measured in images in physical units ($\mu m$, $mm$, seconds)
119 | + "the area of segment $A$ is $12 mm^2$"
120 | + "mitosis begins at time = $3.2 s$"
121 |
122 | Non-examples:
123 |
124 | * sample intensities not derived from sensors
125 | + segmentation id
126 | + the output of a deep neural network model
127 | * distances / areas / volumes / times described by sample / array indexes
128 | + "the area of segment $B$ is $85$ pixels"
129 | + "mitosis begins at frame $51$"
130 |
131 | ### quantization
132 | A process that converts a physical or continuous value to a digital representation with a particular precision. Samples of a quantized image can take one of a finite set of values defined by its bit depth.
133 |
134 | Related terms: [bit-depth](#bit-depth)
135 |
136 | ### resampling
137 | A process that generates a new array representing an image at a new resolution.
138 |
139 | The new resolution is often an integer multiple or fraction of the original image resolution, but need not be. Resampling methods often
140 | consist of filtering and interpolation steps.
141 |
142 | Related terms: [downsampling](#downsampling), [interpolation](#interpolation), [resolution](#resolution)
143 |
144 | ### resolution
145 |
146 | The smallest difference in signal quantity that can be discriminated by a device or system. In the case of analog to digital signal conversion, resolution is determined by the number of bits used to represent the signal.
147 |
148 | Colloquial uses of the term ‘resolution’ include the total number of samples in each dimension of an image (e.g., 640 by 420), and the set of spatial sampling intervals for an image (e.g. ‘spacing’, ‘pixel spacing’, or ‘pixel resolution’). For clarity, it may be prudent to avoid using the term ‘resolution’ to describe either the number of samples in an image or the spacing between samples.
149 |
150 | Related terms: [bit-depth](#bit-depth), [resampling](#resampling)
--------------------------------------------------------------------------------
/REVIEW_PROCESS.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | Articles posted to the Janelia Data Standards website should be vetted, authoritative, well-considered, and of high stylistic and technical quality. For this reason, each new article will undergo a review, using GitHub’s PR and review tooling. This process is intended to help the reviewer refine their ideas.
4 |
5 | # Review process
6 |
7 | Each new article must be approved by at least two reviewers. Reviewers must complete all *three* sections of the template below. Reviewers are selected by the author. The author may select more than two reviewers, if desired, but only the approval of two reviewers is needed. The review *may* consist of in-line comments, but it *must* include the template.
8 |
9 | Anybody who wishes to give feedback on the PR may do so, and the author is encouraged to consider all feedback thoughtfully, whether it comes from casual commenters or official reviewers. There is no deadline between initial submission of the PR and assignment of reviewers. If the author wishes, they may submit the PR, wait a while, see who comments on it, and then invite the commenters to review.
10 |
11 | Each reviewer must respond to each of the three big questions on the template, but there is no length requirement for the responses. Reviewers are encouraged to guide their thinking with the provided sub-questions. Reviewers have **three weeks** to complete their review. There is no minimum timeframe for which the PR must be open--the review process could be done in a day if the author and reviewers act promptly. If a reviewer takes more than three weeks, it is up to the authors’ discretion whether they wish to give the tardy reviewer more time, or select a new reviewer.
12 |
13 | If either reviewer requests changes, the author should not spend more than 3 weeks addressing reviewer feedback. This is a soft deadline, as some reviews may require intensive work. The author should address every reviewer comment, but need not implement all of them. In other words, if the author does not want to implement the recommended change, they should explain why. Ultimately, all that matters is that the template is filled out, and two reviewers must ultimately approve the article. Reviewers are encouraged to be reasonable and understand that not all of their comments will be heeded.
14 |
15 | If an article fails to achieve the approval of two reviewers in a reasonable time frame, the PR will be closed. (Who decides this, and how, will be played by ear.) The author may wish to re-submit a modified version of the article. Authors are discouraged from simply re-submitting the same article with more agreeable reviewers. In the event of disagreement on a data standard, the disagreeing parties are encouraged to meet—preferably over food and drink—and have one or more in-person discussions to try and achieve consensus. If consensus is simply impossible, the disagreeing parties should each write an article arguing for their position. The two articles should link to each other on the website, and should respond to each other’s ideas.
16 |
17 |
18 | Here is a diagram illustrating the review process.
19 |
20 | 
21 |
22 | ---
23 |
24 | # Review template
25 |
26 | The questions below do not need to be answered line-by-line. They are not a checklist; rather, they are meant to guide the reviewer in their subjective evaluation of the article's quality. It is up to the reviewer to decide how thoroughly they wish to address each section.
27 |
28 | 
29 |
30 | ## 1. The Big Picture
31 |
32 | Is this article appropriate for the Janelia Data Standards project?
33 | 1. Yes
34 | 2. No
35 | 3. Comments: ...
36 |
37 | Is the general premise of the article sound?
38 | 1. Yes
39 | 2. No
40 | 3. Comments: ...
41 |
42 | Is the proposed data standard generalizable beyond that author’s particular use case?
43 | 1. Yes
44 | 2. No
45 | 3. Comments: ...
46 |
47 | ## 2. Technical Choices
48 |
49 | Is the proposed standard elegant, straightforward, and focused?
50 | 1. Yes
51 | 2. No
52 | 3. Comments: ...
53 |
54 | Does the author adequately explain the rationale behind the standard?
55 | 1. Yes
56 | 2. No
57 | 3. Comments: ...
58 |
59 | Is the author making any flawed implicit assumptions, either about the problem or about their audience?
60 | 1. Yes
61 | 2. No
62 | 3. Comments: ...
63 |
64 | Does the author provide an implementation and/or example data (preferable but not required)?
65 | 1. Yes
66 | 2. No
67 | 3. N/A
68 | 4. Comments: ...
69 |
70 | ## 3. Writing Style
71 |
72 | Is the post readable to bio-imaging developers who may come from a different sub-field, or code in a different language?
73 | 1. Yes
74 | 2. No
75 | 3. Comments: ...
76 |
77 | Does it contain any typos or awkward sentences?
78 | 1. Yes
79 | 2. No
80 | 3. Comments: ...
81 |
82 | Are the ideas organized in a logical flow?
83 | 1. Yes
84 | 2. No
85 | 3. Comments: ...
86 |
87 | ---
88 |
89 | # Modifying existing articles
90 |
91 | Ideas and standards evolve over time, so some articles will need to be modified after publication. The modification history will be noted clearly on the post itself, so that no modification will be silent.
92 |
93 | If the author is making a *minor* modification to their own article, they may skip the review process. A minor modification does not change the audience or the use case for the standard, and poses no danger of rendering existing implementations of the standard obsolete.
94 |
95 | If the author is making a moderate or significant modification to their own published article, they must submit a new PR, and the PR will undergo the same review process described above.
96 |
97 | The line between ‘minor’ and ‘moderate’ modifications will be fuzzy. For example, altering a standard to be more permissive may not pose a danger of rendering existing implementations obsolete, but the author may wish to submit it for review anyway. These judgment calls are up to the author.
98 |
99 | Anyone may modify *any* article, even one they didn’t write, subject to the following condition: **any modification—even a minor one—requires the modifier to reach out to the original author.** Beyond that, the procedure then depends on the size of the modification:
100 |
101 | - Minor modifications only require a quick assent from the original author. If person A wrote the original article, and person B wishes to make a minor modification, then person B should reach out to person A. If person A replies, ‘go ahead’, then no further review is necessary.
102 | - If the original author cannot be reached, then minor modifications can be made with no review.
103 | - Moderate or significant modifications require formal review. In these cases, the modifier must make a good-faith effort to invite the original author to be a reviewer. If the original author wants to review, then the new author must select them as a reviewer. If the original author can’t be reached, then the article can be modified without their consent, with the usual review process.
104 |
105 | ---
106 |
107 | # Implementation in GitHub
108 |
109 | When the author is ready to submit their PR for review, they must use GitHub's [Request Review](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/requesting-a-pull-request-review) feature to assign TWO reviewers to the PR. Some instructions for reviewers are available [here](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request).
110 |
111 | On GitHub, a review has three possible statuses (or no status, e.g. pending reviewer response):
112 | - Comment: Submit general feedback without explicitly approving the changes or requesting additional changes.
113 | - Approve: Submit feedback and approve merging the changes proposed in the pull request.
114 | - Request changes: Submit feedback that must be addressed before the pull request can be merged.
115 |
116 | Reviews-in-progress should have no status or "Comment" status. A status of "Approve" or "Request Changes" indicates that the reviewer has made their choice, and the status must not be changed after that. Reviewers have at most three weeks to select either "Approve" or "Request Changes".
117 |
118 | If either reviewer requests changes, the author should implement those changes as described above, and then [re-request review](https://i.sstatic.net/H2XaO.gif).
119 |
120 | Note: All authors and reviewers should have write access to the repository. Write access is required to assign reviewers, and while GitHub does not require write access to leave reviews, a merge can proceed without approval from a read-only reviewer. For this reason, DSG authors and reviewers should request write permissions from the repository owner(s).
121 |
122 | ---
123 |
124 | # Attribution of Roles
125 |
126 | Contributions to the Data Standards project can be categorized in five roles:
127 | * Author(s) - Wrote the article or made a major modification to the article
128 | * Reviewer(s) - Reviewed the article or reviewed a major modification to the article
129 | * Contributor(s) - Commented on a draft of the article or provided minor modifications
130 | * Endorser(s) - Agree with the article's recommendations
131 | * Maintainer - Party to whom correspondence should be addressed
132 |
133 | People who contributed in each of these roles should be added to the post metadata using the template provided. (See previous posts for examples.) Endorsers may be added at any time, and endorsers should feel free to add themselves with no review process.
134 |
--------------------------------------------------------------------------------
/posts/file_formats_introduction.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Introduction to Microscopy File Formats"
3 | description: "An opinionated overview of file formats used in microscopy."
4 | author:
5 | - name: "Virginia Scarlett"
6 | contributors: [
7 | "Mark Kittisopikul",
8 | "John Bogovic"
9 | ]
10 | reviewers: [
11 | "Mark Kittisopikul",
12 | "John Bogovic",
13 | "Stephan Preibisch"
14 | ]
15 | maintainer:
16 | - name: ""
17 | endorsers: []
18 | date: "01/29/2025"
19 | format:
20 | html:
21 | template-partials:
22 | - title-metadata.html
23 | ---
24 |
25 | ## Outline
26 |
27 | - Introduction
28 | - [Ease of use](#ease-of-use)
29 | - [Scalability](#scalability)
30 | - [FAIRness](#fairness)
31 | - Ultra base formats
32 | - [Binary/Hard to classify](#binaryhard-to-classify)
33 | - [Text](#text)
34 | - Base formats
35 | - [XML and JSON](#xml-and-json)
36 | - [TIFF](#tiff)
37 | - [HDF5](#hdf5)
38 | - [Zarr](#zarr)
39 | - [N5](#n5)
40 | - [TileDB](#tiledb)
41 | - General-purpose formats
42 | - [OME-TIFF](#ome-tiff)
43 | - [OME-NGFF](#ome-ngff)
44 | - [Proprietary](#proprietary)
45 | - [BIDS](#bids)
46 | - Specialty file formats
47 | - [BigDataViewer](#bigdataviewer)
48 | - [H5J](#h5j)
49 |
50 |
51 |
52 | # Introduction
53 |
54 | A biology professor of mine used to say, "All biologists are visual learners. We need to see the cells or the molecules with our own eyes." It's no surprise, then, that biologists excel at imagining new ways to harness microscopy, which brings the biological world into view. However, the excited bench biologist may be in for a shock when they find their microscope acquiring gigantic images that take up half their hard drive and crash their software. When this happens, the visual biologist is suddenly thrust into an abstract world: the bits-and-bytes world of microscopy data management.
55 |
56 | Choosing the right file format for your microscopy project can be overwhelming. There are hundreds of microscopy file formats out there. The [Bio-Formats](http://www.openmicroscopy.org/bio-formats/) interoperability project supports 160 of them. Why are there so many formats? Trade-offs between size and speed are a big reason, as is the broad landscape of proprietary vendors and open-source developers. Luckily, you don’t need to weigh all the options yourself. You can use this guide to identify the few file formats that are likely to be useful for you.
57 |
58 | When choosing a file format to work with, we recommend considering three factors: ease of use, scalability, and FAIRness. It's unlikely that any one format will excel perfectly in all these areas. Rather, you should pick a format that meets your specific needs.
59 |
60 | ### Ease of use
61 |
62 | The ease of use of a microscopy file format largely depends on the quality and availability of the tools designed to read and write it. An ideal tool should be simple to install (if installation is required at all), intuitive to learn, and capable of functioning with minimal or no programming -- though having the option to customize through code is a valuable bonus. If ease of use is your top priority, you only need one reliable tool that works seamlessly with your chosen format. Most people are biased toward tools they already know, and for good reason, but it's always worth considering learning new skills. [Moore et al. (2023)](https://doi.org/10.1007/s00418-023-02209-1) provides an excellent overview of popular tools for viewing and annotating scientific images.
63 |
64 | Your specific needs may require tools with specialized features not universally available. For instance, you might want to store points or regions of interest (ROIs) overlaid on the image, time points for time-series data, or details about transformations applied to the data. An in-depth comparison of these kinds of features is beyond the scope of this introduction, but please see the [metadata features chart](https://datastandards.janelia.org/metadata.html) we have created, which provides a quick comparison of features we know about for the file formats we've used.
65 |
66 | ### Scalability
67 |
68 | Scalability is closely related to ease of use. A scalable format is easy to use... at scale. Scalability encompasses storing, viewing, and sharing gigantic images.
69 |
70 | Historically, images were limited to non-chunked or 'monolithic' file formats. In these formats, an image is stored as a linear string of bytes in one file by rows or planes. If you want to view a local region of your image that is stored in this format, you must traverse over many bytes that you do not need. Compression further complicates finding the needed information since variable compression sizes make the location of bytes needed hard to predict.
71 |
72 | The scalability of monolithic formats may be improved by storing multiple copies of the same 2D image at different resolutions in the same file. This type of file is often referred to as a "multiscale pyramid". The different resolutions are generated through a technique called downsampling, in which data are typically either uniformly discarded, or in which subsets of pixels are averaged to produce a new, smaller image. Storing an image as a multiscale pyramid means that the software needs to only load into memory the lowest-resolution plane that is needed. However, even though it is a smaller plane, the entire plane must still be loaded into memory.
73 |
74 | Another technique for improved scalability is chunking. Chunking involves breaking up an image into local regions called chunks. The images can be any number of dimensions, so it's easier to think of these data as n-dimensional arrays rather than 2D grids of pixels. The arrays can be broken up into uniform, but otherwise arbitrary chunk sizes that reflect the dimensionality of the array. A large 3D volume can be divided into small subvolumes, for example. The groups of chunks may be organized as a hierarchy of nested directories on a file system or web service, making it easy to locate the desired chunk. The metadata, which are usually in a dedicated file, contain information describing the layout of the chunks. Chunked formats can store more complex images than non-chunked formats, and can be viewed more efficiently, because software can retrieve small chunks of data as needed. Chunking and multiscale pyramids are not mutually exclusive; many file formats recommend both.
75 |
76 | A final scalability mechanism is compression. While a detailed discussion of compression is beyond the scope of this article, it's worth noting that there are two broad categories of compression algorithms: lossy and lossless. Lossy compression algorithms discard small amounts of data, while lossless algorithms do not discard any data. A good rule of thumb is to use lossless compression for analysis, while lossy compression is usually fine for visualization.
77 |
78 | ### FAIRness
79 |
80 | Though not a wholly new idea, [Wilkinson et al. (2016)](https://doi.org/10.1038/sdata.2016.18) gave us a memorable, one-syllable word for best practice in research data management. FAIR stands for Findable, Accessible, Interoperable, and Reusable. All researchers should strive to make their data as FAIR as possible, though optimal FAIRness can be hard to achieve.
81 |
82 | FAIR data means the data can be viewed by anyone in the world relatively easily, and far into the future. FAIRness in microscopy mostly comes down to the I -- interoperability. Interoperability in microscopy is a huge challenge ([Nature Methods, 2021](https://doi.org/10.1038/s41592-021-01347-5)), but it's also a prerequisite for optimal FAIRness. We believe FAIR practice means relying on open-source tools and formats that are either led by the microscopy community, or are very receptive to feedback from the community. The Open Microscopy Environment is a long-time leader in this space, so we highlight OME's file formats below.
83 |
84 | Many scientists find themselves with image data that are too large to share easily, so they make the data available upon request, often via shipping a hard drive. For the busy scientist who doesn't have time to learn about chunked formats, databases, or object stores, or whose data aren't a great fit for existing repositories such as the [Image Data Resource](https://idr.openmicroscopy.org/), [EMPIAR](https://www.ebi.ac.uk/empiar/), the [DANDI Archive](https://dandiarchive.org/), or the [AWS Open Data](https://registry.opendata.aws/) registry, shipping hard drives is a reasonable option. However, it is not as FAIR as making the data accessible on the web. In the future, we are likely to see more big-data repositories based on chunked formats stored in the cloud.
85 |
86 | # Microscopy File Formats
87 |
88 | Below, we present an opinionated overview of some of the most common microscopy file formats.
89 |
90 | ## Ultra base file formats
91 |
92 | ### Binary/Hard to classify
93 |
94 | We use the term "binary format" as a generic term to refer to file formats that can be easily read by computers. If files defy classification into more standardized categories, they may be only referred to as "binary". Most of the file formats described in other sections are, at least in part, binary in the sense that they have text-encoded and non-text components (e.g. metadata files and chunk files, respectively). By 'binary formats', we mean prescribed arrangements of bytes in a single file designed for specific applications.
95 |
96 | Two examples of binary formats we use at Janelia are Enhanced FIBSEM DAT ([Chris Barnes' DAT toolkit](https://github.com/clbarnes/jeiss-convert), [Janelia's DAT toolkit](https://github.com/janelia-cellmap/fibsem-tools)) and [Keller Lab Block (KLB)](https://doi.org/10.1038/nprot.2015.111) ([bitbucket repository](https://bitbucket.org/fernandoamat/keller-lab-block-filetype/src/master/)). These formats are not derived from any broader standard and can only be interpreted by dedicated software specifically written for them.
97 |
98 | Binary formats can be useful at acquisition, especially in live cell imaging or in fluorescence imaging, where speed is paramount. They may also provide storage solutions, for example, by allowing for the use of completely novel compression schemes.
99 |
100 | In our experience, these formats are usually not very FAIR. Often, they are only maintained by a handful of people who designed the format for a special use case. The Keller Lab has made significant efforts to enhance the interoperability of KLB by providing converters for platforms like ImageJ, MATLAB, and some proprietary libraries. While these efforts improve accessibility, we still recommend against publishing data in these specialized formats.
101 |
102 | A more widely adopted binary format is MRC ([CCP-EM](https://www.ccpem.ac.uk/mrc_format/mrc2014.php), [Cheng et al. 2015](https://doi.org/10.1016/j.jsb.2015.04.002)), designed for electron cryo-microscopy and tomography. The accompanying [python library](https://pypi.org/project/mrcfile/) is being actively maintained. However, the format and tooling are still maintained by a relatively small community. Multi-language support is lacking, and MRC does not offer the flexibility of formats that separate the storage backend from the user API (like Zarr or HDF5). This limited flexibility is a barrier to widespread adoption. We encourage users of the MRC format to consider using Zarr or HDF5-based formats instead, to promote convergence on a small number of technologies with robust development communities.
103 |
104 | ### Text
105 |
106 | While all files on a computer are ultimately binary, text-based files are binary files that can be decoded into human-readable text using character encodings. ASCII and Unicode are widely used standards for mapping bytes to letters and symbols. Unicode, which includes ASCII as a subset, supports a large range of characters across many languages. These encodings enable human-readable text to be represented as a binary sequence (0s and 1s) that computers can store and transmit. Text formats generally use encodings such as UTF-8, UTF-16, or ASCII to store textual data.
107 |
108 | In microscopy, the two main text formats are XML and JSON, discussed below. Another example worth mentioning is RDF (Resource Description Framework), which, although not strictly a text format, is often represented as text. RDF enables more descriptive relationships beyond simple key-value pairs, making it a powerful tool for modeling semantic data.
109 |
110 | For small amounts of data, storing the data as structured text is convenient. However, for large amounts of data, say, megabytes, text formats become unwieldy. This is why, in microscopy, text formats are usually reserved for metadata, while the image data themselves are stored in more space-efficient formats.
111 |
112 | ## Base file formats
113 |
114 | Next, we summarize some "base" formats that are commonly used in bioimaging and that we find noteworthy. These formats serve as generic storage technologies that have often been further refined to suit a particular application. This list covers a wide variety of use cases, but it is not exhaustive - some applications might benefit from a format not listed here.
115 |
116 | ### XML and JSON
117 |
118 | XML (eXtensible Markup Language) and JSON (JavaScript Object Notation) are standardized text formats designed for exchanging data between programs. Both are human-readable and readable by software programs with the appropriate libraries. XML is older, more verbose, and the tools for structuring and parsing it are more mature. JSON is newer, more readable, and the schema language JSONSchema is only moderately mature. XML is a bit more expressive, allowing for more complex hierarchies and dedicated namespaces. JSON is more lightweight and straightforward to parse.
119 |
120 | The simplest way to store metadata (attributes of the data such as resolution, axis order, units, etc.) is in the form of plain text, as in the methods section of a paper. However, plain text has two drawbacks: (1) it separates the data and the metadata into separate files, and (2) it is not directly interpretable by image-viewing tools. XML and JSON solve these problems by embedding machine-readable metadata directly within the data files or folders. XML or JSON metadata usually follow a schema, which defines the structure and organization of the metadata. With a schema, such as the OME schemas, software tools can reliably locate specific metadata fields. For example, if the viewer tool knows where to extract the pixel spacing information, it can then display the position of your cursor in real-world units, e.g. nanometers.
121 |
122 | XML and JSON are both useful formats for storing microscopy metadata. While the original OME-XML standard recommended storing the actual image data in XML, this practice was abandoned decades ago because it's very unwieldy. Nowadays, most image formats have the data in a binary component and the metadata in an XML or JSON component.
123 |
124 | As JSON is newer, more lightweight, and human-readable, newer formats tend to store their metadata in JSON. However, microscopy formats that store their metadata in XML are still widespread.
125 |
126 | ### TIFF
127 |
128 | TIFF is a well-established, non-chunked image format that has been widely trusted since its first release in 1986. It is supported by most image analysis tools and remains a staple in scientific and industrial imaging. TIFFs can be compressed with or without loss of data (lossy or lossless), though compression is lossless by default. JPEG compression is often more straightforward for TIFF than for chunked formats.
129 |
130 | A single TIFF file can store multiple 2D images. In these multi-page TIFFs, metadata embedded in the file indicate to the viewer software where each image starts and ends. Multi-page TIFFs are particularly useful for storing sets of related images, such as multiscale pyramids, z-stacks, or image-thumbnails pairs. If your top priorities are ease and FAIRness, TIFF may be a good choice, because it's so widespread and well-supported.
131 |
132 | The scalability of TIFF is limited, however. Once a TIFF file is more than a few GB, image viewers are likely to struggle with it, because they are limited by the size of the computer's memory. For large datasets or multi-dimensional images, chunked formats may be more suitable.
133 |
134 | ### HDF5
135 |
136 | [HDF5](https://www.hdfgroup.org/solutions/hdf5/) is a chunked file format for storing n-dimensional arrays, dating back to 1998. Like TIFF, it can be read by many tools, and many varieties exist. It is possible to store extremely complex datasets in a single HDF5 file, including heterogeneous arrays where each element is itself a complex object. You can think of HDF5 as an entire file system contained within a single file. The main differences between HDF5 and a folder on your file system are that (1) the individual HDF5 chunks can be accessed with a lot less overhead than individual files, and (2) data are not accessed through your operating system but rather through the GUI HDFView or the HDF5 API, which has been implemented in all the major programming languages. Smart use of data buffers and caches make HDF5 quite good at loading giant data into memory efficiently.
137 |
138 | The crucial drawback of HDF5 is that it is slow to access in the cloud, specifically object storage (such as Amazon S3, Google GCS, or Microsoft ADL). In an unoptimized HDF5 file, it can be time consuming to locate the chunks inside an HDF5 container, particularly over the internet. One way to cloud-optimize an HDF5 file is to put the chunk location information into a single header so that it all can be read in one operation. (For more tips on cloud-optimized HDF5, see [this forum discussion](https://forum.hdfgroup.org/t/retroactive-cloud-optimization-using-h5repack/11320) and [this talk](https://www.youtube.com/watch?v=R5ok4fdYqBs&list=PL8X9E6I5_i8g7IcCHyC-_XdLyGRKXTSc-&index=43).) Additionally, there are intermediate services you can use (notably [HSDS](https://www.hdfgroup.org/solutions/hsds/), also see [this article](https://www.hdfgroup.org/2022/08/08/cloud-storage-options-for-hdf5/)). However, any of these options is an extra layer of effort for you and/or your users if you want to share HDF5 files over the cloud.
139 |
140 | We consider HDF5 reasonably FAIR. It is maintained by a non-profit organization, the HDF Group. Despite this top-down governance structure, the HDF Group is receptive to feedback from the microscopy community. Many scientific image viewers are equipped to read and write HDF5, making it a sensible choice for data that are meant to be downloaded locally or interacted with on an HPC cluster.
141 |
142 | ### Zarr
143 |
144 | [Zarr](https://zarr.dev/) is a relatively new project created by Alistair Miles in 2015, and shares many similarities with HDF5. Both projects offer APIs in a variety of languages for reading, writing, and compressing chunks of huge n-dimensional arrays. Both use a hierarchical data model consisting of groups of chunked arrays. Both are based on a layered computation model: the storage, storage access/transformers, filters/codecs, and application layers are conceptually separated, allowing the developer to mix and match them. Both come in many variants tailored to specific scientific domains.
145 |
146 | A key difference is that while an HDF5 dataset is a single file that must be read with an HDF5 library, an analogous Zarr dataset would be a folder containing many small files that can theoretically be read by anything that can read files. This creates a different kind of overhead than HDF5. Zarr is more cloud-friendly than HDF5, but it can be less friendly on a local computer or HPC cluster. For example, it can be very slow to move a Zarr dataset from one location to another on your computer. Zarr attempts to mitigate this issue using a technique called sharding, which was recently introduced with the latest version of Zarr ([v.3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)). Sharding is another layer of data organization in addition to chunking, where chunks reside within shards. The shards will be folders, if the storage backend is a file system. Nesting chunks within shards can substantially reduce the number of files, improving local performance, but it does make implementation more complex.
147 |
148 | Because Zarr is such a new project, there are some limits to its ease of use. If you are using it for a highly novel, specialized scientific application, you may encounter issues with missing features or incomplete standardization. For example, the existing compression schemes may be inadequate, or it may not support complex transformations. Still, there are perhaps a dozen scientific image viewers that support [OME-Zarr](https://doi.org/10.1007/s00418-023-02209-1), which bodes well for its future.
149 |
150 | ### N5
151 |
152 | [N5](https://github.com/saalfeldlab/n5) is a format created by Stephan Saalfeld in 2017. The main motivation behind it was to
153 | simplify the parallel writing of chunks relative to HDF5. As noted earlier, this was also a motivating factor in the creation
154 | of Zarr. Since N5 was created independently of Zarr at around the same time and with similar motivations, the two formats are
155 | functionally quite similar. Like Zarr, it stores chunks in separate files on a file system (or keys in a cloud store) and
156 | metadata as JSON files. Furthermore, it comes with an API that can read and write from the HDF5 and Zarr formats in addition to
157 | the N5 format, thereby unifying access to chunked data formats.
158 |
159 | Because of their close similarity, N5 and Zarr generally interoperate well: many libraries that can read and write Zarr can read
160 | and write N5 (and vice versa). As Zarr has become more prevalent, with a wider community, we generally recommend using it to N5,
161 | unless your application would benefit from an N5-specific feature:
162 |
163 | 1) N5 allows "partial" chunks where all Zarr chunks must be the same size. The difference appears near the upper boundary of the
164 | array if the array size is not evenly divisible by the chunk size. N5 chunks will end where the array does, but Zarr chunks need
165 | to be filled with "empty" data.
166 |
167 | 2) N5 can read and write the special-case [label multi-set](https://github.com/saalfeldlab/paintera?tab=readme-ov-file#label-multisets)
168 | data type used by [Paintera](https://github.com/saalfeldlab/paintera). Zarr does not currently support label multi-sets.
169 |
170 | Also like Zarr, the ease-of-use of N5 largely will vary across software ecosystems. At this time, Fiji and neuroglancer have
171 | good support for N5. In particular, the [N5 plugins for Fiji](https://github.com/saalfeldlab/n5-ij) enable users to read and
172 | write to a variety of formats, including OME-Zarr, without needing to write code. For that reason, the N5-API may be a friendly
173 | entry-point to chunked formats for many researchers.
174 |
175 | ### TileDB
176 |
177 | [TileDB](https://tiledb.com/) is a modern file format and database management system optimized for highly parallel reads and writes of n-dimensional arrays. It offers excellent performance on sparse arrays, although this use case is not common in bioimaging. Like Zarr, it's very cloud-friendly. Similar to Zarr and HDF5, it separates the storage, compression, and compute layers, providing flexibility in storage options (e.g., file systems or object storage) and access methods (e.g., Spark, Dask, or MariaDB). TileDB also supports basic versioning: its chunks are immutable, enabling users to preserve and inspect previous versions of data.
178 |
179 | One standout feature of TileDB is that it will be familiar to data engineers accustomed to tabular data. When data are stored following recommended conventions, users can query "tiles" (chunks) of n-dimensional arrays using SQL. It also has a very robust development community and strong multi-language support.
180 |
181 | TileDB is gaining traction in scientific fields like geospatial analysis and genomics, but adoption in microscopy has been slower. While the paid tier TileDB Cloud offers an image viewer, among other features, and there is a [Napari](https://www.napari.org/) [plugin](https://github.com/TileDB-Inc/napari-tiledb-bioimg) for TileDB, the format can feel overly complex for many microscopy developers' needs. If you're just sharing some data on S3 or in a repository, we recommend Zarr as a more FAIR choice. The OME-Zarr format in particular is designed with microscopy in mind. However, TileDB may excel in enterprise-scale scenarios, such as building repositories or handling large, diverse datasets with complex access requirements. In such cases, its scalability and versatility could make it a stronger choice than Zarr.
182 |
183 | ## General-purpose file formats
184 |
185 | Below we describe a selection of versatile file formats that are appropriate for publication or everyday use. Building on the base formats described above, they are specifically designed to accommodate microscopy data. With the exception of BIDS, all are widely used at Janelia.
186 |
187 | ### OME-TIFF
188 |
189 | OME-TIFF is a specification of the widely used TIFF format that is tailored for microscopy images. It integrates metadata based on the OME-XML specification, embedding an OME-XML header block directly into the TIFF file. Developed in the early 2000s, OME-TIFF provides a robust framework for storing both image data and richly detailed metadata ([Goldberg et al. 2005](https://doi.org/10.1186/gb-2005-6-5-r47)). OME-TIFF was originally designed for fluorescence imaging, and was adapted in 2019 to accommodate whole slide imaging ([Besson et al. 2019](https://doi.org/10.1007/978-3-030-23937-4_1)). However, new development of the OME-TIFF specification is limited, as the community shifts to OME-NGFF and considers newer metadata frameworks ([Hammer et al. 2021](https://doi.org/10.1038/s41592-021-01327-9%20)) that may be based on RDF and JSON-LD.
190 |
191 | The OME data model supports a wide range of metadata, including image characteristics (e.g., resolution, number of focal planes, time points, and channels), as well as details about the acquisition instrument, experimenters, experimental design, and more. Any software that can read TIFF files can also open OME-TIFFs. Additionally, most scientific image viewers are equipped to interpret at least the core metadata of OME-TIFF, making this format broadly interoperable. Since OME-TIFFs are TIFFs, they can store multiple image planes, but they also lack chunking capabilities.
192 |
193 | If you choose to save your data as a TIFF, it's best practice to use the OME-TIFF specification. OME-TIFF can accommodate more of your metadata than a standard TIFF, which lacks microscopy-specific fields. Even if you don't currently need all the metadata contained in the OME-XML, users analyzing or processing your data in the future may benefit from having access to this additional information. In contrast, custom metadata schemas or plain text notes won't be supported by existing tools.
194 |
195 | ### OME-NGFF
196 |
197 | Like OME-TIFF, OME-NGFF is a specification of a more generic file format that is tailored for microscopy. OME-NGFF is, for the time being, synonymous with OME-Zarr; "NGFF" stands for "Next Generation File Format". An OME-NGFF data set consists of a Zarr hierarchy, which includes certain metadata files in standard locations in JSON format. OME-NGFF is much newer than OME-TIFF; its first release was in 2021 ([Moore et al. 2023](https://doi.org/10.1007/s00418-023-02209-1)). It supports some types of ancillary data, namely high-content screening (HCS) data and label images (e.g. from image segmentation).
198 |
199 | OME-NGFF is much more scalable than OME-TIFF, being a chunked format. It is also designed to handle multi-dimensional data, though it specifies a limit of 5D. Being Zarr-based, OME-NGFF is optimized for the cloud, so it's a great format for data publication. It will soon have strong support for complex coordinate transformations, a feature that is lacking in all the other microscopy formats, to our knowledge.
200 |
201 | A weakness of OME-NGFF is that the schema is changing rapidly (though perhaps this is also a strength). Tools for working with it are immature, making it that much more complicated to learn. The metadata schema is also missing much material that is present in OME-XML, for example, instrument settings and experiment details.
202 |
203 | Even though OME-NGFF is still an emerging standard, we encourage the community to use and contribute to this format. OME is a long-standing leader in microscopy analysis standards and tools, and we are optimistic that the community will come together around this new format.
204 |
205 | ### Proprietary
206 |
207 | Proprietary file formats are developed by a microscopy manufacturer, and are designed to be viewed with the manufacturer's software. Examples include .nd2 for Nikon, .lif for Leica, .oib or .oif for Olympus, .lsm or .czi for Zeiss, or .ims for Imaris. Some of these formats are variants of the base formats described above; for example, Imaris files are HDF5-based. They usually perform well in their corresponding proprietary viewers, but cannot readily be opened by open-source viewers. Luckily, the [Bio-formats](https://www.openmicroscopy.org/bio-formats/) project has made it possible to open these formats in several open-source viewers, though sometimes metadata can be lost in translation. Viewers that use the Bio-formats library often provide functionality to either translate on the fly or convert the file to a new format. There are also free versions of some proprietary viewers, such as Zeiss Zen Lite and Leica LAS AF Lite.
208 |
209 | The FAIRness of these formats is a tricky question. On the one hand, converting to an open-source format such as OME-TIFF or OME-NGFF is appealing because it makes the data accessible to users who are used to, or limited to, free, open-source viewers. On the other hand, some metadata may be lost during conversion. Additionally, conversion duplicates your data size and storage costs. When considering publishing data in a proprietary format, there is no one-size fits all solution. You should consider your audience and your budget. You may try conversion, and then assess whether any metadata have been lost by comparing what you see in the vendor interface against what you see in, say, ImageJ after conversion. The vendor may also offer tools to export the metadata in a text format, say plain text or XML. In that case, you could convert the image to an open-source format, which should retain at least the pixel/voxel data in full, and then you could archive and/or publish the much smaller text export of the metadata.
210 |
211 | ### BIDS
212 |
213 | The Brain Imaging Data Structure (BIDS) is a widely used specification for organizing medical image data. It is not itself a storage backend; rather, it defines a consistent and structured way to arrange files and folders and to format JSON metadata. Originally developed by the MRI community, BIDS quickly gained traction in that field and has since been extended to support other imaging modalities, including EEG, iEEG, PET, and qMRI. Efforts to add microscopy to that list are relatively recent, having started in earnest in 2021 (see [this forum discussion](https://forum.image.sc/t/call-for-comments-on-brain-imaging-data-structure/50701)), and led to the publication of the microscopy extension in 2022 ([Bourget et al. 2022](https://doi.org/10.3389/fnins.2022.871228)).
214 |
215 | The [BIDS microscopy specification](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/microscopy.html) adds additional standards on top of OME-NGFF and OME-TIFF. The BIDS metadata complement, rather than replace, the metadata embedded in OME-TIFF headers or OME-NGFF JSON files. As a result, researchers are required to maintain metadata in two locations, which can introduce redundancy and potential inconsistencies. The authors worked to mitigate these drawbacks by specifying only the minimal metadata needed for image analysis.
216 |
217 | Adopting BIDS can help your data to integrate seamlessly with BIDS-compatible tools or repositories, such as the [DANDI Archive](https://www.dandiarchive.org/), which encourages the use of BIDS for many data types. Even if you do not require BIDS-specific tooling, modeling your Zarr or HDF5 data hierarchies according to BIDS principles can be a sensible approach to organization. However, it is worth noting that the BIDS microscopy standard is still maturing. Learning it takes time, and managing redundant metadata carries risks. For these reasons, we do not currently use BIDS for our microscopy data, though it is a robust and valuable standard, particularly for medical imaging applications.
218 |
219 | ## Specialty file formats
220 |
221 | These are formats you're more likely to inherit than select, unless you have a particular reason to. They were created to solve particular problems the other formats couldn't solve.
222 |
223 | ### BigDataViewer
224 |
225 | The [BigDataViewer](https://imagej.net/plugins/bdv/) format is based on HDF5 or N5 for data and XML for metadata, and is designed for 3D multi-view light sheet microscopy data. As the name suggests, it allows for seamless viewing of terabyte-scale images. It uses both the pyramid and chunking techniques -- each dataset is stored as a series of downsampled volumes, and each volume is stored as a chunked array. This allows the lower resolution slices to be rendered almost instantly, and the higher resolution slices to be loaded soon after, if the user continues browsing in the same region.
226 |
227 | The BigDataViewer software, which comes with Fiji, can view any file format that is in the [Bio-formats](http://www.openmicroscopy.org/bio-formats/) library using the LOCI Bioformats plugin, and also natively supports the Imaris (.ims) file format. However, most of those formats don't readily support large, multiview image datasets such as those from Selective Plane Illumination Microscopy (SPIM) systems. For SPIM or light sheet fluorescence microscopy (LSFM) data, the BigDataViewer format may be a good choice. It integrates well with Fiji's Multiview Reconstruction plugins, as well as Fiji's BigStitcher plugin, and is more standardized than inventing your own custom HDF5 structure and metadata schema.
228 |
229 | ### H5J
230 |
231 | [H5J](https://github.com/JaneliaSciComp/workstation/blob/master/docs/H5JFileFormat.md) is a "visually lossless" file format developed at Janelia Research Campus for storing multichannel 3D image stacks. An H5J file is simply a standard HDF5 file with a specific hierarchy structure. It uses the H.265 codec (a.k.a. HEVC or High Efficiency Video Coding) and different compression ratios per channel to obtain better compression than is readily achievable with OME-Zarr. H5J files can be read by [VVD Viewer](https://github.com/takashi310/VVD_Viewer), the [Janelia Workstation](https://github.com/JaneliaSciComp/workstation), [web-vol-viewer](https://github.com/JaneliaSciComp/web-vol-viewer), and [web-h5j-loader](https://github.com/JaneliaSciComp/web-h5j-loader). They can be read and written using [Fiji](https://fiji.sc/) or [Vaa3D](https://github.com/Vaa3D/release). You can print the metadata using any HDF5-compliant library, and/or export them to JSON.
232 |
233 | H5J is arguably more space efficient than OME-Zarr, but its lower adoption is a significant drawback. H5J is not nearly as widespread as OME-Zarr, and its development community is limited. If you're tied to HDF5, H5J is a reasonably accessible resource for achieving high compression without reinventing the wheel. However, even at Janelia, significant effort is being made to improve the performance of OME-Zarr, to support the community effort to shift to a small number of well-supported formats.
234 |
235 |
236 |
237 |
238 |
239 |
240 |
241 |
242 |
243 |
244 |
245 |
246 |
247 |
248 |
249 |
250 |
251 |
252 |
253 |
254 |
255 |
--------------------------------------------------------------------------------